MTVRP: Multi-task VRP environment¶
This environment can handle any of the following variants:
VRP Variant | Capacity (C) | Open Route (O) | Backhaul (B) | Duration Limit (L) | Time Window (TW) |
---|---|---|---|---|---|
CVRP | ✔ | ||||
OVRP | ✔ | ✔ | |||
VRPB | ✔ | ✔ | |||
VRPL | ✔ | ✔ | |||
VRPTW | ✔ | ✔ | |||
OVRPTW | ✔ | ✔ | ✔ | ||
OVRPB | ✔ | ✔ | ✔ | ||
OVRPL | ✔ | ✔ | ✔ | ||
VRPBL | ✔ | ✔ | ✔ | ||
VRPBTW | ✔ | ✔ | ✔ | ||
VRPLTW | ✔ | ✔ | ✔ | ||
OVRPBL | ✔ | ✔ | ✔ | ✔ | |
OVRPBTW | ✔ | ✔ | ✔ | ✔ | |
OVRPLTW | ✔ | ✔ | ✔ | ✔ | |
VRPBLTW | ✔ | ✔ | ✔ | ✔ | |
OVRPBLTW | ✔ | ✔ | ✔ | ✔ | ✔ |
It is fully batched, meaning that different variants can be in the same batch too!
In [1]:
Copied!
%load_ext autoreload
%autoreload 2
from rl4co.envs.routing.mtvrp.env import MTVRPEnv
from rl4co.envs.routing.mtvrp.generator import MTVRPGenerator
%load_ext autoreload
%autoreload 2
from rl4co.envs.routing.mtvrp.env import MTVRPEnv
from rl4co.envs.routing.mtvrp.generator import MTVRPGenerator
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning_utilities/core/imports.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2832: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/fabric/__init__.py:41: Deprecated call to `pkg_resources.declare_namespace('lightning.fabric')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2317: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(parent) /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/__init__.py:37: Deprecated call to `pkg_resources.declare_namespace('lightning.pytorch')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2317: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(parent)
Let's now generate some variants! By default, we can generate all variants with the variants_preset
variable
In [2]:
Copied!
# Single feat: generate a distribution of single-featured environments
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env = MTVRPEnv(generator, check_solution=False)
td_data = env.generator(8)
env.get_variant_names(td_data)
# Single feat: generate a distribution of single-featured environments
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env = MTVRPEnv(generator, check_solution=False)
td_data = env.generator(8)
env.get_variant_names(td_data)
Out[2]:
['VRPLTW', 'OVRP', 'VRPLTW', 'OVRPLTW', 'OVRPL', 'VRPB', 'OVRPTW', 'OVRPB']
In [3]:
Copied!
# Here is the list of presets and their probabilities of being generated (fully customizable)
env.print_presets()
# Here is the list of presets and their probabilities of being generated (fully customizable)
env.print_presets()
all: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5} single_feat: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5} single_feat_otw: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5, 'OTW': 0.5} cvrp: {'O': 0.0, 'TW': 0.0, 'L': 0.0, 'B': 0.0} ovrp: {'O': 1.0, 'TW': 0.0, 'L': 0.0, 'B': 0.0} vrpb: {'O': 0.0, 'TW': 0.0, 'L': 0.0, 'B': 1.0} vrpl: {'O': 0.0, 'TW': 0.0, 'L': 1.0, 'B': 0.0} vrptw: {'O': 0.0, 'TW': 1.0, 'L': 0.0, 'B': 0.0} ovrptw: {'O': 1.0, 'TW': 1.0, 'L': 0.0, 'B': 0.0} ovrpb: {'O': 1.0, 'TW': 0.0, 'L': 0.0, 'B': 1.0} ovrpl: {'O': 1.0, 'TW': 0.0, 'L': 1.0, 'B': 0.0} vrpbl: {'O': 0.0, 'TW': 0.0, 'L': 1.0, 'B': 1.0} vrpbtw: {'O': 0.0, 'TW': 1.0, 'L': 0.0, 'B': 1.0} vrpltw: {'O': 0.0, 'TW': 1.0, 'L': 1.0, 'B': 0.0} ovrpbl: {'O': 1.0, 'TW': 0.0, 'L': 1.0, 'B': 1.0} ovrpbtw: {'O': 1.0, 'TW': 1.0, 'L': 0.0, 'B': 1.0} ovrpltw: {'O': 1.0, 'TW': 1.0, 'L': 1.0, 'B': 0.0} vrpbltw: {'O': 0.0, 'TW': 1.0, 'L': 1.0, 'B': 1.0} ovrpbltw: {'O': 1.0, 'TW': 1.0, 'L': 1.0, 'B': 1.0}
We can change the preset to generate some specific variant, for instance the VRPB
In [4]:
Copied!
# Change generator
generator = MTVRPGenerator(num_loc=50, variant_preset="vrpb")
env.generator = generator
td_data = env.generator(8)
env.get_variant_names(td_data)
# Change generator
generator = MTVRPGenerator(num_loc=50, variant_preset="vrpb")
env.generator = generator
td_data = env.generator(8)
env.get_variant_names(td_data)
vrpb selected. Will not use feature combination!
Out[4]:
['VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB']
Greedy rollout and plot¶
In [5]:
Copied!
import torch
from rl4co.utils.ops import gather_by_index
# Simple heuristics (nearest neighbor + capacity check)
def greedy_policy(td):
"""Select closest available action"""
available_actions = td["action_mask"]
# distances
curr_node = td["current_node"]
loc_cur = gather_by_index(td["locs"], curr_node)
distances_next = torch.cdist(loc_cur[:, None, :], td["locs"], p=2.0).squeeze(1)
distances_next[~available_actions.bool()] = float("inf")
# do not select depot if some capacity is left
distances_next[:, 0] = float("inf") * (
td["used_capacity_linehaul"] < td["vehicle_capacity"]
).float().squeeze(-1)
# # if sum of available actions is 0, select depot
# distances_next[available_actions.sum(-1) == 0, 0] = 0
action = torch.argmin(distances_next, dim=-1)
td.set("action", action)
return td
def rollout(env, td, policy=greedy_policy, max_steps: int = None):
"""Helper function to rollout a policy. Currently, TorchRL does not allow to step
over envs when done with `env.rollout()`. We need this because for environments that complete at different steps.
"""
max_steps = float("inf") if max_steps is None else max_steps
actions = []
steps = 0
while not td["done"].all():
td = policy(td)
actions.append(td["action"])
td = env.step(td)["next"]
steps += 1
if steps > max_steps:
print("Max steps reached")
break
return torch.stack(actions, dim=1)
import torch
from rl4co.utils.ops import gather_by_index
# Simple heuristics (nearest neighbor + capacity check)
def greedy_policy(td):
"""Select closest available action"""
available_actions = td["action_mask"]
# distances
curr_node = td["current_node"]
loc_cur = gather_by_index(td["locs"], curr_node)
distances_next = torch.cdist(loc_cur[:, None, :], td["locs"], p=2.0).squeeze(1)
distances_next[~available_actions.bool()] = float("inf")
# do not select depot if some capacity is left
distances_next[:, 0] = float("inf") * (
td["used_capacity_linehaul"] < td["vehicle_capacity"]
).float().squeeze(-1)
# # if sum of available actions is 0, select depot
# distances_next[available_actions.sum(-1) == 0, 0] = 0
action = torch.argmin(distances_next, dim=-1)
td.set("action", action)
return td
def rollout(env, td, policy=greedy_policy, max_steps: int = None):
"""Helper function to rollout a policy. Currently, TorchRL does not allow to step
over envs when done with `env.rollout()`. We need this because for environments that complete at different steps.
"""
max_steps = float("inf") if max_steps is None else max_steps
actions = []
steps = 0
while not td["done"].all():
td = policy(td)
actions.append(td["action"])
td = env.step(td)["next"]
steps += 1
if steps > max_steps:
print("Max steps reached")
break
return torch.stack(actions, dim=1)
In [6]:
Copied!
# NOTE: if we don't select ovrpbltw, the below does not work and there is still some
# minor bug in either masking or variant subselection
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env.generator = generator
td_data = env.generator(3)
variant_names = env.get_variant_names(td_data)
td = env.reset(td_data)
actions = rollout(env, td.clone(), greedy_policy)
rewards = env.get_reward(td, actions)
for idx in [0, 1, 2]:
env.render(td[idx], actions[idx])
print("Cost: ", - rewards[idx].item())
print("Problem: ", variant_names[idx])
# NOTE: if we don't select ovrpbltw, the below does not work and there is still some
# minor bug in either masking or variant subselection
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env.generator = generator
td_data = env.generator(3)
variant_names = env.get_variant_names(td_data)
td = env.reset(td_data)
actions = rollout(env, td.clone(), greedy_policy)
rewards = env.get_reward(td, actions)
for idx in [0, 1, 2]:
env.render(td[idx], actions[idx])
print("Cost: ", - rewards[idx].item())
print("Problem: ", variant_names[idx])
Cost: 17.503389358520508 Problem: OVRPLTW
Cost: 18.86773109436035 Problem: CVRP
Cost: 15.39835262298584 Problem: VRPB
Train MVMoE on Multiple Problems¶
In [7]:
Copied!
from rl4co.utils.trainer import RL4COTrainer
from rl4co.models.zoo import MVMoE_POMO
device_id = 0
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
generator = MTVRPGenerator(num_loc=50, variant_preset="single_feat")
env = MTVRPEnv(generator, check_solution=False)
from rl4co.utils.trainer import RL4COTrainer
from rl4co.models.zoo import MVMoE_POMO
device_id = 0
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
generator = MTVRPGenerator(num_loc=50, variant_preset="single_feat")
env = MTVRPEnv(generator, check_solution=False)
single_feat selected. Will not use feature combination!
In [8]:
Copied!
moe_kwargs = {"encoder": {"hidden_act": "ReLU", "num_experts": 4, "k": 2, "noisy_gating": True},
"decoder": {"light_version": False, "num_experts": 4, "k": 2, "noisy_gating": True}}
model = MVMoE_POMO(
env,
moe_kwargs=moe_kwargs,
batch_size=128,
train_data_size=10000, # each epoch,
val_batch_size=100,
val_data_size=1000,
optimizer="Adam",
optimizer_kwargs={"lr": 1e-4, "weight_decay": 1e-6},
lr_scheduler="MultiStepLR",
lr_scheduler_kwargs={"milestones": [451, ], "gamma": 0.1},
)
trainer = RL4COTrainer(
max_epochs=3,
accelerator="gpu",
devices=[device_id],
logger=None
)
trainer.fit(model)
moe_kwargs = {"encoder": {"hidden_act": "ReLU", "num_experts": 4, "k": 2, "noisy_gating": True},
"decoder": {"light_version": False, "num_experts": 4, "k": 2, "noisy_gating": True}}
model = MVMoE_POMO(
env,
moe_kwargs=moe_kwargs,
batch_size=128,
train_data_size=10000, # each epoch,
val_batch_size=100,
val_data_size=1000,
optimizer="Adam",
optimizer_kwargs={"lr": 1e-4, "weight_decay": 1e-6},
lr_scheduler="MultiStepLR",
lr_scheduler_kwargs={"milestones": [451, ], "gamma": 0.1},
)
trainer = RL4COTrainer(
max_epochs=3,
accelerator="gpu",
devices=[device_id],
logger=None
)
trainer.fit(model)
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'env' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['env'])`. /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'policy' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['policy'])`. Using 16bit Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default Missing logger folder: /home/botu/Dev/rl4co/examples/other/lightning_logs val_file not set. Generating dataset instead test_file not set. Generating dataset instead LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] | Name | Type | Params -------------------------------------------------- 0 | env | MTVRPEnv | 0 1 | policy | AttentionModelPolicy | 3.7 M 2 | baseline | SharedBaseline | 0 -------------------------------------------------- 3.7 M Trainable params 0 Non-trainable params 3.7 M Total params 14.868 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance. /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
Training: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
`Trainer.fit` stopped: `max_epochs=3` reached.
In [34]:
Copied!
# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td.to(device).clone(), env, phase="test", decode_type="greedy")
actions_mvmoe = out['actions'].cpu().detach()
rewards_mvmoe = out['reward'].cpu().detach()
for idx in [0, 1, 2]:
env.render(td[idx], actions_mvmoe[idx])
print("Cost: ", -rewards_mvmoe[idx].item())
print("Problem: ", variant_names[idx])
# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td.to(device).clone(), env, phase="test", decode_type="greedy")
actions_mvmoe = out['actions'].cpu().detach()
rewards_mvmoe = out['reward'].cpu().detach()
for idx in [0, 1, 2]:
env.render(td[idx], actions_mvmoe[idx])
print("Cost: ", -rewards_mvmoe[idx].item())
print("Problem: ", variant_names[idx])
Cost: 17.188127517700195 Problem: OVRPLTW
Cost: 14.578388214111328 Problem: CVRP
Cost: 12.24499797821045 Problem: VRPB
Getting gaps to classical solvers¶
We additionally offer an optional solve
API to get solutions from classical solvers. We can use this to get the gaps to the optimal solutions.
In [31]:
Copied!
# PyVRP - HGS
pyvrp_actions, pyvrp_costs = env.solve(td, max_runtime=5, num_procs=10, solver="pyvrp")
rewards_pyvrp = env.get_reward(td, pyvrp_actions)
# PyVRP - HGS
pyvrp_actions, pyvrp_costs = env.solve(td, max_runtime=5, num_procs=10, solver="pyvrp")
rewards_pyvrp = env.get_reward(td, pyvrp_actions)
In [36]:
Copied!
def calculate_gap(cost, bks):
gaps = (cost - bks) / bks
return gaps.mean() * 100
# Nearest insertion
actions = rollout(env, td.clone(), greedy_policy)
rewards_ni = env.get_reward(td, actions)
print(rewards_mvmoe, rewards_ni, rewards_pyvrp)
print(f"Gap to HGS (NI): {calculate_gap(-rewards_ni, -rewards_pyvrp):.2f}%")
print(f"Gap to HGS (MVMoE): {calculate_gap(-rewards_mvmoe, -rewards_pyvrp):.2f}%")
def calculate_gap(cost, bks):
gaps = (cost - bks) / bks
return gaps.mean() * 100
# Nearest insertion
actions = rollout(env, td.clone(), greedy_policy)
rewards_ni = env.get_reward(td, actions)
print(rewards_mvmoe, rewards_ni, rewards_pyvrp)
print(f"Gap to HGS (NI): {calculate_gap(-rewards_ni, -rewards_pyvrp):.2f}%")
print(f"Gap to HGS (MVMoE): {calculate_gap(-rewards_mvmoe, -rewards_pyvrp):.2f}%")
tensor([-17.1881, -14.5784, -12.2450]) tensor([-17.5034, -18.8677, -15.3984]) tensor([-12.6954, -11.9107, -9.9261]) Gap to HGS (NI): 50.47% Gap to HGS (MVMoE): 27.05%
With only two short epochs, we can already get better than NI!