MTVRP: Multi-task VRP environment¶
This environment can handle any of the following variants:
VRP Variant | Capacity (C) | Open Route (O) | Backhaul (B) | Duration Limit (L) | Time Window (TW) |
---|---|---|---|---|---|
CVRP | ✔ | ||||
OVRP | ✔ | ✔ | |||
VRPB | ✔ | ✔ | |||
VRPL | ✔ | ✔ | |||
VRPTW | ✔ | ✔ | |||
OVRPTW | ✔ | ✔ | ✔ | ||
OVRPB | ✔ | ✔ | ✔ | ||
OVRPL | ✔ | ✔ | ✔ | ||
VRPBL | ✔ | ✔ | ✔ | ||
VRPBTW | ✔ | ✔ | ✔ | ||
VRPLTW | ✔ | ✔ | ✔ | ||
OVRPBL | ✔ | ✔ | ✔ | ✔ | |
OVRPBTW | ✔ | ✔ | ✔ | ✔ | |
OVRPLTW | ✔ | ✔ | ✔ | ✔ | |
VRPBLTW | ✔ | ✔ | ✔ | ✔ | |
OVRPBLTW | ✔ | ✔ | ✔ | ✔ | ✔ |
It is fully batched, meaning that different variants can be in the same batch too!
In [1]:
Copied!
%load_ext autoreload
%autoreload 2
from rl4co.envs.routing.mtvrp.env import MTVRPEnv
from rl4co.envs.routing.mtvrp.generator import MTVRPGenerator
%load_ext autoreload
%autoreload 2
from rl4co.envs.routing.mtvrp.env import MTVRPEnv
from rl4co.envs.routing.mtvrp.generator import MTVRPGenerator
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning_utilities/core/imports.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2832: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/fabric/__init__.py:41: Deprecated call to `pkg_resources.declare_namespace('lightning.fabric')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2317: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(parent) /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/__init__.py:37: Deprecated call to `pkg_resources.declare_namespace('lightning.pytorch')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2317: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`. Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(parent)
Let's now generate some variants! By default, we can generate all variants with the variants_preset
variable
In [2]:
Copied!
# Single feat: generate a distribution of single-featured environments
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env = MTVRPEnv(generator, check_solution=False)
td_data = env.generator(8)
env.get_variant_names(td_data)
# Single feat: generate a distribution of single-featured environments
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env = MTVRPEnv(generator, check_solution=False)
td_data = env.generator(8)
env.get_variant_names(td_data)
Out[2]:
['VRPLTW', 'OVRP', 'VRPLTW', 'OVRPLTW', 'OVRPL', 'VRPB', 'OVRPTW', 'OVRPB']
In [3]:
Copied!
# Here is the list of presets and their probabilities of being generated (fully customizable)
env.print_presets()
# Here is the list of presets and their probabilities of being generated (fully customizable)
env.print_presets()
all: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5} single_feat: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5} single_feat_otw: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5, 'OTW': 0.5} cvrp: {'O': 0.0, 'TW': 0.0, 'L': 0.0, 'B': 0.0} ovrp: {'O': 1.0, 'TW': 0.0, 'L': 0.0, 'B': 0.0} vrpb: {'O': 0.0, 'TW': 0.0, 'L': 0.0, 'B': 1.0} vrpl: {'O': 0.0, 'TW': 0.0, 'L': 1.0, 'B': 0.0} vrptw: {'O': 0.0, 'TW': 1.0, 'L': 0.0, 'B': 0.0} ovrptw: {'O': 1.0, 'TW': 1.0, 'L': 0.0, 'B': 0.0} ovrpb: {'O': 1.0, 'TW': 0.0, 'L': 0.0, 'B': 1.0} ovrpl: {'O': 1.0, 'TW': 0.0, 'L': 1.0, 'B': 0.0} vrpbl: {'O': 0.0, 'TW': 0.0, 'L': 1.0, 'B': 1.0} vrpbtw: {'O': 0.0, 'TW': 1.0, 'L': 0.0, 'B': 1.0} vrpltw: {'O': 0.0, 'TW': 1.0, 'L': 1.0, 'B': 0.0} ovrpbl: {'O': 1.0, 'TW': 0.0, 'L': 1.0, 'B': 1.0} ovrpbtw: {'O': 1.0, 'TW': 1.0, 'L': 0.0, 'B': 1.0} ovrpltw: {'O': 1.0, 'TW': 1.0, 'L': 1.0, 'B': 0.0} vrpbltw: {'O': 0.0, 'TW': 1.0, 'L': 1.0, 'B': 1.0} ovrpbltw: {'O': 1.0, 'TW': 1.0, 'L': 1.0, 'B': 1.0}
We can change the preset to generate some specific variant, for instance the VRPB
In [4]:
Copied!
# Change generator
generator = MTVRPGenerator(num_loc=50, variant_preset="vrpb")
env.generator = generator
td_data = env.generator(8)
env.get_variant_names(td_data)
# Change generator
generator = MTVRPGenerator(num_loc=50, variant_preset="vrpb")
env.generator = generator
td_data = env.generator(8)
env.get_variant_names(td_data)
vrpb selected. Will not use feature combination!
Out[4]:
['VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB']
Greedy rollout and plot¶
In [5]:
Copied!
import torch
from rl4co.utils.ops import gather_by_index
# Simple heuristics (nearest neighbor + capacity check)
def greedy_policy(td):
"""Select closest available action"""
available_actions = td["action_mask"]
# distances
curr_node = td["current_node"]
loc_cur = gather_by_index(td["locs"], curr_node)
distances_next = torch.cdist(loc_cur[:, None, :], td["locs"], p=2.0).squeeze(1)
distances_next[~available_actions.bool()] = float("inf")
# do not select depot if some capacity is left
distances_next[:, 0] = float("inf") * (
td["used_capacity_linehaul"] < td["vehicle_capacity"]
).float().squeeze(-1)
# # if sum of available actions is 0, select depot
# distances_next[available_actions.sum(-1) == 0, 0] = 0
action = torch.argmin(distances_next, dim=-1)
td.set("action", action)
return td
def rollout(env, td, policy=greedy_policy, max_steps: int = None):
"""Helper function to rollout a policy. Currently, TorchRL does not allow to step
over envs when done with `env.rollout()`. We need this because for environments that complete at different steps.
"""
max_steps = float("inf") if max_steps is None else max_steps
actions = []
steps = 0
while not td["done"].all():
td = policy(td)
actions.append(td["action"])
td = env.step(td)["next"]
steps += 1
if steps > max_steps:
print("Max steps reached")
break
return torch.stack(actions, dim=1)
import torch
from rl4co.utils.ops import gather_by_index
# Simple heuristics (nearest neighbor + capacity check)
def greedy_policy(td):
"""Select closest available action"""
available_actions = td["action_mask"]
# distances
curr_node = td["current_node"]
loc_cur = gather_by_index(td["locs"], curr_node)
distances_next = torch.cdist(loc_cur[:, None, :], td["locs"], p=2.0).squeeze(1)
distances_next[~available_actions.bool()] = float("inf")
# do not select depot if some capacity is left
distances_next[:, 0] = float("inf") * (
td["used_capacity_linehaul"] < td["vehicle_capacity"]
).float().squeeze(-1)
# # if sum of available actions is 0, select depot
# distances_next[available_actions.sum(-1) == 0, 0] = 0
action = torch.argmin(distances_next, dim=-1)
td.set("action", action)
return td
def rollout(env, td, policy=greedy_policy, max_steps: int = None):
"""Helper function to rollout a policy. Currently, TorchRL does not allow to step
over envs when done with `env.rollout()`. We need this because for environments that complete at different steps.
"""
max_steps = float("inf") if max_steps is None else max_steps
actions = []
steps = 0
while not td["done"].all():
td = policy(td)
actions.append(td["action"])
td = env.step(td)["next"]
steps += 1
if steps > max_steps:
print("Max steps reached")
break
return torch.stack(actions, dim=1)
In [6]:
Copied!
# NOTE: if we don't select ovrpbltw, the below does not work and there is still some
# minor bug in either masking or variant subselection
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env.generator = generator
td_data = env.generator(3)
variant_names = env.get_variant_names(td_data)
td = env.reset(td_data)
actions = rollout(env, td.clone(), greedy_policy)
rewards = env.get_reward(td, actions)
for idx in [0, 1, 2]:
env.render(td[idx], actions[idx])
print("Cost: ", - rewards[idx].item())
print("Problem: ", variant_names[idx])
# NOTE: if we don't select ovrpbltw, the below does not work and there is still some
# minor bug in either masking or variant subselection
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env.generator = generator
td_data = env.generator(3)
variant_names = env.get_variant_names(td_data)
td = env.reset(td_data)
actions = rollout(env, td.clone(), greedy_policy)
rewards = env.get_reward(td, actions)
for idx in [0, 1, 2]:
env.render(td[idx], actions[idx])
print("Cost: ", - rewards[idx].item())
print("Problem: ", variant_names[idx])
Cost: 17.503389358520508 Problem: OVRPLTW
Cost: 18.86773109436035 Problem: CVRP
Cost: 15.39835262298584 Problem: VRPB
Train MVMoE on Multiple Problems¶
In [7]:
Copied!
from rl4co.utils.trainer import RL4COTrainer
from rl4co.models.zoo import MVMoE_POMO
device_id = 0
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
generator = MTVRPGenerator(num_loc=50, variant_preset="single_feat")
env = MTVRPEnv(generator, check_solution=False)
from rl4co.utils.trainer import RL4COTrainer
from rl4co.models.zoo import MVMoE_POMO
device_id = 0
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
generator = MTVRPGenerator(num_loc=50, variant_preset="single_feat")
env = MTVRPEnv(generator, check_solution=False)
single_feat selected. Will not use feature combination!
In [8]:
Copied!
moe_kwargs = {"encoder": {"hidden_act": "ReLU", "num_experts": 4, "k": 2, "noisy_gating": True},
"decoder": {"light_version": False, "num_experts": 4, "k": 2, "noisy_gating": True}}
model = MVMoE_POMO(
env,
moe_kwargs=moe_kwargs,
batch_size=128,
train_data_size=10000, # each epoch,
val_batch_size=100,
val_data_size=1000,
optimizer="Adam",
optimizer_kwargs={"lr": 1e-4, "weight_decay": 1e-6},
lr_scheduler="MultiStepLR",
lr_scheduler_kwargs={"milestones": [451, ], "gamma": 0.1},
)
trainer = RL4COTrainer(
max_epochs=3,
accelerator="gpu",
devices=[device_id],
logger=None
)
trainer.fit(model)
moe_kwargs = {"encoder": {"hidden_act": "ReLU", "num_experts": 4, "k": 2, "noisy_gating": True},
"decoder": {"light_version": False, "num_experts": 4, "k": 2, "noisy_gating": True}}
model = MVMoE_POMO(
env,
moe_kwargs=moe_kwargs,
batch_size=128,
train_data_size=10000, # each epoch,
val_batch_size=100,
val_data_size=1000,
optimizer="Adam",
optimizer_kwargs={"lr": 1e-4, "weight_decay": 1e-6},
lr_scheduler="MultiStepLR",
lr_scheduler_kwargs={"milestones": [451, ], "gamma": 0.1},
)
trainer = RL4COTrainer(
max_epochs=3,
accelerator="gpu",
devices=[device_id],
logger=None
)
trainer.fit(model)
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'env' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['env'])`. /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'policy' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['policy'])`. Using 16bit Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default Missing logger folder: /home/botu/Dev/rl4co/examples/other/lightning_logs val_file not set. Generating dataset instead test_file not set. Generating dataset instead LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] | Name | Type | Params -------------------------------------------------- 0 | env | MTVRPEnv | 0 1 | policy | AttentionModelPolicy | 3.7 M 2 | baseline | SharedBaseline | 0 -------------------------------------------------- 3.7 M Trainable params 0 Non-trainable params 3.7 M Total params 14.868 Total estimated model params size (MB)
Sanity Checking: | | 0/? [00:00<?, ?it/s]
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance. /home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
Training: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
Validation: | | 0/? [00:00<?, ?it/s]
`Trainer.fit` stopped: `max_epochs=3` reached.
In [34]:
Copied!
# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td.to(device).clone(), env, phase="test", decode_type="greedy")
actions_mvmoe = out['actions'].cpu().detach()
rewards_mvmoe = out['reward'].cpu().detach()
for idx in [0, 1, 2]:
env.render(td[idx], actions_mvmoe[idx])
print("Cost: ", -rewards_mvmoe[idx].item())
print("Problem: ", variant_names[idx])
# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td.to(device).clone(), env, phase="test", decode_type="greedy")
actions_mvmoe = out['actions'].cpu().detach()
rewards_mvmoe = out['reward'].cpu().detach()
for idx in [0, 1, 2]:
env.render(td[idx], actions_mvmoe[idx])
print("Cost: ", -rewards_mvmoe[idx].item())
print("Problem: ", variant_names[idx])