MTVRP: Multi-task VRP environment¶

This environment can handle any of the following variants:

VRP Variant	Capacity (C)	Open Route (O)	Backhaul (B)	Duration Limit (L)	Time Window (TW)
CVRP	✔
OVRP	✔	✔
VRPB	✔		✔
VRPL	✔			✔
VRPTW	✔				✔
OVRPTW	✔	✔			✔
OVRPB	✔	✔	✔
OVRPL	✔	✔		✔
VRPBL	✔		✔	✔
VRPBTW	✔		✔		✔
VRPLTW	✔			✔	✔
OVRPBL	✔	✔	✔	✔
OVRPBTW	✔	✔	✔		✔
OVRPLTW	✔	✔		✔	✔
VRPBLTW	✔		✔	✔	✔
OVRPBLTW	✔	✔	✔	✔	✔

It is fully batched, meaning that different variants can be in the same batch too!

In [1]:

Copied!

%load_ext autoreload
%autoreload 2

from rl4co.envs.routing.mtvrp.env import MTVRPEnv
from rl4co.envs.routing.mtvrp.generator import MTVRPGenerator
%load_ext autoreload
%autoreload 2

from rl4co.envs.routing.mtvrp.env import MTVRPEnv
from rl4co.envs.routing.mtvrp.generator import MTVRPGenerator

/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning_utilities/core/imports.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2832: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('sphinxcontrib')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(pkg)
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/fabric/__init__.py:41: Deprecated call to `pkg_resources.declare_namespace('lightning.fabric')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2317: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/__init__.py:37: Deprecated call to `pkg_resources.declare_namespace('lightning.pytorch')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/pkg_resources/__init__.py:2317: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('lightning')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
  declare_namespace(parent)

Let's now generate some variants! By default, we can generate all variants with the variants_preset variable

In [2]:

Copied!





# Single feat: generate a distribution of single-featured environments
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env = MTVRPEnv(generator, check_solution=False)

td_data = env.generator(8)
env.get_variant_names(td_data)
# Single feat: generate a distribution of single-featured environments
generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env = MTVRPEnv(generator, check_solution=False)

td_data = env.generator(8)
env.get_variant_names(td_data)

Out[2]:

['VRPLTW', 'OVRP', 'VRPLTW', 'OVRPLTW', 'OVRPL', 'VRPB', 'OVRPTW', 'OVRPB']

In [3]:

Copied!

# Here is the list of presets and their probabilities of being generated (fully customizable)
env.print_presets()
# Here is the list of presets and their probabilities of being generated (fully customizable)
env.print_presets()

all: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5}
single_feat: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5}
single_feat_otw: {'O': 0.5, 'TW': 0.5, 'L': 0.5, 'B': 0.5, 'OTW': 0.5}
cvrp: {'O': 0.0, 'TW': 0.0, 'L': 0.0, 'B': 0.0}
ovrp: {'O': 1.0, 'TW': 0.0, 'L': 0.0, 'B': 0.0}
vrpb: {'O': 0.0, 'TW': 0.0, 'L': 0.0, 'B': 1.0}
vrpl: {'O': 0.0, 'TW': 0.0, 'L': 1.0, 'B': 0.0}
vrptw: {'O': 0.0, 'TW': 1.0, 'L': 0.0, 'B': 0.0}
ovrptw: {'O': 1.0, 'TW': 1.0, 'L': 0.0, 'B': 0.0}
ovrpb: {'O': 1.0, 'TW': 0.0, 'L': 0.0, 'B': 1.0}
ovrpl: {'O': 1.0, 'TW': 0.0, 'L': 1.0, 'B': 0.0}
vrpbl: {'O': 0.0, 'TW': 0.0, 'L': 1.0, 'B': 1.0}
vrpbtw: {'O': 0.0, 'TW': 1.0, 'L': 0.0, 'B': 1.0}
vrpltw: {'O': 0.0, 'TW': 1.0, 'L': 1.0, 'B': 0.0}
ovrpbl: {'O': 1.0, 'TW': 0.0, 'L': 1.0, 'B': 1.0}
ovrpbtw: {'O': 1.0, 'TW': 1.0, 'L': 0.0, 'B': 1.0}
ovrpltw: {'O': 1.0, 'TW': 1.0, 'L': 1.0, 'B': 0.0}
vrpbltw: {'O': 0.0, 'TW': 1.0, 'L': 1.0, 'B': 1.0}
ovrpbltw: {'O': 1.0, 'TW': 1.0, 'L': 1.0, 'B': 1.0}

We can change the preset to generate some specific variant, for instance the VRPB

In [4]:

Copied!





# Change generator
generator = MTVRPGenerator(num_loc=50, variant_preset="vrpb")
env.generator = generator
td_data = env.generator(8)
env.get_variant_names(td_data)
# Change generator
generator = MTVRPGenerator(num_loc=50, variant_preset="vrpb")
env.generator = generator
td_data = env.generator(8)
env.get_variant_names(td_data)

vrpb selected. Will not use feature combination!

Out[4]:

['VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB', 'VRPB']

Greedy rollout and plot¶

In [5]:

Copied!





import torch
from rl4co.utils.ops import gather_by_index


# Simple heuristics (nearest neighbor + capacity check)
def greedy_policy(td):
    """Select closest available action"""
    available_actions = td["action_mask"]
    # distances
    curr_node = td["current_node"]
    loc_cur = gather_by_index(td["locs"], curr_node)
    distances_next = torch.cdist(loc_cur[:, None, :], td["locs"], p=2.0).squeeze(1)

    distances_next[~available_actions.bool()] = float("inf")
    # do not select depot if some capacity is left
    distances_next[:, 0] = float("inf") * (
        td["used_capacity_linehaul"] < td["vehicle_capacity"]
    ).float().squeeze(-1)

    # # if sum of available actions is 0, select depot
    # distances_next[available_actions.sum(-1) == 0, 0] = 0
    action = torch.argmin(distances_next, dim=-1)
    td.set("action", action)
    return td


def rollout(env, td, policy=greedy_policy, max_steps: int = None):
    """Helper function to rollout a policy. Currently, TorchRL does not allow to step
    over envs when done with `env.rollout()`. We need this because for environments that complete at different steps.
    """

    max_steps = float("inf") if max_steps is None else max_steps
    actions = []
    steps = 0

    while not td["done"].all():
        td = policy(td)
        actions.append(td["action"])
        td = env.step(td)["next"]
        steps += 1
        if steps > max_steps:
            print("Max steps reached")
            break
    return torch.stack(actions, dim=1)
import torch
from rl4co.utils.ops import gather_by_index


# Simple heuristics (nearest neighbor + capacity check)
def greedy_policy(td):
    """Select closest available action"""
    available_actions = td["action_mask"]
    # distances
    curr_node = td["current_node"]
    loc_cur = gather_by_index(td["locs"], curr_node)
    distances_next = torch.cdist(loc_cur[:, None, :], td["locs"], p=2.0).squeeze(1)

    distances_next[~available_actions.bool()] = float("inf")
    # do not select depot if some capacity is left
    distances_next[:, 0] = float("inf") * (
        td["used_capacity_linehaul"] < td["vehicle_capacity"]
    ).float().squeeze(-1)

    # # if sum of available actions is 0, select depot
    # distances_next[available_actions.sum(-1) == 0, 0] = 0
    action = torch.argmin(distances_next, dim=-1)
    td.set("action", action)
    return td


def rollout(env, td, policy=greedy_policy, max_steps: int = None):
    """Helper function to rollout a policy. Currently, TorchRL does not allow to step
    over envs when done with `env.rollout()`. We need this because for environments that complete at different steps.
    """

    max_steps = float("inf") if max_steps is None else max_steps
    actions = []
    steps = 0

    while not td["done"].all():
        td = policy(td)
        actions.append(td["action"])
        td = env.step(td)["next"]
        steps += 1
        if steps > max_steps:
            print("Max steps reached")
            break
    return torch.stack(actions, dim=1)

In [6]:

Copied!





# NOTE: if we don't select ovrpbltw, the below does not work and there is still some
# minor bug in either masking or variant subselection

generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env.generator = generator
td_data = env.generator(3)
variant_names = env.get_variant_names(td_data)

td = env.reset(td_data)

actions = rollout(env, td.clone(), greedy_policy)
rewards = env.get_reward(td, actions)

for idx in [0, 1, 2]:
    env.render(td[idx], actions[idx])
    print("Cost: ", - rewards[idx].item())
    print("Problem: ", variant_names[idx])
# NOTE: if we don't select ovrpbltw, the below does not work and there is still some
# minor bug in either masking or variant subselection

generator = MTVRPGenerator(num_loc=50, variant_preset="all")
env.generator = generator
td_data = env.generator(3)
variant_names = env.get_variant_names(td_data)

td = env.reset(td_data)

actions = rollout(env, td.clone(), greedy_policy)
rewards = env.get_reward(td, actions)

for idx in [0, 1, 2]:
    env.render(td[idx], actions[idx])
    print("Cost: ", - rewards[idx].item())
    print("Problem: ", variant_names[idx])

No description has been provided for this image

Cost:  17.503389358520508
Problem:  OVRPLTW

Cost:  18.86773109436035
Problem:  CVRP

Cost:  15.39835262298584
Problem:  VRPB

Train MVMoE on Multiple Problems¶

In [7]:

Copied!





from rl4co.utils.trainer import RL4COTrainer
from rl4co.models.zoo import MVMoE_POMO

device_id = 0
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
generator = MTVRPGenerator(num_loc=50, variant_preset="single_feat")
env = MTVRPEnv(generator, check_solution=False)
from rl4co.utils.trainer import RL4COTrainer
from rl4co.models.zoo import MVMoE_POMO

device_id = 0
device = torch.device(f"cuda:{device_id}" if torch.cuda.is_available() else "cpu")
generator = MTVRPGenerator(num_loc=50, variant_preset="single_feat")
env = MTVRPEnv(generator, check_solution=False)

single_feat selected. Will not use feature combination!

In [8]:

Copied!





moe_kwargs = {"encoder": {"hidden_act": "ReLU", "num_experts": 4, "k": 2, "noisy_gating": True},
              "decoder": {"light_version": False, "num_experts": 4, "k": 2, "noisy_gating": True}}
model = MVMoE_POMO(
    env,
    moe_kwargs=moe_kwargs,
    batch_size=128,
    train_data_size=10000,  # each epoch,
    val_batch_size=100,
    val_data_size=1000,
    optimizer="Adam",
    optimizer_kwargs={"lr": 1e-4, "weight_decay": 1e-6},
    lr_scheduler="MultiStepLR",
    lr_scheduler_kwargs={"milestones": [451, ], "gamma": 0.1},
)

trainer = RL4COTrainer(
        max_epochs=3,
        accelerator="gpu",
        devices=[device_id],
        logger=None
    )

trainer.fit(model)
moe_kwargs = {"encoder": {"hidden_act": "ReLU", "num_experts": 4, "k": 2, "noisy_gating": True},
              "decoder": {"light_version": False, "num_experts": 4, "k": 2, "noisy_gating": True}}
model = MVMoE_POMO(
    env,
    moe_kwargs=moe_kwargs,
    batch_size=128,
    train_data_size=10000,  # each epoch,
    val_batch_size=100,
    val_data_size=1000,
    optimizer="Adam",
    optimizer_kwargs={"lr": 1e-4, "weight_decay": 1e-6},
    lr_scheduler="MultiStepLR",
    lr_scheduler_kwargs={"milestones": [451, ], "gamma": 0.1},
)

trainer = RL4COTrainer(
        max_epochs=3,
        accelerator="gpu",
        devices=[device_id],
        logger=None
    )

trainer.fit(model)

/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'env' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['env'])`.
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/utilities/parsing.py:199: Attribute 'policy' is an instance of `nn.Module` and is already saved during checkpointing. It is recommended to ignore them using `self.save_hyperparameters(ignore=['policy'])`.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
Missing logger folder: /home/botu/Dev/rl4co/examples/other/lightning_logs
val_file not set. Generating dataset instead
test_file not set. Generating dataset instead
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

  | Name     | Type                 | Params
--------------------------------------------------
0 | env      | MTVRPEnv             | 0     
1 | policy   | AttentionModelPolicy | 3.7 M 
2 | baseline | SharedBaseline       | 0     
--------------------------------------------------
3.7 M     Trainable params
0         Non-trainable params
3.7 M     Total params
14.868    Total estimated model params size (MB)

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/botu/mambaforge/envs/rl4co/lib/python3.11/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_epochs=3` reached.

In [34]:

Copied!





# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td.to(device).clone(), env, phase="test", decode_type="greedy")
actions_mvmoe = out['actions'].cpu().detach()
rewards_mvmoe = out['reward'].cpu().detach()

for idx in [0, 1, 2]:
    env.render(td[idx], actions_mvmoe[idx])
    print("Cost: ", -rewards_mvmoe[idx].item())
    print("Problem: ", variant_names[idx])
# Greedy rollouts over trained model (same states as previous plot)
policy = model.policy.to(device)
out = policy(td.to(device).clone(), env, phase="test", decode_type="greedy")
actions_mvmoe = out['actions'].cpu().detach()
rewards_mvmoe = out['reward'].cpu().detach()

for idx in [0, 1, 2]:
    env.render(td[idx], actions_mvmoe[idx])
    print("Cost: ", -rewards_mvmoe[idx].item())
    print("Problem: ", variant_names[idx])

Cost:  17.188127517700195
Problem:  OVRPLTW

Cost:  14.578388214111328
Problem:  CVRP

Cost:  12.24499797821045
Problem:  VRPB

Getting gaps to classical solvers¶

We additionally offer an optional solve API to get solutions from classical solvers. We can use this to get the gaps to the optimal solutions.

In [31]:

Copied!

# PyVRP - HGS
pyvrp_actions, pyvrp_costs = env.solve(td, max_runtime=5, num_procs=10, solver="pyvrp")
rewards_pyvrp = env.get_reward(td, pyvrp_actions)
# PyVRP - HGS
pyvrp_actions, pyvrp_costs = env.solve(td, max_runtime=5, num_procs=10, solver="pyvrp")
rewards_pyvrp = env.get_reward(td, pyvrp_actions)

In [36]:

Copied!





def calculate_gap(cost, bks):   
    gaps = (cost - bks) / bks
    return gaps.mean() * 100

# Nearest insertion
actions = rollout(env, td.clone(), greedy_policy)
rewards_ni = env.get_reward(td, actions)

print(rewards_mvmoe, rewards_ni, rewards_pyvrp)   
print(f"Gap to HGS (NI): {calculate_gap(-rewards_ni, -rewards_pyvrp):.2f}%")
print(f"Gap to HGS (MVMoE): {calculate_gap(-rewards_mvmoe, -rewards_pyvrp):.2f}%")
def calculate_gap(cost, bks):   
    gaps = (cost - bks) / bks
    return gaps.mean() * 100

# Nearest insertion
actions = rollout(env, td.clone(), greedy_policy)
rewards_ni = env.get_reward(td, actions)

print(rewards_mvmoe, rewards_ni, rewards_pyvrp)   
print(f"Gap to HGS (NI): {calculate_gap(-rewards_ni, -rewards_pyvrp):.2f}%")
print(f"Gap to HGS (MVMoE): {calculate_gap(-rewards_mvmoe, -rewards_pyvrp):.2f}%")

tensor([-17.1881, -14.5784, -12.2450]) tensor([-17.5034, -18.8677, -15.3984]) tensor([-12.6954, -11.9107,  -9.9261])
Gap to HGS (NI): 50.47%
Gap to HGS (MVMoE): 27.05%

With only two short epochs, we can already get better than NI!