Skip to content

Routing Problems

See also the Multi-Task VRP at the bottom of this page, that includes 16 variants!

Asymmetric Traveling Salesman Problem (ATSP)

ATSPEnv

ATSPEnv(
    generator: ATSPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Asymmetric Traveling Salesman Problem (ATSP) environment At each step, the agent chooses a customer to visit. The reward is 0 unless the agent visits all the customers. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length. Unlike the TSP, the distance matrix is asymmetric, i.e., the distance from A to B is not necessarily the same as the distance from B to A.

Observations
  • distance matrix between customers
  • the current customer
  • the first customer (for calculating the reward)
  • the remaining unvisited customers
Constraints
  • the tour starts and ends at the same customer.
  • each customer must be visited exactly once.
Finish Condition
  • the agent has visited all customers.
Reward
  • (minus) the negative length of the path.

Parameters:

  • generator (ATSPGenerator, default: None ) –

    ATSPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Source code in rl4co/envs/routing/atsp/env.py
47
48
49
50
51
52
53
54
55
56
57
def __init__(
    self,
    generator: ATSPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = ATSPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

ATSPGenerator

ATSPGenerator(
    num_loc: int = 10,
    min_dist: float = 0.0,
    max_dist: float = 1.0,
    dist_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    tmat_class: bool = True,
    **kwargs
)

Bases: Generator

Data generator for the Asymmetric Travelling Salesman Problem (ATSP) Generate distance matrices inspired by the reference MatNet (Kwon et al., 2021) We satifsy the triangle inequality (TMAT class) in a batch

Parameters:

  • num_loc (int, default: 10 ) –

    number of locations (customers) in the TSP

  • min_dist (float, default: 0.0 ) –

    minimum value for the distance between nodes

  • max_dist (float, default: 1.0 ) –

    maximum value for the distance between nodes

  • dist_distribution (int | float | str | type | Callable, default: Uniform ) –

    distribution for the distance between nodes

  • tmat_class (bool, default: True ) –

    whether to generate a class of distance matrix

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer

Source code in rl4co/envs/routing/atsp/generator.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def __init__(
    self,
    num_loc: int = 10,
    min_dist: float = 0.0,
    max_dist: float = 1.0,
    dist_distribution: int | float | str | type | Callable = Uniform,
    tmat_class: bool = True,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_dist = min_dist
    self.max_dist = max_dist
    self.tmat_class = tmat_class

    # Distance distribution
    if kwargs.get("dist_sampler", None) is not None:
        self.dist_sampler = kwargs["dist_sampler"]
    else:
        self.dist_sampler = get_sampler("dist", dist_distribution, 0.0, 1.0, **kwargs)

Capacitated Vehicle Routing Problem (CVRP)

CVRPEnv

CVRPEnv(
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Capacitated Vehicle Routing Problem (CVRP) environment. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations
  • location of the depot.
  • locations and demand of each customer.
  • current location of the vehicle.
  • the remaining customer of the vehicle,
Constraints
  • the tour starts and ends at the depot.
  • each customer must be visited exactly once.
  • the vehicle cannot visit customers exceed the remaining capacity.
  • the vehicle can return to the depot to refill the capacity.
Finish Condition
  • the vehicle has visited all customers and returned to the depot.
Reward
  • (minus) the negative length of the path.

Parameters:

  • generator (CVRPGenerator, default: None ) –

    CVRPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Methods:

Source code in rl4co/envs/routing/cvrp/env.py
57
58
59
60
61
62
63
64
65
66
67
def __init__(
    self,
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = CVRPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

check_solution_validity staticmethod

check_solution_validity(td: TensorDict, actions: Tensor)

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded

Source code in rl4co/envs/routing/cvrp/env.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor):
    """Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded"""
    # Check if tour is valid, i.e. contain 0 to n-1
    batch_size, graph_size = td["demand"].size()
    sorted_pi = actions.data.sort(1)[0]

    # Sorting it should give all zeros at front and then 1...n
    assert (
        torch.arange(1, graph_size + 1, out=sorted_pi.data.new())
        .view(1, -1)
        .expand(batch_size, graph_size)
        == sorted_pi[:, -graph_size:]
    ).all() and (sorted_pi[:, :-graph_size] == 0).all(), "Invalid tour"

    # Visiting depot resets capacity so we add demand = -capacity (we make sure it does not become negative)
    demand_with_depot = torch.cat((-td["vehicle_capacity"], td["demand"]), 1)
    d = demand_with_depot.gather(1, actions)

    used_cap = torch.zeros_like(td["demand"][:, 0])
    for i in range(actions.size(1)):
        used_cap += d[
            :, i
        ]  # This will reset/make capacity negative if i == 0, e.g. depot visited
        # Cannot use less than 0
        used_cap[used_cap < 0] = 0
        assert (
            used_cap <= td["vehicle_capacity"] + 1e-5
        ).all(), "Used more than capacity"

load_data staticmethod

load_data(fpath, batch_size=[])

Dataset loading from file Normalize demand by capacity to be in [0, 1]

Source code in rl4co/envs/routing/cvrp/env.py
188
189
190
191
192
193
194
195
@staticmethod
def load_data(fpath, batch_size=[]):
    """Dataset loading from file
    Normalize demand by capacity to be in [0, 1]
    """
    td_load = load_npz_to_tensordict(fpath)
    td_load.set("demand", td_load["demand"] / td_load["capacity"][:, None])
    return td_load

replace_selected_actions

replace_selected_actions(
    cur_actions: Tensor,
    new_actions: Tensor,
    selection_mask: Tensor,
) -> Tensor

Replace selected current actions with updated actions based on selection_mask.

Source code in rl4co/envs/routing/cvrp/env.py
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
def replace_selected_actions(
    self,
    cur_actions: torch.Tensor,
    new_actions: torch.Tensor,
    selection_mask: torch.Tensor,
) -> torch.Tensor:
    """
    Replace selected current actions with updated actions based on `selection_mask`.

    Args:
        cur_actions [batch_size, num_loc]
        new_actions [batch_size, num_loc]
        selection_mask [batch_size,]
    """
    diff_length = cur_actions.size(-1) - new_actions.size(-1)
    if diff_length > 0:
        new_actions = torch.nn.functional.pad(
            new_actions, (0, diff_length, 0, 0), mode="constant", value=0
        )
    elif diff_length < 0:
        cur_actions = torch.nn.functional.pad(
            cur_actions, (0, -diff_length, 0, 0), mode="constant", value=0
        )
    cur_actions[selection_mask] = new_actions[selection_mask]
    return cur_actions

CVRPGenerator

CVRPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    depot_distribution: (
        int | float | str | type | Callable
    ) = None,
    min_demand: int = 1,
    max_demand: int = 10,
    demand_distribution: (
        int | float | type | Callable
    ) = Uniform,
    vehicle_capacity: float = 1.0,
    capacity: float = None,
    **kwargs
)

Bases: Generator

Data generator for the Capacitated Vehicle Routing Problem (CVRP).

Parameters:

  • num_loc (int, default: 20 ) –

    number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)

  • min_loc (float, default: 0.0 ) –

    minimum value for the location coordinates

  • max_loc (float, default: 1.0 ) –

    maximum value for the location coordinates

  • loc_distribution (int | float | str | type | Callable, default: Uniform ) –

    distribution for the location coordinates

  • depot_distribution (int | float | str | type | Callable, default: None ) –

    distribution for the depot location. If None, sample the depot from the locations

  • min_demand (int, default: 1 ) –

    minimum value for the demand of each customer

  • max_demand (int, default: 10 ) –

    maximum value for the demand of each customer

  • demand_distribution (int | float | type | Callable, default: Uniform ) –

    distribution for the demand of each customer

  • capacity (float, default: None ) –

    capacity of the vehicle

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot demand [batch_size, num_loc]: demand of each customer capacity [batch_size]: capacity of the vehicle

Source code in rl4co/envs/routing/cvrp/generator.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: int | float | str | type | Callable = Uniform,
    depot_distribution: int | float | str | type | Callable = None,
    min_demand: int = 1,
    max_demand: int = 10,
    demand_distribution: int | float | type | Callable = Uniform,
    vehicle_capacity: float = 1.0,
    capacity: float = None,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.min_demand = min_demand
    self.max_demand = max_demand
    self.vehicle_capacity = vehicle_capacity

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = (
            get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs)
            if depot_distribution is not None
            else None
        )

    # Demand distribution
    if kwargs.get("demand_sampler", None) is not None:
        self.demand_sampler = kwargs["demand_sampler"]
    else:
        self.demand_sampler = get_sampler(
            "demand", demand_distribution, min_demand - 1, max_demand - 1, **kwargs
        )

    # Capacity
    if (
        capacity is None
    ):  # If not provided, use the default capacity from Kool et al. 2019
        capacity = CAPACITIES.get(num_loc, None)
    if (
        capacity is None
    ):  # If not in the table keys, find the closest number of nodes as the key
        closest_num_loc = min(CAPACITIES.keys(), key=lambda x: abs(x - num_loc))
        capacity = CAPACITIES[closest_num_loc]
        log.warning(
            f"The capacity capacity for {num_loc} locations is not defined. Using the closest capacity: {capacity}\
                with {closest_num_loc} locations."
        )
    self.capacity = capacity

Multiple Traveling Salesman Problem (mTSP)

MTSPEnv

MTSPEnv(
    generator: MTSPGenerator = None,
    generator_params: dict = {},
    cost_type: str = "minmax",
    **kwargs
)

Bases: RL4COEnvBase

Multiple Traveling Salesman Problem environment At each step, an agent chooses to visit a city. A maximum of num_agents agents can be employed to visit the cities. The cost can be defined in two ways:

- `minmax`: (default) the reward is the maximum of the path lengths of all the agents
- `sum`: the cost is the sum of the path lengths of all the agents

Reward is - cost, so the goal is to maximize the reward (minimize the cost).

Observations
  • locations of the depot and each customer.
  • number of agents.
  • the current agent index.
  • the current location of the vehicle.
Constrains
  • each agent's tour starts and ends at the depot.
  • each customer must be visited exactly once.
Finish condition
  • all customers are visited and all agents back to the depot.
Reward

There are two ways to calculate the cost (-reward):

  • minmax: (default) the cost is the maximum of the path lengths of all the agents.
  • sum: the cost is the sum of the path lengths of all the agents.

Parameters:

  • cost_type (str, default: 'minmax' ) –

    type of cost to use, either minmax or sum

  • generator (MTSPGenerator, default: None ) –

    MTSPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Source code in rl4co/envs/routing/mtsp/env.py
50
51
52
53
54
55
56
57
58
59
60
61
62
def __init__(
    self,
    generator: MTSPGenerator = None,
    generator_params: dict = {},
    cost_type: str = "minmax",
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = MTSPGenerator(**generator_params)
    self.generator = generator
    self.cost_type = cost_type
    self._make_spec(self.generator)

MTSPGenerator

MTSPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    min_num_agents: int = 5,
    max_num_agents: int = 5,
    **kwargs
)

Bases: Generator

Data generator for the Multiple Travelling Salesman Problem (mTSP).

Parameters:

  • num_loc (int, default: 20 ) –

    number of locations (customers) in the TSP

  • min_loc (float, default: 0.0 ) –

    minimum value for the location coordinates

  • max_loc (float, default: 1.0 ) –

    maximum value for the location coordinates

  • loc_distribution (int | float | str | type | Callable, default: Uniform ) –

    distribution for the location coordinates

  • min_num_agents (int, default: 5 ) –

    minimum number of agents (vehicles), include

  • max_num_agents (int, default: 5 ) –

    maximum number of agents (vehicles), include

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer num_agents [batch_size]: number of agents (vehicles)

Source code in rl4co/envs/routing/mtsp/generator.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: int | float | str | type | Callable = Uniform,
    min_num_agents: int = 5,
    max_num_agents: int = 5,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.min_num_agents = min_num_agents
    self.max_num_agents = max_num_agents

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

Orienteering Problem (OP)

OPEnv

OPEnv(
    generator: OPGenerator = None,
    generator_params: dict = {},
    prize_type: str = "dist",
    **kwargs
)

Bases: RL4COEnvBase

Orienteering Problem (OP) environment. At each step, the agent chooses a location to visit in order to maximize the collected prize. The total length of the path must not exceed a given threshold.

Observations
  • location of the depot
  • locations and prize of each customer
  • current location of the vehicle
  • current tour length
  • current total prize
  • the remaining length of the path
Constraints
  • the tour starts and ends at the depot
  • not all customers need to be visited
  • the vehicle cannot visit customers exceed the remaining length of the path
Finish Condition
  • the vehicle back to the depot
Reward
  • the sum of the prizes of visited nodes

Parameters:

  • generator (OPGenerator, default: None ) –

    OPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Methods:

  • get_action_mask

    Get action mask with 1 = feasible action, 0 = infeasible action.

  • check_solution_validity

    Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded.

Source code in rl4co/envs/routing/op/env.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def __init__(
    self,
    generator: OPGenerator = None,
    generator_params: dict = {},
    prize_type: str = "dist",
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = OPGenerator(**generator_params)
    self.generator = generator
    self.prize_type = prize_type
    assert self.prize_type in [
        "dist",
        "unif",
        "const",
    ], f"Invalid prize_type: {self.prize_type}"
    self._make_spec(self.generator)

get_action_mask staticmethod

get_action_mask(td: TensorDict) -> Tensor

Get action mask with 1 = feasible action, 0 = infeasible action. Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.

Source code in rl4co/envs/routing/op/env.py
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
@staticmethod
def get_action_mask(td: TensorDict) -> torch.Tensor:
    """Get action mask with 1 = feasible action, 0 = infeasible action.
    Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.
    """
    current_loc = gather_by_index(td["locs"], td["current_node"])[..., None, :]
    exceeds_length = (
        td["tour_length"][..., None] + (td["locs"] - current_loc).norm(p=2, dim=-1)
        > td["max_length"]
    )
    mask = td["visited"] | td["visited"][..., 0:1] | exceeds_length

    action_mask = ~mask  # 1 = feasible action, 0 = infeasible action

    # Depot can always be visited: we do not hardcode knowledge that this is strictly suboptimal if other options are available
    action_mask[..., 0] = 1
    return action_mask

check_solution_validity staticmethod

check_solution_validity(
    td: TensorDict,
    actions: Tensor,
    add_distance_to_depot: bool = True,
) -> None

Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded. If add_distance_to_depot if True, then the distance to the depot is added to max length since by default, the max length is modified in the reset function to account for the distance to the depot.

Source code in rl4co/envs/routing/op/env.py
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
@staticmethod
def check_solution_validity(
    td: TensorDict, actions: torch.Tensor, add_distance_to_depot: bool = True
) -> None:
    """Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded.
    If `add_distance_to_depot` if True, then the distance to the depot is added to max length since by default, the max length is
    modified in the reset function to account for the distance to the depot.
    """

    # Check that tours are valid, i.e. contain 0 to n -1
    sorted_actions = actions.data.sort(1)[0]
    # Make sure each node visited once at most (except for depot)
    assert (
        (sorted_actions[:, 1:] == 0)
        | (sorted_actions[:, 1:] > sorted_actions[:, :-1])
    ).all(), "Duplicates"

    # Gather locations in order of tour and get the length of tours
    locs_ordered = gather_by_index(td["locs"], actions)
    length = get_tour_length(locs_ordered)

    max_length = td["max_length"]
    if add_distance_to_depot:
        max_length = (
            max_length
            + (td["locs"][..., 0:1, :] - td["locs"]).norm(p=2, dim=-1)
            + 1e-6
        )
    assert (
        length[..., None] <= max_length + 1e-5
    ).all(), "Max length exceeded by {}".format(
        (length[..., None] - max_length).max()
    )

OPGenerator

OPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    depot_distribution: (
        int | float | str | type | Callable
    ) = None,
    min_prize: float = 1.0,
    max_prize: float = 1.0,
    prize_distribution: (
        int | float | type | Callable
    ) = Uniform,
    prize_type: str = "dist",
    max_length: float | Tensor = None,
    **kwargs
)

Bases: Generator

Data generator for the Orienteering Problem (OP).

Parameters:

  • num_loc (int, default: 20 ) –

    number of locations (customers) in the OP, without the depot. (e.g. 10 means 10 locs + 1 depot)

  • min_loc (float, default: 0.0 ) –

    minimum value for the location coordinates

  • max_loc (float, default: 1.0 ) –

    maximum value for the location coordinates

  • loc_distribution (int | float | str | type | Callable, default: Uniform ) –

    distribution for the location coordinates

  • depot_distribution (int | float | str | type | Callable, default: None ) –

    distribution for the depot location. If None, sample the depot from the locations

  • min_prize (float, default: 1.0 ) –

    minimum value for the prize of each customer

  • max_prize (float, default: 1.0 ) –

    maximum value for the prize of each customer

  • prize_distribution (int | float | type | Callable, default: Uniform ) –

    distribution for the prize of each customer

  • max_length (float | Tensor, default: None ) –

    maximum length of the path

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot prize [batch_size, num_loc]: prize of each customer max_length [batch_size, 1]: maximum length of the path for each customer

Source code in rl4co/envs/routing/op/generator.py
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: int | float | str | type | Callable = Uniform,
    depot_distribution: int | float | str | type | Callable = None,
    min_prize: float = 1.0,
    max_prize: float = 1.0,
    prize_distribution: int | float | type | Callable = Uniform,
    prize_type: str = "dist",
    max_length: float | torch.Tensor = None,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.min_prize = min_prize
    self.max_prize = max_prize
    self.prize_type = prize_type
    self.max_length = max_length

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = (
            get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs)
            if depot_distribution is not None
            else None
        )

    # Prize distribution
    if kwargs.get("prize_sampler", None) is not None:
        self.prize_sampler = kwargs["prize_sampler"]
    elif (
        prize_distribution == "dist"
    ):  # If prize_distribution is 'dist', then the prize is the distance from the depot
        self.prize_sampler = None
    else:
        self.prize_sampler = get_sampler(
            "prize", prize_distribution, min_prize, max_prize, **kwargs
        )

    # Max length
    if max_length is not None:
        self.max_length = max_length
    else:
        self.max_length = MAX_LENGTHS.get(num_loc, None)
    if self.max_length is None:
        closest_num_loc = min(MAX_LENGTHS.keys(), key=lambda x: abs(x - num_loc))
        self.max_length = MAX_LENGTHS[closest_num_loc]
        log.warning(
            f"The max length for {num_loc} locations is not defined. Using the closest max length: {self.max_length}\
                with {closest_num_loc} locations."
        )

Pickup and Delivery Problem (PDP)

PDPEnv

PDPEnv(
    generator: PDPGenerator = None,
    generator_params: dict = {},
    force_start_at_depot: bool = False,
    **kwargs
)

Bases: RL4COEnvBase

Pickup and Delivery Problem (PDP) environment. The environment is made of num_loc + 1 locations (cities):

- 1 depot
- `num_loc` / 2 pickup locations
- `num_loc` / 2 delivery locations

The goal is to visit all the pickup and delivery locations in the shortest path possible starting from the depot The conditions is that the agent must visit a pickup location before visiting its corresponding delivery location

Observations
  • locations of the depot, pickup, and delivery locations
  • current location of the vehicle
  • the remaining locations to deliver
  • the visited locations
  • the current step
Constraints
  • the tour starts and ends at the depot
  • each pickup location must be visited before its corresponding delivery location
  • the vehicle cannot visit the same location twice
Finish Condition
  • the vehicle has visited all locations
Reward
  • (minus) the negative length of the path

Parameters:

  • generator (PDPGenerator, default: None ) –

    PDPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

  • force_start_at_depot (bool, default: False ) –

    whether to force the agent to start at the depot If False (default), the agent won't consider the depot, which is added in the get_reward method If True, the only valid action at the first step is to visit the depot (=0)

Methods:

  • get_num_starts

    Only half of the nodes (i.e. pickup nodes) can be start nodes

  • select_start_nodes

    Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected

Source code in rl4co/envs/routing/pdp/env.py
52
53
54
55
56
57
58
59
60
61
62
63
64
def __init__(
    self,
    generator: PDPGenerator = None,
    generator_params: dict = {},
    force_start_at_depot: bool = False,
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = PDPGenerator(**generator_params)
    self.generator = generator
    self.force_start_at_depot = force_start_at_depot
    self._make_spec(self.generator)

get_num_starts

get_num_starts(td)

Only half of the nodes (i.e. pickup nodes) can be start nodes

Source code in rl4co/envs/routing/pdp/env.py
228
229
230
def get_num_starts(self, td):
    """Only half of the nodes (i.e. pickup nodes) can be start nodes"""
    return (td["locs"].shape[-2] - 1) // 2

select_start_nodes

select_start_nodes(td, num_starts)

Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected

Source code in rl4co/envs/routing/pdp/env.py
232
233
234
235
236
237
238
239
240
def select_start_nodes(self, td, num_starts):
    """Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected"""
    num_possible_starts = (td["locs"].shape[-2] - 1) // 2
    selected = (
        torch.arange(num_starts, device=td.device).repeat_interleave(td.shape[0])
        % num_possible_starts
        + 1
    )
    return selected

PDPGenerator

PDPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    depot_distribution: (
        int | float | str | type | Callable
    ) = None,
    **kwargs
)

Bases: Generator

Data generator for the Pickup and Delivery Problem (PDP). Args: num_loc: number of locations (customers) in the PDP, without the depot. (e.g. 10 means 10 locs + 1 depot)

    - 1 depot
    - `num_loc` / 2 pickup locations
    - `num_loc` / 2 delivery locations
min_loc: minimum value for the location coordinates
max_loc: maximum value for the location coordinates
init_sol_type: the method type used for generating initial solutions (random or greedy)
loc_distribution: distribution for the location coordinates
depot_distribution: distribution for the depot location. If None, sample the depot from the locations

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot

Source code in rl4co/envs/routing/pdp/generator.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: int | float | str | type | Callable = Uniform,
    depot_distribution: int | float | str | type | Callable = None,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.init_sol_type = init_sol_type

    # Number of locations must be even
    if num_loc % 2 != 0:
        log.warn(
            "Number of locations must be even. Adding 1 to the number of locations."
        )
        self.num_loc += 1

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = (
            get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs)
            if depot_distribution is not None
            else None
        )

Prize Collecting Traveling Salesman Problem (PCTSP)

PCTSPEnv

PCTSPEnv(
    generator: PCTSPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Prize-collecting TSP (PCTSP) environment. The goal is to collect as much prize as possible while minimizing the total travel cost. The environment is stochastic, the prize is only revealed when the node is visited.

Observations
  • locations of the nodes
  • prize and penalty of each node
  • current location of the vehicle
  • current total prize
  • current total penalty
  • visited nodes
  • prize required to visit a node
  • the current step
Constraints
  • the tour starts and ends at the depot
  • the vehicle cannot visit nodes exceed the remaining prize
Finish Condition
  • the vehicle back to the depot
Reward
  • the sum of the saved penalties

Parameters:

  • generator (PCTSPGenerator, default: None ) –

    OPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Methods:

  • get_action_mask

    Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes

  • check_solution_validity

    Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited

Source code in rl4co/envs/routing/pctsp/env.py
52
53
54
55
56
57
58
59
60
61
62
def __init__(
    self,
    generator: PCTSPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = PCTSPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

get_action_mask staticmethod

get_action_mask(td: TensorDict) -> Tensor

Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes

Source code in rl4co/envs/routing/pctsp/env.py
149
150
151
152
153
154
155
156
@staticmethod
def get_action_mask(td: TensorDict) -> torch.Tensor:
    """Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes"""
    mask = td["visited"] | td["visited"][..., 0:1]
    mask[..., 0] = (td["cur_total_prize"] < 1.0) & (
        td["visited"][..., 1:].int().sum(-1) < td["visited"][..., 1:].size(-1)
    )
    return ~(mask > 0)  # Invert mask, since 1 means feasible action

check_solution_validity staticmethod

check_solution_validity(
    td: TensorDict, actions: Tensor
) -> None

Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited

Source code in rl4co/envs/routing/pctsp/env.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None:
    """Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited"""

    # Check that tours are valid, i.e. contain 0 to n -1
    sorted_actions = actions.data.sort(1)[0]

    # Make sure each node visited once at most (except for depot)
    assert (
        (sorted_actions[..., 1:] == 0)
        | (sorted_actions[..., 1:] > sorted_actions[..., :-1])
    ).all(), "Duplicates"

    prize = td["real_prize"][..., 1:]  # Remove depot
    prize_with_depot = torch.cat((torch.zeros_like(prize[:, :1]), prize), 1)
    p = prize_with_depot.gather(1, actions)

    # Either prize constraint should be satisfied or all prizes should be visited
    assert (
        (p.sum(-1) >= 1 - 1e-5)
        | (
            sorted_actions.size(-1) - (sorted_actions == 0).int().sum(-1)
            == (td["locs"].size(-2) - 1)
        )  # no depot
    ).all(), "Total prize does not satisfy min total prize"

PCTSPGenerator

PCTSPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    depot_distribution: (
        int | float | str | type | Callable
    ) = None,
    penalty_factor: float = 3.0,
    prize_required: float = 1.0,
    **kwargs
)

Bases: Generator

Data generator for the Prize-collecting Traveling Salesman Problem (PCTSP).

Parameters:

  • num_loc (int, default: 20 ) –

    number of locations (customers) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)

  • min_loc (float, default: 0.0 ) –

    minimum value for the location coordinates

  • max_loc (float, default: 1.0 ) –

    maximum value for the location coordinates

  • loc_distribution (int | float | str | type | Callable, default: Uniform ) –

    distribution for the location coordinates

  • depot_distribution (int | float | str | type | Callable, default: None ) –

    distribution for the depot location. If None, sample the depot from the locations

  • min_demand

    minimum value for the demand of each customer

  • max_demand

    maximum value for the demand of each customer

  • demand_distribution

    distribution for the demand of each customer

  • capacity

    capacity of the vehicle

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each city depot [batch_size, 2]: location of the depot demand [batch_size, num_loc]: demand of each customer capacity [batch_size, 1]: capacity of the vehicle

Source code in rl4co/envs/routing/pctsp/generator.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: int | float | str | type | Callable = Uniform,
    depot_distribution: int | float | str | type | Callable = None,
    penalty_factor: float = 3.0,
    prize_required: float = 1.0,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.penalty_fctor = penalty_factor
    self.prize_required = prize_required

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    # Depot distribution
    if kwargs.get("depot_sampler", None) is not None:
        self.depot_sampler = kwargs["depot_sampler"]
    else:
        self.depot_sampler = (
            get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs)
            if depot_distribution is not None
            else None
        )

    # Prize distribution
    self.deterministic_prize_sampler = get_sampler(
        "deterministric_prize", "uniform", 0.0, 4.0 / self.num_loc, **kwargs
    )
    self.stochastic_prize_sampler = get_sampler(
        "stochastic_prize", "uniform", 0.0, 2.0, **kwargs
    )

    # For the penalty to make sense it should be not too large (in which case all nodes will be visited) nor too small
    # so we want the objective term to be approximately equal to the length of the tour, which we estimate with half
    # of the nodes by half of the tour length (which is very rough but similar to op)
    # This means that the sum of penalties for all nodes will be approximately equal to the tour length (on average)
    # The expected total (uniform) penalty of half of the nodes (since approx half will be visited by the constraint)
    # is (n / 2) / 2 = n / 4 so divide by this means multiply by 4 / n,
    # However instead of 4 we use penalty_factor (3 works well) so we can make them larger or smaller
    self.max_penalty = kwargs.get("max_penalty", None)
    if self.max_penalty is None:  # If not provided, use the default max penalty
        self.max_penalty = MAX_LENGTHS.get(num_loc, None)
    if (
        self.max_penalty is None
    ):  # If not in the table keys, find the closest number of nodes as the key
        closest_num_loc = min(MAX_LENGTHS.keys(), key=lambda x: abs(x - num_loc))
        self.max_penalty = MAX_LENGTHS[closest_num_loc]
        log.warning(
            f"The max penalty for {num_loc} locations is not defined. Using the closest max penalty: {self.max_penalty}\
                with {closest_num_loc} locations."
        )

    # Adjust as in Kool et al. (2019)
    self.max_penalty *= penalty_factor / self.num_loc
    self.penalty_sampler = get_sampler(
        "penalty", "uniform", 0.0, self.max_penalty, **kwargs
    )

Split Delivery Vehicle Routing Problem (SDVRP)

SDVRPEnv

SDVRPEnv(
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: CVRPEnv

Split Delivery Vehicle Routing Problem (SDVRP) environment. SDVRP is a generalization of CVRP, where nodes can be visited multiple times and a fraction of the demand can be met. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the customers. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations
  • location of the depot.
  • locations and demand/remaining demand of each customer
  • current location of the vehicle.
  • the remaining capacity of the vehicle.
Constraints
  • the tour starts and ends at the depot.
  • each customer can be visited multiple times.
  • the vehicle cannot visit customers exceed the remaining capacity.
  • the vehicle can return to the depot to refill the capacity.
Finish Condition
  • the vehicle has finished all customers demand and returned to the depot.
Reward
  • (minus) the negative length of the path.

Parameters:

  • generator (CVRPGenerator, default: None ) –

    CVRPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Methods:

Source code in rl4co/envs/routing/sdvrp/env.py
50
51
52
53
54
55
56
def __init__(
    self,
    generator: CVRPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(generator, generator_params, **kwargs)

check_solution_validity staticmethod

check_solution_validity(
    td: TensorDict, actions: Tensor
) -> None

Check that the solution is valid (all demand is satisfied)

Source code in rl4co/envs/routing/sdvrp/env.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None:
    """Check that the solution is valid (all demand is satisfied)"""

    batch_size, graph_size = td["demand"].size()

    # Each node can be visited multiple times, but we always deliver as much demand as possible
    # We check that at the end all demand has been satisfied
    demands = torch.cat((-td["vehicle_capacity"], td["demand"]), 1)

    rng = torch.arange(batch_size, out=demands.data.new().long())
    used_cap = torch.zeros_like(td["demand"][..., 0])
    a_prev = None
    for a in actions.transpose(0, 1):
        assert (
            a_prev is None or (demands[((a_prev == 0) & (a == 0)), :] == 0).all()
        ), "Cannot visit depot twice if any nonzero demand"
        d = torch.min(demands[rng, a], td["vehicle_capacity"].squeeze(-1) - used_cap)
        demands[rng, a] -= d
        used_cap += d
        used_cap[a == 0] = 0
        a_prev = a
    assert (demands == 0).all(), "All demand must be satisfied"

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)

SPCTSPEnv

SPCTSPEnv(**kwargs)

Bases: PCTSPEnv

Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP) environment.

Note

The only difference with deterministic PCTSP is that the prizes are stochastic (i.e. the expected prize is not the same as the real prize).

Source code in rl4co/envs/routing/spctsp/env.py
19
20
def __init__(self, **kwargs):
    super().__init__(**kwargs)

Traveling Salesman Problem (TSP)

TSPEnv

TSPEnv(
    generator: TSPGenerator = None,
    generator_params: dict = {},
    **kwargs
)

Bases: RL4COEnvBase

Traveling Salesman Problem (TSP) environment At each step, the agent chooses a city to visit. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.

Observations
  • locations of each customer.
  • the current location of the vehicle.
Constraints
  • the tour must return to the starting customer.
  • each customer must be visited exactly once.
Finish condition
  • the agent has visited all customers and returned to the starting customer.
Reward
  • (minus) the negative length of the path.

Parameters:

  • generator (TSPGenerator, default: None ) –

    TSPGenerator instance as the data generator

  • generator_params (dict, default: {} ) –

    parameters for the generator

Methods:

Source code in rl4co/envs/routing/tsp/env.py
50
51
52
53
54
55
56
57
58
59
60
def __init__(
    self,
    generator: TSPGenerator = None,
    generator_params: dict = {},
    **kwargs,
):
    super().__init__(**kwargs)
    if generator is None:
        generator = TSPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

check_solution_validity staticmethod

check_solution_validity(
    td: TensorDict, actions: Tensor
) -> None

Check that solution is valid: nodes are visited exactly once

Source code in rl4co/envs/routing/tsp/env.py
160
161
162
163
164
165
166
167
168
@staticmethod
def check_solution_validity(td: TensorDict, actions: torch.Tensor) -> None:
    """Check that solution is valid: nodes are visited exactly once"""
    assert (
        torch.arange(actions.size(1), out=actions.data.new())
        .view(1, -1)
        .expand_as(actions)
        == actions.data.sort(1)[0]
    ).all(), "Invalid tour"

replace_selected_actions

replace_selected_actions(
    cur_actions: Tensor,
    new_actions: Tensor,
    selection_mask: Tensor,
) -> Tensor

Replace selected current actions with updated actions based on selection_mask.

Source code in rl4co/envs/routing/tsp/env.py
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def replace_selected_actions(
    self,
    cur_actions: torch.Tensor,
    new_actions: torch.Tensor,
    selection_mask: torch.Tensor,
) -> torch.Tensor:
    """
    Replace selected current actions with updated actions based on `selection_mask`.

    Args:
        cur_actions [batch_size, num_loc]
        new_actions [batch_size, num_loc]
        selection_mask [batch_size,]
    """
    cur_actions[selection_mask] = new_actions[selection_mask]
    return cur_actions

TSPGenerator

TSPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    **kwargs
)

Bases: Generator

Data generator for the Travelling Salesman Problem (TSP).

Parameters:

  • num_loc (int, default: 20 ) –

    number of locations (customers) in the TSP

  • min_loc (float, default: 0.0 ) –

    minimum value for the location coordinates

  • max_loc (float, default: 1.0 ) –

    maximum value for the location coordinates

  • init_sol_type (str, default: 'random' ) –

    the method type used for generating initial solutions (random or greedy)

  • loc_distribution (int | float | str | type | Callable, default: Uniform ) –

    distribution for the location coordinates

Returns:

  • A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer

Source code in rl4co/envs/routing/tsp/generator.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    init_sol_type: str = "random",
    loc_distribution: int | float | str | type | Callable = Uniform,
    **kwargs,
):
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    self.init_sol_type = init_sol_type

    # Location distribution
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

Multi-Task Vehicle Routing Problem (MTVRP)

MTVRPEnv

MTVRPEnv(
    generator: MTVRPGenerator = None,
    generator_params: dict = {},
    check_solution: bool = False,
    **kwargs
)

Bases: RL4COEnvBase

MTVRPEnv is a Multi-Task VRP environment which can take any combination of the following constraints:

Features:

  • Capacity (C) - Each vehicle has a maximum capacity \(Q\), restricting the total load that can be in the vehicle at any point of the route. - The route must be planned such that the sum of demands and pickups for all customers visited does not exceed this capacity.
  • Time Windows (TW) - Every node \(i\) has an associated time window \([e_i, l_i]\) during which service must commence. - Additionally, each node has a service time \(s_i\). Vehicles must reach node \(i\) within its time window; early arrivals must wait at the node location until time \(e_i\).
  • Open Routes (O) - Vehicles are not required to return to the depot after serving all customers. - Note that this does not need to be counted as a constraint since it can be modelled by setting zero costs on arcs returning to the depot \(c_{i0} = 0\) from any customer \(i \in C\), and not counting the return arc as part of the route.
  • Backhauls (B) - Backhauls generalize demand to also account for return shipments. Customers are either linehaul or backhaul customers. - Linehaul customers require delivery of a demand \(q_i > 0\) that needs to be transported from the depot to the customer, whereas backhaul customers need a pickup of an amount \(p_i > 0\) that is transported from the client back to the depot. - It is possible for vehicles to serve a combination of linehaul and backhaul customers in a single route, but then any linehaul customers must precede the backhaul customers in the route.
  • Duration Limits (L) - Imposes a limit on the total travel duration (or length) of each route, ensuring a balanced workload across vehicles.

The environment covers the following 16 variants depending on the data generation:

VRP Variant Capacity (C) Open Route (O) Backhaul (B) Duration Limit (L) Time Window (TW)
CVRP
OVRP
VRPB
VRPL
VRPTW
OVRPTW
OVRPB
OVRPL
VRPBL
VRPBTW
VRPLTW
OVRPBL
OVRPBTW
OVRPLTW
VRPBLTW
OVRPBLTW

You may also check out the following papers as reference:

Tip

Have a look at https://pyvrp.org/ for more information about VRP and its variants and their solutions. Kudos to their help and great job!

Parameters:

  • generator (MTVRPGenerator, default: None ) –

    Generator for the environment, see :class:MTVRPGenerator.

  • generator_params (dict, default: {} ) –

    Parameters for the generator.

Methods:

  • load_data

    Dataset loading from file

  • render

    Simple wrapper for render function

  • select_start_nodes

    Select available start nodes for the environment (e.g. for POMO-based training)

  • solve

    Classical solver for the environment. This is a wrapper for the baselines solver.

  • check_variants

    Check if the problem has the variants

Source code in rl4co/envs/routing/mtvrp/env.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def __init__(
    self,
    generator: MTVRPGenerator = None,
    generator_params: dict = {},
    check_solution: bool = False,
    **kwargs,
):
    if check_solution:
        log.warning(
            "Solution checking is enabled. This may slow down the environment."
            " We recommend disabling this for training by passing `check_solution=False`."
        )

    super().__init__(check_solution=check_solution, **kwargs)

    if generator is None:
        generator = MTVRPGenerator(**generator_params)
    self.generator = generator
    self._make_spec(self.generator)

load_data

load_data(fpath, batch_size=[], scale=False)

Dataset loading from file Normalize demand by capacity to be in [0, 1]

Source code in rl4co/envs/routing/mtvrp/env.py
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
def load_data(self, fpath, batch_size=[], scale=False):
    """Dataset loading from file
    Normalize demand by capacity to be in [0, 1]
    """
    td_load = load_npz_to_tensordict(fpath)
    if scale:
        td_load.set(
            "demand_linehaul",
            td_load["demand_linehaul"] / td_load["capacity_original"],
        )
        td_load.set(
            "demand_backhaul",
            td_load["demand_backhaul"] / td_load["capacity_original"],
        )
    return td_load

render staticmethod

render(*args, **kwargs)

Simple wrapper for render function

Source code in rl4co/envs/routing/mtvrp/env.py
378
379
380
381
382
383
@staticmethod
def render(*args, **kwargs):
    """Simple wrapper for render function"""
    from .render import render

    return render(*args, **kwargs)

select_start_nodes

select_start_nodes(td, num_starts)

Select available start nodes for the environment (e.g. for POMO-based training)

Source code in rl4co/envs/routing/mtvrp/env.py
385
386
387
388
389
390
391
392
393
def select_start_nodes(self, td, num_starts):
    """Select available start nodes for the environment (e.g. for POMO-based training)"""
    num_loc = td["locs"].shape[-2] - 1
    selected = (
        torch.arange(num_starts, device=td.device).repeat_interleave(td.shape[0])
        % num_loc
        + 1
    )
    return selected

solve staticmethod

solve(
    instances: TensorDict,
    max_runtime: float,
    num_procs: int = 1,
    solver: str = "pyvrp",
    **kwargs
) -> tuple[Tensor, Tensor]

Classical solver for the environment. This is a wrapper for the baselines solver. Available solvers are: pyvrp, ortools, lkh. Returns the actions and costs.

Source code in rl4co/envs/routing/mtvrp/env.py
395
396
397
398
399
400
401
402
403
404
405
406
407
408
@staticmethod
def solve(
    instances: TensorDict,
    max_runtime: float,
    num_procs: int = 1,
    solver: str = "pyvrp",
    **kwargs,
) -> tuple[torch.Tensor, torch.Tensor]:
    """Classical solver for the environment. This is a wrapper for the baselines solver.
    Available solvers are: `pyvrp`, `ortools`, `lkh`. Returns the actions and costs.
    """
    from .baselines.solve import solve

    return solve(instances, max_runtime, num_procs, solver, **kwargs)

check_variants staticmethod

check_variants(td)

Check if the problem has the variants

Source code in rl4co/envs/routing/mtvrp/env.py
457
458
459
460
461
462
463
464
@staticmethod
def check_variants(td):
    """Check if the problem has the variants"""
    has_open = td["open_route"].squeeze(-1)
    has_tw = (td["time_windows"][:, :, 1] != float("inf")).any(-1)
    has_limit = (td["distance_limit"] != float("inf")).squeeze(-1)
    has_backhaul = (td["demand_backhaul"] != 0).any(-1)
    return has_open, has_tw, has_limit, has_backhaul

MTVRPGenerator

MTVRPGenerator(
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: (
        int | float | str | type | Callable
    ) = Uniform,
    capacity: float = None,
    min_demand: int = 1,
    max_demand: int = 10,
    min_backhaul: int = 1,
    max_backhaul: int = 10,
    scale_demand: bool = True,
    max_time: float = 4.6,
    backhaul_ratio: float = 0.2,
    distance_limit: float = 3.0,
    speed: float = 1.0,
    prob_open: float = 0.5,
    prob_time_window: float = 0.5,
    prob_limit: float = 0.5,
    prob_backhaul: float = 0.5,
    variant_preset=None,
    use_combinations=True,
    subsample=True,
    **kwargs
)

Bases: Generator

MTVRP Generator. Class to generate instances of the MTVRP problem. If a variant is declared and Subsample is True, the generator will sample the problem based on the variant probabilities. By default, we use Mixed-Batch Training as in Berto et al. 2024 (RouteFinder), i.e. one batch can contain multiple variants.

Example presets:

  • "all": Sample uniformly from 16 variants
  • "single_feat": Sample uniformly between CVRP, OVRP, VRPB, VRPL, VRPTW (as done in Liu et al. 2024 (MTPOMO))
  • "single_feat_otw": Sample uniformly between CVRP, OVRP, VRPB, VRPL, VRPTW, OVRPTW (as done in Zhou et al. 2024 (MVMoE))
  • "cvrp": Only CVRP (similarly for other variants)

Parameters:

  • num_loc (int, default: 20 ) –

    Number of locations to generate

  • min_loc (float, default: 0.0 ) –

    Minimum location value

  • max_loc (float, default: 1.0 ) –

    Maximum location value

  • loc_distribution (int | float | str | type | Callable, default: Uniform ) –

    Distribution to sample locations from

  • capacity (float, default: None ) –

    Vehicle capacity. If None, get value based on get_vehicle_capacity

  • min_demand (int, default: 1 ) –

    Minimum demand value

  • max_demand (int, default: 10 ) –

    Maximum demand value

  • min_backhaul (int, default: 1 ) –

    Minimum backhaul value

  • max_backhaul (int, default: 10 ) –

    Maximum backhaul value

  • scale_demand (bool, default: True ) –

    Scale demand values (by default, generate between 1 and 10)

  • max_time (float, default: 4.6 ) –

    Maximum time window value (at depot)

  • backhaul_ratio (float, default: 0.2 ) –

    Fraction of backhauls (e.g. 0.2 means 20% of nodes are backhaul)

  • distance_limit (float, default: 3.0 ) –

    Distance limit

  • speed (float, default: 1.0 ) –

    Speed of vehicle. Defaults to 1

  • subsample

    If False, we always sample all attributes (i.e., OVRPBLTW) If true, we use the

  • **kwargs

    Additional keyword arguments

Methods:

  • subsample_problems

    Create subproblems starting from seed probabilities depending on their variant.

  • generate_locations

    Generate seed locations.

  • generate_demands

    Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation.

  • generate_time_windows

    Generate time windows (TW) and service times for each location including depot.

  • generate_distance_limit

    Generates distance limits (L) and checks their feasibilities.

  • generate_open_route

    Generate open route flags (O). Here we could have a sampler but we simply return True here so all

  • generate_speed

    We simply generate the speed as constant here

Source code in rl4co/envs/routing/mtvrp/generator.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
def __init__(
    self,
    num_loc: int = 20,
    min_loc: float = 0.0,
    max_loc: float = 1.0,
    loc_distribution: int | float | str | type | Callable = Uniform,
    capacity: float = None,
    min_demand: int = 1,
    max_demand: int = 10,
    min_backhaul: int = 1,
    max_backhaul: int = 10,
    scale_demand: bool = True,
    max_time: float = 4.6,
    backhaul_ratio: float = 0.2,
    distance_limit: float = 3.0,
    speed: float = 1.0,
    prob_open: float = 0.5,
    prob_time_window: float = 0.5,
    prob_limit: float = 0.5,
    prob_backhaul: float = 0.5,
    variant_preset=None,
    use_combinations=True,
    subsample=True,
    **kwargs,
) -> None:
    # Location distribution
    self.num_loc = num_loc
    self.min_loc = min_loc
    self.max_loc = max_loc
    if kwargs.get("loc_sampler", None) is not None:
        self.loc_sampler = kwargs["loc_sampler"]
    else:
        self.loc_sampler = get_sampler(
            "loc", loc_distribution, min_loc, max_loc, **kwargs
        )

    if capacity is None:
        capacity = get_vehicle_capacity(num_loc)
    self.capacity = capacity
    self.min_demand = min_demand
    self.max_demand = max_demand
    self.min_backhaul = min_backhaul
    self.max_backhaul = max_backhaul
    self.scale_demand = scale_demand
    self.backhaul_ratio = backhaul_ratio

    self.max_time = max_time
    self.distance_limit = distance_limit
    self.speed = speed

    assert not (
        subsample and (variant_preset is None)
    ), "Cannot use subsample if variant_preset is not specified. "
    if variant_preset is not None:
        log.info(f"Using variant generation preset {variant_preset}")
        variant_probs = VARIANT_GENERATION_PRESETS.get(variant_preset)
        assert (
            variant_probs is not None
        ), f"Variant generation preset {variant_preset} not found. \
            Available presets are {VARIANT_GENERATION_PRESETS.keys()} with probabilities {VARIANT_GENERATION_PRESETS.values()}"
    else:
        variant_probs = {
            "O": prob_open,
            "TW": prob_time_window,
            "L": prob_limit,
            "B": prob_backhaul,
        }
    # check probabilities
    for key, prob in variant_probs.items():
        assert 0 <= prob <= 1, f"Probability {key} must be between 0 and 1"
    self.variant_probs = variant_probs
    self.variant_preset = variant_preset
    if isinstance(variant_preset, str) and variant_preset != "all":
        log.warning(f"{variant_preset} selected. Will not use feature combination!")
        use_combinations = False
    self.use_combinations = use_combinations
    self.subsample = subsample

subsample_problems

subsample_problems(td)

Create subproblems starting from seed probabilities depending on their variant. If random seed sampled in [0, 1] in batch is greater than prob, remove the constraint thus, if prob high, it is less likely to remove the constraint (i.e. prob=0.9, 90% chance to keep constraint)

Source code in rl4co/envs/routing/mtvrp/generator.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def subsample_problems(self, td):
    """Create subproblems starting from seed probabilities depending on their variant.
    If random seed sampled in [0, 1] in batch is greater than prob, remove the constraint
    thus, if prob high, it is less likely to remove the constraint (i.e. prob=0.9, 90% chance to keep constraint)
    """
    batch_size = td.batch_size[0]

    variant_probs = torch.tensor(list(self.variant_probs.values()))

    if self.use_combinations:
        # in a batch, multiple variants combinations can be picked
        keep_mask = torch.rand(batch_size, 4) >= variant_probs  # O, TW, L, B
    else:
        # in a batch, only a variant can be picked.
        # we assign a 0.5 prob to the last variant (which is normal cvrp)
        if self.variant_preset in list(
            VARIANT_GENERATION_PRESETS.keys()
        ) and self.variant_preset not in (
            "all",
            "cvrp",
            "single_feat",
            "single_feat_otw",
        ):
            cvrp_prob = 0
        else:
            cvrp_prob = 0.5
        if self.variant_preset in ("all", "cvrp", "single_feat", "single_feat_otw"):
            indices = torch.distributions.Categorical(
                torch.Tensor(list(self.variant_probs.values()) + [cvrp_prob])[
                    None
                ].repeat(batch_size, 1)
            ).sample()
            if self.variant_preset == "single_feat_otw":
                keep_mask = torch.zeros((batch_size, 6), dtype=torch.bool)
                keep_mask[torch.arange(batch_size), indices] = True

                # If keep_mask[:, 4] is True, make both keep_mask[:, 0] and keep_mask[:, 1] True
                keep_mask[:, :2] |= keep_mask[:, 4:5]
            else:
                keep_mask = torch.zeros((batch_size, 5), dtype=torch.bool)
                keep_mask[torch.arange(batch_size), indices] = True
        else:
            # if the variant is specified, we keep the attributes with probability > 0
            keep_mask = torch.zeros((batch_size, 4), dtype=torch.bool)
            indices = torch.nonzero(variant_probs).squeeze()
            keep_mask[:, indices] = True

    td = self._default_open(td, ~keep_mask[:, 0])
    td = self._default_time_window(td, ~keep_mask[:, 1])
    td = self._default_distance_limit(td, ~keep_mask[:, 2])
    td = self._default_backhaul(td, ~keep_mask[:, 3])

    return td

generate_locations

generate_locations(batch_size, num_loc) -> Tensor

Generate seed locations.

Returns:

  • locs ( Tensor ) –

    [B, N+1, 2] where the first location is the depot.

Source code in rl4co/envs/routing/mtvrp/generator.py
317
318
319
320
321
322
323
324
325
326
def generate_locations(self, batch_size, num_loc) -> torch.Tensor:
    """Generate seed locations.

    Returns:
        locs: [B, N+1, 2] where the first location is the depot.
    """
    locs = torch.FloatTensor(*batch_size, num_loc + 1, 2).uniform_(
        self.min_loc, self.max_loc
    )
    return locs

generate_demands

generate_demands(batch_size: int, num_loc: int) -> Tensor

Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation. Initialize the demand for nodes except the depot, which are added during reset. Demand sampling Following Kool et al. (2019), demands as integers between 1 and 10. Generates a slightly different distribution than using torch.randint.

Returns:

  • linehaul_demand ( Tensor ) –

    [B, N]

  • backhaul_demand ( Tensor ) –

    [B, N]

Source code in rl4co/envs/routing/mtvrp/generator.py
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
def generate_demands(self, batch_size: int, num_loc: int) -> torch.Tensor:
    """Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation.
    Initialize the demand for nodes except the depot, which are added during reset.
    Demand sampling Following Kool et al. (2019), demands as integers between 1 and 10.
    Generates a slightly different distribution than using torch.randint.

    Returns:
        linehaul_demand: [B, N]
        backhaul_demand: [B, N]
    """
    linehaul_demand = (
        torch.FloatTensor(*batch_size, num_loc)
        .uniform_(self.min_demand - 1, self.max_demand - 1)
        .int()
        + 1
    ).float()
    # Backhaul demand sampling
    backhaul_demand = (
        torch.FloatTensor(*batch_size, num_loc)
        .uniform_(self.min_backhaul - 1, self.max_backhaul - 1)
        .int()
        + 1
    ).float()
    is_linehaul = torch.rand(*batch_size, num_loc) > self.backhaul_ratio
    backhaul_demand = (
        backhaul_demand * ~is_linehaul
    )  # keep only values where they are not linehauls
    linehaul_demand = linehaul_demand * is_linehaul
    return linehaul_demand, backhaul_demand

generate_time_windows

generate_time_windows(
    locs: Tensor, speed: Tensor
) -> Tensor

Generate time windows (TW) and service times for each location including depot. We refer to the generation process in "Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization" (Liu et al., 2024). Note that another way to generate is from "Learning to Delegate for Large-scale Vehicle Routing" (Li et al, 2021) which is used in "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts" (Zhou et al, 2024). Note that however, in that case the distance limit would have no influence when time windows are present, since the tw for depot is the same as distance with speed=1. This function can be overridden for that implementation. See also https://github.com/RoyalSkye/Routing-MVMoE

Parameters:

  • locs (Tensor) –

    [B, N+1, 2] (depot, locs)

  • speed (Tensor) –

    [B]

Returns:

  • time_windows ( Tensor ) –

    [B, N+1, 2]

  • service_time ( Tensor ) –

    [B, N+1]

Source code in rl4co/envs/routing/mtvrp/generator.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
def generate_time_windows(
    self,
    locs: torch.Tensor,
    speed: torch.Tensor,
) -> torch.Tensor:
    """Generate time windows (TW) and service times for each location including depot.
    We refer to the generation process in "Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization"
    (Liu et al., 2024). Note that another way to generate is from "Learning to Delegate for Large-scale Vehicle Routing" (Li et al, 2021) which
    is used in "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts" (Zhou et al, 2024). Note that however, in that case
    the distance limit would have no influence when time windows are present, since the tw for depot is the same as distance with speed=1.
    This function can be overridden for that implementation.
    See also https://github.com/RoyalSkye/Routing-MVMoE

    Args:
        locs: [B, N+1, 2] (depot, locs)
        speed: [B]

    Returns:
        time_windows: [B, N+1, 2]
        service_time: [B, N+1]
    """

    batch_size, n_loc = locs.shape[0], locs.shape[1] - 1  # no depot

    a, b, c = 0.15, 0.18, 0.2
    service_time = a + (b - a) * torch.rand(batch_size, n_loc)
    tw_length = b + (c - b) * torch.rand(batch_size, n_loc)
    d_0i = get_distance(locs[:, 0:1], locs[:, 1:])
    h_max = (self.max_time - service_time - tw_length) / d_0i * speed - 1
    tw_start = (1 + (h_max - 1) * torch.rand(batch_size, n_loc)) * d_0i / speed
    tw_end = tw_start + tw_length

    # Depot tw is 0, max_time
    time_windows = torch.stack(
        (
            torch.cat((torch.zeros(batch_size, 1), tw_start), -1),  # start
            torch.cat((torch.full((batch_size, 1), self.max_time), tw_end), -1),
        ),  # en
        dim=-1,
    )
    # depot service time is 0
    service_time = torch.cat((torch.zeros(batch_size, 1), service_time), dim=-1)
    return time_windows, service_time  # [B, N+1, 2], [B, N+1]

generate_distance_limit

generate_distance_limit(
    shape: Tuple[int, int], locs: Tensor
) -> Tensor

Generates distance limits (L) and checks their feasibilities.

Returns:

  • distance_limit ( Tensor ) –

    [B, 1]

Source code in rl4co/envs/routing/mtvrp/generator.py
402
403
404
405
406
407
408
409
410
411
412
413
414
415
def generate_distance_limit(
    self, shape: Tuple[int, int], locs: torch.Tensor
) -> torch.Tensor:
    """Generates distance limits (L) and checks their feasibilities.

    Returns:
        distance_limit: [B, 1]
    """
    # calculate distance of all locations to depot
    dist_to_depot = torch.cdist(locs, locs[:, 0:1, :], p=2)
    assert (
        dist_to_depot * 2 < self.distance_limit  # go back and forth
    ).all(), "Distance limit too low, not all nodes can be reached from the depot."
    return torch.full(shape, self.distance_limit, dtype=torch.float32)

generate_open_route

generate_open_route(shape: Tuple[int, int])

Generate open route flags (O). Here we could have a sampler but we simply return True here so all routes are open. Afterwards, we subsample the problems.

Source code in rl4co/envs/routing/mtvrp/generator.py
417
418
419
420
421
def generate_open_route(self, shape: Tuple[int, int]):
    """Generate open route flags (O). Here we could have a sampler but we simply return True here so all
    routes are open. Afterwards, we subsample the problems.
    """
    return torch.ones(shape, dtype=torch.bool)

generate_speed

generate_speed(shape: Tuple[int, int])

We simply generate the speed as constant here

Source code in rl4co/envs/routing/mtvrp/generator.py
423
424
425
426
def generate_speed(self, shape: Tuple[int, int]):
    """We simply generate the speed as constant here"""
    # in this version, the speed is constant but this class may be overridden
    return torch.full(shape, self.speed, dtype=torch.float32)