Routing Problems¶
See also the Multi-Task VRP at the bottom of this page, that includes 16 variants!
Asymmetric Traveling Salesman Problem (ATSP)¶
ATSPEnv
¶
ATSPEnv(
generator: ATSPGenerator = None,
generator_params: dict = {},
**kwargs
)
Bases: RL4COEnvBase
Asymmetric Traveling Salesman Problem (ATSP) environment At each step, the agent chooses a customer to visit. The reward is 0 unless the agent visits all the customers. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length. Unlike the TSP, the distance matrix is asymmetric, i.e., the distance from A to B is not necessarily the same as the distance from B to A.
Observations
- distance matrix between customers
- the current customer
- the first customer (for calculating the reward)
- the remaining unvisited customers
Constraints
- the tour starts and ends at the same customer.
- each customer must be visited exactly once.
Finish Condition
- the agent has visited all customers.
Reward
- (minus) the negative length of the path.
Parameters:
-
generator
(ATSPGenerator
, default:None
) –ATSPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Source code in rl4co/envs/routing/atsp/env.py
47 48 49 50 51 52 53 54 55 56 57 |
|
ATSPGenerator
¶
ATSPGenerator(
num_loc: int = 10,
min_dist: float = 0.0,
max_dist: float = 1.0,
dist_distribution: (
int | float | str | type | Callable
) = Uniform,
tmat_class: bool = True,
**kwargs
)
Bases: Generator
Data generator for the Asymmetric Travelling Salesman Problem (ATSP) Generate distance matrices inspired by the reference MatNet (Kwon et al., 2021) We satifsy the triangle inequality (TMAT class) in a batch
Parameters:
-
num_loc
(int
, default:10
) –number of locations (customers) in the TSP
-
min_dist
(float
, default:0.0
) –minimum value for the distance between nodes
-
max_dist
(float
, default:1.0
) –maximum value for the distance between nodes
-
dist_distribution
(int | float | str | type | Callable
, default:Uniform
) –distribution for the distance between nodes
-
tmat_class
(bool
, default:True
) –whether to generate a class of distance matrix
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer
Source code in rl4co/envs/routing/atsp/generator.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
Capacitated Vehicle Routing Problem (CVRP)¶
CVRPEnv
¶
CVRPEnv(
generator: CVRPGenerator = None,
generator_params: dict = {},
**kwargs
)
Bases: RL4COEnvBase
Capacitated Vehicle Routing Problem (CVRP) environment. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.
Observations
- location of the depot.
- locations and demand of each customer.
- current location of the vehicle.
- the remaining customer of the vehicle,
Constraints
- the tour starts and ends at the depot.
- each customer must be visited exactly once.
- the vehicle cannot visit customers exceed the remaining capacity.
- the vehicle can return to the depot to refill the capacity.
Finish Condition
- the vehicle has visited all customers and returned to the depot.
Reward
- (minus) the negative length of the path.
Parameters:
-
generator
(CVRPGenerator
, default:None
) –CVRPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Methods:
-
check_solution_validity
–Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded
-
load_data
–Dataset loading from file
-
replace_selected_actions
–Replace selected current actions with updated actions based on
selection_mask
.
Source code in rl4co/envs/routing/cvrp/env.py
57 58 59 60 61 62 63 64 65 66 67 |
|
check_solution_validity
staticmethod
¶
check_solution_validity(td: TensorDict, actions: Tensor)
Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded
Source code in rl4co/envs/routing/cvrp/env.py
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
load_data
staticmethod
¶
load_data(fpath, batch_size=[])
Dataset loading from file Normalize demand by capacity to be in [0, 1]
Source code in rl4co/envs/routing/cvrp/env.py
188 189 190 191 192 193 194 195 |
|
replace_selected_actions
¶
replace_selected_actions(
cur_actions: Tensor,
new_actions: Tensor,
selection_mask: Tensor,
) -> Tensor
Replace selected current actions with updated actions based on selection_mask
.
Source code in rl4co/envs/routing/cvrp/env.py
230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 |
|
CVRPGenerator
¶
CVRPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
depot_distribution: (
int | float | str | type | Callable
) = None,
min_demand: int = 1,
max_demand: int = 10,
demand_distribution: (
int | float | type | Callable
) = Uniform,
vehicle_capacity: float = 1.0,
capacity: float = None,
**kwargs
)
Bases: Generator
Data generator for the Capacitated Vehicle Routing Problem (CVRP).
Parameters:
-
num_loc
(int
, default:20
) –number of locations (cities) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)
-
min_loc
(float
, default:0.0
) –minimum value for the location coordinates
-
max_loc
(float
, default:1.0
) –maximum value for the location coordinates
-
loc_distribution
(int | float | str | type | Callable
, default:Uniform
) –distribution for the location coordinates
-
depot_distribution
(int | float | str | type | Callable
, default:None
) –distribution for the depot location. If None, sample the depot from the locations
-
min_demand
(int
, default:1
) –minimum value for the demand of each customer
-
max_demand
(int
, default:10
) –maximum value for the demand of each customer
-
demand_distribution
(int | float | type | Callable
, default:Uniform
) –distribution for the demand of each customer
-
capacity
(float
, default:None
) –capacity of the vehicle
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot demand [batch_size, num_loc]: demand of each customer capacity [batch_size]: capacity of the vehicle
Source code in rl4co/envs/routing/cvrp/generator.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
Multiple Traveling Salesman Problem (mTSP)¶
MTSPEnv
¶
MTSPEnv(
generator: MTSPGenerator = None,
generator_params: dict = {},
cost_type: str = "minmax",
**kwargs
)
Bases: RL4COEnvBase
Multiple Traveling Salesman Problem environment
At each step, an agent chooses to visit a city. A maximum of num_agents
agents can be employed to visit the cities.
The cost can be defined in two ways:
- `minmax`: (default) the reward is the maximum of the path lengths of all the agents
- `sum`: the cost is the sum of the path lengths of all the agents
Reward is - cost, so the goal is to maximize the reward (minimize the cost).
Observations
- locations of the depot and each customer.
- number of agents.
- the current agent index.
- the current location of the vehicle.
Constrains
- each agent's tour starts and ends at the depot.
- each customer must be visited exactly once.
Finish condition
- all customers are visited and all agents back to the depot.
Reward
There are two ways to calculate the cost (-reward):
minmax
: (default) the cost is the maximum of the path lengths of all the agents.sum
: the cost is the sum of the path lengths of all the agents.
Parameters:
-
cost_type
(str
, default:'minmax'
) –type of cost to use, either
minmax
orsum
-
generator
(MTSPGenerator
, default:None
) –MTSPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Source code in rl4co/envs/routing/mtsp/env.py
50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
MTSPGenerator
¶
MTSPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
min_num_agents: int = 5,
max_num_agents: int = 5,
**kwargs
)
Bases: Generator
Data generator for the Multiple Travelling Salesman Problem (mTSP).
Parameters:
-
num_loc
(int
, default:20
) –number of locations (customers) in the TSP
-
min_loc
(float
, default:0.0
) –minimum value for the location coordinates
-
max_loc
(float
, default:1.0
) –maximum value for the location coordinates
-
loc_distribution
(int | float | str | type | Callable
, default:Uniform
) –distribution for the location coordinates
-
min_num_agents
(int
, default:5
) –minimum number of agents (vehicles), include
-
max_num_agents
(int
, default:5
) –maximum number of agents (vehicles), include
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer num_agents [batch_size]: number of agents (vehicles)
Source code in rl4co/envs/routing/mtsp/generator.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
|
Orienteering Problem (OP)¶
OPEnv
¶
OPEnv(
generator: OPGenerator = None,
generator_params: dict = {},
prize_type: str = "dist",
**kwargs
)
Bases: RL4COEnvBase
Orienteering Problem (OP) environment. At each step, the agent chooses a location to visit in order to maximize the collected prize. The total length of the path must not exceed a given threshold.
Observations
- location of the depot
- locations and prize of each customer
- current location of the vehicle
- current tour length
- current total prize
- the remaining length of the path
Constraints
- the tour starts and ends at the depot
- not all customers need to be visited
- the vehicle cannot visit customers exceed the remaining length of the path
Finish Condition
- the vehicle back to the depot
Reward
- the sum of the prizes of visited nodes
Parameters:
-
generator
(OPGenerator
, default:None
) –OPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Methods:
-
get_action_mask
–Get action mask with 1 = feasible action, 0 = infeasible action.
-
check_solution_validity
–Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded.
Source code in rl4co/envs/routing/op/env.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
|
get_action_mask
staticmethod
¶
get_action_mask(td: TensorDict) -> Tensor
Get action mask with 1 = feasible action, 0 = infeasible action. Cannot visit if already visited, if depot has been visited, or if the length exceeds the maximum length.
Source code in rl4co/envs/routing/op/env.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
check_solution_validity
staticmethod
¶
check_solution_validity(
td: TensorDict,
actions: Tensor,
add_distance_to_depot: bool = True,
) -> None
Check that solution is valid: nodes are not visited twice except depot and capacity is not exceeded.
If add_distance_to_depot
if True, then the distance to the depot is added to max length since by default, the max length is
modified in the reset function to account for the distance to the depot.
Source code in rl4co/envs/routing/op/env.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
|
OPGenerator
¶
OPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
depot_distribution: (
int | float | str | type | Callable
) = None,
min_prize: float = 1.0,
max_prize: float = 1.0,
prize_distribution: (
int | float | type | Callable
) = Uniform,
prize_type: str = "dist",
max_length: float | Tensor = None,
**kwargs
)
Bases: Generator
Data generator for the Orienteering Problem (OP).
Parameters:
-
num_loc
(int
, default:20
) –number of locations (customers) in the OP, without the depot. (e.g. 10 means 10 locs + 1 depot)
-
min_loc
(float
, default:0.0
) –minimum value for the location coordinates
-
max_loc
(float
, default:1.0
) –maximum value for the location coordinates
-
loc_distribution
(int | float | str | type | Callable
, default:Uniform
) –distribution for the location coordinates
-
depot_distribution
(int | float | str | type | Callable
, default:None
) –distribution for the depot location. If None, sample the depot from the locations
-
min_prize
(float
, default:1.0
) –minimum value for the prize of each customer
-
max_prize
(float
, default:1.0
) –maximum value for the prize of each customer
-
prize_distribution
(int | float | type | Callable
, default:Uniform
) –distribution for the prize of each customer
-
max_length
(float | Tensor
, default:None
) –maximum length of the path
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot prize [batch_size, num_loc]: prize of each customer max_length [batch_size, 1]: maximum length of the path for each customer
Source code in rl4co/envs/routing/op/generator.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
|
Pickup and Delivery Problem (PDP)¶
PDPEnv
¶
PDPEnv(
generator: PDPGenerator = None,
generator_params: dict = {},
force_start_at_depot: bool = False,
**kwargs
)
Bases: RL4COEnvBase
Pickup and Delivery Problem (PDP) environment. The environment is made of num_loc + 1 locations (cities):
- 1 depot
- `num_loc` / 2 pickup locations
- `num_loc` / 2 delivery locations
The goal is to visit all the pickup and delivery locations in the shortest path possible starting from the depot The conditions is that the agent must visit a pickup location before visiting its corresponding delivery location
Observations
- locations of the depot, pickup, and delivery locations
- current location of the vehicle
- the remaining locations to deliver
- the visited locations
- the current step
Constraints
- the tour starts and ends at the depot
- each pickup location must be visited before its corresponding delivery location
- the vehicle cannot visit the same location twice
Finish Condition
- the vehicle has visited all locations
Reward
- (minus) the negative length of the path
Parameters:
-
generator
(PDPGenerator
, default:None
) –PDPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
-
force_start_at_depot
(bool
, default:False
) –whether to force the agent to start at the depot If False (default), the agent won't consider the depot, which is added in the
get_reward
method If True, the only valid action at the first step is to visit the depot (=0)
Methods:
-
get_num_starts
–Only half of the nodes (i.e. pickup nodes) can be start nodes
-
select_start_nodes
–Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected
Source code in rl4co/envs/routing/pdp/env.py
52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
get_num_starts
¶
get_num_starts(td)
Only half of the nodes (i.e. pickup nodes) can be start nodes
Source code in rl4co/envs/routing/pdp/env.py
228 229 230 |
|
select_start_nodes
¶
select_start_nodes(td, num_starts)
Only nodes from [1 : num_loc // 2 +1] (i.e. pickups) can be selected
Source code in rl4co/envs/routing/pdp/env.py
232 233 234 235 236 237 238 239 240 |
|
PDPGenerator
¶
PDPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
init_sol_type: str = "random",
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
depot_distribution: (
int | float | str | type | Callable
) = None,
**kwargs
)
Bases: Generator
Data generator for the Pickup and Delivery Problem (PDP). Args: num_loc: number of locations (customers) in the PDP, without the depot. (e.g. 10 means 10 locs + 1 depot)
- 1 depot
- `num_loc` / 2 pickup locations
- `num_loc` / 2 delivery locations
min_loc: minimum value for the location coordinates
max_loc: maximum value for the location coordinates
init_sol_type: the method type used for generating initial solutions (random or greedy)
loc_distribution: distribution for the location coordinates
depot_distribution: distribution for the depot location. If None, sample the depot from the locations
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer depot [batch_size, 2]: location of the depot
Source code in rl4co/envs/routing/pdp/generator.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
|
Prize Collecting Traveling Salesman Problem (PCTSP)¶
PCTSPEnv
¶
PCTSPEnv(
generator: PCTSPGenerator = None,
generator_params: dict = {},
**kwargs
)
Bases: RL4COEnvBase
Prize-collecting TSP (PCTSP) environment. The goal is to collect as much prize as possible while minimizing the total travel cost. The environment is stochastic, the prize is only revealed when the node is visited.
Observations
- locations of the nodes
- prize and penalty of each node
- current location of the vehicle
- current total prize
- current total penalty
- visited nodes
- prize required to visit a node
- the current step
Constraints
- the tour starts and ends at the depot
- the vehicle cannot visit nodes exceed the remaining prize
Finish Condition
- the vehicle back to the depot
Reward
- the sum of the saved penalties
Parameters:
-
generator
(PCTSPGenerator
, default:None
) –OPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Methods:
-
get_action_mask
–Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes
-
check_solution_validity
–Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited
Source code in rl4co/envs/routing/pctsp/env.py
52 53 54 55 56 57 58 59 60 61 62 |
|
get_action_mask
staticmethod
¶
get_action_mask(td: TensorDict) -> Tensor
Cannot visit depot if not yet collected 1 total prize and there are unvisited nodes
Source code in rl4co/envs/routing/pctsp/env.py
149 150 151 152 153 154 155 156 |
|
check_solution_validity
staticmethod
¶
check_solution_validity(
td: TensorDict, actions: Tensor
) -> None
Check that the solution is valid, i.e. contains all nodes once at most, and either prize constraint is met or all nodes are visited
Source code in rl4co/envs/routing/pctsp/env.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
|
PCTSPGenerator
¶
PCTSPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
depot_distribution: (
int | float | str | type | Callable
) = None,
penalty_factor: float = 3.0,
prize_required: float = 1.0,
**kwargs
)
Bases: Generator
Data generator for the Prize-collecting Traveling Salesman Problem (PCTSP).
Parameters:
-
num_loc
(int
, default:20
) –number of locations (customers) in the VRP, without the depot. (e.g. 10 means 10 locs + 1 depot)
-
min_loc
(float
, default:0.0
) –minimum value for the location coordinates
-
max_loc
(float
, default:1.0
) –maximum value for the location coordinates
-
loc_distribution
(int | float | str | type | Callable
, default:Uniform
) –distribution for the location coordinates
-
depot_distribution
(int | float | str | type | Callable
, default:None
) –distribution for the depot location. If None, sample the depot from the locations
-
min_demand
–minimum value for the demand of each customer
-
max_demand
–maximum value for the demand of each customer
-
demand_distribution
–distribution for the demand of each customer
-
capacity
–capacity of the vehicle
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each city depot [batch_size, 2]: location of the depot demand [batch_size, num_loc]: demand of each customer capacity [batch_size, 1]: capacity of the vehicle
Source code in rl4co/envs/routing/pctsp/generator.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
|
Split Delivery Vehicle Routing Problem (SDVRP)¶
SDVRPEnv
¶
SDVRPEnv(
generator: CVRPGenerator = None,
generator_params: dict = {},
**kwargs
)
Bases: CVRPEnv
Split Delivery Vehicle Routing Problem (SDVRP) environment. SDVRP is a generalization of CVRP, where nodes can be visited multiple times and a fraction of the demand can be met. At each step, the agent chooses a customer to visit depending on the current location and the remaining capacity. When the agent visits a customer, the remaining capacity is updated. If the remaining capacity is not enough to visit any customer, the agent must go back to the depot. The reward is the -infinite unless the agent visits all the customers. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.
Observations
- location of the depot.
- locations and demand/remaining demand of each customer
- current location of the vehicle.
- the remaining capacity of the vehicle.
Constraints
- the tour starts and ends at the depot.
- each customer can be visited multiple times.
- the vehicle cannot visit customers exceed the remaining capacity.
- the vehicle can return to the depot to refill the capacity.
Finish Condition
- the vehicle has finished all customers demand and returned to the depot.
Reward
- (minus) the negative length of the path.
Parameters:
-
generator
(CVRPGenerator
, default:None
) –CVRPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Methods:
-
check_solution_validity
–Check that the solution is valid (all demand is satisfied)
Source code in rl4co/envs/routing/sdvrp/env.py
50 51 52 53 54 55 56 |
|
check_solution_validity
staticmethod
¶
check_solution_validity(
td: TensorDict, actions: Tensor
) -> None
Check that the solution is valid (all demand is satisfied)
Source code in rl4co/envs/routing/sdvrp/env.py
137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP)¶
SPCTSPEnv
¶
SPCTSPEnv(**kwargs)
Bases: PCTSPEnv
Stochastic Prize Collecting Traveling Salesman Problem (SPCTSP) environment.
Note
The only difference with deterministic PCTSP is that the prizes are stochastic (i.e. the expected prize is not the same as the real prize).
Source code in rl4co/envs/routing/spctsp/env.py
19 20 |
|
Traveling Salesman Problem (TSP)¶
TSPEnv
¶
TSPEnv(
generator: TSPGenerator = None,
generator_params: dict = {},
**kwargs
)
Bases: RL4COEnvBase
Traveling Salesman Problem (TSP) environment At each step, the agent chooses a city to visit. The reward is 0 unless the agent visits all the cities. In that case, the reward is (-)length of the path: maximizing the reward is equivalent to minimizing the path length.
Observations
- locations of each customer.
- the current location of the vehicle.
Constraints
- the tour must return to the starting customer.
- each customer must be visited exactly once.
Finish condition
- the agent has visited all customers and returned to the starting customer.
Reward
- (minus) the negative length of the path.
Parameters:
-
generator
(TSPGenerator
, default:None
) –TSPGenerator instance as the data generator
-
generator_params
(dict
, default:{}
) –parameters for the generator
Methods:
-
check_solution_validity
–Check that solution is valid: nodes are visited exactly once
-
replace_selected_actions
–Replace selected current actions with updated actions based on
selection_mask
.
Source code in rl4co/envs/routing/tsp/env.py
50 51 52 53 54 55 56 57 58 59 60 |
|
check_solution_validity
staticmethod
¶
check_solution_validity(
td: TensorDict, actions: Tensor
) -> None
Check that solution is valid: nodes are visited exactly once
Source code in rl4co/envs/routing/tsp/env.py
160 161 162 163 164 165 166 167 168 |
|
replace_selected_actions
¶
replace_selected_actions(
cur_actions: Tensor,
new_actions: Tensor,
selection_mask: Tensor,
) -> Tensor
Replace selected current actions with updated actions based on selection_mask
.
Source code in rl4co/envs/routing/tsp/env.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|
TSPGenerator
¶
TSPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
init_sol_type: str = "random",
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
**kwargs
)
Bases: Generator
Data generator for the Travelling Salesman Problem (TSP).
Parameters:
-
num_loc
(int
, default:20
) –number of locations (customers) in the TSP
-
min_loc
(float
, default:0.0
) –minimum value for the location coordinates
-
max_loc
(float
, default:1.0
) –maximum value for the location coordinates
-
init_sol_type
(str
, default:'random'
) –the method type used for generating initial solutions (random or greedy)
-
loc_distribution
(int | float | str | type | Callable
, default:Uniform
) –distribution for the location coordinates
Returns:
-
–
A TensorDict with the following keys: locs [batch_size, num_loc, 2]: locations of each customer
Source code in rl4co/envs/routing/tsp/generator.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
Multi-Task Vehicle Routing Problem (MTVRP)¶
MTVRPEnv
¶
MTVRPEnv(
generator: MTVRPGenerator = None,
generator_params: dict = {},
check_solution: bool = False,
**kwargs
)
Bases: RL4COEnvBase
MTVRPEnv is a Multi-Task VRP environment which can take any combination of the following constraints:
Features:
- Capacity (C) - Each vehicle has a maximum capacity \(Q\), restricting the total load that can be in the vehicle at any point of the route. - The route must be planned such that the sum of demands and pickups for all customers visited does not exceed this capacity.
- Time Windows (TW) - Every node \(i\) has an associated time window \([e_i, l_i]\) during which service must commence. - Additionally, each node has a service time \(s_i\). Vehicles must reach node \(i\) within its time window; early arrivals must wait at the node location until time \(e_i\).
- Open Routes (O) - Vehicles are not required to return to the depot after serving all customers. - Note that this does not need to be counted as a constraint since it can be modelled by setting zero costs on arcs returning to the depot \(c_{i0} = 0\) from any customer \(i \in C\), and not counting the return arc as part of the route.
- Backhauls (B) - Backhauls generalize demand to also account for return shipments. Customers are either linehaul or backhaul customers. - Linehaul customers require delivery of a demand \(q_i > 0\) that needs to be transported from the depot to the customer, whereas backhaul customers need a pickup of an amount \(p_i > 0\) that is transported from the client back to the depot. - It is possible for vehicles to serve a combination of linehaul and backhaul customers in a single route, but then any linehaul customers must precede the backhaul customers in the route.
- Duration Limits (L) - Imposes a limit on the total travel duration (or length) of each route, ensuring a balanced workload across vehicles.
The environment covers the following 16 variants depending on the data generation:
VRP Variant | Capacity (C) | Open Route (O) | Backhaul (B) | Duration Limit (L) | Time Window (TW) |
---|---|---|---|---|---|
CVRP | ✔ | ||||
OVRP | ✔ | ✔ | |||
VRPB | ✔ | ✔ | |||
VRPL | ✔ | ✔ | |||
VRPTW | ✔ | ✔ | |||
OVRPTW | ✔ | ✔ | ✔ | ||
OVRPB | ✔ | ✔ | ✔ | ||
OVRPL | ✔ | ✔ | ✔ | ||
VRPBL | ✔ | ✔ | ✔ | ||
VRPBTW | ✔ | ✔ | ✔ | ||
VRPLTW | ✔ | ✔ | ✔ | ||
OVRPBL | ✔ | ✔ | ✔ | ✔ | |
OVRPBTW | ✔ | ✔ | ✔ | ✔ | |
OVRPLTW | ✔ | ✔ | ✔ | ✔ | |
VRPBLTW | ✔ | ✔ | ✔ | ✔ | |
OVRPBLTW | ✔ | ✔ | ✔ | ✔ | ✔ |
You may also check out the following papers as reference:
- "Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization" (Liu et al, 2024)
- "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts" (Zhou et al, 2024)
- "RouteFinder: Towards Foundation Models for Vehicle Routing Problems" (Berto et al, 2024)
Tip
Have a look at https://pyvrp.org/ for more information about VRP and its variants and their solutions. Kudos to their help and great job!
Parameters:
-
generator
(MTVRPGenerator
, default:None
) –Generator for the environment, see :class:
MTVRPGenerator
. -
generator_params
(dict
, default:{}
) –Parameters for the generator.
Methods:
-
load_data
–Dataset loading from file
-
render
–Simple wrapper for render function
-
select_start_nodes
–Select available start nodes for the environment (e.g. for POMO-based training)
-
solve
–Classical solver for the environment. This is a wrapper for the baselines solver.
-
check_variants
–Check if the problem has the variants
Source code in rl4co/envs/routing/mtvrp/env.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
load_data
¶
load_data(fpath, batch_size=[], scale=False)
Dataset loading from file Normalize demand by capacity to be in [0, 1]
Source code in rl4co/envs/routing/mtvrp/env.py
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 |
|
render
staticmethod
¶
render(*args, **kwargs)
Simple wrapper for render function
Source code in rl4co/envs/routing/mtvrp/env.py
378 379 380 381 382 383 |
|
select_start_nodes
¶
select_start_nodes(td, num_starts)
Select available start nodes for the environment (e.g. for POMO-based training)
Source code in rl4co/envs/routing/mtvrp/env.py
385 386 387 388 389 390 391 392 393 |
|
solve
staticmethod
¶
solve(
instances: TensorDict,
max_runtime: float,
num_procs: int = 1,
solver: str = "pyvrp",
**kwargs
) -> tuple[Tensor, Tensor]
Classical solver for the environment. This is a wrapper for the baselines solver.
Available solvers are: pyvrp
, ortools
, lkh
. Returns the actions and costs.
Source code in rl4co/envs/routing/mtvrp/env.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 |
|
check_variants
staticmethod
¶
check_variants(td)
Check if the problem has the variants
Source code in rl4co/envs/routing/mtvrp/env.py
457 458 459 460 461 462 463 464 |
|
MTVRPGenerator
¶
MTVRPGenerator(
num_loc: int = 20,
min_loc: float = 0.0,
max_loc: float = 1.0,
loc_distribution: (
int | float | str | type | Callable
) = Uniform,
capacity: float = None,
min_demand: int = 1,
max_demand: int = 10,
min_backhaul: int = 1,
max_backhaul: int = 10,
scale_demand: bool = True,
max_time: float = 4.6,
backhaul_ratio: float = 0.2,
distance_limit: float = 3.0,
speed: float = 1.0,
prob_open: float = 0.5,
prob_time_window: float = 0.5,
prob_limit: float = 0.5,
prob_backhaul: float = 0.5,
variant_preset=None,
use_combinations=True,
subsample=True,
**kwargs
)
Bases: Generator
MTVRP Generator. Class to generate instances of the MTVRP problem. If a variant is declared and Subsample is True, the generator will sample the problem based on the variant probabilities. By default, we use Mixed-Batch Training as in Berto et al. 2024 (RouteFinder), i.e. one batch can contain multiple variants.
Example presets:
- "all": Sample uniformly from 16 variants
- "single_feat": Sample uniformly between CVRP, OVRP, VRPB, VRPL, VRPTW (as done in Liu et al. 2024 (MTPOMO))
- "single_feat_otw": Sample uniformly between CVRP, OVRP, VRPB, VRPL, VRPTW, OVRPTW (as done in Zhou et al. 2024 (MVMoE))
- "cvrp": Only CVRP (similarly for other variants)
Parameters:
-
num_loc
(int
, default:20
) –Number of locations to generate
-
min_loc
(float
, default:0.0
) –Minimum location value
-
max_loc
(float
, default:1.0
) –Maximum location value
-
loc_distribution
(int | float | str | type | Callable
, default:Uniform
) –Distribution to sample locations from
-
capacity
(float
, default:None
) –Vehicle capacity. If None, get value based on
get_vehicle_capacity
-
min_demand
(int
, default:1
) –Minimum demand value
-
max_demand
(int
, default:10
) –Maximum demand value
-
min_backhaul
(int
, default:1
) –Minimum backhaul value
-
max_backhaul
(int
, default:10
) –Maximum backhaul value
-
scale_demand
(bool
, default:True
) –Scale demand values (by default, generate between 1 and 10)
-
max_time
(float
, default:4.6
) –Maximum time window value (at depot)
-
backhaul_ratio
(float
, default:0.2
) –Fraction of backhauls (e.g. 0.2 means 20% of nodes are backhaul)
-
distance_limit
(float
, default:3.0
) –Distance limit
-
speed
(float
, default:1.0
) –Speed of vehicle. Defaults to 1
-
subsample
–If False, we always sample all attributes (i.e., OVRPBLTW) If true, we use the
-
**kwargs
–Additional keyword arguments
Methods:
-
subsample_problems
–Create subproblems starting from seed probabilities depending on their variant.
-
generate_locations
–Generate seed locations.
-
generate_demands
–Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation.
-
generate_time_windows
–Generate time windows (TW) and service times for each location including depot.
-
generate_distance_limit
–Generates distance limits (L) and checks their feasibilities.
-
generate_open_route
–Generate open route flags (O). Here we could have a sampler but we simply return True here so all
-
generate_speed
–We simply generate the speed as constant here
Source code in rl4co/envs/routing/mtvrp/generator.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
subsample_problems
¶
subsample_problems(td)
Create subproblems starting from seed probabilities depending on their variant. If random seed sampled in [0, 1] in batch is greater than prob, remove the constraint thus, if prob high, it is less likely to remove the constraint (i.e. prob=0.9, 90% chance to keep constraint)
Source code in rl4co/envs/routing/mtvrp/generator.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 |
|
generate_locations
¶
generate_locations(batch_size, num_loc) -> Tensor
Generate seed locations.
Returns:
-
locs
(Tensor
) –[B, N+1, 2] where the first location is the depot.
Source code in rl4co/envs/routing/mtvrp/generator.py
317 318 319 320 321 322 323 324 325 326 |
|
generate_demands
¶
Classical lineahul demand / delivery from depot (C) and backhaul demand / pickup to depot (B) generation. Initialize the demand for nodes except the depot, which are added during reset. Demand sampling Following Kool et al. (2019), demands as integers between 1 and 10. Generates a slightly different distribution than using torch.randint.
Returns:
Source code in rl4co/envs/routing/mtvrp/generator.py
328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 |
|
generate_time_windows
¶
Generate time windows (TW) and service times for each location including depot. We refer to the generation process in "Multi-Task Learning for Routing Problem with Cross-Problem Zero-Shot Generalization" (Liu et al., 2024). Note that another way to generate is from "Learning to Delegate for Large-scale Vehicle Routing" (Li et al, 2021) which is used in "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts" (Zhou et al, 2024). Note that however, in that case the distance limit would have no influence when time windows are present, since the tw for depot is the same as distance with speed=1. This function can be overridden for that implementation. See also https://github.com/RoyalSkye/Routing-MVMoE
Parameters:
Returns:
Source code in rl4co/envs/routing/mtvrp/generator.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 |
|
generate_distance_limit
¶
Generates distance limits (L) and checks their feasibilities.
Returns:
-
distance_limit
(Tensor
) –[B, 1]
Source code in rl4co/envs/routing/mtvrp/generator.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
generate_open_route
¶
Generate open route flags (O). Here we could have a sampler but we simply return True here so all routes are open. Afterwards, we subsample the problems.
Source code in rl4co/envs/routing/mtvrp/generator.py
417 418 419 420 421 |
|
generate_speed
¶
We simply generate the speed as constant here
Source code in rl4co/envs/routing/mtvrp/generator.py
423 424 425 426 |
|