Constructive NonAutoregressive¶
DeepACO¶
Classes:
-
AntSystem
–Implements the Ant System algorithm: https://doi.org/10.1109/3477.484436.
AntSystem
¶
AntSystem(
log_heuristic: Tensor,
n_ants: int = 20,
alpha: float = 1.0,
beta: float = 1.0,
decay: float = 0.95,
Q: Optional[float] = None,
pheromone: Optional[Tensor | int] = None,
use_local_search: bool = False,
use_nls: bool = False,
n_perturbations: int = 1,
local_search_params: dict = {},
perturbation_params: dict = {},
)
Implements the Ant System algorithm: https://doi.org/10.1109/3477.484436.
Parameters:
-
log_heuristic
(Tensor
) –Logarithm of the heuristic matrix.
-
n_ants
(int
, default:20
) –Number of ants to be used in the algorithm. Defaults to 20.
-
alpha
(float
, default:1.0
) –Importance of pheromone in the decision-making process. Defaults to 1.0.
-
beta
(float
, default:1.0
) –Importance of heuristic information in the decision-making process. Defaults to 1.0.
-
decay
(float
, default:0.95
) –Rate at which pheromone evaporates. Should be between 0 and 1. Defaults to 0.95.
-
Q
(Optional[float]
, default:None
) –Rate at which pheromone deposits. Defaults to
1 / n_ants
. -
pheromone
(Optional[Tensor | int]
, default:None
) –Initial pheromone matrix. Defaults to
torch.ones_like(log_heuristic)
. -
use_local_search
(bool
, default:False
) –Whether to use local_search provided by the env. Default to False.
-
use_nls
(bool
, default:False
) –Whether to use neural-guided local search provided by the env. Default to False.
-
n_perturbations
(int
, default:1
) –Number of perturbations to be used for nls. Defaults to 5.
-
local_search_params
(dict
, default:{}
) –Arguments to be passed to the local_search.
-
perturbation_params
(dict
, default:{}
) –Arguments to be passed to the perturbation used for nls.
Methods:
-
run
–Run the Ant System algorithm for a specified number of iterations.
-
local_search
–Perform local search on the actions and reward obtained.
Source code in rl4co/models/zoo/deepaco/antsystem.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
run
¶
run(
td_initial: TensorDict,
env: RL4COEnvBase,
n_iterations: int,
decoding_kwargs: dict,
disable_tqdm: bool = True,
) -> Tuple[Tensor, dict[int, Tensor]]
Run the Ant System algorithm for a specified number of iterations.
Parameters:
-
td_initial
(TensorDict
) –Initial state of the problem.
-
env
(RL4COEnvBase
) –Environment representing the problem.
-
n_iterations
(int
) –Number of iterations to run the algorithm.
-
decoding_kwargs
(dict
) –Keyword arguments for decoding strategy.
-
disable_tqdm
(bool
, default:True
) –Whether to disable the tqdm progress bar. Defaults to False.
Returns:
-
td
(Tensor
) –The final state of the problem.
-
actions
(dict[int, Tensor]
) –The final actions chosen by the algorithm.
-
reward
(Tuple[Tensor, dict[int, Tensor]]
) –The final reward achieved by the algorithm.
Source code in rl4co/models/zoo/deepaco/antsystem.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
local_search
¶
local_search(
td: TensorDict,
env: RL4COEnvBase,
actions: Tensor,
decoding_kwargs: dict,
) -> Tuple[Tensor, Tensor]
Perform local search on the actions and reward obtained.
Parameters:
-
td
(TensorDict
) –Current state of the problem.
-
env
(RL4COEnvBase
) –Environment representing the problem.
-
actions
(Tensor
) –Actions chosen by the algorithm.
Returns:
Source code in rl4co/models/zoo/deepaco/antsystem.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
|
Classes:
-
DeepACO
–Implements DeepACO: https://arxiv.org/abs/2309.14032.
DeepACO
¶
DeepACO(
env: RL4COEnvBase,
policy: Optional[DeepACOPolicy] = None,
baseline: Union[REINFORCEBaseline, str] = "no",
train_with_local_search: bool = True,
ls_reward_aug_W: float = 0.95,
policy_kwargs: dict = {},
baseline_kwargs: dict = {},
**kwargs
)
Bases: REINFORCE
Implements DeepACO: https://arxiv.org/abs/2309.14032.
Parameters:
-
env
(RL4COEnvBase
) –Environment to use for the algorithm
-
policy
(Optional[DeepACOPolicy]
, default:None
) –Policy to use for the algorithm
-
baseline
(Union[REINFORCEBaseline, str]
, default:'no'
) –REINFORCE baseline. Defaults to "no" because the shared baseline is manually implemented.
-
train_with_local_search
(bool
, default:True
) –Whether to train with local search. Defaults to False.
-
ls_reward_aug_W
(float
, default:0.95
) –Coefficient to be used for the reward augmentation with the local search. Defaults to 0.95.
-
policy_kwargs
(dict
, default:{}
) –Keyword arguments for policy
-
baseline_kwargs
(dict
, default:{}
) –Keyword arguments for baseline
-
**kwargs
–Keyword arguments passed to the superclass
Methods:
-
calculate_loss
–Calculate loss for REINFORCE algorithm.
Source code in rl4co/models/zoo/deepaco/model.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
calculate_loss
¶
calculate_loss(
td: TensorDict,
batch: TensorDict,
policy_out: dict,
reward: Optional[Tensor] = None,
log_likelihood: Optional[Tensor] = None,
)
Calculate loss for REINFORCE algorithm.
Parameters:
-
td
(TensorDict
) –TensorDict containing the current state of the environment
-
batch
(TensorDict
) –Batch of data. This is used to get the extra loss terms, e.g., REINFORCE baseline
-
policy_out
(dict
) –Output of the policy network
-
reward
(Optional[Tensor]
, default:None
) –Reward tensor. If None, it is taken from
policy_out
-
log_likelihood
(Optional[Tensor]
, default:None
) –Log-likelihood tensor. If None, it is taken from
policy_out
Source code in rl4co/models/zoo/deepaco/model.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
Classes:
-
DeepACOPolicy
–Implememts DeepACO policy based on :class:
NonAutoregressivePolicy
. Introduced by Ye et al. (2023): https://arxiv.org/abs/2309.14032.
DeepACOPolicy
¶
DeepACOPolicy(
encoder: Optional[NonAutoregressiveEncoder] = None,
env_name: str = "tsp",
temperature: float = 1.0,
top_p: float = 0.0,
top_k: int = 0,
aco_class: Optional[Type[AntSystem]] = None,
aco_kwargs: dict = {},
train_with_local_search: bool = False,
n_ants: Optional[Union[int, dict]] = None,
n_iterations: Optional[Union[int, dict]] = None,
start_node: Optional[int] = None,
multistart: bool = False,
k_sparse: Optional[int] = None,
**encoder_kwargs
)
Bases: NonAutoregressivePolicy
Implememts DeepACO policy based on :class:NonAutoregressivePolicy
. Introduced by Ye et al. (2023): https://arxiv.org/abs/2309.14032.
This policy uses a Non-Autoregressive Graph Neural Network to generate heatmaps,
which are then used to run Ant Colony Optimization (ACO) to construct solutions.
Parameters:
-
encoder
(Optional[NonAutoregressiveEncoder]
, default:None
) –Encoder module. Can be passed by sub-classes
-
env_name
(str
, default:'tsp'
) –Name of the environment used to initialize embeddings
-
temperature
(float
, default:1.0
) –Temperature for the softmax during decoding. Defaults to 1.0.
-
aco_class
(Optional[Type[AntSystem]]
, default:None
) –Class representing the ACO algorithm to be used. Defaults to :class:
AntSystem
. -
aco_kwargs
(dict
, default:{}
) –Additional arguments to be passed to the ACO algorithm.
-
train_with_local_search
(bool
, default:False
) –Whether to train with local search. Defaults to False.
-
n_ants
(Optional[Union[int, dict]]
, default:None
) –Number of ants to be used in the ACO algorithm. Can be an integer or dictionary. Defaults to 20.
-
n_iterations
(Optional[Union[int, dict]]
, default:None
) –Number of iterations to run the ACO algorithm. Can be an integer or dictionary. Defaults to
dict(train=1, val=20, test=100)
. -
encoder_kwargs
–Additional arguments to be passed to the encoder.
Methods:
-
forward
–Forward method. During validation and testing, the policy runs the ACO algorithm to construct solutions.
Source code in rl4co/models/zoo/deepaco/policy.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
forward
¶
forward(
td_initial: TensorDict,
env: Optional[Union[str, RL4COEnvBase]] = None,
phase: str = "train",
return_actions: bool = True,
return_hidden: bool = True,
actions=None,
**decoding_kwargs
)
Forward method. During validation and testing, the policy runs the ACO algorithm to construct solutions.
See :class:NonAutoregressivePolicy
for more details during the training phase.
Source code in rl4co/models/zoo/deepaco/policy.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
|
NAR-GNN¶
Classes:
-
NARGNNPolicy
–Base Non-autoregressive policy for NCO construction methods.
NARGNNPolicy
¶
NARGNNPolicy(
encoder: Optional[NonAutoregressiveEncoder] = None,
decoder: Optional[NonAutoregressiveDecoder] = None,
embed_dim: int = 64,
env_name: str = "tsp",
init_embedding: Optional[Module] = None,
edge_embedding: Optional[Module] = None,
graph_network: Optional[Module] = None,
heatmap_generator: Optional[Module] = None,
num_layers_heatmap_generator: int = 5,
num_layers_graph_encoder: int = 15,
act_fn="silu",
agg_fn="mean",
linear_bias: bool = True,
train_decode_type: str = "multistart_sampling",
val_decode_type: str = "multistart_greedy",
test_decode_type: str = "multistart_greedy",
**constructive_policy_kw
)
Bases: NonAutoregressivePolicy
Base Non-autoregressive policy for NCO construction methods. This creates a heatmap of NxN for N nodes (i.e., heuristic) that models the probability to go from one node to another for all nodes.
The policy performs the following steps
- Encode the environment initial state into node embeddings
- Decode (non-autoregressively) to construct the solution to the NCO problem
Warning
The effectiveness of the non-autoregressive approach can vary significantly across different problem types and configurations. It may require careful tuning of the model architecture and decoding strategy to achieve competitive results.
Parameters:
-
encoder
(Optional[NonAutoregressiveEncoder]
, default:None
) –Encoder module. Can be passed by sub-classes
-
decoder
(Optional[NonAutoregressiveDecoder]
, default:None
) –Decoder module. Note that this moule defaults to the non-autoregressive decoder
-
embed_dim
(int
, default:64
) –Dimension of the embeddings
-
env_name
(str
, default:'tsp'
) –Name of the environment used to initialize embeddings
-
init_embedding
(Optional[Module]
, default:None
) –Model to use for the initial embedding. If None, use the default embedding for the environment
-
edge_embedding
(Optional[Module]
, default:None
) –Model to use for the edge embedding. If None, use the default embedding for the environment
-
graph_network
(Optional[Module]
, default:None
) –Model to use for the graph network. If None, use the default embedding for the environment
-
heatmap_generator
(Optional[Module]
, default:None
) –Model to use for the heatmap generator. If None, use the default embedding for the environment
-
num_layers_heatmap_generator
(int
, default:5
) –Number of layers in the heatmap generator
-
num_layers_graph_encoder
(int
, default:15
) –Number of layers in the graph encoder
-
act_fn
–Activation function to use in the encoder
-
agg_fn
–Aggregation function to use in the encoder
-
linear_bias
(bool
, default:True
) –Whether to use bias in the encoder
-
train_decode_type
(str
, default:'multistart_sampling'
) –Type of decoding during training
-
val_decode_type
(str
, default:'multistart_greedy'
) –Type of decoding during validation
-
test_decode_type
(str
, default:'multistart_greedy'
) –Type of decoding during testing
-
**constructive_policy_kw
–Unused keyword arguments
Source code in rl4co/models/zoo/nargnn/policy.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
|
Classes:
-
EdgeHeatmapGenerator
–MLP for converting edge embeddings to heatmaps.
-
NARGNNEncoder
–Anisotropic Graph Neural Network encoder with edge-gating mechanism as in Joshi et al. (2022), and used in DeepACO (Ye et al., 2023).
-
NARGNNNodeEncoder
–In this case, we just use the node embeddings from the graph
EdgeHeatmapGenerator
¶
EdgeHeatmapGenerator(
embed_dim: int,
num_layers: int,
act_fn: str | Callable = "silu",
linear_bias: bool = True,
undirected_graph: bool = True,
)
Bases: Module
MLP for converting edge embeddings to heatmaps.
Parameters:
-
embed_dim
(int
) –Dimension of the embeddings
-
num_layers
(int
) –The number of linear layers in the network.
-
act_fn
(str | Callable
, default:'silu'
) –Activation function. Defaults to "silu".
-
linear_bias
(bool
, default:True
) –Use bias in linear layers. Defaults to True.
-
undirected_graph
(bool
, default:True
) –Whether the graph is undirected. Defaults to True.
Source code in rl4co/models/zoo/nargnn/encoder.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
NARGNNEncoder
¶
NARGNNEncoder(
embed_dim: int = 64,
env_name: str = "tsp",
init_embedding: Optional[Module] = None,
edge_embedding: Optional[Module] = None,
graph_network: Optional[Module] = None,
heatmap_generator: Optional[Module] = None,
num_layers_heatmap_generator: int = 5,
num_layers_graph_encoder: int = 15,
act_fn="silu",
agg_fn="mean",
linear_bias: bool = True,
k_sparse: Optional[int] = None,
)
Bases: NonAutoregressiveEncoder
Anisotropic Graph Neural Network encoder with edge-gating mechanism as in Joshi et al. (2022), and used in DeepACO (Ye et al., 2023). This creates a heatmap of NxN for N nodes (i.e., heuristic) that models the probability to go from one node to another for all nodes. This model utilizes a multi-layer perceptron (MLP) approach to predict edge attributes directly from the input graph features, which are then transformed into a heatmap representation to facilitate the decoding of the solution. The decoding process is managed by a specified strategy which could vary from simple greedy selection to more complex sampling methods.
Tip
This decoder's performance heavily relies on the ability of the MLP to capture the dependencies between different parts of the solution without the iterative refinement provided by autoregressive models. It is particularly useful in scenarios where the solution space can be effectively explored in a parallelized manner or when the solution components are largely independent.
Parameters:
-
embed_dim
(int
, default:64
) –Dimension of the node embeddings
-
env_name
(str
, default:'tsp'
) –Name of the environment used to initialize embeddings
-
num_layers
–Number of layers in the encoder
-
init_embedding
(Optional[Module]
, default:None
) –Model to use for the initial embedding. If None, use the default embedding for the environment
-
edge_embedding
(Optional[Module]
, default:None
) –Model to use for the edge embedding. If None, use the default embedding for the environment
-
graph_network
(Optional[Module]
, default:None
) –Model to use for the graph network. If None, use the default network for the environment
-
heatmap_generator
(Optional[Module]
, default:None
) –Model to use for the heatmap generator. If None, use the default network for the environment
-
num_layers_heatmap_generator
(int
, default:5
) –Number of layers in the heatmap generator
-
num_layers_graph_encoder
(int
, default:15
) –Number of layers in the graph encoder
-
act_fn
–The activation function to use in each GNNLayer, see https://pytorch.org/docs/stable/nn.functional.html#non-linear-activation-functions for available options. Defaults to 'silu'.
-
agg_fn
–The aggregation function to use in each GNNLayer for pooling features. Options: 'add', 'mean', 'max'. Defaults to 'mean'.
-
linear_bias
(bool
, default:True
) –Use bias in linear layers. Defaults to True.
-
k_sparse
(Optional[int]
, default:None
) –Number of edges to keep for each node. Defaults to None.
Methods:
-
forward
–Forward pass of the encoder.
Source code in rl4co/models/zoo/nargnn/encoder.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
forward
¶
forward(td: TensorDict)
Forward pass of the encoder. Transform the input TensorDict into the latent representation.
Source code in rl4co/models/zoo/nargnn/encoder.py
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
NARGNNNodeEncoder
¶
NARGNNNodeEncoder(
embed_dim: int = 64,
env_name: str = "tsp",
init_embedding: Optional[Module] = None,
edge_embedding: Optional[Module] = None,
graph_network: Optional[Module] = None,
heatmap_generator: Optional[Module] = None,
num_layers_heatmap_generator: int = 5,
num_layers_graph_encoder: int = 15,
act_fn="silu",
agg_fn="mean",
linear_bias: bool = True,
k_sparse: Optional[int] = None,
)
Bases: NARGNNEncoder
In this case, we just use the node embeddings from the graph without transforming them into a heatmap.
Methods:
-
forward
–Forward pass of the encoder.
Source code in rl4co/models/zoo/nargnn/encoder.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
forward
¶
forward(td: TensorDict)
Forward pass of the encoder. Transform the input TensorDict into the latent representation.
Source code in rl4co/models/zoo/nargnn/encoder.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
|