Improvement Methods¶
These methods are trained to improve existing solutions iteratively, akin to local search algorithms. They focus on refining existing solutions rather than generating them from scratch.
DACT¶
Classes:
-
DACTEncoder
–Dual-Aspect Collaborative Transformer Encoder as in Ma et al. (2021)
DACTEncoder
¶
DACTEncoder(
embed_dim: int = 64,
init_embedding: Module = None,
pos_embedding: Module = None,
env_name: str = "tsp_kopt",
pos_type: str = "CPE",
num_heads: int = 4,
num_layers: int = 3,
normalization: str = "layer",
feedforward_hidden: int = 64,
)
Bases: ImprovementEncoder
Dual-Aspect Collaborative Transformer Encoder as in Ma et al. (2021)
Parameters:
-
embed_dim
(int
, default:64
) –Dimension of the embedding space
-
init_embedding
(Module
, default:None
) –Module to use for the initialization of the node embeddings
-
pos_embedding
(Module
, default:None
) –Module to use for the initialization of the positional embeddings
-
env_name
(str
, default:'tsp_kopt'
) –Name of the environment used to initialize embeddings
-
pos_type
(str
, default:'CPE'
) –Name of the used positional encoding method (CPE or APE)
-
num_heads
(int
, default:4
) –Number of heads in the attention layers
-
num_layers
(int
, default:3
) –Number of layers in the attention network
-
normalization
(str
, default:'layer'
) –Normalization type in the attention layers
-
feedforward_hidden
(int
, default:64
) –Hidden dimension in the feedforward layers
Source code in rl4co/models/zoo/dact/encoder.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
|
Classes:
-
DACTDecoder
–DACT decoder based on Ma et al. (2021)
DACTDecoder
¶
Bases: ImprovementDecoder
DACT decoder based on Ma et al. (2021) Given the environment state and the dual sets of embeddings (PFE, NFE embeddings), compute the logits for selecting two nodes for the 2-opt local search from the current solution
Parameters:
-
embed_dim
(int
, default:64
) –Embedding dimension
-
num_heads
(int
, default:4
) –Number of attention heads
Methods:
-
forward
–Compute the logits of the removing a node pair from the current solution
Source code in rl4co/models/zoo/dact/decoder.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
forward
¶
forward(
td: TensorDict, final_h: Tensor, final_p: Tensor
) -> Tensor
Compute the logits of the removing a node pair from the current solution
Parameters:
-
td
(TensorDict
) –TensorDict with the current environment state
-
final_h
(Tensor
) –final NFE embeddings
-
final_p
(Tensor
) –final pfe embeddings
Source code in rl4co/models/zoo/dact/decoder.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
Classes:
-
DACTPolicy
–DACT Policy based on Ma et al. (2021)
DACTPolicy
¶
DACTPolicy(
embed_dim: int = 64,
num_encoder_layers: int = 3,
num_heads: int = 4,
normalization: str = "layer",
feedforward_hidden: int = 64,
env_name: str = "tsp_kopt",
pos_type: str = "CPE",
init_embedding: Module = None,
pos_embedding: Module = None,
temperature: float = 1.0,
tanh_clipping: float = 6.0,
train_decode_type: str = "sampling",
val_decode_type: str = "sampling",
test_decode_type: str = "sampling",
)
Bases: ImprovementPolicy
DACT Policy based on Ma et al. (2021)
This model first encodes the input graph and current solution using a DACT encoder (:class:DACTEncoder
)
and then decodes the 2-opt action (:class:DACTDecoder
)
Parameters:
-
embed_dim
(int
, default:64
) –Dimension of the node embeddings
-
num_encoder_layers
(int
, default:3
) –Number of layers in the encoder
-
num_heads
(int
, default:4
) –Number of heads in the attention layers
-
normalization
(str
, default:'layer'
) –Normalization type in the attention layers
-
feedforward_hidden
(int
, default:64
) –Dimension of the hidden layer in the feedforward network
-
env_name
(str
, default:'tsp_kopt'
) –Name of the environment used to initialize embeddings
-
pos_type
(str
, default:'CPE'
) –Name of the used positional encoding method (CPE or APE)
-
init_embedding
(Module
, default:None
) –Module to use for the initialization of the embeddings
-
pos_embedding
(Module
, default:None
) –Module to use for the initialization of the positional embeddings
-
temperature
(float
, default:1.0
) –Temperature for the softmax
-
tanh_clipping
(float
, default:6.0
) –Tanh clipping value (see Bello et al., 2016)
-
train_decode_type
(str
, default:'sampling'
) –Type of decoding to use during training
-
val_decode_type
(str
, default:'sampling'
) –Type of decoding to use during validation
-
test_decode_type
(str
, default:'sampling'
) –Type of decoding to use during testing
Methods:
-
forward
–Forward pass of the policy.
Source code in rl4co/models/zoo/dact/policy.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
forward
¶
forward(
td: TensorDict,
env: str | RL4COEnvBase = None,
phase: str = "train",
return_actions: bool = True,
return_embeds: bool = False,
only_return_embed: bool = False,
actions=None,
**decoding_kwargs
) -> dict
Forward pass of the policy.
Parameters:
-
td
(TensorDict
) –TensorDict containing the environment state
-
env
(str | RL4COEnvBase
, default:None
) –Environment to use for decoding. If None, the environment is instantiated from
env_name
. Note that it is more efficient to pass an already instantiated environment each time for fine-grained control -
phase
(str
, default:'train'
) –Phase of the algorithm (train, val, test)
-
return_actions
(bool
, default:True
) –Whether to return the actions
-
actions
–Actions to use for evaluating the policy. If passed, use these actions instead of sampling from the policy to calculate log likelihood
-
decoding_kwargs
–Keyword arguments for the decoding strategy. See :class:
rl4co.utils.decoding.DecodingStrategy
for more information.
Returns:
-
out
(dict
) –Dictionary containing the reward, log likelihood, and optionally the actions and entropy
Source code in rl4co/models/zoo/dact/policy.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
Classes:
-
DACT
–DACT Model based on n_step Proximal Policy Optimization (PPO) with an DACT model policy.
DACT
¶
DACT(
env: RL4COEnvBase,
policy: Module = None,
critic: CriticNetwork = None,
policy_kwargs: dict = {},
critic_kwargs: dict = {},
**kwargs
)
Bases: n_step_PPO
DACT Model based on n_step Proximal Policy Optimization (PPO) with an DACT model policy. We default to the DACT model policy and the improvement Critic Network.
Parameters:
-
env
(RL4COEnvBase
) –Environment to use for the algorithm
-
policy
(Module
, default:None
) –Policy to use for the algorithm
-
critic
(CriticNetwork
, default:None
) –Critic to use for the algorithm
-
policy_kwargs
(dict
, default:{}
) –Keyword arguments for policy
-
critic_kwargs
(dict
, default:{}
) –Keyword arguments for critic
Source code in rl4co/models/zoo/dact/model.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
N2S¶
Classes:
-
N2SEncoder
–Neural Neighborhood Search Encoder as in Ma et al. (2022)
N2SEncoder
¶
N2SEncoder(
embed_dim: int = 128,
init_embedding: Module = None,
pos_embedding: Module = None,
env_name: str = "pdp_ruin_repair",
pos_type: str = "CPE",
num_heads: int = 4,
num_layers: int = 3,
normalization: str = "layer",
feedforward_hidden: int = 128,
)
Bases: ImprovementEncoder
Neural Neighborhood Search Encoder as in Ma et al. (2022) First embed the input and then process it with a Graph AttepdN2ntion Network.
Parameters:
-
embed_dim
(int
, default:128
) –Dimension of the embedding space
-
init_embedding
(Module
, default:None
) –Module to use for the initialization of the node embeddings
-
pos_embedding
(Module
, default:None
) –Module to use for the initialization of the positional embeddings
-
env_name
(str
, default:'pdp_ruin_repair'
) –Name of the environment used to initialize embeddings
-
pos_type
(str
, default:'CPE'
) –Name of the used positional encoding method (CPE or APE)
-
num_heads
(int
, default:4
) –Number of heads in the attention layers
-
num_layers
(int
, default:3
) –Number of layers in the attention network
-
normalization
(str
, default:'layer'
) –Normalization type in the attention layers
-
feedforward_hidden
(int
, default:128
) –Hidden dimension in the feedforward layers
Source code in rl4co/models/zoo/n2s/encoder.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
Classes:
-
NodePairRemovalDecoder
–N2S Node-Pair Removal decoder based on Ma et al. (2022)
-
NodePairReinsertionDecoder
–N2S Node-Pair Reinsertion decoder based on Ma et al. (2022)
NodePairRemovalDecoder
¶
Bases: ImprovementDecoder
N2S Node-Pair Removal decoder based on Ma et al. (2022) Given the environment state and the node embeddings (positional embeddings are discarded), compute the logits for selecting a pair of pickup and delivery nodes for node pair removal from the current solution
Parameters:
-
embed_dim
(int
, default:128
) –Embedding dimension
-
num_heads
(int
, default:4
) –Number of attention heads
Methods:
-
forward
–Compute the logits of the removing a node pair from the current solution
Source code in rl4co/models/zoo/n2s/decoder.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
forward
¶
forward(
td: TensorDict, final_h: Tensor, final_p: Tensor
) -> Tensor
Compute the logits of the removing a node pair from the current solution
Parameters:
-
td
(TensorDict
) –TensorDict with the current environment state
-
final_h
(Tensor
) –final node embeddings
-
final_p
(Tensor
) –final positional embeddings
Source code in rl4co/models/zoo/n2s/decoder.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|
NodePairReinsertionDecoder
¶
Bases: ImprovementDecoder
N2S Node-Pair Reinsertion decoder based on Ma et al. (2022) Given the environment state, the node embeddings (positional embeddings are discarded), and the removed node from the NodePairRemovalDecoder, compute the logits for finding places to re-insert the removed pair of pickup and delivery nodes to form a new solution
Parameters:
-
embed_dim
(int
, default:128
) –Embedding dimension
-
num_heads
(int
, default:4
) –Number of attention heads
Source code in rl4co/models/zoo/n2s/decoder.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
Classes:
-
N2SPolicy
–N2S Policy based on Ma et al. (2022)
N2SPolicy
¶
N2SPolicy(
embed_dim: int = 128,
num_encoder_layers: int = 3,
num_heads: int = 4,
normalization: str = "layer",
feedforward_hidden: int = 128,
env_name: str = "pdp_ruin_repair",
pos_type: str = "CPE",
init_embedding: Module = None,
pos_embedding: Module = None,
temperature: float = 1.0,
tanh_clipping: float = 6.0,
train_decode_type: str = "sampling",
val_decode_type: str = "sampling",
test_decode_type: str = "sampling",
)
Bases: ImprovementPolicy
N2S Policy based on Ma et al. (2022)
This model first encodes the input graph and current solution using a N2S encoder (:class:N2SEncoder
)
and then decodes the node-pair removal and reinsertion action using
the Node-Pair Removal (:class:NodePairRemovalDecoder
) and Reinsertion (:class:NodePairReinsertionDecoder
) decoders
Parameters:
-
embed_dim
(int
, default:128
) –Dimension of the node embeddings
-
num_encoder_layers
(int
, default:3
) –Number of layers in the encoder
-
num_heads
(int
, default:4
) –Number of heads in the attention layers
-
normalization
(str
, default:'layer'
) –Normalization type in the attention layers
-
feedforward_hidden
(int
, default:128
) –Dimension of the hidden layer in the feedforward network
-
env_name
(str
, default:'pdp_ruin_repair'
) –Name of the environment used to initialize embeddings
-
pos_type
(str
, default:'CPE'
) –Name of the used positional encoding method (CPE or APE)
-
init_embedding
(Module
, default:None
) –Module to use for the initialization of the embeddings
-
pos_embedding
(Module
, default:None
) –Module to use for the initialization of the positional embeddings
-
temperature
(float
, default:1.0
) –Temperature for the softmax
-
tanh_clipping
(float
, default:6.0
) –Tanh clipping value (see Bello et al., 2016)
-
train_decode_type
(str
, default:'sampling'
) –Type of decoding to use during training
-
val_decode_type
(str
, default:'sampling'
) –Type of decoding to use during validation
-
test_decode_type
(str
, default:'sampling'
) –Type of decoding to use during testing
Methods:
-
forward
–Forward pass of the policy.
Source code in rl4co/models/zoo/n2s/policy.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
forward
¶
forward(
td: TensorDict,
env: str | RL4COEnvBase = None,
phase: str = "train",
return_actions: bool = True,
return_embeds: bool = False,
only_return_embed: bool = False,
actions=None,
**decoding_kwargs
) -> dict
Forward pass of the policy.
Parameters:
-
td
(TensorDict
) –TensorDict containing the environment state
-
env
(str | RL4COEnvBase
, default:None
) –Environment to use for decoding. If None, the environment is instantiated from
env_name
. Note that it is more efficient to pass an already instantiated environment each time for fine-grained control -
phase
(str
, default:'train'
) –Phase of the algorithm (train, val, test)
-
return_actions
(bool
, default:True
) –Whether to return the actions
-
actions
–Actions to use for evaluating the policy. If passed, use these actions instead of sampling from the policy to calculate log likelihood
-
decoding_kwargs
–Keyword arguments for the decoding strategy. See :class:
rl4co.utils.decoding.DecodingStrategy
for more information.
Returns:
-
out
(dict
) –Dictionary containing the reward, log likelihood, and optionally the actions and entropy
Source code in rl4co/models/zoo/n2s/policy.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
|
Classes:
-
N2S
–N2S Model based on n_step Proximal Policy Optimization (PPO) with an N2S model policy.
N2S
¶
N2S(
env: RL4COEnvBase,
policy: Module = None,
critic: CriticNetwork = None,
policy_kwargs: dict = {},
critic_kwargs: dict = {},
**kwargs
)
Bases: n_step_PPO
N2S Model based on n_step Proximal Policy Optimization (PPO) with an N2S model policy. We default to the N2S model policy and the improvement Critic Network.
Parameters:
-
env
(RL4COEnvBase
) –Environment to use for the algorithm
-
policy
(Module
, default:None
) –Policy to use for the algorithm
-
critic
(CriticNetwork
, default:None
) –Critic to use for the algorithm
-
policy_kwargs
(dict
, default:{}
) –Keyword arguments for policy
-
critic_kwargs
(dict
, default:{}
) –Keyword arguments for critic
Source code in rl4co/models/zoo/n2s/model.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
NeuOpt¶
Classes:
-
RDSDecoder
–RDS Decoder for flexible k-opt based on Ma et al. (2023)
RDSDecoder
¶
RDSDecoder(embed_dim: int = 128)
Bases: ImprovementDecoder
RDS Decoder for flexible k-opt based on Ma et al. (2023) Given the environment state and the node embeddings (positional embeddings are discarded), compute the logits for selecting a k-opt exchange on basis moves (S-move, I-move, E-move) from the current solution
Parameters:
-
embed_dim
(int
, default:128
) –Embedding dimension
-
num_heads
–Number of attention heads
Source code in rl4co/models/zoo/neuopt/decoder.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
|
Classes:
-
CustomizeTSPInitEmbedding
–Initial embedding for the Traveling Salesman Problems (TSP).
-
NeuOptPolicy
–NeuOpt Policy based on Ma et al. (2023)
CustomizeTSPInitEmbedding
¶
CustomizeTSPInitEmbedding(embed_dim, linear_bias=True)
Bases: Module
Initial embedding for the Traveling Salesman Problems (TSP). Embed the following node features to the embedding space:
- locs: x, y coordinates of the cities
Source code in rl4co/models/zoo/neuopt/policy.py
24 25 26 27 28 29 30 31 |
|
NeuOptPolicy
¶
NeuOptPolicy(
embed_dim: int = 128,
num_encoder_layers: int = 3,
num_heads: int = 4,
normalization: str = "layer",
feedforward_hidden: int = 128,
env_name: str = "tsp_kopt",
pos_type: str = "CPE",
init_embedding: Module = None,
pos_embedding: Module = None,
temperature: float = 1.0,
tanh_clipping: float = 6.0,
train_decode_type: str = "sampling",
val_decode_type: str = "sampling",
test_decode_type: str = "sampling",
)
Bases: ImprovementPolicy
NeuOpt Policy based on Ma et al. (2023)
This model first encodes the input graph and current solution using a N2S encoder (:class:N2SEncoder
)
and then decodes the k-opt action (:class:RDSDecoder
)
Parameters:
-
embed_dim
(int
, default:128
) –Dimension of the node embeddings
-
num_encoder_layers
(int
, default:3
) –Number of layers in the encoder
-
num_heads
(int
, default:4
) –Number of heads in the attention layers
-
normalization
(str
, default:'layer'
) –Normalization type in the attention layers
-
feedforward_hidden
(int
, default:128
) –Dimension of the hidden layer in the feedforward network
-
env_name
(str
, default:'tsp_kopt'
) –Name of the environment used to initialize embeddings
-
pos_type
(str
, default:'CPE'
) –Name of the used positional encoding method (CPE or APE)
-
init_embedding
(Module
, default:None
) –Module to use for the initialization of the embeddings
-
pos_embedding
(Module
, default:None
) –Module to use for the initialization of the positional embeddings
-
temperature
(float
, default:1.0
) –Temperature for the softmax
-
tanh_clipping
(float
, default:6.0
) –Tanh clipping value (see Bello et al., 2016)
-
train_decode_type
(str
, default:'sampling'
) –Type of decoding to use during training
-
val_decode_type
(str
, default:'sampling'
) –Type of decoding to use during validation
-
test_decode_type
(str
, default:'sampling'
) –Type of decoding to use during testing
Methods:
-
forward
–Forward pass of the policy.
Source code in rl4co/models/zoo/neuopt/policy.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
forward
¶
forward(
td: TensorDict,
env: str | RL4COEnvBase = None,
phase: str = "train",
return_actions: bool = True,
return_embeds: bool = False,
only_return_embed: bool = False,
actions=None,
**decoding_kwargs
) -> dict
Forward pass of the policy.
Parameters:
-
td
(TensorDict
) –TensorDict containing the environment state
-
env
(str | RL4COEnvBase
, default:None
) –Environment to use for decoding. If None, the environment is instantiated from
env_name
. Note that it is more efficient to pass an already instantiated environment each time for fine-grained control -
phase
(str
, default:'train'
) –Phase of the algorithm (train, val, test)
-
return_actions
(bool
, default:True
) –Whether to return the actions
-
actions
–Actions to use for evaluating the policy. If passed, use these actions instead of sampling from the policy to calculate log likelihood
-
decoding_kwargs
–Keyword arguments for the decoding strategy. See :class:
rl4co.utils.decoding.DecodingStrategy
for more information.
Returns:
-
out
(dict
) –Dictionary containing the reward, log likelihood, and optionally the actions and entropy
Source code in rl4co/models/zoo/neuopt/policy.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 |
|
Classes:
-
NeuOpt
–NeuOpt Model based on n_step Proximal Policy Optimization (PPO) with an NeuOpt model policy.
NeuOpt
¶
NeuOpt(
env: RL4COEnvBase,
policy: Module = None,
critic: CriticNetwork = None,
policy_kwargs: dict = {},
critic_kwargs: dict = {},
**kwargs
)
Bases: n_step_PPO
NeuOpt Model based on n_step Proximal Policy Optimization (PPO) with an NeuOpt model policy. We default to the NeuOpt model policy and the improvement Critic Network.
Parameters:
-
env
(RL4COEnvBase
) –Environment to use for the algorithm
-
policy
(Module
, default:None
) –Policy to use for the algorithm
-
critic
(CriticNetwork
, default:None
) –Critic to use for the algorithm
-
policy_kwargs
(dict
, default:{}
) –Keyword arguments for policy
-
critic_kwargs
(dict
, default:{}
) –Keyword arguments for critic
Source code in rl4co/models/zoo/neuopt/model.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|