Transductive Methods

Transductive Methods¶

These methods update policy parameters during online testing to improve the solutions of a specific instance.

Active Search (AS)¶

Classes:

ActiveSearch –

Active Search for Neural Combination Optimization from Bello et al. (2016).

ActiveSearch ¶

ActiveSearch(
    env,
    policy,
    dataset: Dataset | str,
    batch_size: int = 1,
    max_iters: int = 200,
    augment_size: int = 8,
    augment_dihedral: bool = True,
    num_parallel_runs: int = 1,
    max_runtime: int = 86400,
    save_path: str = None,
    optimizer: str | Optimizer | partial = "Adam",
    optimizer_kwargs: dict = {
        "lr": 0.00026,
        "weight_decay": 1e-06,
    },
    **kwargs
)

Bases: TransductiveModel

Active Search for Neural Combination Optimization from Bello et al. (2016). Fine-tunes the whole policy network (encoder + decoder) on a batch of instances. Reference: https://arxiv.org/abs/1611.09940

Parameters:

env –

RL4CO environment to be solved
policy –

policy network
dataset (Dataset | str) –

dataset to be used for training
batch_size (int, default: 1 ) –

batch size for training
max_iters (int, default: 200 ) –

maximum number of iterations
augment_size (int, default: 8 ) –

number of augmentations per state
augment_dihedral (bool, default: True ) –

whether to augment with dihedral rotations
parallel_runs –

number of parallel runs
max_runtime (int, default: 86400 ) –

maximum runtime in seconds
save_path (str, default: None ) –

path to save solution checkpoints
optimizer (str | Optimizer | partial, default: 'Adam' ) –

optimizer to use for training
optimizer_kwargs (dict, default: {'lr': 0.00026, 'weight_decay': 1e-06} ) –

keyword arguments for optimizer
**kwargs –

additional keyword arguments

Methods:

setup –

Setup base class and instantiate:
on_train_batch_start –

Called before training (i.e. search) for a new batch begins.
training_step –

Main search loop. We use the training step to effectively adapt to a batch of instances.
on_train_batch_end –

We store the best solution and reward found.
on_train_epoch_end –

Called when the training ends.

Source code in rl4co/models/zoo/active_search/search.py

def __init__(
    self,
    env,
    policy,
    dataset: Dataset | str,
    batch_size: int = 1,
    max_iters: int = 200,
    augment_size: int = 8,
    augment_dihedral: bool = True,
    num_parallel_runs: int = 1,
    max_runtime: int = 86_400,
    save_path: str = None,
    optimizer: str | torch.optim.Optimizer | partial = "Adam",
    optimizer_kwargs: dict = {"lr": 2.6e-4, "weight_decay": 1e-6},
    **kwargs,
):
    self.save_hyperparameters(logger=False)

    assert batch_size == 1, "Batch size must be 1 for active search"

    super(ActiveSearch, self).__init__(
        env,
        policy=policy,
        dataset=dataset,
        batch_size=batch_size,
        max_iters=max_iters,
        max_runtime=max_runtime,
        save_path=save_path,
        optimizer=optimizer,
        optimizer_kwargs=optimizer_kwargs,
        **kwargs,
    )

setup ¶

setup(stage='fit')

Setup base class and instantiate:

augmentation
instance solutions and rewards
original policy state dict

Source code in rl4co/models/zoo/active_search/search.py

def setup(self, stage="fit"):
    """Setup base class and instantiate:
    - augmentation
    - instance solutions and rewards
    - original policy state dict
    """
    log.info("Setting up active search...")
    super(ActiveSearch, self).setup(stage)

    # Instantiate augmentation
    self.augmentation = StateAugmentation(
        num_augment=self.hparams.augment_size,
        augment_fn="dihedral8" if self.hparams.augment_dihedral else "symmetric",
    )

    # Store original policy state dict
    self.original_policy_state = self.policy.state_dict()

    # Get dataset size and problem size
    dataset_size = len(self.dataset)
    _batch = next(iter(self.train_dataloader()))
    self.problem_size = self.env.reset(_batch)["action_mask"].shape[-1]
    self.instance_solutions = torch.zeros(
        dataset_size, self.problem_size * 2, dtype=int
    )
    self.instance_rewards = torch.zeros(dataset_size)

on_train_batch_start ¶

on_train_batch_start(batch: Any, batch_idx: int)

Called before training (i.e. search) for a new batch begins. We re-load the original policy state dict and configure the optimizer.

Source code in rl4co/models/zoo/active_search/search.py

def on_train_batch_start(self, batch: Any, batch_idx: int):
    """Called before training (i.e. search) for a new batch begins.
    We re-load the original policy state dict and configure the optimizer.
    """
    self.policy.load_state_dict(self.original_policy_state)
    self.configure_optimizers(self.policy.parameters())

training_step ¶

training_step(batch, batch_idx)

Main search loop. We use the training step to effectively adapt to a batch of instances.

Source code in rl4co/models/zoo/active_search/search.py

def training_step(self, batch, batch_idx):
    """Main search loop. We use the training step to effectively adapt to a `batch` of instances."""
    # Augment state
    batch_size = batch.shape[0]
    td_init = self.env.reset(batch)
    n_aug, n_start, n_runs = (
        self.augmentation.num_augment,
        self.env.get_num_starts(td_init),
        self.hparams.num_parallel_runs,
    )
    td_init = self.augmentation(td_init)
    td_init = batchify(td_init, n_runs)

    # Solution and reward buffer
    max_reward = torch.full((batch_size,), -float("inf"), device=batch.device)
    best_solutions = torch.zeros(
        batch_size, self.problem_size * 2, device=batch.device, dtype=int
    )

    # Init search
    t_start = time.time()
    for i in range(self.hparams.max_iters):
        # Evaluate policy with sampling multistarts (as in POMO)
        out = self.policy(
            td_init.clone(),
            env=self.env,
            decode_type="multistart_sampling",
            num_starts=n_start,
        )

        if i == 0:
            log.info(f"Initial reward: {out['reward'].max():.2f}")

        # Update best solution and reward found
        max_reward_iter = out["reward"].max()
        if max_reward_iter > max_reward:
            max_reward_idx = out["reward"].argmax()
            best_solution_iter = out["actions"][max_reward_idx]
            max_reward = max_reward_iter
            best_solutions[0, : best_solution_iter.shape[0]] = best_solution_iter

        # Compute REINFORCE loss with shared baseline
        reward = unbatchify(out["reward"], (n_runs, n_aug, n_start))
        ll = unbatchify(out["log_likelihood"], (n_runs, n_aug, n_start))
        advantage = reward - reward.mean(dim=-1, keepdim=True)
        loss = -(advantage * ll).mean()

        # Backpropagate loss
        # perform manual optimization following the Lightning routine
        # https://lightning.ai/docs/pytorch/stable/common/optimization.html
        opt = self.optimizers()
        opt.zero_grad()
        self.manual_backward(loss)

        self.log_dict(
            {
                "loss": loss,
                "max_reward": max_reward,
                "step": i,
                "time": time.time() - t_start,
            },
            on_step=self.log_on_step,
        )

        # Stop if max runtime is exceeded
        if time.time() - t_start > self.hparams.max_runtime:
            break

    return {"max_reward": max_reward, "best_solutions": best_solutions}

on_train_batch_end ¶

on_train_batch_end(
    outputs: STEP_OUTPUT, batch: Any, batch_idx: int
) -> None

We store the best solution and reward found.

Source code in rl4co/models/zoo/active_search/search.py

def on_train_batch_end(
    self, outputs: STEP_OUTPUT, batch: Any, batch_idx: int
) -> None:
    """We store the best solution and reward found."""
    max_rewards, best_solutions = outputs["max_reward"], outputs["best_solutions"]
    self.instance_rewards[batch_idx] = max_rewards
    self.instance_solutions[batch_idx, :] = best_solutions.squeeze(
        0
    )  # only one instance
    log.info(f"Best reward: {max_rewards.mean():.2f}")

on_train_epoch_end ¶

on_train_epoch_end() -> None

Called when the training ends. If the epoch ends, it means we have finished searching over the instances, thus the trainer should stop.

Source code in rl4co/models/zoo/active_search/search.py

def on_train_epoch_end(self) -> None:
    """Called when the training ends.
    If the epoch ends, it means we have finished searching over the
    instances, thus the trainer should stop.
    """
    save_path = self.hparams.save_path
    if save_path is not None:
        log.info(f"Saving solutions and rewards to {save_path}...")
        torch.save(
            {"solutions": self.instance_solutions, "rewards": self.instance_rewards},
            save_path,
        )

    # https://github.com/Lightning-AI/lightning/issues/1406
    self.trainer.should_stop = True

Efficent Active Search (EAS)¶

Classes:

EAS –

Efficient Active Search for Neural Combination Optimization from Hottung et al. (2022).
EASEmb –

EAS with embedding adaptation
EASLay –

EAS with layer adaptation

EAS ¶

EAS(
    env,
    policy,
    dataset: Dataset | str,
    use_eas_embedding: bool = True,
    use_eas_layer: bool = False,
    eas_emb_cache_keys: list[str] = ["logit_key"],
    eas_lambda: float = 0.013,
    batch_size: int = 2,
    max_iters: int = 200,
    augment_size: int = 8,
    augment_dihedral: bool = True,
    num_parallel_runs: int = 1,
    baseline: str = "multistart",
    max_runtime: int = 86400,
    save_path: str = None,
    optimizer: str | Optimizer | partial = "Adam",
    optimizer_kwargs: dict = {
        "lr": 0.0041,
        "weight_decay": 1e-06,
    },
    verbose: bool = True,
    **kwargs
)

Bases: TransductiveModel

Efficient Active Search for Neural Combination Optimization from Hottung et al. (2022). Fine-tunes a subset of parameters (such as node embeddings or newly added layers) thus avoiding expensive re-encoding of the problem. Reference: https://openreview.net/pdf?id=nO5caZwFwYu

Parameters:

env –

RL4CO environment to be solved
policy –

policy network
dataset (Dataset | str) –

dataset to be used for training
use_eas_embedding (bool, default: True ) –

whether to use EAS embedding (EASEmb)
use_eas_layer (bool, default: False ) –

whether to use EAS layer (EASLay)
eas_emb_cache_keys (list[str], default: ['logit_key'] ) –

keys to cache in the embedding
eas_lambda (float, default: 0.013 ) –

lambda parameter for IL loss
batch_size (int, default: 2 ) –

batch size for training
max_iters (int, default: 200 ) –

maximum number of iterations
augment_size (int, default: 8 ) –

number of augmentations per state
augment_dihedral (bool, default: True ) –

whether to augment with dihedral rotations
parallel_runs –

number of parallel runs
baseline (str, default: 'multistart' ) –

REINFORCE baseline type (multistart, symmetric, full)
max_runtime (int, default: 86400 ) –

maximum runtime in seconds
save_path (str, default: None ) –

path to save solution checkpoints
optimizer (str | Optimizer | partial, default: 'Adam' ) –

optimizer to use for training
optimizer_kwargs (dict, default: {'lr': 0.0041, 'weight_decay': 1e-06} ) –

keyword arguments for optimizer
verbose (bool, default: True ) –

whether to print progress for each iteration

Methods:

setup –

Setup base class and instantiate:
on_train_batch_start –

Called before training (i.e. search) for a new batch begins.
training_step –

Main search loop. We use the training step to effectively adapt to a batch of instances.
on_train_batch_end –

We store the best solution and reward found.
on_train_epoch_end –

Called when the train ends.