Skip to content

A2C

A2C

A2C(
    env: RL4COEnvBase,
    policy: Module,
    critic: CriticNetwork = None,
    critic_kwargs: dict = {},
    actor_optimizer_kwargs: dict = {"lr": 0.0001},
    critic_optimizer_kwargs: dict = None,
    **kwargs
)

Bases: REINFORCE

Advantage Actor Critic (A2C) algorithm. A2C is a variant of REINFORCE where a baseline is provided by a critic network. Here we additionally support different optimizers for the actor and the critic.

Parameters:

  • env (RL4COEnvBase) –

    Environment to use for the algorithm

  • policy (Module) –

    Policy to use for the algorithm

  • critic (CriticNetwork, default: None ) –

    Critic network to use for the algorithm

  • critic_kwargs (dict, default: {} ) –

    Keyword arguments to pass to the critic network

  • actor_optimizer_kwargs (dict, default: {'lr': 0.0001} ) –

    Keyword arguments for the policy (=actor) optimizer

  • critic_optimizer_kwargs (dict, default: None ) –

    Keyword arguments for the critic optimizer. If None, use the same as actor_optimizer_kwargs

  • **kwargs

    Keyword arguments passed to the superclass

Source code in rl4co/models/rl/a2c/a2c.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def __init__(
    self,
    env: RL4COEnvBase,
    policy: nn.Module,
    critic: CriticNetwork = None,
    critic_kwargs: dict = {},
    actor_optimizer_kwargs: dict = {"lr": 1e-4},
    critic_optimizer_kwargs: dict = None,
    **kwargs,
):
    if critic is None:
        log.info("Creating critic network for {}".format(env.name))
        critic = create_critic_from_actor(policy, **critic_kwargs)

    # The baseline is directly created here, so we eliminate the baseline argument
    kwargs.pop("baseline", None)

    super().__init__(env, policy, baseline=CriticBaseline(critic), **kwargs)
    self.actor_optimizer_kwargs = actor_optimizer_kwargs
    self.critic_optimizer_kwargs = (
        critic_optimizer_kwargs
        if critic_optimizer_kwargs is not None
        else actor_optimizer_kwargs
    )

configure_optimizers

configure_optimizers()

Configure the optimizers for the policy and the critic network (=baseline)

Source code in rl4co/models/rl/a2c/a2c.py
52
53
54
55
56
57
58
def configure_optimizers(self):
    """Configure the optimizers for the policy and the critic network (=baseline)"""
    parameters = [
        {"params": self.policy.parameters(), **self.actor_optimizer_kwargs},
    ] + [{"params": self.baseline.parameters(), **self.critic_optimizer_kwargs}]

    return super().configure_optimizers(parameters)