Hydra Configuration¶

Hydra makes it extremely convenient to configure projects with lots of parameter settings like the RL4CO library.

While you don't need Hydra to use RL4CO, it is recommended to use it for your own projects to make it easier to manage the configuration of your experiments.

Hydra uses config files in .yaml format for this. These files can be found in the configs/ folder, where the subfolders define configurations for specific parts of the framework which are then combined in the main.yaml configuration. In this tutorial we will have a look at how to use these different configuration files and how to add new parameters to the configuration.

In [1]:

Copied!

from hydra import compose, initialize
from omegaconf import OmegaConf

ROOT_DIR = "../../" # relative to this file
from hydra import compose, initialize
from omegaconf import OmegaConf

ROOT_DIR = "../../" # relative to this file

In [2]:

Copied!

# context initialization
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main")
# context initialization
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main")

Hydra stores the configurations in a dictionary like object called OmegaConf

In [3]:

Copied!

type(cfg)
type(cfg)

Out[3]:

omegaconf.dictconfig.DictConfig

The different subfolders in the configs folder are represented as distinct keys in the omegaconf

In [4]:

Copied!

list(cfg.keys())
list(cfg.keys())

Out[4]:

['mode',
 'tags',
 'train',
 'test',
 'compile',
 'ckpt_path',
 'seed',
 'matmul_precision',
 'model',
 'callbacks',
 'logger',
 'trainer',
 'paths',
 'extras',
 'env']

Keys can be accessed using the dot notation (e.g. cfg.model) or via normal dictionaries:

In [5]:

Copied!

print(cfg.model == cfg["model"])
print(cfg.model == cfg["model"])

True

The dot notation is however more convenient especially in nested structures

In [6]:

Copied!

print(cfg.model._target_ == cfg["model"]["_target_"])
print(cfg.model._target_ == cfg["model"]["_target_"])

True

For example, lets look at the model configuration (which corresponds the model/default.yaml configuration).

In [7]:

Copied!

print(OmegaConf.to_yaml(cfg.model))
print(OmegaConf.to_yaml(cfg.model))

generate_default_data: true
metrics:
  train:
  - loss
  - reward
  val:
  - reward
  test:
  - reward
  log_on_step: true
_target_: rl4co.models.AttentionModel
baseline: rollout
batch_size: 512
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1280000
val_data_size: 10000
test_data_size: 10000
optimizer_kwargs:
  lr: 0.0001

If we want to change parts of the configuration, it is generally a good practice to make the changes via the command line when executing the respective python script (in the case of RL4CO for example rl4co/tasks/train.py). For example, if we want to use a different model configuration, we can do something like:

python train.py model=pomo model.batch_size=32

Here we use the model/pomo.yaml configuration for the model and also change the batch size during training to 32.

Note: check out the see override syntax documentation on the Hydra website for more!

In [8]:

Copied!

with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["model=pomo","model.batch_size=32"])
    print(OmegaConf.to_yaml(cfg.model))
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["model=pomo","model.batch_size=32"])
    print(OmegaConf.to_yaml(cfg.model))

generate_default_data: true
metrics:
  train:
  - loss
  - reward
  val:
  - reward
  - max_reward
  - max_aug_reward
  test: ${metrics.val}
  log_on_step: true
_target_: rl4co.models.POMO
num_augment: 8
batch_size: 32
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1280000
val_data_size: 10000
test_data_size: 10000
optimizer_kwargs:
  lr: 0.0001

It is also possible to add new parameters to a config using the + prefix. Using ++ will add a new parameter if it does not exist and overwrite it if it does.

In [9]:

Copied!

with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["model=pomo","model.batch_size=32","+model.num_starts=10"])
    print(OmegaConf.to_yaml(cfg.model))
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["model=pomo","model.batch_size=32","+model.num_starts=10"])
    print(OmegaConf.to_yaml(cfg.model))

generate_default_data: true
metrics:
  train:
  - loss
  - reward
  val:
  - reward
  - max_reward
  - max_aug_reward
  test: ${metrics.val}
  log_on_step: true
_target_: rl4co.models.POMO
num_augment: 8
batch_size: 32
val_batch_size: 1024
test_batch_size: 1024
train_data_size: 1280000
val_data_size: 10000
test_data_size: 10000
optimizer_kwargs:
  lr: 0.0001
num_starts: 10

Likewise, we can also remove unwanted parts of the configuration. For example, if we do not want to use any experiment configuration, we can remove the changes to the configuration made by experiments/base.yaml using the ~ prefix:

In [10]:

Copied!

with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["model=pomo","~experiment"])
    print(OmegaConf.to_yaml(cfg.model))
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["model=pomo","~experiment"])
    print(OmegaConf.to_yaml(cfg.model))

generate_default_data: true
metrics:
  train:
  - loss
  - reward
  val:
  - reward
  - max_reward
  - max_aug_reward
  test: ${metrics.val}
  log_on_step: true
_target_: rl4co.models.POMO
num_augment: 8

As you can see, parameters like "batch_size" were removed from the model config, as those were set by the experiment config base.yaml. Through the hashbang

# @package _global_

in the configs/experiments/base.yaml, this configuration is able to make changes to all parts of the configuration (like model, trainer, logger). So instead of adding a new key to the omegaconf object, configurations with a # @package _global_ hashbang typically alter other parts of the configuration.

Another example of such a configuration is the debug/default.yaml, which sets all parameters into a lightweight debugging mode:

In [11]:

Copied!

with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["debug=default"])
    print(OmegaConf.to_yaml(cfg.model))
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
    cfg = compose(config_name="main", overrides=["debug=default"])
    print(OmegaConf.to_yaml(cfg.model))

generate_default_data: true
metrics:
  train:
  - loss
  - reward
  val:
  - reward
  test:
  - reward
  log_on_step: true
_target_: rl4co.models.AttentionModel
baseline: rollout
batch_size: 8
val_batch_size: 32
test_batch_size: 32
train_data_size: 64
val_data_size: 1000
test_data_size: 1000
optimizer_kwargs:
  lr: 0.0001

Summary¶

Reference config files using the CLI flag <key>=<config_file> (e.g. model=am)
Add parameters (or even entire keys) to the config using the "+" prefix (e.g. +model.batch_size=32)
Remove parameters (or even entire keys) to the config using the "~" prefix (e.g. ~logger.wandb)
The # @package _global_ hashbang allows global access from any config file
Turn on debugging mode using debug=default