Hydra Configuration¶
Hydra makes it extremely convenient to configure projects with lots of parameter settings like the RL4CO library.
While you don't need Hydra to use RL4CO, it is recommended to use it for your own projects to make it easier to manage the configuration of your experiments.
Hydra uses config files in .yaml
format for this. These files can be found in the configs/ folder, where the subfolders define configurations for specific parts of the framework which are then combined in the main.yaml configuration. In this tutorial we will have a look at how to use these different configuration files and how to add new parameters to the configuration.
from hydra import compose, initialize
from omegaconf import OmegaConf
ROOT_DIR = "../../" # relative to this file
# context initialization
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
cfg = compose(config_name="main")
Hydra stores the configurations in a dictionary like object called OmegaConf
type(cfg)
omegaconf.dictconfig.DictConfig
The different subfolders in the configs folder are represented as distinct keys in the omegaconf
list(cfg.keys())
['mode', 'tags', 'train', 'test', 'compile', 'ckpt_path', 'seed', 'matmul_precision', 'model', 'callbacks', 'logger', 'trainer', 'paths', 'extras', 'env']
Keys can be accessed using the dot notation (e.g. cfg.model
) or via normal dictionaries:
print(cfg.model == cfg["model"])
True
The dot notation is however more convenient especially in nested structures
print(cfg.model._target_ == cfg["model"]["_target_"])
True
For example, lets look at the model configuration (which corresponds the model/default.yaml configuration).
print(OmegaConf.to_yaml(cfg.model))
generate_default_data: true metrics: train: - loss - reward val: - reward test: - reward log_on_step: true _target_: rl4co.models.AttentionModel baseline: rollout batch_size: 512 val_batch_size: 1024 test_batch_size: 1024 train_data_size: 1280000 val_data_size: 10000 test_data_size: 10000 optimizer_kwargs: lr: 0.0001
If we want to change parts of the configuration, it is generally a good practice to make the changes via the command line when executing the respective python script (in the case of RL4CO for example rl4co/tasks/train.py). For example, if we want to use a different model configuration, we can do something like:
python train.py model=pomo model.batch_size=32
Here we use the model/pomo.yaml configuration for the model and also change the batch size during training to 32.
Note: check out the see override syntax documentation on the Hydra website for more!
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
cfg = compose(config_name="main", overrides=["model=pomo","model.batch_size=32"])
print(OmegaConf.to_yaml(cfg.model))
generate_default_data: true metrics: train: - loss - reward val: - reward - max_reward - max_aug_reward test: ${metrics.val} log_on_step: true _target_: rl4co.models.POMO num_augment: 8 batch_size: 32 val_batch_size: 1024 test_batch_size: 1024 train_data_size: 1280000 val_data_size: 10000 test_data_size: 10000 optimizer_kwargs: lr: 0.0001
It is also possible to add new parameters to a config using the +
prefix. Using ++
will add a new parameter if it does not exist and overwrite it if it does.
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
cfg = compose(config_name="main", overrides=["model=pomo","model.batch_size=32","+model.num_starts=10"])
print(OmegaConf.to_yaml(cfg.model))
generate_default_data: true metrics: train: - loss - reward val: - reward - max_reward - max_aug_reward test: ${metrics.val} log_on_step: true _target_: rl4co.models.POMO num_augment: 8 batch_size: 32 val_batch_size: 1024 test_batch_size: 1024 train_data_size: 1280000 val_data_size: 10000 test_data_size: 10000 optimizer_kwargs: lr: 0.0001 num_starts: 10
Likewise, we can also remove unwanted parts of the configuration. For example, if we do not want to use any experiment configuration, we can remove the changes to the configuration made by experiments/base.yaml using the ~
prefix:
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
cfg = compose(config_name="main", overrides=["model=pomo","~experiment"])
print(OmegaConf.to_yaml(cfg.model))
generate_default_data: true metrics: train: - loss - reward val: - reward - max_reward - max_aug_reward test: ${metrics.val} log_on_step: true _target_: rl4co.models.POMO num_augment: 8
As you can see, parameters like "batch_size" were removed from the model config, as those were set by the experiment config base.yaml. Through the hashbang
# @package _global_
in the configs/experiments/base.yaml, this configuration is able to make changes to all parts of the configuration (like model, trainer, logger). So instead of adding a new key to the omegaconf object, configurations with a # @package _global_
hashbang typically alter other parts of the configuration.
Another example of such a configuration is the debug/default.yaml, which sets all parameters into a lightweight debugging mode:
with initialize(version_base=None, config_path=ROOT_DIR+"configs"):
cfg = compose(config_name="main", overrides=["debug=default"])
print(OmegaConf.to_yaml(cfg.model))
generate_default_data: true metrics: train: - loss - reward val: - reward test: - reward log_on_step: true _target_: rl4co.models.AttentionModel baseline: rollout batch_size: 8 val_batch_size: 32 test_batch_size: 32 train_data_size: 64 val_data_size: 1000 test_data_size: 1000 optimizer_kwargs: lr: 0.0001
Summary¶
- Reference config files using the CLI flag
<key>=<config_file>
(e.g.model=am
) - Add parameters (or even entire keys) to the config using the "+" prefix (e.g.
+model.batch_size=32
) - Remove parameters (or even entire keys) to the config using the "~" prefix (e.g.
~logger.wandb
) - The
# @package _global_
hashbang allows global access from any config file - Turn on debugging mode using
debug=default