Engine
Training
- class ezflow.engine.trainer.DistributedTrainer(cfg, model, train_loader_creator, val_loader_creator)[source]
Trainer class for distributed training and evaluating models on a single node multi-gpu environment.
- Parameters
cfg (CfgNode) – Configuration object for training
model (torch.nn.Module) – Model to be trained
train_loader_creator (ezflow.DataloaderCreator) – DataloaderCreator instance for training
val_loader_creator (ezflow.DataloaderCreator) – DataloaderCreator instance for validation
- train(loss_fn=None, optimizer=None, scheduler=None, total_iterations=None, start_iteration=None)[source]
Method to train the model in a distributed fashion using DDP
- Parameters
loss_fn (torch.nn.modules.loss, optional) – The loss function to be used. Defaults to None (which uses the loss function specified in the config file).
optimizer (torch.optim.Optimizer, optional) – The optimizer to be used. Defaults to None (which uses the optimizer specified in the config file).
scheduler (torch.optim.lr_scheduler, optional) – The learning rate scheduler to be used. Defaults to None (which uses the scheduler specified in the config file).
total_iterations (int, optional) – The number of epochs or steps to train for. Defaults to None (which uses the number of epochs specified in the config file)
start_iteration (int, optional) – The epoch or step number to resume training from. Defaults to None (which starts from 0).
- class ezflow.engine.trainer.Trainer(cfg, model, train_loader, val_loader)[source]
Trainer class for training and evaluating models on a single device CPU/GPU.
- Parameters
cfg (CfgNode) – Configuration object for training
model (torch.nn.Module) – Model to be trained
train_loader (torch.utils.data.DataLoader) – DataLoader for training
val_loader (torch.utils.data.DataLoader) – DataLoader for validation
- train(loss_fn=None, optimizer=None, scheduler=None, total_iterations=None, start_iteration=None)[source]
Method to train the model using a single cpu/gpu device.
- Parameters
loss_fn (torch.nn.modules.loss, optional) – The loss function to be used. Defaults to None (which uses the loss function specified in the config file).
optimizer (torch.optim.Optimizer, optional) – The optimizer to be used. Defaults to None (which uses the optimizer specified in the config file).
scheduler (torch.optim.lr_scheduler, optional) – The learning rate scheduler to be used. Defaults to None (which uses the scheduler specified in the config file).
total_iterations (int, optional) – The number of epochs or steps to train for. Defaults to None (which uses the number of epochs specified in the config file)
start_iteration (int, optional) – The epoch or step number to resume training from. Defaults to None (which starts from 0).
Evaluation
- ezflow.engine.eval.eval_model(model, dataloader, device, metric=None, profiler=None, flow_scale=1.0, pad_divisor=1)[source]
Evaluates a model on a dataloader and optionally profiles model characteristics such as memory usage, inference time, and evaluation metric
- Parameters
model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
metric (function, optional) – Function to be used to calculate the evaluation metric
profiler (torch.profiler.profile, optional) – Profiler to be used for profiling model characteristics
flow_scale (float, optional) – Scale factor to be applied to the predicted flow
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1
- Returns
Average evaluation metric
- Return type
float
- ezflow.engine.eval.profile_inference(model, dataloader, device, metric_fn, profiler, flow_scale=1.0, count_params=False, pad_divisor=1)[source]
Uses a model to perform inference on a dataloader and profiles model characteristics such as memory usage, inference time, and evaluation metric
- Parameters
model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
metric_fn (function) – Function to be used to calculate the evaluation metric
profiler (ezflow.engine.Profiler) – Profiler to be used for collecting performance metrics of the model
flow_scale (float, optional) – Scale factor to be applied to the predicted flow
count_params (bool, optional) – Flag to indicate whether to count model parameters
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1
- Returns
metric_meter (AverageMeter) – AverageMeter object containing the evaluation metric information
avg_inference_time (float) – Average inference time
- ezflow.engine.eval.run_inference(model, dataloader, device, metric_fn, flow_scale=1.0, pad_divisor=1)[source]
Uses a model to perform inference on a dataloader and captures inference time and evaluation metric
- Parameters
model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
metric_fn (function) – Function to be used to calculate the evaluation metric
flow_scale (float, optional) – Scale factor to be applied to the predicted flow
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1
- Returns
metric_meter (AverageMeter) – AverageMeter object containing the evaluation metric information
avg_inference_time (float) – Average inference time
- ezflow.engine.eval.warmup(model, dataloader, device, pad_divisor=1)[source]
Performs an iteration of dataloading and model prediction to warm up CUDA device
- Parameters
model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1
Pruning
Profiler
- class ezflow.engine.profiler.Profiler(model_name, log_dir, profile_cpu=False, profile_cuda=False, profile_memory=False, record_shapes=False, skip_first=0, wait=0, warmup=1, active=1, repeat=10)[source]
This class is a wrapper to initialize the parameters of PyTorch profiler. An instance of this class can be passed as an argument to ezflow.engine.eval_model to enable profiling of the model during inference.
Official documentation on torch.profiler
- Parameters
model_name (str) – Name of the model
log_dir (str) – Path to save the profiling logs
profile_cpu (bool, optional) – Enable CPU profiling, by default False
profile_cuda (bool, optional) – Enable CUDA profiling, by default False
profile_memory (bool, optional) – Enable memory profiling, by default False
record_shapes (bool, optional) – Enable shape recording for tensors, by default False
skip_first (int, optional) – Number of warmup iterations to skip, by default 0
wait (int, optional) – Number of seconds to wait before starting the profiler, by default 0
warmup (int, optional) – Number of iterations to warmup the profiler, by default 1
active (int, optional) – Number of iterations to profile, by default 1
repeat (int, optional) – Number of times to repeat the profiling, by default 10
Retrieve
Adapted from Detectron2 (https://github.com/facebookresearch/detectron2)
- ezflow.engine.retrieve.get_training_cfg(cfg_path=None, cfg_name=None, custom=True)[source]
- Parameters
cfg_path (str) – Path to the config file.
cfg_name (str) – Name of the config file.
custom (bool) – If True, the config file is assumed to be a custom config file. If False, the config file is assumed to be a standard config file present in ezflow/configs/trainers.
- Returns
cfg – The config object
- Return type