Engine

Training

class ezflow.engine.trainer.DistributedTrainer(cfg, model, train_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator, val_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator)[source]

Trainer class for distributed training and evaluating models on a single node multi-gpu environment.

Parameters

cfg (CfgNode) – Configuration object for training
model (torch.nn.Module) – Model to be trained
train_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for training
val_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for validation

train(loss_fn=None, optimizer=None, scheduler=None, total_epochs=None, start_epoch=None)[source]

Method to train the model in a distributed fashion using DDP

Parameters

loss_fn (torch.nn.modules.loss, optional) – The loss function to be used. Defaults to None (which uses the loss function specified in the config file).
optimizer (torch.optim.Optimizer, optional) – The optimizer to be used. Defaults to None (which uses the optimizer specified in the config file).
scheduler (torch.optim.lr_scheduler, optional) – The learning rate scheduler to be used. Defaults to None (which uses the scheduler specified in the config file).
total_epochs (int, optional) – The number of epochs to train for. Defaults to None (which uses the number of epochs specified in the config file)
start_epoch (int, optional) – The epoch number to resume training from. Defaults to None (which starts from 0).

class ezflow.engine.trainer.Trainer(cfg, model, train_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator, val_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator)[source]

Trainer class for training and evaluating models on a single device CPU/GPU.

Parameters

cfg (CfgNode) – Configuration object for training
model (torch.nn.Module) – Model to be trained
train_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for training
val_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for validation

train(loss_fn=None, optimizer=None, scheduler=None, total_epochs=None, start_epoch=None)[source]

Method to train the model using a single cpu/gpu device.

Parameters

loss_fn (torch.nn.modules.loss, optional) – The loss function to be used. Defaults to None (which uses the loss function specified in the config file).
optimizer (torch.optim.Optimizer, optional) – The optimizer to be used. Defaults to None (which uses the optimizer specified in the config file).
scheduler (torch.optim.lr_scheduler, optional) – The learning rate scheduler to be used. Defaults to None (which uses the scheduler specified in the config file).
total_epochs (int, optional) – The number of epochs train for. Defaults to None (which uses the number of epochs specified in the config file)
start_epoch (int, optional) – The epoch to resume training from. Defaults to None (which starts from 0).

Evaluation

ezflow.engine.eval.eval_model(model, dataloader, device, metric=None, profiler=None, flow_scale=1.0, pad_divisor=1)[source]

Evaluates a model on a dataloader and optionally profiles model characteristics such as memory usage, inference time, and evaluation metric

Parameters

model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
metric (function, optional) – Function to be used to calculate the evaluation metric
profiler (torch.profiler.profile, optional) – Profiler to be used for profiling model characteristics
flow_scale (float, optional) – Scale factor to be applied to the predicted flow
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Returns

Average evaluation metric

Return type

float

ezflow.engine.eval.profile_inference(model, dataloader, device, metric_fn, profiler, flow_scale=1.0, count_params=False, pad_divisor=1)[source]

Uses a model to perform inference on a dataloader and profiles model characteristics such as memory usage, inference time, and evaluation metric

Parameters

model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
metric_fn (function) – Function to be used to calculate the evaluation metric
profiler (ezflow.engine.Profiler) – Profiler to be used for collecting performance metrics of the model
flow_scale (float, optional) – Scale factor to be applied to the predicted flow
count_params (bool, optional) – Flag to indicate whether to count model parameters
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Returns

metric_meter (AverageMeter) – AverageMeter object containing the evaluation metric information
avg_inference_time (float) – Average inference time

ezflow.engine.eval.run_inference(model, dataloader, device, metric_fn, flow_scale=1.0, pad_divisor=1)[source]

Uses a model to perform inference on a dataloader and captures inference time and evaluation metric

Parameters

model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
metric_fn (function) – Function to be used to calculate the evaluation metric
flow_scale (float, optional) – Scale factor to be applied to the predicted flow
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Returns

metric_meter (AverageMeter) – AverageMeter object containing the evaluation metric information
avg_inference_time (float) – Average inference time

ezflow.engine.eval.warmup(model, dataloader, device, pad_divisor=1)[source]

Performs an iteration of dataloading and model prediction to warm up CUDA device

Parameters

model (torch.nn.Module) – Model to be used for prediction / inference
dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference
device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference
pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Pruning

ezflow.engine.pruning.prune_l1_structured(model, layer_type, proportion)[source]

L1 structured pruning

Parameters

model (torch.nn.Module) – The model to prune
layer_type (torch.nn.Module) – The layer type to prune
proportion (float) – The proportion of weights to prune

ezflow.engine.pruning.prune_l1_unstructured(model, layer_type, proportion)[source]

L1 unstructured pruning

Parameters

model (torch.nn.Module) – The model to prune
layer_type (torch.nn.Module) – The layer type to prune
proportion (float) – The proportion of weights to prune

Profiler

class ezflow.engine.profiler.Profiler(model_name, log_dir, profile_cpu=False, profile_cuda=False, profile_memory=False, record_shapes=False, skip_first=0, wait=0, warmup=1, active=1, repeat=10)[source]

This class is a wrapper to initialize the parameters of PyTorch profiler. An instance of this class can be passed as an argument to ezflow.engine.eval_model to enable profiling of the model during inference.

Official documentation on torch.profiler

Parameters

model_name (str) – Name of the model
log_dir (str) – Path to save the profiling logs
profile_cpu (bool, optional) – Enable CPU profiling, by default False
profile_cuda (bool, optional) – Enable CUDA profiling, by default False
profile_memory (bool, optional) – Enable memory profiling, by default False
record_shapes (bool, optional) – Enable shape recording for tensors, by default False
skip_first (int, optional) – Number of warmup iterations to skip, by default 0
wait (int, optional) – Number of seconds to wait before starting the profiler, by default 0
warmup (int, optional) – Number of iterations to warmup the profiler, by default 1
active (int, optional) – Number of iterations to profile, by default 1
repeat (int, optional) – Number of times to repeat the profiling, by default 10

Retrieve

Adapted from Detectron2 (https://github.com/facebookresearch/detectron2)

ezflow.engine.retrieve.get_training_cfg(cfg_path=None, cfg_name=None, custom=True)[source]

Parameters

cfg_path (str) – Path to the config file.
cfg_name (str) – Name of the config file.
custom (bool) – If True, the config file is assumed to be a custom config file. If False, the config file is assumed to be a standard config file present in ezflow/configs/trainers.

Returns

cfg – The config object

Return type

CfgNode