Engine

Training

class ezflow.engine.trainer.DistributedTrainer(cfg, model, train_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator, val_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator)[source]

Trainer class for distributed training and evaluating models on a single node multi-gpu environment.

Parameters
  • cfg (CfgNode) – Configuration object for training

  • model (torch.nn.Module) – Model to be trained

  • train_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for training

  • val_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for validation

train(loss_fn=None, optimizer=None, scheduler=None, total_epochs=None, start_epoch=None)[source]

Method to train the model in a distributed fashion using DDP

Parameters
  • loss_fn (torch.nn.modules.loss, optional) – The loss function to be used. Defaults to None (which uses the loss function specified in the config file).

  • optimizer (torch.optim.Optimizer, optional) – The optimizer to be used. Defaults to None (which uses the optimizer specified in the config file).

  • scheduler (torch.optim.lr_scheduler, optional) – The learning rate scheduler to be used. Defaults to None (which uses the scheduler specified in the config file).

  • total_epochs (int, optional) – The number of epochs to train for. Defaults to None (which uses the number of epochs specified in the config file)

  • start_epoch (int, optional) – The epoch number to resume training from. Defaults to None (which starts from 0).

class ezflow.engine.trainer.Trainer(cfg, model, train_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator, val_loader_creator: ezflow.data.dataloader.dataloader_creator.DataloaderCreator)[source]

Trainer class for training and evaluating models on a single device CPU/GPU.

Parameters
  • cfg (CfgNode) – Configuration object for training

  • model (torch.nn.Module) – Model to be trained

  • train_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for training

  • val_loader_creator (ezflow.data.DataloaderCreator) – DataloaderCreator instance for validation

train(loss_fn=None, optimizer=None, scheduler=None, total_epochs=None, start_epoch=None)[source]

Method to train the model using a single cpu/gpu device.

Parameters
  • loss_fn (torch.nn.modules.loss, optional) – The loss function to be used. Defaults to None (which uses the loss function specified in the config file).

  • optimizer (torch.optim.Optimizer, optional) – The optimizer to be used. Defaults to None (which uses the optimizer specified in the config file).

  • scheduler (torch.optim.lr_scheduler, optional) – The learning rate scheduler to be used. Defaults to None (which uses the scheduler specified in the config file).

  • total_epochs (int, optional) – The number of epochs train for. Defaults to None (which uses the number of epochs specified in the config file)

  • start_epoch (int, optional) – The epoch to resume training from. Defaults to None (which starts from 0).

Evaluation

ezflow.engine.eval.eval_model(model, dataloader, device, metric=None, profiler=None, flow_scale=1.0, pad_divisor=1)[source]

Evaluates a model on a dataloader and optionally profiles model characteristics such as memory usage, inference time, and evaluation metric

Parameters
  • model (torch.nn.Module) – Model to be used for prediction / inference

  • dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference

  • device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference

  • metric (function, optional) – Function to be used to calculate the evaluation metric

  • profiler (torch.profiler.profile, optional) – Profiler to be used for profiling model characteristics

  • flow_scale (float, optional) – Scale factor to be applied to the predicted flow

  • pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Returns

Average evaluation metric

Return type

float

ezflow.engine.eval.profile_inference(model, dataloader, device, metric_fn, profiler, flow_scale=1.0, count_params=False, pad_divisor=1)[source]

Uses a model to perform inference on a dataloader and profiles model characteristics such as memory usage, inference time, and evaluation metric

Parameters
  • model (torch.nn.Module) – Model to be used for prediction / inference

  • dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference

  • device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference

  • metric_fn (function) – Function to be used to calculate the evaluation metric

  • profiler (ezflow.engine.Profiler) – Profiler to be used for collecting performance metrics of the model

  • flow_scale (float, optional) – Scale factor to be applied to the predicted flow

  • count_params (bool, optional) – Flag to indicate whether to count model parameters

  • pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Returns

  • metric_meter (AverageMeter) – AverageMeter object containing the evaluation metric information

  • avg_inference_time (float) – Average inference time

ezflow.engine.eval.run_inference(model, dataloader, device, metric_fn, flow_scale=1.0, pad_divisor=1)[source]

Uses a model to perform inference on a dataloader and captures inference time and evaluation metric

Parameters
  • model (torch.nn.Module) – Model to be used for prediction / inference

  • dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference

  • device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference

  • metric_fn (function) – Function to be used to calculate the evaluation metric

  • flow_scale (float, optional) – Scale factor to be applied to the predicted flow

  • pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Returns

  • metric_meter (AverageMeter) – AverageMeter object containing the evaluation metric information

  • avg_inference_time (float) – Average inference time

ezflow.engine.eval.warmup(model, dataloader, device, pad_divisor=1)[source]

Performs an iteration of dataloading and model prediction to warm up CUDA device

Parameters
  • model (torch.nn.Module) – Model to be used for prediction / inference

  • dataloader (torch.utils.data.DataLoader) – Dataloader to be used for prediction / inference

  • device (torch.device) – Device (CUDA / CPU) to be used for prediction / inference

  • pad_divisor (int, optional) – The divisor to make the image dimensions evenly divisible by using padding, by default 1

Pruning

ezflow.engine.pruning.prune_l1_structured(model, layer_type, proportion)[source]

L1 structured pruning

Parameters
  • model (torch.nn.Module) – The model to prune

  • layer_type (torch.nn.Module) – The layer type to prune

  • proportion (float) – The proportion of weights to prune

ezflow.engine.pruning.prune_l1_unstructured(model, layer_type, proportion)[source]

L1 unstructured pruning

Parameters
  • model (torch.nn.Module) – The model to prune

  • layer_type (torch.nn.Module) – The layer type to prune

  • proportion (float) – The proportion of weights to prune

Profiler

class ezflow.engine.profiler.Profiler(model_name, log_dir, profile_cpu=False, profile_cuda=False, profile_memory=False, record_shapes=False, skip_first=0, wait=0, warmup=1, active=1, repeat=10)[source]

This class is a wrapper to initialize the parameters of PyTorch profiler. An instance of this class can be passed as an argument to ezflow.engine.eval_model to enable profiling of the model during inference.

Official documentation on torch.profiler

Parameters
  • model_name (str) – Name of the model

  • log_dir (str) – Path to save the profiling logs

  • profile_cpu (bool, optional) – Enable CPU profiling, by default False

  • profile_cuda (bool, optional) – Enable CUDA profiling, by default False

  • profile_memory (bool, optional) – Enable memory profiling, by default False

  • record_shapes (bool, optional) – Enable shape recording for tensors, by default False

  • skip_first (int, optional) – Number of warmup iterations to skip, by default 0

  • wait (int, optional) – Number of seconds to wait before starting the profiler, by default 0

  • warmup (int, optional) – Number of iterations to warmup the profiler, by default 1

  • active (int, optional) – Number of iterations to profile, by default 1

  • repeat (int, optional) – Number of times to repeat the profiling, by default 10

Retrieve

Adapted from Detectron2 (https://github.com/facebookresearch/detectron2)

ezflow.engine.retrieve.get_training_cfg(cfg_path=None, cfg_name=None, custom=True)[source]
Parameters
  • cfg_path (str) – Path to the config file.

  • cfg_name (str) – Name of the config file.

  • custom (bool) – If True, the config file is assumed to be a custom config file. If False, the config file is assumed to be a standard config file present in ezflow/configs/trainers.

Returns

cfg – The config object

Return type

CfgNode