Customize mixed algorithms¶
Here we show how to customize mixed algorithms with our algorithm components. We take AutoSlim as an example.
Note
Why is AutoSlim a mixed algorithm?
In AutoSlim, the sandwich rule and the inplace distillation will be introduced to enhance the training process, which is called as the slimmable training. The sandwich rule means that we train the model at smallest width, largest width and (n − 2) random widths, instead of n random widths. And the inplace distillation means that we use the predicted label of the model at the largest width as the training label for other widths, while for the largest width we use ground truth. So both the KD algorithm and the pruning algorithm are used in AutoSlim.
Register a new algorithm
Create a new file mmrazor/models/algorithms/nas/autoslim.py
, class AutoSlim
inherits from class BaseAlgorithm
. You need to build the KD algorithm component (distiller) and the pruning algorithm component (mutator) because AutoSlim is a mixed algorithm.
Note
You can also inherit from the existing algorithm instead of BaseAlgorithm
if your algorithm is similar to the existing algorithm.
Note
You can choose existing algorithm components in MMRazor, such as OneShotChannelMutator
and ConfigurableDistiller
in AutoSlim.
If these in MMRazor don’t meet your needs, you can customize new algorithm components for your algorithm. Reference is as follows:
Customize NAS algorithms Customize Pruning algorithms Customize KD algorithms
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Dict, List, Optional, Union
import torch
from torch import nn
from mmrazor.models.distillers import ConfigurableDistiller
from mmrazor.models.mutators import OneShotChannelMutator
from mmrazor.registry import MODELS
from ..base import BaseAlgorithm
VALID_MUTATOR_TYPE = Union[OneShotChannelMutator, Dict]
VALID_DISTILLER_TYPE = Union[ConfigurableDistiller, Dict]
@MODELS.register_module()
class AutoSlim(BaseAlgorithm):
def __init__(self,
mutator: VALID_MUTATOR_TYPE,
distiller: VALID_DISTILLER_TYPE,
architecture: Union[BaseModel, Dict],
data_preprocessor: Optional[Union[Dict, nn.Module]] = None,
num_random_samples: int = 2,
init_cfg: Optional[Dict] = None) -> None:
super().__init__(architecture, data_preprocessor, init_cfg)
self.mutator = self._build_mutator(mutator)
# `prepare_from_supernet` must be called before distiller initialized
self.mutator.prepare_from_supernet(self.architecture)
self.distiller = self._build_distiller(distiller)
self.distiller.prepare_from_teacher(self.architecture)
self.distiller.prepare_from_student(self.architecture)
......
def _build_mutator(self,
mutator: VALID_MUTATOR_TYPE) -> OneShotChannelMutator:
"""build mutator."""
if isinstance(mutator, dict):
mutator = MODELS.build(mutator)
if not isinstance(mutator, OneShotChannelMutator):
raise TypeError('mutator should be a `dict` or '
'`OneShotModuleMutator` instance, but got '
f'{type(mutator)}')
return mutator
def _build_distiller(
self, distiller: VALID_DISTILLER_TYPE) -> ConfigurableDistiller:
if isinstance(distiller, dict):
distiller = MODELS.build(distiller)
if not isinstance(distiller, ConfigurableDistiller):
raise TypeError('distiller should be a `dict` or '
'`ConfigurableDistiller` instance, but got '
f'{type(distiller)}')
return distiller
Implement the core logic in
train_step
In train_step
, both the mutator
and the distiller
play an important role. For example, sample_subnet
, set_max_subnet
and set_min_subnet
are supported by the mutator
, and the function ofdistill_step
is mainly implemented by the distiller
.
@MODELS.register_module()
class AutoSlim(BaseAlgorithm):
......
def train_step(self, data: List[dict],
optim_wrapper: OptimWrapper) -> Dict[str, torch.Tensor]:
def distill_step(
batch_inputs: torch.Tensor, data_samples: List[BaseDataElement]
) -> Dict[str, torch.Tensor]:
......
......
batch_inputs, data_samples = self.data_preprocessor(data, True)
total_losses = dict()
for kind in self.sample_kinds:
# update the max subnet loss.
if kind == 'max':
self.set_max_subnet()
......
total_losses.update(add_prefix(max_subnet_losses, 'max_subnet'))
# update the min subnet loss.
elif kind == 'min':
self.set_min_subnet()
min_subnet_losses = distill_step(batch_inputs, data_samples)
total_losses.update(add_prefix(min_subnet_losses, 'min_subnet'))
# update the random subnets loss.
elif 'random' in kind:
self.set_subnet(self.sample_subnet())
random_subnet_losses = distill_step(batch_inputs, data_samples)
total_losses.update(
add_prefix(random_subnet_losses, f'{kind}_subnet'))
return total_losses
Import the class
You can either add the following line to mmrazor/models/algorithms/nas/__init__.py
from .autoslim import AutoSlim
__all__ = ['AutoSlim']
or alternatively add
custom_imports = dict(
imports=['mmrazor.models.algorithms.nas.autoslim'],
allow_failed_imports=False)
to the config file to avoid modifying the original code.
Use the algorithm in your config file
model= dict(
type='mmrazor.AutoSlim',
architecture=...,
mutator=dict(
type='OneShotChannelMutator',
...),
distiller=dict(
type='ConfigurableDistiller',
...),
...)