brainspy.algorithms package#

Module contents#

It provides different default algorithms for brains-py, that already add several features that are particular to dopant-networks. There are two main flavours of algorithms: Genetic Algorithm and Gradient Descent. Both algorithms can be executed seamlessly by importing and calling their corresponding ‘train’ function. For general purpose, the corresponding train function can be loaded from brainspy.utils.manager. For more advanced implementation, a custom algorithm is recommended. Check the wiki for more information.

Submodules#

brainspy.algorithms.ga module#

File containing the genetic algorithm methods and the optimizer class for training a single DNPU.

class brainspy.algorithms.ga.GeneticOptimizer(gene_ranges: list, partition: list | Tensor, epochs: int, alpha: float = 0.6, beta: float = 0.4)[source]#

Bases: object

A class for implementing a genetic algorithm optimisation solution for training DNPUs on and off chip, in a way that resembles a PyTorch optimizer.

crossover(new_pool: Tensor, chosen: list)[source]#

In genetic algorithms and evolutionary computation, crossover, also called recombination, is a genetic operator used to combine the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population, and is analogous to the crossover that happens during reproduction in biology.

Parameters:
  • new_pool (torch.Tensor) – A copy of the genome pool containing the set of all control voltage solutions ordered by fitness performance.

  • chosen (list) – Parents that have been chosen for potentially good solutions.

Returns:

new_pool – A genome pool containing the set of all control voltage solutions obtained by applying the crossover method against those solutions with higest fitness scores.

Return type:

torch.Tensor

crossover_blxab(parent1: Tensor, parent2: Tensor)[source]#

Creates a new offspring by selecting a random value from the interval between the two alleles of the parent solutions. The interval is increased in direction of the solution with better fitness by the factor alpha, and into the direction of the solution with worse fitness by the factor beta. Crossover method: Blend alpha beta crossover returns a new genome (voltage combination) from two parents. Here, parent 1 has a higher fitness than parent 2

Parameters:
  • parent1 (torch.Tensor) – Set of control voltages corresponding to a particular solution that has higher or equal fitness than parent2.

  • parent2 (torch.Tensor) – Set of control voltages corresponding to a particular solution that has lower or equal fitness than parent1.

Returns:

New genome (voltage combination) from two parent control voltages.

Return type:

torch.Tensor

linear_rank()[source]#

Linear ranking scheme used for stochastic universal sampling method.

Returns:

  • torch.Tensor

  • Tensor with the probability of a genome being chosen. The first probability corresponds to

  • the genome with the highest fitness, etc.

mutation(pool: Tensor)[source]#

Mutate all genes but the first partition[0], with a triangular distribution in gene range with mode=gene to be mutated.

Parameters:

pool (torch.Tensor) – Genome pool containing the set of all control voltage solutions ordered by fitness performance.

Returns:

pool – Genome pool containing a new mutated set of all control voltage solutions based on the best performing solutions.

Return type:

torch.Tensor

remove_duplicates(pool: Tensor)[source]#

Check the entire pool for any duplicate genomes and replace them by the genome put through a triangular distribution.

Parameters:

pool (torch.Tensor) – Genome pool containing a new mutated set of all control voltage solutions based on the best performing solutions.

Returns:

pool – Genome pool containing a new mutated set of all control voltage solutions based on the best performing solutions without any repeated solution.

Return type:

torch.Tensor

step(criterion_pool: Tensor)[source]#

This function performs an epoch step for a new generation of solutions. First, it sorts the gene pool based on fitness performance. Then, it generates offspring between best solutions by applying the blend crossover alpha-beta (BLX-a-b). Finally, it mutates every genome except those specified not to be updated in the partiton variable.

Parameters:

criterion_pool (torch.Tensor) – A pool storing the results from the criterion (fitness function) in the same order as that of the genome pool containing all solutions. It is used to help sorting the pool solutions by fitness.

Returns:

Genome pool containing the set of all control voltage solutions ordered by fitness performance.

Return type:

pool

universal_sampling()[source]#

A technique used in genetic algorithms for selecting potentially useful solutions for crossover. More information can be found in: https://en.wikipedia.org/wiki/Stochastic_universal_sampling#cite_note-baker-1

Returns:

The chosen ‘parents’. length: len(self.fitness) == len(self.pool).

Return type:

list

update_mutation_rate()[source]#

Dynamic parameter control of mutation rate. This method updates the mutation rate based on the generation counter.

Returns:

Mutation rate parameter.

Return type:

float

brainspy.algorithms.ga.evaluate_population(inputs: Tensor, targets: Tensor, pool: Tensor, model: Module, criterion)[source]#

Given a particular genome pool, containing all possible control voltage solutions of a genetic algorithm, it evaluates on the DNPU model/hardware the fitness for those solutions.

Parameters:
  • inputs (torch.Tensor) – The whole dataset of inputs in a single batch.

  • targets (torch.Tensor) – The whole dataset of target values in a single batch.

  • pool (torch.Tensor) – Array of different control voltage values that are going to be evaluated. The array has a shape of (pool_size, control_electrode_no).

  • model (torch.nn.Module) – Model against which all the solutions will be measured. It can be a Processor, representing either a hardware DNPU or a DNPU surrogate model. Refer to the documentation of the train function (above) to see how a model can be defined.

  • criterion (<method>) – Fitness function that will be used to train the model.

Returns:

  • outputs_pool (torch.Tensor) – All the outputs from all the measurements of the models.

  • criterion_pool (torch.Tensor) – Scores of the particular criterion fitness function used in the algorithm. These can be used to order the solutions with higher scores.

brainspy.algorithms.ga.train(model: Module, dataloaders: list, criterion, optimizer, configs: dict, save_dir: str | None = None, average_plateaus: bool = False)[source]#

Main training loop for the genetic algorithm. It supports training a single DNPU hardware device on both on and off chip flavours. It only supports using a training dataset (not a validation one). More information on what a genetic algorithm is can be found at: https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3

Parameters:
  • model (torch.nn.Module) – Model to be trained. Note that the model cannot be an instance of SurrogateModel, HardwareProcessor or Processor, it can only consist of 1 DNPU instance.

  • dataloaders (list) – A list containing a single PyTorch Dataloader containing the training dataset. More information about dataloaders can be found at: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

  • criterion (<method>) – Fitness function that will be used to train the model.

  • optimizer (GeneticOptimizer) – Optimization method for sorting the genome pool by fitness and creating new offspring based on the best resulting genomes.

  • configs (dict) – A dictionary containing extra configurations for the algorithm. * stop_threshold: float When the criterion fitness function reaches the specified threshold, or a higher value, the algorithm will stop.

  • save_dir (Optional[str]) – Folder where the trained model is going to be saved. When None, the model will not be saved. By default None.

Returns:

  • model (torch.nn.Module) – Trained model with best results according to the criterion fitness function.

  • training_data (dict) – Dictionary returning relevant data produced while training the model.

Notes

A) After the end of the last epoch, the algorithm saves two main files: model_raw.pt: This file is only saved when the model is not hardware (simulation). The file is an exact copy of the model after the end of the training process. It can be loaded directly as an instance of the model using: my_model_instance_at_best_val_results = torch.load(‘best_model_raw.pt’). training_data.pickle: A pytorch picle which contains the following keys: - epochs: int Number of epochs used for training the model - algorithm: Algorithm type that was being used. Either ‘genetic’ or ‘gradient’. - model_state_dict: OrderedDict It contains the value of the learnable parameters (weights, or in this case, control voltages) at the point where all the training was finised. - performance: list A list of the fitness function performance over all epochs - correlations: list A list of the correlation over all epochs - genome_history: list A list of the genomes that were used in each epoch

B) If the fitness performance is better than in previous epochs, the following files are saved: best_model_raw.pt: This file is only saved when the model is not hardware (simulation). The file is an exact copy of the model when it got the best validation results. It can be loaded directly as an instance of the model using: my_model_instance_at_best_val_results = torch.load(‘best_model_raw.pt’). best_training_data.pickle: A pytorch picle which contains the following keys: - epoch: int Epoch at which the model with best validation loss was found. - algorithm: str Algorithm type that was being used. Either ‘genetic’ or ‘gradient’. - model_state_dict: OrderedDict It contains the value of the learnable parameters (weights, or in this case, control voltages) at the point where the best validation was achieved. - best_fitness: float Training fitness at the point where the best validation was achieved. - correlation: float Correlation achieved on the best fitness achieved.

brainspy.algorithms.gd module#

File containing the gradient descent algorithm methods adapted for DNPU classes and custom torch.nn.Module children custom classes that contain DNPU classes or DNPU based modules from brainspy.processors.modules.

brainspy.algorithms.gd.default_train_step(model, epoch, dataloader, criterion, optimizer, logger=None, constraint_control_voltages=None)[source]#

Deafult training step for training a torch model in Gradiet descent. The method calulates the training loss in each training step. The training loss indicates how well the model is fitting the training data.

More information about training loss can be found at https://www.baeldung.com/cs/learning-curve-ml

The method returns the trained model and the running loss, which is used to calculate the training loss, in that step.

Parameters:
  • model (torch.nn.Module) – The model to be trained. It should be an instance of a torch.nn.Module. It can be a Processor, representing a hardware DNPU or a DNPU model, but it also can be a model that contains different more complex architectures using several processors. Refer to the documentation of the train function above for more inforamtion about defining a model.

  • epoch (int) – Number of passes through the entire training dataset.

  • dataloader (torch.utils.data.Dataloader) – A Pytorch dataloaders that corresponds to the training dataset. More information about dataloaders can be found at: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

  • criterion (Object <method>) – Loss function criterion that will be used to optimise the model. More information on several loss functions supported can be found at: https://pytorch.org/docs/stable/nn.html#loss-functions

  • optimizer (torch.optim.Optimizer) – Optimisation algorithm to be used during the training process. More on Pytorch’s optimizer package can be found at: https://pytorch.org/docs/stable/optim.html

  • logger (logging (optional)) –

    It provides a way for applications to configure different log handlers. by default None. The logger should be an already initialised class that contains a method called ‘log_output’, where the input is a single numpy array variable. It can be any class, and the data can be treated in the way the user wants.You can get more information about loggers at https://pytorch.org/docs/stable/tensorboard.html

    Logger directory info :

    log_train_step: to log each step in the training process

  • constraint_control_voltages (str) –

    When training models, typically it is desired for the control voltages to stay within the ranges in which they where trained, in order to avoid extrapolating, or reaching the clipping values. This str key can have the following values:

    1. ‘regul’ : It applies a penalty to the loss function when control voltages go outside the ranges in which they were trained. This method allows a bit of flexibility, enabling to find solutions that are, in some cases, slightly outside of the control voltage ranges. In order to be used, it also requires that the model has a method called ‘regularizer’ which controls that penalty. An example can be found at: brainspy.processors.dnpu, inside the class DNPU, method regularizer. 2. ‘clip’ : It applies clipping after the backward pass and optimiser step. It enforces that the control voltage ranges will not be outside the ranges in which the model was trained. In order to use it, the model should have a method called ‘constraint_weights’. An example can be found at: brainspy.processors.dnpu, inside the class DNPU, method constraint_weights.

Returns:

  • model (torch.nn.Module) – Trained model with best results according to the criterion fitness function.

  • running loss (int) – To assess the training loss: how far the predictions of the model are from the actual targets.

brainspy.algorithms.gd.default_val_step(epoch, model, dataloader, criterion, logger=None)[source]#

To calulate the validation loss in each training step of the Gradient descent. Validation loss indicates how well the model fits unseen data. More information about validation loss and training loss can be found at https://www.baeldung.com/cs/learning-curve-ml

Parameters:
  • epoch (int) – Number of passes through the entire training dataset.

  • model (torch.nn.Module) – The model to be trained. It should be an instance of a torch.nn.Module. It can be a Processor, representing a hardware DNPU or a DNPU model, but it also can be a model that contains different more complex architectures using several processors.Refer to the documentation of the train function above for more inforamtion about defining a model.

  • dataloader (torch.utils.data.Dataloader) – A Pytorch dataloaders that corresponds to the validation dataset. More information about dataloaders can be found at: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

  • criterion (Object <method>) – Loss function criterion that will be used to optimise the model. More information on several loss functions supported can be found at: https://pytorch.org/docs/stable/nn.html#loss-functions

  • logger (logging (optional)) –

    It provides a way for applications to configure different log handlers. by default None. The logger should be an already initialised class that contains a method called ‘log_output’, where the input is a single numpy array variable. It can be any class, and the data can be treated in the way the user wants.You can get more information about loggers at https://pytorch.org/docs/stable/tensorboard.html

    Logger directory info :

    log_val_step: to log each step in the validation process

Returns:

val_loss – To assess how well the model fits new data. It is the sum of errors made for each example in training or validation sets.

Return type:

int

brainspy.algorithms.gd.train(model: Module, dataloaders: list, criterion, optimizer: Optimizer, configs: dict, logger=None, save_dir: str | None = None, return_best_model: bool = True)[source]#

Main training loop for off-chip gradient descent training with early stopping using PyTorch. It is a default training loop used for simple training tasks, but its code can be taken as a reference on how to implement a training loop for more specific or complext tasks.

Parameters:
  • model (torch.nn.Module) –

    The model to be trained. It should be an instance of a torch.nn.Module. It can be a Processor, representing a hardware DNPU or a DNPU model, but it also can be a model that contains different more complex architectures using several processors.

    Note that the model can be a custom model (child of torch.nn.Module) containing multiple DNPU instances, but the model cannot be an instance of SurrogateModel or HardwareProcessor. If the model is a custom model, it should have the following methods implemented:

    format_targets : The hardware processor uses a waveform to represent points (see 5.1 in Introduction of the Wiki). Each point is represented with some slope and some plateau points. When passing through the hardware, there will be a difference between the output from the device and the input (in points). This function is used for the targets to have the same length in shape as the outputs. It simply repeats each point in the input as many times as there are points in the plateau. In this way, targets can then be compared against hardware outputs in the loss function. This function should have the following input (x : torch.Tensor), that represents the rgets of the supervised learning problem, which will be extended to have the same length shape as the outputs from the processor.

    regularizer : When the constraint_control_voltages parameter is set to “regul”, the result from the custom method regularizer will be added to the loss function. It is used to add a penalisation to the loss function when found control voltages are outside the control electrode ranges. The developer should decide how this value will be computed. Each DNPU class contains a regularizer method that returns how much the current control voltages of the DNPU are outside from the control electrode ranges. In a custom model, the custom regularizer function can be composed by calling the regularizer function of instantiated DNPUs. The custom regularizer method of a custom model only needs to be implemented if constraint_control_voltages = “regul” in the configs. An example can be found at: brainspy.processors.dnpu, inside the class DNPU.

    constraint_weights : When the constraint_control_voltages parameter is set to “clip”, the trainer will call this function to clip the current control voltages, if they are outside from the control electrode ranges to which they correspond. Each DNPU class contains a clip method (constraint_control_voltages) that clips current control voltage electrodes in this way. This method only needs to be implemented in a custom model if constraint_control_voltages = “clip” in the configs.

  • dataloaders (list) –

    A list containing one or two Pytorch dataloaders. The first dataloader corresponds to the training dataset. The second dataloader is optional, and it corresponds to the validation dataset. If no validation dataset is given, the training loop will train the model and return the trained model only after reaching to the latest epoch. If a second dataloader is given, it will be used as a validation dataset. When a validation dataset is present, only models with solutions that achieve the lowest validation score will be saved. It is recommended to have an additional test dataset on the side, to check the model against, after training it with an additional validation datasetz

    More information about dataloaders can be found at: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

  • criterion (Object <method>) – Loss function criterion that will be used to optimise the model. More information on several loss functions supported can be found at: https://pytorch.org/docs/stable/nn.html#loss-functions

  • optimizer (torch.optim.Optimizer) – Optimisation algorithm to be used during the training process. More on Pytorch’s optimizer package can be found at: https://pytorch.org/docs/stable/optim.html

  • configs (dict) –

    Dictionary containing the following extra configuration keys:

    epochs : int Number of passes through the entire training dataset.

    constraint_control_voltages : str When training models, typically it is desired for the control voltages to stay within the ranges in which they where trained, in order to avoid extrapolating, or reaching the clipping values. This str key can have the following values:

    1. ‘regul’ : It applies a penalty to the loss function when control voltages go outside the ranges in which they were trained. This method allows a bit of flexibility, enabling to find solutions that are, in some cases, slightly outside of the control voltage ranges. In order to be used, it also requires that the model has a method called ‘regularizer’ which controls that penalty. An example can be found at: brainspy.processors.dnpu, inside the class DNPU, method regularizer.

    2. ‘clip’ : It applies clipping after the backward pass and optimiser step. It enforces that the control voltage ranges will not be outside the ranges in which the model was trained. In order to use it, the model should have a method called ‘constraint_weights’. An example can be found at: brainspy.processors.dnpu, inside the class DNPU, method constraint_weights.

  • logger (logging (optional)) –

    It provides a way for applications to configure different log handlers. By default None. The logger should be an already initialised class that contains a method called ‘log_output’, where the input is a single numpy array variable. It can be any class, and the data can be treated in the way the user wants.You can get more information about loggers at https://pytorch.org/docs/stable/tensorboard.html

    Logger directory info: 1. log_train_step: to log each step in the training process 2. log_val_step: to log each step in the validation process

  • save_dir (Optional[str]) – Folder where the trained model is going to be saved. When None, the model will not be saved. By default None.

  • return_best_model (bool, optional) – to return the trained model instead of saving it to a directory, by default True

Returns:

  • model (torch.nn.Module) – Trained model with best results according to the criterion fitness function.

  • training_data (dict) – Dictionary returning relevant data produced while training the model.

  • configs[‘return_best_model’] (boolean) – It also adds to the configs dictionary whether the algorithm was returning the best model or not at configs[‘return_best_model’].

Notes

A) After the end of the last epoch, the algorithm saves two main files: model_raw.pt: An exact copy of the model after the end of the training process. It can be loaded directly as an instance of the model using: my_model_instance_at_best_val_results = torch.load(‘best_model_raw.pt’). training_data.pickle: A pytorch picle which contains the following keys:

epochs: int Number of epochs used for training the model

algorithm: Algorithm type that was being used. Either ‘genetic’ or ‘gradient’.

optimizer_state_dict: OrderedDict State of the optimizer at the end of last epoch. It can be used to resume model training at that exact point.

model_state_dict: OrderedDict It contains the value of the learnable parameters (weights, or in this case, control voltages) at the point where all the training was finised.

train_losses: list A list of the loss performance over all epochs

val_losses: list A list of the loss performance over all epochs

B) If there is a validation dataset present, and return_best_model is set to true. The algorithm will save, each time that the validation loss is better than the previous, the following files: best_model_raw.pt: An exact copy of the model when it got the best validation results. It can be loaded directly as an instance of the model using: my_model_instance_at_best_val_results = torch.load(‘best_model_raw.pt’). best_training_data.pickle: A pytorch picle which contains the following keys:

epoch: int Epoch at which the model with best validation loss was found.

algorithm: str Algorithm type that was being used. Either ‘genetic’ or ‘gradient’.

optimizer_state_dict: OrderedDict State of the optimizer at the moment when the best validation loss was achieved. It can be used to resume model training at that exact point.

model_state_dict: OrderedDict It contains the value of the learnable parameters (weights, or in this case, control voltages) at the point where the best validation was achieved.

train_loss: float Training loss at the point where the best validation was achieved.

validation_loss: float Best validation loss achieved.

brainspy.algorithms.gd.train_checks(model, dataloaders, criterion, optimizer, configs, save_dir, return_best_model)[source]#

Main training loop for off-chip gradient descent training with early stopping using PyTorch. It is a default training loop used for simple training tasks, but its code can be taken as a reference on how to implement a training loop for more specific or complext tasks.

Parameters:
  • model (torch.nn.Module) –

    The model to be trained. It should be an instance of a torch.nn.Module. It can be a Processor, representing a hardware DNPU or a DNPU model, but it also can be a model that contains different more complex architectures using several processors.

    Note that the model can be a custom model (child of torch.nn.Module) containing multiple DNPU instances, but the model cannot be an instance of SurrogateModel or HardwareProcessor. If the model is a custom model, it should have the following methods implemented:

    format_targets : The hardware processor uses a waveform to represent points (see 5.1 in Introduction of the Wiki). Each point is represented with some slope and some plateau points. When passing through the hardware, there will be a difference between the output from the device and the input (in points). This function is used for the targets to have the same length in shape as the outputs. It simply repeats each point in the input as many times as there are points in the plateau. In this way, targets can then be compared against hardware outputs in the loss function. This function should have the following input (x : torch.Tensor), that represents the rgets of the supervised learning problem, which will be extended to have the same length shape as the outputs from the processor.

    regularizer : When the constraint_control_voltages parameter is set to “regul”, the result from the custom method regularizer will be added to the loss function. It is used to add a penalisation to the loss function when found control voltages are outside the control electrode ranges. The developer should decide how this value will be computed. Each DNPU class contains a regularizer method that returns how much the current control voltages of the DNPU are outside from the control electrode ranges. In a custom model, the custom regularizer function can be composed by calling the regularizer function of instantiated DNPUs. The custom regularizer method of a custom model only needs to be implemented if constraint_control_voltages = “regul” in the configs. An example can be found at: brainspy.processors.dnpu, inside the class DNPU.

    constraint_weights : When the constraint_control_voltages parameter is set to “clip”, the trainer will call this function to clip the current control voltages, if they are outside from the control electrode ranges to which they correspond. Each DNPU class contains a clip method (constraint_control_voltages) that clips current control voltage electrodes in this way. This method only needs to be implemented in a custom model if constraint_control_voltages = “clip” in the configs.

  • dataloaders (list) –

    A list containing one or two Pytorch dataloaders. The first dataloader corresponds to the training dataset. The second dataloader is optional, and it corresponds to the validation dataset. If no validation dataset is given, the training loop will train the model and return the trained model only after reaching to the latest epoch. If a second dataloader is given, it will be used as a validation dataset. When a validation dataset is present, only models with solutions that achieve the lowest validation score will be saved. It is recommended to have an additional test dataset on the side, to check the model against, after training it with an additional validation datasetz

    More information about dataloaders can be found at: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

  • criterion (Object <method>) – Loss function criterion that will be used to optimise the model. More information on several loss functions supported can be found at: https://pytorch.org/docs/stable/nn.html#loss-functions

  • optimizer (torch.optim.Optimizer) – Optimisation algorithm to be used during the training process. More on Pytorch’s optimizer package can be found at: https://pytorch.org/docs/stable/optim.html

  • configs (dict) –

    Dictionary containing the following extra configuration keys:

    epochs : int Number of passes through the entire training dataset.

    constraint_control_voltages : str When training models, typically it is desired for the control voltages to stay within the ranges in which they where trained, in order to avoid extrapolating, or reaching the clipping values. This str key can have the following values:

    1. ‘regul’ : It applies a penalty to the loss function when control voltages go outside the ranges in which they were trained. This method allows a bit of flexibility, enabling to find solutions that are, in some cases, slightly outside of the control voltage ranges. In order to be used, it also requires that the model has a method called ‘regularizer’ which controls that penalty. An example can be found at: brainspy.processors.dnpu, inside the class DNPU, method regularizer.

    2. ‘clip’ : It applies clipping after the backward pass and optimiser step. It enforces that the control voltage ranges will not be outside the ranges in which the model was trained. In order to use it, the model should have a method called ‘constraint_weights’. An example can be found at: brainspy.processors.dnpu, inside the class DNPU, method constraint_weights.

  • logger (logging (optional)) –

    It provides a way for applications to configure different log handlers. By default None. The logger should be an already initialised class that contains a method called ‘log_output’, where the input is a single numpy array variable. It can be any class, and the data can be treated in the way the user wants.You can get more information about loggers at https://pytorch.org/docs/stable/tensorboard.html

    Logger directory info: 1. log_train_step: to log each step in the training process 2. log_val_step: to log each step in the validation process

  • save_dir (Optional[str]) – Folder where the trained model is going to be saved. When None, the model will not be saved. By default None.

  • return_best_model (bool, optional) – to return the trained model instead of saving it to a directory, by default True

Returns:

  • model (torch.nn.Module) – Trained model with best results according to the criterion fitness function.

  • training_data (dict) – Dictionary returning relevant data produced while training the model.

  • configs[‘return_best_model’] (boolean) – It also adds to the configs dictionary whether the algorithm was returning the best model or not at configs[‘return_best_model’].

Notes

A) After the end of the last epoch, the algorithm saves two main files: model_raw.pt: An exact copy of the model after the end of the training process. It can be loaded directly as an instance of the model using: my_model_instance_at_best_val_results = torch.load(‘best_model_raw.pt’). training_data.pickle: A pytorch picle which contains the following keys:

epochs: int Number of epochs used for training the model

algorithm: Algorithm type that was being used. Either ‘genetic’ or ‘gradient’.

optimizer_state_dict: OrderedDict State of the optimizer at the end of last epoch. It can be used to resume model training at that exact point.

model_state_dict: OrderedDict It contains the value of the learnable parameters (weights, or in this case, control voltages) at the point where all the training was finised.

train_losses: list A list of the loss performance over all epochs

val_losses: list A list of the loss performance over all epochs

B) If there is a validation dataset present, and return_best_model is set to true. The algorithm will save, each time that the validation loss is better than the previous, the following files: best_model_raw.pt: An exact copy of the model when it got the best validation results. It can be loaded directly as an instance of the model using: my_model_instance_at_best_val_results = torch.load(‘best_model_raw.pt’). best_training_data.pickle: A pytorch picle which contains the following keys:

epoch: int Epoch at which the model with best validation loss was found.

algorithm: str Algorithm type that was being used. Either ‘genetic’ or ‘gradient’.

optimizer_state_dict: OrderedDict State of the optimizer at the moment when the best validation loss was achieved. It can be used to resume model training at that exact point.

model_state_dict: OrderedDict It contains the value of the learnable parameters (weights, or in this case, control voltages) at the point where the best validation was achieved.

train_loss: float Training loss at the point where the best validation was achieved.

validation_loss: float Best validation loss achieved.