Networks

Network architectures in CN24 are defined using a JSON file. The basic layout looks like the following example:

{
  "hyperparameters": { /* See section on hyperparameters */ },
  "net": {
    "task":        "detection",
    "input":       "conv1",
    "output":      "fc7",
    "nodes":       { /* See section on layer types */ },
    "error_layer": "square"
  },
  "data_input": { /* See section on data input */ }
}

Hyperparameters

This section controls the optimization process. The following hyperparameters can (and should) be set:

  • batch_size_parallel: Sets the fourth dimension of the network’s input. This directly affects VRAM usage if you are using a GPU.

  • batch_size_sequential: If you want to use a larger minibatch size than your memory would allow using batch_size_parallel, you can change batch_size_sequential. The effective minibatch size is the product of both.

  • epoch_iterations: The number of iterations (gradient steps) per epoch. This is an arbitrary setting. If it is not set, an epoch will have one iteration per training sample.

  • optimization_method: Choose the optimizer you want to use for your network. Currently, the following optimization methods are supported:

    • adam: The Adam optimizer. It can be configured using the following hyperparameter keys:

      • ad_step_size, ad_beta1 and ad_beta2: Matches the \(\alpha,\beta_1\) and \(\beta_2\) parameters from the Adam paper.
      • ad_epsilon: Mathces the \(\epsilon\) parameter from the Adam paper.
      • ad_sqrt_step_size: If set to 1, the effective step size will be \(\alpha\) divided by the square root of the number of iterations already processed.
    • gd: Standard stochastic gradient descent with momentum. Using the number of iterations \(t\), the effective learning rate is \(\eta (1 + \gamma t)^q\). SGD supports the following hyperparameter keys:

      • learning_rate: Sets the learning rate \(\eta\) for gradient descent.
      • learning_rate_exponent: Sets the exponent \(q\) for the effective learning rate.
      • learning_rate_gamma: Sets the coefficient \(\gamma\) for the effective learning rate.
      • gd_momentum: Sets the momentum coefficient.
  • l1: The coefficient for \(L_1\) regularization of weights.
  • l2: The coefficient for \(L_2\) regularization of weights.

An example block might look like this:

"hyperparameters": {
  "testing_ratio": 1,
  "batch_size_parallel": 2,
  "batch_size_sequential": 32,
  "epoch_iterations": 100,
  "l1": 0,
  "l2": 0.0005,
  "optimization_method": "adam",
  "ad_step_size": 0.000001
}

Data Input

This section specifies the input size into the network. It is required because the node list does not contain any information on input or output shapes of the nodes.

Layer Types

Convolution Layer

Maximum Pooling Layer