.. _elsa-ml-first-example:
***************
Getting started
***************
.. contents:: Table of Contents
A first example
===============
Let's start with a first example and build a network to learn the famous
`MNIST `_ task which consists of 60.000
images of handwritten digits with height and width both 28.
We start by defining a VolumeDescriptor describing the input shape of a single
MNIST image and then define our first network layer.
.. code-block:: cpp
VolumeDescriptor inputDescriptor{{28, 28, 1}};
auto input = ml::Input(inputDescriptor, /* batch-size */ 10);
In this first example we design a fully-connected network. We must therefore
flatten our input using a Flatten layer. We set the input of ``flatten`` to be
our input layer defined above;
.. code-block:: cpp
auto flatten = ml::Flatten();
flatten.setInput(&input);
Let's add a Dense layer and set the input. The Dense layer is defined by
specifying the number of neurons (128 in this case) and the activation function.
Again, we set the input of ``dense`` to be ``flatten``.
.. code-block:: cpp
// A dense layer with 128 neurons and Relu activation
auto dense = ml::Dense(128, ml::Activation::Relu);
dense.setInput(&flatten);
Finally we add a second Dense layer, followed by a Softmax layer:
.. code-block:: cpp
// A dense layer with 10 neurons and Relu activation
auto dense2 = ml::Dense(10, ml::Activation::Relu);
dense2.setInput(&dense);
auto softmax = ml::Softmax();
softmax.setInput(&dense2);
Now it's time to construct a Model out of the layers above which works just by
specifying the input and output of our network
.. code-block:: cpp
auto model = ml::Model(&input, &softmax);
Sequential Networks
===============
Since our fully-connected network above consists of layers with at most one
input and at most one output we call such a network *sequential*, again following
Keras's naming convention.
In the case of a sequential network we have an easier way to build out model.
The above network can also be constructed by the following snippet:
.. code-block:: cpp
auto model = ml::Sequential(ml::Input(inputDesc, /* batch-size */ 1),
ml::Dense(128, ml::Activation::Relu),
ml::Dense(10, ml::Activation::Relu),
ml::Softmax());
Graphs and pretty printing
===============
While constructing a model, elsa defines an internal graph representation of the
network. Observing this graph can be helpful. Elsa exports network graphs in
`Graphviz' DOT language `_:
.. code-block:: cpp
ml::Utils::Plotting::modelToDot(model, "myModel.dot");
By using e.g.
.. code-block:: bash
$ dot -Tpng myModel.dot > myModel.png
the network model defined in the previous section is plotted as
.. image:: myModel.png
:width: 300
:align: center
:alt: NetworkGraph
Another possibility to observe the architecture of a model is to just print the
model to console:
.. code-block:: cpp
std::cout << model << "\n";
In our example this would result in
.. code-block:: none
Model:
________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
================================================================================
input_0 (Input) (28, 28, 1) 0 flatten_1
________________________________________________________________________________
flatten_1 (Flatten) (784) 0 dense_2
________________________________________________________________________________
dense_2 (Dense) (128) 100480 dense_4
________________________________________________________________________________
dense_4 (Dense) (10) 1290 softmax_6
________________________________________________________________________________
softmax_6 (Softmax) (10) 0
================================================================================
Total trainable params: 101770
________________________________________________________________________________
Compiling a model
===============
So far we only defined what we call a front-end model. To really do something
meaningful we need to compile the model. This is also the point where we
set the loss function we want to use as well as the optimizer.
While compiling a model elsa performs a lot of detail work under the hood. In
particular backend resources get allocated.
Let's compile our toy MNIST model with a SparseCategoricalCrossentropy loss
and the well known Adam optimizer:
.. code-block:: cpp
// Define an Adam optimizer
auto opt = ml::Adam();
// Compile the model
model.compile(ml::SparseCategoricalCrossentropy(), &opt);
After the model is compile we are ready for training or inference.
Train a model
===============
Training a model is straight forward and most of the work will be preparing
training data which is of course independant of elsa. Assume we bundled our
MNIST images into a ``std::vector`` ``inputs`` and our labels
into ``labels`` respectively.
Following Keras we can traing our model as easy as
.. code-block:: cpp
model.fit(inputs, labels, /* epochs */ 10);