2.5 Roadmap for DeepLearningKit
Here follows a brief overview of things we are working on or is on our roadmap.
1. use FFT-based convolution - with precalculated convolution filters [14, 15]
2. use lower resolution on floating point - in order to increase performance and support larger
models (for now it uses 32 bit float or complex numbers - i.e. 2x32 bit per complex number
to prepare for FFT-based convolution) [16, 17]
3. avoid copying memory between CPU and GPU more than needed [18]
4. add support for other types of pre-trained networks than deep convolutional neural net-
works, e.g. recurring neural networks[19, 20]
5. look into more in-place calculations to save memory, i.e. supporting larger models
6. try to exploit larger parts of Metal API wrt memory layout, threadgroups to increase per-
formance (this relates to 1.) [21, 22, 23, 24, 25]
7. Look into teacher-student deep networks or other compressed models for even smaller but
still high quality models (recent research have shown AlexNet models being compressed
from 240MB to 6.9MB), see the paper [A Deep Neural Network Compression Pipeline]
8. Look into algorithms for approximate matrix multiplication (i.e. convolution step speedup)
to further increase speed (and reduce energy usage), interesting techniques include a) [Ap-
proximating matrix multiplication and low-rank approximation], [Fast Approximate Matrix
Multiplication by Solving Linear Systems] and [Fast Monte-Carlo Algorithms for Approx-
imate Matrix Multiplications].
9. Look into a broad set of Deep Learning applications, e.g. categories in figures 7 from the
research bibliography at [http://Deeplearning.University]. It might be application specific
optimizations that can be done, e.g. in the case of natural language processing with convo-
lutional neural networks one uses 1D convolution instead of 2D (as in image classification).
3 App Store for Deep Learning Models
Given the immense asymmetry in time taken to train a Deep Learning Model versus time needed to
use it (e.g. to do image recognition), it makes perfect sense to build a large repository of pre-trained
models that can be (re)used several times. Since there are several popular tools used to train Deep
Learning models (e.g. Caffe, Torch, Theano, DeepLearning4J, PyLearn and Nervana) were working
on supporting importing pre-trained models in those tools into an app store for deep learning models
(currently weve been primarily been working with Caffe CNN models).
The tweet in Figure 8 illustrates how much energy is required to train a Deep Network (per night),
some Deep Learning Models can take weeks of training on GPUs like the Nvidia TitanX, or in other
words piles of wood of energy. Using a model is quite different since it requires less energy than
lighting match. See figures 9 and 10 for an illustration of this.
Deep Learning Models also typically have a (low) limit in the number of classes they can predict per
model (e.g. in the ImageNet competition there are 1000 classes, CIFAR-100 100 classes and CIFAR-
10 10 classes). This means that in order to create real-life applications one need to intelligently (and
very rapid load them from SSD into GPU accessible RAM) switch between several Deep Learning
Models, or if there is enough capacity one can run several models in parallel on the same GPU.
Selecting an approriate Deep Learning model (i.e. which is the most likely to work well in a given
context) is to our knowledge not a well-studied field of research, and in some ways it resembles the
meta or universal search problem found in web search (e.g. cross-model ranking), but latency plays
an even bigger part in the mobile on-device case (dont have time to run many models). We have
some ideas for a meta model for selecting a model to use, which can use input like location, time of
day, and camera history to predict which models might be most relevant.
With state-of-the-art compression techniques for Convolutional Neural Network the (groundbreak-
ing) AlexNet model from 2012 can be compressed from 240MB to 6.9MB. This means that one
could theoretically fit more than eighteen thousand AlexNet models on a 128 GB mobile device like
the iPhone 6!
5