How many times have you thought is that a 4 or a 9 when your best mate wrote a number on a piece of paper? Well, if that's hard for humans how possibly could it be simpler for a computer to learn. Welcome in the kingdom of deep learning where certain tasks can be taught to computer with super-humans capacity. And when I say "taught" I mean it. Here, we don't code algorithms for solving problems. No, here we code algorithms for learning how to solve a problem. Then, we take a bunch of examples and the computer will learn from them. Kinda of cool, no?

So let's start.

First, we need a dataset with handwritten characters and luckily we have one handy. That's MNIST (http://yann.lecun.com/exdb/mnist/) which is produced by Yan LeCun the guru of deep learning, currently at Facebook. He invented something known as ConvNets which broke any previous result in learning in so many different application domains. I think he will get the Turing Award one day. Convnets are simple and effective as we will see in follow up posting.

Second, we need some high level library for coding deep-learning in a simple and effective way. Here we are super-lucky because in the last year there has been a Cambrian explosion of deep learning libraries with all the big players giving a contribution from Google, to Facebook, to Microsoft, to the Academic world. After testing many (Theano, Google's Tensorflow, Lasagne, Block, Neon) I decided to go for Keras because it is clean and minimalist. Plus it runs on the top of Theano and TensorFlow which are the state of the art today and you can switch the backend transparently. Keras supports both CPUs and GPUs computation.

Third, let's show directly some code which I wrote and can get to an accuracy of >98%

import numpy as np import matplotlib.pyplot as plt import time np.random.seed(1111) # for reproducibility from keras.datasets import mnist from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation, Flatten from keras.layers.convolutional import Convolution2D, MaxPooling2D from keras.utils import np_utils from keras.regularizers import l2, activity_l2 from keras.utils.visualize_util import plot from keras.optimizers import SGD, Adam, RMSprop from keras.callbacks import EarlyStopping import inspect # # save the graph produced by the experiment # def print_Graph( # Training log fitlog, # elapsed time elapsed, # input parameters for the experiment args, # input values for the experiment values): experiment_label = "\n".join(['%s=%s' % (i, values[i]) for i in args]) experiment_file = experiment_label+"-Time= %02d" % elapsed + "sec" experiment_file = experiment_file.replace("\n", "-")+'.png' fig = plt.figure(figsize=(6, 3)) plt.plot(fitlog.history["val_acc"]) plt.title('val_accuracy') plt.ylabel('val_accuracy') plt.xlabel('iteration') fig.text(.7,.15,experiment_label, size='6') plt.savefig(experiment_file, format="png") # # A LeNet-like convnet for classifying MINST handwritten characters 28x28 # def convNet_LeNet( VERBOSE=1, # normlize NORMALIZE = True, # Network Parameters BATCH_SIZE = 128, NUM_EPOCHS = 20, # Number of convolutional filters NUM_FILTERS = 32, # side length of maxpooling square NUM_POOL = 2, # side length of convolution square NUM_CONV = 3, # dropout rate for regularization DROPOUT_RATE = 0.5, # hidden number of neurons first layer NUM_HIDDEN = 128, # validation data VALIDATION_SPLIT=0.2, # 20% # optimizer used OPTIMIZER = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) ): # Output classes, number of MINST DIGITS NUM_CLASSES = 10 # Shape of an MINST digit image SHAPE_X, SHAPE_Y = 28, 28 # Channels on MINST IMG_CHANNELS = 1 # LOAD the MINST DATA split in training and test data (X_train, Y_train), (X_test, Y_test) = mnist.load_data() X_train = X_train.reshape(X_train.shape[0], 1, SHAPE_X, SHAPE_Y) X_test = X_test.reshape(X_test.shape[0], 1, SHAPE_X, SHAPE_Y) # convert in float32 representation for GPU computation X_train = X_train.astype("float32") X_test = X_test.astype("float32") if (NORMALIZE): # NORMALIZE each pixerl by dividing by max_value=255 X_train /= 255 X_test /= 255 print('X_train shape:', X_train.shape) print(X_train.shape[0], 'train samples') print(X_test.shape[0], 'test samples') # KERAS needs to represent each output class into OHE representation Y_train = np_utils.to_categorical(Y_train, NUM_CLASSES) Y_test = np_utils.to_categorical(Y_test, NUM_CLASSES) nn = Sequential() #FIRST LAYER OF CONVNETS, POOLING, DROPOUT # apply a NUM_CONV x NUM_CONF convolution with NUM_FILTERS output # for the first layer it is also required to define the input shape # activation function is rectified linear nn.add(Convolution2D(NUM_FILTERS, NUM_CONV, NUM_CONV, input_shape=(IMG_CHANNELS, SHAPE_X, SHAPE_Y) )) nn.add(Activation('relu')) nn.add(Convolution2D(NUM_FILTERS, NUM_CONV, NUM_CONV)) nn.add(Activation('relu')) nn.add(MaxPooling2D(pool_size = (NUM_POOL, NUM_POOL))) nn.add(Dropout(DROPOUT_RATE)) #SECOND LAYER OF CONVNETS, POOLING, DROPOUT # apply a NUM_CONV x NUM_CONF convolution with NUM_FILTERS output nn.add(Convolution2D( NUM_FILTERS, NUM_CONV, NUM_CONV)) nn.add(Activation('relu')) nn.add(Convolution2D(NUM_FILTERS, NUM_CONV, NUM_CONV)) nn.add(Activation('relu')) nn.add(MaxPooling2D(pool_size = (NUM_POOL, NUM_POOL) )) nn.add(Dropout(DROPOUT_RATE)) # FLATTEN the shape for dense connections nn.add(Flatten()) # FIRST HIDDEN LAYER OF DENSE NETWORK nn.add(Dense(NUM_HIDDEN)) nn.add(Activation('relu')) nn.add(Dropout(DROPOUT_RATE)) # OUTFUT LAYER with NUM_CLASSES OUTPUTS # ACTIVATION IS SOFTMAX, REGULARIZATION IS L2 nn.add(Dense(NUM_CLASSES, W_regularizer=l2(0.01) )) nn.add(Activation('softmax') ) #summary nn.summary() #plot the model plot(nn) # set an early-stopping value early_stopping = EarlyStopping(monitor='val_loss', patience=2) # COMPILE THE MODEL # loss_function is categorical_crossentropy # optimizer is parametric nn.compile(loss='categorical_crossentropy', optimizer=OPTIMIZER, metrics=["accuracy"]) start = time.time() # FIT THE MODEL WITH VALIDATION DATA fitlog = nn.fit(X_train, Y_train, \ batch_size=BATCH_SIZE, nb_epoch=NUM_EPOCHS, \ verbose=VERBOSE, validation_split=VALIDATION_SPLIT, \ callbacks=[early_stopping]) elapsed = time.time() - start # Test the network results = nn.evaluate(X_test, Y_test, verbose=VERBOSE) print('accuracy:', results[1]) # just to get the list of input parameters and their value frame = inspect.currentframe() args, _, _, values = inspect.getargvalues(frame) # used for printing pretty arguments print_Graph(fitlog, elapsed, args, values) return fitlog # 2 epochs #log = convNet_LeNet(OPTIMIZER = 'Adam', NUM_EPOCHS=2) #print(log.history) # 20 epochs #log = convNet_LeNet(OPTIMIZER = 'Adam', NUM_EPOCHS=20) #print(log.history) # default optimizer = SGD #log = convNet_LeNet(NUM_EPOCHS=20) #print(log.history) # default optimizer = RMSProp #log = convNet_LeNet(OPTIMIZER=RMSprop(), NUM_EPOCHS=20) #print(log.history) ## default optimizer #log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0) #print(log.history) # default optimizer #log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0.1) #print(log.history) # default optimizer #log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0.2) #print(log.history) # default optimizer #log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0.4) #print(log.history) # default optimizer #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=64) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=128) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=256) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=512) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=1024) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=2048) #print(log.history) # #log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=4096) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.8) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.6) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.4) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.2) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.2, NORMALIZE=False) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.2, NUM_FILTERS=64) #print(log.history) log = convNet_LeNet(OPTIMIZER='Adam', NUM_FILTERS=128) print(log.history) # log = convNet_LeNet(OPTIMIZER='Adam', NUM_FILTERS=256) # print(log.history) # x log = convNet_LeNet(OPTIMIZER='Adam', NUM_POOL=4) # x print(log.history) # log = convNet_LeNet(OPTIMIZER='Adam', NUM_POOL=8) # print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', NUM_CONV=4) #print(log.history) # x log = convNet_LeNet(OPTIMIZER='Adam', NUM_CONV=8) # x print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=32) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=64) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=256) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=512) #print(log.history) #log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=1024) #print(log.history) # VERBOSE=1, # # normlize # NORMALIZE = True, # # Network Parameters # BATCH_SIZE = 128, # NUM_EPOCHS = 100, # # Number of convolutional filters # NUM_FILTERS = 32, # # side length of maxpooling square # NUM_POOL = 2, # # side length of convolution square # NUM_CONV = 3, # # dropout rate for regularization # DROPOUT_RATE = 0.5, # # hidden number of neurons first layer # N_HIDDEN = 128, # # validation data # VALIDATION_SPLIT=0.2, # 20% # # optimizer used # OPTIMIZER = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) #plt.show()

Next posting is about describing the code. Then, you will see dozens of experiments for exploring the hyper-parameters' space and inferring some rules of thumbs for fine tuning our deep learning nets.

Stay tuned, during the next months we will see more than 20 nets for deep learning in different contexts and show super-human capacity

## No comments:

## Post a Comment