The R Bootcamp @ DHLab |
![]() |
By the end of this practical you will know how to:
Keras
to build deepforward neural networks.Keras
to fit and evaluate deepforward neural
networks. .Keras
to optimize the predictive performance of
deepforward neural networks.Open your neuralnets
R project.
Open a new R script. Save it as a new file called
deep_nn_practical.R
in the 2_Code
folder.
Using library()
load the the packages
tidyverse
and keras
# install.packages("tidyverse")
# install.packages("keras")
# Load packages necessary for this exercise
library(tidyverse)
library(keras)
fashion.RDS
dataset as a
new object.# MNIST fashion data
fashion <- readRDS(file = "1_Data/fashion.RDS")
fashion
object using
str()
. You will see a list with two elements named
train
and test
, which consist of two elements
x
(the images) and y
(the item depicted).# Inspect contents
str(fashion)
source()
the helper.R
file in your
2_Code
folder.# Load helper.R
source("2_Code/helper.R")
images
and items
separately for training and test. Use the code below.# split digit train
c(fashion_train_images, fashion_train_items) %<-% fashion$train
# split digit test
c(fashion_test_images, fashion_test_items) %<-% fashion$test
array_reshape
function to serialize the
images of both training and test, such that every image is a vector of
28*28=784
elements (and resulting object a matrix with that
many columns). Use the code below.# reshape images
fashion_train_images_serialized <- array_reshape(fashion_train_images, c(nrow(fashion_train_images), 784))
fashion_test_images_serialized <- array_reshape(fashion_test_images, c(nrow(fashion_test_images), 784))
# rescale images
fashion_train_images_serialized <- fashion_train_images_serialized / 255
fashion_test_images_serialized <- fashion_test_images_serialized / 255
1
sitting
in the position of the integer and 0
s otherwise.# expand criterion
fashion_train_items_onehot <- to_categorical(fashion_train_items, 10)
fashion_test_items_onehot <- to_categorical(fashion_test_items, 10)
head(fashion_train_items_onehot)
to inspect the
first few rows and compare them to
head(fashion_train_items)
. Do things line up?# fashion items
fashion_labels = c('T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle boot')
plt_imgs()
function, which you have loaded
earlier with the helper.R
script, to illustrate the images.
You have add a 1
, because the indices in
fashion_train_items
start at 0
.# rescale images
plt_imgs(fashion_train_images[1:25,,],fashion_labels[fashion_train_items[1:25]+1])
keras_model_sequential()
to start building
a network.# begin building network
net <- keras_model_sequential()
layer_dense()
.
Inside the function you will have to specify three arguments:
input_shape
and activation
. For a moment,
think about what values to use for these three given the kind of data
that you wish to model. The answers come in a sec. # add layer
net %>% layer_dense(
input_shape = XX,
units = XX,
activation = "XX"
)
The correct solutions are input_shape = 784
to
specify that there must be784
input nodes, one for each
pixel, units = 10
to specify that there must be 10
different output nodes, and activation = 'softmax'
to
specify that the final activation should be a probability that sums to
1
across all output nodes. After you have entered these
values, use summary(net)
to see the model
information.
Take a look at the Param #
column in the print out.
Why is the number 7850
rather than
7840 = 784 * 10
? Any ideas?
Yes, keras
automatically adds a biases to each node
in a layer.
compile()
to compile the network. You will need to
specify at least two arguments: optimizer
and
loss
. Think about what we’ve used in the presentation.
Would the same values make sense here?# loss, optimizers, & metrics
net %>% compile(
optimizer = 'XX',
loss = 'XX',
metrics = c('accuracy')
)
optimizer = 'adam'
and
loss = categorical_crossentropy
and run the chunk. You see
I’ve also added 'accuracy'
as an additional metric, which
can be useful to track during fitting, as it is much easier to interpret
than crossentropy
.fit()
. Specify, the arguments x
,
y
, batch_size
, and epoch
. Think
for a moment, what the appropriate values for these arguments could
be.# loss, optimizers, & metrics
history <- net %>% fit(
x = XX,
y = XX,
batch_size = XX,
epochs = XX
)
The arguments x
and y
specify the
training features and training criterion, respectively, so
x = fashion_train_images_serialized
and
y = fashion_train_items_onehot
. The arguments
batch_size
and epochs
control how often the
weights will be updated and for how many iterations of the data set.
Useful (and somewhat arbitrary) values are batch_size = 32
and epochs = 10
. Use these values and then run the
chunk.
The fit
function automatically provides with useful
information on the progression of fit indices. You can additionally use
the history
object to get the same illustration to create a
ggplot
. Run plot(history) + theme_minimal()
.
When you inspect the plot, what tells you that the network indeed has
learned? And how good is the final performance of the network?
The network has learned, which can be gathered by the decrease in
loss
and the increase in accuracy
. The
non-linear pattern is very characteristic, almost always most of the
gains are achieved in the first epoch or two. To get accurate values on
the final performance you can simply print history
. What do
you think, how well will this network perform in predicting fashion
items out-of-sample? Find out in the next section.
evaluate()
while supplying the function with the test
images and items.# evaluate
net %>% evaluate(XX, XX, verbose = 0)
The network is slightly worse than in training, but still pretty
good given that guessing performance is only 10%
accuracy.
This, again, is very characteristic. In machine learning “simple” models
often get a long way towards the desired level of performance. Though,
one might question whether a model with 750 parameters can still be
considered “simple”.
Compare the predictions made by the network with the actual fashion labels. Do you note any patterns? Can you maybe understand the errors that the network has made.
# compare predictions to truth
pred = net %>% predict(fashion_test_images_serialized) %>% k_argmax() %>% as.numeric()
table(fashion_labels[fashion_test_items+1], fashion_labels[pred+1])
'relu'
activation function. See template below. The final layer will again be
the output layer and must be supplied with the same values as before.
Plot the summary at the end.# initialize deepnet
deepnet <- keras_model_sequential()
# add layers
deepnet %>%
layer_dense(input_shape = 784, units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX")
# model information
summary(deepnet)
How many parameters are there now in the network?
A whole lot more parameters. There are more than 300 times as
many parameters as before. Let’s see what this network can achieve.
Compile and train the network using the exact same steps as before and
then evaluate the network on the test data. The only change in the code
you need to make is to replace net
with
deepnet
.
The test performance has improved by about 7% points. Not bad. Also the drop from fitting to testing performance is small suggesting minimal overfitting of the data. Let’s the see how far we can take it. Let’s build a real deep network
# initialize realdeepnet
realdeepnet <- keras_model_sequential()
# add layers
realdeepnet %>%
layer_dense(input_shape = 784, units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX")
# model information
summary(realdeepnet)
The number of parameters in network only increased by a factor of 2, because the lion share stems from the weights between the input and the first hidden layer, which remained unchanged. Nonetheless, there is reason to believe that this network fares differently. Try it out using the exact same steps as before.
The model did not fare a whole lot better. The fit performance increased a bit, but the predictive accuracy more or less remained constant. Fit the model again. To do this you can simply run the fit function again, which will lead to a continuation of training from where training ended before. After training has finished, evaluate the predictive accuracy again.
Not all too much happening. Though, both fitting and prediction performance went up by another notch. This is again characteristic of neural networks. To really max out on performance, many epochs of training will often be necessary. However, at the same time risks to give the model opportunity to overfit the data.
crazycomplexnet
which should
have twice as many nodes in the first hidden layer than there are input
nodes. Add as many hidden layers as you like and then run the network
for 20
epochs for a batch size of 100
and
evaluate the network’s performance.# initialize crazycomplexnet
crazycomplexnet <- keras_model_sequential()
# add layers
crazycomplexnet %>%
layer_dense(input_shape = 784, units = 1568, activation = "relu") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = 10, activation = "softmax")
# model information
summary(crazycomplexnet)
layour_dropout(rate = .3)
in between the
layer_dense()
helps preserve some more of the fitting
performance for test. Layer dropout sets the activation of a random
subset of nodes to zero, which effectively eliminates momentarily the
weights emitting from the node and, thus, constrains model
flexibility.# initialize crazycomplexnet
crazycomplexnet <- keras_model_sequential()
# add layers
crazycomplexnet %>%
layer_dense(input_shape = 784, units = 1568, activation = "relu") %>%
layer_dropout(rate = XX) %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dropout(rate = XX)
layer_dense(units = 10, activation = "softmax")
# model information
summary(crazycomplexnet)