Automating Keras hyperparameter optimization with Talos
A Deep Learning model is NOT a black box. It requires tuning for good performance.
In the two previous posts I showed you my first steps with Keras. I used examples found on the internet and changed the dataset into something trivial, meaning I generate the data myself and know the expected values. But I also told you that I had no idea why parameters like neurons, epochs, batch_size had these values.
So what we have is not really a black box. On the outside there are also some switches and screws that very much need our attention. In this post I am using Talos, 'Hyperparameter Optimization for Keras, TensorFlow (tf.keras) and PyTorch', see links below, which is intended to automate the process of selecting the optimal parameters.
At the end of this post is the code so you can try it yourself.
Hyperparameters
Parameters like neurons, epochs and batch_size are called hyperparameters, and tuning them is essential for good performance of the model. There are some interesting articles on the internet on how to tune these parameters. You can call this the holy grail of neural networks: hyperparameter optimization. Merriam-Webster website: Holy Grail is an object or goal that is sought after for its great significance. Unless you know what you are doing, it is easy to select not-optimal, or even totally wrong, parameters.
Loss functions
To perform optimization we need values that indicate how good our model performs. These values are calculated by loss functions. We use different loss functions for regression and classification. See for example the article 'Overview of loss functions for Machine Learning', see links below.
Regression loss functions:
- Mean Squared Error (MSE)
- Mean Absolute Error (MAE)
- Huber
- Log-Cosh
- Quantile
Classification loss functions:
- Binary Cross Entropy
- Multi-Class Cross Entropy
Talos summary
I like the sentence in the article 'Hyperparameter Optimization with Keras', see links below:
Make no mistake; EVEN WHEN WE DO GET THE PERFORMANCE METRIC RIGHT (yes I'm yelling), we need to consider what happens in the process of optimizing a model.
With Talos we parameterize our model. The number of combinations of epochs, batch_size, etc. can be huge. Talos picks randomly a number of combinations and creates the new model with training and validation data. This can take minutes but also hours or even days.
Once finished we can use the scores, change parameter values and/or add some parameters and run again. In the mean time we can do other things like drinking coffee, talking to a friend, or even better, try to learn more about Machine Learning optimization. Let's see how this works.
Example
I will run Talos for a very simple Neural Network model based on 'Keras 101: A simple (and interpretable) Neural Network model for House Pricing regression', see links below.
The dataset is generated with this function:
# define input sequence
def fx(x0, x1):
y = x0 + 2*x1
return y
The model:
model = Sequential()
# add layers
model.add(Dense(100, input_shape=(2,), activation='relu', name='layer_input'))
model.add(Dense(50, activation='relu', name='layer_hidden_1')
model.add(Dense(1, activation='linear', name='layer_output')
# compile
model.compile(optimizer='adam', loss='mse', metrics=['mean_absolute_error'])
And the fit function:
history = model.fit(X_train, y_train, epochs=100, validation_split=0.05)
Like I told you before, I have no idea why the parameters have these values. Ok, well a little bit.
Talos
To use this in Talos we replace the hardcoded parameters by variables that can be controlled by Talos. First we create a parameter dictionary with the values we want to vary. Do not start with all parameters you want to change. Before you know it, you will be waiting and waiting. Also, I started for example the batch_size parameter with two values that are far apart to get an idea. With the parameters below Talos did 16 runs and the total time on my PC (without GPU) was 42 seconds.
# parameters
p = dict(
first_neuron=[24, 192],
activation=['relu', 'elu'],
epochs=[50, 200],
batch_size=[8, 32]
)
Then we change the model for Talos:
model = Sequential()
# add layers
model.add(Dense(
params['first_neuron'],
input_shape=(2,),
activation=params['activation'],
name='layer_input')
)
model.add(Dense(
50,
activation=params['activation'],
name='layer_hidden_1')
)
model.add(Dense(
1,
activation='linear',
name='layer_output')
)
# compile
model.compile(
optimizer='adam',
loss='mse',
metrics=['mean_absolute_error'],
)
And the fit function becomes:
history = model.fit(
x=x_train,
y=y_train,
validation_data=[x_val, y_val],
epochs=params['epochs'],
batch_size=params['batch_size'],
verbose=0,
)
Run Talos and analyze
Time to run! After the run I print the full table of results.
# perform scan
so = talos.Scan(
x=X,
y=y,
model=dlm.model2scan,
params=p,
experiment_name='model2scan',
val_split=0.3,
)
print('scan details {}'.format(so.details))
print('analyze ...')
a = talos.Analyze(so)
a_table = a.table('val_loss', sort_by='val_loss', exclude=['start', 'end', 'duration'])
print('a_table = \n{}'.format(a_table))
Here is the table. Note that it is sorted by val_loss, meaning that the last row is the winner:
activation epochs round_epochs first_neuron val_mean_absolute_error mean_absolute_error batch_size val_loss loss
4 relu 50 50 24 7.786795 10.516150 32 94.053894 143.701889
12 elu 50 50 24 5.021943 2.713378 32 30.090796 10.598367
13 elu 50 50 192 2.880891 2.392172 32 10.355382 7.829942
5 relu 50 50 192 2.306647 1.435794 32 8.680595 3.228125
8 elu 50 50 24 2.205534 1.599179 8 5.929257 3.529320
9 elu 50 50 192 1.564395 0.991934 8 3.059430 1.329003
1 relu 50 50 192 0.813473 0.418985 8 1.315141 0.284857
14 elu 200 200 24 0.934512 0.557581 32 1.210560 0.448432
0 relu 50 50 24 0.680375 0.463401 8 0.818936 0.343270
15 elu 200 200 192 0.773358 0.466824 32 0.776512 0.313476
6 relu 200 200 24 0.510728 0.256091 32 0.515720 0.105524
7 relu 200 200 192 0.473588 0.219744 32 0.471352 0.076440
2 relu 200 200 24 0.562183 0.213781 8 0.431688 0.075045
3 relu 200 200 192 0.261547 0.062857 8 0.179677 0.006133
10 elu 200 200 24 0.327835 0.207104 8 0.140866 0.063864
11 elu 200 200 192 0.218479 0.116624 8 0.071611 0.028682
Looking at the data we see that epochs=200 gives much better results than epochs=50. We can change it to 100 to see if this also is good. The batch_size=8 also gives better results than batch_size=32. We can change it to 16 to see if this also is good. We can also try to reduce first_neuron. The new parameters for Talos:
p = dict(
first_neuron=[128, 192],
activation=['relu', 'elu'],
epochs=[100, 200],
batch_size=[8, 16]
)
Let's rerun with these values. The result:
batch_size activation epochs val_mean_absolute_error val_loss loss round_epochs first_neuron mean_absolute_error
13 16 elu 100 0.884498 1.053809 0.726826 100 192 0.706334
12 16 elu 100 0.779756 0.834097 0.669265 100 128 0.696979
4 16 relu 100 0.415210 0.235131 0.124713 100 128 0.291552
9 8 elu 100 0.395605 0.204896 0.157592 100 192 0.321956
5 16 relu 100 0.290187 0.109819 0.064319 100 192 0.211642
15 16 elu 200 0.234876 0.108920 0.070559 200 192 0.220270
14 16 elu 200 0.274542 0.107709 0.075080 200 128 0.216383
0 8 relu 100 0.280294 0.104701 0.049495 100 128 0.188951
1 8 relu 100 0.268457 0.100130 0.041598 100 192 0.160658
6 16 relu 200 0.228168 0.079410 0.033531 200 128 0.138397
11 8 elu 200 0.175497 0.070203 0.041988 200 192 0.154247
10 8 elu 200 0.150712 0.057377 0.021882 200 128 0.108372
8 8 elu 100 0.205943 0.055572 0.048045 100 128 0.187398
2 8 relu 200 0.182463 0.046856 0.018731 200 128 0.096890
7 16 relu 200 0.135524 0.025975 0.010142 200 192 0.073783
3 8 relu 200 0.078327 0.009304 0.004181 200 192 0.042721
The best results improved but not that much. The epochs=200 is still best as is batch_size=8. The final parameters are:
p = dict(
first_neuron=192,
activation='relu',
epochs=200,
batch_size=8,
)
Full code
Below is the code in case you want to try yourself. Select 'run_optimizer' to run Talos. The dataset is split into training data and test data. The training data is split again into training data and validation data.
After running the optimizer you can plug the new parameter values into the model, generate graphs, run evaluation against test data and run some predictions.
# optimizing keras hyperparameters with talos
from keras.models import Sequential, load_model
from keras.layers import Dense
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
import talos
# use plotly or pyplot
from matplotlib import pyplot
# your input: select talos optimizer or normal operation
run_optimizer = False
#run_optimizer = True
# your input: train model or use saved model
use_saved_model = False
#use_saved_model = True
# create dataset
def fx(x0, x1):
y = x0 + 2*x1
return y
X_items = []
y_items = []
for x0 in range(0, 18, 3):
for x1 in range(2, 27, 3):
y = fx(x0, x1)
X_items.append([x0, x1])
y_items.append(y)
X = np.array(X_items).reshape((-1, 2))
y = np.array(y_items)
print('X = {}'.format(X))
print('y = {}'.format(y))
X_data_shape = X.shape
print('X_data_shape = {}'.format(X_data_shape))
class DLM:
def __init__(
self,
model_name='my_model',
):
self.model_name = model_name
self.layer_input_shape=(2, )
# your input: final model parameters
self.params = dict(
# layers
first_neuron=192,
activation='relu',
# compile
# fit
val_split=0.3,
epochs=200,
batch_size=8,
verbose=0,
)
def data_split_train_test(
self,
X,
y,
):
self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, test_size=0.3, random_state=1)
print('self.X_train = {}'.format(self.X_train))
print('self.X_test = {}'.format(self.X_test))
print('self.y_train = {}'.format(self.y_train))
print('self.y_test = {}'.format(self.y_test))
print('training data row count = {}'.format(len(self.y_train)))
print('test data row count = {}'.format(len(self.y_test)))
X_train_data_shape = self.X_train.shape
print('X_train_data_shape = {}'.format(X_train_data_shape))
def get_model(
self,
):
self.model = self.get_model_for_params(self.params)
return self.model
def get_model_for_params(self, params):
model = Sequential()
# add layers
model.add(Dense(
params['first_neuron'],
input_shape=self.layer_input_shape,
activation=params['activation'],
name='layer_input')
)
model.add(Dense(
50,
activation=params['activation'],
name='layer_hidden_1')
)
model.add(Dense(
1,
activation='linear',
name='layer_output')
)
# compile
model.compile(
optimizer='adam',
loss='mse',
metrics=['mean_absolute_error'],
)
return model
def model_summary(
self,
model,
):
model.summary()
def fit(
self,
model,
plot=False,
):
# split training data
X_train, X_val, y_train, y_val = train_test_split(self.X_train, self.y_train, test_size=self.params['val_split'], random_state=1)
print('X_train = {}'.format(X_train))
print('y_train = {}'.format(y_train))
print('X_val = {}'.format(X_val))
print('y_val = {}'.format(y_val))
print('training data row count = {}'.format(len(y_train)))
print('validation data row count = {}'.format(len(y_val)))
history = self.fit_model(model, X_train, y_train, X_val, y_val, self.params)
if plot:
fig = go.Figure()
fig.add_trace(go.Scattergl(y=history.history['loss'], name='Train'))
fig.add_trace(go.Scattergl(y=history.history['val_loss'], name='Valid'))
fig.update_layout(height=500, width=700, xaxis_title='Epoch', yaxis_title='Loss')
fig.show()
fig = go.Figure()
fig.add_trace(go.Scattergl(y=history.history['mean_absolute_error'], name='Train'))
fig.add_trace(go.Scattergl(y=history.history['val_mean_absolute_error'], name='Valid'))
fig.update_layout(height=500, width=700, xaxis_title='Epoch', yaxis_title='Mean Absolute Error')
fig.show()
pyplot.plot(history.history['loss'], label='Train')
pyplot.plot(history.history['val_loss'], label='Valid')
pyplot.legend()
#pyplot.show()
pyplot.savefig('ex_dl_loss.png')
return history
def fit_model(self, model, x_train, y_train, x_val, y_val, params):
history = model.fit(
x=x_train,
y=y_train,
validation_data=[x_val, y_val],
epochs=params['epochs'],
batch_size=params['batch_size'],
verbose=0,
)
return history
def model2scan(self, x_train, y_train, x_val, y_val, params):
model = self.get_model_for_params(params)
history = self.fit_model(model, x_train, y_train, x_val, y_val, params)
return history, model
def evaluate(
self,
model,
):
score = model.evaluate(self.X_test, self.y_test)
print('test data - loss = {}'.format(score[0]))
print('test data - mean absolute error = {}'.format(score[1]))
return score
def predict(
self,
model,
x0,
x1,
fx=None,
):
x = np.array([[x0, x1]]).reshape((-1, 2))
predictions = model.predict(x)
expected = ''
if fx is not None:
expected = ', expected = {}'.format(fx(x0, x1))
print('for x = {}, predictions = {}{}'.format(x, predictions, expected))
return predictions
def save_model(
self,
model,
):
model.save(self.model_name)
def load_saved_model(
self,
):
self.model = load_model(self.model_name)
return self.model
dlm = DLM()
if not run_optimizer:
# create & save or used saved
if use_saved_model:
model = dlm.load_saved_model()
else:
dlm.data_split_train_test(X, y)
model = dlm.get_model()
# remove plot=True for no plot
dlm.fit(model, plot=True)
dlm.evaluate(model)
dlm.save_model(model)
# predict
dlm.predict(model, 4, 17, fx=fx)
dlm.predict(model, 23, 79, fx=fx)
dlm.predict(model, 40, 33, fx=fx)
dlm.predict(model, 140, 68, fx=fx)
else:
# talos
# your input: parameters run 1
p = dict(
first_neuron=[24, 192],
activation=['relu', 'elu'],
epochs=[50, 200],
batch_size=[8, 32]
)
# your input: parameters run 2 (change p2 to p)
p2 = dict(
first_neuron=[128, 192],
activation=['relu', 'elu'],
epochs=[100, 200],
batch_size=[8, 16]
)
# perform scan
so = talos.Scan(
x=X,
y=y,
model=dlm.model2scan,
params=p,
experiment_name='model2scan',
val_split=0.3,
)
print('scan details {}'.format(so.details))
print('analyze ...')
a = talos.Analyze(so)
# dump table
a_table = a.table('val_loss', sort_by='val_loss', exclude=['start', 'end', 'duration'])
print('a_table = \n{}'.format(a_table))
Some thoughts
Are we really converging in the right direction? This is very complex problem. We have an N-dimensional world with highs and lows everywhere. This means the starting point can be very important. In the end you should always do a run with as much parameters as possible. If you have a large dataset you should start by reducing it by picking random samples.
I hope it is clear that you need a very good understanding of what you are doing. But a system that is speed optimized (expensive GPU) also helps a lot.
Summary
Talos is a very nice tool that does much work for you. The documentation could be improved but I am not complaining. It has much more features but did not try them all. I could not make my (multi-step) univariate LSTM examples work with it because of the more complex nature of the dataset. Must look into this more.
Can we optimize the optimizer? Of course. As a next step we could take the results of the table and have it automatically analyzed to make a new selection for the parameters for the next run.
Links / credits
10 Hyperparameter optimization frameworks
https://towardsdatascience.com/10-hyperparameter-optimization-frameworks-8bc87bc8b7e3
5 Regression Loss Functions All Machine Learners Should Know
https://heartbeat.comet.ml/5-regression-loss-functions-all-machine-learners-should-know-4fb140e9d4b0
How do I choose the optimal batch size?
https://ai.stackexchange.com/questions/8560/how-do-i-choose-the-optimal-batch-size
How to Develop LSTM Models for Time Series Forecasting
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting
How to get reproducible results in keras
https://stackoverflow.com/questions/32419510/how-to-get-reproducible-results-in-keras
How to Manually Optimize Machine Learning Model Hyperparameters
https://machinelearningmastery.com/manually-optimize-hyperparameters
How to tune the number of epochs and batch_size in Keras-tuner?
https://kegui.medium.com/how-to-tune-the-number-of-epochs-and-batch-size-in-keras-tuner-c2ab2d40878d
Hyperparameter Optimization for Keras, TensorFlow (tf.keras) and PyTorch
https://github.com/autonomio/talos
Hyperparameter Optimization with Keras
https://towardsdatascience.com/hyperparameter-optimization-with-keras-b82e6364ca53
Keras 101: A simple (and interpretable) Neural Network model for House Pricing regression
https://towardsdatascience.com/keras-101-a-simple-and-interpretable-neural-network-model-for-house-pricing-regression-31b1a77f05ae
Overview of loss functions for Machine Learning
https://medium.com/analytics-vidhya/overview-of-loss-functions-for-machine-learning-61829095fa8a
What is batch size in neural network?
https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network
Read more
Deep Learning Machine Learning
Recent
- Hiding database UUID primary keys of your web application
- Don't Repeat Yourself (DRY) with Jinja2
- SQLAlchemy, PostgreSQL, maximum number of rows per user
- Show the values in SQLAlchemy dynamic filters
- Secure data transfer with Public Key encryption and pyNaCl
- rqlite: a high-availability and distributed SQLite alternative
Most viewed
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- Connect to a service on a Docker host from a Docker container
- Using PyInstaller and Cython to create a Python executable
- SQLAlchemy: Using Cascade Deletes to delete related objects
- Flask RESTful API request parameter validation with Marshmallow schemas