The manual search for the optimal hyperparameter values of a machine learning model is a tedious and time-consuming process. KerasTuner is a framework that automates the hyperparameter tuning and is the focus of today’s post.
To illustrate how to use the KerasTuner library, I have tuned the hyperparameters of two image classifiers for the PneumoniaMNIST dataset. The first model, denoted as simple model, was developed from scratch and has a series of convolutional layers, while the second one leverages a pre-trained InceptionV3 model (Szegedy et al., 2016), by using the transfer learning technique, with a custom classification layer for pneumonia detection.
The hyperparameters of the simple model are the following:
Whereas, the pre-trained model has three hyperparameters to be optimized:
The repository containing the code and the trained models is available here. I have developed the codebase using Python 3.12, TensorFlow 2.16.1 and KerasTuner 1.4.7. The structure of the repository is as follows (I may add new packages or modules for future posts):
repo/
├── config/
│ └── params.template.yaml
├── hyperparameter_tuning/
│ ├── pre_trained_model_best_hyperparameters.pkl
│ ├── pre_trained_model_history.txt
│ ├── simple_model_best_hyperparameters.pkl
│ └── simple_model_history.txt
├── images/
│ ├── healthy.png
│ └── pneumonia.png
├── src/
│ ├── __version__.py
│ ├── builder.py
│ ├── evaluator.py
│ ├── hyperparameter_tuner.py
│ ├── plotter.py
│ ├── preprocessor.py
│ ├── trainer.py
│ └── utils.py
├── training/
│ ├── pre_trained_model_history.pkl
│ ├── pre_trained_model.keras
│ ├── simple_model_model_history.pkl
│ └── simple_model.keras
├── .gitattributes
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt
The config
directory stores the params.template.yaml
, which allows to set the image size (for the pre-trained model), epoch count, batch size, and the model to be tuned. This file is version-controlled and to be modified it without triggering Git, copy the file and remove the .template
string from the filename.
The sets of optimal hyperparameters and tuning histories are stored in the hyperparameter_tuning
directory, while the training histories and trained models are available in the training
folder.
The src
folder contains the source code files. First, the optimal hyperparameters values are computed by running the hyperparameter_tuner.py
and then utilized for training the model with the trainer.py
module. Both hyperparameter tuning and model training are performed by leveraging the training and validation datasets. Lastly, the evaluator.py
module evaluates the model accuracy against the test set, i.e. a dataset not seen by the model during training. Training, validation and test datasets are generated by the preprocessor.py
module. Let’s dive deeper into the details.
The PneumoniaMNIST is based on a dataset of 5,856 pediatric chest X-Ray images, and allows to perform binary classification of pneumonia against normal. The source training set is split with a ratio of 9:1 into training and validation set, and use its source validation set as the test set. The source images are gray-scale, and their sizes are (384–2,916) × (127–2,713). The images are center-cropped with a window size of length of the short edge and resized into 1 × 28 × 28 (Yang et al., 2023). The dataset can be downloaded either from Zenodo, a popular open repository developed under the European OpenAIRE program and operated by CERN, or from TensorFlow Datasets, a collection of datasets ready to use with TensorFlow or other Python machine learning frameworks, such as JAX. I have recently added PneumoniaMNIST to the TensorFlow Datasets library; for further details take a look at this pull request. In this repository, PneumoniaMNIST is downloaded from TensorFlow Datasets, which can be installed by running the command pip install tfds-nightly
within your local virtual environment. As a side note, the requirements.txt
is provided to quickly and easily setup the virtual environment.
The compute_datasets
function of the preprocessor.py
module returns the training, validation and test datasets.
def compute_datasets(params: SimpleNamespace) -> Tuple[Dataset, Dataset, Dataset]:
"""
Preprocess data.
Parameters
----------
params : SimpleNamespace
Returns
-------
Tuple[Dataset, Dataset, Dataset]
"""
training_dataset, validation_dataset, test_dataset = tfds.load(
"pneumonia_mnist", split=["train", "val", "test"], as_supervised=True
)
if params.mode == 0:
preprocess_image_ = preprocess_image_simple_model
else:
image_size = [params.image_size, params.image_size]
preprocess_image_ = functools.partial(
preprocess_image_pre_trained_model, target_size=image_size
)
training_dataset = preprocess_dataset(
training_dataset, preprocess_image_, params.batch_size
)
training_dataset = augment_data(training_dataset)
validation_dataset = preprocess_dataset(
validation_dataset, preprocess_image_, params.batch_size
)
test_dataset = preprocess_dataset(
test_dataset, preprocess_image_, params.batch_size
)
return training_dataset, validation_dataset, test_dataset
The InceptionV3 model requires the images to have three inputs channels, and width and height should be no smaller than 75 pixels. Thus, for tuning the pre-trained model, each image is resized to be 150 × 150 pixels to better leverage the network depth and complexity, and converted to RGB. Moreover, for both models the images are rescaled to be in the [0, 1] range. The preprocessing of each image is handled by the preprocess_image_simple_model
and preprocess_image_pre_trained_model
functions.
def preprocess_image_pre_trained_model(
image: tf.Tensor, label: tf.Tensor, target_size: Tuple[int, int]
) -> Tuple[tf.Tensor, tf.Tensor]:
"""
Preprocess image for the pre-trained model.
Parameters
----------
image : tf.Tensor
label : tf.Tensor
target_size : Tuple[int, int]
Returns
-------
Tuple[tf.Tensor, tf.Tensor]
"""
image /= 255
image = tf.image.resize(image, target_size)
image = tf.image.grayscale_to_rgb(image)
return image, label
The training dataset is augmented by applying random rotations, zooms, and horizontal flips to the images, as shown in the code snippet below. Refer to the TensorFlow 2.16.1 API documentation for further details.
def compute_augmentation_layers() -> Model:
"""
Compute augmentation layers.
Returns
-------
Model
"""
return tf.keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation((-0.1, 0.1)),
layers.RandomZoom((-0.1, 0.1)),
]
)
The builder.py
module provides the build_simple_model
and build_pre_trained_model
functions for building the two models. Both functions require the hyperparameter set as input. The simple model is created by using the Keras sequential API, which allows to group a linear stack of layers. The Adam optimizer is adopted, i.e. an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments (Kingma, 2014).
def build_simple_model(hyper_parameters: kt.HyperParameters) -> Sequential:
"""
Build a simple parametric convolutional neural network.
Parameters
----------
hyper_parameters : kt.HyperParameters
Returns
-------
tf.keras.models.Sequential
"""
hyper_parameters_specs = SimpleModelHyperParametersConfig(hyper_parameters)
model = Sequential()
for i in range(hyper_parameters_specs.convolutional_layer_count):
filter_count = hyper_parameters_specs.convolutional_first_layer_filter_count
if i == 0:
model.add(
layers.Conv2D(
filter_count, (3, 3), activation="relu", input_shape=(28, 28, 1)
)
)
else:
model.add(layers.Conv2D(filter_count * i * 2, (3, 3), activation="relu"))
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Flatten())
model.add(
layers.Dense(
units=hyper_parameters_specs.dense_layer_neuron_count, activation="relu"
)
)
model.add(layers.Dropout(hyper_parameters_specs.dropout))
model.add(layers.Dense(1, activation="sigmoid"))
adam_optimizer = Adam(learning_rate=hyper_parameters_specs.learning_rate)
model.compile(
optimizer=adam_optimizer, loss="binary_crossentropy", metrics=["accuracy"]
)
return model
Regarding the pre-trained model, first an instance of the InceptionV3 model is created whose include_top
parameter is set to False
to exclude the fully-connected layer at the top, i.e. the output layer, which will be replaced by the custom classification layer. The weights
parameter is set to its default value imagenet
, i.e. the pre-trained weights based on the ImageNet dataset. The classification layer is coded using the Keras functional API, that enables building models with non-linear topology, shared layers, and multiple inputs or outputs. I have used the functional API just for educational purposes, as the sequential API would have been sufficient for modelling the output layers. Finally, the model is compiled, and I have chosen the Root Mean Squared Propagation as the optimizer.
def build_pre_trained_model(
hyper_parameters: kt.HyperParameters, image_size: int
) -> Model:
"""
Build a model starting from a pre-trained one. A parametric classifier is added.
Parameters
----------
hyper_parameters : kt.HyperParameters
image_size : int
Returns
-------
Model
"""
hyper_parameters_ = PreTrainedModelHyperParametersConfig(hyper_parameters)
pre_trained_model = InceptionV3(
input_shape=(image_size, image_size, 3), include_top=False
)
for layer in pre_trained_model.layers:
layer.trainable = False
last_layer = pre_trained_model.get_layer("mixed7")
last_output = last_layer.output
x = layers.Flatten()(last_output)
x = layers.Dense(hyper_parameters_.neuron_count, activation="relu")(x)
x = layers.Dropout(hyper_parameters_.dropout)(x)
x = layers.Dense(1, activation="sigmoid")(x)
model = Model(pre_trained_model.input, x)
model.compile(
optimizer=RMSprop(learning_rate=hyper_parameters_.learning_rate),
loss="binary_crossentropy",
metrics=["accuracy"],
)
return model
The SimpleModelHyperParametersConfig
and PreTrainedModelHyperParametersConfig
dataclasses store the possible values that the hyperparameters can take. Further details about how to define the hyperparameter space are available here.
@dataclass
class SimpleModelHyperParametersConfig:
"""
Keras tuner hyperparameters.
Parameters
----------
hyper_parameters : kt.HyperParameters
"""
hyper_parameters: kt.HyperParameters
learning_rate: kt.HyperParameters.Choice = field(init=False)
convolutional_layer_count: kt.HyperParameters.Choice = field(init=False)
convolutional_first_layer_filter_count: kt.HyperParameters.Choice = field(
init=False
)
dropout: kt.HyperParameters.Choice = field(init=False)
dense_layer_neuron_count: kt.HyperParameters.Choice = field(init=False)
def __post_init__(self) -> None:
self.learning_rate = self.hyper_parameters.Choice(
"learning_rate", values=[1e-4, 1e-3, 1e-2]
)
self.convolutional_layer_count = self.hyper_parameters.Choice(
"convolutional_layer_count", [1, 2, 3]
)
self.convolutional_first_layer_filter_count = self.hyper_parameters.Choice(
"convolutional_first_layer_filter_count", [16, 32, 48, 64]
)
self.dropout = self.hyper_parameters.Choice(
"dropout", values=[0.1, 0.2, 0.3, 0.4, 0.5]
)
self.dense_layer_neuron_count = self.hyper_parameters.Choice(
"dense_layer_neuron_count", values=[256, 512, 1024]
)
@dataclass
class PreTrainedModelHyperParametersConfig:
"""
Keras tuner hyperparameters.
Parameters
----------
hyper_parameters : kt.HyperParameters
"""
hyper_parameters: kt.HyperParameters
learning_rate: kt.HyperParameters.Choice = field(init=False)
dropout: kt.HyperParameters.Choice = field(init=False)
neuron_count: kt.HyperParameters.Choice = field(init=False)
def __post_init__(self) -> None:
self.learning_rate = self.hyper_parameters.Choice(
"learning_rate", values=[1e-4, 1e-3, 1e-2]
)
self.dropout = self.hyper_parameters.Choice(
"dropout", values=[0.1, 0.2, 0.3, 0.4, 0.5]
)
self.neuron_count = self.hyper_parameters.Choice(
"neuron_count", values=[256, 512, 1024]
)
The hyperparameter tuning is performed by the hyperparameter_tuner.py
module, specifically by the tune_model
function, which leverages the Hyperband algorithm, a bandit-based approach to hyperparameter optimization (Li et al., 2018). As explained here:
The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing \(1 + log_{factor}(\text{max_epochs})\) and rounding it up to the nearest integer.
The optimal hyperparameter are then saved to a pickle file in the hyperparameter_tuning
folder. Other tuning algorithms are available in KerasTuner, such as GridSearch
and BayesianOptimization
.
def tune_model(
training_dataset: Dataset, validation_dataset: Dataset, params: SimpleNamespace
) -> None:
"""
Tune the model.
Parameters
----------
training_dataset : Dataset
validation_dataset : Dataset
params : SimpleNamespace
"""
mode_to_model_builder_type = {
0: build_simple_model,
1: lambda hyper_parameters: build_pre_trained_model(
hyper_parameters, params.image_size
),
}
label = mode_to_label[params.mode]
tuner = kt.Hyperband(
mode_to_model_builder_type[params.mode],
objective="val_accuracy",
max_epochs=params.epoch_count,
project_name=label,
directory=r"..\hyperparameter_tuning",
)
tuner.search_space_summary(extended=True)
stop_early = tf.keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=5)
callbacks = get_callbacks(label)
callbacks.append(stop_early)
tuner.search(
training_dataset,
validation_data=validation_dataset,
epochs=params.epoch_count,
callbacks=callbacks,
)
tuner.results_summary()
best_hyperparameters = tuner.get_best_hyperparameters()[0]
with open(
rf"..\hyperparameter_tuning\{label}_best_hyperparameters.pkl", "wb"
) as file:
pickle.dump(best_hyperparameters, file)
The optimal hyperparameter values discovered by the tuner for the simple model are the following:
and for the pre-trained model:
Finally, the simple and pre-trained models with optimal hyperparameters were tested on the test set (by using the evaluate
method of the tf.keras.Model
class), yielding an accuracy of 0.8686 and 0.9054 respectively.
You should now have a clearer understanding of how to use KerasTuner for the hyperparameter tuning of a machine learning model. If you need help, feel free to contact me, for instance by opening an issue on GitHub.