The manual search for the optimal hyperparameter values of a machine learning model is a tedious and time-consuming process. KerasTuner is a framework that automates the hyperparameter tuning and is the focus of today’s post.
To illustrate how to use the KerasTuner library, I have developed a binary image classifier for the PneumoniaMNIST dataset. Instead of training a neural network from scratch, I have adopted the pre-trained InceptionV3 model (Szegedy et al., 2016), by using the transfer learning technique, and trained a custom classification layer for pneumonia detection. The following hyperparameters are optimized using KerasTuner:
The repository containing the code and the trained model is available here. I have developed the codebase using Python 3.12, TensorFlow 2.16.1 and KerasTuner 1.4.7. The structure of the repository is as follows:
repo/
├── config/
│ └── params.template.yaml
├── hyperparameter_tuning/
│ ├── best_hyperparameters.pkl
│ └── history.txt
├── images/
│ ├── healthy.png
│ └── pneumonia.png
├── src/
│ ├── __version__.py
│ ├── evaluator.py
│ ├── hyperparameter_tuner.py
│ ├── plotter.py
│ ├── preprocessor.py
│ ├── trainer.py
│ └── utils.py
├── training/
│ ├── history.pkl
│ └── model.keras
├── .gitattributes
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt
The config
directory stores the params.template.yaml
, which allows to set the image size (further details later on), epoch count, and batch size. This file is version-controlled and to be modified it without triggering Git, copy the file and remove the .template
string from the filename.
The set of optimal hyperparameters and tuning history are stored in the hyperparameter_tuning
directory, while the training history and trained model are available in the training
folder.
The src
folder contains the source code files. First, the optimal hyperparameters values are computed by running the hyperparameter_tuner.py
and then utilized for training the model with the trainer.py
module. Both hyperparameter tuning and model training are performed by leveraging the training and validation datasets. Lastly, the evaluator.py
module evaluates the model accuracy against the test set, i.e. a dataset not seen by the model during training. Training, validation and test datasets are generated by the preprocessor.py
module. Let’s dive deeper into the details.
The PneumoniaMNIST is based on a dataset of 5,856 pediatric chest X-Ray images, and allows to perform binary classification of pneumonia against normal. The source training set is split with a ratio of 9:1 into training and validation set, and use its source validation set as the test set. The source images are gray-scale, and their sizes are (384–2,916) × (127–2,713). The images are center-cropped with a window size of length of the short edge and resized into 1 × 28 × 28 (Yang et al., 2023). The dataset can be downloaded either from Zenodo, a popular open repository developed under the European OpenAIRE program and operated by CERN, or from TensorFlow Datasets, a collection of datasets ready to use with TensorFlow or other Python machine learning frameworks, such as JAX. I have recently added PneumoniaMNIST to the TensorFlow Datasets library; for further details take a look at this pull request. In this repository, PneumoniaMNIST is downloaded from TensorFlow Datasets, which can be installed by running the command pip install tfds-nightly
within your local virtual environment. As a side note, the requirements.txt
is provided to quickly and easily setup the virtual environment.
The compute_datasets
function of the preprocessor.py
module returns the training, validation and test datasets, and requires the image size and batch size as inputs.
def compute_datasets(
image_size: int, batch_size: int
) -> Tuple[Dataset, Dataset, Dataset]:
"""
Preprocess data.
Parameters
----------
image_size : int
batch_size : int
Returns
-------
Tuple[Dataset, Dataset, Dataset]
"""
training_dataset, validation_dataset, test_dataset = tfds.load(
"pneumonia_mnist", split=["train", "val", "test"], as_supervised=True
)
image_size = [image_size, image_size]
preprocess_image_ = functools.partial(preprocess_image, target_size=image_size)
training_dataset = preprocess_dataset(
training_dataset, preprocess_image_, batch_size
)
training_dataset = augment_data(training_dataset)
validation_dataset = preprocess_dataset(
validation_dataset, preprocess_image_, batch_size
)
test_dataset = preprocess_dataset(test_dataset, preprocess_image_, batch_size)
return training_dataset, validation_dataset, test_dataset
The InceptionV3 model requires the images to have three inputs channels, and width and height should be no smaller than 75 pixels. Here each image is resized to be 150 × 150 pixels to better leverage the network depth and complexity, and converted to RGB. Moreover, the images are rescaled to be in the [0, 1] range. The preprocessing of each image is handled by the preprocess_image
function.
def preprocess_image(
image: tf.Tensor, label: tf.Tensor, target_size: Tuple[int, int]
) -> Tuple[tf.Tensor, tf.Tensor]:
"""
Preprocess image.
Parameters
----------
image : tf.Tensor
label : tf.Tensor
target_size : Tuple[int, int]
Returns
-------
Tuple[tf.Tensor, tf.Tensor]
"""
image /= 255
image = tf.image.resize(image, target_size)
image = tf.image.grayscale_to_rgb(image)
return image, label
The training dataset is augmented by applying random rotations, zooms, and horizontal flips to the images, as shown in the code snippet below. Refer to the TensorFlow 2.16.1 API documentation for further details.
def compute_augmentation_layers() -> Model:
"""
Compute augmentation layers.
Returns
-------
Model
"""
return tf.keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation((-0.1, 0.1)),
layers.RandomZoom((-0.1, 0.1)),
]
)
The hyperparameter_tuner.py
module provides the build_model
function, which returns the model to train and requires the hyperparameter set and image size as inputs. First, an instance of the InceptionV3 model is created whose include_top
parameter is set to False
to exclude the fully-connected layer at the top, i.e. the output layer, which will be replaced by the custom classification layer. The weights
parameter is set to its default value imagenet
, i.e. the pre-trained weights based on the ImageNet dataset. Finally, the model is compiled, and I have chosen the Root Mean Squared Propagation as the optimizer.
def build_model(hyper_parameters: kt.HyperParameters, image_size: int) -> Model:
"""
Build a model starting from a pre-trained one. A parametric classifier is added.
Parameters
----------
hyper_parameters : kt.HyperParameters
image_size : int
Returns
-------
Model
"""
hyper_parameters_ = HyperParametersConfig(hyper_parameters)
pre_trained_model = InceptionV3(
input_shape=(image_size, image_size, 3), include_top=False
)
for layer in pre_trained_model.layers:
layer.trainable = False
last_layer = pre_trained_model.get_layer("mixed7")
last_output = last_layer.output
x = layers.Flatten()(last_output)
x = layers.Dense(hyper_parameters_.neuron_count, activation="relu")(x)
x = layers.Dropout(hyper_parameters_.dropout)(x)
x = layers.Dense(1, activation="sigmoid")(x)
model = Model(pre_trained_model.input, x)
model.compile(
optimizer=RMSprop(learning_rate=hyper_parameters_.learning_rate),
loss="binary_crossentropy",
metrics=["accuracy"],
)
return model
The hyperparameter tuning is performed by the tune_model
function, which leverages the Hyperband algorithm, a bandit-based approach to hyperparameter optimization (Li et al., 2018). As explained here
The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing \(1 + log_{factor}(\text{max_epochs})\) and rounding it up to the nearest integer.
The optimal hyperparameter are then saved to a pickle file in the hyperparameter_tuning
folder. Other tuning algorithms are available in KerasTuner, such as GridSearch
and BayesianOptimization
.
def tune_model(
training_dataset: Dataset,
validation_dataset: Dataset,
epoch_count: int,
image_size: int,
) -> None:
"""
Tune the model.
Parameters
----------
training_dataset : Dataset
validation_dataset : Dataset
epoch_count : int
image_size : int
"""
tuner = kt.Hyperband(
lambda hyper_parameters: build_model(hyper_parameters, image_size),
objective="val_accuracy",
max_epochs=epoch_count,
project_name="pneumoniamnist",
)
tuner.search_space_summary(extended=True)
stop_early = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=5)
callbacks = [tensorboard_callback, model_checkpoint_callback, stop_early]
tuner.search(
training_dataset,
validation_data=validation_dataset,
epochs=epoch_count,
callbacks=callbacks,
)
tuner.results_summary()
best_hyperparameters = tuner.get_best_hyperparameters()[0]
with open(r"..\hyperparameter_tuning\best_hyperparameters.pkl", "wb") as file:
pickle.dump(best_hyperparameters, file)
The HyperParametersConfig
dataclass stores the possible values that the hyperparameters can take. Further details about how to define the hyperparameter space are available here.
@dataclass
class HyperParametersConfig:
"""
Keras tuner hyperparameters.
Parameters
----------
hyper_parameters : kt.HyperParameters
"""
hyper_parameters: kt.HyperParameters
learning_rate: kt.HyperParameters.Choice = field(init=False)
dropout: kt.HyperParameters.Float = field(init=False)
neuron_count: kt.HyperParameters.Int = field(init=False)
def __post_init__(self) -> None:
self.learning_rate = self.hyper_parameters.Choice(
"learning_rate", values=[1e-4, 1e-3, 1e-2]
)
self.dropout = self.hyper_parameters.Choice(
"dropout", values=[0.1, 0.2, 0.3, 0.4, 0.5]
)
self.neuron_count = self.hyper_parameters.Choice(
"neuron_count", values=[128, 256, 512, 1024]
)
The tuner performed 30 iterations and the optimal parameter values, listed below, were discovered at trail #26:
Finally, the model with optimal hyperparameters was tested on the test set (by using the evaluate
method of the tf.keras.Model
class), yielding an accuracy of 0.9199.
You should now have a clearer understanding of how to use KerasTuner for the hyperparameter tuning of a machine learning model. If you need help, feel free to contact me, for instance by opening an issue on GitHub.