KerasTuner: Machine Learning Hyperparameter Optimization

The manual search for the optimal hyperparameter values of a machine learning model is a tedious and time-consuming process. KerasTuner is a framework that automates the hyperparameter tuning and is the focus of today’s post.

To illustrate how to use the KerasTuner library, I have tuned the hyperparameters of two image classifiers for the PneumoniaMNIST dataset. The first model, denoted as simple model, was developed from scratch and has a series of convolutional layers, while the second one leverages a pre-trained InceptionV3 model (Szegedy et al., 2016), by using the transfer learning technique, with a custom classification layer for pneumonia detection.

The hyperparameters of the simple model are the following:

Number of convolutional layers
Number of filters of the first convolutional layer
Number of neurons in the dense layer
Dropout rate
Learning rate

Whereas, the pre-trained model has three hyperparameters to be optimized:

Number of neurons in the dense layer
Dropout rate
Learning rate

The repository containing the code and the trained models is available here. I have developed the codebase using Python 3.12, TensorFlow 2.16.1 and KerasTuner 1.4.7. The structure of the repository is as follows (I may add new packages or modules for future posts):

repo/
├── config/
│   └── params.template.yaml
├── hyperparameter_tuning/
│   ├── pre_trained_model_best_hyperparameters.pkl
│   ├── pre_trained_model_history.txt
│   ├── simple_model_best_hyperparameters.pkl
│   └── simple_model_history.txt
├── images/
│   ├── healthy.png
│   └── pneumonia.png
├── src/
│   ├── __version__.py
│   ├── builder.py
│   ├── evaluator.py
│   ├── hyperparameter_tuner.py
│   ├── plotter.py
│   ├── preprocessor.py
│   ├── trainer.py
│   └── utils.py
├── training/
│   ├── pre_trained_model_history.pkl
│   ├── pre_trained_model.keras
│   ├── simple_model_model_history.pkl
│   └── simple_model.keras
├── .gitattributes
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

The config directory stores the params.template.yaml, which allows to set the image size (for the pre-trained model), epoch count, batch size, and the model to be tuned. This file is version-controlled and to be modified it without triggering Git, copy the file and remove the .template string from the filename.

The sets of optimal hyperparameters and tuning histories are stored in the hyperparameter_tuning directory, while the training histories and trained models are available in the training folder.

The src folder contains the source code files. First, the optimal hyperparameters values are computed by running the hyperparameter_tuner.py and then utilized for training the model with the trainer.py module. Both hyperparameter tuning and model training are performed by leveraging the training and validation datasets. Lastly, the evaluator.py module evaluates the model accuracy against the test set, i.e. a dataset not seen by the model during training. Training, validation and test datasets are generated by the preprocessor.py module. Let’s dive deeper into the details.

The PneumoniaMNIST is based on a dataset of 5,856 pediatric chest X-Ray images, and allows to perform binary classification of pneumonia against normal. The source training set is split with a ratio of 9:1 into training and validation set, and use its source validation set as the test set. The source images are gray-scale, and their sizes are (384–2,916) × (127–2,713). The images are center-cropped with a window size of length of the short edge and resized into 1 × 28 × 28 (Yang et al., 2023). The dataset can be downloaded either from Zenodo, a popular open repository developed under the European OpenAIRE program and operated by CERN, or from TensorFlow Datasets, a collection of datasets ready to use with TensorFlow or other Python machine learning frameworks, such as JAX. I have recently added PneumoniaMNIST to the TensorFlow Datasets library; for further details take a look at this pull request. In this repository, PneumoniaMNIST is downloaded from TensorFlow Datasets, which can be installed by running the command pip install tfds-nightly within your local virtual environment. As a side note, the requirements.txt is provided to quickly and easily setup the virtual environment.

The compute_datasets function of the preprocessor.py module returns the training, validation and test datasets.

def compute_datasets(params: SimpleNamespace) -> Tuple[Dataset, Dataset, Dataset]:
    """
    Preprocess data.

    Parameters
    ----------
    params : SimpleNamespace

    Returns
    -------
    Tuple[Dataset, Dataset, Dataset]

    """
    training_dataset, validation_dataset, test_dataset = tfds.load(
        "pneumonia_mnist", split=["train", "val", "test"], as_supervised=True
    )

    if params.mode == 0:
        preprocess_image_ = preprocess_image_simple_model
    else:
        image_size = [params.image_size, params.image_size]
        preprocess_image_ = functools.partial(
            preprocess_image_pre_trained_model, target_size=image_size
        )

    training_dataset = preprocess_dataset(
        training_dataset, preprocess_image_, params.batch_size
    )
    training_dataset = augment_data(training_dataset)
    validation_dataset = preprocess_dataset(
        validation_dataset, preprocess_image_, params.batch_size
    )
    test_dataset = preprocess_dataset(
        test_dataset, preprocess_image_, params.batch_size
    )

    return training_dataset, validation_dataset, test_dataset

The InceptionV3 model requires the images to have three inputs channels, and width and height should be no smaller than 75 pixels. Thus, for tuning the pre-trained model, each image is resized to be 150 × 150 pixels to better leverage the network depth and complexity, and converted to RGB. Moreover, for both models the images are rescaled to be in the [0, 1] range. The preprocessing of each image is handled by the preprocess_image_simple_model and preprocess_image_pre_trained_model functions.

def preprocess_image_pre_trained_model(
    image: tf.Tensor, label: tf.Tensor, target_size: Tuple[int, int]
) -> Tuple[tf.Tensor, tf.Tensor]:
    """
    Preprocess image for the pre-trained model.

    Parameters
    ----------
    image : tf.Tensor
    label : tf.Tensor
    target_size : Tuple[int, int]

    Returns
    -------
    Tuple[tf.Tensor, tf.Tensor]

    """
    image /= 255
    image = tf.image.resize(image, target_size)
    image = tf.image.grayscale_to_rgb(image)
    return image, label

The training dataset is augmented by applying random rotations, zooms, and horizontal flips to the images, as shown in the code snippet below. Refer to the TensorFlow 2.16.1 API documentation for further details.

def compute_augmentation_layers() -> Model:
    """
    Compute augmentation layers.

    Returns
    -------
    Model

    """
    return tf.keras.Sequential(
        [
            layers.RandomFlip("horizontal"),
            layers.RandomRotation((-0.1, 0.1)),
            layers.RandomZoom((-0.1, 0.1)),
        ]
    )

The builder.py module provides the build_simple_model and build_pre_trained_model functions for building the two models. Both functions require the hyperparameter set as input. The simple model is created by using the Keras sequential API, which allows to group a linear stack of layers. The Adam optimizer is adopted, i.e. an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments (Kingma, 2014).

def build_simple_model(hyper_parameters: kt.HyperParameters) -> Sequential:
    """
    Build a simple parametric convolutional neural network.

    Parameters
    ----------
    hyper_parameters : kt.HyperParameters

    Returns
    -------
    tf.keras.models.Sequential

    """
    hyper_parameters_specs = SimpleModelHyperParametersConfig(hyper_parameters)
    model = Sequential()

    for i in range(hyper_parameters_specs.convolutional_layer_count):
        filter_count = hyper_parameters_specs.convolutional_first_layer_filter_count
        if i == 0:
            model.add(
                layers.Conv2D(
                    filter_count, (3, 3), activation="relu", input_shape=(28, 28, 1)
                )
            )
        else:
            model.add(layers.Conv2D(filter_count * i * 2, (3, 3), activation="relu"))
        model.add(layers.MaxPooling2D(2, 2))

    model.add(layers.Flatten())
    model.add(
        layers.Dense(
            units=hyper_parameters_specs.dense_layer_neuron_count, activation="relu"
        )
    )
    model.add(layers.Dropout(hyper_parameters_specs.dropout))
    model.add(layers.Dense(1, activation="sigmoid"))

    adam_optimizer = Adam(learning_rate=hyper_parameters_specs.learning_rate)
    model.compile(
        optimizer=adam_optimizer, loss="binary_crossentropy", metrics=["accuracy"]
    )
    return model

Regarding the pre-trained model, first an instance of the InceptionV3 model is created whose include_top parameter is set to False to exclude the fully-connected layer at the top, i.e. the output layer, which will be replaced by the custom classification layer. The weights parameter is set to its default value imagenet, i.e. the pre-trained weights based on the ImageNet dataset. The classification layer is coded using the Keras functional API, that enables building models with non-linear topology, shared layers, and multiple inputs or outputs. I have used the functional API just for educational purposes, as the sequential API would have been sufficient for modelling the output layers. Finally, the model is compiled, and I have chosen the Root Mean Squared Propagation as the optimizer.

def build_pre_trained_model(
    hyper_parameters: kt.HyperParameters, image_size: int
) -> Model:
    """
    Build a model starting from a pre-trained one. A parametric classifier is added.

    Parameters
    ----------
    hyper_parameters : kt.HyperParameters
    image_size : int

    Returns
    -------
    Model

    """
    hyper_parameters_ = PreTrainedModelHyperParametersConfig(hyper_parameters)

    pre_trained_model = InceptionV3(
        input_shape=(image_size, image_size, 3), include_top=False
    )
    for layer in pre_trained_model.layers:
        layer.trainable = False

    last_layer = pre_trained_model.get_layer("mixed7")
    last_output = last_layer.output

    x = layers.Flatten()(last_output)
    x = layers.Dense(hyper_parameters_.neuron_count, activation="relu")(x)
    x = layers.Dropout(hyper_parameters_.dropout)(x)
    x = layers.Dense(1, activation="sigmoid")(x)

    model = Model(pre_trained_model.input, x)
    model.compile(
        optimizer=RMSprop(learning_rate=hyper_parameters_.learning_rate),
        loss="binary_crossentropy",
        metrics=["accuracy"],
    )
    return model

The SimpleModelHyperParametersConfig and PreTrainedModelHyperParametersConfig dataclasses store the possible values that the hyperparameters can take. Further details about how to define the hyperparameter space are available here.

@dataclass
class SimpleModelHyperParametersConfig:
    """
    Keras tuner hyperparameters.

    Parameters
    ----------
    hyper_parameters : kt.HyperParameters

    """

    hyper_parameters: kt.HyperParameters
    learning_rate: kt.HyperParameters.Choice = field(init=False)
    convolutional_layer_count: kt.HyperParameters.Choice = field(init=False)
    convolutional_first_layer_filter_count: kt.HyperParameters.Choice = field(
        init=False
    )
    dropout: kt.HyperParameters.Choice = field(init=False)
    dense_layer_neuron_count: kt.HyperParameters.Choice = field(init=False)

    def __post_init__(self) -> None:
        self.learning_rate = self.hyper_parameters.Choice(
            "learning_rate", values=[1e-4, 1e-3, 1e-2]
        )
        self.convolutional_layer_count = self.hyper_parameters.Choice(
            "convolutional_layer_count", [1, 2, 3]
        )
        self.convolutional_first_layer_filter_count = self.hyper_parameters.Choice(
            "convolutional_first_layer_filter_count", [16, 32, 48, 64]
        )
        self.dropout = self.hyper_parameters.Choice(
            "dropout", values=[0.1, 0.2, 0.3, 0.4, 0.5]
        )
        self.dense_layer_neuron_count = self.hyper_parameters.Choice(
            "dense_layer_neuron_count", values=[256, 512, 1024]
        )

@dataclass
class PreTrainedModelHyperParametersConfig:
    """
    Keras tuner hyperparameters.

    Parameters
    ----------
    hyper_parameters : kt.HyperParameters

    """

    hyper_parameters: kt.HyperParameters
    learning_rate: kt.HyperParameters.Choice = field(init=False)
    dropout: kt.HyperParameters.Choice = field(init=False)
    neuron_count: kt.HyperParameters.Choice = field(init=False)

    def __post_init__(self) -> None:
        self.learning_rate = self.hyper_parameters.Choice(
            "learning_rate", values=[1e-4, 1e-3, 1e-2]
        )
        self.dropout = self.hyper_parameters.Choice(
            "dropout", values=[0.1, 0.2, 0.3, 0.4, 0.5]
        )
        self.neuron_count = self.hyper_parameters.Choice(
            "neuron_count", values=[256, 512, 1024]
        )

The hyperparameter tuning is performed by the hyperparameter_tuner.py module, specifically by the tune_model function, which leverages the Hyperband algorithm, a bandit-based approach to hyperparameter optimization (Li et al., 2018). As explained here:

The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing \(1 + log_{factor}(\text{max_epochs})\) and rounding it up to the nearest integer.

The optimal hyperparameter are then saved to a pickle file in the hyperparameter_tuning folder. Other tuning algorithms are available in KerasTuner, such as GridSearch and BayesianOptimization.

def tune_model(
    training_dataset: Dataset, validation_dataset: Dataset, params: SimpleNamespace
) -> None:
    """
    Tune the model.

    Parameters
    ----------
    training_dataset : Dataset
    validation_dataset : Dataset
    params : SimpleNamespace

    """
    mode_to_model_builder_type = {
        0: build_simple_model,
        1: lambda hyper_parameters: build_pre_trained_model(
            hyper_parameters, params.image_size
        ),
    }
    label = mode_to_label[params.mode]

    tuner = kt.Hyperband(
        mode_to_model_builder_type[params.mode],
        objective="val_accuracy",
        max_epochs=params.epoch_count,
        project_name=label,
        directory=r"..\hyperparameter_tuning",
    )
    tuner.search_space_summary(extended=True)

    stop_early = tf.keras.callbacks.EarlyStopping(monitor="val_accuracy", patience=5)
    callbacks = get_callbacks(label)
    callbacks.append(stop_early)

    tuner.search(
        training_dataset,
        validation_data=validation_dataset,
        epochs=params.epoch_count,
        callbacks=callbacks,
    )
    tuner.results_summary()

    best_hyperparameters = tuner.get_best_hyperparameters()[0]

    with open(
        rf"..\hyperparameter_tuning\{label}_best_hyperparameters.pkl", "wb"
    ) as file:
        pickle.dump(best_hyperparameters, file)

The optimal hyperparameter values discovered by the tuner for the simple model are the following:

Number of convolutional layers = 3
Number of filters of the first convolutional layer = 16
Number of neurons in the dense layer = 1024
Dropout rate = 0.1
Learning rate = 0.001

and for the pre-trained model:

Number of neurons in the dense layer = 256
Dropout rate = 0.1
Learning rate = 0.001

Finally, the simple and pre-trained models with optimal hyperparameters were tested on the test set (by using the evaluate method of the tf.keras.Model class), yielding an accuracy of 0.8686 and 0.9054 respectively.

You should now have a clearer understanding of how to use KerasTuner for the hyperparameter tuning of a machine learning model. If you need help, feel free to contact me, for instance by opening an issue on GitHub.

References

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826. https://doi.org/10.48550/arXiv.1512.00567
Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., & Ni, B. (2023). Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1), 41. https://doi.org/10.1038/s41597-022-01721-8
Kingma, D. P. (2014). Adam: A method for stochastic optimization. ArXiv Preprint ArXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18(185), 1–52. http://jmlr.org/papers/v18/16-558.html

← Previous Post Next Post →