KerasTuner: Machine Learning Hyperparameter Optimization

October 26, 2024 · 16 mins read

The manual search for the optimal hyperparameter values of a machine learning model is a tedious and time-consuming process. KerasTuner is a framework that automates the hyperparameter tuning and is the focus of today’s post.

To illustrate how to use the KerasTuner library, I have developed a binary image classifier for the PneumoniaMNIST dataset. Instead of training a neural network from scratch, I have adopted the pre-trained InceptionV3 model (Szegedy et al., 2016), by using the transfer learning technique, and trained a custom classification layer for pneumonia detection. The following hyperparameters are optimized using KerasTuner:

  • Learning rate
  • Number of neuron in the dense classification layer
  • Dropout rate

The repository containing the code and the trained model is available here. I have developed the codebase using Python 3.12, TensorFlow 2.16.1 and KerasTuner 1.4.7. The structure of the repository is as follows:

repo/
├── config/
│   └── params.template.yaml
├── hyperparameter_tuning/
│   ├── best_hyperparameters.pkl
│   └── history.txt
├── images/
│   ├── healthy.png
│   └── pneumonia.png
├── src/
│   ├── __version__.py
│   ├── evaluator.py
│   ├── hyperparameter_tuner.py
│   ├── plotter.py
│   ├── preprocessor.py
│   ├── trainer.py
│   └── utils.py
├── training/
│   ├── history.pkl
│   └── model.keras
├── .gitattributes
├── .gitignore
├── LICENSE
├── README.md
└── requirements.txt

The config directory stores the params.template.yaml, which allows to set the image size (further details later on), epoch count, and batch size. This file is version-controlled and to be modified it without triggering Git, copy the file and remove the .template string from the filename.

The set of optimal hyperparameters and tuning history are stored in the hyperparameter_tuning directory, while the training history and trained model are available in the training folder.

The src folder contains the source code files. First, the optimal hyperparameters values are computed by running the hyperparameter_tuner.py and then utilized for training the model with the trainer.py module. Both hyperparameter tuning and model training are performed by leveraging the training and validation datasets. Lastly, the evaluator.py module evaluates the model accuracy against the test set, i.e. a dataset not seen by the model during training. Training, validation and test datasets are generated by the preprocessor.py module. Let’s dive deeper into the details.

The PneumoniaMNIST is based on a dataset of 5,856 pediatric chest X-Ray images, and allows to perform binary classification of pneumonia against normal. The source training set is split with a ratio of 9:1 into training and validation set, and use its source validation set as the test set. The source images are gray-scale, and their sizes are (384–2,916) × (127–2,713). The images are center-cropped with a window size of length of the short edge and resized into 1 × 28 × 28 (Yang et al., 2023). The dataset can be downloaded either from Zenodo, a popular open repository developed under the European OpenAIRE program and operated by CERN, or from TensorFlow Datasets, a collection of datasets ready to use with TensorFlow or other Python machine learning frameworks, such as JAX. I have recently added PneumoniaMNIST to the TensorFlow Datasets library; for further details take a look at this pull request. In this repository, PneumoniaMNIST is downloaded from TensorFlow Datasets, which can be installed by running the command pip install tfds-nightly within your local virtual environment. As a side note, the requirements.txtis provided to quickly and easily setup the virtual environment.

The compute_datasets function of the preprocessor.py module returns the training, validation and test datasets, and requires the image size and batch size as inputs.

def compute_datasets(
    image_size: int, batch_size: int
) -> Tuple[Dataset, Dataset, Dataset]:
    """
    Preprocess data.

    Parameters
    ----------
    image_size : int
    batch_size : int

    Returns
    -------
    Tuple[Dataset, Dataset, Dataset]

    """
    training_dataset, validation_dataset, test_dataset = tfds.load(
        "pneumonia_mnist", split=["train", "val", "test"], as_supervised=True
    )

    image_size = [image_size, image_size]
    preprocess_image_ = functools.partial(preprocess_image, target_size=image_size)

    training_dataset = preprocess_dataset(
        training_dataset, preprocess_image_, batch_size
    )
    training_dataset = augment_data(training_dataset)
    validation_dataset = preprocess_dataset(
        validation_dataset, preprocess_image_, batch_size
    )
    test_dataset = preprocess_dataset(test_dataset, preprocess_image_, batch_size)

    return training_dataset, validation_dataset, test_dataset

The InceptionV3 model requires the images to have three inputs channels, and width and height should be no smaller than 75 pixels. Here each image is resized to be 150 × 150 pixels to better leverage the network depth and complexity, and converted to RGB. Moreover, the images are rescaled to be in the [0, 1] range. The preprocessing of each image is handled by the preprocess_image function.

def preprocess_image(
    image: tf.Tensor, label: tf.Tensor, target_size: Tuple[int, int]
) -> Tuple[tf.Tensor, tf.Tensor]:
    """
    Preprocess image.

    Parameters
    ----------
    image : tf.Tensor
    label : tf.Tensor
    target_size : Tuple[int, int]

    Returns
    -------
    Tuple[tf.Tensor, tf.Tensor]

    """
    image /= 255
    image = tf.image.resize(image, target_size)
    image = tf.image.grayscale_to_rgb(image)
    return image, label

The training dataset is augmented by applying random rotations, zooms, and horizontal flips to the images, as shown in the code snippet below. Refer to the TensorFlow 2.16.1 API documentation for further details.

def compute_augmentation_layers() -> Model:
    """
    Compute augmentation layers.

    Returns
    -------
    Model

    """
    return tf.keras.Sequential(
        [
            layers.RandomFlip("horizontal"),
            layers.RandomRotation((-0.1, 0.1)),
            layers.RandomZoom((-0.1, 0.1)),
        ]
    )

The hyperparameter_tuner.py module provides the build_model function, which returns the model to train and requires the hyperparameter set and image size as inputs. First, an instance of the InceptionV3 model is created whose include_top parameter is set to False to exclude the fully-connected layer at the top, i.e. the output layer, which will be replaced by the custom classification layer. The weights parameter is set to its default value imagenet, i.e. the pre-trained weights based on the ImageNet dataset. Finally, the model is compiled, and I have chosen the Root Mean Squared Propagation as the optimizer.

def build_model(hyper_parameters: kt.HyperParameters, image_size: int) -> Model:
    """
    Build a model starting from a pre-trained one. A parametric classifier is added.

    Parameters
    ----------
    hyper_parameters : kt.HyperParameters
    image_size : int

    Returns
    -------
    Model

    """
    hyper_parameters_ = HyperParametersConfig(hyper_parameters)

    pre_trained_model = InceptionV3(
        input_shape=(image_size, image_size, 3), include_top=False
    )
    for layer in pre_trained_model.layers:
        layer.trainable = False

    last_layer = pre_trained_model.get_layer("mixed7")
    last_output = last_layer.output

    x = layers.Flatten()(last_output)
    x = layers.Dense(hyper_parameters_.neuron_count, activation="relu")(x)
    x = layers.Dropout(hyper_parameters_.dropout)(x)
    x = layers.Dense(1, activation="sigmoid")(x)

    model = Model(pre_trained_model.input, x)
    model.compile(
        optimizer=RMSprop(learning_rate=hyper_parameters_.learning_rate),
        loss="binary_crossentropy",
        metrics=["accuracy"],
    )
    return model

The hyperparameter tuning is performed by the tune_model function, which leverages the Hyperband algorithm, a bandit-based approach to hyperparameter optimization (Li et al., 2018). As explained here

The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing \(1 + log_{factor}(\text{max_epochs})\) and rounding it up to the nearest integer.

The optimal hyperparameter are then saved to a pickle file in the hyperparameter_tuning folder. Other tuning algorithms are available in KerasTuner, such as GridSearch and BayesianOptimization.

def tune_model(
    training_dataset: Dataset,
    validation_dataset: Dataset,
    epoch_count: int,
    image_size: int,
) -> None:
    """
    Tune the model.

    Parameters
    ----------
    training_dataset : Dataset
    validation_dataset : Dataset
    epoch_count : int
    image_size : int

    """
    tuner = kt.Hyperband(
        lambda hyper_parameters: build_model(hyper_parameters, image_size),
        objective="val_accuracy",
        max_epochs=epoch_count,
        project_name="pneumoniamnist",
    )
    tuner.search_space_summary(extended=True)

    stop_early = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=5)
    callbacks = [tensorboard_callback, model_checkpoint_callback, stop_early]

    tuner.search(
        training_dataset,
        validation_data=validation_dataset,
        epochs=epoch_count,
        callbacks=callbacks,
    )
    tuner.results_summary()

    best_hyperparameters = tuner.get_best_hyperparameters()[0]
    with open(r"..\hyperparameter_tuning\best_hyperparameters.pkl", "wb") as file:
        pickle.dump(best_hyperparameters, file)

The HyperParametersConfig dataclass stores the possible values that the hyperparameters can take. Further details about how to define the hyperparameter space are available here.

@dataclass
class HyperParametersConfig:
    """
    Keras tuner hyperparameters.

    Parameters
    ----------
    hyper_parameters : kt.HyperParameters

    """

    hyper_parameters: kt.HyperParameters
    learning_rate: kt.HyperParameters.Choice = field(init=False)
    dropout: kt.HyperParameters.Float = field(init=False)
    neuron_count: kt.HyperParameters.Int = field(init=False)

    def __post_init__(self) -> None:
        self.learning_rate = self.hyper_parameters.Choice(
            "learning_rate", values=[1e-4, 1e-3, 1e-2]
        )
        self.dropout = self.hyper_parameters.Choice(
            "dropout", values=[0.1, 0.2, 0.3, 0.4, 0.5]
        )
        self.neuron_count = self.hyper_parameters.Choice(
            "neuron_count", values=[128, 256, 512, 1024]
        )

The tuner performed 30 iterations and the optimal parameter values, listed below, were discovered at trail #26:

  • Learning rate = 0.001
  • Number of neuron in the dense classification layer = 512
  • Dropout rate = 0.1

Finally, the model with optimal hyperparameters was tested on the test set (by using the evaluate method of the tf.keras.Model class), yielding an accuracy of 0.9199.

You should now have a clearer understanding of how to use KerasTuner for the hyperparameter tuning of a machine learning model. If you need help, feel free to contact me, for instance by opening an issue on GitHub.

References

  1. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826. https://doi.org/10.48550/arXiv.1512.00567
  2. Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., & Ni, B. (2023). Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1), 41. https://doi.org/10.1038/s41597-022-01721-8
  3. Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., & Talwalkar, A. (2018). Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. Journal of Machine Learning Research, 18(185), 1–52. http://jmlr.org/papers/v18/16-558.html