Skip to article frontmatterSkip to article content

Convolutional Neural Networks Tuning

In our previous attempt at image classification, we built a pretty decent convolutional neural network and achieved a respectable 75% accuracy on the CIFAR-10 dataset. However, we also observed a clear sign of overfitting - while training accuracy climbed, validation accuracy began to plateau and even dip.

In this notebook, we will try to mitigate this issue by using multiple tuning techniques, that will improve the model accuracy and make it much more robust.

Data Preparation

We will start by loading and preparing the CIFAR-10 dataset, just as we did before. This involves loading the data, defining human-readable class names, and normalizing the pixel values.

from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

Data Augmentation

Then, we may apply a technique called data augumentation. That’s one of the most effective ways to combat overfitting and improve model generalization, especially with image data.

This technique involves applying random (but realistic) transformations to our existing training images, effectively creating new training samples on the fly. This helps the model learn to be invariant to these slight variations - instead of simply memoizing them.

from tensorflow.keras import layers, Sequential

data_augmentation = Sequential([
    layers.RandomCrop(32, 32),
    layers.RandomFlip('horizontal'),
    layers.RandomRotation(0.1),
    layers.RandomContrast(0.2),
    layers.RandomZoom(0.15),
])

Let’s visualize what these augmentations look like on a few sample images from our training set. You may clearly see that each image is slightly different from its original, yet still clearly recognizable. Note that we clip values to [0, 1] for proper display after augmentation, as some transformations might push pixel values slightly out of this range.

import matplotlib.pyplot as plt
import numpy as np

augmented_example = data_augmentation(x_train[:25])
plt.figure(figsize=[10, 10])

for i in range(len(augmented_example)):
  plt.subplot(5, 5, i + 1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(np.clip(augmented_example[i], 0, 1), cmap=plt.cm.binary)
  plt.xlabel(class_names[y_train[i][0]])

plt.show()
<Figure size 1000x1000 with 25 Axes>

Label Encoding

from tensorflow.keras.utils import to_categorical
y_train_encoded = to_categorical(y_train, num_classes=len(class_names))
y_test_encoded = to_categorical(y_test, num_classes=len(class_names))

Building and Training the Model

For our improved model, we’ll adopt a more robust, VGG-inspired architecture, incorporating multiple normalization layers to stabilize training and deeper convolutional blocks to learn more intricate features.

Its core idea is to use repeating blocks of convolutional layers. Each block will consist of:

  • Convolutional Layers: We use two convolutional layers back-to-back. The first one finds initial features (like edges), and the second one looks at those features to find slightly more complex patterns (like corners or textures made of those edges) before we simplify things. It’s like taking a first look, then a closer second look.
  • Batch Normalization: After our convolutional layers work their magic, it steps in. It helps keep the learning process smooth and steady, like a good guide keeping everyone on track. This helps the network train faster and can also prevent it from getting too stuck on the training data (overfitting).
  • Activation Function: Just like before, the ReLU activation helps the network make non-linear decisions, deciding which features are important enough to pass on. Note that is is put after the batch normalization, which is a pretty common practice.
  • Feature Condenser: After finding detailed features, it picks out the strongest signals and shrinks the information. This makes our model more efficient and helps it recognize objects even if they are slightly moved or rotated.
  • Dropout: To stop our network from simply memorizing the training images (which would make it bad at recognizing new images), it randomly ignores some of the learned features during training. This forces the network to be more robust and general ways to identify objects.

By stacking these blocks, and progressively increasing the number of filters, we could build a neural network able to to perform complex visual understanding. The initial blocks might learn simple edges and colors, while deeper blocks combine these to recognize textures, parts of objects, and eventually, the objects themselves.

from tensorflow.keras import layers
def vgg_block(filters, dropout_rate=0.15):
    return layers.Pipeline([
        layers.Conv2D(filters, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.Conv2D(filters, (3, 3), padding='same'),
        layers.BatchNormalization(),
        layers.Activation('relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Dropout(dropout_rate),
    ])

Just as before, our full model will consist of the feature learning part, now built by stacking several of these VGG blocks, and the integrated data augmentation layer. This is then followed by the classification part, which takes these learned features and decides what object is in the image.

from tensorflow.keras import Sequential

num_classes = len(class_names)
input_shape = x_train.shape[1:]

feature_learning = Sequential([
    layers.Input(shape=input_shape),
    data_augmentation,
    vgg_block(32),
    vgg_block(64),
    vgg_block(128),
    vgg_block(256),
])

classification = Sequential([
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.35),
    layers.Dense(num_classes, activation='softmax'),
])

model = Sequential([
    feature_learning,
    classification,
])

To aid our training process and combat overfitting more effectively, we will use a couple of callbacks:

  • Reduce Learning Rate: This will reduce the learning rate when the validation loss stops improving, helping the model to fine-tune.
  • Early Stopping: This will stop training if the validation loss doesn’t improve for a set number of epochs, also restoring the weights from the best epoch. We’ll give it a bit more patience this time.
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
earlystop = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', patience=8, min_lr=0.00001)
callbacks = [reduce_lr, earlystop]

We’ll might also use the AdamW optimizer, which is an extension of the Adam optimizer that incorporates the normalization technique called weight decay, often leading to better generalization.

That’s a regularization technique that helps prevent overfitting by adding a penalty to the loss function proportional to the model’s weights. By keeping them smaller, the model tends to be simpler and less likely to fit the noise in the training data, leading to better generalization on unseen data.

from tensorflow.keras.optimizers import AdamW
optimizer = AdamW(weight_decay=0.001)

Now, let’s compile and train our tuned model. This time we might start using the GPU - our model is finally complex enough to leverage the parallelism it offers to speed up the training process.

from tensorflow import device
with device('/GPU'):
    model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train_encoded, epochs=120, batch_size=64, callbacks=callbacks, validation_split=0.2)
Output
Epoch 1/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 49s 65ms/step - accuracy: 0.3173 - loss: 2.0250 - val_accuracy: 0.5261 - val_loss: 1.3037 - learning_rate: 0.0010
Epoch 2/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 64ms/step - accuracy: 0.5015 - loss: 1.3959 - val_accuracy: 0.6225 - val_loss: 1.0665 - learning_rate: 0.0010
Epoch 3/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 63ms/step - accuracy: 0.5868 - loss: 1.1730 - val_accuracy: 0.5550 - val_loss: 1.2927 - learning_rate: 0.0010
Epoch 4/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.6337 - loss: 1.0515 - val_accuracy: 0.6321 - val_loss: 1.1115 - learning_rate: 0.0010
Epoch 5/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.6613 - loss: 0.9739 - val_accuracy: 0.6468 - val_loss: 1.0793 - learning_rate: 0.0010
Epoch 6/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 62ms/step - accuracy: 0.6865 - loss: 0.9024 - val_accuracy: 0.6738 - val_loss: 0.9711 - learning_rate: 0.0010
Epoch 7/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.7038 - loss: 0.8640 - val_accuracy: 0.7018 - val_loss: 0.8726 - learning_rate: 0.0010
Epoch 8/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.7166 - loss: 0.8186 - val_accuracy: 0.7518 - val_loss: 0.7230 - learning_rate: 0.0010
Epoch 9/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7347 - loss: 0.7756 - val_accuracy: 0.6818 - val_loss: 1.0146 - learning_rate: 0.0010
Epoch 10/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7434 - loss: 0.7508 - val_accuracy: 0.7361 - val_loss: 0.8191 - learning_rate: 0.0010
Epoch 11/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7498 - loss: 0.7304 - val_accuracy: 0.7549 - val_loss: 0.7467 - learning_rate: 0.0010
Epoch 12/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7639 - loss: 0.6798 - val_accuracy: 0.7483 - val_loss: 0.7403 - learning_rate: 0.0010
Epoch 13/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7680 - loss: 0.6833 - val_accuracy: 0.7752 - val_loss: 0.6775 - learning_rate: 0.0010
Epoch 14/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7698 - loss: 0.6661 - val_accuracy: 0.7787 - val_loss: 0.6567 - learning_rate: 0.0010
Epoch 15/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.7803 - loss: 0.6380 - val_accuracy: 0.7981 - val_loss: 0.6100 - learning_rate: 0.0010
Epoch 16/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 42s 67ms/step - accuracy: 0.7865 - loss: 0.6228 - val_accuracy: 0.7918 - val_loss: 0.6278 - learning_rate: 0.0010
Epoch 17/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.7910 - loss: 0.6136 - val_accuracy: 0.7938 - val_loss: 0.6335 - learning_rate: 0.0010
Epoch 18/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 60ms/step - accuracy: 0.7947 - loss: 0.6002 - val_accuracy: 0.7894 - val_loss: 0.6192 - learning_rate: 0.0010
Epoch 19/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8000 - loss: 0.5841 - val_accuracy: 0.8074 - val_loss: 0.5701 - learning_rate: 0.0010
Epoch 20/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 62ms/step - accuracy: 0.8073 - loss: 0.5638 - val_accuracy: 0.7641 - val_loss: 0.7581 - learning_rate: 0.0010
Epoch 21/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8115 - loss: 0.5459 - val_accuracy: 0.7910 - val_loss: 0.6468 - learning_rate: 0.0010
Epoch 22/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8213 - loss: 0.5317 - val_accuracy: 0.8133 - val_loss: 0.5658 - learning_rate: 0.0010
Epoch 23/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8165 - loss: 0.5289 - val_accuracy: 0.8425 - val_loss: 0.4705 - learning_rate: 0.0010
Epoch 24/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 61ms/step - accuracy: 0.8177 - loss: 0.5365 - val_accuracy: 0.8287 - val_loss: 0.4994 - learning_rate: 0.0010
Epoch 25/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 37s 59ms/step - accuracy: 0.8247 - loss: 0.5153 - val_accuracy: 0.8093 - val_loss: 0.5767 - learning_rate: 0.0010
Epoch 26/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 60ms/step - accuracy: 0.8324 - loss: 0.4902 - val_accuracy: 0.8383 - val_loss: 0.4807 - learning_rate: 0.0010
Epoch 27/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 37s 59ms/step - accuracy: 0.8285 - loss: 0.4958 - val_accuracy: 0.8342 - val_loss: 0.5064 - learning_rate: 0.0010
Epoch 28/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 37s 58ms/step - accuracy: 0.8328 - loss: 0.4927 - val_accuracy: 0.8434 - val_loss: 0.4567 - learning_rate: 0.0010
Epoch 29/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 37s 59ms/step - accuracy: 0.8335 - loss: 0.4877 - val_accuracy: 0.8533 - val_loss: 0.4342 - learning_rate: 0.0010
Epoch 30/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 64ms/step - accuracy: 0.8385 - loss: 0.4680 - val_accuracy: 0.8215 - val_loss: 0.5694 - learning_rate: 0.0010
Epoch 31/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8421 - loss: 0.4631 - val_accuracy: 0.8139 - val_loss: 0.5582 - learning_rate: 0.0010
Epoch 32/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.8428 - loss: 0.4576 - val_accuracy: 0.8476 - val_loss: 0.4537 - learning_rate: 0.0010
Epoch 33/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 37s 59ms/step - accuracy: 0.8396 - loss: 0.4561 - val_accuracy: 0.8579 - val_loss: 0.4322 - learning_rate: 0.0010
Epoch 34/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8484 - loss: 0.4436 - val_accuracy: 0.8479 - val_loss: 0.4545 - learning_rate: 0.0010
Epoch 35/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.8494 - loss: 0.4379 - val_accuracy: 0.8498 - val_loss: 0.4454 - learning_rate: 0.0010
Epoch 36/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8522 - loss: 0.4257 - val_accuracy: 0.8493 - val_loss: 0.4577 - learning_rate: 0.0010
Epoch 37/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8534 - loss: 0.4337 - val_accuracy: 0.8417 - val_loss: 0.4792 - learning_rate: 0.0010
Epoch 38/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8535 - loss: 0.4282 - val_accuracy: 0.8083 - val_loss: 0.6126 - learning_rate: 0.0010
Epoch 39/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.8546 - loss: 0.4198 - val_accuracy: 0.8356 - val_loss: 0.5021 - learning_rate: 0.0010
Epoch 40/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 61ms/step - accuracy: 0.8536 - loss: 0.4244 - val_accuracy: 0.8552 - val_loss: 0.4324 - learning_rate: 0.0010
Epoch 41/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8619 - loss: 0.4047 - val_accuracy: 0.8412 - val_loss: 0.5104 - learning_rate: 0.0010
Epoch 42/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 38s 61ms/step - accuracy: 0.8671 - loss: 0.3913 - val_accuracy: 0.8723 - val_loss: 0.3853 - learning_rate: 1.0000e-04
Epoch 43/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 37s 60ms/step - accuracy: 0.8777 - loss: 0.3583 - val_accuracy: 0.8728 - val_loss: 0.3825 - learning_rate: 1.0000e-04
Epoch 44/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 65ms/step - accuracy: 0.8802 - loss: 0.3466 - val_accuracy: 0.8739 - val_loss: 0.3780 - learning_rate: 1.0000e-04
Epoch 45/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 65ms/step - accuracy: 0.8845 - loss: 0.3368 - val_accuracy: 0.8711 - val_loss: 0.3920 - learning_rate: 1.0000e-04
Epoch 46/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 64ms/step - accuracy: 0.8816 - loss: 0.3400 - val_accuracy: 0.8740 - val_loss: 0.3830 - learning_rate: 1.0000e-04
Epoch 47/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 63ms/step - accuracy: 0.8850 - loss: 0.3301 - val_accuracy: 0.8789 - val_loss: 0.3728 - learning_rate: 1.0000e-04
Epoch 48/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 42s 68ms/step - accuracy: 0.8868 - loss: 0.3322 - val_accuracy: 0.8757 - val_loss: 0.3834 - learning_rate: 1.0000e-04
Epoch 49/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8888 - loss: 0.3230 - val_accuracy: 0.8756 - val_loss: 0.3784 - learning_rate: 1.0000e-04
Epoch 50/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 65ms/step - accuracy: 0.8914 - loss: 0.3189 - val_accuracy: 0.8769 - val_loss: 0.3720 - learning_rate: 1.0000e-04
Epoch 51/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 64ms/step - accuracy: 0.8929 - loss: 0.3130 - val_accuracy: 0.8801 - val_loss: 0.3644 - learning_rate: 1.0000e-04
Epoch 52/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 62ms/step - accuracy: 0.8952 - loss: 0.3114 - val_accuracy: 0.8796 - val_loss: 0.3712 - learning_rate: 1.0000e-04
Epoch 53/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 42s 66ms/step - accuracy: 0.8902 - loss: 0.3222 - val_accuracy: 0.8795 - val_loss: 0.3718 - learning_rate: 1.0000e-04
Epoch 54/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8938 - loss: 0.3060 - val_accuracy: 0.8787 - val_loss: 0.3676 - learning_rate: 1.0000e-04
Epoch 55/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 65ms/step - accuracy: 0.8931 - loss: 0.3097 - val_accuracy: 0.8755 - val_loss: 0.3823 - learning_rate: 1.0000e-04
Epoch 56/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 65ms/step - accuracy: 0.8940 - loss: 0.3038 - val_accuracy: 0.8758 - val_loss: 0.3838 - learning_rate: 1.0000e-04
Epoch 57/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 42s 68ms/step - accuracy: 0.8955 - loss: 0.3075 - val_accuracy: 0.8794 - val_loss: 0.3719 - learning_rate: 1.0000e-04
Epoch 58/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 41s 66ms/step - accuracy: 0.8916 - loss: 0.3152 - val_accuracy: 0.8739 - val_loss: 0.3895 - learning_rate: 1.0000e-04
Epoch 59/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 42s 68ms/step - accuracy: 0.8925 - loss: 0.3060 - val_accuracy: 0.8800 - val_loss: 0.3652 - learning_rate: 1.0000e-04
Epoch 60/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 64ms/step - accuracy: 0.8952 - loss: 0.2977 - val_accuracy: 0.8796 - val_loss: 0.3708 - learning_rate: 1.0000e-05
Epoch 61/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 63ms/step - accuracy: 0.8974 - loss: 0.3007 - val_accuracy: 0.8790 - val_loss: 0.3722 - learning_rate: 1.0000e-05
Epoch 62/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 39s 63ms/step - accuracy: 0.8975 - loss: 0.2991 - val_accuracy: 0.8798 - val_loss: 0.3711 - learning_rate: 1.0000e-05
Epoch 63/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 44s 71ms/step - accuracy: 0.8970 - loss: 0.2975 - val_accuracy: 0.8789 - val_loss: 0.3730 - learning_rate: 1.0000e-05
Epoch 64/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 63ms/step - accuracy: 0.8997 - loss: 0.2942 - val_accuracy: 0.8789 - val_loss: 0.3736 - learning_rate: 1.0000e-05
Epoch 65/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 64ms/step - accuracy: 0.8987 - loss: 0.2912 - val_accuracy: 0.8793 - val_loss: 0.3746 - learning_rate: 1.0000e-05
Epoch 66/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 63ms/step - accuracy: 0.8956 - loss: 0.3028 - val_accuracy: 0.8794 - val_loss: 0.3704 - learning_rate: 1.0000e-05
Epoch 67/120
625/625 ━━━━━━━━━━━━━━━━━━━━ 40s 63ms/step - accuracy: 0.8988 - loss: 0.2932 - val_accuracy: 0.8788 - val_loss: 0.3701 - learning_rate: 1.0000e-05

Result

from sklearn.metrics import classification_report
import numpy as np
with device('/GPU'):
    y_pred_probs = model.predict(x_test, verbose=False)
    y_pred_labels = np.argmax(y_pred_probs, axis=1)
    y_true_labels = np.argmax(y_test_encoded, axis=1)
    print(classification_report(y_true_labels, y_pred_labels, target_names=class_names))
              precision    recall  f1-score   support

    airplane       0.90      0.90      0.90      1000
  automobile       0.92      0.96      0.94      1000
        bird       0.83      0.85      0.84      1000
         cat       0.81      0.72      0.76      1000
        deer       0.86      0.86      0.86      1000
         dog       0.86      0.78      0.82      1000
        frog       0.84      0.95      0.89      1000
       horse       0.91      0.91      0.91      1000
        ship       0.94      0.92      0.93      1000
       truck       0.88      0.93      0.90      1000

    accuracy                           0.88     10000
   macro avg       0.88      0.88      0.88     10000
weighted avg       0.88      0.88      0.88     10000

Conclusion

By systematically incorporating data augmentation, architectural changes, and intelligent training callbacks like learning rate reduction and early stopping, we’ve substantially boosted our model’s performance. The leap from 75% to a much more compelling 88% accuracy on the CIFAR-10 test set clearly demonstrates the power of these combined techniques.

More importantly, these improvements weren’t just about chasing a higher accuracy figure - they were crucial in addressing the overfitting observed previously. The model now generalizes better to unseen data, making it more much more reliable.

While this tuned model shows significant progress, future improvements could involve experimenting with even deeper or wider networks, attention mechanisms, or leveraging transfer learning for even greater accuracy.