CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
better-data-science

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: better-data-science/TensorFlow
Path: blob/main/005_Optimize_Neural_Network_Architecture.ipynb
Views: 47
Kernel: Python 3.9.7 64-bit ('env_tensorflow': conda)
import os import numpy as np import pandas as pd import itertools import warnings os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' warnings.filterwarnings('ignore') df = pd.read_csv('data/winequalityN.csv') df.sample(5)
from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler # Prepare the data df = df.dropna() df['is_white_wine'] = [1 if typ == 'white' else 0 for typ in df['type']] df['is_good_wine'] = [1 if quality >= 6 else 0 for quality in df['quality']] df.drop(['type', 'quality'], axis=1, inplace=True) # Train/test split X = df.drop('is_good_wine', axis=1) y = df['is_good_wine'] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Scaling scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)

How will we approach optimization

import tensorflow as tf tf.random.set_seed(42)
Init Plugin Init Graph Optimizer Init Kernel
  • Let's declare some constants

    • We want to optimize a network with 3 hidden layers

    • Each hidden layer can have from 64 to 256 nodes

    • The step size between nodes is 64

      • So the possibilities are: 64, 128, 192, 256

num_layers = 3 min_nodes_per_layer, max_nodes_per_layer = 64, 256 node_step_size = 64
  • Possibilities:

node_options = list(range( min_nodes_per_layer, max_nodes_per_layer + 1, node_step_size )) node_options
[64, 128, 192, 256]
  • Taking them to two layers:

two_layer_possibilities = [node_options, node_options] two_layer_possibilities
[[64, 128, 192, 256], [64, 128, 192, 256]]
  • And now it's just a task of calculating all permutations between these two lists:

list(itertools.product(*two_layer_possibilities))
[(64, 64), (64, 128), (64, 192), (64, 256), (128, 64), (128, 128), (128, 192), (128, 256), (192, 64), (192, 128), (192, 192), (192, 256), (256, 64), (256, 128), (256, 192), (256, 256)]
  • We want to optimize a 3-layer-deep neural network, so we'll have a bit more possibilities:

layer_possibilities = [node_options] * num_layers layer_possibilities
[[64, 128, 192, 256], [64, 128, 192, 256], [64, 128, 192, 256]]
  • Here are the permutations:

layer_node_permutations = list(itertools.product(*layer_possibilities)) layer_node_permutations
[(64, 64, 64), (64, 64, 128), (64, 64, 192), (64, 64, 256), (64, 128, 64), (64, 128, 128), (64, 128, 192), (64, 128, 256), (64, 192, 64), (64, 192, 128), (64, 192, 192), (64, 192, 256), (64, 256, 64), (64, 256, 128), (64, 256, 192), (64, 256, 256), (128, 64, 64), (128, 64, 128), (128, 64, 192), (128, 64, 256), (128, 128, 64), (128, 128, 128), (128, 128, 192), (128, 128, 256), (128, 192, 64), (128, 192, 128), (128, 192, 192), (128, 192, 256), (128, 256, 64), (128, 256, 128), (128, 256, 192), (128, 256, 256), (192, 64, 64), (192, 64, 128), (192, 64, 192), (192, 64, 256), (192, 128, 64), (192, 128, 128), (192, 128, 192), (192, 128, 256), (192, 192, 64), (192, 192, 128), (192, 192, 192), (192, 192, 256), (192, 256, 64), (192, 256, 128), (192, 256, 192), (192, 256, 256), (256, 64, 64), (256, 64, 128), (256, 64, 192), (256, 64, 256), (256, 128, 64), (256, 128, 128), (256, 128, 192), (256, 128, 256), (256, 192, 64), (256, 192, 128), (256, 192, 192), (256, 192, 256), (256, 256, 64), (256, 256, 128), (256, 256, 192), (256, 256, 256)]

We'll iterate over the permutations and then iterate again over the values of individual permutation to get the node count for each hidden layer:

for permutation in layer_node_permutations[:2]: for nodes_at_layer in permutation: print(nodes_at_layer) print()
64 64 64 64 64 128
  • We'll create a new Sequential model at each iteration

    • And add an InputLayer to it with a shape of (12,) (the number of columns in our dataset)

  • Then, we'll iterate over the items in a single permutation and add a Dense layer to the model with the current number of nodes

  • Finally, we'll add a Dense output layer

  • We'll also setting a name to the model so it's easier to compare them later:

models = [] for permutation in layer_node_permutations: model = tf.keras.Sequential() model.add(tf.keras.layers.InputLayer(input_shape=(12,))) model_name = '' for nodes_at_layer in permutation: model.add(tf.keras.layers.Dense(nodes_at_layer, activation='relu')) model_name += f'dense{nodes_at_layer}_' model.add(tf.keras.layers.Dense(1, activation='sigmoid')) model._name = model_name[:-1] models.append(model)
Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB
  • Here's how a single model looks like:

models[0].summary()
Model: "dense64_dense64_dense64" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 64) 832 _________________________________________________________________ dense_1 (Dense) (None, 64) 4160 _________________________________________________________________ dense_2 (Dense) (None, 64) 4160 _________________________________________________________________ dense_3 (Dense) (None, 1) 65 ================================================================= Total params: 9,217 Trainable params: 9,217 Non-trainable params: 0 _________________________________________________________________
  • Not too bad, right?

  • Let's wrap all this logic into a single function next.



Get architecture possibilities from a function

  • This one will have a lot of parameters

  • But it doesn't do anything we haven't discussed so far:

def get_models(num_layers: int, min_nodes_per_layer: int, max_nodes_per_layer: int, node_step_size: int, input_shape: tuple, hidden_layer_activation: str = 'relu', num_nodes_at_output: int = 1, output_layer_activation: str = 'sigmoid') -> list: node_options = list(range(min_nodes_per_layer, max_nodes_per_layer + 1, node_step_size)) layer_possibilities = [node_options] * num_layers layer_node_permutations = list(itertools.product(*layer_possibilities)) models = [] for permutation in layer_node_permutations: model = tf.keras.Sequential() model.add(tf.keras.layers.InputLayer(input_shape=input_shape)) model_name = '' for nodes_at_layer in permutation: model.add(tf.keras.layers.Dense(nodes_at_layer, activation=hidden_layer_activation)) model_name += f'dense{nodes_at_layer}_' model.add(tf.keras.layers.Dense(num_nodes_at_output, activation=output_layer_activation)) model._name = model_name[:-1] models.append(model) return models
  • Let's test it:

all_models = get_models( num_layers=3, min_nodes_per_layer=64, max_nodes_per_layer=256, node_step_size=64, input_shape=(12,) )
  • Let's print the names and the count:

print(f'#Models = {len(all_models)}') print() for model in all_models: print(model.name)
#Models = 64 dense64_dense64_dense64 dense64_dense64_dense128 dense64_dense64_dense192 dense64_dense64_dense256 dense64_dense128_dense64 dense64_dense128_dense128 dense64_dense128_dense192 dense64_dense128_dense256 dense64_dense192_dense64 dense64_dense192_dense128 dense64_dense192_dense192 dense64_dense192_dense256 dense64_dense256_dense64 dense64_dense256_dense128 dense64_dense256_dense192 dense64_dense256_dense256 dense128_dense64_dense64 dense128_dense64_dense128 dense128_dense64_dense192 dense128_dense64_dense256 dense128_dense128_dense64 dense128_dense128_dense128 dense128_dense128_dense192 dense128_dense128_dense256 dense128_dense192_dense64 dense128_dense192_dense128 dense128_dense192_dense192 dense128_dense192_dense256 dense128_dense256_dense64 dense128_dense256_dense128 dense128_dense256_dense192 dense128_dense256_dense256 dense192_dense64_dense64 dense192_dense64_dense128 dense192_dense64_dense192 dense192_dense64_dense256 dense192_dense128_dense64 dense192_dense128_dense128 dense192_dense128_dense192 dense192_dense128_dense256 dense192_dense192_dense64 dense192_dense192_dense128 dense192_dense192_dense192 dense192_dense192_dense256 dense192_dense256_dense64 dense192_dense256_dense128 dense192_dense256_dense192 dense192_dense256_dense256 dense256_dense64_dense64 dense256_dense64_dense128 dense256_dense64_dense192 dense256_dense64_dense256 dense256_dense128_dense64 dense256_dense128_dense128 dense256_dense128_dense192 dense256_dense128_dense256 dense256_dense192_dense64 dense256_dense192_dense128 dense256_dense192_dense192 dense256_dense192_dense256 dense256_dense256_dense64 dense256_dense256_dense128 dense256_dense256_dense192 dense256_dense256_dense256
  • So we have 64 models in total

  • It will take some time to optimize

  • Let's declare another function for that



Model optimization function

  • This one will accept the list of models, training and testing sets (both features and the target), and optionally a number of epochs and verbosity

    • It's advised to set verbosity to 0 so you don't get overwhelmed with the console output

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
def optimize(models: list, X_train: np.array, y_train: np.array, X_test: np.array, y_test: np.array, epochs: int = 50, verbose: int = 0) -> pd.DataFrame: # We'll store the results here results = [] def train(model: tf.keras.Sequential) -> dict: # Change this however you want model.compile( loss=tf.keras.losses.binary_crossentropy, optimizer=tf.keras.optimizers.Adam(), metrics=[ tf.keras.metrics.BinaryAccuracy(name='accuracy') ] ) # Train the model model.fit( X_train, y_train, epochs=epochs, verbose=verbose ) # Make predictions on the test set preds = model.predict(X_test) prediction_classes = [1 if prob > 0.5 else 0 for prob in np.ravel(preds)] # Return evaluation metrics on the test set return { 'model_name': model.name, 'test_accuracy': accuracy_score(y_test, prediction_classes), 'test_precision': precision_score(y_test, prediction_classes), 'test_recall': recall_score(y_test, prediction_classes), 'test_f1': f1_score(y_test, prediction_classes) } # Train every model and save results for model in models: try: print(model.name, end=' ... ') res = train(model=model) results.append(res) except Exception as e: print(f'{model.name} --> {str(e)}') return pd.DataFrame(results)
  • Let's optimize the architecture!

  • It will take some time

optimization_results = optimize( models=models, X_train=X_train_scaled, y_train=y_train, X_test=X_test_scaled, y_test=y_test )
dense64_dense64_dense64 ... dense64_dense64_dense128 ... dense64_dense64_dense192 ... dense64_dense64_dense256 ... dense64_dense128_dense64 ... dense64_dense128_dense128 ... dense64_dense128_dense192 ... dense64_dense128_dense256 ... dense64_dense192_dense64 ... dense64_dense192_dense128 ... dense64_dense192_dense192 ... dense64_dense192_dense256 ... dense64_dense256_dense64 ... dense64_dense256_dense128 ... dense64_dense256_dense192 ... dense64_dense256_dense256 ... dense128_dense64_dense64 ... dense128_dense64_dense128 ... dense128_dense64_dense192 ... dense128_dense64_dense256 ... dense128_dense128_dense64 ... dense128_dense128_dense128 ... dense128_dense128_dense192 ... dense128_dense128_dense256 ... dense128_dense192_dense64 ... dense128_dense192_dense128 ... dense128_dense192_dense192 ... dense128_dense192_dense256 ... dense128_dense256_dense64 ... dense128_dense256_dense128 ... dense128_dense256_dense192 ... dense128_dense256_dense256 ... dense192_dense64_dense64 ... dense192_dense64_dense128 ... dense192_dense64_dense192 ... dense192_dense64_dense256 ... dense192_dense128_dense64 ... dense192_dense128_dense128 ... dense192_dense128_dense192 ... dense192_dense128_dense256 ... dense192_dense192_dense64 ... dense192_dense192_dense128 ... dense192_dense192_dense192 ... dense192_dense192_dense256 ... dense192_dense256_dense64 ... dense192_dense256_dense128 ... dense192_dense256_dense192 ... dense192_dense256_dense256 ... dense256_dense64_dense64 ... dense256_dense64_dense128 ... dense256_dense64_dense192 ... dense256_dense64_dense256 ... dense256_dense128_dense64 ... dense256_dense128_dense128 ... dense256_dense128_dense192 ... dense256_dense128_dense256 ... dense256_dense192_dense64 ... dense256_dense192_dense128 ... dense256_dense192_dense192 ... dense256_dense192_dense256 ... dense256_dense256_dense64 ... dense256_dense256_dense128 ... dense256_dense256_dense192 ... dense256_dense256_dense256 ...
optimization_results.sort_values(by='test_accuracy', ascending=False)
  • And there you have it!