General Middleware

Creating and Tuning Neural Network Models

Context

Nielsen reports that U.S. card fraud (credit, debt, etc) was reportedly 9 billion dollars in 2016 and expected to increase to 12 billion dollars by 2020. For perspective, in 2017 both PayPal’s and Mastercard’s revenue was only $10.8 billion each. Therefore, it is important that credit card companies should be able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

Objective:

Suppose you are working as a Data scientist in a Credit Card company named “CCFraud” and given the credit card transactions, you need to build a Model (i.e. Multilayer perceptrons) for Fraud Detection using Keras.

This notebook covers,

  1. Creating a Model
  2. Adding Layers
  3. Activations
  4. Optimizers and Loss functions
  5. Earlystopping
  6. Weight Initalization
  7. Dropout
  8. Model Evaluation

Dataset Description

The dataset contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, the original features and more background information about the data is not provided. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are ‘Time’ and ‘Amount’.

Time contains the seconds elapsed between each transaction and the first transaction in the dataset.

Amount is the transaction Amount, this feature can be used for example-dependant cost-senstive learning.

Class is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Import all necessary libraries

In [1]:

#importing tensorflow
import tensorflow as tf
print(tf.__version__)
2.7.0

In [2]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn import preprocessing
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, auc
import matplotlib.pyplot as plt
from tensorflow.keras import optimizers
from sklearn.decomposition import PCA
import seaborn as sns
import keras
import tensorflow as tf
from keras import backend as K
from keras.models import Sequential
from keras.layers import Dense, Dropout

In [47]:

mouting the drive
from google.colab import drive
drive.mount('/content/drive/')

Importing data

In [6]:

#Defining the path of the dataset
project_path = 'My Drive'
dataset_file = 'creditcard.csv'

In [7]:

#reading dataset
data = pd.read_csv(dataset_file)

Overview of Dataset

In [8]:

data.head()

Out[8]:

TimeV1V2V3V4V5V6V7V8V9V21V22V23V24V25V26V27V28AmountClass
00.0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.363787-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.620
10.01.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.690
21.0-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.660
31.0-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.500
42.0-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.817739-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.990

5 rows × 31 columns

Let’s check the missing values

In [9]:

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     284807 non-null  float64
 22  V22     284807 non-null  float64
 23  V23     284807 non-null  float64
 24  V24     284807 non-null  float64
 25  V25     284807 non-null  float64
 26  V26     284807 non-null  float64
 27  V27     284807 non-null  float64
 28  V28     284807 non-null  float64
 29  Amount  284807 non-null  float64
 30  Class   284807 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 67.4 MB
  • This shows that there are 284807 instances and 31 attributes including the class attribute.
  • As you can see there are no null values in any of the column

In [10]:

#Number of distinct categories or classes i.e., Fraudulent and Genuine
data['Class'].nunique()

Out[10]:

2
  • As expected, there are only 2 classes.

In [11]:

#checking the percentage of each class in the dataset
(data.Class.value_counts())/(data.Class.count())

Out[11]:

0    0.998273
1    0.001727
Name: Class, dtype: float64
  • This shows a complete imbalance of classes. There are 99.82% ‘Genuine’ (0) instances and only 0.17% ‘Fraudulent’ (1) instances. This means that we are aiming to predict anomalous events.

In [12]:

print("*********Losses due to fraud:************\n")
print("Total amount lost to fraud")
print(data.Amount[data.Class == 1].sum())
print("Mean amount per fraudulent transaction")
print(data.Amount[data.Class == 1].mean())
print("Compare to normal transactions:")
print("Total amount from normal transactions")
print(data.Amount[data.Class == 0].sum())
print("Mean amount per normal transactions")
print(data.Amount[data.Class == 0].mean())
*********Losses due to fraud:************

Total amount lost to fraud
60127.97
Mean amount per fraudulent transaction
122.21132113821133
Compare to normal transactions:
Total amount from normal transactions
25102462.04
Mean amount per normal transactions
88.29102242225574

Let’s Explore the data

In [13]:

#visual representation of instances per class
data.Class.value_counts().plot.bar()

Out[13]:

<AxesSubplot:>

Above plot does not give bettter visual representation of the class imbalance. The below plot after PCA gives a better visualization of the imbalance in the datasets. PCA helps to visualize the high dimensional data into lower dimensions

In [14]:

#PCA is performed for visualization only

pca= PCA(n_components=2)
creditcard_2d= pd.DataFrame(pca.fit_transform(data.iloc[:,0:30]))
creditcard_2d= pd.concat([creditcard_2d, data['Class']], axis=1)
creditcard_2d.columns= ['x', 'y', 'Class']
sns.lmplot(x='x', y='y', data=creditcard_2d, fit_reg=False, hue='Class')

Out[14]:

<seaborn.axisgrid.FacetGrid at 0x1a08571af70>
  • As you can see, PCA gives a better visualization of the imbalance in the datasets.

In [15]:

#Histrogram for feature Time
f, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(12,4))

ax1.hist(data["Time"][data["Class"] == 1], bins = 50)
ax1.set_title('Fraudulent')

ax2.hist(data["Time"][data["Class"] == 0], bins = 50)
ax2.set_title('Genuine')

plt.xlabel('Seconds after transaction number zero')
plt.ylabel('Number of Transactions')
plt.show()
  • The transactions occur in a cyclic way. But the time feature does not provide any useful information as the time when the first transaction was initiated is not given. Thus, we’ll drop this feature.

In [16]:

#Dropping time feature
data = data.drop("Time", axis = 1)

Let’s take a look at the V1,…,V28 features.

In [17]:

Vfeatures = data.iloc[:,1:29].columns
print(Vfeatures)
Index(['V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12',
       'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20', 'V21', 'V22',
       'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount'],
      dtype='object')

In [18]:

data

Out[18]:

V1V2V3V4V5V6V7V8V9V10V21V22V23V24V25V26V27V28AmountClass
0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.090794-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.620
11.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.166974-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.690
2-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.660
3-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.500
4-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.753074-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.990
284802-11.88111810.071785-9.834783-2.066656-5.364473-2.606837-4.9182157.3053341.9144284.3561700.2134540.1118641.014480-0.5093481.4368070.2500340.9436510.8237310.770
284803-0.732789-0.0550802.035030-0.7385890.8682291.0584150.0243300.2948690.584800-0.9759260.2142050.9243840.012463-1.016226-0.606624-0.3952550.068472-0.05352724.790
2848041.919565-0.301254-3.249640-0.5578282.6305153.031260-0.2968270.7084170.432454-0.4847820.2320450.578229-0.0375010.6401340.265745-0.0873710.004455-0.02656167.880
284805-0.2404400.5304830.7025100.689799-0.3779610.623708-0.6861800.6791450.392087-0.3991260.2652450.800049-0.1632980.123205-0.5691590.5466680.1088210.10453310.000
284806-0.533413-0.1897330.703337-0.506271-0.012546-0.6496171.577006-0.4146500.486180-0.9154270.2610570.6430780.3767770.008797-0.473649-0.818267-0.0024150.013649217.000

284807 rows × 30 columns

Separating response variable and predictors

In [19]:

X_data = data.iloc[:,0:29]
y_data = data.iloc[:, -1]

In [20]:

#printing the shape of the data 
print(y_data.shape)
print(X_data.shape)
(284807,)
(284807, 29)

In [21]:

X_data

Out[21]:

V1V2V3V4V5V6V7V8V9V10V20V21V22V23V24V25V26V27V28Amount
0-1.359807-0.0727812.5363471.378155-0.3383210.4623880.2395990.0986980.3637870.0907940.251412-0.0183070.277838-0.1104740.0669280.128539-0.1891150.133558-0.021053149.62
11.1918570.2661510.1664800.4481540.060018-0.082361-0.0788030.085102-0.255425-0.166974-0.069083-0.225775-0.6386720.101288-0.3398460.1671700.125895-0.0089830.0147242.69
2-1.358354-1.3401631.7732090.379780-0.5031981.8004990.7914610.247676-1.5146540.2076430.5249800.2479980.7716790.909412-0.689281-0.327642-0.139097-0.055353-0.059752378.66
3-0.966272-0.1852261.792993-0.863291-0.0103091.2472030.2376090.377436-1.387024-0.054952-0.208038-0.1083000.005274-0.190321-1.1755750.647376-0.2219290.0627230.061458123.50
4-1.1582330.8777371.5487180.403034-0.4071930.0959210.592941-0.2705330.8177390.7530740.408542-0.0094310.798278-0.1374580.141267-0.2060100.5022920.2194220.21515369.99
284802-11.88111810.071785-9.834783-2.066656-5.364473-2.606837-4.9182157.3053341.9144284.3561701.4758290.2134540.1118641.014480-0.5093481.4368070.2500340.9436510.8237310.77
284803-0.732789-0.0550802.035030-0.7385890.8682291.0584150.0243300.2948690.584800-0.9759260.0596160.2142050.9243840.012463-1.016226-0.606624-0.3952550.068472-0.05352724.79
2848041.919565-0.301254-3.249640-0.5578282.6305153.031260-0.2968270.7084170.432454-0.4847820.0013960.2320450.578229-0.0375010.6401340.265745-0.0873710.004455-0.02656167.88
284805-0.2404400.5304830.7025100.689799-0.3779610.623708-0.6861800.6791450.392087-0.3991260.1274340.2652450.800049-0.1632980.123205-0.5691590.5466680.1088210.10453310.00
284806-0.533413-0.1897330.703337-0.506271-0.012546-0.6496171.577006-0.4146500.486180-0.9154270.3829480.2610570.6430780.3767770.008797-0.473649-0.818267-0.0024150.013649217.00

284807 rows × 29 columns

Data Pre-processing

In [22]:

#Standardizing the Amount column (All other 'V' columns are already scaled as they've undergone PCA transformation).
from sklearn.preprocessing import StandardScaler
X_data['normalizedAmount'] = StandardScaler().fit_transform(X_data['Amount'].values.reshape(-1,1))  # Normalize 'Amount' in [-1,+1] range
X_data= X_data.drop(['Amount'],axis=1)

Splitting the Data into train and test set

In [23]:

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size = 0.2, random_state = 7)

In [24]:

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(227845, 29)
(56962, 29)
(227845,)
(56962,)

Model Building

Random Forest

In [25]:

from sklearn.ensemble import RandomForestClassifier

In [26]:

random_forest = RandomForestClassifier(n_estimators=100)

In [27]:

# Pandas Series.ravel() function returns the flattened underlying data as an ndarray.
random_forest.fit(X_train,y_train.values.ravel())    # np.ravel() Return a contiguous flattened array

Out[27]:

RandomForestClassifier()

In [28]:

y_pred = random_forest.predict(X_test)

In [29]:

random_forest.score(X_test,y_test)

Out[29]:

0.9995786664794073

In [30]:

def make_confusion_matrix(cf,
                          group_names=None,
                          categories='auto',
                          count=True,
                          percent=True,
                          cbar=True,
                          xyticks=True,
                          xyplotlabels=True,
                          sum_stats=True,
                          figsize=None,
                          cmap='Blues',
                          title=None):
    '''
    This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.
    Arguments
    '''


    # CODE TO GENERATE TEXT INSIDE EACH SQUARE
    blanks = ['' for i in range(cf.size)]

    if group_names and len(group_names)==cf.size:
        group_labels = ["{}\n".format(value) for value in group_names]
    else:
        group_labels = blanks

    if count:
        group_counts = ["{0:0.0f}\n".format(value) for value in cf.flatten()]
    else:
        group_counts = blanks

    if percent:
        group_percentages = ["{0:.2%}".format(value) for value in cf.flatten()/np.sum(cf)]
    else:
        group_percentages = blanks

    box_labels = [f"{v1}{v2}{v3}".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]
    box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])


    # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS
    if sum_stats:
        #Accuracy is sum of diagonal divided by total observations
        accuracy  = np.trace(cf) / float(np.sum(cf))

        #if it is a binary confusion matrix, show some more stats
        if len(cf)==2:
            #Metrics for Binary Confusion Matrices
            precision = cf[1,1] / sum(cf[:,1])
            recall    = cf[1,1] / sum(cf[1,:])
            f1_score  = 2*precision*recall / (precision + recall)
            stats_text = "\n\nAccuracy={:0.3f}\nPrecision={:0.3f}\nRecall={:0.3f}\nF1 Score={:0.3f}".format(
                accuracy,precision,recall,f1_score)
        else:
            stats_text = "\n\nAccuracy={:0.3f}".format(accuracy)
    else:
        stats_text = ""


    # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS
    if figsize==None:
        #Get default figure size if not set
        figsize = plt.rcParams.get('figure.figsize')

    if xyticks==False:
        #Do not show categories if xyticks is False
        categories=False


    # MAKE THE HEATMAP VISUALIZATION
    plt.figure(figsize=figsize)
    sns.heatmap(cf,annot=box_labels,fmt="",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)

    if xyplotlabels:
        plt.ylabel('True label')
        plt.xlabel('Predicted label' + stats_text)
    else:
        plt.xlabel(stats_text)
    
    if title:
        plt.title(title)

In [32]:

cm3=confusion_matrix(y_test, y_pred)
labels = ['True Negative','False Positive','False Negative','True Positive']
make_confusion_matrix(cm3, 
                      group_names=labels,
                      #categories=categories, 
                      cmap='Blues')

Model evaluation criterion

Model can make wrong predictions as:

  • Predicting a transaction is fraud and the transaction is not fraud
  • Predicting a transaction is not fraud and transaction is fraud

Which case is more important?

  • Predicting that transaction is not fraud but it is Fraud. It might enable lot of criminal activities and heavy loss to the bank

How to reduce this loss i.e need to reduce False Negative?

  • Company would want Recall to be maximized, greater the Recall higher the chances of minimizing false Negative. Hence, the focus should be on increasing Recall or minimizing the false Negative or in other words identifying the True Positive(i.e. Class 1) so that the Company can identify the fraud transaction.

Conclusion:

  • while only 5 regular transactions are wrongly predicted as fraudulent, the model only detects 81% of the fraudulent transactions. As a consequence 19 fraudulent transactions are not detected (False Negatives).
  • Let’s see if we can improve this performance with other machine learning / deep learning models in the rest of the notebook.

Let’s now explore Neural Network models

Deep neural network

Model-1

  • We will use a simple NN made of 5 fully-connected layers with ReLu activation. The NN takes a vector of length 29 as input. This represents the information related to each transactions, ie each line with 29 columns from the dataset. For each transaction, the final layer will output a probability distribution (sigmoid activation function) and classify either as not fraudulent (0) or fraudulent (1).
  • a dropout step is included to prevent overfitting.

Dropout

Dropout is a regularization technique for neural network models proposed by Srivastava, et al. in their 2014 paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly.

Creating a model

Keras model object can be created with Sequential class

At the outset, the model is empty per se. It is completed by adding additional layers and compilation

Adding layers [layers and activations]

Keras layers can be added to the model

Adding layers are like stacking lego blocks one by one

It should be noted that as this is a classification problem, sigmoid layer (softmax for multi-class problems) should be added

In [34]:

#initialize the model
model = Sequential()
# This adds the input layer (by specifying input dimension) AND the first hidden layer (units)
model.add(Dense(units=16, input_dim = 29,activation='relu'))   # input of 29 columns as shown above
# hidden layer
model.add(Dense(units=24,activation='relu'))
#Adding Dropout to prevent overfitting 
model.add(Dropout(0.5))
model.add(Dense(24,activation='relu'))
model.add(Dense(24,activation='relu'))
# Adding the output layer
# Notice that we do not need to specify input dim. 
# we have an output of 1 node, which is the the desired dimensions of our output (fraud or not)
# We use the sigmoid because we want probability outcomes
model.add(Dense(1,activation='sigmoid'))                        # binary classification fraudulent or not

Model compile [optimizers and loss functions]

Keras model should be “compiled” prior to training

Types of loss (function) and optimizer should be designated

In [35]:

# Create optimizer with default learning rate
# Compile the model
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

Let’s print the summary of the model

In [36]:

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 16)                480       
                                                                 
 dense_1 (Dense)             (None, 24)                408       
                                                                 
 dropout (Dropout)           (None, 24)                0         
                                                                 
 dense_2 (Dense)             (None, 24)                600       
                                                                 
 dense_3 (Dense)             (None, 24)                600       
                                                                 
 dense_4 (Dense)             (None, 1)                 25        
                                                                 
=================================================================
Total params: 2,113
Trainable params: 2,113
Non-trainable params: 0
_________________________________________________________________

Training [Forward pass and Backpropagation]

Training the model

In [37]:

#fitting the model
history=model.fit(X_train,y_train,batch_size=15,epochs=10,validation_split=0.2)
Epoch 1/10
12152/12152 [==============================] - 12s 926us/step - loss: 0.0103 - accuracy: 0.9985 - val_loss: 0.0026 - val_accuracy: 0.9995
Epoch 2/10
12152/12152 [==============================] - 11s 916us/step - loss: 0.0046 - accuracy: 0.9992 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 3/10
12152/12152 [==============================] - 11s 919us/step - loss: 0.0041 - accuracy: 0.9992 - val_loss: 0.0023 - val_accuracy: 0.9995
Epoch 4/10
12152/12152 [==============================] - 12s 1ms/step - loss: 0.0039 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 5/10
12152/12152 [==============================] - 12s 955us/step - loss: 0.0035 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 6/10
12152/12152 [==============================] - 11s 931us/step - loss: 0.0039 - accuracy: 0.9993 - val_loss: 0.0019 - val_accuracy: 0.9994
Epoch 7/10
12152/12152 [==============================] - 11s 940us/step - loss: 0.0037 - accuracy: 0.9994 - val_loss: 0.0022 - val_accuracy: 0.9993
Epoch 8/10
12152/12152 [==============================] - 12s 948us/step - loss: 0.0036 - accuracy: 0.9993 - val_loss: 0.0019 - val_accuracy: 0.9994
Epoch 9/10
12152/12152 [==============================] - 11s 943us/step - loss: 0.0033 - accuracy: 0.9993 - val_loss: 0.0023 - val_accuracy: 0.9994
Epoch 10/10
12152/12152 [==============================] - 11s 932us/step - loss: 0.0032 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995

Plotting the train and test loss

In [38]:

# Capturing learning history per epoch
hist  = pd.DataFrame(history.history)
hist['epoch'] = history.epoch

# Plotting accuracy at different epochs
plt.plot(hist['loss'])
plt.plot(hist['val_loss'])
plt.legend(("train" , "valid") , loc =0)

Out[38]:

<matplotlib.legend.Legend at 0x1a08a302130>

Evaluation

Keras model can be evaluated with evaluate() function

Evaluation results are contained in a list

In [39]:

score = model.evaluate(X_test, y_test)
1781/1781 [==============================] - 1s 528us/step - loss: 0.0034 - accuracy: 0.9995
  • The model achieves an accuracy of 99.95% ! Is this a good performance ?
  • Remember that our dataset is significantly composed of non fraudulent samples with only 172 fraudulent transactions per 100,000. Consequently, a model predicting every transaction as ‘non fraudulent’ would achieve 99.83% accuracy despite being unable to detect a single fraudulent case !

In [40]:

print(score)
[0.0033923292066901922, 0.9994909167289734]

Let’s Print confusion matrix

In [41]:

## Confusion Matrix on unsee test set
import seaborn as sn
y_pred1 = model.predict(X_test)
for i in range(len(y_test)):
    if y_pred1[i]>0.5:
        y_pred1[i]=1 
    else:
        y_pred1[i]=0



cm2=confusion_matrix(y_test, y_pred1)
labels = ['True Negative','False Positive','False Negative','True Positive']
#categories = [ 'Not_Fraud','Fraud']
make_confusion_matrix(cm2, 
                      group_names=labels,
                      #categories=categories, 
                      cmap='Blues')

Detection of fraudulent transactions did not improve compared to the previous machine learning model ( Randomforest).

  • There are 100 fraudulent transactions in the test data and yet 15 fraudulent transactions are not identified (false negative) which remains an issue. Our objective must be to detect as many fraudulent transactions as possible since these can have a huge negative impact.
  • 15 regular transactions are detected as potentially fraudulent by the model. These are false positive. This number is negligible.

Conclusion:

We must find ways to further reduce the number of false negative.

Model-2

Let’s try another architecture to get the better Recall

There are some basic Hyperparameters which can help to get the better model performance.

Early stopping:

During training, the model is evaluated on a holdout validation dataset after each epoch. If the performance of the model on the validation dataset starts to degrade or no improvement (e.g. loss begins to increase or accuracy begins to decrease), then the training process is stopped after the certian interations.The model at the time that training is stopped is then used and is known to have good generalization performance.

This procedure is called “early stopping” and is perhaps one of the oldest and most widely used forms of neural network regularization.

Weight Initialization

Weight initialization is an important consideration in the design of a neural network model.

The nodes in neural networks are composed of parameters referred to as weights used to calculate a weighted sum of the inputs.

Neural network models are fit using an optimization algorithm called stochastic gradient descent that incrementally changes the network weights to minimize a loss function, hopefully resulting in a set of weights for the mode that is capable of making useful predictions.

This optimization algorithm requires a starting point in the space of possible weight values from which to begin the optimization process. Weight initialization is a procedure to set the weights of a neural network to small random values that define the starting point for the optimization (learning or training) of the neural network model.

There are many WI techniques as follows:

1) Random normal initialization

2) Random Uniform initialization

3) Xaviour Initialization

4) He Initialization

In [48]:

#Training Multi-layer perceptron with 2 hidden layers

#adding earlystopping callback
es= keras.callbacks.EarlyStopping(monitor='val_loss',
                              min_delta=0,
                              patience=15,
 
                             verbose=0, mode='min', restore_best_weights= True)

n_inputs = X_train.shape[1]
Model2 = Sequential()
#Initializing the weights uisng hue_normal 
Model2.add(Dense(65, input_shape=(n_inputs, ), kernel_initializer='he_normal', activation='relu'))
Model2.add(Dropout(0.5))
Model2.add(Dense(65, kernel_initializer='he_normal', activation='relu'))
Model2.add(Dropout(0.5))
Model2.add(Dense(1, kernel_initializer='he_normal', activation='sigmoid'))

Model2.compile(optimizers.Adam(lr=0.001), loss='binary_crossentropy', metrics=['accuracy'])
    
his_mod2= Model2.fit(X_train, y_train, validation_split=0.2, batch_size=700, epochs=40, callbacks=[es], shuffle=True, verbose=1)
C:\Users\Sanjay Kumar CJ\anaconda_3\lib\site-packages\keras\optimizer_v2\adam.py:105: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  super(Adam, self).__init__(name, **kwargs)
Epoch 1/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0915 - accuracy: 0.9766 - val_loss: 0.0089 - val_accuracy: 0.9986
Epoch 2/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0143 - accuracy: 0.9984 - val_loss: 0.0040 - val_accuracy: 0.9994
Epoch 3/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0097 - accuracy: 0.9988 - val_loss: 0.0034 - val_accuracy: 0.9994
Epoch 4/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0078 - accuracy: 0.9988 - val_loss: 0.0032 - val_accuracy: 0.9994
Epoch 5/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0069 - accuracy: 0.9989 - val_loss: 0.0029 - val_accuracy: 0.9995
Epoch 6/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0062 - accuracy: 0.9989 - val_loss: 0.0027 - val_accuracy: 0.9994
Epoch 7/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0058 - accuracy: 0.9990 - val_loss: 0.0025 - val_accuracy: 0.9995
Epoch 8/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0052 - accuracy: 0.9991 - val_loss: 0.0025 - val_accuracy: 0.9995
Epoch 9/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0051 - accuracy: 0.9990 - val_loss: 0.0024 - val_accuracy: 0.9995
Epoch 10/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0051 - accuracy: 0.9991 - val_loss: 0.0023 - val_accuracy: 0.9995
Epoch 11/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0047 - accuracy: 0.9991 - val_loss: 0.0022 - val_accuracy: 0.9995
Epoch 12/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0044 - accuracy: 0.9992 - val_loss: 0.0022 - val_accuracy: 0.9995
Epoch 13/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0047 - accuracy: 0.9991 - val_loss: 0.0022 - val_accuracy: 0.9995
Epoch 14/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0044 - accuracy: 0.9992 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 15/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0044 - accuracy: 0.9992 - val_loss: 0.0022 - val_accuracy: 0.9995
Epoch 16/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0042 - accuracy: 0.9992 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 17/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0041 - accuracy: 0.9992 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 18/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0036 - accuracy: 0.9992 - val_loss: 0.0022 - val_accuracy: 0.9995
Epoch 19/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0038 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 20/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0038 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 21/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0038 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 22/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0035 - accuracy: 0.9993 - val_loss: 0.0022 - val_accuracy: 0.9995
Epoch 23/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0035 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 24/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 25/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0036 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 26/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 27/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0035 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 28/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0033 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 29/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0034 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 30/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0030 - accuracy: 0.9993 - val_loss: 0.0019 - val_accuracy: 0.9995
Epoch 31/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0032 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 32/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0032 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 33/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0031 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 34/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0032 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 35/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0031 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 36/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0030 - accuracy: 0.9993 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 37/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0031 - accuracy: 0.9993 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 38/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0029 - accuracy: 0.9994 - val_loss: 0.0020 - val_accuracy: 0.9995
Epoch 39/40
261/261 [==============================] - 1s 2ms/step - loss: 0.0028 - accuracy: 0.9994 - val_loss: 0.0021 - val_accuracy: 0.9995
Epoch 40/40
261/261 [==============================] - 1s 3ms/step - loss: 0.0029 - accuracy: 0.9994 - val_loss: 0.0021 - val_accuracy: 0.9995

Plotting the train and validation loss

In [49]:

# Capturing learning history per epoch
hist  = pd.DataFrame(his_mod2.history)
hist['epoch'] = his_mod2.epoch

# Plotting accuracy at different epochs
plt.plot(hist['loss'])
plt.plot(hist['val_loss'])
plt.legend(("train" , "valid") , loc =0)

Out[49]:

<matplotlib.legend.Legend at 0x1a08aa22a00>

Plotting confusion matrix

In [50]:

## Confusion Matrix on unsee test set
import seaborn as sn
y_pred1 = Model2.predict(X_test)
for i in range(len(y_test)):
    if y_pred1[i]>0.5:
        y_pred1[i]=1 
    else:
        y_pred1[i]=0



cm2=confusion_matrix(y_test, y_pred1)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Not_Fraud','Fraud']
make_confusion_matrix(cm2, 
                      group_names=labels,
                      #categories=categories, 
                      cmap='Blues')

Conclusion:

As you can see here the Recall of the model is not improved and it is worse than the Previous ANN model as well as the RandomForest but the precision is changed.

Let’s try weighted loss for imbalance dataset

Weighted loss to account for large class imbalance in train dataset

  • we will adjust the class imbalance by giving additional weight to the loss associated to errors made on fraudulent transaction detection.

We will use our first ANN model and apply weighted loss

Let’s review the process:

In [56]:

from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight('balanced', np.unique(y_train), np.array([y_train.iloc[i] for i in range(len(y_train))]))
class_weights = class_weight.compute_class_weight(class_weight = "balanced",classes = np.unique(y_train), y = y_train)
class_weights = dict(enumerate(class_weights))
class_weights

Out[56]:

{0: 0.5008617164864829, 1: 290.6186224489796}
  • The class ‘Fraudulent’ (y=1) is assigned a weight of 290 vs 0.5 for the class ‘not fraudulent’ due to the very low prevalence we detected during data exploration. This allows the model to give more importance to the errors made on fraudulent cases during training.

Training the model

In [57]:

model.fit(X_train,y_train,batch_size=15,epochs=5, class_weight=class_weights, shuffle=True)
Epoch 1/5
15190/15190 [==============================] - 30s 2ms/step - loss: 0.2691 - accuracy: 0.9814
Epoch 2/5
15190/15190 [==============================] - 28s 2ms/step - loss: 0.3214 - accuracy: 0.9781
Epoch 3/5
15190/15190 [==============================] - 27s 2ms/step - loss: 0.3466 - accuracy: 0.9738
Epoch 4/5
15190/15190 [==============================] - 27s 2ms/step - loss: 0.2964 - accuracy: 0.9612
Epoch 5/5
15190/15190 [==============================] - 29s 2ms/step - loss: 0.2773 - accuracy: 0.9669

Out[57]:

<keras.callbacks.History at 0x1a08b409940>

In [58]:

score_weighted = model.evaluate(X_test, y_test)
1781/1781 [==============================] - 2s 899us/step - loss: 0.0401 - accuracy: 0.9850

Plotting confusion matrix

In [59]:

## Confusion Matrix on unsee test set
import seaborn as sn
y_pred1 = model.predict(X_test)
for i in range(len(y_test)):
    if y_pred1[i]>0.5:
        y_pred1[i]=1 
    else:
        y_pred1[i]=0



cm2=confusion_matrix(y_test, y_pred1)
labels = ['True Negative','False Positive','False Negative','True Positive']
categories = [ 'Not_Fraud','Fraud']
make_confusion_matrix(cm2, 
                      group_names=labels,
                      #categories=categories, 
                      cmap='Blues')

Conclusion:

As you can see here the Recall is increased but the precision is very bad. There is still lot of scope of improvements as follows:

1) Threshold can be tuned to get the optimal value

2) Resampling techniques can be applied to balanced the data and then train the model

3) Hyperparameter tuning can be applied to tune the different Hyperparameters

We can select the Model-1 as our final model based on the above analysis

Leave a Reply

Your email address will not be published. Required fields are marked *