Deep Learning Approaches for Security Threats in IoT Environments
An expert discussion of the application of deep learning methods in the IoT security environment
In Deep Learning Approaches for Security Threats in IoT Environments, a team of distinguished cybersecurity educators deliver an insightful and robust exploration of how to approach and measure the security of Internet-of-Things (IoT) systems and networks. In this book, readers will examine critical concepts in artificial intelligence (AI) and IoT, and apply effective strategies to help secure and protect IoT networks. The authors discuss supervised, semi-supervised, and unsupervised deep learning techniques, as well as reinforcement and federated learning methods for privacy preservation.
This book applies deep learning approaches to IoT networks and solves the security problems that professionals frequently encounter when working in the field of IoT, as well as providing ways in which smart devices can solve cybersecurity issues.
Readers will also get access to a companion website with PowerPoint presentations, links to supporting videos, and additional resources. They'll also find:
* A thorough introduction to artificial intelligence and the Internet of Things, including key concepts like deep learning, security, and privacy
* Comprehensive discussions of the architectures, protocols, and standards that form the foundation of deep learning for securing modern IoT systems and networks
* In-depth examinations of the architectural design of cloud, fog, and edge computing networks
* Fulsome presentations of the security requirements, threats, and countermeasures relevant to IoT networks
Perfect for professionals working in the AI, cybersecurity, and IoT industries, Deep Learning Approaches for Security Threats in IoT Environments will also earn a place in the libraries of undergraduate and graduate students studying deep learning, cybersecurity, privacy preservation, and the security of IoT networks.
Mohamed Abdel-Basset, PhD, is an Associate Professor in the Faculty of Computers and Informatics at Zagazig University, Egypt. He is a Senior Member of the IEEE.
Nour Moustafa, PhD, is a Postgraduate Discipline Coordinator (Cyber) and Senior Lecturer in Cybersecurity and Computing at the School of Engineering and Information Technology at the University of New South Wales, UNSW Canberra, Australia.
Hossam Hawash is an Assistant Lecturer in the Department of Computer Science, Faculty of Computers and Informatics at Zagazig University, Egypt.
About the Authors xv
1 Introducing Deep Learning for IoT Security 1
1.1 Introduction 1
1.2 Internet of Things (IoT) Architecture 1
1.2.1 Physical Layer 3
1.2.2 Network Layer 4
1.2.3 Application Layer 5
1.3 Internet of Things' Vulnerabilities and Attacks 6
1.3.1 Passive Attacks 6
1.3.2 Active Attacks 7
1.4 Artificial Intelligence 11
1.5 Deep Learning 14
1.6 Taxonomy of Deep Learning Models 15
1.6.1 Supervision Criterion 15
1.6.1.1 Supervised Deep Learning 15
1.6.1.2 Unsupervised Deep Learning 17
1.6.1.3 Semi-Supervised Deep Learning 18
1.6.1.4 Deep Reinforcement Learning 19
1.6.2 Incrementality Criterion 19
1.6.2.1 Batch Learning 20
1.6.2.2 Online Learning 21
1.6.3 Generalization Criterion 21
1.6.3.1 Model-Based Learning 22
1.6.3.2 Instance-Based Learning 22
1.6.4 Centralization Criterion 22
1.7 Supplementary Materials 25
References 25
2 Deep Neural Networks 27
2.1 Introduction 27
2.2 From Biological Neurons to Artificial Neurons 28
2.2.1 Biological Neurons 28
2.2.2 Artificial Neurons 30
2.3 Artificial Neural Network 31
2.3.1 Input Layer 34
2.3.2 Hidden Layer 34
2.3.3 Output Layer 34
2.4 Activation Functions 35
2.4.1 Types of Activation 35
2.4.1.1 Binary Step Function 35
2.4.1.2 Linear Activation Function 36
2.4.1.3 Nonlinear Activation Functions 36
2.5 The Learning Process of ANN 40
2.5.1 Forward Propagation 41
2.5.2 Backpropagation (Gradient Descent) 42
2.6 Loss Functions 49
2.6.1 Regression Loss Functions 49
2.6.1.1 Mean Absolute Error (MAE) Loss 50
2.6.1.2 Mean Squared Error (MSE) Loss 50
2.6.1.3 Huber Loss 50
2.6.1.4 Mean Bias Error (MBE) Loss 51
2.6.1.5 Mean Squared Logarithmic Error (MSLE) 51
2.6.2 Classification Loss Functions 52
2.6.2.1 Binary Cross Entropy (BCE) Loss 52
2.6.2.2 Categorical Cross Entropy (CCE) Loss 52
2.6.2.3 Hinge Loss 53
2.6.2.4 Kullback-Leibler Divergence (KL) Loss 53
2.7 Supplementary Materials 53
References 54
3 Training Deep Neural Networks 55
3.1 Introduction 55
3.2 Gradient Descent Revisited 56
3.2.1 Gradient Descent 56
3.2.2 Stochastic Gradient Descent 57
3.2.3 Mini-batch Gradient Descent 59
3.3 Gradient Vanishing and Explosion 60
3.4 Gradient Clipping 61
3.5 Parameter Initialization 62
3.5.1 Zero Initialization 62
3.5.2 Random Initialization 63
3.5.3 Lecun Initialization 65
3.5.4 Xavier Initialization 65
3.5.5 Kaiming (He) Initialization 66
3.6 Faster Optimizers 67
3.6.1 Momentum Optimization 67
3.6.2 Nesterov Accelerated Gradient 69
3.6.3 AdaGrad 69
3.6.4 RMSProp 70
3.6.5 Adam Optimizer 70
3.7 Model Training Issues 71
3.7.1 Bias 72
3.7.2 Variance 72
3.7.3 Overfitting Issues 72
3.7.4 Underfitting Issues 73
3.7.5 Model Capacity 74
3.8 Supplementary Materials 74
References 75
4 Evaluating Deep Neural Networks 77
4.1 Introduction 77
4.2 Validation Dataset 78
4.3 Regularization Methods 79
4.3.1 Early Stopping 79
4.3.2 L1 and L2 Regularization 80
4.3.3 Dropout 81
4.3.4 Max-Norm Regularization 82
4.3.5 Data Augmentation 82
4.4 Cross-Validation 83
4.4.1 Hold-Out Cross-Validation 84
4.4.2 k-Folds Cross-Validation 85
4.4.3 Stratified k-Folds' Cross-Validation 86
4.4.4 Repeated k-Folds' Cross-Validation 87
4.4.5 Leave-One-Out Cross-Validation 88
4.4.6 Leave-p-Out Cross-Validation 89
4.4.7 Time Series Cross-Validation 90
4.4.8 Rolling Cross-Validation 90
4.4.9 Block Cross-Validation 90
4.5 Performance Metrics 92
4.5.1 Regression Metrics 92
4.5.1.1 Mean Absolute Error (MAE) 92
4.5.1.2 Root Mean Squared Error (RMSE) 93
4.5.1.3 Coefficient of Determination (R2) 93
4.5.1.4 Adjusted R2 94
4.5.2 Classification Metrics 94
4.5.2.1 Confusion Matrix 94
4.5.2.2 Accuracy 96
4.5.2.3 Precision 96
4.5.2.4 Recall 97
4.5.2.5 Precision-Recall Curve 97
4.5.2.6 F1-Score 97
4.5.2.7 Beta F1 Score 98
4.5.2.8 False Positive Rate (FPR) 98
4.5.2.9 Specificity 99
4.5.2.10 Receiving Operating Characteristics (ROC) Curve 99
4.6 Supplementary Materials 99
References 100
5 Convolutional Neural Networks 103
5.1 Introduction 103
5.2 Shift from Full Connected to Convolutional 104
5.3 Basic Architecture 106
5.3.1 The Cross-Correlation Operation 106
5.3.2 Convolution Operation 107
5.3.3 Receptive Field 108
5.3.4 Padding and Stride 109
5.3.4.1 Padding 109
5.3.4.2 Stride 111
5.4 Multiple Channels 113
5.4.1 Multi-Channel Inputs 113
5.4.2 Multi-Channel Output 114
5.4.3 Convolutional Kernel 1 × 1 115
5.5 Pooling Layers 116
5.5.1 Max Pooling 117
5.5.2 Average Pooling 117
5.6 Normalization Layers 119
5.6.1 Batch Normalization 119
5.6.2 Layer Normalization 122
5.6.3 Instance Normalization 124
5.6.4 Group Normalization 126
5.6.5 Weight Normalization 126
5.7 Convolutional Neural Networks (LeNet) 127
5.8 Case Studies 129
5.8.1 Handwritten Digit Classification (One Channel Input) 129
5.8.2 Dog vs. Cat Image Classification (Multi-Channel Input) 130
5.9 Supplementary Materials 130
References 130
6 Dive Into Convolutional Neural Networks 133
6.1 Introduction 133
6.2 One-Dimensional Convolutional Network 134
6.2.1 One-Dimensional Convolution 134
6.2.2 One-Dimensional Pooling 135
6.3 Three-Dimensional Convolutional Network 136
6.3.1 Three-Dimensional Convolution 136
6.3.2 Three-Dimensional Pooling 136
6.4 Transposed Convolution Layer 137
6.5 Atrous/Dilated Convolution 144
6.6 Separable Convolutions 145
6.6.1 Spatially Separable Convolutions 146
6.6.2 Depth-wise Separable (DS) Convolutions 148
6.7 Grouped Convolution 150
6.8 Shuffled Grouped Convolution 152
6.9 Supplementary Materials 154
References 154
7 Advanced Convolutional Neural Network 157
7.1 Introduction 157
7.2 AlexNet 158
7.3 Block-wise Convolutional Network (VGG) 159
7.4 Network in Network 160
7.5 Inception Networks 162
7.5.1 GoogLeNet 163
7.5.2 Inception Network v2 (Inception v2) 166
7.5.3 Inception Network v3 (Inception v3) 170
7.6 Residual Convolutional Networks 170
7.7 Dense Convolutional Networks 173
7.8 Temporal Convolutional Network 176
7.8.1 One-Dimensional Convolutional Network 177
7.8.2 Causal and Dilated Convolution 180
7.8.3 Residual Blocks 185
7.9 Supplementary Materials 188
References 188
8 Introducing Recurrent Neural Networks 189
8.1 Introduction 189
8.2 Recurrent Neural Networks 190
8.2.1 Recurrent Neurons 190
8.2.2 Memory Cell 192
8.2.3 Recurrent Neural Network 193
8.3 Different Categories of RNNs 194
8.3.1 One-to-One RNN 195
8.3.2 One-to-Many RNN 195
8.3.3 Many-to-One RNN 196
8.3.4 Many-to-Many RNN 197
8.4 Backpropagation Through Time 198
8.5 Challenges Facing Simple RNNs 202
8.5.1 Vanishing Gradient 202
8.5.2 Exploding Gradient 204
8.5.2.1 Truncated Backpropagation Through Time (TBPTT) 204
8.5.2.2 Penalty on the Recurrent Weights Whh205
8.5.2.3 Clipping Gradients 205
8.6 Case Study: Malware Detection 205
8.7 Supplementary Material 206
References 207
9 Dive Into Recurrent Neural Networks 209
9.1 Introduction 209
9.2 Long Short-Term Memory (LSTM) 210
9.2.1 LSTM Gates 211
9.2.2 Candidate Memory Cells 213
9.2.3 Memory Cell 214
9.2.4 Hidden State 216
9.3 LSTM with Peephole Connections 217
9.4 Gated Recurrent Units (GRU) 218
9.4.1 CRU Cell Gates 218
9.4.2 Candidate State 220
9.4.3 Hidden State 221
9.5 ConvLSTM 222
9.6 Unidirectional vs. Bidirectional Recurrent Network 223
9.7 Deep Recurrent Network 226
9.8 Insights 227
9.9 Case Study of Malware Detection 228
9.10 Supplementary Materials 229
References 229
10 Attention Neural Networks 231
10.1 Introduction 231
10.2 From Biological to Computerized Attention 232
10.2.1 Biological Attention 232
10.2.2 Queries, Keys, and Values 234
10.3 Attention Pooling: Nadaraya-Watson Kernel Regression 235
10.4 Attention-Scoring Functions 237
10.4.1 Masked Softmax Operation 239
10.4.2 Additive Attention (AA) 239
10.4.3 Scaled Dot-Product Attention 240
10.5 Multi-Head Attention (MHA) 240
10.6 Self-Attention Mechanism 242
10.6.1 Self-Attention (SA) Mechanism 242
10.6.2 Positional Encoding 244
10.7 Transformer Network 244
10.8 Supplementary Materials 247
References 247
11 Autoencoder Networks 249
11.1 Introduction 249
11.2 Introducing Autoencoders 250
11.2.1 Definition of Autoencoder 250
11.2.2 Structural Design 253
11.3 Convolutional Autoencoder 256
11.4 Denoising Autoencoder 258
11.5 Sparse Autoencoders 260
11.6 Contractive Autoencoders 262
11.7 Variational Autoencoders 263
11.8 Case Study 268
11.9 Supplementary Materials 269
References 269
12 Generative Adversarial Networks (GANs) 271
12.1 Introduction 271
12.2 Foundation of Generative Adversarial Network 272
12.3 Deep Convolutional GAN 279
12.4 Conditional GAN 281
12.5 Supplementary Materials 285
References 285
13 Dive Into Generative Adversarial Networks 287
13.1 Introduction 287
13.2 Wasserstein GAN 288
13.2.1 Distance Functions 289
13.2.2 Distance Function in GANs 291
13.2.3 Wasserstein Loss 293
13.3 Least-Squares GAN (LSGAN) 298
13.4 Auxiliary Classifier GAN (ACGAN) 300
13.5 Supplementary Materials 301
References 301
14 Disentangled Representation GANs 303
14.1 Introduction 303
14.2 Disentangled Representations 304
14.3 InfoGAN 306
14.4 StackedGAN 309
14.5 Supplementary Materials 316
References 316
15 Introducing Federated Learning for Internet of Things (IoT) 317
15.1 Introduction 317
15.2 Federated Learning in the Internet of Things 319
15.3 Taxonomic View of Federated Learning 322
15.3.1 Network Structure 322
15.3.1.1 Centralized Federated Learning 322
15.3.1.2 Decentralized Federated Learning 323
15.3.1.3 Hierarchical Federated Learning 324
15.3.2 Data Partition 325
15.3.3 Horizontal Federated Learning 326
15.3.4 Vertical Federated Learning 327
15.3.5 Federated Transfer Learning 328
15.4 Open-Source Frameworks 330
15.4.1 TensorFlow Federated 330
15.4.2 PySyft and PyGrid 331
15.4.3 FedML 331
15.4.4 LEAF 332
15.4.5 PaddleFL 332
15.4.6 Federated AI Technology Enabler (FATE) 333
15.4.7 OpenFL 333
15.4.8 IBM Federated Learning 333
15.4.9 NVIDIA Federated Learning Application Runtime Environment (NVIDIA FLARE) 334
15.4.10 Flower 334
15.4.11 Sherpa.ai 335
15.5 Supplementary Materials 335
References 335
16 Privacy-Preserved Federated Learning 337
16.1 Introduction 337
16.2 Statistical Challenges in Federated Learning 338
16.2.1 Nonindependent and Identically Distributed (Non-IID) Data 338
16.2.1.1 Class Imbalance 338
16.2.1.2 Distribution Imbalance 341
16.2.1.3 Size Imbalance 346
16.2.2 Model Heterogeneity 346
16.2.2.1 Extracting the Essence of a Subject 346
16.2.3 Block Cycles 348
16.3 Security Challenge in Federated Learning 348
16.3.1 Untargeted Attacks 349
16.3.2 Targeted Attacks 349
16.4 Privacy Challenges in Federated Learning 350
16.4.1 Secure Aggregation 351
16.4.1.1 Homomorphic Encryption (HE) 351
16.4.1.2 Secure Multiparty Computation 352
16.4.1.3 Blockchain 352
16.4.2 Perturbation Method 353
16.5 Supplementary Materials 355
References 355
Index 357