Muography. Группа авторов
As it is presented in the first section, the weights of different input values are set by the learning procedure to perform better the required task.
In this study, the ANN model was constructed as follows. The input layer was built up from seven neurons that were fed with the daily average flux values. The output was a single neuron corresponding to probability of volcano eruption on the eighth day. Hidden layers were applied between the input and output layers. The ReLU was applied as activation functions for the input and hidden neurons. The sigmoid function was utilized to determine the probability of eruption. Dropout was applied for the input and hidden layers to avoid overfitting (Srivastava et al., 2014). Batch normalization was also applied before ReLU function (Ioffe & Szegedy, 2015). The weights of neurons were optimized with the Adam method, which is a gradient‐based optimization algorithm that determines adaptive learning rates for each parameter via calculation of lower order moments of the gradients (Kingma & Ba, 2015).
Figure 4.5 (a) Cross‐validation scores of support vector machine are plotted as a function of C and γ parameters of radial basis function kernel. (b) Receiver operating characteristic curve for support vector machine with C = 925.83 and γ = 1.75. The circle shows the optimal cutoff point of ROC curve.
Bayesian optimization was utilized for hyperparameter tuning of ANN with 500 epochs. Early patience callback was applied to evaluate the performance of ANN on the training data set after each epoch. The training was stopped after 50 epochs to avoid overfitting of the data. The AUC of ROC curve was calculated to extract the optimal hyperparameters. The optimal number of hidden layers was found to be 3. The optimal number of neurons on the 3 hidden layers was found to be 64, 265, 256, respectively. The optimal dropout ratio was found to be 0.281. The batch size was found to be 64. The learning rate and the exponential decay rate parameters of the Adam method were found to be 4.48 × 10–4 and 0.903, respectively. Fig. 4.6 shows the ROC curve that was extracted for the test data sets. The AUC score of fine‐tuned ANN just slightly exceeded the value of 0.5. The achieved results hint that the conventional models fed with the average muon flux values cannot accurately predict the impending eruptions of Sakurajima volcano. Although collection of further data is expected to avoid undertraining the ANN, significantly higher scores by ROC analysis are not expected because the ANN cannot extract the features of muographic images.
Figure 4.6 Receiver operating characteristic curve for neural network. The circle shows the optimal cutoff point of ROC curve.
4.4.3 Muographic Image Processing With Convolutional Neural Network
Application of a series of convolutional layers allows us to reveal the hidden features of images on a layer‐by‐layer basis, and the fully connected neuron layers can process the extracted features to predict the eruptions. The schematic drawing of the applied CNN model is shown in Fig. 4.7. The input convolutional layer was fed with seven consecutive daily muographic images extracted from the three investigated angular regions. The output layer was a single neuron with sigmoid activation that provided the probability of volcano eruption on the eighth day. The hidden layers consisted of convolutional layers and one fully connected layer. The filter size was set to 3×3 for each convolutional layer to scan the input muographic images. ReLU activation functions were applied after the input and hidden layers. The CNN was trained using the Adam method.
The hyperparameters of the CNN model were tuned with Bayesian optimization. The tuning was performed with the number of epochs of 100. The number of earlier patience call back was set to 10. The tuned hyperparameters and the range of their values were the following: number of convolutional layers (1–4), number of filters on the convolutional layers (22–28), number of neurons on the fully connected layer (22–28), dropout (0.2–0.7), batch size (21–27), learning rate (10–6–10–2) and decay rate (0.9–0.99). The optimal parameters were selected with the best AUC of ROC curves. Table 4.1 summarizes the optimal CNN hyperparameters, and lists the results of the ROC analysis (Fig. 4.8). The fine‐tuned CNN model achieved a moderate performance in forecasting of impending eruptions of Minamidake crater with the largest AUC score of 0.761 and highest sensitivity of 0.737. Although the CNN achieved highest specificity with images captured through the Surface region, the sensitivity was the smallest in this region, in which volcanological processes did not occur before the eruptions. It is worth noting that the CNN is considered a black‐box function, thus it is not known how the features are extracted, or what the features are. Furthermore, intuitive explanation cannot be given to the number of convolutional layers and other parameters. Interpretable ML is expected to overcome these limitations of CNN to explore the hidden features of muographic images (Rudin, 2019).
Figure 4.7 The schematic diagram of muographic image processing with convolutional neural network. The input of CNN is fed with seven daily muon flux images. The hidden features of the muographic images are extracted by a series of convolutional layers activated by ReLU. A fully connected (FC) layer of neurons is responsible for the classification of eruptions. The result of classification is extracted from a single output neuron activated by sigmoid function on eruption‐by‐eruption basis.
Table 4.1 The tuned hyperparameters of convolutional neural network and the results of test with receiver operating characteristic analyses. The “–” signs denote the missing values for number of filters due to the lack of convolutional layers.
Region | Minamidake | Showa | Surface |
---|---|---|---|
Convolutional Layers | 2 | 2 | 3 |
Filters on 1st Conv. Layer | 16 | 64 | 8 |
Filters on 2nd Conv. Layer | 64 | 32 | 8 |
Filters on 3rd Conv. Layer | – | – | 4 |
Neurons on FC Layer | 32 | 128 | 32 |
Dropout | 0.215 | 0.313 | 0.332 |
Batch Size | 16 | 8 | 32 |
Learning Rate |