Machine Learning Algorithms and Applications. Группа авторов
1. K-Means Clustering Outcomes: As explained in the methodology section, we applied K-means clustering to determine the classes via clusters for our unsupervised data. In order to find out the optimal number of clusters required, Silhouette coefficient was calculated. The Silhouette coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample where b is the distance between a sample and the nearest cluster that the sample is not a part of. The value of Silhouette coefficient for a sample is (b – a)/max (a, b). For our experiments, we kept it equal to 7 using the Elbow method. After clustering, the clustered data were assigned labels for air quality using the AQI table. The required range for different air control parameters is shown in Table 1.1.
We worked on six parameters, namely, NO2, O3, PM10, PM2.5, SO2, and CO. To build the LSTM model, we trained our model for 14 different places in India, namely, Visakhapatnam (GVMC Ram Nagar), Ajmer (Civil Lines), Alwar, Vasundhara (Ghaziabad), Gurgaon (Vikas Sadan), Bandra (Maharashtra), Bhiwadi Industrial Area, Bengaluru (BWSSB Kadabesanaha), Amritsar (Golden Temple), Anand Vihar, R K Puram, Punjabi Bagh, NSIT (Dwarka), and Sector 62 Noida. First of all, K-means clustering was applied.
2. SVM outcomes: The data values (1,870) were divided into training and testing sets. We took 80% for the training set and 20% for the testing set. The clustered data was trained on SVM against air quality so that air quality could be determined based on the values of all parameters. Sklearn library was used for it [14]. SVM was cross-validated using GridSearchCV (k = 10) technique. Results on 374 test samples could be seen in Table 1.2. Best parameter set found was {c: 0.1, gamma: 0.001, kernel: linear}.
3. LSTM outcomes: To build the LSTM model, we trained our model for 14 different places in India, namely, Visakhapatnam (GVMC Ram Nagar), Ajmer (Civil Lines), Alwar, Vasundhara (Ghaziabad), Gurgaon (Vikas Sadan), Bandra (Maharashtra), Bhiwadi Industrial Area, Bengaluru (BWSSB Kadabesanaha), Amritsar (Golden Temple), Anand Vihar, R K Puram, Punjabi Bagh, NSIT (Dwarka), and Sector 62 Noida. Five thousand samples were used for training and 500 samples for testing of each model.
Each model had different values for different parameters like kernel initializer, batch size, and epochs during hyper parameter tuning. We used Keras library in Python [15]. The performance was evaluated with two metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE). Table 1.3 shows the MAE and RMSE values received. MAE is calculated by (∑|y − x|)/n, and RMSE is calculated by √(∑y − x)2/n where y is predicted value and x is actual value.
Figure 1.4 shows the prediction values for Bengaluru City at present hour as well as for 2 days 3 hours after 13th December, 2017. Figure 1.5 shows the prediction values for 2 days 3 hours after 6th June, 2020. We observed that on an average Bengaluru is a cleaner city as compared to other cities even during November and December. It was realized that it could have been due to rainy weather. Bengaluru gets rain almost every day and due to which the majority of air pollutants get washed down thus resulting into reduced air pollution.
Figure 1.6 shows the predicted values at present hour and for future one day 3 hours for Anand Vihar, New Delhi, after 13th December, 2017. New Delhi suffers from heavy pollution and therefore the quality of observed air was very poor. PM2.5 level remains high, making the air not only toxic but also prone to causing breathing problems. We have also generated advisory for the users of the app. Figure 1.7 shows the predicted values for 1 day and 3 hours for Anand Vihar, New Delhi, after 6th June, 2020. It could clearly be seen that pollution levels have drastically reduced and air quality has also become better due to imposed lockdown as there is less traffic and industrial waste emissions.
The experiments were performed for batch sizes of 10, 24, 15, 8, and 6 with epochs of 10 and 100. The MAE Scores for LSTM Hyper Parameters for NO2, O3, PM10, PM2.5, and SO2 are shown in (Table 1.4), and after careful analysis of the LSTM Hyper Parameter scores, we zeroed in on the batch size with minimum bias.
4. Data Visualization: One of the main objectives of the project was to provide better visualizations to the normal people who are not able to interpret the relations between different values of the air pollutants. We therefore generated the Heat Maps of different parameters. Individual Heat Maps for the parameters as well as combined Heat Maps for the parameters have been provided.
Figure 1.8 shows the Heat Map for Ozone gas O3 for 12th and 13th December, 2017. From the map, we could observe that O3 suffers maximum fluctuations between day and night intervals. O3 levels reduce at midnight and are very high on 13th December evening time. This could be due to heavy vehicular traffic during evening hours. Figure 1.9 shows Heat Map for O3 for 6th to 8th June, 2020 which clearly shows reduction in O3 levels during less vehicular traffic and reduced industrial emissions.
Figure 1.10 shows the Heat Map for all the parameters for the days 11th, 12th, and 13th December, 2017, at Sector 62, Noida. From the Heat Maps it could be observed that PM2.5 is the main pollution causing parameter in the Air. It could also be observed that it remains at dangerous levels on all days and during Days as well as Nights. Figure 1.11 shows the Heat Map for all the parameters for the days 6th, 7th and 8th June, 2020 at Sector 62, Noida. The reduced levels of all pollutants could clearly be seen from the Heat Map as a result of imposed lockdown. However, PM2.5 still remains the top contributing factor toward pollution in the area.
Figure 1.12 shows the predicted values of O3 for Anand Vihar, New Delhi in December, 2017, and decline in O3 levels can be observed. Figure 1.13 shows the predicted values of PM10 for Sector 62, Noida in June, 2020, and decline in levels could be observed.
The quality of air as shown in