Rank-Based Methods for Shrinkage and Selection. A. K. Md. Ehsanes Saleh

Rank-Based Methods for Shrinkage and Selection

1Figure 1.1 Four plots using different versions of the telephone data set with fitted lines.Figure 1.2 Histograms and ordered residual plots of LS and Theil estimators.Figure 1.3 Effect of a single outlier on LS and rank estimators.Figure 1.4 Gradients of absolute value (Bn′(θ)) and dispersion (Dn′(θ)) functions.Figure 1.5 Scoring functions ϕ(u)=12(u−0.5) and ϕ+(u)=3u.Figure 1.6 Dispersion functions and derivative plots for 1.1(d).Figure 1.7 Key shrinkage characteristics of LASSO and ridge.Figure 1.8 Geometric interpretation of ridge.Figure 1.9 Geometric interpretation of LASSO.

2 Chapter 2Figure 2.1 The first-order nature of shrinkage due to ridge.Figure 2.2 Two outliers found in the Q–Q plot for the Swiss data set.Figure 2.3 Sampling distributions of rank estimates.Figure 2.4 Shrinkage of β₅ due to increase in ridge tuning parameter, λ₂.Figure 2.5 Ridge traces for orthonormal, diagonal, LS, and rank estimators (m = 40).Figure 2.6 MSE Derivative plot to find optimal λ₂ for the diagonal case.Figure 2.7 Bias, variance and MSE for the Swiss data set...Figure 2.8 MSE for training, CV and test sets, and coefficients from the ridge trace.Figure 2.9 The first-order nature of shrinkage due to LASSO.Figure 2.10 Diamond-warping effect of weights in the aLASSO estimator for p = 2.Figure 2.11 Comparison of LASSO and aLASSO traces for the Swiss data set.Figure 2.12 Variable ordering from R-LASSO and R-aLASSO traces for the Swiss data set.Figure 2.13 Ranked residuals of the diabetes data set. (Source: Rfit() package in R.)Figure 2.14 Rank-aLASSO trace of the diabetes data set showing variable importance.Figure 2.15 Diabetes data set showing variable ordering and adjusted R² plot.Figure 2.16 Rank-aLASSO cleaning followed by rank-ridge estimation.Figure 2.17 R-ridge traces and CV scheme with optimal λ₂.Figure 2.18 MSE and MAE plots for five-fold CV scheme producing similar optimal λ₂.Figure 2.19 LS-Enet traces for α = 0.0, 0.2, 0.4, 0.8, 1.0.Figure 2.20 LS-Enet traces and five-fold CV results for α = 0.6 from glmnet().

3 Chapter 3Figure 3.1 Key shrinkage R-estimators to be considered.Figure 3.2 The ADRE of the shrinkage R-estimator using the optimal c and URE.Figure 3.3 The ADRE of the preliminary test (or hard threshold) R-estimator for different Δ² based on λ*=2ln(2).Figure 3.4 The ADRE of nEnet R-estimators.Figure 3.5 Figure of the ADRE of all R-estimators for different Δ².

4 Chapter 4Figure 4.1 Boxplot and Q–Q plot using ANOVA table data.Figure 4.2 LS-ridge and ridge R traces for fertilizer problem from ANOVA table data.Figure 4.3 LS-LASSO and LASSOR traces for the fertilizer problem from the ANOVA table data.Figure 4.4 Effect of variance on shrinkage using ridge and LASSO traces.Figure 4.5 Hard threshold and positive-rule Stein–Saleh traces for ANOVA table data.

5 Chapter 8Figure 8.1 Left: the qq-plot for the diabates data sets; Right: the distribution of the residuals.

6 Chapter 11Figure 11.1 Sigmoid function.Figure 11.2 Outlier in the context of logistic regression.Figure 11.3 LLR vs. RLR with one outlier.Figure 11.4 LLR vs. RLR with no outliers.Figure 11.5 LLR vs. RLR with two outliers.Figure 11.6 Binary classification – nonlinear decision boundary.Figure 11.7 Binary classification comparison – nonlinear boundary.Figure 11.8 Ridge comparison of number of correct solutions with n = 337.Figure 11.9 LLR-ridge regularization showing the shrinking decision boundary.Figure 11.10 LLR, RLR and SVM on the circular data set with mixed outliers.Figure 11.11 Histogram of passengers: (a) age and (b) fare.Figure 11.12 Histogram of residuals associated with the null, LLR, RLR, and SVM cases for the Titanic data set. SVM probabilities were extracted from the sklearn.svm package.Figure 11.13 RLR-ridge trace for Titanic data set.Figure 11.14 RLR-LASSO trace for the Titanic data set.Figure 11.15 RLR-aLASSO trace for the Titanic data set.

7 Chapter 12Figure 12.1 Computational unit (neuron) for neural networks.Figure 12.2 Sigmoid and relu activation functions.Figure 12.3 Four-layer neural network.Figure 12.4 Neural network example of back propagation.Figure 12.5 Forward propagation matrix and vector operations.Figure 12.6 ROC curve and random guess classifier line based on the RLR classifier on the Titanic data...Figure 12.7 Neural network architecture for the circular data set.Figure 12.8 LNNs and RNNs on the circular data set (n = 337) with nonlinear decision boundaries.Figure 12.9 Convergence plots for LNNs and RNNs for the circular data set.Figure 12.10 ROC plots for LNNs and RNNs for the circular data set.Figure 12.11 Typical setup for supervised learning methods. The training set is used to build the model.Figure 12.12 Examples from test data set with cat = 1, dog = 0.Figure 12.13 Unrolling of an RGB image into a single vector.Figure 12.14 Effect of over-fitting, under-fitting and regularization.Figure 12.15 Convergence plots for LLN and RNNs (test size = 35).Figure 12.16 ROC plots for LLN and RNNs (test size = 35).Figure 12.17 Ten representative images from the MNIST data set.Figure 12.18 LNN and RNN convergence traces – loss vs. iterations (Χ100).Figure 12.19 Residue histograms for LNNs (0 outliers) and RNNs (50 outliers).Figure 12.20 These are 49 potential outlier images reported by RNNs.Figure 12.21 LNN (0 outliers) and RNN (144 outliers) residue histograms.Figure 12.22 LNN and RNN confusion matrices and MCC scores. 418

List of Tables

1 Chapter 1Table 1.1 Comparison of mean and median on three data sets.Table 1.2 Examples comparing order and rank statistics.Table 1.3 Belgium telephone data set.Table 1.4 Comparison of LS and Theil estimations...Table 1.5 Walsh averages for the set {0.1, 1.2, 2.3, 3.4, 4.5, 5.0, 6.6, 7.7, 8.8, 9.9, 10.5}.Table 1.6 The individual terms that are summed in Dn(β) and Ln(β) for the telephone data set.Table 1.7 The terms that are summed in Dn(θ) and Ln(θ) for the telephone data set.Table 1.8 The LS and R estimations of slope and intercept...Table 1.9 Interpretation of L₁/L₂ loss and penalty functions

2 Chapter 2Table 2.1 Swiss fertility data set.Table 2.2 Swiss fertility data set definitions.Table 2.3 Swiss fertility estimates and standard errors for least squares (LS) and rank (R).Table 2.4 Swiss data subset ordering using | t.value |Table 2.5 Swiss data models with adjusted R² values.Table 2.6 Estimates with outliers from diabetes data before standardization.Table 2.7 Estimates. MSE and MAE for the diabetes dataTable 2.8 Enet estimates, training MSE and test MSE as a function of α for the diabetes data

3 Chapter 3Table 3.1 The ADRE values of ridge for different values of Δ²Table 3.2 Maximum and minimum guaranteed ADRE of the preliminary test R-estimator for different values of α.Table 3.3 The ADRE values of the Saleh-type R-estimator for λmax*=2π and different Δ²Table 3.4 The ADRE values of the positive-rule Saleh-type R-estimator for λmax*=2π and different Δ²Table 3.5 The ADRE of all R-estimators for different Δ²

4 Chapter 4Table 4.1 Table of (hypothetical) corn crop yield from six different fertilizers.Table 4.2 Table of p-values from pairwise comparisons of fertilizers.

5 Chapter 8Table 8.1 The VIF values of the diabetes data set.Table 8.2 Estimations for the diabetes data^*. (The numbers in parentheses are the corresponding standard errors).

6 Chapter 11Table 11.1 LLR algorithm.Table 11.2 RLR algorithm.Table 11.3 Car data set.Table 11.4 Ridge accuracy vs. λ₂ with n = 337 (six outliers).Table 11.5 RLR-LASSO estimates vs. λ₁ with number of correct predictions.Table 11.6 Sample of Titanic training data.Table 11.7 Specifications for the Titanic data set.Table 11.8 Number of actual data entries in each column.Table 11.9 Cross-tabulation of survivors based on sex.Table 11.10 Cross-tabulation using Embarked for the Titanic data set.Table 11.11 Sample of Titanic numerical training data.Table 11.12 Number of correct predictions for Titanic training and test sets.Table 11.13 Train/test set accuracy for LLR-ridge. Optimal value at (*).Table 11.14 Train/test set accuracy for RLR-ridge. Optimal value at (*).Table 11.15 Train/Test set accuracy for LLR-LASSO. Optimal value at (*).Table 11.16 Train/test set accuracy for RLR-LASSO. Optimal value at (*).

7 Chapter

Скачать книгу