Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks

upper G Endscripts normal y Subscript italic g j Baseline ln left-parenthesis normal e Superscript normal upper X Super Subscript j Superscript normal upper B Super Subscript g Superscript Baseline slash left-parenthesis sigma-summation Underscript s equals 1 Overscript upper G Endscripts normal e Superscript normal upper X Super Subscript j Superscript normal upper B Super Subscript s Superscript Baseline right-parenthesis right-parenthesis 2nd Row equals sigma-summation Underscript normal j equals 1 Overscript normal upper N Endscripts left-bracket sigma-summation Underscript normal g equals 1 Overscript normal upper G Endscripts y Subscript italic g j Baseline normal upper X Subscript j Baseline normal upper B Subscript g Baseline minus ln left-parenthesis sigma-summation Underscript g equals 1 Overscript upper G Endscripts normal e Superscript normal upper X Super Subscript slash Superscript normal upper B Super Subscript g Superscript Baseline right-parenthesis right-bracket EndLayout"/>

Maximum likelihood estimates of the β’s are those values that maximize this log likelihood equation. This is accomplished by calculating the partial derivatives and setting them to zero. These equations are ∂L / ∂β_ik equals sigma-summation Underscript j equals 1 Overscript normal upper N Endscripts x Subscript italic k j Baseline left-parenthesis y Subscript italic i g Baseline minus Ð Subscript italic i g Baseline right-parenthesis for g = 1, 2, …, G and k = 1, 2, …, p. Since all coefficients are zero for g = 1, the effective range of g is from 2 to G.

Because of the nonlinear nature of the parameters, there is no closed‐form solution to these equations, and they must be solved iteratively. The Newton–Raphson [4–7] method is used to solve these equations. This method makes use of the information matrix, I(β), which is formed from the matrix of second partial derivatives.

The elements of the information matrix are given by

and partial-differential squared upper L slash partial-differential normal beta Subscript italic i k Baseline partial-differential normal beta Subscript i prime k Sub Superscript prime Subscript Baseline equals sigma-summation Underscript j equals 1 Overscript upper N Endscripts x Subscript italic k j Baseline x Subscript k prime j Baseline normal pi Subscript italic i g Baseline normal pi Subscript i prime g Baseline period

The information matrix is used because the asymptotic covariance matrix of the maximum likelihood estimates is equal to the inverse of the information matrix. That is, upper V left-parenthesis ModifyingAbove beta With ampersand c period circ semicolon right-parenthesis equals I (β)⁻¹. This covariance matrix is used in the calculation of confidence intervals for the regression coefficients, odds ratios, and predicted probabilities.

The interpretation of the estimated regression coefficients is not straightforward. In logistic regression, not only is the relationship between X and Y nonlinear, but also, if the dependent variable has more than two unique values, there are several regression equations. Consider the usual case of a binary dependent variable, Y, and a single independent variable, X. Assume that Y is coded so it takes on the values 0 and 1. In this case, the logistic regression equation is ln(p/(1 − p)) = β₀ + β₁ X. Now consider impact of a unit increase in X. The logistic regression equation becomes ln(p ′ /(1 − p′)) = β₀ + β₁(X + 1) = β₀ + β₁ X + β₁. We can isolate the slope by taking the difference between these two equations. We have

(2.9)

That is, β₁ is the log of the ratio of the odds at X + 1 and X. Removing the logarithm by exponentiating both sides gives e Superscript normal beta 1 Baseline equals italic o d d s Superscript prime Baseline slash italic odds . The regression coefficient β₁ is interpreted as the log of the odds ratio comparing the odds after a one unit increase in X to the original odds. Note that the interpretation of β1 depends on the particular value of X since the probability values, the p ′ s, will vary for different X.

Inferences about individual regression coefficients, groups of regression coefficients, goodness of fit, mean responses, and predictions of group membership of new observations are all of interest. These inference procedures can be treated by considering hypothesis tests and/or confidence intervals. The inference procedures in logistic regression rely on large sample sizes for accuracy. Two procedures are available for testing the significance of one or more independent variables in a logistic regression: likelihood ratio tests and Wald tests. Simulation studies usually show that the likelihood ratio test performs better than the Wald test. However, the Wald test is still used to test the significance of individual regression coefficients because of its ease of calculation.

The likelihood ratio test statistic is −2 times the difference between the log likelihoods of two models, one of which is a subset of the other. The likelihood ratio is defined as LR = −2[L_subset − L_full] = −2[ ln (l_subset/l_full)]. When the full model in the likelihood ratio test statistic is the saturated model, LR is referred to as the deviance. A saturated model is one that includes all possible terms (including interactions) so that the predicted values from the model equal the original data. The formula for the deviance is D = −2[L_Reduced − L_Saturated]. The deviance may be calculated directly using the formula for the deviance residuals:

(2.10) upper D equals 2 sigma-summation Underscript j equals 1 Overscript upper J Endscripts sigma-summation Underscript g equals 1 Overscript upper G Endscripts w Subscript normal g j Baseline ln left-parenthesis StartFraction w Subscript normal g j Baseline Over n Subscript j Baseline p Subscript italic g j Baseline EndFraction right-parenthesis

This expression may be used to calculate the log likelihood of the saturated model without actually fitting a saturated model. The formula is L_Saturated = L_Reduced + D/2.

The deviance in logistic regression is analogous to the residual sum of squares in multiple regression. In fact, when the deviance is calculated in multiple regression, it is equal to the sum of the squared residuals. Deviance residuals, to be discussed later, may be squared and summed as an alternative way to calculate the deviance D.

The change in deviance, ΔD, due to excluding (or including) one or more variables is used in logistic regression just as the partial F test is used in multiple regression. Many texts use the letter G to represent ΔD, but we have already used G to represent the number of groups in Y. Instead of using the F

Скачать книгу