Sunday, August 25, 2019

Easy Binary Logistic Regression Interpretation in SPSS

What is binary logistic regression?
Binary logistic regression belongs to the family of logistic regression analysis wherein the dependent or outcome variable is binary or categorical in nature and one or more nominal, ordinal, interval or ratio-level independent variables. Like all linear regressions, logistic regression is a predictive analysis.  
Binary logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more continuous-level (interval or ratio scale) independent variables. In binary logistic regression,  the log of odds of the dependent variable is modeled as a linear combination of the independent variables. Log odds are an alternate way of expressing probabilities, which simplifies the process of updating them with new evidence. 

 alt="binary logistic regression interpretation of results for non statisticians"

What is the primary assumption when using binary logistic regression?
The dependent variable in binary logistic regression should be binary or dichotomous in nature. Example of these includes pass/failed, alive/dead, male/female, yes/no, approved/disapproved, with/without.

What are the kinds of studies in using the binary logistic regression?
*Predict the likelihood of the law graduates to pass the bar exam or not?
*Whether or not the teachers will use the new instructional media or not?
*Predict the likelihood of the divorce bill in the Philippines be approved or not?

Are there assumptions in logistic regression?
Logistic regression is a non-parametric statistics approach. Like parametric statistics, binary logistic regression requires to fulfill certain assumptions before employing it. The assumptions for logistic regression are the following:
1. Does not require a linear relationship between the dependent and independent variables.
2. The error terms (residuals) do not need to be normally distributed.
3. Binary logistic regression requires a dependent variable to be binary and ordinal logistic regression requires the dependent variable to be ordinal.
4. Logistic regression requires observations to be independent of each other.
5. Logistic regression requires there to be little or no multicollinearity among the independent variables. 
6. Finally, logistic regression typically requires a large sample size.  A general guideline is that you need a minimum of 10 cases with the least frequent outcome for each independent variable in your model. For example, if you have 5 independent variables and the expected probability of your least frequent outcome is .10, then you would need a minimum sample size of 500 (10*5 / .10).

Can I use Linear Regression when the dependent variable is binary?
A simple or multiple linear regression is a violation if you will carry out statistical analysis with a dependent variable that is binary in nature because the dependent variable is binary and violates the assumption of normality.

What is the formula for binary logistic regression?
Binary logistic regression is similar to ordinary least square. The prediction equation is:

log(p/1-p) = b0 + b1*x1 + b2*x2 + b3*x3 + b3*x3+b4*x4

What are the steps in interpreting the Binary Logistic Regression result in an easy way?

Here are the steps in interpreting the results of binary logistic regression using SPSS.

Case Processing Summary
Unweighted Casesa
N
Percent
 Selected Cases
Included in Analysis
100
100.0
Missing Cases
0
.0
Total
100
100.0
Unselected Cases
0
.0
Total
100
100.0
a. If weight is in effect, see classification table for the total number of cases.

The first table above shows a breakdown of the number of cases used and not used in the analysis. Based on the table there are no missing cases in the dataset. 

Dependent Variable Encoding
Original Value
Internal Value
Will Not Adopt
0
Adopt
1
 
The second table above gives the coding for the outcome variable, adoption.   

Categorical Variables Codings

Frequency
Parameter coding
(1)
Farm_Size
3 Hectares or less
52
1.000
4 Hectares or more
48
.000
Farm_Locatiosns
Rural
48
1.000
Urban
52
.000

The table above shows how the values of the categorical variable farm size and locations were handled, there are terms (essentially dummy variables) in the model. 

Classification Tablea,b
Observed
Predicted
Adoption
Percentage Correct
Will Not Adopt
Adopt
Step 0
Adoption
Will Not Adopt
54
0
100.0
Adopt
46
0
.0
Overall Percentage


54.0

The block 0 output is for a model that includes only the intercept (constant). It is a null model, a model with no predictors. 


Variables in the Equation

B
S.E.
Wald
df
Sig.
Exp(B)
 Step 0
Constant
-.160
.201
.639
1
.424
.852

The .852 is the predicted odds of adopting farming technology. The computation is presented below: Since 46 of our subjects decided to adopt the technology and 54 decided not to adopt, our observed odds is 46/54=.852

Variables not in the Equation

Score
df
Sig.
Step 0
Variables
Awareness
58.237
1
.000
Farm_Locations(1)
31.975
1
.000
Farm_Size(1)
51.794
1
.000
Overall Statistics
70.867
3
.000

   It gives the results of a score test, also known as a Lagrange multiplier (LM) test. Lagrange multiplier test measures a hypothesis about the parameters in a likelihood framework. Lagrange multiplier test measures only the estimates of the parameters subject to the restrictions while Wald tests are are based on unrestricted estimates.
      The column labeled Score gives the estimated change in a model fit if the term is added to the model, the other two columns give the degrees of freedom, and p-value (labeled Sig.) for the estimated change. Based on the table above, all three of the predictors, awarenesslocation, and size, are expected to improve the fit of the model.

 Block 1 Method Enter

Omnibus Tests of Model Coefficients

Chi-square
df
Sig.
Step 1
Step
94.698
3
.000
Block
94.698
3
.000
Model
94.698
3
.000

The table above gives the overall test for the model that includes the predictors. The chi-square value of 94.698 with a p-value .000 tells us that our model as a whole fits significantly better than an empty model (i.e., a model with no predictors).


The Omnibus Tests of Model Coefficients is used to check that the new model (with explanatory variables included) is an improvement over the baseline model. It uses chi-square tests to see if there is a significant difference between the Log-likelihoods.

Classification Table
Observed
Predicted
Adoption
Percentage Correct
Will Not Adopt
Adopt
Step 1
Adoption
Will Not Adopt
50
4
92.6
Adopt
2
44
95.7
Overall Percentage


94.0
a. The cut value is .500
 
  With the addition of the 3 predictors, 95.7 percent of the observed respondents who will adopt and 92.6 percent who will not adopt in the new farming technology were correctly predicted that gives an overall percentage of 94.0. Significantly, this is higher as compared with the null model. 

Model Summary
Step
-2 Log likelihood
Cox & Snell R Square
Nagelkerke R Square
1
43.291a
.612
.818
a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001.
       The -2 Log Likelihood is 43.291. This statistic measures how poorly the model predicts the decisions. The Chi-square value of 94.698+43.291=137.989 is the -2log likelihood of the model without the predictors. So based on that, the null model has a higher value (very poor in predicting the decisions as compared with this model summary).
       The Cox and Snell R Square and Nagelkerke R Square, explains the variation in the likelihood that the farmer will adopt the new farming technology.  The full model explains that about 61 to 82 percent in the likelihood that the farmers will adopt the new farming technology given the set of independent variables. 

Cox & Snell R Square and Nagelkerke R Square, are pseudo-R-squares. They determine the variation of probability of the likelihood. In the given example, 61 to 82% of the variation of probability that the farmers will adopt the new farming technology.


Variables in the Equation

B
S.E.
Wald
df
Sig.
Exp(B)
Step 1a
Awareness
1.110
.265
17.575
1
.000
3.035
Farm_Locations(1)
.163
.893
.033
1
.855
1.177
Farm_Size(1)
-3.513
.891
15.551
1
.000
.030
Constant
-2.944
1.192
6.100
1
.014
.053
a. Variable(s) entered on step 1: Awareness, Farm_Locations, Farm_Size.
 
       Based on the table, awareness and Farm Size significantly predict the likelihood of farmers to adopt the new farming technology (Wald=17.575; p<.01; Wald=15.551; p<.01) respectively.
        Those farmers whose farm size is 3 hectares or less are .030 times less likely to adopt the new farming technology than those farmers with only 4 hectares or more. Those farmers whose awareness level is 2 is 3.035 times more likely to adopt the new farming technology than those farmers whose awareness level is 1.

B are the values for the logistic regression equation for predicting the dependent variable from the independent variable. 

S.E. is the standard errors associated with the coefficients.

Wald and Sig. columns provide the Wald chi-square value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0.  Coefficients having p-values less than alpha are statistically significant.  

df column lists the degrees of freedom for each of the tests of the coefficients.

Exp(B) are the odds ratios for the predictors. They are the exponentiation of the coefficients. 



References:
1. Tabachnick &  Fidell (2013). 6th Edition. Using Multivariate Statistics. Binary Logistic Regression Interpretation of SPSS Result. Pearson. USA 
2. Statistics Solutions. Advancement Through Clarity. Binary Logistic Regression Interpretation of SPSS Resulthttps://www.statisticssolutions.com/assumptions-of-logistic-regression. Retrieved on August 25, 2019.
3. UCLA. Institute for Digital Research and Education. Binary Logistic Interpretation of SPSS ResultRegression. https://stats.idre.ucla.edu/spss/output/logistic-regression/. Retrieved on August 25, 2019



#Binarylogisticregression #Binarylogisticregressioninterpretationofresult
#Binarylogisticregressionspssoutputinterpretation




5 comments:

  1. this is the easiest way to interpret a binary regression analysis. i do hope you can add different types of regression. thanks so much. new follower here.

    ReplyDelete
  2. This is the easiest way to interpret a binary regression analysis. i do hope you can add different types of regression. thanks so much. new follower here.

    ReplyDelete
    Replies
    1. thank you for your feedback. please check this link for the linear regression analysis https://www.eresearch101.today/2019/09/how-to-interpret-linear-regression.html

      Delete
  3. This helps me so much. After googling related readings, so far this has the most comprehensive discussion. Thank you so much.

    ReplyDelete

Be sure to check back again because I do make an effort to reply to your comments here.