©Research for Beginners: Easy Binary Logistic Regression Interpretation in SPSS

What is binary logistic regression?

Binary logistic regression belongs to the family of logistic regression analysis wherein the dependent or outcome variable is binary or categorical in nature and one or more nominal, ordinal, interval or ratio-level independent variables. Like all linear regressions, logistic regression is a predictive analysis.

Binary logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more continuous-level (interval or ratio scale) independent variables. In binary logistic regression, the log of odds of the dependent variable is modeled as a linear combination of the independent variables. Log odds are an alternate way of expressing probabilities, which simplifies the process of updating them with new evidence.

alt="binary logistic regression interpretation of results for non statisticians"

What is the primary assumption when using binary logistic regression?

The dependent variable in binary logistic regression should be binary or dichotomous in nature. Example of these includes pass/failed, alive/dead, male/female, yes/no, approved/disapproved, with/without.

What are the kinds of studies in using the binary logistic regression?

*Predict the likelihood of the law graduates to pass the bar exam or not?

*Whether or not the teachers will use the new instructional media or not?

*Predict the likelihood of the divorce bill in the Philippines be approved or not?

Are there assumptions in logistic regression?

Logistic regression is a non-parametric statistics approach. Like parametric statistics, binary logistic regression requires to fulfill certain assumptions before employing it. The assumptions for logistic regression are the following:

1. Does not require a linear relationship between the dependent and independent variables.

2. The error terms (residuals) do not need to be normally distributed.

3. Binary logistic regression requires a dependent variable to be binary and ordinal logistic regression requires the dependent variable to be ordinal.

4. Logistic regression requires observations to be independent of each other.

5. Logistic regression requires there to be little or no multicollinearity among the independent variables.

6. Finally, logistic regression typically requires a large sample size. A general guideline is that you need a minimum of 10 cases with the least frequent outcome for each independent variable in your model. For example, if you have 5 independent variables and the expected probability of your least frequent outcome is .10, then you would need a minimum sample size of 500 (10*5 / .10).

Can I use Linear Regression when the dependent variable is binary?
A simple or multiple linear regression is a violation if you will carry out statistical analysis with a dependent variable that is binary in nature because the dependent variable is binary and violates the assumption of normality.

What is the formula for binary logistic regression?

Binary logistic regression is similar to ordinary least square. The prediction equation is:

log(p/1-p) = b0 + b1*x1 + b2*x2 + b3*x3 + b3*x3+b4*x4

What are the steps in interpreting the Binary Logistic Regression result in an easy way?

Here are the steps in interpreting the results of binary logistic regression using SPSS.

Case Processing Summary
Unweighted Cases^a		N	Percent
Selected Cases	Included in Analysis	100	100.0
	Missing Cases	0	.0
	Total	100	100.0
Unselected Cases		0	.0
Total		100	100.0
a. If weight is in effect, see classification table for the total number of cases.

The first table above shows a breakdown of the number of cases used and not used in the analysis. Based on the table there are no missing cases in the dataset.

Dependent Variable Encoding
Original Value	Internal Value
Will Not Adopt	0
Adopt	1

The second table above gives the coding for the outcome variable, adoption.

Categorical Variables Codings
		Frequency	Parameter coding
		Frequency	(1)
Farm_Size	3 Hectares or less	52	1.000
Farm_Size	4 Hectares or more	48	.000
Farm_Locatiosns	Rural	48	1.000
Farm_Locatiosns	Urban	52	.000

The table above shows how the values of the categorical variable farm size and locations were handled, there are terms (essentially dummy variables) in the model.

Classification Table^a,b
Observed			Predicted
			Adoption		Percentage Correct
			Will Not Adopt	Adopt
Step 0	Adoption	Will Not Adopt	54	0	100.0
		Adopt	46	0	.0
	Overall Percentage				54.0

The block 0 output is for a model that includes only the intercept (constant). It is a null model, a model with no predictors.

Variables in the Equation
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 0	Constant	-.160	.201	.639	1	.424	.852

The .852 is the predicted odds of adopting farming technology. The computation is presented below: Since 46 of our subjects decided to adopt the technology and 54 decided not to adopt, our observed odds is 46/54=.852

Variables not in the Equation
			Score	df	Sig.
Step 0	Variables	Awareness	58.237	1	.000
		Farm_Locations(1)	31.975	1	.000
		Farm_Size(1)	51.794	1	.000
	Overall Statistics		70.867	3	.000

It gives the results of a score test, also known as a Lagrange multiplier (LM) test. Lagrange multiplier test measures a hypothesis about the parameters in a likelihood framework. Lagrange multiplier test measures only the estimates of the parameters subject to the restrictions while Wald tests are are based on unrestricted estimates.

The column labeled Score gives the estimated change in a model fit if the term is added to the model, the other two columns give the degrees of freedom, and p-value (labeled Sig.) for the estimated change. Based on the table above, all three of the predictors, awareness, location, and size, are expected to improve the fit of the model.

Block 1 Method Enter

Omnibus Tests of Model Coefficients
		Chi-square	df	Sig.
Step 1	Step	94.698	3	.000
	Block	94.698	3	.000
	Model	94.698	3	.000

The table above gives the overall test for the model that includes the predictors. The chi-square value of 94.698 with a p-value .000 tells us that our model as a whole fits significantly better than an empty model (i.e., a model with no predictors).

The Omnibus Tests of Model Coefficients is used to check that the new model (with explanatory variables included) is an improvement over the baseline model. It uses chi-square tests to see if there is a significant difference between the Log-likelihoods.

Classification Table
Observed			Predicted
			Adoption		Percentage Correct
			Will Not Adopt	Adopt
Step 1	Adoption	Will Not Adopt	50	4	92.6
		Adopt	2	44	95.7
	Overall Percentage				94.0
a. The cut value is .500

With the addition of the 3 predictors, 95.7 percent of the observed respondents who will adopt and 92.6 percent who will not adopt in the new farming technology were correctly predicted that gives an overall percentage of 94.0. Significantly, this is higher as compared with the null model.

Model Summary
Step	-2 Log likelihood	Cox & Snell R Square	Nagelkerke R Square
1	43.291^a	.612	.818
a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001.

The -2 Log Likelihood is 43.291. This statistic measures how poorly the model predicts the decisions. The Chi-square value of 94.698+43.291=137.989 is the -2log likelihood of the model without the predictors. So based on that, the null model has a higher value (very poor in predicting the decisions as compared with this model summary).

The Cox and Snell R Square and Nagelkerke R Square, explains the variation in the likelihood that the farmer will adopt the new farming technology. The full model explains that about 61 to 82 percent in the likelihood that the farmers will adopt the new farming technology given the set of independent variables.

Cox & Snell R Square and Nagelkerke R Square, are pseudo-R-squares. They determine the variation of probability of the likelihood. In the given example, 61 to 82% of the variation of probability that the farmers will adopt the new farming technology.

Variables in the Equation
		B	S.E.	Wald	df	Sig.	Exp(B)
Step 1^a	Awareness	1.110	.265	17.575	1	.000	3.035
	Farm_Locations(1)	.163	.893	.033	1	.855	1.177
	Farm_Size(1)	-3.513	.891	15.551	1	.000	.030
	Constant	-2.944	1.192	6.100	1	.014	.053
a. Variable(s) entered on step 1: Awareness, Farm_Locations, Farm_Size.

Based on the table, awareness and Farm Size significantly predict the likelihood of farmers to adopt the new farming technology (Wald=17.575; p<.01; Wald=15.551; p<.01) respectively.

Those farmers whose farm size is 3 hectares or less are .030 times less likely to adopt the new farming technology than those farmers with only 4 hectares or more. Those farmers whose awareness level is 2 is 3.035 times more likely to adopt the new farming technology than those farmers whose awareness level is 1.

B are the values for the logistic regression equation for predicting the dependent variable from the independent variable.

S.E. is the standard errors associated with the coefficients.

Wald and Sig. columns provide the Wald chi-square value and 2-tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0. Coefficients having p-values less than alpha are statistically significant.

df column lists the degrees of freedom for each of the tests of the coefficients.

Exp(B) are the odds ratios for the predictors. They are the exponentiation of the coefficients.

References:

1. Tabachnick & Fidell (2013). 6^th Edition. Using Multivariate Statistics. Binary Logistic Regression Interpretation of SPSS Result. Pearson. USA
2. Statistics Solutions. Advancement Through Clarity. Binary Logistic Regression Interpretation of SPSS Result. https://www.statisticssolutions.com/assumptions-of-logistic-regression. Retrieved on August 25, 2019.

3. UCLA. Institute for Digital Research and Education. Binary Logistic Interpretation of SPSS ResultRegression. https://stats.idre.ucla.edu/spss/output/logistic-regression/. Retrieved on August 25, 2019

#Binarylogisticregression #Binarylogisticregressioninterpretationofresult

#Binarylogisticregressionspssoutputinterpretation

5 comments:

Kathy YapAugust 30, 2019 at 11:18 AM
this is the easiest way to interpret a binary regression analysis. i do hope you can add different types of regression. thanks so much. new follower here.
Kathy YapAugust 30, 2019 at 11:22 AM
This is the easiest way to interpret a binary regression analysis. i do hope you can add different types of regression. thanks so much. new follower here.
Akongoh Rudolf N.May 18, 2020 at 7:30 AM
Thanks, I found this useful.
Edu MacabatasJune 22, 2020 at 8:17 PM
This helps me so much. After googling related readings, so far this has the most comprehensive discussion. Thank you so much.

Be sure to check back again because I do make an effort to reply to your comments here.

©Research for Beginners

Pages

Sunday, August 25, 2019

Easy Binary Logistic Regression Interpretation in SPSS

5 comments: