Regression and
correlation is closely related. Both techniques involve the relationship
between two variables, and they both utilize the same set of paired scores
taken from the same subjects. However, whereas correlation is concerned with
the magnitude and direction of the relationship, regression focuses on using
the relationship for prediction.
In terms of prediction, if two variables were correlated perfectly, then
knowing the value of one score permits a perfect prediction of the score on the
second variable. Generally, whenever two variables are significantly
correlated, the researcher may use the score on one variable to predict the
score on the second.
There are many
reasons why researchers want to predict one variable from another. For example,
knowing a person’s I.Q., what can we say about this person’s prospects of
successfully completing a university course? Knowing a person’s prior voting
record, can we make any informed guesses concerning his vote in the coming
election? Knowing his mathematics aptitude score, can we estimate the quality
of his performance in a course in statistics? These questions involve
predictions from one variable to another, and psychologists, educators,
biologists, sociologists, and economists are constantly being called upon to
perform this function.
Problems we meet
in regression analysis are related often to situations when the assumptions of
regression analysis are not satisfied. For example, the predictive power of the
regression equation depends on the assumption that the residuals from that
regression satisfy certain statistical properties. This section will discuss
these problems.
Multiple linear regression requires at least two independent
variables, which can be nominal, ordinal, or interval/ratio level
variables. When we have one independent variable, we call this “simple”
linear regression. A rule of thumb for the sample size is that regression
analysis requires at least 20 cases per independent variable in the analysis.
1.
LEVEL OF MEASUREMENT
The dependent variable or the outcome variable is scale, while the independent variable or predictor variable is scale or nominal.
The dependent variable or the outcome variable is scale, while the independent variable or predictor variable is scale or nominal.
2.
LINEARITY
Correlation and regression are
measures of association between variables. Prior to performing regression
analysis, it is important to run the correlation test to determine the strength
of the linear relationship between the two variables.
The linearity test will determine whether
the relationship between the independent variable and the dependent variable is
linear or not. Linearity is a requirement in the correlation and linear
studies.
Decision-making process in Linearity
Test:
1. If the value sig. deviation from Linearity >0.05, then the relationship
between the independent and dependent variables is linear.
2. If the value sig. deviation from Linearity <0.05, then the
relationship between the independent and dependent variables is not linear.
Steps:
1. On the SPSS menu, select Analyze,
then click Compare Means and then
click Means.
2. A new dialog box will appear with the name Means. Enter the
corresponding independent variable(s) in the Independent List and dependent variable(s) in the Dependent List.
3. The dialog box appears with the name Means: Options. At the bottom of the box where the Statistics for First Layer is found,
select the Test for Linearity, and
then click Continue.
4. The last step clicks
on OK to terminate the command,
after which will appear
Interpretation of Linearity Test Result
Based on the ANOVA Output Table,
value sig. deviation from Linearity of 0.0000 >0.05, it can be concluded
that there is a linear relationship between the variables of immunization rate
and the mortality rate of the children.
3. NORMALITY OF VALUES
Assumption of normality means that you
should make sure your data roughly fits a bell curve shape before running
certain statistical tests or regression. The normality test determines the
distribution of the data in the variable that will be used in statistical
analysis. Shapiro-Wilk and Kolmogorov-Smirnov are commonly used to determine if
the variables are normally distributed. Shapiro-Wilk is used for a sample that is less than 50. Meanwhile, for samples more than 50, Kolmogorov-Smirnov is
used.
Decision-making
process in the normality test:
1. If the value Asymp. Sig > 0.05, the data is normally
distributed.
2. If the value Asymp. Sig < 0.05, the data is not
normally distributed.
Steps:
1. On the SPSS menu, select Analyze,
then click Explore.
2. A new dialog box will appear. Enter the corresponding independent
variable(s) in the Factor List (Independent
Variable) and dependent variable(s) in the Dependent
List (Dependent Variables).
3. Click Plots. Uncheck Stem-and-Leaf
and click Normality plots with tests. Click Continue, then OK.
Interpretation of Normality Test Result
Based on the table,
the Sig. values are > .005, therefore, the data are normally distributed.
4.
MULTICOLLINEARITY TEST
A key goal of regression analysis is to isolate the relationship between each independent variable and the dependent variable. The interpretation of a regression coefficient is that it
represents the mean change in the dependent variable for each 1 unit change
in an independent variable when you hold all of the other independent variables constant.
Multicollinearity occurs when independent
variables in a regression model are correlated. This correlation is a problem because independent variables should be independent. If the degree of
correlation between variables is high enough, it can cause problems when you
fit the model and interpret the results.
Similarities between the independent variables will result in a very
strong correlation. Collinearity between independent variables should not
happen in good regression models.
Variance Inflation Factor (VIF) is often used to
detect multicollinearity. The variance
inflation factor of the linear regression is defined as VIF = 1/T. With VIF
> 5 there is an indication that multicollinearity may be present; with VIF
> 10 there is certainly multicollinearity among the variables.
If multicollinearity is found
in the data, centering the data (that is deducting the mean of the variable
from each score) might help to solve the problem. However, the simplest
way to address the problem is to remove independent variables with high VIF
values.
Decision-making process in the collinearity test:
1. If the VIF value lies between 1-10, there is no
multicollinearity.
2. If the VIF is <1 or >10, there is multicollinearity.
Steps:
1. On the SPSS menu, select Analyze. Click Regression, then Linear.
2. A new dialog box will appear. Enter the corresponding independent
variable(s) in the Dependent and Independent(s) lists.
3. Click Statistics.
4. It will open a new dialog box. Click Collinearity diagnostics.
3. Click Statistics.
3. Click Continue, and then OK.
Interpretation of Multicollinearity Test Result
Based on the coefficient output, the VIF value lies
between 1-10, therefore, there is no multicollinearity.
5. EQUALITY OF VARIANCE
Statistical tests,
such as analysis of variance (ANOVA), assume that although different
samples can come from populations with different means, they have the same variance. Equal
variances also called homoscedasticity
is when the variances are
approximately the same across the samples.
In linear
regression analysis, the fact that the errors of the model (also named
residuals) are not homoscedastic has the consequence that the model
coefficients estimated using ordinary least squares (OLS) are neither unbiased
nor those with minimum variance. The estimation of their variance is not
reliable.
Decision-making process in the
homoscedasticity test:
1.
If the data does not have an obvious pattern, it is homoscedastic.
2. If
the data has very tight distribution to the left of the plot, and a very wide
distribution to the right of the plot, or vice versa, the data is not homoscedastic.
Steps:
1. On the SPSS menu, select Analyze.
Click Regression, then Linear.
2. A new dialog box will appear. Enter the corresponding independent variable(s) in the Dependent and Independent(s) lists.
3. Click Plots.
4. A new dialog box will open. Drag *ZPRED to X and *ZRESID to Y.
5. Click Continue, and then Ok.
Interpretation
of Result
Homoscedastic
The data have
no obvious pattern.
Heteroscedastic
Below is an example of heteroscedastic values. There is a
tight distribution to the left of the plot and a very wide distribution to the right
of the plot. If you were to draw a line around your data, it would look like a
cone.
Image: Data SPSS
Version 20
Easy to follow and understand. Thank you for such output.
ReplyDelete