discussionsource.pdf

Chapter5:BivariateCorrelation

CorrelationandCausationBefore addressing bivariate correlation and Analysis of Variance, it is important to address the assumptions of provide an overview of correlation and The essence of causality may be captured by the notion of manipulation. If one could intervene without changing the surrounding circumstances and make a change in the first thing, a change in the second thing would follow from the original manipulation.

Whatarethecriteriaforcausality?There are several criteria necessary for causation. These include association, temporal order, and spuriousness.

Association: The first criterion for causality is than an association must exist between presumed cause and its effect. If two variables do not co-vary, meaning as one changes, the other changes is a corresponding manner, then neither can be considered a candidate to exert causal influence on the other. Temporal Order: For variable A to be considered a causal candidate for the occurrence of B, it must occur before B in time. Temporal order in the social and behavioral sciences is often obvious, but because of the feedback nature of many of the things we study, the order is not always easy to determine. Two things that are occurring at the same time for example could be neither a cause nor effect for the other. Spuriousness: The third criterion is that the relationship must not statistically disappear when the influence of other variables is considered.

There should also be necessary and sufficient cause. A necessary cause or condition is one that must be present for an effect to follow. A sufficient cause is a cause or condition that by itself is able to produce an event.

BivariateCorrelationThe next procedure is bivariate correlation. Bivariate correlation is used to evaluate whether two ratio or scale (in other words, continuous) variables are correlated, in other words, associated with each other. This procedure should not be used with nominal or ordinal variables. If necessary, review the discussion regarding levels of measurement in Chapter 2. First, it is important to remember that correlation is not synonymous with causation. For example, the number of firemen is positively correlated with fire damage but does it mean that the firemen cause the fire damage? No! The fire itself, and more specifically the size of the fire, is the cause of the damage not the number of firemen. Think of it terms of the assumption of

necessary or sufficient cause. Note: We can never completely satisfy the necessary and sufficient criterion of causality and we never will (Walsh and Ollenburger 2001). Bivariate correlation is often the first step before doing multivariate regression (Chapter 6). It allows us to evaluate whether there is an association between two variables absent any other influence. This means without controls. Multivariate regression allows one to evaluate whether that association is still present while controlling for other variables (plausible alternative causes). To illustrate how to do a bivariate correlation, consider the following: Is the percent of students that have used marijuana negatively associated with high school graduation rates? Using the STATES10 data set, we can examine this research question using Percent of High School Students That Use Marijuana: 2007 (HTC255) and Percent of Population Graduated from High School: 2008 (EDS131) First, OPEN the STATES10 data set in PSPP

Graphic 5.1

Then, as illustrated in Graphic 5.2 go to

Analyze Bivariate Correlation (left click)

Graphic 5.2

After left-clicking the following dialogue box appears:

Graphic 5.3

Left-click on the first variable so it is highlighted as shown in Graphic 5.3 and then begin typing HTC255 (percent of high school students that use marijuana) and the proper variable should become highlighted as shown in Graphic 5.4.

Graphic 5.4

Then click the arrow in the middle to move this variable HTC255 to the RIGHT so that it appears as shown in Graphic 5.5

Graphic 5.5

Following the same procedure but highlight the top variable and then begin typing EDS131 so that Percent of Population Graduated from High School is highlighted as shown in Graphic 5.6.

Graphic 5.6

Move EDS131 over to the right as you did with HTC255

Graphic 5.7

Then click OK, and you will get an output that looks like what is displayed in Graphic 5.8.

Graphic 5.8

How should you read this output table? First, you have a 2 X 2 table. You need to make sure you are reading the table properly. Make sure you are reading at the intersection of the two variables. On the top row, you have the correlation for HTC255 and HTC255 (row 1, column 1) and HTC255 and EDS131 (row 1, column 2) and in the second row, the correlation for EDS131 and HTC255 (row 2, column 1) and EDS131 and EDS131 (row 2, column 2). So the correlation for the two variables is available in the cells at row 1, column 2 and row 2, column 1. So what do the results for the bivariate correlation tell us? The first line is the Pearson correlation: -.12 The second line is the significance level (Sig. (2-tailed)): .468 The third line is N, which is the number of cases: 51 The Pearson correlation is -.12, which is negative as expected and it indicates that there is a negative weak correlation between the two variables. Before addressing the significance, the strength of the association depends on the size of the Pearson correlation, which will be some value between -1 and 1. The closer to 1 or -1 the Pearson correlation is, the stronger the relationship. Conversely, the closer to 0 the Pearson correlation is, the weaker the relationship As noted earlier there appears to be a negative weak correlation. In the social sciences, a correlation of .40 (or -.40) is considered “strong.”

Graphic 5.9

Back to our example, -.12 is a weak relationship (at best). However, with a level of significance of .468, which is above the .05 threshold suggests that these two variables are not significantly correlated and as such the weak association could be due to chance. A scatterplot will visually illustrate the association or lack there of as shown in Graphic 5.9. Next, looking at an example so that you can see what a significant bivariate correlation looks like. Using Percent of Population Graduated from High School: 2008 (EDS131) from the first example and for the second variable: Per Capita Income: (ECS100). Unlike the previous example, we will begin with the scatterplot (Graphic 5.10).

Graphic 5.10

Note the upward sloping cluster of most of the observations but to know whether there is a significant correlation between per capita income and graduation rates. Beginning with the procedure discussed above, move EDS131 over to the RIGHT as shown in Graphic 5.11. (if you have been following along, just move HTC255 back to the LEFT if it is still there).

Graphic 5.11

Next move ECS100 over to the right as shown below (Graphic 5.12),

Graphic 5.12

Then click OK and the following output will appear (Graphic 5.13).

Graphic 5.13

So what does the output tell us about the bivariate correlation? Does it confirm what appears in the above scatterplot (Graphic 5.10)? Based on the output: The first line is the Pearson correlation: .30 The second line is the significance level (Sig. (2-tailed)): .036 The third line is N, which is the number of cases: 51 The Pearson correlation is .30, which indicates a moderate correlation. The level of significance is .036, which is less than .05 and so this would indicate that there is a moderate correlation and that it is statistically significant. This suggests that state graduation rates are higher in states with higher per capita income. More sophisticated analysis, such as multivariate regression would be necessary to determine if this correlation is causation. Multivariate regression will be covered in Chapter 6.