Tuesday, May 2, 2017

Assignment 5

Goals and Background


This assignment is designed to assist in the understanding of correlation through the use of various software. Students begin with the task of running correlations through IBM SPSS. With the output created from this process, they are then expected to interpret the correlations and determine significance of relationships. The next task is set up to have students download U.S. Census data, use the GEOIDs to join it within ArcMap, and then to run the joined data through Geoda to view Moran's I values and create LISA cluster maps.


Part 1


Part 1's goal is to use census data for Milwaukee, WI to find correlation values between various datasets. Correlation is a measure of how strong of a relationship exists between two variables. The results vary from -1 to 1, and the closer to either end of the spectrum indicates stronger relationships. Positive relationships are the result of  both datasets increasing together, and negative relationships are the opposite. The correlation results do not imply causation, however. 

The provided codes are the datasets in Figure 1 below:


White  = White Pop. for the Census Tracts in Milwaukee County
Black = Black Pop
Hispanic = Hispanic Pop
MedInc = Median Household Income
Manu = Number of Manufacturing Employees
Retail = Number of Retail Employees
Finance = Number of Finance Employees


Figure 1: Bivariate correlation matrix results from provided excel datasheet. 


One of the most apparent results of the matrix is from the White population having jobs in each industry, and the tendency for other races to not have as many jobs. Aside from the the correlation values comparing the white data against other races, the Pearson Correlation value is very strongly positive for the white data on all of the employee data. These results point towards white populations having more jobs in manufacturing, retail, and finance compared to black populations and Hispanic populations. This claim is supported by the results from the Pearson Correlation comparing the race's median income as well. 58.5% of the white population's income is explained by the correlation results on all of the data, while the black and Hispanic correlations are both negative values. This suggests that the white population's median income trends upwards with the trend line, and the other race's incomes are far lower than white populations. In other words there is a negative relationship between the black and Hispanic workers and median income.

Part 2


Introduction


The following work is dedicated to a project for the Texas Election Commission (TEC). The goal is to analyze the patterns of elections to determine if there is spatial auto correlation of voting patterns, and voter turnout in The State of Texas. Spatial autocorrelation is the study of how alike things arecompared to what is around it. When models are run, a value is returned that displays a strength of similarity the data set exhibits on a spatial scale. Unlike correlation, a positive or negative value does not give direction. The study is focused on the voting results for the Presidential Elections of 1980 and 2016. The data obtained is focused on the percent Democrat votes, and the voter turnout for each year. Ultimately, the results of this study will be provided to the Governor to see if election patterns have changed over the course of the 36 years between the two Presidential Elections.

Methodology


The provided data table from the TEC contains multiple code as follows:

  • VTP80 = Voter Turnout 1980
  • VTP16 = Voter Turnout 2016
  • PRES80D = % Democratic Vote 1980
  • PRES16D = % Democratic Vote 2016

The Hispanic Population of 2015 was necessary to obtain for the completion of the study. It was gathered from the 2015 US Census Bureau ACS 5 Year Estimates. The Texas County shapefile was also obtained from the US Census Bureau.

Once the data downloaded from the Census Bureau, it was cleaned up by deleting unnecessary columns and fields. After this was completed, it was brought into ArcMap along with the Texas Voting Data from the TEC, and the Texas County shapefile. All three of these data sets were joined on the GEOID provided through the US Census Bureau. This joined dataset was then exported into a new shapefile so that it could be used later.


The new shapefile was then brought into Geoda for further analyzing. A new project was created, and within this a spatial weight was developed to assist in determining spatial autocorrelation later. Once this was done, Moran's I plots were created using the weight against the variables. This helped to see if spatial autocorrelation was a factor in the elections. Similareto correlation coefficients, the Morans I returns a value between -1 and 1, with stronger values on the extreme ends of the spectrum. The higher the value, the higher the spatial autocorrelation, and the lower the value, the less the spatial autocorrelation. After these plots were created, LISA Cluster Maps were created for each data set for further analysis in spatial autocorrelation. These all helped in determining the strength of spatial autocorrelation between counties. 

Results



Percent Hispanic


The first data set ran through the Geoda Moran's I and LISA Cluster maps was the Percent Hispanic residents. The Moran's I exhibits very strong spatial autocorrelation results, with a score of .78 (Figure 2). This can be seen in Figure 3 in the LISA Cluster Map as well. The southern border of Texas exhibits High, High values almost all the way across, multiple counties in. This means that these counties all have areas of high Hispanic Populations surrounded by other areas of Hispanic Populations. This can also be seen on the opposite end of the spectrum, with the North East side of the state exhibiting Low, Low values. This means that these counties all have very low Hispanic populations, and are surrounded by other counties with very low populations.


Figure 2: Moran's I scatterplot of Hispanic populations in Texas counties.

Figure 3: LISA Cluster Map of Hispanic populations spread across the counties in The State of Texas.



Voter Turnout for 1980 Presidential Election


The next set of data to analyze was the voter turnout for the 1980 Presidential Election. The Moran's I is a medium strength value of .47 of spatial autocorrelation (Figure 2). This means that there is a strength that exists in the spatial autocorrelation of the data set, but it is not particularly strong. When looking at the LISA Cluster Map, this is visible in just a few areas around the state. The southern and eastern blue portions exhibit Low, Low values, which means that this is an area which exhibits low voter turnouts in the counties highlighted as well as the surrounding counties (Figure 3). The Northern part of the state, highlighted in red, exhibits High, High values, which means that these areas all have very high voter turnout compared to other areas in the state.



Figure 4: Moran's I scatterplot of voter turnout for the 1980 Presidential Election in Texas counties.





Figure 3: LISA Cluster Map of voter turnout for the 1980 Presidential Election spread across the counties in The State of Texas.



Voter Turnout For the 2016 Presidential Election


The Moran's I for the 2016 Presidential Election voter turnout is lower than the spatial autocorrelation for voter turnout in 1980. With a Moran's I value of .29, the spatial autocrrelation is pretty low for this data set (Figure 4) . This can mean that voters and non-voters are more evenly spread out than the election in 1980. This election, only the southern tip and a North Western portion of the state exhibits Low, Low values, and there is only a small area within the center of the state which has High, High values (Figure 5) . This finding would support the claim that there is more evenly spread out voters to non-voters as much less of the state is covered by extremes on the LISA map.



Figure 4: Moran's I scatterplot of voter turnout for the 2016 Presidential Election in Texas counties.





Figure 5: LISA Cluster Map of voter turnout for the 2016 Presidential Election spread across the counties in The State of Texas.



Percent Democrat Vote in 1980


The Moran's I value for democratic vote in the 1980 was relatively strong, with a score of .58 (Figure 6). This means that there is a fair amount of spatial autocorrelation between counties in Texas that voted Democrat. Comparing the LISA Cluster Map (Figure 7) to the voter turnout LISA map for 1980 (Figure 3), one can see similarities. The South Eastern High, High section of Figure 7, is the same area where there was Low, Low voter turnout in 1980. The same can be said for the Low, Low section of Figure 7, as it is almost the same area of the High, High turnout in 1980 (Figure 3). 



Figure 6: Moran's I scatterplot of percent democrat vote in 1980 within Texas counties.




Figure 7: LISA Cluster Map of percent democratic vote for the 1980 Presidential Election spread across the counties in The State of Texas.



Percent Democrat Vote in 2016


The Moran's I value of percent Democrat becomes higher in 2016 compared to 1980, with a score of .69 (Figure 8). This indicates a high amount of spatial autocorrelation for the party lines of the Presidential Election of 2016. Looking at the LISA map, one can see that there is a larger area of High, High Democrat voters along the southern edge of the state (Figure 9). Looking back at Figure 3, one can see that this area also has High, High Hispanic populations. One could assume that there was a larger Hispanic voter turnout this year than there was in 1980. 




Figure 8: Moran's I scatterplot of percent democrat vote in 2016 within Texas counties.




Figure 7: LISA Cluster Map of percent democratic vote for the 2016 Presidential Election spread across the counties in The State of Texas.



Conclusion


Part 1 allows one to see that SPSS is a very robust statistics software that allows user to gather fast, accurate results. Part 2 helps to see that using data from a variety of sources, one can aggregate it into a single file and find very compelling results within Geoda. There was not very strong evidence for voter turnout difference between 1980 and 2016, however there is evidence that there could have been a stronger Hispanic voter turnout in the South West portion of the state in 2016.


No comments:

Post a Comment