Author(s): Kai Barner, Ha Dao, Abigail Kokkinakis, Alexandra Diaz Merida, Amanda Webber
Mentor(s): Taylor Anderson, Geography & Geoinformation Science; Travis Gallo, Environmental Science and Policy; Mariaelena Pierobon, Biohealth
AbstractAK
Abigail G Kokkinakis0:09
Hello everyone. The objective of this study is to examine the association between the urban built environment and sociodemographic variables and breast cancer mortality using an approach that will capture spatial heterogeneity and those associations. According to the American Cancer Society, breast cancer is most common cancer in American females. One in eight females will be diagnosed with breast cancer and their lifetime, and one in 39 females will die from breast cancer. A lot of studies look at individual level, social demographic data, and regression techniques. However, these regression techniques assume spatial stationarity which ignores local heterogeneities and associations between variables and breast cancer mortality.
Ours created a US model focusing on urban built environment variables without the limitations of most global regression techniques.
We collected Seer data from the most recent five-year interval, which was 2015 to 2019. Note the white space in the map shown. These are the counties that did not have data provided, which was roughly 1/3 of the data.
HD
Ha Dao1:10
Due to privacy reasons, counties with the count of deaths less than 10 were omitted by SEER. Therefore, we decided to assume these missing counts as 9 and calculate the crude rate based on this. Moving forward with data collection and preprocessing. In the beginning of our research, 65 urban-built and social-demographic breast cancer determinants were obtained. Throughout the process of data cleaning and data wrangling, log transformation and scaling were applied on selected variables to reduce skewness and normalize the range of independent variables. Upon running Pearson correlation and Variance Inflation Factor, we removed variables that were highly correlated so we can avoid multicollinearity in the regression model. Lastly, the leap and bounds algorithm was used to find the variables that explain the models best which you can see on the screen here
AW
Amanda A Webber1:59
Our model contained 18 variables. For the sake of time, we’re showing the three most significant variables as well as the variables that have no significance. The three most significant variables are access to mammograms, access to primary care doctors, and percent of population that is uninsured. The variables that are not significant based on their P value are textile manufacturing, commute, radiance, and unemployment.
You can see from this map of our county residuals that predictive accuracy varied across counties and appears spatially correlated. The R square value reflects the amount of variation in the data set that is explained by our model. On a scale from 0 to 1, this models R Square explains roughly 50% of the variation within the data.
KB
Kai Barner2:47
Big picture, this project wanted to account for spatial heterogeneity. A primary limitation of OLS regression is that it assumes that each explanatory variable has the same relationship with the dependent variable across the entire study area. So when one unit change of an explanatory variable occurs, that results in the same change to the dependent variable, no matter the location. Multiscale Geographically Weighted Regression – MGWR – takes location into account by taking the concept of neighborhoods from standard Geographically Weighted Regression and allowing their size, or bandwidth, to vary between different variables in the data set. Multiple scales reflect diversity in distribution and extent of influence.
This project examined data at the county level, so when analyzing the influence of a variable on an individual county, the bandwidth took into account that many neighbors at the same time. There are just over 3100 counties in the continental US, and we can see that some of the neighborhoods actually encompassed all of them. But at the other end of the spectrum, a small bandwidth indicates that the spatial process changes quickly from location to neighboring locations. And we see this with food index, exercise, access to primary health care, and mammograms.
Where our OLS model had an adjusted R-squared of 0.478, our MGWR model explained about 44 percentage points more variance, with an adjusted R-squared of 0.918.
Here we have our MGWR results for mammograms. The mean coefficient is negative 0.14, so the standard behavior nationwide is an inverse relationship. As the number of mammograms increase, breast cancer mortality decreases and we see this norm in cornflower blue. As we move through the purples into magenta, this negative correlation becomes even stronger. But on the other hand, areas in the cyan extend from having a weaker negative correlation to actually having a positive coefficient. In these areas as mammograms increase, breast cancer mortality also increases.
These local differences warrant further examination of other variables and other factors of underlying influence, alluding to the local complexities behind trying to improve breast cancer health outcomes. There is no one-size-fits-all solution.
AM
Alexandra Diaz Merida5:05
Another variable that we looked at was access to exercise, which measures how close an individual is to a location for physical activity. This can be either parks or recreation centers in this map we see that in general there is a negative correlation, which means that the higher the access to exercise opportunities, there is a decrease in breast cancer mortality. Looking at these graphs, we can see which counties need more access to exercise in order to decrease breast cancer mortality.
Similar to access to exercise, we also looked at food index, which is described as how close an individual is to healthy foods and the and the ability to access due to cost barriers. It is also negatively correlated, which again means that the higher the food index, there’s a decrease in mortality. These maps will help inform local governments about breast cancer mortality rates and how improving these variables can decrease mortality.
In summary, MGWR explains the results better than OLS for spatial heterogeneity, providing better explanations for how urban built environment variables affect breast cancer mortality. Thank you.
KB
Kai Barner6:12
Thank you.
AK
Abigail G Kokkinakis6:13
Thank you.