Role of Urban Built Environment of Breast Cancer Mortality Health Disparities – OSCAR Celebration of Student Scholarship and Impact

Author(s): Kai Barner, Ha Dao, Abigail Kokkinakis, Alexandra Diaz Merida, Amanda Webber

Mentor(s): Taylor Anderson, Geography & Geoinformation Science; Travis Gallo, Environmental Science and Policy; Mariaelena Pierobon, Biohealth

Abstract

Breast cancer is the second-leading cause of cancer-related death among women in the United States, but mortality rates vary across the population. Previous studies use individual and county level data to examine the association between breast cancer mortality and various socio-demographic and environmental variables. However, global regression models assume spatial stationarity meaning that associations between explanatory variables and breast cancer mortality are the same across geographic space and scales. Therefore, the objective of the study is to use county-level data that describes the social and built environment across the contiguous United States to explain breast cancer mortality rates, as reported in the Surveillance, Epidemiology, and End Results (SEER) database, using a multi-scale geographically weighted regression model (MGWR) that accounts for spatial heterogeneity. We compare our MGWR model with a baseline global linear regression model (OLS). The MGWR outperformed the global linear regression model with an adjusted R2 of 0.91, explaining about 43% more variance than the OLS. The R2 for each county is high in the western part of the US and decreases towards the east. In comparison to global linear regression, MGWR had the same overall trends in terms of relationships between explanatory variables and mortality. For example, as mammogram screenings, health food index, and primary healthcare physician ratio increased, breast cancer mortality decreased. However, MGWR reveals spatial heterogeneity associated with the magnitude and direction of each relationship across counties. Such an approach allows for greater consideration of where certain variables are most influential in breast cancer mortality allowing for location-specific interventions.

Audio Transcript

Top of Form

Abigail G Kokkinakis0:09

Hello everyone. The objective of this study is to examine the association between the urban built environment and sociodemographic variables and breast cancer mortality using an approach that will capture spatial heterogeneity and those associations. According to the American Cancer Society, breast cancer is most common cancer in American females. One in eight females will be diagnosed with breast cancer and their lifetime, and one in 39 females will die from breast cancer. A lot of studies look at individual level, social demographic data, and regression techniques. However, these regression techniques assume spatial stationarity which ignores local heterogeneities and associations between variables and breast cancer mortality.

Ours created a US model focusing on urban built environment variables without the limitations of most global regression techniques.

We collected Seer data from the most recent five-year interval, which was 2015 to 2019. Note the white space in the map shown. These are the counties that did not have data provided, which was roughly 1/3 of the data.

Ha Dao1:10

Due to privacy reasons, counties with the count of deaths less than 10 were omitted by SEER. Therefore, we decided to assume these missing counts as 9 and calculate the crude rate based on this. Moving forward with data collection and preprocessing. In the beginning of our research, 65 urban-built and social-demographic breast cancer determinants were obtained. Throughout the process of data cleaning and data wrangling, log transformation and scaling were applied on selected variables to reduce skewness and normalize the range of independent variables. Upon running Pearson correlation and Variance Inflation Factor, we removed variables that were highly correlated so we can avoid multicollinearity in the regression model. Lastly, the leap and bounds algorithm was used to find the variables that explain the models best which you can see on the screen here

Amanda A Webber1:59

Our model contained 18 variables. For the sake of time, we’re showing the three most significant variables as well as the variables that have no significance. The three most significant variables are access to mammograms, access to primary care doctors, and percent of population that is uninsured. The variables that are not significant based on their P value are textile manufacturing, commute, radiance, and unemployment.

You can see from this map of our county residuals that predictive accuracy varied across counties and appears spatially correlated. The R square value reflects the amount of variation in the data set that is explained by our model. On a scale from 0 to 1, this models R Square explains roughly 50% of the variation within the data.

Kai Barner2:47

Big picture, this project wanted to account for spatial heterogeneity. A primary limitation of OLS regression is that it assumes that each explanatory variable has the same relationship with the dependent variable across the entire study area. So when one unit change of an explanatory variable occurs, that results in the same change to the dependent variable, no matter the location. Multiscale Geographically Weighted Regression – MGWR – takes location into account by taking the concept of neighborhoods from standard Geographically Weighted Regression and allowing their size, or bandwidth, to vary between different variables in the data set. Multiple scales reflect diversity in distribution and extent of influence.

This project examined data at the county level, so when analyzing the influence of a variable on an individual county, the bandwidth took into account that many neighbors at the same time. There are just over 3100 counties in the continental US, and we can see that some of the neighborhoods actually encompassed all of them. But at the other end of the spectrum, a small bandwidth indicates that the spatial process changes quickly from location to neighboring locations. And we see this with food index, exercise, access to primary health care, and mammograms.

Where our OLS model had an adjusted R-squared of 0.478, our MGWR model explained about 44 percentage points more variance, with an adjusted R-squared of 0.918.

Here we have our MGWR results for mammograms. The mean coefficient is negative 0.14, so the standard behavior nationwide is an inverse relationship. As the number of mammograms increase, breast cancer mortality decreases and we see this norm in cornflower blue. As we move through the purples into magenta, this negative correlation becomes even stronger. But on the other hand, areas in the cyan extend from having a weaker negative correlation to actually having a positive coefficient. In these areas as mammograms increase, breast cancer mortality also increases.

These local differences warrant further examination of other variables and other factors of underlying influence, alluding to the local complexities behind trying to improve breast cancer health outcomes. There is no one-size-fits-all solution.

Alexandra Diaz Merida5:05

Another variable that we looked at was access to exercise, which measures how close an individual is to a location for physical activity. This can be either parks or recreation centers in this map we see that in general there is a negative correlation, which means that the higher the access to exercise opportunities, there is a decrease in breast cancer mortality. Looking at these graphs, we can see which counties need more access to exercise in order to decrease breast cancer mortality.

Similar to access to exercise, we also looked at food index, which is described as how close an individual is to healthy foods and the and the ability to access due to cost barriers. It is also negatively correlated, which again means that the higher the food index, there’s a decrease in mortality. These maps will help inform local governments about breast cancer mortality rates and how improving these variables can decrease mortality.

In summary, MGWR explains the results better than OLS for spatial heterogeneity, providing better explanations for how urban built environment variables affect breast cancer mortality. Thank you.

Kai Barner6:12

Thank you.

Abigail G Kokkinakis6:13

Thank you.

Leave a Reply Cancel reply