Spatio Temporal Prediction of Human Mobility – OSCAR Celebration of Student Scholarship and Impact

Author(s): Dhruv Gandhi

Mentor(s): Hamdi Kavak, Computational and Data Sciences; Taylor Anderson, Geography and Geoinformation Science; Andreas Zufle, Geography and Geoinformation Science; Amira Roess, Global and Community Health; Samiul Islam, Fahad Aloraini, Graduate Assistants

Abstract

There have been many studies on location prediction that have tried to predict the number of visitors at a certain location; however, these studies suffer from some limitations due to the lack of data. This study attempts to create a location prediction system based on data for Fairfax County from SafeGraph. SafeGraph data provides us with a sample of approximately 10 percent of the number of visitors from each census block group to each place of interest from Jan 2018 to Jun 2021. Tensor factorization was used to cancel out the noise that is present in the SafeGraph data. We developed three baseline prediction models to compare against more sophisticated approaches: weekly rolling average [model 1], using previous 4 weeks [model 2], and using previous 4 weeks weighted [model 3]. Other approaches used were long short-term memory (LSTM), regression, croston, autoregressive integrative moving average (ARIMA), and exponential smoothing. The baseline approaches, specifically model 2 and model 3, have consistently performed better than the more sophisticated approaches and tensor factorization has always influenced our models positively. Given that we have access to this abundance of data, using the power of factorization and simple baseline algorithms, businesses and entities can predict human mobility and identify a potential group for their mutual betterment through marketing and advertisements.

Video Transcript

our presentation is on spatial temporal prediction of human mobility our research objective is to predict human mobility in fairfax county from a census block group to a place of interest using data from safe graph so we’re going to define points of interest and census block groups a point of interest is basically just any location that shows up on a gps like a hotel or a gas station and a census block group is just a geographical unit used by the census bureau and then to simplify that a little bit more basically we’re looking at someone from cbg1 going to a place of interest such as walmart or panera bread and same for cbg too and that’s how we’re getting our data next slide please and through that we have 649 cbgs which are the rows and then over 15 000 places of interest which are the columns walmart nera gym etc and then that data ranges from january 2018 to june 2021 and we have data on a weekly basis and through that we have over a billion data points to work with and the workflow in this regard would be just pre-processing the data a little bit uh doing some matrix or just creating the matrix then creating the sparse matrix and now getting to matrix factorization non-negative and regular this is just to cancel the noise get rid of the outliers and keep the signal then we look at our approaches create some models see if the models are scalable sample them and then just fill in the gaps with more factorization just so it’s more scalable next slide please and these are some of the approaches we’re looking at um approach one is just looking at the previous week’s data and then just using that to predict the next week approach two is looking at the previous four weeks of data to predict the next week model three is to look at the previous four weeks on a weighted scale to predict the next week then we also tried some regression models and we also tried long short term memory lstm given that the approaches we have tried these are the results we have we have tried all of the approaches on the raw matrix or the actual matrix and also on the factorized matrix and we used tucker for the factorization the models we tried with like model one two three regression polynomial regression with the dedgree of 2 and lstm and as you can clearly see if you focus on the right side’s plot you can see that the model two and three is outperforming all other models the regressions we have and the lstm and for that reason with a smaller subset we would only focus on model one two and three basically the baseline approaches to evaluate and if we go to the next slide as you can see here we have the subset this is not a geographical subset but a subset in terms of the number of poi since we are selecting the enter fairfax county but only 1670 poi and these are the most visited pois and we are focusing on the first 52 weeks which is basically the year 2018. and these are the results from 2018 and if you compare the rmse score between model one two and three you may see that the factorized matrix always tends to perform better than the raw or the actual matrix and this has been observed in all cases and model two and three performs better than model one even though in this case model one tends to perform as good as model two and three but we suggest two or three for the further approaches and going forward we you’d like to focus on the cbg and poi key trends and we’d like to develop maybe a multi-threaded approach for a larger area and we are in the process of trying some of the sophisticated approaches and hopefully in the next iteration of our work you will be able to see that we’d like to give a special thanks to the national science foundation the aspiring scientists summer internship program and the summer team impact projects

For more on this topic see:

Examining Different Disease Transmission Approaches in Data-Driven Agent-Based Models

Analyzing Changes in US Mobility Trends During 2020a

Measuring the Changes in Sentiment and Emotion Towards COVID-19 Over Time in Tweets Posted from Within United States Counties

For more on this topic see:

Leave a Reply Cancel reply