ArcGIS Pro

Time Series Forecasting 101 – Part 1. COVID-19 data preparation with ArcGIS Notebooks in ArcGIS Pro

The new Time Series Forecasting toolset in ArcGIS Pro 2.6 brings a variety of ways for you to forecast and evaluate your space-time data. During the 2020 User Conference, Lauren Bennett demonstrated using this toolset to analyze the cumulative and daily confirmed cases of COVID-19. The Time Series Forecasting 101 series will help you understand the whole workflow behind the demo and recommend useful approaches to data preparation, model fitting, and interpretation. This first blog article of the series covers the first part for every analysis – data preparation.

Time Series Forecasting demo from the UC 2020 plenary

Introduction

Forecasts have been key to responding to COVID-19. There are so many ways to model and forecast cases, including the epidemiological models that many of us have leveraged. Most of the models consume not only the daily new confirmed cases or cumulative confirmed cases, but also some explanatory variables like population, hospitalization rate, doubling time, interventions like social distancing, etc. Alternatively, we can use another approach that focuses on modeling patterns and trends in time-series data. More specifically, the data that will be used is a collection of historic records, observations, or measurements taken over a period of equal time intervals.

To use the Time Series Forecasting toolset, we need to convert the raw time-series data into a space-time cube. In this blog, we’ll demonstrate how to use ArcGIS Notebooks in Pro to create a Space-time Cube containing cumulative confirmed cases and daily new confirmed cases of COVID-19 at the US county level.

Data preparation

The raw data we are using are two CSV files downloaded from USAFacts: cumulative confirmed cases and cumulative deaths from COVID-19 for each county in the US. The format of the original table downloaded from the website is shown on the left in Fig 1, and the format of the table we need to use as the Related Table in the Create Space Time Cube From Defined Locations tool is shown on the right in Fig 1.

Fig 1. How to prepare Related Table for Create Space Time Cube from Defined Locations

So how did we prepare this data? We created an ArcGIS Notebook in Pro and use pandas to do the data engineering work. You can download the notebook, features of US county polygons, and a space-time cube sample compacted in the ZIP file, and add both to a new ArcGIS Pro project. Then you can open the notebook from Catalog, modify the workspace if necessary, and reproduce the workflow using the most recent data from the website. Here we highlight the key steps in the data preparation process that the notebook covers:

Other useful processing tasks that would be possible to achieve by customizing this notebook include:

Here are examples of what the 7-day moving average looks like compared to the reported data, one for Los Angeles County, California in Fig 2, the other for Manhattan, New York City in Fig 3. The 7-day moving average is in orange, and the reported data is in blue. Four different variables are plotted: cumulative confirmed cases, daily new confirmed cases, cumulative deaths, and daily new deaths. The time period in which the data is reported is from Jan 22 to July 12, 2020. As shown in the plots, it’s not necessary to apply moving average on the cumulative data, but very important to apply on the daily new data.

Fig 2. 7-day moving average (orange) vs reported data (blue) in LA County, California. (The left charts shown cumulative cases and deaths, the right charts show daily cases and deaths)
Fig 3. 7-day moving average (orange) vs reported data (blue) in Manhattan, New York City. (The left charts shown cumulative cases and deaths, the right charts show daily cases and deaths.)

Now we have generated a CSV file containing all the data we need, the next step in the ArcGIS Notebook in Pro is defining a function to add the CSV file to Table of Content, Export Table to DBF so that an OBJECTID field is added, and use the DBF to create a cube through the Create Space Time Cube From Defined Locations tool. And then every time you need to create a space-time cube from the CSV file, you just call this function instead of writing three arcpy commands. If you want to migrate this function to another dataset, you may need to manually run the Create Space Time Cube From Defined Locations tool to get the arcpy command with your Input Features and Variables and modify the arcpy command for this tool inside the function accordingly.  You can check out how to run it step-by-step in UI in part 4 of this article series.

Key takeaway

The best part about creating an ArcGIS Notebook in Pro to do data preparation is that, as the website updates the data frequently, you can re-download the CSV file, run all cells to automate the whole process, and the space-time cube will be ready for you! The rerun won’t overwrite previous cubes since the last date of the data is included as a suffix in the cube name. We can also share the notebook to someone else, who may be not very experienced with python scripting, but wants to analyze the COVID-19 data using the most up-to-date data with the Space Time Pattern Mining tools.

If you want to skip all the data processing and just grab a cube that is ready for use in the Time Series Forecasting tools, a space-time cube sample with COVID-19 time-series data is also included in the same ZIP file.

Now we have the cube, the next question is which forecast method or methods are most suitable for daily new confirmed cases, and which are more appropriate for cumulative confirmed cases? Let’s start with forecasting daily new confirmed cases of COVID-19 in part 2 of this blog series.

About the author

Jie Liu is a senior product engineer on the Spatial Statistics team. Jie earned her bachelor’s degree in Urban Planning and minored in Economics at Peking University, and earned dual degrees in Master of City Planning and Master of Urban and Spatial Analytics in School of Design, University of Pennsylvania. She dives deep into spatial statistics algorithms but is also design- and user-focused. She loves applying spatial data science to solve transportation planning and socio-economic problems. In her free time, Jie enjoys snowboarding, hiking, backpacking, cooking, and playing the ukulele.

Connect:
0 Comments
Inline Feedbacks
View all comments

Next Article

ArcGIS Urban Coming to ArcGIS Enterprise 11.3

Read this article