Matching COVID-19 Cases to Facilities: Lessons Learned

During the COVID-19 pandemic, the authors, who are epidemiologists at the Baltimore City Health Department, matched COVID-19 case and death data with lists of facilities most likely to be impacted by outbreaks. This information assisted the department with its response operations. Their article describes considerations for matching health data with facilities using ArcGIS tools.


The COVID-19 pandemic has disproportionately affected older adults, particularly those living in facilities such as nursing homes, and people residing in congregate living settings such as shelters and correctional facilities.

In Baltimore City, an estimated 28 percent of deaths from COVID-19 occurred in facilities for older adults and in congregate settings from March 2020 to March 2023. While cases within facilities represent a small percentage of all cases citywide, many of

the deaths caused by COVID-19 occurred within facilities. Nursing homes had 19 percent of all deaths, followed by senior housing with 6 percent of deaths and assisted living facilities with 2 percent of deaths.

Within the first few days of the pandemic, Baltimore City Health Department epidemiologists created an internal map focusing on potential risk to older adults and facilities in Baltimore City. As the pandemic progressed, they matched COVID-19 case and death addresses with facility addresses.

Address matching could potentially identify more instances of deaths that occurred in facilities than the death certificate information that had been provided. Address matching reports were utilized daily by the contact tracing and outbreak investigation teams to guide their work.

When Interviews Can’t Be Completed

Some COVID-19 case investigations and outbreak interviews cannot be completed for reasons ranging from lack of a phone number to an unresponsive subject. However, address information is available for cases from COVID-19 test results. When a health-care provider orders a COVID-19 test, the patient’s address must be entered in the test requisition form.

Information about a case’s residential/facility address on the test requisition form can be used to glean information useful for operations. While some facility type and address matching could be completed using code, it was far more efficient to export case data and perform geographic matching using ArcGIS tools.

COVID-19 case data varies across jurisdictions. It may contain many types of location data including residence; test location; test organization; place of employment; places visited; place vaccinated; and travel, both domestic and international. In addition, there is also contact tracing data for these cases. These contacts may later become cases.

An Overview of the Process

Before beginning matching health data to facilities lists, determine the concerns of the people using this information. In this case, the consumers of the information were the assistant commissioner for clinical services, the Office of Acute Communicable Diseases, and the contact tracing team.

Identifying Which Facility Types to Track

At the Baltimore City Health Department, the epidemiologists decided to focus on licensed nonhospital health facilities and other high-risk settings for COVID-19. This list has grown and changed over time, but has included

The epidemiologists discussed which facilities to track. They looked at laboratory records, talked to outbreak investigation staff, and contacted tracing staff. The facilities tracked varied over time as priorities changed or outbreaks among certain groups were detected.

Obtaining Lists of Facilities

Obtaining up-to-date lists of medical facilities can be challenging. However, some state licensing boards maintain lists of licensed facilities. In Maryland, the Office of Health Care Quality maintains publicly available lists ( Some facilities may have a single address while others may have multiple addresses or address ranges. For example, colleges and universities were tracked using a list of locations with high frequencies of cases that included dormitories and student apartments.

Cleaning Address Fields

The COVID-19 address data was unstandardized because it came from multiple providers and included handwritten addresses. The first address field could sometimes contain the facility name or apartment, suite, unit, or floor number. These address fields can easily be cleaned up using a Python script in ArcGIS or a statistical program such as R or SAS.

Remember, when working with sensitive data, be sure to examine potential legal and ethical issues in advance. At Baltimore City Health Department, access to lists of cases matched with facilities was limited to outbreak staff and senior contact tracing staff.

Using an Offline Address Locator

With ArcGIS products, address locators and their reference data can be packed up by an organization and shared with other users, who can then unpack and use the locator. Having up-to-date location data is important as new communities are developed, populations move, large apartment complexes close, or those complexes are redeveloped.

Since many geocoding services are not approved for Protected Health Information (PHI), Baltimore City Health Department uses an offline address locator with locally stored reference data. [PHI is individually identifiable health information such as a name, address, or medical record number.] This allows large datasets to be geocoded on site.

The epidemiologists at the Baltimore City Health Department obtained a packaged locator with reference data from the Baltimore City GIS Office. If you do not have an address locator with reference data stored locally, consider contacting your state or local GIS office. Be sure to check the address locator properties to see where reference data is being pulled from.

There are also a small number of geocoding services for PHI. These services require a Business Associates Agreement (BAA). [A BAA is a legal agreement that outlines the responsibilities and obligations of a business associate (e.g., vendor) to handle PHI in a compliant and secure manner.]

Joining Addresses with a Facilities List

Adding a join to match address fields between the cases and facilities list in ArcGIS Pro is easy. Factors to consider when creating a join between address fields and facilities include

Creating a list of the locations with the most cases—a high frequency list—is important. In this case, the high frequency list consisted of locations or addresses with many COVID-19 cases. This list included both addresses that matched facility lists and addresses that did not match facility lists but still had many cases. For example, university student housing showed up clearly in the data. Real property data can help you better understand location types.

How to Summarize Results

Daily, Baltimore City Health Department epidemiologists produced a summary report of facilities with the number of cases and deaths. This report was sent to the outbreak investigation and contact tracing teams to facilitate their work. Case line lists by facility were also rapidly produced to allow investigators to compare against lists manually created as cases were reported by facilities.

Impact of Case and Facility Matching

Providing a daily summary report of COVID-19 cases and deaths gave health department staff a better starting point when investigating COVID-19 cases and outbreaks and performing contact tracing. These summaries helped avoid manually scrutinizing lengthy lists of unstandardized addresses. It also enabled epidemiologists to quickly identify patterns. As a result, it made Baltimore City Health Department’s COVID-19 outbreak and contact tracing operations more efficient and effective. This was crucial, given the scale and complexity of the public health response to the pandemic.

An interactive map was created using ArcGIS Online to highlight populations of older adults and people in facilities potentially at risk of COVID-19 during the first few days of the pandemic.

Five Lessons Learned

Sometimes the end may just be the beginning. Lessons learned should be incorporated to improve processes over the long term. Matching health records with facilities lists can provide timely operational information. Facility matching has been expanded from cases to deaths and vaccination data. Matching can potentially shorten the time from laboratory test to outbreak detection, or at the very least allow contact tracing to begin sooner.

  1. Providing matched lists of cases and facilities greatly aids outbreak investigation and contact tracing.
  2. Simple cleaning of address fields can greatly improve the matching process during geocoding—making it faster and more accurate.
  3. Facility staff can be matched to their work location, rather than their residential location.
  4. Consider big data approaches when cleaning address fields. These could include breaking addresses into component parts and using fuzzy matching. Large organizations could consider paying for address cleaning.
  5. Realize that address matching has limitations. Matching will almost always underestimate the burden of disease in facilities.

For more information,
contact Jonathan Gross at

About the authors

Jonathan Gross

Jonathan Gross is an epidemiologist at the Baltimore City Health Department. Throughout the COVID-19 pandemic, he created maps and performed spatial analysis of COVID-19 cases, deaths, hospitalizations, and vaccinations. His background includes a master’s degree in public health in epidemiology from the University of Michigan, Ann Arbor. He holds a certification in public health and a graduate certificate in geographic information systems from Johns Hopkins Advanced Academic Programs.

Darcy Phelan-Emrick

Darcy Phelan-Emrick is the chief epidemiologist at the Baltimore City Health Department. She leads the department’s efforts to use existing data sources and new data collection tools to assess public health needs and inform policy development and assurance. She is a faculty member at the Johns Hopkins Bloomberg School of Public Health. Phelan-Emrick earned a doctorate in public health and a master’s degree of health science in epidemiology at Johns Hopkins University.