Join Features allows you to transfer attributes between datasets based on spatial or attribute relationships, making it one of the most powerful and commonly used tools in ArcGIS Online. The June 2020 update of ArcGIS Online includes a new parameter for Join Features, Keep all target features, that allows you to choose whether to keep unmatched features from the target layer in the output dataset.
In this blog, you’ll learn the basics of creating joins in your data and how to use the new Keep all target features parameter, and see a real-world scenario that shows why the Keep all target features parameter is important in your analysis.
What are dataset joins?
Joins are used to take the features from two datasets and combine them together based on shared values, shared locations, or both. For example, you can use a join to add locations to a nonspatial table using fields with county codes or add road information to a dataset with traffic collision locations using a spatial relationship.
In ArcGIS Online, you begin with a target layer and append information to it from a join layer. It’s important to choose the correct target layer because the spatial information in the output dataset will be based on the target layer. If you are adding attributes from a nonspatial table to county boundaries, the county boundaries will be the target layer and the nonspatial table will be the join layer. The output layer will be the county boundaries, but includes the information from the joined table.
In this example, each target feature has only one matching join feature. In other cases, there may be multiple join features for a single target feature. You can choose a one-to-many join, which will create an output dataset containing multiple entries of the same target feature with different matching join features. You can also create a one-to-one join, which summarize the join features using statistics like sum or average, or returns only the first or latest matching join feature for each target feature. For more information about join operations and statistics, see Usage notes in the Join Features documentation.
To keep things simple, the remaining examples will continue to use only one matching join feature for each target feature.
Keep all target features
When you create a join, either spatially or attribute based, the Join Features tool looks for matches between the target layer and the join layer. The Keep all target features parameter determines which features from the target layer are included in the output dataset. There are two possible options based on the Keep all target features parameter. The first option is to include only features from the target layer that have a match on the join layer by keeping the Keep all target features parameter unchecked. In the diagram below, the target layer includes the CountyID 1237. Since there is no matching county in the join layer, the feature is not included in the output.
This join type is commonly called an inner join, because it only includes overlapping or matching features from both input datasets.
The second option is to include all features from the target layer even if they don’t have a match in the join layer by enabling the Keep all target features parameter. In the diagram below, the output includes county 1237, even though there are no matching values from the join layer.
This join type is commonly called a left outer join, because it includes all features from the left (or target) dataset, plus the matching features from the join dataset.
Now that you know how Keep all target features works, let’s see it in action. We’ll look at a scenario comparing the COVID-19 incident rate to unemployment percentages across counties in the continental United States using data from the COVID-19 GIS Hub.
The data for COVID-19 cases by county is available in the COVID-19 Cases US dataset by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
The unemployment data by county is available in the Bureau of Labor Statistics Monthly Unemployment (current 14 months) dataset by Esri demographics.
To understand the relationship between COVID-19 cases and unemployment, we have to join the two datasets so that both attributes are available in the same layer. Since the unemployment layer has data for every county, we’ll use it as the target layer. The COVID-19 cases will be the join layer that gets appended onto the unemployment data. These two datasets can be joined either spatially, or using the county code.
When the Keep all target features parameter is left unchecked, the output layer contains missing data for the counties that have no recorded COVID-19 cases.
By contrast, when Keep all target features is checked, the output layer contains all counties.
The layer can be used to create a Relationship map that shows the interaction of high and low values for both unemployment percentage and COVID-19 incident rate. Counties missing one or more values are styled with a gray symbol for Other.
- Join Features is a powerful tool used to combine information from two datasets.
- The Keep all target features parameter allows you to choose whether your output should be based on an inner join, or a left outer join.