More adventures in overlay: point in polygon

Counting the number of points in a polygon is a  common overlay operation. But unless you’re aware of what happens when points fall on polygon boundaries, or when points fall just outside the coverage of your polygons, you may not be getting the results you expect.

Here’s the scenario:

The map to the left shows the household points with graduated symbols based on HHSIZE.

There are lots of ways to perform the overlay to get the number of households and the sum of household size. For now, let’s look at what I would describe as the “standard method”: Spatial Join, shown to the right.  The Target Featuresare the polygons (Districts) and the Join Features are the points (Households).  Since I want the sum of HHSize, I right-click the HHSize field in the Field Map and choose Sum as the Merge Rule.  The Match Option is INTERSECT.

The figure below shows the output of Spatial Join, the Districts_SpatialJoin feature class, symbolized by the sum of HHSize, and its table with statistics about the Join_Count field (the number of points found in the polygon) and the HHSize field (the sum of all HHSize values for the polygon). These statistics tell us that 327 points were overlaid and the sum of HHSize for all polygons is 1621.

Now let’s go back to the Households table and get some statistics.  There are 340 points with a HHSize total of 1670.

Huh?  Out of the original 340 points, only 327 were found to intersect the polygons.  The HHSize sums are off as well; instead of 1670, there’s 1621.  Why the difference?

The issue is that some points are ambiguous; some  fall outside the coverage of the polygons and some  fall exactly on polygon boundaries, as shown on the map to the left.  The red triangles show points that fall outside the coverage of the polygons and do not overlay any of the polygons. So they don’t get counted at all. More interesting are the blue asterisks that show points that fall exactly on a polygon boundary.  When a point falls exactly on a polygon boundary, it gets overlaid with each polygon that shares the boundary.  In the figure, point A overlays two districts and gets counted twice, once in district 5 and once in district 6.  Point B overlays districts 2, 4, and 5 and gets counted thrice.  Point C, although on a border, gets counted only once since it falls on an exterior boundary.

Finding boundary points

To find boundary points, use the Select By Location tool found in the Selection menu of ArcMap.  The Target layer is the point layer (Households) and the Source layer is the polygon layer (Districts).  The Spatial selection method is “touch the boundary of the source layer feature“.

The result is a selection of points that fall exactly on the boundary of a polygon.

Finding points outside

To find points that fall outside the coverage of the polygons, use Select By Location with the same inputs as above (be sure to clear any selections beforehand), but choose “are within the source layer feature” for the Spatial Selection method.  This selects all points that are covered by the polygons.  Next, right click the point layer (Households), and click Selection > Switch Selection. The result is a selection of points that fall outside the coverage of the polygons.

If you have any ambiguous points (border points or outside points), you need to determine what to do with them before doing any point-in-polygon analysis.

  1. Are the outside points truly outside? In this example, the outside points are just a few meters away from a polygon boundary, and I want them to overlay the closest polygon.  For your data, you’ll have to decide based on your data resolution and quality whether outside points should overlay a polygon.
  2. For boundary points, do you care which polygon they overlay?  For some analyses, you may want the point to overlay each polygon that shares a boundary (because, technically, the point does belong in each polygon).  Or you just don’t care — just have the software randomly assign the point to one (and only one) of the shared polygons.

You could, of course, edit the points and move them inside the polygon to which they belong, but this requires you have more information than just the location of the point, which means you may have some attribute about each point that tells you what polygon the point should overlay.  And if you have this information, you may not have to do an overlay; you’d use the attribute in something like Summary Statistics to calculate totals for each polygon.  If your points are the result of geocoding addresses, see the section at the end of this post; it may be that you need to geocode your addresses again using a side offset.

So, if editing and moving the points isn’t an option (either because there’s too many or there’s no other information), you need something other than the “standard method” of using Spatial Join shown above. My favorite is…

Use proximity, not overlay

In may seem counter-intuitive, but instead of using an overlay function to do point-in-polygon analysis, use a proximity function.

In ArcGIS, the proximity tools (such as Near and Generate Near Table) have a unique (but not unexpected) behavior: when two or more nearby features are equal-distant from the target feature, one of the nearby features is chosen at random.  When a point falls on a polygon boundary, it is equidistant from all bounding polygons.  That means that one of the bounding polygons is chosen at random to be the closest polygon, and over-counting is eliminated.

Here’s a model that does point-in-polygon analysis using the Near tool.

Below are details about each tool in the model.


Summary Statistics

Join Field

Address Matching

Points that fall on polygon boundaries are usually the result of geocoding addresses to street centerlines that also form polygon boundaries (in the example above, the Districts feature class uses streets centerlines as polygon boundaries).   If a side offset is not specified when geocoding addresses, the geocoded points will fall exactly on the street centerlines and, by association, the polygon boundaries.  So, to avoid an ambiguous points situation, you’ll want to specify a side offset on the address locator you use for geocoding.  An offset of just a few feet or meters is all you need to avoid ambiguity.

Best Practices

To wrap this up, if you’re doing a point-in-polygon analysis, you need to investigate your data before performing the analysis:

Next Article

Virtualizing 3D training models with NVIDIA AI Enterprise

Read this article