Parallel Processing is available when using a number of Geoprocessing tools in the Analysis toolbox. Parallel Processing could help you get better performance from your analysis if your data has a very large number of features (hundreds of thousands and millions), the data is complex and if the machine you are running your analysis on has enough resources to handle the processing of the data in parallel.
This blog will introduce you to those tools that use parallel processing, how parallel processing works in different ways depending on the tools you are using and provides insight into whether you should look into using parallel processing for your analysis workflows.
A bit of history…
ArcGIS Pro 1.0 introduced the Pairwise Intersect tool, the Pairwise Dissolve tool was added at 1.1, and the Pairwise Buffer tool was added at 1.3. The Pairwise Dissolve and Pairwise Buffer tools replicate the basic functionality of the standard Dissolve and Buffer tools that have come with ArcGIS for many years. However, the output of Pairwise Intersect can be quite different from the Intersect Analysis tool. So using Pairwise Intersect instead of the Intersect tool will require some reworking of your workflows. In ArcGIS Pro 1.2 parallel processing was enabled for the Pairwise tools. ArcGIS Pro 1.3 introduced parallel processing for area-area (polygon on polygon) overlay operations. Clip, Erase, Identity, Intersect, Split, Union and Update all can run using parallel processing if the inputs are polygon (with more parallel options coming in future releases). Parallel Processing can be controlled using the parallelProcessingFactor geoprocessing environment.
What is it?
Parallel Processing allows tools to distribute their work across multiple cores. For a deeper understanding, see the following resources on parallel processing:
How does it work in ArcGIS?
Pairwise tools and Parallel Processing
The Pairwise tools have been designed to deal with the data in discrete, well defined batches. The data is generally looked at on a feature by feature (Pairwise Buffer and Pairwise Dissolve) or pair of features (Pairwise Intersect) basis and these algorithms are more straightforward to run in parallel. The Pairwise tools attempt to ‘batch’ the data in such a way as to keep as many of the CPU’s busy processing these jobs as possible.
Analysis Overlay Tools
Found in the Analysis toolbox under the Extract and Overlay toolsets, the Analysis Overlay tools we’ve known and loved (where would be with without them?!) for many years were not designed to deal with data in discrete, well defined batches. For most of the overlay tools the underlying engine that does all the work, the Topology Engine (TE), creates a topological fabric across the entire extent of the data. It’s more of an ‘all at once’ algorithm rather than a feature by feature or feature pairs type of algorithm. Its goal is to maintain the topological integrity of the data across the entire extent of the data and determine all spatial interactions between all the input features regardless of the input feature class it came from. On top of this, it is the Topology Engine’s goal to be able to process any data, regardless of size or complexity, while maintaining this topological integrity and to do this in as efficient a manner possible. It has been designed to recognize when the machine it is being run on cannot handle the processing of the data (due to size or complexity relative to available resources) and if necessary will begin to spatially tile the data.
The TE’s parallel implementation (area-area overlay only… more coming in later releases) is to perform the process in groups of 4 (using a max of 4 cores). Rather than with parallel OFF where it will try first to see if all the data will fit into one tile, with parallel ON the TE will immediately tile the data by creating an n x n initial tiling based on an analysis of the input data and available memory. The minimum is a 2 x 2 tiling (total of 4 tiles) and will treat each of these tiles as if they are the upper level tile (tile level 0).
A job of processing each tile is sent to a separate core for processing. The available memory is estimated at the start of each job and at the start of each tile and each job will set a limit that is more conservative than with parallel OFF (due to more processing jobs running at once). If these tiles also need to be tiled (because their data does not fit within the memory estimation limit) the tiles will be sent to a job stack and parallel TE will continue to process them when a core is available. Tiles are processed in batches of 4 as much as possible. After all tiles are processed all output units that straddle one or more tile boundaries are fused in a separate parallel process.
Parallel Processing faster? Maybe.
If your analysis is only running against a few thousand features or less, the data is fairly simple and the running of the processing is taking just a few minutes or less it is unlikely parallel processing will provide performance gains. Running parallel with small amounts of simple data may actually make things slower. Something to keep in mind… sometimes you can get performance gains with large amounts of data, other times you get gains using parallel due to the complexity of the data. Sometimes you have both large amounts of data and huge complexity and parallel provides improved performance. Sometimes it does not. Its complicated so giving it a try is the best way to know if parallel processing will help you get improved performance.
Running the Pairwise tools in parallel generally can speed up performance. The number of cores and amount of available RAM will come into play and determine whether parallel processing will provide performance improvements. Distribution of the data also can be a factor and of course the amount of data will also be a factor. In rare cases there can be performance issues due to the data being randomly ordered (spatially). This can occur in very active editing workflows. If your data is randomly ordered you will notice it draws randomly in a map rather than in blocks of features. In these cases spatially sorting the data using the Sort tool can help improve performance.
Standard Overlay Tools
The Topology Engine was designed to be very efficient with extremely large, complex data within the bounds of the available resources on the machine you are running overlay tools. We’ve found that the point at which running the TE in parallel improves performances is often quite far down the road as far as size and complexity of the data goes. We have seen massively complex cases that take hours and hours with parallel OFF, now take only minutes with parallel processing turned ON. However, this will not always be the case. We still see cases that take the same amount of time or longer with parallel turned ON.
For the standard overlay tools using the Topology Engine, in some cases we see slower performance when parallel processing is used with data that can be processed all in one tile by the topology engine with parallel processing turned OFF. In other words, against data that is not big enough, nor complex enough to require tiling when run with parallel OFF. Why the slow down? There is a trade-off between tile subdivision and parallel processing. The basic idea is that there will be a performance gain because the parallel operation will compensate for the extra work required by tiling, but this is not always the case.
As mentioned above, with Parallel TE turned ON we take the data and immediately create a 2 x 2 tiling. The TE then processes each of these ’tiles’ separately as if they were the upper level tile, sending the job for each individual tile to its own core. Since we are now processing 4 tiles at once each of these tiles gets a bit less memory than with parallel TE OFF at the start of each job and tile. The allocated memory is calculated by the TE as 60% of available physical memory subject to an 8 GB limit. We need to be careful about the amount of memory each tile can use so as not to push the machine into a state where it runs out of memory and has to page. For these types of cases each tile gets slightly less memory than if we would have been able to proceed with the operation using one big tile (for all the processing as it would with parallel OFF). Processing of the data within one of the parallel processing tiles may require additional tiling to deal with the size or complexity of the data in that quadrant of the data where, when the data was taken as a whole with parallel TE OFF, it did not tile at all. More tiling often means more time.
Another scenario where you may see parallel being slower is in cases where the data is not distributed evenly across the extent of the data. For example, where one of the 4 parallel tiles ends up with most of the data… or most of the complex data. We have 4 upper tiles to process, but one has ‘all’ the data or ‘all’ the complex data with complex feature interaction (overlapping features that generate thousands – millions of new features representing the unique overlap in the inputs). This one tile, allocated less memory than it would have gotten if run with parallel OFF, has to work with fewer resources and ends up tiling the data a great deal more (which is expensive). The subdivided tiles go on an input stack for the next batch of 4 tiles. Each tile in a 4 tile “batch” will always have the same amount of allocated memory to work within.
To make sure we are clear, you may get the impression that every time a tile is subdivided there is less memory to work with, this is not so. Each batch of 4 tiles has approximately the same amount of memory to work with regardless of the subdivision level. When tile data is read, we estimate whether the data will fit into the estimated memory limit . If not the read is discarded and the tile is subdivided. Normally the only expense is the partial re-reading of the data, and reading data is very fast.
It may seem obvious, but another reminder is always a good thing, having your input and output data on a solid-state drive (SSD) disk will provide the best read/write performance whether using the TE or pairwise tools.
Your benefits from using parallel processing with tools that use the TE or the Pairwise processing will be dependent on the data itself, and the available resources on the machine it is being run on. You will want your machine to have at least 4 cores, but more are recommended. In ArcGIS Pro 1.3 the TE tools will only use 4 cores. Pairwise tools will use from 1 to the number of cores on the machine (all the available cores on the machine will be used providing there is enough work to be done to use all the cores). As far as memory (RAM) goes, the more the merrier. There are no general guidelines on how much memory you’ll need, there are infinite possibilities depending on the size and complexity of your data.
The default behavior of the tools and parallel processing gives you a hint as to where you should start when investigating if parallel processing is going to help your analysis perform more quickly.
For Pairwise tools, parallel processing is turned ON by default. Generally, but not always, when run in parallel the Pairwise tools will run as fast as or faster than when run with parallel off for a large variety of data.
For Standard Overlay tools the default is for parallel processing to be OFF (remember, at 1.3 only area-area overlay supports parallel processing). For small, simple data parallel processing will most likely slow things down for standard overlay tools. Mid to very large, complex overlay operations may benefit from parallel processing if the machine the operations are being run on can support it.
So, should you try parallel? I believe that there are a lot of performance gains to be seen with lots of different types of data when using the Pairwise Buffer and Pairwise Dissolve tools in parallel. Results will vary, but I’d give them a try and compare your results when running the same analysis using the basic functionality of the Buffer and Dissolve tools. The Pairwise Intersect tool run in parallel will have more varied results but there will be performance benefits to be had (keep in mind the output differences with Pairwise Intersect vs Intersect that will make you rethink your workflow). For the tools that use the Topology Engine, if the inputs are polygon and you have analysis workflows you run over and over against extremely large, complex data, give it a try to see if you are can get an improvement in performance. Just remember, the more memory you have, the more likely you will have success!