Raster Image Processing Tips and Tricks — Part 4: Image Classification

By Allison Muise

This is the fourth in a series of blog posts that will cover some tips and tricks for performing the following operations on a series of aerial images using ArcGIS 10.0:

These images are from a project I recently completed looking at the structure of a seabird colony off the coast of Nova Scotia, Canada, and are representative of the less-than-ideal imagery many of us have to work with regularly. The final goal of the project was to produce a detailed classification of the island’s vegetation from a series of digital aerial photographs. The Georeferencing and mosaicking of the imagery were covered in previous posts, as was creating a polygon mask of the island. In this final section, I’ll focus on classification techniques including identifying areas requiring more focused training areas, what makes a good training area, and how you can quickly and easily clean up your classified image. An excellent resource on classification in ArcGIS10.0 is available here.

A high quality vegetation classification can be a powerful tool in vegetation and habitat analysis, as it can provide a lot of information about a large area with relatively little work. Initially I tried to classify the entire image, but reflections in the water and waves breaking on shore made it difficult for the software to construct an accurate classification, even using hundreds of training areas. To reduce the potential number of misclassifications, the area covered by ocean was excluded from the image using the Extract by Mask tool and the outline of the island created in the previous post.

Identify Trouble Areas using Unsupervised Classification

While, as seen in the previous post the Iso Cluster Unsupervised Classification did a great job at separating the island from the ocean, it really struggled with grouping the vegetation into meaningful classes. Even using as many as 30 classes, the maximum number where I could still make sense of the results, the software could not distinguish between three of the major vegetation types due to variations in the foliage. While I didn’t get a meaningful classification using this method, it took less than 2 minutes to run and I was able to identify problem areas where I will need to focus my training areas to get a successful Maximum Likelihood Classification.

Figure 1: Results of a 30 class IsoCluster Unsupervised classification

Dig Deeper with Supervised Classification

A Maximum Likelihood Classification can be time consuming to prepare, but the results can be fabulous. The key? – training areas – lots and lots of good training areas. While you can certainly have too few, I really don’t think you can have too many good quality training areas. The help documentation covers several of these topics in detail, including creating, evaluating and managing training samples.

So… how do you choose good training areas?

1 Enhance your image

If you are working with an RGB image it can be difficult to distinguish subtle differences in the vegetation. The Image Analysis Window offers several simple ways to adjust the display of an image in order to enhance the variations. Applying a stretch renderer such as Histogram-Equalization can make it easier to identify areas of different vegetation. Don’t worry, modifying how the image is rendered (displayed) won’t affect the classification results.

Figure 2: Histogram-Equalization stretched image

2 Choose areas you are familiar with

It makes more sense to have a ‘grasses’ training area than a ‘light green’ training. There may be several ‘light green’ areas in the imagery and they may not all be the same type of vegetation, but if you know two patches of light green are both grass, these training areas can later be grouped into more meaningful classes. If you can’t visit the area yourself, consult the literature and/or talk to people who are familiar with the area.

3 Make large training areas

Content is more important than size. The recommended number of pixels per training areas is from 10n to 100n, where n is the number of bands in the imagery. These numbers are recommended as they generally will supply the software with adequate information to determine how to classify each pixel in an image, but the range of values in that training area must also be considered. In my experience, better classification results come from using training areas which are as large as possible. I have found that sometimes this means creating multiple training areas which are smaller than recommended, but which can later be merged together with the help of information from statistics, scatterplots and histograms. More information is available here on how Maximum Likelihood Classification works.

As an example, trees may appear primarily as mid-tone greens overall, but they usually will also have areas of highlights and shadows. If all three of these areas are included in a single training area, the range of pixel values will be very large, and may fully encompass training areas defining other vegetation types. This can result in misclassifications when the software has to guess which of the two overlapping classes is most appropriate. The areas of highlight and shadow may not be large enough to meet the recommended sizes for training areas, but small training areas created over these areas can later be merged into classes which will contain sufficient pixels.

4 Use the scatterplots, histograms and statistics

Evaluate your training areas as you create them to ensure they are large. Each training area should appear as a fairly tight cluster of points in the scatterplots display. Similarly, training areas should have a single peak in the histograms and should have unique statistics. If a training area has multiple clusters on the scatterplot, multiple peaks on the histograms, or shows a lot of overlap in statistics, it isn’t large enough, which could result in misclassifications. More information is available here on evaluating training samples.

Figure 3: Selecting good training areas in a forested area. Note the dense, distinct clusters on the scatterplots and the shape and distribution of the histograms even though all of these training areas cover the same vegetation type.

5 Group your training areas into distinct classes

If training areas overlap in the histograms and scatter plots they will be hard for the software to decide how a particular pixel should be classified. Sometimes that’s ok. You would expect “fern” and “fern shadow” training areas to be similar. Although very similar training areas may be grouped into classes, if the training areas are distinct enough on their own, it’s best to leave them separate for now, even if they represent the same vegetation type. Narrower classes will help the computer distinguish them from other vegetation types and they can be grouped together post-classification. You can also use the Interactive Supervised Classification to peek at how the classification will look based on your current training areas.

6 Record and save your training areas

The classified image will contain only the values listed in the ‘Value’ column of the Training Sample Manager. Before you run the classification, I would suggest using the arrows to group related training areas and classes together in the list, and then resetting the class values so they will be ordered the same way in the classification. Depending on the number of classes, I would also record somewhere what vegetation type each value represents. Once you’ve done that, save your training areas as a shapefile or feature class. If you need to modify or re-do your classification later, this will save you a lot of work! Finally, don’t forget to create a signature file. You’ll need it to run a Maximum Likelihood classification.

Clarifying the Results

Whether you’ve used unsupervised or supervised classification, the results are likely going to require some finishing touches to finalize the classes and remove excess pixelation.

Figure 4: Results of a Maximum Likelihood classification

Now is the time to regroup your classes into recognizable vegetation categories. Before making the reclassification permanent with the Reclassify tool, try assigning common symbology to the classes you think should be regrouped together. This can be a quick way to make sure the vegetation has been classified in a way that makes sense given your knowledge of the area. Once you are satisfied, go ahead and reclassify the raster cell values permanently.

Small, isolated groups of pixels can be processed away using the Majority Filter tool. I would suggest running it until it is no longer making obvious changes and then using the Boundary Clean tool. Be aware that these tools may shrink or remove entirely small but important areas of vegetation.

Correct Localized Misclassifications

Some classes may require more work to correct localized misclassification. In this case there were several areas where grass, which is only found in the interior of the island, was classified as beach pea, which is only found on the beach. These errors can be manually corrected by converting the raster to a polygon and editing the grid code values associated with the incorrect polygons. This may sound tedious, but it can be done quickly and easily with a few editing tricks, and can dramatically improve the quality of the classification result. The steps are listed below:

Use Select By Attributes to select all the polygons with the grid code that was mistakenly assigned, in this case the beach pea class.
Display the selection in its own layer above the main classification and make this new layer the only selectable layer. I will refer to this layer as the ‘display layer’ for the remainder of these steps.
Open an edit session and, from the display layer, select all the incorrect polygons (i.e., areas of beach pea in the interior of the island) which should be the same new grid value (i.e., grass) by clicking and dragging the selection tool over them. If the new grid value will vary across the image, only select the polygons for one grid value at a time.
Use the Field calculator to modify the grid code value of the selected polygons to their new grid code. Because this layer is only a display layer of the classification layer, these updates are automatically made in the classification layer as well. After saving your edits the display layer can be removed from the Table of Contents and the changes will be preserved in the classification layer.
Once the layers have been cleaned up, the Dissolve tool can be used to merge polygons with the same value together.
Finally, choose representative but distinct colors for each class, and show off your classification!

Figure 5: Final classified image

Project Conclusions

Visually, the vegetation classification results shown in Figure 5 match what I experienced while on the island this past spring. The spatial accuracy of the results, though, depends almost entirely on earlier stages of the project.

The accuracy of the georeferencing was improved by placing the links on evenly distributed, defined objects and monitoring the impact each link had on the overall RMS value. In those areas without clear features, other techniques were used to increase the accuracy of links placed on the features which did exist.

When mosaicking, a seamless image was produced. Classification groups pixels based on their RGB values, so changes in lighting would have affected the ability of the software to accurately distinguish between vegetation types. Repositioning the seam in the image also removed the discontinuities by eliminating areas which had been georeferenced with less accuracy.

Using an unsupervised classification and generalization tools created an outline of the island much more accurate than tracing the island by hand. The outline, used as a mask to isolate the dry land area of the island, focused the classification on the vegetation – my true area of interest.

Combining all these tools and techniques, I was able to produce a vegetation classification in which I have a high degree of confidence, and which corresponds to my knowledge of the area.