Where do statistics, spatial statistics, and geostatistics fit in GIS projects? Dr. Lauren Scott, a product engineer on Esri's geoprocessing team and an expert in the use of statistics in a geospatial context, answers that question and others in an interview conducted by Matt Artz, Esri's GIS and science marketing manager and editor of the GISandScience.com blog.
At Esri, Scott is responsible for software support, education, documentation, and development of spatial statistics tools in ArcGIS. She received her Ph.D. in 1999 from the Joint Doctoral Program at San Diego State University and the University of California, Santa Barbara. She holds an M.A. and a B.A. in geography from California State University, Fullerton.
Artz: How do traditional statistics, spatial statistics, and geostatistics differ from each other?
Scott: Traditional or non-spatial statistics are typically used in two different ways. In the first case, we have a large set of data values that we want to understand, and we can use descriptive statistics to try to summarize them. In the second case, we may have a set of samples and we want to know how reflective those samples are of the broader population.
Artz: Where do spatial statistics come into play?
Scott: Spatial statistics were designed specifically for use with spatial data-with geographic data. These methods actually use space-area, length, proximity, direction, orientation, or some notion of how the features in a dataset interact with each other-right in the mathematics. That's really what makes spatial statistics different from traditional statistical methods.
Artz: Are there different types of spatial statistics?
Scott: Yes, there are many different types. There are descriptive spatial statistics similar to descriptive traditional statistics. For example, if we have lots of points on the map, we might want to know where the center of those points is located. (The equivalent traditional statistic would involve computing the mean for a set of data values.) We might also want to know how spread out those points are around the center. (This is similar to computing the standard deviation for a set of values.)
Other statistical methods involve spatial pattern analysis: We try to identify if there is any structure to the data we're looking at-for example, are features clustered? Are they dispersed? Are high values all found together? Are there "hot spots" in the data? Spatial pattern analysis tools can help us to identify anomalous or unusual spending patterns, find unexpected areas with high disease rates, crime, or fire incidents, or track diffusion of some environmental contaminant. There are really lots of applications.
Then there are spatial statistics concerned with identifying and measuring spatial relationships. Imagine we are looking at a hot spot map for 911 calls. We might be curious about why we are seeing so many calls, or hot spots, in certain locations. We can use regression and spatial regression analysis to examine relationships and to identify the factors promoting the spatial pattern we're observing-factors that would help us explain why 911 rates are so high.
Artz: And how would you define geostatistics?
Scott: Geostatistics are a type of spatial statistics. Kriging, for example, is a very powerful geostatistical technique that goes beyond interpolation, looking not only at nearby features to predict values where you don't have sample data, but actually utilizing spatial relationships to give you stronger, more accurate predictions.
Traditionally, geostatistics have been used to analyze geologic and environmental data-for example, rainfall, or elevation-the goal being to create a surface from sampled data points. These methods are widely used in the petroleum and mining industries. But geostatistics are ideal for analyzing and predicting the values associated with nearly any kind of spatially continuous phenomena.
Artz: How has Esri addressed geostatistics and spatial statistics in its product offerings?
Scott: Many people have probably heard of the ArcGIS Geostatistical Analyst extension, a specialized set of geostatistical tools. It's most useful if you're working with sample data taken from a continuous phenomenon such as rainfall, temperature, geology, or soils and your goal is to create a surface-a probability surface, a prediction surface, or an error surface. However, as the product has been enhanced over the years, its capabilities now extend beyond creating surfaces and the tools are valuable for a large variety of applications.
All ArcGIS users also get the Spatial Statistics Toolbox with tools for analyzing spatial distributions, patterns, processes, and relationships as part of the core software at all license levels. These statistical tools let you do a number of things, including determining central tendency or identifying the overarching directional trend, identifying hot and cold spots or spatial outliers, assessing overall patterns of clustering or dispersion, and modeling spatial relationships. I'm so happy with how many people now use these tools! When I first started developing the Spatial Statistics Toolbox as a set of sample scripts, I didn't really envision how successful they would become.
Artz: Are there other statistical tools that users can leverage inside ArcGIS?
Scott: Certainly. Esri Business Analyst has statistical methods for identifying market share, service areas, sales territories, and potential customers. It also comes with lots of data to use with those methods. The ArcGIS Spatial Analyst extension includes statistical methods to help classify remote sensing data.so statistical tools are found throughout the ArcGIS family of products. And the geoprocessing framework in ArcGIS is also very much extendible, so it's pretty easy to connect to traditional statistical packages. You can also create your own custom tools; these custom tools work just like any other out-of-the-box geoprocessing tool in ArcToolbox.
For people who already use SAS software, both SAS and Esri provide a product called the SAS Bridge which makes it easy to work in both software environments at the same time. We also have some sample scripts available for people to download from the Geoprocessing Resource Center for using R, an open source statistical package, within the ArcGIS framework.
Artz: Why should people consider using spatial statistics?
Scott: When we analyze our data outside of their spatial context-when we remove space and time from our data-it's like we're only getting half the story. Things happen in space and time, and if we ignore that, our analysis is going to be incomplete. This is an important difference between traditional statistics and spatial statistics: traditional statistics often make the assumption that data are free of something called spatial autocorrelation.
Artz: What is spatial autocorrelation?
Scott: It's a big word, but it's a very simple concept: spatial autocorrelation just means that there is spatial structure in your data. That structure might be clustering, or some kind of dispersion, but in any case, the distribution of your features, or of the data values associated with your features, is not random. Jobs, houses, manufacturing, shopping opportunities.these are not randomly sprinkled across the landscape; they cluster together into cities and districts and land-use zones. Spatially autocorrelated data violates the assumptions for some traditional statistical methods and so it is often seen as a nuisance by traditional statisticians.
GIS analysts and spatial statisticians, however, get excited when they see spatial autocorrelation in their data, when they observe clustering in the landscape-because it's evidence that underlying spatial processes are at work. And that's exciting! Something out there is causing this clustering or structure, is promoting different types of relationships and spatial patterns; often understanding that "something" is what we are most interested in. Why are people persistently dying at a younger age in this part of the country? What might be the factors explaining why kids in this school district consistently turn in high test scores?
Spatial processes are often invisible, but by using tools in the Spatial Statistics Toolbox to measure the strength and scale of their outcome-spatial clustering or dispersion, hot spots, or spatial outliers-we learn more about them and we get a much better understanding of our data.
Artz: You talk to a lot of GIS people about statistics. What do you think is most often misunderstood about spatial statistics?
Scott: In the GIS community, the thing that's probably most often misunderstood is just that it's hard! People hear "statistics" and they immediately have bad memories of a class they took in high school, and they just shut down. And I think that's too bad, because to me while traditional statistics are interesting, spatial statistics are really fascinating! And they aren't that difficult. Some spatial statistics reflect very simple concepts, but still they can be used in very powerful ways.
Artz: Can you give me an example of a statistical tool that's simple, yet powerful?
Scott: The simplest tool in the Spatial Statistics Toolbox is the mean center tool. It works by taking all your x-coordinates, and computing the average. It then takes all your y-coordinates, and computes the average for those. The mean center is that average x and y coordinate location. How much more simple can we get than that? But you can use this tool in powerful ways. For example, we looked at population data by county for California over the last 100 years. We were interested in finding the population center and in seeing if it changed over time, so we computed a weighted mean center. In the early part of the century, the population center was near San Francisco, a reflection of the growing banking industry there. Each decade the population center moved south, at first very quickly, reflecting growth in southern California associated with the oil industry, with Hollywood, aerospace, and everything else going on there. The southward shift in the population center slowed down, however, toward the end of the century.
The simplest tool in the Spatial Statistics Toolbox allows us to visualize a complex spatial trend; how quickly the mean center moves, and where it moves, provides interesting information about the spatial processes promoting this southern shift in population.
Artz: But some of the tools are not as straightforward as mean center?
Scott: True. Most GIS tools are fairly straightforward; you just fill out the parameters and go. For some of the spatial statistics tools, however, you do have to think a little bit more about spatial relationships, the scale of your analysis, study area boundaries, and so on. But we try very hard to include good strategies in the ArcGIS documentation that explain the proper use of the tools and help you decide on the right parameters for your particular analysis.
Artz: Where can people learn more about using statistics in their GIS projects?
Scott: In the book The Esri Guide to GIS Analysis, Volume 2, by Andy Mitchell, every chapter corresponds to a tool in the Spatial Statistics Toolbox. This is a great resource for people who are starting with little or no knowledge of spatial statistics. We also have some free webinars and tutorials available through the Esri Virtual Campus and the ArcGIS Geoprocessing Resource Center. Your blog GISandScience.com contains quite a few resources for learning about spatial statistics and spatial analysis in a more general sense.