Making Effective Use of Geostatistics
By Witold Fraczek, Esri Application Prototype Lab, and Andrzej Bytnerowicz, USDA Forest Service
Applying GIS tools for studying the geographic distribution of regionalized variables
Geostatistics is a branch of science that applies statistical methods to spatial interpolation. Although geostatistics was developed independently of GIS, it has become an integral part of GIS. Without a computer and GIS mapping ability, it wouldn't be known outside a small group of geostatistical gurus. Just as one does not have to be a GIS expert to use GIS, one doesn't need to be a geostatistician to make effective use of geostatistics. Meteorologists, soil scientists, geologists, oceanographers, foresters, and other scientists can benefit from using appropriate geostatistical methods.
The functionality of geostatistics is applicable when the studied phenomena are regionalized variables that fall between random and deterministic variables. The geographic distribution of regionalized variables cannot be mathematically described as deterministic; yet the distribution of intensity of those phenomena is not random. Most of the natural phenomena that take place in the atmosphere, seawater, or soil meet the criteria of this category. The distribution of air temperature, the salinity of an ocean, soil moisture, or ore deposit concentrations in a geologic layer are examples of regionalized variables. Even though they don't represent truly natural phenomena, crop yield prediction and air pollution might also be subjects for geostatistical analysis.
It is not practical or possible to make exhaustive real-world observations so sampling is used for these analyses. The ultimate goal of sampling is to get a good representation of the phenomenon under study. Spatial sampling is an important consideration in environmental studies because sample configuration influences the reliability, effectiveness, and cost of a survey. Intensive sampling is expensive but gives a precise picture of spatial variability for a given phenomenon. However, sparse sampling is less expensive but may miss significant spatial features. Practical sampling constraints and the availability of existing information can enhance the development of a sampling scheme.
To ensure a high level of confidence in the results of any geostatistical interpolation, it is important to have a sufficient number of well-distributed sampling stations in the monitoring network. How many stations are sufficient and how can their distribution be optimized? GIS, and particularly the ArcGIS Geostatistical Analyst extension, can help answer this question.
One technique used to design an optimal sampling network for a regionalized variable, such as air pollution, is sequential sampling. Sequential sampling is based on extended knowledge of the area to be sampled and expertise in the factors controlling the distribution of a regionalized variable. Familiarity with the terrain and the phenomena should inform the initial choice of site for the sampling network. The results of this preliminary study are used to optimize the scheme by adding new sampling points both in areas having the lowest reliability and in possible hot spot areas (e.g., areas of maximum concentration, high variability, or uncertain measurements).
The kriging interpolator is considered the most sophisticated and accurate way to determine the intensity of a phenomenon at unmeasured locations. Kriging weights surrounding measured values are based not only on the distance between measured points and the prediction location but also on the overall spatial arrangement of the measured points. Except for generating an estimated prediction, kriging can provide a measure of an error, or uncertainty of the estimated surface. Since the estimation variances can be mapped, a confidence placed in the estimates can be calculated and their spatial distribution can be presented on a map to assist in the decision-making process. The prediction standard error maps show a distribution of a square root of a prediction variance, which is a variation associated with differences between the measured and calculated values. The prediction standard error quantifies an uncertainty of a prediction.