ArcUser Magazine Spring 2009

Special Section to occur when spatial autocorrelation is removed from the dependent and explanatory variables. This is the approach of a traditional statistician for dealing with spatial autocorrelation. It is only appropriate if spatial autocorrelation is the result of data redundancy (i.e., the sampling scheme is too fine). n Isolate the spatial and nonspatial components of each input variable using a spatial filtering regression method. Space is removed from each variable, but then it is put back into the regression model as a new variable to account for spatial effects/spatial structure. Spatial filtering regression methods will be added to ArcGIS in a future release. n Incorporate spatial autocorrelation into the regression model using spatial econometric regression methods. Econometric spatial regression methods will be added to ArcGIS in a future release. Regional Variation Global models, such as OLS regression, create equations that best describe the overall data relationships in a study area. When those relationships are consistent across the study area, the OLS regression equation models those relationships well. However, when those relationships behave differently in different parts of the study area, the regression equation produces more of an average of the mix of relationships present. When those relationships represent two extremes, the global average will not model either extreme well. When your explanatory variables exhibit nonstationary relationships (i.e., regional variation), global models tend to fall apart unless robust methods are used to compute regression results. Ideally, you will be able to identify a full set of explanatory variables to capture the regional variation inherent in your dependent variable. However, if you cannot identify all these spatial variables, you will again notice statistically significant spatial autocorrelation in your model residuals and/or lower-thanexpected R-squared values. (R-squared values are a measure of model performance. Values vary from 0.0 to 1.0, with higher values being preferable.) Unfortunately, you cannot trust your regression results until this is remedied. There are at least four ways to deal with regional variation in OLS regression models: n Include a variable in the model that explains the regional variation. If you see that your model is always overpredicting in the north and underpredicting in the south, for example, add a regional variable set to 1 for northern features and 0 for southern features. n Use methods that incorporate regional variation into the regression model such as GWR. n Consult robust regression standard errors and probabilities to determine if variable coefficients are statistically significant. In the ArcGIS Desktop Help Online, see the topic "Interpreting OLS regression results." GWR is still the recommended tool. n Redefine/reduce the size of the study area so processes within it are all stationary and no longer exhibit regional variation. Learning More about Using These Tools This article provides an introduction to the OLS and GWR tools that were released in ArcGIS Desktop 9.3. Esri provides many resources for understanding how to intelligently use regression analysis and other spatial statistics tools. A great place to start learning about spatial statistics in general--and these tools specifically--is the ArcGIS Desktop Web help available through the ArcGIS Desktop and Geoprocessing Resource Centers (resources.arcgis.com) as well as the knowledge bases, communities, and blogs on those sites. Several training courses offered by Esri cover spatial statistics. Understanding Spatial Statistics in ArcGIS 9, a training seminar, is available at no charge from www.esri.com/ training. Advanced Analysis with ArcGIS, an instructor-led course, includes an introduction to the Spatial Statistics toolbox, analyzing patterns, and measuring geographic distributions. A book from Esri Press, The Esri Guide to GIS Analysis, Volume 2: Spatial Measuarements and Statistics, by Andy Mitchell explains how these analyses are performed and used effectively. Observed Estimated R-Squared is a measure of model performance, summarizing how well the estimated y values match the observed y values. 100 Observed Values Predicted Values Residual Map Residuals are the unexplained portion of the dependent variable, represented in the regression equation as the random error term (). Known values for the dependent variable are used to build and calibrate the regression model. Using known values for the dependent variable (y) and known values for all of the explanatory variables (the Xs), the regression tool constructs an equation that will predict those known y values as well as possible. The predicted values will rarely match the observed values exactly. The differences between the observed y values and the predicted y values are called the residuals. The magnitude of the residuals from a regression equation is one measure of model fit. Large residuals indicate poor model fit. www.esri.com 80 60 40 20 Residual = Observed - Predicted 0 0 20 40 60 80 100 Regression residuals ArcUser Spring 2009 43