Special Section
Strong Positive Relationship
Strong Negative Relationship
No Relationship
Household Units
% College Educated
Pet Owners
Foreclosures
Unemployment Rate
% College Educated
These graphs show relationships of two variables when that relationship is positive and negative as well as when no relationship is indicated. the extent that changes in one or more variables jointly affect changes in another. For example, understanding the key characteristics of the habitat of an endangered species of bird (e.g., precipitation, food sources, vegetation, or predators) may help design legislation that will more effectively protect that species. Modeling a phenomenon to predict values for that phenomenon at other places or other times is another valid and valuable use of regression analysis. In this case, the basic objective is to build a prediction model that is consistent and accurate. You could use this model to predict rainfall in places (such as peaks or valleys) where there are no gauges, based on a set of variables that explain observed precipitation values. Regression may be used in cases where interpolation is not effective because of insufficient sampling. You can also use regression analysis to test hypotheses. Suppose you are modeling residential crime to better understand it and implement policy to prevent it. At the outset, you probably have questions or hypotheses you want to test. Some of these might include n Will there be a positive relationship between vandalism incidents and residential burglary (i.e., Broken Window Theory)? n Is there a relationship between illegal drug use and burglary? Might drug addicts steal to support their habits? n Are burglars predatory? Might there be more incidents in residential neighborhoods with higher proportions of elderly or femaleheaded households? n Are people at greater risk for burglary if they live in a rich or a poor neighborhood? Using regression analysis, you can explore these relationships to answer these questions. Building a regression model is an iterative process that involves finding effective independent variables to explain the process Continued on page 42
Dependent variable (y) is the variable representing the process you are trying to predict or understand (e.g., residential burglary, foreclosure, rainfall). In the regression equation, it appears on the left side of the equal sign. While you can use regression to predict the dependent variable, you always start with a set of known y values and use these to build (or to calibrate) the regression model. The known y values are often referred to as observed values. Independent/Explanatory variables (X) are used to model or predict the dependent variable values. In the regression equation, they appear on the right side of the equal sign. We say that the dependent variable is a function of the independent (or explanatory) variables. If you are interested in predicting annual purchases for a proposed store, you might include in your model explanatory variables representing the number of potential customers, distance to competition, store visibility, and local spending patterns.
www.esri.com
Regression coefficients () are computed by the regression tool. They are values, one for each explanatory variable, that represent the strength and type of relationship the explanatory variable has to the dependent variable. Suppose you are modeling fire frequency as a function of solar radiation, vegetation, precipitation, and aspect. You might expect a positive relationship between fire frequency and solar radiation (i.e., the more sun, the more frequent is occurence of fire incidents). When the relationship is positive, the sign for the associated coefficient is also positive. You might expect a negative relationship between fire frequency and precipitation (places with more rain have fewer fires). Coefficients for negative relationships have negative signs. When the relationship is a strong one, the coefficient is large. Weak relationships are associated with coefficients near zero. 0 is the regression intercept. It represents the expected value for the dependent variable if all of the independent variables are zero.
Continued on page 42 ArcUser Spring 2009 41