Many of the benefits GIS provides for decision-making is due to its ability to leverage non-spatial information in a geographic context. This non-spatial information, stored as a series of attributes, provide rich and descriptive characteristics to your features. Such attributes are commonly stored as text values because of the flexibility they provide. They can include letters, numbers, punctuation, or other special characters. Unfortunately, with this flexibility also comes the increased risk of introducing errors in your data.
Data Reviewer’s Regular Expression check helps to simplify and automate quality control in this component of your GIS by ensuring that attribute data meets defined requirements. Here are some examples that illustrate the use of this check.
NENA Globally Unique Identification Number (NGUID)
The National Emergency Number Association’s (NENA) Standard for NG9-1-1 GIS Data Model (NENA-STA-006.1.1-2020) defines GIS data information, formats, requirements, and related information which support NENA Next Generation 9-1-1 services. The NENA Globally Unique IDs (NGUID) are a required attribute for all GIS data elements defined in the data model. This includes such features as road centerlines, site or structure address points, and related management boundaries. Values stored in this attribute support the reporting and resolution of errors resulting from quality control processes.
This attribute contains values that combine a locally assigned ID (which can be numeric and/or text) and an Agency Identifier (a domain representing that authority). For example, a road with a locally unique ID of RCL12085303, combined with an Agency Identifier of “mycounty.mystate.us”, would result in an NGUID value of “RCL12085303@mycounty.mystate.us”.
Implementing quality control for attribute data
The following steps outline a workflow for implementing automated quality control for attribute values that must adhere to the NGUID formatting standard. In this example, the Regular Expression check will be used to implement a validation attribute rule to identify existing road centerline features that do not comply with this requirement.
Step 1 – From ArcGIS Pro, in the Catalog pane, select the feature class or standalone table that contains NGUID values requiring validation.
Step 2 – Click Data Design -> Attribute Rules from the right-click context menu to open the Attribute Rules view for the selected data source.
Step 3 – From the Attribute Rules tab, in the Add Rules group, click Ready to Use Rules to display a gallery of Data Reviewer checks available for use in creating attribute rules for the data source.
Step 4 – From the Ready to Use Rules gallery, Validation group, click the Regular Expression item to create a new validation attribute rule based on the Regular Expression check.
Step 5 – From the check panel, Search Goal parameter, enter the regular expression pattern in the Expression text box for the NGUID field and click Validate to validate the syntax of the expression. For the above example, entering the “RCL[0-9]*@mycounty\.mystate\.us” (without quotes) will identify attribute values in the RCL_NGUID field that do not match the expected format.
Step 6 – From the check panel, Details section, enter a descriptive Rule Name for use in error reporting and management, a brief Description of the error to facilitate corrective workflows, a Severity value to communicate the importance of errors from this rule relative to other errors, and optional Tags for requirements tracking.
Step 7 – Press Save to save the new rule to your geodatabase.
Step 8 – Follow the steps outlined here to evaluate your features and identify those that do not comply with the new rule.
NOTE: To completely address the NGUID requirement, it is also important to use the Unique Field Value check to identify any non-unique values in the dataset.
Additional quality requirements for attribute data
Here are some additional use-cases where the Regular Expression check can be used to identify pattern-related errors in text:
There are many online resources available for creating, testing, and sharing regular expression (aka regex) patterns. Many of the above use-cases are based on shared samples used to identify common errors found in textual data.
As you can see from these examples, regular expressions are a powerful technology for identifying pattern-related errors in data. With origins in Unix and integration in multiple programming languages, software developers find it to be a flexible method for manipulating text in software applications. By leveraging Data Reviewer’s Regular Expression check you too can integrate this powerful capability into your quality control workflows.
Check-out the resources below to learn more about Data Reviewer.