Grow Your Open Data Ecosystem

As the world has become digitized, sheafs of paper and file drawers have transformed into ephemeral, infinite, and globally accessible databases. The data recorded in information systems—whether generated to comply with regulatory requirements or for specific tasks—has a nominal life cycle of birth; growth; and, finally, obsolescence. While the data never truly dies, it often atrophies in old, unmaintained systems. It is archived and stored for potential reuse later—if anyone remembers where the data is or how the old systems work.

When data is instead accessible, reusable, and open for continuous improvement, it is often of higher quality and has a greater impact on communities. We’ve seen this happen with software over the decades: at first, it was stored on fragile punch cards, and now it is being freely and rapidly shared for reuse and collaboration. This has resulted in undeniably more rapid, complex, and high-quality software innovation.

As Eric Raymond stated in The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary, “given enough eyeballs, all bugs are shallow.” In other words, software code that can be read, used, and fixed by other software engineers will result in better software with fewer problems. And beyond fixing issues with software, making source code available for public access enables it to continually grow and be improved on—which is the case for many key projects that now underpin computer systems around the world.

The City of Washington, DC’s ArcGIS Hub site
The City of Washington, DC, empowers city departments to make appropriate decisions regarding open data by providing clear guidance on what data they can share and what data should remain restricted.

Open data is experiencing a transformation similar to that of software. Databases that were locked in silos of singular use are now made available and accessible to anyone. This means that people from other departments, municipalities, businesses, and community groups can immediately reuse the data in their own work, providing better context for evaluating complex relationships, making important decisions, and measuring program outcomes. It also means that journalists can use it when researching and reporting on specific issues or trends. And all consumers of the data can provide feedback on data quality issues, possible corrections that need to made, and other potential improvements to the data.

The Benefits of Open Data

Organizations often face scrutiny of their policies to ensure that they are equitable, effective, efficient, and evolving. When organizations and programs openly share the data that fuels their decision-making—including data from other organizations that relates to the topic and geography of their work—it encourages independent evaluation. This, in turn, can dramatically improve organizational transparency and, thus, trust.

What’s more, solving complex problems, such as disaster response or the need for affordable housing, depends on developing and maintaining partnerships. These types of relationships are strengthened when information is shared freely across groups. Data that is already available and connected can produce opportunities for better collaboration and action.

All this requires an increased and ongoing investment in data maintenance and improvement—which can be done with already existing infrastructure, generating a multitude of benefits. For one, sharing data can reduce an organization’s operating costs by making staff members more efficient at finding data for their work and encouraging interdepartmental collaboration earlier in a project. It can also improve the development and growth of markets by making it easier to compare business metrics with contextual trends and enhancing data-driven decision-making. By contrast, limiting data sharing due to cost-recovery policies, in which potential data users pay for access to public information, is demonstrably regressive.

Sharing data prioritizes the development of best practices that make it possible for other people to understand and appropriately use that data. To ensure that users always have the most up-to-date and accurate data available, for example, it’s expedient to make data accessible through web hyperlinks so that it can easily be integrated into apps, websites, and software tools. It’s also advisable to make sure that data can be downloaded in open file formats that work in common tools like spreadsheets, dashboards, and developer APIs. Then, users can be automatically notified when the data gets updated so that they can quickly download new versions.

In addition, the metadata should be complete and readable, which improves discovery in search engines such as Google, the most common way people look for and find information. Providing a data dictionary about attributes—such as the data’s origin, usage, and relationships to other data—ensures that users know what the data means and how to best use it in analyses. Also, including contact information with data makes it possible for people to ask questions about the data and provide useful feedback that can, ultimately, improve it.

Getting Started with Open Data

Achieving the ideals of data sharing can seem daunting. Many organizations face policy restrictions on data sharing or are unsure about how to start sharing data. Fortunately, over the past few decades, many successful pioneers have developed useful strategies for iteratively growing their own healthy open data programs.

The Wake County Open Data site
Wake County, North Carolina, started its open data program by releasing a few datasets at a time and highlighting various ways to reuse them, such as integrating the data into popular consumer apps.

For instance, the City of Washington, DC, has a spectrum of data—from open data to private and license-restricted data—and needed a comprehensive policy to cover a wide range of data sharing options. The city published a draft data policy that was available for anyone to read and comment on. Over several weeks, hundreds of respondents posted specific comments, recommendations, and requests to change various aspects of the policy. Based on that, the city created a structured taxonomy that goes from Level 0 data, which is completely open and can be reused by anyone, to Level 4 data, which is restricted and confidential. This simple structure provides city departments with clear guidance on open data and empowers them to make explicit and appropriate decisions about what data to share.

Many organizations are concerned about where to begin with open data. Wake County, North Carolina, started small by releasing a few datasets each month. This phased approach allowed employees at the county to focus their limited time on improving data quality and strengthening their data sharing processes. Each data release they put out had a focused story that highlighted the purpose of the data and opportunities for reusing it. This included integrating open data into popular consumer apps like Yelp and Waze. Sometimes the best place for open data is inside the apps people are already using.

Regional organizations that connect different government administrations are another healthy driver of open data. While data infrastructure and services are usually managed within specific administrative boundaries, the data necessarily intersects and requires coordinated action with nearby municipalities. Regional data sharing programs, like NC OneMap in North Carolina, define common data standards and priorities that support statewide local data sharing.

Open data has become a common and popular movement that continues to grow. National governments are leading the way with modern policies and programs that encourage local municipalities to share their data. Additionally, the commercial industry is sharing data that is accelerating health, energy, and transportation innovation, and nonprofit organizations are sharing data that validates their important work and builds trust among their partners and donors. Together, all these organizations in various places around the world are developing a healthy ecosystem of data that overlaps and interconnects. But there are still significant gaps in available data. So regressive or restrictive policies need to evolve to provide open, free, and well-maintained data for all—a surefire way to better support local businesses and constituents.

It doesn’t take much to share more data. Start small with focused open data releases that support specific priorities and issues. Find a champion in your organization who can provide leadership in this endeavor and demonstrate the effective outcomes of an open data program. Build partnerships with others, both within and outside your organization—including data providers, analysts, and users—to grow the reach and success of your program. And join the thriving open data ecosystem.

About the author

Andrew Turner

Andrew Turner is the director and chief technology officer of Esri’s Research and Development Center in Washington, DC, where he leads the development of ArcGIS Hub and supports Esri’s strategies for open standards and open-source data. Turner joined Esri in 2012 from GeoIQ and has been developing open data sharing systems for more than 20 years.