The Mapping Challenges of Big Data
The Relevance of Cartography: A Cartographer's Perspective
A column by Menno-Jan Kraak, President of the International Cartographic Association
Big data is everywhere. At least, that is what we're told. But has anyone ever fact-checked that statement to see if it is true? Is big data really everywhere, or is it just hype?
To begin to answer that question, we need to ask a couple more. First, what is big data? And second, is what we determine to be big data really ubiquitous?
As is often the case, answering these questions is a matter of definition and context. Even then, there can be multiple perspectives.
Big Data Is…
So what is big data? From a mapping viewpoint, I can think of all kinds of very big datasets.
Imagine, for example, a global dataset of 2.5-meter contour lines. You might wonder at first why someone would have such dense contour lines for such a large area. Well, in the Netherlands, we have them on our 1:25,000 topographic maps to help defend land located below sea level. So they matter.
If someone has this global dataset and puts it on your desk, though, you might have some trouble processing it because of its size. But the data is not actually complex because the contour lines never cross each other. It's just a very large amount of data.
Now imagine receiving a collection of five years' worth of georeferenced social media posts taken from Twitter and Facebook. Like the contour lines, there is a lot of data. But it is also more complex because the content is varied, with photos, text, and videos about diverse aspects of users' daily lives. The geography might also be fuzzy because of the subjective nature of user-generated geotagging. (If a woman tags her post as "New York City," for instance, which borough is she in?) This data will clearly be less coherent than the contours, which suddenly look easy.
Which is big data: a global dataset of 2.5-meter contour lines or a collection of five years' worth of georeferenced social media posts?
But the lingering question remains, are both sets of data considered big data?
The Five Vs
To answer that, let's looks at some definitions. Big data can be characterized by five words that start with V.
The first is volume: a considerable quantity. Both of our data-sets are sizable, to be sure—though it is prudent to keep in mind that proportions are defined by context.
The second word is variety: a collection of heterogeneous things. Here, our contour lines do not qualify because they're all the same. Our social media data, however, does.
Third, we have variability: in geographic terms, differences in spatial and temporal distribution, scale, and attributes. Our contour dataset has spatial variability—some areas are flat and have virtually no contour lines, while others are mountainous, with contour lines that are very close to one another. The social media data also has spatial and temporal variability in that posts come from different places and appear at different times.
The fourth word is velocity: the rate at which the data is updated. For the contour lines, this is likely limited because these attributes don't get updated very often. But it could apply for the location of the contour lines themselves—in a river delta, for example, where land and water alternate positions, thus swiftly changing the contour lines. The social media data, on the other hand, gets updated constantly; therefore, it fits this definition.
Fifth, we have veracity: the accuracy of the data. This might vary a bit for our contour lines due to the surveying methods used, but, in general, contours are within the bounds of good-quality data. Our social media dataset, however, is probably more prone to imprecision because it is crowdsourced information.
So what does all this mean? Although neither dataset adheres to all five V-words, each conforms to enough criteria to be considered big data. What qualifies as big data, then, depends on context, which can be analyzed by looking at the dataset's volume, variety, variability, velocity, and veracity. By analyzing each dataset according to these guideposts, we can figure out whether big data actually is everywhere. Indeed, though, it looks like contemporary forms of data are tending to get larger and larger.
Summarizing Versus Interpreting
Since maps are visual representations of an environment, they are the interface between users and data. Big data, with its increasing prevalence, has a substantial effect on how cartographers work.
In the past, cartographers were expected to create authoritative products—products that, in large part, went unchallenged. Often, cartographers had to fill in gaps in information to present the essence of the message—in an election map, for example, turning certain areas red or blue to show which way they are trending before all the votes are tallied. This required cartographers to employ specific skills, such as interpolation or extrapolation, to interpret missing data or inaccuracies. But when the map was finished, its message was mostly fixed and stable.
Today, however, many cartographers retrieve the essence of their message from big data. They, therefore, are required to summarize more than interpret. Traditional cartographic practices, then—such as generalizing, classifying, and aggregating—no longer suffice. Cartography now requires different skills, along with teamwork.
To deal with big data, the cartographer of today has to be able to do computer programming and communicate with data analysts to get to the core of the message. This message is likely to change over time, however, due to the velocity at which information is updated and the variability of the insights contained within the data. Cartographers now have to create flexible summary maps that can present multiple perspectives. That is challenging for the cartographic community, since it requires coming up with novel designs and even map types. Although summary maps might be simple and schematic, they also require the user to possess the skills to read them.
To help cartographers rise above these new hurdles in map design, the International Cartographic Association (ICA) tasked four of its commissions with putting together a research agenda around how big data affects maps. The Commission on Cognitive Issues in Geographic Information Visualization; the Commission on Visual Analytics; the Commission on Use, User and Usability Issues; and the Commission on Map Design published their findings in the most recent edition of the International Journal of Cartography.
About the Author
Menno-Jan Kraak is professor of geovisual analytics and cartography at the University of Twente in the Netherlands, where he has been teaching since 1996. He has a degree in cartography from the Faculty of Geographical Sciences at Utrecht University and received his PhD in cartography from Delft University of Technology. Kraak has written extensively on cartography and GIS. His book Cartography: Visualization of Spatial Data, written with Ferjan Ormeling, has been translated into five languages. He also wrote Mapping Time: Illustrated by Minard's Map of Napoleon's Russian Campaign of 1812, published by Esri Press in 2014. Kraak is a member of the editorial boards of several cartography journals, including the International Journal of Cartography. He currently serves as president of the International Cartographic Association.
Read other articles in "The Relevance of Cartography" series.