The technology tides have shifted again and, as the notion of cloud computing is becoming mainstream across most industries, a new buzzword is emerging: Big Data. Never heard of it? Simply put, the term refers to the ever-growing mountain of data, generated from myriad sources, that organizations must effectively address.
For instance, according to a recent MeriTalk survey, 96% of Federal IT professionals expect their agency’s stored data to grow in the next two years by an average of 64 percent.
Big Data is often described using the Three “V”s: Velocity, Volume, and Variety. By example, let’s take a few of the real world case studies gathered by IBM and provided by Mike Rhodin, Senior Vice President at IBM Software Solutions:
- Utility companies record 350
- meter readings per year (= Volume); The financial service industry clocks 5,000,000 trade events
- (= Velocity); and, as we know, the types of data formats that can be generated easily range from structured traditional file formats, to unstructured video, audio, imagery, email, web logs, and pretty much
anything you can think of
- (= Variety).
As you might expect, the typical C-level individual is not overly concerned with the definition of Big Data. What really interests her is finding ways to take advantage of Big Data in order to drive better business outcomes, according to Gartner. So how does one extract value out of data that is terrabytes large (or larger), at-scale, both on a geographical and also a practical level? How do you store all this data? And how do you achieve results that are meaningful to your organization and your customers? These are a few of the challenges of Big Data.
The good news is that there are a growing number of technologies that allow individuals to store and conduct analyses on the 3V’s of Big Data. For example, MapReduce is the original set of distributed computing ideas now embodied today as Apache Hadoop. Other big data-related technologies include Apache Cassandra, Hive, NoSQL, and MongoDB, just to name a few. Also emerging are applications or methodologies used to perform data mining and analyses on Big Data via pleasing dashboards and an intuitive user experience (UX).
A residual effect of the growth of these offerings is the increased demand for workers with blended skill sets to fill the role of a Data Scientist: a half-research scientist, half-data analyst.
Enter: Serious business analytics.
To put the potential for Big Data into perspective, in 2011, GigaOm shared a few interesting examples of real-world situations where Big Data problems were solved:
- A New York University PhD student conducted a comprehensive analysis of several terabytes worth of Wikileaks data to determine key trends around U.S. and coalition troop activity in Afghanistan.
- A global non-profit analyzed 80 Million documents to confirm validity of the Guatemalan genocide of the 1990’s
- A California genomics company consumed over 100 Million gene samples to predict markers for coronary artery disease.
It’s not surprising and quite a natural progression that the discussion of Big Data arrives on the heels of cloud computing. The cloud allows organizations and agencies the ability to store a tremendous amount of data in a [hopefully] highly reliable system, in a distributed environment. Cloud provides the ability to scale dynamically, leverage existing algorithms for analyses, and take advantage of robust data center hardware, cost effectively, without building from the ground-up.
Esri has been testing Amazon Web Services’ Elastic MapReduce product and deploying prototypes on the AWS cloud, as well as exploring and providing MongoDB examples to plug-in NoSQL data sources to ArcGIS. More visibly is Esri’s geospatial analysis of tweets generated from Twitter and collected through big data partner, Gnip. You can see examples of social media monitoring via the public information maps, where tweets are captured then displayed across relevant geographies. Other Esri partners in the big data space currently are Microsoft, IBM, TerraEchos, and CloudTrigger.
Every organization sees their data as core assets that drive business and decision-making. Mining location data from these assets and making sense of them is perhaps one of the biggest challenges we face with Big Data. Typically this information is randomly collected and then locked away. Access is limited or the data is archived and forgotten. A more democratic platform, such as ArcGIS Online, can be used to greatly increase the speed of understanding and sharing of location data assets. As a result, individual users are empowered with the information they need to make the most effective and innovative decisions, affecting the future of government and society, science and business.
That is the big deal about Big Data.