Storing large volumes of data in the cloud
Storing large volumes of imagery in the cloud and transmitting the data to client applications has always created a quandary. Do you want it large, slow and accurate; or fast, small and of unknown accuracy? For the first option, you can store and transmit the rasters with no or lossless compression, but the compression factors are relatively small. For the second option, you can compress using a lossy compression, but you have no real control of the resulting accuracy.
Tradeoff: Accurate, but large and slow? Or, fast and small, but accuracy takes a hit?
Lossless – Stores data at full accuracy, but the data is not compressed much.
Lossy – Sets the compression based on a quality such as 80% or compression factor 15x
The problem with lossy is that you don’t know how much your data was changed to obtain that compression factor. Imagine you’re storing a 32-bit elevation dataset and you know the data is accurate to 10cm. If you store it as lossy, you might get values that deviate by more than 10m. If you store it as lossless, it’s still a large, unwieldy dataset.
To address these issues, Esri has developed a new approach to data compression where you set a tolerance for how much the compressed values can change from the original values. We call the algorithm that does this LERC (Limited Error Raster Compression).
Using LERC, you can set a tolerance of 10 cm when you compress the data, and the result is a dataset that is compressed as much as possible while remaining true to that 10cm tolerance that you set. You can also set the tolerance to 0 which makes the compression lossless. In most cases LERC provides better lossless compression than traditional lossless compression methods such as LZW and deflate.
With scientists and analysts worldwide trying to use imagery to solve complex world problems, like climate change, air pollution, and food sustainability, having accurate information available very quickly is exciting news. Speedy and accurate web access to imagery will improve the way people understand and analyze changes in the Earth. This will make it easier for scientists and analysts to share their results with the public and policy holders, because it is web-enabled and fast!
The Nitty Gritty
LERC is implemented within MRF (MetaRasterFormat), a file format designed by NASA JPL for fast access to raster datasets. Without getting too far into the technical details, this file format takes advantage of tiling schemes to enable web-based caching. (If you do want to know the technical details behind this relationship, check out this forum on geonet). MRF is implemented in Open Source GDAL and Esri have contributed the LERC implementation (https://github.com/nasa-gibs/mrf). Now, using LERC and MRF we all have the ability to store and access massive volumes of imagery at a lower cost in an open format. For organizations with large volumes of imagery this is significant.
To date, LERC has only been available internally within ArcGIS, but we are making this available to the geospatial community, so that developers can make use of the technology and continue to develop new ways of exploring geospatial data. To help serve this community, we have created a Github repository where you can access the code.