Storing large volumes of imagery and rasters in the cloud

By kevin_butler

Storing large volumes of data in the cloud

Storing large volumes of imagery in the cloud and transmitting the data to client applications has always created a quandary. Do you want it large, slow and accurate; or fast, small and of unknown accuracy? For the first option, you can store and transmit the rasters with no or lossless compression, but the compression factors are relatively small. For the second option, you can compress using a lossy compression, but you have no real control of the resulting accuracy.

Tradeoff: Accurate, but large and slow? Or, fast and small, but accuracy takes a hit?

Terms:

Lossless – Stores data at full accuracy, but the data is not compressed much.

Lossy – Sets the compression based on a quality such as 80% or compression factor 15x

The problem with lossy is that you don’t know how much your data was changed to obtain that compression factor. Imagine you’re storing a 32-bit elevation dataset and you know the data is accurate to 10cm. If you store it as lossy, you might get values that deviate by more than 10m. If you store it as lossless, it’s still a large, unwieldy dataset.

LERC

To address these issues, Esri has developed a new approach to data compression where you set a tolerance for how much the compressed values can change from the original values. We call the algorithm that does this LERC (Limited Error Raster Compression).

Using LERC, you can set a tolerance of 10 cm when you compress the data, and the result is a dataset that is compressed as much as possible while remaining true to that 10cm tolerance that you set. You can also set the tolerance to 0 which makes the compression lossless. In most cases LERC provides better lossless compression than traditional lossless compression methods such as LZW and deflate.

With scientists and analysts worldwide trying to use imagery to solve complex world problems, like climate change, air pollution, and food sustainability, having accurate information available very quickly is exciting news. Speedy and accurate web access to imagery will improve the way people understand and analyze changes in the Earth. This will make it easier for scientists and analysts to share their results with the public and policy holders, because it is web-enabled and fast!

The Nitty Gritty

We’ve talked about tradeoffs between accuracy and size and how LERC is optimized for compressing large datasets while maintaining the accuracy you need. Another tradeoff you see in traditional compression techniques is in the amount of time it takes to compress and decompress a file. Most compression algorithms focus on making the compressed file as small as possible, but require substantial computing time to do so. Although the resulting file is small, it takes a long time to compress it and it also takes a long time to decompress the file. Under the hood one of the great things about LERC is that the algorithm is very efficient. Because of this, it is blazing fast at both packing and unpacking imagery. LERC decompression can also can be implemented in JavaScript, making it compatible with every web browser in the world, so that you can create dynamic web applications built using accurate, high speed imagery.

LERC is implemented within MRF (MetaRasterFormat), a file format designed by NASA JPL for fast access to raster datasets. Without getting too far into the technical details, this file format takes advantage of tiling schemes to enable web-based caching. (If you do want to know the technical details behind this relationship, check out this forum on geonet). MRF is implemented in Open Source GDAL and Esri have contributed the LERC implementation (https://github.com/nasa-gibs/mrf). Now, using LERC and MRF we all have the ability to store and access massive volumes of imagery at a lower cost in an open format. For organizations with large volumes of imagery this is significant.

To date, LERC has only been available internally within ArcGIS, but we are making this available to the geospatial community, so that developers can make use of the technology and continue to develop new ways of exploring geospatial data. To help serve this community, we have created a Github repository where you can access the code.

Article Discussion:

0 Comments

Oldest

Newest

Inline Feedbacks

View all comments