ArcGIS Pro

Transforming Raw Data into a NetCDF File for the Voxel Layer

1 Introduction

A voxel cube can be thought of as a 3D pixel, where many voxel cubes together can generate a great representation of data. The newly released ArcGIS Pro 2.6 is able to provide a Voxel Layer with the initial input data format of NetCDF.   Although many NetCDF datasets are widely available online, creating one’s own NetCDF is often desired. This blog, that was created in collaboration with Maximilian Haverkamp, will walk you through an example of how to process data to comply with the Voxel Layer. The opportunities for beautiful data visualization and meaningful analysis are available more now than ever before through the voxel layer.

Network Common Data Form (NetCDF) is used to save multidimensional data in arrays. A NetCDF file contains dimensions, variables and attributes. Dimensions define the area in which the data is located, variables are arrays covering the data, and attributes are used to define variables which include items such as names and units. 

As an examplea NetCDF file about temperature data can include three dimensions: latitude, longitude and height. There are three variables, one for each dimension and a fourth for temperature, which size is defined by  all three dimensions.  The temperature in this example is dense volumetric data, which the voxel layer serves as as a perfect medium for visualization.

Processing any raw data into NetCDF can be cumbersome.  This blog will hopefully alleviate some of the research time and inspire you to create your own beautiful datasets.

GeoMol Data Post-Processing
GeoMol Data Post-Processing

1.1 Example Data 

The data is contains geothermal energy in Switzerland. It is provided by the Federal Office of Topography Switzerland, which analyzed over 5000km of seismic data and more than 60 deep wells and can be found here.

The data expands over more than 1 million values of underground temperature data and the corresponding x, y, z coordinates in the Swiss coordinate system CH1903. 

Table of data from geomol
Table of data from geomol

2 Data creation with Python

2.1 Modules

The following libraries and modules are used to import and then convert our geothermal data to NetCDF.  The netCDF4 module will be used to create the netcdf file.  Numpy is a well-known library specializing in handling single and multidimensional arrays. Here it is used for the coordinates and temperature data arrays.  The module Functools is used to process the first row of the data, which has attribute and/or dimension names.  Depending on the set coordinate system of the origin data, pyproj is able to perform transforms. Finally, scipy is a scientific python module which will be used to interpolate the data.

from netCDF4 import Dataset
import numpy as np
from functools import reduce
from scipy import interpolate
from pyproj import Transformer

2.2 Import and prepare data

The geomol file in our example is a DAT file, so it can simply be opened and read with standard Python commands. Based on your source data, you may have different means of creating the entry point for your data.  Additionally, we will count the number of rows in the file which we will need later.

Geomol_file_path = r'C:\temp\Geomol.dat'
file_buffer = open(Geomol_file_path, 'r')
linecount = len(file_buffer.readlines())

The first line of our file are column names, which will be used as variable and attribute information.  So we must isolate these items into there own array.

firstline = file_buffer.readline().rsplit()
column_names = []
for word in firstline:
  sw = word.strip()

In the case of the GeoMol data, we wish to reduce the amount of attributes. The above snippet recognizes white space, such as tabs and spaces, as separate attributes.  The lines below correct this by reducing the amount of attributes by combining them appropriately. This is not always required, but dependent on the data.  Items such as ‘Color Num’ would be two separate labels in the in the column_names array,  so these must be combined to correctly distribute the data.

column_names[0:2] = [reduce(lambda x, y: x + y, column_names[0:2])]
column_names[7:9] = [reduce(lambda x, y: x + y, column_names[7:9])]
column_names[8:10] = [reduce(lambda x, y: x + y, column_names[8:10])]
column_names[9:11] = [reduce(lambda x, y: x + y, column_names[9:11])]
column_names[10:12] = [reduce(lambda x, y: x + y, column_names[10:12])]
column_names[11:13] = [reduce(lambda x, y: x + y, column_names[11:13])]

An array will be filled with all information from the Geomol file (besides the first line).  The file buffer can also be closed at this step.

column_values = []
for i in range(linecount):
  lines = file_buffer.readline().rsplit()
  for word in lines:
    sw = word.strip().split()

Now an empty matrix must be created and filled. The size of the matrix is equal to the length of values of one column times the amount of columns. Due to the list being one dimension, it can be divided by the number of attribute types to get the number of appropriate values of each column.

The values will be placed in the matrix to match the row and column from the original data keeping the values in the correct spot.  A value will be casted into a float if possible, otherwise fills the matrix with the original data type.

row_size = int((len(column_values)) / len(column_names))
column_size = int(len(column_names))
matrix = np.empty((row_size,column_size),"object")
j,k = 0,0
for i in range(0, len(column_values)):
    matrix[j][k] = float(column_values[i])
    matrix[j][k] = (column_values[i])
  if k == len(column_names)-1:
    k = 0
    j = j + 1
    k = k + 1

In order to correctly process the data, there must be an ordering.  In this example, the data is sorted by x, then y, and finally z.  The ordering is going to be dependent on an individual implementation of filling the matrix as seen later in this example.

Sort by X, then Y, then Z.
Sort by X, then Y, then Z.
matrix = matrix[matrix[:,1].argsort()]
matrix = matrix[matrix[:,2].argsort(kind = 'mergesort')]
matrix = matrix[matrix[:,3].argsort(kind = 'mergesort')]

Values are extracted from the matrix and put into their own individual arrays.  For this example, the x dimension is at 1, y dimension is at 2, and so on.  Then the x and y values are transformed from one coordinate system to another.

x_values = matrix[:,1]
y_values = matrix[:,2]
z_values = matrix[:,3]
temp_values = matrix[:,13]
transformer = Transformer.from_crs("epsg:2056", "epsg:2056")
y_values, x_values = transformer.transform(x_values, y_values)

2.3 Interpolation

Since our data is not guaranteed (actually it is almost never the case) to have data points equal in distance from each other in perfect steps, the necessity for interpolation appears. The interpolation will be performed on the x,y plane for every slice in z.  In order to perform this, a list of unique z values must be gathered and sorted.  Casting the z values list into a set will get each unique value with no repeated values, then it will be cast back into a list to be sorted.  The x,y step will also be decided, which determines the size of the voxels.  For this example, the data is split into 200 steps in the x,y dimensions.

Example of reducing list
Example of reducing list
z_values_unique = list(set(z_values))
x_range = np.arrange(min(x_values), max(x_values),
((max(x_values) - min(x_values)) / 200)
y_range = np.arrange(min(y_values), max(y_values),
((max(y_values) - min(y_values)) / 200

A matrix the size of the unique z values, and amount of steps in both the x and y axes.  The matrix is ordered by Z,Y,X because of the necessary ordering to consume NetCDF.  This will be discussed further below.

data_matrix = np.zeros((len(z_range),len(y_range),len(x_range)),dtype = np.float64)

Every unique z value will used to create an x,y slice by iterating through the size of the unique z value list. The value at each unique z value will be accessed, from least to greatest, and a boolean mask will be created.  This mask will determine which rows to select from the matrix that have that specific z value. An x value buffer,y value buffer, and data value buffer will be created by using the mask and the column of the matrix.  These will be one dimensional arrays. Now grids must be created from the x and y range to be passed into the interpolation function. Interpolate over the grid, and assign the result to a temporary buffer.  Swap the axes on the temporary buffer to match the data matrix created before.  Before exiting the for loop the temporary buffer must be assigned to a the ith slice of the data matrix.

for i in range(len(z_values_unique)):
  z_mask = (z_values == z_values_unique[i])
  x_value_buffer = matrix[z_mask][:,1].astype(np.float64)
  y_value_buffer = matrix[z_mask][:,2].astype(np.float64)
  data_value_buffer = matrix[z_mask][:,13].astype(np.float64)
  x_grid,y_grid = np.meshgrid(x_range,y_range)
  temp_buffer = interpolate.griddata((x_value_buffer,y_value_buffer),
                method = "cubic")
  data_matrix[i,:,:] = temp_buffer

2.3.1 A word on the interpolation method

The interpolation function being used is griddata from the scipy module.  This function works in a 2D space, however other methods of interpolation could be used to work in a 3D space. The method passed, “cubic”, interpolates over the grid on the points given. Therefore, points outside of the calculated convex hull will be set to nan.  Below the image describes the area, in green, where interpolation will give a non-nan value given the points are the edges of the original data.  The rest of the grid area, surrounded by a red outline, will be the area of where nan values are given. If extrapolation is needed to fit the data points across the grid area, the the parameter “nearest” can be passed to method, however this may give less desirable results.

Interpolated area of convex hull over a grid
Interpolated area of convex hull over a grid. The green area will receive non-nan values for interpolation.

2.4 Creating NetCDF

To begin the NetCDF creation, a new empty NetCDF file must be created. A NetCDF file consists of dimensions and attributes, which have already been extracted above. The size of a dimension is as big as the length of the data array, e.g. 20 latitude values equals in a dimension size of 20. The size can also be set to none or zero, which causes the dimension to be unlimited.

ncdf = Dataset("", "w")
ncdf.history = "Created using python script"
ncdf.createDimension("x", (len(x_range)))
ncdf.createDimension("y", (len(y_range)))
ncdf.createDimension("z", (len(z_values_unique)))

After the dimensions are created, variable creation is next.  Variables are defined by and define dimensions.  For example, the x variable consists of the x dimension only, and gives the array values of the x dimension.  Other variables, such as temperature in this example, are defined by multiple dimensions and have a value for each combination of dimension.  Note that the dimensions must be in the order of Z,Y,X to be compatible with the voxel layer.

Other information can be appended to variables as well, such as units, standard_name, and more.

xvar = ncdf.createVariable("x", "d", ("x",))
xvar.units = "degrees"
xvar.standard_name = "Projected X"
yvar = ncdf.createVariable("y", "d", ("y",))
yvar.units = "degrees"
yvar.standard_name = "Projected Y",
zvar = ncdf.createVariable("z", "d", ("z",))
zvar.units = "meter"
zvar.standard_name = "level"
zvar.positive = "up" 
T = ncdf.createVariable("T", "d", ("z","y","x",))
T.units = "Celsius"
T.standard_name = "Temperature"

Now the final step is to set the NetCDF variable arrays to their corresponding previously processed arrays.  Closing the NetCDF file is also recommended.

xvar[:] = x_range
yvar[:] = y_range
zvar[:] = z_range
T[:,:,:] = data_matrix

3 Adding NetCDF to Pro

3.1 Requirements

The NetCDF file must fulfill the Climate and Forecast (CF) conditionswhich can be validated using the compliance checker toolMore importantly, the coordinate arrays must be monotonic, which means they are sorted ascending or descending.

If the NetCDF file has no set coordinate systema projection file must be created in the same folder location. A projection file is a text file with the same name as the NetCDF file and .prj as the file extension. It contains one line, which defines the projection, thus must match the coordinates. The appropriate projection file can be found here.  For this example WKID 4314 will be used.

3.2 Add data into ArcGIS Pro

Once the file meets the requirements it can be added into ArcGIS Pro. First create a local scene, then via Map ribbon – Add Data – Multidimensional Voxel Layer. This is only accessible in a local scene.  

Add NetCDF to Pro
Add NetCDF to Pro

Now a dialog is displayed and the NetCDF file can be consumed, it is also possible to specify which variables shall be displayed, if you have more than one variable. You may notice a yellow triangle with an exclamation mark to the right of the Input Data Source Path. This will inform you of any warnings or errors that may be present in your NetCDF file.

If the file cannot be added, it is most likely due to it not meeting the requirements (see 3.1).

Select variable from NetCDF
Select variable from NetCDF

After clicking OK you may visualize and explore your voxel data! For more information, please check out What is a voxel layer? in the help topics.

About the author

Richard is a Product Engineer (SDET) on the 3D Scene Layers Team at ESRI. In this role he works on multidimensional voxel and 3D object layers, as well as a maintainer for the ESRI I3S specification. Richard has a Bachelor of Science in Computer Science from California State University San Bernardino, and is currently enrolled at Georgia Institute of Technology for a Master of Science in Computer Science. Outside of work, Richard likes to indulge in video games, fiddle with the guitar, and go on hikes.


Leave a Reply

Please Login to comment

Next Article

ArcGIS Hub says goodbye to Internet Explorer and Edge Legacy

Read this article