{"id":2162682,"date":"2023-11-15T14:37:06","date_gmt":"2023-11-15T22:37:06","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2162682"},"modified":"2023-11-16T09:40:18","modified_gmt":"2023-11-16T17:40:18","slug":"leverage-apache-arrow-in-arcgis-pro","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro","title":{"rendered":"Leverage Apache Arrow in ArcGIS Pro"},"author":122991,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341,23851,738191],"tags":[772592,31181,230422,24341],"industry":[],"product":[36841,765842,36561],"class_list":["post-2162682","blog","type-blog","status-publish","format-standard","hentry","category-analytics","category-data-management","category-developers","tag-apache-arrow","tag-arcpy","tag-data","tag-python","product-api-python","product-geoanalytics-engine","product-arcgis-pro"],"acf":{"authors":[{"ID":122991,"user_firstname":"Hannes","user_lastname":"Ziegler","nickname":"hziegler","user_nicename":"hziegler","display_name":"Hannes Ziegler","user_email":"hziegler@esri.com","user_url":"","user_registered":"2020-10-30 20:06:29","user_description":"Hannes is a product engineer on the Python team. He has five years of experience streamlining spatial data analysis workflows in the public and private sectors, and has been with Esri since 2019, where he focuses on the design, evaluation, and documentation of new and existing Python functionality.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/10\/HannesZiegler2-465x465.jpg' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"short_description":"Leverage Apache Arrow to transport data at increased efficiency between ArcGIS Pro and open-source components.","flexible_content":[{"acf_fc_layout":"content","content":"<p>ArcGIS continues to grow as a geospatial data science platform, incorporating specialized geospatial data science tools with open-source components. With this ever-increasing network and the ever-increasing volume of data we need some way to efficiently connect between these various components. Apache Arrow may help.<\/p>\n<h1>Introduction to Apache Arrow<\/h1>\n<p><a href=\"https:\/\/arrow.apache.org\/\">Apache Arrow<\/a> is a burgeoning, ambitious, open-source project by Wes McKinley and partners. For some time now it has been slowly finding its way into <a href=\"https:\/\/arrow.apache.org\/powered_by\/\">various popular data and analytics platforms<\/a>. In short, Apache Arrow is an in-memory, columnar, cross-platform, cross-language, and open-source data representation that allows you to efficiently transfer data between components. It is intended to sit low in the stack:<\/p>\n"},{"acf_fc_layout":"quote","author_name":"The Apache Organization","author_profession_organization":"","image":false,"text":"[Apache Arrow] is designed to both improve the performance of analytical algorithms and the efficiency of moving data from one system or programming language to another."},{"acf_fc_layout":"content","content":"<p>In other words, unlike user-facing Pandas and Spark data frames, Apache Arrow\u2019s data representation is intended to sit behind the scenes at a lower level, efficiently running the logistics regardless of platform or language.<\/p>\n<p>One of the most powerful promises of Arrow is to serve as a sort of Esperanto (or common language) for data transport\u2014a super-efficient, often zero-copy vehicle that can thread the interfaces between various platforms, including ArcGIS Pro.<\/p>\n<p>In this blog, you\u2019ll learn how to leverage Apache Arrow to improve your workflows across components like Pandas (including Spatially Enabled Data Frames and Geopandas), Spark, Parquet, and ArcPy.<\/p>\n<h1>Leverage Apache Arrow in ArcPy<\/h1>\n<p>We added support for reading and writing Arrow Tables to ArcPy at ArcGIS Pro 2.9. With the release of ArcGIS Pro 3.2, we improved upon this feature by adding support for additional data types and geometry encodings. This allows you to connect your ArcGIS Pro workflows with other data and analytics platforms by transporting your geospatial data using Arrow Tables. As Apache Arrow grows in popularity and adoption, support for it will expand on other platforms. So, if you\u2019re searching for an efficient path for bringing your geospatial data from other projects into ArcGIS (or vice versa), leveraging ArcPy\u2019s integration with Arrow may, in some cases, offer the best solution.<\/p>\n<h3>Apache Arrow in Python<\/h3>\n<p>Apache Arrow\u2019s interface for Python is provided by the <a href=\"https:\/\/arrow.apache.org\/docs\/python\/index.html\">PyArrow<\/a> library.<\/p>\n<h3>Arrow Tables<\/h3>\n<p><a href=\"https:\/\/arrow.apache.org\/docs\/python\/data.html#tables\">Arrow Tables<\/a> are a tabular data representation composed of columns, in which each column has a field name, data type, and the data itself (as well as optional metadata, more on this later).<\/p>\n<h2>Write an Arrow Table from ArcPy<\/h2>\n<p>To convert a Featureclass to an Arrow Table, you can use the <code>arcpy.da.TableToArrowTable<\/code> function.<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"import arcpy\r\n\r\nnsowlnests_path = r'C:\\data\\forestry.gdb\\northern_spotted_owl_nests'\r\n\r\n# Convert a Featureclass to an Arrow Table.\r\nnsowlnests_at = arcpy.da.TableToArrowTable(nsowlnests_path)  # _at for Arrow Table","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>The geometry column in the resulting Arrow Table will be encoded in the <em>EsriShape<\/em> binary format. This format is efficient and lossless, but it is also incompatible with most other analytics platforms. When you need the exported geometry data to be compatible with another platform, you can choose a different geometry encoding with the optional <code>geometry_encoding<\/code> parameter, which supports the additional geometry encodings <em>EsriJSON<\/em>, <em>GeoJSON<\/em>, <em>WKT<\/em>, and <em>WKB<\/em>. These are publicly documented formats for representing geometries, which you can read about at the following sites:<\/p>\n<ul>\n<li><a href=\"https:\/\/developers.arcgis.com\/documentation\/common-data-types\/geometry-objects.htm\">EsriJSON Specification<\/a><\/li>\n<li><a href=\"https:\/\/geojson.org\/\">GeoJSON specification<\/a><\/li>\n<li><a href=\"https:\/\/www.ogc.org\/standard\/sfa\/\">Well-known Text (WKT) and Well-known Byte (WKB) Specification<\/a><\/li>\n<\/ul>\n<p>Most geospatial analytics and data platforms will support reading or writing at least one of these formats.<\/p>\n<h2>Read an Arrow Table to ArcPy<\/h2>\n<p>Reading an Arrow Table into ArcGIS Pro is done by passing the Arrow Table into <code>arcpy.management.CopyRows<\/code> (for Tables) or <code>arcpy.management.CopyFeatures<\/code> (for Featureclasses). In fact, geoprocessing tools accept Arrow Tables as a data source, so you can directly use an Arrow Table as input to a geoprocessing tool.<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"buffer = '0.7 MILES'\r\noutfc  = r'C:\\data\\forestry.gdb\\protected_areas'\r\n\r\n# Use the Arrow Table in a Geoprocessing tool.\r\narcpy.analysis.Buffer(nsowlnests_at, outfc, buffer)\r\n","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>ArcPy can read Arrow Tables with all the same geometry encodings it can write (EsriShape, EsriJSON, GeoJSON, WKT, and WKB). However, when the Arrow Table did not originate from ArcPy, you may need to do some additional prep work to ensure the table can be successfully read by ArcPy. You\u2019ll learn about that in the next section.<\/p>\n<h1>Interoperability with other analytics components<\/h1>\n<p>ArcPy uses metadata keys embedded with the Arrow Table columns to determine how to interpret the data. The metadata is stored as part of a table\u2019s schema. When using Arrow Tables as a vehicle for moving data between different geospatial data and analytics platforms, it is important to understand the schema specification for Apache Arrow Tables that ArcPy supports. You can view an Arrow Table\u2019s schema using its schema method.<\/p>\n<p><code>&lt;Arrow Table Object&gt;.schema<\/code><\/p>\n<p>For the Arrow Table from the previous example, <code>'nsowlnests_at'<\/code>, which contains the columns OID, Shape, NEST_ID, TREE_SPECIES, NEST_HEIGHT_M, LAST_ACTIVE_YEAR, and NOTES, the schema looks like this:<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"OID: int64 not null\r\n  -- field metadata --\r\n  esri.oid: 'esri.int64'\r\nShape: binary\r\n  -- field metadata --\r\n  esri.sr_wkt: 'GEOGCS[\"GCS_WGS_1984\",DatUM[\"D_WGS_1984\",SPHEROID[\"WGS_19' + 111\r\n  esri.encoding: 'EsriShape'\r\nNEST_ID: int32\r\nTREE_SPECIES: string\r\nNEST_HEIGHT_M: int32\r\nLAST_ACTIVE_YEAR: int32\r\nNOTES: string","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>Note the metadata attached to the Shape field. The <code>esri.sr_wkt<\/code> key defines the coordinate system of the geometry stored in this column using <a href=\"https:\/\/www.ogc.org\/standard\/wkt-crs\/\">Well-known-text of Coordinate Reference Systems<\/a> (WKT CRS). The <code>esri.encoding<\/code> key specifies the geometry encoding of the data, in this case EsriShape. The data type of the Shape field is binary. Note that different geometry encodings may require different field data types. For example, if the Shape field held GeoJSON encoded geometry instead, it would need to be of data type string.<\/p>\n<p>You can find additional information about the required schema and the mappings and metadata for the supported field data types in the Type conversions section of the <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/arcpy\/get-started\/working-with-arrow-in-arcgis.htm\">Apache Arrow in ArcGIS documentation<\/a>.<\/p>\n<p>The schema profile for Apache Arrow Tables supported by ArcPy is not the only Arrow Table schema profile for geospatial data, there is also the <a href=\"https:\/\/geoarrow.org\/\">GeoArrow specification<\/a>. ArcPy also supports reading Arrow Tables with a GeoArrow schema. However, ArcPy will not create Arrow Tables with the GeoArrow schema.<\/p>\n<p>Some platforms may not preserve an Arrow Table\u2019s original schema or produce an Arrow Table with a schema ArcPy understands. In cases where they don\u2019t, you will need to reconstruct the schema either from scratch or using the original schema.<\/p>\n<h2>Parquet<\/h2>\n<p>While Apache Arrow is an efficient but temporary in-memory data structure for fast operations, <a href=\"https:\/\/parquet.apache.org\/\">Apache Parquet<\/a> is an on-disk data structure for space efficient long-term storage. In short, Apache Arrow is for processing and moving of data, and Apache Parquet is for storage. The two formats are optimized for compatibility. This compatibility means that the schema will be preserved when writing an Arrow Table to a parquet file for long-term storage.<\/p>\n<p>Here&#8217;s how you can move geospatial data between Parquet files and ArcGIS Pro using Apache Arrow:<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"import os\r\nimport pyarrow.parquet as pq\r\n\r\n# Using the `nsowlnests_at` created previously from ArcPy:\r\n\r\n# Write to parquet for long-term storage.\r\nws = r'C:\\data'\r\nnsowlnests_pq = os.path.join(ws, \"northern_spotted_owl_nests.parquet\")\r\n# _pq for Parquet\r\n\r\npq.write_table(nsowlnests_at, nsowlnests_pq)\r\n\r\n# After some time in storage...\r\n\r\n# Read (retrieve from storage) the parquet file to an Arrow Table. \r\n# The original Arrow Table\u2019s schema is preserved.\r\nretrieved_at = pq.read_table(nsowlnests_pq)\r\n\r\n# Now use `retrieved_at` in ArcPy.","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<h2>Pandas DataFrames<\/h2>\n<p>The <a href=\"https:\/\/pandas.pydata.org\/\">Pandas<\/a> DataFrame is a table-like in-memory data structure with an interface for data analysis. The Pandas team plans to completely back Pandas with Apache Arrow (instead of NumPy) when Pandas 3.0 is released. With the recently released Pandas 2.0, backing a DataFrame with Apache Arrow is optional. ArcGIS Pro 3.2 ships with Pandas version 2.0.2, so you can try this out yourself.<\/p>\n<p>In the following example, we will use Arrow to move geospatial data between a Pandas DataFrame and ArcGIS Pro, and leverage the new Arrow backed data types in Pandas:<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"import pandas as pd\r\nimport pyarrow as pa\r\n\r\n# Using the `nsowlnests_at` created previously from ArcPy:\r\n\r\n# Store the Arrow Table\u2019s schema in `schema` for later, because\r\n# it will not be preserved during the conversion to a Pandas DataFrame.\r\nschema = nsowlnests_at.schema\r\n\r\n# Define a data type mapping (to arrow data types) for Pandas to use.\r\ndtype_mapping = {\r\n    pa.int8(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int8()),\r\n    pa.int16(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int16()),\r\n    pa.int32(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int32()),\r\n    pa.int64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int64()),\r\n    pa.uint8(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint8()),\r\n    pa.uint16(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint16()),\r\n    pa.uint32(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint32()),\r\n    pa.uint64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint64()),\r\n    pa.float32(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.float32()),\r\n    pa.float64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.float64()),\r\n    pa.float64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.float64()),\r\n    pa.bool_(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.bool_()),\r\n    pa.binary():pd.core.arrays.arrow.dtype.ArrowDtype(pa.binary()),\r\n    pa.string():pd.core.arrays.arrow.dtype.ArrowDtype(pa.string())\r\n}\r\n\r\n# Convert the Arrow Table to a Pandas DataFrame using `dtype_mapping`.\r\nnsowlnests_pdf = nsowlnests_at.to_pandas(types_mapper=dtype_mapping.get)  \r\n# _pdf for Pandas DataFrame\r\n\r\n# After some processing performed on the Pandas DataFrame...\r\n\r\n# Convert the Pandas DataFrame back to an Arrow Table,\r\n# applying the schema stored earlier.\r\nretrieved_at = pa.Table.from_pandas(nsowlnests_pdf, schema=schema)\r\n\r\n# Now use `retrieved_at` in ArcPy.\r\n","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>In testing, the <code>from_pandas<\/code> operation sees a significant performance boost of roughly 40 percent from the Pandas DataFrame being backed with Arrow data types rather than NumPy, but your mileage may vary. Pandas 3.0 is expected to standardize this once it is released.<\/p>\n<p>While moving data between ArcGIS and Pandas can be useful, Pandas has no inherent geospatial data processing and analysis capabilities. For this, you will need to look to the ArcGIS API Spatially Enabled DataFrame in the next section.<\/p>\n<h2>Spatially Enabled DataFrames<\/h2>\n<p>The <a href=\"https:\/\/developers.arcgis.com\/python\/\">ArcGIS API for Python<\/a>\u2019s Spatially Enabled DataFrame (SEDF) is built on top of Pandas. Essentially, it extends the Pandas DataFrame with geospatial capabilities, with interoperability between SEDF and ArcPy. An SEDF can be created from a Featureclass using the ArcGIS API, and ArcPy can directly read the SEDF format as input to geoprocessing tools, so you don\u2019t necessarily need to use Arrow. However, you can use Arrow in this transaction as well. By converting the SEDF to an Arrow Table first, and then using the Arrow Table with ArcPy instead of the SEDF, testing resulted in roughly a 14 percent boost in performance (again, your mileage may vary).<\/p>\n<p>The below code shows how you can leverage Arrow to move geospatial data between the ArcGIS API SEDF and ArcGIS Pro to gain a slight performance boost:<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"import arcgis\r\nimport pandas as pd\r\n\r\n# Convert a Featureclass to an ArcGIS API SEDF.\r\nnsowlnests_path = r'C:\\data\\forestry.gdb\\northern_spotted_owl_nests'\r\nnsowlnests_sedf = pd.DataFrame.spatial.from_Featureclass(nsowlnests_path)\r\n\r\n# After some geospatial processing performed on the ArcGIS API SEDF...\r\n\r\n# Convert the ArcGIS API SEDF to an Arrow Table.\r\nretrieved_at = nsowl_nests_sedf.spatial.to_arrow()\r\n\r\n# Now use `retrieved_at ` in ArcPy.\r\n","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>Note that the Arrow Table that results from the <code>spatial.to_arrow<\/code> \u00a0method adheres to the GeoArrow specification instead of Esri\u2019s schema profile for Apache Arrow Tables.<\/p>\n<h2>Geopandas<\/h2>\n<p>The <a href=\"https:\/\/geopandas.org\/en\/stable\/\">Geopandas<\/a> GeoDataframe is also built on top of Pandas and, like SEDF, extends the Pandas DataFrame with geospatial capabilities. You can convert a Featureclass to a GeoDataFrame using <code>geopandas.read_file<\/code>. However, converting a GeoDataFrame to a Featureclass is not directly supported. You can go one of two routes here: either convert the GeoDataFrame to an SEDF using <code>pd.DataFrame.spatial.from_geodataframe<\/code>, or leverage Arrow.<\/p>\n<p>In this example, we will use Arrow to move geospatial data between a Geopandas GeoDataFrame and ArcGIS Pro:<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"import geopandas  # Must install into environment before import\r\nimport pyarrow as pa\r\n\r\n# Read a featureclass to a Geopandas GeoDataFrame\r\nws = arcpy.env.workspace\r\nnsowlnests_gdf = geopandas.read_file(ws, layer=\"northern_spotted_owl_nests\")\r\n# _gdf for GeoDataFrame\r\n\r\n# After some geospatial processing performed on the Geopandas GeoDataFrame...\r\n\r\n# The GeodataFrame geometry format is incompatible with ArcPy,\r\n# convert it to WKB.\r\nnsowlnests_gdf2 = gdf.to_wkb()\r\n\r\n# Create (from scratch) the schema for the Arrow Table. \r\n# The schema must adhere to Esri\u2019s schema profile \r\n# for Apache Arrow Tables.\r\n\r\n# You can grab the spatial reference from the original layer,\r\n# (\"northern_spotted_owl_nests\").\r\nsr = arcpy.Describe(\"northern_spotted_owl_nests\").spatialReference.exportToString()\r\n\r\n# To help with determining the Arrow data types, use nsowlnests_gdf2.dtypes \r\n# to view the existing DataFrame data types.\r\n# The table below shows the mapping chosen for this table:\r\n# ColumnName    Pandas dtype  -&gt;  Arrow dtype\r\n# ------------------------------------------\r\n# NEST_ID              int64  -&gt;  int64\r\n# TREE_SPECIES        object  -&gt;  string\r\n# NEST_HEIGHT_M        int64  -&gt;  uint8\r\n# LAST_ACTIVE_YEAR     int64  -&gt;  uint16\r\n# NOTES               object  -&gt;  string\r\n# geometry            object  -&gt;  binary\r\n# You will have to decide the appropriate Arrow data types to map \r\n# to the Pandas data types.\r\nfields = [\r\n    pa.field(\"NEST_ID\", pa.int64()),\r\n    pa.field(\"TREE_SPECIES\", pa.string()),\r\n    pa.field(\"NEST_HEIGHT_M\", pa.uint8()),\r\n    pa.field(\"LAST_ACTIVE_YEAR\", pa.uint16()),\r\n    pa.field(\"NOTES\", pa.string()),\r\n    pa.field(\r\n        \"geometry\",\r\n        pa.binary(),\r\n        metadata={b'esri.encoding': \"WKB\", b'esri.sr_wkt': sr}\r\n    )\r\n]\r\nschema = pa.schema(fields)\r\n\r\nretrieved_at = pa.Table.from_pandas(nsowlnests_gdf2, schema=schema)\r\n\r\n# Now use `retrieved_at ` in ArcPy.","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>Because you must create the ArcPy compatible schema for the Arrow Table from scratch, this workflow is quite a bit more involved than simply converting the GeoDataFrame to an SEDF. Consider it an example of moving geospatial data by brute force. In this case, a better alternative exists by first converting to SEDF, but other third-party analytics components may not offer such integrations, so an approach like this may come in handy.<\/p>\n<h2>Apache Spark<\/h2>\n<p>Apache Spark is a scalable distributed data processing and analytics engine. It can also be run locally, but the real benefit of using Spark comes from its ability to parallel-process large data distributed over clusters of computers.<\/p>\n<p>The following example shows how you can leverage Arrow to move geospatial data between a Spark DataFrame and ArcGIS Pro.<\/p>\n"},{"acf_fc_layout":"sidebar","content":"","image_reference":false,"layout":"code_snippet","image_reference_figure":"","snippet":"# Running Spark requires setting up an environment with additional packages.\r\n#\r\n# Prior to starting, \r\n# Run the following commands from the Python Command Prompt:\r\n#   conda create --clone arcgispro-py3 -n arcgispro-py3-spark --pinned\r\n#   proswap arcgispro-py3-spark\r\n#   conda install deep-learning-essentials\r\n#   conda install openjdk\r\n#\r\n# Now you are ready to start a basic (local) Spark session.\r\nfrom pyspark.sql import SparkSession  # Must have PySpark &amp; Java, see above\r\nimport arcpy\r\nimport pandas as pd\r\nimport pyarrow as pa\r\n\r\n# Start a SparkSession\r\nspark = SparkSession \\\r\n    .builder \\\r\n    .appName(\"Moving Data With Arrow\") \\\r\n    .config(\"spark.sql.execution.arrow.enabled\", \"true\") \\  # enable Arrow\r\n    .getOrCreate()\r\n\r\n# Convert a Featureclass to an Arrow Table with WKB encoded geometry\r\nnsowlnests_at = arcpy.da.TableToArrowTable(\r\n    \"northern_spotted_owl_nests\",\r\n    geometry_encoding=\"WKB\"\r\n)\r\n\r\n# Store the Arrow Table\u2019s schema in `schema` for later, because\r\n# it will not be preserved during the conversion to a Pandas DataFrame.\r\nschema = nsowlnests_at.schema\r\n\r\n# Define a data type mapping (to arrow data types) for Pandas to use.\r\ndtype_mapping = {\r\n    pa.int8(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int8()),\r\n    pa.int16(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int16()),\r\n    pa.int32(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int32()),\r\n    pa.int64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.int64()),\r\n    pa.uint8(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint8()),\r\n    pa.uint16(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint16()),\r\n    pa.uint32(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint32()),\r\n    pa.uint64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.uint64()),\r\n    pa.float32(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.float32()),\r\n    pa.float64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.float64()),\r\n    pa.float64(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.float64()),\r\n    pa.bool_(): pd.core.arrays.arrow.dtype.ArrowDtype(pa.bool_()),\r\n    pa.binary():pd.core.arrays.arrow.dtype.ArrowDtype(pa.binary()),\r\n    pa.string():pd.core.arrays.arrow.dtype.ArrowDtype(pa.string())\r\n}\r\n\r\n# Convert the Arrow Table to a Pandas DataFrame using `dtype_mapping`.\r\nnsowlnests_pdf = nsowlnests_at.to_pandas(types_mapper=dtype_mapping.get)\r\n\r\n# Convert the Pandas DataFrame to a Spark DataFrame\r\nnsowlnests_sdf = spark.createDataFrame(nsowlnests_pdf)  # _sdf for Spark DataFrame\r\n\r\n# After some processing performed on the Spark DataFrame...\r\n\r\n# Convert the Spark DataFrame back to a Pandas DataFrame\r\nnsowlnests_pdf2 = sdf.select(\"*\").toPandas()\r\n\r\n# Convert the Pandas DataFrame back to an Arrow Table\r\nnsowlnests_at2 = pa.Table.from_pandas(nsowlnests_pdf2, schema=schema)\r\n\r\n# Now use `retrieved_at2` in ArcPy.\r\narcpy.management.CopyFeatures(nsowlnests_at2, \"TestPoint_Copy\")","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>This is one of many ways to prepare your data for use in Spark. Engines typically have user-friendly interfaces for common data transport operations. To perform geospatial analysis on your Spark DataFrame, you will need to look to geospatial analytics engines. For example, Esri\u2019s <a href=\"https:\/\/developers.arcgis.com\/geoanalytics\/\">Geoanalytics Engine<\/a> \u00a0includes over 100 functions and tools that operate on Spark DataFrames to manage, enrich, summarize, or analyze entire geospatial datasets. However, discussing these engines in detail is beyond the scope of this blog.<\/p>\n<h2>Conclusion<\/h2>\n<p>Integrating with the Apache Arrow ecosystem opens the door for you to transport geospatial data from other participating components into ArcGIS Pro, and vice versa. The Apache Arrow story is still developing, and ArcGIS integration will grow alongside it. As the Arrow platform becomes more integrated in other components, it will continue removing barriers to allow you to leverage data from various open-source geospatial data and analytics components with the ArcGIS Pro platform.<\/p>\n"}],"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/01\/arrows-c-1.jpg","wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/01\/arrows-b-1.jpg","related_articles":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Leverage Apache Arrow in ArcGIS Pro<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Leverage Apache Arrow in ArcGIS Pro\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2023-11-16T17:40:18+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\"},\"author\":{\"name\":\"Hannes Ziegler\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/ea3735f79040105f9ee9a92313e0ff82\"},\"headline\":\"Leverage Apache Arrow in ArcGIS Pro\",\"datePublished\":\"2023-11-15T22:37:06+00:00\",\"dateModified\":\"2023-11-16T17:40:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\"},\"wordCount\":6,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"Apache Arrow\",\"ArcPy\",\"data\",\"python\"],\"articleSection\":[\"Analytics\",\"Data Management\",\"Developers\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\",\"name\":\"Leverage Apache Arrow in ArcGIS Pro\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2023-11-15T22:37:06+00:00\",\"dateModified\":\"2023-11-16T17:40:18+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Leverage Apache Arrow in ArcGIS Pro\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/ea3735f79040105f9ee9a92313e0ff82\",\"name\":\"Hannes Ziegler\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/10\/HannesZiegler2-465x465.jpg\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/10\/HannesZiegler2-465x465.jpg\",\"caption\":\"Hannes Ziegler\"},\"description\":\"Hannes is a product engineer on the Python team. He has five years of experience streamlining spatial data analysis workflows in the public and private sectors, and has been with Esri since 2019, where he focuses on the design, evaluation, and documentation of new and existing Python functionality.\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/author\/hziegler\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Leverage Apache Arrow in ArcGIS Pro","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro","og_locale":"en_US","og_type":"article","og_title":"Leverage Apache Arrow in ArcGIS Pro","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2023-11-16T17:40:18+00:00","twitter_card":"summary_large_image","twitter_site":"@ESRI","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro"},"author":{"name":"Hannes Ziegler","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/ea3735f79040105f9ee9a92313e0ff82"},"headline":"Leverage Apache Arrow in ArcGIS Pro","datePublished":"2023-11-15T22:37:06+00:00","dateModified":"2023-11-16T17:40:18+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro"},"wordCount":6,"commentCount":0,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["Apache Arrow","ArcPy","data","python"],"articleSection":["Analytics","Data Management","Developers"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro","name":"Leverage Apache Arrow in ArcGIS Pro","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2023-11-15T22:37:06+00:00","dateModified":"2023-11-16T17:40:18+00:00","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/developers\/leverage-apache-arrow-in-arcgis-pro#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Leverage Apache Arrow in ArcGIS Pro"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/ea3735f79040105f9ee9a92313e0ff82","name":"Hannes Ziegler","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/10\/HannesZiegler2-465x465.jpg","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/10\/HannesZiegler2-465x465.jpg","caption":"Hannes Ziegler"},"description":"Hannes is a product engineer on the Python team. He has five years of experience streamlining spatial data analysis workflows in the public and private sectors, and has been with Esri since 2019, where he focuses on the design, evaluation, and documentation of new and existing Python functionality.","url":"https:\/\/www.esri.com\/arcgis-blog\/author\/hziegler"}]}},"text_date":"November 15, 2023","author_name":"Hannes Ziegler","author_page":"https:\/\/www.esri.com\/arcgis-blog\/author\/hziegler","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/01\/arrows-b-1.jpg","primary_product":"ArcGIS Pro","tag_data":[{"term_id":772592,"name":"Apache Arrow","slug":"apache-arrow","term_group":0,"term_taxonomy_id":772592,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"},{"term_id":31181,"name":"ArcPy","slug":"arcpy","term_group":0,"term_taxonomy_id":31181,"taxonomy":"post_tag","description":"","parent":0,"count":32,"filter":"raw"},{"term_id":230422,"name":"data","slug":"data","term_group":0,"term_taxonomy_id":230422,"taxonomy":"post_tag","description":"","parent":0,"count":33,"filter":"raw"},{"term_id":24341,"name":"python","slug":"python","term_group":0,"term_taxonomy_id":24341,"taxonomy":"post_tag","description":"","parent":0,"count":171,"filter":"raw"}],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1325,"filter":"raw"},{"term_id":23851,"name":"Data Management","slug":"data-management","term_group":0,"term_taxonomy_id":23851,"taxonomy":"category","description":"","parent":0,"count":920,"filter":"raw"},{"term_id":738191,"name":"Developers","slug":"developers","term_group":0,"term_taxonomy_id":738191,"taxonomy":"category","description":"","parent":0,"count":420,"filter":"raw"}],"product_data":[{"term_id":36841,"name":"ArcGIS API for Python","slug":"api-python","term_group":0,"term_taxonomy_id":36841,"taxonomy":"product","description":"","parent":36601,"count":151,"filter":"raw"},{"term_id":765842,"name":"ArcGIS GeoAnalytics Engine","slug":"geoanalytics-engine","term_group":0,"term_taxonomy_id":765842,"taxonomy":"product","description":"","parent":36601,"count":23,"filter":"raw"},{"term_id":36561,"name":"ArcGIS Pro","slug":"arcgis-pro","term_group":0,"term_taxonomy_id":36561,"taxonomy":"product","description":"","parent":0,"count":2035,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=arcgis-pro","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2162682","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/122991"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=2162682"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2162682\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=2162682"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=2162682"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=2162682"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=2162682"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=2162682"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}