{"id":2789462,"date":"2025-05-20T08:03:29","date_gmt":"2025-05-20T15:03:29","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2789462"},"modified":"2025-05-16T13:53:45","modified_gmt":"2025-05-16T20:53:45","slug":"solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","title":{"rendered":"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue"},"author":323502,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341],"tags":[],"industry":[],"product":[765842],"class_list":["post-2789462","blog","type-blog","status-publish","format-standard","hentry","category-analytics","product-geoanalytics-engine"],"acf":{"authors":[{"ID":323502,"user_firstname":"Arif","user_lastname":"Masrur","nickname":"Arif Masrur","user_nicename":"amasrur","display_name":"Arif Masrur","user_email":"amasrur@esri.com","user_url":"","user_registered":"2022-12-01 19:41:13","user_description":"Arif Masrur is a Sr. Solutions Engineer at Esri with expertise in spatial data science and GeoAI. He has a PhD in Geography (GIScience and Data Analytics) from Penn State, and loves transforming space-time data into actionable intelligence.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/10\/g2513150-dev-2022-portraits-1042-scaled-e1759350705582-213x200.jpg' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"},{"ID":342532,"user_firstname":"Sarah","user_lastname":"Battersby","nickname":"Sarah Battersby","user_nicename":"sbattersby","display_name":"Sarah Battersby","user_email":"sbattersby@esri.com","user_url":"","user_registered":"2023-07-17 22:33:34","user_description":"Sarah is a Product Manager for ArcGIS GeoAnalytics Engine. She has a PhD in Geography \/ Cognitive Science from UC Santa Barbara, and enjoys finding ways to make spatial technologies easier to use, understand, and trust.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/07\/Sarah_Battersby-213x200.png' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"short_description":"Enable efficient data integration and scalability with ArcGIS GeoAnalytics Engine and AWS Glue.","flexible_content":[{"acf_fc_layout":"content","content":"<p>When your organization\u2019s data grows and diversifies, geospatial analysis faces two key challenges: efficient data integration and scalability. <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-geoanalytics-engine\/overview\">ArcGIS GeoAnalytics Engine<\/a> \u2013 a cloud-native library with a comprehensive set of \u00a0spatial functions and tools \u2013 addresses these challenges by moving geoanalytics workflows directly to where your data resides &#8211; in data lakes, data warehouses, or ArcGIS. GeoAnalytics Engine is fully integrated with Apache Spark, thus can process and analyze spatial datasets at scale with advanced geospatial operations including space-time pattern mining, track data analysis, geocoding, reverse geocoding, network analysis, and spatial modeling.<\/p>\n<h2>Why Integrate ArcGIS GeoAnalytics Engine with AWS Glue?<\/h2>\n<p>Accessing and preparing large, diverse datasets from various sources can be painstaking and time-consuming \u2013 that is where AWS Glue comes in. <a href=\"https:\/\/aws.amazon.com\/glue\/\">AWS Glue<\/a> is a serverless data integration service designed to simplify the discovery, preparation, transfer, and integration of your datasets from various sources for analytics, machine learning, and application development. It automatically scales compute resources, supports Spark and PySpark jobs, and runs your extract, transform, and load (ETL) tasks, making it more efficient to clean, organize, and securely move datasets between sources like <a href=\"https:\/\/aws.amazon.com\/s3\/\">Amazon S3<\/a> and <a href=\"https:\/\/aws.amazon.com\/redshift\/\">Redshift<\/a>.<\/p>\n<p>GeoAnalytics Engine in AWS Glue offers your organization a streamlined geospatial data integration with both data lakes and ArcGIS. With that, you are now better equipped to orchestrate your ETL pipeline that can run highly scalable geospatial analysis both on-demand and on-schedule.<\/p>\n<p>Note that while ArcGIS GeoAnalytics Engine is certified and tested with AWS EMR, it is not specifically tested with AWS Glue. At this time, network analysis and geocoding tools are not supported in Glue.<\/p>\n<h2>Getting Started in AWS Glue<\/h2>\n<p>Let\u2019s showcase how you can get up and running with GeoAnalytics Engine in <a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/author-job-glue.html\">AWS Glue Studio<\/a> to run geoanalytics workflows \u00a0<a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/aws-glue-programming-intro-tutorial.html\">interactively or on schedule<\/a>. First, follow the guideline <a href=\"https:\/\/developers.arcgis.com\/geoanalytics\/install\/\">here<\/a> to download ArcGIS GeoAnalytics Engine distribution. Next, sign in to your <a href=\"https:\/\/aws.amazon.com\/\">AWS Management Console<\/a> and upload those distribution files to your S3 bucket. After that navigate to AWS Glue Studio and start or upload a Notebook with an appropriate <a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/create-notebook-job.html\">IAM role<\/a> so that it has permissions for data access and ETL job.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2789472,"id":2789472,"title":"Picture1","filename":"Picture1.png","filesize":26706,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\/picture1-120","alt":"","author":"323502","description":"","caption":"","name":"picture1-120","status":"inherit","uploaded_to":2789462,"date":"2025-05-10 21:17:04","modified":"2025-05-10 21:17:04","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":637,"height":157,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1-213x157.png","thumbnail-width":213,"thumbnail-height":157,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","medium-width":464,"medium-height":114,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","medium_large-width":637,"medium_large-height":157,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","large-width":637,"large-height":157,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","1536x1536-width":637,"1536x1536-height":157,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","2048x2048-width":637,"2048x2048-height":157,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","card_image-width":637,"card_image-height":157,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture1.png","wide_image-width":637,"wide_image-height":157}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>IAM Role Configuration for Secure Access:<\/strong><\/p>\n<ul>\n<li>Create a specific IAM role for Glue jobs<\/li>\n<li>Apply least privilege principle<\/li>\n<li>Required permissions:\n<ul>\n<li>secretsmanager:GetSecretValue<\/li>\n<li>s3:GetObject for GeoAnalytics Engine files<\/li>\n<li>s3:PutObject for output locations<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2789482,"id":2789482,"title":"Picture2","filename":"Picture2.png","filesize":56376,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\/picture2-78","alt":"","author":"323502","description":"","caption":"","name":"picture2-78","status":"inherit","uploaded_to":2789462,"date":"2025-05-10 21:17:06","modified":"2025-05-10 21:17:06","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":637,"height":325,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","medium-width":464,"medium-height":237,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","medium_large-width":637,"medium_large-height":325,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","large-width":637,"large-height":325,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","1536x1536-width":637,"1536x1536-height":325,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","2048x2048-width":637,"2048x2048-height":325,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","card_image-width":637,"card_image-height":325,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture2.png","wide_image-width":637,"wide_image-height":325}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2>Configure Spark Environment<\/h2>\n<p>Let\u2019s\u00a0 start with an empty notebook. In the notebook cell, first, configure the compute resources on the backend including the Glue version, worker type and number of workers, using the following commands:<\/p>\n<blockquote>\n<pre>%glue_version x.x\r\n%number_of_workers 5\r\n%worker_type G.1X<\/pre>\n<\/blockquote>\n<p>For more information about scaling your AWS Glue for Apache Spark jobs, check out this <a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/scale-your-aws-glue-for-apache-spark-jobs-with-new-larger-worker-types-g-4x-and-g-8x\/\">blog post<\/a>.<\/p>\n<h2>Import and Configure GeoAnalytics Engine Distribution<\/h2>\n<p>Given that we have already uploaded GeoAnalytics Engine installation files to S3, next, we import those necessary JARs and Python Wheel (.whl) files to the Spark environment, using the following commands:<\/p>\n<blockquote>\n<pre>%extra_jars s3:\/\/ga-engine\/Install_GeoAnalytics\/geoanalytics_2.12-1.6.0.jar\r\n%extra_py_files s3:\/\/ga-engine\/Install_GeoAnalytics\/geoanalytics-1.6.0-py3-none-any.whl<\/pre>\n<\/blockquote>\n<p>Now, you have to set several Spark properties, as shown below, before a Spark context is created.<\/p>\n<blockquote>\n<pre>%%configure\r\n{\u00a0\u00a0 \"--conf\": \"spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrator=com.esri.geoanalytics.KryoRegistrator --conf spark.plugins=com.esri.geoanalytics.Plugin\"}<\/pre>\n<\/blockquote>\n<p>Once these properties are set, you will be able to import geoanalytics in your Spark session and authorize the module using a username and password, a license file, or an API key. Next, initialize a Glue session:<\/p>\n<blockquote>\n<pre>import sys\r\nfrom awsglue.transforms import *\r\nfrom awsglue.utils import getResolvedOptions\r\nfrom awsglue.context import GlueContext\r\nfrom awsglue.job import Job\r\n\r\nfrom pyspark.context import SparkContext\r\nsc = SparkContext.getOrCreate()\r\nglueContext = GlueContext(sc)\r\nspark = glueContext.spark_session\r\njob = Job(glueContext)<\/pre>\n<\/blockquote>\n<h3>Import <em>geoanalytics<\/em> library and authenticate<\/h3>\n<p>We will retrieve a JSON credentials file stored in our S3 bucket and then use those credentials to authenticate the geoanalytics module.<\/p>\n<blockquote>\n<pre>import geoanalytics\r\nimport json\r\nimport boto3<\/pre>\n<pre><em># Initialize S3 client<\/em>\r\ns3_client = boto3.client('s3')\r\nbucket_name = 'your-s3-bucket-name'\r\nfile_key = 'gae_credentials.json'\r\n\r\nresponse = s3_client.get_object(Bucket=bucket_name, Key=file_key)\r\ncreds_json = json.loads(response['Body'].read().decode('utf-8'))\r\n\r\n<em># Authenticate<\/em>\r\ngeoanalytics.auth(username = creds_json['geoanalytics']['username'], \\\r\npassword = creds_json['geoanalytics']['password'])<\/pre>\n<\/blockquote>\n<p>Alternatively, if you want to authenticate the module using a license file stored in your S3 bucket:<\/p>\n<blockquote>\n<pre>geoanalytics.auth(license_file=\"s3:\/\/your-secure-bucket\/path\/to\/license.ecp\")<\/pre>\n<\/blockquote>\n<p>You can also use an API key for an active GeoAnalytics Engine subscription \u2013 see <a href=\"https:\/\/developers.arcgis.com\/geoanalytics\/install\/authorization\/#:~:text=Provide%20an%20API%20key%20for%20an%20active%20GeoAnalytics%20Engine%20subscription.\">here<\/a>.<\/p>\n<h2>Use Case: Analyzing Real-Time Flight Data<\/h2>\n<p>Let\u2019s take a look at how you can use AWS Glue and ArcGIS GeoAnalytics Engine together to tackle your analytics problems.\u00a0 We\u2019ll do this with an example exploring real-time streaming of flight data.<\/p>\n<p>Real-time streaming data needs to be processed instantly or on schedule for optimal decision-making, efficiency, cost savings, ensuring safety and security. <a href=\"https:\/\/www.flightaware.com\/\">FlightAware<\/a> is a leading provider of real-time flight tracking data. Thanks to the integration of <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-velocity\/overview\">ArcGIS Velocity<\/a> and FlightAware, organizations can now bring live, global aircraft positions (per second) directly into <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-online\/overview\">ArcGIS Online<\/a> as feature layer, as well as into big data storages like S3 as formats like parquet.<\/p>\n<p>Let\u2019s read FlightAware\u2019s flight position data (over 200 million records) using data exported periodically from ArcGIS Velocity to our S3 bucket.\u00a0 We read this into a Spark DataFrame in a Glue notebook. Using GeoAnalytics Engine&#8217;s <a href=\"https:\/\/developers.arcgis.com\/geoanalytics\/sql-functions\/\">ST-like SQL functions<\/a>, create point geometries and filter the data to include only flights inbound to Ronald Reagan Washington National Airport.<\/p>\n<blockquote>\n<pre>path = \"s3:\/\/ga-engine\/Datasets\/FlightAware\/feed\/*.parquet\"\r\ndf = spark.read.format(\"parquet\").load(path)\r\ndf_dca = (df\r\n     .withColumn(\"point\", ST.transform(ST.point(\"lon\", \"lat\", 4326), 8857))\r\n     .filter(df.dest == \"KDCA\")\r\n     .filter(\"point IS NOT NULL\")\r\n     .select(\"id\", \"aircrafttype\", \"orig\", \"dest\", \"ident\", \"heading\", \"point\", \"clock\", \"alt\"))<\/pre>\n<\/blockquote>\n<p>Next, using <a href=\"https:\/\/developers.arcgis.com\/geoanalytics\/tools\/reconstruct-tracks\/\">ReconstructTracks<\/a> tool, connect all time-sequential points for each flight into tracks and summarize records within each track.<\/p>\n<blockquote>\n<pre>from geoanalytics.tools import ReconstructTracks\r\ntracks = ReconstructTracks() \\\r\n     .setTrackFields(\"id\") \\\r\n     .setDistanceMethod(distance_method=\"Planar\") \\\r\n     .addSummaryField(summary_field=\"alt\", statistic=\"Min\") \\\r\n     .addSummaryField(summary_field=\"alt\", statistic=\"Max\") \\\r\n     .addSummaryField(summary_field=\"alt\", statistic=\"Mean\",\r\nalias=\"avg_alt\") \\\r\n     .run(dataframe=df1) \\\r\n     .show(5)<\/pre>\n<\/blockquote>\n"},{"acf_fc_layout":"image","image":{"ID":2789502,"id":2789502,"title":"Picture4","filename":"Picture4.png","filesize":65266,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\/picture4-68","alt":"","author":"323502","description":"","caption":"","name":"picture4-68","status":"inherit","uploaded_to":2789462,"date":"2025-05-10 21:27:13","modified":"2025-05-10 21:27:13","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":624,"height":108,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4-213x108.png","thumbnail-width":213,"thumbnail-height":108,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","medium-width":464,"medium-height":80,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","medium_large-width":624,"medium_large-height":108,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","large-width":624,"large-height":108,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","1536x1536-width":624,"1536x1536-height":108,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","2048x2048-width":624,"2048x2048-height":108,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","card_image-width":624,"card_image-height":108,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/Picture4.png","wide_image-width":624,"wide_image-height":108}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h3>Export flight tracks to ArcGIS Online and visualize<\/h3>\n<p>You can export the <em>tracks<\/em> DataFrame to ArcGIS Online as a hosted feature layer and create a flight track app to visually explore durations of each flight (Figure 1). These near-real-time flight tracks can be further analyzed alongside weather data for hazard detection and alerting. In the event of an aircraft incident or near-miss, investigators can reconstruct flight paths to gain insights into the circumstances surrounding the event.<\/p>\n<blockquote>\n<pre>tracks.write.format(\"feature-service\")\u00a0 \\\r\n   .option(\"gis\", \"myGIS\") \\\r\n   .option(\"serviceName\", \"flightAware_DCA_Tracks\") \\\r\n   .option(\"layerName\", \"flightAware_DCA_Tracks\")\u00a0 \\\r\n   .save()<\/pre>\n<\/blockquote>\n"},{"acf_fc_layout":"image","image":{"ID":2789552,"id":2789552,"title":"flight tracks animation","filename":"flight-tracks-animation.gif","filesize":944339,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\/flight-tracks-animation","alt":"","author":"323502","description":"","caption":"","name":"flight-tracks-animation","status":"inherit","uploaded_to":2789462,"date":"2025-05-11 01:49:03","modified":"2025-05-11 01:49:03","menu_order":0,"mime_type":"image\/gif","type":"image","subtype":"gif","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1392,"height":772,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation-213x200.gif","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","medium-width":464,"medium-height":257,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","medium_large-width":768,"medium_large-height":426,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","large-width":1392,"large-height":772,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","1536x1536-width":1392,"1536x1536-height":772,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","2048x2048-width":1392,"2048x2048-height":772,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation-826x458.gif","card_image-width":826,"card_image-height":458,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/05\/flight-tracks-animation.gif","wide_image-width":1392,"wide_image-height":772}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Figure 1. Daily tracks and duration of flights (Feb 6-9, 2025) to Ronald Reagan Washington National Airport (Code: DCA). Data source: <a href=\"https:\/\/www.flightaware.com\/\">FlightAware<\/a><\/p>\n<p>By analyzing real-time flight tracking and historical flight data, geo-enriched with other context-aware information such as weather events, and gaining predictive insights, companies can identify problem areas such as delays, flight incidents, route inefficiencies, or recurring issues with specific airports or airlines. These insights enable proactive decision-making, optimizing operations, improving customer experiences, streamlining supply chain management, and more.<\/p>\n<h2>Automating the Workflow with AWS Glue Triggers<\/h2>\n<p>As your streaming and historical datasets continuously flow in, you can execute this notebook workflow on-demand and also based on a schedule using <a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/about-triggers.html\">AWS Glue triggers<\/a>. This allows for easy and reliable time-based job scheduling (e.g., every hour, day, week, or month), making it more efficient to automate data workflows and reducing the need for manual oversight or external scheduling systems.<\/p>\n<h2>Conclusion: Unlocking Scalable Geospatial Intelligence<\/h2>\n<p>This blog offers a step-by-step guide on setting up and executing ArcGIS GeoAnalytics Engine workflows in AWS Glue, emphasizing the advantages of utilizing both for big data ETL and spatial analytics jobs via an aviation industry use case using FlightAware Firehose\u2120 Flight Data Feed.<\/p>\n<p>By leveraging AWS Glue&#8217;s fully managed, serverless environment, and the Apache Spark-based \u00a0functionalities of GeoAnalytics Engine, you can process and analyze large spatial datasets without the need for infrastructure management. Whether you\u2019re working with real-time streaming data, very large historical datasets spanning years to decades, or complex geospatial operations, this integrated solution streamlines your ETL processes, improves speed, all while providing robust tools for advanced spatial analysis.<\/p>\n<p>We hope this guide inspires you to explore how GeoAnalytics Engine and AWS Glue can address your geospatial data challenges. For more details, check out the official documentation on <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-geoanalytics-engine\/overview\">GeoAnalytics Engine<\/a> and <a href=\"https:\/\/aws.amazon.com\/glue\/\">AWS Glue.<\/a> We\u2019d love to hear how you&#8217;re using these tools in your own projects and workflows\u2014feel free to share your experiences with us!<\/p>\n<p><strong>WARNING:<\/strong> The example code in this blog demonstrates concepts only. Production deployments should undergo comprehensive security reviews and follow your organizations security policies and compliance requirements.<\/p>\n"}],"related_articles":"","show_article_image":false,"card_image":false,"wide_image":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\"},\"author\":{\"name\":\"Arif Masrur\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2c8aee17d2bb73b7dcd3e3817a88b72a\"},\"headline\":\"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue\",\"datePublished\":\"2025-05-20T15:03:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\"},\"wordCount\":12,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"articleSection\":[\"Analytics\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\",\"name\":\"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2025-05-20T15:03:29+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2c8aee17d2bb73b7dcd3e3817a88b72a\",\"name\":\"Arif Masrur\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/10\/g2513150-dev-2022-portraits-1042-scaled-e1759350705582-213x200.jpg\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/10\/g2513150-dev-2022-portraits-1042-scaled-e1759350705582-213x200.jpg\",\"caption\":\"Arif Masrur\"},\"description\":\"Arif Masrur is a Sr. Solutions Engineer at Esri with expertise in spatial data science and GeoAI. He has a PhD in Geography (GIScience and Data Analytics) from Penn State, and loves transforming space-time data into actionable intelligence.\",\"sameAs\":[\"https:\/\/x.com\/Arif_Masrur\"],\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/author\/amasrur\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","og_locale":"en_US","og_type":"article","og_title":"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","twitter_card":"summary_large_image","twitter_site":"@ESRI","twitter_misc":{"Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue"},"author":{"name":"Arif Masrur","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2c8aee17d2bb73b7dcd3e3817a88b72a"},"headline":"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue","datePublished":"2025-05-20T15:03:29+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue"},"wordCount":12,"commentCount":0,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"articleSection":["Analytics"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","name":"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2025-05-20T15:03:29+00:00","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Solving Big Data Geoanalytics Challenges with ArcGIS GeoAnalytics Engine in AWS Glue"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2c8aee17d2bb73b7dcd3e3817a88b72a","name":"Arif Masrur","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/10\/g2513150-dev-2022-portraits-1042-scaled-e1759350705582-213x200.jpg","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/10\/g2513150-dev-2022-portraits-1042-scaled-e1759350705582-213x200.jpg","caption":"Arif Masrur"},"description":"Arif Masrur is a Sr. Solutions Engineer at Esri with expertise in spatial data science and GeoAI. He has a PhD in Geography (GIScience and Data Analytics) from Penn State, and loves transforming space-time data into actionable intelligence.","sameAs":["https:\/\/x.com\/Arif_Masrur"],"url":"https:\/\/www.esri.com\/arcgis-blog\/author\/amasrur"}]}},"text_date":"May 20, 2025","author_name":"Multiple Authors","author_page":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-engine\/analytics\/solving-big-data-geoanalytics-challenges-with-arcgis-geoanalytics-engine-in-aws-glue","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Newsroom-Keyart-Wide-1920-x-1080.jpg","primary_product":"ArcGIS GeoAnalytics Engine","tag_data":[],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1331,"filter":"raw"}],"product_data":[{"term_id":765842,"name":"ArcGIS GeoAnalytics Engine","slug":"geoanalytics-engine","term_group":0,"term_taxonomy_id":765842,"taxonomy":"product","description":"","parent":36601,"count":23,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=geoanalytics-engine","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2789462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/323502"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=2789462"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2789462\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=2789462"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=2789462"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=2789462"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=2789462"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=2789462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}