{"id":576482,"date":"2019-08-06T10:33:29","date_gmt":"2019-08-06T17:33:29","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=576482"},"modified":"2019-08-06T15:59:12","modified_gmt":"2019-08-06T22:59:12","slug":"extend-your-big-data-analysis-with-spark","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark","title":{"rendered":"Extend Your Big Data Analysis with GeoAnalytics Server and Spark"},"author":6831,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341,23851],"tags":[25351,35561,35661,24341,25631],"industry":[],"product":[36571,36941],"class_list":["post-576482","blog","type-blog","status-publish","format-standard","hentry","category-analytics","category-data-management","tag-big-data","tag-geoanalytics","tag-machine-learning","tag-python","tag-spatial-analysis","product-arcgis-enterprise","product-geoanalytics-server"],"acf":{"short_description":"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.","flexible_content":[{"acf_fc_layout":"content","content":"<p><a href=\"https:\/\/enterprise.arcgis.com\/en\/server\/latest\/get-started\/windows\/what-is-arcgis-geoanalytics-server-.htm\">ArcGIS GeoAnalytics Server<\/a> comes with 25 tools at 10.7, but what if you want to run distributed analysis and the tool you need isn&#8217;t available? With this release, we on the GeoAnalytics team are excited to announce a new way of managing and analyzing your large datasets using a tool called <a href=\"https:\/\/developers.arcgis.com\/rest\/services-reference\/run-python-script.htm\">Run Python Script<\/a>.<\/p>\n<p>As you might guess, Run Python Script executes code in a Python environment on your GeoAnalytics Server. So, what\u2019s the excitement all about? This Python environment gives you access to <a href=\"https:\/\/spark.apache.org\/\">Apache Spark<\/a>, the engine that distributes data and analysis across the cores of each machine in a GeoAnalytics Server site. With Spark you can customize your analysis and extend your analysis capabilities by:<\/p>\n<ul>\n<li>Querying and summarizing your data using SQL<\/li>\n<li>Turning analysis workflows into pipelines of GeoAnalytics tools<\/li>\n<li>Classifying, clustering, or modeling non-spatial data with included machine learning libraries<\/li>\n<\/ul>\n<p>All using the power of distributed compute! Just like other GeoAnalytics tools, this means that you can find answers in your large datasets much faster than other non-distributed tools.<\/p>\n<p>The <a href=\"https:\/\/spark.apache.org\/docs\/latest\/api\/python\/index.html#\">pyspark API<\/a> provides an interface for working with Spark, and in this blog post we\u2019d like to show you how easy it is to get started with pyspark and begin taking advantage of all it has to offer.<\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Explore and manage ArcGIS Enterprise layers as DataFrames<\/h3>\n<p>When using the pyspark API, data is often represented as <a href=\"https:\/\/spark.apache.org\/docs\/2.3.0\/sql-programming-guide.html#datasets-and-dataframes\">Spark DataFrames<\/a>. If you\u2019re familiar with Pandas or R DataFrames, the Spark version is conceptually similar, but optimized for distributed data processing.<\/p>\n<p>When you perform an operation on a DataFrame (such as running an SQL query) the source data will be distributed across the cores of your server site, meaning that you can work with large datasets much faster than using a non-distributed approach.<\/p>\n<p>Run Python Script includes built-in support for loading ArcGIS Enterprise layers into Spark DataFrames, which means you can create a DataFrame from a feature service or <a href=\"https:\/\/enterprise.arcgis.com\/en\/server\/latest\/get-started\/windows\/what-is-a-big-data-file-share.htm\">big data file share<\/a> with one line of code.<\/p>\n"},{"acf_fc_layout":"blockquote","content":"<p>df = spark.read.format(\"webgis\").load()<\/p>\n"},{"acf_fc_layout":"content","content":"<p>DataFrame operations can then be called to query the layer, update the schema, summarize columns, and more. Geometry and time info will be preserved in fields called <em>$geometry<\/em> and <em>$time<\/em>, so you can use them like any other column.<\/p>\n<p>When you\u2019re ready to write a layer back to ArcGIS Enterprise, all it takes is:<\/p>\n"},{"acf_fc_layout":"blockquote","content":"<p>df.write.format(\u201cwebgis\u201d).save()<\/p>\n"},{"acf_fc_layout":"content","content":"<p>and the result layer will be available as a feature service or a big data file share in your Portal. The pyspark API also supports writing to many types of locations external to ArcGIS Enterprise, allowing for the connection of GeoAnalytics to other big data solutions.<\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Create analysis pipelines with GeoAnalytics tools<\/h3>\n<p>The Run Python Script Python environment comes with a <a href=\"https:\/\/developers.arcgis.com\/rest\/services-reference\/using-geoanalytics-tools-in-pyspark.htm\">geoanalytics module<\/a> which exposes most GeoAnalytics tools as pyspark methods. These methods accept DataFrames as input layers and return results as DataFrames as well, but nothing is written out to a data store until you call <em>write()<\/em> on the DataFrame.<\/p>\n<p>This means that you can chain multiple GeoAnalytics tools together into a <a href=\"https:\/\/spark.apache.org\/docs\/2.3.0\/ml-pipeline.html\">pipeline<\/a>, which both reduces overall processing time and avoids creating unneeded intermediate layers in your data store. When working with large datasets these intermediate results could amount to 100&#8217;s of GB of data &#8211; but not with pyspark!<\/p>\n<p>Check out <a href=\"https:\/\/developers.arcgis.com\/rest\/services-reference\/run-python-script-examples.htm#ESRI_SECTION1_C30D73392D964D51A8B606128A8A6E8F\">this example<\/a> script that chains together several GeoAnalytics tools into a single analysis pipeline.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":581292,"id":581292,"title":"pipeline_diagram","filename":"pipeline_diagram-1.png","filesize":43389,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\/pipeline_diagram-2","alt":"Diagram showing that an intermediate layer is created when using stand alone tools, but not when using an analysis pipeline.","author":"6831","description":"","caption":"","name":"pipeline_diagram-2","status":"inherit","uploaded_to":576482,"date":"2019-08-06 22:18:43","modified":"2019-08-06 22:18:59","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1806,"height":1028,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","medium-width":459,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","medium_large-width":768,"medium_large-height":437,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","large-width":1806,"large-height":1028,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","1536x1536-width":1536,"1536x1536-height":874,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","2048x2048-width":1806,"2048x2048-height":1028,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1-817x465.png","card_image-width":817,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/pipeline_diagram-1.png","wide_image-width":1806,"wide_image-height":1028}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h3>Leverage distributed machine learning tools with pyspark.mllib<\/h3>\n<p>While the geoanalytics module offers powerful spatial analysis tools, the <a href=\"https:\/\/spark.apache.org\/docs\/2.0.0\/api\/python\/pyspark.mllib.html\">pyspark.mllib<\/a> package includes dozens of non-spatial distributed tools for classification, prediction, clustering, and more.<\/p>\n<p>Now that the pyspark.mllib package is exposed, you can create a <a href=\"https:\/\/spark.apache.org\/docs\/2.3.0\/ml-classification-regression.html#naive-bayes\">Na\u00efve Bayes classifier<\/a>, perform multi-variate clustering with <a href=\"https:\/\/spark.apache.org\/docs\/2.3.0\/ml-clustering.html#k-means\">k-means<\/a>, or build an <a href=\"https:\/\/spark.apache.org\/docs\/2.3.0\/ml-classification-regression.html#isotonic-regression\">isotonic regression model<\/a>, all using the resources on your GeoAnalytics Server site.<\/p>\n<p>These tools input and output DataFrames, which means you can chain them together with both GeoAnalytics tools and each other to create pipelines. While pyspark.mllib doesn\u2019t have native support for spatial data, you can use GeoAnalytics to calculate tabular representations of spatial data and use that with pyspark.mllib.<\/p>\n<p>For example, you could create a <a href=\"https:\/\/enterprise.arcgis.com\/en\/portal\/latest\/use\/geoanalytics-build-multi-variable-grid.htm\">multi-variable grid<\/a> with GeoAnalytics and use variables (like distance to nearest feature or attribute of nearest feature) as training data in a <a href=\"https:\/\/spark.apache.org\/docs\/2.3.0\/ml-classification-regression.html#linear-support-vector-machine\">support vector machine<\/a>, a method not available as a GeoAnalytics tool but exposed in the mllib package.<\/p>\n<p>Check out <a href=\"https:\/\/developers.arcgis.com\/rest\/services-reference\/run-python-script-examples.htm#ESRI_SECTION1_A3599B1F473D4951A13FFA8BBFBA1502\">this example<\/a> of how one might integrate GeoAnalytics and pyspark.mllib.<\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Summary<\/h3>\n<p>We&#8217;re excited to see what you do with this new way of interrogating and analyzing your large data with GeoAnalytics Server. In addition to the samples linked-to above, be sure to check out <a href=\"https:\/\/github.com\/noahslocum\/RunPythonScript-Samples\">this GitHub page<\/a> I made with more samples and a <a href=\"https:\/\/github.com\/noahslocum\/RunPythonScript-Samples\/blob\/master\/SubmitRPSJob.py\">utility for executing the Run Python Script tool<\/a> via command line or Python.<\/p>\n"}],"authors":[{"ID":6831,"user_firstname":"Noah","user_lastname":"Slocum","nickname":"noahmead","user_nicename":"noahmead","display_name":"Noah Slocum","user_email":"NSlocum@esri.com","user_url":"","user_registered":"2018-03-02 00:18:50","user_description":"I am a product engineer on the GeoAnalytics team at Esri in Redlands, CA","user_avatar":"<img alt='' src='https:\/\/secure.gravatar.com\/avatar\/68b5806fd9e9bc28cde7937731930d4ccd3f5614bc35be696fed904ff1e6677f?s=96&#038;d=blank&#038;r=g' srcset='https:\/\/secure.gravatar.com\/avatar\/68b5806fd9e9bc28cde7937731930d4ccd3f5614bc35be696fed904ff1e6677f?s=192&#038;d=blank&#038;r=g 2x' class='avatar avatar-96 photo' height='96' width='96' loading='lazy' decoding='async'\/>"}],"related_articles":[{"ID":462182,"post_author":"8022","post_date":"2019-03-22 09:25:18","post_date_gmt":"2019-03-22 16:25:18","post_content":"","post_title":"What's New in GeoAnalytics Server at ArcGIS Enterprise 10.7","post_excerpt":"","post_status":"publish","comment_status":"open","ping_status":"closed","post_password":"","post_name":"whats-new-in-arcgis-geoanalytics-server-at-arcgis-enterprise-10-7","to_ping":"","pinged":"","post_modified":"2020-02-21 12:20:47","post_modified_gmt":"2020-02-21 20:20:47","post_content_filtered":"","post_parent":0,"guid":"http:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=462182","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"},{"ID":410232,"post_author":"8022","post_date":"2019-01-16 17:17:09","post_date_gmt":"2019-01-17 01:17:09","post_content":"","post_title":"Following the flow of data in GeoAnalytics Server","post_excerpt":"","post_status":"publish","comment_status":"open","ping_status":"closed","post_password":"","post_name":"following-the-flow-of-data-in-geoanalytics-server","to_ping":"","pinged":"","post_modified":"2020-02-21 12:22:08","post_modified_gmt":"2020-02-21 20:22:08","post_content_filtered":"","post_parent":0,"guid":"http:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=410232","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"},{"ID":454742,"post_author":"6831","post_date":"2019-03-25 11:23:22","post_date_gmt":"2019-03-25 18:23:22","post_content":"","post_title":"Using GeoAnalytics Server to detect delays in public transit","post_excerpt":"","post_status":"publish","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"geoanalytics-detect-delays-public-transit","to_ping":"","pinged":"","post_modified":"2019-03-25 12:04:17","post_modified_gmt":"2019-03-25 19:04:17","post_content_filtered":"","post_parent":0,"guid":"http:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=454742","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"},{"ID":386402,"post_author":"8022","post_date":"2018-12-19 11:46:13","post_date_gmt":"2018-12-19 19:46:13","post_content":"","post_title":"GeoAnalytics Server Analysis Demo - Ozone Detection","post_excerpt":"","post_status":"publish","comment_status":"open","ping_status":"closed","post_password":"","post_name":"geoanalytics-server-analysis-demo-ozone-detection","to_ping":"","pinged":"","post_modified":"2020-02-21 12:22:14","post_modified_gmt":"2020-02-21 20:22:14","post_content_filtered":"","post_parent":0,"guid":"http:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=386402","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"},{"ID":380092,"post_author":"8022","post_date":"2019-03-01 15:55:43","post_date_gmt":"2019-03-01 23:55:43","post_content":"","post_title":"Detecting Water Utility Leaks with GeoAnalytics Server","post_excerpt":"","post_status":"publish","comment_status":"open","ping_status":"closed","post_password":"","post_name":"detecting-water-utility-leaks-with-geoanalytics-server","to_ping":"","pinged":"","post_modified":"2020-02-21 12:21:19","post_modified_gmt":"2020-02-21 20:21:19","post_content_filtered":"","post_parent":0,"guid":"http:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=380092","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"},{"ID":564332,"post_author":"8132","post_date":"2019-07-16 08:21:25","post_date_gmt":"2019-07-16 15:21:25","post_content":"","post_title":"Spark-Powered Analysis with GeoAnalytics Desktop Tools vs. GeoAnalytics Server","post_excerpt":"","post_status":"publish","comment_status":"closed","ping_status":"closed","post_password":"","post_name":"spark-powered-analysis-with-geoanalytics-desktop-tools-vs-geoanalytics-server","to_ping":"","pinged":"","post_modified":"2019-07-16 09:41:36","post_modified_gmt":"2019-07-16 16:41:36","post_content_filtered":"","post_parent":0,"guid":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=564332","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"}],"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/mika-baumeister-Wpnoqo2plFA-unsplash.jpg","wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/nasa-Q1p7bh3SHj8-unsplash.jpg"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Extend Your Big Data Analysis with GeoAnalytics Server and Spark<\/title>\n<meta name=\"description\" content=\"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Extend Your Big Data Analysis with GeoAnalytics Server and Spark\" \/>\n<meta property=\"og:description\" content=\"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2019-08-06T22:59:12+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\"},\"author\":{\"name\":\"Noah Slocum\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/500a5db71529a56874de29a69e65edb8\"},\"headline\":\"Extend Your Big Data Analysis with GeoAnalytics Server and Spark\",\"datePublished\":\"2019-08-06T17:33:29+00:00\",\"dateModified\":\"2019-08-06T22:59:12+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\"},\"wordCount\":10,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"Big Data\",\"GeoAnalytics\",\"machine learning\",\"python\",\"spatial analysis\"],\"articleSection\":[\"Analytics\",\"Data Management\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\",\"name\":\"Extend Your Big Data Analysis with GeoAnalytics Server and Spark\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2019-08-06T17:33:29+00:00\",\"dateModified\":\"2019-08-06T22:59:12+00:00\",\"description\":\"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Extend Your Big Data Analysis with GeoAnalytics Server and Spark\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/500a5db71529a56874de29a69e65edb8\",\"name\":\"Noah Slocum\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/68b5806fd9e9bc28cde7937731930d4ccd3f5614bc35be696fed904ff1e6677f?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/68b5806fd9e9bc28cde7937731930d4ccd3f5614bc35be696fed904ff1e6677f?s=96&d=blank&r=g\",\"caption\":\"Noah Slocum\"},\"description\":\"I am a product engineer on the GeoAnalytics team at Esri in Redlands, CA\",\"url\":\"\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Extend Your Big Data Analysis with GeoAnalytics Server and Spark","description":"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark","og_locale":"en_US","og_type":"article","og_title":"Extend Your Big Data Analysis with GeoAnalytics Server and Spark","og_description":"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2019-08-06T22:59:12+00:00","twitter_card":"summary_large_image","twitter_site":"@ESRI","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark"},"author":{"name":"Noah Slocum","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/500a5db71529a56874de29a69e65edb8"},"headline":"Extend Your Big Data Analysis with GeoAnalytics Server and Spark","datePublished":"2019-08-06T17:33:29+00:00","dateModified":"2019-08-06T22:59:12+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark"},"wordCount":10,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["Big Data","GeoAnalytics","machine learning","python","spatial analysis"],"articleSection":["Analytics","Data Management"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark","name":"Extend Your Big Data Analysis with GeoAnalytics Server and Spark","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2019-08-06T17:33:29+00:00","dateModified":"2019-08-06T22:59:12+00:00","description":"Perform custom distributed spatial analysis with the pyspark API and a new GeoAnalytics Server tool called Run Python Script.","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/analytics\/extend-your-big-data-analysis-with-spark#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Extend Your Big Data Analysis with GeoAnalytics Server and Spark"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/500a5db71529a56874de29a69e65edb8","name":"Noah Slocum","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/68b5806fd9e9bc28cde7937731930d4ccd3f5614bc35be696fed904ff1e6677f?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/68b5806fd9e9bc28cde7937731930d4ccd3f5614bc35be696fed904ff1e6677f?s=96&d=blank&r=g","caption":"Noah Slocum"},"description":"I am a product engineer on the GeoAnalytics team at Esri in Redlands, CA","url":""}]}},"text_date":"August 6, 2019","author_name":"Noah Slocum","author_page":false,"custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2019\/07\/nasa-Q1p7bh3SHj8-unsplash.jpg","primary_product":"ArcGIS GeoAnalytics Server","tag_data":[{"term_id":25351,"name":"Big Data","slug":"big-data","term_group":0,"term_taxonomy_id":25351,"taxonomy":"post_tag","description":"","parent":0,"count":36,"filter":"raw"},{"term_id":35561,"name":"GeoAnalytics","slug":"geoanalytics","term_group":0,"term_taxonomy_id":35561,"taxonomy":"post_tag","description":"","parent":0,"count":19,"filter":"raw"},{"term_id":35661,"name":"machine learning","slug":"machine-learning","term_group":0,"term_taxonomy_id":35661,"taxonomy":"post_tag","description":"","parent":0,"count":41,"filter":"raw"},{"term_id":24341,"name":"python","slug":"python","term_group":0,"term_taxonomy_id":24341,"taxonomy":"post_tag","description":"","parent":0,"count":171,"filter":"raw"},{"term_id":25631,"name":"spatial analysis","slug":"spatial-analysis","term_group":0,"term_taxonomy_id":25631,"taxonomy":"post_tag","description":"","parent":0,"count":59,"filter":"raw"}],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1328,"filter":"raw"},{"term_id":23851,"name":"Data Management","slug":"data-management","term_group":0,"term_taxonomy_id":23851,"taxonomy":"category","description":"","parent":0,"count":920,"filter":"raw"}],"product_data":[{"term_id":36571,"name":"ArcGIS Enterprise","slug":"arcgis-enterprise","term_group":0,"term_taxonomy_id":36571,"taxonomy":"product","description":"","parent":0,"count":973,"filter":"raw"},{"term_id":36941,"name":"ArcGIS GeoAnalytics Server","slug":"geoanalytics-server","term_group":0,"term_taxonomy_id":36941,"taxonomy":"product","description":"","parent":36571,"count":21,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=geoanalytics-server","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/576482","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/6831"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=576482"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/576482\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=576482"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=576482"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=576482"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=576482"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=576482"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}