{"id":967941,"date":"2020-08-06T14:49:43","date_gmt":"2020-08-06T21:49:43","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=967941"},"modified":"2020-08-06T15:00:00","modified_gmt":"2020-08-06T22:00:00","slug":"business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights","title":{"rendered":"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights"},"author":61811,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341,22771,25831],"tags":[40691,741071,231052,30341,741061],"industry":[],"product":[36801],"class_list":["post-967941","blog","type-blog","status-publish","format-standard","hentry","category-analytics","category-natural-resources","category-petroleum","tag-analytics","tag-databricks","tag-insights","tag-intelligence","tag-spark","product-insights"],"acf":{"short_description":"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.","flexible_content":[{"acf_fc_layout":"content","content":"<h2>Insight at Scale<\/h2>\n<p>Recent updates to <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-insights\/overview\">ArcGIS Insights<\/a> have opened up fascinating possibilities for improving business intelligence and data science workflows in your organization. Business intelligence applications can enable you to find things out about your data quickly and easily. ArcGIS Insights lets you perform business intelligence analysis and publish the results as interactive Workbooks that visualize and explain the data in an intuitive fashion without sacrificing geospatial detail.<\/p>\n<p>This capability is also an excellent tool for data scientists. Instead of providing raw code or developing custom web apps to be the frontend for your analysis, you can publish a Workbook in ArcGIS Online or ArcGIS Enterprise for others to view.<\/p>\n<p>However, the scale of data used in advanced analytics is often a barrier to single-node applications. That is, the size of our data can often exceed the amount of memory we have, so we look to big data patterns like distributed processing to compensate. Apache Spark is the de facto standard for out-of-memory distributed analytics and since ArcGIS Insights gives us <a href=\"https:\/\/doc.arcgis.com\/en\/insights\/latest\/analyze\/scripting-overview.htm\">access to python<\/a>, we also have access to Spark.<\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Apache Spark<\/h3>\n<p>There are numerous ways to leverage the power of Apache Spark in Insights. For some, the solution is <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-geoanalytics-server\/overview\">ArcGIS GeoAnalytics Server<\/a>, which can use the <a href=\"https:\/\/developers.arcgis.com\/python\/\">ArcGIS API for Python<\/a>\u2019s geoanalytics module to <a href=\"https:\/\/developers.arcgis.com\/python\/guide\/working-with-big-data\/\">process and work with big datasets<\/a> from <a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/geoanalytics-server\/data-management\/following-the-flow-of-data-in-geoanalytics-server\/\">numerous data sources<\/a>. Others might manage and deploy their own Spark clusters in the cloud and internally configure ways to access and distribute jobs. Our friends at Databricks have yet another solution.<\/p>\n<p><a href=\"https:\/\/databricks.com\/\">Databricks<\/a> provides an analytics platform (built on the <a href=\"https:\/\/databricks.com\/product\/databricks-runtime\">Databricks Apache Spark runtime<\/a>) which enables data scientists to easily create and leverage managed Spark clusters, create notebooks, and manage models and experiments. Using <a href=\"https:\/\/docs.databricks.com\/dev-tools\/databricks-connect.html\">Databricks Connect<\/a>, we can now access our remote Databricks clusters and datasets inside of ArcGIS Insights. <\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Requirements<\/h3>\n<p>To go through the walkthrough yourself, you\u2019ll need the following:<\/p>\n<p>1. ArcGIS Insights Desktop \u2013 this can be done with ArcGIS Insights for Enterprise, as well, but this demo will use Desktop which can be <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/arcgis-insights\/resources\/desktop-client-download\">downloaded here<\/a><br \/>\n2. Databricks Subscription \u2013 Databricks Community Edition doesn\u2019t support remote access tokens, so you must have a paid subscription \u2013 check out how to get started with a <a href=\"https:\/\/docs.databricks.com\/getting-started\/try-databricks.html\">free trial from Databricks here<\/a><br \/>\n3. Local Environment \u2013 a terminal, java 8, python, conda, and a scripting gateway for Insights<br \/>\n4. Some data to play with that\u2019s not too small \u2013 the dataset I\u2019ll be using is the <a href=\"https:\/\/hifld-geoplatform.opendata.arcgis.com\/datasets\/oil-and-natural-gas-wells\/data\">Oil and Natural Gas Wells dataset<\/a> provided by HIFLD Open Data. It\u2019s only 1.5 million records, but it\u2019ll do for the purpose of this exercise. I\u2019ll have this data stored in the Databricks File System (DBFS) attached to my Insights cluster.<\/p>\n<p>If you don\u2019t have Insights, but do have access to ArcGIS Pro, my colleague Mansour Raad has setup remote access to Databricks in Pro Notebooks which you can read about <a href=\"\/\/www.linkedin.com\/posts\/mansour-raad-b552212_arcgis-pro-jupyter-notebook-and-databricks-activity-6696231972559212544-Or5J\u201d\">here.<\/a><\/p>\n<p>You can also find all the code from this article in <a href=\"https:\/\/github.com\/scook12\/databricks-insights\">this repository.<\/a> If you need to setup your local environment, follow the instructions and links from the README before continuing.<\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Setup<\/h3>\n<p>In a terminal, run the following:<\/p>\n<p><code>conda activate insights_gateway_env # where insights_gateway_env has the insights gateway kernel configured<\/code><br \/>\n<code>pip install -U databricks-connect==6.6 # replace 6.6 with your cluster version <\/code><\/p>\n<p>Make sure you have a Databricks cluster spun up that has the proper Spark configuration, including at least the following:<br \/>\n<code>spark.databricks.service.server.enabled true<\/code><br \/>\n<code>spark.databricks.service.port 8787 # 8787 req for Azure, AWS can be something else<\/code><\/p>\n<p>Next, you&#8217;ll need to retrieve the following from Databricks:<br \/>\n1. Workspace URL<br \/>\n2. Access token<br \/>\n3. Cluster ID<br \/>\n4. Port \u2013 this is under cluster \/ ssh configs<\/p>\n<p>The Databricks Connect documentation can <a href=\"https:\/\/docs.databricks.com\/dev-tools\/databricks-connect.html#step-2-configure-connection-properties\">help you find these<\/a>.<\/p>\n<p>Next, run <code>databricks-connect configure<\/code> and enter the information you just retrieved when prompted. Once that succeeds, run <code>databricks-connect test<\/code> and ensure the connection is configured.<\/p>\n<p>If you\u2019ve passed all tests, you can launch the kernel gateway for Insights:<\/p>\n<p><code>jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 \\<\/code><br \/>\n<code>--KernelGatewayApp.port=9999 \\<\/code><br \/>\n<code>--KernelGatewayApp.allow_origin='*' \\<\/code><br \/>\n<code>--KernelGatewayApp.allow_credentials='*' \\<\/code><br \/>\n<code>--KernelGatewayApp.allow_headers='*' \\<\/code><br \/>\n<code>--KernelGatewayApp.allow_methods='*' \\<\/code><br \/>\n<code>--JupyterWebsocketPersonality.list_kernels=True<\/code><\/p>\n<p>With the kernel gateway running, open ArcGIS Insights for Desktop and launch the scripting window. This opens a pop-up with input options for the connection URL. Assuming you used the default setup described above, enter <code>http:\/\/0.0.0.0:9999<\/code> as the connection and the websocket URL will autofill.<\/p>\n<p>Click <code>connect<\/code> and Insights will launch a scripting interface. From here, we\u2019re writing code that is using our local python kernel, but distributing Spark jobs to the Databricks cluster, which makes it both very convenient and efficient.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":968161,"id":968161,"title":"scripting","filename":"scripting.png","filesize":238872,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\/scripting-2","alt":"Scripting console in Insights","author":"61811","description":"Scripting console in Insights","caption":"Scripting console in Insights","name":"scripting-2","status":"inherit","uploaded_to":967941,"date":"2020-08-06 15:15:15","modified":"2020-08-06 15:16:10","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":3360,"height":2100,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting.png","medium-width":418,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting.png","medium_large-width":768,"medium_large-height":480,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting.png","large-width":1728,"large-height":1080,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting-1536x960.png","1536x1536-width":1536,"1536x1536-height":960,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting-2048x1280.png","2048x2048-width":2048,"2048x2048-height":1280,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting-744x465.png","card_image-width":744,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/scripting-1728x1080.png","wide_image-width":1728,"wide_image-height":1080}},"image_position":"center","orientation":"horizontal","hyperlink":"https:\/\/doc.arcgis.com\/en\/insights\/latest\/analyze\/scripting-overview.htm"},{"acf_fc_layout":"content","content":"<h3>Analysis<\/h3>\n<p>Now, we&#8217;ll dive into some analysis to show the value of connecting to Databricks via ArcGIS Insights.<\/p>\n<p>First, we&#8217;ll setup a Spark session and access our data:<br \/>\n<code>from pyspark.sql import SparkSession<\/code><br \/>\n<code>spark = SparkSession.getOrCreate()<\/code><br \/>\n<code>                                  <\/code><br \/>\n<code># Your filepath may differ and this assumes it's on the DBFS<\/code><br \/>\n<code>df = spark.read.csv(\"\/FileStore\/tables\/oil_and_ng_wells_hifld_opendata.csv\",<\/code><br \/>\n<code>.               header=\"true\", inferSchema=\"true\")<\/code><\/p>\n<p>To view the shape of our data in pyspark:<\/p>\n<p><code># looks like there's about 1.5m rows and 35 columns, so something like 50 million elements<\/code><br \/>\n<code>print(df.count(), len(df.columns))<\/code><\/p>\n<p>And to preview the data itself:<br \/>\n<code>df.show(n=10)<\/code><\/p>\n<p>We can proceed to build a feature for our clustering analysis:<br \/>\n<code>from pyspark.ml.feature import VectorAssembler<\/code><br \/>\n<code>cols = [\"X\", \"Y\"]<\/code><br \/>\n<code>assembler = VectorAssembler(inputCols=cols, outputCol='features')<\/code><br \/>\n<code>locations = assembler.transform(df)<\/code><\/p>\n<p>And now train a simple k-means model:<br \/>\n<code>from pyspark.ml.clustering import KMeans<\/code><br \/>\n<code># fit a k-means model with 50 clusters using the new \"features\" column<\/code><br \/>\n<code>km = KMeans(k=50)<\/code><br \/>\n<code>model = km.fit(locations.select(\"features\"))<\/code><\/p>\n<p>Performing inference, we get a DataFrame with a \u2018predictions\u2019 column that we can filter down to the interesting data we\u2019d like to share, in this case, active wells:<br \/>\n<code>clusters = model.transform(locations)<\/code><br \/>\n<code>from pyspark.sql.functions import col<\/code><br \/>\n<code>active = clusters.select([\"prediction\", \"features\", \"API\", \"STATUS\"])\\<\/code><br \/>\n<code>                 .filter(col(\"STATUS\") != \"NON-ACTIVE WELL\")<\/code><br \/>\n<code>active.show(n=10)<\/code><\/p>\n"},{"acf_fc_layout":"content","content":"<h3>Visualization<\/h3>\n<p>Now, to visualize the results:<br \/>\n<code>import seaborn as sns<\/code><br \/>\n<code>import matplotlib.pyplot as plt<\/code><br \/>\n<code>sns.set(style=\"ticks\", color_codes=True, rc={'figure.figsize':(12.7,10.27)})<\/code><\/p>\n<p><code>pdf = active.groupBy(\"STATUS\").count().toPandas()<\/code><br \/>\n<code>pdf['status'] = [\"prod, na\", \"dev\", \"active\", \"unknown\", \"tx\", \"smo\", \"prod\", \"O&amp;G\"] # less verbose labels<\/code><br \/>\n<code>sns.catplot(x=\"status\", y=\"count\", hue=\"status\", kind=\"bar\", data=pdf)<\/code><\/p>\n<p>This is really simple chart just to provide an example, and you can get as creative as you want about plotting. From here, in the top left of the scripting console, there\u2019s a plus button surrounded by a dashed square line. Click on the seaborn plot and then click that button. The chart will be added to the Insights Workbook as a card that you can then annotate and share.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":968901,"id":968901,"title":"","filename":"chart-scaled.png","filesize":458585,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-scaled.png","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\/chart-13","alt":"Chart in Insights workbook","author":"61811","description":"","caption":"Chart in Insights Workbook","name":"chart-13","status":"inherit","uploaded_to":967941,"date":"2020-08-06 21:47:59","modified":"2020-08-06 21:48:29","menu_order":0,"mime_type":"image\/png","type":"image","subtype":"png","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":2560,"height":1600,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-213x200.png","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-scaled.png","medium-width":418,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-scaled.png","medium_large-width":768,"medium_large-height":480,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-scaled.png","large-width":1728,"large-height":1080,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-1536x960.png","1536x1536-width":1536,"1536x1536-height":960,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-2048x1280.png","2048x2048-width":2048,"2048x2048-height":1280,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-744x465.png","card_image-width":744,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/08\/chart-1728x1080.png","wide_image-width":1728,"wide_image-height":1080}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Now, you can create new visualizations, perform more in-depth analysis, or publish that data to ArcGIS Online or Enterprise. Others can then use your results and incorporate it into their analysis, maps, and applications.<\/p>\n<h2>Conclusion<\/h2>\n<p>Scripting in ArcGIS Insights provides powerful new capabilities to integrate with and share the results of data science workflows with the simplicity of a configurable business intelligence application. You can access external data sources to enrich your dataset and compute clusters to power your analysis at scale. With your own python kernels attached, there\u2019s few limits on what you can accomplish.<\/p>\n<p>What\u2019s your organization doing with Insights? Let us know in the comments.<\/p>\n"}],"authors":[{"ID":61811,"user_firstname":"Samuel","user_lastname":"Cook","nickname":"scook","user_nicename":"scook","display_name":"Samuel Cook","user_email":"scook@esri.com","user_url":"","user_registered":"2020-07-06 19:46:53","user_description":"","user_avatar":"<img alt='' src='https:\/\/secure.gravatar.com\/avatar\/409d4ea4387a062f762e77385b50301d28e7e23be422c60b7d6335f5884e96df?s=96&#038;d=blank&#038;r=g' srcset='https:\/\/secure.gravatar.com\/avatar\/409d4ea4387a062f762e77385b50301d28e7e23be422c60b7d6335f5884e96df?s=192&#038;d=blank&#038;r=g 2x' class='avatar avatar-96 photo' height='96' width='96' loading='lazy' decoding='async'\/>"}],"related_articles":"","card_image":false,"wide_image":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Geospatial BI at Scale: Leveraging Spark in ArcGIS Insights<\/title>\n<meta name=\"description\" content=\"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights\" \/>\n<meta property=\"og:description\" content=\"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2020-08-06T22:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.esri.com\/arcgis-blog\/wp-content\/uploads\/2020\/02\/AB-99588132-arcgis-insights-in-monitor-operating-budget-1920-1080.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\"},\"author\":{\"name\":\"Samuel Cook\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/c100ae33fd593fa37dd9ed8da9cf6256\"},\"headline\":\"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights\",\"datePublished\":\"2020-08-06T21:49:43+00:00\",\"dateModified\":\"2020-08-06T22:00:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\"},\"wordCount\":10,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"analytics\",\"Databricks\",\"Insights\",\"Intelligence\",\"Spark\"],\"articleSection\":[\"Analytics\",\"Natural Resources\",\"Petroleum and Pipeline\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\",\"name\":\"Geospatial BI at Scale: Leveraging Spark in ArcGIS Insights\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2020-08-06T21:49:43+00:00\",\"dateModified\":\"2020-08-06T22:00:00+00:00\",\"description\":\"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/c100ae33fd593fa37dd9ed8da9cf6256\",\"name\":\"Samuel Cook\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/409d4ea4387a062f762e77385b50301d28e7e23be422c60b7d6335f5884e96df?s=96&d=blank&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/409d4ea4387a062f762e77385b50301d28e7e23be422c60b7d6335f5884e96df?s=96&d=blank&r=g\",\"caption\":\"Samuel Cook\"},\"url\":\"\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Geospatial BI at Scale: Leveraging Spark in ArcGIS Insights","description":"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights","og_locale":"en_US","og_type":"article","og_title":"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights","og_description":"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2020-08-06T22:00:00+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.esri.com\/arcgis-blog\/wp-content\/uploads\/2020\/02\/AB-99588132-arcgis-insights-in-monitor-operating-budget-1920-1080.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_site":"@ESRI","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights"},"author":{"name":"Samuel Cook","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/c100ae33fd593fa37dd9ed8da9cf6256"},"headline":"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights","datePublished":"2020-08-06T21:49:43+00:00","dateModified":"2020-08-06T22:00:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights"},"wordCount":10,"commentCount":1,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["analytics","Databricks","Insights","Intelligence","Spark"],"articleSection":["Analytics","Natural Resources","Petroleum and Pipeline"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights","name":"Geospatial BI at Scale: Leveraging Spark in ArcGIS Insights","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2020-08-06T21:49:43+00:00","dateModified":"2020-08-06T22:00:00+00:00","description":"Leverage the power of Apache Spark to scale your BI analysis in ArcGIS Insights by using Databricks Connect and python.","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/insights\/analytics\/business-intelligence-at-scale-leveraging-apache-spark-within-arcgis-insights#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Business Intelligence at Scale: Leveraging Apache Spark within ArcGIS Insights"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/c100ae33fd593fa37dd9ed8da9cf6256","name":"Samuel Cook","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/409d4ea4387a062f762e77385b50301d28e7e23be422c60b7d6335f5884e96df?s=96&d=blank&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/409d4ea4387a062f762e77385b50301d28e7e23be422c60b7d6335f5884e96df?s=96&d=blank&r=g","caption":"Samuel Cook"},"url":""}]}},"text_date":"August 6, 2020","author_name":"Samuel Cook","author_page":false,"custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Newsroom-Keyart-Wide-1920-x-1080.jpg","primary_product":"ArcGIS Insights","tag_data":[{"term_id":40691,"name":"analytics","slug":"analytics","term_group":0,"term_taxonomy_id":40691,"taxonomy":"post_tag","description":"","parent":0,"count":53,"filter":"raw"},{"term_id":741071,"name":"Databricks","slug":"databricks","term_group":0,"term_taxonomy_id":741071,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"},{"term_id":231052,"name":"Insights","slug":"insights","term_group":0,"term_taxonomy_id":231052,"taxonomy":"post_tag","description":"","parent":0,"count":13,"filter":"raw"},{"term_id":30341,"name":"Intelligence","slug":"intelligence","term_group":0,"term_taxonomy_id":30341,"taxonomy":"post_tag","description":"","parent":0,"count":31,"filter":"raw"},{"term_id":741061,"name":"Spark","slug":"spark","term_group":0,"term_taxonomy_id":741061,"taxonomy":"post_tag","description":"","parent":0,"count":2,"filter":"raw"}],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1325,"filter":"raw"},{"term_id":22771,"name":"Natural Resources","slug":"natural-resources","term_group":0,"term_taxonomy_id":22771,"taxonomy":"category","description":"","parent":0,"count":263,"filter":"raw"},{"term_id":25831,"name":"Petroleum and Pipeline","slug":"petroleum","term_group":0,"term_taxonomy_id":25831,"taxonomy":"category","description":"","parent":0,"count":85,"filter":"raw"}],"product_data":[{"term_id":36801,"name":"ArcGIS Insights","slug":"insights","term_group":0,"term_taxonomy_id":36801,"taxonomy":"product","description":"","parent":36591,"count":119,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=insights","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/967941","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/61811"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=967941"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/967941\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=967941"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=967941"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=967941"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=967941"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=967941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}