{"id":2361222,"date":"2024-06-11T07:39:16","date_gmt":"2024-06-11T14:39:16","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2361222"},"modified":"2025-01-13T07:09:44","modified_gmt":"2025-01-13T15:09:44","slug":"end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis","title":{"rendered":"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS"},"author":154341,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341],"tags":[760452,35661,24341,30241,759592],"industry":[],"product":[36841,36561],"class_list":["post-2361222","blog","type-blog","status-publish","format-standard","hentry","category-analytics","tag-data-engineering","tag-machine-learning","tag-python","tag-r","tag-spatial-data-science","product-api-python","product-arcgis-pro"],"acf":{"authors":[{"ID":154341,"user_firstname":"Nicholas","user_lastname":"Giner","nickname":"Nick Giner","user_nicename":"nginer","display_name":"Nicholas Giner","user_email":"NGiner@esri.com","user_url":"","user_registered":"2021-01-07 14:31:25","user_description":"Nick Giner is a Product Manager for Spatial Analysis and Data Science.  Prior to joining Esri in 2014, he completed Bachelor\u2019s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read\/watch shows about history.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"short_description":"This is the fifth in a series of blogs that showcase an end-to-end spatial data science workflow for clustering US precipitation regions.","flexible_content":[{"acf_fc_layout":"content","content":"<h2>Introduction<\/h2>\n<p>At this point, we are finally ready to do some machine learning.\u00a0 Let this blog series be a reminder of just how much work may be required to prepare your data prior to getting to any of the fancy stuff!<\/p>\n<p>In the original <a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/24694452.2020.1828803\">paper<\/a>, the authors used two machine learning techniques back-to-back to create the final climate region map: Principal Components Analysis (PCA) and cluster analysis.\u00a0 PCA is a technique used to reduce highly dimensional datasets into smaller, uncorrelated dimensions, while attempting to maintain as much of the variation (e.g. <em>information<\/em>) in the original data as possible.\u00a0 Cluster analysis is a technique used to group similar observations together.\u00a0 The objective is to take a dataset of individual data points and create subgroups (or clusters) of data points, where the data points within each cluster are more similar to each other than to the data points in other clusters.<\/p>\n"},{"acf_fc_layout":"sidebar","content":"<p><strong>Note:<\/strong> It is a common practice to use PCA prior to cluster analysis when your data is highly dimensional.<\/p>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<h2>Principal Components Analysis (PCA)<\/h2>\n<p>Recall that our final gridded precipitation dataset contains 16 precipitation variables (4 precipitation variables x 4 seasons), so we\u2019ll use PCA to reduce the dimensionality of this dataset into fewer components.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2368222,"id":2368222,"title":"dimensions","filename":"dimensions.jpg","filesize":59409,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/dimensions-3","alt":"","author":"154341","description":"","caption":"","name":"dimensions-3","status":"inherit","uploaded_to":2361222,"date":"2024-06-07 15:16:54","modified":"2024-06-07 15:16:54","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":982,"height":379,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","medium-width":464,"medium-height":179,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","medium_large-width":768,"medium_large-height":296,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","large-width":982,"large-height":379,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","1536x1536-width":982,"1536x1536-height":379,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","2048x2048-width":982,"2048x2048-height":379,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions-826x319.jpg","card_image-width":826,"card_image-height":319,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dimensions.jpg","wide_image-width":982,"wide_image-height":379}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Back in the notebook, we\u2019ll create a variable for our gridded precipitation dataset, then use ArcPy to make sure we have all the necessary fields.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2368232,"id":2368232,"title":"pca_input","filename":"pca_input.jpg","filesize":72124,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/pca_input","alt":"","author":"154341","description":"","caption":"","name":"pca_input","status":"inherit","uploaded_to":2361222,"date":"2024-06-07 15:21:42","modified":"2024-06-07 15:21:42","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1198,"height":250,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","medium-width":464,"medium-height":97,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","medium_large-width":768,"medium_large-height":160,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","large-width":1198,"large-height":250,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","1536x1536-width":1198,"1536x1536-height":250,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","2048x2048-width":1198,"2048x2048-height":250,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input-826x172.jpg","card_image-width":826,"card_image-height":172,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_input.jpg","wide_image-width":1198,"wide_image-height":250}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>In ArcGIS Pro, PCA can be performed using the <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/dimensionreduction.htm\">Dimension Reduction<\/a><\/strong> tool.\u00a0 We\u2019ll specify our input and output datasets and choose all 16 of the precipitation variables for the \u201cFields\u201d parameter.\u00a0 The \u201cScale\u201d parameter gives you the option to transform all the input variables to the same scale, which can ensure that all variables contribute equally to the principal components.\u00a0 This is important in a situation like ours, where our precipitation variables have different units such as millimeters of precipitation, number of precipitation days, and Gini\/Lorenz Asymmetry Coefficients that range in value from 0 to just over 1.\u00a0 The tool also has the option to output eigenvalue and eigenvector tables, which can help you interpret the results of the PCA.\u00a0 For more information on how all this works behind the scenes, check out the <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/dimensionreduction.htm\">documentation<\/a>.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2368342,"id":2368342,"title":"pca_tool","filename":"pca_tool.jpg","filesize":88032,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/pca_tool","alt":"","author":"154341","description":"","caption":"","name":"pca_tool","status":"inherit","uploaded_to":2361222,"date":"2024-06-07 15:51:48","modified":"2024-06-07 15:51:48","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1132,"height":370,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","medium-width":464,"medium-height":152,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","medium_large-width":768,"medium_large-height":251,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","large-width":1132,"large-height":370,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","1536x1536-width":1132,"1536x1536-height":370,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","2048x2048-width":1132,"2048x2048-height":370,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool-826x270.jpg","card_image-width":826,"card_image-height":270,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_tool.jpg","wide_image-width":1132,"wide_image-height":370}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>When you run a geoprocessing tool from within a notebook in ArcGIS Pro, you will see the tool result displayed as a rich representation directly within the notebook.\u00a0 This makes it easy for you to see all the information that helps you understand and interpret the tool results the same as you would if you ran the tool from the geoprocessing pane.<\/p>\n<p>Looking at the PCA results in the notebook, we can see that the first three principal components account for over 80% of the variance in the original 16 precipitation variables.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2368452,"id":2368452,"title":"pca_result","filename":"pca_result.jpg","filesize":113533,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/pca_result","alt":"","author":"154341","description":"","caption":"","name":"pca_result","status":"inherit","uploaded_to":2361222,"date":"2024-06-07 16:05:43","modified":"2024-06-07 16:05:43","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1032,"height":750,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","medium-width":359,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","medium_large-width":768,"medium_large-height":558,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","large-width":1032,"large-height":750,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","1536x1536-width":1032,"1536x1536-height":750,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","2048x2048-width":1032,"2048x2048-height":750,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result-640x465.jpg","card_image-width":640,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_result.jpg","wide_image-width":1032,"wide_image-height":750}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>While there is no hard and fast rule for deciding how many of the principal components to retain (e.g. how many it takes to adequately represent the variance in the data), there are a few methods and strategies available to help you do so.\u00a0 One example is to examine the scree plot, which is automatically created for you when you run the <strong>Dimension Reduction<\/strong> tool and specify the \u201cOutput Eigenvalues Table\u201d parameter.\u00a0 It plots each principal component (x-axis) against the percent of variance explained by that component (y-axis).<\/p>\n<p>You can use the scree plot to help you decide on how many principal components to retain by locating the point at which the curve begins to flatten out, which in our case is around PC3.\u00a0 This aligns with the authors\u2019 decision from the original paper, where they chose to use the first three principal components as input into the subsequent cluster analyses.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2368502,"id":2368502,"title":"scree_plot","filename":"scree_plot.jpg","filesize":56630,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/scree_plot","alt":"","author":"154341","description":"","caption":"Scree plot produced by the Dimension Reduction tool.","name":"scree_plot","status":"inherit","uploaded_to":2361222,"date":"2024-06-07 16:20:33","modified":"2024-06-07 16:21:13","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":938,"height":444,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","medium-width":464,"medium-height":220,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","medium_large-width":768,"medium_large-height":364,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","large-width":938,"large-height":444,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","1536x1536-width":938,"1536x1536-height":444,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","2048x2048-width":938,"2048x2048-height":444,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot-826x391.jpg","card_image-width":826,"card_image-height":391,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/scree_plot.jpg","wide_image-width":938,"wide_image-height":444}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>The &#8220;Output Eigenvectors Table&#8221; parameter may also be helpful for understanding the results of the PCA.\u00a0 It provides a\u00a0graphical representation of the loadings (e.g. weights) for each component, which gives you an indication of how much the original variables are contributing to each component.<\/p>\n<h2>Cluster analysis<\/h2>\n<p>Broadly speaking, cluster analysis is a technique used to group together individual objects, entities, or data points into meaningful subgroups (e.g. \u201cclusters\u201d).\u00a0 The end goal is for each data point within a cluster to be more similar to the other data points within that cluster than to the data points in any other cluster.\u00a0 In the context of this case study, the aim is to cluster the gridded precipitation dataset into climate regions, where each region has similar historical precipitation characteristics over the 30-year period from 1981-2010.<\/p>\n<p>In the field of data science, there are dozens of clustering techniques and algorithms to choose from.\u00a0 Some make specific assumptions about the statistical properties of the data points, while others are more data-driven and fall under the category of machine learning. The common thread among <u>all<\/u> clustering techniques and algorithms is:\u00a0 <em>What does it mean for two data points to be \u201csimilar\u201d? (e.g. how do we define \u201csimilarity\u201d between two data points, and how does this inform the creation of clusters?)<\/em><\/p>\n<p>In the context of cluster analysis, similarity is most often measured using correlational measures, association measures, or distance-based measures\u2014with distance-based being the most common.\u00a0 From a purely data science perspective, the term \u201cdistance\u201d refers to <em>statistical distance<\/em>, which is simply the distance between observations in multivariate attribute space (e.g. the difference in the data <span style=\"text-decoration: underline\">values<\/span>).\u00a0 To geographers and GIS professionals, however, distance refers to <em>geographical distance<\/em>, the physical distance between observations or data points on Earth\u2019s surface.\u00a0 As it turns out, there are clustering techniques and algorithms that use one or the other, or blend the two together.<\/p>\n<p>In ArcGIS, for example, you have access to <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/an-overview-of-the-mapping-clusters-toolset.htm\">clustering tools<\/a> that group spatial data based solely on the data attributes (e.g. <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/multivariate-clustering.htm\">Multivariate Clustering<\/a><\/strong>), or solely on their locations (e.g. <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/densitybasedclustering.htm\">Density-based Clustering<\/a><\/strong>).\u00a0 There are also several tools that perform clustering based on both the attributes <em>and <\/em>locations of the data.\u00a0 For example, <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/buildbalancedzones.htm\">Build Balanced Zones<\/a><\/strong> and <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/spatially-constrained-multivariate-clustering.htm\">Spatially Constrained Multivariate Clustering<\/a><\/strong> take a machine learning approach, while the different flavors of <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/hot-spot-analysis.htm\">Hot Spot Analysis<\/a><\/strong> and <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/cluster-and-outlier-analysis-anselin-local-moran-s.htm\">Cluster and Outlier Analysis<\/a> <\/strong>take a statistical approach.\u00a0 Additionally, the <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/space-time-pattern-mining\/time-series-clustering.htm\">Time Series Clustering<\/a><\/strong> tool applies many of these concepts and more to time series data.<\/p>\n<p><strong>As with any other form of analysis, it\u2019s up to you as the analyst to choose the most appropriate clustering technique or algorithm based on the problem you want to solve and the properties of your data.\u00a0<\/strong><\/p>\n"},{"acf_fc_layout":"sidebar","content":"<p><strong>Note:<\/strong> In the paper that this blog series is based on, the authors choose a form of cluster analysis that focuses on creating clusters based on the attributes in multivariate data space. In the following sections, we\u2019ll use both open-source Python libraries and ArcGIS tools to replicate their results as close as possible, but for simplicity and educational purposes, we have made small adjustments to their exact workflow.<\/p>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<h3>Cluster analysis using open-source Python libraries: Hierarchical clustering<\/h3>\n<p>Following the workflow in the paper, the first cluster analysis technique we\u2019ll use is <em>hierarchical clustering<\/em>.\u00a0 Fundamentally, hierarchical clustering methods operate by iteratively merging or dividing a dataset into a nested hierarchy of clusters.\u00a0 There are two main varieties of hierarchical clustering: agglomerative and divisive.<\/p>\n<p>Agglomerative hierarchical clustering (e.g. \u201cbottom-up\u201d) begins with each data point as a member of its own individual cluster, then successively merges similar individual clusters together until all data points form one large cluster.\u00a0 Divisive hierarchical clustering (\u201ctop-down\u201d) initially treats the entire dataset as one large cluster, then splits this large cluster into successively smaller clusters based on the dissimilarity between them.<\/p>\n<p>In the paper, the authors experiment with several different agglomerative clustering algorithms, each of which uses a different \u201clinkage algorithm\u201d.\u00a0 In a nutshell, these linkage algorithms define how the distance (in multivariate attribute space) between clusters is calculated.\u00a0 You can learn more about the different linkage algorithms in the Hair, Jr. et al. (1998) and O\u2019Sullivan and Unwin (2003) textbooks referenced in <a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-1-clustering-us-precipitation-regions\/\">Part #1<\/a> of this blog series.\u00a0 You can also watch a great explanation by Luc Anselin in his 2021 Spatial Cluster Analysis <a href=\"https:\/\/www.youtube.com\/watch?v=GG9xlaU0Bxw&amp;list=PLzREt6r1Nenk3L0ndufhYuwdrrfZqdsIA&amp;index=17\">course<\/a> at the University of Chicago.<\/p>\n<p>We saw in the previous section that there are many clustering techniques available in ArcGIS, but hierarchical clustering is <em>not<\/em> one of them.\u00a0 Because I am working with Python in an ArcGIS Notebook, however, it\u2019s very easy to hook into the Python ecosystem and take advantage of its powerful open-source data science libraries.\u00a0 In this case, I\u2019m going to call a specific hierarchical clustering algorithm from one of the most popular of these libraries, <a href=\"https:\/\/scikit-learn.org\/stable\/\">scikit-learn<\/a>.<\/p>\n<p>Back in our notebook, we need to import a few more Python libraries.<\/p>\n<ul>\n<li><a href=\"https:\/\/matplotlib.org\/\">matplotlib<\/a> \u2013 creating data visualizations (graphs and charts) and plots<\/li>\n<li><a href=\"https:\/\/numpy.org\/\">numpy<\/a> \u2013 scientific and mathematical computing, working with arrays and matrices<\/li>\n<li><a href=\"https:\/\/scipy.org\/\">scipy<\/a> \u2013 scientific computing for mathematics, science, engineering<\/li>\n<li><a href=\"https:\/\/scikit-learn.org\/stable\/\">sklearn<\/a> \u2013 the scikit-learn machine learning library<\/li>\n<\/ul>\n<p>You may notice that for several of these libraries, we are importing only certain subsets of functionality.\u00a0 In the SciPy import, for example, we call only the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/cluster.html#module-scipy.cluster\">cluster submodule<\/a> so we can use the functions for hierarchical and agglomerative clustering in the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/cluster.hierarchy.html#module-scipy.cluster.hierarchy\">hierarchy module<\/a>.\u00a0 From the scikit-learn <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/classes.html#module-sklearn.cluster\">cluster module<\/a>, we are importing only the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering\">AgglomerativeClustering class<\/a>.\u00a0 From Matplotlib, we import only the<a href=\"https:\/\/matplotlib.org\/stable\/api\/pyplot_summary.html\"> pyplot<\/a> class.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2368922,"id":2368922,"title":"python_clustering_imports","filename":"python_clustering_imports.jpg","filesize":32216,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/python_clustering_imports","alt":"","author":"154341","description":"","caption":"","name":"python_clustering_imports","status":"inherit","uploaded_to":2361222,"date":"2024-06-07 17:12:19","modified":"2024-06-07 17:12:19","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":647,"height":165,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports-213x165.jpg","thumbnail-width":213,"thumbnail-height":165,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","medium-width":464,"medium-height":118,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","medium_large-width":647,"medium_large-height":165,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","large-width":647,"large-height":165,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","1536x1536-width":647,"1536x1536-height":165,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","2048x2048-width":647,"2048x2048-height":165,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","card_image-width":647,"card_image-height":165,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/python_clustering_imports.jpg","wide_image-width":647,"wide_image-height":165}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Prior to using SciPy and scikit-learn for cluster analysis, we need to ensure our GIS data is in the format they require, which in this case is the <a href=\"https:\/\/numpy.org\/doc\/stable\/user\/absolute_beginners.html#what-is-an-array\">NumPy array<\/a>.\u00a0 At this point, our gridded precipitation dataset (which now includes the principal components score of each observation) is a geodatabase feature class, so there are a few data conversion steps we need to do.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370232,"id":2370232,"title":"pca_attribute_table","filename":"pca_attribute_table.jpg","filesize":130497,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/pca_attribute_table","alt":"","author":"154341","description":"","caption":"","name":"pca_attribute_table","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:12:26","modified":"2024-06-10 14:12:26","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1084,"height":410,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","medium-width":464,"medium-height":175,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","medium_large-width":768,"medium_large-height":290,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","large-width":1084,"large-height":410,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","1536x1536-width":1084,"1536x1536-height":410,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","2048x2048-width":1084,"2048x2048-height":410,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table-826x312.jpg","card_image-width":826,"card_image-height":312,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_attribute_table.jpg","wide_image-width":1084,"wide_image-height":410}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>First, we\u2019ll use the ArcGIS API for Python to convert our gridded precipitation dataset into a <a href=\"https:\/\/developers.arcgis.com\/python\/guide\/part1-introduction-to-sedf\/\">Spatially Enabled DataFrame<\/a> (SeDF), which is essentially a Pandas DataFrame that contains the spatial geometry of each row.\u00a0 We pass the feature class into the <a href=\"https:\/\/developers.arcgis.com\/python\/guide\/part2-data-io-reading-data\/#read-in-local-gis-data\"><strong><em>.from_featureclass<\/em><\/strong><\/a> method on the ArcGIS API for Python\u2019s <a href=\"https:\/\/developers.arcgis.com\/python\/api-reference\/arcgis.features.toc.html?arcgis.features.GeoAccessor.from_featureclass#arcgis.features.GeoAccessor.from_featureclass\">GeoAccessor<\/a> class to convert it to a SeDF.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370242,"id":2370242,"title":"pca_pandas_df","filename":"pca_pandas_df.jpg","filesize":115590,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/pca_pandas_df","alt":"","author":"154341","description":"","caption":"","name":"pca_pandas_df","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:15:46","modified":"2024-06-10 14:15:46","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":982,"height":720,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","medium-width":356,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","medium_large-width":768,"medium_large-height":563,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","large-width":982,"large-height":720,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","1536x1536-width":982,"1536x1536-height":720,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","2048x2048-width":982,"2048x2048-height":720,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df-634x465.jpg","card_image-width":634,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_pandas_df.jpg","wide_image-width":982,"wide_image-height":720}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Next, we\u2019ll print the DataFrame column names using the <a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.columns.html#pandas.DataFrame.columns\"><strong><em>.columns<\/em><\/strong><\/a> attribute, then use the <a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.Index.get_loc.html\">.<strong><em>get_loc<\/em><\/strong><\/a> method on this <strong><em>.columns<\/em><\/strong> attribute to print the index position of the column we specify.\u00a0 Because we\u2019ll be using the first three principal components in our subsequent cluster analysis steps, we need to get the index positions of \u201cPCA1\u201d, \u201cPCA2\u201d, and \u201cPCA3\u201d, which are 19, 25, and 26, respectively.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370272,"id":2370272,"title":"pca_index_positions","filename":"pca_index_positions.jpg","filesize":89310,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/pca_index_positions","alt":"","author":"154341","description":"","caption":"","name":"pca_index_positions","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:19:14","modified":"2024-06-10 14:19:14","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":820,"height":480,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","medium-width":446,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","medium_large-width":768,"medium_large-height":450,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","large-width":820,"large-height":480,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","1536x1536-width":820,"1536x1536-height":480,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","2048x2048-width":820,"2048x2048-height":480,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions-794x465.jpg","card_image-width":794,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/pca_index_positions.jpg","wide_image-width":820,"wide_image-height":480}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>The last steps before cluster analysis are to subset the DataFrame to include only the columns we need, then convert these columns to a NumPy array.\u00a0 For this subset\/selection step, we can use either the <a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.loc.html\"><strong><em>.loc<\/em><\/strong><\/a> or <strong><em>.<\/em><\/strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.iloc.html\"><strong><em>iloc<\/em><\/strong><\/a> functions.\u00a0 You can think of these functions as a more flexible version of an attribute query.\u00a0 The <strong><em>.loc<\/em><\/strong> function allows you to select rows and\/or columns of a Pandas DataFrame using the row and column <em>labels<\/em>, while <strong><em>.iloc<\/em><\/strong> performs the selection based on the row and column <em>integer positions<\/em>.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370282,"id":2370282,"title":"loc_iloc","filename":"loc_iloc.jpg","filesize":70660,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/loc_iloc","alt":"","author":"154341","description":"","caption":"In both examples, we are only using the second argument of the .loc or .iloc functions.  The colon in the first argument indicates that all rows will be included in the subset.","name":"loc_iloc","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:22:16","modified":"2024-06-10 14:22:33","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":596,"height":580,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","medium-width":268,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","medium_large-width":596,"medium_large-height":580,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","large-width":596,"large-height":580,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","1536x1536-width":596,"1536x1536-height":580,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","2048x2048-width":596,"2048x2048-height":580,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc-478x465.jpg","card_image-width":478,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/loc_iloc.jpg","wide_image-width":596,"wide_image-height":580}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>In my case, I chose to use <strong><em>.iloc <\/em><\/strong>and the integer positions, but either would do the job.\u00a0 Last, I can add the <a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.values.html\"><strong><em>.values<\/em><\/strong><\/a> attribute to this line of code, which creates a NumPy representation of the DataFrame subset. With the new variable \u201cdata\u201d pointing to a NumPy array containing the first three principal components, we are ready for cluster analysis.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370302,"id":2370302,"title":"numpy_array","filename":"numpy_array.jpg","filesize":81864,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/numpy_array","alt":"","author":"154341","description":"","caption":"","name":"numpy_array","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:24:17","modified":"2024-06-10 14:24:17","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":770,"height":384,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","medium-width":464,"medium-height":231,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","medium_large-width":768,"medium_large-height":383,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","large-width":770,"large-height":384,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","1536x1536-width":770,"1536x1536-height":384,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","2048x2048-width":770,"2048x2048-height":384,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","card_image-width":770,"card_image-height":384,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_array.jpg","wide_image-width":770,"wide_image-height":384}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>One of the defining characteristics of all hierarchical clustering methods is that they do not require you to choose the final number of clusters up front.\u00a0 In the paper, the authors calculated a few measures of cluster cohesiveness to help them with this choice, but most often this is a subjective decision.\u00a0 A common approach for deciding on the optimal number of clusters is to use a dendrogram.\u00a0 A dendrogram is a tree-like representation of the nested hierarchy of clusters, where you can visualize the steps in which data points get iteratively merged into (agglomerative) or removed from (divisive) clusters.<\/p>\n<p>To create a dendrogram, we\u2019ll first use the <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy.linkage\"><strong><em>.linkage<\/em><\/strong><\/a> function from SciPy\u2019s <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/cluster.hierarchy.html#module-scipy.cluster.hierarchy\">cluster.hierarchy<\/a> submodule.\u00a0 This function is used to build a linkage matrix, which shows you which clusters combine to form successive clusters, the distance (e.g. \u201csimilarity\u201d) between them, and the number of original observations within each.<\/p>\n<p>In the <strong><em>.linkage<\/em><\/strong> function, we pass in the NumPy array, a metric to quantify <a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.pdist.html\">distance<\/a> between individual data points, and a linkage algorithm which determines how these distances are aggregated as new data points get merged into clusters (e.g. the distances between clusters).\u00a0 We follow the authors\u2019 steps from the paper, using \u201cEuclidean\u201d as the distance metric and \u201cWard\u2019s\u201d as the linkage algorithm.\u00a0 Unlike some of the other linkage algorithms that are based on simple distances between clusters, Ward\u2019s Method focuses on minimizing the variance within clusters as they are updated, and is often cited as one of the best performing of the linkage algorithms (Hair et al. 1998).<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370312,"id":2370312,"title":"ward_array","filename":"ward_array.jpg","filesize":93970,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/ward_array","alt":"","author":"154341","description":"","caption":"","name":"ward_array","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:28:20","modified":"2024-06-10 14:28:20","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":863,"height":449,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","medium-width":464,"medium-height":241,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","medium_large-width":768,"medium_large-height":400,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","large-width":863,"large-height":449,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","1536x1536-width":863,"1536x1536-height":449,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","2048x2048-width":863,"2048x2048-height":449,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array-826x430.jpg","card_image-width":826,"card_image-height":430,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/ward_array.jpg","wide_image-width":863,"wide_image-height":449}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We then pass this output NumPy array into the<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.cluster.hierarchy.dendrogram.html\"><strong><em> .dendrogram<\/em><\/strong><\/a> function, which allows us to visualize the cluster hierarchy as a dendrogram.\u00a0 Note that we are also utilizing several of the functions for plot formatting from within the Matplotlib <a href=\"https:\/\/matplotlib.org\/3.1.1\/api\/pyplot_summary.html\">pyplot<\/a> class.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370322,"id":2370322,"title":"dendrogram_cut","filename":"dendrogram_cut.jpg","filesize":77222,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/dendrogram_cut","alt":"","author":"154341","description":"","caption":"There are 13 vertical lines that intersect the horizontal red line that shows the dendrogram cut.  These 13 lines represent the 13 precipitation clusters.","name":"dendrogram_cut","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:30:23","modified":"2024-06-10 14:32:03","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1003,"height":775,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","medium-width":338,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","medium_large-width":768,"medium_large-height":593,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","large-width":1003,"large-height":775,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","1536x1536-width":1003,"1536x1536-height":775,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","2048x2048-width":1003,"2048x2048-height":775,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut-602x465.jpg","card_image-width":602,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/dendrogram_cut.jpg","wide_image-width":1003,"wide_image-height":775}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>As mentioned above, choosing the final number of clusters in hierarchical clustering is subjective and is often driven by your data and the problem at hand.\u00a0 In the paper, the authors experimented with three different linkage functions and numbers of clusters to retain (Complete Linkage-15, Average Linkage-13, Ward\u2019s Method-16).\u00a0 We decided on our final number of clusters with these numbers in mind <em>and<\/em> based on our dendrogram.\u00a0 The graphic above shows a red line that \u201ccuts\u201d the dendrogram at 13 clusters.\u00a0 While there is no hard and fast rule for where to cut a dendrogram and it tends to feel more like art than science, one common approach is to make the cut where the vertical lines first become longer (e.g. where there is more dissimilarity and separation between the clusters).\u00a0 Think of the dendrogram like a tree.\u00a0 All of the individual data points at the bottom are the individual leaves and twigs.\u00a0 As you move up the tree, each leaf and twig is associated with a larger branch, representing a cluster.\u00a0 Our tree has 13 \u201cbranches\u201d.<\/p>\n<p>With this decision in mind, we are ready to perform the clustering.\u00a0 Using scikit-learn\u2019s <strong><em>.<\/em><\/strong><a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.AgglomerativeClustering.html\"><strong><em>AgglomerativeClustering<\/em><\/strong><\/a> function, we\u2019ll create our <a href=\"https:\/\/scikit-learn.org\/stable\/developers\/develop.html\">estimator<\/a> (e.g. \u201ccluster model\u201d) by passing in our desired number of clusters (13), the distance metric, and the linkage algorithm that we used in the previous step.\u00a0 The <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit_predict\"><strong><em>.fit_predict<\/em><\/strong><\/a> method is then used to fit this clustering model to the \u201cdata\u201d variable, which points to the NumPy array containing the first three principal components.\u00a0 In addition to fitting a model to your training data, <strong><em>.fit_predict<\/em><\/strong> also predicts the cluster labels that are assigned to each data point and returns these values as another NumPy array.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370342,"id":2370342,"title":"agglom_clustering","filename":"agglom_clustering.jpg","filesize":47913,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/agglom_clustering","alt":"","author":"154341","description":"","caption":"","name":"agglom_clustering","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:35:58","modified":"2024-06-10 14:35:58","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":961,"height":284,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","medium-width":464,"medium-height":137,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","medium_large-width":768,"medium_large-height":227,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","large-width":961,"large-height":284,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","1536x1536-width":961,"1536x1536-height":284,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","2048x2048-width":961,"2048x2048-height":284,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering-826x244.jpg","card_image-width":826,"card_image-height":244,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/agglom_clustering.jpg","wide_image-width":961,"wide_image-height":284}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We can explore these results further using some basic NumPy functionality.\u00a0 First, we\u2019ll use the <a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/generated\/numpy.shape.html\"><strong><em>.shape<\/em><\/strong><\/a> attribute to check the dimensions of the array.\u00a0 In our case, the cluster analysis output is a 1-dimensional (1-D) array containing 30,665 elements that represent a cluster label for each row in the 16km by 16km gridded precipitation dataset.\u00a0 Next, we\u2019ll use the <a href=\"https:\/\/numpy.org\/doc\/stable\/reference\/generated\/numpy.unique.html\"><strong><em>.unique<\/em><\/strong><\/a> function to confirm that there are 13 unique cluster labels.\u00a0 Looks good!<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370352,"id":2370352,"title":"numpy_checks","filename":"numpy_checks.jpg","filesize":40073,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/numpy_checks","alt":"","author":"154341","description":"","caption":"","name":"numpy_checks","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:39:04","modified":"2024-06-10 14:39:04","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":753,"height":284,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","medium-width":464,"medium-height":175,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","medium_large-width":753,"medium_large-height":284,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","large-width":753,"large-height":284,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","1536x1536-width":753,"1536x1536-height":284,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","2048x2048-width":753,"2048x2048-height":284,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","card_image-width":753,"card_image-height":284,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/numpy_checks.jpg","wide_image-width":753,"wide_image-height":284}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"sidebar","content":"<p>Note: You\u2019ll notice that the 13 elements in this array are ordered 0 to 12, rather than 1 to 13. In Python and many other programming languages, it is a common convention to begin indexing at 0 rather than 1 (e.g. \u201czero-based indexing\u201d).<\/p>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<p>At this point, we have a NumPy array that tells us which cluster each of our original gridded precipitation points belongs to.\u00a0 Now we need to get this information attached to the Pandas DataFrame that contains all of the other attributes.\u00a0 This is a common workflow when performing other types of machine learning or predictive analysis (e.g. linking the output results in the form of a NumPy array back to the Pandas DataFrame that contained the original values that were input into the algorithm).<\/p>\n<p>We pass the NumPy array containing our cluster labels into the <a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.Series.html\"><strong><em>.Series<\/em><\/strong><\/a> function to make it a 1-dimensional Pandas Series, and set the <em>index<\/em> argument to match the index of the DataFrame.\u00a0 \u00a0The result of this function is assigned to a new column in the DataFrame called \u201cWard_ClusterID\u201d.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370372,"id":2370372,"title":"wards_sedf","filename":"wards_sedf.jpg","filesize":96761,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/wards_sedf","alt":"","author":"154341","description":"","caption":"","name":"wards_sedf","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:42:41","modified":"2024-06-10 14:42:41","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":945,"height":392,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","medium-width":464,"medium-height":192,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","medium_large-width":768,"medium_large-height":319,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","large-width":945,"large-height":392,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","1536x1536-width":945,"1536x1536-height":392,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","2048x2048-width":945,"2048x2048-height":392,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf-826x343.jpg","card_image-width":826,"card_image-height":343,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_sedf.jpg","wide_image-width":945,"wide_image-height":392}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Last, we\u2019ll use the ArcGIS API for Python\u2019s <a href=\"https:\/\/developers.arcgis.com\/python\/guide\/part3-data-io-writing-data\/#write-to-a-local-file\"><strong><em>.to_featureclass()<\/em><\/strong><\/a> method to export the SeDF containing the Ward\u2019s Method cluster labels as a feature class in a geodatabase so we can use it for further analysis and mapping in ArcGIS Pro.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370392,"id":2370392,"title":"wards_export","filename":"wards_export.jpg","filesize":32070,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/wards_export","alt":"","author":"154341","description":"","caption":"","name":"wards_export","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:45:39","modified":"2024-06-10 14:45:39","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1122,"height":87,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export-213x87.jpg","thumbnail-width":213,"thumbnail-height":87,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","medium-width":464,"medium-height":36,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","medium_large-width":768,"medium_large-height":60,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","large-width":1122,"large-height":87,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","1536x1536-width":1122,"1536x1536-height":87,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","2048x2048-width":1122,"2048x2048-height":87,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export-826x64.jpg","card_image-width":826,"card_image-height":64,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_export.jpg","wide_image-width":1122,"wide_image-height":87}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2370412,"id":2370412,"title":"wards_fc","filename":"wards_fc.jpg","filesize":164698,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/wards_fc","alt":"","author":"154341","description":"","caption":"Feature class attribute table containing the original input data with the Ward's Method cluster ID appended.","name":"wards_fc","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:46:02","modified":"2024-06-11 12:23:33","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1201,"height":450,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","medium-width":464,"medium-height":174,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","medium_large-width":768,"medium_large-height":288,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","large-width":1201,"large-height":450,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","1536x1536-width":1201,"1536x1536-height":450,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","2048x2048-width":1201,"2048x2048-height":450,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc-826x309.jpg","card_image-width":826,"card_image-height":309,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_fc.jpg","wide_image-width":1201,"wide_image-height":450}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h3>Cluster analysis using ArcGIS Pro: K-means clustering<\/h3>\n<p>The second clustering technique used in the paper is K-means clustering, which is considered a form of <em>partitioning<\/em> clustering.\u00a0 Unlike the Ward\u2019s Method discussed in the previous section, K-means is<em> nonhierarchical<\/em>, which means that it does not rely on the treelike hierarchy of clusters that is constructed through either agglomeration or division. Rather, K-means clustering typically begins with a set of random \u201cseed points\u201d that iteratively grow into larger clusters.<\/p>\n<p>Initially, all observations in the dataset are assigned to the random seed point they are closest to (<em>closest<\/em> in multivariate attribute space).\u00a0 The mean center of each cluster is then calculated, and each observation is then reassigned to the new closest mean center.\u00a0 This process continues until individual data points are no longer being reassigned to different clusters (e.g. the clusters become stable).<\/p>\n<p>Unlike the hierarchical methods discussed above, one the defining characteristics of partitioning methods is that the desired number of clusters is specified up front, via the seed points.\u00a0 For the purposes of this blog, we\u2019ll use the same number of clusters (13) in K-means as we used in the Ward\u2019s Method above.<\/p>\n<p>In our notebook, we\u2019ll create a variable for our gridded precipitation dataset containing the PCA result columns, then use the ArcPy <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/arcpy\/functions\/listfields.htm\"><strong><em>ListFields()<\/em><\/strong><\/a> function to make sure we have all the necessary fields.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370442,"id":2370442,"title":"kmeans_input","filename":"kmeans_input.jpg","filesize":94479,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/kmeans_input","alt":"","author":"154341","description":"","caption":"","name":"kmeans_input","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:50:53","modified":"2024-06-10 14:50:53","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1284,"height":266,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","medium-width":464,"medium-height":96,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","medium_large-width":768,"medium_large-height":159,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","large-width":1284,"large-height":266,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","1536x1536-width":1284,"1536x1536-height":266,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","2048x2048-width":1284,"2048x2048-height":266,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input-826x171.jpg","card_image-width":826,"card_image-height":171,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_input.jpg","wide_image-width":1284,"wide_image-height":266}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We\u2019ll then input this feature class into the <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/multivariate-clustering.htm\"><strong>Multivariate Clustering<\/strong><\/a> tool, which is the implementation of the K-means clustering algorithm in ArcGIS Pro.\u00a0 Here is a bit more information on some of the important parameters:<\/p>\n<ul>\n<li><em>analysis_fields<\/em> &#8211; the first three principal components, following the workflow in the paper and what we used in the Ward\u2019s Method<\/li>\n<li><em>clustering_method \u00ad <\/em>&#8211; specifying K-means. <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/spatial-statistics\/how-multivariate-clustering-works.htm#ESRI_SECTION1_5C0726507A4E49F9997BFB19219CE123\">K-medoids<\/a> is also an option<\/li>\n<li><em>initialization_method<\/em> \u2013 how the initial cluster seeds are created (random, optimized, user-defined)<\/li>\n<li><em>number_of_clusters<\/em> \u2013 desired number of clusters<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2370452,"id":2370452,"title":"kmeans_tool_2","filename":"kmeans_tool_2.jpg","filesize":124204,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/kmeans_tool_2","alt":"","author":"154341","description":"","caption":"","name":"kmeans_tool_2","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:53:23","modified":"2024-06-10 14:53:23","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1134,"height":615,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","medium-width":464,"medium-height":252,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","medium_large-width":768,"medium_large-height":417,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","large-width":1134,"large-height":615,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","1536x1536-width":1134,"1536x1536-height":615,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","2048x2048-width":1134,"2048x2048-height":615,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2-826x448.jpg","card_image-width":826,"card_image-height":448,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_tool_2.jpg","wide_image-width":1134,"wide_image-height":615}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2>Post-processing<\/h2>\n<p>Before comparing and interpreting our final precipitation region maps, we\u2019ll do a few post-processing steps to clean up our data.\u00a0 Both the Ward\u2019s and K-means clustering algorithms were run on vector feature classes, so our first step will be to convert them to rasters.\u00a0 We\u2019ll use the <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/conversion\/point-to-raster.htm\"><strong>Point to Raster<\/strong><\/a> tool to convert the 16km grid points to a raster, specifying 0.166 to represent a 16km by 16km raster cell size.\u00a0 The\u00a0<em>Value field<\/em>\u00a0parameter specifies the attribute that will be assigned to the output raster, which in this case is the cluster ID.\u00a0 The\u00a0<em>Cell assignment type<\/em>\u00a0parameter specifies which value will be given to each new raster cell based on the underlying grid points.\u00a0 Unlike our previous use of this tool to spatially aggregate (e.g. calculate the spatial average) a 4km grid dataset into a 16km grid, here we are not changing resolution so we assign the cluster ID of each grid point to its underlying raster cell.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370462,"id":2370462,"title":"point_to_raster_wards","filename":"point_to_raster_wards.jpg","filesize":47755,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/point_to_raster_wards","alt":"","author":"154341","description":"","caption":"","name":"point_to_raster_wards","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 14:59:59","modified":"2024-06-10 14:59:59","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":486,"height":550,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","medium-width":231,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","medium_large-width":486,"medium_large-height":550,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","large-width":486,"large-height":550,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","1536x1536-width":486,"1536x1536-height":550,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","2048x2048-width":486,"2048x2048-height":550,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards-411x465.jpg","card_image-width":411,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/point_to_raster_wards.jpg","wide_image-width":486,"wide_image-height":550}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We apply the same steps to the K-means clustering output, and apply an appropriate symbology for a categorical map.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2370482,"id":2370482,"title":"wards_noisy_legend","filename":"wards_noisy_legend.jpg","filesize":275278,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/wards_noisy_legend","alt":"","author":"154341","description":"","caption":"Ward\u2019s Method 13-cluster solution.","name":"wards_noisy_legend","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 15:06:01","modified":"2024-06-10 15:06:24","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1469,"height":1004,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","medium-width":382,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","medium_large-width":768,"medium_large-height":525,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","large-width":1469,"large-height":1004,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","1536x1536-width":1469,"1536x1536-height":1004,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","2048x2048-width":1469,"2048x2048-height":1004,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend-680x465.jpg","card_image-width":680,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_noisy_legend.jpg","wide_image-width":1469,"wide_image-height":1004}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2370492,"id":2370492,"title":"kmeans_noisy_legend","filename":"kmeans_noisy_legend.jpg","filesize":228069,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/kmeans_noisy_legend","alt":"","author":"154341","description":"","caption":"K-means 13-cluster solution.","name":"kmeans_noisy_legend","status":"inherit","uploaded_to":2361222,"date":"2024-06-10 15:06:37","modified":"2024-06-10 15:06:53","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1469,"height":1003,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","medium-width":382,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","medium_large-width":768,"medium_large-height":524,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","large-width":1469,"large-height":1003,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","1536x1536-width":1469,"1536x1536-height":1003,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","2048x2048-width":1469,"2048x2048-height":1003,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend-681x465.jpg","card_image-width":681,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_noisy_legend.jpg","wide_image-width":1469,"wide_image-height":1003}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>In both maps, the precipitation regions look relatively homogeneous.\u00a0 However, there are several locations that appear noisy and speckled due to the \u201csalt and pepper effect\u201d\u2014e.g. individual isolated raster cells or small groups of raster cells of one precipitation region (cluster) within larger contiguous areas of another region.\u00a0 In the original paper, the authors attempted to solve this issue by overlaying a 2\u00b0 by 2\u00b0 fishnet grid on top of the 16km gridded cluster maps and assigning each 2\u00b0 grid cell the majority cluster that falls within it.\u00a0 For this blog, we\u2019ve taken a slightly different, more granular approach.<\/p>\n<p>We first convert the raster cluster maps to vector using the <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/conversion\/raster-to-polygon.htm\">Raster to Polygon<\/a><\/strong> tool, choosing the \u201cnon-simplified\u201d output to ensure that the polygon edges will be the same as the raster cell edges.\u00a0 We then run the <strong><a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/data-management\/eliminate.htm\">Eliminate<\/a><\/strong> tool to clean up some of the noisy areas in our maps.\u00a0 This tool works by merging small polygons with neighboring polygons that have either the longest shared border or largest area.<\/p>\n<p>When using the <strong>Eliminate<\/strong> tool, your input polygons need to have a selection applied to identify which small polygons will be merged in with their larger neighbors.\u00a0 To create this selection, I needed to determine some threshold value (e.g. polygon size) below which all smaller polygons will be eliminated. \u00a0After a bit of manual exploration around the cluster maps, I was able to determine that the threshold values for the Ward\u2019s and K-means cluster maps are 9,200 km<sup>2<\/sup> and 10,300 km<sup>2<\/sup>, respectively. \u00a0These values were used in the <em>Expression<\/em> parameter of the <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/data-management\/select-layer-by-attribute.htm\"><strong>Select Layer By Attribute<\/strong><\/a> tool to apply a selection to each cluster map, then these layers were passed into the <strong>Eliminate<\/strong> tool to remove the small polygons.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2372812,"id":2372812,"title":"select_layer_by_attribute_eliminate","filename":"select_layer_by_attribute_eliminate.jpg","filesize":38285,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/select_layer_by_attribute_eliminate","alt":"","author":"154341","description":"","caption":"","name":"select_layer_by_attribute_eliminate","status":"inherit","uploaded_to":2361222,"date":"2024-06-11 11:34:39","modified":"2024-06-11 11:34:39","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":420,"height":471,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","medium-width":233,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","medium_large-width":420,"medium_large-height":471,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","large-width":420,"large-height":471,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","1536x1536-width":420,"1536x1536-height":471,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","2048x2048-width":420,"2048x2048-height":471,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate-415x465.jpg","card_image-width":415,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/select_layer_by_attribute_eliminate.jpg","wide_image-width":420,"wide_image-height":471}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2372822,"id":2372822,"title":"side_by_side_eliminate","filename":"side_by_side_eliminate.jpg","filesize":282365,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/side_by_side_eliminate","alt":"","author":"154341","description":"","caption":"Output cluster maps before (left) and after (right) the elimination of small polygons.","name":"side_by_side_eliminate","status":"inherit","uploaded_to":2361222,"date":"2024-06-11 11:38:10","modified":"2024-06-11 11:38:46","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1888,"height":652,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate.jpg","medium-width":464,"medium-height":160,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate.jpg","medium_large-width":768,"medium_large-height":265,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate.jpg","large-width":1888,"large-height":652,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate-1536x530.jpg","1536x1536-width":1536,"1536x1536-height":530,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate.jpg","2048x2048-width":1888,"2048x2048-height":652,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate-826x285.jpg","card_image-width":826,"card_image-height":285,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/side_by_side_eliminate.jpg","wide_image-width":1888,"wide_image-height":652}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>The resulting dataset can then be input into the <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/tool-reference\/data-management\/dissolve.htm\"><strong>Dissolve<\/strong><\/a> tool to aggregate all polygons assigned to the same cluster into one polygon representing that cluster.<\/p>\n<h2>Results<\/h2>\n<p>Following these post-processing steps, the final output maps appear cleaner and much less noisy.\u00a0 Generally speaking, both clustering approaches produce visually similar results, with some noticeable differences in the Northwest, Southeast, and Northeast.\u00a0 For example, portions of the Northwest in the Ward\u2019s cluster map are much less compact than in the K-means cluster map.<\/p>\n<p>When comparing either of the two clustering solutions to the original 9-region NCEI map, it appears that precipitation regions are much more compact and contiguous in the eastern US, while there is much more spatial heterogeneity and local variation west of the Rocky Mountains.\u00a0 It\u2019s clear from these maps that using a data-driven approach to delineate climate regions provides a much more robust picture of the climate geography of the US than by simply grouping together state boundaries.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2372832,"id":2372832,"title":"wards_clean_legend","filename":"wards_clean_legend.jpg","filesize":241406,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/wards_clean_legend","alt":"","author":"154341","description":"","caption":"Ward\u2019s Method 13-cluster solution following post-processing.","name":"wards_clean_legend","status":"inherit","uploaded_to":2361222,"date":"2024-06-11 11:47:00","modified":"2024-06-11 11:47:11","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1486,"height":1005,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","medium-width":386,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","medium_large-width":768,"medium_large-height":519,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","large-width":1486,"large-height":1005,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","1536x1536-width":1486,"1536x1536-height":1005,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","2048x2048-width":1486,"2048x2048-height":1005,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend-688x465.jpg","card_image-width":688,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/wards_clean_legend.jpg","wide_image-width":1486,"wide_image-height":1005}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2372842,"id":2372842,"title":"kmeans_clean_legend","filename":"kmeans_clean_legend.jpg","filesize":229024,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/kmeans_clean_legend","alt":"","author":"154341","description":"","caption":"K-means 13-cluster solution following post-processing.","name":"kmeans_clean_legend","status":"inherit","uploaded_to":2361222,"date":"2024-06-11 11:49:42","modified":"2024-06-11 11:49:51","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1485,"height":1004,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","medium-width":386,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","medium_large-width":768,"medium_large-height":519,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","large-width":1485,"large-height":1004,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","1536x1536-width":1485,"1536x1536-height":1004,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","2048x2048-width":1485,"2048x2048-height":1004,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend-688x465.jpg","card_image-width":688,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/kmeans_clean_legend.jpg","wide_image-width":1485,"wide_image-height":1004}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Other notable findings and observations:<\/p>\n<ul>\n<li>In general, the regions are mostly spatially contiguous, despite the fact that both clustering algorithms operated on <em>only<\/em> the input attributes (e.g. there were no spatial constraints enforced). For example, portions of the Southeast and South (cluster 2 &#8211; purple) belong to the same cluster in the Ward\u2019s map, while portions of the Southeast and Upper Midwest (cluster 9 \u2013 green) are members of the same cluster in the K-means map.\u00a0 Clearly, the underlying geography matters!<\/li>\n<li>Apart from the Pacific Northwest, precipitation amount and frequency is on average highest in the Southeast and Northeast, and lowest in the southern West and Southwest.<\/li>\n<li>On average, the Gini Coefficient values decrease moving from east to west across the US, while the Lorenz Asymmetry Coefficient values increase slightly from east to west. This means that in the East there are generally many small precipitation events, with more variability in the amount of precipitation received from event to event.\u00a0 In the west, there are fewer large precipitation events, with each event having a similar amount of precipitation.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2372852,"id":2372852,"title":"spring_precip_boxplot","filename":"spring_precip_boxplot.jpg","filesize":46569,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/spring_precip_boxplot","alt":"","author":"154341","description":"","caption":"","name":"spring_precip_boxplot","status":"inherit","uploaded_to":2361222,"date":"2024-06-11 11:59:02","modified":"2024-06-11 11:59:02","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1232,"height":381,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","medium-width":464,"medium-height":143,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","medium_large-width":768,"medium_large-height":238,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","large-width":1232,"large-height":381,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","1536x1536-width":1232,"1536x1536-height":381,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","2048x2048-width":1232,"2048x2048-height":381,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot-826x255.jpg","card_image-width":826,"card_image-height":255,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/spring_precip_boxplot.jpg","wide_image-width":1232,"wide_image-height":381}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2372862,"id":2372862,"title":"fall_gini_boxplot","filename":"fall_gini_boxplot.jpg","filesize":50991,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\/fall_gini_boxplot","alt":"","author":"154341","description":"","caption":"Boxplots showing the distribution of average spring precipitation (mm) and average fall Gini Coefficients for the 13 precipitation regions generated using the K-means clustering algorithm.","name":"fall_gini_boxplot","status":"inherit","uploaded_to":2361222,"date":"2024-06-11 12:01:09","modified":"2024-06-14 13:14:06","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1232,"height":380,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","medium-width":464,"medium-height":143,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","medium_large-width":768,"medium_large-height":237,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","large-width":1232,"large-height":380,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","1536x1536-width":1232,"1536x1536-height":380,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","2048x2048-width":1232,"2048x2048-height":380,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot-826x255.jpg","card_image-width":826,"card_image-height":255,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/fall_gini_boxplot.jpg","wide_image-width":1232,"wide_image-height":380}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2>Conclusion<\/h2>\n<p>This blog series detailed how you can combine Esri technology and open-source Python and R to perform an end-to-end spatial data science project.\u00a0 I hope that it not only taught you a few new techniques and workflows, but also inspired you reproduce research and keep advancing the science of GIS!<\/p>\n"},{"acf_fc_layout":"sidebar","content":"<h2 style=\"text-align: left\">Spatial data science with R, Python, and ArcGIS<\/h2>\n<p>Here are the links to all the articles of the series:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-1-clustering-us-precipitation-regions\/\">Part 1<\/a>. Clustering US Precipitation Regions<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-2-data-preparation-and-data-engineering-using-r\/\">Part 2<\/a>. Data preparation and data engineering using R<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/\">Part 3<\/a>. Data preparation and data engineering using Python<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-4-data-preparation-using-spatial-analysis-and-automation-in-arcgis\/\" target=\"_blank\" rel=\"noopener\">Part 4<\/a>. Data preparation using spatial analysis and automation in ArcGIS<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\">Part 5<\/a>. Machine Learning: Cluster analysis using Python and ArcGIS<\/li>\n<\/ul>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false}],"related_articles":"","card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/cluster_map_resized.jpg","wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/AdobeStock_96810852_fixed-1.png","show_article_image":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-13T15:09:44+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"20 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\"},\"author\":{\"name\":\"Nicholas Giner\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2\"},\"headline\":\"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS\",\"datePublished\":\"2024-06-11T14:39:16+00:00\",\"dateModified\":\"2025-01-13T15:09:44+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\"},\"wordCount\":12,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"Data Engineering\",\"machine learning\",\"python\",\"r\",\"spatial data science\"],\"articleSection\":[\"Analytics\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\",\"name\":\"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2024-06-11T14:39:16+00:00\",\"dateModified\":\"2025-01-13T15:09:44+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2\",\"name\":\"Nicholas Giner\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg\",\"caption\":\"Nicholas Giner\"},\"description\":\"Nick Giner is a Product Manager for Spatial Analysis and Data Science. Prior to joining Esri in 2014, he completed Bachelor\u2019s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read\/watch shows about history.\",\"sameAs\":[\"www.linkedin.com\/in\/nicholas-giner-0282966b\",\"https:\/\/x.com\/NickGiner\"],\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/author\/nginer\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis","og_locale":"en_US","og_type":"article","og_title":"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2025-01-13T15:09:44+00:00","twitter_card":"summary_large_image","twitter_site":"@ESRI","twitter_misc":{"Est. reading time":"20 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis"},"author":{"name":"Nicholas Giner","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2"},"headline":"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS","datePublished":"2024-06-11T14:39:16+00:00","dateModified":"2025-01-13T15:09:44+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis"},"wordCount":12,"commentCount":0,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["Data Engineering","machine learning","python","r","spatial data science"],"articleSection":["Analytics"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis","name":"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2024-06-11T14:39:16+00:00","dateModified":"2025-01-13T15:09:44+00:00","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/analytics\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"End-to-end spatial data science 5: Machine learning: Cluster analysis in Python and ArcGIS"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2","name":"Nicholas Giner","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg","caption":"Nicholas Giner"},"description":"Nick Giner is a Product Manager for Spatial Analysis and Data Science. Prior to joining Esri in 2014, he completed Bachelor\u2019s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read\/watch shows about history.","sameAs":["www.linkedin.com\/in\/nicholas-giner-0282966b","https:\/\/x.com\/NickGiner"],"url":"https:\/\/www.esri.com\/arcgis-blog\/author\/nginer"}]}},"text_date":"June 11, 2024","author_name":"Nicholas Giner","author_page":"https:\/\/www.esri.com\/arcgis-blog\/author\/nginer","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2024\/06\/AdobeStock_96810852_fixed-1.png","primary_product":"ArcGIS API for Python","tag_data":[{"term_id":760452,"name":"Data Engineering","slug":"data-engineering","term_group":0,"term_taxonomy_id":760452,"taxonomy":"post_tag","description":"","parent":0,"count":34,"filter":"raw"},{"term_id":35661,"name":"machine learning","slug":"machine-learning","term_group":0,"term_taxonomy_id":35661,"taxonomy":"post_tag","description":"","parent":0,"count":41,"filter":"raw"},{"term_id":24341,"name":"python","slug":"python","term_group":0,"term_taxonomy_id":24341,"taxonomy":"post_tag","description":"","parent":0,"count":171,"filter":"raw"},{"term_id":30241,"name":"r","slug":"r","term_group":0,"term_taxonomy_id":30241,"taxonomy":"post_tag","description":"","parent":0,"count":19,"filter":"raw"},{"term_id":759592,"name":"spatial data science","slug":"spatial-data-science","term_group":0,"term_taxonomy_id":759592,"taxonomy":"post_tag","description":"","parent":0,"count":17,"filter":"raw"}],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1325,"filter":"raw"}],"product_data":[{"term_id":36841,"name":"ArcGIS API for Python","slug":"api-python","term_group":0,"term_taxonomy_id":36841,"taxonomy":"product","description":"","parent":36601,"count":151,"filter":"raw"},{"term_id":36561,"name":"ArcGIS Pro","slug":"arcgis-pro","term_group":0,"term_taxonomy_id":36561,"taxonomy":"product","description":"","parent":0,"count":2035,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=api-python","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2361222","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/154341"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=2361222"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2361222\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=2361222"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=2361222"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=2361222"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=2361222"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=2361222"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}