{"id":2191552,"date":"2023-12-14T10:00:55","date_gmt":"2023-12-14T18:00:55","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2191552"},"modified":"2024-06-11T06:37:02","modified_gmt":"2024-06-11T13:37:02","slug":"end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python","title":{"rendered":"End-to-end spatial data science 3: Data preparation and data engineering using Python"},"author":154341,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341],"tags":[760452,35661,24341,30241,759592],"industry":[],"product":[36841,36561],"class_list":["post-2191552","blog","type-blog","status-publish","format-standard","hentry","category-analytics","tag-data-engineering","tag-machine-learning","tag-python","tag-r","tag-spatial-data-science","product-api-python","product-arcgis-pro"],"acf":{"authors":[{"ID":154341,"user_firstname":"Nicholas","user_lastname":"Giner","nickname":"Nick Giner","user_nicename":"nginer","display_name":"Nicholas Giner","user_email":"NGiner@esri.com","user_url":"","user_registered":"2021-01-07 14:31:25","user_description":"Nick Giner is a Product Manager for Spatial Analysis and Data Science.  Prior to joining Esri in 2014, he completed Bachelor\u2019s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read\/watch shows about history.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"short_description":"This is the third in a series of blogs that showcase an end-to-end spatial data science workflow for clustering US precipitation regions.","flexible_content":[{"acf_fc_layout":"content","content":"<h2>Introduction<\/h2>\n<p>In the <a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-2-data-preparation-and-data-engineering-using-r\/\">previous blog article<\/a>, we used R to process a 30-year daily precipitation dataset (~11,000 rasters) into a collection of 90 CSV files, where each CSV file contains seasonal calculations of four precipitation variables at each location in a 4km by 4km gridded dataset of the US.\u00a0 For example, the x\/y location in the first row of this CSV file experienced precipitation in 14 of the 90 days in the Fall of 2010, totaling about 188 millimeters of precipitation.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2193972,"id":2193972,"title":"csv_example_3","filename":"csv_example_3.jpg","filesize":513909,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/csv_example_3","alt":"","author":"154341","description":"","caption":"Example of the summer\/fall 2010 CSV file with precipitation variable calculations.","name":"csv_example_3","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 15:59:16","modified":"2023-12-12 15:59:23","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1787,"height":558,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3.jpg","medium-width":464,"medium-height":145,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3.jpg","medium_large-width":768,"medium_large-height":240,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3.jpg","large-width":1787,"large-height":558,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3-1536x480.jpg","1536x1536-width":1536,"1536x1536-height":480,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3.jpg","2048x2048-width":1787,"2048x2048-height":558,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3-826x258.jpg","card_image-width":826,"card_image-height":258,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_example_3.jpg","wide_image-width":1787,"wide_image-height":558}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>In this blog article, we\u2019ll use Python* to do some additional data engineering steps to further aggregate the data by calculating the long-term, 30-year averages of each of the four precipitation variables, for each season.\u00a0 Our final dataset will therefore contain 16 precipitation variables for each location in the 4km by 4km grid.<\/p>\n"},{"acf_fc_layout":"sidebar","content":"<p><strong>Note:<\/strong> The appendix section of this blog article has steps to replicate the Python code in R, if that is your language of choice.<\/p>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"content","content":"<h2>Python libraries<\/h2>\n<p>Just as we did in the previous blog post, our first few lines of code will be for loading the Python libraries we require for our task.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194752,"id":2194752,"title":"python_libs","filename":"python_libs-1.jpg","filesize":29948,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/python_libs-2","alt":"","author":"154341","description":"","caption":"","name":"python_libs-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:29:38","modified":"2023-12-12 17:29:38","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":606,"height":158,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1-213x158.jpg","thumbnail-width":213,"thumbnail-height":158,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","medium-width":464,"medium-height":121,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","medium_large-width":606,"medium_large-height":158,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","large-width":606,"large-height":158,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","1536x1536-width":606,"1536x1536-height":158,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","2048x2048-width":606,"2048x2048-height":158,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","card_image-width":606,"card_image-height":158,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/python_libs-1.jpg","wide_image-width":606,"wide_image-height":158}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li><a href=\"https:\/\/pandas.pydata.org\/\">pandas<\/a> \u2013 working with and manipulating tabular data<\/li>\n<li><a href=\"https:\/\/developers.arcgis.com\/python\/\">arcgis<\/a> \u2013 the ArcGIS API for Python, used to convert ArcGIS data to Pandas DataFrames<\/li>\n<li><a href=\"https:\/\/docs.python.org\/3\/library\/glob.html\">glob<\/a> \u2013 working with file paths<\/li>\n<li><a href=\"https:\/\/docs.python.org\/3\/library\/os.html\">os<\/a> \u2013 working with operating system files and file directories<\/li>\n<\/ul>\n<h2>Combining multiple CSVs into one Pandas DataFrame<\/h2>\n<p>The first thing we need to do is have a look inside the folder of CSVs we created in the first blog.\u00a0 We\u2019ll create a variable for the folder location where the CSVs are stored, then use the glob library to search for files in this folder location that match a certain pattern, in this case all files that end with a .csv file extension.\u00a0 The result is stored as a list with 90 elements representing the 90 seasonal CSV files.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194762,"id":2194762,"title":"csv_concat","filename":"csv_concat.jpg","filesize":150047,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/csv_concat","alt":"","author":"154341","description":"","caption":"","name":"csv_concat","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:35:57","modified":"2023-12-12 17:35:57","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1811,"height":676,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat.jpg","medium-width":464,"medium-height":173,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat.jpg","medium_large-width":768,"medium_large-height":287,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat.jpg","large-width":1811,"large-height":676,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat-1536x573.jpg","1536x1536-width":1536,"1536x1536-height":573,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat.jpg","2048x2048-width":1811,"2048x2048-height":676,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat-826x308.jpg","card_image-width":826,"card_image-height":308,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_concat.jpg","wide_image-width":1811,"wide_image-height":676}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Next, we\u2019ll use the Pandas <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.read_csv.html\"><em>.read_csv<\/em><\/a><\/strong> function in a Python list comprehension to read each CSV file into a Pandas DataFrame, then use the pandas <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.concat.html\"><em>.concat<\/em><\/a><\/strong>\u00a0function to combine all 90 DataFrames into one large DataFrame.<\/p>\n"},{"acf_fc_layout":"sidebar","content":"<p><strong>Note:<\/strong> Adding the %%time <a href=\"https:\/\/ipython.readthedocs.io\/en\/stable\/interactive\/magics.html\">magic command<\/a> at the top of a notebook cell calculates the runtime of the cell.\u00a0 This is useful when you have cells in your notebook that take longer than a few seconds to run.\u00a0 I\u2019ll use this in most of the cells in this notebook, as the dataset I\u2019ll be working with contains over 57 million records and some of the data engineering steps take a few minutes.<\/p>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false},{"acf_fc_layout":"image","image":{"ID":2194782,"id":2194782,"title":"csv_properties","filename":"csv_properties.jpg","filesize":38104,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/csv_properties","alt":"","author":"154341","description":"","caption":"","name":"csv_properties","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:39:47","modified":"2023-12-12 17:39:47","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":670,"height":139,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties-213x139.jpg","thumbnail-width":213,"thumbnail-height":139,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","medium-width":464,"medium-height":96,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","medium_large-width":670,"medium_large-height":139,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","large-width":670,"large-height":139,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","1536x1536-width":670,"1536x1536-height":139,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","2048x2048-width":670,"2048x2048-height":139,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","card_image-width":670,"card_image-height":139,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/csv_properties.jpg","wide_image-width":670,"wide_image-height":139}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We can have a look at the dataset dimensions (rows and columns) using the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.shape.html\"><em>.shape<\/em><\/a><\/strong> property, then use the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.head.html\"><em>.head<\/em><\/a><\/strong>\u00a0function to look at the first few rows of the DataFrame.\u00a0 You should recognize the precipitation variables from the columns in the CSV files, then double check that the total of 57,795,720 rows corresponds to the 30-year seasonal time series at each location (481,631 locations * 120 seasons).<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194792,"id":2194792,"title":"merged_df","filename":"merged_df-2.jpg","filesize":152466,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/merged_df-3","alt":"","author":"154341","description":"","caption":"","name":"merged_df-3","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:40:48","modified":"2023-12-12 17:40:48","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1282,"height":405,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","medium-width":464,"medium-height":147,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","medium_large-width":768,"medium_large-height":243,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","large-width":1282,"large-height":405,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","1536x1536-width":1282,"1536x1536-height":405,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","2048x2048-width":1282,"2048x2048-height":405,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2-826x261.jpg","card_image-width":826,"card_image-height":261,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-2.jpg","wide_image-width":1282,"wide_image-height":405}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Last, let\u2019s make sure we get a sense of whether we have any missing data.\u00a0 For this, I\u2019ll use the <em><strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.isnull.html\">.isnull<\/a><\/strong><\/em> function to find missing values in the DataFrame, then the <a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.sum.html\"><em><strong>.sum<\/strong><\/em><\/a> function to add them up for each row.\u00a0 In this case, the axis parameter indicates whether you are adding horizontally (rows) or vertically (columns).<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194812,"id":2194812,"title":"df_nulls","filename":"df_nulls.jpg","filesize":48464,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/df_nulls","alt":"","author":"154341","description":"","caption":"","name":"df_nulls","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:42:54","modified":"2023-12-12 17:42:54","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":434,"height":391,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","medium-width":290,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","medium_large-width":434,"medium_large-height":391,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","large-width":434,"large-height":391,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","1536x1536-width":434,"1536x1536-height":391,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","2048x2048-width":434,"2048x2048-height":391,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","card_image-width":434,"card_image-height":391,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/df_nulls.jpg","wide_image-width":434,"wide_image-height":391}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Here we can see that four of the columns contain about 0.25% missing values, which by most standards is negligible and will not have an impact on the final results.\u00a0 In our case, we know that these missing values are expected because they are locations that <em>experienced zero precipitation<\/em> <em>throughout an entire season<\/em>, and therefore will not have measures of variability (Gini Coefficient) or inequity (Lorenz Asymmetry Coefficient) within a season.<\/p>\n<h2>Data engineering: Calculating the 30-year averages of the four seasonal precipitation variables<\/h2>\n<p>At this point, we have a 120-season time series of each of the four precipitation variables at each location in this dataset, where the season is indicated by the \u201cseason_year\u201d column.\u00a0 We ultimately want to calculate a single average of each of the precipitation variables for each season, at each location, so the resulting dataset will contain four seasonal averages (winter, spring, summer, fall) of each precipitation variable, at each location.<\/p>\n<p>We\u2019ll first create a new \u201cseason\u201d column by stripping off the year from the \u201cseason_year\u201d column. \u00a0This is achieved using the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.Series.str.split.html\"><em>.str.split<\/em><\/a><\/strong> function, which splits a text string based on a separator or delimiter (in this case, a space) and then returns the first index position, which here is only the season from the \u201cseason_year\u201d column.\u00a0 We can then use the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.Series.unique.html\"><em>.unique<\/em><\/a><\/strong> function on the new column to verify that the four seasons are the only values available.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194832,"id":2194832,"title":"season_column","filename":"season_column.jpg","filesize":159552,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/season_column","alt":"","author":"154341","description":"","caption":"","name":"season_column","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:45:59","modified":"2023-12-12 17:45:59","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1288,"height":474,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","medium-width":464,"medium-height":171,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","medium_large-width":768,"medium_large-height":283,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","large-width":1288,"large-height":474,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","1536x1536-width":1288,"1536x1536-height":474,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","2048x2048-width":1288,"2048x2048-height":474,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column-826x304.jpg","card_image-width":826,"card_image-height":304,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/season_column.jpg","wide_image-width":1288,"wide_image-height":474}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2194842,"id":2194842,"title":"unique","filename":"unique.jpg","filesize":29580,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/unique-2","alt":"","author":"154341","description":"","caption":"","name":"unique-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:47:06","modified":"2023-12-12 17:47:06","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":594,"height":158,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique-213x158.jpg","thumbnail-width":213,"thumbnail-height":158,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","medium-width":464,"medium-height":123,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","medium_large-width":594,"medium_large-height":158,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","large-width":594,"large-height":158,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","1536x1536-width":594,"1536x1536-height":158,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","2048x2048-width":594,"2048x2048-height":158,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","card_image-width":594,"card_image-height":158,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/unique.jpg","wide_image-width":594,"wide_image-height":158}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We now have what we need to calculate one single average for each precipitation variable, for each season, at each location.\u00a0 We\u2019ll use <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.groupby.html\"><em>.groupby<\/em><\/a><\/strong> to create groupings of data, then <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.agg.html\"><em>.agg<\/em><\/a><\/strong> to apply an operation to the groupings.\u00a0 In our case, we\u2019re grouping the data by combination of location and season (e.g. four seasons at each location), then taking the averages of all four precipitation variables within each season.\u00a0 The <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.reset_index.html\"><em>.reset_index<\/em><\/a><\/strong> method at the end ensures that the DataFrame has a numeric index that starts at 0 and increases by 1 for each subsequent row.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194872,"id":2194872,"title":"agg_groupby","filename":"agg_groupby.jpg","filesize":237111,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/agg_groupby","alt":"","author":"154341","description":"","caption":"","name":"agg_groupby","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:51:03","modified":"2023-12-12 17:51:03","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1175,"height":669,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","medium-width":458,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","medium_large-width":768,"medium_large-height":437,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","large-width":1175,"large-height":669,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","1536x1536-width":1175,"1536x1536-height":669,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","2048x2048-width":1175,"2048x2048-height":669,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby-817x465.jpg","card_image-width":817,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/agg_groupby.jpg","wide_image-width":1175,"wide_image-height":669}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>We can see here that the DataFrame has been reduced from 57,795,720 rows (4 seasons x 30 years x 481,631 locations) to 1,926,524 rows containing the average of each of the four precipitation variables, for each season, in each location (4 seasonal averages x 481,631 locations).<\/p>\n<p>While this step has drastically reduced the number of rows in the table, it is still a <em>long table<\/em>, meaning that there is duplicate information in the table rows.\u00a0 In this case, there are four rows for each location (\u201ccoordinates\u201d column), with each of these rows representing the seasonal average of the four precipitation variables.\u00a0 We\u2019ll next use the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.pivot_table.html\"><em>.pivot_table<\/em><\/a><\/strong> function to flip the table from long to wide, such that each row represents an individual location and the columns contain the corresponding season\/precipitation variable information.\u00a0 The <em>index<\/em> parameter specifies which column becomes the row index of the resulting DataFrame, which in this case is the \u201ccoordinates\u201d column.\u00a0 We specify \u201cseason\u201d in the <em>columns<\/em> parameter, and the <em>values <\/em>parameter contains the four precipitation variables.\u00a0 The resulting DataFrame now contains one row for each location and a total of 16 precipitation variables (4 variables x 4 seasons).<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194882,"id":2194882,"title":"reshape_pivot","filename":"reshape_pivot.jpg","filesize":72318,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/reshape_pivot","alt":"","author":"154341","description":"","caption":"","name":"reshape_pivot","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:53:32","modified":"2023-12-12 17:53:32","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1246,"height":181,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot-213x181.jpg","thumbnail-width":213,"thumbnail-height":181,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","medium-width":464,"medium-height":67,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","medium_large-width":768,"medium_large-height":112,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","large-width":1246,"large-height":181,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","1536x1536-width":1246,"1536x1536-height":181,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","2048x2048-width":1246,"2048x2048-height":181,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot-826x120.jpg","card_image-width":826,"card_image-height":120,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pivot.jpg","wide_image-width":1246,"wide_image-height":181}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2194892,"id":2194892,"title":"reshape_pibot_results","filename":"reshape_pibot_results.jpg","filesize":319160,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/reshape_pibot_results","alt":"","author":"154341","description":"","caption":"","name":"reshape_pibot_results","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:54:31","modified":"2023-12-12 17:54:31","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1308,"height":692,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","medium-width":464,"medium-height":245,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","medium_large-width":768,"medium_large-height":406,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","large-width":1308,"large-height":692,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","1536x1536-width":1308,"1536x1536-height":692,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","2048x2048-width":1308,"2048x2048-height":692,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results-826x437.jpg","card_image-width":826,"card_image-height":437,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/reshape_pibot_results.jpg","wide_image-width":1308,"wide_image-height":692}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>You might notice that the resulting DataFrame appears to have two header rows (the precipitation variable, with four seasons under it).\u00a0 The pivot table step actually results in something called a MultiIndex DataFrame (e.g. hierarchical index), which means that it can have multiple levels.\u00a0 In this example, the two levels are the precipitation variable and the season.\u00a0 We had to use one more step to flatten the multiindex, which we achieved by converting the Multindex DataFrame to a NumPy array, then back to a single index DataFrame using the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.to_records.html\"><em>.to_records<\/em><\/a><\/strong> function.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194902,"id":2194902,"title":"flatten","filename":"flatten.jpg","filesize":191951,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/flatten","alt":"","author":"154341","description":"","caption":"","name":"flatten","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:55:57","modified":"2023-12-12 17:55:57","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1296,"height":525,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","medium-width":464,"medium-height":188,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","medium_large-width":768,"medium_large-height":311,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","large-width":1296,"large-height":525,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","1536x1536-width":1296,"1536x1536-height":525,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","2048x2048-width":1296,"2048x2048-height":525,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten-826x335.jpg","card_image-width":826,"card_image-height":335,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten.jpg","wide_image-width":1296,"wide_image-height":525}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>As you can see, the column names contain parentheses and single quotes, and it is always a best practice to remove such special characters.\u00a0 We can print all the column names using the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.columns.html#pandas.DataFrame.columns\"><em>.columns<\/em><\/a><\/strong> attribute, then pass a dict into the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.DataFrame.rename.html\"><em>.rename<\/em><\/a><\/strong> function, where each key is the original column name mapped to each value as the new column name.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194912,"id":2194912,"title":"flatten_cols","filename":"flatten_cols.jpg","filesize":99370,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/flatten_cols","alt":"","author":"154341","description":"","caption":"","name":"flatten_cols","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:57:42","modified":"2023-12-12 17:57:42","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":730,"height":286,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","medium-width":464,"medium-height":182,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","medium_large-width":730,"medium_large-height":286,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","large-width":730,"large-height":286,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","1536x1536-width":730,"1536x1536-height":286,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","2048x2048-width":730,"2048x2048-height":286,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","card_image-width":730,"card_image-height":286,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/flatten_cols.jpg","wide_image-width":730,"wide_image-height":286}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2194922,"id":2194922,"title":"rename_cols","filename":"rename_cols.jpg","filesize":311431,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/rename_cols","alt":"","author":"154341","description":"","caption":"","name":"rename_cols","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 17:58:56","modified":"2023-12-12 17:58:56","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1298,"height":708,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","medium-width":464,"medium-height":253,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","medium_large-width":768,"medium_large-height":419,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","large-width":1298,"large-height":708,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","1536x1536-width":1298,"1536x1536-height":708,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","2048x2048-width":1298,"2048x2048-height":708,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols-826x451.jpg","card_image-width":826,"card_image-height":451,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rename_cols.jpg","wide_image-width":1298,"wide_image-height":708}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>At this point, we have nearly everything we need to proceed with the next parts of our analysis.\u00a0 Each row in the DataFrame contains 16 columns representing 30-year seasonal averages of four different precipitation variables (4 seasons x 4 variables), along with a \u201ccoordinates\u201d column indicating one of the 481,631 locations in the dataset.\u00a0 Obviously, this is the column that we\u2019ll use to get this information on the map, but there are a few more simple yet crucial steps we have to do first.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194932,"id":2194932,"title":"coords_before_split","filename":"coords_before_split.jpg","filesize":180773,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/coords_before_split","alt":"","author":"154341","description":"","caption":"","name":"coords_before_split","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:00:54","modified":"2023-12-12 18:00:54","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":539,"height":583,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","medium-width":241,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","medium_large-width":539,"medium_large-height":583,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","large-width":539,"large-height":583,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","1536x1536-width":539,"1536x1536-height":583,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","2048x2048-width":539,"2048x2048-height":583,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split-430x465.jpg","card_image-width":430,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/coords_before_split.jpg","wide_image-width":539,"wide_image-height":583}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>The first step is to create two new columns representing the x- and y-coordinates of each location.\u00a0 Recall earlier in this blog that we used the <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.Series.str.split.html\"><em>.str.split<\/em><\/a><\/strong> function to split a text string based on a separator or delimiter, and we\u2019ll do the same thing again here to split the values in the \u201ccoordinates\u201d column based on the space between them.\u00a0 This results in two new individual columns, &#8220;x_coord&#8221; and &#8220;y_coord&#8221;.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2194952,"id":2194952,"title":"split_coords","filename":"split_coords.jpg","filesize":126644,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/split_coords","alt":"","author":"154341","description":"","caption":"","name":"split_coords","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:03:14","modified":"2023-12-12 18:03:14","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1296,"height":357,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","medium-width":464,"medium-height":128,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","medium_large-width":768,"medium_large-height":212,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","large-width":1296,"large-height":357,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","1536x1536-width":1296,"1536x1536-height":357,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","2048x2048-width":1296,"2048x2048-height":357,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords-826x228.jpg","card_image-width":826,"card_image-height":228,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/split_coords.jpg","wide_image-width":1296,"wide_image-height":357}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>With our x- and y-coordinates now in their own columns, we can use the ArcGIS API for Python to convert the DataFrame into a <a href=\"https:\/\/developers.arcgis.com\/python\/guide\/part1-introduction-to-sedf\/\">Spatially Enabled DataFrame<\/a> (SeDF).\u00a0 A SeDF is essentially a Pandas DataFrame with an additional SHAPE column representing the geometry of each row.\u00a0 This means that it can be used non-spatially in traditional Pandas operations, but can also be displayed on a map and used in true spatial operations such as buffers, distance calculations, and more.<\/p>\n<p>Here, we pass the DataFrame into the <strong><a href=\"https:\/\/developers.arcgis.com\/python\/guide\/part2-data-io-reading-data\/#read-in-dataframe-with-latlong-information\"><em>.from_xy<\/em><\/a><\/strong> method on the ArcGIS API for Python\u2019s <a href=\"https:\/\/developers.arcgis.com\/python\/api-reference\/arcgis.features.toc.html#geoaccessor\">GeoAccessor<\/a> class, specifying \u201cx_coord\u201d and \u201cy_coord\u201d as the x- and y-columns, respectively, as well as the appropriate coordinate system.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2195002,"id":2195002,"title":"sedf_table","filename":"sedf_table.jpg","filesize":164066,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/sedf_table","alt":"","author":"154341","description":"","caption":"","name":"sedf_table","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:15:28","modified":"2023-12-12 18:15:28","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1298,"height":498,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","medium-width":464,"medium-height":178,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","medium_large-width":768,"medium_large-height":295,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","large-width":1298,"large-height":498,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","1536x1536-width":1298,"1536x1536-height":498,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","2048x2048-width":1298,"2048x2048-height":498,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table-826x317.jpg","card_image-width":826,"card_image-height":317,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_table.jpg","wide_image-width":1298,"wide_image-height":498}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2195012,"id":2195012,"title":"sedf_info","filename":"sedf_info-1.jpg","filesize":187909,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/sedf_info-2","alt":"","author":"154341","description":"","caption":"Note the addition of the \"SHAPE\" field to the DataFrame, and its \"geometry\" data type.","name":"sedf_info-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:16:23","modified":"2023-12-13 22:23:21","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":529,"height":586,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","medium-width":236,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","medium_large-width":529,"medium_large-height":586,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","large-width":529,"large-height":586,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","1536x1536-width":529,"1536x1536-height":586,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","2048x2048-width":529,"2048x2048-height":586,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1-420x465.jpg","card_image-width":420,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_info-1.jpg","wide_image-width":529,"wide_image-height":586}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Last, we\u2019ll use the ArcGIS API for Python\u2019s <strong><a href=\"https:\/\/developers.arcgis.com\/python\/guide\/part3-data-io-writing-data\/#write-to-a-local-file\"><em>.to_featureclass<\/em><\/a><\/strong> method to export the final, cleaned SeDF as a feature class in a geodatabase so we can use it in further analysis in ArcGIS Pro.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2195052,"id":2195052,"title":"sedf_to_fc","filename":"sedf_to_fc-1.jpg","filesize":34841,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/sedf_to_fc-2","alt":"","author":"154341","description":"","caption":"","name":"sedf_to_fc-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:20:49","modified":"2023-12-12 18:20:49","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1107,"height":65,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1-213x65.jpg","thumbnail-width":213,"thumbnail-height":65,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","medium-width":464,"medium-height":27,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","medium_large-width":768,"medium_large-height":45,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","large-width":1107,"large-height":65,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","1536x1536-width":1107,"1536x1536-height":65,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","2048x2048-width":1107,"2048x2048-height":65,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1-826x49.jpg","card_image-width":826,"card_image-height":49,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sedf_to_fc-1.jpg","wide_image-width":1107,"wide_image-height":65}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2>Final thoughts<\/h2>\n<p>At this point, we now have a feature class that represents a gridded point dataset covering the contiguous United States.\u00a0 Each of the 481,631 grid points corresponds to the raster cell centroid of the original 4km by 4km PRISM precipitation dataset.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2195072,"id":2195072,"title":"gridded_dataset","filename":"gridded_dataset-2.jpg","filesize":232983,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/gridded_dataset-3","alt":"","author":"154341","description":"","caption":"Example of the 4km by 4km gridded dataset overlaid on the average precipitation raster from 1981-2010.","name":"gridded_dataset-3","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:24:25","modified":"2023-12-12 18:24:31","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1166,"height":758,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","medium-width":401,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","medium_large-width":768,"medium_large-height":499,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","large-width":1166,"large-height":758,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","1536x1536-width":1166,"1536x1536-height":758,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","2048x2048-width":1166,"2048x2048-height":758,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2-715x465.jpg","card_image-width":715,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/gridded_dataset-2.jpg","wide_image-width":1166,"wide_image-height":758}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p>Each of these points also contains 16 columns representing the 30-year seasonal averages of the four precipitation variables calculated in Blog #2 in the series.\u00a0 Using ArcGIS Pro\u2019s <a href=\"https:\/\/pro.arcgis.com\/en\/pro-app\/latest\/help\/analysis\/geoprocessing\/data-engineering\/what-is-data-engineering.htm\">Data Engineering view<\/a>, we can see and explore the summary statistics and distributions of each variable in the feature class.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2195082,"id":2195082,"title":"DE_view_1","filename":"DE_view_1.jpg","filesize":207542,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/de_view_1","alt":"","author":"154341","description":"","caption":"ArcGIS Pro Data Engineering view.  The 4km by 4km dataset contains 481,631 locations.","name":"de_view_1","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:26:38","modified":"2023-12-12 18:26:44","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1133,"height":565,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","medium-width":464,"medium-height":231,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","medium_large-width":768,"medium_large-height":383,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","large-width":1133,"large-height":565,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","1536x1536-width":1133,"1536x1536-height":565,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","2048x2048-width":1133,"2048x2048-height":565,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1-826x412.jpg","card_image-width":826,"card_image-height":412,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/DE_view_1.jpg","wide_image-width":1133,"wide_image-height":565}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<h2>Appendix<\/h2>\n<p>Or if you prefer&#8230; do it in R!<\/p>\n<p><strong>1. R packages<\/strong><\/p>\n<ul>\n<li style=\"list-style-type: none\">\n<ul>\n<li>{<a href=\"https:\/\/cran.r-project.org\/web\/packages\/dplyr\/index.html\">dplyr<\/a>} \u2013 working with and manipulating data frames<\/li>\n<li>{<a href=\"https:\/\/cran.r-project.org\/web\/packages\/tidyr\/index.html\">tidyr<\/a>} \u2013 creating \u201ctidy\u201d data (cleaning, wrangling, manipulating data frames)<\/li>\n<li>{<a href=\"https:\/\/www.rdocumentation.org\/packages\/data.table\/versions\/1.8.0\/topics\/data.table\">data.table<\/a>} \u2013 working with data frames<\/li>\n<li>{<a href=\"https:\/\/www.tidyverse.org\/\">tidyverse<\/a>} \u2013 collection of R packages for data science<\/li>\n<li>{<a href=\"https:\/\/rstudio.github.io\/reticulate\/\">reticulate<\/a>} \u2013 provides interoperability between R and Python, allowing you to call Python functions from within R scripts.\u00a0 This allows you to call ArcPy geoprocessing tools from within R, for example.<\/li>\n<li>{<a href=\"https:\/\/github.com\/R-ArcGIS\/r-bridge\">arcgisbinding<\/a>} \u2013 the R-ArcGIS Bridge, which connects ArcGIS Pro and R<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195322,"id":2195322,"title":"r_packages2","filename":"r_packages2.jpg","filesize":53113,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/r_packages2","alt":"","author":"154341","description":"","caption":"","name":"r_packages2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:54:27","modified":"2023-12-12 18:54:27","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":653,"height":205,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","medium-width":464,"medium-height":146,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","medium_large-width":653,"medium_large-height":205,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","large-width":653,"large-height":205,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","1536x1536-width":653,"1536x1536-height":205,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","2048x2048-width":653,"2048x2048-height":205,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","card_image-width":653,"card_image-height":205,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/r_packages2.jpg","wide_image-width":653,"wide_image-height":205}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>2. Combine all CSV files into one R data frame<\/strong><\/p>\n<ul>\n<li><strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/base\/versions\/3.6.2\/topics\/list.files\"><em>list.files<\/em><\/a><\/strong>\u00a0\u2013 returns a list of all files or directories at a specified path<\/li>\n<li>%&gt;% is a pipe operator, which makes it easy and efficient to chain multiple functions together in R. The input of each successive function is the output of the previous function.\u00a0 Here, the list of 90 CSVs is piped into the <strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/base\/versions\/3.6.2\/topics\/lapply\"><em>lapply<\/em><\/a><\/strong> function, which is used to apply a function over a List or Vector in R.\u00a0 In this case, the <strong><a href=\"https:\/\/readr.tidyverse.org\/reference\/read_delim.html\"><em>read_csv <\/em><\/a><\/strong>function is applied to each of the 90 CSV files in the list and converts it to a <a href=\"https:\/\/tibble.tidyverse.org\/\">tibble<\/a>, which is a modern version of the R data frame.\u00a0 The 90 tibbles are then piped into the <strong><a href=\"https:\/\/dplyr.tidyverse.org\/reference\/bind.html\"><em>bind_rows<\/em><\/a><\/strong> function, which is used to bind (combine) many data frames into one.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195432,"id":2195432,"title":"merged_df (1)","filename":"merged_df-1-1.jpg","filesize":86266,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/merged_df-1","alt":"","author":"154341","description":"","caption":"","name":"merged_df-1","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:03:33","modified":"2023-12-12 19:03:33","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1800,"height":331,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1.jpg","medium-width":464,"medium-height":85,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1.jpg","medium_large-width":768,"medium_large-height":141,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1.jpg","large-width":1800,"large-height":331,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1-1536x282.jpg","1536x1536-width":1536,"1536x1536-height":282,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1.jpg","2048x2048-width":1800,"2048x2048-height":331,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1-826x152.jpg","card_image-width":826,"card_image-height":152,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df-1-1.jpg","wide_image-width":1800,"wide_image-height":331}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2195442,"id":2195442,"title":"merged_df_results","filename":"merged_df_results.jpg","filesize":184587,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/merged_df_results","alt":"","author":"154341","description":"","caption":"","name":"merged_df_results","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:04:29","modified":"2023-12-12 19:04:29","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1369,"height":301,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","medium-width":464,"medium-height":102,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","medium_large-width":768,"medium_large-height":169,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","large-width":1369,"large-height":301,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","1536x1536-width":1369,"1536x1536-height":301,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","2048x2048-width":1369,"2048x2048-height":301,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results-826x182.jpg","card_image-width":826,"card_image-height":182,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/merged_df_results.jpg","wide_image-width":1369,"wide_image-height":301}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>The final combined data frame has the expected 57,795,720 rows (120 seasons x 481,631 locations) and 11 columns.<\/li>\n<\/ul>\n<p><strong>3. Data engineering and wrangling<\/strong><\/p>\n<ul>\n<li>Create a new \u201cseason\u201d column that includes only the season (not year) from the \u201cseason_year\u201d column. This requires use of the <strong><a href=\"https:\/\/www.programmingr.com\/tutorial\/sub-in-r\/\"><em>sub<\/em><\/a><\/strong> function, which is used to replace one string with another.\u00a0 In the code below, we\u2019re using a <a href=\"https:\/\/cran.r-project.org\/web\/packages\/stringr\/vignettes\/regular-expressions.html\">regular expression<\/a> within the <strong><em>sub<\/em><\/strong>\u00a0function to replace a text string pattern (e.g. \u201cwinter 1981\u201d) with only the first word in the text string pattern (e.g. \u201cwinter\u201d) and then populate a new \u201cseason\u201d column with this value.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195362,"id":2195362,"title":"sub_regex","filename":"sub_regex-1.jpg","filesize":27301,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/sub_regex-2","alt":"","author":"154341","description":"","caption":"","name":"sub_regex-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 18:57:15","modified":"2023-12-12 18:57:15","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":742,"height":126,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1-213x126.jpg","thumbnail-width":213,"thumbnail-height":126,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","medium-width":464,"medium-height":79,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","medium_large-width":742,"medium_large-height":126,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","large-width":742,"large-height":126,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","1536x1536-width":742,"1536x1536-height":126,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","2048x2048-width":742,"2048x2048-height":126,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","card_image-width":742,"card_image-height":126,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex-1.jpg","wide_image-width":742,"wide_image-height":126}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2195462,"id":2195462,"title":"sub_regex_results","filename":"sub_regex_results-1.jpg","filesize":143764,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/sub_regex_results-2","alt":"","author":"154341","description":"","caption":"","name":"sub_regex_results-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:06:12","modified":"2023-12-12 19:06:12","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1203,"height":204,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","medium-width":464,"medium-height":79,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","medium_large-width":768,"medium_large-height":130,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","large-width":1203,"large-height":204,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","1536x1536-width":1203,"1536x1536-height":204,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","2048x2048-width":1203,"2048x2048-height":204,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1-826x140.jpg","card_image-width":826,"card_image-height":140,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/sub_regex_results-1.jpg","wide_image-width":1203,"wide_image-height":204}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>Check for null values. We use <strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/base\/versions\/3.6.2\/topics\/lapply\"><em>sapply<\/em><\/a><\/strong> to apply a function that identifies nulls (<a href=\"https:\/\/www.rdocumentation.org\/packages\/base\/versions\/3.6.2\/topics\/NA\"><em><strong>is.na<\/strong><\/em><\/a>) and then totals (<strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/base\/versions\/3.6.2\/topics\/sum\"><em>sum<\/em><\/a><\/strong>) the missing values for each column in the data frame.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195472,"id":2195472,"title":"nulls_R","filename":"nulls_R-1.jpg","filesize":10053,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/nulls_r-2","alt":"","author":"154341","description":"","caption":"","name":"nulls_r-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:07:27","modified":"2023-12-12 19:07:27","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":455,"height":52,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1-213x52.jpg","thumbnail-width":213,"thumbnail-height":52,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","medium-width":455,"medium-height":52,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","medium_large-width":455,"medium_large-height":52,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","large-width":455,"large-height":52,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","1536x1536-width":455,"1536x1536-height":52,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","2048x2048-width":455,"2048x2048-height":52,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","card_image-width":455,"card_image-height":52,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R-1.jpg","wide_image-width":455,"wide_image-height":52}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2195482,"id":2195482,"title":"nulls_R_printout","filename":"nulls_R_printout.jpg","filesize":52495,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/nulls_r_printout","alt":"","author":"154341","description":"","caption":"","name":"nulls_r_printout","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:09:31","modified":"2023-12-12 19:09:31","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1320,"height":96,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout-213x96.jpg","thumbnail-width":213,"thumbnail-height":96,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","medium-width":464,"medium-height":34,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","medium_large-width":768,"medium_large-height":56,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","large-width":1320,"large-height":96,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","1536x1536-width":1320,"1536x1536-height":96,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","2048x2048-width":1320,"2048x2048-height":96,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout-826x60.jpg","card_image-width":826,"card_image-height":60,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/nulls_R_printout.jpg","wide_image-width":1320,"wide_image-height":96}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>Calculate one single average for each precipitation variable, for each season, at each location. For this, we\u2019ll rely again on the <strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/stats\/versions\/3.6.2\/topics\/aggregate\"><em>aggregate<\/em><\/a><\/strong> function to help us calculate summary statistics on subsets of data.\u00a0 Here, we calculate the average precipitation (\u201cprecip\u201d), average number of precipitation days (\u201cfrequency\u201d), average Gini Coefficient (\u201cgini_coef\u201d) and average Lorenz Asymmetry Coefficient (\u201clorenz_coef\u201d) for each of the four seasons, at every location in the dataset.\u00a0 The FUN argument specifies the summary statistic that is applied to each subset (location\/season pair), which in this case is the mean.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195512,"id":2195512,"title":"R_aggregate","filename":"R_aggregate-1.jpg","filesize":80224,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/r_aggregate-2","alt":"","author":"154341","description":"","caption":"","name":"r_aggregate-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:11:09","modified":"2023-12-12 19:11:09","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1204,"height":141,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1-213x141.jpg","thumbnail-width":213,"thumbnail-height":141,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","medium-width":464,"medium-height":54,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","medium_large-width":768,"medium_large-height":90,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","large-width":1204,"large-height":141,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","1536x1536-width":1204,"1536x1536-height":141,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","2048x2048-width":1204,"2048x2048-height":141,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1-826x97.jpg","card_image-width":826,"card_image-height":97,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_aggregate-1.jpg","wide_image-width":1204,"wide_image-height":141}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>Calculating the seasonal averages of each of the four precipitation variables reduces our data frame from 57,795,720 rows (481,631 locations x 4 seasons x 30 years) to 1,926,524 rows (481,631 locations x 4 seasonal averages at each location).<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195522,"id":2195522,"title":"R_seasonal_agg_results","filename":"R_seasonal_agg_results-1.jpg","filesize":235690,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/r_seasonal_agg_results-2","alt":"","author":"154341","description":"","caption":"","name":"r_seasonal_agg_results-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:13:13","modified":"2023-12-12 19:13:13","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":811,"height":453,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","medium-width":464,"medium-height":259,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","medium_large-width":768,"medium_large-height":429,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","large-width":811,"large-height":453,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","1536x1536-width":811,"1536x1536-height":453,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","2048x2048-width":811,"2048x2048-height":453,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","card_image-width":811,"card_image-height":453,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_seasonal_agg_results-1.jpg","wide_image-width":811,"wide_image-height":453}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>Flip the table from long to wide, such that each row represents one location, and the four seasonal averages for each of the four precipitation variables become the columns. Here we use the <strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/maditr\/versions\/0.8.3\/topics\/dcast\"><em>dcast<\/em><\/a><\/strong> function to reshape the data frame, which is essentially the same as using <strong><a href=\"https:\/\/pandas.pydata.org\/docs\/reference\/api\/pandas.pivot_table.html\"><em>.pivot_table<\/em><\/a><\/strong> with Pandas in Python.\u00a0 In the <em>formula<\/em> parameter, the left-hand side specifies the variable you want to pivot on and keep as the row index (e.g. \u201ccoordinates\u201d) and the right-hand side represents the variable you want to pass to the rest of the columns (\u201cseason\u201d).\u00a0 In other words, instead of each location having four rows for each season and four columns for the precipitation variables, the output data frame will have one row for each location and 16 columns, one for each season\/precipitation variable combination.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195552,"id":2195552,"title":"dcast","filename":"dcast.jpg","filesize":70478,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/dcast","alt":"","author":"154341","description":"","caption":"","name":"dcast","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:18:29","modified":"2023-12-12 19:18:29","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1034,"height":88,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast-213x88.jpg","thumbnail-width":213,"thumbnail-height":88,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","medium-width":464,"medium-height":39,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","medium_large-width":768,"medium_large-height":65,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","large-width":1034,"large-height":88,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","1536x1536-width":1034,"1536x1536-height":88,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","2048x2048-width":1034,"2048x2048-height":88,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast-826x70.jpg","card_image-width":826,"card_image-height":70,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/dcast.jpg","wide_image-width":1034,"wide_image-height":88}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>As expected, the output data frame contains one row for each of the 481,631 locations, with 17 columns representing the location coordinates plus 16 precipitation variables.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195582,"id":2195582,"title":"R_dcast_results","filename":"R_dcast_results-1.jpg","filesize":233382,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/r_dcast_results-2","alt":"","author":"154341","description":"","caption":"","name":"r_dcast_results-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:21:40","modified":"2023-12-12 19:21:40","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":863,"height":401,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","medium-width":464,"medium-height":216,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","medium_large-width":768,"medium_large-height":357,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","large-width":863,"large-height":401,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","1536x1536-width":863,"1536x1536-height":401,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","2048x2048-width":863,"2048x2048-height":401,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1-826x384.jpg","card_image-width":826,"card_image-height":384,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_dcast_results-1.jpg","wide_image-width":863,"wide_image-height":401}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<ul>\n<li>As we did in the first blog in this series, we\u2019ll again use the <strong><a href=\"https:\/\/www.rdocumentation.org\/packages\/tidyr\/versions\/1.3.0\/topics\/separate\"><em>separate<\/em><\/a><\/strong> function to split the \u201ccoordinates\u201d column into two new columns representing the x- and y-coordinates for each location individually.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195592,"id":2195592,"title":"R_final_df","filename":"R_final_df-1.jpg","filesize":42715,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/r_final_df-2","alt":"","author":"154341","description":"","caption":"","name":"r_final_df-2","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:22:44","modified":"2023-12-12 19:22:44","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":796,"height":143,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1-213x143.jpg","thumbnail-width":213,"thumbnail-height":143,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","medium-width":464,"medium-height":83,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","medium_large-width":768,"medium_large-height":138,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","large-width":796,"large-height":143,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","1536x1536-width":796,"1536x1536-height":143,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","2048x2048-width":796,"2048x2048-height":143,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","card_image-width":796,"card_image-height":143,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/R_final_df-1.jpg","wide_image-width":796,"wide_image-height":143}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"image","image":{"ID":2195612,"id":2195612,"title":"final_R_table","filename":"final_R_table.jpg","filesize":332163,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/final_r_table","alt":"","author":"154341","description":"","caption":"","name":"final_r_table","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:24:08","modified":"2023-12-12 19:24:08","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1188,"height":496,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","medium-width":464,"medium-height":194,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","medium_large-width":768,"medium_large-height":321,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","large-width":1188,"large-height":496,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","1536x1536-width":1188,"1536x1536-height":496,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","2048x2048-width":1188,"2048x2048-height":496,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table-826x345.jpg","card_image-width":826,"card_image-height":345,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/final_R_table.jpg","wide_image-width":1188,"wide_image-height":496}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>4. Export data for further analysis in ArcGIS using the R-ArcGIS Bridge<\/strong><\/p>\n<ul>\n<li>Last, we\u2019ll use the <a href=\"https:\/\/www.esri.com\/en-us\/arcgis\/products\/r-arcgis-bridge\/overview\">R-ArcGIS Bridge<\/a> {arcgisbinding} package to convert our R data frame to a feature class within a file geodatabase. The <em><strong><a href=\"https:\/\/rdrr.io\/github\/R-ArcGIS\/r-bridge\/man\/arc.write.html\">arc.write<\/a><\/strong><\/em> function allows you to export several different R objects (data frames, spatial {sp} and {sf} objects, {raster} objects, etc.).\u00a0 In our case, we\u2019ll pass in the R data frame, specify the x- and y-coordinate columns in the <em>coords<\/em> parameter, and pass in \u201cPoint\u201d type and the appropriate spatial reference into the <em>shape_info<\/em> parameter.\u00a0 This last step creates a new file geodatabase feature class that we can use for further analysis in ArcGIS Pro.<\/li>\n<\/ul>\n"},{"acf_fc_layout":"image","image":{"ID":2195622,"id":2195622,"title":"rbridge_write","filename":"rbridge_write.jpg","filesize":59203,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/rbridge_write","alt":"","author":"154341","description":"","caption":"","name":"rbridge_write","status":"inherit","uploaded_to":2191552,"date":"2023-12-12 19:26:04","modified":"2023-12-12 19:26:04","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":986,"height":76,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write-213x76.jpg","thumbnail-width":213,"thumbnail-height":76,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","medium-width":464,"medium-height":36,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","medium_large-width":768,"medium_large-height":59,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","large-width":986,"large-height":76,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","1536x1536-width":986,"1536x1536-height":76,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","2048x2048-width":986,"2048x2048-height":76,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write-826x64.jpg","card_image-width":826,"card_image-height":64,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/rbridge_write.jpg","wide_image-width":986,"wide_image-height":76}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"sidebar","content":"<h2 style=\"text-align: left\">Spatial data science with R, Python, and ArcGIS<\/h2>\n<p>Here are the links to all the articles of the series:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-1-clustering-us-precipitation-regions\/\">Part 1<\/a>. Clustering US Precipitation Regions<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-2-data-preparation-and-data-engineering-using-r\/\">Part 2<\/a>. Data preparation and data engineering using R<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\/\">Part 3<\/a>. Data preparation and data engineering using Python<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-4-data-preparation-using-spatial-analysis-and-automation-in-arcgis\/\" target=\"_blank\" rel=\"noopener\">Part 4<\/a>. Data preparation using spatial analysis and automation in ArcGIS<\/li>\n<li><a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-5-machine-learning-cluster-analysis-in-python-and-arcgis\">Part 5<\/a>. Machine Learning: Cluster analysis using Python and ArcGIS<\/li>\n<\/ul>\n","image_reference":false,"layout":"standard","image_reference_figure":"","snippet":"","spotlight_name":"","section_title":"","position":"Center","spotlight_image":false}],"related_articles":"","card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/cluster_map_resized.jpg","wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/AdobeStock_96810852_fixed-2.png"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>End-to-end spatial data science 3: Data preparation and data engineering using Python<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"End-to-end spatial data science 3: Data preparation and data engineering using Python\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-11T13:37:02+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"19 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\"},\"author\":{\"name\":\"Nicholas Giner\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2\"},\"headline\":\"End-to-end spatial data science 3: Data preparation and data engineering using Python\",\"datePublished\":\"2023-12-14T18:00:55+00:00\",\"dateModified\":\"2024-06-11T13:37:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\"},\"wordCount\":11,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"Data Engineering\",\"machine learning\",\"python\",\"r\",\"spatial data science\"],\"articleSection\":[\"Analytics\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\",\"name\":\"End-to-end spatial data science 3: Data preparation and data engineering using Python\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2023-12-14T18:00:55+00:00\",\"dateModified\":\"2024-06-11T13:37:02+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"End-to-end spatial data science 3: Data preparation and data engineering using Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2\",\"name\":\"Nicholas Giner\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg\",\"caption\":\"Nicholas Giner\"},\"description\":\"Nick Giner is a Product Manager for Spatial Analysis and Data Science. Prior to joining Esri in 2014, he completed Bachelor\u2019s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read\/watch shows about history.\",\"sameAs\":[\"www.linkedin.com\/in\/nicholas-giner-0282966b\",\"https:\/\/x.com\/NickGiner\"],\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/author\/nginer\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"End-to-end spatial data science 3: Data preparation and data engineering using Python","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python","og_locale":"en_US","og_type":"article","og_title":"End-to-end spatial data science 3: Data preparation and data engineering using Python","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2024-06-11T13:37:02+00:00","twitter_card":"summary_large_image","twitter_site":"@ESRI","twitter_misc":{"Est. reading time":"19 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python"},"author":{"name":"Nicholas Giner","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2"},"headline":"End-to-end spatial data science 3: Data preparation and data engineering using Python","datePublished":"2023-12-14T18:00:55+00:00","dateModified":"2024-06-11T13:37:02+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python"},"wordCount":11,"commentCount":0,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["Data Engineering","machine learning","python","r","spatial data science"],"articleSection":["Analytics"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python","name":"End-to-end spatial data science 3: Data preparation and data engineering using Python","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2023-12-14T18:00:55+00:00","dateModified":"2024-06-11T13:37:02+00:00","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/analytics\/end-to-end-spatial-data-science-3-data-preparation-and-data-engineering-using-python#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"End-to-end spatial data science 3: Data preparation and data engineering using Python"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/2dc4741deea59d3274cfa775e52501b2","name":"Nicholas Giner","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2021\/01\/headshot-e1610030307989-213x200.jpeg","caption":"Nicholas Giner"},"description":"Nick Giner is a Product Manager for Spatial Analysis and Data Science. Prior to joining Esri in 2014, he completed Bachelor\u2019s and PhD degrees in Geography from Penn State University and Clark University, respectively. In his spare time, he likes to play guitar, golf, cook, cut the grass, and read\/watch shows about history.","sameAs":["www.linkedin.com\/in\/nicholas-giner-0282966b","https:\/\/x.com\/NickGiner"],"url":"https:\/\/www.esri.com\/arcgis-blog\/author\/nginer"}]}},"text_date":"December 14, 2023","author_name":"Nicholas Giner","author_page":"https:\/\/www.esri.com\/arcgis-blog\/author\/nginer","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2023\/12\/AdobeStock_96810852_fixed-2.png","primary_product":"ArcGIS Pro","tag_data":[{"term_id":760452,"name":"Data Engineering","slug":"data-engineering","term_group":0,"term_taxonomy_id":760452,"taxonomy":"post_tag","description":"","parent":0,"count":34,"filter":"raw"},{"term_id":35661,"name":"machine learning","slug":"machine-learning","term_group":0,"term_taxonomy_id":35661,"taxonomy":"post_tag","description":"","parent":0,"count":41,"filter":"raw"},{"term_id":24341,"name":"python","slug":"python","term_group":0,"term_taxonomy_id":24341,"taxonomy":"post_tag","description":"","parent":0,"count":171,"filter":"raw"},{"term_id":30241,"name":"r","slug":"r","term_group":0,"term_taxonomy_id":30241,"taxonomy":"post_tag","description":"","parent":0,"count":19,"filter":"raw"},{"term_id":759592,"name":"spatial data science","slug":"spatial-data-science","term_group":0,"term_taxonomy_id":759592,"taxonomy":"post_tag","description":"","parent":0,"count":17,"filter":"raw"}],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1329,"filter":"raw"}],"product_data":[{"term_id":36841,"name":"ArcGIS API for Python","slug":"api-python","term_group":0,"term_taxonomy_id":36841,"taxonomy":"product","description":"","parent":36601,"count":151,"filter":"raw"},{"term_id":36561,"name":"ArcGIS Pro","slug":"arcgis-pro","term_group":0,"term_taxonomy_id":36561,"taxonomy":"product","description":"","parent":0,"count":2036,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=arcgis-pro","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2191552","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/154341"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=2191552"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2191552\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=2191552"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=2191552"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=2191552"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=2191552"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=2191552"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}