{"id":185671,"date":"2014-07-28T23:41:32","date_gmt":"2014-07-29T06:41:32","guid":{"rendered":"http:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=185671"},"modified":"2018-12-18T09:58:03","modified_gmt":"2018-12-18T17:58:03","slug":"setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis","title":{"rendered":"Setting up a small budget Hadoop Cluster for Big Data Analysis"},"author":3981,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[23341,23851],"tags":[25351,25371,25381,25671,25391],"industry":[],"product":[],"class_list":["post-185671","blog","type-blog","status-publish","format-standard","hentry","category-analytics","category-data-management","tag-big-data","tag-geodata","tag-geodatabase","tag-github","tag-hadoop"],"acf":{"short_description":"At the 2014 Esri User Conference, the Big Data team gave several presentations, including two technical workshops","flexible_content":[{"acf_fc_layout":"content","content":"<p>At the 2014 Esri User Conference, the Big Data team gave several presentations, including two technical workshops entitled:\u00a0<em>\u2018Big Data and Analytics: The Fundamentals\u2019<\/em>\u00a0and\u00a0<em>\u2018Big Data and Analytics with ArcGIS\u2019<\/em>. We presented our open source GIS Tools for Hadoop (shared on\u00a0<a href=\"https:\/\/github.com\/Esri\/gis-tools-for-hadoop\">GitHub<\/a>), as well as some research that we\u2019re currently pursuing (exciting things to come!). We gave demos using both our open source tools as well as the prototype tools being currently researched.<\/p>\n<p>For the demos (<a href=\"http:\/\/chriswhong.com\/\">source data<\/a>\u00a0consisted of &gt; 170 million data points that represent all the taxi cab trips in New York City in 2013), we ran all of our analytics on a Hadoop cluster back in Redlands. A twenty node cluster may seem like a big investment (and it can be); but, it doesn\u2019t have to be. Enter the DREDD cluster\u2026<span id=\"more-40236\"><\/span><\/p>\n<p>A few months ago we created the DREDD cluster at Esri for R&amp;D use using old computers that were destined for the dump (yes, it\u2019s named after the one-and-only Judge DREDD). Hadoop is described as being able to run on clusters of commodity hardware. What better test is there than using a set of 5-7 year old desktops? Our cluster is composed of twenty computers (called nodes in Hadoop). The DREDD cluster was set up with twenty of these free computers (read: FREE). We did however update each with a new fast network card, more RAM, and a large and fast hard drive. This is all fairly inexpensive, especially when compared with new computers. Buying twenty new computers would have cost around $75,000. We were able to get the DREDD cluster up and running at much less than 1\/10th of that!<\/p>\n<p>With old computers you can expect occasional failures, but luckily for us (and you), Hadoop is built to be fault tolerant \u2013 meaning that your data is replicated across computers to protect against failures. So old hardware isn\u2019t a problem with the Hadoop infrastructure.<\/p>\n<p>From there we were able to set up our Hadoop cluster using documentation readily found online,\u00a0<a href=\"https:\/\/ambari.apache.org\/1.2.2\/installing-hadoop-using-ambari\/content\/\" target=\"_blank\" rel=\"noopener\">such as this<\/a>.<\/p>\n<p>As always, we would love to hear what you are doing with big data, what functionality you want to see in the future, and any questions or comments you have about setting up your own cluster using commodity hardware.<\/p>\n<p>Technical specs of the DREDD cluster:<br \/>\nOS: Linux (CentOS-6.5 distribution)<br \/>\nHard Drive: 1TB<br \/>\nRAM: 16 GB<br \/>\nCPU: Intel Xeon ~3.07GHz quad-core<br \/>\nSystem Deployment: Clonezilla<br \/>\nHadoop Management: Ambari<\/p>\n<p><em>Thanks to Sarah Ambrose from the Big Data portion of the geodatabase team for this post.<\/em><\/p>\n"}],"authors":[{"ID":3981,"user_firstname":"Jonathan","user_lastname":"Murphy","nickname":"Jonathan Murphy","user_nicename":"jonmurphy","display_name":"Jonathan Murphy","user_email":"jonathan_murphy@esri.com","user_url":"","user_registered":"2018-03-02 00:15:37","user_description":"Product Owner, UX Designer and Content Strategist on the Geodatabase team at Esri. \r\nWriter, musician, cockatiel whisperer and prolific world traveler.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/04\/J_Mu-213x200.png' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"related_articles":"","card_image":false,"wide_image":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Setting up a small budget Hadoop Cluster for Big Data Analysis<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Setting up a small budget Hadoop Cluster for Big Data Analysis\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta property=\"article:modified_time\" content=\"2018-12-18T17:58:03+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\"},\"author\":{\"name\":\"Jonathan Murphy\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/dec789ad68db472c6018c1c9068998be\"},\"headline\":\"Setting up a small budget Hadoop Cluster for Big Data Analysis\",\"datePublished\":\"2014-07-29T06:41:32+00:00\",\"dateModified\":\"2018-12-18T17:58:03+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\"},\"wordCount\":11,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"Big Data\",\"geodata\",\"geodatabase\",\"GitHub\",\"Hadoop\"],\"articleSection\":[\"Analytics\",\"Data Management\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\",\"name\":\"Setting up a small budget Hadoop Cluster for Big Data Analysis\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2014-07-29T06:41:32+00:00\",\"dateModified\":\"2018-12-18T17:58:03+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Setting up a small budget Hadoop Cluster for Big Data Analysis\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/dec789ad68db472c6018c1c9068998be\",\"name\":\"Jonathan Murphy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/04\/J_Mu-213x200.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/04\/J_Mu-213x200.png\",\"caption\":\"Jonathan Murphy\"},\"description\":\"Product Owner, UX Designer and Content Strategist on the Geodatabase team at Esri. Writer, musician, cockatiel whisperer and prolific world traveler.\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/author\/jonmurphy\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Setting up a small budget Hadoop Cluster for Big Data Analysis","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis","og_locale":"en_US","og_type":"article","og_title":"Setting up a small budget Hadoop Cluster for Big Data Analysis","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","article_modified_time":"2018-12-18T17:58:03+00:00","twitter_card":"summary_large_image","twitter_site":"@ESRI","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis"},"author":{"name":"Jonathan Murphy","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/dec789ad68db472c6018c1c9068998be"},"headline":"Setting up a small budget Hadoop Cluster for Big Data Analysis","datePublished":"2014-07-29T06:41:32+00:00","dateModified":"2018-12-18T17:58:03+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis"},"wordCount":11,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["Big Data","geodata","geodatabase","GitHub","Hadoop"],"articleSection":["Analytics","Data Management"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis","name":"Setting up a small budget Hadoop Cluster for Big Data Analysis","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2014-07-29T06:41:32+00:00","dateModified":"2018-12-18T17:58:03+00:00","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/product\/analytics\/setting-up-a-small-budget-hadoop-cluster-for-big-data-analysis#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Setting up a small budget Hadoop Cluster for Big Data Analysis"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/dec789ad68db472c6018c1c9068998be","name":"Jonathan Murphy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/04\/J_Mu-213x200.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2020\/04\/J_Mu-213x200.png","caption":"Jonathan Murphy"},"description":"Product Owner, UX Designer and Content Strategist on the Geodatabase team at Esri. Writer, musician, cockatiel whisperer and prolific world traveler.","url":"https:\/\/www.esri.com\/arcgis-blog\/author\/jonmurphy"}]}},"text_date":"July 28, 2014","author_name":"Jonathan Murphy","author_page":"https:\/\/www.esri.com\/arcgis-blog\/author\/jonmurphy","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Newsroom-Keyart-Wide-1920-x-1080.jpg","primary_product":false,"tag_data":[{"term_id":25351,"name":"Big Data","slug":"big-data","term_group":0,"term_taxonomy_id":25351,"taxonomy":"post_tag","description":"","parent":0,"count":36,"filter":"raw"},{"term_id":25371,"name":"geodata","slug":"geodata","term_group":0,"term_taxonomy_id":25371,"taxonomy":"post_tag","description":"","parent":0,"count":10,"filter":"raw"},{"term_id":25381,"name":"geodatabase","slug":"geodatabase","term_group":0,"term_taxonomy_id":25381,"taxonomy":"post_tag","description":"","parent":0,"count":48,"filter":"raw"},{"term_id":25671,"name":"GitHub","slug":"github","term_group":0,"term_taxonomy_id":25671,"taxonomy":"post_tag","description":"","parent":0,"count":4,"filter":"raw"},{"term_id":25391,"name":"Hadoop","slug":"hadoop","term_group":0,"term_taxonomy_id":25391,"taxonomy":"post_tag","description":"","parent":0,"count":3,"filter":"raw"}],"category_data":[{"term_id":23341,"name":"Analytics","slug":"analytics","term_group":0,"term_taxonomy_id":23341,"taxonomy":"category","description":"","parent":0,"count":1333,"filter":"raw"},{"term_id":23851,"name":"Data Management","slug":"data-management","term_group":0,"term_taxonomy_id":23851,"taxonomy":"category","description":"","parent":0,"count":927,"filter":"raw"}],"product_data":[],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/185671","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/3981"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=185671"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/185671\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=185671"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=185671"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=185671"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=185671"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=185671"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}