{"id":2928111,"date":"2025-08-14T00:15:19","date_gmt":"2025-08-14T07:15:19","guid":{"rendered":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2928111"},"modified":"2025-08-14T00:15:19","modified_gmt":"2025-08-14T07:15:19","slug":"vision-language-models-geospatial-analysis","status":"publish","type":"blog","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis","title":{"rendered":"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis"},"author":8452,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_acf_changed":false,"_searchwp_excluded":""},"categories":[770712,22931],"tags":[25891,665211,780681,758872,780671],"industry":[],"product":[767992,36951,36551,36561],"class_list":["post-2928111","blog","type-blog","status-publish","format-standard","hentry","category-geoai","category-imagery","tag-arcgis","tag-geoai","tag-llms","tag-pretrained-models","tag-vision-language-models","product-arcgis-image-for-arcgis-online","product-image-server","product-arcgis-online","product-arcgis-pro"],"acf":{"authors":[{"ID":8452,"user_firstname":"Vinay","user_lastname":"Viswambharan","nickname":"Vinay Viswambharan","user_nicename":"vinayv","display_name":"Vinay Viswambharan","user_email":"vinayv@esri.com","user_url":"https:\/\/www.esri.com\/arcgis-blog\/author\/vinayv\/","user_registered":"2018-10-04 22:28:54","user_description":"Principal Product manager on the Imagery team at Esri, with a zeal for remote sensing, AI and everything imagery.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/10\/vin4.png' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"},{"ID":6911,"user_firstname":"Rohit","user_lastname":"Singh","nickname":"Rohit Singh","user_nicename":"rsinghesri-com","display_name":"Rohit Singh","user_email":"rsingh@esri.com","user_url":"","user_registered":"2018-03-02 00:19:00","user_description":"Rohit Singh is Director of Esri\u2019s R&amp;D Center in New Delhi, leading the design and development of Geospatial AI capabilities across the ArcGIS platform. He has played a key role in the development of ArcGIS API for Python, ArcGIS Java Engine API, and the Linux enablement of ArcGIS. An alumnus of IIT Kharagpur, Rohit holds an MS in Computer Science with specialization in AI from Georgia Tech.","user_avatar":"<img data-del=\"avatar\" src='https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/RohitSingh_AISummit2025-213x200.jpeg' class='avatar pp-user-avatar avatar-96 photo ' height='96' width='96'\/>"}],"short_description":"Unlock the power of vision\u2013language AI in ArcGIS to analyze imagery with simple prompts\u2014no training, labels, or coding required.","flexible_content":[{"acf_fc_layout":"content","content":"<p>In recent years, the surge in sensor data from drones, satellites, and aerial platforms has made automated feature extraction increasingly important. Artificial intelligence is now playing a key role in turning this raw geospatial data into actionable information, enabling faster processing and deeper insights.<\/p>\n<p>This is where pretrained AI models play a pivotal role. Through the ArcGIS Living Atlas, users can access over 100 ready-to-use deep learning models purpose-built for GIS workflows &#8211; whether it\u2019s extracting building footprints, detecting objects, or mapping land cover change. Models like Prithvi Weather &amp; Climate (W&amp;C) go even further, enabling advanced applications like regional weather forecasting.<\/p>\n<p>These pretrained models put the power of AI into everyone\u2019s hands. You don\u2019t need to be a data scientist or train models from scratch. Just plug them into your workflows using out-of-the-box tools in ArcGIS Pro and ArcGIS Online and get high-quality results at scale.<\/p>\n"},{"acf_fc_layout":"content","content":"<p><strong>Meet the Next Generation: Vision-Language Models<\/strong><\/p>\n<p>We\u2019re entering a new era of AI &#8211; one where vision\u2013language models can extract features directly from imagery using nothing but simple English prompts. This exciting new capability is making geospatial analysis more accessible and intuitive than ever.<\/p>\n<p>While task-specific models will continue to play an important role, we\u2019re introducing a new class of AI models to the ArcGIS ecosystem: <strong>vision\u2013language models<\/strong>. Unlike the task-specific models built for a single purpose &#8211; such as detecting trees or segmenting roads &#8211; these models are true multi-taskers. They can interpret both imagery and language, and respond intelligently to natural language instructions.<\/p>\n<p>Imagine uploading an aerial image and simply asking:<\/p>\n<ul>\n<li>\u201cWhat do you see?\u201d \u2192 Returns a descriptive caption.<\/li>\n<li>\u201cSegment the lake\u201d \u2192 Outlines the water body.<\/li>\n<li>\u201cClassify these images into forest, urban, and agriculture\u201d \u2192 Instantly categorizes them.<\/li>\n<\/ul>\n<p>No model training. No labeling. Just prompt &#8211; and you\u2019re ready to go!<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2929271,"id":2929271,"title":"Vision language Models integrated with ArcGIS","filename":"Vision-language-Models.jpg","filesize":363073,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/vision-language-models","alt":"Vision Language Models integrated with ArcGIS","author":"8452","description":"Vision Language Models integrated with ArcGIS","caption":"Vision Language Models integrated with ArcGIS","name":"vision-language-models","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 22:03:03","modified":"2025-08-14 06:47:37","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":1865,"height":822,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models.jpg","medium-width":464,"medium-height":205,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models.jpg","medium_large-width":768,"medium_large-height":338,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models.jpg","large-width":1865,"large-height":822,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models-1536x677.jpg","1536x1536-width":1536,"1536x1536-height":677,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models.jpg","2048x2048-width":1865,"2048x2048-height":822,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models-826x364.jpg","card_image-width":826,"card_image-height":364,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models.jpg","wide_image-width":1865,"wide_image-height":822}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>Real Examples in Action<\/strong><\/p>\n<p>We\u2019ve integrated several Vision-Language models directly into ArcGIS. Here\u2019s a glimpse at what\u2019s now possible:<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Image Interrogation<\/strong><\/p>\n<p>Ask, \u201cWhat do you see in this image?\u201d and get back a full description of visible features\u2014roads, rivers, buildings, clouds, vegetation, even man-made structures.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2928271,"id":2928271,"title":"Image Interrogation model describing what it sees in an image","filename":"ImageInterrogation.jpg","filesize":116480,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/imageinterrogation","alt":"","author":"8452","description":"Image Interrogation model describing what it sees in an image","caption":"Image Interrogation model describing what it sees in an image","name":"imageinterrogation","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 17:06:34","modified":"2025-08-14 06:48:12","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":949,"height":454,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","medium-width":464,"medium-height":222,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","medium_large-width":768,"medium_large-height":367,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","large-width":949,"large-height":454,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","1536x1536-width":949,"1536x1536-height":454,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","2048x2048-width":949,"2048x2048-height":454,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation-826x395.jpg","card_image-width":826,"card_image-height":395,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ImageInterrogation.jpg","wide_image-width":949,"wide_image-height":454}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>Vision-Language Context Based Classification<\/strong><\/p>\n<p>Prompt the model with labels like \u201cdamaged building,\u201d \u201cintact building,\u201d \u201cdebris\u201d, and it can classify image features accordingly. This can be especially useful in post-disaster scenarios.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2929061,"id":2929061,"title":"Vision Language Context-Based Classification model classifying whether a parcel has a swimming pool or not","filename":"Vision-Language-Context-Based-Classification.jpg","filesize":149626,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/vision-language-context-based-classification","alt":"Vision Language Context-Based Classification model classifying whether a parcel has a swimming pool or not","author":"8452","description":"Vision Language Context-Based Classification model classifying whether a parcel has a swimming pool or not","caption":"Vision Language Context-Based Classification model classifying whether a parcel has a swimming pool or not","name":"vision-language-context-based-classification","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 20:56:30","modified":"2025-08-14 06:48:46","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":778,"height":472,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","medium-width":430,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","medium_large-width":768,"medium_large-height":466,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","large-width":778,"large-height":472,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","1536x1536-width":778,"1536x1536-height":472,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","2048x2048-width":778,"2048x2048-height":472,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification-766x465.jpg","card_image-width":766,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-Language-Context-Based-Classification.jpg","wide_image-width":778,"wide_image-height":472}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>Grounding DINO<\/strong><\/p>\n<p>Describe the features you\u2019d like the model to detect, such as \u201csolar panels\u201d or \u201cships\u201d, and the model returns spatially grounded detections in the form of GIS layers.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2929071,"id":2929071,"title":"Grounding DINO model being used to detect airplanes","filename":"GroundingDino.jpg","filesize":119017,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/groundingdino","alt":"","author":"8452","description":"Grounding DINO model being used to detect airplanes","caption":"Grounding DINO model being used to detect airplanes","name":"groundingdino","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 20:58:49","modified":"2025-08-14 06:49:19","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":773,"height":474,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","medium-width":426,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","medium_large-width":768,"medium_large-height":471,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","large-width":773,"large-height":474,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","1536x1536-width":773,"1536x1536-height":474,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","2048x2048-width":773,"2048x2048-height":474,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino-758x465.jpg","card_image-width":758,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/GroundingDino.jpg","wide_image-width":773,"wide_image-height":474}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>Zero-Shot Classification<\/strong><\/p>\n<p>The zero-shot classification models classify an entire image into one of the provided text labels. They use the model\u2019s pre-trained knowledge of image and text relationships to classify images based on your provided class names, such as \u201cflood\u201d, \u201cfire\u201d or \u201clandslide\u201d.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2929081,"id":2929081,"title":"Zero Shot Classification model being used to classify drone images","filename":"ZeroShotClassification.jpg","filesize":65058,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/zeroshotclassification","alt":"Zero Shot Classification model being used to classify drone images","author":"8452","description":"Zero Shot Classification model being used to classify drone images","caption":"Zero Shot Classification model being used to classify drone images","name":"zeroshotclassification","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 21:02:32","modified":"2025-08-14 06:49:53","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":713,"height":491,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","medium-width":379,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","medium_large-width":713,"medium_large-height":491,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","large-width":713,"large-height":491,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","1536x1536-width":713,"1536x1536-height":491,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","2048x2048-width":713,"2048x2048-height":491,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification-675x465.jpg","card_image-width":675,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/ZeroShotClassification.jpg","wide_image-width":713,"wide_image-height":491}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>Prompt-based Segmentation<\/strong><\/p>\n<p>With this model, you can segment features like lakes, agriculture zones, or flooded areas by simply asking. This makes it perfect for exploratory analysis or rapid mapping.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2929181,"id":2929181,"title":"Prompt Based Segmentation model being used to segment debris after a hurricane","filename":"PromptBasedSegmentation.jpg","filesize":103824,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/promptbasedsegmentation","alt":"Prompt Based Segmentation model being used to segment debris after a hurricane","author":"8452","description":"Prompt Based Segmentation model being used to segment debris after a hurricane","caption":"Prompt Based Segmentation model being used to segment debris after a hurricane","name":"promptbasedsegmentation","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 21:49:13","modified":"2025-08-14 06:50:33","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":713,"height":517,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","medium-width":360,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","medium_large-width":713,"medium_large-height":517,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","large-width":713,"large-height":517,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","1536x1536-width":713,"1536x1536-height":517,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","2048x2048-width":713,"2048x2048-height":517,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation-641x465.jpg","card_image-width":641,"card_image-height":465,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/PromptBasedSegmentation.jpg","wide_image-width":713,"wide_image-height":517}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>TextSAM<\/strong><\/p>\n<p>This model is great at extracting objects with clear boundaries and distinct shapes, such as cars, trees, buildings, etc. Prompt it with natural language &#8211; \u201cround objects, oil tanks\u201d &#8211; and it responds with pixel-accurate segmentation masks of oil tanks in the imagery.<\/p>\n"},{"acf_fc_layout":"image","image":{"ID":2929201,"id":2929201,"title":"Text SAM model being used to segment airplanes","filename":"TextSAM.jpg","filesize":77899,"url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","link":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\/textsam-4","alt":"Text SAM model being used to segment airplanes","author":"8452","description":"Text SAM model being used to segment airplanes","caption":"Text SAM model being used to segment airplanes","name":"textsam-4","status":"inherit","uploaded_to":2928111,"date":"2025-08-13 21:51:52","modified":"2025-08-14 06:50:55","menu_order":0,"mime_type":"image\/jpeg","type":"image","subtype":"jpeg","icon":"https:\/\/www.esri.com\/arcgis-blog\/wp-includes\/images\/media\/default.png","width":618,"height":462,"sizes":{"thumbnail":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM-213x200.jpg","thumbnail-width":213,"thumbnail-height":200,"medium":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","medium-width":349,"medium-height":261,"medium_large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","medium_large-width":618,"medium_large-height":462,"large":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","large-width":618,"large-height":462,"1536x1536":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","1536x1536-width":618,"1536x1536-height":462,"2048x2048":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","2048x2048-width":618,"2048x2048-height":462,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","card_image-width":618,"card_image-height":462,"wide_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/TextSAM.jpg","wide_image-width":618,"wide_image-height":462}},"image_position":"center","orientation":"horizontal","hyperlink":""},{"acf_fc_layout":"content","content":"<p><strong>Precision vs. Flexibility: Not Either-Or<\/strong><\/p>\n<p>You might be wondering: Should I use a task-specific model or a generalized one?<\/p>\n<p>The answer: <a href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis\/geoai\/task-and-generalized-pretrained-models\">Both have their place<\/a>.<\/p>\n<ul>\n<li>Task-specific models are precision tools &#8211; fast, accurate, and optimized for specific types of data (like multispectral or SAR imagery).<\/li>\n<li>Generalized vision-language models are more like Swiss Army knives &#8211; flexible, fast to deploy, and incredibly intuitive to use, though only with natural color imagery.<\/li>\n<\/ul>\n<p>The key is to use the right tool for the task. When you need scalable, high-accuracy building extraction over an entire city &#8211; task-specific wins. When you\u2019re quickly exploring imagery or asking ad-hoc questions &#8211; vision-language models shine.<\/p>\n<p>We\u2019re excited to bring this new class of AI models to the ArcGIS platform &#8211; and even more excited to see what you\u2019ll build with them.<\/p>\n<p>Curious to try them out? Explore the models in the ArcGIS Living Atlas or contact us to learn how to integrate generalized vision language models in your geospatial workflows.<\/p>\n"}],"related_articles":[{"ID":2789822,"post_author":"8452","post_date":"2025-05-12 11:31:52","post_date_gmt":"2025-05-12 18:31:52","post_content":"","post_title":"Pretrained Models in ArcGIS: Comparing Task-Specific and Generalized Vision-Language models","post_excerpt":"","post_status":"publish","comment_status":"open","ping_status":"closed","post_password":"","post_name":"task-and-generalized-pretrained-models","to_ping":"","pinged":"","post_modified":"2025-05-16 18:15:47","post_modified_gmt":"2025-05-17 01:15:47","post_content_filtered":"","post_parent":0,"guid":"https:\/\/www.esri.com\/arcgis-blog\/?post_type=blog&#038;p=2789822","menu_order":0,"post_type":"blog","post_mime_type":"","comment_count":"0","filter":"raw"}],"show_article_image":true,"card_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Vision-language-Models-in-ArcGIS-banner.jpg","wide_image":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Talk to Your Maps: Vision\u2013Language AI for Geospatial Insights<\/title>\n<meta name=\"description\" content=\"Discover how vision\u2013language AI in ArcGIS transforms imagery into insights using simple prompts\u2014no training, labels, or coding required.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis\" \/>\n<meta property=\"og:description\" content=\"Discover how vision\u2013language AI in ArcGIS transforms imagery into insights using simple prompts\u2014no training, labels, or coding required.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\" \/>\n<meta property=\"og:site_name\" content=\"ArcGIS Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/esrigis\/\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@ESRI\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":[\"Article\",\"BlogPosting\"],\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\"},\"author\":{\"name\":\"Vinay Viswambharan\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/fe6ac8773de48adda0651f5f144d2acf\"},\"headline\":\"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis\",\"datePublished\":\"2025-08-14T07:15:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\"},\"wordCount\":9,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"keywords\":[\"ArcGIS\",\"geoAI\",\"llms\",\"pretrained models\",\"Vision\u2013Language Models\"],\"articleSection\":[\"AI\",\"Imagery &amp; Remote Sensing\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\",\"name\":\"Talk to Your Maps: Vision\u2013Language AI for Geospatial Insights\",\"isPartOf\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\"},\"datePublished\":\"2025-08-14T07:15:19+00:00\",\"description\":\"Discover how vision\u2013language AI in ArcGIS transforms imagery into insights using simple prompts\u2014no training, labels, or coding required.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.esri.com\/arcgis-blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#website\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"name\":\"ArcGIS Blog\",\"description\":\"Get insider info from Esri product teams\",\"publisher\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#organization\",\"name\":\"Esri\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png\",\"width\":400,\"height\":400,\"caption\":\"Esri\"},\"image\":{\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/esrigis\/\",\"https:\/\/x.com\/ESRI\",\"https:\/\/www.linkedin.com\/company\/5311\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/fe6ac8773de48adda0651f5f144d2acf\",\"name\":\"Vinay Viswambharan\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/10\/vin4.png\",\"contentUrl\":\"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/10\/vin4.png\",\"caption\":\"Vinay Viswambharan\"},\"description\":\"Principal Product manager on the Imagery team at Esri, with a zeal for remote sensing, AI and everything imagery.\",\"sameAs\":[\"https:\/\/www.esri.com\/arcgis-blog\/author\/vinayv\/\",\"https:\/\/www.linkedin.com\/in\/vinayviswambharan\/\"],\"url\":\"https:\/\/www.esri.com\/arcgis-blog\/author\/vinayv\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Talk to Your Maps: Vision\u2013Language AI for Geospatial Insights","description":"Discover how vision\u2013language AI in ArcGIS transforms imagery into insights using simple prompts\u2014no training, labels, or coding required.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis","og_locale":"en_US","og_type":"article","og_title":"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis","og_description":"Discover how vision\u2013language AI in ArcGIS transforms imagery into insights using simple prompts\u2014no training, labels, or coding required.","og_url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis","og_site_name":"ArcGIS Blog","article_publisher":"https:\/\/www.facebook.com\/esrigis\/","twitter_card":"summary_large_image","twitter_site":"@ESRI","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#article","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis"},"author":{"name":"Vinay Viswambharan","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/fe6ac8773de48adda0651f5f144d2acf"},"headline":"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis","datePublished":"2025-08-14T07:15:19+00:00","mainEntityOfPage":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis"},"wordCount":9,"commentCount":0,"publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"keywords":["ArcGIS","geoAI","llms","pretrained models","Vision\u2013Language Models"],"articleSection":["AI","Imagery &amp; Remote Sensing"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis","url":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis","name":"Talk to Your Maps: Vision\u2013Language AI for Geospatial Insights","isPartOf":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#website"},"datePublished":"2025-08-14T07:15:19+00:00","description":"Discover how vision\u2013language AI in ArcGIS transforms imagery into insights using simple prompts\u2014no training, labels, or coding required.","breadcrumb":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.esri.com\/arcgis-blog\/"},{"@type":"ListItem","position":2,"name":"Talk to Your Imagery: Vision-Language Models for Geospatial Analysis"}]},{"@type":"WebSite","@id":"https:\/\/www.esri.com\/arcgis-blog\/#website","url":"https:\/\/www.esri.com\/arcgis-blog\/","name":"ArcGIS Blog","description":"Get insider info from Esri product teams","publisher":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.esri.com\/arcgis-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.esri.com\/arcgis-blog\/#organization","name":"Esri","url":"https:\/\/www.esri.com\/arcgis-blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/04\/Esri.png","width":400,"height":400,"caption":"Esri"},"image":{"@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/esrigis\/","https:\/\/x.com\/ESRI","https:\/\/www.linkedin.com\/company\/5311\/"]},{"@type":"Person","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/fe6ac8773de48adda0651f5f144d2acf","name":"Vinay Viswambharan","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.esri.com\/arcgis-blog\/#\/schema\/person\/image\/","url":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/10\/vin4.png","contentUrl":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2018\/10\/vin4.png","caption":"Vinay Viswambharan"},"description":"Principal Product manager on the Imagery team at Esri, with a zeal for remote sensing, AI and everything imagery.","sameAs":["https:\/\/www.esri.com\/arcgis-blog\/author\/vinayv\/","https:\/\/www.linkedin.com\/in\/vinayviswambharan\/"],"url":"https:\/\/www.esri.com\/arcgis-blog\/author\/vinayv"}]}},"text_date":"August 14, 2025","author_name":"Multiple Authors","author_page":"https:\/\/www.esri.com\/arcgis-blog\/products\/arcgis-pro\/geoai\/vision-language-models-geospatial-analysis","custom_image":"https:\/\/www.esri.com\/arcgis-blog\/app\/uploads\/2025\/08\/Newsroom-Keyart-Wide-1920-x-1080.jpg","primary_product":"ArcGIS Pro","tag_data":[{"term_id":25891,"name":"ArcGIS","slug":"arcgis","term_group":0,"term_taxonomy_id":25891,"taxonomy":"post_tag","description":"","parent":0,"count":209,"filter":"raw"},{"term_id":665211,"name":"geoAI","slug":"geoai","term_group":0,"term_taxonomy_id":665211,"taxonomy":"post_tag","description":"","parent":0,"count":36,"filter":"raw"},{"term_id":780681,"name":"llms","slug":"llms","term_group":0,"term_taxonomy_id":780681,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"},{"term_id":758872,"name":"pretrained models","slug":"pretrained-models","term_group":0,"term_taxonomy_id":758872,"taxonomy":"post_tag","description":"","parent":0,"count":5,"filter":"raw"},{"term_id":780671,"name":"Vision\u2013Language Models","slug":"vision-language-models","term_group":0,"term_taxonomy_id":780671,"taxonomy":"post_tag","description":"","parent":0,"count":1,"filter":"raw"}],"category_data":[{"term_id":770712,"name":"AI","slug":"geoai","term_group":0,"term_taxonomy_id":770712,"taxonomy":"category","description":"","parent":0,"count":51,"filter":"raw"},{"term_id":22931,"name":"Imagery &amp; Remote Sensing","slug":"imagery","term_group":0,"term_taxonomy_id":22931,"taxonomy":"category","description":"","parent":0,"count":767,"filter":"raw"}],"product_data":[{"term_id":767992,"name":"ArcGIS Image for ArcGIS Online","slug":"arcgis-image-for-arcgis-online","term_group":0,"term_taxonomy_id":767992,"taxonomy":"product","description":"","parent":0,"count":43,"filter":"raw"},{"term_id":36951,"name":"ArcGIS Image Server","slug":"image-server","term_group":0,"term_taxonomy_id":36951,"taxonomy":"product","description":"","parent":36571,"count":69,"filter":"raw"},{"term_id":36551,"name":"ArcGIS Online","slug":"arcgis-online","term_group":0,"term_taxonomy_id":36551,"taxonomy":"product","description":"","parent":0,"count":2427,"filter":"raw"},{"term_id":36561,"name":"ArcGIS Pro","slug":"arcgis-pro","term_group":0,"term_taxonomy_id":36561,"taxonomy":"product","description":"","parent":0,"count":2037,"filter":"raw"}],"primary_product_link":"https:\/\/www.esri.com\/arcgis-blog\/?s=#&products=arcgis-pro","_links":{"self":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2928111","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/users\/8452"}],"replies":[{"embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/comments?post=2928111"}],"version-history":[{"count":0,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/blog\/2928111\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/media?parent=2928111"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/categories?post=2928111"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/tags?post=2928111"},{"taxonomy":"industry","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/industry?post=2928111"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/www.esri.com\/arcgis-blog\/wp-json\/wp\/v2\/product?post=2928111"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}