ArcUser Online

Search ArcUser


E-mail to a Friend

Language Map Server
By Östen Dahl and Ljuba Veselinova, Department of Linguistics, Stockholm University, Sweden

Editor's note: Although nearly 7,000 languages are spoken throughout the world, where these languages are spoken remains largely unknown except to specialists. Linguists are gradually adopting GIS, but its tools have not been effectively adopted. The authors, members of the Department of Linguistics at Stockholm University, have created an ArcIMS-based Language Map Server that demonstrates the flexibility of GIS in mapping an area of great linguistic diversity and presenting languages in a spatial and physical context.

Current Language Mapping

click to enlarge
Figure 1: This traditional map of the Caucasus was originally compiled by the Central Intelligence Agency and quoted here from the University of Texas Perry-Castaņeda Map Library Collection.

The prevailing approach to language mapping, whether in printed atlases or on various kinds of electronic maps, is to use polygons to show the approximate boundaries of individual languages or language groups. Such a strategy turns out to be highly unsuitable for mapping a greater part of the modern languages for a number of reasons. The main ones concern visibility, especially for smaller languages; accuracy for the indicated locations; and ways to model human languages from a geographic perspective. A detailed discussion of these issues was presented in GISLI (GIS in Linguistics)—An Interactive Language Atlas, a research proposal submitted to the Swedish Research Council by one of the authors (Dahl) in 2005. A shortened version is included here.

Consider the following simple statistics from the latest edition of The Ethnologue: Languages of the World, an online reference database for the languages of the world.

  • There are 6,912 languages spoken around the globe.
  • Of the world's languages, 347—or approximately 5 percent—have more than one million speakers.
  • The remaining 95 percent of the world's languages are spoken by only 6 percent of the world's people.

Thus, most of the world's languages can be described as "small" languages in terms of the number of people who speak them. Small languages tend to be represented in a rather inefficient way on traditional maps and more often than not, they are completely absent. An example of one of the better traditional maps (shown in Figure 1) illustrates this point. This map of the Caucasus was originally compiled by the Central Intelligence Agency and quoted here from the University of Texas Perry-Castaņeda Map Library Collection.

This map successfully conveys the substantial variety of peoples and languages in the Caucasus. This region is rightly referred to as the Jigsaw Puzzle of Languages. However, the map is otherwise of limited use because

  • It can only be viewed as is; very little ancillary information is indicated on the map.
  • Language locations are only approximations.
  • No other information about the mapped languages can be added without making the display too crowded.
  • Except for Rutul, Tabarastan, and Tsakhur, other languages with 50,000 speakers or fewer are not indicated on the map. However, there are more than 20 languages with 50,000 or fewer speakers in this area, so this is a substantial omission.
  • Information sources are not indicated on the map and, consequently, are difficult to verify. The area indicated for the Rutul language on the map in Figure 1 (listed as item 11 in the map legend) is incomplete when compared to the information about this language found in Jazyki Narodov SSSR (The Languages of the Peoples of the Soviet Union) by Vicktor Vinogradov. According to the map, Rutul is spoken in a continuous area in southern Russia located near the Russian-Azerbaijani border.

However, according to Vinogradov, there are also villages in Azerbaijan where Rutul is spoken. When those additional villages are mapped, the locations where Rutul is spoken do not form a continuous area. As pointed out in the previously cited paper by Dahl, languages with several thousand speakers or fewer ought to be mapped on the settlement level—something that GIS makes possible. Relating individual languages to specific populated places will hereafter be referred to as geocoding languages.

The Limitations of Polygons

In addition to issues of location accuracy and displaying information about smaller languages, using polygons to map languages has another serious drawback. The map in Figure 2 is a snapshot of the Caucasus region from the mapping service by Global Mapping International ( This map uses the same approach as the previous map. It gives the rather erroneous impression that dialects or languages are discrete entities with clearly definable boundaries. In fact, most linguists now recognize that setting such clear boundaries is not possible.

click to enlarge
Figure 2: The approach used for this map service gives the erroneous impression that dialects or languages are discrete entities with clearly definable boundaries. Most linguists recognize that setting such clear boundaries is not possible.

Another serious problem concerns the language-dialect distinction. Anyone who has taken a basic linguistics class knows that the distinction between language and dialect is one of the great unsolvables in language science. The definition of what constitutes a language typically involves political, cultural, and other factors. The distinction is rarely based only on linguistic features.

The level of detail offered by GIS tools makes it possible to design a language mapping system that is flexible enough to reflect not only different views of the language-dialect dichotomy for individual languages and various levels of language groupings. Using point files—instead of polygons—for languages and dialects eliminates the need to set up unrealistic boundaries.

In summary, present-day language geography makes an insufficient use of GIS tools. Moreover, whenever GIS tools are used, they are used to implement the traditional approach to language mapping (i.e., using polygons to represent languages). The maps thus generated may be inaccurate as far as language location is concerned and erroneous in the sense that languages and their varieties are represented as discrete entities with clearly definable boundaries. In addition, these maps are not flexible enough to show different levels of language groupings. These levels range from local varieties to larger dialect groups or higher-level language groups or families. Finally, smaller languages-which are the majority of languages of the world—are represented in an inefficient manner or not mapped at all.

Continued on page 2

[an error occurred while processing this directive]