How come we can't decipher the Indus script?


I just got a book on ancient civilizations. In the chapter dealing with written languages, they list Egyptian hieroglyphics, Mesopotamian pictographs, and Indus script as the three oldest known written languages. The book goes on to say Indus script has never been deciphered even though over 2,500 examples of it exist. Maybe I've watched too many sci-fi movies where a master linguist deciphers alien languages, but I really thought we had terrestrial languages mastered. What's the deal with Indus script? Is the art of linguistics still held hostage by our inability to decipher ancient languages without a "key" à la the Rosetta Stone?

bibliophage replies:

Too much science fiction? No such thing. Star Trek, for example, teaches us that a good communications officer can send a message that transcends mere language, especially if she has legs down to here and a hemline up to there. Mmmmm. Mm-HMMMmmmmm . . . er, sorry. Was I saying something?

Yes, I was. The Indus script, which was written in and around Pakistan over a period of several centuries centered around 2500 B.C., is the most famous undeciphered script, but there are many others. Other mystery writing systems include Linear A (Greece, 1800 B.C.), Zapotec (Mexico, 500 B.C.), Meroitic (Sudan, 300 B.C.), Isthmian (Central America, A.D. 200), Rongorongo (Easter Island, A.D. 1800) and Joycean (Ireland, A.D. 1900). Okay, maybe not that last one.

Why haven’t they been deciphered? It’s instructive to look at some deciphered scripts to see what makes the enigmatic writing of the Indus valley different. Script decipherment is not as easy as it’s made out to be in science fiction–and sometimes not as easy as it’s made out to be in history books. Chances are the impression you took away from school was that the Rosetta stone made it child’s play to decipher Egyptian hieroglyphics. Not so. How many schools teach that some of the best minds in the world pored over the Rosetta stone for a quarter century before it finally revealed its secrets?

One of the biggest obstacles was that the ancient Egyptians used a writing system unlike anything known when the Rosetta stone was discovered in 1799. Scholars knew about logographic systems like Chinese, where there are thousands of symbols, each normally representing a whole word or idea. They knew about alphabetic systems like Hebrew and English, where there are typically 20 to 30 symbols, each normally representing one consonant or vowel. Some scholars may have known about syllabaries, with several dozen symbols each representing one syllable, as in Japanese hiragana and katakana. But Egyptian hieroglyphics had too many distinct symbols to be an alphabet or syllabary, and too few to be logographic.

The decipherment published by Champollion in 1823 (building on work by many others, including Thomas Young) showed that Egyptian hieroglyphics were (neglecting some complications) a logo-phonetic system. In such a writing system, any given symbol can represent either an entire idea or word, or the sound (or initial sound) of that word. Some simple ideas can be expressed efficiently with a drawing of the object or an object it’s associated with. But to express an abstract idea that can’t be readily drawn, you can use a string of sounds. Suppose you want to express the English word “charitable” without an alphabet. You could draw a picture of a chair and a table (since “chair table” sounds sort of like “charitable”). This is the rebus principle. Today we may consider rebus puzzles to be nothing but a silly game, but to the ancients, they were a natural way to write a language. Other early scripts, like Mayan hieroglyphs and Mesopotamian cuneiform, are built on the same principle.

The rebus approach may seem an unwieldy way to write a language, but it’s a step up from non-linguistic pictograms. A picture of a chair and a table can only convey "chair and table," or at best an idea associated with a chair and table, such as the act of sitting down at a table. An abstract concept such as “charitable” is difficult to get across using pictograms. Writing systems built on the rebus system are a way of filling the void, but have the drawback (for us latter-day translators) that, unlike pictograms, they’ll only work in one language. For a speaker of Latin, for example, pictograms of a chair (in Latin, sella) and a picture of a table (mensa) would never suggest the word for charitable (benignus).

I go into such detail about logo-phonetic systems because the Indus script appears to have about the right number of distinct symbols (250 to 400, depending on who’s counting) to use this system. Knowing that, shouldn’t it be easier to decipher the Indus script? Not really–the decipherers of Egyptian hieroglyphics had the help of the Rosetta stone, a bilingual or bitext (parallel texts of the same message in the unknown script and a known script). No bitext for the Indus script has yet been found.

A bitext is no guarantee that decipherment will be easy. Take the case of Etruscan writing, found in Italy. At a superficial level the script is easily deciphered, since the letters are close in form to archaic Greek and Latin alphabets. But the language remains largely uninterpreted. What’s the difference? Given a piece of Etruscan writing, we have no difficulty pronouncing the words, but no idea what most of the words mean (think of a trained politician reading off a TelePrompTer). The trouble is that Etruscan is apparently unrelated to any language understood today. Champollion, the decipherer of Egyptian hieroglyphics, had the advantage of knowing Coptic, which he correctly suspected was the descendant of the ancient Egyptian language. Etruscan has left no descendants.

The dozens of Etruscan bitexts (with Latin, Greek, or Phoenician) aren’t very helpful. All they really tell you is that a given block of mysterious text means such-and-such. There’s no sure way to tell which Etruscan word corresponds to which word in the parallel text, since the order of ideas and number of words vary widely among the different languages. All is not lost, however. If, for example, a Latin word occurs several times in a text and a mystery word occurs the same number of times in the corresponding Etruscan text, you may be justified in supposing that they mean the same thing. But beware–often the two messages in a bilingual text are just paraphrases of each other, not word-for-word translations. Still, using methods like this, together with glosses (explicit translations of individual words in the documents), scholars have been able to determine–or at least make a reasonable guess at–the meanings of a couple hundred Etruscan words.

If we understand the language or a close relative or descendant of the language, it ought to be pretty easy to decipher the script, right? Not so fast. The Rongorongo script used on Easter Island after European contact almost certainly represents Rapa Nui, the well known Polynesian language of the Easter Islanders. But no one now remembers how the script symbols are meant to be read. Steven Fischer recently claimed to have deciphered Rongorongo, but his critics say “Wrong-o, wrong-o.” I don’t know if Fischer is right or wrong, but undeciphered scripts do seem to invite harebrained analysis. Jacques Guy bluntly calls them “kook attractors,” but even serious scholars aren’t immune. Hrozný, who correctly deciphered Hittite, later went down many wrong paths with other scripts.

The real kooks are those like Goropius Becanus of the Netherlands, who in 1580 proved to his satisfaction that Egyptian hieroglyphics represented Dutch. A Jesuit priest named Heras is one of scores who have claimed to decipher Indus script. Here’s one of his translations: “There is no feast in the place outside the country of the Minas of the three fishes of the despised country of the woodpeckers.” Whatever you say, padre.

You mention the 2,500 examples of the Indus script. The number of available texts now exceeds 4,000, but quantity is no indication of ease of decipherment. Some scripts have been translated with far fewer texts. Take Palmyrene, the first ancient script ever deciphered. A handful of inscriptions were found on the walls of the ruins of the city of Palmyra in Syria. Scholars knew from ancient Greek writers that the language spoken there was closely related to Syriac, a well known Semitic language. The script was obviously derived from the known Aramaic alphabet but many letters weren’t immediately identifiable. Among the ruins were several bilingual inscriptions in Greek and Palmyrene. If you know the Aramaic alphabet, it’s a fairly simple matter to use the identifiable Aramaic letters and the similarity of proper names in Greek and Palmyrene to get a good start. Then you can use your knowledge of Greek and Syriac to fill in the blanks. Your Syriac is a little rusty, you say? Not to worry–a decent Syriac dictionary will serve just as well. Soon after the first decent reproductions of Palmyrene inscriptions were published in Europe in the 1750s, Barthélemy in France and Swinton in England independently deciphered them, each taking just a few hours to finish the job. It was perhaps a bit more challenging than the cryptogram puzzles you can find in your Sunday paper, but not by much. Most decipherments, needless to say, are a good deal tougher to crack than that.

Returning to the matter at hand, is the lack of a bitext for the Indus script an insurmountable obstacle? Not necessarily. Some scripts have been deciphered without them, although not without a good deal of cleverness. Ugaritic writings, like Palmyrene,  were found in Syria (in 1929), suggesting that they too might be a Semitic language. About two dozen symbols were used, suggesting an alphabetic script. Several of the words were only a single letter long, suggesting Ugaritic used a consonantal alphabet written without vowels (as was the case with other early Semitic alphabets such as Hebrew). Applying letter frequency analysis to the problem, Hans Bauer tentatively assigned the values L and M to two Ugaritic letters. In Semitic languages, L is common as a single-letter word, but not so common in suffixes and prefixes; M is the only letter that is really common in Semitic suffixes, prefixes, and as single-letter words. 

On the assumption that related languages use similar words for common concepts (much as European languages have father/vater/pater), Bauer then used the M and L assignments to search the texts for the expected Semitic word for “king” (M-L-K or similar) and “kings” (M-L-K-K or similar). Proceeding along these lines, he found the words for “son" and the name of the god Ba`al, and so eventually determined the values of several other letters. His real insight was to guess that the word for axe might occur in the text inscribed on several axes. He turned out to be right about that, but chose the wrong phonetic values (he guessed G-R-Z-N as in Hebrew; the actual Ugaritic form was the related but not identical H-R-S-N). Édouard Dhorme later corrected the reading and finished the decipherment. One of the axe inscriptions said, in a language related to biblical Hebrew, “Unto the high priest doth this axe belong, wherefore shouldst thou keep thy hands off it!” Or something like that. It strikes me that Bauer’s guess was pretty lucky–I have two axes in my garage but have yet to inscribe either with the word “axe.” But hey, when the high priest tells me, “Inscribe the word ‘axe’ on this axe, chop-chop,” I’m not about to wait around for him to axe me politely.

Ugaritic isn’t the only language to have been deciphered without a bilingual. Georg Friedrich Grotefend made considerable progress in deciphering Persian cuneiform by looking for and finding proper names of Persian emperors known from ancient Greek and Hebrew sources. (Henry Rawlinson finished the decipherment in the 1830s.) The point is that bilinguals aren’t necessary to decipher an unknown script. Still, in the case of Ugaritic and Persian, scholars had a pretty good handle on the language the script represented before they started work. In the case of Etruscan, where the language is largely unknown, complete decipherment thus far has eluded us.

What do we know about the language the Indus script wrote? We can say little for certain, but the best guess is that it’s a language of the Dravidian family, an idea that has been around since at least the 1920s. Today most Dravidian speakers live in Sri Lanka and southern India, 800 miles or more from the Indus valley where the bulk of the Indus inscriptions have been found. But about a hundred thousand speakers of one Dravidian language, Brahui, live in western Pakistan and neighboring parts of Iran and Afghanistan, not too far west of the Indus. Contrary to earlier speculation about recent migrations, linguistic and genetic analyses show that they have been separated from other Dravidian speakers for at least several thousand years. Further evidence that Dravidian or related languages were once spoken in the general area comes from Linear Elamite inscriptions, found in the ruins of the ancient city of Susa in southwestern Iran. The script has been deciphered from a phonetic standpoint because of its similarity to Mesopotamian cuneiform, but as with Etruscan, the language remains largely unknown. A significant percentage of words in Linear Elamite appear to be of Dravidian origin, which could mean it is descended from a hypothetical Elamo-Dravidian ancestor language, or just that it borrowed a lot of words from a Dravidian language spoken nearby. In either case, the Elamite connection makes it seem more likely that a Dravidian or related language was spoken in the Indus valley when the inscriptions were made.

Many Indian nationalists, and some serious scholars, believe the Indus script writes a language of the Indo-Iranian (Aryan) branch of the Indo-European family, which includes Farsi (modern Persian), Sanskrit and Hindi. All things considered, this seems unlikely. The inscriptions go back to about 3200 B.C., which according to mainstream archaeological thinking is before any Indo-Europeans had come that far southeast. Another problem is that Indo-European peoples kept domesticated horses and used chariots and had other cultural traits not shared with the ancient Indus civilization. Indeed, according to the mainstream thinking, the arrival of the Indo-Europeans in the Indus Valley around 1800 B.C. is more likely to have been the end of the Harappan culture than the beginning of it.

If the Indus script turns out to write a language that is neither Indo-European nor Dravidian (or Elamo-Dravidian), then the chances of deciphering it are slim. In the words of Alice Kober, who helped decipher Linear B, "an unknown language written in an unknown script cannot be deciphered, bilingual or no bilingual.” There are really no other decent candidates among known languages, so we would be left with an unknown language, and the prospects of complete decipherment would be as poor as with Etruscan.

But faint hope is better than none. Sumerian is a linguistic isolate, but the script has been phonetically deciphered, and the language partly deciphered. Most of the cuneiform scripts of Mesopotamia are direct descendants of the Sumerian script, though they’re used to write unrelated languages. Babylonian and Akkadian and some other languages written in these related scripts were amenable to decipherment in part because they were members of the well understood Semitic family. The similarity of the scripts, the many Sumerian loanwords in these Semitic languages, and the unusually large number of bilingual texts have allowed scholars to reconstruct the Sumerian language with considerable success despite its being unrelated to any known language. No such combination of circumstances exists for the Indus script, and no discoveries along these lines are seriously expected.

What will we get if the Indus script is finally deciphered–great historical works that reveal the local political situation 5,000 years ago? Classic works of literature like the Egyptian Book of the Dead or the Mesopotamian epic of Gilgamesh? Insight into ancient religious practices of the sort revealed by Ugaritic? No to all the above. The sad truth is that the longest known Indus inscription is only 17 symbols long. The bulk of the 4,000 or so Indus inscriptions are believed to be simple identifying marks. Most of the inscriptions are on seals or seal impressions, similar to signet rings or rubber stamps. So even if we decipher the script and the language, chances are we’ll discover they say nothing more fascinating than "government property" or “John Smith” or "tax paid." As with the revelation that Linear B wrote an archaic form of Greek, if the Indus script is deciphered, the most interesting fact learned will be what language the ancient script wrote–that is, if it writes a language at all.

If it writes a language? They wouldn’t call it the “Indus script” if it weren’t a script, would they? Don’t be so sure. When the first inscriptions were discovered in the 1870s in and around the Indus valley of Pakistan, and when the early cities of Harappa and Mohenjo-Daro were excavated in the 1920s, archaeologists assumed that civilization and writing always went together–a complex urban culture couldn’t possibly develop without writing. The Indus sites were urban; ergo, the inscriptions were writing.

Today we recognize that civilization and writing don’t always go together. The Inca empire, for example, was urban but lacked true writing. Historian Steve Farmer now questions the assumption that the Indus script is true writing. In a recent paper, he and two linguists compare the Indus script with medieval European heraldry. Like heraldry, they say, the Indus script may consist of discrete conventional elements that serve as identification marks but don’t encode a spoken language.

This controversial idea has some points in its favor. Considering the corpus of texts as a whole, there’s a considerable amount of repetition among symbols, as would be expected if they wrote a spoken language. But there’s less repetition than expected within the texts, even considering their brevity. Further, several systems of pictograms from around the world–for example, the Vinca signs of southeastern Europe, written about 4000 B.C.–resemble the Indus script in their use of conventional symbols, but nobody believes they code a written language. 

Traditionalists have some points in their favor too. The Indus script was linear, that is, usually written with symbols following one another in a line, rather than being placed randomly or in some other geometric pattern. Linearity is found in most writing, though not exclusively so. More to the point, the characters often crowd at the end of a line, as if the writer wanted to avoid breaking up a word. This is a distinctive feature of true writing. The comparison with heraldry may not hold water either. Hittite hieroglyphics were initially considered heraldry by serious linguists but were eventually found to be true writing and deciphered. Much the same has been said about many other undeciphered scripts likewise shown to be true writing.

Still, Farmer feels so strongly that the Indus script is not a real script that he has offered a $10,000 reward for proof that it is true writing. He will accept as proof an authenticated inscription more than 50 symbols long. Farmer thinks the extant texts are all so short because they don’t write a language. The pro-language side thinks the longer texts once produced in Harappa and other cities have been lost because they were written on perishable surfaces. Certainly a long text would be a great gift to modern science. I just wish they wouldn’t use the lame excuse that they couldn’t give it to us because they ran out of Harappan paper.

Further reading

Lost Languages: The Enigma of the World’s Undeciphered Scripts by Andrew Robinson, 2002

The Story of Decipherment: From Egyptian Hieroglyphs to Maya Script by Maurice Pope, revised edition, 1999

“The Collapse of the Indus-Script Thesis: The Myth of a Literate Harappan Civilization” by Steve Farmer, Richard Sproat, and Michael Witzel in Electronic Journal of Vedic Studies, Dec.13, 2004. This and related items can be accessed from Steve Farmer’s download page at

