Thursday, April 05, 2007

English as a fourth branch of Germanic?

After the NYT Science Times piece a month ago about the position of English within Germanic, Mr. Verb did a post on the topic (with a link to the article), and Sally Thomason had a post on "Nutty journalists' (and others') language theories" at Language Log, followed by Mark Liberman's post there on Stephen Oppenheimer's book, The Origins of the British.

At Mr. V's suggestion, I started digging around a little on linguistic angles (get the pun?) of this story, since I'm working on a somewhat related paper. All I want to do here is to address Forster's approach to language and prehistory, drawing on two of his papers, the first on genetics and prehistory from 2004, in a Traces of Ancestry, a book in honor of Colin Renfrew:
"MtDNA Markers for Celtic and Germanic Language Areas in the British Isles",
by Peter Forster, Valentino Romano, Francesco Calì, Arne Röhl & Matthew Hurles
The second paper is from 2006, and directly linguistic:
Evolution of English Basic Vocabulary within the Network of Germanic Languages" by Peter Forster, Tobias Polzin & Arne Röhl. Phylogenetic Methods and the Prehistory of Languages, ed. by Renfrew & Forster.
Unlike Oppenheimer, Forster is a geneticist and on reading the first paper, I come away with a pretty different sense of what his work is about than you get from the NYT piece or Thomason's comments. He provides clear evidence for some patterns we would or could expect from what we know about the early history and prehistory of Europe. For example, they find a big presence of 'Celtic' mtDNA markers among Icelanders: A lot of women who ended up in Iceland came from what's now Scotland. But what's striking is the amount of uncertainty. For instance, the article focuses on Germanic- and Celtic-speaking areas but the Low Countries and northern France haven't been investigated and evidence from Jutland is "scant". Jutland in particular is crucial for this topic, as the graphic of early Germanic territory (above, from wikipedia) underscores. Where they have evidence, things still don't look clear; they date the expansion of J/16231 (a "Germanic" type) in Europe to "5000 years ago, with a high standard error of 3000 years" (p. 107). And the conclusions they draw are hardly dramatic (p. 108):
One could argue with some justification that the genetic data are at present too imprecise to deliver reliable dates and geographic origins for fine-grained linguistic studies, and quite reasonably one could go even further and claim that female migrations are largely irrelevant to language spread.
I would argue the first point given what I see, but the second is too strongly stated (sometimes female migrations do, sometimes they don't figure in a big way.) They continue:
Nevertheless, on the basis of the current limited genetic evidence, a Neolithic timescale for the initial spread of Indo-European languages such as Germanic and Celtic within Europe (Renfrew 1987) appears at least as likely as the traditionally assumed shallower time depth.
For a chapter in a book honoring Renfrew, a nod toward his (controversial) views is hardly out of place, and the bottom line here is inconclusive.

But as we turn to language, allow me to hammer the obvious: DNA tracks genes and not language. Our little neighborhood here in Madison includes people who I can say with confidence carry genes from Europe, the Near East, Africa, the Americas (North and Central, at least) and Southeast Asia. Virtually all these folks, certainly the younger ones, are native speakers of American English, typically local Wisconsin English. And people got around plenty in the old days, even without international flights.

OK, then, to language. The critiques are right: These folks don't understand how historical and comparative linguistics works. They try, for sure, and the opening has some promise:
Within the Germanic languages, English basic vocabulary appears to be an anomaly. The English language is thought, by some, to be closely related to Frisian on the basis of morphological and phonological considerations … . However the postulated English-Frisian relationship is not reflected in shared lexical innovations.
We need to cut "by some", since the fundamental structures of early English are obviously and undeniably what we expect from West Germanic speakers from the North Sea area. Whether there was an 'Anglo-Frisian' subgroup is an old and thorny problem, but that's another game. And specialists are utterly unsurprised and unbothered by the lexical relations: English has borrowed massively from Norse and Romance in ways that dramatically skew any kind of lexicostatistical approach. (de Vries's etymological dictionary of Old Norse is incomplete, I'm pretty sure, but it has upwards of five pages of borrowings between Norse and English, in small type, five columns to the page.)

They then use phylogenetic software (Network 4.106) to produce unrooted networks of 19 languages and dialects, some dead and others living, based on 56 words from the Swadesh 100-word list. These were coded etymologically and 'variable' words were entered as if they were amino-acid sequences. Let me sketch two really basic matters:

First, information about lexical relations is very valuable, but this is trying to saw a board with a ball peen hammer. They concede (p. 132) that "an evolutionary 'tree' of languages is an idealized concept that may not exist in reality." Well, we can get pretty good trees for genetic relationships often enough, but North Sea Germanic clearly evolved in a big messy soup of language contact and that shines through obviously in the vocabulary. That is, how the English lexicon relates to Germanic overall is much more a matter of horizontal than vertical relationships, contact rather than inheritance. The role of horizontal versus vertical relationships is a longstanding problem in genetic linguistics and I gather that it's now a real area of activity in biology.

Second, the range of variation differs tremendously between DNA (very stable, so that changes tell us something pretty big) and words (much more unstable, due to borrowing, lexical replacement, and so on). They seem to uncritically import the notion of 'mutation rate', but that doesn't fit lexical data in such a simple way. When you argue that "'hund' has changed to 'dog'" you've missed a big point: We still have hund in English, cf. hound, but it's not the most common or general word for canines. In fact, the words they identify as setting English apart are striking: you, small, know, dog, black, bird, neck. These in fact still have historical Germanic cognates around in English: thee/thou (archaic), clean, wit, hound, swarthy, fowl. On the other term, they discuss the problems of the semantic field of neck in Germanic and they (reasonably) chose German Hals instead of Nacken. It's hard to argue (though some do) that you can make much out of this small number of words anyway, given the very high level of noise in the signal here. The real point is that these words haven't been 'lost' in the sense that I understand DNA markers are. Their status has changed, with narrowing or shift of meaning. That kind of low-level shuffling in the lexicon is pretty clearly promoted by language contact, although we don't really understand how that stuff works yet.

But even if they are wrong in many ways, both papers do contain material that could help advance our understanding of language history and prehistory. In a forthcoming paper, Michele Loporcaro (Papers from the 17th International Conference on Historical Linguistics, now being printed) talks about the relationship between synchronic and diachronic phonology, concluding with this:
I can think of no better conclusion than the comment Larry Hyman made when I presented part of this research at GLOW 2005 in Geneva: “Well, in a way what you are implying is that historical linguistics should be done by historical linguists”. Yes, this was exactly my point.
I'd argue that it increasingly can and should be done by research groups including other specialists, as long as you have a good core of historical linguists involved. I would urge Forster to collaborate with historical linguists. For example, they rely on Brett Kessler's on-line version of the Hêliand as a lexical source, but don't cite his excellent book, The Significance of Word Lists. That book, written by somebody well known for his command of quantitative methods, could have saved them some problems in understanding how word lists work in practice. (This is a good place to note that Kessler will be the keynote speaker at the Association for Computational Linguistics session in Prague this summer on Computing and Historical Phonology, see here.)

In his original post, Mr. Verb quoted the devastating critique that Ringe & Eska published on some of Forster's work with Alfred Toth (see more of the same, here). Had they brought a historical linguist onto their team, they could have avoided those problems and reached (very different!) conclusions that could have moved the wheel forward part of a turn.

(And thanks to various folks, especially my buddy the former biologist, for discussions on this topic; hope I didn't garble anything.)

Update, Friday, 6:27: As somebody who's not really a blogger, I've already tinkered with this post and will continue to, as things come into focus. Anyhow, I'd welcome comments.

1 comment:

Mr. Verb said...

Thanks for finally getting to this. The check's in the mail. Ha.