Sunday, July 29, 2007

McKean on Corpus: Somewhere, pigs must be flying

Yesterday, after posting on yet another depressing topic, I was hoping that the next piece of material I would stumble across might be something positive. The Sunday NYT can provide such every so often, though even thinking of that triggers thoughts of "On Language", where the odds of finding anything positive to talk about have been precisely nil to date.

Unfreakinbelievably, though, today's column provides tasty substance. Of all people, after the collection of clowns and goons they've had, Erin McKean is sitting in this week. She's part of the young superstar lexicography crew that includes Ben Zimmer, Jesse Sheidlower and others. A guest column by somebody who actually knows about words? I got dizzy. But it gets better: "Corpus: Exploring what words really mean" lays out neatly and cleanly how searchable electronic corpora ('corpuses' if you're young enough?) serve as 'microscopes' allowing us to see things about language you'd never catch with the naked eye.

Using the Oxford English Corpus, McKean starts from the surprising finding that the good old spork (see image above) is used in connection with violence a quarter of the time. Of course, it's almost always humorous violence.

I had just enough time this morning to run some numbers through my new prototype Secret Linguatext Overall Quality Evaluator. (Don't ask — it starts from a Hidden Markov Model and some techniques developed by Bill James. All tenure decisions in linguistics will soon be based on it.) Anyway, SLOQE (pronounced slow-key, patent pending) spits out the results shown on the right when asked to compare the highest quality ever reached by "On Language" before with today's result. In fact, this column compares well with top-shelf writing by people who know something about language.

I fear the NYT won't appreciate the value of transmitting reliable knowledge about language over publishing the spittle-caked rantings of a senile amateur, but these results are clear. But this is a day I never really expected to see.

High-tech spork image from here.


Anonymous said...

Is this like the White House getting Colbert to host the Correspondents' Dinner? Like the Times didn't realize who they were inviting? They accidentally got somebody who knew the score and was willing and able to take care of business.

Oscar Madison said...

The "SLOQE" is nonpareil linguablogging.

Mr. Verb said...

Well, Oscar, it's mostly disturbing. Let's see how the beta version works ... .

And Anon, I have no clue why the NYT did it, but I'm just glad they finally had a good column run under that title.