Неформальное введение в WORDNET

Заметка о WORDNET в газете

Wizard of the New Wordsmiths

His idea to link words rewrote the dictionary

Tuesday, January 22, 2002
BY KELLY HEYBOER
STAR-LEDGER STAFF

George Miller is still not sure what he was thinking when he set out to write his own dictionary. By his own definition he was mad (meaning "brainsick, crazy, demented, distracted, disturbed, sick, unbalanced, unhinged - affected with madness or insanity").

Nearly two decades ago, the Princeton University psychology professor needed a decent computerized dictionary to help devise experimental tests to determine how children's brains learn language. The major dictionary publishers, however, all wanted several thousand dollars in fees before they would turn over their software.

"I said, 'The hell with you. I'll make my own dictionary,'" recalled Miller, 81. So the Princeton professor got a small government grant and a stack of dictionaries and set out to type in all the nouns. His wife took the adjectives, while a colleague took the verbs. With that, WordNet and the next generation of dictionaries were born.


Instead of just listing the words and their definitions, Miller decided to show how every word is linked to another. Type in the word "tree" and the user gets not only the definition, synonyms and antonyms, but the hypernyms (a tree is a kind of what?), meronyms (what are the parts of a tree?) and more. The user can also find a list of thousands of trees, from yellowwood to the Tree of Knowledge, and even all words that contain the letters t-r-e-e.

At last count, the WordNet had grown into an unprecedented web of 138,838 English words linked hundreds of thousands of different ways. Linguists call Miller's project one of the biggest leaps for dictionaries since scholars sat down to write the epic Oxford English Dictionary in 1879.

Online dictionaries modeled after WordNet are being built at universities around the world for more than a dozen languages, from Basque to Bulgarian. This week, 100 of the world's top linguists are gathering in Mysore, India, for the first international conference of the Global WordNet Association. The group's ultimate goal is to develop WordNets for every language on earth and link them in one vast digital trellis of words that allows computers to provide instant, accurate translations.

"It's the golden fleece of natural language processing - a way for a machine to translate where humans can't," said Princeton University linguist Christiane Fellbaum, Miller's collaborator and an expert on verbs.

Fellbaum co-founded the global association (with a Dutch linguist) after she and Miller discovered, to their surprise, that WordNet had caught on around the world. Not only were foreign linguists building their own versions, but commercial companies were using the dictionary for their own purposes because it was available free via the World Wide Web. (WordNet is still available free. It can be downloaded at ww.cogsci.princeton.edu. It also can be sampled, in a limited form, through the Web site.)

The idea that all human language can be mapped in one mind-boggling spider web of words has caught the attention of everyone from computer software developers to the CIA. Most of the $3 million in funding Miller has received over the years to develop WordNet has quietly come from various agencies in the federal government eager to develop computers that can accurately translate documents obtained in espionage activities.

Miller, who was awarded the National Medal of Science by President George H. Bush in 1991, remains humbled by the success he has found as grandfatherly guru to a new generation of digital wordsmiths. While linguists meet in India to talk about building a global WordNet, Miller will stay home in Princeton. He says he dislikes travel and needs the time to stay on top of his dictionary.

His cluttered university office, which looks out on a wall of the building next door, contains 45 dictionaries and dozens of reference books in precarious piles. Miller keeps them all, from "Cursing in America" to the "Dictionary of Gardening," within easy reach. His well-worn favorite, "The American Heritage Dictionary, 4th Edition," is open next to the keyboard.

As his 82nd birthday approaches, Miller says he can't escape the maelstrom of words still churning in his brain. He always carries a long piece of paper in his breast pocket on which he can scrawl new words he hears.

On this day, the list includes "nephology," "caspase," "troop movement," "bacillus anthracis," and "Intifada." The introduction of the euro means he has to change the definitions of "lira," "franc," "mark" and other European currencies that are becoming obsolete.

Also, Miller's daughter recently called from Alabama to ask why the Southern word "tump" (as in, "My clumsy husband tumped over the wheelbarrow") was not in WordNet.

"So I put in 'tump,'" Miller said with a shrug.

Recently, he received an e-mail from Oxford University in England requesting permission to use the term "WordNet." Oxford, publisher of the 20-volume Oxford English Dictionary - the gold standard among scholars - is considering putting together an online web of intersecting words modeled after Miller's Princeton project.

"Oxford is developing its own WordNet," Miller said with a grin. "That means you've really arrived, when the dictionary publishers start stealing your stuff."

Kelly Heyboer covers higher education. She can be reached at heyboer@starledger.com or (973) 392-5929.


E-mail: rykov2000@mail.ru



Hosted by uCoz