| |||
GWA Mailing list EuroWordNet Samples Web WordNet Interface Consulting Wordnet LINGVA::WORDNET ~ ~ Wordnet is a lexical database of the English language organized according to current psycholinguistic theories of human lexical memory. Developed at Princeton's Cognitive Science department in the early 1990s, Wordnet was possibly the first undertaking to produce a machine-interpretable collection of English on a large scale. And like all really helpful and important software projects, it's Open Source and just waiting to be used and extended. Concepts In Wordnet The Wordnet package consists of several text database files, text indexes for those files, binaries for searching the files, and the source code for those binaries. A brief example can illustrate the functionality of the system: % wn canary -n1 -hypen Synonyms/Hypernyms (Ordered by Frequency) of noun canary Sense 1 fink, snitch, stoolpigeon, stoolie, sneak, canary => informer, betrayer, rat, squealer => informant, source => communicator => person, individual, someone, somebody, mortal, human, soul => life form, organism, being, living thing => entity, something => causal agent, cause, causal agency => entity, something This example of the wn program searches for "sense #1" (-n1) of the noun "canary", and displays its hypernyms (which I'll talk about shortly). Entries in the Wordnet databases are called synsets (sets of synonyms); in the case of this entry, "fink", "snitch", "stoolpigeon", "stoolie", "sneak", and "canary" are all considered synonyms for this particular word sense, and thus are displayed as members of the same synset. A synset can then be understood to be all words sharing the same essential meaning. Consequently, the second sense (-n2) of "canary" in Wordnet is that of a "singer", the third refers to the color "canary" or "canary yellow", and the fourth is the bird. The collection of synsets for a given word is equivalent to the entries you might typically find in a dictionary entry, but includes a plethora of other data. This isn't your grandpa's dictionary. It's your grandson's. A hypernym is the type to which something belongs: a boat is a type of transport, and a fish is a kind of animal. The hierarchy of hypernyms shown above proceeds from the most specific synset to the most general synset: entity, something. You can read this hierarchy as "a fink/canary is an kind of informer, which is a kind of informant, which is a communicator, which is a person, which is a life form; a person is also a causal agent, and both life form and causal agent are kinds of entities." This hierarchy is generated using word relation pointers between synsets. First, let's look at the index entry for "canary": canary n 4 3 @ ~ #m 4 0 07263970 07137082 03881697 01055943 ~ ~ Mapping an Upper Ontology to WordNet ~ ~ Ontologies are becoming extremely useful tools for sophisticated software engineering. Designing applications, databases, and knowledge bases with reference to a common ontology can mean shorter development cycles, easier and faster integration with other software and content, and a more scalable product. Although ontologies are a very promising solution to some of the most pressing problems that confront software engineering, they also raise some issues and difficulties of their own. Consider, for example, the questions below: · How can a formal ontology be used effectively by those who lack extensive training in logic and mathematics? · How can an ontology be used automatically by applications (e.g. Information Retrieval and Natural Language Processing applications) that process free text? · How can we know when an ontology is complete? In this paper we will begin by describing the upper-level ontology SUMO (Suggested Upper Merged Ontology), which has been proposed as the initial version of an eventual Standard Upper Ontology (SUO). We will then describe the popular, free, and structured WordNet lexical database. After this preliminary discussion, we will describe the methodology that we are using to align WordNet with the SUMO. Finally, we will close this paper by demonstrating how this alignment of WordNet with SUMO will provide answers to the questions posed above. SUMO The SUMO (Suggested Upper Merged Ontology) is an ontology that was created at Teknowledge Corporation with extensive input from the SUO mailing list, and it has been proposed as a starter document for the IEEE-sanctioned SUO Working Group. The SUMO was created by merging publicly available ontological content into a single, comprehensive, and cohesive structure (Niles & Pease, 2001). As of July 2001, the ontology contains 654 terms and 2351 assertions. The ontology can be browsed online (http://ontology.teknowledge.com:8080/rsigma/SKB.jsp) , and source files for all of the versions of the ontology can be freely downloaded (http://ontology.teknowledge.com/cgi-bin/cvsweb.cgi/SUO/). WordNet WordNet is an extremely large and freely available online database. The database is divided by part of speech into nouns, verbs, adjectives, and adverbs. The nouns are organized as a hierarchy of nodes, where each node is a word meaning or, as it is termed in WordNet, a synset. A synset is simply a set of English words that express the same meaning in at least one context. For example, {accession, addition} is a synset which expresses the meaning of adding to something that you have already. In version 1.6 of WordNet, there are 66, 054 noun synsets, 17,944 adjective synsets, 3,604 adverb synsets, and 12,156 verb synsets As an example of a record in the WordNet database, consider the following. 00047131 04 n 02 accession 0 addition 0 001 @ 09536731 n 0000 | something added to what you have already; "the librarian shelved the new accessions"; "he was a new addition to the staff" ~ ~ Wordnet Tables DMP3A Parsing and Mapping Instructions Automated link generation: can we do better than term repetition? ~ ~ Automated link generation: can we do better than term repetition? Stephen J. Green Microsoft Research Institute, School of MPCE, Macquarie University Sydney NSW 2109, Australia sjgreen@mri.mq.edu.au Abstract Most current automatic hypertext generation systems rely on term repetition to calculate the relatedness of two documents. There are well-recognized problems with such approaches, most notably, they are vulnerable to the linguistic effects of synonymy (many words for the same concept) and polysemy (many concepts for the same word). I propose a novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sets of related words in a text. I will also present the results of an empirical study designed to test this method in the context of a question answering task from a database of newspaper articles. Keywords Thinking XML: Basic XML and RDF techniques for knowledge management ~ RDF ! ~ Contents: Introducing WordNet Setting up WordNet/RDF A lexical layer for the issue tracker All is not sweets and candles Conclusion Resources About the author Part 1: Generate RDF Part 2: Combining files Part 4: Issue tracker schema Part 5: Defining RDF Part 6: RDF Query Part 7: Review and relevance Subscribe to the developerWorks newsletter Also in the XML zone: Tutorials Tools and products Code and components Articles Knowledge from semantics Uche Ogbuji (uche.ogbuji@fourthought.com) Principal consultant, Fourthought, Inc. November 2001 Elec Dictionaries Elec Dictionaries - Ken Litkowsky GermaNet - Homepage GermaNet is a lexical-semantic net that has been developed within the LSD Project at the Division of Computational Linguistics of the Linguistics Department at the University of Tübingen. It has been integrated into the EuroWordNet (EWN), a multilingual lexical-semantic database. |