How to make a dictionary: WS 2008-9

Summary and Roundup

Dictionary evaluation matrix

Task

Evaluation Matrix

Dictionary/Criterion Oxford Advanced Learners Dictionary Oxford English Dictionary Duden Deutsche Rechtschreibung Longman Pronunciation Dictionary Collins German dictionary en.wikipedia.org en.wiktionary.org www.beolingus.de IATE (iate.europa.eu)
Required language/controlled vocabulary English/3500 words English/na German/na English English or German English, available in other languages In independent projects/na English/na English or German One of 24 EU languages
Writing system/Pronunciation Latin/IPA Latin/na Latin/na Latin/IPA Latin/IPA Latin/na Latin/na Latin/idiosyncratic transcription system standard script of the languages, most Latin/na
User Model Some knowledge of English, used for definitions, syntactic use, orthography, some pronunciation References for a word in literature Principles of German orthography required to create hypothesis Only for pronunciation, orthography has to be clear Orthography of source language, disambiguation of homonyms requires knowledge of target language Advanced knowledge of English Advanced knowledge of English Orthography of source language, disambiguation of homonyms requires knowledge of target language Fluency in source and target language, some experience of subject fields or time to get into subject field
Authors List of editors but no authorship for individual articles. List of editors but no authorship for individual articles. Reference to an editorial board, but no authorship for individual articles. John C. Wells, qualification given List of editors but no authorship for individual articles. Every article has an editorial history: all contributors are listed in the order of participation. Some authors use pseudonyms; characterization of some authors and clear names available. Every article has an editorial history: all contributors are listed in the order of participation. Real names available for most contributors; contributor characterization available. List of contributors without profile Technical translators of European institutions, not explicitly mentioned
Age of article na (only publishing date of book) na (only publishing date of book) na (only publishing date of book) na (only publishing date of book) na (only publishing date of book) Available with date and time in the editing history of each article Available with date and time in the editing history of each article na na
Languages included en en de en de/en en en de/en, others available official languages of the EU, currently 24 languages
Number of Articles as given by the source (online sources accessed: 2008-07-16 about 11 am, GMT) about 63000 about 130,000 280,000 2,458,045 859,654 about 8.4 million headwords
Amazon.com sales rank/as of July 16th 2008 (hardcover if available) PB 131,921 189,588 6,535,371 298,303 1,608,142 na na na na

Lexicon Theory

Analysing a lexicon article

Lexicon theory
the study of structures and compositions of lexicons
Lexicography
the craft of writing/creating lexicons
Lexicographer
humble drudge (according to Samuel Johnson)

Microstructure

Orthography Transcription POS Definition 1 Usage examples ...
unit /juː nɪ t/ n a single thing, person or group that is complete in itself, although it can be part of sth larger a family unit

unit, /juː nɪ t/, n, a single thing, person or group that is complete in itself, although it can be part of sth larger, a family unit

Meaning

Sign and Content

According to Saussure's 'dyadic' (two-part) model of the sign, each sign is composed of:

  1. a 'signifier' (signifiant): the form of a sign
  2. the 'signified' (signifié): the concept of a sign

Relation between signifier and signified

The relation between signifier and signified is arbitrary, nevertheless a community must have a sort of agreement to be able to communicate with each other. Where this does not happen there are different languages.

Saussure's sign model

Saussures Sign Model

About concepts

Concepts are an abstract construct. They are intended for referring to the mental image of something rather then the concrete object "in the world". As the mental image may not be constant for every person (see for example the Sapir-Whorf hypothesis), concepts are only used in discussions and the concrete reference to the real world may be argued about.

A triadic model: the semiotic triangle

There is another dimension, illustrated by the semiotic triangle, it is the dimension of perception.

Semiotic triangle

The semiotic triangle expresses the difference in the perception of the concept and the concept itself.

Subject to change: Semantics

Pejoration
A word takes on a negative meaning in the course of the years
Example: German word "Weib" "Frau" --> something negative
Narrowing
Meaning becomes more specific
English "Chant" used to mean "sing", but now it can only be used for singing in church or on a boat.
...
...

Types of Definitions

All: a term (definiendum) is defined.

Rules for definitions

Problems with the rules of definition

Relation between words and their meaning

Semantic Relations

Synonymy:
Two words that have (more or less) the same meaning.
Example: Grandmother|Granny; pass away|die;
Polysemy:
One word that have two (or more) (closely) related meanings.
Example: unit: family unit| entity
Homonymy:
Two words that have the same form, that is: they look the same, are pronounced the same, belong to the same grammatical class but their meaning is not at all related.
Example: bank (finances)| bank (river)
Antonymy:
Two words that are closely related (share the same word class, distribution, have the same hypernym) but their meaning is "opposite" to the other are called an antonym. In terms of logic a word A is an antonym of word B iff when A is true, B is not true.

Distinguishing polysemy and homonymy

The distinction between polysems and homonyms is not easy in some cases. When in doubt they are distinguished by their etymology (if they are derived from the same word they are polysems, if they are derived from two different words they are homonyms).

IS-A relation

PART-OF relation

Ambiguity

Word Sense Disambiguation

Morphology

Basic concepts of morphology

Simple word:
  • Consists of only one morpheme
  • Example: boy, man, radio, book, paper, magnet, house, compute
Complex word:
  • contains more than one morpheme (i.e. ≥2 morphemes).
  • Example: computer, boys, radio-recorder, bookshelf, magnetize, acidfree

Basic concepts of morphology

Free morpheme:
  • can occur as a simple word.
  • Example: boy, man, radio,...
Bound morpheme:
  • can only occur in connection with other morphemes
  • Example: -s, -ion, un-, -ize, ...
Allomorph
  • Variant forms of a morpheme
  • Example: a -- an, plural -s /s/ -- /Iz/ -- /z

What about words?

Root:
  • Carries the meaning
  • Example: unbelievable “believe”
Affixes:
  • Other parts [bound morphemes]
  • Examples: prefixes, suffixes, infixes, circumfixes
Prefix:
  • Affixes that attach before the root
  • Examples: anti-, -de-, un-, (see below)
Suffix:
  • Affixes that attach after the root
  • Example: -ed, -s, -ment, (see below)
Base
  • Form to which an affix is attached
  • Example: In unbelievable “un” is a prefix and “able” a suffix. When attached one after the other unbelieve is the base to which "-able" is added.

Derivation

Derivation is the process of adding a morpheme to a base by which the meaning and/or wordclass of the base changes

Consequences of derivation in English

Zero derivation

BASE Derived Meaning
Xerox (the company) to xerox make a photocopy
thread to thread to put a thread through the eye of needle; can also be used metaphorically
house to house to shelter someone or something
bottle to bottle to fill something into a bottle

Other types of bound morphemes

The following do not exist in English and arguably in German only, but in other languages

Infixes
a morpheme is inserted into another morpheme (in many semitic languages)
Circumfix
at the same time of adding something in the front, something in the back is added; some regard this as prefix + suffix. German example: ge- [root]-t as in ge-heiz-t

Inflectional morphology

Terms

Lexeme
A lexeme is an abstract concept denoting all possible forms of a word independent of inflection (and sometimes derivation)
Lemma
A lemma is a concrete representation of a lexeme, sometimes realized as a stem or base.
Stem
A stem is the part of a word after the removal of all affixes (see also root)

Use of inflections in English

Derivation or inflection?

Compounding

Constituents of compounds

Compound or not?

Classes of compounds

Tatpurusa: endocentric
A+B is a kind of B: the second part (the head) is modified
Bahuvrihi: exocentric
A+B is attributed to C: Refers to something not part of the thing itself
Dvandva: copulative
A+B denotes the sum of both:
Tatpurusa: endocentric
A+B is a kind of B: the second part (the head) is modified
Bahuvrihi: exocentric
A+B is attributed to C: Refers to something not part of the thing itself
Dvandva: copulative
A+B denotes the sum of both:

Other morphological processes

Social group identification by language

Lexicography and other linguistic fields

Figure: Relation of lexicons to other fields of linguistics

Creating lexicons: why?

Where does the lexicon data come from?

The bread and butter: IGT

Original Then I must be thy lady: but I know
Gloss then I will be your wife but I know
Gloss (German) dann ich muss sein deine Dame aber ich weiß
POS DP PP Aux V Possessive Pron N Conj PP V
"Translation" If you say you are my husband, then it sounds logical to me that I am your wife.

Resulting lexicon

then
Modern: then
German: dann
POS: DP
I
Modern: I
German: ich
POS: PP

Creating a lexicon: It's magic

Examples for lexical resources from the research context

Finally: the exam

Class website

Return