Salama Dictionary is based on a corpus of 20 million Swahili words.
Salama Dictionary is not like traditional dictionaries.
When the user enters a word, the system first analyses the given word form and then searches in the database those dictionary entries
that match with the base form of the given word. If the base form matches with more than one entry, all matched entries are produced.
The result has typically three sections.
Dictionary entry
The dictionary entry has normally the following features: (1) base form, (2) part-of-speech, (3) gloss or glosses in English, (4) etymology, (5) frequency in the corpus.
Use examples
The system searches for examples of use in the corpus.
To avoid systematic errors in example selection, all example sentences were initially randomly shuffled.
Then out of this shuffled example corpus examples are retrieved, depending on how many examples the user wants to see.
By selecting option 1 the user gets just one example. Option 3 produces three examples.
If the user wants to find more examples, it is possible to use the option 5 EXAMPLES.
This option shuffles the examples again and retrieves the five first examples from this list.
The option MORE EXAMPLES uses a new implementation of the dictionary system. Below is more information on the new dictionary.
Each example sentence is translated into English. For saving space, example sentences were cut on both sides of the hit.
This causes inevitably problems in translation.
Analysis
The word form entered in the text box is analysed on the last line. In case of inflected verb forms, the result is quite reliable.
If there is not enough morphological information in the entered word, the analysis may be different than what the user intended.
Multiword expressions
Also various types of multiword expressions are identified by the system.
Multiword expressions are marked by the underscore '_' between words.
Such marks are in dictionary entries as well as in example sentences.
The distionary user may search for such expressions as 'anapiga picha' and get the dictionry information for 'piga_picha',
together with corresponding use examples.
Note that when you write a multiword expression, DO NOT put an underscore between words.
Cross references
An entered word or multiword expression may produce also references to other words and multiword expressions.
Revised version of the dictionary
Option MORE EXAMPLES on the list opens a new version of the dictionary.
It is based on a more extensive and updated corpus. It is also more precise than the earlier version (options 1-4).
While the earlier version excluded some very common words, the new version is comprehensive.
In addition to all words in the corpus, it also contains a large number of words that do not occur in the corpus.
Such words have the frequency number '0'. Accuracy in searching for use examples is increased.
Criteria in searching for examples have been added.
While in the earlier version examples were searched on the basis of the base form only, in the new version also part-of-speech
and noun class information has been used.
Also multiword expressions are displayed differently in the new version.
While in the old version the 'hit' in example sentence is displayed so that part of the surface form is before
and part after the hit, in the new version the hit is displayed after the multiword expression.
In the new version also not-searched multiword expressions are marked by underscore '_' in example sentences.
English-Swahili dictionary
Option FIND ENGLISH on the selection list opens the possibility of using the system for searching Swahili words
on the basis of English lexical words.
For example, if you enter photograph, you will get lexical infrmation for 'picha N' and 'piga_picha V'.
You will also get use examples for both types of uses.
Note that when you are using English-Swahili dictionary, you must enter the word in base form.
Inflected forms are not allowed.