CSTlemma version 8.1.2

Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen, 2021, CSTlemma version 8.1.2, CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/45.
Date issued
2021-05-21
Description
CSTlemma is a lemmatizer that treats pre- in- and suffixes alike. The CST's lemmatizer can be (and already is) trained for tens of languages, also ones that require lemmatization rules that change words by adding or removing prefixes and/or infixes to obtain the lemma for the word. In Dutch, for example, the word "afgemaakt" has the lemma "afmaken", so the "ge" has to be removed, an "a" has to be inserted and the "t"-ending must be replaced by "en". New in version 8 of CSTlemma is the possibility to output the rule by which a given word is transformed to its lemma. It is also possible to just output a unique identifier for that rule - in practice, this identifier is just some kind of pointer in the datastructure that comprises the rule set. Rules for CSTlemma must be created with the affixtrain program (https://github.com/kuhumcst/affixtrain), but ready-made rules can be obtained from the net. For example, the https://github.com/kuhumcst/texton-linguistic-resources repo contains rules for about 30 languages. If you want to build CSTlemma, you not only need the source code contained in https://github.com/kuhumcst/cstlemma, but also some source code files from https://github.com/kuhumcst/letterfunc and from https://github.com/kuhumcst/parsesgml, The easiest and best way to go forward is to copy https://github.com/kuhumcst/cstlemma/blob/master/doc/makecstlemma.bash to a (linux, Mac?) folder and run that script. That will fetch all needed repositories and build cstlemma.
This item isPublicly Available
and licensed under:
 Files in this item
Name
cstlemma-8.1.2.tar.gz
Size
163.48 KB
Format
application/gzip
Description
Source code & Makefile
MD5
627b300945873cdf284b8adece6e3555
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator