Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and European Commission, 2011,
DK-CLARIN Rapid Parallel Corpus 1993-2003 (da-en-de), CLARIN-DK-UCPH Centre Repository,
http://hdl.handle.net/20.500.12115/28.
| dc.creator | Hansen, Dorte Haltrup |
| dc.creator | Offersgaard, Lene |
| dc.date.accessioned | 2018-06-22T09:44:11Z |
| dc.date.available | 2018-06-22T09:44:11Z |
| dc.date.issued | 2011 |
| dc.description | The corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 (http://europa.eu/rapid/search.htm). Each of the 5330 press releases (files) exist in Danish, English and German with app. 3,000,000 words for each language. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), the Danish and English texts with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation, and the German texts with tokenisation sentence and paragraph segmentation. The annotations are placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010. |
| dc.identifier.uri | http://hdl.handle.net/20.500.12115/28 |
| dc.language.iso | dan |
| dc.language.iso | eng |
| dc.language.iso | deu |
| dc.publisher | Centre for Language Technology, NorS, University of Copenhagen |
| dc.publisher | European Commission |
| dc.rights | CLARIN-ACA-NC |
| dc.rights.label | ACA |
| dc.rights.uri | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1 |
| dc.subject | press relase |
| dc.subject | politics |
| dc.subject | EU |
| dc.title | DK-CLARIN Rapid Parallel Corpus 1993-2003 (da-en-de) |
| dc.type | corpus |
| local.annotationInfo.annotationType | tokenization |
| local.annotationInfo.annotationType | sentence and paragraph segmentation |
| local.annotationInfo.annotationType | POS-tagging |
| local.annotationInfo.annotationType | lemmatization |
| local.annotationInfo.annotationType | termhood scoring |
| local.branding | CLARIN-DK |
| local.contact.person | Administrator CLARIN-DK info@clarin.dk Centre for Language Technology, NorS, University of Copenhagen |
| local.files.count | 6 |
| local.files.size | 352451311 |
| local.has.files | yes |
| local.language.name | Danish |
| local.language.name | English |
| local.language.name | German |
| local.size.info | 5330 files |
| local.size.info | 3000000 words |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
Collections
Files in this item
- Name
- teiHeader.xsd
- Size
- 59.88 KB
- Format
- text/xml
- Description
- schema
- MD5
- 9fc5374ad34319278f437b963454f972

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- de.zip
- Size
- 67.45 MB
- Format
- application/zip
- Description
- German corpus
- MD5
- a33313a5c1cd6760856bc68876096d34

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- da.zip
- Size
- 132.07 MB
- Format
- application/zip
- Description
- Danish corpus
- MD5
- 340ac9cd92f3dd4974f0d0ffcd391d78

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- text-format.pdf
- Size
- 111.77 KB
- Format
- application/pdf
- Description
- Documentation
- MD5
- c4c4b5f1cd83ff232c44bc7692621da7

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- en.zip
- Size
- 136.07 MB
- Format
- application/zip
- Description
- English corpus
- MD5
- 0406ef3fc8fb5eebe4f3d10fb952a994

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- text-header.pdf
- Size
- 375.79 KB
- Format
- application/pdf
- Description
- Documentation
- MD5
- 47825d0010a398bf10ce1564da2a15f0

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk

