DK-CLARIN JRC-Acquis Parallel Corpus 1958-2003 (da-en)
Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and European Commission, 2011,
DK-CLARIN JRC-Acquis Parallel Corpus 1958-2003 (da-en), CLARIN-DK-UCPH Centre Repository,
http://hdl.handle.net/20.500.12115/29.
Authors
Item identifier
Date issued
2011
Size
20000000 words,
8606 files,
8627 files
Description
The DK-CLARIN JRC-Acquis Parallel Corpus (da, en) is a part of the JRC-Acquis mulilingual parallel corpus, containing documents from The Acquis Communautaire (AC) which is the total body of European Union (EU) law applicable in the the EU Member States (see: https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis). The data comes with one or more Eurovoc class codes added in the metadata from the European Commission.
Each language corpus (English and Danish) contains app. 20 million words.
All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and for Danish also termhood annotation. The annotations are placed in separate text external spangroups.
The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology.
The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository.
The project ran from January 2008 until the end of 2010.
Collections
This item isPublicly Available
and licensed under:
Files in this item
- Name
- teiHeader.xsd
- Size
- 59.88 KB
- Format
- text/xml
- Description
- schema
- MD5
- 9fc5374ad34319278f437b963454f972

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- eurovoc_in_skos_core_concepts.zip
- Size
- 7.31 MB
- Format
- application/zip
- Description
- Eurovoc Class codes
- MD5
- a73c868e716adf98446af1bd2f441bac

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- text-format.pdf
- Size
- 111.77 KB
- Format
- application/pdf
- Description
- Documentation
- MD5
- c4c4b5f1cd83ff232c44bc7692621da7

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- text-header.pdf
- Size
- 375.79 KB
- Format
- application/pdf
- Description
- Documentation
- MD5
- 47825d0010a398bf10ce1564da2a15f0

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- da-2000-2003.zip
- Size
- 309.82 MB
- Format
- application/zip
- Description
- Danish Corpus, 2000-2003
- MD5
- b31822a44ed9b7502ee919adc8a28435

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- en-2000-2003.zip
- Size
- 273.74 MB
- Format
- application/zip
- Description
- English Corpus, 2000-2003
- MD5
- 0e4eee0ca72f37db3adb1a8406bb8c09

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- da-1958-1989.zip
- Size
- 166.09 MB
- Format
- application/zip
- Description
- Danish Corpus, 1958 - 1989
- MD5
- 06c58016b6b17a34d958903d68c0b0b5

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- en-1990-1999.zip
- Size
- 319.67 MB
- Format
- application/zip
- Description
- English Corpus, 1990-1999
- MD5
- 79539e3de6bf4984fd8620f1e5fd92e3

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- da-1990-1999.zip
- Size
- 357.4 MB
- Format
- application/zip
- Description
- Danish Corpus, 1990-1999
- MD5
- ac6da2a7872c77c7c3b0e9c51f0b0990

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- en-1958-1989.zip
- Size
- 121.36 MB
- Format
- application/zip
- Description
- English Corpus, 1958-1989
- MD5
- ab323e7cacc7803d49be54f107722b53

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk

