DK-CLARIN JRC-Acquis Parallel Corpus 1958-2003 (da-en)

Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and European Commission, 2011, DK-CLARIN JRC-Acquis Parallel Corpus 1958-2003 (da-en), CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/29.
Date issued
2011
Size
20000000 words,
8606 files,
8627 files
Language(s)
Description
The DK-CLARIN JRC-Acquis Parallel Corpus (da, en) is a part of the JRC-Acquis mulilingual parallel corpus, containing documents from The Acquis Communautaire (AC) which is the total body of European Union (EU) law applicable in the the EU Member States (see: https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis). The data comes with one or more Eurovoc class codes added in the metadata from the European Commission. Each language corpus (English and Danish) contains app. 20 million words. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and for Danish also termhood annotation. The annotations are placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
Subject(s)
This item isPublicly Available
and licensed under:
 Files in this item
Name
teiHeader.xsd
Size
59.88 KB
Format
text/xml
Description
schema
MD5
9fc5374ad34319278f437b963454f972
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
eurovoc_in_skos_core_concepts.zip
Size
7.31 MB
Format
application/zip
Description
Eurovoc Class codes
MD5
a73c868e716adf98446af1bd2f441bac
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
text-format.pdf
Size
111.77 KB
Format
application/pdf
Description
Documentation
MD5
c4c4b5f1cd83ff232c44bc7692621da7
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
text-header.pdf
Size
375.79 KB
Format
application/pdf
Description
Documentation
MD5
47825d0010a398bf10ce1564da2a15f0
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
da-2000-2003.zip
Size
309.82 MB
Format
application/zip
Description
Danish Corpus, 2000-2003
MD5
b31822a44ed9b7502ee919adc8a28435
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
en-2000-2003.zip
Size
273.74 MB
Format
application/zip
Description
English Corpus, 2000-2003
MD5
0e4eee0ca72f37db3adb1a8406bb8c09
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
da-1958-1989.zip
Size
166.09 MB
Format
application/zip
Description
Danish Corpus, 1958 - 1989
MD5
06c58016b6b17a34d958903d68c0b0b5
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
en-1990-1999.zip
Size
319.67 MB
Format
application/zip
Description
English Corpus, 1990-1999
MD5
79539e3de6bf4984fd8620f1e5fd92e3
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
da-1990-1999.zip
Size
357.4 MB
Format
application/zip
Description
Danish Corpus, 1990-1999
MD5
ac6da2a7872c77c7c3b0e9c51f0b0990
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
en-1958-1989.zip
Size
121.36 MB
Format
application/zip
Description
English Corpus, 1958-1989
MD5
ab323e7cacc7803d49be54f107722b53
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator