Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and European Commission, 2011, DK-CLARIN JRC-Acquis Parallel Corpus 1958-2003 (da-en), CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/29.
dc.creatorHansen, Dorte Haltrup
dc.creatorOffersgaard, Lene
dc.date.accessioned2018-06-25T10:10:26Z
dc.date.available2018-06-25T10:10:26Z
dc.date.issued2011
dc.descriptionThe DK-CLARIN JRC-Acquis Parallel Corpus (da, en) is a part of the JRC-Acquis mulilingual parallel corpus, containing documents from The Acquis Communautaire (AC) which is the total body of European Union (EU) law applicable in the the EU Member States (see: https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis). The data comes with one or more Eurovoc class codes added in the metadata from the European Commission. Each language corpus (English and Danish) contains app. 20 million words. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and for Danish also termhood annotation. The annotations are placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
dc.identifier.urihttp://hdl.handle.net/20.500.12115/29
dc.language.isodan
dc.language.isoeng
dc.publisherCentre for Language Technology, NorS, University of Copenhagen
dc.publisherEuropean Commission
dc.rightsCreative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.labelPUB
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectlegal
dc.subjectEU
dc.titleDK-CLARIN JRC-Acquis Parallel Corpus 1958-2003 (da-en)
dc.typecorpus
local.annotationInfo.annotationTypetokenization
local.annotationInfo.annotationTypesentence and paragraph segmentation
local.annotationInfo.annotationTypePOS-tagging
local.annotationInfo.annotationTypelemmatization
local.annotationInfo.annotationTypetermhood scoring
local.brandingCLARIN-DK
local.contact.personAdministrator CLARIN-DK info@clarin.dk Centre for Language Technology, NorS, University of Copenhagen
local.files.count10
local.files.size1631520040
local.has.filesyes
local.language.nameDanish
local.language.nameEnglish
local.size.info20000000 words
local.size.info8606 files
local.size.info8627 files
metashare.ResourceInfo#ContentInfo.mediaTypetext
This item isPublicly Available
and licensed under:
 Files in this item
Name
teiHeader.xsd
Size
59.88 KB
Format
text/xml
Description
schema
MD5
9fc5374ad34319278f437b963454f972
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
eurovoc_in_skos_core_concepts.zip
Size
7.31 MB
Format
application/zip
Description
Eurovoc Class codes
MD5
a73c868e716adf98446af1bd2f441bac
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
text-format.pdf
Size
111.77 KB
Format
application/pdf
Description
Documentation
MD5
c4c4b5f1cd83ff232c44bc7692621da7
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
text-header.pdf
Size
375.79 KB
Format
application/pdf
Description
Documentation
MD5
47825d0010a398bf10ce1564da2a15f0
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
da-2000-2003.zip
Size
309.82 MB
Format
application/zip
Description
Danish Corpus, 2000-2003
MD5
b31822a44ed9b7502ee919adc8a28435
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
en-2000-2003.zip
Size
273.74 MB
Format
application/zip
Description
English Corpus, 2000-2003
MD5
0e4eee0ca72f37db3adb1a8406bb8c09
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
da-1958-1989.zip
Size
166.09 MB
Format
application/zip
Description
Danish Corpus, 1958 - 1989
MD5
06c58016b6b17a34d958903d68c0b0b5
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
en-1990-1999.zip
Size
319.67 MB
Format
application/zip
Description
English Corpus, 1990-1999
MD5
79539e3de6bf4984fd8620f1e5fd92e3
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
da-1990-1999.zip
Size
357.4 MB
Format
application/zip
Description
Danish Corpus, 1990-1999
MD5
ac6da2a7872c77c7c3b0e9c51f0b0990
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
en-1958-1989.zip
Size
121.36 MB
Format
application/zip
Description
English Corpus, 1958-1989
MD5
ab323e7cacc7803d49be54f107722b53
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator