Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and European Commission, 2012, DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de), CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/30.
dc.creatorHaltrup Hansen, Dorte
dc.creatorOffersgaard, Lene
dc.date.accessioned2018-06-25T13:41:09Z
dc.date.available2018-06-25T13:41:09Z
dc.date.issued2012
dc.descriptionThe aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The corpus comprises 5330 + 2200 press releases (files) for each language Danish, English and German with app. 5,000,000 words per language and 260,000 - 270,000 aligned sentences for the language pair Danish - English and Danish - German. All documents are processed with Uplug (https://bitbucket.org/tiedemann/uplug/wiki/Home) and aligned with HunAlign. Files with more than 10 % negative alignments have been removed and so has all 0-alignmants. The documents are in txt-format for each language and in tmx-format for the aligned language pairs (da-en and da-de).
dc.identifier.urihttp://hdl.handle.net/20.500.12115/30
dc.language.isodan
dc.language.isoeng
dc.language.isodeu
dc.publisherCentre for Language Technology, NorS, University of Copenhagen
dc.publisherEuropean Commission
dc.rightsCLARIN-ACA-NC
dc.rights.labelACA
dc.rights.urihttps://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1
dc.subjectMT
dc.subjectEU
dc.subjectpress relase
dc.subjectalignment
dc.subjectpolitics
dc.titleDK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de)
dc.typecorpus
local.brandingCLARIN-DK
local.contact.personAdministrator CLARIN-DK info@clarin.dk Centre for Language Technology, NorS, University of Copenhagen
local.files.count3
local.files.size112987350
local.has.filesyes
local.language.nameDanish
local.language.nameEnglish
local.language.nameGerman
local.size.info5000000 tokens
local.size.info270000 sentences
metashare.ResourceInfo#ContentInfo.mediaTypetext
This item isAcademic Use
and licensed under:
 Files in this item
Name
Rapid-2004-2011.zip
Size
39.2 MB
Format
application/zip
Description
Corpus 2004 - 2011
MD5
ce84f48a004e249fcbe511faf0856e77
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
README.txt
Size
1.01 KB
Format
text/plain
Description
Documentation
MD5
8a7d86a2ef03a56751b93a15b60a4d63
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
Rapid-1993-2003.zip
Size
68.55 MB
Format
application/zip
Description
Corpus 1993 - 2003
MD5
d73a47ab17a22afeff024a360100e907
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator