Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen and European Commission, 2012,
DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de), CLARIN-DK-UCPH Centre Repository,
http://hdl.handle.net/20.500.12115/30.
| dc.creator | Haltrup Hansen, Dorte |
| dc.creator | Offersgaard, Lene |
| dc.date.accessioned | 2018-06-25T13:41:09Z |
| dc.date.available | 2018-06-25T13:41:09Z |
| dc.date.issued | 2012 |
| dc.description | The aligned corpus consists of press releases from the European Commission Press Relase Database (Rapid) harvested in 2009 and 2011 (http://europa.eu/rapid/search.htm). The corpus comprises 5330 + 2200 press releases (files) for each language Danish, English and German with app. 5,000,000 words per language and 260,000 - 270,000 aligned sentences for the language pair Danish - English and Danish - German. All documents are processed with Uplug (https://bitbucket.org/tiedemann/uplug/wiki/Home) and aligned with HunAlign. Files with more than 10 % negative alignments have been removed and so has all 0-alignmants. The documents are in txt-format for each language and in tmx-format for the aligned language pairs (da-en and da-de). |
| dc.identifier.uri | http://hdl.handle.net/20.500.12115/30 |
| dc.language.iso | dan |
| dc.language.iso | eng |
| dc.language.iso | deu |
| dc.publisher | Centre for Language Technology, NorS, University of Copenhagen |
| dc.publisher | European Commission |
| dc.rights | CLARIN-ACA-NC |
| dc.rights.label | ACA |
| dc.rights.uri | https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&NORED=1 |
| dc.subject | MT |
| dc.subject | EU |
| dc.subject | press relase |
| dc.subject | alignment |
| dc.subject | politics |
| dc.title | DK-CLARIN Rapid Aligned Corpus 1993-2011 (da-en, da-de) |
| dc.type | corpus |
| local.branding | CLARIN-DK |
| local.contact.person | Administrator CLARIN-DK info@clarin.dk Centre for Language Technology, NorS, University of Copenhagen |
| local.files.count | 3 |
| local.files.size | 112987350 |
| local.has.files | yes |
| local.language.name | Danish |
| local.language.name | English |
| local.language.name | German |
| local.size.info | 5000000 tokens |
| local.size.info | 270000 sentences |
| metashare.ResourceInfo#ContentInfo.mediaType | text |
Collections
Files in this item
- Name
- Rapid-2004-2011.zip
- Size
- 39.2 MB
- Format
- application/zip
- Description
- Corpus 2004 - 2011
- MD5
- ce84f48a004e249fcbe511faf0856e77

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- README.txt
- Size
- 1.01 KB
- Format
- text/plain
- Description
- Documentation
- MD5
- 8a7d86a2ef03a56751b93a15b60a4d63

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- Rapid-1993-2003.zip
- Size
- 68.55 MB
- Format
- application/zip
- Description
- Corpus 1993 - 2003
- MD5
- d73a47ab17a22afeff024a360100e907

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk

