DK-CLARIN Parallel Financial Corpus (da-en)
Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen, 2011,
DK-CLARIN Parallel Financial Corpus (da-en), CLARIN-DK-UCPH Centre Repository,
http://hdl.handle.net/20.500.12115/18.
Authors
Item identifier
Date issued
2011
Size
4343072 tokens,
4854172 tokens,
90 files
Description
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from 12 of the biggest Danish companies.
All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation placed in separate text external spangroups.
The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology.
The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
Acknowledgement
n/a
Project code:n/a
Project name:DK-CLARIN
Subject(s)
Collections
Files in this item
- Name
- annual-reports-en.zip
- Size
- 87.23 MB
- Format
- application/zip
- Description
- Corpus - English
- MD5
- e79821e3d1b912536f56254760b2e85e

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- text-header.pdf
- Size
- 375.79 KB
- Format
- application/pdf
- Description
- Documentation
- MD5
- 47825d0010a398bf10ce1564da2a15f0

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- text-format.pdf
- Size
- 111.77 KB
- Format
- application/pdf
- Description
- Documentation
- MD5
- c4c4b5f1cd83ff232c44bc7692621da7

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- textCorpusProfile.xsd
- Size
- 142.26 KB
- Format
- text/xml
- Description
- Schema
- MD5
- 7d6b452b88175041133ea8020e453cd8

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- annual-reports-da.zip
- Size
- 93.86 MB
- Format
- application/zip
- Description
- Corpus - Danish
- MD5
- 441f9b22e1f510d83a9e5dba1725b7a3

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- README_financial-reports.txt
- Size
- 2.38 KB
- Format
- text/plain
- Description
- Documentation
- MD5
- a8048d2626384dbaa8cb0d0b9dccbef7

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk
- Name
- teiHeader.xsd
- Size
- 59.88 KB
- Format
- text/xml
- Description
- Schema
- MD5
- 9fc5374ad34319278f437b963454f972

The file preview has not been generated yet. Please try again later or contact the system administrator info@clarin.dk

