DK-CLARIN Parallel Financial Corpus (da-en)

Please use the following text to cite this item or export to a predefined format:
Centre for Language Technology, NorS, University of Copenhagen, 2011, DK-CLARIN Parallel Financial Corpus (da-en), CLARIN-DK-UCPH Centre Repository, http://hdl.handle.net/20.500.12115/18.
Date issued
2011
Size
4343072 tokens,
4854172 tokens,
90 files
Language(s)
Description
The DK-CLARIN Parallel Financial Corpus comprises 4.3 M Danish and 4.8 M English tokens from translated (parallel) documents, mainly annual reports, of the period 2002-2010 from 12 of the biggest Danish companies. All texts are in XML TEIP5 format (TEIP5DKCLARIN-format), with tokenisation, pos-tagging, sentence and paragraph segmentation, lemmatisation and termhood annotation placed in separate text external spangroups. The corpus was collected and processed in the work package 2.6 of the Danish CLARIN project (see http://dkclarin.ku.dk/english) by University of Copenhagen, Centre for Language Technology. The aim of the Danish CLARIN consortium was to construct a Danish research infrastructure for the humanities integrating written, spoken, and visual records into a coherent and systematic digital repository. The project ran from January 2008 until the end of 2010.
Acknowledgement
Subject(s)
This item isAcademic Use
and licensed under:
 Files in this item
Name
annual-reports-en.zip
Size
87.23 MB
Format
application/zip
Description
Corpus - English
MD5
e79821e3d1b912536f56254760b2e85e
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
text-header.pdf
Size
375.79 KB
Format
application/pdf
Description
Documentation
MD5
47825d0010a398bf10ce1564da2a15f0
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
text-format.pdf
Size
111.77 KB
Format
application/pdf
Description
Documentation
MD5
c4c4b5f1cd83ff232c44bc7692621da7
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
textCorpusProfile.xsd
Size
142.26 KB
Format
text/xml
Description
Schema
MD5
7d6b452b88175041133ea8020e453cd8
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
annual-reports-da.zip
Size
93.86 MB
Format
application/zip
Description
Corpus - Danish
MD5
441f9b22e1f510d83a9e5dba1725b7a3
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
README_financial-reports.txt
Size
2.38 KB
Format
text/plain
Description
Documentation
MD5
a8048d2626384dbaa8cb0d0b9dccbef7
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator
Name
teiHeader.xsd
Size
59.88 KB
Format
text/xml
Description
Schema
MD5
9fc5374ad34319278f437b963454f972
Preview
  File Preview
    The file preview has not been generated yet. Please try again later or contact the system administrator