Listing English-Finnish Resources

Name Type Description Supported Languages
Seram MT System
Seram
Company: Sunda Systems Oy
Category: MT system
Languages: English?Finnish
Dictionary: 120,000 t... (more info)
Seram
Company: Sunda Systems Oy
Category: MT system
Languages: English?Finnish
Dictionary: 120,000 terms
Requirements: Windows 2000/XP; Internet Explorer, Mozilla Firefox
Input: Word, RTF, text, PDF, webpages
Note: also available as client-server system
Price: subscription (from (less info)
English to Finnish
Teemapoint MT System
Teemapoint
Company: teemapoint.com
Category: MT system
Languages: English?Finnish
Requirements: 1 GH... (more info)
Teemapoint
Company: teemapoint.com
Category: MT system
Languages: English?Finnish
Requirements: 1 GHz; 256MB RAM; Java or JRE (1.3 or later)
Note: NLP workstation with integrated MT system
Price: licence from (less info)
English to Finnish
Acquis Communautaire Parallel Corpus
The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU ... (more info)
The Acquis Communautaire (AC) is the total body of European Union (EU) law applicable in the the EU Member States. This collection of legislative text changes continuously and currently comprises selected texts written between the 1950s and now. At the beginning of the year 2007, the EU has 27 Member States and 23 official languages (see the Wikipedia entry). The Acquis Communautaire texts exist in these languages, although Irish translations are not currently available. The Acquis Communautaire thus is a collection of parallel texts in the following 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish. (less info)
Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Ita... (more info)
Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish (less info)
EuroParl release 3.0 Parallel Corpus
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includ... (more info)
The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic (French, Italian, Spanish, Portuguese), Germanic (English, Dutch, German, Danish, Swedish), Greek and Finnish.

For a detailed description of this corpus, please read:
Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005 (less info)
French, Italian, Spanish, Portuguese, English, Dutch, German, Danish, Swedish, Greek, Finnish
OPUS: EUconst Parallel Corpus
21 languages, 210 bitexts
total number of files: 987
total number of tokens: 3099290
total number... (more info)
21 languages, 210 bitexts
total number of files: 987
total number of tokens: 3099290
total number of sentence fragments: 224919 (less info)
Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Hungarian, Italian... (more info)
Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Irish, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Slovak, Slovenian, Swedish (less info)
OPUS: KDE Parallel Corpus
source: http://i18n.kde.org/
61 languages, 1830 bitexts
total number of files: 24586
total number... (more info)
source: http://i18n.kde.org/
61 languages, 1830 bitexts
total number of files: 24586
total number of tokens: 20414063
total number of sentence fragments: 4777530 (less info)
Afrikaans, Arabic, Azerbaijani, Belarusian, Bulgarian, Breton, Bosnian, Catalan, Czech, Welsh, Danis... (more info)
Afrikaans, Arabic, Azerbaijani, Belarusian, Bulgarian, Breton, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Esperanto, Spanish, Estonian, Basque, Finnish, French, Irish, Galician, Hebrew, Croatian, Hungarian, Indonesian, Icelandic, Italian, Japanese, Korean, Kurdish, Lithuanian, Latvian, Maori, Macedonian, Maltese, Norwegian Bokmål, Dutch, Norwegian Nynorsk, Occitan, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Serbian, Swedish, Tamil, Thai, Turkish, Ukrainian, ven, Vietnamese, Walloon, Xhosa, Zulu (less info)
OPUS: KDEdoc Parallel Corpus
source: http://i18n.kde.org/
24 languages, 226 bitexts
total number of files: 3736
total number ... (more info)
source: http://i18n.kde.org/
24 languages, 226 bitexts
total number of files: 3736
total number of tokens: 3783411
total number of sentence fragments: 302634 (less info)
Danish, German, English, Spanish, Estonian, French, Hungarian, Italian, Japanese, Dutch, Norwegian N... (more info)
Danish, German, English, Spanish, Estonian, French, Hungarian, Italian, Japanese, Dutch, Norwegian Nynorsk, Portuguese, Portuguese, Romanian, Russian, Slovak, Slovenian, Serbian, Swedish, Turkish, Ukrainian, Walloon, Xhosa, Chinese (less info)
OPUS: OpenOffice Parallel Corpus
6 languages, 15 bitexts
total number of files: 10983
total number of tokens: 2612156
total number... (more info)
6 languages, 15 bitexts
total number of files: 10983
total number of tokens: 2612156
total number of sentence fragments: 246760

The original documentation of the office package OpenOffice.org (http://www.openoffice.org/) contains 2014 English documents which have been partly translated into 5 languages: French, Spanish, Swedish, German, and Japanese. The original documentation in English comprises about 500,000 words and translations contain between 400,000 and 500,000 words per language. All documents have been tokenized and, except of the Spanish part, tagged with parts of speech. The English part of the corpus has been marked with syntactic chunks as well. (less info)
German, English, Spanish, French, Japanese, Swedish
OPUS: PHP Parallel Corpus
22 languages, 231 bitexts
total number of files: 71518
total number of tokens: 3303964
total numb... (more info)
22 languages, 231 bitexts
total number of files: 71518
total number of tokens: 3303964
total number of sentence fragments: 1381582

PHP manuals and translations have been downloaded from (http://www.php.net/download-docs.php). The original documents are written in English and have been partly translated into 21 languages. The original manuals contain about 500,000 words. The amount of actually translated texts varies for different languages between 50,000 and 380,000 words. The corpus is rather noisy and may include parts from the English original in some of the translations. The corpus is tokenized and each language pair has been sentence aligned. (less info)
Czech, German, English, Spanish, Finnish, French, Hebrew, Hungarian, Italian, Japanese, Korean, Dutc... (more info)
Czech, German, English, Spanish, Finnish, French, Hebrew, Hungarian, Italian, Japanese, Korean, Dutch, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Swedish, Turkish, Chinese (less info)