Free/open-source machine translation software

Article by Mikel L. Forcada appeared on Computing.dcu.ie

Here’s a non-exhaustive list of links to existing free/open-source machine translation systems, which I will try to complete as I find about them. To the best of my knowledge, software listed here has:

Rule-based systems

  • Apertium, a free/open-source rule-based machine translation platform.
  • Matxin, a free/open-source rule-based machine translation system for Basque.
  • OpenLogos, a free/open-source version of the historical Logos machine translation system.
  • Anusaaraka, English-Hindi machine translation system.

Statistical machine translation systems

Decoders

  • Moses, a statistical machine translation system.
  • Marie, an n-gram-based statistical machine translation decoder.
  • Joshua, an open source decoder for statistical translation models based on synchronous context free grammars
  • Phramer, an open-source statistical phrase-based machine translation decoder
  • GREAT, a decoder based on stochastic finite-state transducers, which includes a training toolkit.

Training translation models

  • Giza++ is a tool to train translation models for statistical machine translation (see also the related mkclstool to train word classes)
  • Thot is a toolkit to train phrase-based models for statistical machine translation.

Language models

  • IRSTLM, free/open-source language modelling tool to be used with Moses instead of SRILM, which is not free.
  • RandLM, space-efficient ngram-based language models built using randomized representations (Bloom Filters etc).
  • Kenneth Heafield’s software for the fast filtering of ARPA format language models to multiple vocabularies.

Scoring

  • Kenneth Heafield’s scripts that make it easy to score machine translation output using NIST’s BLEU and NIST, TER, and METEOR.

Other software

  • RIAis a tool for automatic induction of transfer rules for Transfer-Based Statistical Machine Translation using dependency structures.
  • Chaski: Distributed phrase-based machine translation training tool based on Hadoop.

Example-based machine translation systems

Multi-engine machine translation / system combination

  • MANY: Open Source Machine Translation System Combination.
  • Kenneth Heafield’s multi-engine machine translation system.

Aligners and translation models

  • Giza++: training of statistical translation models.
  • Anymalign,a multilingual sub-sentential aligner.
  • Ventsislav Zhechev’s Sub-tree aligner which can be used for the automatic generation of parallel treebanks.

Web services around machine translation

  • Tradubi is an open-source Ajax-based web application for social translation built upon Apertium (may be tested online).

Distributed machine translation

  • ScaleMT (no release yet, browse at the Apertium Subversion repository) is a free/open-source framework for building scalable machine translation web services.

Other useful tools

… that may be used to build machine translation systems

  • Freeling, a free/open-source suite of language analyzers.
  • Bitextor, an automatic bitext harvester
  • Foma, a finite-state machine toolkit and library
  • HFST, Helsinki Finite State Technology for natural-language morphologies.
  • VISL CG-3, the constraint grammar parser at the Visual Interactive Syntax Learning project of Syddansk Universitet: browse Subversion repository, source snapshots.

 

About TermCoord

The Terminology Coordination Unit of the European Parliament in Schuman Building on Place de l'Europe, Luxembourg
This entry was posted in Uncategorized and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s