Archives September 2017

My Neural Machine Translation Project – Overview over Open-Source Toolkits – updated Dec 2017

Updated: December 2017

Before deciding on a toolkit, I needed to get an overview over the various open-source neural machine translation (MT) toolkits that are available at the time of writing (September 2017). In the following, I will summarize the features of the various toolkits from my point of view. Note that this summary does not include open-source MT toolkits such as Moses, which is based on a statistical approach. I will mainly summarize the impressions I got after lurking on the various support discussion forums/groups for a while.

The big kahuna – TensorFlow

Provided by: Google (TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.)
Language: Python (main API), with APIs available for C, Java, and Go, however the latter seem to have somewhat less functionality
Architecture: Since Tensorflow is a whole framework, both recurrent as well as convolutional neural networks are available.
White paper: Large-Scale Machine Learning on Heterogeneous Distributed Systems, M. Abadi et al., Nov. 9, 2015
Support: Stack Overflow for technical questions; a Google group (what else?) for higher-level discussions about features etc., although some technical questions are also discussed in the Google group; and a blog announcing new features and tutorials
Summary: TensorFlow is a large-scale, general-purpose open-source machine learning toolkit, not necessarily tailored for machine translation, but it does include tutorials on vector word representations, recurrent neural networks, and sequence-to-sequence models, which are the basic building blocks for a neural machine translation system. TensorFlow also provides various other neural network architectures and a vast number of features one could play around with for language learning and translation. Definitely not a plug-and-play system for beginners.

The more user-friendly one – OpenNMT

Provided by: Harvard University and Systran
Language: Lua, based on the Torch framework for machine learning; there exist two “light” versions using Python/PyTorch and C++
Update: As of December 2017, the main lua version is now accompanied by a full-fledged Python version, based on the PyTorch framework, and a version based on the Tensorflow framework.
Architecture: Recurrent neural network
White paper: OpenNMT: Open-Source Toolkit for Neural Machine Translation, G. Klein et al., Jan 10, 2017
Support: a very active discussion forum (where, among other people, Systran’s CTO is very involved)
Summary: More suited for machine learning beginners, although the choice of the programming language Lua, which is not that widely used, may be a bit of a hurdle. Update December 2017: Since there are now two other versions, based on Python and Tensorflow, this should no longer be an issue. End update. On the other hand, there exist lots of tutorials and step by step instructions. Some of the questions that are asked in the forum are indeed quite elementary (and I’m far from an expert!). Thus, if one wants to play around with inputs (that is, well-chosen corpora!) and various metrics and cost functions for the output, this is the toolkit to choose. In machine translation systems input and output are just as critical as the architecture itself, if not more so. Because for neural networks, and thus also neural machine translation systems, the old adage “garbage in – garbage out” is particularly true. Therefore, it may make more sense for linguists and translators to approach the machine translation problem from the angle of the input (corpora) and output (translation “quality” metrics), instead of getting lost in the architecture and the code.

The newer kid on the block – Nematus

Provided by: University of Edinburgh
Website: Not really a website, but the project plus documentation and tutorials is here on Github.
Language: Python, based on the Theano framework for machine learning
Architecture: Recurrent neural network
White paper: Nematus: a Toolkit for Neural Machine Translation, R. Sennrich et al., Mar 13, 2017
Support: a Google group
Summary: This is the third kid on the block, not as active as the other two above. Like OpenNMT, it is a toolkit only for language translation, as opposed to the general-purpose TensorFlow framework. It uses the better-known Python as opposed to Lua, which would be an advantage, at least for me, over OpenNMT. However, the user base does not seem quite as extensive or active as OpenNMT’s. Thus, at the time of writing, Nematus seems to be an option to keep in mind, but not necessarily the first choice.

The brand new kid on the block – Sockeye

Provided by: Amazon
Website: The main website is here: Not really a website, but a tutorial how to use Sockeye has been published on Amazon’s AWS (Amazon Web Services) AI blog —
Language: Python, built on the Apache MXNet framework for machine learning
Architecture: Recurrent neural network
White paper: SOCKEYE: A Toolkit for Neural Machine Translation, F. Hieber et al., Dec 15, 2017
Support: Aside from the website with documentation and a FAQ, there is the general AWS Discussion Forum.
Summary: The newest open source NMT toolkit is geared towards advanced users, who are also familiar with the AWS and MXNet setup. On the other hand, like with Google’s Tensorflow, there are many available architectures and advanced options, and therefore many more options for experimentation.

Another big one – Fairseq

Provided by: Facebook
Website: The Github repository is here: Not really a website, but a tutorial how to use Fairseq has been published on Facebook’s code blog —
Language: Lua, built on the Torch framework for machine learning
Architecture: Convolutional neural network
White paper: Convolutional Sequence to Sequence Learning, J. Gehring et al., May 8, 2017
Support: A Facebook group (what else?), and a Google group.
Summary: This is another open source toolkit for advanced users, for one, because it is also based on the more esoteric (as compared to Python) language Lua, for another, because the intended user base seem to be advanced researchers, not curious end users. It is also fairly new.