Slides for presentation at ATA58, ST-7, “An Introduction to Artificial Intelligence, Machine Learning, and Neural Networks”

You can download the slides for my presentation here.
(© All rights reserved, though I am happy to share a version with higher resolution or give specific permission for reuse of the slides upon request.)


From spam filters to stock trading bots, the applications of artificial intelligence are already omnipresent. This poses important questions such as: Will my autonomous vacuum cleaner go on a rampage and eat the hamster? Do neural networks think like brains? What are the chances of a robot uprising? The presentation will address these questions and give an introduction to artificial intelligence, which is impacting all our lives, perhaps more than most people are aware of. However, the talk will not discuss machine translation and related topics. No knowledge of computer science or advanced mathematics is required to attend.

My Neural Machine Translation Project – Overview over Open-Source Toolkits – updated Dec 2017

Updated: December 2017

Before deciding on a toolkit, I needed to get an overview over the various open-source neural machine translation (MT) toolkits that are available at the time of writing (September 2017). In the following, I will summarize the features of the various toolkits from my point of view. Note that this summary does not include open-source MT toolkits such as Moses, which is based on a statistical approach. I will mainly summarize the impressions I got after lurking on the various support discussion forums/groups for a while.

The big kahuna – TensorFlow

Provided by: Google (TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.)
Language: Python (main API), with APIs available for C, Java, and Go, however the latter seem to have somewhat less functionality
Architecture: Since Tensorflow is a whole framework, both recurrent as well as convolutional neural networks are available.
White paper: Large-Scale Machine Learning on Heterogeneous Distributed Systems, M. Abadi et al., Nov. 9, 2015
Support: Stack Overflow for technical questions; a Google group (what else?) for higher-level discussions about features etc., although some technical questions are also discussed in the Google group; and a blog announcing new features and tutorials
Summary: TensorFlow is a large-scale, general-purpose open-source machine learning toolkit, not necessarily tailored for machine translation, but it does include tutorials on vector word representations, recurrent neural networks, and sequence-to-sequence models, which are the basic building blocks for a neural machine translation system. TensorFlow also provides various other neural network architectures and a vast number of features one could play around with for language learning and translation. Definitely not a plug-and-play system for beginners.

The more user-friendly one – OpenNMT

Provided by: Harvard University and Systran
Language: Lua, based on the Torch framework for machine learning; there exist two “light” versions using Python/PyTorch and C++
Update: As of December 2017, the main lua version is now accompanied by a full-fledged Python version, based on the PyTorch framework, and a version based on the Tensorflow framework.
Architecture: Recurrent neural network
White paper: OpenNMT: Open-Source Toolkit for Neural Machine Translation, G. Klein et al., Jan 10, 2017
Support: a very active discussion forum (where, among other people, Systran’s CTO is very involved)
Summary: More suited for machine learning beginners, although the choice of the programming language Lua, which is not that widely used, may be a bit of a hurdle. Update December 2017: Since there are now two other versions, based on Python and Tensorflow, this should no longer be an issue. End update. On the other hand, there exist lots of tutorials and step by step instructions. Some of the questions that are asked in the forum are indeed quite elementary (and I’m far from an expert!). Thus, if one wants to play around with inputs (that is, well-chosen corpora!) and various metrics and cost functions for the output, this is the toolkit to choose. In machine translation systems input and output are just as critical as the architecture itself, if not more so. Because for neural networks, and thus also neural machine translation systems, the old adage “garbage in – garbage out” is particularly true. Therefore, it may make more sense for linguists and translators to approach the machine translation problem from the angle of the input (corpora) and output (translation “quality” metrics), instead of getting lost in the architecture and the code.

The newer kid on the block – Nematus

Provided by: University of Edinburgh
Website: Not really a website, but the project plus documentation and tutorials is here on Github.
Language: Python, based on the Theano framework for machine learning
Architecture: Recurrent neural network
White paper: Nematus: a Toolkit for Neural Machine Translation, R. Sennrich et al., Mar 13, 2017
Support: a Google group
Summary: This is the third kid on the block, not as active as the other two above. Like OpenNMT, it is a toolkit only for language translation, as opposed to the general-purpose TensorFlow framework. It uses the better-known Python as opposed to Lua, which would be an advantage, at least for me, over OpenNMT. However, the user base does not seem quite as extensive or active as OpenNMT’s. Thus, at the time of writing, Nematus seems to be an option to keep in mind, but not necessarily the first choice.

The brand new kid on the block – Sockeye

Provided by: Amazon
Website: The main website is here: Not really a website, but a tutorial how to use Sockeye has been published on Amazon’s AWS (Amazon Web Services) AI blog —
Language: Python, built on the Apache MXNet framework for machine learning
Architecture: Recurrent neural network
White paper: SOCKEYE: A Toolkit for Neural Machine Translation, F. Hieber et al., Dec 15, 2017
Support: Aside from the website with documentation and a FAQ, there is the general AWS Discussion Forum.
Summary: The newest open source NMT toolkit is geared towards advanced users, who are also familiar with the AWS and MXNet setup. On the other hand, like with Google’s Tensorflow, there are many available architectures and advanced options, and therefore many more options for experimentation.

Another big one – Fairseq

Provided by: Facebook
Website: The Github repository is here: Not really a website, but a tutorial how to use Fairseq has been published on Facebook’s code blog —
Language: Lua, built on the Torch framework for machine learning
Architecture: Convolutional neural network
White paper: Convolutional Sequence to Sequence Learning, J. Gehring et al., May 8, 2017
Support: A Facebook group (what else?), and a Google group.
Summary: This is another open source toolkit for advanced users, for one, because it is also based on the more esoteric (as compared to Python) language Lua, for another, because the intended user base seem to be advanced researchers, not curious end users. It is also fairly new.

My Neural Machine Translation Project – Summary

I recently embarked on the ambitious project to set up my own neural machine translation engine. This post serves as the overview page over the posts, with posts being added over time.

My Neural Machine Translation Project – Step 0 – Hardware Selection

This is the second installment of a series of blog posts, where I want to describe my attempts to set up my own neural machine translation engine. In the previous episode, I introduced myself and the project.

For neural network applications, the choice of hardware is just as crucial as the software and algorithms, because the training of neural networks consists basically of a large number of matrix multiplications that are best done in parallel. This is why the proliferation of dedicated graphics processing units (GPUs), spurred by the invention of LCD monitors and the popularity of virtual reality games, made the current advances in artificial neural networks possible. The idea of artificial neural networks is not new. The concept has been around since the 1940s, except nobody could really accomplish any practical tasks with neural networks until sufficiently powerful GPUs came along. GPUs are constructed specifically for large scale parallel computing and matrix multiplications, while even the fastest CPUs are wired for serial computing, not parallel computing.

But I digress. This post deals with setting up the right hardware for a neural machine translation network. According to the authors of various open source NMT toolkits, a powerful GPU with at least 6 GB of dedicated, on-board GPU memory is recommended. So it was really a no-brainer when my local nerd store, Fry’s, advertised an Asus gaming PC with an Nvidia GeForce GTX 1070 graphics card with 8 GB on-board memory. The 1070 is one level down from Nvidia’s current flagship GPU, the GTX 1080 Ti, but at roughly half the price of the 1080 Ti, it is definitely the most bang for the buck at this time. Aside from Nvidia, AMD also makes good GPUs with its Radeon line, but the PC package I bought was on sale as an open-box display item, so the price couldn’t be beat. Of course, the Asus BIOS came with its own headaches, so please keep reading if you are interested in the gory technical details of the lengthy setup process that ensued.

The PC came with Windows 10 preinstalled, which is essentially useless for serious computations. All open source neural net toolkits run standard on Linux. Although I have no intention of using the “gaming PC” for gaming and thus have no immediate use for Windows, I decided to keep Windows 10 on the machine, also because it came preinstalled without an installation medium. So I repartitioned the hard drive, which is surprisingly simple in Windows 10, and installed Ubuntu in a dual boot configuration. Instructions on how to do that can be found in abundance on the web. The process was quite straightforward with a USB stick and all seemed well, until I noticed that the Nvidia card wasn’t using the proprietary Nvidia driver for Ubuntu, but another driver. This defeats the purpose of having this high-end graphics card, because the standard driver is not using all the high-end features of the card. Here is where the headaches began.

I began by installing the latest Nvidia driver for Ubuntu with the following commands (drop the sudo if you are logged in as root):

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-384

384 is the latest version at the time of writing. However, upon reboot to activate the driver, things started to go awry. Ubuntu would not let me log in and always returned to the login screen, no matter what I tried.

The culprit with my particular setup seemed to be the so-called Secure Boot settings in the Asus UEFI BIOS, which seem as useless as sliced bread to me. These secure boot settings are supposed to prevent non-Windows operating systems from using certain firmware that is not trusted by the system manufacturer, even if that firmware is by a manufacturer of one of the components — in other words, in my case Asus doesn’t seem to trust Nvidia. After rebooting, I entered the UEFI BIOS by pressing F2, and accessed the Secure Boot settings. I was unable to disable it, “Enabled” was simply greyed out, so instead I chose the option “Other OS” instead of “Windows,” as you can see in the screenshots below. This fixed one problem.

Asus UEFI Bios boot settings

Asus UEFI Bios boot settings

Asus UEFI Bios boot settings detail

Asus UEFI Bios boot settings detail

However, upon another reboot, I got a screen filled with error messages about a PCIe bus error, pcieport, etc. that wouldn’t stop, along with syslog and kern.log files that filled up my entire TB harddisk and froze the system once the disk was full. Here, a simple additional option in the grub boot menu solved the problem:

  • I went to the command line with Ctrl + Alt + F1.
  • I emptied the syslog and kern.log files that had eaten up my entire harddisk with the following commands:

    sudo truncate -s0 syslog
    sudo truncate -s0 kern.log

  • I backed up the grub configuration and then edited it as follows:

    sudo cp /etc/default/grub /etc/default/grub.bak
    sudo -H gedit /etc/default/grub

    In gedit, I replaced the following line



    GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash pci=nomsi”

    MSI is short for Message Signaled Interrupts, which are supposed to be more stable against system freezes than other interrupt signals. However, it is known that the MSI support of specific combinations of hardware is inherently unstable and tends to freeze the system instead of preventing such freezes. My current setup of Asus motherboard with Intel chipset and Nvidia GeForce GPU card on Ubuntu 16.04 seems to be an example for this.

  • I saved the edited grub configuration file and exited gedit. Then I updated grub and restarted the system:

    sudo update-grub
    sudo reboot

I have run a few GPU intensive computations (not neural net related), and everything seems well. No overflowing system log files, no strange PCIe-related errors, and no log-in or boot issues. Windows 10 also seems to work fine.

Next up: The choice of toolkit — OpenNMT or Google’s Tensorflow? Decisions, decisions…

My Neural Machine Translation Project – Prologue

Lately, when I introduce myself as a translator, or more specifically, as a patent translator, people invariably always ask me whether I’m worried that I’d be replaced by neural machine translation (NMT) in the next few years. Obviously, this being Silicon Valley with its ubiquitous self-driving cars, drones, and robot security guards, I can’t just reply no, point to the latest MT translation error meme that is making the rounds on social media and be done with it. Here, a deeper (pun intended) argument is needed. In addition, the European Patent Office (EPO) announced their new Unitary Patent, which is supposed to reduce translation costs for applicants significantly, whereby many now mandatory translations of patents are to be replaced by machine translations. This new Unitary Patent was supposed to go into effect on January 2018; however, it currently looks as if this timeframe will be delayed.

Nevertheless, my inbox is also beginning to fill up more and more with offers for post-editing of machine translation output (MTPE). I am not the most efficient editor when I am editing translations by human colleagues, even when the text is excellent, because I tend to get sidetracked by matters of style. Thus I am utterly unsuited for MTPE, because I simply lack the patience to deal with nonsensical machine errors. However, in light of all of the above, a plan began to form in my head: I want to set up my own machine translation engine.

While this is certainly very ambitious, it’s not impossible. I have years/decades of background in advanced mathematics (theoretical physics) and computer programming. Furthermore, there are now several open source NMT toolkits on the market, complete with various libraries and discussion forums. Obviously, I could just download one of the toolkits, train the net with various open source corpora and be done with it. But that would be too easy! And not very productive. I want to get to a point where the net is trained well enough so that I can actually use the output in my daily work. I also want to achieve an expert level where I understand how NMT actually works, perhaps to work as an NMT consultant instead of an MTPE slave when the NMT apocalypse descends on the translation world (which is not likely to happen anytime soon). In addition, I will document my experiences on this blog. Since this is a side-project, I can’t promise to blog regularly, because my progress will be highly dependent on my daily workload. I certainly won’t be “live-blogging” due to the inevitable R-rated Austrian expletives that will accompany the programming stage.

I began the journey over a year ago by taking an introductory class by Andrew Ng on Machine Learning on Coursera. Andrew Ng is not only the co-founder of Coursera and a Stanford professor, he is also an excellent teacher. The course introduced all the necessary concepts with just the right amount of math (for me as a physicist) and programming (in the symbolic language MATLAB). I highly recommend this course as an advanced introduction for anybody who is interested in the topic. However, note, that Andrew Ng’s excellent course does not cover machine translation. I followed this up with several courses on Robotics (on Coursera) and on Artificial Intelligence (on EdX) at the introductory Master’s level. I even built an autonomously navigating robot, nicknamed Boticelli. While I am far from an expert now, I certainly know more than the average amateur about artificial intelligence and neural nets. I will summarize what I’ve learned so far in a presentation at the 58th Annual Conference of the American Translators Association this fall.

The next steps will be to buy the necessary computer hardware and pick one open source NMT toolkit. Neural nets require dedicated hardware, that is, very high-end graphics processing units (GPUs), because the training phase of neural nets basically consists of huge numbers of matrix multiplications. Dedicated GPUs are capable of performing large numbers of computations in parallel, in contrast to CPUs, which are best used for serial computations. Thus, to set up an NMT engine, a “gaming” PC with a VR-ready high-end graphics card is necessary, because ironically, the computations for virtual reality computer games and the computations for neural nets in serious applications such as translation are quite similar.

But more on that in the next post. Stay tuned!