Category Neural machine translation

A Conversation with ChatGPT

conversing with ChatGPT

With all the hype about ChatGPT, I decided to try it out myself, asking it to explain the concept of neural machine translation and a more exotic concept from my particle physics research past. Feel free to compare its explanation of NMT to my earlier blog posts.

The upshot of this “test”: Caveat emptor! If you ask ChatGPT to write an article (or even your homework) for you, you should carefully check the content, or better yet, write it yourself. Because it turns out that ChatGPT is very good at inventing references that don’t exist.

Me: Explain how neural machine translation works

ChatGPT’s reply:

Neural machine translation (NMT) is a method of machine translation that uses deep learning techniques to improve the accuracy of the translation. It works by training a neural network on a large dataset of human-translated text pairs and using the learned features to translate new text.

The NMT system consists of two main components: an encoder and a decoder. The encoder processes the input text and converts it into a compact internal representation, known as a latent representation, that captures the meaning and context of the input. The decoder then processes this latent representation and generates the translated text.

During training, the NMT system is fed a large dataset of human-translated text pairs and learns to predict the correct translation given an input text. The system adjusts the weights and biases of the encoder and decoder neural networks through a process called backpropagation, which helps the system learn to produce more accurate translations.

Once the NMT system is trained, it can be used to translate new text by encoding the input text, generating a latent representation, and then decoding the latent representation to produce the translated output.

Me: Can you explain the learning process in NMT?

ChatGPT’s reply:

In NMT, the learning process involves training the encoder and decoder neural networks to accurately translate text from one language to another. This is typically done using a large dataset of human-translated text pairs, where each pair consists of an input text in one language and its corresponding translation in the other language.

During training, the NMT system is fed the input text from each pair and attempts to produce the corresponding translation. The system then compares the predicted translation to the ground truth translation and calculates a loss value that represents how far off the prediction was.

The loss value is then used to update the weights and biases of the encoder and decoder neural networks through a process called backpropagation. This involves propagating the loss value back through the network and using an optimization algorithm, such as stochastic gradient descent, to adjust the weights and biases in a way that reduces the loss.

This process is repeated for each text pair in the training dataset, and the NMT system continues to learn and improve its translation accuracy over time. Once the system has been trained, it can be used to translate new text by encoding the input text and decoding the latent representation to produce the translated output.

Comment: This explanation is not bad, but if somebody is unfamiliar with the terminology, they won’t understand it. So I asked ChatGPT to use simple terms instead of jargon. Amazingly, it complied.

Me: Can you explain the learning process in NMT in very simple terms without jargon?

ChatGPT’s reply:

Sure, I can explain it in simple terms.

Neural machine translation (NMT) is a way for a computer to automatically translate text from one language to another. To do this, it uses a neural network, which is a type of computer program that is inspired by the way the brain works.

To teach the neural network how to translate, we give it many examples of translations that have been done by humans. The neural network looks at these examples and tries to find patterns that will help it translate new text.

When the neural network tries to translate new text, it makes a guess at what the translation should be. Then, it compares its guess to the correct translation and sees how close it was. Based on this comparison, it adjusts the way it makes its guesses so that it will be more accurate next time. This process is called “learning.”

The neural network continues to learn and improve its translations as it gets more examples to work with. Eventually, it should be able to produce translations that are very similar to those done by humans.

Me: Can you explain neural networks in more detail in simple terms?

ChatGPT’s reply:

A neural network is a type of computer program that is designed to recognize patterns. It is called a “neural” network because it is inspired by the way the brain works.

A neural network is made up of layers of interconnected “neurons,” which are simple calculations that process input data and pass it through the network. Each layer of neurons takes in input data, processes it, and passes it on to the next layer. The final layer produces the output of the neural network, which could be a prediction, a classification, or something else.

To train a neural network, we give it a large amount of data and tell it what the correct output should be for each piece of data. The neural network then uses this data to adjust the calculations that the neurons perform, so that the network can produce the correct output for new data.

For example, if we wanted to train a neural network to recognize cats in pictures, we would give it many pictures of cats and tell it that each picture contains a cat. The neural network would then adjust the calculations that the neurons perform, so that when it is shown a new picture, it can correctly predict whether or not there is a cat in the picture.

Comment: Not bad at all, although I suspect ChatGPT has been trained on many texts with explanations of neural networks and neural machine translation, since nearly everybody interested in these topics has been blogging or written about them at some point, myself included. So I decided to ask it a question that is not so widely discussed, although fairly well known by experts in particle physics, my previous career. I asked ChatGPT about a term that I coined myself back in 2003 to characterize the flow of particles that emerge from a collision of sub-atomic particles. When you smash together sub-atomic particles at high energy, the particles and the collision energy are converted into new particles that fly out of the collision region. The path of these new particles can give you information about the original particles that collided. Back in 2002-2003, my colleagues and I invented a new measure to characterize this path, and I called this new measure “angularity.” The term caught on in the community, apparently well enough for ChatGPT to recognize it.

Me: What is angularity in high-energy physics?

ChatGPT’s reply:

In high-energy physics, angularity is a term used to describe the distribution of particles or energy over a range of angles. In experiments involving the collision of particles at high energies, the angular distribution of the resulting particles can provide important information about the physical processes taking place. For example, the angular distribution of particles produced in a collision may be used to study the properties of new particles or to test the predictions of theoretical models. Angularity is often used in conjunction with other types of measurements, such as the energy and momentum of the particles, to provide a more complete understanding of the physical processes involved.

Comment: I then asked it to cite its sources. This is the result:

ChatGPT on angularities

It turns out (as I should know) that the paper does not exist. Neither does at least one of the authors, at least not in the relevant field of physics. Here is the relevant excerpt from the journal’s table of contents:
Phys Rev D TOC

For the record, “angularities” were first introduced by C. F. Berger (yours truly), T. Kucs, and G. Sterman in Event Shape/Energy Flow Correlations, Physical Review D 68, 014012 (2003). I believe the name was coined first in Scaling rule for nonperturbative radiation in a class of event shapes, JHEP 09, 058 (2003) by C. F. Berger (again, yours truly) and G. Sterman.


While in my opinion a well thought-out, researched, and illustrated article by a good writer is still better than ChatGPT’s output, the engine already shows parity to superficial fluff pieces or articles written by non-experts by the dozen. As shown above, it has also clearly been trained on some not-so-standard texts in a fairly specialized field such as high-energy particle physics, although its answer is much less extensive than for my more mainstream queries on NMT. And when you press it for details, it gets wildly creative with its “sources,” instead of admitting that it does not know. Caveat emptor — buyer beware!

ChatGPT is built upon a large, pretrained neural model (also called a large language model, LLM) called GPT-3.5, developed by OpenAI. GPT-3.5 is a somewhat improved version of GPT-3. According to this Wikipedia article here, GPT-3 contains 175 billion (!!) parameters and was trained on approx. 500 billion tokens (for the layperson, a token is roughly equivalent to a word).

Is neural machine translation sexist?

Robot apocalypse and AI bias

So, now I have your attention with that title, let’s delve into the real, less anthropomorphizing question: Is neural machine translation (NMT) biased? I was inspired to write this blog post after watching the documentary Coded Bias (available on Netflix) and a follow-up panel discussion entitled “Is AI racist?” (AI = artificial intelligence). Obviously, I borrowed the title for this blog post from that panel discussion. I highly recommend watching the aforementioned movie, if you haven’t already.

As a white Western European female, I don’t have first-hand experience with racism, so am therefore not really qualified to write a blog post about NMT and racist bias. However, as a female with two advanced STEM degrees (Science, Technology, Engineering and Mathematics), I do know a thing or two about gender bias. I was once told at the beginning of a physics lecture at university that

a woman’s place is in the kitchen.

This is, in fact, a verbatim quote, but I won’t mention names or other details to protect the guilty. Given that I am the world’s worst cook, I did not heed that “advice.” (How many other people do you know who have managed to explode an oven while trying to bake a cake?)

Read More

What is neural machine translation? How does NMT work? Will it replace human translators? Do machines think? Is the robot apocalypse near?

neural networks

If you are in any way connected with the world of translation and interpretation, you have certainly asked yourself at least one of the above questions about neural machine translation (NMT). These questions are by no means easy to answer. If you ask n experts, you’ll likely get n+1 different answers. Let me quote a few experts:

Read More

Handouts for ATA59 – An Introduction to Neural Machine Translation

I will be giving another presentation at the upcoming ATA Annual Conference in New Orleans, ATA59, jointly in the SciTech and Language Technology tracks. The presentation will give an introduction to neural machine translation. My talk is preliminarily scheduled for the very last time slot on Saturday before the final keynote. I hope to see you there, despite the late hour!


“The end of the human translator,” “nearly indistinguishable from human translation” – these and similar headlines have been used to describe neural machine translation (NMT). Most language specialists have probably asked themselves: How much of that is hype? How far can this approach to machine translation really go? How does it work? The presentation will examine one of the available open source NMT toolkits as an illustrative example to explain the underlying concepts of NMT and sequence-to-sequence models. It will follow in the same spirit as last year’s general introduction to neural networks, which is summarized in the accompanying handouts.


I have just uploaded the handout for the presentation onto the ATA server. The material is a slightly updated version of my blog post on neural networks, which summarizes my presentation at ATA58. You can download the handout here.