With all the hype about ChatGPT, I decided to try it out myself, asking it to explain the concept of neural machine translation and a more exotic concept from my particle physics research past. Feel free to compare its explanation of NMT to my earlier blog posts.
The upshot of this “test”: Caveat emptor! If you ask ChatGPT to write an article (or even your homework) for you, you should carefully check the content, or better yet, write it yourself. Because it turns out that ChatGPT is very good at inventing references that don’t exist.
Me: Explain how neural machine translation works
Neural machine translation (NMT) is a method of machine translation that uses deep learning techniques to improve the accuracy of the translation. It works by training a neural network on a large dataset of human-translated text pairs and using the learned features to translate new text.
The NMT system consists of two main components: an encoder and a decoder. The encoder processes the input text and converts it into a compact internal representation, known as a latent representation, that captures the meaning and context of the input. The decoder then processes this latent representation and generates the translated text.
During training, the NMT system is fed a large dataset of human-translated text pairs and learns to predict the correct translation given an input text. The system adjusts the weights and biases of the encoder and decoder neural networks through a process called backpropagation, which helps the system learn to produce more accurate translations.
Once the NMT system is trained, it can be used to translate new text by encoding the input text, generating a latent representation, and then decoding the latent representation to produce the translated output.
Me: Can you explain the learning process in NMT?
In NMT, the learning process involves training the encoder and decoder neural networks to accurately translate text from one language to another. This is typically done using a large dataset of human-translated text pairs, where each pair consists of an input text in one language and its corresponding translation in the other language.
During training, the NMT system is fed the input text from each pair and attempts to produce the corresponding translation. The system then compares the predicted translation to the ground truth translation and calculates a loss value that represents how far off the prediction was.
The loss value is then used to update the weights and biases of the encoder and decoder neural networks through a process called backpropagation. This involves propagating the loss value back through the network and using an optimization algorithm, such as stochastic gradient descent, to adjust the weights and biases in a way that reduces the loss.
This process is repeated for each text pair in the training dataset, and the NMT system continues to learn and improve its translation accuracy over time. Once the system has been trained, it can be used to translate new text by encoding the input text and decoding the latent representation to produce the translated output.
Comment: This explanation is not bad, but if somebody is unfamiliar with the terminology, they won’t understand it. So I asked ChatGPT to use simple terms instead of jargon. Amazingly, it complied.
Me: Can you explain the learning process in NMT in very simple terms without jargon?
Sure, I can explain it in simple terms.
Neural machine translation (NMT) is a way for a computer to automatically translate text from one language to another. To do this, it uses a neural network, which is a type of computer program that is inspired by the way the brain works.
To teach the neural network how to translate, we give it many examples of translations that have been done by humans. The neural network looks at these examples and tries to find patterns that will help it translate new text.
When the neural network tries to translate new text, it makes a guess at what the translation should be. Then, it compares its guess to the correct translation and sees how close it was. Based on this comparison, it adjusts the way it makes its guesses so that it will be more accurate next time. This process is called “learning.”
The neural network continues to learn and improve its translations as it gets more examples to work with. Eventually, it should be able to produce translations that are very similar to those done by humans.
Me: Can you explain neural networks in more detail in simple terms?
A neural network is a type of computer program that is designed to recognize patterns. It is called a “neural” network because it is inspired by the way the brain works.
A neural network is made up of layers of interconnected “neurons,” which are simple calculations that process input data and pass it through the network. Each layer of neurons takes in input data, processes it, and passes it on to the next layer. The final layer produces the output of the neural network, which could be a prediction, a classification, or something else.
To train a neural network, we give it a large amount of data and tell it what the correct output should be for each piece of data. The neural network then uses this data to adjust the calculations that the neurons perform, so that the network can produce the correct output for new data.
For example, if we wanted to train a neural network to recognize cats in pictures, we would give it many pictures of cats and tell it that each picture contains a cat. The neural network would then adjust the calculations that the neurons perform, so that when it is shown a new picture, it can correctly predict whether or not there is a cat in the picture.
Comment: Not bad at all, although I suspect ChatGPT has been trained on many texts with explanations of neural networks and neural machine translation, since nearly everybody interested in these topics has been blogging or written about them at some point, myself included. So I decided to ask it a question that is not so widely discussed, although fairly well known by experts in particle physics, my previous career. I asked ChatGPT about a term that I coined myself back in 2003 to characterize the flow of particles that emerge from a collision of sub-atomic particles. When you smash together sub-atomic particles at high energy, the particles and the collision energy are converted into new particles that fly out of the collision region. The path of these new particles can give you information about the original particles that collided. Back in 2002-2003, my colleagues and I invented a new measure to characterize this path, and I called this new measure “angularity.” The term caught on in the community, apparently well enough for ChatGPT to recognize it.
Me: What is angularity in high-energy physics?
In high-energy physics, angularity is a term used to describe the distribution of particles or energy over a range of angles. In experiments involving the collision of particles at high energies, the angular distribution of the resulting particles can provide important information about the physical processes taking place. For example, the angular distribution of particles produced in a collision may be used to study the properties of new particles or to test the predictions of theoretical models. Angularity is often used in conjunction with other types of measurements, such as the energy and momentum of the particles, to provide a more complete understanding of the physical processes involved.
Comment: I then asked it to cite its sources. This is the result:
It turns out (as I should know) that the paper does not exist. Neither does at least one of the authors, at least not in the relevant field of physics. Here is the relevant excerpt from the journal’s table of contents:
For the record, “angularities” were first introduced by C. F. Berger (yours truly), T. Kucs, and G. Sterman in Event Shape/Energy Flow Correlations, Physical Review D 68, 014012 (2003). I believe the name was coined first in Scaling rule for nonperturbative radiation in a class of event shapes, JHEP 09, 058 (2003) by C. F. Berger (again, yours truly) and G. Sterman.
While in my opinion a well thought-out, researched, and illustrated article by a good writer is still better than ChatGPT’s output, the engine already shows parity to superficial fluff pieces or articles written by non-experts by the dozen. As shown above, it has also clearly been trained on some not-so-standard texts in a fairly specialized field such as high-energy particle physics, although its answer is much less extensive than for my more mainstream queries on NMT. And when you press it for details, it gets wildly creative with its “sources,” instead of admitting that it does not know. Caveat emptor — buyer beware!
ChatGPT is built upon a large, pretrained neural model (also called a large language model, LLM) called GPT-3.5, developed by OpenAI. GPT-3.5 is a somewhat improved version of GPT-3. According to this Wikipedia article here, GPT-3 contains 175 billion (!!) parameters and was trained on approx. 500 billion tokens (for the layperson, a token is roughly equivalent to a word).