Let’s Build a Transformer in TensorFlow: Part 1

Abdulkader Helwan
4 min readMar 1, 2024

In this two-part story, we will implement a Transformer model from scratch using TensorFlow. We will go with you through all the steps. Keep Reading.

Photo by Arseny Togulev on Unsplash

What is a Transformer?

Transformer is a type of deep learning model, specifically designed for understanding and processing language. It became popular in the field of natural language processing (NLP) due to its unique approach. Unlike older methods that process information step-by-step, transformers use a special technique called “attention to focus on the most relevant parts of the input. This allows them to handle various NLP tasks, such as translation and text generation, more effectively.

Inside the Transformer

The Transformer is like a machine with two parts: an encoder and a decoder. They work together to understand and create sequences of text. The encoder acts like a “reader” — it takes in the text and gets a general idea of its meaning. Then, the decoder acts like a “writer” — it uses what the encoder learned to create a new piece of text, following the same rules as the original.

The key to how Transformers work is something called “attention.” Imagine you’re reading a document — with attention, the Transformer can focus on specific parts of the text that are most important for the task at hand, just like you might highlight important information while studying. This helps the Transformer understand the meaning of the text better and create more accurate outputs.

Transformer architecture

Imagine the Transformer as a powerful machine for understanding and creating text. It has two main parts: Encoder and Decoder.

The Encoder:

This acts like a “reader” for the machine. It takes in the text and breaks it down into smaller pieces called “tokens” (like individual words). Then, it uses a special ability called “self-attention” to pay close attention to different parts of the text, just like highlighting important information while studying. This helps the encoder understand the relationships and connections between the words.