Revisiting Skip Connections in Transformers

Abdulkader Helwan
5 min readJan 4, 2024

This article is part of a series about the Transformer skip connections and attention layers. If you haven’t read the others, refer to the introductory article here. The next article of this series is here.

I recently had a chat with one of my best friends who happens to be a great Machine Learning scientist, working at a very big company (Spire, Luxembourg). Our friendship goes way back, we had our Masters at the same University, and I learned a lot from him. Long story short, we were talking about Transformers, and I realized that my friend believes…

--

--