More powerful deep learning with transformers (Ep. 84)

27/10/2019 37 min Episodio 80
More powerful deep learning with transformers (Ep. 84)

Listen "More powerful deep learning with transformers (Ep. 84)"

Episode Synopsis

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
 
References
Attention is all you need 
https://arxiv.org/abs/1706.03762
The illustrated transformer 
https://jalammar.github.io/illustrated-transformer

Self-attention for generative models 
http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf