Motivation
For building a deep learning compiler with backward pass optimization, it is important to have a clear understanding of the gradients of the different neural network operators like matmul, convolution and many others.
In this small python library, called autograd, I have implemented the gradients for several basic neural network operators.
I turned it into a tutorial and added some further explanation how the gradients are derived for matmul, convolution (x, w, b) and gelu.
The main goal of the formulas in the explanation part is to provide a concise but also accurate derivation, instead of just building an intuition.