Activation functions are applied in an element-wise maner on the input. Therefore the dimension of the input and output is the same. The goal of the activation functions is to break the linearity of the network. This can provide a function space (in terms of the trainable weights) which can approximate a large body of mapping rules from the inputs to the outputs. For mathematical details see the following references:
- Funahashi, K. I. "On the Approximate Realization of Continuous Mappings by Neural Networks", Neural Networks, Vol. 2. No. 3. pp. 183-192. 1989.
- Leshno, M. - Lin, V. Y. - Pinkus, A. - Schocken, S. "Multilayer Feedforward Networks With a Nonpolynomial Activation Function Can Approximate Any Function", Neural Networks, Vol. 6. pp. 861-867. 1993.
The last one contains the following theorem:
Assume the neural network contains one non-linear layer. An f function (domain and range are real numbers) can be approximated with arbitrary accuracy if and only if the applied activation function is not a polynom and it is locally bounded.
ELu
HardSigmoid
ReLu
Sigmoid
SoftPlus
Softmax
Supports only 1 dimensional inputs.