Fourier with Deep Learning in Sequence Translation

As deep learning architectures are a technique to write a learning system where gradients are the only necessary requirements. The network uses the Fourier transform to replace the Self-Attention of BERT [3].

The Fourier transform is a technique to embedding an existing function by one using the sinusoidal functions as a basis…