Augmenting images training data to add variability doesn't always end well since the augmentation changes the semantic content of the image. Here's how to solve it.
To overcome Transformers' squared complexity (w.r.t input length), the Perceiver article here offers a novel method to learn the QKV matrices. Check it out!
This very relevant article suggests an unusual method to improve neural network regularization abilities. I believe that this method has a great potential to enter the standard neural network training toolkit. I was also impressed by the detailed comparison to other methods.