Self-Supervised Learning keynote talk by Yann LeCunn; Perceiver - computation efficient transformer algorithm; Data Visualization also for summaries and more news in this monthly update.
To overcome Transformers' squared complexity (w.r.t input length), the Perceiver article here offers a novel method to learn the QKV matrices. Check it out!
This very relevant article suggests an unusual method to improve neural network regularization abilities. I believe that this method has a great potential to enter the standard neural network training toolkit. I was also impressed by the detailed comparison to other methods.