Publications
NeuroAI
-
Continual learning using dendritic modulations on view-invariant
feedforward weights
Training the whole network to solve a classification task leads to neural collapse, hindering continual learning (left). Instead, we suggest a separate self-supervised training objective to learn view-invariant features (right), upon which task-specific modulations might induce linearly separable representations. The brain can learn continuously without forgetting past skills, unlike traditional machine learning models that struggle with continual learning. We provide a 'fast and slow learning' paradigm towards solving this problem.
First, we observe that standard supervised learning of neural networks leads to a configuration, where the network becomes invariant to task-irrelevant features, i.e. also being invariant to features relevant to previous tasks ('neural collapse'). Instead, in a slow learning phase, we suggest to first learn feedforward weights to extract general features by proxy of a view-invariance learning objective: The training signal is thereby provided by smoothly moving visual stimuli, suggesting object identity. The machine learning equivalent is contrastive self-supervised learning, where the network is trained to be invariant towards randomly generated distortions (e.g. random crops, flips, etc.).
Yet, a neural collapse configuration in order to readout the class via a linear classifier. Inspired by our previous work, we train task-specific modulations - augmenting the feedfoward computation - towards such a configuration (fast learning). We show that alternating between these two phases allows in a standard continual learning setup does not lead to catastrophic forgetting of task-relevant features, but requires keeping track of drifting class-clusters (readout healing).
-
NMDA-driven dendritic modulation enables multitask representation
learning in hierarchical sensory processing pathways
Task-Modulated Contrastive Learning (TMCL) is a layer-local learning algorithm, that consolidates learned dendrites-inspired task-modulations into task-agnostic feedforward weights. How does the brain adapt its computation to dynamically changing environments and tasks? We propose that dendritic modulation is a suitable candidate for these requiremets. We show in biophysically realistic simulations that task-solving modulations learned via a Hebbian learning rule modulated by a global error signal can be used to solve multiple tasks. Towards more biologically plausible machine learning, we propose task-modulated contrastive learning (TMCL) as a layer-local, semi-supervised, multi-task learning algorithm.
Natural Language Processing
-
Does Joint Training Really Help Cascaded Speech Translation?
EMNLP 2022
A simple approach to translate speech from one language to text in another language is to generate a transcript using ASR (automatic speech recognition) model, which is then translated using a separate MT (machine translation) model, i.e. cascaded speech trnslation. We discuss the potential benefits of training these two models jointly. Our investigations highlight that the benefits of such joint training suggested by previous work can be explained away by in-domain fine-tuning both ASR and MT models whilst using the traditional cascaded approach.
-
On Sampling-Based Training Criteria for Neural Language
Modeling
INTERSPEECH 2021
As the vocabulary size of language models increases, training the cross-entropy loss across the entire vocabulary becomes computationally expensive. However, it is unclear why certain sampling-based training criteria such as noise contrastive estimation (NCE) work well in practice compared to others. Here, starting from three fundamental criteria, namely mean squared error (MSE), binary cross-entropy (BCE), and cross-entropy (CE), we explicitly write out sampling-based versions such as importance sampling, NCE and standard Monte Carlo sampling and derive a 'correction term' that - if applied during inference - makes the sampling-based training criteria perform similarly to NCE.
-
Analysis of positional encodings for neural machine
translation
Relative positional encodings as proposed by Shaw et al., 2019 We show in the context of machine translation that while relative positional encodings are not beneficial for performance on sequence lengths seen during training, they are crucial for generalization to longer sequences. Nowadays, this fact is widely acknowledged outside of machine translation (e.g. Csordas, Irie and Schmidhuber, 2021) and relative positional encodings are used in many state-of-the-art models.