Dendritic Learner
Viet Anh Khoa Tran 1,2
1Dendritic Learning Group (PGI-15), Forschungszentrum Jülich
2RWTH Aachen University

Hi there! I am interested in how the biological brain updates, represents and reasons upon information, and to apply that knowledge to build artificial learning agents (NeuroAI). My goal is to engineer machine learning algorithms and architectures that learn efficiently on neuromorphic hardware and reason beyond chains of thoughts hidden in massive language data. Currently, I am privileged to be able to work towards that goal as a PhD student at the Dendritic Learning Group headed by Willem Wybo as part of Emre Neftci's Neuromorphic Computing lab at Forschungszentrum Jülich, Germany.

Apart from work, I enjoy playing classical piano, table tennis, volleyball, bouldering, video games, and engaging myself in politics, philosophy, and languages.

Research

Contrastive Consolidation of Top-Down Modulations Achieves Sparsely Supervised Continual Learning Viet Anh Khoa Tran, Emre O. Neftci, Willem A. M. Wybo

Using contrastive learning to integrate modulations into feedforward weights, continually.

Biological brains learn continually from a stream of unlabeled data, while integrating specialized information from sparsely labeled examples without compromising their ability to generalize. Meanwhile, machine learning methods are susceptible to catastrophic forgetting in this natural learning setting, as supervised specialist fine-tuning degrades performance on the original task.

We introduce task-modulated contrastive learning (TMCL) , which takes inspiration from the biophysical machinery in the neocortex, using predictive coding principles to integrate top-down information continually and without supervision. We follow the idea that these principles build a view-invariant representation space, and that this can be implemented using a contrastive loss. Then, whenever labeled samples of a new class occur, new affine modulations are learned that improve separation of the new class from all others, without affecting feedforward weights. By co-opting the view-invariance learning mechanism, we then train feedforward weights to match the unmodulated representation of a data sample to its modulated counterparts. This introduces modulation invariance into the representation space, and, by also using past modulations, stabilizes it.

Our experiments show improvements in both class-incremental and transfer learning over state-of-the-art unsupervised approaches, as well as over comparable supervised approaches, using as few as 1% of available labels. Taken together, our work suggests that top-down modulations play a crucial role in balancing stability and plasticity.

Cortical learning (left) is characterized by the interplay between top-down (orange) and feedforward (blue) processing, where top-down connections impart high-level information on the feedforward sensory processing pathway (top). The feedforward pathway, on the other hand, learns to predict neural representations of future inputs (predictive coding). Translating this view to a machine learning algorithm (middle), we (i) train modulations to implement high-level object identification tasks as the analogue of top-down inputs, while we (ii) train for view invariance over modulated representations and for modulation invariance as the analogue of predictive coding (top). As a consequence, high-level information continually permeates into the sensory processing pathway, which can be contrasted with the traditional machine learning (right) approach of unsupervised pretraining for view invariance (top) followed by supervised fine-tuning (bottom). In this case, it is unclear how high-level information can be incorporated into the sensory processing pathway to improve subsequent learning.

Continual learning using dendritic modulations on view-invariant feedforward weights

Viet Anh Khoa Tran, Emre O. Neftci, Willem A. M. Wybo COSYNE 2024

View-invariance + supervised top-down modulations for task-incremental learning

The brain can learn continuously without forgetting past skills, unlike traditional machine learning models that struggle with continual learning. We provide a 'fast and slow learning' paradigm towards solving this problem.

First, we observe that standard supervised learning of neural networks leads to a configuration, where the network becomes invariant to task-irrelevant features, i.e. also being invariant to features relevant to previous tasks ('neural collapse'). Instead, in a slow learning phase, we suggest to first learn feedforward weights to extract general features by proxy of a view-invariance learning objective: The training signal is thereby provided by smoothly moving visual stimuli, suggesting object identity. The machine learning equivalent is contrastive self-supervised learning, where the network is trained to be invariant towards randomly generated distortions (e.g. random crops, flips, etc.).

Yet, a neural collapse configuration in order to read out the class via a linear classifier. Inspired by our previous work, we train task-specific modulations - augmenting the feedfoward computation - towards such a configuration (fast learning). We show that alternating between these two phases allows in a standard continual learning setup does not lead to catastrophic forgetting of task-relevant features, but requires keeping track of drifting class-clusters (readout healing).

Training the whole network to solve a classification task leads to neural collapse, hindering continual learning (left). Instead, we suggest a separate self-supervised training objective to learn view-invariant features (right), upon which task-specific modulations might induce linearly separable representations.
Training the whole network to solve a classification task leads to neural collapse, hindering continual learning (left). Instead, we suggest a separate self-supervised training objective to learn view-invariant features (right), upon which task-specific modulations might induce linearly separable representations.

NMDA-driven dendritic modulation enables multitask representation learning in hierarchical sensory processing pathways

Willem A. M. Wybo, Matthias C. Tsai, Viet Anh Khoa Tran, Bernd Illing, Jakob Jordan, Abigail Morrison, Walter Senn PNAS

The brain can learn continuously without forgetting past skills, unlike traditional machine learning models that struggle with continual learning. We provide a 'fast and slow learning' paradigm towards solving this problem.

How does the brain adapt its computation to dynamically changing environments and tasks? We propose that dendritic modulation is a suitable candidate for these requiremets. We show in biophysically realistic simulations that task-solving modulations learned via a Hebbian learning rule modulated by a global error signal can be used to solve multiple tasks. Towards more biologically plausible machine learning, we propose task-modulated contrastive learning (TMCL) as a layer-local, semi-supervised, multitask learning algorithm.

Training the whole network to solve a classification task leads to neural collapse, hindering continual learning (left). Instead, we suggest a separate self-supervised training objective to learn view-invariant features (right), upon which task-specific modulations might induce linearly separable representations.
Task-Modulated Contrastive Learning (TMCL) is a layer-local learning algorithm, which trains weights to be invariant to task modulations.

Does Joint Training Really Help Cascaded Speech Translation?

Viet Anh Khoa Tran, David Thulke, Yingbo Gao, Christian Herold, Hermann Ney EMNLP 2022

In-domain fine-tuning is all you need.

A simple approach to translate speech from one language to text in another language is to generate a transcript using ASR (automatic speech recognition) model, which is then translated using a separate MT (machine translation) model, i.e. cascaded speech translation. We discuss the potential benefits of training these two models jointly. Our investigations highlight that the benefits of such joint training suggested by previous work can be explained away by in-domain fine-tuning both ASR and MT models whilst using the traditional cascaded approach.

On Sampling-Based Training Criteria for Neural Language Modeling

Yingbo Gao, David Thulke, Alexander Gerstenberger, Viet Anh Khoa Tran, Ralf Schlüter, Hermann Ney INTERSPEECH 2021

Different vocabulary sampling-based training criteria are all the same, except for a correction term.

As the vocabulary size of language models increases, training the cross-entropy loss across the entire vocabulary becomes computationally expensive. However, it is unclear why certain sampling-based training criteria such as noise contrastive estimation (NCE) work well in practice compared to others. Here, starting from three fundamental criteria, namely mean squared error (MSE), binary cross-entropy (BCE), and cross-entropy (CE), we explicitly write out sampling-based versions such as importance sampling, NCE and standard Monte Carlo sampling and derive a 'correction term' that - if applied during inference - makes the sampling-based training criteria perform similarly to NCE.

Analysis of positional encodings for neural machine translation

Jan Rosendahl, Viet Anh Khoa Tran, Weiyue Wang, Hermann Ney IWSLT 2019

Relative positional encodings help generalize to longer sequences

We show in the context of machine translation that while relative positional encodings are not beneficial for performance on sequence lengths seen during training, they are crucial for generalization to longer sequences. Nowadays, this fact is widely acknowledged outside of machine translation (e.g. Csordas, Irie and Schmidhuber, 2021) and relative positional encodings are used in many state-of-the-art models.

Relative positional encodings as proposed by Shaw et al., 2019
Relative positional encodings as proposed by Shaw et al., 2019

Side projects