Eklavya Sarkar

PhD Candidate in Machine Learning

Final year PhD student at EPFL and Research Assistant at Idiap Research Insitute,
working in the Speech and Audio Processing group, under Dr. Mathew Magimai Doss.

My research focuses on speech and audio methods based on self-supervised
representation learning for analyzing human and non-human vocal communication,
for the wider purpose of studying the evolution of language.


Previously I worked worked on computer vision topics such as deepfakes and biometrics
spoofing, as well as physics research and development at CERN in the CMS experiment.



Publications

Pre-training DomainFine-TuningSelf-Supervised LearningBioacoustics
Comparing Self-Supervised Learning Models Pre-Trained on Human Speech and Animal Vocalizations for Bioacoustics Processing
Authors: Eklavya Sarkar, Mathew Magimai-Doss. Self-supervised learning (SSL) foundation models have emerged as powerful, domain-agnostic, general-purpose feature extractors applicable to a wide range of tasks. Such models pre-trained on human speech have demonstrated high transferability for bioacoustic processing. This paper investigates (i) whether SSL models pre-trained directly on animal vocalizations offer a significant advantage over those pre-trained on speech, and (ii) whether fine-tuning speech-pretrained models on automatic speech recognition (ASR) tasks can enhance bioacoustic classification. We conduct a comparative analysis using three diverse bioacoustic datasets and two different bioacoustic tasks. Results indicate that pre-training on bioacoustic data provides only marginal improvements over speech-pretrained models, with comparable performance in most scenarios. Fine-tuning on ASR tasks yields mixed outcomes, suggesting that the general-purpose representations learned during SSL pre-training are already well-suited for bioacoustic tasks. These findings highlight the robustness of speech-pretrained SSL models for bioacoustics and imply that extensive fine-tuning may not be necessary for optimal performance.
Accepted at ICASSP 2025 🇮🇳

Foundation ModelsPre-training DomainBandwidthBioacoustics
On the Utility of Speech and Audio Foundation Models for Marmoset Call Analysis
Authors: Eklavya Sarkar, Mathew Magimai-Doss. Marmoset monkeys encode vital information in their calls and serve as a surrogate model for neuro-biologists to understand the evolutionary origins of human vocal communication. Traditionally analyzed with signal processing-based features, recent approaches have utilized self-supervised models pre-trained on human speech for feature extraction, capitalizing on their ability to learn a signal's intrinsic structure independently of its acoustic domain. However, the utility of such foundation models remains unclear for marmoset call analysis in terms of multi-class classification, bandwidth, and pre-training domain. This study assesses feature representations derived from speech and general audio domains, across pre-training bandwidths of 4, 8, and 16 kHz for marmoset call-type and caller classification tasks. Results show that models with higher bandwidth improve performance, and pre-training on speech or general audio yields comparable results, improving over a spectral baseline.
Accepted at Interspeech 2024 🇬🇷

Feature RepresentationCall-Type ClassificationBioacoustics
Feature Representations for Automatic Meerkat Vocalization Classification
Authors: Imen Ben Mahoud, Eklavya Sarkar, Marta Manser, Mathew Magimai-Doss. Understanding evolution of vocal communication in social animals is an important research problem. In that context, beyond humans, there is an interest in analyzing vocalizations of other social animals such as, meerkats, marmosets, apes. While existing approaches address vocalizations of certain species, a reliable method tailored for meerkat calls is lacking. To that extent, this paper investigates feature representations for automatic meerkat vocalization analysis. Both traditional signal processing-based representations and data-driven representations facilitated by advances in deep learning are explored. Call type classification studies conducted on two data sets reveal that feature extraction methods developed for human speech processing can be effectively employed for automatic meerkat call analysis.
Accepted at Interspeech 2024 🇬🇷

Self-Supervised LearningSpeaker IDSpeech
Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?
Authors: Eklavya Sarkar, Mathew Magimai-Doss. Self-supervised learning (SSL) models use only the intrinsic structure of a given signal, independent of its acoustic domain, to extract essential information from the input to an embedding space. This implies that the utility of such representations is not limited to modeling human speech alone. Building on this understanding, this paper explores the cross-transferability of SSL neural representations learned from human speech to analyze bio-acoustic signals. We conduct a caller discrimination analysis and a caller detection study on Marmoset vocalizations using eleven SSL models pre-trained with various pretext tasks. The results show that the embedding spaces carry meaningful caller information and can successfully distinguish the individual identities of Marmoset callers without fine-tuning. This demonstrates that representations pre-trained on human speech can be effectively applied to the bio-acoustics domain, providing valuable insights for future investigations in this field.
Accepted at Interspeech 2023 🇮🇪

Signal ProcessingVADSpeech
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
Authors: Eklavya Sarkar, RaviShankar Prasad, Mathew Magimai-Doss. Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora-2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics.
Accepted at Interspeech 2022 🇰🇷

StyleGAN2Face RecognitionBiometrics
Are GAN-based Morphs Threatening Face Recognition?
Authors: Eklavya Sarkar, Pavel Korschunov, Laurent Colbois, Sébastien Marcel. Morphing attacks are a threat to biometric systems where the biometric reference in an identity document can be altered. This form of attack presents an important issue in applications relying on identity documents such as border security or access control. Research in generation of face morphs and their detection is developing rapidly, however very few datasets with morphing attacks and open-source detection toolkits are publicly available. This paper bridges this gap by providing two datasets and the corresponding code for four types of morphing attacks: two that rely on facial landmarks based on OpenCV and FaceMorpher, and two that use StyleGAN 2 to generate synthetic morphs. We also conduct extensive experiments to assess the vulnerability of four state-of-the-art face recognition systems, including FaceNet, VGG-Face, ArcFace, and ISV. Surprisingly, the experiments demonstrate that, although visually more appealing, morphs based on StyleGAN 2 do not pose a significant threat to the state to face recognition systems, as these morphs were outmatched by the simple morphs that are based facial landmarks.
Accepted at ICASSP 2022 🇸🇬

StyleGAN2Face RecognitionBiometrics
Vulnerability Analysis of Face Morphing Attacks from Landmarks and Generative Adversarial Networks
Authors: Eklavya Sarkar, Pavel Korschunov, Laurent Colbois, Sébastien Marcel. Morphing attacks are a threat to biometric systems where the biometric reference in an identity document can be altered. This form of attack presents an important issue in applications relying on identity documents such as border security or access control. Research in face morphing attack detection is developing rapidly, however very few datasets with several forms of attacks are publicly available. This paper bridges this gap by providing a new dataset with four different types of morphing attacks, based on OpenCV, FaceMorpher, WebMorph and a generative adversarial network (StyleGAN), generated with original face images from three public face datasets. We also conduct extensive experiments to assess the vulnerability of the state-of-the-art face recognition systems, notably FaceNet, VGG-Face, and ArcFace. The experiments demonstrate that VGG-Face, while being less accurate face recognition system compared to FaceNet, is also less vulnerable to morphing attacks. Also, we observed that naıve morphs generated with a StyleGAN do not pose a significant threat.
Idiap-RR-38-2020

Work Experience

Speech Processing

Research Assistant (PhD Candidate)

Idiap Research Institute

Supervisor: Dr. Mathew Magimai Doss, Speech and Audio Processing Group

  • Self-Supervised Speech Learning, Representation Learning
  • SSL, VAD, Diarization, ASWUs, Bioacoustics
  • Audio Segmentation Methods for Analyzing Vocal Communication: From Humans to Animals.
  • Low Resource Speech and Animal Vocalizations processing.
  • Working on EvoLang Project, TTF Tech ASR.
March 2021 - Present





BiometricsMLDLGANs

Research Intern

Idiap Research Institute

Supervisor: Dr. Sébastien Marcel, HOD Biometrics Security and Privacy Group

  • Developed and released StyleGAN2 latent space editing code for morphing.
  • Implemented different techniques to generate traditional and StyleGAN2-based face morphs.
  • Investigated vulnerabilities of modern facial recognition systems against morphing attacks.
  • Currently researching detection techniques for such attacks to publish paper by November.

May 2020 - Feb 2021
(10 months)



SEDSR&D

Intern

CERN

Project Manager: Dr. Archana Sharma, Principal Scientist, CMS Experiment

  • Contributed to CERN's CMS-GEM-DAQ project's production code: PR1, PR2.
  • Refined efficiency of production code by implementing requested features on Python scripts.
  • Improved code used for testing detector in a QC stand by adding an step-size feature.
  • Created method for configuring detector’s electrical state with custom values.
  • Published real time gas levels of a mixer by writing code to send data to a server via an API.

July 2017 - September 2017
(3 months)



Thesis

DLML160 Pages

Facial Information Extraction

MSc, Computer Vision, Convolutional Neural Networks
  • Attempted to use state-of-the-art deep learning techniques to build models which take an image as input.
  • Performed facial detection, recognition, and emotion classification on the present individuals on the images.
  • Achieved 95% test accuracy on facial recognition with convolutional neural networks and hyper-parameter tuning.
  • Built separate models for tasks such as emotion classification before combining them into an end-to-end models.
  • Optimised performance with DL best practices: data augmentation, batch-normalisation, cross-validation.
2018-19
Grade: Distinction

SEML200 Pages

Kohonen Self-Organising Maps

BSc, Computer Vision, Pattern Recognition
  • Implemented unsupervised machine learning neural network from scratch without using any specific ML library.
  • Trained back-end model on 3 different open-source datasets to test neural network’s efficiency and scalability.
  • Developed front-end GUI for interactive data visualisation before & after clustering and dimensionality reduction.
  • Wrote extensive thesis covering all aspects of project such as system design, algorithmic optimisation, scalability.
2017-18
Grade: 90%

DS50 Pages

Exoplanets: Discoveries and Prospects

Research, Data Analysis, Literature Review
  • 2019 Update: Dider Queloz has since won the Physics Nobel Prize !
  • Conducted literature review on Exoplanets, with inputs from Didier Queloz, co-discoverer of the first exoplanet.
  • Showed correlations between possibly habitable planets and core laws of physics by analyzing open-source DB.
  • 50 page report selected among top 2013 student scientific projects in Geneva canton and Pays de Gex.
  • Invited to present project at a public ‘Science Sharing’ event at CERN's Universe de Particules museum.
2012-13
Grade: 6/6

Projects and Open-source Contributions

RLDLDQNDDPG

Deep Reinforcement Learning: Flappy Bird

Deep Q-Learning Network, Deep Deterministic Policy Gradient, Experience Replay

Attempted to a develop model which is able to learn to play Flappy Bird, and surpass human level scores by using Reinforcement Learning techniques. Specifically investigated Deep Q-Learning networks to develop an overview of the problem and deeper understanding on reinforcement learning techniques. Wished to showcase how computer vision and deep neural networks such as convolutional neural networks can be used in the context of reinforcement learning as well.

2019

NLPMLDL

Kaggle Competition: Toxic Comment Classification

Multi-Label Classification Problem

Attempted to solve a Kaggle competition in a group of three to the best of our abilities. Specifically strove for implementations beyond the exsiting classical ones, and attempted to develop a model which is well-adapted and fine tuned to the specific problem at hand. Implemented a Naive-Bayes Bag of Words model, Random Forest, Extra Trees, and compared their results with the Log Regression, Convolutional Neural Network, and Long Short-Term Memory Recurrent models.

2019

MLBayesianStats

Bayesian Machine Learning

Hamiltonian Monte Carlo Stochastic Methods, Automatic Relevance Determination

Used Bayesian modelling methods, specifically Hamiltonian Monte Carlo, to approximate Gaussian posterior distributions on a multivariate regression task to derive a good predictor from the dataset, and estimate which of the input variabels are relevant for prediction.

2019

NLPML

Open Information Extraction

Speech Tagging, Named Entity Recognition, Relation Extraction, Kitchen Sink

Attempted to summarise Jules Verne's 20,000 leagues under the seas' by training a classifier that indicates which of the part of speech tags each word is. The approach was based on Identifying Relations for Open Information Extraction (Fader, Soderland & Etzioni). To this end, Glove word vectors were employed to implement a logistic one vs all kitchen sink model, and attempted speech tagging on word and sentence levels, named entity resolution and relation extraction.

2019

SE

Robotics I

Localisation, Pathfinding, Navigation, Calibration, Object Detection

Wrote a program using the Java LeJOS framework that enables a robot to explore the arena which contains a small number of obstacles, placed at random locations. There was a single coloured sheet of paper which the robots had to be able to detect using the colour sensor which also signifed the end location, to which the robot had optimally navigate back to the ending position.

2017

SE

Robotics II

Scout, Doctor, Agents, Jason

Wrote a program using the Java LeJOS framework allowing a robot to determine it's starting location in the arena, and optimally work its way to the pre-determined ending position using scout and doctor agents while avoiding the possible obstacles.

2017

AndroidSE

Android Food App

Full stack development

Scran is a user-oriented application that aids in the decision-making process when choosing a restaurant, and more specifically a dish. Scran will maintain, search and track user and restaurant data to help its users to choose the dish they didn’t know they wanted.

2017

SE

Moving Average Filter

Generate, Filter and Display data

Wrote C++ in Xcode to generate random plot and noise values of a sinusoidal function using signal characteristics as parameters, which would then be handled by the designed event driven panels and data structures in LabVIEW, and subsequently transferred to Matlab to be displayed in both filtered and unfiltered states.

2014

Talks

DLDiarizationNCCR

Automatic Speech Segmentation

14 June 2021
HMM Tutorial Problems

Hidden Markov Models

8 Dec 2020
DLGANsVAEs

Generative Adversarial Networks

30 April 2020
DLCNNs

Convolutional Neural Networks

30 April 2020

Competitions

3rd PrizeDL

International Create Challenge

Adversarial Attacks

  • Developed model to detect and combat adversarial attacks using Foolbox toolkit.
  • Implemented website to evaluate the robutness of a given model to adversarial attacks using a specific metric.
  • Awarded 3rd place in overall ICC2020.
2020

SEiOS

Facebook Hackathon 2015

iOS Revision App

Developed iOS app with first generation Swift on xCode.

  • Goal was to give students a platform to revise and prepare for exams on the go.
  • Content specifically tailored to the common first-year Bachelor course.
  • Option of adding content for additional modules and courses by users.
  • Hopefully improve the student pass rate at EPFL by providing feedback and tips.

2015

News

CERN Intern

Featured on University of Liverpool Student News

FULL Student feature: My summer Internship at CERN

"Arriving wide-eyed at the main lab on the first day, I discovered that I was among twenty other excited students, from all over the world, ranging from Thailand, Brazil, France, US, India, Italy and many others, all of whom had arrived at different moments during the summer, meaning there was little time for individual introductions to the lab and explanations of the various hardware components and the software code base."

October 2017

Exoplanet Project Presentation

Featured on Colloc Transfrontalier TPE-TM

Selected to present on stage my research project on Exoplanets at the Colloque Transfrontalier: La Science en Partage (a public ‘Science Sharing’ event) at CERN’s Universe of Particles museum.

October 2013

Education

Ecole Polytechnique Fédérale de Lausanne

PhD Machine Learning
- Deep Learning
- Graph Machine Learning
- Deep Learning for Natural Language Processing
- Science and Engineering Teaching and Learning

March 2021 - Present
Grade: 5.2/6


University of Bath

MSc Data Science
- Statistics
- Machine Learning I & II
- Neural Computing
- Bayesian Machine Learning
- Reinforcement Learning
- Applied Data Science
- Software technologies for data science
October 2018 - September 2019
Grade: First Class


University of Liverpool

BSc Computer Science
- Efficient Sequential Algorithms, Complexity of Algorithms
- Robotics and Autonomous Systems, Multi-System Agents
- Biocomputation, Artificial Intelligence
- Complex Information and Social Networks
- Software Engineering, Group Software Project
- Automata Theory
September 2015 - June 2018
Grade: First Class

Skills

Languages and Libraries
Tools and Technologies

Extra-Curricular

Organizer

Perspectives on AI Symposium Series
  • Participated in the conception and organization of the series along PIs.
  • Responsible for developing the vision, design, and implementation of the event website.
  • Helped in finding sponsors, budgeting, and overall event management.

2022-24

President

Student Residence Hall's Committee
  • Elected President of Hall Committee by ballot vote majority to represent 270 students.
  • Enhanced residents’ experience by taking charge and managing events throughout the year.
  • Responsible for formulating outline and implementation of vision for Hall’s community and life.
  • Led 10 member committee through generating team vision and chairing weekly meetings.
  • Maintained professional relationship with residence staff, guild of students, accommodation office and the university.

2017-18

Interests