Research
My research focuses on advancing audio processing—spanning speech, sounds, and music. I aim to tackle challenges such as developing data- and compute-efficient audio models, improving audio representation learning, and improving audio perception and reasoning in AI systems. In my early work, I explored resource-efficient deep learning, devising methods to train models for scenarios constrained by labeled/unlabeled data or compute. This includes synthetic data augmentation, self-supervised learning, etc to enable effective downstream learning.
Currently, I am working on improving audio perception and reasoning in Large Language Models through better architectures, audio representations, and scalable synthetic data. My publications span diverse tasks within Speech, Language, and Audio Processing, including NLU, room impulse response (RIR) estimation, audio generation, compositional reasoning, Large Audio Language Models (LALMs) and audio captioning.
I am always open to collaborations, and please feel free to drop me a mail!
Google Scholar Semantic Scholar
Pre-prints
-
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmarks
S Sakshi*, Utkarsh Tyagi*, Sonal Kumar*, Ashish Seth*, Ramaneswaran Selvakumar, Oriol Nieto, Ramani Duraiswami, Sreyan Ghosh*, Dinesh Manocha
Project Website
Pre-print -
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh, Sonal Kumar, Zhifeng Kong, Rafael Valle, Bryan Catanzaro, Dinesh Manocha
Code
Pre-print -
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Sreyan Ghosh, Mohammad Sadegh Rasooli, Michael Levit, Peidong Wang, Jian Xue, Dinesh Manocha, Jinyu Li
Pre-print -
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Sreyan Ghosh*, Chandra Kiran Reddy Evuru*, Sonal Kumar*, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha
Code / Summary Tweet
Pre-print -
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Ashish Seth, Ramaneswaran Selvakumar, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
Pre-print -
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar, Sonal Kumar, Hemant Kumar Giri, Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha
Pre-print -
Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh*, Ashish Seth*, Sandesh Katta*, S. Umesh
Code
Pre-print -
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Lodagala V S V Durga Prasad*, Ashish Seth*, Sreyan Ghosh*, S. Umesh
Pre-print
Audio and Spoken Language Processing (Chronological)
-
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Sreyan Ghosh*, Sonal Kumar*, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Project Website / Summary Tweet / Coverage 1 / Coverage 2
EMNLP 2024 (Oral) -
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Code
ICASSP 2025 -
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
Ashish Seth*, Ramaneswaran Selvakumar, S Sakshi, Sonal Kumar, Sreyan Ghosh*, Dinesh Manocha
GitHub
EMNLP 2024 (Oral) -
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh*, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha
Code
InterSpeech 2024 (Oral) -
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar, Purva Chiniya, Dinesh Manocha
Project Website / Poster
CVPR 2024 -
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh*, Ashish Seth*, Sonal Kumar*, Utkarsh Tyagi*, Chandra Kiran Reddy Evuru*, Ramaneswaran S, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Project Webiste / Slides / Poster
ICLR 2024 -
RECAP: Retrieval-Augmented Audio Captioning
Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha
Code / Slides
ICASSP 2024 (Oral) -
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Ashish Seth*, Sreyan Ghosh*, S. Umesh, Dinesh Manocha
Code / Poster
ICASSP 2024 -
FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning
Ashish Seth*, Sreyan Ghosh*, S. Umesh, Dinesh Manocha
Code / Poster
ICASSP 2024 -
AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury*, Sreyan Ghosh*, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha
Poster
ICCV 2023 -
MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
Sreyan Ghosh, Utkarsh Tyagi, Ramaneswaran S, Harshvardhan Srivastava, Dinesh Manocha
Code / Slides
Interspeech 2023 (Oral) -
Decorrelating Feature Spaces for Learning General Purpose Audio Representations
Sreyan Ghosh*, Ashish Seth*, S. Umesh
Code / Poster
IEEE JSTSP Special Issue on Self-Supervised Learning for Speech and Audio Processing
ICASSP 2023 -
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup
Lodagala V S V Durga Prasad*, Sreyan Ghosh*, S. Umesh
Code / Leaderboard
ICASSP 2023 (Oral) -
MAST: Multiscale Audio Spectrogram Transformers
Sreyan Ghosh*, Ashish Seth*, S. Umesh, Dinesh Manocha
Code / Poster
ICASSP 2023 -
SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Ashish Seth*, Sreyan Ghosh*, S. Umesh, Dinesh Manocha
Code / Poster
ICASSP 2023 -
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
Code
IEEE SLT 2022 -
CCC-WAV2VEC 2.0: Clustering aided cross contrastive self-supervised learning of speech representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
Code / Leaderboard
IEEE SLT 2022 -
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh
Code
Interspeech 2022 (Oral) -
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
Sreyan Ghosh, Samden Lepcha, Sakshi, Rajiv Ratn Shah, S. Umesh
Code / Data
Interspeech 2022 -
End-to-end Named Entity Recognition from English Speech
Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah
Code / Data
Interspeech 2020
Natural Language Processing (Chronological)
-
ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions
Sreyan Ghosh*, Utkarsh Tyagi*, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramaneswaran S, S. Sakshi, Dinesh Manocha
Code / Poster
ACL 2024 -
ASPIRE: Language-Guided Augmentation for Robust Image Classification
Sreyan Ghosh*, Chandra Kiran Reddy Evuru*, Sonal Kumar*, S. Sakshi, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
ACL 2024 Findings -
A Closer Look at the Limitations of Instruction Tuning
Sreyan Ghosh*, Chandra Kiran Reddy Evuru*, Sonal Kumar*, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha
Summary Tweet / Poster / Video
ICML 2024 -
Do Vision-Language Models Understand Compound Nouns?
Sonal Kumar*, Sreyan Ghosh*, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
NAACL 2024 -
CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP
Chandra Kiran Reddy Evuru*, Sreyan Ghosh*, Sonal Kumar, Ramaneswaran S, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
NAACL 2024 Findings -
DALE: Generative Data Augmentation for Low-Resource Legal NLP
Sreyan Ghosh*, Chandra Kiran Reddy Evuru*, Sonal Kumar, Ramaneswaran S, S Sakshi, Utkarsh Tyagi, Dinesh Manocha
Code / Poster
EMNLP 2023 -
CoSyn: Detecting Implicit Hate Speech in Online Conversations Using a Context Synergized Hyperbolic Network
Sreyan Ghosh*, Manan Suri*, Purva Chiniya*, Utkarsh Tyagi*, Sonal Kumar*, Dinesh Manocha
Code / Poster
EMNLP 2023 -
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
Sreyan Ghosh*, Utkarsh Tyagi*, Manan Suri, Sonal Kumar, Ramaneswaran S, Dinesh Manocha
Code / Poster
ACL 2023 -
BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER
Sreyan Ghosh*, Utkarsh Tyagi*, Sonal Kumar*, Dinesh Manocha
Code / Poster
SIGIR 2023
Workshop
-
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation
Ashish Seth*, Sreyan Ghosh*, S. Umesh, Dinesh Manocha
Code / Poster
ICASSP 2023 SASB Workshop -
DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning
Sreyan Ghosh, Ashish Seth, Deepak Mittal, Maneesh Singh, S. Umesh
Code
SAS Workshop @ AAAI 2022 -
Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Zaki Mustafa Farooqi, Sreyan Ghosh, Rajiv Ratn Shah
Leader Board (Team Name: MIDAS@IIIT-D)
FIRE 2021 -
Cisco at SemEval-2021 Task 5: What’s Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments
Sreyan Ghosh, Sonal Kumar
Code
SemEval-2021 @ ACL 2021 -
Cisco at AAAI-CAD21 shared task: Predicting Emphasis in Presentation Slides using Contextualized Embeddings
Sreyan Ghosh, Sonal Kumar, Harsh Jalan, Hemant Yadav, Rajiv Ratn Shah
Code
CAD-21 @ AAAI 2021