My research focuses on advancing audio processing—spanning speech, sounds, and music. I aim to tackle challenges such as developing data- and compute-efficient audio models, improving audio representation learning, and improving audio perception and reasoning in AI systems. In my early work, I explored resource-efficient deep learning, devising methods to train models for scenarios constrained by labeled/unlabeled data or compute. This includes synthetic data augmentation, self-supervised learning, etc to enable effective downstream learning.

Currently, I am working on improving audio perception and reasoning in Large Language Models through better architectures, audio representations, and scalable synthetic data. My publications span diverse tasks within Speech, Language, and Audio Processing, including NLU, room impulse response (RIR) estimation, audio generation, compositional reasoning, Large Audio Language Models (LALMs) and audio captioning.

I am always open to collaborations, and please feel free to drop me a mail!

Google Scholar Semantic Scholar

Pre-prints

Audio and Spoken Language Processing (Chronological)

Natural Language Processing (Chronological)

Workshop