I am broadly interested in Natural Language Processing (Speech and Text). In the modality of speech, I have a special focus on making supervised and self-supervised learning in speech and audio amenable to resource constratined scenarios (both data and compute). Currently I am also working on achieving domain invariance in supervised and self-supervised learning in speech. I also like low-resource domain adaptation as a topic of research. In the modality of text, I like working on the topics of content moderation and information extraction. I am currently focused on making deep learning models for detecting complex entities in text and help detect implicit hate speech in online conversations.

I am excited to see what multiple modalities together can offer (speech, text and graphs).

Google Scholar

Papers

(names in italics indicate main contributor or equal contribution)

Under Review

MAST: Multiscale Audio Spectrogram Transformers
Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha
arXiv
Under review at ICASSP 2023

SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
arXiv
Under review at ICASSP 2023

M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations
Sreyan Ghosh, S Ramaneswaran, Utkarsh Tyagi, Harshvardhan Srivastava, Samden Lepcha, S Sakshi, Dinesh Manocha
arXiv Code
Under review at ICASSP 2023

Pre-print

Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh, Ashish Seth, Sandesh Katta, S. Umesh
arXiv Code Pre-print

Speech Emotion Recognition using Multi-task learning and a multimodal dynamic fusion network
Sreyan Ghosh, Harshvardhan Srivastava, S. Umesh
arXiv Code
Pre-print

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Lodagala V S V Durga Prasad, Ashish Seth, Sreyan Ghosh, S. Umesh
arXiv Checkpoints Leader Board
Pre-print

Journal

Decorrelating Feature Spaces for Learning General Purpose Audio Representations
Sreyan Ghosh, Ashish Seth, S. Umesh
Paper Code
IEEE JSTSP Special Issue on Self-Supervised Learning for Speech and Audio Processing

Conference

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
arXiv Code
IEEE SLT 2022

CCC-WAV2VEC 2.0: Clustering aided cross contrastive self-supervised learning of speech representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
arXiv Code
IEEE SLT 2022

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh
arXiv Code
Interspeech 2022 (Oral)

DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
Sreyan Ghosh, Sakshi, Samden Lepcha, Rajiv Ratn Shah, S. Umesh
arXiv Code Data
Interspeech 2022 (Poster)

End-to-end Named Entity Recognition from English Speech
Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah
arXiv Code Data
Interspeech 2020 (Poster)

Workshop

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning
Sreyan Ghosh, Ashish Seth, Deepak Mittal, Maneesh Singh, S. Umesh
arXiv Code
SAS Workshop @ AAAI 2022

Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Zaki Mustafa Farooqi, Sreyan Ghosh, Rajiv Ratn Shah
arXiv Leader Board (Team Name: MIDAS@IIIT-D)
FIRE 2021

Cisco at SemEval-2021 Task 5: What’s Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments
Sreyan Ghosh, Sonal Kumar
arXiv Code
SemEval-2021 @ ACL 2021

Cisco at AAAI-CAD21 shared task: Predicting Emphasis in Presentation Slides using Contextualized Embeddings
Sreyan Ghosh, Sonal Kumar, Harsh Jalan, Hemant Yadav, Rajiv Ratn Shah
arXiv Code
CAD-21 @ AAAI 2021