Research
I am broadly interested in Natural Language Processing (Speech and Text). In the modality of speech, I have a special focus on making supervised and self-supervised learning in speech and audio amenable to resource constratined scenarios (both data and compute). Currently I am also working on achieving domain invariance in supervised and self-supervised learning in speech. I also like low-resource domain adaptation as a topic of research. In the modality of text, I like working on the topics of content moderation and information extraction. I am currently focused on making deep learning models for detecting complex entities in text and help detect implicit hate speech in online conversations.
I am excited to see what multiple modalities together can offer (speech, text and graphs).
Papers
(names in italics indicate main contributor or equal contribution)
Under Review
MAST: Multiscale Audio Spectrogram Transformers
Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha
arXiv
Under review at ICASSP 2023
SLICER: Learning universal audio representations using low-resource self-supervised pre-training
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
arXiv
Under review at ICASSP 2023
M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations
Sreyan Ghosh, S Ramaneswaran, Utkarsh Tyagi, Harshvardhan Srivastava, Samden Lepcha, S Sakshi, Dinesh Manocha
arXiv Code
Under review at ICASSP 2023
Pre-print
Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh, Ashish Seth, Sandesh Katta, S. Umesh
arXiv Code
Pre-print
Speech Emotion Recognition using Multi-task learning and a multimodal dynamic fusion network
Sreyan Ghosh, Harshvardhan Srivastava, S. Umesh
arXiv Code
Pre-print
Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition
Lodagala V S V Durga Prasad, Ashish Seth, Sreyan Ghosh, S. Umesh
arXiv Checkpoints Leader Board
Pre-print
Journal
Decorrelating Feature Spaces for Learning General Purpose Audio Representations
Sreyan Ghosh, Ashish Seth, S. Umesh
Paper Code
IEEE JSTSP Special Issue on Self-Supervised Learning for Speech and Audio Processing
Conference
PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
arXiv Code
IEEE SLT 2022
CCC-WAV2VEC 2.0: Clustering aided cross contrastive self-supervised learning of speech representations
Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
arXiv Code
IEEE SLT 2022
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh
arXiv Code
Interspeech 2022 (Oral)
DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
Sreyan Ghosh, Sakshi, Samden Lepcha, Rajiv Ratn Shah, S. Umesh
arXiv Code Data
Interspeech 2022 (Poster)
End-to-end Named Entity Recognition from English Speech
Hemant Yadav, Sreyan Ghosh, Yi Yu, Rajiv Ratn Shah
arXiv Code Data
Interspeech 2020 (Poster)
Workshop
DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning
Sreyan Ghosh, Ashish Seth, Deepak Mittal, Maneesh Singh, S. Umesh
arXiv Code
SAS Workshop @ AAAI 2022
Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets
Zaki Mustafa Farooqi, Sreyan Ghosh, Rajiv Ratn Shah
arXiv Leader Board (Team Name: MIDAS@IIIT-D)
FIRE 2021
Cisco at SemEval-2021 Task 5: What’s Toxic?: Leveraging Transformers for Multiple Toxic Span Extraction from Online Comments
Sreyan Ghosh, Sonal Kumar
arXiv Code
SemEval-2021 @ ACL 2021
Cisco at AAAI-CAD21 shared task: Predicting Emphasis in Presentation Slides using Contextualized Embeddings
Sreyan Ghosh, Sonal Kumar, Harsh Jalan, Hemant Yadav, Rajiv Ratn Shah
arXiv Code
CAD-21 @ AAAI 2021