I am Sreyan Ghosh, a 3rd-year Computer Science Ph.D. student at the University of Maryland, College Park (UMD). I conduct my research in the Gamma Lab under the mentorship of Prof. Dinesh Manocha. My work focuses on advancing audio processing—spanning speech, sounds, and music. I aim to tackle challenges such as developing data- and compute-efficient audio models, improving audio representation learning, and enhancing audio perception and reasoning in AI systems. My research is proudly supported by the NVIDIA Graduate Fellowship.

Previously, I served as a Deep Learning Solutions Architect at Nvidia, Bangalore. My primary work at Nvidia involved building and delivering deep learning based NLP solutions to Nvidia’s customers and partners. Previous to that, I served as a Software Engineer II at Cisco Systems, Bangalore. My primary work at Cisco involved building network assurance software systems for Cisco’s Service Provider customers.

I have been fortunate to have worked with Prof. S. Umesh at Speech Lab @ Indian Institute of Technology Madras on making self-supervised learning in speech and audio more amenable to resource-constrained scenarios (both data and compute). I have also worked with Prof. Rajiv Ratn Shah at MIDAS Labs @ IIIT Delhi on content moderation, complex named entity recognition and speech recognition systems for low-resource Indian languages and Indian-accented English.

I graduated with a Bachelor’s in Computer Science and Engineering from Christ University in 2020. During my undergraduate studies, I served as the Vice President and co-founder of Neuron, Christ University’s first AI group focused on research and hackathons. During my undergraduate studies, I have won over 20 national and international hackathons.

I maintain a list of my publications and research implementations under the Research tab. I also blog about my personal experiences and topics related to speech and text processing. I am always open to collaborations, and please feel free to drop me a mail!

CV / Resume: link
Email ID: gsreyan@gmail.com ; sreyang@umd.edu

📣 We announce the first Call for Papers for the Workshop on Speech and Audio Language Models (SALMA), co-located with ICASSP 2025 in Hyderabad, India! 📣

Updates

Dec 2024:ReCLAP (and a total of 3 papers) have been accepted to ICASSP 2025! More details under the Research section.
Dec 2024:We are hosting the DCASE 2025 Task 5 in collaboration with NVIDIA! More details here.
Nov 2024:I was awarded the NVIDIA and Apple graduate fellowships! I have decided to accept the NVIDIA fellowship.
Sept 2024:We released MMAU, the most comprehesive audio understanding and reasoning benchmark yet!
Sept 2024:2 papers accepted to EMNLP 2024 as oral presentations!
Aug 2024:Our workshop proposal, SALMA, has been accepted to ICASSP 2025!
June 2024:We release GAMA, an LLM with strong audio-understanding capabilities! Details under the Research section.
May 2024:1 paper accepted to InterSpeech 2024!
May 2024:Joined Microsoft in Redmond as a Research Scientist Intern!
May 2024:2 papers accepted to ACL 2024!
May 2024:1 paper accepted to ICML 2024!
March 2024:2 papers accepted to NAACL 2024!
Feb 2024:1 paper accepted to CVPR 2024!
Jan 2024:1 paper accepted to ICLR 2024!
Dec 2023:Awarded the UMD graduate school's Outstanding RA Award!
Dec 2023:3 papers accepted to ICASSP 2024! Details under the research section.
Dec 2023:Attended EMNLP 2023 in-person in Singapore!
Oct 2023:2 papers accepted to EMNLP 2023! Details under the research section.
Oct 2023:Attended ICCV 2023 in-person in Paris!
Oct 2023:Attended InterSpeech 2023 in-person in Dublin!
May 2023:Our paper was accepted to ICCV 2023!
May 2023:Started as a Research Scientist Intern at Adobe Research!
May 2023:Our paper was accepted to Interspeech 2023!
Apr 2023:Our paper was accepted to ACL 2023!
Apr 2023:Our paper was accepted to SIGIR 2023!
Mar 2023:Serving as a reviewer for Interspeech 2023!
Feb 2023:I got admitted to the C.S. Ph.D. program at UMD! I will be starting in the Fall of 2023!.
Feb 2023:3 papers accepted to ICASSP 2023! Pre-prints under the research section.
Feb 2023:Serving as a reviewer for ACL 2023!
Jan 2023:Submitted one paper to ACL 2023!
Jan 2023:Our team Shravan won the Best Demo Implementation award at the 2022 IEEE-SLT Code Hackathon! Links to slides and recording of the presentation to be posted soon under the Others tab.
Jan 2023:Served as a reviewer for AAAI 2023 Muffin Workshop.
Dec 2022:Served as a reviewer for ICASSP 2023.
Nov 2022:Served as a reviewer for AAAI 2023.
Oct 2022:4 papers submitted to IEEE ICASSP 2023! Pre-print and codes to be made available soon!
Sept 2022:2 papers accepted to IEEE SLT 2022! Pre-print and code now available!
Aug 2022:Paper on low-resource audio representation learning accepted to IEEE JSTSP Special Issue! More details under the research section!
Aug 2022:Moved to the beautiful city of College Park and started school at the University of Maryland!
July 2022:Started contributing to GSoC 2022 for the Keras Organization. More details about my project can be found in the Projects section!
July 2022:2 papers accepted to Interspeech 2022! Pre-print and codes now available now!
Dec 2021:Paper on Low-Resource Audio Representation Learning accepted to AAAI 2022 SAS Workshop! Pre-print now available under research section!