I am Sreyan Ghosh, a 3rd-year Computer Science Ph.D. student at the University of Maryland, College Park (UMD). I conduct my research in the Gamma Lab under the mentorship of Prof. Dinesh Manocha. My work focuses on advancing audio processing—spanning speech, sounds, and music. I aim to tackle challenges such as developing data- and compute-efficient audio models, improving audio representation learning, and enhancing audio perception and reasoning in AI systems. My research is proudly supported by the NVIDIA Graduate Fellowship.
Previously, I served as a Deep Learning Solutions Architect at Nvidia, Bangalore. My primary work at Nvidia involved building and delivering deep learning based NLP solutions to Nvidia’s customers and partners. Previous to that, I served as a Software Engineer II at Cisco Systems, Bangalore. My primary work at Cisco involved building network assurance software systems for Cisco’s Service Provider customers.
I have been fortunate to have worked with Prof. S. Umesh at Speech Lab @ Indian Institute of Technology Madras on making self-supervised learning in speech and audio more amenable to resource-constrained scenarios (both data and compute). I have also worked with Prof. Rajiv Ratn Shah at MIDAS Labs @ IIIT Delhi on content moderation, complex named entity recognition and speech recognition systems for low-resource Indian languages and Indian-accented English.
I graduated with a Bachelor’s in Computer Science and Engineering from Christ University in 2020. During my undergraduate studies, I served as the Vice President and co-founder of Neuron, Christ University’s first AI group focused on research and hackathons. During my undergraduate studies, I have won over 20 national and international hackathons.
I maintain a list of my publications and research implementations under the Research tab. I also blog about my personal experiences and topics related to speech and text processing. I am always open to collaborations, and please feel free to drop me a mail!
CV / Resume: link
Email ID: gsreyan@gmail.com ; sreyang@umd.edu
📣 We announce the first Call for Papers for the Workshop on Speech and Audio Language Models (SALMA), co-located with ICASSP 2025 in Hyderabad, India! 📣
Updates
Dec 2024: | ReCLAP (and a total of 3 papers) have been accepted to ICASSP 2025! More details under the Research section. |
Dec 2024: | We are hosting the DCASE 2025 Task 5 in collaboration with NVIDIA! More details here. |
Nov 2024: | I was awarded the NVIDIA and Apple graduate fellowships! I have decided to accept the NVIDIA fellowship. |
Sept 2024: | We released MMAU, the most comprehesive audio understanding and reasoning benchmark yet! |
Sept 2024: | 2 papers accepted to EMNLP 2024 as oral presentations! |
Aug 2024: | Our workshop proposal, SALMA, has been accepted to ICASSP 2025! |
June 2024: | We release GAMA, an LLM with strong audio-understanding capabilities! Details under the Research section. |
May 2024: | 1 paper accepted to InterSpeech 2024! |
May 2024: | Joined Microsoft in Redmond as a Research Scientist Intern! |
May 2024: | 2 papers accepted to ACL 2024! |
May 2024: | 1 paper accepted to ICML 2024! |
March 2024: | 2 papers accepted to NAACL 2024! |
Feb 2024: | 1 paper accepted to CVPR 2024! |
Jan 2024: | 1 paper accepted to ICLR 2024! |
Dec 2023: | Awarded the UMD graduate school's Outstanding RA Award! |
Dec 2023: | 3 papers accepted to ICASSP 2024! Details under the research section. |
Dec 2023: | Attended EMNLP 2023 in-person in Singapore! |
Oct 2023: | 2 papers accepted to EMNLP 2023! Details under the research section. |
Oct 2023: | Attended ICCV 2023 in-person in Paris! |
Oct 2023: | Attended InterSpeech 2023 in-person in Dublin! |
May 2023: | Our paper was accepted to ICCV 2023! |
May 2023: | Started as a Research Scientist Intern at Adobe Research! |
May 2023: | Our paper was accepted to Interspeech 2023! |
Apr 2023: | Our paper was accepted to ACL 2023! |
Apr 2023: | Our paper was accepted to SIGIR 2023! |
Mar 2023: | Serving as a reviewer for Interspeech 2023! |
Feb 2023: | I got admitted to the C.S. Ph.D. program at UMD! I will be starting in the Fall of 2023!. |
Feb 2023: | 3 papers accepted to ICASSP 2023! Pre-prints under the research section. |
Feb 2023: | Serving as a reviewer for ACL 2023! |
Jan 2023: | Submitted one paper to ACL 2023! |
Jan 2023: | Our team Shravan won the Best Demo Implementation award at the 2022 IEEE-SLT Code Hackathon! Links to slides and recording of the presentation to be posted soon under the Others tab. |
Jan 2023: | Served as a reviewer for AAAI 2023 Muffin Workshop. |
Dec 2022: | Served as a reviewer for ICASSP 2023. |
Nov 2022: | Served as a reviewer for AAAI 2023. |
Oct 2022: | 4 papers submitted to IEEE ICASSP 2023! Pre-print and codes to be made available soon! |
Sept 2022: | 2 papers accepted to IEEE SLT 2022! Pre-print and code now available! |
Aug 2022: | Paper on low-resource audio representation learning accepted to IEEE JSTSP Special Issue! More details under the research section! |
Aug 2022: | Moved to the beautiful city of College Park and started school at the University of Maryland! |
July 2022: | Started contributing to GSoC 2022 for the Keras Organization. More details about my project can be found in the Projects section! |
July 2022: | 2 papers accepted to Interspeech 2022! Pre-print and codes now available now! |
Dec 2021: | Paper on Low-Resource Audio Representation Learning accepted to AAAI 2022 SAS Workshop! Pre-print now available under research section! |