I am Sreyan Ghosh, a 4th-year Computer Science Ph.D. student at the University of Maryland, College Park (UMD). I conduct my research in the Gamma Lab under the mentorship of Prof. Dinesh Manocha and Prof. Ramani Duraiswami. My research focuses on advancing multimodal intelligence, with an emphasis on audio—spanning speech, sounds, and music. I work on challenges such as building data- and compute-efficient audio models, improving audio representation learning, generating synthetic data, and enhancing perception and reasoning in AI systems. My research is proudly supported by the NVIDIA Graduate Fellowship.

I maintain a list of my publications and research implementations under the Research tab. I am always open to collaborations, and please feel free to drop me a mail!

Google Scholar | CV
Email: gsreyan@gmail.com ; sreyang@umd.edu

Updates

Sep 2025:Audio Flamingo 3 accepted to NeurIPS 2025 as a spotlight!
Aug 2025:We release MMAU-Pro, a challenging and comprehensive benchmark for evaluating audio intelligence! More details under the Research section.
July 2025:We release Audio Flamingo 3, the most open, capable and powerful large audio-language model yet! More details under the Research section.
May 2025:Failing Forward accepted to ACL 2025 (Findings)!
May 2025:Audio Flamingo 2 accepted to ICML 2025!
Mar 2025:We release Audio Flamingo 2, a SOTA audio-language model outperforming most other frontier models on audio understanding and reasoning tasks. Check out the demo here!
Jan 2025:VDGD, MMAU (Spotlight) and Synthio have been accepted to ICLR 2025! More details under the Research section.
Jan 2025:PAT, RobustCLAP and ProSE have been accepted to NAACL 2025! More details under the Research section.
Dec 2024:ReCLAP (and a total of 3 papers) have been accepted to ICASSP 2025! More details under the Research section.
Dec 2024:We are hosting the DCASE 2025 Task 5 in collaboration with NVIDIA! More details here.
Nov 2024:I was awarded the NVIDIA and Apple graduate fellowships! I have decided to accept the NVIDIA fellowship.
Sept 2024:We released MMAU, the most comprehesive audio understanding and reasoning benchmark yet!
Sept 2024:2 papers accepted to EMNLP 2024 as oral presentations!
Aug 2024:Our workshop proposal, SALMA, has been accepted to ICASSP 2025!
June 2024:We release GAMA, an LLM with strong audio-understanding capabilities! Details under the Research section.
May 2024:1 paper accepted to InterSpeech 2024!
May 2024:Joined Microsoft in Redmond as a Research Scientist Intern!
May 2024:2 papers accepted to ACL 2024!
May 2024:1 paper accepted to ICML 2024!
March 2024:2 papers accepted to NAACL 2024!
Feb 2024:1 paper accepted to CVPR 2024!
Jan 2024:1 paper accepted to ICLR 2024!
Dec 2023:Awarded the UMD graduate school's Outstanding RA Award!
Dec 2023:3 papers accepted to ICASSP 2024! Details under the research section.
Dec 2023:Attended EMNLP 2023 in-person in Singapore!
Oct 2023:2 papers accepted to EMNLP 2023! Details under the research section.
Oct 2023:Attended ICCV 2023 in-person in Paris!
Oct 2023:Attended InterSpeech 2023 in-person in Dublin!
May 2023:Our paper was accepted to ICCV 2023!
May 2023:Started as a Research Scientist Intern at Adobe Research!
May 2023:Our paper was accepted to Interspeech 2023!
Apr 2023:Our paper was accepted to ACL 2023!
Apr 2023:Our paper was accepted to SIGIR 2023!
Mar 2023:Serving as a reviewer for Interspeech 2023!
Feb 2023:I got admitted to the C.S. Ph.D. program at UMD! I will be starting in the Fall of 2023!.
Feb 2023:3 papers accepted to ICASSP 2023! Pre-prints under the research section.
Feb 2023:Serving as a reviewer for ACL 2023!
Jan 2023:Submitted one paper to ACL 2023!
Jan 2023:Our team Shravan won the Best Demo Implementation award at the 2022 IEEE-SLT Code Hackathon! Links to slides and recording of the presentation to be posted soon under the Others tab.
Jan 2023:Served as a reviewer for AAAI 2023 Muffin Workshop.
Dec 2022:Served as a reviewer for ICASSP 2023.
Nov 2022:Served as a reviewer for AAAI 2023.
Oct 2022:4 papers submitted to IEEE ICASSP 2023! Pre-print and codes to be made available soon!
Sept 2022:2 papers accepted to IEEE SLT 2022! Pre-print and code now available!
Aug 2022:Paper on low-resource audio representation learning accepted to IEEE JSTSP Special Issue! More details under the research section!
Aug 2022:Moved to the beautiful city of College Park and started school at the University of Maryland!
July 2022:Started contributing to GSoC 2022 for the Keras Organization. More details about my project can be found in the Projects section!
July 2022:2 papers accepted to Interspeech 2022! Pre-print and codes now available now!
Dec 2021:Paper on Low-Resource Audio Representation Learning accepted to AAAI 2022 SAS Workshop! Pre-print now available under research section!