Sreyan Ghosh

I am Sreyan Ghosh, a Research Scientist at Nvidia and a Computer Science Ph.D. student at the University of Maryland, College Park (UMD). At UMD, I conduct my research at Gamma Lab under the mentorship of Prof. Dinesh Manocha and Prof. Ramani Duraiswami. At Nvidia, I work with the ADLR and Cosmos World Model teams. My research focuses on advancing multimodal intelligence, with an emphasis on audio—spanning speech, sounds, and music. I work on challenges such as building data- and compute-efficient audio models, improving audio representation learning, generating synthetic data, and enhancing perception and reasoning in AI systems. My research is proudly supported by the NVIDIA Graduate Fellowship.

I maintain a list of my publications and research implementations under the Research tab. I am always open to collaborations, and please feel free to drop me a mail!

Google Scholar | CV
Email: gsreyan@gmail.com ; sreyang@umd.edu

Updates

Mar 2026:	We release MMOU, large-scale benchmark for evaluating omni-modal models on joint audio-visual understanding in long, complex real-world videos.
Feb 2026:	The second edition of SALMA, our flagship workshop on LALMs, is coming to EMNLP 2026, Budapest, Hungary!
Feb 2026:	1 paper accepted to CVPR 2026!
Jan 2026:	Music Flamingo, OmniVinci and UALM accepted to ICLR 2026! More details in the research section!
Jan 2026:	Multi-Domain Audio QA accepted to ICASSP 2026! More details in the research section!
Nov 2025:	We release Music Flamingo, an LALM with expert music understanding capabilities!
Jan 2026:	Music Flamingo now plays the key role in a major collaboration between Universal Music Group and Nvidia.
Nov 2025:	We release Music Flamingo, an LALM with expert music understanding capabilities!
Nov 2025:	MMAU-Pro accepted to AAAI 2026!
Sep 2025:	Audio Flamingo 3 accepted to NeurIPS 2025 as a spotlight!
Aug 2025:	We release MMAU-Pro, a challenging and comprehensive benchmark for evaluating audio intelligence! More details under the Research section.
July 2025:	We release Audio Flamingo 3, the most open, capable and powerful large audio-language model yet! More details under the Research section.
May 2025:	Failing Forward accepted to ACL 2025 (Findings)!
May 2025:	Audio Flamingo 2 accepted to ICML 2025!
Mar 2025:	We release Audio Flamingo 2, a SOTA audio-language model outperforming most other frontier models on audio understanding and reasoning tasks. Check out the demo here!
Jan 2025:	VDGD, MMAU (Spotlight) and Synthio have been accepted to ICLR 2025! More details under the Research section.
Jan 2025:	PAT, RobustCLAP and ProSE have been accepted to NAACL 2025! More details under the Research section.
Dec 2024:	ReCLAP (and a total of 3 papers) have been accepted to ICASSP 2025! More details under the Research section.
Dec 2024:	We are hosting the DCASE 2025 Task 5 in collaboration with NVIDIA! More details here.
Nov 2024:	I was awarded the NVIDIA and Apple graduate fellowships! I have decided to accept the NVIDIA fellowship.
Sept 2024:	We released MMAU, the most comprehesive audio understanding and reasoning benchmark yet!
Sept 2024:	2 papers accepted to EMNLP 2024 as oral presentations!
Aug 2024:	Our workshop proposal, SALMA, has been accepted to ICASSP 2025!
June 2024:	We release GAMA, an LLM with strong audio-understanding capabilities! Details under the Research section.
May 2024:	1 paper accepted to InterSpeech 2024!
May 2024:	Joined Microsoft in Redmond as a Research Scientist Intern!
May 2024:	2 papers accepted to ACL 2024!
May 2024:	1 paper accepted to ICML 2024!
March 2024:	2 papers accepted to NAACL 2024!
Feb 2024:	1 paper accepted to CVPR 2024!
Jan 2024:	1 paper accepted to ICLR 2024!
Dec 2023:	Awarded the UMD graduate school's Outstanding RA Award!
Dec 2023:	3 papers accepted to ICASSP 2024! Details under the research section.
Dec 2023:	Attended EMNLP 2023 in-person in Singapore!
Oct 2023:	2 papers accepted to EMNLP 2023! Details under the research section.
Oct 2023:	Attended ICCV 2023 in-person in Paris!
Oct 2023:	Attended InterSpeech 2023 in-person in Dublin!
May 2023:	Our paper was accepted to ICCV 2023!
May 2023:	Started as a Research Scientist Intern at Adobe Research!
May 2023:	Our paper was accepted to Interspeech 2023!
Apr 2023:	Our paper was accepted to ACL 2023!
Apr 2023:	Our paper was accepted to SIGIR 2023!
Mar 2023:	Serving as a reviewer for Interspeech 2023!
Feb 2023:	I got admitted to the C.S. Ph.D. program at UMD! I will be starting in the Fall of 2023!.
Feb 2023:	3 papers accepted to ICASSP 2023! Pre-prints under the research section.
Feb 2023:	Serving as a reviewer for ACL 2023!
Jan 2023:	Submitted one paper to ACL 2023!
Jan 2023:	Our team Shravan won the Best Demo Implementation award at the 2022 IEEE-SLT Code Hackathon! Links to slides and recording of the presentation to be posted soon under the Others tab.
Jan 2023:	Served as a reviewer for AAAI 2023 Muffin Workshop.
Dec 2022:	Served as a reviewer for ICASSP 2023.
Nov 2022:	Served as a reviewer for AAAI 2023.
Oct 2022:	4 papers submitted to IEEE ICASSP 2023! Pre-print and codes to be made available soon!
Sept 2022:	2 papers accepted to IEEE SLT 2022! Pre-print and code now available!
Aug 2022:	Paper on low-resource audio representation learning accepted to IEEE JSTSP Special Issue! More details under the research section!
Aug 2022:	Moved to the beautiful city of College Park and started school at the University of Maryland!
July 2022:	Started contributing to GSoC 2022 for the Keras Organization. More details about my project can be found in the Projects section!
July 2022:	2 papers accepted to Interspeech 2022! Pre-print and codes now available now!
Dec 2021:	Paper on Low-Resource Audio Representation Learning accepted to AAAI 2022 SAS Workshop! Pre-print now available under research section!