
I am Sreyan Ghosh, a 4th-year Computer Science Ph.D. student at the University of Maryland, College Park (UMD) and a student researcher at Nvidia. At UMD, I conduct my research at Gamma Lab under the mentorship of Prof. Dinesh Manocha and Prof. Ramani Duraiswami. At Nvidia, I work with the ADLR and Cosmos World Model team. My research focuses on advancing multimodal intelligence, with an emphasis on audio—spanning speech, sounds, and music. I work on challenges such as building data- and compute-efficient audio models, improving audio representation learning, generating synthetic data, and enhancing perception and reasoning in AI systems. My research is proudly supported by the NVIDIA Graduate Fellowship.
I maintain a list of my publications and research implementations under the Research tab. I am always open to collaborations, and please feel free to drop me a mail!
Google Scholar | CV
Email: gsreyan@gmail.com ; sreyang@umd.edu
Updates
| Nov 2025: | We release Music Flamingo, an LALM with expert music understanding capabilities! |
| Nov 2025: | MMAU-Pro accepted to AAAI 2026! |
| Sep 2025: | Audio Flamingo 3 accepted to NeurIPS 2025 as a spotlight! |
| Aug 2025: | We release MMAU-Pro, a challenging and comprehensive benchmark for evaluating audio intelligence! More details under the Research section. |
| July 2025: | We release Audio Flamingo 3, the most open, capable and powerful large audio-language model yet! More details under the Research section. |
| May 2025: | Failing Forward accepted to ACL 2025 (Findings)! |
| May 2025: | Audio Flamingo 2 accepted to ICML 2025! |
| Mar 2025: | We release Audio Flamingo 2, a SOTA audio-language model outperforming most other frontier models on audio understanding and reasoning tasks. Check out the demo here! |
| Jan 2025: | VDGD, MMAU (Spotlight) and Synthio have been accepted to ICLR 2025! More details under the Research section. |
| Jan 2025: | PAT, RobustCLAP and ProSE have been accepted to NAACL 2025! More details under the Research section. |
| Dec 2024: | ReCLAP (and a total of 3 papers) have been accepted to ICASSP 2025! More details under the Research section. |
| Dec 2024: | We are hosting the DCASE 2025 Task 5 in collaboration with NVIDIA! More details here. |
| Nov 2024: | I was awarded the NVIDIA and Apple graduate fellowships! I have decided to accept the NVIDIA fellowship. |
| Sept 2024: | We released MMAU, the most comprehesive audio understanding and reasoning benchmark yet! |
| Sept 2024: | 2 papers accepted to EMNLP 2024 as oral presentations! |
| Aug 2024: | Our workshop proposal, SALMA, has been accepted to ICASSP 2025! |
| June 2024: | We release GAMA, an LLM with strong audio-understanding capabilities! Details under the Research section. |
| May 2024: | 1 paper accepted to InterSpeech 2024! |
| May 2024: | Joined Microsoft in Redmond as a Research Scientist Intern! |
| May 2024: | 2 papers accepted to ACL 2024! |
| May 2024: | 1 paper accepted to ICML 2024! |
| March 2024: | 2 papers accepted to NAACL 2024! |
| Feb 2024: | 1 paper accepted to CVPR 2024! |
| Jan 2024: | 1 paper accepted to ICLR 2024! |
| Dec 2023: | Awarded the UMD graduate school's Outstanding RA Award! |
| Dec 2023: | 3 papers accepted to ICASSP 2024! Details under the research section. |
| Dec 2023: | Attended EMNLP 2023 in-person in Singapore! |
| Oct 2023: | 2 papers accepted to EMNLP 2023! Details under the research section. |
| Oct 2023: | Attended ICCV 2023 in-person in Paris! |
| Oct 2023: | Attended InterSpeech 2023 in-person in Dublin! |
| May 2023: | Our paper was accepted to ICCV 2023! |
| May 2023: | Started as a Research Scientist Intern at Adobe Research! |
| May 2023: | Our paper was accepted to Interspeech 2023! |
| Apr 2023: | Our paper was accepted to ACL 2023! |
| Apr 2023: | Our paper was accepted to SIGIR 2023! |
| Mar 2023: | Serving as a reviewer for Interspeech 2023! |
| Feb 2023: | I got admitted to the C.S. Ph.D. program at UMD! I will be starting in the Fall of 2023!. |
| Feb 2023: | 3 papers accepted to ICASSP 2023! Pre-prints under the research section. |
| Feb 2023: | Serving as a reviewer for ACL 2023! |
| Jan 2023: | Submitted one paper to ACL 2023! |
| Jan 2023: | Our team Shravan won the Best Demo Implementation award at the 2022 IEEE-SLT Code Hackathon! Links to slides and recording of the presentation to be posted soon under the Others tab. |
| Jan 2023: | Served as a reviewer for AAAI 2023 Muffin Workshop. |
| Dec 2022: | Served as a reviewer for ICASSP 2023. |
| Nov 2022: | Served as a reviewer for AAAI 2023. |
| Oct 2022: | 4 papers submitted to IEEE ICASSP 2023! Pre-print and codes to be made available soon! |
| Sept 2022: | 2 papers accepted to IEEE SLT 2022! Pre-print and code now available! |
| Aug 2022: | Paper on low-resource audio representation learning accepted to IEEE JSTSP Special Issue! More details under the research section! |
| Aug 2022: | Moved to the beautiful city of College Park and started school at the University of Maryland! |
| July 2022: | Started contributing to GSoC 2022 for the Keras Organization. More details about my project can be found in the Projects section! |
| July 2022: | 2 papers accepted to Interspeech 2022! Pre-print and codes now available now! |
| Dec 2021: | Paper on Low-Resource Audio Representation Learning accepted to AAAI 2022 SAS Workshop! Pre-print now available under research section! |