Senior Researcher ยท Tencent America
I am a Senior Researcher at Tencent America. My current research centers on multimodal understanding, with a particular emphasis on vision-language models. I am also actively exploring computer use agents. On the side, I have been experimenting with text-to-speech generation.
I received my PhD in Computer Science at UC Irvine, where I had been conducting research since Fall 2019 under the supervision of Prof. Stephan Mandt. My research primarily focuses on developing neural-based approaches for image/video compression and generation.
Before joining UC Irvine, I earned my bachelor's degree in Computer Science from NYU Shanghai. During my undergraduate studies, I had the privilege of working as a research assistant with Prof. Gus Xia, where I investigated generative modeling applications in music. Additionally, I collaborated with Prof. Hanghui Chen on physics research, gaining valuable interdisciplinary experience.