Image Description

Bio. Qin Jin received her Ph.D. degree in Language and Information Technologies from Carnegie Mellon University in 2007. She got her B.Sc. and M.S. degrees in computer science and technologies from Tsinghua University, China. Dr. Jin worked as a research faculty at Language Technologies Institute in Carnegie Mellon University from 2007 to 2012. She worked as a Research Scientist at IBM China Research Laboratory from April, 2012 to December 2012. She is currently an Associate Professor at School of Information in Renmin University of China. She is leading the multimedia content analysis and understanding research group.

Multi-Level

Multi-Aspect

Multi-Modal

We live in a multi-modal world, we learn, we think and we express through multiple modalities. Therefore for AI systems they should have the ability to understand the multi-modal world. Our research effort for building AI systems foucs on understanding from multi-level, multi-aspect amd multi-modal.

  • Vision & Language

    image/video captioning, multimodal translation, VQA etc.

  • Affective Computing

    multimodal emotion recognition, memorability prediction etc.

  • Natural Dialog

    emotion-elicit dialogs etc.

Awards

  • The End-of-End-to-End A Video Understanding Pentathlon @CVPR 2020 (Rank 2nd)
  • Outstanding Method Award in VATEX Video Captioning Challenge @ ICCV 2019
  • 2019, 之江杯全球人工智能大赛视频内容描述生成 (第一名)
  • CVPR 2019 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
  • 2019 TRECVID (Video to Text Description) Grand Challenge (Rank 1st)
  • 2019 Audio-Visual Emotion Challenge @ ACM Multimedia 2019 (Winner)
  • CVPR 2018 ActivityNet Large Scale Activity Recognition Challenge (ANET) Temporal Captioning Task (Winner)
  • 2018 TRECVID (Video to Text Description) Grand Challenge (Rank 1st)
  • 2017 TRECVID (Video to Text Description) Grand Challenge (Rank 1st)
  • Best Grand Challenge Paper Award at ACM Multimedia 2017
  • 2017 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
  • 2017 Audio-Visual Emotion Challenge (Rank 1st)
  • 2016 ACM Multimedia (Video to Language) Grand Challenge (Rank 1st)
  • 2016 Audio-Visual Emotion Challenge (AVEC)(Rank 2nd)
  • 2016 MediaEval Movie Emotion Impact Challenge(Rank 1st)
  • 2016 Chinese Multimodal Emotion Challenge (MEC)(Rank 2nd)
  • 2016 NLPCC Chinese Weibo Stance Detection(Rank 1st)
  • "Spoken English Assistant" system won the second place price in IBM Bluemix computing contest.
  • 2015 ImageCLEF(Image Sentence Generation)Evaluation(Rank 1st)