Journal and Conference

  2020

Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs
Shizhe Chen, Qin Jin, Peng Wang, Qi Wu
CVPR, 2020.

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning
Shizhe Chen, Yida Zhao, Qin Jin, Qi Wu
CVPR, 2020.

Better Captioning With Sequence-Level Exploration
Jia Chen, Qin Jin
CVPR, 2020.

Skeleton-based Interactive Graph Network for Human Object Interaction Detection
Sipeng Zheng, Shizhe Chen, Qin Jin
ICME, 2020.

Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
Jingjun Liang, Ruichen Li, Qin Jin
ACM Multimedia, 2020.

ICECAP: Information Concentrated Entity-aware Image Captioning
Anwen Hu, Shizhe Chen, Qin Jin
ACM Multimedia, 2020.

VideoIC: A Video Interactive Comments Dataset and Multimodal Multitask Learning for Comments Generation
Weiying Wang, Jieting Chen, Qin Jin
ACM Multimedia, 2020.

  2019

Unsupervised Bilingual Lexicon Induction from Mono-lingual Multimodal Data
Shizhe Chen, Qin Jin, Alexandar Hauptmann
AAAI, 2019.

Cross-culture Multimodal Emotion Recognition with Adversarial Learning
Jingjun Liang, Shizhe Chen, Jinming Zhao, Qin Jin, Haibo Liu, Li Lu
ICASSP, 2019.

Activitynet 2019 Task 3:Exploring Contexts for Dense Captioning Events in Video
Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin,Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann
CVPR 2019, ActivityNet Large Scale Activity Recognition Challenge.

From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots
Shizhe Chen, Qin Jin, Jianlong Fu
IJCAI, 2019.

Generating Video Descriptions With Latent Topic Guidance
Shizhe Chen, Qin Jin, Jia Chen, Alexander G. Hauptmann
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 21, NO. 9, SEPTEMBER 2019.

Speech Emotion Recognition in Dyadic Dialogues
Jinming Zhao, Shizhe Chen, Jingjun Liang, Qin Jin
INTERSPEECH, 2019.

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards
Yuqing Song, Shizhe Chen, Qin Jin
ACM Multimedia, 2019.

Visual Relation Detection with Multi-Level Attention
Sipeng Zheng, Shizhe Chen, Qin Jin
ACM Multimedia, 2019.

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences
Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou
ACM Multimedia, 2019.

Relation Understanding in Videos
Sipeng Zheng, Xiangyu Chen, Shizhe Chen, Qin Jin
ACM Multimedia, Grand Challenge: Relation Understanding in Videos, 2019.

Adversarial Domain Adaption for Multi-Cultural DimensionalEmotion Recognition in Dyadic Interactions
Jinming Zhao, Ruichen Li, Jingjun Liang, Qin Jin
AVEC, 2019.

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu
ICCV, VATEX Video Captioning Challenge 2019.

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Weiying Wang, Yongcheng Wang, Shizhe Chen, Qin Jin
EMNLP, 2019.

RUC_AIM3 at TRECVID 2019: Video to Text
Yuqing Song, Yida Zhao, Shizhe Chen, Qin Jinn
NIST TRECVID, 2019.

Semi-supervised Multimodal Emotion Recognition With Improved Wasserstein GANs
Jingjun Liang, Shizhe Chen, Qin Jin
APSIPA ASC, 2019.

  2018

RUC+CMU: System Report for Dense Captioning Events in Videos
Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Alexandar Hauptmann
CVPR ActivityNet Large Scale Activity Recognition Challenge, 2018.

Class-aware Self-Attention for Audio Event Recognition
Shizhe Chen, Jia Chen, Qin Jin, Alexandar Hauptmann
ACM International Conference on Multimedia Retrieval (ICMR), 2018. (Best Paper Runner-up)

Multimodal Dimensional and Continuous Emotion Recognition in Dyadic Video Interactions
Jinming Zhao, Shizhe Chen, Qin Jin
Pacific-Rim Conference on Multimedia (PCM), 2018.

iMakeup: Makeup Instructional Video Dataset for Fine-grained Dense Video Captioning
Xiaozhu Lin, Qin Jin, Shizhe Chen, Yuqing Song, Yida Zhao
Pacific-Rim Conference on Multimedia (PCM), 2018.

Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions
Jinming Zhao, Ruichen Li, Shizhe Chen, Qin Jin
ACM Multimedia Audio-Visual Emotion Challenge (AVEC) Workshop, 2018.

  2017

Video Captioning with Guidance of Multimodal Latent Topics
Shizhe Chen, Jia Chen, Qin Jin, Alexandar Hauptmann
ACM Multimedia, 2017.

Knowing Yourself: Improving Video Caption via In-depth Recap
Qin Jin, Shizhe Chen, Jia Chen, Alexandar Hauptmann
ACM Multimedia, 2017. (Best Grand Challenge Paper)

Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition
Shizhe Chen, Qin Jin, Jinming Zhao and Shuai Wang
ACM Multimedia Audio-Visual Emotion Challenge (AVEC) Workshop, 2017.

Generating Video Descriptions with Topic Guidance
Shizhe Chen, Jia Chen, Qin Jin
ACM International Conference on Multimedia Retrieval (ICMR), 2017. pdf

Emotion Recognition with Multimodal Features and Temporal Models
Shuai Wang, Wenxuan Wang, Jinming Zhao, Shizhe Chen, Qin Jin, Shilei Zhang, Yong Qin
ACM International Conference on Multimodal Interaction (ICMI), 2017.

Facial Action Units Detection with Multi-Features and-AUs Fusion
Xinrui Li, Shizhe Chen, and Qin Jin
Automatic Face & Gesture Recognition (FGR), 2017.

  2016

Boosting Recommendation in Unexplored Categories by User Price Preference
Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
ACM Transactions on Information Systems (TOIS), 2016. pdf

Video Emotion Recognition in the Wild Based on Fusion of Multimodal Features
Shizhe Chen, Xinrui Li, Qin Jin, Shilei Zhang, Yong Qin
International Conference on Multimodal Interaction (ICMI) 2016. pdf

Describing Videos using Multi-modal Fusion
Qin Jin, Jia Chen, Shizhe Chen, Yifan Xiong
ACM Multimedia, 2016. pdf

Semantic Image Profiling for Historic Events: Linking Images to Phrases
Jia Chen, Qin Jin, Yifan Xiong
ACM Multimedia 2016. pdf

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction
Shizhe Chen, Qin Jin
ACM Multimedia 2016. pdf

History Rhyme: Searching Historic Events by Multimedia Knowledge
Yifan Xiong, Jia Chen, Qin Jin, Chao Zhang
ACM Multimedia 2016. pdf

Detecting Violence in Video using Subclasses
Xirong Li, Yujia Huo, Qin Jin, Jieping Xu
ACM Multimedia 2016.

Generating Natural Video Descriptions via Multimodal Processing
Qin Jin, Junwei Liang, Xiaozhu Lin
Interspeech 2016. pdf

Improving Image Captioning by Concept-based Sentence Reranking
Xirong Li, Qin Jin
Pacific-Rim Conference on Multimedia (PCM), 2016. (Best Paper Runner-up) pdf

Video Description Generation using Audio and Visual Cues
Qin Jin, Junwei Liang
International Conference on Multimedia Retrieval (ICMR) 2016. pdf

  2015

Exploitation and Exploration Balanced Hierarchical Summary for Landmark Images
Jia Chen, Qin Jin, Shenghua Bao, Junfeng Ye, Zhong Su, Shimin Chen, Yong Yu
IEEE Transactions on Multimedia (TMM), 2015,17(10): 1773-1786. pdf

Lead Curve Detection in Drawings with Complex Cross-Points
Jia Chen, Min Li, Qin Jin, Yongzhe Zhang, Shenghua Bao, Zhong Su, Yong Yu
Neurocomputing, 2015, 168: 35-46. pdf

Image Profiling for History Events on the Fly
Jia Chen, Qin Jin, Yong Yu, Alexander G. Hauptmann
ACM Multimedia 2015 (MM’15). pdf

Persistent B+-Trees in Non-Volatile Main Memory
Shimin Chen and Qin Jin
VLDB, Hawaii, USA, 2015 (VLDB’15). pdf

Semantic Concept Annotation for User Generated Videos Using Soundtracks
Qin Jin, Junwei Liang, Xixi He, Gang Yang, Jieping Xu, Xirong Li,
International Conference on Multimedia Retrieval 2015 (ICMR’15). pdf

Speech Emotion Recognition With Acoustic And Lexical Features
Qin Jin, Chengxin Li, Shizhe Chen, Huimin Wu
In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015 (ICASSP’15). pdf

Detecting Semantic Concepts In Consumer Videos Using Audio
Junwei Liang, Qin Jin, Xixi He, Gang Yang, Jieping Xu, Xirong Li
In Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, 2015 (ICASSP’15). pdf

  2014

Does Product Recommendation Meet its Waterloo in Unexplored Categories? No, Price Comes to Help
Jia Chen, Qin Jin, Shiwan Zhao, Shenghua Bao, Li Zhang, Zhong Su, Yong Yu
SIGIR 2014 (SIGIR’14).

Semantic Concept Annotation of Consumer Videos at Frame-level Using Audio
Junwei Liang, Qin Jin, Xixi He, Xirong Li, Gang Yang, Jieping Xu
Pacific-rim Conference on Multimedia 2014 (PCM’14).

Speech Emotion Classification using Acoustic Features
Shizhe Chen, Qin Jin, Xirong Li, Gang Yang, Jieping Xu
International Symposium on Chinese Spoken Language Processing, 2014 (ISCSLP’14).

  2013

Tell Me What Happened Here in History
Jia Chen, Qin Jin, Weipeng Zhang, Shenghua Bao, Zhong Su, Yong Yu
ACM International Conference on Multimedia, 2013 (MM’13).

  2012

Event-based Video Retrieval Using Audio
Qin Jin, Peter Schulam, Shourabh Rawat, Susanne Burger, Duo Ding and Florian Metze
Annual Conf. of the International Speech Communication Association, Protland, USA,7th September, 2012 (INTERSPEECH’12).

  2011

Harmonic Structure Transform for Speaker Recognition
Kornel Laskowski and Qin Jin
Annual Conf. of the International Speech Communication Association, Florence, Italy, 28, August 2011 (INTERSPEECH’11).

Overview of Front-end Features for Robust Speaker Recognition
Qin Jin, Thomas Fang Zheng
The Third APSIPA Annual Summit and Conference (APSIPA ASC 2011).

Analysis of Dialectal Influence in Pan-Arabic ASR
Udhay Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaff, Tanja Schultz
Annual Conf. of the International Speech Communication Association, Florence, Italy, 28, August 2011 (INTERSPEECH’11).

Investigation of Cross-show Speaker Diarization
Qian Yang, Qin Jin, Tanja Schultz
Annual Conf. of the International Speech Communication Association, Florence, Italy, 28, August 2011 (INTERSPEECH’11).

  2010

Speaker Indeitification with Distant Microphone Speech
Qin Jin, Runxin Li, Qian Yang, Kornel Laskowski, Tanja Schultz
IEEE International Conference on Acoustics, Speech, and Signal Processing, Dallas, TX, USA, 14-19 April (ICASSP’10).

The 2010 CMU GALE Speech-to-Text System
Florian Metze, Roger Hsiao, Qin Jin, Udhay Nallasamy, Tanja Schultz
Annual Conference of the International Speech Communication Association, Makuhari, Japan, 26, September 2010 (INTERSPEECH’10).

  2009

Speaker De-identification via Voice Transformation
Qin Jin, Arthur Toth, Tanja Schultz, and Alan Black
In Proc. Of IEEE workshop on Automatic Speech Recognition and Understanding, Merano, Italy (ASRU’09).

Speaker Identification using Warped MVDR Cepstral Features
Matthias Wolfel, Qian Yang, Qin Jin, Tanja Schultz
Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, 06. September 2009 (INTERSPEECH’09).

Improving Speaker Segmentation via Speaker Identification and Text Segmentation
Runxin Li, Qin Jin, Tanja Schultz
Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, 06. September 2009 (INTERSPEECH’09).

Voice Convergin: Speaker De-Identification by Voice Transformation
Qin Jin, Arthur R. Toth, Tanja Schultz, Alan W Black
In proceedings of the 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 19-24 April, pp3909-3912 (ICASSP’09).

Modeling Instantaneous Intonation for Speaker Identification Using the Fundamental Frequency Variation Spectrum
Kornel Laskowski and Qin Jin
IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 19-24 April, pp4541-4544 (ICASSP’09).

Detecting Bandlimited Audio in Broadcast Television Shows
Mark C. Fuhs, Qin Jin, and Tanja Schultz
In proceedings of the 34th IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 19-24 April, pp4589-4592 (ICASSP’09).

Robust Far-Field Speaker Identification Under Mismatched Conditions
Qin Jin, Tanja Schultz
In proceedings of InterSpeech 2008, Brisbane, Australia, 22. September 2008 (INTERSPEECH’09).

  Before 2009

Is Voice Transformation a Threat to Speaker Identification?
Qin Jin, Arthur Toth, Alan Black, and Tanja Schultz
IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vagas, April, 2008 (ICASSP’08).

The CMU-InterACT 2008 Mandarin Transcription System
Roger Hsiao, Mark Fuhs, Yik-Cheung Tam, Qin Jin, Tanja Schultz
In proceedings of InterSpeech 2008, Brisbane, Australia, 22. September 2008 (INTERSPEECH’08).

Far-field Speaker Recognition
Qin Jin, Tanja Schultz, and Alex Waibel
IEEE transactions on Audio, Speech, and Language Processing (TASL), 2007, 15(7): 2023-2032.

Whispering Speaker Identification
Qin Jin, Szu-Chen Stan Jou, and Tanja Schultz
International Conference on Multimedia & Expo, Beijing, P.R.China, July 2007 (ICME’07).

Far-field Speaker Recognition
Qin Jin, Yue Pan, and Tanja Schultz
International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, 2006 (ICASP’06).

Combining Cross-Stream and Time Dimensions in Phonetic Speaker Recognition
Qin Jin, Jiri Navratil, Douglas Reynolds, Joseph Campbell, Walter Andrews, and Joy Abramson
International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 2003 (INTERSPEECH’03).



Book Chapter, workshops

Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks
Shizhe Chen, Qin Jin
ACM Multimedia Audio/Visual Emotion Challenge and Workshop 2015 (AVEC’15).

RUCMM at MediaEval 2015 Affective Impact of Movies Task: Fusion of Audio and Visual Cues
Qin Jin, Xirong Li, Haibing Cao, Yujia Huo, Shuai Liao, Gang Yang, Jieping Xu
MediaEval Workshop 2015, Wurzen, Germany.

RUC-Tencent at ImageCLEF 2015: Concept Detection, Localization and Sentence Generation
Xirong Li, Qin Jin, Shuai Liao, Junwei Liang, Xixi He, Yujia Huo, WeiYu Lan, Bin Xiao, Yanxiong Lu, Jieping Xu
CLEF working notes, 2015.

Towards the State of the Art in Automatic Mandarin Broadcast Speech Transcription
Stephen M. Chu, Hong-Kwang Kuo, Lidia Mangu, Qin Shi, Shilei Zhang, Yong Qin, Qin Jin, Ian Lane and Yik-Cheung Tam
Handbook of Natural Language Processing and Machine Translation, Chapter 3.5.2, pp. 487-495, Springer, ISBN 978-1-4419-7712-0, 2011.

CMU-InterACT Mandarin Transcription System for GALE
Roger Hsiao, Mark Fuhs, Yik-Cheung Tam, Qin Jin, Ian Lane, and Tanja Schultz
Handbook of Natural Language Processing and Machine Translation, Chapter 3.5.3, pp. 496-504, Springer, ISBN 978-1-4419-7712-0, 2011.

CMU-InterACT Arabic Speech Recognition System for GALE
Udhyakumar Nallasamy, Ian Lane, Mark Fuhs, Mohamed Noamany, Yik-Cheung Tam, Qin Jin and Tanja Schultz
Handbook of Natural Language Processing and Machine Translation, Chapter 3.6.4, pp. 535-540, Springer, ISBN 978-1-4419-7712-0, 2011.

Modeling Prosody for Speaker Recognition: Why Estimating Pitch May Be a Red Herring
Kornel Laskowski and Qin Jin
In Proc. of the 7th ISCA Speaker and Language Recognition Workshop (ODYSSEY2010), Brno, Czech Republic, 28 June - 01 July.

Compensation Approaches for Far-field Speaker Identification
Qin Jin, Kshitiz Kumar , Tanja Schultz, Richard M Stern
NIST SRE Workshop 2008, NIST SRE 2008, Montreal, Canada, 17. June 2008.

ISL Person Identification Systems in the CLEAR Evaluations
Hazim K. Ekenel and Qin Jin
In Multimodal Technologies for Perception of Humans of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, May 18, 2007.