d221: Visual Question Answering (Sources, Datasets, PDFs)

Visual Question Answering (VQA): Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

Visual Question Answering

Additional materials:

  • Virginia Tech / MSR [Web][PDF] [Paper]
    • Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop
      • “Given an image and a natural language question about the image, the task is to provide an accurate natural language answer”


  • MPI / Berkeley [Web][PDF] [Paper]
    • Mateusz Malinowski, Marcus Rohrbach, Mario Fritz, Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, arXiv:1505.01121
      • “Visual understanding by machines progresses rapidly, our primary focus lies on building machines that answer questions about images. Moreover, we also explore different ways of benchmarking the machines on this complex and ambiguous task”


  • Toronto [PDF][Dataset] [Paper]
    • Mengye Ren, Ryan Kiros, Richard Zemel, Image Question Answering: A Visual Semantic Embedding Model and a New Dataset, arXiv:1505.02074 / ICML 2015 deep learning workshop
      • “This work aims to address the problem of image-based question-answering (QA) with new models and datasets. We propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images”


  • Baidu / UCLA [PDF][Dataset] [Paper]
    • Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu, Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, arXiv:1505.05612
      • “mQA model is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word”


  • POSTECH [PDF] [Project Page] [Paper]
    • Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han, Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arXiv:1511.05765
      • “Image question answering (ImageQA) by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions”


  • CMU / Microsoft Research [PDF] [Paper]
    • Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015). Stacked Attention Networks for Image Question Answering. arXiv:1511.02274
      • “Stacked attention networks (SANs) uses semantic representation of a question as query to search for the regions in an image that are related to the answer”


  • MetaMind [PDF] [Paper]
    • Xiong, Caiming, Stephen Merity, and Richard Socher. “Dynamic Memory Networks for Visual and Textual Question Answering.” arXiv:1603.01417 (2016)
      • “Dynamic memory network DMN+ model improves the state of the art on both the Visual Question Answering dataset and the babi-10k text question-answering dataset without supporting fact supervision”


Source: https://github.com/kjw0612/awesome-deep-vision