- Children’s Book Test (CBT), designed to measure directly how well language models can exploit wider linguistic context.
- Movie Dialog dataset (MDD), designed to measure how well models can perform at goal and non-goal orientated dialog centered around the topic of movies (question answering, recommendation and discussion).
- The MovieQA dataset, this allows to test the ability of models to directly read documents to answer questions, and to compare this to traditional KBs in the same setting.
- Dialog-based Language Learning dataset, designed to measure how well models can perform at learning as a student given a teacher’s textual responses to the student’s answer.
- SimpleQuestions, a dataset collected for research in automatic question answering with human generated questions.