AudioSet Audio Events Dataset: https://research.googleblog.com/2017/03/announcing-audiosеt-dataset-for-audio.html
AudioSet a large-scale dataset of manually annotated audio events: https://research.google.com/audiosеt/
Systems able to recognize sounds familiar to human listeners have a wide range of applications, from adding sound effect information to automatic video captions, to potentially allowing you to search videos for specific audio events. Building Deep Learning systems to do this relies heavily on both a large quantity of computing (often from highly parallel GPUs), and also – and perhaps more importantly – on significant amounts of accurately-labeled training data. However, research in environmental sound recognition is limited by currently available public datasets.
A sound vocabulary and audio dataset:
AudioSet consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. The ontology is specified as a hierarchical graph of event categories, covering a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.
Based on 2,084,320 YouTube videos containing 527 labels:
- Discussion: https://www.reddit.com/r/MachineLearning/comments/62g3xw/n_announcing_audiosеt_a_dataset_for_audio_event/
- AudioSet download: https://research.google.com/audiosеt/download.html
- Evaluation – eval_segments.csv – 20,383 segments from distinct videos, providing at least 59 examples for each of the 527 sound classes that are used. Because of label co-occurrence, many classes have more examples.
- Balanced train – balanced_train_segments.csv – 22,176 segments from distinct videos chosen with the same criteria: providing at least 59 examples per class with the fewest number of total segments.
- Unbalanced train – unbalanced_train_segments.csv – 2,042,985 segments from distinct videos, representing the remainder of the dataset
- AudioSet Ontology: https://github.com/audiosеt/ontology | https://github.com/audiosеt/ontology/blob/master/ontology.json
- Related paper: “Audio Set: An ontology and human-labeled dataset for audio events” – https://research.google.com/pubs/pub45857.html:
Abstract: Audio event recognition, the human-like ability to identify and relate sounds from audio, is a nascent problem in machine perception. Comparable problems such as object detection in images have reaped enormous benefits from comprehensive datasets — principally ImageNet. This paper describes the creation of Audio Set, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research. Using a carefully structured hierarchical ontology of 635 audio classes guided by the literature and manual curation, we collect data from human labelers to probe the presence of specific audio classes in 10 second segments of YouTube videos. Segments are proposed for labeling using searches based on metadata, context (e.g., links), and content analysis. The result is a dataset of unprecedented breadth and size that will, we hope, substantially stimulate the development of high-performance audio event recognizers.