As you get familiar with Machine Learning and Neural Networks you will want to use datasets that have been provided by academia, industry, government, and even other users of Caffe2. Many of these datasets have already been trained with Caffe and/or Caffe2, so you can jump right in and start using these pre-trained models. You can also fine-tune or even do “mashups” with pre-trained models by adding additional data, models, parameters, or combinations thereof to train a new custom model for your experiments. If you think you’ve found something great, then don’t hesitate to share! This is an Open Source project and we really hope to foster innovation and collaboration.
For further info on datasets and how to prepare them take a look at the Models and Datasets tutorial. You can also check out a Caffe2 Python tutorial that downloads MNIST handwriting dataset, unzips it, calls a Caffe2 provided binary that will extract/transform/load (ETL) the data into a database of key value pairs (KVPs) - in this case it uses LevelDB to store the images. The tutorial goes on to show how the dataset is used to train a neural network that can be used to identify handwriting of numbers. This tutorial is also available as a Juypter notebook.
You may also want to check out the pre-trained models at Caffe2’s Model Zoo! You might find examples there where these datasets have been used to train models, be able to draw from their project’s open source code, and be informed of dataset-specific best practices for training models.
|AN4: 948 training and 130 test utterances|
|BSDS (300/500): 12k labeled segmentations||images segmentations|
|Celeb-A: 200k+ celebrity images, 10k+ identities|
|CIFAR-10: 60k tiny (32x32) tagged images|
|COCO: A large image dataset designed for object detection, segmentation, and caption generation.|
|CompCars: 136k+ car images & 1716 car models|
|Oxford 102 Flowers: 102 flower categories||images segmentations|
|ImageNet: 14,197,122 images, 21841 synsets indexed|
|ImageNet ILSVRC: Competition datasets|
|LSUN Scenes millions of indoor/outdoor building scenes|
|LSUN Room Layout 4000 indoor scenes|
|MNIST 60k handwriting training set, 10k test images|
|Multi-Salient-Object (MSO) 1224 tagged salient object images|
|OUI-Adience Face Image 26,580 age & gender labeled images|
|PASCAL VOC 2012 11,530 images w/ 27,450 ROI annotated objects and 6,929 segmentations|
|PCAP Network captures of regular internet traffic and attack scenario traffic|
|Penn Treebank (PTB) statistical language modeling|
|UCF11/YouTube Action 11 action categories: basketball shooting, biking/cycling, diving, golf swinging, horse back riding, soccer juggling, swinging, tennis swinging, trampoline jumping, volleyball spiking, and walking with a dog|
|US Census: demographic data|
|VGG-Face millions of faces|
|LibriSpeech 1000 hours free speech recognition traning dataset|