Caffe2 - C++ API
A deep learning, cross platform ML framework
Data Structures | Public Types | Public Member Functions | Protected Member Functions | Protected Attributes
torch::data::DataLoaderBase< Dataset, Batch, BatchRequest > Class Template Referenceabstract

Data Structures

struct  Job
 A Job is either a BatchRequest (new indices to fetch data at) or a QuitWorker object, to indicate the worker should shut down. More...
 
struct  QuitWorker
 
struct  Result
 The finished result of a job. More...
 
struct  Sequenced
 Simple mix-in to give something a sequence number. More...
 

Public Types

using BatchType = Batch
 
using BatchRequestType = BatchRequest
 

Public Member Functions

 DataLoaderBase (DataLoaderOptions options, std::unique_ptr< Dataset > main_thread_dataset=nullptr)
 Constructs a new DataLoader from a dataset to sample from, options to configure the DataLoader with, and a sampler that specifies the sampling strategy. More...
 
Iterator< Batch > begin ()
 Returns an iterator into the DataLoader. More...
 
Iterator< Batch > end ()
 Returns a special "sentinel" iterator that compares equal with a non-sentinel iterator once the DataLoader is exhausted. More...
 
void join ()
 Joins the DataLoader's worker threads and drains internal queues. More...
 
const FullDataLoaderOptionsoptions () const noexcept
 Returns the options with which the DataLoader was configured.
 

Protected Member Functions

virtual optional< BatchRequestType > get_batch_request ()=0
 Subclass hook for getting the next batch request. More...
 
virtual void reset ()
 Resets the internal state of the DataLoader, optionally pre-fetching new jobs. More...
 
void prefetch (size_t requested_jobs)
 Schedules requested_jobs many new batches to be fetched. More...
 
void prefetch ()
 Schedules the maximum number of jobs (based on the max_jobs option).
 
optional< BatchType > next ()
 Returns the next batch of data, or an empty optional if the DataLoader is exhausted. More...
 
void worker_thread (Dataset &dataset)
 The function that worker threads run.
 
template<typename T >
void push_job (T value)
 Convenience method that calls shuttle_.push_job() with the next sequence number. More...
 
optional< Resultpop_result ()
 Convenience method that gets the next result from the sequencer.
 
std::unique_ptr< detail::sequencers::Sequencer< Result > > new_sequencer ()
 Convenience method that creates a new sequencer based on the enforce_ordering option. More...
 

Protected Attributes

const FullDataLoaderOptions options_
 The options the DataLoader was configured with.
 
std::unique_ptr< Dataset > main_thread_dataset_
 The dataset for the main thread, only has a value if the number of worker threads was configured as zero, meaning the main thread has to do all the work (synchronously). More...
 
size_t sequence_number_ = 0
 The sequence number for the next batch to be retrieved from the dataset. More...
 
std::vector< std::thread > workers_
 The worker threads, running the worker_thread() method.
 
detail::DataShuttle< Job, Resultshuttle_
 The DataShuttle which takes care of the life cycle of a job.
 
std::unique_ptr< detail::sequencers::Sequencer< Result > > sequencer_
 The Sequencer, which handles optional ordering of batches.
 
bool joined_ = false
 True if the DataLoader has joined its worker threads.
 

Detailed Description

template<typename Dataset, typename Batch, typename BatchRequest>
class torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >

Definition at line 27 of file base.h.

Constructor & Destructor Documentation

template<typename Dataset, typename Batch, typename BatchRequest>
torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::DataLoaderBase ( DataLoaderOptions  options,
std::unique_ptr< Dataset >  main_thread_dataset = nullptr 
)
inline

Constructs a new DataLoader from a dataset to sample from, options to configure the DataLoader with, and a sampler that specifies the sampling strategy.

Definition at line 35 of file base.h.

Member Function Documentation

template<typename Dataset, typename Batch, typename BatchRequest>
Iterator<Batch> torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::begin ( )
inline

Returns an iterator into the DataLoader.

The lifetime of the iterator is bound to the DataLoader. In C++ standards language, the category of the iterator is OutputIterator. See https://en.cppreference.com/w/cpp/named_req/OutputIterator for what this means. In short: you may increment the iterator and dereference it, but cannot go back, or step forward more than one position at a time. When the DataLoader is exhausted, it will compare equal with the special "sentinel" iterator returned by DataLoader::end(). Most of the time, you should only use range-for loops to loop over the DataLoader, but standard algorithms like std::copy(dataloader.begin(), dataloader.end(), output_iterator) are supported too.

Definition at line 57 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
Iterator<Batch> torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::end ( )
inline

Returns a special "sentinel" iterator that compares equal with a non-sentinel iterator once the DataLoader is exhausted.

Definition at line 69 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
virtual optional<BatchRequestType> torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::get_batch_request ( )
protectedpure virtual

Subclass hook for getting the next batch request.

The stateless case will ask the sampler for a new batch request (e.g. a vector of indices), while the stateful one will simply return the batch size.

template<typename Dataset, typename Batch, typename BatchRequest>
void torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::join ( )
inline

Joins the DataLoader's worker threads and drains internal queues.

This function may only be invoked from the main thread (in which the DataLoader lives).

Definition at line 77 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
std::unique_ptr<detail::sequencers::Sequencer<Result> > torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::new_sequencer ( )
inlineprotected

Convenience method that creates a new sequencer based on the enforce_ordering option.

Definition at line 212 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
optional<BatchType> torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::next ( )
inlineprotected

Returns the next batch of data, or an empty optional if the DataLoader is exhausted.

This operation will block until a batch is available if one is still expected.

Definition at line 165 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
void torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::prefetch ( size_t  requested_jobs)
inlineprotected

Schedules requested_jobs many new batches to be fetched.

The actual number of jobs scheduled may be less if the DataLoader exhausts.

Definition at line 147 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
template<typename T >
void torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::push_job ( T  value)
inlineprotected

Convenience method that calls shuttle_.push_job() with the next sequence number.

Definition at line 200 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
virtual void torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::reset ( )
inlineprotectedvirtual

Resets the internal state of the DataLoader, optionally pre-fetching new jobs.

Definition at line 138 of file base.h.

Field Documentation

template<typename Dataset, typename Batch, typename BatchRequest>
std::unique_ptr<Dataset> torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::main_thread_dataset_
protected

The dataset for the main thread, only has a value if the number of worker threads was configured as zero, meaning the main thread has to do all the work (synchronously).

NOTE: Really want this to be on the heap when empty, therefore unique_ptr and not optional.

Definition at line 227 of file base.h.

template<typename Dataset, typename Batch, typename BatchRequest>
size_t torch::data::DataLoaderBase< Dataset, Batch, BatchRequest >::sequence_number_ = 0
protected

The sequence number for the next batch to be retrieved from the dataset.

Definition at line 231 of file base.h.


The documentation for this class was generated from the following file: