Caffe2 - Python API
A deep learning, cross platform ML framework
Public Member Functions | Public Attributes | List of all members
caffe2.python.checkpoint.Job Class Reference
Inheritance diagram for caffe2.python.checkpoint.Job:

Public Member Functions

def __init__ (self, init_group=None, epoch_group=None, download_group=None, exit_group=None, stop_conditions=None, nodes_to_checkpoint=None)
 
def nodes_to_checkpoint (self)
 
def compile (self, session_class)
 
def __enter__ (self)
 
def __exit__ (self, args)
 
def add_stop_condition (self, output)
 

Public Attributes

 init_group
 
 epoch_group
 
 download_group
 
 exit_group
 
 stop_conditions
 

Detailed Description

A Job defines three TaskGroups: the `init_group`, the `epoch_group` and the
`exit_group` which will be run by a JobRunner.

The `init_group` will be run only once at startup. Its role is to
initialize globally persistent blobs such as model weights, accumulators
and data file lists.

The `epoch_group` will be run in a loop after init_group. The loop will
exit when any of the stop signals added with `add_stop_condition` is True
at the end of an epoch.

The download_group will be run only once, after all the executions of
epoch_group finish. Its role is to collect the distribute scattered
parameters back after training.

The `exit_group` will be run only once at the very end of the job, the
role of this group is to save the results of training in the end of the job.

Jobs are context-driven, so that Tasks can be added to the active Job
without having to explicitly pass the job object around.

Example of usage:

def build_reader(partitions):
    with Job.current().init_group:
        reader = HiveReader(init_reader, ..., partitions)
        Task(step=init_reader)
    with Job.current().epoch_group:
        limited_reader = ReaderWithLimit(reader, num_iter=10000)
        data_queue = pipe(limited_reader, num_threads=8)
        Job.current().add_stop_condition(limited_reader.data_finished())
    return data_queue

def build_hogwild_trainer(reader, model):
    with Job.current().init_group:
        Task(step=model.param_init_net)
    with Job.current().epoch_group:
        pipe(reader, processor=model, num_threads=8)
    with Job.current().exit_group:
        Task(step=model.save_model_net)

with Job() as job:
    reader = build_reader(partitions)
    model = build_model(params)
    build_hogwild_trainer(reader, model)

Definition at line 27 of file checkpoint.py.


The documentation for this class was generated from the following file: