Inheritance diagram for caffe2.python.checkpoint.Job:

Public Member Functions
def	__init__ (self, init_group=None, epoch_group=None, download_group=None, exit_group=None, stop_conditions=None, nodes_to_checkpoint=None)

def	nodes_to_checkpoint (self)

def	compile (self, session_class)

def	__enter__ (self)

def	__exit__ (self, args)

def	add_stop_condition (self, output)

Public Attributes
	init_group

	epoch_group

	download_group

	exit_group

	stop_conditions

Detailed Description

A Job defines three TaskGroups: the `init_group`, the `epoch_group` and the
`exit_group` which will be run by a JobRunner.

The `init_group` will be run only once at startup. Its role is to
initialize globally persistent blobs such as model weights, accumulators
and data file lists.

The `epoch_group` will be run in a loop after init_group. The loop will
exit when any of the stop signals added with `add_stop_condition` is True
at the end of an epoch.

The download_group will be run only once, after all the executions of
epoch_group finish. Its role is to collect the distribute scattered
parameters back after training.

The `exit_group` will be run only once at the very end of the job, the
role of this group is to save the results of training in the end of the job.

Jobs are context-driven, so that Tasks can be added to the active Job
without having to explicitly pass the job object around.

Example of usage:

def build_reader(partitions):
    with Job.current().init_group:
        reader = HiveReader(init_reader, ..., partitions)
        Task(step=init_reader)
    with Job.current().epoch_group:
        limited_reader = ReaderWithLimit(reader, num_iter=10000)
        data_queue = pipe(limited_reader, num_threads=8)
        Job.current().add_stop_condition(limited_reader.data_finished())
    return data_queue

def build_hogwild_trainer(reader, model):
    with Job.current().init_group:
        Task(step=model.param_init_net)
    with Job.current().epoch_group:
        pipe(reader, processor=model, num_threads=8)
    with Job.current().exit_group:
        Task(step=model.save_model_net)

with Job() as job:
    reader = build_reader(partitions)
    model = build_model(params)
    build_hogwild_trainer(reader, model)

Definition at line 27 of file checkpoint.py.

The documentation for this class was generated from the following file:

caffe2/python/checkpoint.py

Public Member Functions

Public Attributes

Detailed Description

Facebook Open Source