Public Member Functions | |
def | __init__ (self, db_prefix, node_name, db_type, metadata_handler=None) |
def | init (self, nodes=None, retrieve_from_epoch=None, path_prefix=None, path_type=None) |
def | blob_list (self) |
def | collect_checkpoint_stats (self, stats) |
def | load (self, epoch, path_prefix=None, path_type=None) |
def | load_blobs_from_checkpoint (self, blob_names, epoch) |
def | check_db_exists (self, epoch) |
def | report_checkpoint_stats (self, action_name) |
def | save (self, epoch) |
def | write_checkpoint_metadata (self, epoch) |
def | get_resume_from_epoch_id (self, user_epoch=None) |
def | set_params (self, nodes, path_prefix=None, path_type=None) |
def | cp_accessible (self, epoch=None) |
Static Public Attributes | |
string | BLOB_NAMES = "blob_names" |
Controls saving and loading of workspaces on every epoch boundary of a job. If a CheckpointManager instance is passed to JobRunner, then JobRunner will call `init`, `read` and `save` at different moments in between epoch runs. Args: db_prefix: The prefix used to construct full db name. Since `absolute_path` is set to True, this will be used as db_name in SaveOp. node_name: Name of the node where this checkpoint_manager is used. db_type: Type of database to use for storing checkpoint. metadata_handler: An optional object capable of reading/writing checkpoint info in storage of choice.
Definition at line 149 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.collect_checkpoint_stats | ( | self, | |
stats | |||
) |
Add one checkpoint stats into the stats. Args: stats: A dict of checkpoint stats that will be reported.
Definition at line 255 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.cp_accessible | ( | self, | |
epoch = None |
|||
) |
Returns True if Checkpoint data is accessible Args: epoch: An integer. The epoch of the checkpoint. If None, it implies we need to check if checkpoint directory is accessible Returns: is_cp_accessible: A boolean. Returns True if Checkpoint data is accessible
Definition at line 416 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.get_resume_from_epoch_id | ( | self, | |
user_epoch = None |
|||
) |
Identify the epoch-id from which Job must resume Args: user_epoch: An integer. Optional parameter for user to explicitly identify the epoch-id to load checkpoint from Retruns: epoch: the epoch-id to load checkpoints from or None if no checkpoints were written
Definition at line 379 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.init | ( | self, | |
nodes = None , |
|||
retrieve_from_epoch = None , |
|||
path_prefix = None , |
|||
path_type = None |
|||
) |
Build a Task that will be run once after the job's `init_group` is run. This task will determine which blobs need to be checkpointed. If retrieve_from_epoch is not None, then the checkpoint metadata is retrieved from a previously saved checkpoint.
Definition at line 198 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.load | ( | self, | |
epoch, | |||
path_prefix = None , |
|||
path_type = None |
|||
) |
Build a Task that will be run by JobRunner when the job is to be resumed from a given epoch. This task will run a Load op that will load and deserialize all relevant blobs from a persistent storage.
Definition at line 271 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.load_blobs_from_checkpoint | ( | self, | |
blob_names, | |||
epoch | |||
) |
Builds a Task that loads only the necessary blobs from a checkpoint of the given epoch. The necessary blobs are given in the blob_names argument. Args: blob_names: A list of strings. Each string is the name of a blob. epoch: The checkpoint epoch to load from. Returns: A Task which loads the specified blobs from the checkpoint of the given epoch.
Definition at line 295 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.report_checkpoint_stats | ( | self, | |
action_name | |||
) |
Report checkpoint operation stats for current node. Args: action_name: A string of the name of checkpoint operation.
Definition at line 338 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.save | ( | self, | |
epoch | |||
) |
Build a Task that is run once after `init_group` and after each epoch is run. This will execute a Save ops to serialize and persist blobs present in the global workspace.
Definition at line 350 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.set_params | ( | self, | |
nodes, | |||
path_prefix = None , |
|||
path_type = None |
|||
) |
Set parameters associated with CP manager Args: nodes: An array of nodes where this checkpoint manager is running. path_prefix: Used to construct db name or path where checkpoint files are stored. path_type: Indicate the type of path where checkpoint files are stored.
Definition at line 395 of file checkpoint.py.
def caffe2.python.checkpoint.CheckpointManager.write_checkpoint_metadata | ( | self, | |
epoch | |||
) |
Write metadata for checkpoint Args: epoch: An integer. The epoch-id for which checkpoint metadata is written
Definition at line 368 of file checkpoint.py.