caffe2.python.checkpoint.MultiNodeCheckpointManager Class Reference
def __init__ (self, db_prefix, db_type, metadata_handler=None)
def init (self, nodes, retrieve_from_epoch=None, path_prefix=None, path_type=None)
def load (self, epoch, path_prefix=None, path_type=None)
def load_blobs_locally (self, nodes, blob_names, epoch, session)
def get_ckpt_db_name (self, node_name, epoch)
def report_checkpoint_stats (self, action_name)
def save (self, epoch)
def write_checkpoint_metadata (self, epoch)
def get_resume_from_epoch_id (self, user_epoch=None)
def set_params (self, nodes, path_prefix=None, path_type=None)
def cp_accessible (self, epoch=None)

Coordinates checkpointing and checkpointing across multiple nodes.
Each of `init`, `load` and `save` will build TaskGroups which will
trigger checkpointing on each of the nodes involved in a distributed job.

    db_prefix: The prefix used to construct full db name. Since `absolute_path`
        is set to True, this will be used as db_name in SaveOp.
    db_type: Type of database to use for storing checkpoint.
    metadata_handler: An optional object capable of reading/writing
        checkpoint info in storage of choice.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.cp_accessible (   self,
  epoch = None 
Returns True if Checkpoint data is accessible

    epoch: An integer. The epoch of the checkpoint. If None,
it implies we need to check if checkpoint directory is accessible

    is_cp_accessible: A boolean. Returns True if Checkpoint data is accessible

def caffe2.python.checkpoint.MultiNodeCheckpointManager.get_ckpt_db_name (   self,
Returns the DB name of the given node and the given epoch.

The DB name is effectively the checkpoint path of the given node and
the given epoch.

    node_name: A string. The node name of interest.
    epoch: An integer. The epoch of the checkpoint.

    checkpoint_db_name: A string. The checkpoint path of the given
node and the given epoch.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.get_resume_from_epoch_id (   self,
  user_epoch = None 
Identify the epoch-id from which Job must resume

    user_epoch: An integer. Optional parameter for user to explicitly
identify the epoch-id to load checkpoint from
    epoch: the epoch-id to load checkpoints from
or None if no checkpoints were written

def caffe2.python.checkpoint.MultiNodeCheckpointManager.load_blobs_locally (   self,
Loads the necessary blobs from the checkpoints to the current node.

    blob_names: A list of strings. Each string is the name of a
    epoch: An integer. The checkpoint epoch to load from.
    session: A Session object to execute the Load ops.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.report_checkpoint_stats (   self,
Report the checkpoint stats for all the nodes, we need to aggregate all
the node's stats together so that we know which node's checkpoint
operation dominates.

    action_name: A string of the name of checkpoint operation.

def (   self,
Build a Task that will execute a Save ops to serialize and persist
blobs present in the global workspace.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.set_params (   self,
  path_prefix = None,
  path_type = None 
Set parameters associated with CP manager

    nodes: An array of nodes where this checkpoint manager is running.
    path_prefix: Used to construct db name or path where checkpoint files are
    path_type: Indicate the type of path where checkpoint files are stored.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.write_checkpoint_metadata (   self,
Write metadata for checkpoint

    epoch: An integer. The epoch-id for which checkpoint metadata is

