Caffe2 - Python API
A deep learning, cross platform ML framework
Public Member Functions | List of all members
caffe2.python.checkpoint.MultiNodeCheckpointManager Class Reference
Inheritance diagram for caffe2.python.checkpoint.MultiNodeCheckpointManager:

Public Member Functions

def __init__ (self, db_prefix, db_type, metadata_handler=None)
 
def init (self, nodes, retrieve_from_epoch=None, path_prefix=None, path_type=None)
 
def load (self, epoch, path_prefix=None, path_type=None)
 
def load_blobs_locally (self, nodes, blob_names, epoch, session)
 
def get_ckpt_db_name (self, node_name, epoch)
 
def report_checkpoint_stats (self, action_name)
 
def save (self, epoch)
 
def write_checkpoint_metadata (self, epoch)
 
def get_resume_from_epoch_id (self, user_epoch=None)
 
def set_params (self, nodes, path_prefix=None, path_type=None)
 
def cp_accessible (self, epoch=None)
 

Detailed Description

Coordinates checkpointing and checkpointing across multiple nodes.
Each of `init`, `load` and `save` will build TaskGroups which will
trigger checkpointing on each of the nodes involved in a distributed job.

Args:
    db_prefix: The prefix used to construct full db name. Since `absolute_path`
        is set to True, this will be used as db_name in SaveOp.
    db_type: Type of database to use for storing checkpoint.
    metadata_handler: An optional object capable of reading/writing
        checkpoint info in storage of choice.

Definition at line 432 of file checkpoint.py.

Member Function Documentation

def caffe2.python.checkpoint.MultiNodeCheckpointManager.cp_accessible (   self,
  epoch = None 
)
Returns True if Checkpoint data is accessible

Args:
    epoch: An integer. The epoch of the checkpoint. If None,
it implies we need to check if checkpoint directory is accessible

Returns:
    is_cp_accessible: A boolean. Returns True if Checkpoint data is accessible

Definition at line 621 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.get_ckpt_db_name (   self,
  node_name,
  epoch 
)
Returns the DB name of the given node and the given epoch.

The DB name is effectively the checkpoint path of the given node and
the given epoch.

Args:
    node_name: A string. The node name of interest.
    epoch: An integer. The epoch of the checkpoint.

Returns:
    checkpoint_db_name: A string. The checkpoint path of the given
node and the given epoch.

Definition at line 531 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.get_resume_from_epoch_id (   self,
  user_epoch = None 
)
Identify the epoch-id from which Job must resume

Args:
    user_epoch: An integer. Optional parameter for user to explicitly
identify the epoch-id to load checkpoint from
Retruns:
    epoch: the epoch-id to load checkpoints from
or None if no checkpoints were written

Definition at line 583 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.load_blobs_locally (   self,
  nodes,
  blob_names,
  epoch,
  session 
)
Loads the necessary blobs from the checkpoints to the current node.

Args:
    blob_names: A list of strings. Each string is the name of a
blob.
    epoch: An integer. The checkpoint epoch to load from.
    session: A Session object to execute the Load ops.

Definition at line 497 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.report_checkpoint_stats (   self,
  action_name 
)
Report the checkpoint stats for all the nodes, we need to aggregate all
the node's stats together so that we know which node's checkpoint
operation dominates.

Args:
    action_name: A string of the name of checkpoint operation.

Definition at line 549 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.save (   self,
  epoch 
)
Build a Task that will execute a Save ops to serialize and persist
blobs present in the global workspace.

Definition at line 565 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.set_params (   self,
  nodes,
  path_prefix = None,
  path_type = None 
)
Set parameters associated with CP manager

Args:
    nodes: An array of nodes where this checkpoint manager is running.
    path_prefix: Used to construct db name or path where checkpoint files are
stored.
    path_type: Indicate the type of path where checkpoint files are stored.

Definition at line 599 of file checkpoint.py.

def caffe2.python.checkpoint.MultiNodeCheckpointManager.write_checkpoint_metadata (   self,
  epoch 
)
Write metadata for checkpoint

Args:
    epoch: An integer. The epoch-id for which checkpoint metadata is
written

Definition at line 572 of file checkpoint.py.


The documentation for this class was generated from the following file: