Operators Catalog

caffe2/operators/accuracy_op.cc

Accuracy

Accuracy takes two inputs- predictions and labels, and returns a float accuracy value for the batch. Predictions are expected in the form of 2-D tensor containing a batch of scores for various classes, and labels are expected in the form of 1-D tensor containing true label indices of samples in the batch. If the score for the label index in the predictions is the highest among all classes, it is considered a correct prediction.

Interface

Arguments
`top_k`	Count as correct by comparing the true label to the top k scoring classes (default 1: only compare to the top scoring class i.e. argmax)
Inputs
`predictions`	2-D tensor (Tensor) of size (num_batches x num_classes) containing scores
`labels`	1-D tensor (Tensor) of size (num_batches) having the indices of true labels
Outputs
`accuracy`	1-D tensor (Tensor) of size 1 containing accuracy

Code

Adagrad

Computes the AdaGrad update for an input gradient and accumulated history. Concretely, given inputs (param, grad, moment, learning_rate), computes

    new_moment = moment + square(grad)
    new_grad = learning_rate * grad / (sqrt(new_moment) + epsilon)
    new_param = param + new_grad

and returns (new_param, new_moment).

Interface

Arguments
`epsilon`	Default 1e-5
`decay`	Default 1. If it is in (0, 1), the gradient square sum is decayed by this factor.
Inputs
`param`	Parameters to be updated
`moment`	Moment history
`grad`	Gradient computed
`lr`	learning rate
Outputs
`output_param`	Updated parameters
`output_moment`	Updated moment

Code

caffe2/sgd/adagrad_op.cc

Adam

Computes the Adam update ( https://arxiv.org/abs/1412.6980)) for an input gradient and momentum parameters. Concretely, given inputs (param, m1, m2, grad, lr, iters),

    t = iters + 1
    corrected_local_rate = lr * sqrt(1 - power(beta2, t)) /
      (1 - power(beta1, t))
    m1_o = (beta1 * m1) + (1 - beta1) * grad
    m2_o = (beta2 * m2) + (1 - beta2) * np.square(grad)
    grad_o = corrected_local_rate * m1_o / \
        (sqrt(m2_o) + epsilon)
    param_o = param + grad_o

and returns (param_o, m1_o, m2_o)

Interface

Arguments
`beta1`	Default 0.9
`beta2`	Default 0.999
`epsilon`	Default 1e-5
Inputs
`param`	Parameters to be updated
`moment_1`	First moment history
`moment_2`	Second moment history
`grad`	Gradient computed
`lr`	learning rate
`iter`	iteration number
Outputs
`output_param`	Updated parameters
`output_moment_1`	Updated first moment
`output_moment_2`	Updated second moment

Code

caffe2/sgd/adam_op.cc

Add

Performs element-wise binary addition (with limited broadcast support). If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and type as A

Code

caffe2/operators/sequence_ops.cc

AddPadding

Given a partitioned tensor T<N, D1…, Dn>, where the partitions are defined as ranges on its outer-most (slowest varying) dimension N, with given range lengths, return a tensor T<N + 2*padding_width, D1 …, Dn> with paddings added to the start and end of each range. Optionally, different paddings can be provided for beginning and end. Paddings provided must be a tensor T<D1…, Dn>. If no padding is provided, add zero padding. If no lengths vector is provided, add padding only once, at the start and end of data.

Interface

Arguments
`padding_width`	Number of copies of padding to add around each range.
`end_padding_width`	(Optional) Specifies a different end-padding width.
Inputs
`data_in`	(T<N, D1…, Dn>) Input data
`lengths`	(i64) Num of elements in each range. sum(lengths) = N.
`start_padding`	T<D1…, Dn> Padding data for range start.
`end_padding`	T<D1…, Dn> (optional) Padding for range end. If not provided, start_padding is used as end_padding as well.
Outputs
`data_out`	(T<N + 2*padding_width, D1…, Dn>) Padded data.
`lengths_out`	(i64, optional) Lengths for each padded range.

Code

Alias

Makes the output and the input share the same underlying storage. WARNING: in general, in caffe2’s operator interface different tensors should have different underlying storage, which is the assumption made by components such as the dependency engine and memory optimization. Thus, in normal situations you should not use the AliasOp, especially in a normal forward-backward pass. The Alias op is provided so one can achieve true asynchrony, such as Hogwild, in a graph. But make sure you understand all the implications similar to multi-thread computation before you use it explicitly.

Interface

Inputs
`input`	Input tensor whose storage will be shared.
Outputs
`output`	Tensor of same shape as input, sharing its storage.

Code

Allgather

Does an allgather operation among the nodes.

Interface

Inputs
`comm_world`	The common world.
`X`	A tensor to be allgathered.
Outputs
`Y`	The allgathered tensor, same on all nodes.

Code

Allreduce

Does an allreduce operation among the nodes. Currently only Sum is supported.

Interface

Inputs
`comm_world`	The common world.
`X`	A tensor to be allreduced.
Outputs
`Y`	The allreduced tensor, same on all nodes.

Code

And

Performs element-wise logical operation and (with limited broadcast support). Both input operands should be of type bool . If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and A and type `bool`

Code

Append

Append input 2 to the end of input 1. Input 1 must be the same as output, that is, it is required to be in-place. Input 1 may have to be re-allocated in order for accommodate to the new size. Currently, an exponential growth ratio is used in order to ensure amortized constant time complexity. All except the outer-most dimension must be the same between input 1 and 2.

Interface

Inputs
`dataset`	The tensor to be appended to.
`new_data`	Tensor to append to the end of dataset.
Outputs
`dataset`	Same as input 0, representing the mutated tensor.

Code

caffe2/operators/arg_ops.cc

ArgMax

Retrive the argmax of the axis dimension. Given an input tensor of shape [a_0, a_1, …, a_{n-1}] and two arguments axis as int and keepdims as bool, returns one output: - Index tensor which contains the indices of the largest element. It has the

  same dims as X.dims() with the dimension along axis equals 1 when
  keepdims == true otherwise removed.

Interface

Arguments
`axis`	The axis to get argmax.
`keepdims`	Whether to keep the axis dim in the output.
Inputs
`X`	Tenor of shape [a_0, a_1, …, a_{n-1}].
Outputs
`Indices`	Tensor of indices for the largest values.

Code

ArgMin

Retrive the argmin of the axis dimension. Given an input tensor of shape [a_0, a_1, …, a_{n-1}] and two arguments axis as int and keepdims as bool, returns one output: - Index tensor which contains the indices of the largest element. It has the

  same dims as X.dims() with the dimension along axis equals 1 when
  keepdims == true otherwise removed.

Interface

Arguments
`axis`	The axis to get argmin.
`keepdims`	Whether to keep the axis dim in the output.
Inputs
`X`	Tenor of shape [a_0, a_1, …, a_{n-1}].
Outputs
`Indices`	Tensor of indices for the largest values.

Code

caffe2/operators/arg_ops.cc

Assert

Assertion op. Takes in a tensor of bools, ints, longs, or long longs and checks if all values are true when coerced into a boolean. In other words, for non-bool types this asserts that all values in the tensor are non-zero.

Interface

Arguments
`error_msg`	An error message to print when the assert fails.

BBoxTransform

Transform proposal bounding boxes to target bounding box using bounding box

    regression deltas.

Interface

Arguments
`weights`	vector weights [wx, wy, ww, wh] for the deltas
`apply_scale`	bool (default true), transform the boxes to the scaled image space after applying the bbox deltas.Set to false to match the detectron code, set to true for keypoint models and for backward compatibility
`correct_transform_coords`	bool (default false), Correct bounding box transform coordates, see bbox_transform() in boxes.py Set to true to match the detectron code, set to false for backward compatibility
Inputs
`rois`	Bounding box proposals in pixel coordinates, Size (M, 4), format [x1, y1, x2, y2], orSize (M, 5), format [batch_index, x1, y1, x2, y2]. If proposals from multiple images in a batch are present, they should be grouped sequentially and in incremental order.
`deltas`	bounding box translations and scales,size (M, 4*K), format [dx, dy, dw, dh], K = # classes
`im_info`	Image dimensions, size (batch_size, 3), format [img_height, img_width, img_scale]
Outputs
`box_out`	Pixel coordinates of the transformed bounding boxes,Size (M, 4*K), format [x1, y1, x2, y2]
`roi_batch_splits`	Tensor of shape (batch_size) with each element denoting the number of RoIs belonging to the corresponding image in batch

BatchBoxCox

Input data is a N * D matrix. Apply box-cox transform for each column. lambda1 and lambda2 is of size D that defines the hyper-parameters for the transform of each column x of the input data :

    ln(x + lambda2), if lambda1 == 0
    ((x + lambda2)^lambda1 - 1)/lambda1, if lambda1 != 0

Interface

Inputs
`data`	input float or double N * D matrix
`lambda1`	tensor of size D with the same type as data
`lambda2`	tensor of size D with the same type as data
Outputs
`output`	output matrix that applied box-cox transform

Code

caffe2/operators/batch_box_cox_op.cc

BatchBucketOneHot

Input is a matrix tensor. Its first dimension is the batch size. For each column, bucketize it based on the boundary values and then do one hot encoding. The lengths specifies the number of boundary values for each column. The final number of buckets is this number plus 1. This would also be the expanded feature size. boundaries specifies all the boundary values. Note that each bucket is right-inclusive. That is, given boundary values [b1, b2, b3], the buckets are defined as (-int, b1], (b1, b2], (b2, b3], (b3, inf). For example

  If data = [[2, 3], [4, 1], [2, 5]], lengths = [2, 3],
  and boundaries = [0.1, 2.5, 1, 3.1, 4.5], then

  output = [[0, 1, 0, 0, 1, 0, 0], [0, 0, 1, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1]]

Interface

Inputs
`data`	input tensor matrix
`lengths`	the size is the same as the width of the `data`
`boundaries`	bucket boundaries
Outputs
`output`	output matrix that expands each input column with one hot encodingbased on the bucketization

Code

caffe2/operators/one_hot_ops.cc

BatchDenseToSparse

This Op is a inverse of BatchSparseToDenseOp. Basically, given a lengths vector, a indices vector, and a dense matrix dense , output value vector so that, along with lengths vector and indices vector, forms a sparse representation of the dense matrix. A sparse matrix is represented by lengths vector, indices vector, and values vector. Each element in lengths vector (lengths[ i ]) represents the number of indices in this batch (batch i ). With in each batch, indices should not have duplicate number. For example, with input:

  lengths = [2, 3, 1]
  indices = [0, 1, 2, 3, 4, 5]
  output = [[6, 7, 0, 0, 0,  0],
            [0, 0, 8, 9, 10, 0],
            [0, 0, 0, 0, 0, 11]]

The output is:

  values = [6, 7, 8, 9, 10, 11]

after running this operator.

Interface

Inputs
`lengths`	Flatten lengths, Used to break down indices into per batch indices
`indices`	Flatten indices, tensor of total size = \sum lengths, containing the indices
`dense`	dense 2-D tensor, first dim = len(lengths), last dim > Any(indices)
Outputs
`values`	Values, tensor of the same size as `indices` and same data type as dense tensor.

Code

caffe2/operators/batch_sparse_to_dense_op.cc

BatchGather

Batch gather operation, first dimension in DATA is the batch size. Given DATA tensor of rank r >= 2, and INDICES tensor of rank q >= 1, gather entries of the outer-most dimension of DATA indexed by INDICES, and concatenate them in an output tensor of rank (q - 1) + (r - 1). Example:

  DATA  = [
      [1.0, 1.2, 2.4, 4.5],
      [2.3, 3.4, 3.6, 2.3],
      [4.5, 5.7, 1.2, 4.5],
  ]
  INDICES = [
      [0, 2],
  ]
  OUTPUT = [
      [1.0, 2.4],
      [2.3, 3.6],
      [4.5, 1.2],
  ]

Interface

Inputs
`DATA`	Tensor of rank r >= 2.
`INDICES`	Tensor of int32/int64 indices, of any rank q.
Outputs
`OUTPUT`	Tensor of rank (q - 1) + (r - 1).

BatchOneHot

Input is a matrix tensor. Its first dimension is the batch size. Expand each column of it using one hot encoding. The lengths specifies the size of each column after encoding, and the values is the dictionary value of one-hot encoding for each column. For example

  If data = [[2, 3], [4, 1], [2, 5]], lengths = [2, 3],
  and values = [2, 4, 1, 3, 5], then

  output = [[1, 0, 0, 1, 0], [0, 1, 1, 0, 0], [1, 0, 0, 0, 1]]

Interface

Inputs
`data`	input tensor matrix
`lengths`	the size is the same as the width of the `data`
`values`	one hot encoding dictionary values
Outputs
`output`	output matrix that expands each input column with one hot encoding

Code

caffe2/operators/one_hot_ops.cc

BatchSparseToDense

Convert sparse matrix representation into dense matrix. A sparse matrix is represented by lengths vector, indices vector, and values vector. Each element in lengths vector (lengths[ i ]) represents the number of indices in this batch (batch i ). With in each batch, indices should not have duplicate number. For example, with input:

  lengths = [2, 3, 1]
  indices = [0, 1, 2, 3, 4, 5]
  values =  [6, 7, 8, 9, 10, 11]
  dense_dim = 6
  default_value = 0

The output is:

  output = [[6, 7, 0, 0, 0,  0],
            [0, 0, 8, 9, 10, 0],
            [0, 0, 0, 0, 0, 11]]

after running this operator.

Interface

Arguments
`dense_last_dim`	Optional, output dense last dimension. If both this argument and output_shape_inference are set, it should be consistent with output_shape_inference’s last dim
`default_value`	Optional, missing values are filled with this value.default_value = 0 when not set
Inputs
`lengths`	Flatten tensor, used to break down indices and values into per batch indices and values.
`indices`	Flatten tensor of total size = \sum lengths, containing the indices
`values`	Data tensor, dimension has to match `indices`
`output_shape_inference`	Optional, a dense tensor whose shape define the output shape
Outputs
`dense`	2-D dense tensor, with 1st dim = len(lengths), 2nd dim = dense_last_dimin the arg list, the tensor is of the same data type as `values`.Missing values are filled with default_value

  mask1   = True, False, True, False, False
  values1 = 1.0, 3.0
  mask2   = False, True, False, False, False
  values2 = 2.0
  mask3   = False, False, False, True, True
  values3 = 4.0, 5.0

Reconstruct by:

  output = net.BooleanUnmask([mask1, values1, mask2, values2, mask3, values3], ["output"])

We get:

  output = 1.0, 2.0, 3.0, 4.0, 5.0

Note that for all mask positions, there must be at least one True. If for a field there are multiple True’s, we will accept the first value. For example: Example 1:

  mask1   = True, False
  values1 = 1.0
  mask2   = False, False
  values2 =

This is not allowed:

  output = net.BooleanUnmask([mask1, values1, mask2, values2], ["output"])

Example 2:

  mask1   = True, False
  values1 = 1.0
  mask2   = True, True
  values2 = 2.0, 2.0

  output = net.BooleanUnmask([mask1, values1, mask2, values2], ["output"])

We get:

  output = 1.0, 2.0

Interface

Outputs
`unmasked_data`	The final reconstructed unmasked data

Code

caffe2/operators/boolean_unmask_ops.cc

BoxWithNMSLimit

Apply NMS to each class (except background) and limit the number of returned boxes.

Interface

Arguments
`score_thresh`	(float) TEST.SCORE_THRESH
`nms`	(float) TEST.NMS
`detections_per_im`	(int) TEST.DEECTIONS_PER_IM
`soft_nms_enabled`	(bool) TEST.SOFT_NMS.ENABLED
`soft_nms_method`	(string) TEST.SOFT_NMS.METHOD
`soft_nms_sigma`	(float) TEST.SOFT_NMS.SIGMA
`soft_nms_min_score_thres`	(float) Lower bound on updated scores to discard boxes
Inputs
`scores`	Scores, size (count, num_classes)
`boxes`	Bounding box for each class, size (count, num_classes * 4)
`batch_splits`	Tensor of shape (batch_size) with each element denoting the number of RoIs/boxes belonging to the corresponding image in batch. Sum should add up to total count of scores/boxes.
Outputs
`scores`	Filtered scores, size (n)
`boxes`	Filtered boxes, size (n, 4)
`classes`	Class id for each filtered score/box, size (n)
`batch_splits`	Output batch splits for scores/boxes after applying NMS
`keeps`	Optional filtered indices, size (n)
`keeps_size`	Optional number of filtered indices per class, size (num_classes)

Code

caffe2/operators/box_with_nms_limit_op.cc

Broadcast

Does a broadcast operation from the root node to every other node. The tensor on each node should have been pre-created with the same shape and data type.

Interface

Arguments
`root`	(int, default 0) the root to run broadcast from.
Inputs
`comm_world`	The common world.
`X`	A tensor to be broadcasted.
Outputs
`X`	In-place as input 1.

Code

caffe2/operators/cast_op.cc

Cast

The operator casts the elements of a given input tensor to a data type specified by the ‘to’ argument and returns an output tensor of the same size in the converted type. The ‘to’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message. If the ‘to’ argument is not provided or is not one of the enumerated types in DataType, Caffe2 throws an Enforce error. NOTE: Casting to and from strings is not supported yet.

Interface

Arguments
`to`	The data type to which the elements of the input tensor are cast.Strictly must be one of the types from DataType enum in TensorProto
Inputs
`input`	Input tensor to be cast.
Outputs
`output`	Output tensor with the same shape as input with type specified by the ‘to’ argument

Code

Ceil

Ceil takes one input data (Tensor) and produces one output data (Tensor) where the ceil function, y = ceil(x), is applied to the tensor elementwise. Currently supports only float32.

Interface

Inputs
`X`	ND input tensor
Outputs
`Y`	ND input tensor

Code

caffe2/operators/ceil_op.cc

ChannelBackpropStats

Given an input tensor in NCHW format, the gradient for the output of SpatialBN and the per-channel mean and inverse std var vectors for the input, computes the per-channel bias and scale gradient to be used during the backward pass for subsequent spatial batch normalization gradient calculation. Typically, the results of this op are subsequently reduced over multiple devices to obtain statistics over a larger batch size in cases where the batch size for a single model copy is too low to yield the full benefit of batch normalization. The resulting bias and scale can then be plugged back into SpatialBNGradient to get results over the larger batch size

Interface

Inputs
`X`	The input 4-dimensional tensor of shape NCHW
`mean`	The mean saved from the forward pass as a 1-dimensional tensor of size C.
`inv_std`	The saved inverse standard deviation as a 1-dimensional tensor of size C.
`output_grad`	Gradient for the output layer of SpatialBN, here used as input because we are on the backward pass
Outputs
`scale_grad`	Gradient for the scale vector
`bias_grad`	Gradient for the bias vector

Code

caffe2/operators/clip_op.cc

ClipTensorByScaling

    Clips the input tensor by scaling based on the input value and the threshold.
    The value is usually the (pre-computed) norm of the tensor. If the value is
    larger than the threshold, scaling would be performed in this way:

          tensor *= (threshold / value).

    An optional input called additional_threshold can be provided which
    will scale the original threshold before it is used. That is,
    the final threshold will become threshold * additional_threshold.
    This op could be used for gradient clipping.

Interface

Arguments
`threshold`	Threshold to determine whether to scale down the tensor
Inputs
`input_tensor`	Tensor of floats to be clipped.
`val`	Value to be compared against the threshold
`additional_threshold`	An optional additonal threshold to scale the orignal threshold
Outputs
`clipped`	Tensor of floats, which is the same size as the input tensor, representing the clipped tensor.

Code

caffe2/sgd/clip_tensor_op.cc

CloneCommonWorld

Clones existing common world.

Interface

Inputs
`existing_comm_world`	Existing common world to clone.
Outputs
`comm_world`	A common world for collective operations.

Interface

Inputs
`input`	The input CPU tensor.
Outputs
`output`	either a TensorCUDA or a TensorCPU

Code

CopyOnDeviceLike

Copy input tensor into output to the specific device.

Interface

Inputs
`input`	The input tensor.
`dst`	Tensor, on which device the copy will be performed.
Outputs
`output`	Tensor that will contain a copy of the input.

Code

caffe2/operators/cos_op.cc

Cos

Calculates the cosine of the given input tensor, element-wise.

Interface

Inputs
`input`	Input tensor
Outputs
`output`	The cosine of the input tensor computed element-wise

Code

CosGradient

No documentation yet.

Code

caffe2/operators/cos_op.cc

CosineEmbeddingCriterion

CosineEmbeddingCriterion takes two inputs: the similarity value and the label, and computes the elementwise criterion output as

  output = 1 - s,               if y == 1

          max(0, s - margin),  if y == -1

Interface

Inputs
`S`	The cosine similarity as a 1-dim TensorCPU.
`Y`	The label as a 1-dim TensorCPU with int value of 1 or -1.
Outputs
`loss`	The output loss with the same dimensionality as S.

Code

caffe2/operators/text_file_reader.cc

CreateTreeCursor

Creates a cursor to iterate through a list of tensors, where some of those tensors contains the lengths in a nested schema. The schema is determined by the fields arguments. For example, to represent the following schema:

  Struct(
      a=Int(),
      b=List(List(Int),
      c=List(
          Struct(

            c1=String,

            c2=List(Int),
          ),
      ),
  )

the field list will be:

  [
      "a",
      "b:lengths",
      "b:values:lengths",
      "b:values:values",
      "c:lengths",
      "c:c1",
      "c:c2:lengths",
      "c:c2:values",
  ]

And for the following instance of the struct:

  Struct(
      a=3,
      b=[[4, 5], [6, 7, 8], [], [9]],
      c=[
          Struct(c1='alex', c2=[10, 11]),
          Struct(c1='bob', c2=[12]),
      ],
  )

The values of the fields will be:

  {
      "a": [3],
      "b:lengths": [4],
      "b:values:lengths": [2, 3, 0, 1],
      "b:values:values": [4, 5, 6, 7, 8, 9],
      "c:lengths": [2],
      "c:c1": ["alex", "bob"],
      "c:c2:lengths": [2, 1],
      "c:c2:values", [10, 11, 12],
  }

In general, every field name in the format “{prefix}:lengths” defines a domain “{prefix}”, and every subsequent field in the format “{prefix}:{field}” will be in that domain, and the length of the domain is provided for each entry of the parent domain. In the example, “b:lengths” defines a domain of length 4, so every field under domain “b” will have 4 entries. The “lengths” field for a given domain must appear before any reference to that domain. Returns a pointer to an instance of the Cursor, which keeps the current offset on each of the domains defined by fields . Cursor also ensures thread-safety such that ReadNextBatch and ResetCursor can be used safely in parallel. A cursor does not contain data per se, so calls to ReadNextBatch actually need to pass a list of blobs containing the data to read for each one of the fields.

Interface

Arguments
`fields`	A list of strings each one representing a field of the dataset.
Outputs
`cursor`	A blob pointing to an instance of a new TreeCursor.

Code

caffe2/queue/queue_ops.cc

CrossEntropy

Operator computes the cross entropy between the input and the label set. In practice, it is most commonly used at the end of models, after the SoftMax operator and before the AveragedLoss operator. Note that CrossEntropy assumes that the soft labels provided is a 2D array of size N x D (batch size x number of classes). Each entry in the 2D label corresponds to the soft label for the input, where each element represents the correct probability of the class being selected. As such, each element must be between 0 and 1, and all elements in an entry must sum to 1. The formula used is:

                Y[i] = sum_j (label[i][j] * log(X[i][j]))

where (i, j) is the classifier’s prediction of the jth class (the correct one), and i is the batch size. Each log has a lower limit for numerical stability.

Interface

Inputs
`X`	Input blob from the previous layer, which is almost always the result of a softmax operation; X is a 2D array of size N x D, where N is the batch size and D is the number of classes
`label`	Blob containing the labels used to compare the input
Outputs
`Y`	Output blob after the cross entropy computation

DequeueBlobs

  Dequeue the blobs from queue.

Interface

Arguments
`timeout_secs`	Timeout in secs, default: no timeout
Inputs
`queue`	The shared pointer for the BlobsQueue
Outputs
`blob`	The blob to store the dequeued data

Code

DequeueRebatchingQueue

Dequeue Tensors from the Queue. If the Queue is closed this might return less elements than asked. If num_elements > 1 the returned elements will be concatenated into one tensor per component.

Interface

Arguments
`num_elements`	Number of elements to dequeue. By default we dequeue one element.
Inputs
`rebatching_queue`	object representing the queue
`tensor`	First tensor to enqueue

Code

caffe2/queue/rebatching_queue_ops.cc

DestroyCommonWorld

Closes all connections managed by a common world.

Interface

Inputs
`common_world`	The common world to be destroyed.

Code

caffe2/operators/filler_op.cc

DiagonalFill

The operator fills the diagonal elements of the output tensor (>= 2D) with a constant value specified by the ‘value’ argument, and others 0. If number of dimensions of the output tensor is greater than 2, all dimensions must be equal. The data type is specified by the ‘dtype’ argument. The ‘dtype’ argument must be one of the data types specified in the ‘DataType’ enum field in the TensorProto message. If the ‘dtype’ argument is not provided, the data type of ‘value’ is used. The output tensor shape is specified by the ‘shape’ argument. If the number of input is 1, the shape will be identical to that of the input at run time with optional additional dimensions appended at the end as specified by ‘extra_shape’ argument. In that case the ‘shape’ argument should not be set. If input_as_shape is set to true, then the input should be a 1D tensor containing the desired output shape (the dimensions specified in extra_shape will also be appended) NOTE: Currently, it supports data type of float, int32, int64, and bool.

Interface

Arguments
`value`	The value for the elements of the output tensor.
`dtype`	The data type for the elements of the output tensor.Strictly must be one of the types from DataType enum in TensorProto.
`shape`	The shape of the output tensor.Cannot set the shape argument and pass in an input at the same time.
`extra_shape`	The additional dimensions appended at the end of the shape indicatedby the input blob.Cannot set the extra_shape argument when there is no input blob.
`input_as_shape`	1D tensor containing the desired output shape
Inputs
`input`	Input tensor (optional) to provide shape information.
Outputs
`output`	Output tensorargument and its type is specified by the ‘dtype’ argument

Code

Div

Performs element-wise binary division (with limited broadcast support). If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and type as A

EQ

Performs element-wise equality comparison == (with limited broadcast support). If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and A and type `bool`

Code

ElementwiseLinear

Given inputs X of size (N x D), w of size D and b of size D, the op computes Y of size (N X D) where Y_{nd} = X_{nd} * w_d + b_d

Interface

Arguments
`axis`	default to 1; describes the axis of the inputs; defaults to one because the 0th axis most likely describes the batch_size
Inputs
`X`	2D input tensor of size (N X D) data
`w`	1D scaling factors of size D
`b`	1D biases of size D
Outputs
`Y`	2D output tensor

Code

EnsureDense

This operator converts dense or sparse gradients to dense ones. Therefore, sparse gradient can be back propagated to Operators that consume dense gradients only (e.g., FCGradient). The operator’s behaviors: - In forward, simply pass in place or copy input to the output.

In backward, if the gradient passed-in is sparse gradient, change it to dense gradient in linear time; otherwise, simply pass the dense gradient.

Interface

Inputs
`input`	Input tensors.
Outputs
`output`	Output tensor. Same dimension as inputs.

Code

caffe2/operators/exp_op.cc

Exp

Calculates the exponential of the given input tensor, element-wise. This operation can be done in an in-place fashion too, by providing the same input and output blobs.

Interface

Inputs
`input`	Input tensor
Outputs
`output`	The exponential of the input tensor computed element-wise

Code

ExpandDims

Insert single-dimensional entries to the shape of a tensor. Takes one required argument dims , a list of dimensions that will be inserted. Dimension indices in dims are as seen in the output tensor. For example:

  Given a tensor such that tensor.Shape() = [3, 4, 5], then
  ExpandDims(tensor, dims=[0, 4]).Shape() == [1, 3, 4, 5, 1])

If the same blob is provided in input and output, the operation is copy-free.

Interface

Inputs
`data`	Original tensor
Outputs
`expanded`	Reshaped tensor with same data as input.

Code

caffe2/operators/expand_squeeze_dims_op.cc

ExtendTensor

Extend input 0 if necessary based on max element in input 1. Input 0 must be the same as output, that is, it is required to be in-place. Input 0 may have to be re-allocated in order for accommodate to the new size. Currently, an exponential growth ratio is used in order to ensure amortized constant time complexity. All except the outer-most dimension must be the same between input 0 and 1.

Interface

Inputs
`tensor`	The tensor to be extended.
`new_indices`	The size of tensor will be extended based on max element in new_indices.
Outputs
`extended_tensor`	Same as input 0, representing the mutated tensor.

Code

caffe2/operators/extend_tensor_op.cc

FC

Computes the result of passing an input vector X into a fully connected layer with 2D weight matrix W and 1D bias vector b. That is, the layer computes Y = X * W^T + b, where X has size (M x K), W has size (N x K), b has size (N), and Y has size (M x N), where M is often the batch size. NOTE: X does not need to explicitly be a 2D vector; rather, it will be coerced into one. For an arbitrary n-dimensional tensor X \in [a_0, a_1, …,a_{k-1}, a_k, …, a_{n-1}] where a_i \in N+ and k is the axis provided, then X will be coerced into a 2-dimensional tensor with dimensions [a_0 * … * a_{k-1}, a_k * … * a_{n-1}]. For the default case where axis=1, this means the X tensor will be coerced into a 2D tensor of dimensions [a_0, a_1 * … * a_{n-1}], where a_0 is often the batch size. In this situation, we must have a_0 = M and a_1 * … * a_{n-1} = K. Lastly, even though b is a 1D vector of size N, it is copied/resized to be size (M x N) implicitly and added to each vector in the batch. Each of these dimensions must be matched correctly, or else the operator will throw errors.

Interface

Arguments
`axis`	(int32_t) default to 1; describes the axis of the inputs; defaults to one because the 0th axis most likely describes the batch_size
`axis_w`	(int32_t) default to 1; describes the axis of the weight matrix W; defaults to one because the 0th axis most likely describes the batch_size
`float16_compute`	Whether to use float-16 compute kernel
Inputs
`X`	input tensor that’s coerced into a 2D matrix of size (MxK) as described above
`W`	A tensor that is coerced into a 2D blob of size (KxN) containing fully connected weight matrix
`b`	1D blob containing bias vector
Outputs
`Y`	2D output tensor

  elements, flatten indices from the input tensor).

These two outputs should be used with the input K, so that we know which indices in X are picked. Given two equivalent values, this operator uses the indices along the last dim- ension as a tiebreaker. That is, the element with the lower index will appear first.

Interface

Inputs
`X`	Tensor of shape [a_1, a_2, …, a_n, r]
`K`	Tensor of shape [a_1, a_2, …, a_n, 1]
Outputs
`Flatten values`	Tensor of shape [ \sum_i K[i, 1] ] containing top K[…, 1] values from the input tensor
`Flatten indices`	Tensor of shape [ \sum_i K[i, 1] ] containing the indices into the flatten input

Code

caffe2/operators/flexible_top_k.cc

FlexibleTopKGradient

No documentation yet.

Code

caffe2/operators/flexible_top_k.cc

FloatToFused8BitRowwiseQuantized

Applies 8-bit row-wise quantization by determining the range (maximum - minimum) and offset (minimum value) of each row in the input matrix, and then scaling each element to an 8-bit number between 0 and 255. To later de-quantize values, the scale (range / 255) and offset (bias) are stored alongside the data. More precisely, the first 4 bytes of each row in the output matrix are a 32-bit float storing the scale, the next 4 bytes store the bias as a 32-bit float, and all remaining bytes in the row encode single quantized values.)

Interface

Inputs
`input`	Float32 input data
Outputs
`output`	Fused scale, bias and quantized data

Code

caffe2/operators/fused_rowwise_8bit_conversion_ops.cc

FloatToRowwiseQuantized8Bits

This operator applies 8Bit row-wise quantization to input tensor and returns quantized tensor. Row wise quantization of input tensor is the following process. We take tensor of size (m_1, m_2,…,m_n), n >= 2, reshape it into matrix of size (m_1, m_2 x… x m_n) and apply row-wise quantization. After this, we compute scale_i= (min_i - max_i) / 255 and

  bias_i = min_i for

i-th row r_i of reshaped matrix, where min_i and max_i –

  minimum

and maximum elements of i-th row, and quantize each element r_{ij} as 0 <= round(r_ij - bias_i) / scale_i) < 256. Instead of input tensor we obtain uint8 tensor and auxiliary information as scale and bias to restore input tensor (with losses).

Interface

Inputs
`input`	input
Outputs
`quantized_input`	quantized_input
`scale_bias`	Matrix of floats, each row r_i of which stores a pair s_i, b_i

Code

caffe2/operators/lengths_reducer_rowwise_8bit_ops.cc

Floor

Floor takes one input data (Tensor) and produces one output data (Tensor) where the floor function, y = floor(x), is applied to the tensor elementwise. Currently supports only float32.

Interface

Inputs
`X`	ND input tensor
Outputs
`Y`	ND input tensor

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and A and type `bool`

Code

caffe2/operators/gru_unit_op.cc

GRUUnit

GRUUnit computes the activations of a standard GRU, in a sequence-length aware fashion. Concretely, given the (fused) inputs X (TxNxD), the previous hidden state (NxD), and the sequence lengths (N), computes the GRU activations, avoiding computation if the input is invalid (as in, the value at X[t][n] >= seqLengths[n].

Interface

Arguments
`drop_states`	Bool to determine if hidden state is zeroes or passed along for timesteps past the given sequence_length.
`sequence_lengths`	When false, the sequence lengths input is left out, and all following inputs are shifted left by one.
Outputs
`hidden`	The new GRU hidden state calculated by this op.

Code

GRUUnitGradient

No documentation yet.

Interface

Arguments
`sequence_lengths`	When false, the sequence lengths input is left out, and all following inputs are shifted left by one.

Code

caffe2/operators/gru_unit_op.cc

GT

Performs element-wise greater than comparison > (with limited broadcast support). If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and A and type `bool`

Code

Gather

Given DATA tensor of rank r >= 1, and INDICES tensor of rank q, gather entries of the outer-most dimension of DATA indexed by INDICES, and concatenate them in an output tensor of rank q + (r - 1). Example:

  DATA  = [
      [1.0, 1.2],
      [2.3, 3.4],
      [4.5, 5.7],
  ]
  INDICES = [
      [0, 1],
      [1, 2],
  ]
  OUTPUT = [
      [
          [1.0, 1.2],
          [2.3, 3.4],
      ],
      [
          [2.3, 3.4],
          [4.5, 5.7],
      ],
  ]

Interface

Inputs
`DATA`	Tensor of rank r >= 1.
`INDICES`	Tensor of int32/int64 indices, of any rank q.
Outputs
`OUTPUT`	Tensor of rank q + (r - 1).

Code

caffe2/operators/partition_ops.cc

GatherByKey

Inverse operation of Partition. Takes the original, full ‘keys’ tensor followed by sharded value tensors, and returns the full value tensor, combined using the same hash used in Partition.

Interface

Inputs
`keys`	The first input is the full keys tensor (same as the first input of Partition).
`sharded_values`	Subsequented inputs are sharded values tensors.
Outputs
`values`	Reconstructed values tensor.

Code

GatherFused8BitRowwise

Perform the same operation as Gather, but operating on 8-bit rowwise quantized matrices with fused storage (where each row stores quantized values, and then the scale and offset). DATA needs to have rank 2 and INDICES needs to have rank 1.

Interface

Inputs
`DATA`	uint8 tensor with rank 2 obtained with operator FloatToFused8BitRowwiseQuantized
`INDICES`	Integer vector containing indices of the first dimension of DATA forthe rows that are being gathered
Outputs
`OUTPUT`	output

Code

caffe2/operators/gather_fused_8bit_rowwise_op.cc

GatherPadding

Gather the sum of start and end paddings in a padded input sequence. Used in order to compute the gradients of AddPadding w.r.t the padding tensors.

Interface

Arguments
`padding_width`	Outer-size of padding present around each range.
`end_padding_width`	(Optional) Specifies a different end-padding width.
Inputs
`data_in`	T<N, D1…, Dn> Padded input data
`lengths`	(i64) Num of elements in each range. sum(lengths) = N. If not provided, considers all data as a single segment.
Outputs
`padding_sum`	Sum of all start paddings, or of all paddings if end_padding_sum is not provided.
`end_padding_sum`	T<D1…, Dn> Sum of all end paddings, if provided.

Code

caffe2/operators/sequence_ops.cc

GatherRanges

Given DATA tensor of rank 1, and RANGES tensor of rank 3, gather corresponding ranges into a 1-D tensor OUTPUT. RANGES dimentions description: 1: represents list of examples within a batch 2: represents list features 3: two values which are start and length or a range (to be applied on DATA) Another output LENGTHS represents each example length within OUTPUT Example:

  DATA  = [1, 2, 3, 4, 5, 6]
  RANGES = [
    [
      [0, 1],
      [2, 2],
    ],
    [
      [4, 1],
      [5, 1],
    ]
  ]
  OUTPUT = [1, 3, 4, 5, 6]
  LENGTHS = [3, 2]

Interface

Inputs
`DATA`	Tensor of rank 1.
`RANGES`	Tensor of int32/int64 ranges, of dims (N, M, 2). Where N is number of examples and M is a size of each example. Last dimension represents a range in the format (start, lengths)
Outputs
`OUTPUT`	1-D tensor of size sum of range lengths
`LENGTHS`	1-D tensor of size N with lengths over gathered data for each row in a batch. sum(LENGTHS) == OUTPUT.size()

Code

caffe2/operators/h_softmax_op.cc

GatherRangesToDense

Given DATA tensor of rank 1, and RANGES tensor of rank 3, gather values corresponding to each range into a separate output tensor. If the optional input KEY tensor is also given, the output will be sorted by KEY for each example. RANGES dimensions description: 1: represents list of examples within a batch 2: represents list features 3: two values which are start and length or a range (to be applied on DATA) Each feature has fixed lengths which are passed as lengths argument and a separate tensor will be produced for each feature. i.e. DATA.dim(1) = len(lengths) = NumOuptuts. Missing features (represented by empty ranges) filled with default_value. Example 1:

  DATA  = [1, 2, 3, 4, 5, 6, 7, 8]
  RANGES = [
    [
      [2, 4],
      [0, 2],
    ],
    [
      [0, 0],
      [6, 2],
    ]
  ]
  lengths = [4, 2]
  OUTPUT[0] = [[3, 4, 5, 6], [0, 0, 0, 0]]
  OUTPUT[1] = [[1, 2], [7, 8]]

Example 2 (with KEY): DATA

  = [1, 2, 3, 4, 5, 6, 7, 8]

KEY

  = [0, 1, 3, 2, 1, 0, 1, 0]

RANGES = [

  [
    [2, 4],
    [0, 2],
  ],
  [
    [0, 0],
    [6, 2],
  ]

] lengths = [4, 2] OUTPUT[0] = [[6, 5, 4, 3], [0, 0, 0, 0]] OUTPUT[1] = [[1, 2], [8, 7]] Contrast Example 2 with Example 1. For each data point per feature, the values are sorted by the corresponding KEY.

Interface

Arguments
`lengths`	Expected lengths for ranges
Inputs
`DATA`	Tensor of rank 1.
`RANGES`	Tensor of int32/int64 ranges, of dims (N, M, 2). Where N is number of examples and M is a size of each example. Last dimention represents a range in the format (start, lengths)
`KEY`	Tensor of rank 1 and type int64.
Outputs
`OUTPUT`	1-D tensor of size sum of range lengths

Interface

Arguments
`num_classes`	The number of classes used to build the hierarchy.
Inputs
`Labels`	The labels vector
Outputs
`Hierarch`	Huffman coding hierarchy of the labels

Code

If

‘If’ control operator, first input is a scalar boolean blob that stores condition value. Accepts ‘then_net’ (required) and ‘else_net’ (optional) arguments for ‘then’ and ‘else’ subnets respectively. Subnets are executed in the same workspace as ‘If’.

Interface

Arguments
`then_net`	Net executed when condition is true
`else_net`	Net executed when condition is false (optional)
Inputs
`condition`	Scalar boolean condition

Code

caffe2/operators/if_op.cc

Im2Col

The Im2Col operator from Matlab.

Interface

Inputs
`X`	4-tensor in NCHW or NHWC.
Outputs
`Y`	4-tensor. For NCHW: N x (C x kH x kW) x outH x outW.For NHWC: N x outH x outW x (kH x kW x C

Code

caffe2/operators/im2col_op.cc

ImageInput

Imports and processes images from a database. For each run of the operator, batch_size images will be processed. GPUs can optionally be used for part of the processing. The following transformations are applied to the image

  - A bounding box is applied to the initial image (optional)
  - The image is rescaled either up or down (with the scale argument) or
    just up (with the minsize argument)
  - The image is randomly cropped (crop size is passed as an argument but
    the location of the crop is random except if is_test is passed in which case
    the image in cropped at the center)
  - The image is normalized. Each of its color channels can have separate
    normalization values

The dimension of the output image will always be cropxcrop

Interface

Arguments
`batch_size`	Number of images to output for each run of the operator. Must be 1 or greater
`color`	Number of color channels (1 or 3). Defaults to 1
`color_jitter`	Whether or not to do color jitter. Defaults to 0
`img_saturation`	Image saturation scale used in color jittering. Defaults to 0.4
`img_brightness`	Image brightness scale used in color jittering. Defaults to 0.4
`img_contrast`	Image contrast scale used in color jittering. Defaults to 0.4
`color_lighting`	Whether or not to do color lighting. Defaults to 0
`color_lighting_std`	Std of normal distribution where color lighting scaling factor is sampled. Defaults to 0.1
`scale_jitter_type`	Type 0: No scale jittering Type 1: Inception-style scale jittering
`label_type`	Type 0: single integer label for multi-class classification. Type 1: sparse active label indices for multi-label classification. Type 2: dense label embedding vector for label embedding regression
`scale`	Scale the size of the smallest dimension of the image to this. Scale and minsize are mutually exclusive. Must be larger than crop
`minsize`	Scale the size of the smallest dimension of the image to this only if the size is initially smaller. Scale and minsize are mutually exclusive. Must be larger than crop.
`warp`	If 1, both dimensions of the image will be set to minsize or scale; otherwise, the other dimension is proportionally scaled. Defaults to 0
`crop`	Size to crop the image to. Must be provided
`mirror`	Whether or not to mirror the image. Defaults to 0
`mean`	Mean by which to normalize color channels. Defaults to 0.
`mean_per_channel`	Vector of means per color channel (1 or 3 elements). Defaults to mean argument. Channel order BGR
`std`	Standard deviation by which to normalize color channels. Defaults to 1.
`std_per_channel`	Vector of standard dev. per color channel (1 or 3 elements). Defaults to std argument. Channel order is BGR
`bounding_ymin`	Bounding box coordinate. Defaults to -1 (none)
`bounding_xmin`	Bounding box coordinate. Defaults to -1 (none)
`bounding_height`	Bounding box coordinate. Defaults to -1 (none)
`bounding_width`	Bounding box coordinate. Defaults to -1 (none)
`is_test`	Set to 1 to do deterministic cropping. Defaults to 0
`use_caffe_datum`	1 if the input is in Caffe format. Defaults to 0
`use_gpu_transform`	1 if GPU acceleration should be used. Defaults to 0. Can only be 1 in a CUDAContext
`decode_threads`	Number of CPU decode/transform threads. Defaults to 4
`output_type`	If gpu_transform, can set to FLOAT or FLOAT16.
`db`	Name of the database (if not passed as input)
`db_type`	Type of database (if not passed as input). Defaults to leveldb
`output_sizes`	The sizes of any outputs besides the data and label (should have a number of elements equal to the number of additional outputs)
`random_scale`	[min, max] shortest-side desired for image resize. Defaults to [-1, -1] or no random resize desired.
Inputs
`reader`	The input reader (a db::DBReader)
Outputs
`data`	Tensor containing the images
`label`	Tensor containing the labels
`additional outputs`	Any outputs after the first 2 will be Tensors read from the input TensorProtos

Code

caffe2/image/image_input_op.cc

IndexFreeze

Freezes the given index, disallowing creation of new index entries. Should not be called concurrently with IndexGet.

Interface

Inputs
`handle`	Pointer to an Index instance.
Outputs
`handle`	The input handle.

Code

IndexGet

Given an index handle and a tensor of keys, return an Int tensor of same shape containing the indices for each of the keys. If the index is frozen, unknown entries are given index 0. Otherwise, new entries are added into the index. If an insert is necessary but max_elements has been reached, fail.

Interface

Inputs
`handle`	Pointer to an Index instance.
`keys`	Tensor of keys to be looked up.
Outputs
`indices`	Indices for each of the keys.

Code

caffe2/operators/index_hash_ops.cc

IndexHash

This operator translates a list of indices into a list of hashed indices. A seed can be fed as an argument to change the behavior of the hash function. If a modulo is specified, all the hashed indices will be modulo the specified number. All input and output indices are enforced to be positive.

Interface

Arguments
`seed`	seed for the hash function
`modulo`	must be > 0, hashed ids will be modulo this number
Inputs
`Indices`	Input feature indices.
Outputs
`HashedIndices`	Hashed feature indices.

Code

IndexLoad

Loads the index from the given 1-D tensor. Elements in the tensor will be given consecutive indexes starting at 1. Fails if tensor contains repeated elements.

Interface

Arguments
`skip_first_entry`	If set, skips the first entry of the tensor. This allows to load tensors that are aligned with an embedding, where the first entry corresponds to the default 0 index entry.
Inputs
`handle`	Pointer to an Index instance.
`items`	1-D tensor with elements starting with index 1.
Outputs
`handle`	The input handle.

Code

IndexSize

Returns the number of entries currently present in the index.

Interface

Inputs
`handle`	Pointer to an Index instance.
Outputs
`items`	Scalar int64 tensor with number of entries.

Code

IndexStore

Stores the keys of this index in a 1-D tensor. Since element 0 is reserved for unknowns, the first element of the output tensor will be element of index 1.

Interface

Inputs
`handle`	Pointer to an Index instance.
Outputs
`items`	1-D tensor with elements starting with index 1.

Code

caffe2/operators/map_ops.cc

InstanceNorm

Carries out instance normalization as described in the paper https://arxiv.org/abs/1607.08022. Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below:

  * Output case #1: output
  * Output case #2: output, saved_mean
    - don't use, doesn't make sense but won't crash
  * Output case #3: output, saved_mean, saved_inv_stdev
    - Makes sense for training only

For training mode, type 3 is faster in the sense that for the backward pass, it is able to reuse the saved mean and inv_stdev in the gradient computation.

Interface

Arguments
`epsilon`	The epsilon value to use to avoid division by zero.
`order`	A StorageOrder string.
Inputs
`input`	The input 4-dimensional tensor of shape NCHW or NHWC depending on the order parameter.
`scale`	The input 1-dimensional scale tensor of size C.
`bias`	The input 1-dimensional bias tensor of size C.
Outputs
`output`	The output 4-dimensional tensor of the same shape as input.
`saved_mean`	Optional saved mean used during training to speed up gradient computation. Should not be used for testing.
`saved_inv_stdev`	Optional saved inverse stdev used during training to speed up gradient computation. Should not be used for testing.

Code

L1Distance

Given two input float tensors X, Y, and produces one output float tensor of the L1 difference between X and Y, computed as L1(x,y) = sum over

x-y

Interface

Inputs
`X`	1D or 2D input tensor
`Y`	1D or 2D input tensor (must have the same shape as X)
Outputs
`Z`	1D output tensor

Arguments
`forget_bias`	Bias term to add in while calculating forget gate
`sequence_lengths`	When false, the sequence lengths input is left out, and all following inputs are shifted left by one.

Code

caffe2/operators/lstm_unit_op.cc

LSTMUnitGradient

No documentation yet.

Interface

Arguments
`sequence_lengths`	When false, the sequence lengths input is left out, and all following inputs are shifted left by one.

Code

caffe2/operators/lstm_unit_op.cc

LT

Performs element-wise less than comparison < (with limited broadcast support). If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and A and type `bool`

Code

caffe2/operators/cross_entropy_op.cc

LabelCrossEntropy

Operator computes the cross entropy between the input and the label set. In practice, it is most commonly used at the end of models, after the SoftMax operator and before the AveragedLoss operator. Note that LabelCrossEntropy assumes that the label provided is either a 1D array of size N (batch size), or a 2D array of size N x 1 (batch size). Each entry in the label vector indicates which is the correct class; as such, each entry must be between 0 and D - 1, inclusive, where D is the total number of classes. The formula used is:

                            Y[i] = -log(X[i][j])

where (i, j) is the classifier’s prediction of the jth class (the correct one), and i is the batch size. Each log has a lower limit for numerical stability.

Interface

Inputs
`X`	Input blob from the previous layer, which is almost always the result of a softmax operation; X is a 2D array of size N x D, where N is the batch size and D is the number of classes
`label`	Blob containing the labels used to compare the input
Outputs
`Y`	Output blob after the cross entropy computation

Code

LabelCrossEntropyGradient

No documentation yet.

Code

caffe2/operators/cross_entropy_op.cc

LambdaRankNdcg

It implements the LambdaRank as appeared in Wu, Qiang, et al. “Adapting boosting for information retrieval measures.” Information Retrieval 13.3 (2010): 254-270. This method heuristically optimizes the NDCG.

Code

caffe2/operators/listwise_l2r_op.cc

LambdaRankNdcgGradient

No documentation yet.

Code

caffe2/operators/listwise_l2r_op.cc

Lars

Implement Layer-wise Adaptive Rate Scaling (LARS) as in https://arxiv.org/abs/1708.03888. Without weight decay, given a global learning rate lr, parameter tensor X and its gradient dX, the local learning rate for X will be

    local_lr = lr * norm(X) / ( norm(dX) + offset * norm(X) )

            = lr  / ( norm(dX) / norm(X) + offset ),

where offset is a preset hyper-parameter to avoid numerical issue. In this implementation, we uses l2 norm and output the rescaling factor

    1 / ( norm(dX) / norm(X) + offset ).

Interface

Arguments
`offset`	rescaling offset parameter
Inputs
`X`	Parameter tensor
`dX`	Gradient tensor
Outputs
`lr_rescale`	Local learning rate rescaling factor

Code

caffe2/sgd/lars_op.cc

LastNWindowCollector

Collect the last N rows from input data. The purpose is to keep track of data accross batches, so for example suppose the LastNWindowCollector is called successively with the following input data

  [1, 2, 3, 4]
  [5, 6, 7]
  [8, 9, 10, 11]

And the number of items is set to 6, then the output after the 3rd call will contain the following elements:

  [6, 7, 8, 9, 10, 11]

No guarantee is made on the ordering of elements in input. So a valid value for output could have been

  [11, 10, 9, 8, 7, 6]

Also, this method works for any order tensor, treating the first dimension as input rows and keeping the last N rows seen as input. So for instance:

  [[1, 2], [2, 3], [3, 4], [4, 5]]
  [[5, 6], [6, 7], [7, 8]]
  [[8, 9], [9, 10], [10, 11], [11, 12]]

A possible output would be

  [[6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12]]

This is not thread safe unless a mutex is given.

Interface

Arguments
`num_to_collect`	The number of random samples to append for each positive samples
Inputs
`last-N buffer`	The buffer for last-N record. Should be initialized to empty tensor
`next cursor`	The cursor pointing to the next position that should be replaced. Should be initialized to 0.
`DATA`	tensor to collect from
`MUTEX`	(optional) mutex to use to make this thread-safe
`NUM_VISITED`
Outputs
`last-N buffer`	Data stored in sessions
`next cursor`	Updated input cursor
`NUM_VISITED`	number of records seen so far

Code

caffe2/operators/last_n_window_collector.cc

LayerNorm

Computes layer normalization as described in https://arxiv.org/pdf/1607.06450.pdf. Given an input vector x \in [a_0, a_1, …,a_{k-1}, a_k, …, a_{n-1}], this op treats dimensions a_k through a_{n-1} as feature vectors. For each feature vector, the op contains the mean and standard deviation. Then, it returns the normalized values (with respect to the feature vector). Note that this op does not contain the scale an bias terms described in the paper. Simply follow this op with an FC op to add those. Concretely, this op implements: h = \frac{1}{\sigma}(a - \mu) where \mu = \frac{1}{H}\sum_{i=1}^{H} a_i and \sigma = \sqrt{\frac{1}{H}\sum_{i=1}^{H}(a_i - \mu)^2} where H is the number of hidden units (i.e. product of dimensions from ‘axis’ to the end.)

Interface

Arguments
`axis`	(int) default to 1; Describes axis of the inputs. Defaults to one because the 0th axis most likely describes the batch size
`epsilon`	(float) default to 0.001. Small value to be added to the stdev when dividing out by that value. This prevents division by zero.
Inputs
`input`	Input tensor which layer normalization will be applied to
Outputs
`output`	Normalized values
`mean`	Mean values for each feature vector
`stddev`	Standard deviations for each feature vector

LearningRate

Learning rate is a decreasing function of time. With low learning rates the improvements will be linear. With high learning rates they will start to look more exponential. Learning rate is controlled by the following arguments: Required:

  `iterations`
  `base_lr`: base learning rate
  `policy`: this controls how the learning rate is applied, options are:
    `fixed`
    `step`: uses `stepsize`, `gamma`
    `exp`: uses `gamma`
    `inv`: uses `gamma`, `power`
    `linearWarmup`: uses `start_multiplier`, `num_iter`
    `constantWarmup`: uses `multiplier`, `num_iter`
    `alter`: uses  `active_first`, `active_period`, `inactive_period`
    `hill`: uses those in both `linearWarmup` and `inv`, plus `end_multiplier`

Optional:

  `stepsize`: defaults to 0
  `gamma`: defaults to 0
  `power`: defaults to 0
  `num_iter`: defaults to 0
  `start_multiplier`: defaults to 0
  `multiplier`: defaults to 0.5

Usage:

  train_net.LearningRate(*iterations*, "*label*", base_lr=*float*,

                        policy="policy_name", stepsize=*int*, gamma=*float*)

Example usage:

  train_net.LearningRate(200, "LR", base_lr=-0.1,

                        policy="step", stepsize=20, gamma=0.9)

Interface

Arguments
`base_lr`	(float, required) base learning rate
`policy`	(float, default 1.0) strategy for gamma enforcement
`power`	(float, default 1.0) used only for inv policy type
`gamma`	(float, default 1.0) momentum of change
`stepsize`	(float, default 1.0) sampling rate on iterations
`active_first`	(boolean, default True) in alter policy
`active_period`	(int64_t, required) in alter policy
`inactive_period`	(int64_t, required) in alter policy
`max_iter`	(int, default -1) maximum iterations in this training run
`num_iter`	(int, default 0) number of iterations over which to warmup lr
`start_multiplier`	(float, default 0) starting multiplier for learning rate
`end_multiplier`	(float, default 0) end multiplier for learning rate
`multiplier`	(float, default 0.5) constant multiplier for learning rate
Inputs
`input`	description needed
Outputs
`output`	description needed

Code

caffe2/sgd/learning_rate_op.cc

LengthsGather

Gather items from sparse tensor. Sparse tensor is described by items and lengths. This operator gathers items corresponding to lengths at the given indices. This deliberately doesn’t return lengths of OUTPUTS so that both lists and maps can be supported without special cases. If you need lengths tensor for OUTPUT, use Gather . Example:

  ITEMS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  LENGTHS = [0, 2, 3, 1, 4]
  INDICES = [0, 2, 4]

  OUTPUT = [2, 3, 4, 6, 7, 8, 9]

Interface

Inputs
`ITEMS`	items tensor
`LENGTHS`	lengths tensor
`INDICES`	indices into LENGTHS where items should be gathered
Outputs
`OUTPUT`	1-D tensor containing gathered items

LengthsTile

Given DATA tensor of rank r >= 1, and LENGTHS tensor of rank 1, duplicate each entry of the outer-most dimension of DATA according to LENGTHS, and concatenate them in an output tensor of rank r. Example:

  DATA  = [
      [1.0, 1.2],
      [2.3, 3.4],
      [4.5, 5.7],
      [6.8, 7.9],
  ]
  LENGTHS = [0, 1, 3, 2]
  OUTPUT = [
      [2.3, 3.4],
      [4.5, 5.7],
      [4.5, 5.7],
      [4.5, 5.7],
      [6.8, 7.9],
      [6.8, 7.9],
  ]

Interface

Inputs
`DATA`	Tensor of rank r >= 1. First dimension must be equal to the size of lengths
`LENGTHS`	Tensor of int32 lengths of rank 1
Outputs
`OUTPUT`	Tensor of rank r

Code

caffe2/operators/lengths_tile_op.cc

LengthsToRanges

Given a vector of segment lengths, calculates offsets of each segment and packs them next to the lengths. For the input vector of length N the output is a Nx2 matrix with (offset, lengths) packaged for each segment. For example, [1, 3, 0, 2] transforms into [[0, 1], [1, 3], [4, 0], [4, 2]] .

Interface

Inputs
`lengths`	1D tensor of int32 segment lengths.
Outputs
`ranges`	2D tensor of shape len(lengths) X 2 and the same type as `lengths`

Code

caffe2/operators/logit_op.cc

LengthsToSegmentIds

Given a vector of segment lengths, returns a zero-based, consecutive vector of segment_ids. For example, [1, 3, 0, 2] will produce [0, 1, 1, 1, 3, 3]. In general, the inverse operation is SegmentIdsToLengths. Notice though that trailing empty sequence lengths can’t be properly recovered from segment ids.

Interface

Inputs
`lengths`	1D tensor of int32 or int64 segment lengths.
Outputs
`segment_ids`	1D tensor of length `sum(lengths)`

Interface

Arguments
`eps`	small positive epsilon value, the default is 1e-6.
Inputs
`X`	input float tensor
`dY`	input float tensor
Outputs
`dX`	output float tensor

Code

LongIndexCreate

Creates a dictionary that maps int64 keys to consecutive integers from 1 to max_elements. Zero is reserved for unknown keys.

Interface

Arguments
`max_elements`	Max number of elements, including the zero entry.
Outputs
`handler`	Pointer to an Index instance.

Code

caffe2/operators/lpnorm_op.cc

LpNorm

Given one input float tensor X, and produces one output float tensor of the Lp norm of tensor X, computed as Lp(x) = sum over

x^p

, in which p is either 1 or 2(currently only supports l1 and l2 norm), determined by the argument p.

Interface

Arguments
`p`	Order of the norm in p-norm
`average`	whehther we calculate norm or averaged_norm.The Lp_averaged_norm(x) is defined asLp_averaged_norm(x) = LpNorm(x) / size(x)
Inputs
`X`	1D input tensor
Outputs
`Z`	1D output tensor

Code

LpNormGradient

Given one input float tensor X, derivative dout, and produces one output float tensor dX. dX is the derivative of the Lp norm of tensor X, computed as dx = d(sum over

x^p

)/dx, in which p is either 1 or 2(currently only supports l1 and l2 norm) determined by the argument p.

Interface

Arguments
`p`	Order of the norm in p-norm
`average`	whehther we calculate norm or averaged_norm.The Lp_averaged_norm(x) is defined asLp_averaged_normgradient(x) = LpNormGradient(x) / size(x)
Inputs
`X`	1D input tensor
`dout`	1D input tensor
Outputs
`dx`	1D output tensor

Code

caffe2/operators/lpnorm_op.cc

LpPool

LpPool consumes an input blob X and applies L-p pooling across the the blob according to kernel sizes, stride sizes, and pad lengths defined by the ConvPoolOpBase operator. L-p pooling consisting of taking the L-p norm of a subset of the input tensor according to the kernel size and downsampling the data into the output blob Y for further processing.

Interface

Inputs
`X`	Input data tensor from the previous operator; dimensions depend on whether the NCHW or NHWC operators are being used. For example, in the former, the input has size (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. The corresponding permutation of dimensions is used in the latter case.
Outputs
`Y`	Output data tensor from L-p pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes.

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and type as A

Code

caffe2/operators/multi_class_accuracy_op.cc

MultiClassAccuracy

Respectively compute accuracy score for each class given a number of instances and predicted scores of each class for each instance.

Interface

Inputs
`prediction`	2-D float tensor (N,D,) of predicted scores of each class for each data. N is the number of instances, i.e., batch size. D is number of possible classes/labels.
`labels`	1-D int tensor (N,) of labels for each instance.
Outputs
`accuracies`	1-D float tensor (D,) of accuracy for each class. If a class has no instance in the batch, its accuracy score is set to zero.
`amounts`	1-D int tensor (D,) of number of instances for each class in the batch.

Code

NCHW2NHWC

The operator switches the order of data in a tensor from NCHW- sample index N, channels C, height H and width W, to the NHWC order.

Interface

Inputs
`data`	The input data (Tensor) in the NCHW order.
Outputs
`output`	The output tensor (Tensor) in the NHWC order.

Code

caffe2/operators/onnx_while_op.cc

ONNXWhile

*** EXPERIMENTAL. This operator is a work-in-progress. No assumption should be made about the stability or correctness of this op. ** * Generic Looping construct confirming to the ONNX Loop operator spec. This loop has multiple termination conditions: 1. Trip count. Iteration count specified at runtime. Set by specifying the

    input M. Optional. Set to empty string to omit. Note that a static trip
    count (specified at graph construction time) can be specified by passing
    in a constant node for input M.

Loop termination condition. This is an input to the op that determines

    whether to run the first interation and also a loop-carried dependency for
    the body graph. The body graph must yield a value for the condition
    variable, whether this input is provided or not.

This table summarizes the operating modes of this operator with equivalent C-style code: Operator inputs defined as (max_trip_count, condition_var). Omitted optional inputs are represented as empty string. Concretely, in this caffe2 op an input is marked as omitted by setting its ‘has_{name}’ argument to False.

    input ("", ""):
        for (int i=0; ; ++i) {
          cond = ... // Note this value is ignored, but is required in the body
        }

    input ("", cond) // Note this is analogous to a while loop
        bool cond = ...;
        for (int i=0; cond; ++i) {
          cond = ...;
        }

    input ("", 1) // Note this is analogous to a do-while loop
        bool cond = true
        for (int i=0; cond; ++i) {
          cond = ...;
        }

    input (trip_count, "") // Note this is analogous to a for loop
        int trip_count = ...
        for (int i=0; i < trip_count; ++i) {
          cond = ...; // ignored
        }

    input (trip_count, cond)
        int trip_count = ...;
        bool cond = ...;
        for (int i=0; i < trip_count && cond; ++i) {
          cond = ...;
        }

Interface

Arguments
`loop_net`	Net executed on each iteration
Inputs
`condition`	Scalar boolean condition

Code

OneHot

Given a sequence of indices, one for each example in a batch, returns a matrix where each inner dimension has the size of the index and has 1.0 in the index active in the given example, and 0.0 everywhere else.

Interface

Inputs
`indices`	The active index for each example in the batch.
`index_size_tensor`	Scalar with the size of the index. Must be in CPU context
Outputs
`one_hots`	Matrix of size len(indices) x index_size

Code

caffe2/operators/one_hot_ops.cc

Or

Performs element-wise logical operation or (with limited broadcast support). Both input operands should be of type bool . If necessary the right-hand-side argument will be broadcasted to match the shape of left-hand-side argument. When broadcasting is specified, the second tensor can either be of size 1 (a scalar value), or having its shape as a contiguous subset of the first tensor’s shape. The starting of the mutually equal shape is specified by the argument “axis”, and if it is not set, suffix matching is assumed. 1-dim expansion doesn’t work yet. For example, the following tensor shapes are supported (with broadcast=1):

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and A and type `bool`

Code

caffe2/operators/prelu_op.cc

PRelu

PRelu takes input data (Tensor) and slope tensor as input, and produces one output data (Tensor) where the function `f(x) = slope * x for x < 0` , `f(x) = x for x >= 0` ., is applied to the data tensor elementwise.

Interface

Inputs
`X`	1D input tensor
`Slope`	1D slope tensor. If `Slope` is of size 1, the value is sharedacross different channels
Outputs
`Y`	1D input tensor

Code

PReluGradient

PReluGradient takes both Y and dY and uses this to update dX and dW according to the chain rule and derivatives of the rectified linear function.

Code

caffe2/operators/prelu_op.cc

PackRNNSequence

Pack values based on the length blob. Each number from length blob represents the corresponding values that need to be packed. The dimension for each pack is the same as the maximum number from the length blob (padding with zero is implemented for smaller length value). The overall output dimension is: T * N * D, where T is the max number of lengths, N is the size of lengths, and D is the dimension of each feature value. The following example shows the input and output of this operator: Given:

  values = [v1, v2, v3, v4, v5, v6, v7, v8]
  lengths = [2, 3, 1, 2];

Output:

  output = [
    [v1, v3, v6, v7],
    [v2, v4, 0,  v8],
    [0,  v5, 0,  0 ],
  ]

One application for this operator is the transfer data into the format that is used for RNN models. Note that the gradient operator of PackRNNSequence is UnpackRNNSequence.

Interface

Inputs
`values`	Data tensor, contains a sequence of features
`lengths`	lengths with each number representing the pack size.
Outputs
`output`	Output tensor after packing

Code

caffe2/operators/pack_rnn_sequence_op.cc

PackRecords

Given a dataset under a schema specified by the fields argument will pack all the input tensors into one, where each tensor element represents a row of data (batch of size 1). This format allows easier use with the rest of Caffe2 operators.

Interface

Arguments
`fields`	List of strings representing the string names in the formatspecified in the doc for CreateTreeCursor.
Outputs
`tensor`	One dimensional tensor having a complex type of SharedTensorVectorPtr. In order to reverse it back to the original input it has to be inserted into UnPackRecordsOp.

Code

caffe2/operators/partition_ops.cc

PackSegments

Map N dim tensor to N+1 dim based on length blob. Sequences that are shorter than the longest sequence are padded with zeros.

Interface

Arguments
`pad_minf`	Padding number in the packed segments. Use true to pad -infinity, otherwise pad zeros
`return_presence_mask`	bool whether to return presence mask, false by default
Inputs
`lengths`	1-d int/long tensor contains the length in each of the output.
`tensor`	N dim Tensor.
Outputs
`packed_tensor`	N + 1 dim Tensorwhere dim(1) is the max length, dim(0) is the batch size.
`presence_mask`	2 dim boolean tensor, false where packed_tensor is padded, true otherwise.

Code

Percentile

    This operator is used to find percentile representations for raw values, given a sample
    set of raw values, labeled with their corresponding percentiles from the same distribution.
    In particular, this operator takes as input a tensor of floats to find the percentile values
    for, a 2D tensor of floats, where the first column of the tensor represents sampled values,
    and the second column represents the percentile labels, and a tensor  of integers lengths.

    This lengths tensor is used because the operator works on multiple sets of raw values at the same time. For
    example, for an input:
    original_values=[[3, 5, 3],[5, 1, 6]], lengths = [2, 1, 1], value_to_pct = [[3, 0.2], [5, 0.5], [1, 0.3], [3. 0.6]]

    Our operator expects that each column i of the input tensor is sampled from distribution i. Lengths tells
    us that the first two elements in value_to_pct are sampled from distribution 1, the next is from distribution two,
    and the last is from distribution 3. We expect the output of our operator to give us [[0.2, 1.0, 0.6], [0.5, 0.3, 1.0]].

    To calculate the percentile of an element, we check to see if its value is already mapped to
    a percentile in value_to_pct. If so, we return that value. If not, we linearly interpolate between
    the two closest values in value_to_pct. If the value is larger than all values in value_to_pct, we
    return 1. If it's smaller than all the values, we return 0.

Interface

Inputs
`original_values`	Input 2D tensor of floats, representing the original, raw data to calculate percentiles for.
`value_to_pct`	Sorted 2D tensor, with 2 columns. Each element in the first column is a float representing the raw value of a sample. Its corresponding element in the next column represents the percentile it maps to.
`lengths`	1D tensor, representing the length of each distribution. We expect that the sum of elements of this tensor is equal to the total length of value_to_pct.
Outputs
`percentile_values`	1D tensor of floats, with the same dimensions as the flattened input tensor. Each element of this tensor, percentile_values[i], corresponds to the percentile calculated for original_values[i].

Code

caffe2/operators/percentile_op.cc

Perplexity

Perplexity calculates how well a probability distribution predicts a sample. Perplexity takes a 1-D tensor containing a batch of probabilities. Each value in the tensor belongs to a different sample and represents the probability of the model predicting the true label for that sample. The operator returns a single (float) perplexity value for the batch.

Interface

Inputs
`probabilities`	The input data as Tensor. It contains a batch oftrue label or target probabilities
Outputs
`output`	The output- a single (float) perplexity value for the batch

Code

caffe2/operators/perplexity_op.cc

PiecewiseLinearTransform

PiecewiseLinearTransform takes inputs – predictions, a 2-D or 1-D tensor (Tensor) of size (batch_size x prediction_dimensions). The piecewise linear functions are stored in bounds, slopes and intercepts. The output tensor has the same shape of input `predictions` and contains the predictions transformed by the piecewise linear functions. Each column of predictions has its own piecewise linear transformation functions. Therefore the size of piecewise function parameters are pieces x prediction_dimensions, except for binary predictions where only the positive prediction needs them. Note that in each piece, low bound is excluded while high bound is included. Also the piecewise linear function must be continuous. Notes - If the input is binary predictions (Nx2 or Nx1 tensor), set the binary arg to true so that one group of piecewise linear functions is needed (see details below).

The transform parameters (bounds, slopes, intercepts) can be passed either through args or through input blobs.
If we have multiple groups of piecewise linear functions, each group has the same number of pieces.
If a prediction is out of the bounds, it is capped to the smallest or largest bound.

Interface

Arguments
`bounds`	1-D vector of size (prediction_dimensions x (pieces+1)) contain the upper bounds of each piece of linear function. One special case is the first bound is the lower bound of whole piecewise function and we treat it the same as the left most functions. (bounds, slopes, intercepts) can be passed through either arg or input blobs.
`slopes`	1-D vector of size (prediction_dimensions x pieces) containing the slopes of linear function
`intercepts`	1-D vector of size (prediction_dimensions x pieces) containing the intercepts of linear function
`binary`	If set true, we assume the input is a Nx1 or Nx2 tensor. If it is Nx1 tensor, it is positive predictions. If the input is Nx2 tensor, its first column is negative predictions and second column is positive and negative + positive = 1. We just need one group of piecewise linear functions for the positive predictions.
Inputs
`predictions`	2-D tensor (Tensor) of size (num_batches x num_classes) containing scores
`bounds (optional)`	See bounds in Arg. (bounds, slopes, intercepts) can be passed through either arg or input blobs.
`slopes (optional)`	See slopes in Arg. (bounds, slopes, intercepts) can be passed through either arg or input blobs.
`intercepts (optional)`	See intercepts in Arg. (bounds, slopes, intercepts) can be passed through either arg or input blobs.
Outputs
`transforms`	2-D tensor (Tensor) of size (num_batches x num_classes) containing transformed predictions

Code

caffe2/operators/piecewise_linear_transform_op.cc

Pow

Pow takes input data (Tensor) and an argument exponent, which can be a scalar or another tensor. It produces one output data (Tensor), where the function `f(x) = x^exponent` is applied to the data tensor elementwise.

Interface

Arguments
`exponent`	The exponent of the power function.
Inputs
`X`	Input tensor of any shape
`exponent`	The exponent of the power function.
Outputs
`Y`	Output tensor (same size as X)

Code

caffe2/operators/pow_op.cc

PrependDim

Reshape the tensor by prepending a dimension of fixed size and dividing the size of the next dimension by that amount.

Interface

Arguments
`dim_size`	Size of the dimension to prepend.
Inputs
`data`	An input tensor.
Outputs
`reshaped`	Reshaped tensor.

Code

caffe2/operators/prepend_dim_op.cc

Print

Logs shape and contents of input tensor to stderr or to a file.

Interface

Arguments
`to_file`	(bool) if 1, saves contents to the root folder of the current workspace, appending the tensor contents to a file named after the blob name. Otherwise, logs to stderr.
Inputs
`tensor`	The tensor to print.

QuantDecode

Decode inputs using codebook. This is a general LUT operator that returns tensors with values from codebook (input 0) based on given indices in codes (input 1 ~ n). Example: Input:

  codebook = [1.5, 2.5, 3.5]
  codes_0 = [0, 1, 1, 2]
  codes_1 = [2, 0, 0]

Output:

  decoded_0 = [1.5, 2.5, 2.5, 3.5]
  decoded_1 = [3.5, 1.5, 1.5]

Interface

Inputs
`codebook`	Codebook in 1d tensor (float)
`codes_0`	Encoded codes 0 (uint8/uint16/int32)
`codes_1`	Encoded codes 1 if existed (uint8/uint16/int32)
`codes_n`	Encoded codes n if existed (uint8/uint16/int32)
Outputs
`decoded_0`	Decoded tensor for codes_0 (float)
`decoded_1`	Decoded tensor for codes_1 (float)
`decoded_n`	Decoded tensor for codes_n (float)

First, initialized the states from the input recurrent states - For each timestep T, apply the links (that map offsets from input/output tensors into the inputs/outputs for the step network) - Finally, alias the recurrent states to the specified output blobs. This is a fairly special-case meta-operator, and so the implementation is somewhat complex. It trades of generality (and frankly usability) against performance and control (compared to e.g. TF dynamic_rnn, Theano scan, etc). See the usage examples for a flavor of how to use it.

Code

caffe2/operators/rnn/recurrent_network_op.cc

RecurrentNetworkBlobFetcher

Retrieves blobs from scratch workspaces (which contain intermediate recurrent network computation for each timestep) and puts them in the global workspace under CPUContext.

Interface

Arguments
`prefix`	Prefix string to prepend extracted blobs.
Inputs
`ScratchWorkspaceBlob`	Name of scratch workspace blob returned by recurrent network.
Outputs
`blob_names`	1D tensor of strings containing extracted blob names.

  When lengths is given, sum is only computed

with subsets of elements correspondingly.

Interface

Arguments
`num_reduce_dims`	Number of dimensions to reduce.
Inputs
`data_in`	(T<D1…, Dn>) Input data.
`lengths`	Num of elements in each sample, should have size D2 x D3 x … x Dn.

ReduceMean

      Computes the mean of the input tensor's element along the provided axes.
      The resulted tensor has the same rank as the input if keepdims equal 1.
      If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

Interface

Arguments
`axes`	A list of integers, along which to reduce.
`keepdims`	Keep the reduced dimension(s) or not, default 1 keeps the reduced dimension(s).
Inputs
`data`	An input tensor.
Outputs
`reduced`	Reduced output tensor.

Code

caffe2/operators/reduce_ops.cc

ReduceScatter

Does reduce-scatter operation among the nodes. Currently only Sum is supported.

Interface

Inputs
`comm_world`	The common world.
`X`	A tensor to be reduce-scattered.
Outputs
`Y`	The reduced tensor, scattered on all nodes.

Code

caffe2/operators/reduce_ops.cc

ReduceSum

  Computes the sum of the input tensor's element along the provided axes.
  The resulted tensor has the same rank as the input if keepdims equal 1.
  If keepdims equal 0, then the resulted tensor have the reduced dimension pruned.

Interface

Arguments
`axes`	A list of integers, along which to reduce.
`keepdims`	Keep the reduced dimension(s) or not, default 1 keeps the reduced dimension(s).
Inputs
`data`	An input tensor.
Outputs
`reduced`	Reduced output tensor.

Code

ReduceTailSum

Reduce the tailing dimensions

Interface

Inputs
`mat`	The matrix
Outputs
`output`	Output

Code

caffe2/operators/rowmul_op.cc

Relu

Relu takes one input data (Tensor) and produces one output data (Tensor) where the rectified linear function, y = max(0, x), is applied to the tensor elementwise.

Interface

Inputs
`X`	1D input tensor
Outputs
`Y`	1D input tensor

Code

caffe2/operators/relu_op.cc

ReluGradient

ReluGradient takes both Y and dY and uses this to update dX according to the chain rule and derivatives of the rectified linear function.

Code

caffe2/operators/relu_op.cc

RemoveDataBlocks

Interface

Inputs
`data`	a N-D data tensor, N >= 1
`indices`	zero-based indices of blocks to be removed
Outputs
`shrunk data`	data after removing data blocks indexed by ‘indices’

Code

caffe2/operators/remove_data_blocks_op.cc

RemovePadding

Remove padding around the edges of each segment of the input data. This is the reverse opration of AddPadding, and uses the same arguments and conventions for input and output data format.

Interface

Arguments
`padding_width`	Outer-size of padding to remove around each range.
`end_padding_width`	(Optional) Specifies a different end-padding width.
Inputs
`data_in`	T<N, D1…, Dn> Input data
`lengths`	(i64) Num of elements in each range. sum(lengths) = N. If not provided, considers all data as a single segment.
Outputs
`data_out`	(T<N - 2*padding_width, D1…, Dn>) Unpadded data.
`lengths_out`	(i64, optional) Lengths for each unpadded range.

Code

caffe2/operators/sequence_ops.cc

ReplaceNaN

Replace the NaN (not a number) element in the input tensor with argument value

Interface

Arguments
`value (optional)`	the value to replace NaN, the default is 0
Inputs
`input`	Input tensor
`output`	Output tensor

Code

caffe2/operators/replace_nan_op.cc

ReservoirSampling

Collect DATA tensor into RESERVOIR of size num_to_collect . DATA is assumed to be a batch. In case where ‘objects’ may be repeated in data and you only want at most one instance of each ‘object’ in the reservoir, OBJECT_ID can be given for deduplication. If OBJECT_ID is given, then you also need to supply additional book-keeping tensors. See input blob documentation for details. This operator is thread-safe.

Interface

Arguments
`num_to_collect`	The number of random samples to append for each positive samples
Inputs
`RESERVOIR`	The reservoir; should be initialized to empty tensor
`NUM_VISITED`	Number of examples seen so far; should be initialized to 0
`DATA`	Tensor to collect from. The first dimension is assumed to be batch size. If the object to be collected is represented by multiple tensors, use `PackRecords` to pack them into single tensor.
`MUTEX`	Mutex to prevent data race
`OBJECT_ID`	(Optional, int64) If provided, used for deduplicating object in the reservoir
`OBJECT_TO_POS_MAP_IN`	(Optional) Auxillary bookkeeping map. This should be created from `CreateMap` with keys of type int64 and values of type int32
`POS_TO_OBJECT_IN`	(Optional) Tensor of type int64 used for bookkeeping in deduplication
Outputs
`RESERVOIR`	Same as the input
`NUM_VISITED`	Same as the input
`OBJECT_TO_POS_MAP`	(Optional) Same as the input
`POS_TO_OBJECT`	(Optional) Same as the input

Code

caffe2/operators/reservoir_sampling.cc

ResetCounter

Resets a count-down counter with initial value specified by the ‘init_count’ argument.

Interface

Arguments
`init_count`	Resets counter to this value, must be >= 0.
Inputs
`counter`	A blob pointing to an instance of a new counter.
Outputs
`previous_value`	(optional) Previous value of the counter.

Code

caffe2/operators/counter_ops.cc

ResetCursor

Resets the offsets for the given TreeCursor. This operation is thread safe.

Interface

Inputs
`cursor`	A blob containing a pointer to the cursor.

Code

caffe2/operators/reshape_op.cc

Reshape

Reshape the input tensor similar to numpy.reshape. It takes a tensor as input and an optional tensor specifying the new shape. When the second input is absent, an extra argument shape must be specified. It outputs the reshaped tensor as well as the original shape. At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. A dimension could also be 0, in which case the actual dimension value is going to be copied from the input tensor.

Interface

Arguments
`shape`	New shape
Inputs
`data`	An input tensor.
`new_shape`	New shape.
Outputs
`reshaped`	Reshaped data.
`old_shape`	Original shape.

Code

ResizeLike

Produces tensor containing data of first input and shape of second input.

Interface

Inputs
`data`	Tensor whose data will be copied into the output.
`shape_tensor`	Tensor whose shape will be applied to output.
Outputs
`output`	Tensor with data of input 0 and shape of input 1.

Code

caffe2/operators/resize_op.cc

ResizeNearest

Resizes the spatial dimensions of the input using nearest neighbor interpolation. The width_scale and height_scale arguments control the size of the output, which is given by: output_width = floor(input_width * width_scale) output_height = floor(output_height * height_scale)

Interface

Arguments
`width_scale`	Scale along width dimension
`height_scale`	Scale along height dimension
Inputs
`X`	Input tensor
Outputs
`Y`	Output tensor

Code

ResizeNearestGradient

No documentation yet.

Interface

Arguments
`width_scale`	Scale along width dimension
`height_scale`	Scale along height dimension

Code

caffe2/operators/resize_op.cc

RetrieveCount

Retrieve the current value from the counter.

Interface

Inputs
`counter`	A blob pointing to an instance of a counter.
Outputs
`count`	current count value.

Code

caffe2/operators/counter_ops.cc

ReversePackedSegs

Reverse segments in a 3-D tensor (lengths, segments, embeddings,), leaving paddings unchanged. This operator is used to reverse input of a recurrent neural network to make it a BRNN.

Interface

Inputs
`data`	a 3-D (lengths, segments, embeddings,) tensor.
`lengths`	length of each segment.
Outputs
`reversed data`	a (lengths, segments, embeddings,) tensor with each segment reversedand paddings unchanged.

Code

caffe2/operators/reverse_packed_segs_op.cc

RmsProp

Computes the RMSProp update ( http://www.cs.toronto.edu/ ~tijmen/csc321/slides/lecture_slides_lec6.pdf). Concretely, given inputs (grad, mean_squares, mom, lr), computes:

    mean_squares_o = mean_squares + (1 - decay) * (square(grad) - mean_squares)
    mom_o = momentum * mom + lr * grad / sqrt(epsilon + mean_squares_o)
    grad_o = mom_o

Returns (grad_o, mean_squares_o, mom_o).

Code

caffe2/sgd/rmsprop_op.cc

RoIAlign

Region of Interest (RoI) align operation as used in Mask R-CNN.

Interface

Arguments
`spatial_scale`	(float) default 1.0; Spatial scale of the input feature map X relative to the input image. E.g., 0.0625 if X has a stride of 16 w.r.t. the input image.
`pooled_h`	(int) default 1; Pooled output Y’s height.
`pooled_w`	(int) default 1; Pooled output Y’s width.
`sampling_ratio`	(int) default -1; number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If <= 0, then an adaptive number of grid points are used (computed as ceil(roi_width / pooled_w), and likewise for height).
Inputs
`X`	4D feature map input of shape (N, C, H, W).
`RoIs`	2D input of shape (R, 4 or 5) specifying R RoIs representing: batch index in [0, N - 1], x1, y1, x2, y2. The RoI coordinates are in the coordinate system of the input image. For inputs corresponding to a single image, batch index can be excluded to have just 4 columns.
Outputs
`Y`	4D output of shape (R, C, pooled_h, pooled_w). The r-th batch element is a pooled feature map cooresponding to the r-th RoI.

Code

caffe2/operators/roi_align_op.cc

RoIAlignGradient

No documentation yet.

Interface

Inputs
`X`	See RoIPoolF.
`RoIs`	See RoIPoolF.
`dY`	Gradient of forward output 0 (Y)
Outputs
`dX`	Gradient of forward input 0 (X)

Code

caffe2/operators/roi_align_gradient_op.cc

RoIPool

Carries out ROI Pooling for Faster-RCNN. Depending on the mode, there are multiple output cases:

  Output case #1: Y, argmaxes (train mode)
  Output case #2: Y           (test mode)

Interface

Arguments
`is_test`	If set, run in test mode and skip computation of argmaxes (used for gradient computation). Only one output tensor is produced. (Default: false).
`order`	A StorageOrder string (Default: “NCHW”).
`pooled_h`	The pooled output height (Default: 1).
`pooled_w`	The pooled output width (Default: 1).
`spatial_scale`	Multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling (Default: 1.0).
Inputs
`X`	The input 4-D tensor of data. Only NCHW order is currently supported.
`rois`	RoIs (Regions of Interest) to pool over. Should be a 2-D tensor of shape (num_rois, 5) given as [[batch_id, x1, y1, x2, y2], …].
Outputs
`Y`	RoI pooled output 4-D tensor of shape (num_rois, channels, pooled_h, pooled_w).
`argmaxes`	Argmaxes corresponding to indices in X used for gradient computation. Only output if arg “is_test” is false.

RowWiseArgMax

    Given a 2D (N X D) input tensor, this operator returns a 2D (N X 1) output
    tensor with with the index of the maximum value in each row. If there are
    duplicate max values in a row the index of the first occurence is returned.

Interface

Inputs
`X`	2D (N X D) input tensor
Outputs
`Z`	2D (N X 1) output tensor

Code

caffe2/operators/arg_max_op.cc

RowWiseSparseAdagrad

Given inputs (param, moment, indices, grad, lr), runs a modified sparse Adagrad update on (param, grad, moment[indices], lr), and returns (new_param, new_momwnr), where moment is a 1D tensor with length equal to the number of rows in param: shape(moment) == shape(param)[0]. Each element of moment is applied to an entire row of param, and the new moment is calculated by adding the average squared sum of gradients across each row. Note that indices must also be a 1D tensor indexing into the rows of param.

Interface

Arguments
`epsilon`	Default 1e-5
Inputs
`param`	Parameters to be updated
`moment`	Moment history
`indices`	Sparse indices
`grad`	Gradient computed
`lr`	learning rate
Outputs
`output_param`	Updated parameters
`output_moment_1`	Updated moment

Code

caffe2/sgd/adagrad_op.cc

RowWiseSparseAdam

Computes a modified Adam Update for the sparse case. Given inputs (param, moment1, moment2, indices, grad, lr, iter), runs the Adam update on (param, moment1[indices], moment2[indices], lr, iter) and returns (new_param, new_moment1, new_moment2), where moment2 is a 1D tensor with length equal to the number of rows in param: shape(moment2) == shape(param)[0]. Each element of

  moment2 is

applied to an entire row of param, and the new moment2 values are calculated by averaging across the row.

Interface

Arguments
`beta1`	Default 0.9
`beta2`	Default 0.999
`epsilon`	Default 1e-5
Inputs
`param`	Parameters to be updated
`moment_1`	First moment history
`moment_2`	Second moment history
`indices`	Sparse indices
`grad`	Gradient computed
`lr`	learning rate
`iter`	iteration number
Outputs
`output_param`	Updated parameters
`output_moment_1`	Updated first moment
`output_moment_2`	Updated second moment

Code

caffe2/sgd/adam_op.cc

Rowwise8BitQuantizedToFloat

Given uint8 tensor, quantized using 8bit row-wise quantization, and auxiliary scales and biases, this operator restores float tensor in the following way. We take input 8bits tensor of size

  (m_1, m_2, ..., m_n), n >= 2, reshape it  into matrix of size

(m_1, m_2 x… x m_n). We compute element r_{ij} of output matrix as r_{ij} * s_i + b_i and after this we reshape this output matrix into output tensor of size (m_1, m_2, …, m_n).

Interface

Inputs
`quantized_input`	quantized_input
`scale_bias`	Matrix of floats, each row r_i of which stores a pair s_i, b_i – scale and bias for i-th row
Outputs
`None`
`output`	output

Code

caffe2/operators/lengths_reducer_rowwise_8bit_ops.cc

RowwiseMax

Compute row-wise max reduction of the input tensor.

Interface

Inputs
`X`	A tenosr of dimensions batch_size x M x N to compute rowwise-max.
Outputs
`Y`	batch_size x M rowwise-max results matrix.

SequenceMask

Mask op designed for use in attention mechanisms for sequence modeling tasks. Supports batching: given batch_dim, collapses dims 0 through batch_dim into a single dimension, e.g. if tensor dims are [4,2,1,3,4] and batch_dim=2, first collapse tensor to [4 2 1,3,4], then mask each batch [i,:,:]. Two current operating modes: 1) Given a 2D input tensor and 1D tensor of sequence lengths, for each row i in the input tensor, set elements in that row to -inf if their column index j >= sequence_lengths[i]. This mode takes two inputs and argument mode = ‘sequence’ 2) Triangular mask. Given row index i and column index j, set elements to -inf given the following conditions:

      mode='upper', x_ij = -inf if j < i
      mode='lower', x_ij = -inf if j > i
      mode='upperdiag', x_ij = -inf if j <= i
      mode='lowerdiag', x_ij = -inf if j >= i

This mode takes one input. 3) Window Mask. Given a 2D input tensor and 1D tensor of window centers, for each row i in the input tensor, set elements in that row to -inf if their column index j outside [center - radius, center + radius]. This mode takes two inputs and argument mode = ‘sequence’. Argument ‘radius’ should be provided.

Interface

Arguments
`mode`	(string) Mode selection. Possible values: ‘sequence’, ‘upper’, ‘lower’, ‘upperdiag’, ‘lowerdiag’
`axis`	(int) Beginning axis of row elements. All dimensions to the left will be treated as row indices and those to the right (inclusive) will be treated as column indices in the 2D mask
`grad`	(bool) operate in gradient mode
`radius`	(int) radius of windows in window mode
`batch`	(int) batch dimension of tensor (optional)
`repeat_from_axis`	(int) used when mask should be repeated for one or more data dimensions (beginning at this axis). (currently only supported for sequence mode without batch argument)
Inputs
`input`	Tensor to apply masking to
`sequence_lengths`	1D Tensor of sequence lengths for mode #1
Outputs
`masked_tensor`	Input tensor with masking applied

  data = [
      [1, 2, 3, 4],
      [5, 6, 7, 8],
  ]
  starts = [0, 1]
  ends = [-1, 3]

  result = [
      [2, 3],
      [6, 7],
  ]

Interface

Arguments
`starts`	List of starting indices
`ends`	List of ending indices
Inputs
`data`	Tensor of data to extract slices from.
`starts`	1D tensor: start-indices for each dimension of data.
`ends`	1D tensor: end-indices for each dimension of data.
Outputs
`output`	Sliced data tensor.

Softsign

Calculates the softsign (x/1+

) of the given input tensor element-wise. This operation can be done in an in-place fashion too, by providing the same input and output blobs.

Interface

Inputs
`input`	1-D input tensor
Outputs
`output`	The softsign (x/1+	x	) values of the input tensor computed element-wise

Code

caffe2/operators/softsign_op.cc

SoftsignGradient

Calculates the softsign gradient (sgn(x)/(1+

)^2) of the given input tensor element-wise.

Interface

Inputs
`input`	1-D input tensor
`input`	1-D input tensor
Outputs
`output`	The softsign gradient (sgn(x)/(1+	x	)^2) values of the input tensor computed element-wise

Code

caffe2/operators/softsign_op.cc

SortAndShuffle

Compute the sorted indices given a field index to sort by and break the sorted indices into chunks of shuffle_size * batch_size and shuffle each chunk, finally we shuffle between batches. If sort_by_field_idx is -1 we skip sort. For example, we have data sorted as 1,2,3,4,5,6,7,8,9,10,11,12 and batchSize = 2 and shuffleSize = 3, when we shuffle we get: [3,1,4,6,5,2] [12,10,11,8,9,7] After this we will shuffle among different batches with size 2 [3,1],[4,6],[5,2],[12,10],[11,8],[9,7] We may end up with something like [9,7],[5,2],[12,10],[4,6],[3,1],[11,8] Input(0) is a blob pointing to a TreeCursor, and [Input(1),… Input(num_fields)] a list of tensors containing the data for each field of the dataset. SortAndShuffle is thread safe.

Interface

Inputs
`cursor`	A blob containing a pointer to the cursor.
`dataset_field_0`	First dataset field
Outputs
`indices`	Tensor containing sorted indices.

Code

caffe2/operators/segment_reduction_op.cc

SortedSegmentMean

Applies ‘Mean’ to each segment of input tensor. Segments need to be sorted and contiguous. See also UnsortedSegmentMean that doesn’t have this requirement. SEGMENT_IDS is a vector that maps each of the first dimension slices of the DATA to a particular group (segment). Values belonging to the same segment are aggregated together. The first dimension of the output is equal to the number of input segments, i.e. SEGMENT_IDS[-1]+1 . Other dimensions are inherited from the input tensor. Mean computes the element-wise mean of the input slices. Operation doesn’t change the shape of the individual blocks.

Interface

Inputs
`DATA`	Input tensor, slices of which are aggregated.
`SEGMENT_IDS`	Vector with the same length as the first dimension of DATA and values in the range 0..K-1 and in increasing order that maps each slice of DATA to one of the segments
Outputs
`OUTPUT`	Aggregated output tensor. Has the first dimension of K (the number of segments).

SparseSortedSegmentWeightedSumGradient

No documentation yet.

Code

SparseToDense

Convert sparse representations to dense with given indices. Transforms a sparse representation of map<id, value> represented as indices vector and values tensor into a compacted tensor where the first dimension is determined by the first dimension of the 3rd input if it is given or the max index. Missing values are filled with zeros. The op supports duplicated indices and performs summation over corresponding values. This behavior is useful for converting GradientSlices into dense representation. After running this op:

  output[indices[i], :] += values[i]  # sum over all indices[i] equal to the index
  output[j, ...] = 0 if j not in indices

Interface

Inputs
`indices`	1-D int32/int64 tensor of concatenated ids of data
`values`	Data tensor, first dimension has to match `indices`, basic numeric types are supported
`data_to_infer_dim`	Optional: if provided, the first dimension of output is the first dimension of this tensor.
Outputs
`output`	Output tensor of the same type as `values` of shape `[len(lengths), len(mask)] + shape(default_value)` (if `lengths` is not provided the first dimension is omitted)

Code

caffe2/operators/sparse_to_dense_op.cc

SparseToDenseMask

Convert sparse representations to dense with given indices. Transforms a sparse representation of map<id, value> represented as indices vector and values tensor into a compacted tensor where the first dimension corresponds to each id provided in mask argument. Missing values are filled with the value of default_value . After running this op:

  output[j, :] = values[i] # where mask[j] == indices[i]
  output[j, ...] = default_value # when mask[j] doesn't appear in indices

If lengths is provided and not empty, and extra “batch” dimension is prepended to the output. values and default_value can have additional matching dimensions, operation is performed on the entire subtensor in thise case. For example, if lengths is supplied and values is 1-D vector of floats and default_value is a float scalar, the output is going to be a float matrix of size len(lengths) X len(mask)

Interface

Arguments
`mask`	list(int) argument with desired ids on the ‘dense’ output dimension
`return_presence_mask`	bool whether to return presence mask, false by default
Inputs
`indices`	1-D int32/int64 tensor of concatenated ids of data
`values`	Data tensor, first dimension has to match `indices`
`default_value`	Default value for the output if the id is not present in `indices`. Must have the same type as `values` and the same shape, but without the first dimension
`lengths`	Optional lengths to represent a batch of `indices` and `values`.
Outputs
`output`	Output tensor of the same type as `values` of shape `[len(lengths), len(mask)] + shape(default_value)` (if `lengths` is not provided the first dimension is omitted)
`presence_mask`	Bool tensor of shape `[len(lengths), len(mask)]` (if `lengths` is not provided the first dimension is omitted). True when a value for given id was present, false otherwise.

SpatialBN

Carries out spatial batch normalization as described in the paper https://arxiv.org/abs/1502.03167 . Depending on the mode it is being run, there are multiple cases for the number of outputs, which we list below: Output case #1:

  Y, mean, var, saved_mean, saved_var (training mode)

Output case #2:

  Y (test mode)

Interface

Arguments
`is_test`	If set to nonzero, run spatial batch normalization in test mode.
`epsilon`	The epsilon value to use to avoid division by zero.
`order`	A StorageOrder string.
`momentum`	Factor used in computing the running mean and variance.e.g., running_mean = running_mean * momentum + mean * (1 - momentum)
`num_batches`	(Optional) Specifies the number of batches to apply normalization on. Requires specifying the optional sums and sumsq inputs that provide statistics across multiple batches from which mean and variance can be determined.
Inputs
`X`	The input 4-dimensional tensor of shape NCHW or NHWC depending on the order parameter.
`scale`	The scale as a 1-dimensional tensor of size C to be applied to the output.
`bias`	The bias as a 1-dimensional tensor of size C to be applied to the output.
`mean`	The running mean (training) or the estimated mean (testing) as a 1-dimensional tensor of size C.
`var`	The running variance (training) or the estimated variance (testing) as a 1-dimensional tensor of size C.
`sums`	(optional) Per-channel sums of elements to be used to determine the mean and variance for this batch
`sumsq`	(optional) Per-channel sum of elements squared per channel to be used to determine the variance for this batch
Outputs
`Y`	The output 4-dimensional tensor of the same shape as X.
`mean`	The running mean after the spatial BN operator. Must be in-place with the input mean. Should not be used for testing.
`var`	The running variance after the spatial BN operator. Must be in-place with the input var. Should not be used for testing.
`saved_mean`	Saved mean used during training to speed up gradient computation. Should not be used for testing.
`saved_var`	Saved variance used during training to speed up gradient computation. Should not be used for testing.

  Data = [
    [2.0, 4.0],
    [9.0, 12.0]
  ]

  SCALE = [4, 9]

  OUTPUT = [
    [1.0, 2.0],
    [3.0, 4.0]
  ]

Code

caffe2/operators/square_root_divide_op.cc

SquaredL2Distance

Given two input float tensors X, Y, and produces one output float tensor of the L2 difference between X and Y that is computed as

(X - Y)^2 / 2

Interface

Inputs
`X`	1D or 2D input tensor
`Y`	1D or 2D input tensor (must have the same shape as X)
Outputs
`Z`	1D output tensor

  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (5,)
  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 4, 5), shape(B) = (2), with axis=0

Argument broadcast=1 needs to be passed to enable broadcasting.

Interface

Arguments
`broadcast`	Pass 1 to enable broadcasting
`axis`	If set, defines the broadcast dimensions. See doc for details.
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and type as A

Code

caffe2/operators/elementwise_sum_op.cc

Sum

Element-wise sum of each of the input tensors. The first input tensor can be used in-place as the output tensor, in which case the sum will be done in place and results will be accumulated in input0. All inputs and outputs must have the same shape and data type.

Interface

Inputs
`data_0`	First of the input tensors. Can be inplace.
Outputs
`sum`	Output tensor. Same dimension as inputs.

Code

SumElements

Sums the elements of the input tensor.

Interface

Arguments
`average`	whether to average or not
Inputs
`X`	Tensor to sum up
Outputs
`sum`	Scalar sum

  shape(A) = (2, 3, 4, 5), shape(B) = (4, 5)
  shape(A) = (2, 3, 4, 5), shape(B) = (,), i.e. B is a scalar
  shape(A) = (2, 3, 4, 5), shape(B) = (3, 4), with axis=1
  shape(A) = (2, 3, 2, 5), shape(B) = (2), with axis=0

Interface

Arguments
`axis`	If set, defines the starting dimension for reduction. Args `axis` and `axis_str` cannot be used simultaneously.
`axis_str`	If set, it could only be N or C or H or W. `order` arg should also be provided. It defines the reduction dimensions on NCHW or NHWC. Args `axis` and `axis_str` cannot be used simultaneously.
`order`	Either NHWC or HCWH
Inputs
`A`	First operand, should share the type with the second operand.
`B`	Second operand. With broadcasting can be of smaller size than A. If broadcasting is disabled it should be of the same size.
Outputs
`C`	Result, has same dimensions and type as B

Code

caffe2/operators/reduction_ops.cc

SumSqrElements

Sums the squares elements of the input tensor.

Interface

Arguments
`average`	whether to average or not
Inputs
`X`	Tensor to sum up
Outputs
`sum`	Scalar sum of squares

Code

Summarize

Summarize computes four statistics of the input tensor (Tensor)- min, max, mean and standard deviation. The output will be written to a 1-D tensor of size 4 if an output tensor is provided. Else, if the argument 'to_file' is greater than 0, the values are written to a log file in the root folder.

Interface

Arguments
`to_file`	(int, default 0) flag to indicate if the summarized statistics have to be written to a log file.
Inputs
`data`	The input data as Tensor.
Outputs
`output`	1-D tensor (Tensor) of size 4 containing min, max, mean and standard deviation

Code

caffe2/operators/summarize_op.cc

Swish

Swish takes one input data (Tensor) and produces one output data (Tensor) where the swish function, y = x / (1 + exp(-x)), is applied to the tensor elementwise.

Interface

Inputs
`X`	1D input tensor
Outputs
`Y`	1D output tensor

Code

caffe2/operators/swish_op.cc

SwishGradient

SwishGradient takes X, Y and dY and uses this to update dX according to the chain rule and derivatives of the swish function.

Code

caffe2/operators/swish_op.cc

TT

The TT-layer serves as a low-rank decomposition of a fully connected layer. The inputs are the same as to a fully connected layer, but the number of parameters are greatly reduced and forward computation time can be drastically reduced especially for layers with large weight matrices. The multiplication is computed as a product of the input vector with each of the cores that make up the TT layer. Given the input sizes (inp_sizes), output sizes(out_sizes), and the ranks of each of the cores (tt_ranks), the ith core will have size:

    inp_sizes[i] * tt_ranks[i] * tt_ranks[i + 1] * out_sizes[i].

The complexity of the computation is dictated by the sizes of inp_sizes, out_sizes, and tt_ranks, where there is the trade off between accuracy of the low-rank decomposition and the speed of the computation.

Interface

Arguments
`inp_sizes`	(int[]) Input sizes of cores. Indicates the input size of the individual cores; the size of the input vector X must match the product of the inp_sizes array.
`out_sizes`	(int[]) Output sizes of cores. Indicates the output size of the individual cores; the size of the output vector Y must match the product of the out_sizes array.
`tt_ranks`	(int[]) Ranks of cores. Indicates the ranks of the individual cores; lower rank means larger compression, faster computation but reduce accuracy.
Inputs
`X`	Input tensor from previous layer with size (M x K), where M is the batch size and K is the input size.
`b`	1D blob containing the bias vector
`cores`	1D blob containing each individual cores with sizes specified above.
Outputs
`Y`	Output tensor from previous layer with size (M x N), where M is the batch size and N is the output size.

  output = [v1, v2, v3, v4, v5, v6, v7, v8];

One application for this operator is the transfer data from the format of RNN back to sequence values. Note that the gradient operator of UnpackRNNSequence is PackRNNSequence.

Interface

Inputs
`values`	Data tensor, contains the packed features
`lengths`	lengths with each number representing the pack size.
Outputs
`output`	Output tensor before packing

Code

caffe2/operators/pack_rnn_sequence_op.cc

UnpackSegments

Map N+1 dim tensor to N dim based on length blob

Interface

Inputs
`lengths`	1-d int/long tensor contains the length in each of the input.
`tensor`	N+1 dim Tensor.
Outputs
`packed_tensor`	N dim Tensor

Code

caffe2/operators/pack_segments.cc

UnsafeCoalesce

Coalesce the N inputs into N outputs and a single coalesced output blob. This allows operations that operate over multiple small kernels (e.g. biases in a deep CNN) to be coalesced into a single larger operation, amortizing the kernel launch overhead, synchronization costs for distributed computation, etc. The operator: - computes the total size of the coalesced blob by summing the input sizes - allocates the coalesced output blob as the total size - copies the input vectors into the coalesced blob, at the correct offset.

aliases each Output(i) to- point into the coalesced blob, at the corresponding offset for Input(i). This is ‘unsafe’ as the output vectors are aliased, so use with caution.

Code