A templated class to allow one to wrap a CPU operator as a CUDA operator. More...
#include <operator_fallback_gpu.h>
Public Member Functions | |
USE_OPERATOR_FUNCTIONS (CUDAContext) | |
GPUFallbackOpEx (const OperatorDef &def, Workspace *ws) | |
bool | RunOnDevice () override |
Public Member Functions inherited from caffe2::Operator< CUDAContext > | |
Operator (const OperatorDef &operator_def, Workspace *ws) | |
Operator (const c10::FunctionSchema &fn_schema, std::vector< c10::IValue > inputs, std::vector< at::Tensor > outputs) | |
const Tensor & | Input (int idx, DeviceType type=CUDAContext::GetDeviceType()) |
Retrieve a non-owning reference to the input at position 'idx' for this operator. More... | |
Tensor | XOutput (int idx, at::IntArrayRef dims, at::TensorOptions options) |
XOutput is a modernized version of Output which returns a Tensor rather than a Tensor* (the raw pointer in the latter case is useless, as Tensor is a pointer type.) | |
Public Member Functions inherited from caffe2::OperatorBase | |
OperatorBase (const OperatorDef &operator_def, Workspace *ws) | |
OperatorBase (const c10::FunctionSchema &schema, std::vector< c10::IValue > inputs, std::vector< at::Tensor > outputs) | |
bool | isLegacyOperator () const |
Return true if the operator was instantiated with OperatorDef New operators should be instantiated with FunctionSchema. | |
const c10::FunctionSchema & | getFunctionSchema () const |
bool | HasArgument (const string &name) const |
Checks if the operator has an argument of the given name. | |
template<typename T > | |
T | GetSingleArgument (const string &name, const T &default_value) const |
template<typename T > | |
bool | HasSingleArgumentOfType (const string &name) const |
template<typename T > | |
vector< T > | GetVectorFromIValueList (const c10::IValue &value) const |
template<typename T > | |
vector< T > | GetRepeatedArgument (const string &name, const vector< T > &default_value={}) const |
template<typename T > | |
const T & | Input (int idx) |
template<typename T > | |
const T & | Input (int idx, DeviceType type) |
template<typename T > | |
T * | Output (int idx) |
template<typename T > | |
T * | Output (int idx, DeviceType type) |
Tensor | XOutputTensor (int idx, at::IntArrayRef dims, at::TensorOptions options) |
void | SetOutputTensor (int idx, Tensor tensor) |
Tensor | OutputTensorOrUndefined (int idx) |
Tensor * | OutputTensor (int idx, at::IntArrayRef dims, at::TensorOptions options) |
Tensor * | OutputTensorCopyFrom (int idx, at::TensorOptions options, const Tensor &src, bool async=false) |
Tensor * | OutputTensorAlias (int idx, const Tensor &src) |
template<typename T > | |
T * | Output (int idx, T *allocated) |
const Blob & | InputBlob (int idx) |
Blob * | OutputBlob (int idx) |
bool | IsInputOutputAlias (int i, int j) |
template<typename T > | |
bool | InputIsType (int idx) |
bool | InputIsTensorType (int idx, DeviceType device_type) |
template<typename T > | |
bool | OutputIsType (int idx) |
bool | OutputIsTensorType (int idx, DeviceType type) |
int | InputSize () const |
int | OutputSize () const |
const vector< const Blob * > & | Inputs () const |
const vector< Blob * > & | Outputs () |
vector< TensorShape > | InputTensorShapes () const |
virtual void | WaitEvent (const Event &ev, int=-1) |
void | Wait (const OperatorBase &other, int stream_id=-1) |
virtual void | WaitEvents (const std::vector< const Event * > &events, int=-1) |
virtual void | Finish () |
virtual bool | Run (int=0) |
virtual bool | HasAsyncPart () const |
virtual bool | SupportsAsyncScheduling () const |
virtual bool | RunAsync (int stream_id=0) |
virtual void | AddRelatedBlobInfo (EnforceNotMet *err) |
const OperatorDef & | debug_def () const |
void | set_debug_def (const std::shared_ptr< const OperatorDef > &operator_def) |
bool | has_debug_def () const |
void | RecordLastFailedOpNetPosition () |
int | net_position () const |
void | set_net_position (int idx) |
const DeviceOption & | device_option () const |
const Event & | event () const |
Event & | event () |
void | ResetEvent () |
void | DisableEvent () |
bool | IsEventDisabled () const |
virtual void | SyncDeviceBarrierForObservers () |
virtual bool | IsStreamFree (int) const |
const std::string & | type () const |
void | annotate_engine (const std::string &engine) |
const std::string & | engine () const |
void | SetExecutorHelper (ExecutorHelper *helper) |
ExecutorHelper * | GetExecutorHelper () const |
std::vector< at::Tensor > | move_newstyle_outputs ()&& |
template<> | |
NetDef | GetSingleArgument (const std::string &name, const NetDef &default_value) const |
template<> | |
vector< int > | GetVectorFromIValueList (const c10::IValue &value) const |
template<> | |
vector< float > | GetVectorFromIValueList (const c10::IValue &value) const |
template<> | |
vector< string > | GetVectorFromIValueList (const c10::IValue &value) const |
Public Member Functions inherited from caffe2::Observable< OperatorBase > | |
Observable (Observable &&)=default | |
Observable & | operator= (Observable &&)=default |
C10_DISABLE_COPY_AND_ASSIGN (Observable) | |
const Observer * | AttachObserver (std::unique_ptr< Observer > observer) |
std::unique_ptr< Observer > | DetachObserver (const Observer *observer_ptr) |
Returns a unique_ptr to the removed observer. More... | |
virtual size_t | NumObservers () |
void | StartAllObservers () |
void | StopAllObservers () |
Protected Attributes | |
Workspace | local_ws_ |
vector< Blob * > | local_input_blobs_ |
vector< Blob * > | local_output_blobs_ |
unique_ptr< OperatorBase > | base_op_ |
Protected Attributes inherited from caffe2::OperatorBase | |
std::unique_ptr< Event > | event_ |
Protected Attributes inherited from caffe2::Observable< OperatorBase > | |
std::vector< std::unique_ptr< Observer > > | observers_list_ |
Additional Inherited Members | |
Public Types inherited from caffe2::Observable< OperatorBase > | |
using | Observer = ObserverBase< OperatorBase > |
Static Public Attributes inherited from caffe2::OperatorBase | |
static const int | kNoNetPositionSet = -1 |
Protected Member Functions inherited from caffe2::OperatorBase | |
virtual void | RecordEvent (const char *=nullptr) |
void | SetEventFinished (const char *err_msg=nullptr) |
void | SetEventFinishedWithException (const char *err_msg=nullptr) |
std::string | getErrorMsg () |
C10_DISABLE_COPY_AND_ASSIGN (OperatorBase) | |
A templated class to allow one to wrap a CPU operator as a CUDA operator.
This class can be used when one does not have the CUDA implementation ready yet for an operator. Essentially, what this op does is to automatically deal with data copy for you. Plausibly, this causes a lot of overhead and is not optimal, so you should use this operator mostly for quick prototyping purpose.
All the input and output of the original operator should be TensorCPU.
Example usage: if you have a class MyMagicOp that is CPU based, and you use the registration code REGISTER_CPU_OPERATOR(MyMagic, MyMagicOp); to register the CPU side, you can create its corresponding GPU operator (with performance hits of course) via REGISTER_CUDA_OPERATOR(MyMagic, GPUFallbackOp); Note that you will need to make sure that the operators actually share the same name.
Advanced usage: if you want to have some specific outputs never copied, you can use the SkipOutputCopy template argument to do that. For example, if MyMagic produces two outputs and the first output is always going to live on the CPU, you can do REGISTER_CUDA_OPERATOR(MyMagic, GPUFallbackOpEx<SkipIndices<0>>);
Definition at line 42 of file operator_fallback_gpu.h.