CUDA.NET

CUFFT API function return values.

Represents an exception that occured in the driver.

Gets the error code returned by CUDA driver that caused the exception.

Compute Modes.

Default compute mode (Multiple contexts allowed per device).

Compute-exclusive mode (Only one context can be present on this device at a time).

Compute-prohibited mode (No contexts can be created on this device at this time).

Device represents a single device that is recognized by CUDA in the system. It provides all details about the device that can be obtained from the CUDA driver.

Holds the ordinal value of the device.

Holds the full name of the device.

Holds the compute capability as a version of the device.

Holds a handle to the device.

Contains further details about the device capabilities.

Holds the total memory available on the device.

Gets the ordinal of the device as recognized by CUDA.

Gets the full name of the device.

Gets the compute capability of the device as a version.

Gets a handle to the device to be used through other CUDA functions.

Gets more advanced properties of the device.

Gets the total memory available on the device.

Array formats.

Unsigned 8-bit integers.

Unsigned 16-bit integers.

Unsigned 32-bit integers.

Signed 8-bit integers.

Signed 16-bit integers.

Signed 32-bit integers.

16-bit floating point.

32-bit floating point.

Legacy device properties.

Maximum number of threads per block.

Maximum size of each dimension of a block.

Maximum size of each dimension of a grid.

Shared memory available per block in bytes.

Constant memory available on device in bytes.

Warp size in threads.

Maximum pitch in bytes allowed by memory copies.

32-bit registers available per block.

Clock frequency in kilohertz.

Alignment requirement for textures.

CUDAExecution is an helper class that creates execution plans for CUDA and executes them without dealing much with CUDA internal information. It allows to control most of the aspects with CUDA with the necessary level of abstraction.

With this class it is possible to load modules, using only their names, without specifying the extension or exact directory. For example, one can specify a module named "transpose" to be loaded, although the original file is named "transpose.cubin", much like using DLL files under windows. The module will be searched in the current directory if a full path is not specified. A .cubin extension is added automatically to the module name if it does not exist.

Creates an execution plan using the first device and the provided parameters.

CUBIN module file to use. Function name to use.

Creates an execution plan using the specified device and the provided parameters.

Device ordinal to use. CUBIN module file to use. Function name to use.

Creates an execution plan using the specified CUDA class and provided parameters.

Previously created CUDA class to use for GPU operations. CUBIN module file to use. Function name to use.

Launches the requested function on the GPU using the given execution configuration.

Grid configuration (number of blocks in X,Y dimensions). Block configuration (number of threads in X,Y,Z dimensions). Total runtime in milliseconds.

Launches the requested function on the GPU using the given execution configuration.

Blocks in X dimension. Blocks in Y dimension. Threads in X dimension. Threads in Y dimension. Threads in Z dimension. Total runtime in milliseconds.

Clears all resources used by this instance on the GPU (allocated memory for paratmeters etc.).

Adds a float scalar parameter to the function.

Name for the parameter to create. Float value to set. Index of the new parameter.

Adds an integer scalar parameter to the function.

Name for the parameter to create. Integer value to set. Index of the new parameter.

Adds an array parameter to the function.

Name for the parameter to create. Array data to set. Index of the new parameter. Default direction is as input.

Adds an array parameter to the function.

One of CUDA supported primitives or vector types. Name for the parameter to create. Array data to set. Direction for the buffer. Index of the new parameter.

Adds a vector parameter to the function.

One of CUDADriver supported vector types. Name for the parameter to create. Vector data to set. Index of the new parameter.

Adds a parameter to the function.

Custom parameter to add. Index of the new parameter.

Reads data from GPU memory for the specified parameter.

Type of expected data. Allocated array to contains the copied data. Index of parameter to copy data from.

Reads data from GPU memory for the specified parameter.

Type of expected data. Allocated array to contains the copied data. Name of parameter to read data from.

Reads data from GPU memory for the specified parameter.

Type of expected data. Allocated array to contains the copied data. Parameter to read data from.

Gets or sets the parameter in the specified index.

Zero based index for the parameter. Parameter in the specified index.

Gets or sets the parameter according to its name.

Name of the parameter. Parameter with the specified name.

Gets the time (in milliseconds) of the last execution.

Gets the name of the module to be used by this class.

Gets the CUDA object of the module to be used by this class.

Gets the name of the function to be called by this class.

Gets the CUDA object of the function to be called by this class.

Gets the CUDA instance used by this class.

CUFFT transform directions.

Forward FFT.

Inverse FFT.

Represents an exception that occured in the BLAS driver.

Gets the error code returned by CUFFT driver that caused the exception.

Provides an object oriented model for accessing BLAS functionality of CUDA, using CUDADriver to communicate with CUDA.

Creates a new instance of CUBLAS class.

CUDA object to use for memory allocation and other operations.

Initializes the CUBLAS driver.

Shuts down and releases all resources used by CUBLAS driver.

Returns the last error or result returned by calling one of CUBLAS driver functions.

Last error or result returned by calling one of CUBLAS driver functions.

Allocates memory with the specified amount.

Number of elements to allocate memory for. Size of each element to allocate memory for. Pointer to device memory that can be used with other CUBLAS functions.

Allocates device memory for the specified one dimensional array.

Type of the array element, must be one of the supported CUDA primitives. Array object to allocate memory for. Pointer to device memory that can be used with other CUBLAS functions.

Frees a previously allocated device memory.

Pointer to device memory.

Sets the vector in device memory given by ptr to the values of the array.

Type of array and destination vector, must be one of CUDA supported primitives. Array to copy to device memory. Vector in device memory to set.

Sets the vector in device memory given by ptr to the values of the array.

Type of array and destination vector, must be one of CUDA supported primitives. Array to copy to device memory. Offset from the begining of the array to start copy from. Vector in device memory to set. Offset from the begining of the device vector to start copy to.

Copies data from the device vector into the specified array.

Type of array to copy data to, must be one of CUDA supported primitives. Vector in device memory. Array to copy data to.

Copies data from device vector to the the array.

Type of array and source vector, must be one of CUDA supported primitives. Vector in device memory to copy from. Offset from the begining of the vector to start copy from. Array to copy device memory to. Offset from the begining of the array to start copy to.

Sets the matrix in device memory to values of the specified array.

Type of array and destination matrix, must be one of CUDA supported primitives. Number of rows of the matrix to set. Number of columns of the matrix to set. Array containing values to copy to device. Leading dimension of source matrix. Matrix in device memory to copy data to. Leading dimension of destination matrix.

Copies matrix data stored in device memory to the specified array.

Type of array and source matrix, must be one of CUDA supported primitives. Number of rows of the matrix to copy. Number of columns of the matrix to copy. Matrix in device memory to copy data from. Leading dimension of source matrix. Array to copy data to. Leading dimension of destination matrix.

Holds a value that indicates for the class whether to throw runtime exceptions when an error result is returned by calling any of the CUBLAS driver functions.

Default is true.

Holds the last result returned by calling one of the CUBLAS driver functions.

Holds a reference to a CUDA class to provide memory allocation capabilities.

Gets the last error/result returned by calling CUBLAS driver functions.

Gets or sets a value to indicate whether to use runtime exceptions when a CUBLAS driver function returns an error, or to ignore that error.

The default value is true.

Event creation flags.

Default event flag.

Event uses blocking synchronization.

Memory types.

Host memory.

Device memory.

Array memory.

CUBLAS status returns.

DeviceProperties holds advanced information for every device.

Holds the maximum number of threads that each block supports.

Holds an array that corresponds to two dimensions, as threads can be specified: X and Y. The number in every cell specifies the maximum number of threads supported by each block in the given dimension.

Holds an array that corresponds to three dimensions, as blocks can be specified: X, Y and Z. The number in every cell specifies the maximum number of blocks supported in the given dimension.

Holds a value that indicates the maximum amount of shared memory (as bytes) for every block.

Holds a value that indicates the maximum available constant memory in the device.

Holds the maximum size for a wrap or multiple instructions that can be executed in the same time.

Holds a value that indicates the supported memory pitch for the device.

Holds a value that indicates the maximum number of registers that can be used by a single block.

Holds a value that indicates the clock rate at which the device operates.

Holds a value that indicates the minimum alignment reqruiment for textures.

Gets the maximum number of threads supported per block.

Gets the maximum number of threads that can be specified in every dimension of a block (2D - X and Y).

Gets the maximum number of blocks that can be specified in every dimension of a grid (3D - X, Y and Z).

Gets the total amount of shared memory per block.

Gets the total amount of constant memory accessible for the device.

Gets the size of the warp or number of instructions that can be executed at the same time.

Gets a value that indicates the memory pitch supported by the device.

Gets the number of registers available per block.

Gets the clock rate at which the device operates.

Gets the minimum requirement for texture alignment in the device.

Provides access to cufft emulation driver API.

3D array descriptor.

Width of 3D array.

Height of 3D array.

Depth of 3D array.

Array format.

Channels per array element.

Flags.

Cubin matching fallback strategies.

Prefer to compile ptx.

Prefer to fall back to compatible binary code.

Array descriptor.

Width of array.

Height of array.

Array format.

Channels per array element.

CUDARuntime provides access to runtime API for CUDA.

Default page-locked allocation flag.

Pinned memory accessible by all CUDA contexts.

Map allocation into device space.

Write-combined memory.

Default event flag.

Event uses blocking synchronization.

Device flag - Automatic scheduling.

Device flag - Spin default scheduling.

Device flag - Yield default scheduling.

Device flag - Use blocking synchronization.

Device flag - Support mapped pinned allocations.

Device flags mask.

Flags to register a resource.

Flags to map or unmap a resource.

Flags to register a resource.

Flags to map or unmap a resource.

CUDA D3D9 Register Flags.

Default; Resource can be accessed througa void*.

Resource can be accessed through a CUarray*.

CUDA D3D9 Map Flags.

Default; Assume resource can be read/written.

CUDA kernels will not write to this resource.

CUDA kernels will only write to and will not read from this resource.

CUDA D3D10 Register Flags.

Default; Resource can be accessed through a void*.

Resource can be accessed through a CUarray*.

CUDA D3D10 Map Flags.

Default; Assume resource can be read/written.

CUDA kernels will not write to this resource.

CUDA kernels will only write to and will not read from this resource.

Online compilation targets.

Compute device class 1.0.

Compute device class 1.1.

Compute device class 1.2.

Compute device class 1.3.

Function properties.

The number of threads beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded.

The size in bytes of statically-allocated shared memory required by this function. This does not include dynamically-allocated shared memory requested by the user at runtime.

The size in bytes of user-allocated constant memory required by this function.

The size in bytes of thread local memory used by this function.

The number of registers used by each thread of this function.

CUFFT supports the following transform types.

Real to Complex (interleaved).

Complex (interleaved) to Real.

Complex to Complex, interleaved.

Double to Double-Complex.

Double-Complex to Double.

Double-Complex to Double-Complex.

Error codes.

No errors.

Invalid value.

Out of memory.

Driver not initialized.

Driver deinitialized.

No CUDA-capable device available.

Invalid device.

Invalid kernel image.

Invalid context.

Context already current.

Map failed.

Unmap failed.

Array is mapped.

Already mapped.

No binary for GPU.

Already acquired.

Not mapped.

Invalid source.

File not found.

Invalid handle.

Not found.

CUDA not ready.

Launch failed.

Launch exceeded resources.

Launch exceeded timeout.

Launch with incompatible texturing.

Unknown error.

CUDA provides an object oriented approach to CUDA driver API, thus simplifing access to CUDA functionality.

After every call to a driver function, an internal parameter is set to hold the error value returned by the specific function. This information can be accessed by LastError property of the object.

Creates a new instance of CUDA without initializing the driver.

Creates a new instance of CUDA without initializing the driver and selects a device to work with.

Device ID to select.

Creates a new instance of CUDA allowing to control whether to initialize the driver or not. While using default flags (InitializationFlags.None).

true to initialize the driver, false otherwise.

Creates a new instance and binds to the selected device.

Device ID to select. true to initialize the driver, false otherwise.

Creates a new instance of CUDA allowing to control whether to initialize the driver or not.

true to initialize the driver, false otherwise. Specifies which flags to pass to cuInit function that initializes the driver.

Destructor. Calls the Dispose function of the object.

Releases all resources used by the object while using CUDA.

Initializes the CUDA driver with default flags (InitializationFlags.None).

Initializes the CUDA driver with the specified flags.

Flags to pass to cuInit

Gets a device with specified ordinal.

Ordinal of the device to get. Gets a device with specified ordinal.

Returns the number of devices identified by CUDA driver.

The number of devices identified by CUDA driver.

Returns the name of the specified device.

Device to get its name. The name of the specified device.

Returns the name of the specified device.

Ordinal of device to get its name. The name of the specified device.

Returns the name of the current device.

The name of the current device.

Returns the attribute value for the specified device.

Attribute to get value for. Device to get attribute value for. The attribute value for the specified device.

Returns the attribute value for the current device.

Attribute to get value for. The attribute value for the current device.

Creates a new context, attached to the specified device ordinal.

Ordinal of the device to attach to. Context object to be used with other context related functions.

Creates a new context, attached to the specified device ordinal.

Ordinal of the device to attach to. Specific flags to pass to cuCtxCreate. Context object to be used with other context related functions.

Destroys the current context.

Destroys the provided context.

Context to destroy.

Attaches the driver to a previously created context.

Context to attach driver functions to.

Attaches the driver to a previously created context.

Context to attach driver functions to. Flags to pass to cuCtxAttach.

Detaches the current context from the driver.

Detaches the specified context from the driver.

Context to detach.

Pushes the current context on the driver context stack.

Pushes the specified context on the driver context stack.

Context to push.

Pops the context on the top of the driver context stack. Returns the poped context and makes it the current context for this class instance.

Poped context from driver context stack.

Returns the device the current context is attached to.

The device the current context is attached to.

Synchronizes all operations performed in this context and waits for them to finish.

This function is especially useful when performing memory operations or launching functions on the device asynchronously.

Loads the specified module using the specified file path.

Filename to load. Module object to be used across other module functions. Use this function to load cubin files for executing functions on the device. Please note, that a full path should be specified to avoid problems of the driver.

Loads the specified module from a binary data.

Byte array containing a cubin file representation to load. Module object to be used across other module functions.

Used to load cubin files attached together. This method isn't supported by the CUDA driver.

Byte array containing several cubin files. Module object to be used across other module functions.

Unloads the current module from the driver.

Unloads the specified module from the driver.

Module to unload.

Returns the requested function from the current module.

Function name to load. Function object to be used across other function management functions. When specifying function names, note that the compiler uses C++ name mangling, so to use simple naming, add the extern "C" directive before the __global__ keyword.

Returns the requested function from the specified module.

Module to load the function from. Function name to load. Function object to be used across other function management functions.

Returns a pointer to a global resource in the device code of the current module.

Name of the global resource to get. Pointer to the data.

Returns a pointer to a global resource in the device code of the specified module.

Module to get the global from. Name of the global resource to get. Pointer to the data.

Returns the size in bytes of the global resource from the current module.

Global name to get it's size. Size in bytes of the global resource.

Returns the size in bytes of the global resource from the specified module.

Module to get the global size from. Global name to get it's size. Size in bytes of the global resource.

Returns a texture reference from the current module.

Name of texture to get. Texture reference.

Returns a texture reference from the specified module.

Module to get texture from. Name of texture to get. Texture reference.

Allocates host memory using cuMemAllocHost. Memory allocated by this function can be used for asynchronous memory operations.

Number of bytes to allocate. Pointer to native memory to use. Memory allocated by this function must be freed using FreeHost.

Allocates host memory using cuMemAllocHost. Memory allocated by this function can be used for asynchronous memory operations.

One of CUDA suppoerted primitives or vector types. Array to allocate enough memory for. Pointer to native memory to use. Memory allocated by this function must be freed using FreeHost.

Allocate host memory that has device pointer attached (zero copy).

Size of buffer to allocate. Flags for buffer allocation. Pointer to host memory with device pointer attached.

Returns the device pointer attached to host buffer (zero copy).

Host pointer allocated with HostAllocate. Flags for buffer allocation. The device pointer attached to host buffer (zero copy).

Allocate device memory using the specified amount of bytes.

Bytes of device memory to allocate. Pointer to device memory.

Allocate device memory using the provided array to determine the size in bytes needed to host the array in device memory.

One of CUDADriver supported primitives. Array to allocate memory for. Pointer to device memory.

Free previously allocated device memory.

Pointer to allocated device device memory.

Frees host memory previously allocated using AllocateHost or a similar driver function.

Allocated pointer to free.

Copies the given array to device memory, returning the allocated device memory pointer.

One of CUDADriver supported primitives. Array to copy to device memory. Pointer to device memory.

Copies the given array to device memory using a pre-allocated pointer.

One of CUDADriver supported primitives. Pointer to allocated device memory. Array to copy to device memory.

Copies the given buffer to device memory using a pre-allocated pointer.

Pointer to allocated device memory. Pointer of host memory to copy from.

Copies the given buffer to device memory, returning the allocated device memory pointer.

Pointer of host memory to copy from. Size of data to copy, in bytes. Pointer to device memory.

Copies memory from the device to the specified array.

One of CUDADriver supported primitives. Pointer to device memory containing the data to copy. Array to copy the data to.

Copies memory from the device to the specified buffer.

Source device pointer to copy from. Pointer to memory to copy to. Amount of bytes to copy.

Intra-device copy. Used to copy memory from one device region to another.

Pointer to device memory containing the data to copy from. Pointer to device memory to copy the data to. Number of bytes to copy.

Copies the given array to device memory and allocates the necessary memory.

One of CUDADriver supported primitives. Array to copy to device. Array object to use across device array functions.

Copies the given array to device memory starting from the specified index.

One of CUDADriver supported primitives. Array to copy to device memory. Array index to start copy from. Array object to use across device array functions.

Copies the given array to a pre-allocated device memory, starting from the provided index.

One of CUDADriver supported primitives. Pointer to device array memory object. Array to copy to the device. Array index to start copy from.

Copies device array data to the host.

One of CUDADriver supported primitives. Pointer to device array. Array to copy data to. Array index to start copy from.

Copy array data inside the device.

Pointer to array to copy from. Source array index to copy from. Pointer to array to copy to. Destination array index to copy to. Number of bytes to copy.

Performs a 2D copy by the CUDA driver.

Describes the 2D copy to perform.

Performs a 2D unaligned copy by the CUDA driver.

Performs a 3D copy by the CUDA driver.

Describes the 3D copy to perform.

Asynchronous host to device memory copy.

Buffer to copy data from. Size of data to copy. Stream to use for copy. Device pointer to allocated memory for transfer.

Asynchronous host to device memory copy.

Device pointer to copy data to. Buffer to copy data from. Size of data to copy. Stream to use for copy.

Asynchronous device to host memory copy.

Device pointer to copy data from. Buffer to copy data to. Size of data to copy. Stream to use for copy.

Asynchronous host to array memory copy.

Device array to copy data to. Buffer to copy data from. Size of data to copy. Stream to use for copy.

Asynchronous host to array memory copy.

Device array to copy data to. Index into array for copy to start from. Buffer to copy data from. Size of data to copy. Stream to use for copy.

Asynchronous array to host memory copy.

Device array to copy from. Buffer to copy to. Index into buffer array to start copy from. Size of data to copy. Stream to use for copy.

Performs an asynchronous 2D copy by the CUDA driver.

Describes the 2D copy to perform. Stream to use for asynchronous copy.

Performs an asynchronous 3D copy by the CUDA driver.

Describes the 3D copy to perform. Stream to use for asynchronous copy.

Sets block size for function execution.

Function to set block size for. X dimension size for block execution. Y dimension size for block execution. Z dimension size for block execution.

Set shared size for function execution.

Function to set shared size for. Shared memory size in bytes.

Returns an attribute value for the current function.

Attribute to get value for. An attribute value for the current function.

Returns an attribute value for the specified function.

Function to get attribute for. Attribute to get value for. An attribute value for the specified function.

Create an array in device memory according to the provided information.

Structure containing array description.

Creates an array in device memory with 1 channel.

Format of array element. Width of array. Height of array. Array object to be used across device array functions. When creating a 1D array, set height to 0.

Creates an array in device memory.

Format of array element. Width of array. Height of array. Number of channels in every element of the array. Array object to be used across device array functions. When creating a 1D array, set height to 0.

Creates array in device memory based on the properties of the provided array.

Array to allocate device memory for. Array object to be used across device array functions.

Creates a 3D array in device memory using the provided configuration.

Format of the array. Number of components per element of the array. Width of the array in elements. Height of the array in elements. Depth of the array in elements. 3D array in device memory.

Creates a 3D array in device memory using the provided configuration.

Array to use as metadata. Width of the array in elements. Height of the array in elements. Depth of the array in elements. 3D array in device memory.

Creates a 3D array in device memory using the provided configuration.

3D array descriptor. 3D array in device memory.

Releases device memory used by the given array.

Array to release memory for.

Returns the descriptor associated with the provided array.

Pointer to device array. Array descriptor information.

Returns the 3D descriptor associated with the provided array.

Pointer to device array. 3D Array descriptor information.

Creates a new texture reference.

New texture reference.

Destroys the provided texture and releases its associated resources.

Sets the given texture to be associated with the following array.

Texture to set array to. Array to bind to the texture. The function uses the CU_TRSA_OVERRIDE_FORMAT constant as flags.

Sets the given texture to be associated with the following array.

Texture to set the array to. Array to bind to the texture. Flags to use for the texture. The CU_TRSA_OVERRIDE_FORMAT constant must be used as flags.

Sets the given texture to be associated with the provided device pointer (linear memory).

Texture to set the address to. Pointer to device memory to bind to. Size of memory in dptr to bind to the texture. A value that must be applied to texture fetches due to hardware alignment requirements.

Sets the given texture to be associated with the provided device pointer (2D linear memory).

Texture to set the address to. Description of 2D memory to set address with. Pointer to device memory to bind to. Pitch of linear memory to apply. A value that must be applied to texture fetches due to hardware alignment requirements.

Sets the format the texture should use when fetching values.

Texture to set format to. Format to set. Number of components packed into every element of the texture.

Sets the addressing mode used for the given dimension of the texture.

Texture to set addressing mode to. Dimension to set addressing mode to. Addressing mode value to use.

Sets the filter mode to use with the texture.

Texture to set filter mode to. Filter mode to set.

Sets flags for the texture.

Values for flags parameter should be one or a combination of the following: CU_TRSF_READ_AS_INTEGER, CU_TRSF_NORMALIZED_COORDINATES.

Returns the device pointer associated with the provided texture.

Texture to get device pointer to. Device pointer associated with the provided texture.

Returns the array associated with the provided texture.

Texture to get array to. Array associated with the provided texture.

Returns the address mode used for the specified dimension of the texture.

Texture to get address mode for. Specific dimension of the texture to get address mode for. Address mode used for the specified dimension of the texture.

Returns the filter mode used with the following texture.

Texture to get filter mode for. Filter mode used with the following texture.

Returns the format used with the following texture.

Texture to get format for. Format used with the following texture.

Returns the number of channels used with the following texture.

Texture to get number of channels for. Number of channels used with the following texture.

Returns flags for the following texture.

Texture to get flags for. Flags for the following texture.

Set total size for parameter information for the given function.

Function to set parameter size for. Number of bytes for parameters definition of the function.

Set a texture as a parameter for the function.

Function to set texture parameter for. Texture reference to bind.

Set a floating point (single precision) value as a parameter in the specified position.

Function to set parameter value for. Offset from parameters begining. Float value to set.

Set an integer value as a parameter in the specified position.

Function to set parameter value for. Offset from parameters begining. Integer value to set.

Set vector/array value as a parameter in the specified position.

One of CUDADriver supported primitives. Function to set parameter value for. Offset from parameters begining. Array value to set.

Set vector/array value as a parameter in the specified position.

One of CUDADriver supported primitives. Function to set parameter value for. Offset from parameters begining. Vector value to set.

Launch the given function in the device.

Function to launch in the device.

Creates an event using default flags (EventFlags.None).

Pointer to event object to be used across device event functions.

Creates an event using the specified flags.

Flags for event creation. Pointer to event object to be used across device event functions.

Records the current time in the event.

Event to record time for.

Records the event over the given stream.

Event to record time for. Stream to record the event for.

Synchronizes event information.

Event to synchronize.

Releases resources used by the device and the driver.

Event to release.

Measures elapsed time between the specified events.

Event representing the starting point. Event representing the end point. Elapsed time in millis.

Creates a stream for asynchronous communication with the device using default flags (StreamFlags.None).

Pointer to stream object to be used across device stream functions.

Creates a stream for asynchronous communication with the device.

Flags for stream creation. Pointer to stream object to be used across device stream functions.

Syncronizes all operations over the stream.

Stream to syncronize.

Releases all device and driver resources consumed by the stream.

Stream to release.

Returns a value that indicates if the provided type is a normal .NET primitive.

Type to check. true if is primitive, false otherwise.

Returns a value that indicates if the provided type is a CUDA vector type.

Type to check. true if is vector, false otherwise.

Returns the format for every type.

Type to get format for. Format for the type.

Returns the number of components for every CUDA vector type.

Type to get number of components for. Number of components for the type.

Gets the version of CUDA driver supported by this class.

Gets a collection of devices recognized by CUDA.

Gets the last error/result returned by calling a function of the CUDA driver.

Gets or sets a value indicating whether to raise exceptions when a CUDA driver function returns with a failure result code.

Gets the current device this class is using.

Gets the current context.

Gets the current loaded module.

Gets the current function.

Gets the amount of free memory available for use by the device.

Gets the total amount of memory available for use by the device.

cufftHandle is a handle type used to store and access CUFFT plans.

Texture reference filtering modes.

Point filter mode.

Linear filter mode.

2D memory copy parameters.

Source X in bytes.

Source Y.

Source memory type (host, device, array).

Source host pointer.

Source device pointer.

Source array reference.

Source pitch (ignored when src is array).

Destination X in bytes.

Destination Y.

Destination memory type (host, device, array).

Destination host pointer.

Destination device pointer.

Destination array reference.

Destination pitch (ignored when dst is array).

Width of 2D memory copy in bytes.

Height of 2D memory copy.

Texture reference addressing modes.

Wrapping address mode.

Clamp to edge address mode.

Mirror address mode

3D memory copy parameters.

Source X in bytes.

Source Y.

Source Z.

Source LOD.

Source memory type (host, device, array).

Source host pointer.

Source device pointer.

Source array reference.

must be NULL.

Source pitch (ignored when src is array).

Source height (ignored when src is array; may be 0 if Depth==1).

Destination X in bytes.

Destination Y.

Destination Z.

Destination LOD.

Destination memory type (host, device, array).

Destination host pointer.

Destination device pointer.

Destination array reference.

Must be NULL.

Destination pitch (ignored when dst is array).

Destination height (ignored when dst is array; may be 0 if Depth==1).

Width of 3D memory copy in bytes.

Height of 3D memory copy.

Depth of 3D memory copy.

CUBLASDriverEmulation provides access to cublasemu driver API.

CUDA error types.

No errors.

Missing configuration error.

Memory allocation error.

Initialization error.

Launch failure.

Prior launch failure.

Launch timeout error.

Launch out of resources error.

Invalid device function.

Invalid configuration.

Invalid device.

Invalid value.

Invalid pitch value.

Invalid symbol.

Map buffer object failed.

Unmap buffer object failed.

Invalid host pointer.

Invalid device pointer.

Invalid texture.

Invalid texture binding.

Invalid channel descriptor.

Invalid memcpy direction.

Address of constant error.

Texture fetch failed.

Texture not bound error.

Synchronization error.

Invalid filter setting.

Invalid norm setting.

Mixed device execution.

CUDA runtime unloading.

Unknown error condition.

Function not yet implemented.

Memory value too large.

Invalid resource handle.

Not ready error.

CUDA runtime is newer than driver.

Set on active process error.

No available CUDA device.

Startup failure.

API failure base.

Channel format kind.

Signed channel format.

Unsigned channel format.

Float channel format.

No channel format.

CUDA memory copy types.

Host -> Host.

Host -> Device.

Device -> Host.

Device -> Device.

CUDA device compute modes.

Default compute mode (Multiple threads can use cudaSetDevice() with this device).

Compute-exclusive mode (Only one thread will be able to use cudaSetDevice() with this device).

Compute-prohibited mode (No threads can use cudaSetDevice() with this device).

CUDA Channel format descriptor.

Channel format kind.

CUDA stream.

CUDA event.

CUDA array.

CUDA Pitched memory pointer.

Pointer to allocated memory.

Pitch of allocated memory in bytes.

Logical width of allocation in bytes.

Logical height of allocation in bytes.

CUDA extent.

Width in bytes.

Height in bytes.

Depth in bytes.

CUDA 3D position.

CUDA 3D memory copying parameters.

Source memory address.

Source position offset.

Pitched source memory address.

Destination memory address.

Destination position offset.

Pitched destination memory address.

Requested memory copy size.

Type of transfer.

CUDA function attributes.

Size of shared memory in bytes.

Size of constant memory in bytes.

Size of local memory in bytes.

Maximum number of threads per block.

Number of registers used.

CUDA device properties.

ASCII string identifying device.

Global memory available on device in bytes.

Shared memory available per block in bytes.

32-bit registers available per block.

Warp size in threads.

Maximum pitch in bytes allowed by memory copies.

Maximum number of threads per block.

Maximum size of each dimension of a block.

Maximum size of each dimension of a grid.

Clock frequency in kilohertz.

Constant memory available on device in bytes.

Major compute capability.

Minor compute capability.

Alignment requirement for textures.

Device can concurrently copy memory and execute a kernel.

Number of multiprocessors on device.

Specified whether there is a run time limit on kernels.

Device is integrated as opposed to discrete.

Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer.

Compute mode (See cudaComputeMode).

Provides access to CUFFT driver API.

Online compiler options.

Max number of registers that a thread may use.

IN: Specifies minimum number of threads per block to target compilation for OUT: Returns the number of threads the compiler actually targeted. This restricts the resource utilization fo the compiler (e.g. max registers) such that a block with the given number of threads should be able to launch based on register limitations. Note, this option does not currently take into account any other resource limitations, such as shared memory utilization.

Returns a float value in the option of the wall clock time, in milliseconds, spent creating the cubin

Pointer to a buffer in which to print any log messsages from PTXAS that are informational in nature

IN: Log buffer size in bytes. Log messages will be capped at this size (including null terminator) OUT: Amount of log buffer filled with messages

Pointer to a buffer in which to print any log messages from PTXAS that reflect errors

IN: Log buffer size in bytes. Log messages will be capped at this size (including null terminator) OUT: Amount of log buffer filled with messages

Level of optimizations to apply to generated code (0 - 4), with 4 being the default and highest level of optimizations.

No option value required. Determines the target based on the current attached context (default)

Target is chosen based on supplied CUJITTargetEnum.

Specifies choice of fallback strategy if matching cubin is not found. Choice is based on supplied CUJITFallbackEnum.

Defines flags to supply to cuMemHostAlloc function.

If set, host memory is portable between CUDA contexts.

If set, host memory is mapped into CUDA address space and cuMemHostGetDevicePointer() may be called on the host pointer.

If set, host memory is allocated as write-combined - fast to write, faster to DMA, slow to read except via SSE4 streaming load instruction (MOVNTDQA).

Represents an exception that occured in the FFT driver.

Gets the error code returned by CUFFT driver that caused the exception.

CUDA GL Map Flags.

Default; Assume resource can be read/written.

CUDA kernels will not write to this resource.

CUDA kernels will only write to and will not read from this resource.

Provides access to driver API for CUDA.

Override the texref format with a format inferred from the array. Flag for cuTexRefSetArray().

Read the texture as integers rather than promoting the values to floats in the range [0,1]. Flag for cuTexRefSetFlags().

Use normalized texture coordinates in the range [0,1) instead of [0,dim). Flag for cuTexRefSetFlags().

For texture references loaded into the module, use default texunit from texture reference.

Context creation flags.

Automatic scheduling.

Set spin as default scheduling.

Set yield as default scheduling.

Use blocking synchronization.

Support mapped pinned allocations.

Used to represent a platform dependent sized variable. On 32 bit platforms it is 4 bytes wide (int, uint), on 64 bit it is 8 bytes wide (long, ulong). This class maps to the C/C++ native size_t data type.

Creates a new instance based on the given value.

Integer value to represent.

Creates a new instance based on the given value.

Integer value to represent.

Creates a new instance based on the given value.

Integer value to represent.

Creates a new instance based on the given value.

Integer value to represent.

Converts the object to int.

Object to convert. Integer value represented by the object.

Converts the object to uint.

Object to convert. Integer value represented by the object.

Converts the object to long.

Object to convert. Integer value represented by the object.

Converts the object to ulong.

Object to convert. Integer value represented by the object.

Converts the given integer to an object.

Integer value to convert. New object representing this value.

Converts the given integer to an object.

Integer value to convert. New object representing this value.

Converts the given integer to an object.

Integer value to convert. New object representing this value.

Converts the given integer to an object.

Integer value to convert. New object representing this value.

Compares two SizeT objects.

First value to compare. Second value to compare. true or false for the comparison result.

Compares two SizeT objects.

First value to compare. Second value to compare. true or false for the comparison result.

Returns a value indicating whether this instance is equal to a specified object.

An object to compare with this instance or null. true if obj is an instance of System.IntPtr and equals the value of this instance; otherwise, false.

Converts the numeric value of the current object to its equivalent string representation.

The string representation of the value of this instance.

Returns the hash code for this instance.

A 32-bit signed integer hash code.

CUDA device.

CUDA device pointer.

CUDA context.

CUDA module.

CUDA function.

CUDA array.

CUDA texture reference.

CUDA event.

CUDA stream.

Device properties.

Maximum number of threads per block.

Maximum block dimension X.

Maximum block dimension Y.

Maximum block dimension Z.

Maximum grid dimension X.

Maximum grid dimension Y.

Maximum grid dimension Z.

Maximum shared memory available per block in bytes.

Deprecated, use MaxSharedMemoryPerBlock.

Memory available on device for __constant__ variables in a CUDA C kernel in bytes.

Warp size in threads.

Maximum pitch in bytes allowed by memory copies.

Maximum number of 32-bit registers available per block.

Deprecated, use MaxRegistersPerBlock.

Peak clock frequency in kilohertz.

Alignment requirement for textures.

Device can possibly copy memory and execute a kernel concurrently.

Number of multiprocessors on device.

Specifies whether there is a run time limit on kernels.

Device is integrated with host memory.

Device can map host memory into CUDA address space.

Compute mode (See CUComputeMode for details).

Parameter represents a parameter to be passed to a CUDA kernel. Kernel parameters can be any of the following: primitives, vectors, global memory buffers, textures and more.

Creates a new empty parameter with a name.

Name of the parameter to create.

Creates a new empty parameter with a name and type.

Name of the parameter to create. Type of the parameter.

Creates a new empty parameter with a name, type and direction.

Name of the parameter to create. Type of the parameter. Direction for the parameter. Buffers created with Out direction, are only allocated. When using InOut or In, they are also copied to the device.

Creates a new parameter.

Name of the parameter to create. Type of the parameter. Direction for the parameter. For scalars or vectors the value itself, with buffers, the CUdeviceptr. Buffers created with Out direction, are only allocated. When using InOut or In, they are also copied to the device.

Gets the name of the parameter.

Gets or sets the direction of the parameter.

Gets or sets the type of the parameter.

Gets or sets the value of the parameter.

CUBLASDriver provides access to CUBLAS driver API.

Provides an object oriented model for accessing FFT functionality of CUDA, using CUDADriver to communicate with CUDA.

Creates a new instance of CUFFT class.

CUDA object to use for memory allocation and other operations.

Creates a new 1D FFT based on the provided parameters.

Transform size (e.g., 256 for 256 point FFT). Type of transformation to use. Number of transforms of size nx. Handle to be used by consequent calls to CUFFT functions.

Creates a new 2D FFT based on the provided parameters.

Transform size (e.g., 256 for 256 point FFT) for x dimension. Transform size (e.g., 256 for 256 point FFT) for y dimension. Type of transformation to use. Handle to be used by consequent calls to CUFFT functions.

Creates a new 3D FFT based on the provided parameters.

Transform size (e.g., 256 for 256 point FFT) for x dimension. Transform size (e.g., 256 for 256 point FFT) for y dimension. Transform size (e.g., 256 for 256 point FFT) for z dimension. Type of transformation to use. Handle to be used by consequent calls to CUFFT functions.

Releases all resources used by the current FFT plan.

Releases all resources used by the provided FFT plan.

Plan to release resources for.

Executes a complex->complex FFT using the current plan.

Pointer to device memory holding the data serving as input. Pointer to device memory to receive output results. Direction of the FFT to apply.

Executes a complex->complex FFT using the specified plan.

Specific plan to use for FFT. Pointer to device memory holding the data serving as input. Pointer to device memory to receive output results. Direction of the FFT to apply.

Executes a real->complex FFT using the current plan.

Pointer to device memory holding the data serving as input. Pointer to device memory to receive output results.

Executes a real->complex FFT using the specified plan.

Specific plan to use for FFT. Pointer to device memory holding the data serving as input. Pointer to device memory to receive output results.

Executes a complex->real FFT using the current plan.

Pointer to device memory holding the data serving as input. Pointer to device memory to receive output results.

Executes a complex->real FFT using the specified plan.

Specific plan to use for FFT. Pointer to device memory holding the data serving as input. Pointer to device memory to receive output results.

Executes a 1D real to complex FFT (implicitly forward).

Real values array serving as input to FFT. Complex values array serving as output to FFT. Transform size (e.g., 256 for 256 point FFT). Number of transforms of size nx.

Executes a 1D complex to real FFT (implicitly inverse).

Complex values array serving as output to FFT. Real values array serving as input to FFT. Transform size (e.g., 256 for 256 point FFT). Number of transforms of size nx.

Executes a 1D complex to complex FFT (implicitly forward).

Complex values array serving as output to FFT. complex values array serving as input to FFT. Transform size (e.g., 256 for 256 point FFT). Number of transforms of size nx.

Executes a 1D complex to complex FFT (implicitly inverse).

Complex values array serving as output to FFT. Complex values array serving as input to FFT. Transform size (e.g., 256 for 256 point FFT). Number of transforms of size nx. Direction for FFT.

Executes a 2D real to complex FFT (implicitly forward).

Real values array serving as input to FFT. Complex values array serving as output to FFT. X dimension transform size (e.g., 256 for 256 point FFT). Y dimension transform size (e.g., 256 for 256 point FFT).

Executes a 2D complex to real FFT (implicitly forward).

Complex values array serving as input to FFT. Real values array serving as output to FFT. X dimension transform size (e.g., 256 for 256 point FFT). Y dimension transform size (e.g., 256 for 256 point FFT).

Executes a 2D complex to complex FFT (implicitly forward).

Complex values array serving as input to FFT. Complex values array serving as output to FFT. X dimension transform size (e.g., 256 for 256 point FFT). Y dimension transform size (e.g., 256 for 256 point FFT).

Executes a 2D complex to complex FFT.

Executes a 3D real to complex FFT (implicitly forward).

Executes a 3D complex to real FFT (implicitly inverse).

Executes a 3D complex to complex FFT (implicitly forward).

Executes a 3D complex to complex FFT.

Holds a value that indicates for the class whether to throw runtime exceptions when an error result is returned by calling any of the CUFFT driver functions.

Default is true.

Holds the last result returned by calling one of the CUFFT driver functions.

Holds a reference to a CUDA class to provide memory allocation capabilities.

Holds the handle created by the user.

Gets the last error/result returned by calling CUFFT driver functions.

Gets or sets a value to indicate whether to use runtime exceptions when a CUFFT driver function returns an error, or to ignore that error.

The default value is true.