coopVecOuterProductAccumulate

Description

Atomically accumulates the outer product of two cooperative vectors into a matrix. Given an M-element vector a, and an N-element vector b, compute the outer product of a and b, forming a M-row by N-col matrix. The elements in the matrix is then atomically accumulated to memory location represented by matrix.

Signature

/// Requires Capability Set 1:
void coopVecOuterProductAccumulate<T, int M, int N>(
    CoopVec<T, M> a,
    CoopVec<T, N> b,
    RWByteAddressBuffer matrix,
    int matrixOffset,
    uint matrixStride,
    CoopVecMatrixLayout memoryLayout,
    CoopVecComponentType matrixInterpretation)
    where T : __BuiltinArithmeticType;

/// Requires Capability Set 2:
void coopVecOuterProductAccumulate<T, int M, int N, IgnoredBufferElementType>(
    CoopVec<T, M> a,
    CoopVec<T, N> b,
    RWStructuredBuffer<IgnoredBufferElementType, DefaultDataLayout> matrix,
    int matrixOffset,
    uint matrixStride,
    CoopVecMatrixLayout memoryLayout,
    CoopVecComponentType matrixInterpretation)
    where T : __BuiltinArithmeticType;

/// Requires Capability Set 2:
void coopVecOuterProductAccumulate<T, int M, int N, U, int IgnoredBufferSize>(
    CoopVec<T, M> a,
    CoopVec<T, N> b,
    U[IgnoredBufferSize] matrix,
    int matrixOffset,
    uint matrixStride,
    CoopVecMatrixLayout memoryLayout,
    CoopVecComponentType matrixInterpretation)
    where T : __BuiltinArithmeticType
    where U : __BuiltinArithmeticType;

/// Requires Capability Set 2:
void coopVecOuterProductAccumulate<T, int M, int N>(
    CoopVec<T, M> a,
    CoopVec<T, N> b,
    Ptr<void, Access.ReadWrite, AddressSpace.Device> matrixPtr,
    uint matrixStride,
    CoopVecMatrixLayout memoryLayout,
    CoopVecComponentType matrixInterpretation)
    where T : __BuiltinArithmeticType;

Generic Parameters

T: __BuiltinArithmeticType

M : int

N : int

IgnoredBufferElementType

U: __BuiltinArithmeticType

IgnoredBufferSize : int

Parameters

a : CoopVec<T, M>

The first cooperative vector.

b : CoopVec<T, N>

The second cooperative vector.

matrix : RWByteAddressBuffer

The matrix buffer to accumulate the result into.

matrixOffset : int

Byte offset into the matrix buffer.

matrixStride : uint

The stride between matrix rows/columns in bytes.

memoryLayout : CoopVecMatrixLayout

Specifies the memory layout of the matrix (row-major or column-major).

matrixInterpretation : CoopVecComponentType

Specifies how to interpret the values in the matrix.

matrix : RWStructuredBuffer<IgnoredBufferElementType, DefaultDataLayout>

The matrix buffer to accumulate the result into.

matrix : U [ IgnoredBufferSize ]

The matrix buffer to accumulate the result into.

matrixPtr : Ptr<void, Access.ReadWrite, AddressSpace.Device>

Remarks

On current hardware, memoryLayout must be TrainingOptimal.

When memoryLayout is RowMajor, this function is equivalent to:

uint8_t* matrixPtr = matrix + matrixOffset;
for (int i = 0; i < M; i++)
{
   for (int j = 0; j < N; j++)
   {
       let elem = a[i] * b[j];
       atomicAdd(matrixPtr + i * matrixStride + j * sizeof(T), elem);
   }
}

Availability and Requirements

Capability Set 1

Defined for the following targets:

hlsl

Available in all stages.

glsl

Available in all stages.

cpp

Available in all stages.

cuda

Available in all stages.

Requires capability: optix_coopvec.

spirv

Available in all stages.

Requires capability: spvCooperativeVectorNV.

Capability Set 2

Defined for the following targets:

spirv

Available in all stages.

Requires capability: spvCooperativeVectorTrainingNV.