coopVecOuterProductAccumulate¶
Description¶
Atomically accumulates the outer product of two cooperative vectors into a matrix. Given an M-element vector a, and an N-element vector b, compute the outer product of a and b, forming a M-row by N-col matrix. The elements in the matrix is then atomically accumulated to memory location represented by matrix.
Signature¶
/// Requires Capability Set 1: void coopVecOuterProductAccumulate<T, int M, int N>( CoopVec<T, M> a, CoopVec<T, N> b, RWByteAddressBuffer matrix, int matrixOffset, uint matrixStride, CoopVecMatrixLayout memoryLayout, CoopVecComponentType matrixInterpretation) where T : __BuiltinArithmeticType; /// Requires Capability Set 2: void coopVecOuterProductAccumulate<T, int M, int N, IgnoredBufferElementType>( CoopVec<T, M> a, CoopVec<T, N> b, RWStructuredBuffer<IgnoredBufferElementType, DefaultDataLayout> matrix, int matrixOffset, uint matrixStride, CoopVecMatrixLayout memoryLayout, CoopVecComponentType matrixInterpretation) where T : __BuiltinArithmeticType; /// Requires Capability Set 2: void coopVecOuterProductAccumulate<T, int M, int N, U, int IgnoredBufferSize>( CoopVec<T, M> a, CoopVec<T, N> b, U[IgnoredBufferSize] matrix, int matrixOffset, uint matrixStride, CoopVecMatrixLayout memoryLayout, CoopVecComponentType matrixInterpretation) where T : __BuiltinArithmeticType where U : __BuiltinArithmeticType; /// Requires Capability Set 2: void coopVecOuterProductAccumulate<T, int M, int N>( CoopVec<T, M> a, CoopVec<T, N> b, Ptr<void> matrixPtr, uint matrixStride, CoopVecMatrixLayout memoryLayout, CoopVecComponentType matrixInterpretation) where T : __BuiltinArithmeticType;
Generic Parameters¶
T: __BuiltinArithmeticType¶
M : int¶
N : int¶
IgnoredBufferElementType¶
U: __BuiltinArithmeticType¶
IgnoredBufferSize : int¶
Parameters¶
a : CoopVec<T, M>¶
The first cooperative vector.
b : CoopVec<T, N>¶
The second cooperative vector.
matrix : RWByteAddressBuffer¶
The matrix buffer to accumulate the result into.
matrixOffset : int¶
Byte offset into the matrix buffer.
matrixStride : uint¶
The stride between matrix rows/columns in bytes.
memoryLayout : CoopVecMatrixLayout¶
Specifies the memory layout of the matrix (row-major or column-major).
matrixInterpretation : CoopVecComponentType¶
Specifies how to interpret the values in the matrix.
matrix : RWStructuredBuffer<IgnoredBufferElementType, DefaultDataLayout>¶
The matrix buffer to accumulate the result into.
matrix : U [ IgnoredBufferSize ]¶
The matrix buffer to accumulate the result into.
matrixPtr : Ptr<void>¶
Remarks¶
On current hardware, memoryLayout must be TrainingOptimal.
When memoryLayout is RowMajor, this function is equivalent to:
uint8_t* matrixPtr = matrix + matrixOffset;
for (int i = 0; i < M; i++)
{
for (int j = 0; j < N; j++)
{
let elem = a[i] * b[j];
atomicAdd(matrixPtr + i * matrixStride + j * sizeof(T), elem);
}
}
Availability and Requirements¶
Capability Set 1¶
Defined for the following targets:
hlsl¶
Available in all stages.
Requires capability: hlsl_coopvec_poc.
glsl¶
Available in all stages.
cpp¶
Available in all stages.
cuda¶
Available in all stages.
Requires capability: optix_coopvec.
spirv¶
Available in all stages.
Requires capability: spvCooperativeVectorNV.
Capability Set 2¶
Defined for the following targets:
spirv¶
Available in all stages.
Requires capability: spvCooperativeVectorTrainingNV.