Skip to content

WasmGPU.compute.kernels.luFactorF32Batched

Summary

WasmGPU.compute.kernels.luFactorF32Batched performs in-place batched LU factorization with partial pivoting over row-major f32 matrices. It overwrites each input matrix with compact L and U data and writes pivot rows into a separate ipiv buffer. Use this when you want GPU-side factorization before one or more solves with the same batched systems.

Syntax

WasmGPU.compute.kernels.luFactorF32Batched(matrices: StorageBuffer, ipiv: StorageBuffer, batchCount: number, n: number, opts?: KernelDispatchOptions): void
wgpu.compute.kernels.luFactorF32Batched(matrices, ipiv, batchCount, n, opts);

Parameters

Name Type Required Description
matrices StorageBuffer Yes Row-major (batchCount, n, n) f32 matrix data. This buffer is overwritten in place.
ipiv StorageBuffer Yes Output pivot buffer for batchCount * n u32 indices.
batchCount number Yes Number of matrices in the batch.
n number Yes Matrix dimension for each square system.
opts KernelDispatchOptions No Optional label and workgroup-limit validation settings. opts.encoder is currently not supported by this method.

Returns

void - This method records and submits its own compute work when batchCount and n are both non-zero.

Type Details

type KernelDispatchOptions = {
    encoder?: GPUCommandEncoder;
    label?: string;
    validateLimits?: boolean;
};

matrices stores f32 values in row-major (batchCount, n, n) layout, so each matrix entry is 4 bytes and each matrix occupies n * n * 4 bytes.

On return, each matrix block contains compact LU data for the pivoted system. The strict lower triangle stores L, the upper triangle including the diagonal stores U, and the diagonal of L is implicit. ipiv[b * n + k] is a 0-based row index recording which row was swapped with row k at step k for batch item b.

Requirements: - matrices.byteLength >= batchCount * n * n * 4 - ipiv.byteLength >= batchCount * n * 4 - matrices and ipiv must be distinct StorageBuffer instances - If batchCount or n is 0, the method returns without dispatching work

Current limitation: - luFactorF32Batched() rejects opts.encoder even though other built-in kernels can encode into a caller-provided command encoder

Implementation note: - The current source uses a small-matrix path for n < 160 and a blocked factorization path for larger systems. Treat that threshold as an implementation detail, not a stable API contract.

Example

const canvas = document.querySelector("canvas");
const wgpu = await WasmGPU.create(canvas);

const batchCount = 1;
const n = 3;
const matrices = wgpu.compute.createStorageBuffer({
    data: new Float32Array([
        4, 1, 0,
        2, 3, 1,
        0, 1, 2
    ])
});
const ipiv = wgpu.compute.createStorageBuffer({
    byteLength: batchCount * n * 4,
    copySrc: true
});

wgpu.compute.kernels.luFactorF32Batched(matrices, ipiv, batchCount, n);

console.log(Array.from(await wgpu.compute.readback.readU32(ipiv, 0, n)));

See Also