WasmGPU.compute.kernels.luFactorF32Batched¶

Summary¶

WasmGPU.compute.kernels.luFactorF32Batched performs in-place batched LU factorization with partial pivoting over row-major f32 matrices. It overwrites each input matrix with compact L and U data and writes pivot rows into a separate ipiv buffer. Use this when you want GPU-side factorization before one or more solves with the same batched systems.

Syntax¶

WasmGPU.compute.kernels.luFactorF32Batched(matrices: StorageBuffer, ipiv: StorageBuffer, batchCount: number, n: number, opts?: KernelDispatchOptions): void
wgpu.compute.kernels.luFactorF32Batched(matrices, ipiv, batchCount, n, opts);

Parameters¶

Name	Type	Required	Description
`matrices`	`StorageBuffer`	Yes	Row-major `(batchCount, n, n)` `f32` matrix data. This buffer is overwritten in place.
`ipiv`	`StorageBuffer`	Yes	Output pivot buffer for `batchCount * n` `u32` indices.
`batchCount`	`number`	Yes	Number of matrices in the batch.
`n`	`number`	Yes	Matrix dimension for each square system.
`opts`	`KernelDispatchOptions`	No	Optional label and workgroup-limit validation settings. `opts.encoder` is currently not supported by this method.

Returns¶

void - This method records and submits its own compute work when batchCount and n are both non-zero.

Type Details¶

type KernelDispatchOptions = {
    encoder?: GPUCommandEncoder;
    label?: string;
    validateLimits?: boolean;
};

matrices stores f32 values in row-major (batchCount, n, n) layout, so each matrix entry is 4 bytes and each matrix occupies n * n * 4 bytes.

On return, each matrix block contains compact LU data for the pivoted system. The strict lower triangle stores L, the upper triangle including the diagonal stores U, and the diagonal of L is implicit. ipiv[b * n + k] is a 0-based row index recording which row was swapped with row k at step k for batch item b.

Requirements: - matrices.byteLength >= batchCount * n * n * 4 - ipiv.byteLength >= batchCount * n * 4 - matrices and ipiv must be distinct StorageBuffer instances - If batchCount or n is 0, the method returns without dispatching work

Current limitation: - luFactorF32Batched() rejects opts.encoder even though other built-in kernels can encode into a caller-provided command encoder

Implementation note: - The current source uses a small-matrix path for n < 160 and a blocked factorization path for larger systems. Treat that threshold as an implementation detail, not a stable API contract.

Example¶

const canvas = document.querySelector("canvas");
const wgpu = await WasmGPU.create(canvas);

const batchCount = 1;
const n = 3;
const matrices = wgpu.compute.createStorageBuffer({
    data: new Float32Array([
        4, 1, 0,
        2, 3, 1,
        0, 1, 2
    ])
});
const ipiv = wgpu.compute.createStorageBuffer({
    byteLength: batchCount * n * 4,
    copySrc: true
});

wgpu.compute.kernels.luFactorF32Batched(matrices, ipiv, batchCount, n);

console.log(Array.from(await wgpu.compute.readback.readU32(ipiv, 0, n)));