WasmGPU.compute.kernels.luSolveF32Batched¶

Summary¶

WasmGPU.compute.kernels.luSolveF32Batched solves batched linear systems A x = b from compact LU factors and pivot rows produced by WasmGPU.compute.kernels.luFactorF32Batched. It reads factored row-major f32 matrices plus contiguous f32 right-hand sides and writes one contiguous f32 solution vector per batch item. Use this when you want a practical GPU-side factor-then-solve path for many small or medium square systems.

Syntax¶

WasmGPU.compute.kernels.luSolveF32Batched(lu: StorageBuffer, ipiv: StorageBuffer, rhs: StorageBuffer, outX: StorageBuffer, batchCount: number, n: number, opts?: KernelDispatchOptions): void
wgpu.compute.kernels.luSolveF32Batched(lu, ipiv, rhs, outX, batchCount, n, opts);

Parameters¶

Name	Type	Required	Description
`lu`	`StorageBuffer`	Yes	Row-major `(batchCount, n, n)` compact LU data from `luFactorF32Batched()`.
`ipiv`	`StorageBuffer`	Yes	Pivot rows for the same batch and matrix size, stored as `(batchCount, n)` `u32`.
`rhs`	`StorageBuffer`	Yes	Contiguous `(batchCount, n)` `f32` right-hand-side vectors.
`outX`	`StorageBuffer`	Yes	Output buffer for contiguous `(batchCount, n)` `f32` solution vectors.
`batchCount`	`number`	Yes	Number of systems in the batch.
`n`	`number`	Yes	Matrix dimension for each square system.
`opts`	`KernelDispatchOptions`	No	Optional label and workgroup-limit validation settings. `opts.encoder` is currently not supported by this method.

Returns¶

void - This method records and submits its own compute work when batchCount and n are both non-zero.

Type Details¶

type KernelDispatchOptions = {
    encoder?: GPUCommandEncoder;
    label?: string;
    validateLimits?: boolean;
};

lu is compact row-major LU data in (batchCount, n, n) layout. rhs and outX are contiguous (batchCount, n) f32 arrays, so each vector entry is 4 bytes and each right-hand side or output vector consumes n * 4 bytes per batch item.

Requirements: - lu.byteLength >= batchCount * n * n * 4 - rhs.byteLength >= batchCount * n * 4 - outX.byteLength >= batchCount * n * 4 - ipiv.byteLength >= batchCount * n * 4 - lu, ipiv, rhs, and outX must be pairwise distinct StorageBuffer instances - If batchCount or n is 0, the method returns without dispatching work

Current limitation: - luSolveF32Batched() rejects opts.encoder even though other built-in kernels can encode into a caller-provided command encoder

Implementation note: - The current source uses a shared-memory solve path for n <= 512 and a separate large-matrix path above that. Treat that threshold as an implementation detail, not a stable API contract.

Example¶

const canvas = document.querySelector("canvas");
const wgpu = await WasmGPU.create(canvas);

const batchCount = 1;
const n = 2;

const matrices = wgpu.compute.createStorageBuffer({
    data: new Float32Array([
        4, 1,
        2, 3
    ])
});
const ipiv = wgpu.compute.createStorageBuffer({
    byteLength: batchCount * n * 4
});
const rhs = wgpu.compute.createStorageBuffer({
    data: new Float32Array([1, 2])
});
const outX = wgpu.compute.createStorageBuffer({
    byteLength: batchCount * n * 4,
    copySrc: true
});

wgpu.compute.kernels.luFactorF32Batched(matrices, ipiv, batchCount, n);
wgpu.compute.kernels.luSolveF32Batched(matrices, ipiv, rhs, outX, batchCount, n);

const x = await wgpu.compute.readback.readF32(outX, 0, batchCount * n);
console.log(Array.from(x));