Skip to content

WasmGPU.compute.kernels.luSolveF32Batched

Summary

WasmGPU.compute.kernels.luSolveF32Batched solves batched linear systems A x = b from compact LU factors and pivot rows produced by WasmGPU.compute.kernels.luFactorF32Batched. It reads factored row-major f32 matrices plus contiguous f32 right-hand sides and writes one contiguous f32 solution vector per batch item. Use this when you want a practical GPU-side factor-then-solve path for many small or medium square systems.

Syntax

WasmGPU.compute.kernels.luSolveF32Batched(lu: StorageBuffer, ipiv: StorageBuffer, rhs: StorageBuffer, outX: StorageBuffer, batchCount: number, n: number, opts?: KernelDispatchOptions): void
wgpu.compute.kernels.luSolveF32Batched(lu, ipiv, rhs, outX, batchCount, n, opts);

Parameters

Name Type Required Description
lu StorageBuffer Yes Row-major (batchCount, n, n) compact LU data from luFactorF32Batched().
ipiv StorageBuffer Yes Pivot rows for the same batch and matrix size, stored as (batchCount, n) u32.
rhs StorageBuffer Yes Contiguous (batchCount, n) f32 right-hand-side vectors.
outX StorageBuffer Yes Output buffer for contiguous (batchCount, n) f32 solution vectors.
batchCount number Yes Number of systems in the batch.
n number Yes Matrix dimension for each square system.
opts KernelDispatchOptions No Optional label and workgroup-limit validation settings. opts.encoder is currently not supported by this method.

Returns

void - This method records and submits its own compute work when batchCount and n are both non-zero.

Type Details

type KernelDispatchOptions = {
    encoder?: GPUCommandEncoder;
    label?: string;
    validateLimits?: boolean;
};

lu is compact row-major LU data in (batchCount, n, n) layout. rhs and outX are contiguous (batchCount, n) f32 arrays, so each vector entry is 4 bytes and each right-hand side or output vector consumes n * 4 bytes per batch item.

Requirements: - lu.byteLength >= batchCount * n * n * 4 - rhs.byteLength >= batchCount * n * 4 - outX.byteLength >= batchCount * n * 4 - ipiv.byteLength >= batchCount * n * 4 - lu, ipiv, rhs, and outX must be pairwise distinct StorageBuffer instances - If batchCount or n is 0, the method returns without dispatching work

Current limitation: - luSolveF32Batched() rejects opts.encoder even though other built-in kernels can encode into a caller-provided command encoder

Implementation note: - The current source uses a shared-memory solve path for n <= 512 and a separate large-matrix path above that. Treat that threshold as an implementation detail, not a stable API contract.

Example

const canvas = document.querySelector("canvas");
const wgpu = await WasmGPU.create(canvas);

const batchCount = 1;
const n = 2;

const matrices = wgpu.compute.createStorageBuffer({
    data: new Float32Array([
        4, 1,
        2, 3
    ])
});
const ipiv = wgpu.compute.createStorageBuffer({
    byteLength: batchCount * n * 4
});
const rhs = wgpu.compute.createStorageBuffer({
    data: new Float32Array([1, 2])
});
const outX = wgpu.compute.createStorageBuffer({
    byteLength: batchCount * n * 4,
    copySrc: true
});

wgpu.compute.kernels.luFactorF32Batched(matrices, ipiv, batchCount, n);
wgpu.compute.kernels.luSolveF32Batched(matrices, ipiv, rhs, outX, batchCount, n);

const x = await wgpu.compute.readback.readF32(outX, 0, batchCount * n);
console.log(Array.from(x));

See Also