Training Pipeline

Library

Include

50-hour collection (is_selected_50hr = true) + approved + non-benchmark

My starred audio (user_favorites) + approved + non-benchmark

Stage

Behavior

Resume (skip already-exported) Force (re-export everything)

Limit (smoke test)

Format

HF dataset repo

JEM repo ref

The export runs on the JEM repo at the given git ref. Use a branch name to test a feature export before merging.

Base config

Base model (HF)

Dataset repo (HF)

Output model name

Run name (W&B)

Method

Epochs

Learning rate

Per-device batch size

Gradient accumulation

Warmup steps

Weight decay

Eval steps

Save steps

Max steps (smoke)

Eval

Predict WER during eval

A100 80GB · ~$1.89/hr