The LLM training codebase
your agents wish you had.
Most LLM projects are duct tape — scattered scripts, no tests, configs agents can't follow. Fleaberry is typed, tested, and built for agents: fine-tuning recipes, GPU provisioning across clouds, model registry and storage, annotation UIs, vibe-check playgrounds, and deployment patterns. Go from idea to deployed inference in one codebase.
The pace of your project determines its outcome
ML projects are time-bound. Sequential execution means fewer experiments, fewer insights, fewer course corrections. Parallel execution means you learn faster and adapt before it's too late.
See it in action: a real project
Watch how an ML engineer and AI agents collaborate. The agent handles performance tuning, cloud selection, and experiment infrastructure — you focus on the ML.
// the_scenario
You're building a SaaS for sales pipeline management. Users ask analytical questions in natural language: "Show me deals closing this quarter over $50k" or "Which reps have the lowest conversion rate?" Your system needs to convert these to the right tool calls with correct parameters. You're using GPT-4.1 but it's only ~70% accurate on your domain-specific tools — users complain about failed queries. It's also costing $8k/month. Time to train a specialized model.
Everything included
Not a framework. Not a CLI. A complete production system you fork and own. Every checkbox here works on day one.
GPU
Mobile
ML
AI Agents
Backend
Build
Four platforms, one build
Train on cloud GPUs. Monitor from your phone. Deploy services globally. Run inference locally. All from the same codebase.
GPU Training // RunPod + Modal
Provision H100s, L40S, or A100s with one command. Automatic crash recovery, log streaming, and self-terminating pods to prevent runaway costs. Supports GRPO, SFT, DPO training paradigms.
iOS App // SwiftUI
Monitor training runs from anywhere. Real-time metrics charts, event timelines, log streaming, full-text search. One-command deployment to physical devices with passkey authentication.
Edge Services // Cloudflare
Auth, training metadata, annotation UIs, AI agent orchestration — all deployed as Workers with D1 databases and Durable Objects. Global edge, no servers, scales to zero.
Local Inference // MLX
Run models locally on Apple Silicon at 3,300+ tokens/sec for Qwen3-0.6B, ~1,000 tok/s for 4B models. Continuous batching, JSONL streaming, OpenAI-compatible chat format.
What you actually get
This isn't about the code — it's about what you can do with it.
Train LLMs today
Clone, configure, run. Real GRPO and SFT training on cloud GPUs in minutes, not weeks of setup.
Monitor anywhere
Check training progress from your phone. Get real-time metrics without opening a laptop.
Let agents build
AI coding agents can implement features autonomously via the ticket workflow system.
Own everything
No vendor lock-in. No platform fees. No surprise bills. It's your code forever.
Optimized recipes included
Not hello world tutorials. Production-tested experiments you can adapt.
GRPO Training
Reinforcement learning with policy optimization. Use case: training models to follow tool-calling formats, code generation with test verification.
Intent Classification
Fine-tuning for multi-class classification. Use case: customer support routing, command parsing, query categorization.
Text-to-Speech
Production TTS inference on Modal GPUs. Use case: voice assistants, audiobook generation, accessibility features.
Why agents love it
Types everywhere
Pydantic in Python. Rust's type system. TypeScript for services. 80% of bugs caught before runtime. Agents make fewer mistakes when the compiler tells them what's wrong.
No magic
No config files to guess at. No environment variables that change behavior. No implicit state. From the experiment entry point, you can trace every parameter that affects the run.
Fast feedback
Bazel caches test results. Only rerun what changed. An agent can iterate quickly because it doesn't wait 10 minutes to find out it broke something three files away.
One CLI, everything included
Everything you need to manage experiments, deploy services, and debug issues.
Run experiments
Start training runs, stream logs, check GPU availability, inspect metrics. Built for terminal power users.
Device deployment
One command to build, sign, and deploy the iOS app to physical devices. Handles certificates and provisioning.
Service deployment
Deploy Cloudflare Workers with D1 database management. Handles dependency ordering and post-deploy verification.
API queries
Direct GraphQL queries to agent orchestration. Debug workflows, inspect state, trigger actions.
Agent control
Manage agent integration. Create tickets, check agent status, view execution logs.
New experiments
Generate boilerplate for new training experiments. Proper structure, types, and integration tests from the start.