Train LLMs. Build iOS apps. Own everything.
Complete ML training infrastructure with GPU orchestration, real-time mobile monitoring, AI agent workflows, and local inference — all in one Bazel monorepo. Fork it once, never pay for a platform again.
Everything included
Not a framework. Not a CLI. A complete production system you fork and own. Every checkbox here works on day one.
Four platforms, one build
Train on cloud GPUs. Monitor from your phone. Deploy services globally. Run inference locally. All from the same codebase.
GPU Training // RunPod + Modal
Provision H100s, L40S, or A100s with one command. Automatic crash recovery, log streaming, and self-terminating pods to prevent runaway costs. Supports GRPO, SFT, DPO training paradigms.
iOS App // SwiftUI
Monitor training runs from anywhere. Real-time metrics charts, event timelines, log streaming, full-text search. One-command deployment to physical devices with passkey authentication.
Edge Services // Cloudflare
Auth, training metadata, annotation UIs, AI agent orchestration — all deployed as Workers with D1 databases and Durable Objects. Global edge, no servers, scales to zero.
Local Inference // MLX
Run models locally on Apple Silicon at 3,300+ tokens/sec for Qwen3-0.6B, ~1,000 tok/s for 4B models. Continuous batching, JSONL streaming, OpenAI-compatible chat format.
What you actually get
This isn't about the code — it's about what you can do with it.
Train LLMs today
Clone, configure, run. Real GRPO and SFT training on cloud GPUs in minutes, not weeks of setup.
Monitor anywhere
Check training progress from your phone. Get real-time metrics without opening a laptop.
Let agents build
AI coding agents can implement features autonomously via the ticket workflow system.
Own everything
No vendor lock-in. No platform fees. No surprise bills. It's your code forever.
AI agents that ship code
Built-in Cursor API integration for autonomous code generation. Agents take tickets, plan implementation, and submit PRs.
# Agent workflow: ticket to pull request # 1. Create a ticket via GraphQL mutation { createTicket(input: { title: "Add user settings page" description: "..." }) } # 2. Agent plans, human approves # 3. Agent executes autonomously # 4. PR ready for review # Real-time progress via WebSocket # Full audit trail in SQLite
Working demos included
Not hello world tutorials. Real experiments from production use.
GRPO Training
Complete reinforcement learning loop with policy optimization, instrumentation, and integration tests. CPU-compatible for local debugging.
Intent Classification
Full fine-tuning reference for Banking77 dataset. Shows the complete experiment lifecycle from config to evaluation.
Text-to-Speech
Production TTS inference via Modal GPUs. Cloudflare Workers frontend, Qwen3-TTS model, audio streaming.
A real ML project, start to finish
See how agents and humans work together to ship a production model. Most of your time goes to labeling and experiment design — not infrastructure.
// The scenario
You're building a SaaS for sales pipeline management. Users ask analytical questions in natural language: "Show me deals closing this quarter over $50k" or "Which reps have the lowest conversion rate?" Your system needs to convert these to the right tool calls with correct parameters. GPT-4 works but costs $0.03/query — at 100k queries/day, that's $90k/month. Time to train a smaller model.
Build the annotation UI
Agent builds Human labelsBenchmark existing models
Agent runsTry prompt optimization first
Agent optimizesBuild a reward model for GRPO
Agent builds Human verifiesFind cost-effective infrastructure
Agent benchmarksStabilize training dynamics
Agent iteratesRun full training
Agent runs Human monitorsThe outcome
From zero to production-ready model in 2 weeks of calendar time, with roughly 8 hours of focused human work — mostly labeling and experiment design. The agents handled infrastructure, benchmarking, hyperparameter search, and training orchestration. Your model runs locally at 150 tokens/sec or on a $0.20/hr GPU for batch inference.
Why this architecture
Agents write better code when the codebase helps them. Every decision here makes mistakes harder and fixes faster.
# No YAML. No JSON. Types catch mistakes. from pydantic import BaseModel class TrialConfig(BaseModel): learning_rate: float = 2e-5 batch_size: int = 16 model_name: str = "SmolLM2-360M" # IDE autocomplete. Validation at parse time. # Agent can't typo "leraning_rate". TRIAL = TrialConfig( learning_rate=5e-5, batch_size=32, )
Why agents love it
Types everywhere
Pydantic in Python. Rust's type system. TypeScript for services. 80% of bugs caught before runtime. Agents make fewer mistakes when the compiler tells them what's wrong.
No magic
No config files to guess at. No environment variables that change behavior. No implicit state. From the experiment entry point, you can trace every parameter that affects the run.
Fast feedback
Bazel caches test results. Only rerun what changed. An agent can iterate quickly because it doesn't wait 10 minutes to find out it broke something three files away.
CLI tools included
Everything you need to manage experiments, deploy services, and debug issues.
Run management
List runs, stream logs, check GPU availability, inspect metrics. 860 lines of Rust built for terminal power users.
Device deployment
One command to build, sign, and deploy the iOS app to physical devices. Handles certificates and provisioning.
Service deployment
Deploy Cloudflare Workers with D1 database management. Handles dependency ordering and post-deploy verification.
API queries
Direct GraphQL queries to agent orchestration. Debug workflows, inspect state, trigger actions.
Agent control
Manage Cursor API integration. Create tickets, check agent status, view execution logs.
New experiments
Generate boilerplate for new training experiments. Proper structure, types, and integration tests from the start.
An honest pitch
This isn't for everyone. Here's who should — and shouldn't — use Fleaberry.
Good fit
- Teams training LLMs who want to own their infrastructure
- Startups using AI agents to ship features faster
- Engineers tired of paying platform fees for training orchestration
- Projects that need iOS apps + ML training + cloud services together
- Anyone who wants reproducible, auditable ML experiments
- Teams running local inference on Apple Silicon
Not for you if
- You want a hosted platform, not code to maintain
- Your team only writes Python and wants to keep it that way
- You need production multitenancy and billing out of the box
- Bazel's learning curve isn't worth it for your project size
- You need extensive documentation and tutorials
- You're not comfortable with a polyglot codebase