VTKL Evals Dashboard

Live
Loading...
Loading evaluation data...
⚡ Discovery-First Methodology
How we collapsed three sequential phases (elicitation → documentation → validation) into one parallel process — producing validated specs, working prototypes, and production schemas simultaneously.
Scrumerfall
1,440h
4-12 weeks, 30-50% rework
Traditional Scrum
48h
Per sprint, no client deliverable
Discovery-First
14h
Spec + Prototype + Schema
Efficiency Ratio
103:1
vs Scrumerfall worst case
Phase Collapse — The Key Innovation
Traditional (Sequential)
1. Elicitation2-4 weeks
2. Documentation4-8 weeks
3. Validation2-4 weeks
Total8-16 weeks
Discovery-First (Parallel)
All three phases4-6 hours
Spec✓ simultaneous
Prototype✓ simultaneous
Schema✓ simultaneous
Evidence: Mari's Garden Case Study
📖 Product Specification
Executive summary, 4 customer journeys, 7 BR sets with testable ACs. Client-presentable spec site.
maris-gardens-spec.pages.dev
🖥️ Interactive Prototype
20+ screens: admin dashboard, product CRUD, ordering portal, customer portal. Real images, working navigation.
maris-gardens-demo.pages.dev
🗄️ Production Schema
19 tables, 14 enums, 20+ RLS policies. Atomic stored procedures. Every deferred decision tagged.
PostgreSQL + Supabase RLS
Eval Integration
Discovery methodology is fully integrated into the evals system with a dedicated rubric (6 PASS criteria, 6 FAIL triggers, 4 process discipline checks) and 9 calibration corpus entries.
Rubricdiscovery-methodology.yaml
Corpus entries9 (TJ-087→095)
Judge modelGLM 5.1
🎧 Full Narration
Complete methodology walkthrough with audio narration and detailed visual comparison.
View Full Presentation →
Production Demos
Enterprise Walkthrough
Pipeline Status
Phase 1 — Intelligence Intake
Operational
Drive intake
Slack monitoringActive
Stakeholder files
Phase 2 — Shadow Review
Operational
Total runs
Judge modelGLM-5.1
Rubrics6
Phase 3 — Correlation Engine
Operational
Correlation runs
Decisions tracked
Intel items
Cron Schedule
Job Schedule Description Status
shadow-review 0 3 * * * Nightly shadow review of agent outputs Active
memory-consolidation 0 4 * * * Consolidate daily memory into long-term storage Active
drive-intake */30 * * * * Sync Google Drive shared files for analysis Active
tony-task-capture 0 8,12,17 * * 1-5 Capture and triage Tony's DM task backlog Active
bd-daily 0 9 * * 1-5 Generate and post BD daily briefing Active
correlation-engine 0 5 * * 0 Weekly cross-domain correlation analysis Active
Memory Layer
Stakeholder Profiles
Individual intelligence files
Rubrics Registered
6
behavioral, discovery, effort, process, product, sales
MLflow Experiment
warren-evals
Experiment ID: 1 • MLflow 3.12.0