DeepSeek-v4-Fable: A Security-Focused AI Agent for CTFs

DeepSeek-v4-Fable is a distilled security-specialized agent model built on DeepSeek-v4-Flash and adapted from Claude-5-Fable, maintained by Chunjiang-Intelligence for autonomous security research workflows. The model contains 0.94B trainable parameters (0.33% of total parameters via LoRA with rank 64) and operates with a maximum sequence length of 96K tokens. It achieved a 63.8% solve rate on Web Security CTF challenges and 68.9% on cryptography challenges. The model was fine-tuned on SecDojo-80K, a corpus of 80,000 verified CTF trajectories across 4,050 distinct challenges, achieving 58.7% overall solve rate within 40 turns and 13.4 mean turns-to-flag on held-out decontaminated challenges. Performance varies by category: binary exploitation reaches 44.5% (19.8 mean turns), reverse engineering 51.2% (16.4 mean turns), and cryptography 68.9% (7.2 mean turns). The two-phase training (rejection-sampled SFT followed by GRPO with programmatic rewards) optimizes for procedural reliability in sandbox environments. The training architecture includes dense milestone rewards, KL anchors (β=0.02), and strict penalties for malformed actions. Ablation studies show removing the KL anchor causes policy collapse into "degenerate payload-spraying