Real-world satellite reconnaissance triage environment for RL agent evaluation. Deterministic grading. Multi-turn decision making. Zero LLM dependency.
Classify satellite reports as genuine threats or false alarms. Binary decision gate with cautious over-flagging partial credit.
Identify threat type and rate severity on a 1–10 scale. Related threats earn partial credit via proximity matrix.
Analyze 3 sectors and deploy a surveillance drone. Multi-turn: investigate before committing. Reasoning quality scored.
Identify facilities using civilian cover stories to hide military activity. Classify deception type, identify the cover story, and provide strategic reasoning. Designed to challenge frontier models.
Typed Pydantic models with task-specific fields: reports, sectors, deception indicators, and multi-turn state.
Standard OpenEnv step/reset/state API. Full episode management with investigation and verification sub-actions.
Zero LLM dependency. Proximity matrices, keyword heuristics with negation filtering, structural reasoning analysis.
Docker-containerized. HuggingFace Spaces deployment. FastAPI backend. Full Phase 1 + Phase 2 compliance.