META × HUGGINGFACE OPENENV 2026

AUTONOMOUS
INTELLIGENCE
TRIAGE ENGINE

Real-world satellite reconnaissance triage environment for RL agent evaluation. Deterministic grading. Multi-turn decision making. Zero LLM dependency.

Launch Operations Dashboard View on HuggingFace

EASY

Task 1 — False Alarm Detection

Classify satellite reports as genuine threats or false alarms. Binary decision gate with cautious over-flagging partial credit.

2 Actions 30 Cases

MEDIUM

Task 2 — Threat Classification

Identify threat type and rate severity on a 1–10 scale. Related threats earn partial credit via proximity matrix.

5 Actions Severity Scoring

HARD

Task 3 — Drone Allocation

Analyze 3 sectors and deploy a surveillance drone. Multi-turn: investigate before committing. Reasoning quality scored.

6 Actions Multi-Turn

ULTRA

Task 4 — Covert Operation Detection

Identify facilities using civilian cover stories to hide military activity. Classify deception type, identify the cover story, and provide strategic reasoning. Designed to challenge frontier models.

3 Actions 5 Deception Types Multi-Turn

Architecture

🛰️

Observation

Typed Pydantic models with task-specific fields: reports, sectors, deception indicators, and multi-turn state.

🤖

Agent Loop

Standard OpenEnv step/reset/state API. Full episode management with investigation and verification sub-actions.

⚖️

Deterministic Grading

Zero LLM dependency. Proximity matrices, keyword heuristics with negation filtering, structural reasoning analysis.

🐳

Deployment

Docker-containerized. HuggingFace Spaces deployment. FastAPI backend. Full Phase 1 + Phase 2 compliance.

AUTONOMOUS INTELLIGENCE TRIAGE ENGINE