Security model
Edictum is a security product.
Deterministic runtime rules that execute outside the LLM, before high-risk tool calls reach the tool.
Architecture
Outside the LLM.
User prompt
Natural language
LLM
Decides tool call
Tool call
Function + args
Edictum
Runtime boundary
Tool execution
Or blocked
Rules are evaluated deterministically. They cannot be overridden by prompt injection, jailbreaks, or model confusion.
The model never sees the enforcement layer. It can't negotiate, argue, or bypass.
Failure modes
Policy failures block. Unmatched calls follow the ruleset default.
Edictum fails closed when policy cannot be loaded, validated, or evaluated. A valid ruleset can still allow unmatched tool calls when its configured default is allow.
Threat model
What we defend against — and what we don't.
Edictum measures behavioral conformance to declared profiles: what an agent may read, write, execute, and approve. It composes with output-quality evals instead of replacing them.
Defends against
- Unauthorized tool execution (pre rules)
- Data exfiltration via output (output-rule redaction)
- Privilege escalation (role-based rules)
- Unauthorized sub-agent spawning
- Secret leakage: OpenAI keys, AWS credentials, JWTs, GitHub tokens, Slack tokens
- Rate abuse (session limits: per-tool, per-session, per-attempt)
- Ruleset tampering (immutable YAML with policy version hashing)
- Sensitive file access (.env, .ssh/, .aws/credentials, keys)
Does not defend against
- Write/Irreversible side effects already completed (postconditions fall back to warn)
- Kernel-level sandboxing (use gVisor/Firecracker for that)
- Hallucinated text content (Edictum enforces rules on actions, not words)
- Output-quality scoring (use evals for accuracy, relevance, coherence, and answer quality)
- Network-level attacks (use network policies)
- Prompt injection on text responses (only on tool-call execution)
Standards alignment
OWASP and EU AI Act starter mappings.
These mappings show where runtime decisions, approvals, and audit evidence can support common agent-risk controls. They are implementation guidance, not a claim of full standards coverage or legal compliance.
Example OWASP agentic-risk mappings
Pre rules block unauthorized tool calls regardless of prompt manipulation.
Sandbox rules restrict file paths, commands, and domains to allowlists.
Role-based rules enforce permissions on every tool call.
Pre rules block dangerous shell patterns (rm -rf, sudo, curl|sh).
Session limits cap tool calls, preventing unbounded context manipulation.
Rules enforce boundaries on sub-agent spawning and cross-tool chaining.
EU AI Act risk-management touchpoints
- Declarative rules document allowed and blocked actions
- Observe mode for non-blocking evaluation before enforcement
- Decision log with policy version hashing
- Session rules for operational limits
- Human-in-the-loop (HITL) — agents pause for human authorization
- Configurable approval scope via YAML rules
- Timeout with fail-safe — unanswered approvals block by default
- Every approval decision recorded with actor identity
Docs
AI-consumable documentation — LLM agents can read and reason about Edictum's ruleset format, operators, and capabilities. Designed for both humans and machines.
Reference stack security
Security boundaries for the optional API/app stack.
The optional API/app reference stack is hardened with defense-in-depth across every layer.
JWT authentication on all API endpoints
Tenant isolation — agents see only their own data
Webhook signature validation
Bcrypt API key hashing — plaintext never stored
CSRF protection on state-changing endpoints
Rate limiting on all endpoints
No sensitive data in client-side code
Input validation with Pydantic schemas
Red team
15 attack patterns. 1 real bypass.
Fixed in 6 minutes.
We attacked our own system before anyone else could. Here's what happened.
Retry after block
PII exfiltration via output
Cross-tool chaining
Role escalation
Prompt injection against rule authoring
Parameter manipulation
Session counter bypass
Tool name spoofing
Wildcard abuse
YAML injection in args
Approval timeout race
Sub-agent policy escape
Environment variable leak
Regex backtracking (ReDoS)
read_file /etc/shadow
The one bypass
read_file /etc/shadow — the sandbox rule didn't cover absolute paths outside the workspace directory.
Fix: 2-line YAML addition
not_within:
- /etc
- /varHot-reloaded via SSE. No agent restart. Total fix time: 6 minutes from discovery.
Research
“Mind the GAP”
The GAP research shows that text refusal does not reliably transfer to tool-call safety.
6
Frontier LLMs tested
17,420
Datapoints collected
4,536
Evaluation runs
6
Regulated domains
Vulnerability disclosure
Responsible disclosure.
We take every report seriously. No legal action against good-faith security researchers.
Safe harbor
We will not pursue legal action against researchers who discover and report vulnerabilities in good faith, follow responsible disclosure practices, and avoid data destruction or service disruption.
/.well-known/security.txt