Security model

Edictum is a security product.

Deterministic runtime rules that execute outside the LLM, before high-risk tool calls reach the tool.

Architecture

Outside the LLM.

User prompt

Natural language

LLM

Decides tool call

Tool call

Function + args

Edictum

Runtime boundary

Tool execution

Or blocked

User prompt

Natural language

LLM

Decides tool call

Tool call

Function + args

Edictum

Runtime boundary

Tool execution

Or blocked

Rules are evaluated deterministically. They cannot be overridden by prompt injection, jailbreaks, or model confusion.

The model never sees the enforcement layer. It can't negotiate, argue, or bypass.

Failure modes

Policy failures block. Unmatched calls follow the ruleset default.

Edictum fails closed when policy cannot be loaded, validated, or evaluated. A valid ruleset can still allow unmatched tool calls when its configured default is allow.

ScenarioOutcome

Ruleset has syntax errorblock

No rules match tool callallow (default)

Rule evaluation timeoutblock (planned)

Reference stack unreachablelocal cache → block

Malformed ruleset YAMLreject load, keep previous

Unknown rule typereject load

Session limit exceededblock

Threat model

What we defend against — and what we don't.

Edictum measures behavioral conformance to declared profiles: what an agent may read, write, execute, and approve. It composes with output-quality evals instead of replacing them.

Defends against

Unauthorized tool execution (pre rules)
Data exfiltration via output (output-rule redaction)
Privilege escalation (role-based rules)
Unauthorized sub-agent spawning
Secret leakage: OpenAI keys, AWS credentials, JWTs, GitHub tokens, Slack tokens
Rate abuse (session limits: per-tool, per-session, per-attempt)
Ruleset tampering (immutable YAML with policy version hashing)
Sensitive file access (.env, .ssh/, .aws/credentials, keys)

Does not defend against

Write/Irreversible side effects already completed (postconditions fall back to warn)
Kernel-level sandboxing (use gVisor/Firecracker for that)
Hallucinated text content (Edictum enforces rules on actions, not words)
Output-quality scoring (use evals for accuracy, relevance, coherence, and answer quality)
Network-level attacks (use network policies)
Prompt injection on text responses (only on tool-call execution)

Standards alignment

OWASP and EU AI Act starter mappings.

These mappings show where runtime decisions, approvals, and audit evidence can support common agent-risk controls. They are implementation guidance, not a claim of full standards coverage or legal compliance.

Example OWASP agentic-risk mappings

ASI01Goal Hijacking

Pre rules block unauthorized tool calls regardless of prompt manipulation.

ASI02Tool Misuse

Sandbox rules restrict file paths, commands, and domains to allowlists.

ASI03Privilege Abuse

Role-based rules enforce permissions on every tool call.

ASI05Code Execution

Pre rules block dangerous shell patterns (rm -rf, sudo, curl|sh).

ASI06Memory Poisoning

Session limits cap tool calls, preventing unbounded context manipulation.

ASI09Trust Exploitation

Rules enforce boundaries on sub-agent spawning and cross-tool chaining.

EU AI Act risk-management touchpoints

Article 9Risk Management

Declarative rules document allowed and blocked actions
Observe mode for non-blocking evaluation before enforcement
Decision log with policy version hashing
Session rules for operational limits

Article 14Human Oversight

Human-in-the-loop (HITL) — agents pause for human authorization
Configurable approval scope via YAML rules
Timeout with fail-safe — unanswered approvals block by default
Every approval decision recorded with actor identity

Docs

AI-consumable documentation — LLM agents can read and reason about Edictum's ruleset format, operators, and capabilities. Designed for both humans and machines.

Reference stack security

Security boundaries for the optional API/app stack.

The optional API/app reference stack is hardened with defense-in-depth across every layer.

JWT authentication on all API endpoints

Tenant isolation — agents see only their own data

Webhook signature validation

Bcrypt API key hashing — plaintext never stored

CSRF protection on state-changing endpoints

Rate limiting on all endpoints

No sensitive data in client-side code

Input validation with Pydantic schemas

Red team

15 attack patterns. 1 real bypass.
Fixed in 6 minutes.

We attacked our own system before anyone else could. Here's what happened.

#01

Retry after block

#02

PII exfiltration via output

#03

Cross-tool chaining

#04

Role escalation

#05

Prompt injection against rule authoring

#06

Parameter manipulation

#07

Session counter bypass

#08

Tool name spoofing

#09

Wildcard abuse

#10

YAML injection in args

#11

Approval timeout race

#12

Sub-agent policy escape

#13

Environment variable leak

#14

Regex backtracking (ReDoS)

#15

read_file /etc/shadow

The one bypass

read_file /etc/shadow — the sandbox rule didn't cover absolute paths outside the workspace directory.

Fix: 2-line YAML addition

not_within:
  - /etc
  - /var

Hot-reloaded via SSE. No agent restart. Total fix time: 6 minutes from discovery.

Research

“Mind the GAP”

The GAP research shows that text refusal does not reliably transfer to tool-call safety.

Frontier LLMs tested

17,420

Datapoints collected

4,536

Evaluation runs

Regulated domains

arXiv:2602.16943 HuggingFace Dataset (CC-BY-4.0)

Vulnerability disclosure

Responsible disclosure.

We take every report seriously. No legal action against good-faith security researchers.

Contactsecurity@edictum.ai

Acknowledgment48 hours

Triage7 days

ScopeCore library, reference stack, official adapters

Safe harbor

We will not pursue legal action against researchers who discover and report vulnerabilities in good faith, follow responsible disclosure practices, and avoid data destruction or service disruption.

/.well-known/security.txt