Security & Approval System¶
Runtime enforcement
This page is the source of truth for the behaviour of this subsystem. Governance runs on the live agent runtime behind the provider-present switch: the approval producer parks blocked actions, the boot ApprovalGate resumes them on a decision, the progressive-trust strategy narrows tool access at the invoker, an agent can call SynthOrg's own MCP tools under its trust level with the admin guardrails fail-closed, and the autonomy controller routes changes through the configured AutonomyChangeStrategy.
SynthOrg enforces a fail-closed security model: every agent action is evaluated by a rule engine (with an optional LLM fallback) before execution, every output is scanned for leaked secrets, and every credential flows through an isolated hands plane that never enters the model context. Four configurable autonomy levels (full, semi, supervised, locked) control which actions require human approval, and a pluggable trust system lets agents earn higher tool access over time.
Approval Workflow¶
graph TD
Task[Task/Action] --> SecOps[Security Ops Agent]
SecOps --> Approve["APPROVE\n(auto)"]
SecOps --> Deny["DENY\n+ reason"]
Approve --> Execute[Execute]
Deny --> HQ[Human Queue\nDashboard]
HQ --> Override[Override Approve]
HQ --> Alt[Alternative Suggested]
Autonomy Levels¶
The framework provides four built-in autonomy presets that control which actions agents can perform independently versus which require human approval. Most users only set the level.
autonomy:
level: "semi" # full, semi, supervised, locked
presets:
full:
description: "Agents work independently. Human notified of results only."
auto_approve: ["all"]
human_approval: []
semi:
description: "Most work is autonomous. Major decisions need approval."
auto_approve: ["code", "test", "docs", "comms:internal"]
human_approval: ["deploy", "comms:external", "budget:exceed", "org:hire"]
security_agent: true
supervised:
description: "Human approves major steps. Agents handle details."
auto_approve: ["code:write", "comms:internal"]
human_approval: ["arch", "code:create", "deploy", "vcs:push"]
security_agent: true
locked:
description: "Human must approve every action."
auto_approve: []
human_approval: ["all"]
security_agent: true # still runs for audit logging
Built-in templates set autonomy levels appropriate to their archetype (e.g. full for
Solo Builder, Research Lab, and Data Team, supervised for Agency, Enterprise Org, and
Consultancy). See the
Company Types table for per-template defaults.
Autonomy scope (Decision Log D6): Three-level
resolution chain: per-agent > per-department > company default. Seniority validation prevents
Juniors/Interns from being set to full.
Runtime changes (Decision Log D7): Human-only promotion via REST API (no agent, including CEO, can escalate privileges). Automatic downgrade on: high error rate (one level down), budget exhausted (supervised), security incident (locked). Recovery from auto-downgrade is human-only.
Autonomy change strategy plugin surface¶
The AutonomyChangeStrategy protocol (security/autonomy/protocol.py:
request_promotion / auto_downgrade / request_recovery) is a
pluggable subsystem following the risk-tier-classifier pattern: a
StrEnum discriminator + frozen config + safe default +
StrategyRegistry factory. The wrapping strategies delegate
downgrade, recovery, and the override store to a base
HumanOnlyPromotionStrategy (where the override store lives) and
override only the promotion decision.
AutonomyStrategyType |
Implementation | Behaviour |
|---|---|---|
HUMAN_ONLY |
HumanOnlyPromotionStrategy |
Promotions + recovery always require human approval. Byte-identical with the pre-plugin default. |
PERFORMANCE_GATED |
PerformanceGatedPromotionStrategy |
Auto-grants promotion when the agent's rolling success rate (injected PerformanceSignalProvider) is at/above promotion_success_threshold; None history defers. |
BUDGET_AWARE |
BudgetAwarePromotionStrategy |
Denies promotion while risk-budget headroom (injected RiskBudgetSignalProvider) is below budget_warn_fraction; otherwise delegates the decision to the base. |
ESCALATION_CHAIN |
EscalationChainPromotionStrategy |
Records the configured approver-role escalation_chain and returns pending (False); per-role approvals arrive out-of-band. |
Selection: AutonomyStrategyConfig (frozen, default
kind=HUMAN_ONLY) + AutonomyStrategyDeps (the base strategy and
signal providers that cannot live in frozen config).
change_strategy_factory.build_autonomy_change_strategy(config, deps)
dispatches via the StrEnum-keyed StrategyRegistry; a wrapping
strategy missing its required signal provider raises
AutonomyStrategyConfigError at construction. The strategy is built
at boot from config.autonomy.change_strategy and attached to
application state; the autonomy controller consults it on every
change request (the D6 seniority rule is enforced first, then the
request is enqueued as an approval, the queue being the apply
driver). With the HUMAN_ONLY default every promotion pends for
human review. The strategy verdict is enforced, not audit-only: a
strategy that returns True from request_promotion produces an
auto-decided approval item (status=APPROVED,
decided_by="strategy:<name>", decided_at set) and the registry
applies the level change immediately, so the queue remains the apply
driver and the audit trail stays intact while a non-HUMAN_ONLY
strategy actually takes effect. The performance / risk-budget signal
providers the PERFORMANCE_GATED and BUDGET_AWARE strategies
require are not wired by the boot seam: selecting one of those kinds
without supplying its provider fails fast at construction.
Security Operations Agent¶
A special meta-agent that reviews all actions before execution:
- Evaluates safety of proposed actions
- Checks for data leaks, credential exposure, destructive operations
- Validates actions against company policies
- Maintains an audit log of all approvals/denials
- Escalates uncertain cases to human queue with explanation
- Cannot be overridden by other agents (only human can override)
Rule engine (Decision Log D4): Hybrid
approach. Rule engine for known patterns (credentials, path traversal, destructive ops) plus
user-defined custom policy rules (custom_policies in security config). Sub-ms, covers ~95%
of cases. LLM fallback only for uncertain cases (~5%). Full autonomy mode:
rules + audit logging only, no LLM path. Hard safety rules (credential exposure, data
destruction) never bypass regardless of autonomy level.
Integration point (Decision Log D5):
Pluggable SecurityInterceptionStrategy protocol. Initial strategy intercepts before every
tool invocation; slots into existing ToolInvoker between permission check and tool
execution. Post-tool-call scanning detects sensitive data in outputs.
Output Scan Response Policies¶
After the output scanner detects sensitive data, a pluggable OutputScanResponsePolicy
protocol decides how to handle the findings. Each policy sets a ScanOutcome enum on the
returned OutputScanResult so downstream consumers (primarily ToolInvoker) can
distinguish intentional policy decisions from scanner failures:
| Policy | Behaviour | ScanOutcome |
Default for |
|---|---|---|---|
| Redact (default) | Return scanner's redacted content as-is | REDACTED |
SEMI, SUPERVISED autonomy |
| Withhold | Clear redacted content; content withheld by policy | WITHHELD |
LOCKED autonomy |
| Log-only | Discard findings (logs at WARNING), pass original output through | LOG_ONLY |
FULL autonomy |
| Autonomy-tiered | Delegate to a sub-policy based on effective autonomy level | (set by delegate) | Composite policy |
The ScanOutcome enum (CLEAN, REDACTED, WITHHELD, LOG_ONLY) is set by the scanner
(initial REDACTED when findings are detected) and may be transformed by the policy (e.g.
WithholdPolicy changes REDACTED -> WITHHELD). The ToolInvoker._scan_output method
branches on ScanOutcome.WITHHELD first to return a dedicated error message ("content
withheld by security policy") with output_withheld metadata, distinct from the generic
fail-closed path used for scanner exceptions.
Policy selection is declarative via SecurityConfig.output_scan_policy_type
(OutputScanPolicyType enum). A factory function (build_output_scan_policy) resolves the
enum to a concrete policy instance. The policy is applied after audit recording, preserving
audit fidelity regardless of policy outcome.
Review Gate Invariants¶
Review gates enforce no-self-review as a structural invariant, not a convention. An agent must never act as reviewer on a task it executed. The invariant is enforced at three layers, each independently sufficient:
- Service-layer preflight:
ReviewGateService.check_can_decide()runs before the approval row is persisted. ASelfReviewErrorat preflight raises403 Forbiddenwith a generic message (the error'stask_idandagent_idattributes are available for structured logs but never leaked in the HTTP body). The preflight-before-persist ordering ensures a rejected self-review attempt never leaves a decided approval row or a broadcast WebSocket event behind. - Pydantic model validator:
DecisionRecord._forbid_self_reviewrejects construction whenexecuting_agent_id == reviewer_agent_id. Type-level invariants catch bugs in any caller that bypasses the service layer. - SQL
CHECKconstraint: thedecision_recordstable carriesCHECK(reviewer_agent_id != executing_agent_id), providing a last-resort defence at the database boundary. If a direct SQL caller somehow bypasses both the service and the model, the DB rejects the write.
Auditable Decisions Drop-Box¶
Every completed review appends an immutable DecisionRecord to the drop-box
(DecisionRepository) capturing full context at decision time: executor,
reviewer, outcome (DecisionOutcome: APPROVED / REJECTED / AUTO_APPROVED
/ AUTO_REJECTED / ESCALATED), reason, acceptance-criteria snapshot, approval
ID cross-reference, and a server-assigned monotonic version per task.
- Append-only: the protocol exposes no update or delete operations; the
SQL schema backs this up by enforcing a
FOREIGN KEY ... ON DELETE RESTRICTontask_id, preventing cascade-deletes that would erase audit trails. - Atomic versioning:
append_with_next_versioncomputes the next version inside a singleINSERT ... (SELECT COALESCE(MAX(version), 0) + 1 ...)statement, eliminating the TOCTOU race that a read-then-write pattern would create under concurrent reviewers. TheUNIQUE(task_id, version)constraint rejects any residual collision asDuplicateRecordError. - Best-effort append after transition: a failed append is logged at ERROR
(via
logger.exception) for audit forensics but does not roll back the review transition itself. Only known transient persistence errors (QueryError,DuplicateRecordError) are treated as non-fatal; programming errors (ValidationError,TypeError, etc.) propagate loudly so schema drift surfaces in dev/CI instead of being masked as silent audit loss. - Unassigned executor, no record: when a task reaches the review gate
without an assigned executor (an anomalous operational state), the service
logs an ERROR event and refuses to write a decision record rather than
smuggling a sentinel string through the
NotBlankStrexecuting_agent_idfield and contaminating the audit trail.
Design Rationale: Append-Only vs Consolidation¶
The drop-box is deliberately append-only, not consolidated into org memory. Org-memory consolidation is lossy by design (it summarises, compresses, and discards detail for context-window efficiency), appropriate for conversational knowledge but unsuitable for compliance-grade audit data, where every decision must be reproducible and verifiable after the fact. Keeping the decision log as a dedicated append-only store avoids coupling audit integrity to memory consolidation heuristics and makes tamper-evident review trivial (any record ever written stays written, verbatim).
Credential Isolation Boundary¶
Credentials flow exclusively through the hands plane (tool execution) via the sandbox credential proxy (tools/sandbox/). They never enter the brain plane (AgentContext, turn records, conversation history) or the session plane (observability events, replay).
Two enforcement points maintain this boundary:
- Task metadata validator:
engine/_validation.py::validate_task_metadata()runs at the engine input boundary before execution begins. It recursively scans all dict keys inTask.metadata(including nested dicts and dicts inside lists), rejecting any key matching credential patterns (token,secret,api_key,password,bearer) with anEXECUTION_CREDENTIAL_ISOLATION_VIOLATIONerror event (execution.credential_isolation.violation) and raisesExecutionStateError. - Sandbox credential manager:
tools/sandbox/credential_manager.py::SandboxCredentialManagerstrips 14 credential-like patterns from environment variable overrides before they enter sandbox containers. Stripped keys are logged viaSANDBOX_CREDENTIAL_STRIPPED.
See also: Engine > Brain / Hands / Session.
Approval Timeout Policy¶
When an action requires human approval (per autonomy level), the agent must wait. The
framework provides configurable timeout policies that determine what happens when a human
does not respond. All policies implement a TimeoutPolicy protocol, configurable per autonomy
level and per action risk tier.
During any wait (regardless of policy) the agent parks the blocked task (saving its
full serialised AgentContext state: conversation, progress, accumulated cost, turn count)
and picks up other available tasks from its queue. When approval arrives, the agent resumes
the original context exactly where it left off. This mirrors real company behaviour: a developer
starts another task while waiting for a code review, then returns to the original work when
feedback arrives.
Approval parking is distinct from the checkpoint-based SUSPENDED state produced by
graceful shutdown: the former is an in-process, voluntary pause initiated by the agent
when a high-risk action needs human sign-off, the latter is an externally-driven save
of in-flight context across a process restart. See
Graceful Shutdown Protocol for the
shutdown-time mechanism.
The action stays in the human queue indefinitely. No timeout, no auto-resolution. The agent works on other tasks in the meantime.
Safest: no risk of unauthorized actions. Can stall tasks indefinitely if human is unavailable.
All unapproved actions auto-deny after a configurable timeout. The agent receives a denial reason and can retry with a different approach or escalate explicitly.
Industry consensus default ("fail closed"). May stall legitimate work if human is consistently slow.
Different timeout behaviour based on action risk level. Low-risk actions auto-approve after a short wait. Medium-risk actions auto-deny. High-risk/security-critical actions wait forever.
approval_timeout:
policy: "tiered"
tiers:
low_risk:
timeout_minutes: 60
on_timeout: "approve" # auto-approve low-risk after 1 hour
actions: ["code:write", "comms:internal", "test"]
medium_risk:
timeout_minutes: 240
on_timeout: "deny" # auto-deny medium-risk after 4 hours
actions: ["code:create", "vcs:push", "arch:decide"]
high_risk:
timeout_minutes: null # wait forever
on_timeout: "wait"
actions: ["deploy", "db:admin", "comms:external", "org:hire"]
Pragmatic: low-risk tasks do not stall, critical actions stay safe. Auto-approve on timeout carries risk. Tuning tier boundaries requires operational experience.
On timeout, the approval request escalates to the next human in a configured chain. If the entire chain times out, the action is denied.
approval_timeout:
policy: "escalation"
chain:
- role: "direct_manager"
timeout_minutes: 120
- role: "department_head"
timeout_minutes: 240
- role: "ceo"
timeout_minutes: 480
on_chain_exhausted: "deny" # deny if entire chain times out
Mirrors real organisations: if one approver is unavailable, the next in line covers. Requires configuring an escalation chain.
Approval API Response Enrichment
The approval REST API enriches every ApprovalItem response with computed
urgency fields so the dashboard can display time-sensitive indicators without
client-side computation:
seconds_remaining(float | null): seconds untilexpires_at, clamped to 0.0 for expired items;nullwhen no TTL is set.urgency_level(enum):critical(< 1 hr),high(< 4 hrs),normal(>= 4 hrs),no_expiry(no TTL). Applied to all list, detail, create, approve, and reject endpoints.
Park/Resume Mechanism
The park/resume mechanism relies on AgentContext snapshots (frozen Pydantic models). When
a task is parked, the full context is persisted to the
PersistenceBackend. When approval arrives, the
framework loads the snapshot, restores the agent's conversation and state, and resumes
execution from the exact point of suspension. This works naturally with the
model_copy(update=...) immutability pattern.
Design decisions (Decision Log):
- D19: Risk Tier Classification. Pluggable
RiskTierClassifierprotocol. Configurable YAML mapping with sensible defaults. Unknown action types default to HIGH (fail-safe). - D20: Context Serialisation. Pydantic JSON via persistence backend.
ParkedContextmodel with metadata columns +context_jsonblob. Conversation stored verbatim; summarization is a context window management concern at resume time, not a persistence concern. - D21: Resume Injection. Tool result injection. Approval requests modelled as tool
calls (
request_human_approval). Approval decision returned asToolResult, semantically correct (approval IS the tool's return value).
Risk-tier classifier plugin surface¶
The RiskTierClassifier protocol (security/timeout/protocol.py,
classify(action_type) -> ApprovalRiskLevel) is a pluggable subsystem
following the security/trust/ pattern: a StrEnum discriminator +
frozen config + safe default + StrategyRegistry factory.
RiskClassifierType |
Implementation | Behaviour |
|---|---|---|
DEFAULT |
DefaultRiskTierClassifier |
Static action-type -> tier map; unknown -> HIGH (D19). Byte-identical with the pre-plugin behaviour. |
WORKLOAD_ADAPTIVE |
WorkloadAdaptiveRiskClassifier |
Wraps a base classifier; elevates one tier when an injected in-flight probe (Callable[[], int]) is at/above workload_threshold. CRITICAL is the ceiling. |
OPERATOR_CONFIGURABLE |
OperatorConfigurableRiskClassifier |
Classifies from an operator-defined action_type -> tier map; unknown -> HIGH (D19 fail-safe). |
TIME_BASED |
TimeBasedRiskElevationClassifier |
Wraps a base classifier; elevates one tier inside a configured off-hours window (wraps midnight) and/or weekends. Uses the Clock seam. |
Selection: RiskClassifierConfig (frozen, on TieredTimeoutConfig.risk_classifier,
default kind=DEFAULT) + RiskClassifierDeps (the in-flight probe and
Clock collaborators that cannot live in frozen config).
risk_classifier_factory.build_risk_tier_classifier(config, deps)
dispatches via the StrEnum-keyed StrategyRegistry; a non-default
kind missing its required dependency raises RiskClassifierConfigError
at construction (fail fast).
The factory is wired at the tiered-timeout-policy seam
(timeout/factory.py::create_timeout_policy). The two other
DefaultRiskTierClassifier() consumers -- SecOpsService.risk_classifier
and the request_human_approval tool wrapper in
engine/_security_factory.py -- remain on the hardcoded default for
now; moving them to the factory is the natural next step once a
SecurityConfig.risk_classifier field is designed (out of scope for
the plugin-surface deliverable, which is the timeout policy seam).
EvidencePackage (HITL Approval Payload)
ApprovalItem.evidence_package (optional EvidencePackage | None) carries a structured
approval payload for human review. See
Event Stream: EvidencePackage Schema for the
full model specification. Existing approval paths (hiring, promotion, pruning) can adopt
the package incrementally; the field defaults to None.
Runtime Policy Engine¶
A pluggable runtime pre-execution gate that evaluates structured action requests
(tool invocations, delegations, approval executions) against loaded policy
definitions before the action runs. This complements the existing
security/rules/ preventive rule engine, which already evaluates actions
before tool execution, by adding a structured policy-as-code decision layer.
Cedar adapter (primary): uses cedarpy for stateless embedded evaluation.
Policies are loaded from files at company boot. No external process needed.
Configuration (SecurityConfig.policy_engine):
| Field | Default | Description |
|---|---|---|
engine |
"none" |
Backend: "cedar" or "none" |
policy_files |
() |
Paths to Cedar policy files |
evaluation_mode |
"log_only" |
"enforce" blocks; "log_only" logs only |
fail_closed |
False |
Deny on evaluation errors if True |
Integration points (via R1 middleware):
wrap_tool_call:PolicyGateMiddlewarewithaction_type="tool_invoke"before_decompose: coordination middleware withaction_type="delegation"ApprovalGate.park_context(): withaction_type="approval_execute"
Safety defaults: engine defaults to "none" (disabled). When enabled,
evaluation_mode defaults to "log_only" so first adoption never breaks
existing flows. Operators graduate to "enforce" after observing decisions.
Module: src/synthorg/security/policy_engine/
Quantum-Safe Audit Trail¶
An observability sink that signs security events with ML-DSA-65 (FIPS 204)
via the Asqav library and chains them in an append-only hash chain for
tamper-evident audit. Wraps the existing observability/sinks.py logging
handler protocol; no changes to event producers.
Features:
- ML-DSA-65 post-quantum signatures per security event
- SHA-256 hash chain linking each entry to its predecessor
- RFC 3161 timestamping via public TSA with local-clock fallback
(emits
SECURITY_TIMESTAMP_FALLBACKon fallback) AuditChainVerifierfor end-to-end chain integrity verification- m-of-n threshold signing for high-risk
EvidencePackageapprovals
Configuration (AuditChainConfig, opt-in):
| Field | Default | Description |
|---|---|---|
enabled |
False |
Opt-in activation |
backend |
"asqav" |
Signing backend |
tsa_url |
None |
RFC 3161 TSA endpoint (None = local clock) |
signing_key_path |
None |
Path to signing key |
chain_storage_path |
None |
Path for chain persistence |
Module: src/synthorg/observability/audit_chain/
OWASP Agentic Top 10 (ASI) Coverage Matrix¶
This matrix maps SynthOrg security mechanisms to the OWASP Top 10 for Agentic Applications (2026). Coverage is independently derived from codebase analysis and may not be fully aligned with OWASP ASI specifications. Operators should cross-reference with official OWASP documentation.
| ASI | Risk | Coverage | Primary Modules |
|---|---|---|---|
| ASI01 | Agent Goal Hijack | Partial | security/rules/ (credential/path detectors), engine/classification/ (semantic detectors), HTMLParseGuard (tool output sanitization), SemanticDriftDetector (middleware) |
| ASI02 | Tool Misuse and Exploitation | Covered | PolicyEngine (Cedar pre-exec gate), security/rules/ (preventive rule engine), tools/sandbox/ (Docker/subprocess isolation), ApprovalGate |
| ASI03 | Identity and Privilege Abuse | Covered | Progressive trust (security/trust/), 4 autonomy levels, AuthorityDeferenceGuard, ApprovalGate, delegation budget, ToolPermissionChecker |
| ASI04 | Agentic Supply Chain Vulnerabilities | Partial | ToolRegistryIntegrityCheck (boot-time hash verification), pip-audit/npm-audit/Trivy in CI, cosign signatures, SLSA provenance. Gap: no runtime plugin integrity verification beyond boot-time hash. |
| ASI05 | Unexpected Code Execution (RCE) | Covered | tools/sandbox/ (Docker with ephemeral containers, subprocess with env filtering), gVisor runtime for high-risk categories (code_execution, terminal), SandboxCredentialManager, workspace boundary enforcement |
| ASI06 | Memory and Context Poisoning | Partial | Procedural memory generation guards, MVCC SharedKnowledgeStore, SemanticDriftDetector. Gap: no automated RAG-store integrity verification. |
| ASI07 | Insecure Inter-Agent Communication | Partial | DelegationChainHashMiddleware (content hash on delegation chain), AuthorityDeferenceGuard (strips authority cues from transcripts). Gap: no message-level encryption (in-process agents, not needed currently). |
| ASI08 | Cascading Failures | Covered | S1 15-risk register mitigations, circuit breakers (BudgetEnforcer), StagnationDetector, CoordinationReplanHook with max_stall_count/max_reset_count hard caps, team-size bounds (3-4 per group, 8 per meeting) |
| ASI09 | Human-Agent Trust Exploitation | Partial | EvidencePackage (structured HITL artifacts with RecommendedAction options), AuditChainSink (tamper-evident decision trail), ApprovalGate with configurable timeout policies. Gap: no cognitive-bias-specific UI warnings. |
| ASI10 | Rogue Agents | Covered | 4 autonomy levels (full/semi/supervised/locked), PolicyEngine (pre-exec gate), tool permissions (ToolPermissionChecker), sandbox isolation, ToolRegistryIntegrityCheck, budget limits, AuthorityBreachDetector |
Summary: 5 covered, 5 partial, 0 uncovered. Partial gaps are documented above with specific module references.
A2A Security¶
Applies when the A2A External Gateway is
enabled (a2a.enabled: true). All A2A security controls are inactive when the gateway
is disabled (the default).
Authentication Schemes¶
The gateway supports multiple authentication schemes for both inbound and outbound A2A communication, configurable per direction:
| Scheme | Inbound (external -> SynthOrg) | Outbound (SynthOrg -> external) |
|---|---|---|
apiKey |
Validate API key in request header | Send API key with outbound requests |
oauth2 |
Validate OAuth2 bearer token | Obtain and send bearer token |
bearer |
Validate static bearer token | Send static bearer token |
mTLS |
Verify client certificate | Present client certificate |
none |
No authentication (development only) | No authentication |
Production Requirement
none authentication is intended for local development and testing only. Production
deployments must not use none for inbound requests. Configure any of the
authenticated schemes (apiKey, oauth2, bearer, or mTLS).
Inbound Request Validation¶
Every inbound A2A request passes through two validation layers before reaching internal agents:
-
DelegationGuard: the same five loop prevention mechanisms that protect internal delegation also apply to external requests. External agents are treated as delegation sources with the gateway as the entry point into the delegation chain.
-
External-specific checks:
- Agent Card verification (see below)
- Request signature validation (when configured)
- Rate limiting scoped to external callers (separate from internal per-pair limits)
- Payload size validation (configurable max request body size)
Agent Trust Establishment¶
External agent identity is verified through two independent layers, both configurable:
- Allowlist (default, always available)
- The
a2a.allowed_agentslist controls which external agents can interact with the organisation. Entries are matched against the Agent Card URL or agent ID. An empty allowlist witha2a.enabled: truerejects all inbound requests (fail-closed). The allowlist is operator-managed via the A2A configuration. - Agent Card signature verification (opt-in)
-
When
a2a.agent_card_verification.require_signaturesis enabled, inbound requests must include a JWS-signed Agent Card. The gateway verifies the signature against a set of trusted public keys or JWKS endpoints. This provides cryptographic proof of agent identity beyond the allowlist.
The two layers are independent: the allowlist gates access (who may connect), signatures verify identity (who is connecting). Both can be enabled simultaneously for defence in depth.
Push Notification Webhook Security¶
A2A push notifications allow external agents to receive task updates via webhooks.
SynthOrg will implement a generic WebhookReceiver that is reusable beyond A2A:
| Protection | Description |
|---|---|
| HMAC signature verification | Webhook payloads are signed with a shared secret using the configured algorithm (default: HMAC-SHA256). The receiver verifies the signature before processing |
| Timestamp validation | Requests include a timestamp header. The receiver rejects requests with timestamps outside the configured clock skew tolerance (default: 300 seconds) |
| Nonce/replay prevention | Each request includes a unique nonce. The receiver maintains a TTL-based dedup window (default: 60 seconds) to reject replayed requests |
The WebhookReceiver will be a standalone reusable component, not A2A-specific. It will
protect any endpoint that receives webhook callbacks from external systems.
SSRF Prevention¶
A2A push notification webhook URLs submitted by external agents must be validated
against SSRF attacks. The framework provides a consolidated SsrfValidator service
that unifies URL validation across all outbound connection points:
| Consumer | Current Implementation | After Consolidation |
|---|---|---|
| Notification adapters (ntfy, Slack) | _validate_outbound_url() |
SsrfValidator |
| Git clone URLs | git_url_validator module |
SsrfValidator |
| Provider discovery | ProviderDiscoveryPolicy allowlist |
SsrfValidator + allowlist |
| A2A push notification webhooks | (new) | SsrfValidator |
For HTTP(S) consumers (webhooks, notifications, provider discovery), the SsrfValidator
rejects URLs targeting private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16),
loopback addresses, link-local addresses, and non-HTTP(S) schemes. Git clone URLs
continue to use the existing git_url_validator module, which supports SSH and SCP-like
syntax with its own validation rules. A configurable allowlist permits legitimate internal
endpoints (e.g., local providers, internal Git servers). DNS rebinding mitigation follows
the existing pattern from git_url_validator: resolved IPs are pinned and re-validated
before connection.
Quadratic Communication Enforcement¶
The existing MessageOverhead.is_quadratic detection (see
Microservices Anti-Patterns)
will be extended with a pluggable QuadraticEnforcementStrategy protocol. This is
particularly relevant for A2A federation where external agent connections can amplify
quadratic scaling. Currently, only detection exists. Enforcement strategies are
proposed below.
Four built-in strategies are planned:
| Strategy | Behaviour | Default |
|---|---|---|
alert_only |
Current behaviour; detect and notify via NotificationDispatcher |
Yes |
soft_throttle |
Auto-tighten rate limiter for affected agent group by rate_reduction_factor |
No |
hard_block |
Reject new connections when agent count exceeds max_agent_connections |
No |
disabled |
No detection or enforcement | No |
The strategy will be pluggable via the QuadraticEnforcementStrategy protocol; custom
strategies can be registered without modifying built-in code.
Quadratic enforcement configuration
A2AConfig¶
The gateway is configured under the a2a key in the company YAML:
Full A2A configuration
a2a:
enabled: false # gateway disabled by default
auth:
inbound: apiKey # apiKey, oauth2, bearer, mTLS, none
outbound: bearer # auth scheme for outbound requests
api_key: "${A2A_API_KEY}" # inbound API key (env var recommended)
outbound_token: "${A2A_OUTBOUND_TOKEN}" # outbound bearer token
allowed_agents: [] # allowlist of external agent IDs/URLs
agent_card_verification:
enabled: false # Agent Card verification
require_signatures: false # JWS signature verification (opt-in)
trusted_jwks_urls: []
trusted_public_keys: []
push_notifications:
enabled: false # push notification support
webhook_receiver:
signature_algorithm: hmac-sha256
clock_skew_seconds: 300 # timestamp tolerance
replay_window_seconds: 60 # nonce dedup window
rate_limiting:
external_max_per_minute: 30 # per-external-agent rate limit
external_burst_allowance: 5
max_request_body_bytes: 1048576 # 1 MB payload limit
See A2A External Gateway for the architecture overview, Agent Card projection, and concept mapping tables.
Session Revalidation and the Revocation Window¶
Long-lived authenticated streams (WebSocket and SSE) do not trust the
access token for their full lifetime. Both re-load the user record on
a single shared cadence, AUTH_REVALIDATE_INTERVAL_SECONDS
(10 minutes), and tear the stream down when the user is deleted, the
role is demoted below read access, or the session JTI has been
revoked (an admin DELETE /sessions/{jti}).
Operationally this means revocation takes effect within at most one
revalidation interval (10 minutes), not instantly. An access token
or open stream remains usable until the next revalidation tick after
the revoking action. The refresh-rotation endpoint
(POST /auth/refresh) rejects immediately on a revoked session, but
an already-issued, not-yet-expired access token on an open WS/SSE
stream is only kicked at the next tick. Size the JWT lifetime and any
incident-response runbook around this 10-minute bound.
WS and SSE share one cadence constant and one sliding-window failure
model. Transient persistence-backend errors during a revalidation
tick are admitted into a per-connection sliding window
(api.auth_revalidate_window_seconds, default 60s;
api.auth_revalidate_max_failures, default 5). Failures age out of
the window instead of resetting on success, so a flaky backend that
interleaves one good response between failure clusters cannot hold a
stale-auth stream open indefinitely; once the window saturates the
stream closes (WS: close code 4011; SSE: a final revoked frame with
reason=backend_unavailable) and the client reconnects against a
healthy replica.
Both failure-tolerance settings are resolved once at startup
(restart_required, read_only_post_init): changing them requires a
restart and does not retune already-open streams. This is a
deliberate change from the previous SSE-only behaviour, where
api.sse_revalidate_max_failures was a runtime-tunable streak
counter; the unified sliding-window model is strictly stronger
against flaky backends and is shared verbatim with WebSocket.
429 rate-limit Retry-After is per-policy (per-operation budgets,
account-lockout duration) and is intentionally not coupled to the
revalidation cadence; the unified cadence governs WS and SSE auth
revalidation only.
Adversarial Red-Team Gate¶
The red-team gate is an opt-in adversarial check that fires as the
LAST step before a deliverable transitions IN_REVIEW -> COMPLETED,
after the normal ReviewPipeline has returned PASS. It treats every
about-to-ship artefact as untrusted input and attacks it along four
locked surfaces:
- CORRECTNESS: does the deliverable do what was asked.
- SECURITY: input validation, secret handling, injection sinks, OWASP-style defects.
- REQUIREMENTS: brief / acceptance-criteria coverage vs. the deliverable's actual content.
- GROUNDING: traceability of every assertive factual claim (numbers, percentages, named entities) to a source.
Shape¶
- The red team is a built-in
Role(name="Red Team", departmentquality_assurance, senioritysenior) carried inBUILTIN_ROLES. The role is instantiated as a realAgentIdentityat boot viabuild_red_team_agent_identityand dispatched throughAgentEngine.runlike any other agent. - The gate's only agent-side side effect is one
submit_red_team_reporttool call carrying a frozenRedTeamReport(execution_id,task_id,findings,summary). The tool is registered ONCE on the engine's tool registry;execution_id/task_idflow through tool arguments, NOT through constructor-bound state, so the tool is a singleton. - The agent prompt wraps the deliverable in
<untrusted-artifact>and the brief in<task-data>viawrap_untrusted(SEC-1). The system prompt explicitly forbids deference to seniority and authority cues in the deliverable, mitigating the authority-deference failure pattern (docs/design/communication-coordination.md).
Severity x autonomy routing¶
Mirrors AutonomyTieredPolicy in security/output_scan_policy.py:
| Severity | LOCKED | SUPERVISED | SEMI | FULL |
|---|---|---|---|---|
| CRITICAL | BLOCK | BLOCK | BLOCK | BLOCK |
| HIGH | BLOCK | BLOCK | BLOCK | BLOCK |
| MEDIUM | BLOCK | BLOCK | PASS+ | PASS+ |
| LOW / INFO | PASS+ | PASS+ | PASS+ | PASS+ |
PASS+ is RedTeamVerdict.PASS_WITH_FINDINGS: the deliverable
proceeds but findings attach to the audit trail. BLOCK returns the
task to IN_PROGRESS with the structured critique as the rework brief.
Grounding subsystem¶
A small GroundingChecker protocol lets the substrate-backed
implementation that EPIC E #1988 ships drop in without changing the
gate. The current HeuristicGroundingChecker is deterministic
regex-based: it flags assertive numeric / temporal claims with no
citation marker. Heuristic-source findings are capped at LOW severity
by HEURISTIC_GROUNDING_MAX_SEVERITY so the stub never blocks on its
own; only the LLM agent's own findings (or, post-#1988, the
substrate-backed checker) may escalate to HIGH/CRITICAL on the
GROUNDING surface.
Configuration¶
CompanyConfig.security.red_team.enabled is False by default. When
enabled, the boot path in workers/runtime_builder.py constructs the
full subsystem via security/redteam/builder.py::build_red_team_runtime,
which returns a RedTeamRuntime NamedTuple (gate, submit tool, repo,
runner). Operators flip the flag once the review-gate integration
point is wired in their deployment.
Failure modes¶
- AGENT FAULTS: agent never files a report, or the dispatch raises. The gate fails OPEN with a synthetic INFO-severity finding; completion is not blocked by an agent fault, but the audit record shows the degraded review.
- GROUNDING FAULTS: heuristic stub raises. The gate logs the failure and proceeds without heuristic findings.
See Also¶
- Tools: tool categories, sandboxing, progressive trust
- Budget: risk budget, shadow mode enforcement
- Verification & Quality: verification stage and review pipeline (the red-team gate is the LAST adversarial layer AFTER the review pipeline passes)
- Design Overview: full index