insightFeb 6, 2026

Building Security and Observability Into AI Agent Architectures

Recent incidents reveal critical gaps in AI agent security and monitoring. Here's how to build proper guardrails, tracing, and safety controls into MCP and agent SDK implementations.

AI-generated

Building Security and Observability Into AI Agent Architectures

Recent production incidents highlight critical gaps in AI agent security and observability. The Moltbook redirect vulnerability and emerging tracing requirements in OpenAI's Agents SDK point to a broader pattern: we're building powerful autonomous systems without adequate safety controls.

The Current Problem

AI agents operate with significant autonomy, making decisions and executing actions with limited oversight. This creates several risk vectors:

  • Unvalidated redirects and URL handling
  • Insufficient action logging and audit trails
  • Lack of runtime safety bounds
  • Poor error handling and recovery

Security Layer Requirements

Input Validation and Sanitization

# URL validation example
import validators
from urllib.parse import urlparse

def validate_url(url: str) -> bool:
    if not validators.url(url):
        return False
    
    parsed = urlparse(url)
    # Block internal networks
    if parsed.hostname in ['localhost', '127.0.0.1']:
        return False
    
    # Whitelist allowed domains
    allowed_domains = ['example.com', 'api.trusted-service.com']
    return parsed.hostname in allowed_domains

Action Boundaries

Implement explicit boundaries around agent capabilities:

  • Resource limits: CPU, memory, network timeouts
  • Permission scopes: Read-only vs. write operations
  • Rate limiting: Requests per minute/hour
  • Domain restrictions: Allowed endpoints and services

Runtime Monitoring

Track agent behavior in real-time:

class AgentMonitor:
    def __init__(self):
        self.action_log = []
        self.error_count = 0
        self.start_time = time.time()
    
    def log_action(self, action_type: str, params: dict, result: dict):
        entry = {
            'timestamp': time.time(),
            'action': action_type,
            'params': params,
            'result': result,
            'duration': result.get('duration', 0)
        }
        self.action_log.append(entry)
        
        # Check for suspicious patterns
        self._check_anomalies(entry)
    
    def _check_anomalies(self, entry):
        # Flag rapid-fire actions
        recent_actions = [a for a in self.action_log 
                         if a['timestamp'] > time.time() - 10]
        if len(recent_actions) > 10:
            raise SecurityWarning("Rapid action execution detected")

Observability Implementation

Structured Logging

Implement comprehensive logging that captures:

  • Intent: What the agent is trying to accomplish
  • Context: Available information and constraints
  • Decisions: Why specific actions were chosen
  • Outcomes: Results and any side effects

Tracing Integration

For MCP implementations, add distributed tracing:

import opentelemetry.trace as trace
from opentelemetry.exporter.jaeger import JaegerExporter

tracer = trace.get_tracer(__name__)

class MCPHandler:
    def execute_tool(self, tool_name: str, params: dict):
        with tracer.start_as_current_span(f"mcp.tool.{tool_name}") as span:
            span.set_attribute("tool.name", tool_name)
            span.set_attribute("tool.params", str(params))
            
            try:
                result = self._execute(tool_name, params)
                span.set_attribute("tool.success", True)
                return result
            except Exception as e:
                span.set_attribute("tool.success", False)
                span.set_attribute("tool.error", str(e))
                raise

Metrics Collection

Track key performance indicators:

  • Success rates by action type
  • Response times and resource usage
  • Error patterns and recovery times
  • Security events and policy violations

MCP Safety Extensions

The Model Context Protocol (MCP) spec needs additional safety controls:

Resource Declarations

{
  "capabilities": {
    "tools": {
      "web_search": {
        "safety_level": "restricted",
        "rate_limit": "10/minute",
        "allowed_domains": ["wikipedia.org", "github.com"]
      }
    }
  }
}

Permission Model

Implement granular permissions:

  • Read permissions: What data can be accessed
  • Write permissions: What can be modified
  • Network permissions: Allowed external connections
  • Execution permissions: What commands can be run

Implementation Checklist

Before Deployment

  • Input validation on all external data
  • Rate limiting and resource bounds
  • Comprehensive logging and tracing
  • Error handling and graceful degradation
  • Security policy enforcement

Monitoring Setup

  • Real-time alerting on security events
  • Performance dashboards
  • Audit log retention and analysis
  • Anomaly detection rules
  • Incident response procedures

Testing Strategy

  • Security penetration testing
  • Chaos engineering for failure modes
  • Load testing under constraints
  • Privacy and data handling validation

Next Steps

Start with basic logging and input validation, then gradually add more sophisticated monitoring and security controls. Focus on:

  1. Immediate fixes: Address known vulnerabilities
  2. Observability foundation: Implement structured logging
  3. Security boundaries: Define and enforce agent limits
  4. Continuous improvement: Regular security reviews and updates

Building secure, observable AI agents requires thinking beyond functionality to operational safety. The incidents we've seen are early warnings—address these gaps now before they become critical vulnerabilities.