Skip to content

Cello v1.0.1 -- Production-Ready Release¶

Release Date: February 24, 2026 License: MIT Python: 3.12+ Codebase: 32,000+ lines of Rust, 6,000+ lines of Python Tests: 394 passing


Overview¶

Cello v1.0.1 is the culmination of ten iterative pre-release versions, marking the framework's transition from beta to a production-ready, enterprise-grade web framework. With this release, Cello's entire public API is frozen under Semantic Versioning -- no breaking changes will be introduced until v2.0. Developers can depend on Cello for production workloads with full confidence in API stability.

At its core, Cello is built on a simple but powerful principle: Rust owns the hot path, Python owns the developer experience. Every TCP accept, HTTP parse, route lookup, middleware execution, JSON serialization, and response assembly happens in Rust. Python is responsible only for route registration, handler function pointers, and business logic. The result is C-level request throughput with the ergonomics of a modern Python web framework.

Benchmarks show Cello sustaining 170,000+ requests per second with 4 workers (5 processes) using wrk (12 threads, 400 connections, 10s) -- 1.9x faster than BlackSheep+Granian, 3.1x faster than FastAPI+Granian, and 5.9x faster than Robyn. This performance comes not from a single optimization but from a holistic approach: single-threaded Tokio runtime per worker (zero GIL contention), SIMD-accelerated JSON, zero-copy radix tree routing, arena allocators, handler metadata caching, lazy parsing, multi-process SO_REUSEPORT workers with os.fork(), ultra-fast response bypass, and an aggressively optimized release build profile. Every microsecond on the hot path has been scrutinized and eliminated where possible.


Performance¶

Benchmark Results¶

Framework Requests/sec Avg Latency p99 Latency Relative
Cello v1.0.1 170,000+ 2.8ms 15ms 1.0x (baseline)
BlackSheep + Granian ~92,000 4.3ms 13ms ~1.9x slower
FastAPI + Granian ~55,000 7.1ms 17ms ~3.1x slower
Robyn ~29,000 14.2ms 38ms ~5.9x slower

All benchmarks: JSON serialization (GET / returning {"message": "Hello, World!"}), 4 workers (5 processes each), wrk (12 threads, 400 connections, 10s). Results vary by hardware and workload.

What Makes Cello Fast¶

SIMD JSON (simd-json 0.13): JSON parsing and serialization leverages CPU SIMD instructions (SSE4.2, AVX2) for hardware-accelerated processing, achieving up to 10x the throughput of Python's built-in json module. On non-SIMD architectures (e.g., ARM), Cello automatically falls back to serde_json for compatible JSON handling.

Zero-Copy Radix Tree Routing (matchit 0.7): Route matching uses a compile-time-optimized radix tree with O(log n) lookup and zero heap allocations per request. Path parameters are extracted without copying strings.

Arena Allocators (bumpalo 3): Per-request arena allocation groups all transient allocations into a single contiguous memory region that is freed in one operation, eliminating heap fragmentation and reducing allocator pressure.

Handler Metadata Caching: Whether a handler is async def or def is determined once at first invocation and cached. Dependency injection parameter introspection is similarly computed once and reused, eliminating per-request inspect.iscoroutine() and inspect.signature() calls.

Lazy Parsing: Query string decoding is skipped when the query string is empty. Body reading is skipped entirely for GET, HEAD, OPTIONS, and DELETE requests. Headers are pre-allocated with HashMap::with_capacity().

Fast-Path Skipping: Empty middleware chains, empty guard lists, and empty lifecycle hook lists short-circuit immediately without any overhead -- no lock acquisition, no GIL interaction, no function calls.

Atomic Operations: The DI container uses AtomicBool instead of RwLock for the singleton existence check, eliminating lock contention on every request.

Single-Threaded Tokio Runtime (Zero GIL Contention): Each worker process runs a single-threaded Tokio event loop instead of a multi-threaded runtime. This eliminates GIL contention entirely -- no Tokio threads compete for Python::with_gil. Parallelism comes from multi-process os.fork() + SO_REUSEPORT, not from multi-threading. This is the single largest performance optimization in Cello.

Multi-Process Workers (SO_REUSEPORT): On Linux, Cello uses os.fork() to spawn N+1 worker processes (N children + parent), each binding to the same port via SO_REUSEPORT. The Linux kernel distributes connections across workers, bypassing the Python GIL bottleneck and achieving near-linear scaling with core count. On Windows, Cello uses subprocess re-execution with the CELLO_WORKER=1 environment variable, since os.fork() and SO_REUSEPORT are not available.

Ultra-Fast Response Bypass: For the most common case (handler returns a dict, no middleware), Cello bypasses the intermediate Response struct entirely and builds the hyper response directly -- eliminating HashMap allocation, header copying, and extra body copies.

Direct Python-to-JSON Bytes Serialization: Handler return values (dicts, lists, primitives) are serialized directly to JSON bytes in a single pass, skipping the intermediate serde_json::Value tree allocation entirely.

Network Optimizations: TCP_NODELAY disables Nagle's algorithm for lower latency. HTTP/1.1 keep-alive and pipelining are enabled by default.

Metrics Optimization: Latency tracking uses sampled VecDeque recording (every 64th request) to avoid write lock contention under high load.

Build Profile: The release binary is compiled with full link-time optimization (lto = "fat"), single codegen unit, panic = "abort" (no unwinding overhead), stripped debug symbols, and disabled integer overflow checks.


What's New Since v0.10.0¶

v1.0.1 is primarily a performance, stability, and polish release on top of the feature-complete v0.10.0. The following changes are specific to this release:

Added¶

  • Handler metadata caching (async detection and DI parameter introspection cached per handler)
  • Lazy query parsing and body reading for bodyless HTTP methods
  • Pre-allocated headers HashMap with known capacity
  • Fast-path skip for empty middleware chains, guards, and lifecycle hooks
  • Atomic has_py_singletons check (replaces RwLock with AtomicBool)
  • TCP_NODELAY on accepted connections
  • HTTP/1.1 keep-alive and pipeline flush
  • VecDeque ring buffer for O(1) latency tracking
  • Zero-copy response body building
  • Thread-local cached regex in OpenAPI generation
  • Optimized release profile: lto = "fat", panic = "abort", strip = true, overflow-checks = false
  • Semantic versioning commitment: all public APIs are frozen

Fixed¶

  • Handler introspection overhead (per-request inspect module import eliminated)
  • O(n) latency tracking with Vec::remove(0) (replaced with VecDeque)
  • Async middleware chain cloning entire middleware vector per request
  • DI container acquiring RwLock on every request for singleton existence check
  • Lifecycle hooks acquiring GIL even when no hooks are registered
  • println! on circuit breaker state transitions (replaced with tracing::warn!)
  • OpenAPI regex recompilation on every path parameter extraction

Cross-Platform Fixes (v1.0.1)¶

  • Windows multi-worker: Replaced broken multiprocessing.Process with subprocess re-execution (CELLO_WORKER=1 environment variable) for reliable multi-worker operation on Windows
  • Windows signal handling: Wrapped SIGTERM registration in try/except with platform validation, since SIGTERM is not available on Windows
  • Windows static files: Fixed UNC path normalization to correctly resolve static file paths on Windows filesystems
  • Linux-only CPU affinity: Gated os.sched_setaffinity() calls behind platform detection with a warning on non-Linux platforms where CPU affinity is not supported
  • ARM JSON fallback: Added serde_json fallback for JSON serialization on non-SIMD architectures (e.g., ARM) where simd-json is not available

Compatibility Fixes (v1.0.1)¶

  • Async handler validation: wrap_handler_with_validation now correctly detects and supports async def handlers in addition to synchronous handlers
  • Async guard wrappers: _apply_guards creates async or sync wrappers based on the handler type, preventing coroutine-related errors when guards are applied to async handlers
  • Async cache decorator: The cache() decorator now supports async def handlers, properly awaiting the wrapped function
  • Blueprint validation and guards: Blueprint route decorators (@bp.get, @bp.post, etc.) now accept a guards parameter and support DTO validation, matching the App route decorator API
  • Guards exported in __all__: RoleGuard, PermissionGuard, Authenticated, And, Or, Not, GuardError, and ForbiddenError, UnauthorizedError are now included in cello.__all__ for proper public API exposure
  • Database exports in __all__: Database, Redis, and Transaction are now included in cello.__all__

Security Improvements¶

Cello v1.0.1 includes comprehensive security hardening across the framework:

Area Protection Implementation
Static files Path traversal prevention Validates all paths against ../ and encoded variants
HTTP responses CRLF header injection protection Response header values are validated before writing
CORS Specification compliance Strict preflight handling, wildcard restrictions, origin validation
Token validation Timing attack prevention Constant-time comparison via subtle crate for JWT, API key, and session tokens
CSRF Cryptographic tokens HMAC-SHA256 token generation and verification
Sessions Secure cookie defaults HttpOnly, Secure, SameSite=Lax enabled by default
Headers Security header suite CSP, HSTS, X-Frame-Options, X-Content-Type-Options, X-XSS-Protection

Breaking Changes¶

None. All existing v0.10.0 code works without modification on v1.0.1. This release is fully backwards compatible.


Migration Guide¶

From v0.10.x¶

  1. Update your dependency:

    pip install --upgrade cello-framework
    

  2. No code changes required. All existing APIs work as before.

  3. Verify your application:

    import cello
    assert cello.__version__ == "1.0.1"
    

  4. (Optional) Review the performance improvements section to understand what changed in the Rust runtime. These improvements are automatic and require no code changes.

From v0.9.x or Earlier¶

If you are upgrading from v0.9.x or earlier, first upgrade to v0.10.0 and verify your application works, then upgrade to v1.0.1. See the Migration Guide for version-specific upgrade instructions.

From Other Frameworks¶

From FastAPI¶

# FastAPI
from fastapi import FastAPI, Depends
app = FastAPI()

@app.get("/items/{item_id}")
async def read_item(item_id: int, db: Session = Depends(get_db)):
    return db.query(Item).filter(Item.id == item_id).first()

# Cello (nearly identical)
from cello import App, Depends
app = App()

@app.get("/items/{item_id}")
async def read_item(request, db=Depends(get_db)):
    item_id = int(request.params["item_id"])
    return db.query(Item).filter(Item.id == item_id).first()

From Django¶

# Django
from django.http import JsonResponse
def my_view(request, pk):
    obj = MyModel.objects.get(pk=pk)
    return JsonResponse({"id": obj.id, "name": obj.name})

# Cello
@app.get("/items/{pk}")
def my_view(request):
    pk = request.params["pk"]
    obj = MyModel.objects.get(pk=pk)
    return {"id": obj.id, "name": obj.name}

Full Feature List¶

Cello v1.0.1 ships with every feature developed across the entire pre-release series:

Core (v0.1.0 -- v0.3.0)¶

  • HTTP server built on Tokio + Hyper (Rust async runtime)
  • PyO3 Python bindings with abi3 for cross-version compatibility
  • Decorator-based routing: @app.get, @app.post, @app.put, @app.delete, @app.patch, @app.head, @app.options
  • Path parameters and query parameters
  • JSON, Text, HTML, Redirect, Binary, Streaming, XML, and NoContent response types
  • SIMD-accelerated JSON parsing and serialization
  • Blueprint route grouping with shared prefixes and middleware
  • WebSocket support (full-duplex, message handling)
  • Server-Sent Events (SSE) for real-time streaming
  • Multipart form handling with streaming file uploads
  • Async handler support

Authentication & Security (v0.4.0)¶

  • JWT authentication with configurable algorithms
  • Basic authentication and API key authentication
  • Token bucket and sliding window rate limiting
  • Secure cookie-based session management
  • Security headers: CSP, HSTS, X-Frame-Options, X-Content-Type-Options
  • CSRF protection with cryptographic tokens
  • ETag caching and conditional requests
  • Request body size limits
  • Static file serving with path traversal prevention
  • Request ID tracing (UUID)
  • Cluster mode with pre-fork process management
  • TLS via rustls (no OpenSSL dependency)
  • HTTP/2 support
  • HTTP/3 (QUIC) support

Architecture & DX (v0.5.0)¶

  • Dependency injection (Depends()) with singleton support
  • RBAC guards: RoleGuard, PermissionGuard, And/Or combinators
  • Prometheus metrics endpoint
  • OpenAPI/Swagger auto-generation
  • Background tasks
  • Jinja2 template engine integration
  • RFC 7807 Problem Details error responses

Intelligent Middleware (v0.6.0)¶

  • Smart caching with TTL and tag-based invalidation (@cache decorator)
  • Adaptive rate limiting (adjusts limits based on CPU, memory, and latency)
  • DTO validation with Pydantic integration
  • Circuit breaker for fault tolerance
  • Lifecycle hooks (@app.on_event("startup"), @app.on_event("shutdown"))

Observability (v0.7.0)¶

  • OpenTelemetry distributed tracing with OTLP export
  • Kubernetes-compatible health checks (liveness, readiness, startup)
  • Structured JSON logging with trace context injection
  • Dependency health monitoring

Data Layer (v0.8.0)¶

  • Async PostgreSQL connection pooling
  • Redis client with Pub/Sub, clustering, and Sentinel
  • Database transactions with automatic rollback and nested savepoints

API Protocols (v0.9.0)¶

  • GraphQL engine with queries, mutations, subscriptions, and DataLoader
  • gRPC support with bidirectional streaming, gRPC-Web, and reflection
  • Kafka consumer/producer with consumer group management
  • RabbitMQ integration
  • AWS SQS adapter with LocalStack support

Enterprise Patterns (v0.10.0)¶

  • Event Sourcing with aggregate roots, event store, replay, and snapshots
  • CQRS with Command/Query buses and event-driven synchronization
  • Saga Pattern for distributed transaction coordination with compensation logic

Performance & Stability (v1.0.1)¶

  • Handler metadata caching
  • Lazy parsing and zero-copy responses
  • Atomic fast-path optimizations
  • Optimized release build profile
  • API frozen under Semantic Versioning
  • Cross-platform support: Windows multi-worker (subprocess), signal handling, UNC static file paths
  • ARM architecture support: serde_json fallback for non-SIMD platforms
  • Linux-only CPU affinity gated with platform detection
  • Async handler support in validation, guards, cache decorator, and blueprints
  • Guards and database classes exported in cello.__all__

Stability Guarantees¶

Starting with v1.0.1, Cello follows Semantic Versioning:

  • Patch releases (1.0.x): Bug fixes and security patches only. No API changes.
  • Minor releases (1.x.0): New features added in a backwards-compatible manner. Existing code continues to work.
  • Major releases (2.0.0+): Reserved for breaking changes, with a migration guide and deprecation period.

The following are considered part of the stable public API:

  • All classes and functions exported in cello.__all__
  • Route decorator signatures (@app.get, @app.post, etc.)
  • Request and Response object APIs
  • Middleware configuration classes
  • Blueprint API
  • Dependency injection (Depends)
  • Guard system (RoleGuard, PermissionGuard)
  • Lifecycle hooks
  • All configuration dataclasses (JwtConfig, RateLimitConfig, SessionConfig, etc.)

Installation¶

# Install the latest stable release
pip install cello-framework

# Or pin to v1.0.1
pip install cello-framework==1.0.1

# With optional features
pip install cello-framework[postgres]   # PostgreSQL support
pip install cello-framework[graphql]    # GraphQL support
pip install cello-framework[grpc]       # gRPC support
pip install cello-framework[full]       # All optional features

Requirements¶

  • Python 3.12 or later
  • No Rust toolchain required at install time (wheels are pre-built)
  • For building from source: Rust stable toolchain + maturin

Quick Start¶

from cello import App

app = App()

@app.get("/")
def hello(request):
    return {"message": "Hello from Cello v1.0.1!"}

@app.get("/users/{id}")
def get_user(request):
    user_id = request.params["id"]
    return {"id": user_id, "name": "Alice"}

app.run(host="0.0.0.0", port=8000)

Architecture¶

Request --> Rust HTTP Engine --> Python Handler --> Rust Response
                |                      |
                +-- SIMD JSON          +-- Return dict or Response
                +-- Radix routing      +-- Python business logic only
                +-- Middleware (Rust)

Rust owns (32,000+ lines): - TCP accept loop (Tokio) - HTTP parsing (Hyper) - Routing (matchit radix tree) - All 16 middleware modules - JSON serialization (simd-json) - Response building - WebSocket and SSE protocols - TLS, HTTP/2, HTTP/3

Python provides (6,000+ lines): - Route registration via decorators - Handler function definitions - Business logic - Configuration DSL


Technology Stack¶

Component Crate / Library Version
Python Bindings pyo3 0.20 (abi3-py312)
Async Runtime tokio 1.x (full features)
HTTP Server hyper 1.x
HTTP/2 h2 0.4
HTTP/3 (QUIC) quinn 0.10
TLS rustls 0.22
JSON simd-json 0.13
Serialization serde 1.x
Routing matchit 0.7
Concurrency dashmap 5.x
Memory bumpalo 3.x
JWT jsonwebtoken 9.x
Security subtle 2.x
Metrics prometheus 0.13
WebSocket tokio-tungstenite 0.21
Multipart multer 3.x
Tracing opentelemetry 0.21
Compression flate2 1.x

Version History¶

Version Date Highlights
v1.0.1 2026-02-24 Cross-platform fixes (Windows, ARM), async compatibility fixes, export completeness
v1.0.0 2026-02-21 Production-ready stable release, 170K+ req/s, 1.9x faster than Granian, API stability
v0.10.0 2026-02 Event Sourcing, CQRS, Saga Pattern
v0.9.0 2026-02 GraphQL, gRPC, Kafka, RabbitMQ, SQS
v0.8.0 2026-02 Database pooling, Redis, transactions
v0.7.0 2026-01 OpenTelemetry, health checks, structured logging
v0.6.0 2025-12 Smart caching, adaptive rate limiting, DTO validation, circuit breaker
v0.5.0 2025-10 Dependency injection, RBAC guards, Prometheus, OpenAPI
v0.4.0 2025-08 JWT auth, rate limiting, sessions, security headers, cluster mode
v0.3.0 2025-06 WebSocket, SSE, multipart, blueprints
v0.2.0 -- Middleware system, CORS, logging, compression
v0.1.0 -- Initial release, core HTTP routing

Contributors¶

Thanks to all contributors who helped bring Cello to its 1.0 milestone. Special recognition to:

  • Jagadeesh Katla -- project creator and lead maintainer
  • All community members who submitted issues, tested pre-release versions, and provided feedback

What's Next¶

Cello v1.0.1 is the foundation for future innovation. Planned features for the v1.1.0+ series include:

  • OAuth2/OIDC Provider: full OAuth2 server with PKCE, token introspection, and OpenID Connect
  • Service Mesh Integration: Istio/Envoy sidecar support with mTLS and service discovery
  • Admin Dashboard: real-time metrics visualization and request inspection UI
  • Multi-tenancy: tenant isolation, tenant-aware routing, and per-tenant configuration

See the Enterprise Roadmap for the full planned feature set.


Full Changelog¶

See the complete changelog for all changes across every release.