Skip to content

Performance¶

Cello is built for speed. Its Rust core handles TCP, HTTP parsing, routing, JSON serialization, and middleware entirely outside of Python. This guide explains why Cello is fast and how to keep your application running at peak performance.


Architecture Advantages¶

Rust Owns the Hot Path¶

Request --> Rust HTTP Engine --> Python Handler --> Rust Response
                |                     |
                +- SIMD JSON          +- Return dict or Response
                +- Radix routing      +- Python business logic only
                +- Middleware (Rust)

Every request touches Python only for business logic. Everything else runs in compiled Rust:

Component Implementation Benefit
TCP accept loop Tokio (Rust) Zero-copy, epoll/kqueue
HTTP parsing Hyper (Rust) Streaming, zero-alloc
Routing matchit radix tree O(log n) lookup
JSON serialization simd-json (Rust) SIMD-accelerated, 5-10x faster
Middleware Rust trait chain No Python overhead per request
Response building Rust Direct byte assembly

Key Numbers¶

On typical hardware (8-core), Cello handles:

  • 170,000+ requests/sec for simple JSON endpoints (4 workers, wrk 12t/400c)
  • 1.9x faster than BlackSheep+Granian, 3.1x faster than FastAPI+Granian
  • Sub-millisecond routing and JSON serialization
  • 50-70% lower memory than equivalent Python frameworks

Benchmarking¶

Use a tool like wrk, hey, or oha to measure throughput.

# Install wrk (Ubuntu)
sudo apt install wrk

# Benchmark a simple endpoint
wrk -t4 -c100 -d30s http://127.0.0.1:8000/

# With more detail
wrk -t4 -c100 -d30s --latency http://127.0.0.1:8000/

Benchmark Tips¶

  • Always run with --env production and --workers $(nproc).
  • Disable logging during benchmarks (--no-logs).
  • Run the benchmark tool on a separate machine to avoid resource contention.
  • Warm up the server with a few hundred requests before measuring.

Profiling¶

Python Profiling¶

To find bottlenecks in your handler code:

import cProfile

@app.get("/debug/profile")
def profiled_endpoint(request):
    profiler = cProfile.Profile()
    profiler.enable()

    result = your_business_logic()

    profiler.disable()
    profiler.print_stats(sort="cumtime")
    return result

Rust-side Metrics¶

Enable Prometheus metrics to measure request latency at the framework level:

app.enable_prometheus(endpoint="/metrics")

Check the histogram cello_http_request_duration_seconds to see where time is spent.


Optimization Tips¶

1. Return Dicts Instead of Response Objects¶

Returning a plain dict lets Cello serialize JSON entirely in Rust using SIMD instructions. Creating a Response object adds a Python allocation.

# Fast -- Rust handles serialization
@app.get("/users")
def list_users(request):
    return {"users": get_users()}

# Slower -- Python creates the Response object first
@app.get("/users")
def list_users(request):
    return Response.json({"users": get_users()})

Only use Response.json() when you need a custom status code or additional headers.

2. Use Path Parameters Over Query Parameters¶

Path parameters are extracted during routing in the Rust radix tree. Query parameters are parsed from the URL string at runtime.

# Faster -- resolved during routing
@app.get("/users/{id}")
def get_user(request):
    return find_user(request.params["id"])

# Slower -- parsed at request time
@app.get("/users")
def get_user(request):
    return find_user(request.query["id"])

3. Enable Compression¶

For responses larger than 1 KB, gzip compression reduces transfer size and improves perceived latency for clients.

app.enable_compression(min_size=1024)

4. Use Lazy Body Parsing¶

Cello parses request bodies lazily. If your handler does not call request.json() or request.body(), the body is never read from the socket. Design read-only endpoints to avoid parsing the body.

5. Cache Expensive Responses¶

Use the @cache decorator for endpoints that return data that changes infrequently.

from cello import cache

@app.get("/reports/summary")
@cache(ttl=300, tags=["reports"])
def summary(request):
    return compute_expensive_report()

6. Avoid Blocking Calls in Async Handlers¶

Never use synchronous I/O inside an async def handler. This blocks the Tokio runtime thread.

# Bad -- blocks the event loop
@app.get("/data")
async def get_data(request):
    import time
    time.sleep(1)  # DO NOT do this
    return {"data": "value"}

# Good -- use async I/O
@app.get("/data")
async def get_data(request):
    import asyncio
    await asyncio.sleep(1)
    return {"data": "value"}

Connection Pooling¶

For database-heavy applications, use connection pooling to avoid the overhead of creating a new connection per request.

from cello import App, DatabaseConfig

app = App()
app.enable_database(DatabaseConfig(
    url="postgresql://user:pass@localhost/mydb",
    pool_size=20,
    max_lifetime_secs=1800,
))

The pool is managed in Rust and shared across all worker threads.


Cluster Mode¶

For multi-process scaling, enable cluster mode to fork multiple processes, each with its own set of worker threads.

from cello import ClusterConfig

app.run(
    host="0.0.0.0",
    port=8000,
    workers=4,
    cluster=ClusterConfig(processes=4),
)

This creates 4 processes x 4 threads = 16 concurrent execution contexts.


Performance Checklist¶

Area Action
Handlers Return dict instead of Response when possible
Routing Prefer path parameters over query parameters
Compression Enable for responses > 1 KB
Caching Use @cache for read-heavy endpoints
Async Use async def for I/O-bound handlers
Blocking Never use time.sleep() or sync HTTP in async handlers
Workers Set to CPU count (--workers $(nproc))
Logging Disable request logging in production benchmarks
Connection pool Use database connection pooling
Monitoring Enable Prometheus metrics to detect regressions

Next Steps¶