Ceris is serverless optimization infrastructure. It lets you solve scheduling, portfolio, and resource allocation problems via a single API call — no solver license, no infrastructure to manage, and no OR team required. You get production-grade results in seconds and pay only for compute used.

Do I need a commercial solver license?

No. Ceris runs HiGHS and OR-Tools, which are free, open-source, production-grade solvers. They handle the vast majority of optimization problems — portfolio allocation, scheduling, supply chain, resource planning — without a commercial license. If you already have a Gurobi license, you can bring it for additional solver options.

How does serverless optimization work?

Submit your optimization problem via REST API. Ceris routes it to the best available solver, handles burst compute and parallelization, and returns results with a full audit trail. No servers to provision, no cold starts to tune, no infrastructure to maintain.

What problems can Ceris solve?

Ceris handles linear programming (LP) and mixed-integer programming (MIP) problems including portfolio optimization, workforce scheduling, resource allocation, supply chain planning, energy dispatch, and production scheduling. If your problem can be expressed as an optimization model, Ceris can solve it.

How is Ceris different from Nextmv?

Nextmv provides DevOps tooling for teams that already have solver access and OR expertise — model hosting, testing, versioning. Ceris solves the access problem: it gives you production-grade optimization without needing a solver license, infrastructure, or dedicated OR team. Ceris also focuses on general optimization (scheduling, portfolio, allocation) rather than routing specifically.

Running OR-Tools in Production: Complete Guide

Most optimization engineers love OR-Tools for modeling—it's free, powerful, and Google's CP-SAT solver dominates competition benchmarks. But when it's time to deploy OR-Tools in production, the infrastructure questions start: How do you handle memory leaks at scale? What's the right containerization strategy? Can you actually run optimization workloads serverless?

I've seen teams spend months figuring out OR-Tools production deployment the hard way. Some abandon OR-Tools entirely and switch to commercial solvers, not because of performance limitations, but because Gurobi comes with clearer deployment documentation. That's backwards—with proper infrastructure, OR-Tools production deployments can be more reliable and cost-effective than commercial alternatives.

Here's what actually works for running OR-Tools at scale.

Understanding OR-Tools Architecture for Production

OR-Tools isn't a single binary—it's a suite of optimization libraries with different memory profiles, performance characteristics, and scaling behaviors. Understanding the architecture is crucial for production deployment.

Core Components and Their Production Implications

CP-SAT Solver: The constraint programming solver that's been winning MiniZinc competitions. In production, CP-SAT tends to have predictable memory usage but can be CPU-intensive for large problems. It's the most production-ready component of OR-Tools.

Linear Programming Solvers: OR-Tools includes GLOP (Google's linear optimizer) and interfaces to commercial solvers. GLOP works well for medium-scale problems but may hit memory limits on enterprise-scale linear programs.

Routing Library: Excellent for VRP problems, but the local search algorithms can be memory-hungry during exploration. This is where most production memory issues occur.

Graph Algorithms: Min-cost flow, shortest path, and maximum flow solvers. Generally the most stable components for production use.

The key insight: different OR-Tools components have different scaling profiles. A production architecture should route problems to appropriate solvers based on problem characteristics, not treat OR-Tools as a black box.

Memory Management Reality

Here's what the documentation doesn't tell you: OR-Tools had memory leak issues in the .NET wrapper that were only fixed in recent versions. Python applications can hit garbage collection issues with large models if you don't explicitly manage solver instances.

Critical for production: Always call solver.Clear() or use context managers. Set explicit memory limits. Monitor resident memory, not just heap size—OR-Tools' C++ core can consume significant off-heap memory that Python's garbage collector doesn't track.

Containerization Best Practices

Generic Docker guides don't account for OR-Tools' specific requirements. Here's what actually matters:

Base Image Selection

# Don't use alpine—OR-Tools needs glibc
FROM python:3.11-slim-bullseye

# Pin OR-Tools version for reproducible builds
# Note: pip installs pre-compiled wheels, no build tools needed
RUN pip install ortools==9.7.2996

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app/ /app/
WORKDIR /app

# Critical: set memory limits OR-Tools respects
ENV PYTHONMALLOC=malloc
ENV MALLOC_ARENA_MAX=1

CMD ["python", "main.py"]

Resource Limits That Actually Work

OR-Tools ignores Docker's default memory limits because it allocates memory through C++. You need to:

Set --memory flag at container level
Configure OR-Tools solver parameters to respect limits
Use ulimit -v for virtual memory limits

Most production issues come from OR-Tools hitting system memory limits before Docker's limits kick in, causing OOM kills that are hard to debug.

Multi-Stage Build Pattern

OR-Tools uses pre-compiled wheels, so multi-stage builds primarily help when you have other packages with native extensions. Here's a complete example:

# Build stage
FROM python:3.11-slim-bullseye as builder

# Install build tools only if you have packages that need compilation
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# Create wheels directory
RUN mkdir /wheels

# Build wheels for all dependencies
COPY requirements.txt .
RUN pip wheel --no-cache-dir --wheel-dir=/wheels ortools==9.7.2996
RUN pip wheel --no-cache-dir --wheel-dir=/wheels -r requirements.txt

# Runtime stage
FROM python:3.11-slim-bullseye

# Copy only the pre-built wheels
COPY --from=builder /wheels /wheels

# Install from wheels (no compilation needed)
RUN pip install --no-cache-dir --no-index --find-links=/wheels /wheels/* \
    && rm -rf /wheels

COPY app/ /app/
WORKDIR /app

ENV PYTHONMALLOC=malloc
ENV MALLOC_ARENA_MAX=1

CMD ["python", "main.py"]

Scaling Strategies That Work

The conventional wisdom about scaling optimization workloads is wrong. You can't just throw more containers at the problem—OR-Tools solvers are mostly single-threaded, and optimization problems don't parallelize like web requests.

Horizontal Scaling Patterns

Problem decomposition: Split large optimization problems into smaller subproblems. This works for routing (geographic regions), scheduling (time windows), and assignment problems (resource groups). Each container handles one subproblem.

Solver specialization: Run different containers optimized for different problem types. Route small linear programs to lightweight containers running GLOP. Send complex constraint satisfaction problems to containers with more memory running CP-SAT.

Asynchronous processing: Use message queues (SQS, RabbitMQ) to queue optimization jobs. This prevents resource contention and allows you to prioritize urgent problems.

Vertical Scaling Considerations

OR-Tools performance is memory-bound more than CPU-bound. A container with 8GB RAM and 2 vCPUs often outperforms 4GB RAM and 4 vCPUs. Profile your specific problems—don't assume more CPU cores help.

Memory sizing rule of thumb: Provision 3-5x your model's theoretical memory requirements. OR-Tools uses significant working memory during search, and memory pressure causes performance degradation before OOM errors.

Kubernetes Deployment Patterns

Standard Kubernetes patterns don't work well for optimization workloads. Here's what does:

Job-based deployment: Use Kubernetes Jobs instead of Deployments for long-running optimization problems. Jobs handle completion tracking and cleanup better than trying to make optimization workloads look like web services.

Resource quotas: Set both requests and limits. OR-Tools needs guaranteed memory allocation—without resource requests, Kubernetes may schedule too many optimization pods on the same node.

Node affinity: Consider dedicating node pools to optimization workloads. Mixing OR-Tools containers with latency-sensitive web services causes resource contention issues.

Monitoring and Observability

Generic application monitoring misses OR-Tools-specific failure modes. You need optimization-aware observability.

Metrics That Matter

Solver progress metrics: Track objective value improvement over time. A solver that's not improving may be stuck in local optima or hitting resource limits.

Memory utilization patterns: OR-Tools memory usage is bursty—it allocates heavily during search tree exploration. Monitor peak memory, not average memory.

Solution quality over time: Log objective values and constraint violations. Production optimization systems should detect when solution quality degrades due to infrastructure issues.

import time
import logging
from ortools.sat.python import cp_model

class ProductionSolver:
    def __init__(self):
        self.solver = cp_model.CpSolver()

    def solve_with_monitoring(self, problem_data):
        # Create a new model for each problem
        model = cp_model.CpModel()

        # Build model from problem_data
        # Example: simple constraint satisfaction problem
        x = model.NewIntVar(0, 10, 'x')
        y = model.NewIntVar(0, 10, 'y')
        model.Add(x + 2 * y <= problem_data.get('max_value', 20))
        model.Maximize(x + y)

        start_time = time.time()

        # Configure solver for production monitoring
        self.solver.parameters.log_search_progress = True
        self.solver.parameters.max_time_in_seconds = 300

        status = self.solver.solve(model)

        # Log production metrics
        solve_time = time.time() - start_time
        logging.info(f"Solve time: {solve_time}s")

        # Check status before accessing objective value
        if status in (cp_model.OPTIMAL, cp_model.FEASIBLE):
            logging.info(f"Objective value: {self.solver.objective_value}")
        else:
            logging.info("No feasible solution found")

        logging.info(f"Status: {self.solver.status_name(status)}")

        return status

Alerting Strategies

Solution degradation alerts: Alert when objective values deviate significantly from historical norms. This catches infrastructure issues before business impact.

Solver timeout patterns: Track timeout rates by problem type. Increasing timeouts often indicate memory pressure or CPU contention.

Memory pressure indicators: Alert on containers approaching memory limits before OOM kills occur.

Logging Best Practices

OR-Tools generates verbose logs that can overwhelm logging infrastructure. In production:

Set log_search_progress = False except for debugging
Log solver parameters and problem characteristics for post-mortem analysis
Separate solver logs from application logs—they have different retention requirements

Serverless Deployment Considerations

Can you run OR-Tools serverless? Yes, but with important caveats that most guides ignore.

Cold Start Reality

OR-Tools has significant cold start overhead—importing the library and initializing solvers takes 2-3 seconds in AWS Lambda. This makes serverless unsuitable for real-time optimization but viable for batch processing.

Mitigation strategies:

Use provisioned concurrency for predictable workloads
Implement connection pooling patterns for solver initialization
Consider Lambda container images instead of zip deployments

Memory and Timeout Limits

AWS Lambda's 15-minute timeout works for many optimization problems, but 10GB memory limit can be restrictive. Google Cloud Functions allows 60-minute timeouts but similar memory constraints.

Sizing guidance: Problems that require more than 8GB RAM or 10 minutes solve time are poor fits for serverless. Consider containerized batch jobs instead.

Cost Analysis

Serverless optimization can be cost-effective for sporadic workloads but expensive for constant usage. CloudWatch logs alone can exceed $1,000/month for high-volume optimization workloads.

Break-even analysis: Serverless becomes cost-prohibitive when you're running optimization jobs more than 6-8 hours per day. At that point, dedicated containers are cheaper.

FAQ

How do I handle OR-Tools memory leaks in long-running services?

Memory leaks in OR-Tools typically occur in wrapper layers, not the C++ core. Use process recycling—restart worker processes after solving N problems. In Kubernetes, implement pod restart policies. Always call solver.Clear() explicitly and use context managers in Python.

What's the best way to scale OR-Tools horizontally?

Don't scale individual solvers—scale by problem decomposition. Partition large problems geographically, temporally, or by resource type. Use message queues to distribute subproblems across containers. Each container should solve complete subproblems, not share solver state.

Can OR-Tools compete with commercial solvers in production environments?

For constraint programming problems, OR-Tools' CP-SAT solver consistently outperforms commercial alternatives in benchmarks. For linear programming, OR-Tools works well for small-to-medium problems but may hit scaling limits where commercial solvers excel. The decision often comes down to licensing costs versus infrastructure complexity.

How do I monitor OR-Tools performance in production?

Track solver-specific metrics: objective value progression, solution feasibility, solve times by problem type. Don't just monitor infrastructure metrics. Log problem characteristics (variable count, constraint count) alongside performance data for capacity planning. Set up alerts for solution quality degradation.

Is serverless viable for optimization workloads?

Serverless works for batch optimization with predictable resource requirements and solve times under 10 minutes. It's not suitable for real-time optimization due to cold start overhead. Consider serverless for irregular optimization workflows and dedicated containers for continuous processing.

If this infrastructure complexity is getting in the way of actually solving optimization problems, Ceris provides serverless OR-Tools deployment without the operational overhead. But whether you build it yourself or use a service, the principles above will help you run optimization workloads reliably in production.