Model Deployment

Once you've trained and evaluated your models, Factory makes it easy to deploy them for real-world use. Deployments in Factory provide an OpenAI-compatible API that can be used with existing tools and applications while offering advanced monitoring and drift detection capabilities.

Why Factory Deployments?

Factory's deployment system offers several key advantages:

OpenAI-Compatible API - Seamless integration with existing tools and workflows
Real-Time Monitoring - Track performance metrics like throughput and latency
Data Drift Detection - Automatically detect when production traffic differs from training data
Multi-Adapter Support - Deploy multiple adapters in a single service
Optimized Inference - Benefit from quantization and performance optimizations

Getting Started

Deploy your trained adapter with just a few lines of code:

from factory_sdk import FactoryClient, DeploymentArgs
 
# Initialize Factory client
factory = FactoryClient(
    tenant="your_tenant_name",
    project="your_project_name",
    token="your_api_key",
)
 
# Deploy your adapter
deployment = factory.deployment \
    .with_name("sentiment-api") \
    .for_adapter(adapter) \
    .with_config(DeploymentArgs(
        port=8000,
        dtype="fp16"
    )) \
    .run(daemon=True)
 
print(f"Deployment is running on http://localhost:8000")

Key Features

OpenAI-Compatible Interface

Access your deployed model using the standard OpenAI client:

from openai import OpenAI
 
client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1"
)
 
response = client.chat.completions.create(
    model="sentiment-api",
    messages=[{"role": "user", "content": "Analyze this text"}]
)

Automatic Data Drift Detection

Factory continuously monitors your production traffic and compares it to your training data distribution:

Uses the same recipe from training to process incoming requests
Embeds production data in the same space as training data
Applies statistical tests to detect distribution shifts
Visualizes drift in the Factory Hub

Deployment Options

Customize your deployment with various configuration options:

Memory optimization with quantization
Precision control (FP16, BF16, FP32)
Sequence length configuration
GPU memory utilization
CPU swap space allocation

Deploy Model