FactoryFactory

Deploy Model

Model Deployment

Once you've trained and evaluated your models, Factory makes it easy to deploy them for real-world use. Deployments in Factory provide an OpenAI-compatible API that can be used with existing tools and applications while offering advanced monitoring and drift detection capabilities.

Why Factory Deployments?

Factory's deployment system offers several key advantages:

  • OpenAI-Compatible API - Seamless integration with existing tools and workflows
  • Real-Time Monitoring - Track performance metrics like throughput and latency
  • Data Drift Detection - Automatically detect when production traffic differs from training data
  • Multi-Adapter Support - Deploy multiple adapters in a single service
  • Optimized Inference - Benefit from quantization and performance optimizations

Getting Started

Deploy your trained adapter with just a few lines of code:

from factory_sdk import FactoryClient, DeploymentArgs
 
# Initialize Factory client
factory = FactoryClient(
    tenant="your_tenant_name",
    project="your_project_name",
    token="your_api_key",
)
 
# Deploy your adapter
deployment = factory.deployment \
    .with_name("sentiment-api") \
    .for_adapter(adapter) \
    .with_config(DeploymentArgs(
        port=8000,
        dtype="fp16"
    )) \
    .run(daemon=True)
 
print(f"Deployment is running on http://localhost:8000")

Key Features

OpenAI-Compatible Interface

Access your deployed model using the standard OpenAI client:

from openai import OpenAI
 
client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8000/v1"
)
 
response = client.chat.completions.create(
    model="sentiment-api",
    messages=[{"role": "user", "content": "Analyze this text"}]
)

Automatic Data Drift Detection

Factory continuously monitors your production traffic and compares it to your training data distribution:

  • Uses the same recipe from training to process incoming requests
  • Embeds production data in the same space as training data
  • Applies statistical tests to detect distribution shifts
  • Visualizes drift in the Factory Hub

Deployment Options

Customize your deployment with various configuration options:

  • Memory optimization with quantization
  • Precision control (FP16, BF16, FP32)
  • Sequence length configuration
  • GPU memory utilization
  • CPU swap space allocation

On this page