Deploy Model
Model Deployment
Once you've trained and evaluated your models, Factory makes it easy to deploy them for real-world use. Deployments in Factory provide an OpenAI-compatible API that can be used with existing tools and applications while offering advanced monitoring and drift detection capabilities.
Why Factory Deployments?
Factory's deployment system offers several key advantages:
- OpenAI-Compatible API - Seamless integration with existing tools and workflows
- Real-Time Monitoring - Track performance metrics like throughput and latency
- Data Drift Detection - Automatically detect when production traffic differs from training data
- Multi-Adapter Support - Deploy multiple adapters in a single service
- Optimized Inference - Benefit from quantization and performance optimizations
Getting Started
Deploy your trained adapter with just a few lines of code:
Key Features
OpenAI-Compatible Interface
Access your deployed model using the standard OpenAI client:
Automatic Data Drift Detection
Factory continuously monitors your production traffic and compares it to your training data distribution:
- Uses the same recipe from training to process incoming requests
- Embeds production data in the same space as training data
- Applies statistical tests to detect distribution shifts
- Visualizes drift in the Factory Hub
Deployment Options
Customize your deployment with various configuration options:
- Memory optimization with quantization
- Precision control (FP16, BF16, FP32)
- Sequence length configuration
- GPU memory utilization
- CPU swap space allocation