Architecture

Overview

The Factory Platform's architecture is built around a fundamental design principle: separating compute from storage to enable maximum flexibility and efficiency in machine learning development and deployment. This separation is crucial for modern ML workflows, where teams need to work across different environments while maintaining consistent access to large model assets.

Why Separate Compute and Storage?

In traditional ML systems, compute and storage are often tightly coupled, requiring teams to either:

Transfer large model files through their central server (creating bottlenecks)
Maintain separate copies of assets across environments (leading to inconsistencies)
Lock themselves into specific cloud providers or environments

Factory's architecture solves these challenges by:

Direct S3 Access: The SDK communicates directly with S3 storage using signed URLs provided by the Factory Hub. This eliminates the need to route large files through the Hub, significantly improving performance and reducing bandwidth costs.
Environment Flexibility: Teams can seamlessly switch between development environments (notebooks, Colab, local machines) and deployment environments (Docker, cloud services, edge devices) without worrying about asset synchronization.
Centralized Management: While storage is distributed, management remains centralized through the Factory Hub, ensuring consistent access control, versioning, and metadata tracking.

Key Design Decisions

Benefits of This Architecture

Performance Optimization
- Large model files (often several gigabytes) transfer directly between compute environments and S3
- No unnecessary hops through central servers
- Reduced latency and bandwidth costs
- Better scalability for distributed training and deployment
Development Flexibility
- Teams can use their preferred development environments
- Easy switching between local development, cloud resources, and edge devices
- Support for both interactive development (notebooks) and production deployment
- No environment lock-in
Resource Efficiency
- Compute resources can be optimized for specific tasks
- Storage is centralized but accessed efficiently
- No redundant copies of large model files
- Better resource utilization across environments
Security and Control
- Factory Hub maintains control over access and permissions
- Signed URLs provide secure, temporary access to assets
- Centralized audit trail of asset usage
- Fine-grained access control across environments

This architecture enables teams to focus on their ML development and deployment without worrying about infrastructure complexity, while maintaining the security and control needed for enterprise ML operations.

Core Components

Storage Layer (Factory Hub)

The Factory Hub serves as the central management layer for all assets, providing:

Persistent storage with S3 backend
Version control and asset fingerprinting
Access control and permissions management
Asset synchronization across environments
Support for various asset types (models, datasets, adapters, recipes)
Automatic dependency tracking

Compute Layer (SDK)

The Factory SDK provides a consistent interface for model development and deployment:

Environment-agnostic execution
Support for multiple model architectures (Qwen, Llama, Phi, Mistral, Gemma)
Integration with vLLM for efficient inference
Support for both text and vision models
Flexible deployment options with configurable parameters

Workflow Example

Development Phase
- Use SDK in preferred environment (notebook, Colab, etc.)
- Develop and train models with supported architectures
- Assets are automatically synced with Factory Hub
Testing Phase
- Test models in different environments
- Access same assets across all environments
- Evaluate with multiple metrics
- Ensure consistency in results
Deployment Phase
- Package models using SDK
- Deploy to desired environment
- Access same assets in production
- Support for various inference endpoints

Technical Implementation

The platform implements a secure and efficient architecture:

Factory Hub manages asset storage and access control
S3 provides scalable backend storage
SDK provides consistent interface across environments
Signed URLs enable direct S3 access
Token-based authentication and SSL/TLS encryption
Support for private model repositories

Best Practices

When working with the Factory Platform:

Use SDK for all asset operations
Leverage version control for assets
Test in multiple environments
Package models with SDK for deployment
Implement proper access controls
Monitor resource utilization
Maintain clear asset naming conventions

Conclusion

The Factory Platform's architecture enables flexible development and deployment while maintaining consistent asset management. The separation of compute and storage allows users to work in their preferred environments while ensuring all assets remain synchronized and accessible. With its focus on efficiency, security, and flexibility, Factory provides a robust foundation for modern ML development and deployment.

Architecture

On this page