Evaluate Models

Evaluation is a crucial step in the machine learning workflow that helps you measure model performance, compare different adapters, and make informed decisions about deployment. Factory provides a comprehensive evaluation system to help you assess your fine-tuned models with precision and flexibility.

Why Evaluation Matters

Validate Model Quality: Ensure your model meets performance requirements
Compare Alternatives: Determine which adapter performs best for your use case
Identify Weaknesses: Discover areas where your model needs improvement
Support Decisions: Make data-driven choices about which models to deploy

Evaluation Workflow

Getting Started

Factory makes it easy to evaluate your models with just a few lines of code:

from factory_sdk import FactoryClient, EvalArgs
from factory_sdk.metrics import ExactMatch, F1Score
 
# Initialize the Factory client
factory = FactoryClient(
    tenant="your_tenant_name",
    project="your_project_name",
    token="your_api_key",
)
 
# Run evaluation on an adapter
evaluation = factory.evaluation \
    .with_name("sentiment-eval") \
    .for_adapter(adapter) \
    .using_metric(ExactMatch) \
    .using_metric(F1Score) \
    .on_recipe(recipe) \
    .with_config(EvalArgs(
        max_samples=500,
        batch_size=8
    )) \
    .run()

Key Features

Multiple Metrics: Apply various metrics to get a comprehensive view of model performance
Adapter Comparison: Compare multiple adapters side-by-side with the same metrics
Custom Metrics: Define your own metrics for specialized evaluation needs
Efficient Processing: Evaluate models with optimized memory usage and parallel processing
Visualization: View evaluation results in the Factory Hub with intuitive visualizations

Evaluate Models

Why Evaluation Matters

Evaluation Workflow

Getting Started

Key Features

On this page