FactoryFactory

Evaluate Models

Evaluation is a crucial step in the machine learning workflow that helps you measure model performance, compare different adapters, and make informed decisions about deployment. Factory provides a comprehensive evaluation system to help you assess your fine-tuned models with precision and flexibility.

Why Evaluation Matters

  • Validate Model Quality: Ensure your model meets performance requirements
  • Compare Alternatives: Determine which adapter performs best for your use case
  • Identify Weaknesses: Discover areas where your model needs improvement
  • Support Decisions: Make data-driven choices about which models to deploy

Evaluation Workflow

graph LR
    A[Train Adapter] --> B[Create Evaluation]
    B --> C[Select Metrics]
    C --> D[Run Evaluation]
    D --> E[Analyze Results]
    E --> F[Deploy or Improve]

Getting Started

Factory makes it easy to evaluate your models with just a few lines of code:

from factory_sdk import FactoryClient, EvalArgs
from factory_sdk.metrics import ExactMatch, F1Score
 
# Initialize the Factory client
factory = FactoryClient(
    tenant="your_tenant_name",
    project="your_project_name",
    token="your_api_key",
)
 
# Run evaluation on an adapter
evaluation = factory.evaluation \
    .with_name("sentiment-eval") \
    .for_adapter(adapter) \
    .using_metric(ExactMatch) \
    .using_metric(F1Score) \
    .on_recipe(recipe) \
    .with_config(EvalArgs(
        max_samples=500,
        batch_size=8
    )) \
    .run()

Key Features

  • Multiple Metrics: Apply various metrics to get a comprehensive view of model performance
  • Adapter Comparison: Compare multiple adapters side-by-side with the same metrics
  • Custom Metrics: Define your own metrics for specialized evaluation needs
  • Efficient Processing: Evaluate models with optimized memory usage and parallel processing
  • Visualization: View evaluation results in the Factory Hub with intuitive visualizations

On this page