🚀 Fine-Tuning FastTrack

This tutorial demonstrates how to fine-tune a small language model (Qwen 2.5 0.5B) for financial sentiment analysis using the Factory SDK. We'll use the Financial PhraseBank dataset to train a model that can classify financial statements as positive, negative, or neutral.

What You'll Learn

Setup & Data Prep

Factory SDK configuration
Loading small foundation models
Preparing financial datasets

Training & Evaluation

Creating recipes for training
Parameter-efficient fine-tuning
Multi-metric model evaluation

Deployment & Testing

Creating OpenAI-compatible APIs
Optimized model deployment
Performance testing & monitoring

Prerequisites

Basic Python knowledge
Google Colab or a local environment with GPU support

ℹ️ Create Your Factory Account

To use this tutorial, you'll need to create a free Factory account at factory.manufactai.com.

For detailed instructions on setting up your account, please visit our account creation guide.

Once you have an account, you'll be able to access your tenant name and API key needed for the FactoryClient setup in this tutorial.

Step 1: Installation and Setup

Let's start by installing the necessary packages:

!pip install --upgrade factory-sdk
!pip uninstall -y pynvml  # Ensure that Colab has no wrong NVIDIA bindings installed
!pip install flash-attn --no-build-isolation

Now initialize the Factory client:

from factory_sdk import FactoryClient, TrainArgs, AdapterArgs, InitArgs, EvalArgs, DeploymentArgs, ModelChatInput, Role, Message
 
factory = FactoryClient(
    tenant="your_tenant_name",
    project="your_project_name",
    token="your_api_key",
)

This connects your local computing resource to the Factory Hub, enabling seamless integration of your development environment with Factory's services for versioning, tracking, and deployment.

Step 2: Load the Base Model

We'll use Qwen 2.5 0.5B Instruct, a small but powerful foundation model:

model = factory.base_model.with_name("qwen_small") \
    .from_open_weights("Qwen/Qwen2.5-0.5B-Instruct") \
    .save_or_fetch()

After this step, a revision of the model is securely stored in Factory for further use and deployment. The model is only 0.5B parameters in size, making it efficient to run on modest hardware while still providing good performance for our task.

Step 3: Prepare the Dataset

We'll use the Financial PhraseBank dataset, which contains financial statements labeled with sentiment:

from datasets import load_dataset
 
data = load_dataset("takala/financial_phrasebank", "sentences_allagree")
data = data["train"].train_test_split(test_size=0.1, seed=42)
 
dataset = factory.dataset.with_name("financial-phrases") \
    .from_local(data) \
    .save_or_fetch()

The dataset contains financial messages categorized as:

0: negative
1: neutral
2: positive

Step 4: Create a Recipe

We need to format our data into a chat format the model can understand:

def processor(x):
    return ModelChatInput(
        messages=[
            Message(content=x["sentence"], role=Role.USER),
            Message(content="The answer is: " + str(x["label"]), role=Role.ASSISTANT)]
    )
 
recipe = factory.recipe \
    .with_name("financial-phrases") \
    .using_dataset(dataset) \
    .with_preprocessor(processor) \
    .save_or_fetch()

During this process, Factory automatically:

Analyzes data for IID (Independent and Identically Distributed) characteristics
Detects potential data shifts between training and test sets
Provides statistical analysis of these results in the Factory Hub

Our recipe creates a chat format where:

The financial statement is the user's message
The sentiment label is the assistant's response

Step 5: Fine-tune the Model

Now we can fine-tune our model using parameter-efficient techniques:

adapter = factory.adapter \
    .with_name("financial-phrases") \
    .based_on_recipe(recipe) \
    .using_model(model) \
    .with_hyperparameters(
        TrainArgs(
            train_batch_size=8,
            eval_batch_size=8,
            gradient_accumulation_steps=2,
            num_train_epochs=2,
            eval_every_n_minutes=2,
            max_eval_samples=100
        ),
        AdapterArgs(layer_selection_percentage=.5),
        InitArgs(n_test_samples=200)
    ) \
    .run()

Factory automatically:

Measures the response of layers in the network to the training data
Selects the best possible layers for fine-tuning
Determines and sets the optimal LoRA parameters for fine-tuning
Stores all metrics and measurement results in the Factory Hub

We're using several techniques to make training efficient:

Small batch sizes with gradient accumulation (effective batch size of 16)
Only tuning 50% of the model layers (parameter-efficient fine-tuning)
Regular evaluation during training to monitor progress

Step 6: Evaluate the Model

After training, we evaluate our model's performance:

from factory_sdk.metrics import ExactMatch, LevenshteinDistance, PrecisionOneVsRest, Recall, RecallOneVsRest
 
evaluation = factory.evaluation \
    .with_name("eval1") \
    .for_adapter(adapter) \
    .using_metric(ExactMatch) \
    .using_metric(LevenshteinDistance, lower_is_better=True) \
    .using_metric(PrecisionOneVsRest) \
    .using_metric(RecallOneVsRest) \
    .on_recipe(recipe) \
    .with_config(EvalArgs(
        max_samples=500, batch_size=8
    )) \
    .run()

We're using multiple metrics to get a comprehensive view of performance:

Exact Match: Measures if predictions match the expected labels exactly
Levenshtein Distance: Measures character-level differences between predictions and expected labels
Precision and Recall: Classification metrics for each class (positive, neutral, negative)

The evaluation results are stored in the Factory Hub for comparison and analysis.

Step 7: Deploy the Model

Now we can deploy our fine-tuned model as an API:

deployment = factory.deployment \
    .with_name("deployment1") \
    .for_adapter(adapter) \
    .with_config(DeploymentArgs(
        dtype="fp16",
        port=9777,
        max_memory_utilization=.8,
        swap_space=0
    )) \
    .run(daemon=True)

This deployment:

Uses FP16 precision for efficiency
Runs on port 9777
Uses 80% of available GPU memory
Uses no CPU swap space (ideal for environments with limited memory)

The deployment automatically connects with the Factory Hub, transmitting data and metrics. New queries are embedded and compared with the training data through statistical tests in the latent space, providing continuous monitoring of potential data shifts.

Step 8: Use the Deployed Model

To communicate with our deployed model, we'll use the OpenAI client which works with our API:

!pip install openai
 
from openai import OpenAI
 
# Configure the OpenAI client to use our Factory deployment
client = OpenAI(
    api_key="EMPTY",  # Not required for Factory deployments
    base_url="http://localhost:9777/v1"
)

A Factory deployment implements exactly the same interface as OpenAI. This allows you to simply use the OpenAI package, and existing applications can be easily switched over.

Step 9: Test the Model

Let's stress test our API with multiple concurrent requests:

import concurrent.futures
import time
import json
from openai import OpenAI
from tqdm import tqdm
 
def fire_requests_in_threads(client, model_name, test_data, num_requests=1000, max_workers=32, temperature=0.1):
    """
    Fire multiple requests to an OpenAI-compatible API using threading.
    """
    results = []
    errors = []
 
    # Function to make a single request
    def make_request(idx):
        try:
            # Use test data in round-robin fashion
            data_idx = idx % len(test_data)
            prompt = test_data[data_idx]["sentence"]
 
            start_time = time.time()
            completion = client.chat.completions.create(
                model=model_name,
                messages=[{
                    "content": prompt,
                    "role": "user"
                }],
                temperature=temperature
            )
            end_time = time.time()
 
            return {
                "request_id": idx,
                "prompt": prompt,
                "response": completion.choices[0].message.content,
                "response_time": end_time - start_time,
                "status": "success"
            }
        except Exception as e:
            print(f"Error in request {idx}: {e}")
            return {
                "request_id": idx,
                "prompt": prompt if 'prompt' in locals() else None,
                "error": str(e),
                "status": "error"
            }
 
    # Use ThreadPoolExecutor to limit concurrent requests
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        # Submit all tasks
        future_to_idx = {executor.submit(make_request, i): i for i in range(num_requests)}
 
        # Process results as they complete with a progress bar
        with tqdm(total=num_requests, desc="Processing requests") as pbar:
            for future in concurrent.futures.as_completed(future_to_idx):
                result = future.result()
                if result["status"] == "success":
                    results.append(result)
                else:
                    errors.append(result)
                pbar.update(1)
 
    print(f"Completed {len(results)} successful requests with {len(errors)} errors")
 
    # Calculate statistics
    if results:
        response_times = [r["response_time"] for r in results]
        avg_response_time = sum(response_times) / len(response_times)
        max_response_time = max(response_times)
        min_response_time = min(response_times)
 
        print(f"Average response time: {avg_response_time:.4f}s")
        print(f"Minimum response time: {min_response_time:.4f}s")
        print(f"Maximum response time: {max_response_time:.4f}s")
 
    return {
        "successful_requests": results,
        "failed_requests": errors
    }
 
# Run the stress test
results = fire_requests_in_threads(
    client=client,
    model_name="financial-phrases",
    test_data=data["test"],
    num_requests=500,
    max_workers=32,
    temperature=0
)

This test:

Sends 500 requests to our API
Uses up to 32 concurrent workers
Measures response times and success rates

The comparison of these test data is stored in the Factory Hub, allowing you to check if any data shifts have occurred. This provides continuous monitoring of your model's performance in production.

Next Steps

Now that you've built and deployed a financial sentiment analysis model, you could:

Improve the model - Experiment with different base models or hyperparameters
Expand the dataset - Add more financial statements or different types of financial data
Enhance the recipe - Modify the preprocessing to improve model performance
Integrate with applications - Connect your model to financial analysis tools or dashboards

Complete Code

The complete code for this tutorial is available in this Colab notebook.

Ready to build your own AI models?

Factory makes it easy to fine-tune, evaluate, and deploy production-ready models in minutes.

Get Started

Learn More

🚀 Fine-Tuning FastTrack

Setup & Data Prep

Training & Evaluation

Deployment & Testing

Ready to build your own AI models?

On this page