Configuration¶

This guide explains the various configuration options available in Kura using its procedural API (v1). This API is best for flexible pipelines where you need fine control over individual steps, the ability to skip or reorder steps, A/B test different models, or prefer a functional programming style.

Checkpoint Files¶

Kura saves several checkpoint files during processing:

Checkpoint File	Description
`conversations.json`	Raw conversation data
`summaries.jsonl`	Summarized conversations
`clusters.jsonl`	Base cluster data
`meta_clusters.jsonl`	Hierarchical cluster data
`dimensionality.jsonl`	Projected data for visualization

Checkpoint filenames are now defined as properties in their respective model classes. When using the procedural API, checkpoint management is handled via the CheckpointManager.

CLI Configuration¶

When using the CLI, you can configure the checkpoint directory:

# Start the web server with a custom checkpoint directory
kura --dir ./my_checkpoints

The procedural API provides flexibility by breaking the pipeline into composable functions:

from kura.summarisation import summarise_conversations, SummaryModel
from kura.cluster import generate_base_clusters_from_conversation_summaries, ClusterModel
from kura.meta_cluster import reduce_clusters_from_base_clusters, MetaClusterModel
from kura.dimensionality import reduce_dimensionality_from_clusters, HDBUMAP
from kura.checkpoints import CheckpointManager
# Assuming Conversation type might be needed for context, if not, it can be removed.
# from kura.types import Conversation

# Sample conversations (replace with your actual data loading)
# conversations = [Conversation(...)]

# Configure models independently
summary_model = SummaryModel()
cluster_model = ClusterModel()
meta_cluster_model = MetaClusterModel(max_clusters=10)
dimensionality_model = HDBUMAP()

# Optional checkpoint management
checkpoint_manager = CheckpointManager("./my_checkpoints", enabled=True)

# Run pipeline with keyword arguments
async def analyze(conversations): # Added conversations as an argument
    summaries = await summarise_conversations(
        conversations,
        model=summary_model,
        checkpoint_manager=checkpoint_manager
    )

    clusters = await generate_base_clusters_from_conversation_summaries(
        summaries,
        model=cluster_model,
        checkpoint_manager=checkpoint_manager
    )

    reduced = await reduce_clusters_from_base_clusters(
        clusters,
        model=meta_cluster_model,
        checkpoint_manager=checkpoint_manager
    )

    projected = await reduce_dimensionality_from_clusters(
        reduced,
        model=dimensionality_model,
        checkpoint_manager=checkpoint_manager
    )

    return projected

The procedural API excels at working with different model implementations for the same task:

# Use different backends for the same task
from kura.summarisation import summarise_conversations
# Assuming these model classes exist and are correctly imported
# from kura.summarisation import OpenAISummaryModel, VLLMSummaryModel, HuggingFaceSummaryModel

# Sample conversations (replace with your actual data loading)
# conversations = [...]
# checkpoint_mgr = CheckpointManager("./my_checkpoints")

# OpenAI backend
# openai_summaries = await summarise_conversations(
#     conversations,
#     model=OpenAISummaryModel(api_key="sk-..."), # Replace with actual model init if different
#     checkpoint_manager=checkpoint_mgr
# )

# Local vLLM backend
# vllm_summaries = await summarise_conversations(
#     conversations,
#     model=VLLMSummaryModel(model_path="/models/llama"), # Replace with actual model init if different
#     checkpoint_manager=checkpoint_mgr
# )

# Hugging Face backend
# hf_summaries = await summarise_conversations(
#     conversations,
#     model=HuggingFaceSummaryModel("facebook/bart-large-cnn"), # Replace with actual model init if different
#     checkpoint_manager=checkpoint_mgr
# )

Note: The heterogeneous models example has been commented out as it relies on specific model classes (OpenAISummaryModel, VLLMSummaryModel, HuggingFaceSummaryModel) whose existence and import paths are not confirmed from the provided context. Ensure these are correctly defined and imported in your actual usage.

Next Steps¶

Now that you understand how to configure Kura using the procedural API, you can: