Jury Selection¶
The Jury Selection page allows you to configure the AI models that will evaluate your query. You can set up multiple AI providers, adjust evaluation parameters, and fine-tune how the jury reaches its decision.
Overview¶
This page displays your query's possible outcomes and lets you configure: - Number of iterations - How many times the evaluation process runs - Jury composition - Which AI models participate and their roles - Model parameters - Runs per model and relative weights
Number of Iterations¶
What are Iterations?¶
An iteration is a complete evaluation cycle where all jury members assess your query. Multiple iterations can provide more robust results by averaging across several independent evaluations.
Setting Iterations¶
- Use the + and - buttons to adjust the count
- Minimum: 1 iteration
- Higher iterations increase evaluation cost but may improve reliability
- For most queries, 1-3 iterations provide good results
When to Use Multiple Iterations: - Complex queries with subjective elements - High-stakes decisions requiring extra confidence - Queries where AI models might have variable responses
AI Jury Configuration¶
Jury Members¶
Each jury member represents an AI model that will independently evaluate your query.
Provider Selection¶
Choose from available AI providers:
- OpenAI: GPT models known for strong reasoning capabilities
gpt-3.5-turbo
: Fast, cost-effective for simpler queriesgpt-4
: More advanced reasoning for complex evaluations-
gpt-4o
: Latest model with optimized performance -
Anthropic: Claude models with strong analytical capabilities
claude-2.1
: Balanced performance for general useclaude-3-sonnet-20240229
: Enhanced reasoning capabilities-
claude-3-5-sonnet-20241022
: Latest with improved analysis -
Open-source: Community models for specialized use cases
llava
: Vision-language model for image analysisllama-3.1
: Strong general-purpose reasoningllama-3.2
: Improved efficiency and capabilityphi3
: Efficient model for lightweight evaluations
Model Configuration¶
Runs: Number of times each model evaluates the query - More runs can reduce variability in model responses - Typical range: 1-5 runs per model - Consider cost vs. confidence trade-offs
Weight: Relative importance of this model's opinion (0.0-1.0) - 1.0 = Full weight (default) - 0.5 = Half weight compared to other models - Use different weights to emphasize certain models' expertise - Total weights are automatically normalized
Managing Jury Members¶
Adding Models¶
- Click "Add Another AI Model"
- Configure the new model's provider, model, runs, and weight
- Each model gets a unique configuration
Removing Models¶
- Click the × button next to any jury member
- At least one jury member must remain
- Removing models reduces evaluation cost
Editing Configuration¶
- Change any parameter by clicking on dropdown menus or input fields
- Provider changes automatically update available models
- All changes are saved automatically
Best Practices¶
Jury Composition Strategies¶
Single Model (Simple Queries)¶
- Use one high-quality model (e.g., GPT-4o)
- Appropriate for straightforward factual questions
- Lowest cost, fastest execution
Multi-Model Ensemble (Complex Queries)¶
- Combine 2-4 different models
- Mix providers for diverse perspectives
- Use equal weights unless you have specific expertise requirements
Specialized Configurations¶
- Technical queries: Weight models known for technical accuracy higher
- Creative evaluations: Include diverse models with varying approaches
- Fact-checking: Use models with strong factual knowledge
Parameter Optimization¶
Runs Configuration¶
- 1 run: Standard for most queries
- 3-5 runs: When model consistency is important
- Higher runs: Diminishing returns, increased cost
Weight Distribution¶
- Equal weights: Default approach for balanced evaluation
- Expert weighting: Higher weights for models with relevant specialization
- Confidence weighting: Higher weights for more reliable models
Cost Considerations¶
Each jury member incurs evaluation costs based on: - Model pricing: Different providers have different rates - Number of runs: Cost scales linearly with runs - Query complexity: Longer queries with more supporting data cost more
Cost Optimization Tips: - Start with single models for testing - Use iterations sparingly until you understand your needs - Consider open-source models for cost-sensitive applications
Example Configurations¶
Financial Analysis Query¶
Model 1: OpenAI GPT-4 (Weight: 0.6, Runs: 2)
Model 2: Anthropic Claude-3-Sonnet (Weight: 0.4, Runs: 1)
Iterations: 2
Creative Content Evaluation¶
Model 1: OpenAI GPT-4o (Weight: 0.4, Runs: 1)
Model 2: Anthropic Claude-3.5-Sonnet (Weight: 0.4, Runs: 1)
Model 3: Open-source Llama-3.1 (Weight: 0.2, Runs: 1)
Iterations: 1
Quick Fact Check¶
Navigation¶
- Back: Return to Query Definition to modify your query or supporting data
- Next: Run Query: Proceed to execution (only enabled when at least one jury member is configured)
Tooltips and Help¶
Hover over the ⓘ icons next to each field for detailed explanations: - Provider: Information about AI service providers - Model: Details about specific AI models - Runs: How multiple runs affect evaluation - Weight: How model weights influence final results - Iterations: How iterations improve result reliability
Common Configurations¶
For Beginners¶
Start with the default configuration: - Single OpenAI GPT-4o model - 1 run, weight 1.0 - 1 iteration
For Advanced Users¶
Experiment with: - Multiple providers for diverse perspectives - Different weight distributions based on query type - Higher iterations for critical decisions - Specialized models for domain-specific queries