stanfordnlp/dspy # codebase.md

This is page 2 of 17. Use http://codebase.md/stanfordnlp/dspy?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .github
│   ├── .internal_dspyai
│   │   ├── internals
│   │   │   ├── build-and-release.md
│   │   │   └── release-checklist.md
│   │   └── pyproject.toml
│   ├── .tmp
│   │   └── .generated-actions
│   │       └── run-pypi-publish-in-docker-container
│   │           └── action.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.yml
│   │   └── feature_request.yml
│   ├── PULL_REQUEST_TEMPLATE
│   │   └── pull_request_template.md
│   ├── workflow_scripts
│   │   └── install_testpypi_pkg.sh
│   └── workflows
│       ├── build_and_release.yml
│       ├── build_utils
│       │   └── test_version.py
│       ├── docs-push.yml
│       ├── precommits_check.yml
│       └── run_tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── docs
│   ├── .gitignore
│   ├── docs
│   │   ├── api
│   │   │   ├── adapters
│   │   │   │   ├── Adapter.md
│   │   │   │   ├── ChatAdapter.md
│   │   │   │   ├── JSONAdapter.md
│   │   │   │   └── TwoStepAdapter.md
│   │   │   ├── evaluation
│   │   │   │   ├── answer_exact_match.md
│   │   │   │   ├── answer_passage_match.md
│   │   │   │   ├── CompleteAndGrounded.md
│   │   │   │   ├── Evaluate.md
│   │   │   │   ├── EvaluationResult.md
│   │   │   │   └── SemanticF1.md
│   │   │   ├── experimental
│   │   │   │   ├── Citations.md
│   │   │   │   └── Document.md
│   │   │   ├── index.md
│   │   │   ├── models
│   │   │   │   ├── Embedder.md
│   │   │   │   └── LM.md
│   │   │   ├── modules
│   │   │   │   ├── BestOfN.md
│   │   │   │   ├── ChainOfThought.md
│   │   │   │   ├── CodeAct.md
│   │   │   │   ├── Module.md
│   │   │   │   ├── MultiChainComparison.md
│   │   │   │   ├── Parallel.md
│   │   │   │   ├── Predict.md
│   │   │   │   ├── ProgramOfThought.md
│   │   │   │   ├── ReAct.md
│   │   │   │   └── Refine.md
│   │   │   ├── optimizers
│   │   │   │   ├── BetterTogether.md
│   │   │   │   ├── BootstrapFewShot.md
│   │   │   │   ├── BootstrapFewShotWithRandomSearch.md
│   │   │   │   ├── BootstrapFinetune.md
│   │   │   │   ├── BootstrapRS.md
│   │   │   │   ├── COPRO.md
│   │   │   │   ├── Ensemble.md
│   │   │   │   ├── GEPA
│   │   │   │   │   ├── GEPA_Advanced.md
│   │   │   │   │   └── overview.md
│   │   │   │   ├── InferRules.md
│   │   │   │   ├── KNN.md
│   │   │   │   ├── KNNFewShot.md
│   │   │   │   ├── LabeledFewShot.md
│   │   │   │   ├── MIPROv2.md
│   │   │   │   └── SIMBA.md
│   │   │   ├── primitives
│   │   │   │   ├── Audio.md
│   │   │   │   ├── Code.md
│   │   │   │   ├── Example.md
│   │   │   │   ├── History.md
│   │   │   │   ├── Image.md
│   │   │   │   ├── Prediction.md
│   │   │   │   ├── Tool.md
│   │   │   │   └── ToolCalls.md
│   │   │   ├── signatures
│   │   │   │   ├── InputField.md
│   │   │   │   ├── OutputField.md
│   │   │   │   └── Signature.md
│   │   │   ├── tools
│   │   │   │   ├── ColBERTv2.md
│   │   │   │   ├── Embeddings.md
│   │   │   │   └── PythonInterpreter.md
│   │   │   └── utils
│   │   │       ├── asyncify.md
│   │   │       ├── configure_cache.md
│   │   │       ├── disable_litellm_logging.md
│   │   │       ├── disable_logging.md
│   │   │       ├── enable_litellm_logging.md
│   │   │       ├── enable_logging.md
│   │   │       ├── inspect_history.md
│   │   │       ├── load.md
│   │   │       ├── StatusMessage.md
│   │   │       ├── StatusMessageProvider.md
│   │   │       ├── streamify.md
│   │   │       └── StreamListener.md
│   │   ├── cheatsheet.md
│   │   ├── community
│   │   │   ├── community-resources.md
│   │   │   ├── how-to-contribute.md
│   │   │   └── use-cases.md
│   │   ├── deep-dive
│   │   │   └── data-handling
│   │   │       ├── built-in-datasets.md
│   │   │       ├── examples.md
│   │   │       ├── img
│   │   │       │   └── data-loading.png
│   │   │       └── loading-custom-data.md
│   │   ├── faqs.md
│   │   ├── index.md
│   │   ├── js
│   │   │   └── runllm-widget.js
│   │   ├── learn
│   │   │   ├── evaluation
│   │   │   │   ├── data.md
│   │   │   │   ├── metrics.md
│   │   │   │   └── overview.md
│   │   │   ├── figures
│   │   │   │   ├── native_tool_call.png
│   │   │   │   └── teleprompter-classes.png
│   │   │   ├── index.md
│   │   │   ├── optimization
│   │   │   │   ├── optimizers.md
│   │   │   │   └── overview.md
│   │   │   └── programming
│   │   │       ├── 7-assertions.md
│   │   │       ├── adapters.md
│   │   │       ├── language_models.md
│   │   │       ├── mcp.md
│   │   │       ├── modules.md
│   │   │       ├── overview.md
│   │   │       ├── signatures.md
│   │   │       └── tools.md
│   │   ├── production
│   │   │   └── index.md
│   │   ├── roadmap.md
│   │   ├── static
│   │   │   ├── .nojekyll
│   │   │   └── img
│   │   │       ├── dspy_logo.png
│   │   │       ├── logo.png
│   │   │       ├── mlflow-tracing-rag.png
│   │   │       ├── modular.png
│   │   │       ├── optimize.png
│   │   │       ├── undraw_docusaurus_mountain.svg
│   │   │       ├── undraw_docusaurus_react.svg
│   │   │       ├── undraw_docusaurus_tree.svg
│   │   │       └── universal_compatibility.png
│   │   ├── stylesheets
│   │   │   └── extra.css
│   │   └── tutorials
│   │       ├── agents
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-agent.png
│   │       ├── ai_text_game
│   │       │   └── index.md
│   │       ├── async
│   │       │   └── index.md
│   │       ├── audio
│   │       │   └── index.ipynb
│   │       ├── build_ai_program
│   │       │   └── index.md
│   │       ├── cache
│   │       │   └── index.md
│   │       ├── classification
│   │       │   └── index.md
│   │       ├── classification_finetuning
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-classification.png
│   │       ├── conversation_history
│   │       │   └── index.md
│   │       ├── core_development
│   │       │   └── index.md
│   │       ├── custom_module
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-custom-module.png
│   │       ├── customer_service_agent
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-customer-service-agent.png
│   │       ├── deployment
│   │       │   ├── dspy_mlflow_ui.png
│   │       │   └── index.md
│   │       ├── email_extraction
│   │       │   ├── index.md
│   │       │   └── mlflow-tracing-email-extraction.png
│   │       ├── entity_extraction
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-entity-extraction.png
│   │       ├── games
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-agent.png
│   │       ├── gepa_ai_program
│   │       │   └── index.md
│   │       ├── gepa_aime
│   │       │   ├── index.ipynb
│   │       │   ├── mlflow-tracing-gepa-aime.png
│   │       │   └── mlflow-tracking-gepa-aime-optimization.png
│   │       ├── gepa_facilitysupportanalyzer
│   │       │   ├── index.ipynb
│   │       │   ├── mlflow-tracing-gepa-support.png
│   │       │   └── mlflow-tracking-gepa-support-optimization.png
│   │       ├── gepa_papillon
│   │       │   ├── index.ipynb
│   │       │   ├── mlflow-tracing-gepa-papilon.png
│   │       │   └── mlflow-tracking-gepa-papilon-optimization.png
│   │       ├── image_generation_prompting
│   │       │   └── index.ipynb
│   │       ├── index.md
│   │       ├── llms_txt_generation
│   │       │   └── index.md
│   │       ├── math
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-math.png
│   │       ├── mcp
│   │       │   └── index.md
│   │       ├── mem0_react_agent
│   │       │   └── index.md
│   │       ├── multihop_search
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-multi-hop.png
│   │       ├── observability
│   │       │   ├── index.md
│   │       │   ├── mlflow_trace_ui_navigation.gif
│   │       │   ├── mlflow_trace_ui.png
│   │       │   └── mlflow_trace_view.png
│   │       ├── optimize_ai_program
│   │       │   └── index.md
│   │       ├── optimizer_tracking
│   │       │   ├── child_run.png
│   │       │   ├── experiment.png
│   │       │   ├── index.md
│   │       │   └── parent_run.png
│   │       ├── output_refinement
│   │       │   └── best-of-n-and-refine.md
│   │       ├── papillon
│   │       │   └── index.md
│   │       ├── program_of_thought
│   │       │   └── index.ipynb
│   │       ├── rag
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-rag.png
│   │       ├── real_world_examples
│   │       │   └── index.md
│   │       ├── rl_ai_program
│   │       │   └── index.md
│   │       ├── rl_multihop
│   │       │   └── index.ipynb
│   │       ├── rl_papillon
│   │       │   └── index.ipynb
│   │       ├── sample_code_generation
│   │       │   └── index.md
│   │       ├── saving
│   │       │   └── index.md
│   │       ├── streaming
│   │       │   └── index.md
│   │       ├── tool_use
│   │       │   └── index.ipynb
│   │       └── yahoo_finance_react
│   │           └── index.md
│   ├── mkdocs.yml
│   ├── overrides
│   │   ├── home.html
│   │   ├── main.html
│   │   └── partials
│   │       └── tabs.html
│   ├── Pipfile
│   ├── Pipfile.lock
│   ├── README.md
│   ├── requirements.txt
│   ├── scripts
│   │   ├── generate_api_docs.py
│   │   └── generate_api_summary.py
│   └── vercel.json
├── dspy
│   ├── __init__.py
│   ├── __metadata__.py
│   ├── adapters
│   │   ├── __init__.py
│   │   ├── baml_adapter.py
│   │   ├── base.py
│   │   ├── chat_adapter.py
│   │   ├── json_adapter.py
│   │   ├── two_step_adapter.py
│   │   ├── types
│   │   │   ├── __init__.py
│   │   │   ├── audio.py
│   │   │   ├── base_type.py
│   │   │   ├── citation.py
│   │   │   ├── code.py
│   │   │   ├── document.py
│   │   │   ├── history.py
│   │   │   ├── image.py
│   │   │   └── tool.py
│   │   ├── utils.py
│   │   └── xml_adapter.py
│   ├── clients
│   │   ├── __init__.py
│   │   ├── base_lm.py
│   │   ├── cache.py
│   │   ├── databricks.py
│   │   ├── embedding.py
│   │   ├── lm_local_arbor.py
│   │   ├── lm_local.py
│   │   ├── lm.py
│   │   ├── openai.py
│   │   ├── provider.py
│   │   └── utils_finetune.py
│   ├── datasets
│   │   ├── __init__.py
│   │   ├── alfworld
│   │   │   ├── __init__.py
│   │   │   ├── alfworld.py
│   │   │   └── base_config.yml
│   │   ├── colors.py
│   │   ├── dataloader.py
│   │   ├── dataset.py
│   │   ├── gsm8k.py
│   │   ├── hotpotqa.py
│   │   └── math.py
│   ├── dsp
│   │   ├── __init__.py
│   │   ├── colbertv2.py
│   │   └── utils
│   │       ├── __init__.py
│   │       ├── dpr.py
│   │       ├── settings.py
│   │       └── utils.py
│   ├── evaluate
│   │   ├── __init__.py
│   │   ├── auto_evaluation.py
│   │   ├── evaluate.py
│   │   └── metrics.py
│   ├── experimental
│   │   └── __init__.py
│   ├── predict
│   │   ├── __init__.py
│   │   ├── aggregation.py
│   │   ├── avatar
│   │   │   ├── __init__.py
│   │   │   ├── avatar.py
│   │   │   ├── models.py
│   │   │   └── signatures.py
│   │   ├── best_of_n.py
│   │   ├── chain_of_thought.py
│   │   ├── code_act.py
│   │   ├── knn.py
│   │   ├── multi_chain_comparison.py
│   │   ├── parallel.py
│   │   ├── parameter.py
│   │   ├── predict.py
│   │   ├── program_of_thought.py
│   │   ├── react.py
│   │   ├── refine.py
│   │   └── retry.py
│   ├── primitives
│   │   ├── __init__.py
│   │   ├── base_module.py
│   │   ├── example.py
│   │   ├── module.py
│   │   ├── prediction.py
│   │   ├── python_interpreter.py
│   │   └── runner.js
│   ├── propose
│   │   ├── __init__.py
│   │   ├── dataset_summary_generator.py
│   │   ├── grounded_proposer.py
│   │   ├── propose_base.py
│   │   └── utils.py
│   ├── retrievers
│   │   ├── __init__.py
│   │   ├── databricks_rm.py
│   │   ├── embeddings.py
│   │   ├── retrieve.py
│   │   └── weaviate_rm.py
│   ├── signatures
│   │   ├── __init__.py
│   │   ├── field.py
│   │   ├── signature.py
│   │   └── utils.py
│   ├── streaming
│   │   ├── __init__.py
│   │   ├── messages.py
│   │   ├── streamify.py
│   │   └── streaming_listener.py
│   ├── teleprompt
│   │   ├── __init__.py
│   │   ├── avatar_optimizer.py
│   │   ├── bettertogether.py
│   │   ├── bootstrap_finetune.py
│   │   ├── bootstrap_trace.py
│   │   ├── bootstrap.py
│   │   ├── copro_optimizer.py
│   │   ├── ensemble.py
│   │   ├── gepa
│   │   │   ├── __init__.py
│   │   │   ├── gepa_utils.py
│   │   │   ├── gepa.py
│   │   │   └── instruction_proposal.py
│   │   ├── grpo.py
│   │   ├── infer_rules.py
│   │   ├── knn_fewshot.py
│   │   ├── mipro_optimizer_v2.py
│   │   ├── random_search.py
│   │   ├── signature_opt.py
│   │   ├── simba_utils.py
│   │   ├── simba.py
│   │   ├── teleprompt_optuna.py
│   │   ├── teleprompt.py
│   │   ├── utils.py
│   │   └── vanilla.py
│   └── utils
│       ├── __init__.py
│       ├── annotation.py
│       ├── asyncify.py
│       ├── caching.py
│       ├── callback.py
│       ├── dummies.py
│       ├── exceptions.py
│       ├── hasher.py
│       ├── inspect_history.py
│       ├── langchain_tool.py
│       ├── logging_utils.py
│       ├── mcp.py
│       ├── parallelizer.py
│       ├── saving.py
│       ├── syncify.py
│       ├── unbatchify.py
│       └── usage_tracker.py
├── LICENSE
├── pyproject.toml
├── README.md
├── tests
│   ├── __init__.py
│   ├── adapters
│   │   ├── test_adapter_utils.py
│   │   ├── test_baml_adapter.py
│   │   ├── test_base_type.py
│   │   ├── test_chat_adapter.py
│   │   ├── test_citation.py
│   │   ├── test_code.py
│   │   ├── test_document.py
│   │   ├── test_json_adapter.py
│   │   ├── test_tool.py
│   │   ├── test_two_step_adapter.py
│   │   └── test_xml_adapter.py
│   ├── callback
│   │   └── test_callback.py
│   ├── clients
│   │   ├── test_cache.py
│   │   ├── test_databricks.py
│   │   ├── test_embedding.py
│   │   ├── test_inspect_global_history.py
│   │   └── test_lm.py
│   ├── conftest.py
│   ├── datasets
│   │   └── test_dataset.py
│   ├── docs
│   │   └── test_mkdocs_links.py
│   ├── evaluate
│   │   ├── test_evaluate.py
│   │   └── test_metrics.py
│   ├── examples
│   │   └── test_baleen.py
│   ├── metadata
│   │   └── test_metadata.py
│   ├── predict
│   │   ├── test_aggregation.py
│   │   ├── test_best_of_n.py
│   │   ├── test_chain_of_thought.py
│   │   ├── test_code_act.py
│   │   ├── test_knn.py
│   │   ├── test_multi_chain_comparison.py
│   │   ├── test_parallel.py
│   │   ├── test_predict.py
│   │   ├── test_program_of_thought.py
│   │   ├── test_react.py
│   │   ├── test_refine.py
│   │   └── test_retry.py
│   ├── primitives
│   │   ├── resources
│   │   │   └── saved_program.json
│   │   ├── test_base_module.py
│   │   ├── test_example.py
│   │   ├── test_module.py
│   │   └── test_python_interpreter.py
│   ├── propose
│   │   └── test_grounded_proposer.py
│   ├── README.md
│   ├── reliability
│   │   ├── __init__.py
│   │   ├── complex_types
│   │   │   └── generated
│   │   │       ├── test_many_types_1
│   │   │       │   ├── inputs
│   │   │       │   │   ├── input1.json
│   │   │       │   │   └── input2.json
│   │   │       │   ├── program.py
│   │   │       │   └── schema.json
│   │   │       ├── test_nesting_1
│   │   │       │   ├── inputs
│   │   │       │   │   ├── input1.json
│   │   │       │   │   └── input2.json
│   │   │       │   ├── program.py
│   │   │       │   └── schema.json
│   │   │       └── test_nesting_2
│   │   │           ├── inputs
│   │   │           │   └── input1.json
│   │   │           ├── program.py
│   │   │           └── schema.json
│   │   ├── conftest.py
│   │   ├── generate
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   └── utils.py
│   │   ├── input_formats
│   │   │   └── generated
│   │   │       └── test_markdown_1
│   │   │           ├── inputs
│   │   │           │   ├── input1.json
│   │   │           │   └── input2.json
│   │   │           ├── program.py
│   │   │           └── schema.json
│   │   ├── README.md
│   │   ├── reliability_conf.yaml
│   │   ├── test_generated.py
│   │   ├── test_pydantic_models.py
│   │   └── utils.py
│   ├── retrievers
│   │   └── test_embeddings.py
│   ├── signatures
│   │   ├── test_adapter_image.py
│   │   ├── test_custom_types.py
│   │   └── test_signature.py
│   ├── streaming
│   │   └── test_streaming.py
│   ├── teleprompt
│   │   ├── gepa_dummy_lm_custom_component_selector_custom_instruction_proposer.json
│   │   ├── gepa_dummy_lm.json
│   │   ├── test_bootstrap_finetune.py
│   │   ├── test_bootstrap_trace.py
│   │   ├── test_bootstrap.py
│   │   ├── test_copro_optimizer.py
│   │   ├── test_ensemble.py
│   │   ├── test_finetune.py
│   │   ├── test_gepa_instruction_proposer.py
│   │   ├── test_gepa.py
│   │   ├── test_grpo.py
│   │   ├── test_knn_fewshot.py
│   │   ├── test_random_search.py
│   │   ├── test_teleprompt.py
│   │   └── test_utils.py
│   ├── test_utils
│   │   ├── __init__.py
│   │   └── server
│   │       ├── __init__.py
│   │       ├── litellm_server_config.yaml
│   │       └── litellm_server.py
│   └── utils
│       ├── __init__.py
│       ├── resources
│       │   └── mcp_server.py
│       ├── test_annotation.py
│       ├── test_asyncify.py
│       ├── test_exceptions.py
│       ├── test_langchain_tool.py
│       ├── test_mcp.py
│       ├── test_parallelizer.py
│       ├── test_saving.py
│       ├── test_settings.py
│       ├── test_syncify.py
│       ├── test_unbatchify.py
│       └── test_usage_tracker.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/tests/utils/test_syncify.py:
--------------------------------------------------------------------------------

```python
 1 | import asyncio
 2 | 
 3 | import dspy
 4 | 
 5 | 
 6 | def test_syncify_in_place():
 7 |     class MyProgram(dspy.Module):
 8 |         async def aforward(self, x: int) -> int:
 9 |             await asyncio.sleep(0.01)
10 |             return x + 1
11 | 
12 |     sync_program = dspy.syncify(MyProgram())
13 |     assert sync_program(1) == 2
14 |     assert sync_program(2) == 3
15 | 
16 | 
17 | def test_syncify_with_wrapper():
18 |     class MyProgram(dspy.Module):
19 |         async def aforward(self, x: int) -> int:
20 |             await asyncio.sleep(0.01)
21 |             return x + 1
22 | 
23 |     sync_program = dspy.syncify(MyProgram(), in_place=False)
24 |     assert sync_program(1) == 2
25 |     assert sync_program(2) == 3
26 | 
27 | 
28 | def test_syncify_works_with_optimizers():
29 |     class MyProgram(dspy.Module):
30 |         def __init__(self):
31 |             self.predict = dspy.Predict("question->answer")
32 | 
33 |         async def aforward(self, question: str):
34 |             return await self.predict.acall(question=question)
35 | 
36 |     async_program = MyProgram()
37 | 
38 |     def dummy_metric(gold, pred, traces=None):
39 |         return True
40 | 
41 |     # We only test the optimizer completes without errors, so the LM response doesn't matter.
42 |     lm = dspy.utils.DummyLM([{"answer": "dummy"} for _ in range(100)])
43 |     dspy.configure(lm=lm)
44 | 
45 |     dataset = [dspy.Example(question="question", answer="answer").with_inputs("question") for _ in range(10)]
46 | 
47 |     optimizer = dspy.BootstrapFewShot(metric=dummy_metric, max_bootstrapped_demos=2, max_labeled_demos=0)
48 | 
49 |     # Test syncify in place
50 |     sync_program = dspy.syncify(async_program, in_place=True)
51 |     optimized_program = optimizer.compile(sync_program, trainset=dataset)
52 |     assert len(optimized_program.predictors()[0].demos) == 2
53 | 
54 |     # Test syncify with wrapper
55 |     sync_program = dspy.syncify(async_program, in_place=False)
56 |     optimized_program = optimizer.compile(sync_program, trainset=dataset)
57 |     assert len(optimized_program.predictors()[0].demos) == 2
58 | 
```

--------------------------------------------------------------------------------
/dspy/predict/multi_chain_comparison.py:
--------------------------------------------------------------------------------

```python
 1 | from dspy.predict.predict import Predict
 2 | from dspy.primitives.module import Module
 3 | from dspy.signatures import InputField, OutputField
 4 | from dspy.signatures.signature import ensure_signature
 5 | 
 6 | 
 7 | class MultiChainComparison(Module):
 8 |     def __init__(self, signature, M=3, temperature=0.7, **config):  # noqa: N803
 9 |         super().__init__()
10 | 
11 |         self.M = M
12 |         signature = ensure_signature(signature)
13 | 
14 |         *_, self.last_key = signature.output_fields.keys()
15 | 
16 |         for idx in range(M):
17 |             signature = signature.append(
18 |                 f"reasoning_attempt_{idx+1}",
19 |                 InputField(
20 |                     prefix=f"Student Attempt #{idx+1}:",
21 |                     desc="${reasoning attempt}",
22 |                 ),
23 |             )
24 | 
25 |         signature = signature.prepend(
26 |             "rationale",
27 |             OutputField(
28 |                 prefix="Accurate Reasoning: Thank you everyone. Let's now holistically",
29 |                 desc="${corrected reasoning}",
30 |             ),
31 |         )
32 | 
33 |         self.predict = Predict(signature, temperature=temperature, **config)
34 | 
35 |     def forward(self, completions, **kwargs):
36 |         attempts = []
37 | 
38 |         for c in completions:
39 |             rationale = c.get("rationale", c.get("reasoning")).strip().split("\n")[0].strip()
40 |             answer = str(c[self.last_key]).strip().split("\n")[0].strip()
41 |             attempts.append(
42 |                 f"«I'm trying to {rationale} I'm not sure but my prediction is {answer}»",
43 |             )
44 | 
45 |         assert (
46 |             len(attempts) == self.M
47 |         ), f"The number of attempts ({len(attempts)}) doesn't match the expected number M ({self.M}). Please set the correct value for M when initializing MultiChainComparison."
48 | 
49 |         kwargs = {
50 |             **{f"reasoning_attempt_{idx+1}": attempt for idx, attempt in enumerate(attempts)},
51 |             **kwargs,
52 |         }
53 |         return self.predict(**kwargs)
54 | 
```

--------------------------------------------------------------------------------
/dspy/datasets/math.py:
--------------------------------------------------------------------------------

```python
 1 | import random
 2 | import re
 3 | 
 4 | 
 5 | class MATH:
 6 |     def __init__(self, subset):
 7 |         from datasets import load_dataset
 8 | 
 9 |         import dspy
10 | 
11 |         ds = load_dataset("DigitalLearningGmbH/MATH-lighteval", subset)
12 | 
13 |         # NOTE: Defaults to sub-splitting MATH's 'test' split into train/dev/test, presuming that current
14 |         # LMs are trained on MATH's train. Makes no difference for gpt-4o-mini, but might for other models.
15 | 
16 |         dataset = [
17 |             dspy.Example(
18 |                 question=example["problem"], reasoning=example["solution"], answer=extract_answer(example["solution"])
19 |             ).with_inputs("question")
20 |             for example in ds["test"]
21 |         ]
22 | 
23 |         size = min(350, len(dataset) // 3)
24 |         random.Random(0).shuffle(dataset)
25 |         self.train, self.dev, self.test = dataset[:size], dataset[size : 2 * size], dataset[2 * size :]
26 | 
27 |     def metric(self, example, pred, trace=None):
28 |         try:
29 |             import math_equivalence
30 |         except ImportError:
31 |             raise ImportError("MATH's metric requires `pip install git+https://github.com/hendrycks/math.git`")
32 | 
33 |         return math_equivalence.is_equiv(example.answer, pred.answer)
34 | 
35 | 
36 | def extract_answer(s):
37 |     start = s.find("\\boxed{")
38 |     if start == -1:
39 |         return None
40 | 
41 |     idx = start + len("\\boxed{")
42 |     brace_level = 1
43 | 
44 |     answer = ""
45 |     while idx < len(s) and brace_level > 0:
46 |         c = s[idx]
47 |         if c == "{":
48 |             brace_level += 1
49 |         elif c == "}":
50 |             brace_level -= 1
51 |             if brace_level == 0:
52 |                 break
53 |         answer += c
54 |         idx += 1
55 | 
56 |     answer = re.sub(r"\\text\{[^}]*\}", "", answer)
57 |     answer = re.sub(r"\\!", "", answer)
58 |     return answer.strip()
59 | 
60 | 
61 | """
62 | NOTE: MATH's official math_equivalence.is_equiv does not seem to have perfect recall.
63 | Consider its behavior on reference values like `left[\frac{1}{2}, \frac{4}{3}\right]`.
64 | """
65 | 
```

--------------------------------------------------------------------------------
/tests/teleprompt/test_ensemble.py:
--------------------------------------------------------------------------------

```python
 1 | import pytest
 2 | 
 3 | import dspy
 4 | from dspy.teleprompt import Ensemble
 5 | 
 6 | 
 7 | class MockProgram(dspy.Module):
 8 |     def __init__(self, output):
 9 |         super().__init__()
10 |         self.output = output
11 | 
12 |     def forward(self, *args, **kwargs):
13 |         return self.output
14 | 
15 | 
16 | # Simple reduction function to test with
17 | def mock_reduce_fn(outputs):
18 |     return sum(outputs) / len(outputs)
19 | 
20 | 
21 | def test_ensemble_without_reduction():
22 |     """Test that Ensemble correctly combines outputs without applying a reduce_fn."""
23 |     programs = [MockProgram(i) for i in range(5)]
24 |     ensemble = Ensemble()
25 |     ensembled_program = ensemble.compile(programs)
26 | 
27 |     outputs = ensembled_program()
28 |     assert len(outputs) == 5, "Ensemble did not combine the correct number of outputs"
29 | 
30 | 
31 | def test_ensemble_with_reduction():
32 |     """Test that Ensemble correctly applies a reduce_fn to combine outputs."""
33 |     programs = [MockProgram(i) for i in range(5)]
34 |     ensemble = Ensemble(reduce_fn=mock_reduce_fn)
35 |     ensembled_program = ensemble.compile(programs)
36 | 
37 |     output = ensembled_program()
38 |     expected_output = sum(range(5)) / 5
39 |     assert output == expected_output, "Ensemble did not correctly apply the reduce_fn"
40 | 
41 | 
42 | def test_ensemble_with_size_limitation():
43 |     """Test that specifying a size limits the number of programs used in the ensemble."""
44 |     programs = [MockProgram(i) for i in range(10)]
45 |     ensemble_size = 3
46 |     ensemble = Ensemble(size=ensemble_size)
47 |     ensembled_program = ensemble.compile(programs)
48 | 
49 |     outputs = ensembled_program()
50 |     assert len(outputs) == ensemble_size, "Ensemble did not respect the specified size limitation"
51 | 
52 | 
53 | def test_ensemble_deterministic_behavior():
54 |     """Verify that the Ensemble class raises an assertion for deterministic behavior."""
55 |     with pytest.raises(
56 |         AssertionError,
57 |         match="TODO: Implement example hashing for deterministic ensemble.",
58 |     ):
59 |         Ensemble(deterministic=True)
60 | 
```

--------------------------------------------------------------------------------
/tests/teleprompt/test_grpo.py:
--------------------------------------------------------------------------------

```python
 1 | from dspy.teleprompt.grpo import GRPO
 2 | 
 3 | 
 4 | def test_grpo_dataset_shuffler():
 5 |     dataset = [1, 2, 3]
 6 |     grpo = GRPO(
 7 |         num_dspy_examples_per_grpo_step=3,
 8 |         exclude_demos=True,
 9 |     )
10 | 
11 |     trainset_instances = []
12 |     for i in range(4):
13 |         trainset_instances.append(grpo.select_training_sample_and_update_shuffled_trainset(dataset, i))
14 |         assert len(trainset_instances[-1]) == 3
15 |         assert set(trainset_instances[-1]) == set(dataset)
16 | 
17 | 
18 | def test_grpo_dataset_shuffler_with_num_ex_per_step_less_dataset():
19 |     dataset = [1, 2, 3]
20 |     grpo = GRPO(
21 |         num_dspy_examples_per_grpo_step=2,
22 |         exclude_demos=True,
23 |     )
24 | 
25 |     trainset_instances = []
26 |     for i in range(15):
27 |         trainset_instances.append(grpo.select_training_sample_and_update_shuffled_trainset(dataset, i))
28 |         assert len(trainset_instances[-1]) == 2
29 | 
30 |     from collections import Counter
31 | 
32 |     counter = Counter()
33 |     for instance in trainset_instances:
34 |         counter.update(instance)
35 | 
36 |     assert len(counter) == 3
37 |     for i in counter:
38 |         assert counter[i] == 10
39 | 
40 | 
41 | def test_grpo_dataset_shuffler_with_num_ex_per_step_greater_dataset():
42 |     dataset = [1, 2, 3]
43 |     grpo = GRPO(
44 |         num_dspy_examples_per_grpo_step=5,
45 |         exclude_demos=True,
46 |     )
47 | 
48 |     trainset_instances = []
49 |     for i in range(6):
50 |         trainset_instances.append(grpo.select_training_sample_and_update_shuffled_trainset(dataset, i))
51 |         assert len(trainset_instances[-1]) == 5
52 | 
53 |     from collections import Counter
54 | 
55 |     counter = Counter()
56 |     for instance in trainset_instances:
57 |         counter.update(instance)
58 | 
59 |     assert len(counter) == 3
60 |     for i in counter:
61 |         assert counter[i] == 10
62 | 
63 | 
64 | if __name__ == "__main__":
65 |     test_grpo_dataset_shuffler()
66 |     test_grpo_dataset_shuffler_with_num_ex_per_step_less_dataset()
67 |     test_grpo_dataset_shuffler_with_num_ex_per_step_greater_dataset()
68 |     print("All tests passed!")
69 | 
```

--------------------------------------------------------------------------------
/dspy/predict/knn.py:
--------------------------------------------------------------------------------

```python
 1 | import numpy as np
 2 | 
 3 | from dspy.clients import Embedder
 4 | from dspy.primitives import Example
 5 | 
 6 | 
 7 | class KNN:
 8 |     def __init__(self, k: int, trainset: list[Example], vectorizer: Embedder):
 9 |         """
10 |         A k-nearest neighbors retriever that finds similar examples from a training set.
11 | 
12 |         Args:
13 |             k: Number of nearest neighbors to retrieve
14 |             trainset: List of training examples to search through
15 |             vectorizer: The `Embedder` to use for vectorization
16 | 
17 |         Example:
18 |             ```python
19 |             import dspy
20 |             from sentence_transformers import SentenceTransformer
21 | 
22 |             # Create a training dataset with examples
23 |             trainset = [
24 |                 dspy.Example(input="hello", output="world"),
25 |                 # ... more examples ...
26 |             ]
27 | 
28 |             # Initialize KNN with a sentence transformer model
29 |             knn = KNN(
30 |                 k=3,
31 |                 trainset=trainset,
32 |                 vectorizer=dspy.Embedder(SentenceTransformer("all-MiniLM-L6-v2").encode)
33 |             )
34 | 
35 |             # Find similar examples
36 |             similar_examples = knn(input="hello")
37 |             ```
38 |         """
39 |         self.k = k
40 |         self.trainset = trainset
41 |         self.embedding = vectorizer
42 |         trainset_casted_to_vectorize = [
43 |             " | ".join([f"{key}: {value}" for key, value in example.items() if key in example._input_keys])
44 |             for example in self.trainset
45 |         ]
46 |         self.trainset_vectors = self.embedding(trainset_casted_to_vectorize).astype(np.float32)
47 | 
48 |     def __call__(self, **kwargs) -> list:
49 |         input_example_vector = self.embedding([" | ".join([f"{key}: {val}" for key, val in kwargs.items()])])
50 |         scores = np.dot(self.trainset_vectors, input_example_vector.T).squeeze()
51 |         nearest_samples_idxs = scores.argsort()[-self.k :][::-1]
52 |         return [self.trainset[cur_idx] for cur_idx in nearest_samples_idxs]
53 | 
```

--------------------------------------------------------------------------------
/dspy/utils/syncify.py:
--------------------------------------------------------------------------------

```python
 1 | import asyncio
 2 | from types import MethodType
 3 | from typing import TYPE_CHECKING
 4 | 
 5 | if TYPE_CHECKING:
 6 |     from dspy.primitives.module import Module
 7 | 
 8 | 
 9 | def run_async(coro):
10 |     """Run an async coroutine from a synchronous context."""
11 |     try:
12 |         loop = asyncio.get_running_loop()
13 |     except RuntimeError:
14 |         loop = None
15 | 
16 |     if loop and loop.is_running():
17 |         # If we're in a running event loop (e.g., Jupyter), use asyncio.create_task and run until done
18 |         import nest_asyncio
19 | 
20 |         nest_asyncio.apply()
21 |         return asyncio.get_event_loop().run_until_complete(coro)
22 |     else:
23 |         return asyncio.run(coro)
24 | 
25 | 
26 | def syncify(program: "Module", in_place: bool = True) -> "Module":
27 |     """Convert an async DSPy module to a sync program.
28 | 
29 |     There are two modes of this function:
30 | 
31 |     - `in_place=True` (recommended): Modify the module in place. But this may not work if you already have a `forward`
32 |         method which does different things from `aforward`.
33 |     - `in_place=False`: Return a wrapper module. This changes the module's architecture, but it's more robust.
34 | 
35 |     Args:
36 |         program: The async program to convert, must have an `aforward` method implemented.
37 |         in_place: If True, modify the module in place. Otherwise, return a wrapper module.
38 | 
39 |     Returns:
40 |         The sync program, which has a `forward` method that can be called from a synchronous context.
41 |     """
42 |     if in_place:
43 | 
44 |         def forward(self, *args, **kwargs):
45 |             return run_async(self.aforward(*args, **kwargs))
46 | 
47 |         # Create the `forward` method in place.
48 |         program.forward = MethodType(forward, program)
49 |         return program
50 |     else:
51 |         from dspy.primitives.module import Module
52 | 
53 |         class SyncWrapper(Module):
54 |             def __init__(self, program: "Module"):
55 |                 self.program = program
56 | 
57 |             def forward(self, *args, **kwargs):
58 |                 return run_async(self.program.aforward(*args, **kwargs))
59 | 
60 |         return SyncWrapper(program)
61 | 
```

--------------------------------------------------------------------------------
/tests/utils/test_asyncify.py:
--------------------------------------------------------------------------------

```python
 1 | import asyncio
 2 | import math
 3 | from time import sleep, time
 4 | 
 5 | import pytest
 6 | 
 7 | import dspy
 8 | from dspy.utils.asyncify import get_limiter
 9 | 
10 | 
11 | @pytest.mark.anyio
12 | async def test_async_limiter():
13 |     limiter = get_limiter()
14 |     assert limiter.total_tokens == 8, "Default async capacity should be 8"
15 |     assert get_limiter() == limiter, "AsyncLimiter should be a singleton"
16 | 
17 |     with dspy.context(async_max_workers=16):
18 |         assert get_limiter() == limiter, "AsyncLimiter should be a singleton"
19 |         assert get_limiter().total_tokens == 16, "Async capacity should be 16"
20 |         assert get_limiter() == get_limiter(), "AsyncLimiter should be a singleton"
21 | 
22 | 
23 | @pytest.mark.anyio
24 | async def test_asyncify():
25 |     def the_answer_to_life_the_universe_and_everything(wait: float):
26 |         sleep(wait)
27 |         return 42
28 | 
29 |     ask_the_question = dspy.asyncify(the_answer_to_life_the_universe_and_everything)
30 | 
31 |     async def run_n_tasks(n: int, wait: float):
32 |         await asyncio.gather(*[ask_the_question(wait) for _ in range(n)])
33 | 
34 |     async def verify_asyncify(capacity: int, number_of_tasks: int, wait: float = 0.5):
35 |         with dspy.context(async_max_workers=capacity):
36 |             start = time()
37 |             await run_n_tasks(number_of_tasks, wait)
38 |             end = time()
39 |             total_time = end - start
40 | 
41 |         # If asyncify is working correctly, the total time should be less than the total number of loops
42 |         # `(number_of_tasks / capacity)` times wait time, plus the computational overhead. The lower bound should
43 |         # be `math.floor(number_of_tasks * 1.0 / capacity) * wait` because there are more than
44 |         # `math.floor(number_of_tasks * 1.0 / capacity)` loops.
45 |         lower_bound = math.floor(number_of_tasks * 1.0 / capacity) * wait
46 |         upper_bound = math.ceil(number_of_tasks * 1.0 / capacity) * wait + 2 * wait  # 2*wait for buffer
47 | 
48 |         assert lower_bound < total_time < upper_bound
49 | 
50 |     await verify_asyncify(4, 10)
51 |     await verify_asyncify(8, 15)
52 |     await verify_asyncify(8, 30)
53 | 
```

--------------------------------------------------------------------------------
/tests/reliability/input_formats/generated/test_markdown_1/inputs/input1.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "assertions": [
 3 |     "Each top-level heading (indicated by `#`) should appear as a top-level entry in the TOC.",
 4 |     "Each second-level heading (indicated by `##`) should be nested under the appropriate top-level heading in the TOC.",
 5 |     "Each third-level heading (indicated by `###`) should be nested under the appropriate second-level heading in the TOC.",
 6 |     "Each entry in the TOC should be linked to the corresponding section in the document, using markdown link syntax."
 7 |   ],
 8 |   "input": {
 9 |     "markdown_content": "# The American Space Program\n\nThe American space program has a rich history of exploration and discovery.\n\n## Early Beginnings\n\nThe journey began in the late 1950s with the launch of the first artificial satellite.\n\n### The Space Race\n\nThe competition between the United States and the Soviet Union led to rapid advancements in space technology.\n\n## Moon Landing\n\nIn 1969, NASA successfully landed the first humans on the moon.\n\n### Apollo Missions\n\nThe Apollo missions were a series of spaceflights that landed humans on the moon and brought them back safely.\n\n## Space Shuttle Era\n\nThe development of the Space Shuttle program marked a new era in space exploration.\n\n### Reusable Spacecraft\n\nThe Space Shuttle was the first reusable spacecraft, capable of multiple missions.\n\n## International Space Station\n\nThe International Space Station (ISS) is a collaborative effort between multiple countries.\n\n### Living in Space\n\nAstronauts live and work on the ISS for extended periods, conducting scientific research.\n\n## Future Missions\n\nNASA continues to plan for future missions to Mars and beyond.\n\n### Mars Exploration\n\nExploration of Mars is a key objective for NASA's future missions.\n\n### Beyond Mars\n\nThe ultimate goal is to explore beyond Mars and into the outer reaches of the solar system.\n\n## Conclusion\n\nThe American space program has achieved many milestones and continues to push the boundaries of space exploration."
10 |   }
11 | }
12 | 
```

--------------------------------------------------------------------------------
/tests/predict/test_knn.py:
--------------------------------------------------------------------------------

```python
 1 | import numpy as np
 2 | import pytest
 3 | 
 4 | import dspy
 5 | from dspy.predict import KNN
 6 | from dspy.utils import DummyVectorizer
 7 | 
 8 | 
 9 | def mock_example(question: str, answer: str) -> dspy.Example:
10 |     """Creates a mock DSP example with specified question and answer."""
11 |     return dspy.Example(question=question, answer=answer).with_inputs("question")
12 | 
13 | 
14 | @pytest.fixture
15 | def setup_knn() -> KNN:
16 |     """Sets up a KNN instance with a mocked vectorizer for testing."""
17 |     trainset = [
18 |         mock_example("What is the capital of France?", "Paris"),
19 |         mock_example("What is the largest ocean?", "Pacific"),
20 |         mock_example("What is 2+2?", "4"),
21 |     ]
22 |     return KNN(k=2, trainset=trainset, vectorizer=dspy.Embedder(DummyVectorizer()))
23 | 
24 | 
25 | def test_knn_initialization(setup_knn):
26 |     """Tests the KNN initialization and checks if the trainset vectors are correctly created."""
27 |     knn = setup_knn
28 |     assert knn.k == 2, "Incorrect k value"
29 |     assert len(knn.trainset_vectors) == 3, "Incorrect size of trainset vectors"
30 |     assert isinstance(knn.trainset_vectors, np.ndarray), "Trainset vectors should be a NumPy array"
31 | 
32 | 
33 | def test_knn_query(setup_knn):
34 |     """Tests the KNN query functionality for retrieving the nearest neighbors."""
35 |     knn = setup_knn
36 |     query = {"question": "What is 3+3?"}  # A query close to "What is 2+2?"
37 |     nearest_samples = knn(**query)
38 |     assert len(nearest_samples) == 2, "Incorrect number of nearest samples returned"
39 |     assert nearest_samples[0].answer == "4", "Incorrect nearest sample returned"
40 | 
41 | 
42 | def test_knn_query_specificity(setup_knn):
43 |     """Tests the KNN query functionality for specificity of returned examples."""
44 |     knn = setup_knn
45 |     query = {"question": "What is the capital of Germany?"}  # A query close to "What is the capital of France?"
46 |     nearest_samples = knn(**query)
47 |     assert len(nearest_samples) == 2, "Incorrect number of nearest samples returned"
48 |     assert "Paris" in [sample.answer for sample in nearest_samples], "Expected Paris to be a nearest sample answer"
49 | 
```

--------------------------------------------------------------------------------
/dspy/retrievers/retrieve.py:
--------------------------------------------------------------------------------

```python
 1 | import random
 2 | 
 3 | from dspy.predict.parameter import Parameter
 4 | from dspy.primitives.prediction import Prediction
 5 | from dspy.utils.callback import with_callbacks
 6 | 
 7 | 
 8 | def single_query_passage(passages):
 9 |     passages_dict = {key: [] for key in list(passages[0].keys())}
10 |     for docs in passages:
11 |         for key, value in docs.items():
12 |             passages_dict[key].append(value)
13 |     if "long_text" in passages_dict:
14 |         passages_dict["passages"] = passages_dict.pop("long_text")
15 |     return Prediction(**passages_dict)
16 | 
17 | 
18 | class Retrieve(Parameter):
19 |     name = "Search"
20 |     input_variable = "query"
21 |     desc = "takes a search query and returns one or more potentially relevant passages from a corpus"
22 | 
23 |     def __init__(self, k=3, callbacks=None):
24 |         self.stage = random.randbytes(8).hex()
25 |         self.k = k
26 |         self.callbacks = callbacks or []
27 | 
28 |     def reset(self):
29 |         pass
30 | 
31 |     def dump_state(self):
32 |         state_keys = ["k"]
33 |         return {k: getattr(self, k) for k in state_keys}
34 | 
35 |     def load_state(self, state):
36 |         for name, value in state.items():
37 |             setattr(self, name, value)
38 | 
39 |     @with_callbacks
40 |     def __call__(self, *args, **kwargs):
41 |         return self.forward(*args, **kwargs)
42 | 
43 |     def forward(
44 |         self,
45 |         query: str,
46 |         k: int | None = None,
47 |         **kwargs,
48 |     ) -> list[str] | Prediction | list[Prediction]:
49 |         k = k if k is not None else self.k
50 | 
51 |         import dspy
52 | 
53 |         if not dspy.settings.rm:
54 |             raise AssertionError("No RM is loaded.")
55 | 
56 |         passages = dspy.settings.rm(query, k=k, **kwargs)
57 | 
58 |         from collections.abc import Iterable
59 |         if not isinstance(passages, Iterable):
60 |             # it's not an iterable yet; make it one.
61 |             # TODO: we should unify the type signatures of dspy.Retriever
62 |             passages = [passages]
63 |         passages = [psg.long_text for psg in passages]
64 | 
65 |         return Prediction(passages=passages)
66 | 
67 | # TODO: Consider doing Prediction.from_completions with the individual sets of passages (per query) too.
68 | 
```

--------------------------------------------------------------------------------
/tests/propose/test_grounded_proposer.py:
--------------------------------------------------------------------------------

```python
 1 | import pytest
 2 | 
 3 | import dspy
 4 | from dspy.predict import Predict
 5 | from dspy.propose.grounded_proposer import GroundedProposer
 6 | from dspy.utils.dummies import DummyLM
 7 | 
 8 | 
 9 | @pytest.mark.parametrize(
10 |     "demo_candidates",
11 |     [
12 |         None,
13 |         [[[dspy.Example(question="What is the capital of France?", answer="Paris")]]],
14 |     ],
15 | )
16 | def test_propose_instructions_for_program(demo_candidates):
17 |     # Set large number here so that lm always returns the same response
18 |     prompt_model = DummyLM([{"proposed_instruction": "instruction"}] * 10)
19 |     program = Predict("question -> answer")
20 |     trainset = []
21 | 
22 |     proposer = GroundedProposer(prompt_model=prompt_model, program=program, trainset=trainset, verbose=False)
23 |     result = proposer.propose_instructions_for_program(
24 |         trainset=trainset, program=program, demo_candidates=demo_candidates, trial_logs={}, N=1
25 |     )
26 |     assert isinstance(result, dict)
27 |     assert len(result) == len(program.predictors())
28 |     for pred_instructions in result.values():
29 |         assert pred_instructions == ["instruction"]
30 | 
31 | 
32 | @pytest.mark.parametrize(
33 |     "demo_candidates",
34 |     [
35 |         None,
36 |         [[[dspy.Example(question="What is the capital of France?", answer="Paris")]]],
37 |     ],
38 | )
39 | def test_propose_instruction_for_predictor(demo_candidates):
40 |     class TrackingDummyLM(DummyLM):
41 |         def copy(self, **kwargs):
42 |             self.last_copy_kwargs = kwargs
43 |             return super().copy(**kwargs)
44 | 
45 |     prompt_model = TrackingDummyLM([{"proposed_instruction": "instruction"}] * 10)
46 |     program = Predict("question -> answer")
47 | 
48 |     proposer = GroundedProposer(
49 |         prompt_model=prompt_model,
50 |         program=program,
51 |         trainset=[],
52 |         verbose=False,
53 |         init_temperature=0.7,
54 |     )
55 |     result = proposer.propose_instruction_for_predictor(
56 |         program=program,
57 |         predictor=None,
58 |         pred_i=0,
59 |         demo_candidates=demo_candidates,
60 |         demo_set_i=0,
61 |         trial_logs={},
62 |         tip=None,
63 |     )
64 |     assert result == "instruction"
65 |     assert prompt_model.last_copy_kwargs["temperature"] == 0.7
66 | 
```

--------------------------------------------------------------------------------
/tests/utils/test_unbatchify.py:
--------------------------------------------------------------------------------

```python
 1 | import time
 2 | from concurrent.futures import Future
 3 | from unittest.mock import MagicMock
 4 | 
 5 | from dspy.utils.unbatchify import Unbatchify
 6 | 
 7 | 
 8 | def simple_batch_processor(batch):
 9 |     """A simple batch function that adds 1 to each item."""
10 |     return [item + 1 for item in batch]
11 | 
12 | 
13 | def submit(self, input_item: any) -> Future:
14 |     """Submits an item for processing and returns a Future."""
15 |     future = Future()
16 |     self.input_queue.put((input_item, future))
17 |     return future
18 | 
19 | 
20 | Unbatchify.submit = submit
21 | 
22 | 
23 | def test_unbatchify_batch_size_trigger():
24 |     """Test that the batch processes exactly when max_batch_size is reached."""
25 |     batch_fn_mock = MagicMock(wraps=simple_batch_processor)
26 |     unbatcher = Unbatchify(batch_fn=batch_fn_mock, max_batch_size=2, max_wait_time=5.0)
27 | 
28 |     futures = []
29 |     futures.append(unbatcher.submit(10))
30 |     time.sleep(0.02)
31 |     assert batch_fn_mock.call_count == 0
32 | 
33 |     futures.append(unbatcher.submit(20))
34 | 
35 |     results_1_2 = [f.result() for f in futures]
36 |     assert batch_fn_mock.call_count == 1
37 |     batch_fn_mock.assert_called_once_with([10, 20])
38 |     assert results_1_2 == [11, 21]
39 | 
40 |     futures_3_4 = []
41 |     futures_3_4.append(unbatcher.submit(30))
42 |     futures_3_4.append(unbatcher.submit(40))
43 | 
44 |     results_3_4 = [f.result() for f in futures_3_4]
45 |     time.sleep(0.01)
46 |     assert batch_fn_mock.call_count == 2
47 |     assert batch_fn_mock.call_args_list[1].args[0] == [30, 40]
48 |     assert results_3_4 == [31, 41]
49 | 
50 |     unbatcher.close()
51 | 
52 | 
53 | def test_unbatchify_timeout_trigger():
54 |     """Test that the batch processes after max_wait_time."""
55 |     batch_fn_mock = MagicMock(wraps=simple_batch_processor)
56 |     wait_time = 0.15
57 |     unbatcher = Unbatchify(batch_fn=batch_fn_mock, max_batch_size=5, max_wait_time=wait_time)
58 | 
59 |     futures = []
60 |     futures.append(unbatcher.submit(100))
61 |     futures.append(unbatcher.submit(200))
62 | 
63 |     time.sleep(wait_time / 2)
64 |     assert batch_fn_mock.call_count == 0
65 | 
66 |     results = [f.result() for f in futures]
67 | 
68 |     assert batch_fn_mock.call_count == 1
69 |     batch_fn_mock.assert_called_once_with([100, 200])
70 |     assert results == [101, 201]
71 | 
72 |     unbatcher.close()
73 | 
```

--------------------------------------------------------------------------------
/dspy/utils/logging_utils.py:
--------------------------------------------------------------------------------

```python
 1 | import logging
 2 | import logging.config
 3 | import sys
 4 | 
 5 | LOGGING_LINE_FORMAT = "%(asctime)s %(levelname)s %(name)s: %(message)s"
 6 | LOGGING_DATETIME_FORMAT = "%Y/%m/%d %H:%M:%S"
 7 | 
 8 | 
 9 | class DSPyLoggingStream:
10 |     """
11 |     A Python stream for use with event logging APIs throughout DSPy (`eprint()`,
12 |     `logger.info()`, etc.). This stream wraps `sys.stderr`, forwarding `write()` and
13 |     `flush()` calls to the stream referred to by `sys.stderr` at the time of the call.
14 |     It also provides capabilities for disabling the stream to silence event logs.
15 |     """
16 | 
17 |     def __init__(self):
18 |         self._enabled = True
19 | 
20 |     def write(self, text):
21 |         if self._enabled:
22 |             sys.stderr.write(text)
23 | 
24 |     def flush(self):
25 |         if self._enabled:
26 |             sys.stderr.flush()
27 | 
28 |     @property
29 |     def enabled(self):
30 |         return self._enabled
31 | 
32 |     @enabled.setter
33 |     def enabled(self, value):
34 |         self._enabled = value
35 | 
36 | 
37 | DSPY_LOGGING_STREAM = DSPyLoggingStream()
38 | 
39 | 
40 | def disable_logging():
41 |     """
42 |     Disables the `DSPyLoggingStream` used by event logging APIs throughout DSPy
43 |     (`eprint()`, `logger.info()`, etc), silencing all subsequent event logs.
44 |     """
45 |     DSPY_LOGGING_STREAM.enabled = False
46 | 
47 | 
48 | def enable_logging():
49 |     """
50 |     Enables the `DSPyLoggingStream` used by event logging APIs throughout DSPy
51 |     (`eprint()`, `logger.info()`, etc), emitting all subsequent event logs. This
52 |     reverses the effects of `disable_logging()`.
53 |     """
54 |     DSPY_LOGGING_STREAM.enabled = True
55 | 
56 | 
57 | def configure_dspy_loggers(root_module_name):
58 |     formatter = logging.Formatter(fmt=LOGGING_LINE_FORMAT, datefmt=LOGGING_DATETIME_FORMAT)
59 | 
60 |     dspy_handler_name = "dspy_handler"
61 |     handler = logging.StreamHandler(stream=DSPY_LOGGING_STREAM)
62 |     handler.setFormatter(formatter)
63 |     handler.set_name(dspy_handler_name)
64 | 
65 |     logger = logging.getLogger(root_module_name)
66 |     logger.setLevel(logging.INFO)
67 |     logger.propagate = False
68 | 
69 |     for existing_handler in logger.handlers[:]:
70 |         if getattr(existing_handler, "name", None) == dspy_handler_name:
71 |             logger.removeHandler(existing_handler)
72 | 
73 |     logger.addHandler(handler)
74 | 
```

--------------------------------------------------------------------------------
/dspy/adapters/types/history.py:
--------------------------------------------------------------------------------

```python
 1 | from typing import Any
 2 | 
 3 | import pydantic
 4 | 
 5 | 
 6 | class History(pydantic.BaseModel):
 7 |     """Class representing the conversation history.
 8 | 
 9 |     The conversation history is a list of messages, each message entity should have keys from the associated signature.
10 |     For example, if you have the following signature:
11 | 
12 |     ```
13 |     class MySignature(dspy.Signature):
14 |         question: str = dspy.InputField()
15 |         history: dspy.History = dspy.InputField()
16 |         answer: str = dspy.OutputField()
17 |     ```
18 | 
19 |     Then the history should be a list of dictionaries with keys "question" and "answer".
20 | 
21 |     Example:
22 |         ```
23 |         import dspy
24 | 
25 |         dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
26 | 
27 |         class MySignature(dspy.Signature):
28 |             question: str = dspy.InputField()
29 |             history: dspy.History = dspy.InputField()
30 |             answer: str = dspy.OutputField()
31 | 
32 |         history = dspy.History(
33 |             messages=[
34 |                 {"question": "What is the capital of France?", "answer": "Paris"},
35 |                 {"question": "What is the capital of Germany?", "answer": "Berlin"},
36 |             ]
37 |         )
38 | 
39 |         predict = dspy.Predict(MySignature)
40 |         outputs = predict(question="What is the capital of France?", history=history)
41 |         ```
42 | 
43 |     Example of capturing the conversation history:
44 |         ```
45 |         import dspy
46 | 
47 |         dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
48 | 
49 |         class MySignature(dspy.Signature):
50 |             question: str = dspy.InputField()
51 |             history: dspy.History = dspy.InputField()
52 |             answer: str = dspy.OutputField()
53 | 
54 |         predict = dspy.Predict(MySignature)
55 |         outputs = predict(question="What is the capital of France?")
56 |         history = dspy.History(messages=[{"question": "What is the capital of France?", **outputs}])
57 |         outputs_with_history = predict(question="Are you sure?", history=history)
58 |         ```
59 |     """
60 | 
61 |     messages: list[dict[str, Any]]
62 | 
63 |     model_config = pydantic.ConfigDict(
64 |         frozen=True,
65 |         str_strip_whitespace=True,
66 |         validate_assignment=True,
67 |         extra="forbid",
68 |     )
69 | 
```

--------------------------------------------------------------------------------
/dspy/utils/asyncify.py:
--------------------------------------------------------------------------------

```python
 1 | from typing import TYPE_CHECKING, Any, Awaitable, Callable
 2 | 
 3 | import asyncer
 4 | from anyio import CapacityLimiter
 5 | 
 6 | if TYPE_CHECKING:
 7 |     from dspy.primitives.module import Module
 8 | 
 9 | _limiter = None
10 | 
11 | 
12 | def get_async_max_workers():
13 |     import dspy
14 | 
15 |     return dspy.settings.async_max_workers
16 | 
17 | 
18 | def get_limiter():
19 |     async_max_workers = get_async_max_workers()
20 | 
21 |     global _limiter
22 |     if _limiter is None:
23 |         _limiter = CapacityLimiter(async_max_workers)
24 |     elif _limiter.total_tokens != async_max_workers:
25 |         _limiter.total_tokens = async_max_workers
26 | 
27 |     return _limiter
28 | 
29 | 
30 | def asyncify(program: "Module") -> Callable[[Any, Any], Awaitable[Any]]:
31 |     """
32 |     Wraps a DSPy program so that it can be called asynchronously. This is useful for running a
33 |     program in parallel with another task (e.g., another DSPy program).
34 | 
35 |     This implementation propagates the current thread's configuration context to the worker thread.
36 | 
37 |     Args:
38 |         program: The DSPy program to be wrapped for asynchronous execution.
39 | 
40 |     Returns:
41 |         An async function: An async function that, when awaited, runs the program in a worker thread.
42 |             The current thread's configuration context is inherited for each call.
43 |     """
44 | 
45 |     async def async_program(*args, **kwargs) -> Any:
46 |         # Capture the current overrides at call-time.
47 |         from dspy.dsp.utils.settings import thread_local_overrides
48 | 
49 |         parent_overrides = thread_local_overrides.get().copy()
50 | 
51 |         def wrapped_program(*a, **kw):
52 |             from dspy.dsp.utils.settings import thread_local_overrides
53 | 
54 |             original_overrides = thread_local_overrides.get()
55 |             token = thread_local_overrides.set({**original_overrides, **parent_overrides.copy()})
56 |             try:
57 |                 return program(*a, **kw)
58 |             finally:
59 |                 thread_local_overrides.reset(token)
60 | 
61 |         # Create a fresh asyncified callable each time, ensuring the latest context is used.
62 |         call_async = asyncer.asyncify(wrapped_program, abandon_on_cancel=True, limiter=get_limiter())
63 |         return await call_async(*args, **kwargs)
64 | 
65 |     return async_program
66 | 
```

--------------------------------------------------------------------------------
/docs/docs/tutorials/real_world_examples/index.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Real-World Examples
 2 | 
 3 | This section demonstrates practical applications of DSPy across different domains and use cases. Each tutorial shows how to build production-ready AI systems using DSPy's modular programming approach.
 4 | 
 5 | ## Featured Examples
 6 | 
 7 | ### 📄 [Generating llms.txt](../llms_txt_generation/index.md)
 8 | Learn how to create AI-powered documentation generators that analyze codebases and produce structured, LLM-friendly documentation following the llms.txt standard.
 9 | 
10 | **Key Concepts:** Repository analysis, meta-programming, documentation generation
11 | 
12 | ### 📧 [Email Information Extraction](../email_extraction/index.md)
13 | Build intelligent email processing systems that classify messages, extract entities, and identify action items using DSPy's structured prediction capabilities.
14 | 
15 | **Key Concepts:** Information extraction, classification, text processing
16 | 
17 | ### 🧠 [Memory-Enabled ReAct Agents with Mem0](../mem0_react_agent/index.md)
18 | Create conversational agents with persistent memory using DSPy ReAct and Mem0 integration for context-aware interactions across sessions.
19 | 
20 | **Key Concepts:** Memory systems, conversational AI, agent persistence
21 | 
22 | ### 💰 [Financial Analysis with Yahoo Finance](../yahoo_finance_react/index.md)
23 | Develop financial analysis agents that fetch real-time market data, analyze news sentiment, and provide investment insights using LangChain tool integration.
24 | 
25 | **Key Concepts:** Tool integration, financial data, real-time analysis
26 | 
27 | ### 🔄 [Automated Code Generation from Documentation](../sample_code_generation/index.md)
28 | Build a system that automatically fetches documentation from URLs and generates working code examples for any library using DSPy's intelligent analysis.
29 | 
30 | **Key Concepts:** Web scraping, documentation parsing, automated learning, code generation
31 | 
32 | ### 🎮 [Building a Creative Text-Based AI Game](../ai_text_game/index.md)
33 | Create an interactive text-based adventure game with dynamic storytelling, AI-powered NPCs, and adaptive gameplay using DSPy's modular programming approach.
34 | 
35 | **Key Concepts:** Interactive storytelling, game state management, character progression, AI-driven narratives
36 | 
```

--------------------------------------------------------------------------------
/docs/docs/learn/evaluation/overview.md:
--------------------------------------------------------------------------------

```markdown
 1 | ---
 2 | sidebar_position: 1
 3 | ---
 4 | 
 5 | # Evaluation in DSPy
 6 | 
 7 | Once you have an initial system, it's time to **collect an initial development set** so you can refine it more systematically. Even 20 input examples of your task can be useful, though 200 goes a long way. Depending on your _metric_, you either just need inputs and no labels at all, or you need inputs and the _final_ outputs of your system. (You almost never need labels for the intermediate steps in your program in DSPy.) You can probably find datasets that are adjacent to your task on, say, HuggingFace datasets or in a naturally occurring source like StackExchange. If there's data whose licenses are permissive enough, we suggest you use them. Otherwise, you can label a few examples by hand or start deploying a demo of your system and collect initial data that way.
 8 | 
 9 | Next, you should **define your DSPy metric**. What makes outputs from your system good or bad? Invest in defining metrics and improving them incrementally over time; it's hard to consistently improve what you aren't able to define. A metric is a function that takes examples from your data and takes the output of your system, and returns a score. For simple tasks, this could be just "accuracy", e.g. for simple classification or short-form QA tasks. For most applications, your system will produce long-form outputs, so your metric will be a smaller DSPy program that checks multiple properties of the output. Getting this right on the first try is unlikely: start with something simple and iterate.
10 | 
11 | Now that you have some data and a metric, run development evaluations on your pipeline designs to understand their tradeoffs. Look at the outputs and the metric scores. This will probably allow you to spot any major issues, and it will define a baseline for your next steps.
12 | 
13 | 
14 | ??? "If your metric is itself a DSPy program..."
15 |     If your metric is itself a DSPy program, a powerful way to iterate is to optimize your metric itself. That's usually easy because the output of the metric is usually a simple value (e.g., a score out of 5), so the metric's metric is easy to define and optimize by collecting a few examples.
16 | 
17 | 
```

--------------------------------------------------------------------------------
/.github/.internal_dspyai/internals/release-checklist.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Release Checklist
 2 | 
 3 | * [ ] On `main` Create a git tag with pattern X.Y.Z where X, Y, and Z follow the [semver pattern](https://semver.org/). Then push the tag to the origin git repo (github).
 4 |     * ```bash
 5 |       git tag X.Y.Z
 6 |       git push origin --tags
 7 |       ```
 8 |     * This will trigger the github action to build and release the package.
 9 | * [ ] Confirm the tests pass and the package has been published to pypi.
10 |     * If the tests fail, you can remove the tag from your local and github repo using:
11 |     ```bash
12 |     git push origin --delete X.Y.Z # Delete on GitHub
13 |     git tag -d X.Y.Z # Delete locally
14 |     ```
15 |     * Fix the errors and then repeat the steps above to recreate the tag locally and push to GitHub to restart the process.
16 |     * Note that the github action takes care of incrementing the release version on test-pypi automatically by adding a pre-release identifier in the scenario where the tests fail and you need to delete and push the same tag again. 
17 | * [ ] [Create a release](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository) 
18 | * [ ] Add release notes. You can make use of [automatically generated release notes](https://docs.github.com/en/repositories/releasing-projects-on-github/automatically-generated-release-notes)
19 | * If creating a new release for major or minor version:
20 |     * [ ] Create a new release branch with the last commit and name it 'release/X.Y`
21 |     * [ ] [Update the default branch](https://docs.github.com/en/organizations/managing-organization-settings/managing-the-default-branch-name-for-repositories-in-your-organization) on the github rep to the new release branch.
22 | 
23 | ### Prerequisites
24 | 
25 | The automation requires a [trusted publisher](https://docs.pypi.org/trusted-publishers/) to be set up on both the pypi and test-pypi packages. If the package is migrated to a new project, please follow the [steps](https://docs.pypi.org/trusted-publishers/adding-a-publisher/) to create a trusted publisher. If you have no releases on the new project, you may have to create a [pending trusted publisher](https://docs.pypi.org/trusted-publishers/creating-a-project-through-oidc/) to allow the first automated deployment. 
```

--------------------------------------------------------------------------------
/tests/clients/test_inspect_global_history.py:
--------------------------------------------------------------------------------

```python
 1 | import pytest
 2 | 
 3 | import dspy
 4 | from dspy.clients.base_lm import GLOBAL_HISTORY
 5 | from dspy.utils.dummies import DummyLM
 6 | 
 7 | 
 8 | @pytest.fixture(autouse=True)
 9 | def clear_history():
10 |     GLOBAL_HISTORY.clear()
11 |     yield
12 | 
13 | 
14 | def test_inspect_history_basic(capsys):
15 |     # Configure a DummyLM with some predefined responses
16 |     lm = DummyLM([{"response": "Hello"}, {"response": "How are you?"}])
17 |     dspy.settings.configure(lm=lm)
18 | 
19 |     # Make some calls to generate history
20 |     predictor = dspy.Predict("query: str -> response: str")
21 |     predictor(query="Hi")
22 |     predictor(query="What's up?")
23 | 
24 |     # Test inspecting all history
25 |     history = GLOBAL_HISTORY
26 |     print(capsys)
27 |     assert len(history) > 0
28 |     assert isinstance(history, list)
29 |     assert all(isinstance(entry, dict) for entry in history)
30 |     assert all("messages" in entry for entry in history)
31 | 
32 | 
33 | def test_inspect_history_with_n(capsys):
34 |     """Test that inspect_history works with n
35 |     Random failures in this test most likely mean you are printing messages somewhere
36 |     """
37 |     lm = DummyLM([{"response": "One"}, {"response": "Two"}, {"response": "Three"}])
38 |     dspy.settings.configure(lm=lm)
39 | 
40 |     # Generate some history
41 |     predictor = dspy.Predict("query: str -> response: str")
42 |     predictor(query="First")
43 |     predictor(query="Second")
44 |     predictor(query="Third")
45 | 
46 |     dspy.inspect_history(n=2)
47 |     # Test getting last 2 entries
48 |     out, err = capsys.readouterr()
49 |     assert "First" not in out
50 |     assert "Second" in out
51 |     assert "Third" in out
52 | 
53 | 
54 | def test_inspect_empty_history(capsys):
55 |     # Configure fresh DummyLM
56 |     lm = DummyLM([])
57 |     dspy.settings.configure(lm=lm)
58 | 
59 |     # Test inspecting empty history
60 |     dspy.inspect_history()
61 |     history = GLOBAL_HISTORY
62 |     assert len(history) == 0
63 |     assert isinstance(history, list)
64 | 
65 | 
66 | def test_inspect_history_n_larger_than_history(capsys):
67 |     lm = DummyLM([{"response": "First"}, {"response": "Second"}])
68 |     dspy.settings.configure(lm=lm)
69 | 
70 |     predictor = dspy.Predict("query: str -> response: str")
71 |     predictor(query="Query 1")
72 |     predictor(query="Query 2")
73 | 
74 |     # Request more entries than exist
75 |     dspy.inspect_history(n=5)
76 |     history = GLOBAL_HISTORY
77 |     assert len(history) == 2  # Should return all available entries
78 | 
```

--------------------------------------------------------------------------------
/docs/overrides/partials/tabs.html:
--------------------------------------------------------------------------------

```html
 1 | <!--
 2 |   Copyright (c) 2016-2023 Martin Donath <[email protected]>
 3 | 
 4 |   Permission is hereby granted, free of charge, to any person obtaining a copy
 5 |   of this software and associated documentation files (the "Software"), to
 6 |   deal in the Software without restriction, including without limitation the
 7 |   rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
 8 |   sell copies of the Software, and to permit persons to whom the Software is
 9 |   furnished to do so, subject to the following conditions:
10 | 
11 |   The above copyright notice and this permission notice shall be included in
12 |   all copies or substantial portions of the Software.
13 | 
14 |   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15 |   IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16 |   FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE
17 |   AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18 |   LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
19 |   FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
20 |   IN THE SOFTWARE.
21 | -->
22 | 
23 | {% import "partials/tabs-item.html" as item with context %}
24 | 
25 | <!-- Navigation tabs -->
26 | <nav
27 |   class="md-tabs"
28 |   aria-label="{{ lang.t('tabs') }}"
29 |   data-md-component="tabs"
30 | >
31 |   <div class="md-tabs__inner md-grid">
32 | 
33 | 	<!-- Adds tab on right side of header -->
34 | 	{% if "FAQ" %}
35 |         <ul class="md-tabs__list" style="float: right;">
36 |             <li class="md-tabs__item">
37 |                 <a href="/production/" class="md-tabs__link">
38 |                     DSPy in Production
39 |                 </a>
40 |             </li>
41 | 	    <li class="md-tabs__item">
42 |                 <a href="/community/community-resources/" class="md-tabs__link">
43 |                     Community
44 |                 </a>
45 |             </li>
46 |             <li class="md-tabs__item">
47 |                 <a href="/faqs/" class="md-tabs__link">
48 |                     FAQ
49 |                 </a>
50 |             </li>
51 |         </ul>
52 | 	{% endif %}
53 | 
54 | 	<!-- Original tabbed sections -->
55 | 	<ul class="md-tabs__list">
56 | 	  {% for nav_item in nav %}
57 | 		{% if nav_item.title not in ["FAQ", "Community", "DSPy in Production"] %}
58 |             {{ item.render(nav_item) }}
59 | 		{% endif %}
60 | 	  {% endfor %}
61 | 	</ul>
62 |   </div>
63 | </nav>
64 | 
```

--------------------------------------------------------------------------------
/.github/workflows/build_utils/test_version.py:
--------------------------------------------------------------------------------

```python
 1 | import sys
 2 | from datetime import datetime
 3 | 
 4 | import requests
 5 | import semver
 6 | from packaging.version import Version as PyPIVersion
 7 | 
 8 | 
 9 | def get_latest_version(package_name, tag_version):  
10 |     # Returns latest version, and T/F as to whether it needs to be incremented
11 |     response = requests.get(f"https://test.pypi.org/pypi/{package_name}/json")  
12 |     if response.status_code == 200:  
13 |         data = response.json()  
14 |         # Flatten the list of files for all releases and get the latest upload  
15 |         all_uploads = [  
16 |             (release['upload_time'], release['filename'], version)  
17 |             for version, releases in data['releases'].items()  
18 |             for release in releases  
19 |         ] 
20 |         # If a release with tag_version does not exist, that is the latest version
21 |         # Then increment is False, as no need to increment the version
22 |         tag_release_exists = any(upload for upload in all_uploads if upload[2] == tag_version)
23 |         if not(tag_release_exists):
24 |             return tag_version, False  
25 |         # Else, get the latest release version, and set increment to True
26 |         else:
27 |             # Sort all uploads by upload time in descending order
28 |             latest_upload = max(all_uploads, key=lambda x: datetime.fromisoformat(x[0].rstrip('Z')))  
29 |             return latest_upload[2], True  
30 |     
31 |     elif response.status_code == 404:
32 |         # If no existing releases can get a 404
33 |         return tag_version, False
34 |     return None, None  
35 |     
36 | def increment_version(curr_version):
37 |     pypi_v = PyPIVersion(curr_version)
38 |     if pypi_v.pre:
39 |         pre = "".join([str(i) for i in pypi_v.pre])
40 |         parsed_v = semver.Version(*pypi_v.release, pre)
41 |     else:
42 |         parsed_v = semver.Version(*pypi_v.release)
43 |     new_v = str(parsed_v.bump_prerelease())
44 |     return new_v
45 |   
46 | if __name__ == "__main__":  
47 |     if len(sys.argv) != 3:  
48 |         raise ValueError("Usage: python get_latest_testpypi_version.py <package_name> <tag_version>")  
49 |       
50 |     package_name = sys.argv[1]
51 |     tag_v = sys.argv[2]
52 | 
53 |     latest_version, increment = get_latest_version(package_name, tag_v)  
54 |     if increment:
55 |         new_version = increment_version(latest_version)
56 |     else: 
57 |         new_version = latest_version
58 | 
59 |     # Output new version
60 |     print(new_version)  
61 | 
```

--------------------------------------------------------------------------------
/tests/utils/test_exceptions.py:
--------------------------------------------------------------------------------

```python
 1 | import dspy
 2 | from dspy.utils.exceptions import AdapterParseError
 3 | 
 4 | 
 5 | def test_adapter_parse_error_basic():
 6 |     adapter_name = "ChatAdapter"
 7 |     signature = dspy.make_signature("question->answer1, answer2")
 8 |     lm_response = "[[ ## answer1 ## ]]\nanswer1"
 9 | 
10 |     error = AdapterParseError(adapter_name=adapter_name, signature=signature, lm_response=lm_response)
11 | 
12 |     assert error.adapter_name == adapter_name
13 |     assert error.signature == signature
14 |     assert error.lm_response == lm_response
15 | 
16 |     error_message = str(error)
17 |     assert error_message == (
18 |         "Adapter ChatAdapter failed to parse the LM response. \n\n"
19 |         "LM Response: [[ ## answer1 ## ]]\nanswer1 \n\n"
20 |         "Expected to find output fields in the LM response: [answer1, answer2] \n\n"
21 |     )
22 | 
23 | 
24 | def test_adapter_parse_error_with_message():
25 |     adapter_name = "ChatAdapter"
26 |     signature = dspy.make_signature("question->answer1, answer2")
27 |     lm_response = "[[ ## answer1 ## ]]\nanswer1"
28 |     message = "Critical error, please fix!"
29 | 
30 |     error = AdapterParseError(adapter_name=adapter_name, signature=signature, lm_response=lm_response, message=message)
31 | 
32 |     assert error.adapter_name == adapter_name
33 |     assert error.signature == signature
34 |     assert error.lm_response == lm_response
35 | 
36 |     error_message = str(error)
37 |     assert error_message == (
38 |         "Critical error, please fix!\n\n"
39 |         "Adapter ChatAdapter failed to parse the LM response. \n\n"
40 |         "LM Response: [[ ## answer1 ## ]]\nanswer1 \n\n"
41 |         "Expected to find output fields in the LM response: [answer1, answer2] \n\n"
42 |     )
43 | 
44 | 
45 | def test_adapter_parse_error_with_parsed_result():
46 |     adapter_name = "ChatAdapter"
47 |     signature = dspy.make_signature("question->answer1, answer2")
48 |     lm_response = "[[ ## answer1 ## ]]\nanswer1"
49 |     parsed_result = {"answer1": "value1"}
50 | 
51 |     error = AdapterParseError(
52 |         adapter_name=adapter_name, signature=signature, lm_response=lm_response, parsed_result=parsed_result
53 |     )
54 | 
55 |     error_message = str(error)
56 |     assert error_message == (
57 |         "Adapter ChatAdapter failed to parse the LM response. \n\n"
58 |         "LM Response: [[ ## answer1 ## ]]\nanswer1 \n\n"
59 |         "Expected to find output fields in the LM response: [answer1, answer2] \n\n"
60 |         "Actual output fields parsed from the LM response: [answer1] \n\n"
61 |     )
62 | 
```

--------------------------------------------------------------------------------
/tests/reliability/complex_types/generated/test_nesting_1/program.py:
--------------------------------------------------------------------------------

```python
 1 | ### Input models ###
 2 | 
 3 | 
 4 | from pydantic import BaseModel, Field
 5 | 
 6 | 
 7 | class Level5(BaseModel):
 8 |     field1: str = Field(..., description="A string field at the deepest level")
 9 |     field2: float = Field(..., description="A numerical field at the deepest level")
10 | 
11 | 
12 | class Level4(BaseModel):
13 |     level5: Level5
14 | 
15 | 
16 | class Level3(BaseModel):
17 |     level4: Level4
18 | 
19 | 
20 | class Level2(BaseModel):
21 |     level3: Level3
22 | 
23 | 
24 | class Level1(BaseModel):
25 |     level2: Level2
26 | 
27 | 
28 | class ProgramInputs(BaseModel):
29 |     level1: Level1
30 | 
31 | 
32 | ### Output models ###
33 | 
34 | 
35 | from typing import List
36 | 
37 | from pydantic import BaseModel, Field
38 | 
39 | 
40 | class ResultLevel5(BaseModel):
41 |     outputField1: bool = Field(..., description="A boolean field indicating success or failure")
42 |     outputField2: list[str] = Field(..., description="An array of strings representing messages")
43 | 
44 | 
45 | class ResultLevel4(BaseModel):
46 |     resultLevel5: ResultLevel5
47 | 
48 | 
49 | class ResultLevel3(BaseModel):
50 |     resultLevel4: ResultLevel4
51 | 
52 | 
53 | class ResultLevel2(BaseModel):
54 |     resultLevel3: ResultLevel3
55 | 
56 | 
57 | class ResultLevel1(BaseModel):
58 |     resultLevel2: ResultLevel2
59 | 
60 | 
61 | class ProgramOutputs(BaseModel):
62 |     resultLevel1: ResultLevel1
63 | 
64 | 
65 | ### Program definition ###
66 | 
67 | import dspy
68 | 
69 | 
70 | class BaseSignature(dspy.Signature):
71 |     """
72 |     The AI program is designed to process hierarchical data structures with multiple levels of nesting. The program will take a deeply nested input structure representing a complex dataset, perform specific transformations, validations, and computations, and then produce an equally complex nested output structure. The program is suitable for applications that require detailed data processing, such as multi-level data aggregation, hierarchical data validation, and nested data transformation.
73 |     """
74 | 
75 | 
76 | program_signature = BaseSignature
77 | for input_field_name, input_field in ProgramInputs.model_fields.items():
78 |     program_signature = program_signature.append(
79 |         name=input_field_name,
80 |         field=dspy.InputField(description=input_field.description),
81 |         type_=input_field.annotation,
82 |     )
83 | for output_field_name, output_field in ProgramOutputs.model_fields.items():
84 |     program_signature = program_signature.append(
85 |         name=output_field_name,
86 |         field=dspy.OutputField(description=input_field.description),
87 |         type_=output_field.annotation,
88 |     )
89 | 
90 | program = dspy.Predict(program_signature)
91 | 
```

--------------------------------------------------------------------------------
/docs/docs/learn/optimization/overview.md:
--------------------------------------------------------------------------------

```markdown
 1 | ---
 2 | sidebar_position: 1
 3 | ---
 4 | 
 5 | 
 6 | # Optimization in DSPy
 7 | 
 8 | Once you have a system and a way to evaluate it, you can use DSPy optimizers to tune the prompts or weights in your program. Now it's useful to expand your data collection effort into building a training set and a held-out test set, in addition to the development set you've been using for exploration. For the training set (and its subset, validation set), you can often get substantial value out of 30 examples, but aim for at least 300 examples. Some optimizers accept a `trainset` only. Others ask for a `trainset` and a `valset`. When splitting data for most prompt optimizers, we recommend an unusual split compared to deep neural networks: 20% for training, 80% for validation. This reverse allocation emphasizes stable validation, since prompt-based optimizers often overfit to small training sets. In contrast, the [dspy.GEPA](https://dspy.ai/tutorials/gepa_ai_program/) optimizer follows the more standard ML convention: Maximize the training set size, while keeping the validation set just large enough to reflect the distribution of the downstream tasks (test set).
 9 | 
10 | After your first few optimization runs, you are either very happy with everything or you've made a lot of progress but you don't like something about the final program or the metric. At this point, go back to step 1 (Programming in DSPy) and revisit the major questions. Did you define your task well? Do you need to collect (or find online) more data for your problem? Do you want to update your metric? And do you want to use a more sophisticated optimizer? Do you need to consider advanced features like DSPy Assertions? Or, perhaps most importantly, do you want to add some more complexity or steps in your DSPy program itself? Do you want to use multiple optimizers in a sequence?
11 | 
12 | Iterative development is key. DSPy gives you the pieces to do that incrementally: iterating on your data, your program structure, your metric, and your optimization steps. Optimizing complex LM programs is an entirely new paradigm that only exists in DSPy at the time of writing (update: there are now numerous DSPy extension frameworks, so this part is no longer true :-), so naturally the norms around what to do are still emerging. If you need help, we recently created a [Discord server](https://discord.gg/XCGy2WDCQB) for the community.
13 | 
14 | 
```

--------------------------------------------------------------------------------
/tests/utils/test_parallelizer.py:
--------------------------------------------------------------------------------

```python
 1 | import time
 2 | 
 3 | import pytest
 4 | 
 5 | from dspy.utils.parallelizer import ParallelExecutor
 6 | 
 7 | 
 8 | def test_worker_threads_independence():
 9 |     def task(item):
10 |         # Each thread maintains its own state by appending to a thread-local list
11 |         return item * 2
12 | 
13 |     data = [1, 2, 3, 4, 5]
14 |     executor = ParallelExecutor(num_threads=3)
15 |     results = executor.execute(task, data)
16 | 
17 |     assert results == [2, 4, 6, 8, 10]
18 | 
19 | 
20 | def test_parallel_execution_speed():
21 |     def task(item):
22 |         time.sleep(0.1)  # Simulate a time-consuming task
23 |         return item
24 | 
25 |     data = [1, 2, 3, 4, 5]
26 |     executor = ParallelExecutor(num_threads=5)
27 | 
28 |     start_time = time.time()
29 |     executor.execute(task, data)
30 |     end_time = time.time()
31 | 
32 |     assert end_time - start_time < len(data)
33 | 
34 | 
35 | def test_max_errors_handling():
36 |     def task(item):
37 |         if item == 3:
38 |             raise ValueError("Intentional error")
39 |         return item
40 | 
41 |     data = [1, 2, 3, 4, 5]
42 |     executor = ParallelExecutor(num_threads=3, max_errors=1)
43 | 
44 |     with pytest.raises(Exception, match="Execution cancelled due to errors or interruption."):
45 |         executor.execute(task, data)
46 | 
47 | 
48 | def test_max_errors_not_met():
49 |     def task(item):
50 |         if item == 3:
51 |             raise ValueError("Intentional error")
52 |         return item
53 | 
54 |     data = [1, 2, 3, 4, 5]
55 |     executor = ParallelExecutor(num_threads=3, max_errors=2)
56 | 
57 |     # Ensure that the execution completes without crashing when max_errors is not met
58 |     results = executor.execute(task, data)
59 | 
60 |     # Verify that the results exclude the failed task
61 |     assert results == [1, 2, None, 4, 5]
62 | 
63 | 
64 | def test_parallel_executor_tracks_failed_indices_and_exceptions():
65 |     def task(item):
66 |         if item == 3:
67 |             raise ValueError("test error for 3")
68 |         if item == 5:
69 |             raise RuntimeError("test error for 5")
70 |         return item
71 | 
72 |     data = [1, 2, 3, 4, 5]
73 |     executor = ParallelExecutor(num_threads=3, max_errors=3)
74 | 
75 |     results = executor.execute(task, data)
76 | 
77 |     assert results == [1, 2, None, 4, None]
78 | 
79 |     assert sorted(executor.failed_indices) == [2, 4]
80 | 
81 |     assert len(executor.exceptions_map) == 2
82 |     assert isinstance(executor.exceptions_map[2], ValueError)
83 |     assert str(executor.exceptions_map[2]) == "test error for 3"
84 |     assert isinstance(executor.exceptions_map[4], RuntimeError)
85 |     assert str(executor.exceptions_map[4]) == "test error for 5"
86 | 
```

--------------------------------------------------------------------------------
/dspy/datasets/gsm8k.py:
--------------------------------------------------------------------------------

```python
 1 | import random
 2 | 
 3 | import tqdm
 4 | 
 5 | 
 6 | class GSM8K:
 7 |     def __init__(self):
 8 |         self.do_shuffle = False
 9 | 
10 |         from datasets import load_dataset
11 | 
12 |         dataset = load_dataset("gsm8k", "main")
13 | 
14 |         hf_official_train = dataset["train"]
15 |         hf_official_test = dataset["test"]
16 |         official_train = []
17 |         official_test = []
18 | 
19 |         for example in tqdm.tqdm(hf_official_train):
20 |             question = example["question"]
21 | 
22 |             answer = example["answer"].strip().split()
23 |             assert answer[-2] == "####"
24 | 
25 |             gold_reasoning = " ".join(answer[:-2])
26 |             answer = str(int(answer[-1].replace(",", "")))
27 | 
28 |             official_train.append({"question": question, "gold_reasoning": gold_reasoning, "answer": answer})
29 | 
30 |         for example in tqdm.tqdm(hf_official_test):
31 |             question = example["question"]
32 | 
33 |             answer = example["answer"].strip().split()
34 |             assert answer[-2] == "####"
35 | 
36 |             gold_reasoning = " ".join(answer[:-2])
37 |             answer = str(int(answer[-1].replace(",", "")))
38 | 
39 |             official_test.append({"question": question, "gold_reasoning": gold_reasoning, "answer": answer})
40 | 
41 |         rng = random.Random(0)
42 |         rng.shuffle(official_train)
43 | 
44 |         rng = random.Random(0)
45 |         rng.shuffle(official_test)
46 | 
47 |         trainset = official_train[:200]
48 |         devset = official_train[200:500]
49 |         testset = official_test[:]
50 | 
51 |         import dspy
52 | 
53 |         trainset = [dspy.Example(**x).with_inputs("question") for x in trainset]
54 |         devset = [dspy.Example(**x).with_inputs("question") for x in devset]
55 |         testset = [dspy.Example(**x).with_inputs("question") for x in testset]
56 | 
57 |         self.train = trainset
58 |         self.dev = devset
59 |         self.test = testset
60 | 
61 | 
62 | def parse_integer_answer(answer, only_first_line=True):
63 |     try:
64 |         if only_first_line:
65 |             answer = answer.strip().split("\n")[0]
66 | 
67 |         # find the last token that has a number in it
68 |         answer = [token for token in answer.split() if any(c.isdigit() for c in token)][-1]
69 |         answer = answer.split(".")[0]
70 |         answer = "".join([c for c in answer if c.isdigit()])
71 |         answer = int(answer)
72 | 
73 |     except (ValueError, IndexError):
74 |         answer = 0
75 | 
76 |     return answer
77 | 
78 | 
79 | def gsm8k_metric(gold, pred, trace=None):
80 |     return int(parse_integer_answer(str(gold.answer))) == int(parse_integer_answer(str(pred.answer)))
81 | 
```

--------------------------------------------------------------------------------
/docs/docs/tutorials/core_development/index.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Tools, Development, and Deployment
 2 | 
 3 | This section covers essential DSPy features and best practices for professional AI development. Learn how to implement key functionalities like streaming, caching, deployment, and monitoring in your DSPy applications. These tutorials focus on the practical aspects of building production-ready systems.
 4 | 
 5 | ## Integration and Tooling
 6 | 
 7 | ### [Use MCP in DSPy](../mcp/index.md)
 8 | Learn to integrate Model Context Protocol (MCP) with DSPy applications. This tutorial shows how to leverage MCP for enhanced context management and more sophisticated AI interactions.
 9 | 
10 | ### [Output Refinement](../output_refinement/best-of-n-and-refine.md)
11 | Master techniques for improving output quality through refinement strategies. Learn how to implement best-of-N sampling and iterative refinement to get higher-quality results from your DSPy programs.
12 | 
13 | ## Data Management and Persistence
14 | 
15 | ### [Saving and Loading](../saving/index.md)
16 | Understand how to persist and restore DSPy programs and their optimized states. Learn best practices for model versioning, checkpoint management, and program serialization.
17 | 
18 | ### [Cache](../cache/index.md)
19 | Implement efficient caching strategies to improve performance and reduce API costs. Learn how to configure and use DSPy's caching mechanisms effectively in different scenarios.
20 | 
21 | ## Production Deployment
22 | 
23 | ### [Deployment](../deployment/index.md)
24 | Learn to deploy DSPy applications in production environments. This tutorial covers multiple deployment strategies such as FastAPI and MLflow.
25 | 
26 | ### [Streaming](../streaming/index.md)
27 | Implement real-time streaming capabilities in your DSPy applications. Learn how to handle streaming responses for better user experience in interactive applications.
28 | 
29 | ### [Async](../async/index.md)
30 | Build asynchronous DSPy applications for improved performance and scalability. Learn async/await patterns and concurrent execution strategies for high-throughput systems.
31 | 
32 | ## Monitoring and Optimization
33 | 
34 | ### [Debugging & Observability](../observability/index.md)
35 | Master debugging and monitoring techniques for DSPy applications. Learn to use comprehensive logging, tracing, and error handling for production systems.
36 | 
37 | ### [Tracking DSPy Optimizers](../optimizer_tracking/index.md)
38 | Learn to track and analyze optimizer performance and behavior. Understand how to monitor optimization processes and enhance the reproducibility of the optimization.
39 | 
```

--------------------------------------------------------------------------------
/docs/docs/learn/programming/overview.md:
--------------------------------------------------------------------------------

```markdown
 1 | ---
 2 | sidebar_position: 1
 3 | ---
 4 | 
 5 | # Programming in DSPy
 6 | 
 7 | DSPy is a bet on _writing code instead of strings_. In other words, building the right control flow is crucial. Start by **defining your task**. What are the inputs to your system and what should your system produce as output? Is it a chatbot over your data or perhaps a code assistant? Or maybe a system for translation, for highlighting snippets from search results, or for generating reports with citations?
 8 | 
 9 | Next, **define your initial pipeline**. Can your DSPy program just be a single module or do you need to break it down into a few steps? Do you need retrieval or other tools, like a calculator or a calendar API? Is there a typical workflow for solving your problem in multiple well-scoped steps, or do you want more open-ended tool use with agents for your task? Think about these but start simple, perhaps with just a single `dspy.ChainOfThought` module, then add complexity incrementally based on observations.
10 | 
11 | As you do this, **craft and try a handful of examples** of the inputs to your program. Consider using a powerful LM at this point, or a couple of different LMs, just to understand what's possible. Record interesting (both easy and hard) examples you try. This will be useful when you are doing evaluation and optimization later.
12 | 
13 | 
14 | ??? "Beyond encouraging good design patterns, how does DSPy help here?"
15 | 
16 |     Conventional prompts couple your fundamental system architecture with incidental choices not portable to new LMs, objectives, or pipelines. A conventional prompt asks the LM to take some inputs and produce some outputs of certain types (a _signature_), formats the inputs in certain ways and requests outputs in a form it can parse accurately (an _adapter_), asks the LM to apply certain strategies like "thinking step by step" or using tools (a _module_'s logic), and relies on substantial trial-and-error to discover the right way to ask each LM to do this (a form of manual _optimization_).
17 |     
18 |     DSPy separates these concerns and automates the lower-level ones until you need to consider them. This allow you to write much shorter code, with much higher portability. For example, if you write a program using DSPy modules, you can swap the LM or its adapter without changing the rest of your logic. Or you can exchange one _module_, like `dspy.ChainOfThought`, with another, like `dspy.ProgramOfThought`, without modifying your signatures. When you're ready to use optimizers, the same program can have its prompts optimized or its LM weights fine-tuned.
19 | 
```

--------------------------------------------------------------------------------
/dspy/utils/usage_tracker.py:
--------------------------------------------------------------------------------

```python
 1 | """Usage tracking utilities for DSPy."""
 2 | 
 3 | from collections import defaultdict
 4 | from contextlib import contextmanager
 5 | from typing import Any, Generator
 6 | 
 7 | from dspy.dsp.utils.settings import settings
 8 | 
 9 | 
10 | class UsageTracker:
11 |     """Tracks LM usage data within a context."""
12 | 
13 |     def __init__(self):
14 |         # Map of LM name to list of usage entries. For example:
15 |         # {
16 |         #     "openai/gpt-4o-mini": [
17 |         #         {"prompt_tokens": 100, "completion_tokens": 200},
18 |         #         {"prompt_tokens": 300, "completion_tokens": 400},
19 |         #     ],
20 |         # }
21 |         self.usage_data = defaultdict(list)
22 | 
23 |     def _flatten_usage_entry(self, usage_entry: dict[str, Any]) -> dict[str, Any]:
24 |         result = dict(usage_entry)
25 | 
26 |         if completion_tokens_details := result.get("completion_tokens_details"):
27 |             result["completion_tokens_details"] = dict(completion_tokens_details)
28 |         if prompt_tokens_details := result.get("prompt_tokens_details"):
29 |             result["prompt_tokens_details"] = dict(prompt_tokens_details)
30 |         return result
31 | 
32 |     def _merge_usage_entries(self, usage_entry1: dict[str, Any] | None, usage_entry2: dict[str, Any] | None) -> dict[str, Any]:
33 |         if usage_entry1 is None or len(usage_entry1) == 0:
34 |             return dict(usage_entry2)
35 |         if usage_entry2 is None or len(usage_entry2) == 0:
36 |             return dict(usage_entry1)
37 | 
38 |         result = dict(usage_entry2)
39 |         for k, v in usage_entry1.items():
40 |             current_v = result.get(k)
41 |             if isinstance(v, dict) or isinstance(current_v, dict):
42 |                 result[k] = self._merge_usage_entries(current_v, v)
43 |             else:
44 |                 result[k] = (current_v or 0) + (v or 0)
45 |         return result
46 | 
47 |     def add_usage(self, lm: str, usage_entry: dict[str, Any]) -> None:
48 |         """Add a usage entry to the tracker."""
49 |         if len(usage_entry) > 0:
50 |             self.usage_data[lm].append(self._flatten_usage_entry(usage_entry))
51 | 
52 |     def get_total_tokens(self) -> dict[str, dict[str, Any]]:
53 |         """Calculate total tokens from all tracked usage."""
54 |         total_usage_by_lm = {}
55 |         for lm, usage_entries in self.usage_data.items():
56 |             total_usage = {}
57 |             for usage_entry in usage_entries:
58 |                 total_usage = self._merge_usage_entries(total_usage, usage_entry)
59 |             total_usage_by_lm[lm] = total_usage
60 |         return total_usage_by_lm
61 | 
62 | 
63 | @contextmanager
64 | def track_usage() -> Generator[UsageTracker, None, None]:
65 |     """Context manager for tracking LM usage."""
66 |     tracker = UsageTracker()
67 | 
68 |     with settings.context(usage_tracker=tracker):
69 |         yield tracker
70 | 
```

--------------------------------------------------------------------------------
/dspy/teleprompt/signature_opt.py:
--------------------------------------------------------------------------------

```python
 1 | from .copro_optimizer import COPRO
 2 | 
 3 | """
 4 | ===============================================================
 5 | DEPRECATED!!!
 6 | PLEASE USE COPRO INSTEAD.
 7 | ===============================================================
 8 | 
 9 | USAGE SUGGESTIONS:
10 | 
11 | The following code can be used to compile a optimized signature teleprompter, and evaluate it on an end task:
12 | 
13 | teleprompter = SignatureOptimizer(prompt_model=prompt_model, metric=metric, breadth=BREADTH, depth=DEPTH, init_temperature=INIT_TEMPERATURE)
14 | kwargs = dict(num_threads=NUM_THREADS, display_progress=True, display_table=0)
15 | compiled_prompt_opt = teleprompter.compile(program.deepcopy(), devset=devset[:DEV_NUM], eval_kwargs=kwargs)
16 | eval_score = evaluate(compiled_prompt_opt, devset=evalset[:EVAL_NUM], **kwargs)
17 | 
18 | Note that this teleprompter takes in the following parameters:
19 | 
20 | * prompt_model: The model used for prompt generation. When unspecified, defaults to the model set in settings (ie. dspy.settings.configure(lm=task_model)).
21 | * metric: The task metric used for optimization.
22 | * breadth: The number of new prompts to generate at each iteration. Default=10.
23 | * depth: The number of times we should ask our prompt model to generate new prompts, with the history of the past prompts as input. Default=3.
24 | * init_temperature: The temperature used to generate new prompts. Higher roughly equals more creative. Default=1.4.
25 | * verbose: Tells the method whether or not to print intermediate steps.
26 | * track_stats: Tells the method whether or not to track statistics about the optimization process.
27 |                 If True, the method will track the following statistics:
28 |                     * results_best: The min,max,avg,stddev of top 10 scores for each predictor at each depth.
29 |                     * results_latest: The min,max,avg,stddev of newest prompt scores for each predictor at each depth.
30 |                     * total_calls: The total number of calls to the task metric.
31 |                 These statistics will be returned as attributes of the best program.
32 | """
33 | 
34 | 
35 | class SignatureOptimizer(COPRO):
36 |     def __init__(
37 |         self,
38 |         prompt_model=None,
39 |         metric=None,
40 |         breadth=10,
41 |         depth=3,
42 |         init_temperature=1.4,
43 |         verbose=False,
44 |         track_stats=False,
45 |     ):
46 |         print(
47 |             "\u001b[31m[WARNING] SignatureOptimizer has been deprecated and replaced with COPRO.  SignatureOptimizer will be removed in a future release. \u001b[31m",
48 |         )
49 |         super().__init__(prompt_model, metric, breadth, depth, init_temperature, verbose, track_stats)
50 | 
51 |     def compile(self, student, *, devset, eval_kwargs):
52 |         return super().compile(student, trainset=devset, eval_kwargs=eval_kwargs)
53 | 
```

--------------------------------------------------------------------------------
/docs/docs/tutorials/index.md:
--------------------------------------------------------------------------------

```markdown
 1 | Welcome to DSPy tutorials! We've organized our tutorials into three main categories to help you get started:
 2 | 
 3 | - **Build AI Programs with DSPy**: These hands-on tutorials guide you through building production-ready AI
 4 |   applications. From implementing RAG systems to creating intelligent agents, each tutorial demonstrates
 5 |   practical use cases. You'll also learn how to leverage DSPy optimizers to enhance your program's performance.
 6 | 
 7 | - **Optimize AI Programs with DSPy Optimizers**: These tutorials deep dive into DSPy's optimization capabilities. While
 8 |   lighter on programming concepts, they focus on how to systematically improve your AI programs using DSPy
 9 |   optimizers, and showcase how DSPy optimizers help improve the quality automatically.
10 | 
11 | - **DSPy Core Development**: These tutorials cover essential DSPy features and best practices. Learn how to implement
12 |   key functionalities like streaming, caching, deployment, and monitoring in your DSPy applications.
13 | 
14 | 
15 | - Build AI Programs with DSPy
16 |     - [Managing Conversation History](conversation_history/index.md)
17 |     - [Building AI Agents with DSPy](customer_service_agent/index.ipynb)
18 |     - [Building AI Applications by Customizing DSPy Modules](custom_module/index.ipynb)
19 |     - [Retrieval-Augmented Generation (RAG)](rag/index.ipynb)
20 |     - [Building RAG as Agent](agents/index.ipynb)
21 |     - [Entity Extraction](entity_extraction/index.ipynb)
22 |     - [Classification](classification/index.md)
23 |     - [Multi-Hop RAG](multihop_search/index.ipynb)
24 |     - [Privacy-Conscious Delegation](papillon/index.md)
25 |     - [Program Of Thought](program_of_thought/index.ipynb)
26 |     - [Image Generation Prompt iteration](image_generation_prompting/index.ipynb)
27 |     - [Audio](audio/index.ipynb)
28 | 
29 | 
30 | - Optimize AI Programs with DSPy
31 |     - [Math Reasoning](math/index.ipynb)
32 |     - [Classification Finetuning](classification_finetuning/index.ipynb)
33 |     - [Advanced Tool Use](tool_use/index.ipynb)
34 |     - [Finetuning Agents](games/index.ipynb)
35 | 
36 | 
37 | - Reflective Prompt Evolution with dspy.GEPA:
38 |     - [Overview](gepa_ai_program/index.md)
39 |     - [GEPA for AIME](gepa_aime/index.ipynb)
40 |     - [GEPA for PAPILLON](gepa_papillon/index.ipynb)
41 |     - [GEPA for Enterprise classification task](gepa_facilitysupportanalyzer/index.ipynb)
42 | 
43 | 
44 | - Tools, Development, and Deployment
45 |     - [Use MCP in DSPy](mcp/index.md)
46 |     - [Output Refinement](output_refinement/best-of-n-and-refine.md)
47 |     - [Saving and Loading](saving/index.md)
48 |     - [Cache](cache/index.md)
49 |     - [Deployment](deployment/index.md)
50 |     - [Debugging & Observability](observability/index.md)
51 |     - [Tracking DSPy Optimizers](optimizer_tracking/index.md)
52 |     - [Streaming](streaming/index.md)
53 |     - [Async](async/index.md)
54 | 
55 | 
56 | 
```

--------------------------------------------------------------------------------
/tests/test_utils/server/__init__.py:
--------------------------------------------------------------------------------

```python
 1 | import json
 2 | import os
 3 | import socket
 4 | import subprocess
 5 | import tempfile
 6 | import time
 7 | from typing import Any
 8 | 
 9 | import pytest
10 | 
11 | from tests.test_utils.server.litellm_server import LITELLM_TEST_SERVER_LOG_FILE_PATH_ENV_VAR
12 | 
13 | 
14 | @pytest.fixture()
15 | def litellm_test_server() -> tuple[str, str]:
16 |     """
17 |     Start a LiteLLM test server for a DSPy integration test case, and tear down the
18 |     server when the test case completes.
19 |     """
20 |     with tempfile.TemporaryDirectory() as server_log_dir_path:
21 |         # Create a server log file used to store request logs
22 |         server_log_file_path = os.path.join(server_log_dir_path, "request_logs.jsonl")
23 |         open(server_log_file_path, "a").close()
24 | 
25 |         port = _get_random_port()
26 |         host = "127.0.0.1"
27 |         print(f"Starting LiteLLM proxy server on port {port}")
28 | 
29 |         process = subprocess.Popen(
30 |             ["litellm", "--host", host, "--port", str(port), "--config", _get_litellm_config_path()],
31 |             env={LITELLM_TEST_SERVER_LOG_FILE_PATH_ENV_VAR: server_log_file_path, **os.environ.copy()},
32 |             text=True,
33 |         )
34 | 
35 |         try:
36 |             _wait_for_port(host=host, port=port)
37 |         except TimeoutError as e:
38 |             process.terminate()
39 |             raise e
40 | 
41 |         server_url = f"http://{host}:{port}"
42 |         yield server_url, server_log_file_path
43 | 
44 |         process.kill()
45 |         process.wait()
46 | 
47 | 
48 | def read_litellm_test_server_request_logs(server_log_file_path: str) -> list[dict[str, Any]]:
49 |     """
50 |     Read request logs from a LiteLLM server used during DSPy integration tests.
51 | 
52 |     Args:
53 |         server_log_file_path: The filesystem path to the LiteLLM server request logs jsonlines file.
54 |     Return:
55 |         A list of log entries, where each entry corresponds to one request handled by the server.
56 |     """
57 |     data = []
58 |     with open(server_log_file_path) as f:
59 |         for line in f:
60 |             data.append(json.loads(line))
61 | 
62 |     return data
63 | 
64 | 
65 | def _get_litellm_config_path():
66 |     module_dir = os.path.dirname(os.path.abspath(__file__))
67 |     return os.path.join(module_dir, "litellm_server_config.yaml")
68 | 
69 | 
70 | def _get_random_port():
71 |     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
72 |         s.bind(("", 0))
73 |         return s.getsockname()[1]
74 | 
75 | 
76 | def _wait_for_port(host, port, timeout=10):
77 |     start_time = time.time()
78 |     while time.time() - start_time < timeout:
79 |         with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
80 |             try:
81 |                 sock.connect((host, port))
82 |                 return True
83 |             except ConnectionRefusedError:
84 |                 time.sleep(0.5)  # Wait briefly before trying again
85 |     raise TimeoutError(f"Server on port {port} did not become ready within {timeout} seconds.")
86 | 
```

--------------------------------------------------------------------------------
/tests/reliability/reliability_conf.yaml:
--------------------------------------------------------------------------------

```yaml
 1 | adapter: chat
 2 | model_list:
 3 |   # The model to use for judging the correctness of program
 4 |   # outputs throughout reliability test suites. We recommend using
 5 |   # a high quality model as the judge, such as OpenAI GPT-4o
 6 |   - model_name: "judge"
 7 |     litellm_params:
 8 |       # model: "<litellm_provider>/<litellm_model_name>"
 9 |       # api_key: "api key"
10 |       # api_base: "<api_base>"
11 |   - model_name: "gpt-4o"
12 |     litellm_params:
13 |       # model: "<litellm_provider>/<litellm_model_name>"
14 |       # api_key: "api key"
15 |       # api_base: "<api_base>"
16 |   - model_name: "gpt-4o-mini"
17 |     litellm_params:
18 |       # model: "<litellm_provider>/<litellm_model_name>"
19 |       # api_key: "api key"
20 |       # api_base: "<api_base>"
21 |   - model_name: "gpt-4-turbo"
22 |     litellm_params:
23 |       # model: "<litellm_provider>/<litellm_model_name>"
24 |       # api_key: "api key"
25 |       # api_base: "<api_base>"
26 |   - model_name: "gpt-o1"
27 |     litellm_params:
28 |       # model: "<litellm_provider>/<litellm_model_name>"
29 |       # api_key: "api key"
30 |       # api_base: "<api_base>"
31 |   - model_name: "gpt-o1-mini"
32 |     litellm_params:
33 |       # model: "<litellm_provider>/<litellm_model_name>"
34 |       # api_key: "api key"
35 |       # api_base: "<api_base>"
36 |   - model_name: "claude-3.5-sonnet"
37 |     litellm_params:
38 |       # model: "<litellm_provider>/<litellm_model_name>"
39 |       # api_key: "api key"
40 |       # api_base: "<api_base>"
41 |   - model_name: "claude-3.5-haiku"
42 |     litellm_params:
43 |       # model: "<litellm_provider>/<litellm_model_name>"
44 |       # api_key: "api key"
45 |       # api_base: "<api_base>"
46 |   - model_name: "gemini-1.5-pro"
47 |     litellm_params:
48 |       # model: "<litellm_provider>/<litellm_model_name>"
49 |       # api_key: "api key"
50 |       # api_base: "<api_base>"
51 |   - model_name: "gemini-1.5-flash"
52 |     litellm_params:
53 |       # model: "<litellm_provider>/<litellm_model_name>"
54 |       # api_key: "api key"
55 |       # api_base: "<api_base>"
56 |   - model_name: "llama-3.1-405b-instruct"
57 |     litellm_params:
58 |       # model: "<litellm_provider>/<litellm_model_name>"
59 |       # api_key: "api key"
60 |       # api_base: "<api_base>"
61 |   - model_name: "llama-3.1-70b-instruct"
62 |     litellm_params:
63 |       # model: "<litellm_provider>/<litellm_model_name>"
64 |       # api_key: "api key"
65 |       # api_base: "<api_base>"
66 |   - model_name: "llama-3.1-8b-instruct"
67 |     litellm_params:
68 |       # model: "<litellm_provider>/<litellm_model_name>"
69 |       # api_key: "api key"
70 |       # api_base: "<api_base>"
71 |   - model_name: "llama-3.2-3b-instruct"
72 |     litellm_params:
73 |       # model: "<litellm_provider>/<litellm_model_name>"
74 |       # api_key: "api key"
75 |       # api_base: "<api_base>"
76 |   - model_name: "deepseek-r1"
77 |     litellm_params:
78 |       # model: "<litellm_provider>/<litellm_model_name>"
79 |       # api_key: "api key"
80 |       # max_tokens: 10000
81 | 
82 | 
```

--------------------------------------------------------------------------------
/tests/reliability/conftest.py:
--------------------------------------------------------------------------------

```python
 1 | import os
 2 | 
 3 | import pytest
 4 | 
 5 | import dspy
 6 | from ..conftest import clear_settings
 7 | from ..reliability.utils import get_adapter, parse_reliability_conf_yaml
 8 | 
 9 | # Standard list of models that should be used for periodic DSPy reliability testing
10 | MODEL_LIST = [
11 |     "gpt-4o",
12 |     "gpt-4o-mini",
13 |     "gpt-4-turbo",
14 |     "gpt-o1-preview",
15 |     "gpt-o1-mini",
16 |     "claude-3.5-sonnet",
17 |     "claude-3.5-haiku",
18 |     "gemini-1.5-pro",
19 |     "gemini-1.5-flash",
20 |     "llama-3.1-405b-instruct",
21 |     "llama-3.1-70b-instruct",
22 |     "llama-3.1-8b-instruct",
23 |     "llama-3.2-3b-instruct",
24 |     "deepseek-r1",
25 | ]
26 | 
27 | 
28 | def pytest_generate_tests(metafunc):
29 |     """
30 |     Hook to parameterize reliability test cases with each model defined in the
31 |     reliability tests YAML configuration
32 |     """
33 |     known_failing_models = getattr(metafunc.function, "_known_failing_models", [])
34 | 
35 |     if "configure_model" in metafunc.fixturenames:
36 |         params = [(model, model in known_failing_models) for model in MODEL_LIST]
37 |         ids = [f"{model}" for model, _ in params]  # Custom IDs for display
38 |         metafunc.parametrize("configure_model", params, indirect=True, ids=ids)
39 | 
40 | 
41 | @pytest.fixture(autouse=True)
42 | def configure_model(request):
43 |     """
44 |     Fixture to configure the DSPy library with a particular configured model and adapter
45 |     before executing a test case.
46 |     """
47 |     module_dir = os.path.dirname(os.path.abspath(__file__))
48 |     conf_path = os.path.join(module_dir, "reliability_conf.yaml")
49 |     reliability_conf = parse_reliability_conf_yaml(conf_path)
50 |     adapter = get_adapter(reliability_conf)
51 | 
52 |     model_name, should_ignore_failure = request.param
53 |     model_params = reliability_conf.models.get(model_name)
54 |     if model_params:
55 |         lm = dspy.LM(**model_params)
56 |         dspy.configure(lm=lm, adapter=adapter)
57 |     else:
58 |         pytest.skip(
59 |             f"Skipping test because no reliability testing YAML configuration was found"
60 |             f" for model {model_name}, or the YAML configuration is missing LiteLLM parameters"
61 |             f" for this model ('litellm_params' section of conf file is missing)."
62 |         )
63 | 
64 |     # Store `should_ignore_failure` flag on the request node for use in post-test handling
65 |     request.node.should_ignore_failure = should_ignore_failure
66 |     request.node.model_name = model_name
67 | 
68 | 
69 | @pytest.hookimpl(tryfirst=True, hookwrapper=True)
70 | def pytest_runtest_makereport(item, call):
71 |     """
72 |     Hook to conditionally ignore failures in a given test case for known failing models.
73 |     """
74 |     outcome = yield
75 |     rep = outcome.get_result()
76 | 
77 |     should_ignore_failure = getattr(item, "should_ignore_failure", False)
78 | 
79 |     if should_ignore_failure and rep.failed:
80 |         rep.outcome = "passed"
81 |         rep.wasxfail = "Ignoring failure for known failing model"
82 | 
```

--------------------------------------------------------------------------------
/dspy/teleprompt/knn_fewshot.py:
--------------------------------------------------------------------------------

```python
 1 | import types
 2 | from typing import Any
 3 | 
 4 | from dspy.clients import Embedder
 5 | from dspy.predict.knn import KNN
 6 | from dspy.primitives import Example
 7 | from dspy.teleprompt import BootstrapFewShot
 8 | from dspy.teleprompt.teleprompt import Teleprompter
 9 | 
10 | 
11 | class KNNFewShot(Teleprompter):
12 |     def __init__(self, k: int, trainset: list[Example], vectorizer: Embedder, **few_shot_bootstrap_args: dict[str, Any]):
13 |         """
14 |         KNNFewShot is an optimizer that uses an in-memory KNN retriever to find the k nearest neighbors
15 |         in a trainset at test time. For each input example in a forward call, it identifies the k most
16 |         similar examples from the trainset and attaches them as demonstrations to the student module.
17 | 
18 |         Args:
19 |             k: The number of nearest neighbors to attach to the student model.
20 |             trainset: The training set to use for few-shot prompting.
21 |             vectorizer: The `Embedder` to use for vectorization
22 |             **few_shot_bootstrap_args: Additional arguments for the `BootstrapFewShot` optimizer.
23 | 
24 |         Example:
25 |             ```python
26 |             import dspy
27 |             from sentence_transformers import SentenceTransformer
28 | 
29 |             # Define a QA module with chain of thought
30 |             qa = dspy.ChainOfThought("question -> answer")
31 | 
32 |             # Create a training dataset with examples
33 |             trainset = [
34 |                 dspy.Example(question="What is the capital of France?", answer="Paris").with_inputs("question"),
35 |                 # ... more examples ...
36 |             ]
37 | 
38 |             # Initialize KNNFewShot with a sentence transformer model
39 |             knn_few_shot = KNNFewShot(
40 |                 k=3,
41 |                 trainset=trainset,
42 |                 vectorizer=dspy.Embedder(SentenceTransformer("all-MiniLM-L6-v2").encode)
43 |             )
44 | 
45 |             # Compile the QA module with few-shot learning
46 |             compiled_qa = knn_few_shot.compile(qa)
47 | 
48 |             # Use the compiled module
49 |             result = compiled_qa("What is the capital of Belgium?")
50 |             ```
51 |         """
52 |         self.KNN = KNN(k, trainset, vectorizer=vectorizer)
53 |         self.few_shot_bootstrap_args = few_shot_bootstrap_args
54 | 
55 |     def compile(self, student, *, teacher=None):
56 |         student_copy = student.reset_copy()
57 | 
58 |         def forward_pass(_, **kwargs):
59 |             knn_trainset = self.KNN(**kwargs)
60 |             few_shot_bootstrap = BootstrapFewShot(**self.few_shot_bootstrap_args)
61 |             compiled_program = few_shot_bootstrap.compile(
62 |                 student,
63 |                 teacher=teacher,
64 |                 trainset=knn_trainset,
65 |             )
66 |             return compiled_program(**kwargs)
67 | 
68 |         student_copy.forward = types.MethodType(forward_pass, student_copy)
69 |         return student_copy
70 | 
```

--------------------------------------------------------------------------------
/tests/teleprompt/test_knn_fewshot.py:
--------------------------------------------------------------------------------

```python
 1 | import pytest
 2 | 
 3 | import dspy
 4 | from dspy.teleprompt.knn_fewshot import KNNFewShot
 5 | from dspy.utils.dummies import DummyLM, DummyVectorizer
 6 | 
 7 | 
 8 | def mock_example(question: str, answer: str) -> dspy.Example:
 9 |     """Creates a mock DSP example with specified question and answer."""
10 |     return dspy.Example(question=question, answer=answer).with_inputs("question")
11 | 
12 | 
13 | @pytest.fixture
14 | def setup_knn_few_shot() -> KNNFewShot:
15 |     """Sets up a KNNFewShot instance for testing."""
16 |     trainset = [
17 |         mock_example("What is the capital of France?", "Paris"),
18 |         mock_example("What is the largest ocean?", "Pacific"),
19 |         mock_example("What is 2+2?", "4"),
20 |     ]
21 |     return KNNFewShot(k=2, trainset=trainset, vectorizer=dspy.Embedder(DummyVectorizer()))
22 | 
23 | 
24 | def test_knn_few_shot_initialization(setup_knn_few_shot):
25 |     """Tests the KNNFewShot initialization."""
26 |     knn_few_shot = setup_knn_few_shot
27 |     assert knn_few_shot.KNN.k == 2, "Incorrect k value for KNN"
28 |     assert len(knn_few_shot.KNN.trainset) == 3, "Incorrect trainset size for KNN"
29 | 
30 | 
31 | class SimpleModule(dspy.Module):
32 |     def __init__(self, signature):
33 |         super().__init__()
34 |         self.predictor = dspy.Predict(signature)
35 | 
36 |     def forward(self, *args, **kwargs):
37 |         return self.predictor(**kwargs)
38 | 
39 |     def reset_copy(self):
40 |         # Creates a new instance of SimpleModule with the same predictor
41 |         return SimpleModule(self.predictor.signature)
42 | 
43 | 
44 | # TODO: Test not working yet
45 | def _test_knn_few_shot_compile(setup_knn_few_shot):
46 |     """Tests the compile method of KNNFewShot with SimpleModule as student."""
47 |     student = SimpleModule("input -> output")
48 |     teacher = SimpleModule("input -> output")  # Assuming teacher uses the same module type
49 | 
50 |     # Setup DummyLM with a response for a query similar to one of the training examples
51 |     lm = DummyLM(["Madrid", "10"])
52 |     dspy.settings.configure(lm=lm)  # Responses for the capital of Spain and the result of 5+5)
53 | 
54 |     knn_few_shot = setup_knn_few_shot
55 |     trainset = knn_few_shot.KNN.trainset
56 |     compiled_student = knn_few_shot.compile(student, teacher=teacher, trainset=trainset, valset=None)
57 | 
58 |     assert len(compiled_student.predictor.demos) == 1
59 |     assert compiled_student.predictor.demos[0].input == trainset[0].input
60 |     assert compiled_student.predictor.demos[0].output == trainset[0].output
61 |     # Simulate a query that is similar to one of the training examples
62 |     output = compiled_student.forward(input="What is the capital of Spain?").output
63 | 
64 |     # Validate that the output corresponds to one of the expected DummyLM responses
65 |     # This assumes the compiled_student's forward method will execute the predictor with the given query
66 |     assert output in ["Madrid", "10"], "The compiled student did not return the correct output based on the query"
67 | 
```

--------------------------------------------------------------------------------
/dspy/utils/annotation.py:
--------------------------------------------------------------------------------

```python
 1 | import inspect
 2 | import re
 3 | import types
 4 | from typing import Callable, ParamSpec, TypeVar, overload
 5 | 
 6 | P = ParamSpec("P")
 7 | R = TypeVar("R")
 8 | 
 9 | @overload
10 | def experimental(f: Callable[P, R], version: str | None = None) -> Callable[P, R]: ...
11 | 
12 | @overload
13 | def experimental(f: None = None, version: str | None = None) -> Callable[[Callable[P, R]], Callable[P, R]]: ...
14 | 
15 | 
16 | def experimental(
17 |     f: Callable[P, R] | None = None,
18 |     version: str | None = None,
19 | ) -> Callable[[Callable[P, R]], Callable[P, R]]:
20 |     """Decorator / decorator creator for marking APIs experimental in the docstring.
21 | 
22 |     Args:
23 |         f: The function to be decorated.
24 |         version: The version in which the API was introduced as experimental.
25 |             The version is used to determine whether the API should be considered
26 |             as stable or not when releasing a new version of DSPy.
27 | 
28 |     Returns:
29 |         A decorator that adds a note to the docstring of the decorated API.
30 |     """
31 |     if f:
32 |         return _experimental(f, version)
33 |     else:
34 |         def decorator(f: Callable[P, R]) -> Callable[P, R]:
35 |             return _experimental(f, version)
36 |         return decorator
37 | 
38 | 
39 | def _experimental(api: Callable[P, R], version: str | None = None) -> Callable[P, R]:
40 |     """Add experimental notice to the API's docstring."""
41 |     if inspect.isclass(api):
42 |         api_type = "class"
43 |     elif inspect.isfunction(api):
44 |         api_type = "function"
45 |     elif isinstance(api, property):
46 |         api_type = "property"
47 |     elif isinstance(api, types.MethodType):
48 |         api_type = "method"
49 |     else:
50 |         api_type = str(type(api))
51 | 
52 |     indent = _get_min_indent_of_docstring(api.__doc__) if api.__doc__ else ""
53 | 
54 |     version_text = f" (introduced in v{version})" if version else ""
55 |     notice = (
56 |         indent + f"Experimental: This {api_type} may change or "
57 |         f"be removed in a future release without warning{version_text}."
58 |     )
59 | 
60 |     if api_type == "property":
61 |         api.__doc__ = api.__doc__ + "\n\n" + notice if api.__doc__ else notice
62 |     else:
63 |         if api.__doc__:
64 |             api.__doc__ = notice + "\n\n" + api.__doc__
65 |         else:
66 |             api.__doc__ = notice
67 |     return api
68 | 
69 | 
70 | def _get_min_indent_of_docstring(docstring_str: str) -> str:
71 |     """
72 |     Get the minimum indentation string of a docstring, based on the assumption
73 |     that the closing triple quote for multiline comments must be on a new line.
74 |     Note that based on ruff rule D209, the closing triple quote for multiline
75 |     comments must be on a new line.
76 | 
77 |     Args:
78 |         docstring_str: string with docstring
79 | 
80 |     Returns:
81 |         Whitespace corresponding to the indent of a docstring.
82 |     """
83 | 
84 |     if not docstring_str or "\n" not in docstring_str:
85 |         return ""
86 | 
87 |     match = re.match(r"^\s*", docstring_str.rsplit("\n", 1)[-1])
88 |     return match.group() if match else ""
89 | 
```

--------------------------------------------------------------------------------
/tests/teleprompt/test_bootstrap_finetune.py:
--------------------------------------------------------------------------------

```python
 1 | from unittest.mock import patch
 2 | 
 3 | import dspy
 4 | from dspy import Example
 5 | from dspy.predict import Predict
 6 | from dspy.teleprompt import BootstrapFinetune
 7 | from dspy.utils.dummies import DummyLM
 8 | 
 9 | 
10 | # Define a simple metric function for testing
11 | def simple_metric(example, prediction, trace=None):
12 |     return example.output == prediction.output
13 | 
14 | 
15 | examples = [
16 |     Example(input="What is the color of the sky?", output="blue").with_inputs("input"),
17 |     Example(input="What does the fox say?", output="Ring-ding-ding-ding-dingeringeding!").with_inputs("input"),
18 | ]
19 | trainset = [examples[0]]
20 | 
21 | 
22 | def test_bootstrap_finetune_initialization():
23 |     """Test BootstrapFinetune initialization with various parameters."""
24 |     bootstrap = BootstrapFinetune(metric=simple_metric)
25 |     assert bootstrap.metric == simple_metric, "Metric not correctly initialized"
26 |     assert bootstrap.multitask == True, "Multitask should default to True"
27 | 
28 | 
29 | class SimpleModule(dspy.Module):
30 |     def __init__(self, signature):
31 |         super().__init__()
32 |         self.predictor = Predict(signature)
33 | 
34 |     def forward(self, **kwargs):
35 |         return self.predictor(**kwargs)
36 | 
37 | 
38 | def test_compile_with_predict_instances():
39 |     """Test BootstrapFinetune compilation with Predict instances."""
40 |     # Create SimpleModule instances for student and teacher
41 |     student = SimpleModule("input -> output")
42 |     teacher = SimpleModule("input -> output")
43 | 
44 |     lm = DummyLM([{"output": "blue"}, {"output": "Ring-ding-ding-ding-dingeringeding!"}])
45 |     dspy.settings.configure(lm=lm)
46 | 
47 |     # Set LM for both student and teacher
48 |     student.set_lm(lm)
49 |     teacher.set_lm(lm)
50 | 
51 |     bootstrap = BootstrapFinetune(metric=simple_metric)
52 | 
53 |     # Mock the fine-tuning process since DummyLM doesn't support it
54 |     with patch.object(bootstrap, "finetune_lms") as mock_finetune:
55 |         mock_finetune.return_value = {(lm, None): lm}
56 |         compiled_student = bootstrap.compile(student, teacher=teacher, trainset=trainset)
57 | 
58 |         assert compiled_student is not None, "Failed to compile student"
59 |         assert hasattr(compiled_student, "_compiled") and compiled_student._compiled, "Student compilation flag not set"
60 | 
61 |         mock_finetune.assert_called_once()
62 | 
63 | 
64 | def test_error_handling_missing_lm():
65 |     """Test error handling when predictor doesn't have an LM assigned."""
66 | 
67 |     lm = DummyLM([{"output": "test"}])
68 |     dspy.settings.configure(lm=lm)
69 | 
70 |     student = SimpleModule("input -> output")
71 |     # Intentionally NOT setting LM for the student module
72 | 
73 |     bootstrap = BootstrapFinetune(metric=simple_metric)
74 | 
75 |     # This should raise ValueError about missing LM and hint to use set_lm
76 |     try:
77 |         bootstrap.compile(student, trainset=trainset)
78 |         assert False, "Should have raised ValueError for missing LM"
79 |     except ValueError as e:
80 |         assert "does not have an LM assigned" in str(e)
81 |         assert "set_lm" in str(e)
82 | 
```

--------------------------------------------------------------------------------
/dspy/utils/inspect_history.py:
--------------------------------------------------------------------------------

```python
 1 | def _green(text: str, end: str = "\n"):
 2 |     return "\x1b[32m" + str(text).lstrip() + "\x1b[0m" + end
 3 | 
 4 | 
 5 | def _red(text: str, end: str = "\n"):
 6 |     return "\x1b[31m" + str(text) + "\x1b[0m" + end
 7 | 
 8 | 
 9 | def _blue(text: str, end: str = "\n"):
10 |     return "\x1b[34m" + str(text) + "\x1b[0m" + end
11 | 
12 | 
13 | def pretty_print_history(history, n: int = 1):
14 |     """Prints the last n prompts and their completions."""
15 | 
16 |     for item in history[-n:]:
17 |         messages = item["messages"] or [{"role": "user", "content": item["prompt"]}]
18 |         outputs = item["outputs"]
19 |         timestamp = item.get("timestamp", "Unknown time")
20 | 
21 |         print("\n\n\n")
22 |         print("\x1b[34m" + f"[{timestamp}]" + "\x1b[0m" + "\n")
23 | 
24 |         for msg in messages:
25 |             print(_red(f"{msg['role'].capitalize()} message:"))
26 |             if isinstance(msg["content"], str):
27 |                 print(msg["content"].strip())
28 |             else:
29 |                 if isinstance(msg["content"], list):
30 |                     for c in msg["content"]:
31 |                         if c["type"] == "text":
32 |                             print(c["text"].strip())
33 |                         elif c["type"] == "image_url":
34 |                             image_str = ""
35 |                             if "base64" in c["image_url"].get("url", ""):
36 |                                 len_base64 = len(c["image_url"]["url"].split("base64,")[1])
37 |                                 image_str = (
38 |                                     f"<{c['image_url']['url'].split('base64,')[0]}base64,"
39 |                                     f"<IMAGE BASE 64 ENCODED({len_base64!s})>"
40 |                                 )
41 |                             else:
42 |                                 image_str = f"<image_url: {c['image_url']['url']}>"
43 |                             print(_blue(image_str.strip()))
44 |                         elif c["type"] == "input_audio":
45 |                             audio_format = c["input_audio"]["format"]
46 |                             len_audio = len(c["input_audio"]["data"])
47 |                             audio_str = f"<audio format='{audio_format}' base64-encoded, length={len_audio}>"
48 |                             print(_blue(audio_str.strip()))
49 |             print("\n")
50 | 
51 |         if isinstance(outputs[0], dict):
52 |             if outputs[0]["text"]:
53 |                 print(_red("Response:"))
54 |                 print(_green(outputs[0]["text"].strip()))
55 | 
56 |             if outputs[0].get("tool_calls"):
57 |                 print(_red("Tool calls:"))
58 |                 for tool_call in outputs[0]["tool_calls"]:
59 |                     print(_green(f"{tool_call['function']['name']}: {tool_call['function']['arguments']}"))
60 |         else:
61 |             print(_red("Response:"))
62 |             print(_green(outputs[0].strip()))
63 | 
64 |         if len(outputs) > 1:
65 |             choices_text = f" \t (and {len(outputs) - 1} other completions)"
66 |             print(_red(choices_text, end=""))
67 | 
68 |     print("\n\n\n")
69 | 
```

--------------------------------------------------------------------------------
/tests/predict/test_refine.py:
--------------------------------------------------------------------------------

```python
 1 | import pytest
 2 | 
 3 | import dspy
 4 | from dspy.predict.predict import Predict
 5 | from dspy.predict.refine import Refine
 6 | from dspy.primitives.prediction import Prediction
 7 | from dspy.utils.dummies import DummyLM
 8 | 
 9 | 
10 | class DummyModule(dspy.Module):
11 |     def __init__(self, signature, forward_fn):
12 |         super().__init__()
13 |         self.predictor = Predict(signature)
14 |         self.forward_fn = forward_fn
15 | 
16 |     def forward(self, **kwargs) -> Prediction:
17 |         return self.forward_fn(self, **kwargs)
18 | 
19 | 
20 | def test_refine_forward_success_first_attempt():
21 |     lm = DummyLM([{"answer": "Brussels"}, {"answer": "City of Brussels"}, {"answer": "Brussels"}])
22 |     dspy.settings.configure(lm=lm)
23 |     module_call_count = [0]
24 | 
25 |     def count_calls(self, **kwargs):
26 |         module_call_count[0] += 1
27 |         return self.predictor(**kwargs)
28 | 
29 |     reward_call_count = [0]
30 | 
31 |     def reward_fn(kwargs, pred: Prediction) -> float:
32 |         reward_call_count[0] += 1
33 |         # The answer should always be one word.
34 |         return 1.0 if len(pred.answer) == 1 else 0.0
35 | 
36 |     predict = DummyModule("question -> answer", count_calls)
37 | 
38 |     refine = Refine(module=predict, N=3, reward_fn=reward_fn, threshold=1.0)
39 |     result = refine(question="What is the capital of Belgium?")
40 | 
41 |     assert result.answer == "Brussels", "Result should be `Brussels`"
42 |     assert reward_call_count[0] > 0, "Reward function should have been called"
43 |     assert module_call_count[0] == 3, (
44 |         "Module should have been called exactly 3 times, but was called %d times" % module_call_count[0]
45 |     )
46 | 
47 | 
48 | def test_refine_module_default_fail_count():
49 |     lm = DummyLM([{"answer": "Brussels"}, {"answer": "City of Brussels"}, {"answer": "Brussels"}])
50 |     dspy.settings.configure(lm=lm)
51 | 
52 |     def always_raise(self, **kwargs):
53 |         raise ValueError("Deliberately failing")
54 | 
55 |     predict = DummyModule("question -> answer", always_raise)
56 | 
57 |     refine = Refine(module=predict, N=3, reward_fn=lambda _, __: 1.0, threshold=0.0)
58 |     with pytest.raises(ValueError):
59 |         refine(question="What is the capital of Belgium?")
60 | 
61 | 
62 | def test_refine_module_custom_fail_count():
63 |     lm = DummyLM([{"answer": "Brussels"}, {"answer": "City of Brussels"}, {"answer": "Brussels"}])
64 |     dspy.settings.configure(lm=lm)
65 |     module_call_count = [0]
66 | 
67 |     def raise_on_second_call(self, **kwargs):
68 |         if module_call_count[0] < 2:
69 |             module_call_count[0] += 1
70 |             raise ValueError("Deliberately failing")
71 |         return self.predictor(**kwargs)
72 | 
73 |     predict = DummyModule("question -> answer", raise_on_second_call)
74 | 
75 |     refine = Refine(module=predict, N=3, reward_fn=lambda _, __: 1.0, threshold=0.0, fail_count=1)
76 |     with pytest.raises(ValueError):
77 |         refine(question="What is the capital of Belgium?")
78 |     assert module_call_count[0] == 2, (
79 |         "Module should have been called exactly 2 times, but was called %d times" % module_call_count[0]
80 |     )
81 | 
```

--------------------------------------------------------------------------------
/tests/predict/test_best_of_n.py:
--------------------------------------------------------------------------------

```python
 1 | import pytest
 2 | 
 3 | import dspy
 4 | from dspy.predict.best_of_n import BestOfN
 5 | from dspy.predict.predict import Predict
 6 | from dspy.primitives.prediction import Prediction
 7 | from dspy.utils.dummies import DummyLM
 8 | 
 9 | 
10 | class DummyModule(dspy.Module):
11 |     def __init__(self, signature, forward_fn):
12 |         super().__init__()
13 |         self.predictor = Predict(signature)
14 |         self.forward_fn = forward_fn
15 | 
16 |     def forward(self, **kwargs) -> Prediction:
17 |         return self.forward_fn(self, **kwargs)
18 | 
19 | 
20 | def test_refine_forward_success_first_attempt():
21 |     lm = DummyLM([{"answer": "Brussels"}, {"answer": "City of Brussels"}, {"answer": "Brussels"}])
22 |     dspy.settings.configure(lm=lm)
23 |     module_call_count = [0]
24 | 
25 |     def count_calls(self, **kwargs):
26 |         module_call_count[0] += 1
27 |         return self.predictor(**kwargs)
28 | 
29 |     reward_call_count = [0]
30 | 
31 |     def reward_fn(kwargs, pred: Prediction) -> float:
32 |         reward_call_count[0] += 1
33 |         # The answer should always be one word.
34 |         return 1.0 if len(pred.answer) == 1 else 0.0
35 | 
36 |     predict = DummyModule("question -> answer", count_calls)
37 | 
38 |     best_of_n = BestOfN(module=predict, N=3, reward_fn=reward_fn, threshold=1.0)
39 |     result = best_of_n(question="What is the capital of Belgium?")
40 | 
41 |     assert result.answer == "Brussels", "Result should be `Brussels`"
42 |     assert reward_call_count[0] > 0, "Reward function should have been called"
43 |     assert module_call_count[0] == 3, (
44 |         "Module should have been called exactly 3 times, but was called %d times" % module_call_count[0]
45 |     )
46 | 
47 | 
48 | def test_refine_module_default_fail_count():
49 |     lm = DummyLM([{"answer": "Brussels"}, {"answer": "City of Brussels"}, {"answer": "Brussels"}])
50 |     dspy.settings.configure(lm=lm)
51 | 
52 |     def always_raise(self, **kwargs):
53 |         raise ValueError("Deliberately failing")
54 | 
55 |     predict = DummyModule("question -> answer", always_raise)
56 | 
57 |     best_of_n = BestOfN(module=predict, N=3, reward_fn=lambda _, __: 1.0, threshold=0.0)
58 |     with pytest.raises(ValueError):
59 |         best_of_n(question="What is the capital of Belgium?")
60 | 
61 | 
62 | def test_refine_module_custom_fail_count():
63 |     lm = DummyLM([{"answer": "Brussels"}, {"answer": "City of Brussels"}, {"answer": "Brussels"}])
64 |     dspy.settings.configure(lm=lm)
65 |     module_call_count = [0]
66 | 
67 |     def raise_on_second_call(self, **kwargs):
68 |         if module_call_count[0] < 2:
69 |             module_call_count[0] += 1
70 |             raise ValueError("Deliberately failing")
71 |         return self.predictor(**kwargs)
72 | 
73 |     predict = DummyModule("question -> answer", raise_on_second_call)
74 | 
75 |     best_of_n = BestOfN(module=predict, N=3, reward_fn=lambda _, __: 1.0, threshold=0.0, fail_count=1)
76 |     with pytest.raises(ValueError):
77 |         best_of_n(question="What is the capital of Belgium?")
78 |     assert module_call_count[0] == 2, (
79 |         "Module should have been called exactly 2 times, but was called %d times" % module_call_count[0]
80 |     )
81 | 
```

--------------------------------------------------------------------------------
/tests/reliability/complex_types/generated/test_many_types_1/program.py:
--------------------------------------------------------------------------------

```python
  1 | ### Input models ###
  2 | 
  3 | 
  4 | from datetime import datetime
  5 | from enum import Enum
  6 | from typing import List, Tuple
  7 | 
  8 | from pydantic import BaseModel, Field
  9 | 
 10 | 
 11 | class EnumField(Enum):
 12 |     option1 = "option1"
 13 |     option2 = "option2"
 14 |     option3 = "option3"
 15 | 
 16 | 
 17 | class LiteralField(Enum):
 18 |     literalValue = "literalValue"
 19 | 
 20 | 
 21 | class ObjectField(BaseModel):
 22 |     subField1: str
 23 |     subField2: float
 24 | 
 25 | 
 26 | class NestedObjectField(BaseModel):
 27 |     tupleField: Tuple[str, float]
 28 |     enumField: EnumField
 29 |     datetimeField: datetime
 30 |     literalField: LiteralField
 31 | 
 32 | 
 33 | class ProgramInputs(BaseModel):
 34 |     tupleField: Tuple[str, float]
 35 |     enumField: EnumField
 36 |     datetimeField: datetime
 37 |     literalField: LiteralField
 38 |     objectField: ObjectField
 39 |     nestedObjectField: NestedObjectField
 40 | 
 41 | 
 42 | ### Output models ###
 43 | 
 44 | 
 45 | from datetime import datetime
 46 | from enum import Enum
 47 | from typing import List, Tuple, Union
 48 | 
 49 | from pydantic import BaseModel, Field
 50 | 
 51 | 
 52 | class ProcessedEnumField(Enum):
 53 |     option1 = "option1"
 54 |     option2 = "option2"
 55 |     option3 = "option3"
 56 | 
 57 | 
 58 | class ProcessedLiteralField(Enum):
 59 |     literalValue = "literalValue"
 60 | 
 61 | 
 62 | class ProcessedObjectField(BaseModel):
 63 |     subField1: str
 64 |     subField2: float
 65 |     additionalField: bool
 66 | 
 67 | 
 68 | class EnumField(Enum):
 69 |     option1 = "option1"
 70 |     option2 = "option2"
 71 |     option3 = "option3"
 72 | 
 73 | 
 74 | class LiteralField(Enum):
 75 |     literalValue = "literalValue"
 76 | 
 77 | 
 78 | class ProcessedNestedObjectField(BaseModel):
 79 |     tupleField: Tuple[str, float]
 80 |     enumField: EnumField
 81 |     datetimeField: datetime
 82 |     literalField: LiteralField
 83 |     additionalField: bool
 84 | 
 85 | 
 86 | class ProgramOutputs(BaseModel):
 87 |     processedTupleField: Tuple[str, float]
 88 |     processedEnumField: ProcessedEnumField
 89 |     processedDatetimeField: datetime
 90 |     processedLiteralField: ProcessedLiteralField
 91 |     processedObjectField: ProcessedObjectField
 92 |     processedNestedObjectField: ProcessedNestedObjectField
 93 | 
 94 | 
 95 | ### Program definition ###
 96 | 
 97 | import dspy
 98 | 
 99 | 
100 | class BaseSignature(dspy.Signature):
101 |     """
102 |     The program is designed to process various data types including tuples, enums, datetime values, literals, objects, and nested objects containing these types. The program will accept inputs of these types, perform specified operations on them, and return the results. The operations could include validation, transformation, and extraction of information from these inputs.
103 |     """
104 | 
105 | 
106 | program_signature = BaseSignature
107 | for input_field_name, input_field in ProgramInputs.model_fields.items():
108 |     program_signature = program_signature.append(
109 |         name=input_field_name,
110 |         field=dspy.InputField(description=input_field.description),
111 |         type_=input_field.annotation,
112 |     )
113 | for output_field_name, output_field in ProgramOutputs.model_fields.items():
114 |     program_signature = program_signature.append(
115 |         name=output_field_name,
116 |         field=dspy.OutputField(description=input_field.description),
117 |         type_=output_field.annotation,
118 |     )
119 | 
120 | program = dspy.Predict(program_signature)
121 | 
```

--------------------------------------------------------------------------------
/tests/clients/test_databricks.py:
--------------------------------------------------------------------------------

```python
 1 | """Test the Databricks finetuning and deployment.
 2 | 
 3 | This test requires valid Databricks credentials, so it is skipped on github actions. Right now it is only used for
 4 | manual testing.
 5 | """
 6 | 
 7 | import pytest
 8 | 
 9 | import dspy
10 | from dspy.clients.databricks import (
11 |     DatabricksProvider,
12 |     TrainingJobDatabricks,
13 |     _create_directory_in_databricks_unity_catalog,
14 | )
15 | 
16 | try:
17 |     from databricks.sdk import WorkspaceClient
18 | 
19 |     WorkspaceClient()
20 | except (ImportError, Exception):
21 |     # Skip the test if the Databricks SDK is not configured or credentials are not available.
22 |     pytestmark = pytest.mark.skip(reason="Databricks SDK not configured or credentials not available")
23 | 
24 | 
25 | def test_create_directory_in_databricks_unity_catalog():
26 |     from databricks.sdk import WorkspaceClient
27 | 
28 |     w = WorkspaceClient()
29 | 
30 |     with pytest.raises(
31 |         ValueError,
32 |         match=(
33 |             "Databricks Unity Catalog path must be in the format '/Volumes/<catalog>/<schema>/<volume>/...', "
34 |             "but received: /badstring/whatever"
35 |         ),
36 |     ):
37 |         _create_directory_in_databricks_unity_catalog(w, "/badstring/whatever")
38 | 
39 |     _create_directory_in_databricks_unity_catalog(w, "/Volumes/main/chenmoney/testing/dspy_testing")
40 |     # Check that the directory was created successfully, otherwise `get_directory_metadata` will raise an exception.
41 |     w.files.get_directory_metadata("/Volumes/main/chenmoney/testing/dspy_testing")
42 | 
43 | 
44 | def test_create_finetuning_job():
45 |     fake_training_data = [
46 |         {
47 |             "messages": [
48 |                 {"role": "user", "content": "Hello, how are you?"},
49 |                 {"role": "assistant", "content": "I'm doing great, thank you!"},
50 |             ]
51 |         },
52 |         {
53 |             "messages": [
54 |                 {"role": "user", "content": "What is the capital of France?"},
55 |                 {"role": "assistant", "content": "Paris!"},
56 |             ]
57 |         },
58 |         {
59 |             "messages": [
60 |                 {"role": "user", "content": "What is the capital of Germany?"},
61 |                 {"role": "assistant", "content": "Berlin!"},
62 |             ]
63 |         },
64 |     ]
65 |     dspy.settings.experimental = True
66 | 
67 |     job = TrainingJobDatabricks()
68 | 
69 |     DatabricksProvider.finetune(
70 |         job=job,
71 |         model="meta-llama/Llama-3.2-1B",
72 |         train_data=fake_training_data,
73 |         data_format="chat",
74 |         train_kwargs={
75 |             "train_data_path": "/Volumes/main/chenmoney/testing/dspy_testing",
76 |             "register_to": "main.chenmoney.finetuned_model",
77 |             "task_type": "CHAT_COMPLETION",
78 |             "skip_deploy": True,
79 |         },
80 |     )
81 |     assert job.finetuning_run.status.display_name is not None
82 | 
83 | 
84 | def test_deploy_finetuned_model():
85 |     dspy.settings.experimental = True
86 |     model_to_deploy = "main.chenmoney.finetuned_model"
87 | 
88 |     DatabricksProvider.deploy_finetuned_model(
89 |         model=model_to_deploy,
90 |         data_format="chat",
91 |     )
92 | 
93 |     lm = dspy.LM(model="databricks/main_chenmoney_finetuned_model")
94 |     lm("what is 2 + 2?")
95 | 
```

--------------------------------------------------------------------------------
/dspy/predict/retry.py:
--------------------------------------------------------------------------------

```python
 1 | # import copy
 2 | 
 3 | # import dspy
 4 | 
 5 | # from .predict import Predict
 6 | 
 7 | 
 8 | # class Retry(Predict):
 9 | #     def __init__(self, module):
10 | #         super().__init__(module.signature)
11 | #         self.module = module
12 | #         self.original_signature = module.signature
13 | #         self.original_forward = module.forward
14 | #         self.new_signature = self._create_new_signature(self.original_signature)
15 | 
16 | #     def _create_new_signature(self, signature):
17 | #         # Add "Past" input fields for each output field
18 | #         for key, value in signature.output_fields.items():
19 | #             actual_prefix = value.json_schema_extra["prefix"].split(":")[0] + ":"
20 | #             signature = signature.append(f"past_{key}", dspy.InputField(
21 | #                 prefix="Previous " + actual_prefix,
22 | #                 desc=f"past {actual_prefix[:-1]} with errors",
23 | #                 format=value.json_schema_extra.get("format"),
24 | #             ))
25 | 
26 | #         signature = signature.append("feedback", dspy.InputField(
27 | #             prefix="Instructions:",
28 | #             desc="Some instructions you must satisfy",
29 | #             format=str,
30 | #         ))
31 | 
32 | #         return signature
33 | 
34 | #     def forward(self, *, past_outputs, **kwargs):
35 | #         # Take into account the possible new signature, as in TypedPredictor
36 | #         new_signature = kwargs.pop("new_signature", None)
37 | #         if new_signature:
38 | #             self.original_signature = new_signature
39 | #             self.new_signature = self._create_new_signature(self.original_signature)
40 | 
41 | #         # Convert the dict past_outputs={"answer": ...} to kwargs
42 | #         # {past_answer=..., ...}
43 | #         for key, value in past_outputs.items():
44 | #             past_key = f"past_{key}"
45 | #             if past_key in self.new_signature.input_fields:
46 | #                 kwargs[past_key] = value
47 | #         # Tell the wrapped module to use the new signature.
48 | #         # Note: This only works if the wrapped module is a Predict or ChainOfThought.
49 | #         kwargs["new_signature"] = self.new_signature
50 | #         return self.original_forward(**kwargs)
51 | 
52 | #     def __call__(self, **kwargs):
53 | #         copy.deepcopy(kwargs)
54 | #         kwargs["_trace"] = False
55 | #         kwargs.setdefault("demos", self.demos if self.demos is not None else [])
56 | 
57 | #         # perform backtracking
58 | #         if dspy.settings.backtrack_to == self:
59 | #             for key, value in dspy.settings.backtrack_to_args.items():
60 | #                 kwargs.setdefault(key, value)
61 | #             pred = self.forward(**kwargs)
62 | #         else:
63 | #             pred = self.module(**kwargs)
64 | 
65 | #         # now pop multiple reserved keys
66 | #         # NOTE(shangyin) past_outputs seems not useful to include in demos,
67 | #         # therefore dropped
68 | #         for key in ["_trace", "demos", "signature", "new_signature", "config", "lm", "past_outputs"]:
69 | #             kwargs.pop(key, None)
70 | 
71 | #         if dspy.settings.trace is not None:
72 | #             trace = dspy.settings.trace
73 | #             trace.append((self, {**kwargs}, pred))
74 | #         return pred
75 | 
```

--------------------------------------------------------------------------------
/tests/primitives/test_example.py:
--------------------------------------------------------------------------------

```python
  1 | import pytest
  2 | 
  3 | import dspy
  4 | from dspy import Example
  5 | 
  6 | 
  7 | def test_example_initialization():
  8 |     example = Example(a=1, b=2)
  9 |     assert example.a == 1
 10 |     assert example.b == 2
 11 | 
 12 | 
 13 | def test_example_initialization_from_base():
 14 |     base = Example(a=1, b=2)
 15 |     example = Example(base=base, c=3)
 16 |     assert example.a == 1
 17 |     assert example.b == 2
 18 |     assert example.c == 3
 19 | 
 20 | 
 21 | def test_example_initialization_from_dict():
 22 |     base_dict = {"a": 1, "b": 2}
 23 |     example = Example(base=base_dict, c=3)
 24 |     assert example.a == 1
 25 |     assert example.b == 2
 26 |     assert example.c == 3
 27 | 
 28 | 
 29 | def test_example_set_get_item():
 30 |     example = Example()
 31 |     example["a"] = 1
 32 |     assert example["a"] == 1
 33 | 
 34 | 
 35 | def test_example_attribute_access():
 36 |     example = Example(a=1)
 37 |     assert example.a == 1
 38 |     example.a = 2
 39 |     assert example.a == 2
 40 | 
 41 | 
 42 | def test_example_deletion():
 43 |     example = Example(a=1, b=2)
 44 |     del example["a"]
 45 |     with pytest.raises(AttributeError):
 46 |         _ = example.a
 47 | 
 48 | 
 49 | def test_example_len():
 50 |     example = Example(a=1, b=2, dspy_hidden=3)
 51 |     assert len(example) == 2
 52 | 
 53 | 
 54 | def test_example_repr_str_img():
 55 |     example = Example(
 56 |         img=dspy.Image(url="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7")
 57 |     )
 58 |     assert (
 59 |         repr(example)
 60 |         == "Example({'img': Image(url=data:image/gif;base64,<IMAGE_BASE_64_ENCODED(56)>)}) (input_keys=None)"
 61 |     )
 62 |     assert (
 63 |         str(example)
 64 |         == "Example({'img': Image(url=data:image/gif;base64,<IMAGE_BASE_64_ENCODED(56)>)}) (input_keys=None)"
 65 |     )
 66 | 
 67 | 
 68 | def test_example_repr_str():
 69 |     example = Example(a=1)
 70 |     assert repr(example) == "Example({'a': 1}) (input_keys=None)"
 71 |     assert str(example) == "Example({'a': 1}) (input_keys=None)"
 72 | 
 73 | 
 74 | def test_example_eq():
 75 |     example1 = Example(a=1, b=2)
 76 |     example2 = Example(a=1, b=2)
 77 |     assert example1 == example2
 78 |     assert example1 != ""
 79 | 
 80 | 
 81 | def test_example_hash():
 82 |     example1 = Example(a=1, b=2)
 83 |     example2 = Example(a=1, b=2)
 84 |     assert hash(example1) == hash(example2)
 85 | 
 86 | 
 87 | def test_example_keys_values_items():
 88 |     example = Example(a=1, b=2, dspy_hidden=3)
 89 |     assert set(example.keys()) == {"a", "b"}
 90 |     assert 1 in example.values()
 91 |     assert ("b", 2) in example.items()
 92 | 
 93 | 
 94 | def test_example_get():
 95 |     example = Example(a=1, b=2)
 96 |     assert example.get("a") == 1
 97 |     assert example.get("c", "default") == "default"
 98 | 
 99 | 
100 | def test_example_with_inputs():
101 |     example = Example(a=1, b=2).with_inputs("a")
102 |     assert example._input_keys == {"a"}
103 | 
104 | 
105 | def test_example_inputs_labels():
106 |     example = Example(a=1, b=2).with_inputs("a")
107 |     inputs = example.inputs()
108 |     assert inputs.toDict() == {"a": 1}
109 |     labels = example.labels()
110 |     assert labels.toDict() == {"b": 2}
111 | 
112 | 
113 | def test_example_copy_without():
114 |     example = Example(a=1, b=2)
115 |     copied = example.copy(c=3)
116 |     assert copied.a == 1
117 |     assert copied.c == 3
118 |     without_a = copied.without("a")
119 |     with pytest.raises(AttributeError):
120 |         _ = without_a.a
121 | 
122 | 
123 | def test_example_to_dict():
124 |     example = Example(a=1, b=2)
125 |     assert example.toDict() == {"a": 1, "b": 2}
126 | 
```

--------------------------------------------------------------------------------
/docs/docs/tutorials/build_ai_program/index.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Build AI Programs with DSPy
 2 | 
 3 | This section contains hands-on tutorials that guide you through building production-ready AI applications using DSPy. Each tutorial demonstrates practical use cases and shows you how to leverage DSPy's modular programming approach to create robust, maintainable AI systems.
 4 | 
 5 | ## Core Applications
 6 | 
 7 | ### [Managing Conversation History](../conversation_history/index.md)
 8 | Learn how to manage conversation history in DSPy applications.
 9 | 
10 | ### [Building AI Agents with DSPy](../customer_service_agent/index.ipynb)
11 | Learn to create intelligent agents that can handle complex customer service scenarios. This tutorial shows how to build agents that can understand context, maintain conversation state, and provide helpful responses.
12 | 
13 | ### [Building AI Applications by Customizing DSPy Modules](../custom_module/index.ipynb)
14 | Discover how to create custom DSPy modules tailored to your specific needs. Learn the patterns for building reusable, composable components that can be shared across different applications.
15 | 
16 | ## Retrieval-Augmented Generation (RAG)
17 | 
18 | ### [Retrieval-Augmented Generation (RAG)](../rag/index.ipynb)
19 | Master the fundamentals of RAG systems with DSPy. Learn how to combine retrieval mechanisms with language models to build systems that can answer questions using external knowledge sources.
20 | 
21 | ### [Building RAG as Agent](../agents/index.ipynb)
22 | Take RAG to the next level by building `ReAct` agent-based systems that can reason about when and how to retrieve information, making your RAG systems more intelligent and adaptive.
23 | 
24 | ### [Multi-Hop RAG](../multihop_search/index.ipynb)
25 | Build sophisticated RAG systems that can perform multi-step reasoning across multiple information sources, perfect for complex research and analysis tasks.
26 | 
27 | ## Specialized Use Cases
28 | 
29 | ### [Entity Extraction](../entity_extraction/index.ipynb)
30 | Learn to build systems that can identify and extract specific entities from text, essential for information processing and data analysis applications.
31 | 
32 | ### [Classification](../classification/index.md)
33 | Build robust text classification systems using DSPy's modular approach with a topic classification example.
34 | 
35 | ### [Privacy-Conscious Delegation](../papillon/index.md)
36 | Explore advanced techniques for building AI systems that respect privacy constraints while maintaining high performance by combining a small local model and an advanced external model.
37 | 
38 | ## Advanced Reasoning
39 | 
40 | ### [Program Of Thought](../program_of_thought/index.ipynb)
41 | Learn to build systems that can generate and execute code to solve complex problems, combining the power of language models with programmatic reasoning.
42 | 
43 | ## Multimodal Applications
44 | 
45 | ### [Image Generation Prompt iteration](../image_generation_prompting/index.ipynb)
46 | Discover how to use DSPy to iteratively improve image generation prompts, creating better visual content through systematic optimization.
47 | 
48 | ### [Audio](../audio/index.ipynb)
49 | Explore audio processing applications with DSPy, learning to build systems that can understand, process, and generate audio content.
50 | 
```

--------------------------------------------------------------------------------
/tests/predict/test_retry.py:
--------------------------------------------------------------------------------

```python
 1 | # import functools
 2 | 
 3 | # import pydantic
 4 | 
 5 | # import dspy
 6 | # from dspy.primitives.assertions import assert_transform_module, backtrack_handler
 7 | # from dspy.utils import DummyLM
 8 | 
 9 | 
10 | # def test_retry_simple():
11 | #     predict = dspy.Predict("question -> answer")
12 | #     retry_module = dspy.Retry(predict)
13 | 
14 | #     # Test Retry has created the correct new signature
15 | #     for field in predict.signature.output_fields:
16 | #         assert f"past_{field}" in retry_module.new_signature.input_fields
17 | #     assert "feedback" in retry_module.new_signature.input_fields
18 | 
19 | #     lm = DummyLM([{"answer": "blue"}])
20 | #     dspy.settings.configure(lm=lm)
21 | #     result = retry_module.forward(
22 | #         question="What color is the sky?",
23 | #         past_outputs={"answer": "red"},
24 | #         feedback="Try harder",
25 | #     )
26 | #     assert result.answer == "blue"
27 | 
28 | 
29 | # def test_retry_forward_with_feedback():
30 | #     # First we make a mistake, then we fix it
31 | #     lm = DummyLM([{"answer": "red"}, {"answer": "blue"}])
32 | #     dspy.settings.configure(lm=lm, trace=[])
33 | 
34 | #     class SimpleModule(dspy.Module):
35 | #         def __init__(self):
36 | #             super().__init__()
37 | #             self.predictor = dspy.Predict("question -> answer")
38 | 
39 | #         def forward(self, **kwargs):
40 | #             result = self.predictor(**kwargs)
41 | #             print(f"SimpleModule got {result.answer=}")
42 | #             dspy.Suggest(result.answer == "blue", "Please think harder")
43 | #             return result
44 | 
45 | #     program = SimpleModule()
46 | #     program = assert_transform_module(
47 | #         program.map_named_predictors(dspy.Retry),
48 | #         functools.partial(backtrack_handler, max_backtracks=1),
49 | #     )
50 | 
51 | #     result = program(question="What color is the sky?")
52 | 
53 | #     assert result.answer == "blue"
54 | 
55 | 
56 | # # def test_retry_forward_with_typed_predictor():
57 | # #     # First we make a mistake, then we fix it
58 | # #     lm = DummyLM([{"output": '{"answer":"red"}'}, {"output": '{"answer":"blue"}'}])
59 | # #     dspy.settings.configure(lm=lm, trace=[])
60 | 
61 | # #     class AnswerQuestion(dspy.Signature):
62 | # #         """Answer questions with succinct responses."""
63 | 
64 | # #         class Input(pydantic.BaseModel):
65 | # #             question: str
66 | 
67 | # #         class Output(pydantic.BaseModel):
68 | # #             answer: str
69 | 
70 | # #         input: Input = dspy.InputField()
71 | # #         output: Output = dspy.OutputField()
72 | 
73 | # #     class QuestionAnswerer(dspy.Module):
74 | # #         def __init__(self):
75 | # #             super().__init__()
76 | # #             self.answer_question = dspy.TypedPredictor(AnswerQuestion)
77 | 
78 | # #         def forward(self, **kwargs):
79 | # #             result = self.answer_question(input=AnswerQuestion.Input(**kwargs)).output
80 | # #             dspy.Suggest(result.answer == "blue", "Please think harder")
81 | # #             return result
82 | 
83 | # #     program = QuestionAnswerer()
84 | # #     program = assert_transform_module(
85 | # #         program.map_named_predictors(dspy.Retry),
86 | # #         functools.partial(backtrack_handler, max_backtracks=1),
87 | # #     )
88 | 
89 | # #     result = program(question="What color is the sky?")
90 | 
91 | # #     assert result.answer == "blue"
92 | 
```

--------------------------------------------------------------------------------
/tests/utils/test_annotation.py:
--------------------------------------------------------------------------------

```python
 1 | from dspy.utils.annotation import experimental
 2 | 
 3 | 
 4 | def test_experimental_decorator_on_function():
 5 |     @experimental
 6 |     def test_function():
 7 |         """A test function."""
 8 |         return "test"
 9 | 
10 |     assert "Experimental: This function may change or be removed in a future release without warning." in test_function.__doc__
11 |     assert "A test function." in test_function.__doc__
12 |     assert test_function() == "test"
13 | 
14 | 
15 | def test_experimental_decorator_on_function_with_version():
16 |     @experimental(version="3.1.0")
17 |     def test_function():
18 |         """A test function with version."""
19 |         return "versioned"
20 | 
21 |     assert "introduced in v3.1.0" in test_function.__doc__
22 |     assert "Experimental: This function may change or be removed in a future release without warning (introduced in v3.1.0)." in test_function.__doc__
23 |     assert "A test function with version." in test_function.__doc__
24 |     assert test_function() == "versioned"
25 | 
26 | 
27 | def test_experimental_decorator_on_class():
28 |     @experimental
29 |     class TestClass:
30 |         """A test class."""
31 | 
32 |         def method(self):
33 |             return "method"
34 | 
35 |     assert "Experimental: This class may change or be removed in a future release without warning." in TestClass.__doc__
36 |     assert "A test class." in TestClass.__doc__
37 | 
38 |     instance = TestClass()
39 |     assert instance.method() == "method"
40 | 
41 | 
42 | def test_experimental_decorator_on_class_with_version():
43 |     @experimental(version="2.5.0")
44 |     class TestClass:
45 |         """A test class with version."""
46 |         pass
47 | 
48 |     assert "introduced in v2.5.0" in TestClass.__doc__
49 |     assert "Experimental: This class may change or be removed in a future release without warning (introduced in v2.5.0)." in TestClass.__doc__
50 |     assert "A test class with version." in TestClass.__doc__
51 | 
52 | 
53 | def test_experimental_decorator_without_docstring():
54 |     @experimental
55 |     def test_function():
56 |         return "no_doc"
57 | 
58 |     assert test_function.__doc__ == "Experimental: This function may change or be removed in a future release without warning."
59 |     assert test_function() == "no_doc"
60 | 
61 | 
62 | def test_experimental_decorator_without_docstring_with_version():
63 |     @experimental(version="1.0.0")
64 |     def test_function():
65 |         return "no_doc_version"
66 | 
67 |     assert test_function.__doc__ == "Experimental: This function may change or be removed in a future release without warning (introduced in v1.0.0)."
68 |     assert test_function() == "no_doc_version"
69 | 
70 | 
71 | def test_experimental_decorator_with_callable_syntax():
72 |     def test_function():
73 |         """A test function."""
74 |         return "callable"
75 | 
76 |     decorated = experimental(test_function)
77 | 
78 |     assert "Experimental:" in decorated.__doc__
79 |     assert "A test function." in decorated.__doc__
80 |     assert decorated() == "callable"
81 | 
82 | 
83 | def test_experimental_decorator_with_version_callable_syntax():
84 |     def test_function():
85 |         """A test function."""
86 |         return "callable_version"
87 | 
88 |     decorated = experimental(test_function, version="4.0.0")
89 | 
90 |     assert "introduced in v4.0.0" in decorated.__doc__
91 |     assert "Experimental:" in decorated.__doc__
92 |     assert decorated() == "callable_version"
93 | 
```

--------------------------------------------------------------------------------
/tests/reliability/complex_types/generated/test_nesting_1/schema.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "description": "The AI program is designed to process hierarchical data structures with multiple levels of nesting. The program will take a deeply nested input structure representing a complex dataset, perform specific transformations, validations, and computations, and then produce an equally complex nested output structure. The program is suitable for applications that require detailed data processing, such as multi-level data aggregation, hierarchical data validation, and nested data transformation.",
 3 |   "properties": {
 4 |     "level1": {
 5 |       "properties": {
 6 |         "level2": {
 7 |           "properties": {
 8 |             "level3": {
 9 |               "properties": {
10 |                 "level4": {
11 |                   "properties": {
12 |                     "level5": {
13 |                       "properties": {
14 |                         "field1": {
15 |                           "description": "A string field at the deepest level",
16 |                           "type": "string"
17 |                         },
18 |                         "field2": {
19 |                           "description": "A numerical field at the deepest level",
20 |                           "type": "number"
21 |                         }
22 |                       },
23 |                       "required": ["field1", "field2"],
24 |                       "type": "object"
25 |                     }
26 |                   },
27 |                   "required": ["level5"],
28 |                   "type": "object"
29 |                 }
30 |               },
31 |               "required": ["level4"],
32 |               "type": "object"
33 |             }
34 |           },
35 |           "required": ["level3"],
36 |           "type": "object"
37 |         }
38 |       },
39 |       "required": ["level2"],
40 |       "type": "object"
41 |     },
42 |     "resultLevel1": {
43 |       "properties": {
44 |         "resultLevel2": {
45 |           "properties": {
46 |             "resultLevel3": {
47 |               "properties": {
48 |                 "resultLevel4": {
49 |                   "properties": {
50 |                     "resultLevel5": {
51 |                       "properties": {
52 |                         "outputField1": {
53 |                           "description": "A boolean field indicating success or failure",
54 |                           "type": "boolean"
55 |                         },
56 |                         "outputField2": {
57 |                           "description": "An array of strings representing messages",
58 |                           "items": {
59 |                             "type": "string"
60 |                           },
61 |                           "type": "array"
62 |                         }
63 |                       },
64 |                       "required": ["outputField1", "outputField2"],
65 |                       "type": "object"
66 |                     }
67 |                   },
68 |                   "required": ["resultLevel5"],
69 |                   "type": "object"
70 |                 }
71 |               },
72 |               "required": ["resultLevel4"],
73 |               "type": "object"
74 |             }
75 |           },
76 |           "required": ["resultLevel3"],
77 |           "type": "object"
78 |         }
79 |       },
80 |       "required": ["resultLevel2"],
81 |       "type": "object"
82 |     }
83 |   },
84 |   "required": ["level1", "resultLevel1"],
85 |   "type": "object"
86 | }
87 | 
```

--------------------------------------------------------------------------------
/dspy/predict/parallel.py:
--------------------------------------------------------------------------------

```python
 1 | import threading
 2 | from typing import Any
 3 | 
 4 | from dspy.dsp.utils.settings import settings
 5 | from dspy.primitives.example import Example
 6 | from dspy.utils.parallelizer import ParallelExecutor
 7 | 
 8 | 
 9 | class Parallel:
10 |     def __init__(
11 |         self,
12 |         num_threads: int | None = None,
13 |         max_errors: int | None = None,
14 |         access_examples: bool = True,
15 |         return_failed_examples: bool = False,
16 |         provide_traceback: bool | None = None,
17 |         disable_progress_bar: bool = False,
18 |     ):
19 |         super().__init__()
20 |         self.num_threads = num_threads or settings.num_threads
21 |         self.max_errors = settings.max_errors if max_errors is None else max_errors
22 |         self.access_examples = access_examples
23 |         self.return_failed_examples = return_failed_examples
24 |         self.provide_traceback = provide_traceback
25 |         self.disable_progress_bar = disable_progress_bar
26 | 
27 |         self.error_count = 0
28 |         self.error_lock = threading.Lock()
29 |         self.cancel_jobs = threading.Event()
30 |         self.failed_examples = []
31 |         self.exceptions = []
32 | 
33 |     def forward(self, exec_pairs: list[tuple[Any, Example]], num_threads: int | None = None) -> list[Any]:
34 |         num_threads = num_threads if num_threads is not None else self.num_threads
35 | 
36 |         executor = ParallelExecutor(
37 |             num_threads=num_threads,
38 |             max_errors=self.max_errors,
39 |             provide_traceback=self.provide_traceback,
40 |             disable_progress_bar=self.disable_progress_bar,
41 |         )
42 | 
43 |         def process_pair(pair):
44 |             result = None
45 |             module, example = pair
46 | 
47 |             if isinstance(example, Example):
48 |                 if self.access_examples:
49 |                     result = module(**example.inputs())
50 |                 else:
51 |                     result = module(example)
52 |             elif isinstance(example, dict):
53 |                 result = module(**example)
54 |             elif isinstance(example, list) and module.__class__.__name__ == "Parallel":
55 |                 result = module(example)
56 |             elif isinstance(example, tuple):
57 |                 result = module(*example)
58 |             else:
59 |                 raise ValueError(
60 |                     f"Invalid example type: {type(example)}, only supported types are Example, dict, list and tuple"
61 |                 )
62 |             return result
63 | 
64 |         # Execute the processing function over the execution pairs
65 |         results = executor.execute(process_pair, exec_pairs)
66 | 
67 |         # Populate failed examples and exceptions from the executor
68 |         if self.return_failed_examples:
69 |             for failed_idx in executor.failed_indices:
70 |                 if failed_idx < len(exec_pairs):
71 |                     _, original_example = exec_pairs[failed_idx]
72 |                     self.failed_examples.append(original_example)
73 |                     if exception := executor.exceptions_map.get(failed_idx):
74 |                         self.exceptions.append(exception)
75 | 
76 |             return results, self.failed_examples, self.exceptions
77 |         else:
78 |             return results
79 | 
80 |     def __call__(self, *args: Any, **kwargs: Any) -> Any:
81 |         return self.forward(*args, **kwargs)
82 | 
```

--------------------------------------------------------------------------------
/tests/reliability/complex_types/generated/test_nesting_2/program.py:
--------------------------------------------------------------------------------

```python
 1 | ### Input models ###
 2 | 
 3 | 
 4 | from datetime import datetime
 5 | 
 6 | from pydantic import BaseModel, Field
 7 | 
 8 | 
 9 | class Details(BaseModel):
10 |     value: str = Field(..., description="Customer's value category")
11 |     age: int = Field(..., description="Customer's age")
12 | 
13 | 
14 | class Customer(BaseModel):
15 |     customer_id: str = Field(..., description="Unique identifier for the customer")
16 |     customer_type: bool = Field(..., description="Indicates if the customer is a premium member")
17 |     details: Details
18 | 
19 | 
20 | class Details1(BaseModel):
21 |     value: float = Field(..., description="Monetary value of the transaction")
22 |     timestamp: datetime = Field(..., description="Timestamp of the transaction")
23 | 
24 | 
25 | class Transaction(BaseModel):
26 |     transaction_id: str = Field(..., description="Unique identifier for the transaction")
27 |     amount: float = Field(..., description="Transaction amount")
28 |     details: Details1
29 | 
30 | 
31 | class ProgramInputs(BaseModel):
32 |     customer: Customer
33 |     transaction: Transaction
34 | 
35 | 
36 | ### Output models ###
37 | 
38 | 
39 | from datetime import datetime
40 | 
41 | from pydantic import BaseModel, Field
42 | 
43 | 
44 | class CustomerType(BaseModel):
45 |     is_premium: bool = Field(..., description="Indicates if the customer is a premium member")
46 |     category: str = Field(..., description="Customer's membership category")
47 | 
48 | 
49 | class CustomerSummary(BaseModel):
50 |     customer_id: str = Field(..., description="Unique identifier for the customer")
51 |     customer_type: CustomerType
52 |     value: str = Field(..., description="Customer's value category")
53 | 
54 | 
55 | class Details(BaseModel):
56 |     value: float = Field(..., description="Monetary value of the transaction")
57 |     timestamp: datetime = Field(..., description="Timestamp of the transaction")
58 | 
59 | 
60 | class TransactionSummary(BaseModel):
61 |     transaction_id: str = Field(..., description="Unique identifier for the transaction")
62 |     total_amount: float = Field(..., description="Total transaction amount")
63 |     details: Details
64 | 
65 | 
66 | class ProgramOutputs(BaseModel):
67 |     customer_summary: CustomerSummary
68 |     transaction_summary: TransactionSummary
69 | 
70 | 
71 | ### Program definition ###
72 | 
73 | import dspy
74 | 
75 | 
76 | class BaseSignature(dspy.Signature):
77 |     """
78 |     This AI program is designed to process complex datasets with multiple nested input fields and produce structured output fields. It can handle cases where nested fields have the same name but different types, ensuring that the data is accurately processed and transformed. The program is particularly useful for applications that require detailed data analysis, integration of multiple data sources, and handling of heterogeneous data types.
79 |     """
80 | 
81 | 
82 | program_signature = BaseSignature
83 | for input_field_name, input_field in ProgramInputs.model_fields.items():
84 |     program_signature = program_signature.append(
85 |         name=input_field_name,
86 |         field=dspy.InputField(description=input_field.description),
87 |         type_=input_field.annotation,
88 |     )
89 | for output_field_name, output_field in ProgramOutputs.model_fields.items():
90 |     program_signature = program_signature.append(
91 |         name=output_field_name,
92 |         field=dspy.OutputField(description=input_field.description),
93 |         type_=output_field.annotation,
94 |     )
95 | 
96 | program = dspy.ChainOfThought(program_signature)
97 | 
```

--------------------------------------------------------------------------------
/dspy/teleprompt/teleprompt_optuna.py:
--------------------------------------------------------------------------------

```python
 1 | from dspy.evaluate.evaluate import Evaluate
 2 | from dspy.teleprompt.teleprompt import Teleprompter
 3 | 
 4 | from .bootstrap import BootstrapFewShot
 5 | 
 6 | 
 7 | class BootstrapFewShotWithOptuna(Teleprompter):
 8 |     def __init__(
 9 |         self,
10 |         metric,
11 |         teacher_settings=None,
12 |         max_bootstrapped_demos=4,
13 |         max_labeled_demos=16,
14 |         max_rounds=1,
15 |         num_candidate_programs=16,
16 |         num_threads=None,
17 |     ):
18 |         self.metric = metric
19 |         self.teacher_settings = teacher_settings or {}
20 |         self.max_rounds = max_rounds
21 |         self.num_threads = num_threads
22 |         self.min_num_samples = 1
23 |         self.max_num_samples = max_bootstrapped_demos
24 |         self.num_candidate_sets = num_candidate_programs
25 |         # self.max_num_traces = 1 + int(max_bootstrapped_demos / 2.0 * self.num_candidate_sets)
26 | 
27 |         # Semi-hacky way to get the parent class's _bootstrap function to stop early.
28 |         # self.max_bootstrapped_demos = self.max_num_traces
29 |         self.max_labeled_demos = max_labeled_demos
30 | 
31 |         print("Going to sample between", self.min_num_samples, "and", self.max_num_samples, "traces per predictor.")
32 |         # print("Going to sample", self.max_num_traces, "traces in total.")
33 |         print("Will attempt to train", self.num_candidate_sets, "candidate sets.")
34 | 
35 |     def objective(self, trial):
36 |         program2 = self.student.reset_copy()
37 |         for (name, compiled_predictor), (_, program2_predictor) in zip(
38 |             self.compiled_teleprompter.named_predictors(), program2.named_predictors(), strict=False,
39 |         ):
40 |             all_demos = compiled_predictor.demos
41 |             demo_index = trial.suggest_int(f"demo_index_for_{name}", 0, len(all_demos) - 1)
42 |             selected_demo = dict(all_demos[demo_index])
43 |             program2_predictor.demos = [selected_demo]
44 |         evaluate = Evaluate(
45 |             devset=self.valset,
46 |             metric=self.metric,
47 |             num_threads=self.num_threads,
48 |             display_table=False,
49 |             display_progress=True,
50 |         )
51 |         result = evaluate(program2)
52 |         trial.set_user_attr("program", program2)
53 |         return result.score
54 | 
55 |     def compile(self, student, *, teacher=None, max_demos, trainset, valset=None):
56 |         import optuna
57 |         self.trainset = trainset
58 |         self.valset = valset or trainset
59 |         self.student = student.reset_copy()
60 |         self.teacher = teacher.deepcopy() if teacher is not None else student.reset_copy()
61 |         teleprompter_optimize = BootstrapFewShot(
62 |             metric=self.metric,
63 |             max_bootstrapped_demos=max_demos,
64 |             max_labeled_demos=self.max_labeled_demos,
65 |             teacher_settings=self.teacher_settings,
66 |             max_rounds=self.max_rounds,
67 |         )
68 |         self.compiled_teleprompter = teleprompter_optimize.compile(
69 |             self.student, teacher=self.teacher, trainset=self.trainset,
70 |         )
71 |         study = optuna.create_study(direction="maximize")
72 |         study.optimize(self.objective, n_trials=self.num_candidate_sets)
73 |         best_program = study.trials[study.best_trial.number].user_attrs["program"]
74 |         print("Best score:", study.best_value)
75 |         print("Best program:", best_program)
76 |         return best_program
77 | 
```

--------------------------------------------------------------------------------
/tests/reliability/complex_types/generated/test_many_types_1/inputs/input2.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |   "assertions": [
 3 |     "The 'processedTupleField' should be an tuple with exactly two elements: the first element being a string and the second element being a number.",
 4 |     "The 'processedEnumField' should be one of the predefined options: 'option1', 'option2', or 'option3'.",
 5 |     "The 'processedDatetimeField' should be a date-time",
 6 |     "The 'processedLiteralField' should be the enum 'literalValue'.",
 7 |     "The 'processedObjectField' should be an object containing 'subField1' as a string, 'subField2' as a number, and an 'additionalField' as a boolean.",
 8 |     "The 'processedNestedObjectField' should be an object containing 'tupleField' as a tuple with a string and float, 'enumField' as one of the predefined options (option1, option2, or option3), 'datetimeField' as a 'date-time' object, 'literalField' as the string 'literalValue', and an 'additionalField' as a boolean."
 9 |   ],
10 |   "input": {
11 |     "datetimeField": "2023-10-01T12:00:00Z",
12 |     "enumField": "option1",
13 |     "literalField": "literalValue",
14 |     "nestedObjectField": {
15 |       "datetimeField": "2023-11-01T12:00:00Z",
16 |       "enumField": "option2",
17 |       "literalField": "literalValue",
18 |       "tupleField": ["nestedString", 789]
19 |     },
20 |     "objectField": {
21 |       "subField1": "Patriotism is a feeling of love, devotion, and sense of attachment to one's country. This attachment can be a combination of many different feelings relating to one's homeland, including ethnic, cultural, political or historical aspects. It encompasses a set of concepts closely related to those of nationalism. In the context of patriotism, people may express their feelings in a variety of ways, including supporting their country's interests and policies, celebrating national holidays, and participating in civic activities. Patriotism often involves a sense of pride in one's country and a willingness to defend it against any threats. It can also include a commitment to improving the country and making it a better place for future generations. The concept of patriotism is often linked with the idea of national identity, which is the sense of a nation as a cohesive whole, as represented by distinctive traditions, culture, language, and politics. Patriots may feel a strong sense of loyalty and duty to their country, and they may take actions to support and protect it. However, it is important to note that patriotism can also be a complex and sometimes controversial concept. While it can inspire positive actions and a sense of community, it can also lead to exclusionary or aggressive behaviors if taken to an extreme. In some cases, excessive patriotism can result in nationalism, which can lead to conflicts with other nations or groups. Despite these potential issues, many people view patriotism as a positive force that can unite people and inspire them to work together for the common good. It can foster a sense of belonging and purpose, and it can motivate individuals to contribute to the well-being of their country. Overall, patriotism is a multifaceted and deeply personal sentiment that can manifest in many different ways, depending on an individual's experiences, beliefs, and values.",
22 |       "subField2": 456
23 |     },
24 |     "tupleField": ["exampleString", 123]
25 |   }
26 | }
27 | 
```

--------------------------------------------------------------------------------
/tests/test_utils/server/litellm_server.py:
--------------------------------------------------------------------------------

```python
 1 | import json
 2 | import os
 3 | from typing import AsyncIterator, Iterator
 4 | 
 5 | import litellm
 6 | from litellm import CustomLLM
 7 | from litellm.types.utils import GenericStreamingChunk
 8 | 
 9 | LITELLM_TEST_SERVER_LOG_FILE_PATH_ENV_VAR = "LITELLM_TEST_SERVER_LOG_FILE_PATH"
10 | 
11 | 
12 | class DSPyTestModel(CustomLLM):
13 |     def completion(self, *args, **kwargs) -> litellm.ModelResponse:
14 |         _append_request_to_log_file(kwargs)
15 |         return _get_mock_llm_response(kwargs)
16 | 
17 |     async def acompletion(self, *args, **kwargs) -> litellm.ModelResponse:
18 |         _append_request_to_log_file(kwargs)
19 |         return _get_mock_llm_response(kwargs)
20 | 
21 |     def streaming(self, *args, **kwargs) -> Iterator[GenericStreamingChunk]:
22 |         generic_streaming_chunk: GenericStreamingChunk = {
23 |             "finish_reason": "stop",
24 |             "index": 0,
25 |             "is_finished": True,
26 |             "text": '{"output_text": "Hello!"}',
27 |             "tool_use": None,
28 |             "usage": {"completion_tokens": 0, "prompt_tokens": 0, "total_tokens": 0},
29 |         }
30 |         return generic_streaming_chunk  # type: ignore
31 | 
32 |     async def astreaming(self, *args, **kwargs) -> AsyncIterator[GenericStreamingChunk]:
33 |         generic_streaming_chunk: GenericStreamingChunk = {
34 |             "finish_reason": "stop",
35 |             "index": 0,
36 |             "is_finished": True,
37 |             "text": '{"output_text": "Hello!"}',
38 |             "tool_use": None,
39 |             "usage": {"completion_tokens": 0, "prompt_tokens": 0, "total_tokens": 0},
40 |         }
41 |         yield generic_streaming_chunk
42 | 
43 | 
44 | def _get_mock_llm_response(request_kwargs):
45 |     _throw_exception_based_on_content_if_applicable(request_kwargs)
46 |     return litellm.completion(
47 |         model="gpt-3.5-turbo",
48 |         messages=[{"role": "user", "content": "Hello world"}],
49 |         usage={"prompt_tokens": 10, "completion_tokens": 10, "total_tokens": 20},
50 |         mock_response="Hi!",
51 |     )
52 | 
53 | 
54 | def _throw_exception_based_on_content_if_applicable(request_kwargs):
55 |     """
56 |     Throws an exception, for testing purposes, based on the content of the request message.
57 |     """
58 |     model = request_kwargs["model"]
59 |     content = request_kwargs["messages"][0]["content"]
60 |     if "429" in content:
61 |         raise litellm.RateLimitError(message="Rate limit exceeded", llm_provider=None, model=model)
62 |     elif "504" in content:
63 |         raise litellm.Timeout("Request timed out!", llm_provider=None, model=model)
64 |     elif "400" in content:
65 |         raise litellm.BadRequestError(message="Bad request", llm_provider=None, model=model)
66 |     elif "401" in content:
67 |         raise litellm.AuthenticationError(message="Authentication error", llm_provider=None, model=model)
68 | 
69 | 
70 | def _append_request_to_log_file(completion_kwargs):
71 |     log_file_path = os.environ.get(LITELLM_TEST_SERVER_LOG_FILE_PATH_ENV_VAR)
72 |     if log_file_path is None:
73 |         raise ValueError(
74 |             "Server logs file path is not defined! Please set the path using the"
75 |             + f" {LITELLM_TEST_SERVER_LOG_FILE_PATH_ENV_VAR} environment variable."
76 |         )
77 | 
78 |     with open(log_file_path, "a") as f:
79 |         log_blob = (
80 |             {
81 |                 "model": completion_kwargs["model"],
82 |                 "messages": completion_kwargs["messages"],
83 |             },
84 |         )
85 |         json.dump(log_blob, f)
86 |         f.write("\n")
87 | 
88 | 
89 | dspy_test_model = DSPyTestModel()
90 | 
```

--------------------------------------------------------------------------------
/docs/docs/api/modules/CodeAct.md:
--------------------------------------------------------------------------------

```markdown
  1 | # dspy.CodeAct
  2 | 
  3 | <!-- START_API_REF -->
  4 | ::: dspy.CodeAct
  5 |     handler: python
  6 |     options:
  7 |         members:
  8 |             - __call__
  9 |             - batch
 10 |             - deepcopy
 11 |             - dump_state
 12 |             - get_lm
 13 |             - inspect_history
 14 |             - load
 15 |             - load_state
 16 |             - map_named_predictors
 17 |             - named_parameters
 18 |             - named_predictors
 19 |             - named_sub_modules
 20 |             - parameters
 21 |             - predictors
 22 |             - reset_copy
 23 |             - save
 24 |             - set_lm
 25 |         show_source: true
 26 |         show_root_heading: true
 27 |         heading_level: 2
 28 |         docstring_style: google
 29 |         show_root_full_path: true
 30 |         show_object_full_path: false
 31 |         separate_signature: false
 32 |         inherited_members: true
 33 | <!-- END_API_REF -->
 34 | 
 35 | # CodeAct
 36 | 
 37 | CodeAct is a DSPy module that combines code generation with tool execution to solve problems. It generates Python code snippets that use provided tools and the Python standard library to accomplish tasks.
 38 | 
 39 | ## Basic Usage
 40 | 
 41 | Here's a simple example of using CodeAct:
 42 | 
 43 | ```python
 44 | import dspy
 45 | from dspy.predict import CodeAct
 46 | 
 47 | # Define a simple tool function
 48 | def factorial(n: int) -> int:
 49 |     """Calculate the factorial of a number."""
 50 |     if n == 1:
 51 |         return 1
 52 |     return n * factorial(n-1)
 53 | 
 54 | # Create a CodeAct instance
 55 | act = CodeAct("n->factorial_result", tools=[factorial])
 56 | 
 57 | # Use the CodeAct instance
 58 | result = act(n=5)
 59 | print(result) # Will calculate factorial(5) = 120
 60 | ```
 61 | 
 62 | ## How It Works
 63 | 
 64 | CodeAct operates in an iterative manner:
 65 | 
 66 | 1. Takes input parameters and available tools
 67 | 2. Generates Python code snippets that use these tools
 68 | 3. Executes the code using a Python sandbox
 69 | 4. Collects the output and determines if the task is complete
 70 | 5. Answer the original question based on the collected information
 71 | 
 72 | ## ⚠️ Limitations
 73 | 
 74 | ### Only accepts pure functions as tools (no callable objects)
 75 | 
 76 | The following example does not work due to the usage of a callable object.
 77 | 
 78 | ```python
 79 | # ❌ NG
 80 | class Add():
 81 |     def __call__(self, a: int, b: int):
 82 |         return a + b
 83 | 
 84 | dspy.CodeAct("question -> answer", tools=[Add()])
 85 | ```
 86 | 
 87 | ### External libraries cannot be used
 88 | 
 89 | The following example does not work due to the usage of the external library `numpy`.
 90 | 
 91 | ```python
 92 | # ❌ NG
 93 | import numpy as np
 94 | 
 95 | def exp(i: int):
 96 |     return np.exp(i)
 97 | 
 98 | dspy.CodeAct("question -> answer", tools=[exp])
 99 | ```
100 | 
101 | ### All dependent functions need to be passed to `CodeAct`
102 | 
103 | Functions that depend on other functions or classes not passed to `CodeAct` cannot be used. The following example does not work because the tool functions depend on other functions or classes that are not passed to `CodeAct`, such as `Profile` or `secret_function`.
104 | 
105 | ```python
106 | # ❌ NG
107 | from pydantic import BaseModel
108 | 
109 | class Profile(BaseModel):
110 |     name: str
111 |     age: int
112 |     
113 | def age(profile: Profile):
114 |     return 
115 | 
116 | def parent_function():
117 |     print("Hi!")
118 | 
119 | def child_function():
120 |     parent_function()
121 | 
122 | dspy.CodeAct("question -> answer", tools=[age, child_function])
123 | ```
124 | 
125 | Instead, the following example works since all necessary tool functions are passed to `CodeAct`:
126 | 
127 | ```python
128 | # ✅ OK
129 | 
130 | def parent_function():
131 |     print("Hi!")
132 | 
133 | def child_function():
134 |     parent_function()
135 | 
136 | dspy.CodeAct("question -> answer", tools=[parent_function, child_function])
137 | ```
138 | 
```

--------------------------------------------------------------------------------
/dspy/datasets/hotpotqa.py:
--------------------------------------------------------------------------------

```python
 1 | import random
 2 | 
 3 | from dspy.datasets.dataset import Dataset
 4 | 
 5 | 
 6 | class HotPotQA(Dataset):
 7 |     def __init__(
 8 |         self,
 9 |         *args,
10 |         only_hard_examples=True,
11 |         keep_details="dev_titles",
12 |         unofficial_dev=True,
13 |         **kwargs,
14 |     ) -> None:
15 |         super().__init__(*args, **kwargs)
16 |         assert only_hard_examples, (
17 |             "Care must be taken when adding support for easy examples."
18 |             "Dev must be all hard to match official dev, but training can be flexible."
19 |         )
20 | 
21 |         from datasets import load_dataset
22 | 
23 |         hf_official_train = load_dataset("hotpot_qa", "fullwiki", split="train")
24 |         hf_official_dev = load_dataset("hotpot_qa", "fullwiki", split="validation")
25 | 
26 |         official_train = []
27 |         for raw_example in hf_official_train:
28 |             if raw_example["level"] == "hard":
29 |                 if keep_details is True:
30 |                     keys = ["id", "question", "answer", "type", "supporting_facts", "context"]
31 |                 elif keep_details == "dev_titles":
32 |                     keys = ["question", "answer", "supporting_facts"]
33 |                 else:
34 |                     keys = ["question", "answer"]
35 | 
36 |                 example = {k: raw_example[k] for k in keys}
37 | 
38 |                 if "supporting_facts" in example:
39 |                     example["gold_titles"] = set(example["supporting_facts"]["title"])
40 |                     del example["supporting_facts"]
41 | 
42 |                 official_train.append(example)
43 | 
44 |         rng = random.Random(0)
45 |         rng.shuffle(official_train)
46 | 
47 |         self._train = official_train[: len(official_train) * 75 // 100]
48 | 
49 |         if unofficial_dev:
50 |             self._dev = official_train[len(official_train) * 75 // 100 :]
51 |         else:
52 |             self._dev = None
53 | 
54 |         for example in self._train:
55 |             if keep_details == "dev_titles":
56 |                 del example["gold_titles"]
57 | 
58 |         test = []
59 |         for raw_example in hf_official_dev:
60 |             assert raw_example["level"] == "hard"
61 |             example = {k: raw_example[k] for k in ["id", "question", "answer", "type", "supporting_facts"]}
62 |             if "supporting_facts" in example:
63 |                 example["gold_titles"] = set(example["supporting_facts"]["title"])
64 |                 del example["supporting_facts"]
65 |             test.append(example)
66 | 
67 |         self._test = test
68 | 
69 | 
70 | if __name__ == "__main__":
71 |     from dspy.dsp.utils import dotdict
72 | 
73 |     data_args = dotdict(train_seed=1, train_size=16, eval_seed=2023, dev_size=200 * 5, test_size=0)
74 |     dataset = HotPotQA(**data_args)
75 | 
76 |     print(dataset)
77 |     print(dataset.train[0].question)
78 |     print(dataset.train[15].question)
79 | 
80 |     print(len(dataset.train), len(dataset.dev), len(dataset.test))
81 | 
82 |     print(dataset.dev[0].question)
83 |     print(dataset.dev[340].question)
84 |     print(dataset.dev[937].question)
85 | 
86 | """
87 | What was the population of the city where Woodward Avenue ends in 2010?
88 | Where did the star , who is also an executive producer, of the Mick begin her carrer?
89 | 16 1000 0
90 | Both London and German have seen attacks during war, there was one specific type of attack that Germany called the blitz, what did London call a similar attack?
91 | Pre-Madonna was a collection of demos by the singer who was a leading presence during the emergence of what network?
92 | Alan Mills composed the classic folk song that tells the story of what?
93 | """
94 | 
```