stanfordnlp/dspy # codebase.md

This is page 6 of 17. Use http://codebase.md/stanfordnlp/dspy?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .github
│   ├── .internal_dspyai
│   │   ├── internals
│   │   │   ├── build-and-release.md
│   │   │   └── release-checklist.md
│   │   └── pyproject.toml
│   ├── .tmp
│   │   └── .generated-actions
│   │       └── run-pypi-publish-in-docker-container
│   │           └── action.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.yml
│   │   └── feature_request.yml
│   ├── PULL_REQUEST_TEMPLATE
│   │   └── pull_request_template.md
│   ├── workflow_scripts
│   │   └── install_testpypi_pkg.sh
│   └── workflows
│       ├── build_and_release.yml
│       ├── build_utils
│       │   └── test_version.py
│       ├── docs-push.yml
│       ├── precommits_check.yml
│       └── run_tests.yml
├── .gitignore
├── .pre-commit-config.yaml
├── CONTRIBUTING.md
├── docs
│   ├── .gitignore
│   ├── docs
│   │   ├── api
│   │   │   ├── adapters
│   │   │   │   ├── Adapter.md
│   │   │   │   ├── ChatAdapter.md
│   │   │   │   ├── JSONAdapter.md
│   │   │   │   └── TwoStepAdapter.md
│   │   │   ├── evaluation
│   │   │   │   ├── answer_exact_match.md
│   │   │   │   ├── answer_passage_match.md
│   │   │   │   ├── CompleteAndGrounded.md
│   │   │   │   ├── Evaluate.md
│   │   │   │   ├── EvaluationResult.md
│   │   │   │   └── SemanticF1.md
│   │   │   ├── experimental
│   │   │   │   ├── Citations.md
│   │   │   │   └── Document.md
│   │   │   ├── index.md
│   │   │   ├── models
│   │   │   │   ├── Embedder.md
│   │   │   │   └── LM.md
│   │   │   ├── modules
│   │   │   │   ├── BestOfN.md
│   │   │   │   ├── ChainOfThought.md
│   │   │   │   ├── CodeAct.md
│   │   │   │   ├── Module.md
│   │   │   │   ├── MultiChainComparison.md
│   │   │   │   ├── Parallel.md
│   │   │   │   ├── Predict.md
│   │   │   │   ├── ProgramOfThought.md
│   │   │   │   ├── ReAct.md
│   │   │   │   └── Refine.md
│   │   │   ├── optimizers
│   │   │   │   ├── BetterTogether.md
│   │   │   │   ├── BootstrapFewShot.md
│   │   │   │   ├── BootstrapFewShotWithRandomSearch.md
│   │   │   │   ├── BootstrapFinetune.md
│   │   │   │   ├── BootstrapRS.md
│   │   │   │   ├── COPRO.md
│   │   │   │   ├── Ensemble.md
│   │   │   │   ├── GEPA
│   │   │   │   │   ├── GEPA_Advanced.md
│   │   │   │   │   └── overview.md
│   │   │   │   ├── InferRules.md
│   │   │   │   ├── KNN.md
│   │   │   │   ├── KNNFewShot.md
│   │   │   │   ├── LabeledFewShot.md
│   │   │   │   ├── MIPROv2.md
│   │   │   │   └── SIMBA.md
│   │   │   ├── primitives
│   │   │   │   ├── Audio.md
│   │   │   │   ├── Code.md
│   │   │   │   ├── Example.md
│   │   │   │   ├── History.md
│   │   │   │   ├── Image.md
│   │   │   │   ├── Prediction.md
│   │   │   │   ├── Tool.md
│   │   │   │   └── ToolCalls.md
│   │   │   ├── signatures
│   │   │   │   ├── InputField.md
│   │   │   │   ├── OutputField.md
│   │   │   │   └── Signature.md
│   │   │   ├── tools
│   │   │   │   ├── ColBERTv2.md
│   │   │   │   ├── Embeddings.md
│   │   │   │   └── PythonInterpreter.md
│   │   │   └── utils
│   │   │       ├── asyncify.md
│   │   │       ├── configure_cache.md
│   │   │       ├── disable_litellm_logging.md
│   │   │       ├── disable_logging.md
│   │   │       ├── enable_litellm_logging.md
│   │   │       ├── enable_logging.md
│   │   │       ├── inspect_history.md
│   │   │       ├── load.md
│   │   │       ├── StatusMessage.md
│   │   │       ├── StatusMessageProvider.md
│   │   │       ├── streamify.md
│   │   │       └── StreamListener.md
│   │   ├── cheatsheet.md
│   │   ├── community
│   │   │   ├── community-resources.md
│   │   │   ├── how-to-contribute.md
│   │   │   └── use-cases.md
│   │   ├── deep-dive
│   │   │   └── data-handling
│   │   │       ├── built-in-datasets.md
│   │   │       ├── examples.md
│   │   │       ├── img
│   │   │       │   └── data-loading.png
│   │   │       └── loading-custom-data.md
│   │   ├── faqs.md
│   │   ├── index.md
│   │   ├── js
│   │   │   └── runllm-widget.js
│   │   ├── learn
│   │   │   ├── evaluation
│   │   │   │   ├── data.md
│   │   │   │   ├── metrics.md
│   │   │   │   └── overview.md
│   │   │   ├── figures
│   │   │   │   ├── native_tool_call.png
│   │   │   │   └── teleprompter-classes.png
│   │   │   ├── index.md
│   │   │   ├── optimization
│   │   │   │   ├── optimizers.md
│   │   │   │   └── overview.md
│   │   │   └── programming
│   │   │       ├── 7-assertions.md
│   │   │       ├── adapters.md
│   │   │       ├── language_models.md
│   │   │       ├── mcp.md
│   │   │       ├── modules.md
│   │   │       ├── overview.md
│   │   │       ├── signatures.md
│   │   │       └── tools.md
│   │   ├── production
│   │   │   └── index.md
│   │   ├── roadmap.md
│   │   ├── static
│   │   │   ├── .nojekyll
│   │   │   └── img
│   │   │       ├── dspy_logo.png
│   │   │       ├── logo.png
│   │   │       ├── mlflow-tracing-rag.png
│   │   │       ├── modular.png
│   │   │       ├── optimize.png
│   │   │       ├── undraw_docusaurus_mountain.svg
│   │   │       ├── undraw_docusaurus_react.svg
│   │   │       ├── undraw_docusaurus_tree.svg
│   │   │       └── universal_compatibility.png
│   │   ├── stylesheets
│   │   │   └── extra.css
│   │   └── tutorials
│   │       ├── agents
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-agent.png
│   │       ├── ai_text_game
│   │       │   └── index.md
│   │       ├── async
│   │       │   └── index.md
│   │       ├── audio
│   │       │   └── index.ipynb
│   │       ├── build_ai_program
│   │       │   └── index.md
│   │       ├── cache
│   │       │   └── index.md
│   │       ├── classification
│   │       │   └── index.md
│   │       ├── classification_finetuning
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-classification.png
│   │       ├── conversation_history
│   │       │   └── index.md
│   │       ├── core_development
│   │       │   └── index.md
│   │       ├── custom_module
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-custom-module.png
│   │       ├── customer_service_agent
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-customer-service-agent.png
│   │       ├── deployment
│   │       │   ├── dspy_mlflow_ui.png
│   │       │   └── index.md
│   │       ├── email_extraction
│   │       │   ├── index.md
│   │       │   └── mlflow-tracing-email-extraction.png
│   │       ├── entity_extraction
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-entity-extraction.png
│   │       ├── games
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-agent.png
│   │       ├── gepa_ai_program
│   │       │   └── index.md
│   │       ├── gepa_aime
│   │       │   ├── index.ipynb
│   │       │   ├── mlflow-tracing-gepa-aime.png
│   │       │   └── mlflow-tracking-gepa-aime-optimization.png
│   │       ├── gepa_facilitysupportanalyzer
│   │       │   ├── index.ipynb
│   │       │   ├── mlflow-tracing-gepa-support.png
│   │       │   └── mlflow-tracking-gepa-support-optimization.png
│   │       ├── gepa_papillon
│   │       │   ├── index.ipynb
│   │       │   ├── mlflow-tracing-gepa-papilon.png
│   │       │   └── mlflow-tracking-gepa-papilon-optimization.png
│   │       ├── image_generation_prompting
│   │       │   └── index.ipynb
│   │       ├── index.md
│   │       ├── llms_txt_generation
│   │       │   └── index.md
│   │       ├── math
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-math.png
│   │       ├── mcp
│   │       │   └── index.md
│   │       ├── mem0_react_agent
│   │       │   └── index.md
│   │       ├── multihop_search
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-multi-hop.png
│   │       ├── observability
│   │       │   ├── index.md
│   │       │   ├── mlflow_trace_ui_navigation.gif
│   │       │   ├── mlflow_trace_ui.png
│   │       │   └── mlflow_trace_view.png
│   │       ├── optimize_ai_program
│   │       │   └── index.md
│   │       ├── optimizer_tracking
│   │       │   ├── child_run.png
│   │       │   ├── experiment.png
│   │       │   ├── index.md
│   │       │   └── parent_run.png
│   │       ├── output_refinement
│   │       │   └── best-of-n-and-refine.md
│   │       ├── papillon
│   │       │   └── index.md
│   │       ├── program_of_thought
│   │       │   └── index.ipynb
│   │       ├── rag
│   │       │   ├── index.ipynb
│   │       │   └── mlflow-tracing-rag.png
│   │       ├── real_world_examples
│   │       │   └── index.md
│   │       ├── rl_ai_program
│   │       │   └── index.md
│   │       ├── rl_multihop
│   │       │   └── index.ipynb
│   │       ├── rl_papillon
│   │       │   └── index.ipynb
│   │       ├── sample_code_generation
│   │       │   └── index.md
│   │       ├── saving
│   │       │   └── index.md
│   │       ├── streaming
│   │       │   └── index.md
│   │       ├── tool_use
│   │       │   └── index.ipynb
│   │       └── yahoo_finance_react
│   │           └── index.md
│   ├── mkdocs.yml
│   ├── overrides
│   │   ├── home.html
│   │   ├── main.html
│   │   └── partials
│   │       └── tabs.html
│   ├── Pipfile
│   ├── Pipfile.lock
│   ├── README.md
│   ├── requirements.txt
│   ├── scripts
│   │   ├── generate_api_docs.py
│   │   └── generate_api_summary.py
│   └── vercel.json
├── dspy
│   ├── __init__.py
│   ├── __metadata__.py
│   ├── adapters
│   │   ├── __init__.py
│   │   ├── baml_adapter.py
│   │   ├── base.py
│   │   ├── chat_adapter.py
│   │   ├── json_adapter.py
│   │   ├── two_step_adapter.py
│   │   ├── types
│   │   │   ├── __init__.py
│   │   │   ├── audio.py
│   │   │   ├── base_type.py
│   │   │   ├── citation.py
│   │   │   ├── code.py
│   │   │   ├── document.py
│   │   │   ├── history.py
│   │   │   ├── image.py
│   │   │   └── tool.py
│   │   ├── utils.py
│   │   └── xml_adapter.py
│   ├── clients
│   │   ├── __init__.py
│   │   ├── base_lm.py
│   │   ├── cache.py
│   │   ├── databricks.py
│   │   ├── embedding.py
│   │   ├── lm_local_arbor.py
│   │   ├── lm_local.py
│   │   ├── lm.py
│   │   ├── openai.py
│   │   ├── provider.py
│   │   └── utils_finetune.py
│   ├── datasets
│   │   ├── __init__.py
│   │   ├── alfworld
│   │   │   ├── __init__.py
│   │   │   ├── alfworld.py
│   │   │   └── base_config.yml
│   │   ├── colors.py
│   │   ├── dataloader.py
│   │   ├── dataset.py
│   │   ├── gsm8k.py
│   │   ├── hotpotqa.py
│   │   └── math.py
│   ├── dsp
│   │   ├── __init__.py
│   │   ├── colbertv2.py
│   │   └── utils
│   │       ├── __init__.py
│   │       ├── dpr.py
│   │       ├── settings.py
│   │       └── utils.py
│   ├── evaluate
│   │   ├── __init__.py
│   │   ├── auto_evaluation.py
│   │   ├── evaluate.py
│   │   └── metrics.py
│   ├── experimental
│   │   └── __init__.py
│   ├── predict
│   │   ├── __init__.py
│   │   ├── aggregation.py
│   │   ├── avatar
│   │   │   ├── __init__.py
│   │   │   ├── avatar.py
│   │   │   ├── models.py
│   │   │   └── signatures.py
│   │   ├── best_of_n.py
│   │   ├── chain_of_thought.py
│   │   ├── code_act.py
│   │   ├── knn.py
│   │   ├── multi_chain_comparison.py
│   │   ├── parallel.py
│   │   ├── parameter.py
│   │   ├── predict.py
│   │   ├── program_of_thought.py
│   │   ├── react.py
│   │   ├── refine.py
│   │   └── retry.py
│   ├── primitives
│   │   ├── __init__.py
│   │   ├── base_module.py
│   │   ├── example.py
│   │   ├── module.py
│   │   ├── prediction.py
│   │   ├── python_interpreter.py
│   │   └── runner.js
│   ├── propose
│   │   ├── __init__.py
│   │   ├── dataset_summary_generator.py
│   │   ├── grounded_proposer.py
│   │   ├── propose_base.py
│   │   └── utils.py
│   ├── retrievers
│   │   ├── __init__.py
│   │   ├── databricks_rm.py
│   │   ├── embeddings.py
│   │   ├── retrieve.py
│   │   └── weaviate_rm.py
│   ├── signatures
│   │   ├── __init__.py
│   │   ├── field.py
│   │   ├── signature.py
│   │   └── utils.py
│   ├── streaming
│   │   ├── __init__.py
│   │   ├── messages.py
│   │   ├── streamify.py
│   │   └── streaming_listener.py
│   ├── teleprompt
│   │   ├── __init__.py
│   │   ├── avatar_optimizer.py
│   │   ├── bettertogether.py
│   │   ├── bootstrap_finetune.py
│   │   ├── bootstrap_trace.py
│   │   ├── bootstrap.py
│   │   ├── copro_optimizer.py
│   │   ├── ensemble.py
│   │   ├── gepa
│   │   │   ├── __init__.py
│   │   │   ├── gepa_utils.py
│   │   │   ├── gepa.py
│   │   │   └── instruction_proposal.py
│   │   ├── grpo.py
│   │   ├── infer_rules.py
│   │   ├── knn_fewshot.py
│   │   ├── mipro_optimizer_v2.py
│   │   ├── random_search.py
│   │   ├── signature_opt.py
│   │   ├── simba_utils.py
│   │   ├── simba.py
│   │   ├── teleprompt_optuna.py
│   │   ├── teleprompt.py
│   │   ├── utils.py
│   │   └── vanilla.py
│   └── utils
│       ├── __init__.py
│       ├── annotation.py
│       ├── asyncify.py
│       ├── caching.py
│       ├── callback.py
│       ├── dummies.py
│       ├── exceptions.py
│       ├── hasher.py
│       ├── inspect_history.py
│       ├── langchain_tool.py
│       ├── logging_utils.py
│       ├── mcp.py
│       ├── parallelizer.py
│       ├── saving.py
│       ├── syncify.py
│       ├── unbatchify.py
│       └── usage_tracker.py
├── LICENSE
├── pyproject.toml
├── README.md
├── tests
│   ├── __init__.py
│   ├── adapters
│   │   ├── test_adapter_utils.py
│   │   ├── test_baml_adapter.py
│   │   ├── test_base_type.py
│   │   ├── test_chat_adapter.py
│   │   ├── test_citation.py
│   │   ├── test_code.py
│   │   ├── test_document.py
│   │   ├── test_json_adapter.py
│   │   ├── test_tool.py
│   │   ├── test_two_step_adapter.py
│   │   └── test_xml_adapter.py
│   ├── callback
│   │   └── test_callback.py
│   ├── clients
│   │   ├── test_cache.py
│   │   ├── test_databricks.py
│   │   ├── test_embedding.py
│   │   ├── test_inspect_global_history.py
│   │   └── test_lm.py
│   ├── conftest.py
│   ├── datasets
│   │   └── test_dataset.py
│   ├── docs
│   │   └── test_mkdocs_links.py
│   ├── evaluate
│   │   ├── test_evaluate.py
│   │   └── test_metrics.py
│   ├── examples
│   │   └── test_baleen.py
│   ├── metadata
│   │   └── test_metadata.py
│   ├── predict
│   │   ├── test_aggregation.py
│   │   ├── test_best_of_n.py
│   │   ├── test_chain_of_thought.py
│   │   ├── test_code_act.py
│   │   ├── test_knn.py
│   │   ├── test_multi_chain_comparison.py
│   │   ├── test_parallel.py
│   │   ├── test_predict.py
│   │   ├── test_program_of_thought.py
│   │   ├── test_react.py
│   │   ├── test_refine.py
│   │   └── test_retry.py
│   ├── primitives
│   │   ├── resources
│   │   │   └── saved_program.json
│   │   ├── test_base_module.py
│   │   ├── test_example.py
│   │   ├── test_module.py
│   │   └── test_python_interpreter.py
│   ├── propose
│   │   └── test_grounded_proposer.py
│   ├── README.md
│   ├── reliability
│   │   ├── __init__.py
│   │   ├── complex_types
│   │   │   └── generated
│   │   │       ├── test_many_types_1
│   │   │       │   ├── inputs
│   │   │       │   │   ├── input1.json
│   │   │       │   │   └── input2.json
│   │   │       │   ├── program.py
│   │   │       │   └── schema.json
│   │   │       ├── test_nesting_1
│   │   │       │   ├── inputs
│   │   │       │   │   ├── input1.json
│   │   │       │   │   └── input2.json
│   │   │       │   ├── program.py
│   │   │       │   └── schema.json
│   │   │       └── test_nesting_2
│   │   │           ├── inputs
│   │   │           │   └── input1.json
│   │   │           ├── program.py
│   │   │           └── schema.json
│   │   ├── conftest.py
│   │   ├── generate
│   │   │   ├── __init__.py
│   │   │   ├── __main__.py
│   │   │   └── utils.py
│   │   ├── input_formats
│   │   │   └── generated
│   │   │       └── test_markdown_1
│   │   │           ├── inputs
│   │   │           │   ├── input1.json
│   │   │           │   └── input2.json
│   │   │           ├── program.py
│   │   │           └── schema.json
│   │   ├── README.md
│   │   ├── reliability_conf.yaml
│   │   ├── test_generated.py
│   │   ├── test_pydantic_models.py
│   │   └── utils.py
│   ├── retrievers
│   │   └── test_embeddings.py
│   ├── signatures
│   │   ├── test_adapter_image.py
│   │   ├── test_custom_types.py
│   │   └── test_signature.py
│   ├── streaming
│   │   └── test_streaming.py
│   ├── teleprompt
│   │   ├── gepa_dummy_lm_custom_component_selector_custom_instruction_proposer.json
│   │   ├── gepa_dummy_lm.json
│   │   ├── test_bootstrap_finetune.py
│   │   ├── test_bootstrap_trace.py
│   │   ├── test_bootstrap.py
│   │   ├── test_copro_optimizer.py
│   │   ├── test_ensemble.py
│   │   ├── test_finetune.py
│   │   ├── test_gepa_instruction_proposer.py
│   │   ├── test_gepa.py
│   │   ├── test_grpo.py
│   │   ├── test_knn_fewshot.py
│   │   ├── test_random_search.py
│   │   ├── test_teleprompt.py
│   │   └── test_utils.py
│   ├── test_utils
│   │   ├── __init__.py
│   │   └── server
│   │       ├── __init__.py
│   │       ├── litellm_server_config.yaml
│   │       └── litellm_server.py
│   └── utils
│       ├── __init__.py
│       ├── resources
│       │   └── mcp_server.py
│       ├── test_annotation.py
│       ├── test_asyncify.py
│       ├── test_exceptions.py
│       ├── test_langchain_tool.py
│       ├── test_mcp.py
│       ├── test_parallelizer.py
│       ├── test_saving.py
│       ├── test_settings.py
│       ├── test_syncify.py
│       ├── test_unbatchify.py
│       └── test_usage_tracker.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/dspy/retrievers/embeddings.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | import os
  3 | from typing import Any
  4 | 
  5 | import numpy as np
  6 | 
  7 | from dspy.utils.unbatchify import Unbatchify
  8 | 
  9 | 
 10 | class Embeddings:
 11 |     def __init__(
 12 |         self,
 13 |         corpus: list[str],
 14 |         embedder,
 15 |         k: int = 5,
 16 |         callbacks: list[Any] | None = None,
 17 |         cache: bool = False,
 18 |         brute_force_threshold: int = 20_000,
 19 |         normalize: bool = True,
 20 |     ):
 21 |         assert cache is False, "Caching is not supported for embeddings-based retrievers"
 22 | 
 23 |         self.embedder = embedder
 24 |         self.k = k
 25 |         self.corpus = corpus
 26 |         self.normalize = normalize
 27 | 
 28 |         self.corpus_embeddings = self.embedder(self.corpus)
 29 |         self.corpus_embeddings = self._normalize(self.corpus_embeddings) if self.normalize else self.corpus_embeddings
 30 | 
 31 |         self.index = self._build_faiss() if len(corpus) >= brute_force_threshold else None
 32 |         self.search_fn = Unbatchify(self._batch_forward)
 33 | 
 34 |     def __call__(self, query: str):
 35 |         return self.forward(query)
 36 | 
 37 |     def forward(self, query: str):
 38 |         import dspy
 39 | 
 40 |         passages, indices = self.search_fn(query)
 41 |         return dspy.Prediction(passages=passages, indices=indices)
 42 | 
 43 |     def _batch_forward(self, queries: list[str]):
 44 |         q_embeds = self.embedder(queries)
 45 |         q_embeds = self._normalize(q_embeds) if self.normalize else q_embeds
 46 | 
 47 |         pids = self._faiss_search(q_embeds, self.k * 10) if self.index else None
 48 |         pids = np.tile(np.arange(len(self.corpus)), (len(queries), 1)) if pids is None else pids
 49 | 
 50 |         return self._rerank_and_predict(q_embeds, pids)
 51 | 
 52 |     def _build_faiss(self):
 53 |         nbytes = 32
 54 |         partitions = int(2 * np.sqrt(len(self.corpus)))
 55 |         dim = self.corpus_embeddings.shape[1]
 56 | 
 57 |         try:
 58 |             import faiss
 59 |         except ImportError:
 60 |             raise ImportError("Please `pip install faiss-cpu` or increase `brute_force_threshold` to avoid FAISS.")
 61 | 
 62 |         quantizer = faiss.IndexFlatL2(dim)
 63 |         index = faiss.IndexIVFPQ(quantizer, dim, partitions, nbytes, 8)
 64 | 
 65 |         print(
 66 |             f"Training a {nbytes}-byte FAISS index with {partitions} partitions, based on "
 67 |             f"{len(self.corpus)} x {dim}-dim embeddings"
 68 |         )
 69 |         index.train(self.corpus_embeddings)
 70 |         index.add(self.corpus_embeddings)
 71 |         index.nprobe = min(16, partitions)
 72 | 
 73 |         return index
 74 | 
 75 |     def _faiss_search(self, query_embeddings: np.ndarray, num_candidates: int):
 76 |         return self.index.search(query_embeddings, num_candidates)[1]
 77 | 
 78 |     def _rerank_and_predict(self, q_embeds: np.ndarray, candidate_indices: np.ndarray):
 79 |         candidate_embeddings = self.corpus_embeddings[candidate_indices]
 80 |         scores = np.einsum("qd,qkd->qk", q_embeds, candidate_embeddings)
 81 | 
 82 |         top_k_indices = np.argsort(-scores, axis=1)[:, : self.k]
 83 |         top_indices = candidate_indices[np.arange(len(q_embeds))[:, None], top_k_indices]
 84 | 
 85 |         return [([self.corpus[idx] for idx in indices], [idx for idx in indices]) for indices in top_indices]  # noqa: C416
 86 | 
 87 |     def _normalize(self, embeddings: np.ndarray):
 88 |         norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
 89 |         return embeddings / np.maximum(norms, 1e-10)
 90 | 
 91 |     def save(self, path: str):
 92 |         """
 93 |         Save the embeddings index to disk.
 94 | 
 95 |         This saves the corpus, embeddings, FAISS index (if present), and configuration
 96 |         to allow for fast loading without recomputing embeddings.
 97 | 
 98 |         Args:
 99 |             path: Directory path where the embeddings will be saved
100 |         """
101 |         os.makedirs(path, exist_ok=True)
102 | 
103 |         # Save configuration and corpus
104 |         config = {
105 |             "k": self.k,
106 |             "normalize": self.normalize,
107 |             "corpus": self.corpus,
108 |             "has_faiss_index": self.index is not None,
109 |         }
110 | 
111 |         with open(os.path.join(path, "config.json"), "w") as f:
112 |             json.dump(config, f, indent=2)
113 | 
114 |         # Save embeddings
115 |         np.save(os.path.join(path, "corpus_embeddings.npy"), self.corpus_embeddings)
116 | 
117 |         # Save FAISS index if it exists
118 |         if self.index is not None:
119 |             try:
120 |                 import faiss
121 |                 faiss.write_index(self.index, os.path.join(path, "faiss_index.bin"))
122 |             except ImportError:
123 |                 # If FAISS is not available, we can't save the index
124 |                 # but we can still save the embeddings for brute force search
125 |                 pass
126 | 
127 |     def load(self, path: str, embedder):
128 |         """
129 |         Load the embeddings index from disk into the current instance.
130 | 
131 |         Args:
132 |             path: Directory path where the embeddings were saved
133 |             embedder: The embedder function to use for new queries
134 | 
135 |         Returns:
136 |             self: Returns self for method chaining
137 | 
138 |         Raises:
139 |             FileNotFoundError: If the save directory or required files don't exist
140 |             ValueError: If the saved config is invalid or incompatible
141 |         """
142 |         if not os.path.exists(path):
143 |             raise FileNotFoundError(f"Save directory not found: {path}")
144 | 
145 |         config_path = os.path.join(path, "config.json")
146 |         embeddings_path = os.path.join(path, "corpus_embeddings.npy")
147 | 
148 |         if not os.path.exists(config_path):
149 |             raise FileNotFoundError(f"Config file not found: {config_path}")
150 |         if not os.path.exists(embeddings_path):
151 |             raise FileNotFoundError(f"Embeddings file not found: {embeddings_path}")
152 | 
153 |         # Load configuration and corpus
154 |         with open(config_path) as f:
155 |             config = json.load(f)
156 | 
157 |         # Validate required config fields
158 |         required_fields = ["k", "normalize", "corpus", "has_faiss_index"]
159 |         for field in required_fields:
160 |             if field not in config:
161 |                 raise ValueError(f"Invalid config: missing required field '{field}'")
162 | 
163 |         # Restore configuration
164 |         self.k = config["k"]
165 |         self.normalize = config["normalize"]
166 |         self.corpus = config["corpus"]
167 |         self.embedder = embedder
168 | 
169 |         # Load embeddings
170 |         self.corpus_embeddings = np.load(embeddings_path)
171 | 
172 |         # Load FAISS index if it was saved and FAISS is available
173 |         faiss_index_path = os.path.join(path, "faiss_index.bin")
174 |         if config["has_faiss_index"] and os.path.exists(faiss_index_path):
175 |             try:
176 |                 import faiss
177 |                 self.index = faiss.read_index(faiss_index_path)
178 |             except ImportError:
179 |                 # If FAISS is not available, fall back to brute force
180 |                 self.index = None
181 |         else:
182 |             self.index = None
183 | 
184 |         return self
185 | 
186 |     @classmethod
187 |     def from_saved(cls, path: str, embedder):
188 |         """
189 |         Create an Embeddings instance from a saved index.
190 | 
191 |         This is the recommended way to load saved embeddings as it creates a new
192 |         instance without unnecessarily computing embeddings.
193 | 
194 |         Args:
195 |             path: Directory path where the embeddings were saved
196 |             embedder: The embedder function to use for new queries
197 | 
198 |         Returns:
199 |             Embeddings instance loaded from disk
200 | 
201 |         Example:
202 |             ```python
203 |             # Save embeddings
204 |             embeddings = Embeddings(corpus, embedder)
205 |             embeddings.save("./saved_embeddings")
206 | 
207 |             # Load embeddings later
208 |             loaded_embeddings = Embeddings.from_saved("./saved_embeddings", embedder)
209 |             ```
210 |         """
211 |         # Create a minimal instance without triggering embedding computation
212 |         instance = cls.__new__(cls)
213 |         # Initialize the search function (required since we bypassed __init__)
214 |         instance.search_fn = Unbatchify(instance._batch_forward)
215 |         instance.load(path, embedder)
216 |         return instance
217 | 
```

--------------------------------------------------------------------------------
/dspy/datasets/alfworld/base_config.yml:
--------------------------------------------------------------------------------

```yaml
  1 | dataset:
  2 |   data_path: '$ALFWORLD_DATA/json_2.1.1/train'
  3 |   eval_id_data_path: '$ALFWORLD_DATA/json_2.1.1/valid_seen'    # null/None to disable
  4 |   eval_ood_data_path: '$ALFWORLD_DATA/json_2.1.1/valid_unseen' # null/None to disable
  5 |   num_train_games: -1                                          # max training games (<=0 indicates full dataset)
  6 |   num_eval_games: -1                                           # max evaluation games (<=0 indicates full dataset)
  7 | 
  8 | logic:
  9 |   domain: '$ALFWORLD_DATA/logic/alfred.pddl'                   # PDDL domain file that defines the world dynamics
 10 |   grammar: '$ALFWORLD_DATA/logic/alfred.twl2'                  # Grammar file that defines the text feedbacks
 11 | 
 12 | env:
 13 |   type: 'AlfredTWEnv'                                          # 'AlfredTWEnv' or 'AlfredThorEnv' or 'AlfredHybrid'
 14 |   regen_game_files: False                                      # check if game is solvable by expert and save to game.tw-pddl file
 15 |   domain_randomization: False                                  # shuffle Textworld print order and object id nums
 16 |   task_types: [1, 2, 3, 4, 5, 6]                               # task-type ids: 1 - Pick & Place, 2 - Examine in Light, 3 - Clean & Place, 4 - Heat & Place, 5 - Cool & Place, 6 - Pick Two & Place
 17 |   expert_timeout_steps: 150                                    # max steps before timeout for expert to solve the task
 18 |   expert_type: "handcoded"                                     # 'handcoded' or 'downward'. Note: the downward planner is very slow for real-time use
 19 |   goal_desc_human_anns_prob: 0.0                               # prob of using human-annotated goal language instead of templated goals (1.0 indicates all human annotations from ALFRED)
 20 | 
 21 |   hybrid:
 22 |     start_eps: 100000                                          # starting episode of hybrid training, tw-only training upto this point
 23 |     thor_prob: 0.5                                             # prob of AlfredThorEnv during hybrid training
 24 |     eval_mode: "tw"                                            # 'tw' or 'thor' - env used for evaluation during hybrid training
 25 | 
 26 |   thor:
 27 |     screen_width: 300                                          # width of THOR window
 28 |     screen_height: 300                                         # height of THOR window
 29 |     smooth_nav: False                                          # smooth rotations, looks, and translations during navigation (very slow)
 30 |     save_frames_to_disk: False                                 # save frame PNGs to disk (useful for making videos)
 31 |     save_frames_path: './videos/'                              # path to save frame PNGs
 32 | 
 33 | controller:
 34 |   type: 'oracle'                                               # 'oracle' or 'oracle_astar' or 'mrcnn' or 'mrcnn_astar' (aka BUTLER)
 35 |   debug: False
 36 |   load_receps: True                                            # load receptacle locations from precomputed dict (if available)
 37 | 
 38 | mask_rcnn:
 39 |   pretrained_model_path: '$ALFWORLD_DATA/detectors/mrcnn.pth'
 40 | 
 41 | general:
 42 |   random_seed: 42
 43 |   use_cuda: True                                               # disable this when running on machine without cuda
 44 |   visdom: False                                                # plot training/eval curves, run with visdom server
 45 |   task: 'alfred'
 46 |   training_method: 'dagger'                                    # 'dqn' or 'dagger'
 47 |   save_path: './training/'                                     # path to save pytorch models
 48 |   observation_pool_capacity: 3                                 # k-size queue, 0 indicates no observation
 49 |   hide_init_receptacles: False                                 # remove initial observation containing navigable receptacles
 50 | 
 51 |   training:
 52 |     batch_size: 10
 53 |     max_episode: 50000
 54 |     smoothing_eps: 0.1
 55 |     optimizer:
 56 |       learning_rate: 0.001
 57 |       clip_grad_norm: 5
 58 | 
 59 |   evaluate:
 60 |     run_eval: True
 61 |     batch_size: 10
 62 |     env:
 63 |       type: "AlfredTWEnv"
 64 | 
 65 |   checkpoint:
 66 |     report_frequency: 1000                                    # report every N episode
 67 |     experiment_tag: 'test'                                    # name of experiment
 68 |     load_pretrained: False                                    # during test, enable this so that the agent load your pretrained model
 69 |     load_from_tag: 'not loading anything'                     # name of pre-trained model to load in save_path
 70 | 
 71 |   model:
 72 |     encoder_layers: 1
 73 |     decoder_layers: 1
 74 |     encoder_conv_num: 5
 75 |     block_hidden_dim: 64
 76 |     n_heads: 1
 77 |     dropout: 0.1
 78 |     block_dropout: 0.1
 79 |     recurrent: True
 80 | 
 81 | rl:
 82 |   action_space: "admissible"                                  # 'admissible' (candidates from text engine) or 'generation' (seq2seq-style generation) or 'beam_search_choice' or 'exhaustive' (not working)
 83 |   max_target_length: 20                                       # max token length for seq2seq generation
 84 |   beam_width: 10                                              # 1 means greedy
 85 |   generate_top_k: 3
 86 | 
 87 |   training:
 88 |     max_nb_steps_per_episode: 50                              # terminate after this many steps
 89 |     learn_start_from_this_episode: 0                          # delay updates until this episode
 90 |     target_net_update_frequency: 500                          # sync target net with online net per this many epochs
 91 | 
 92 |   replay:
 93 |     accumulate_reward_from_final: True
 94 |     count_reward_lambda: 0.0                                  # 0 to disable
 95 |     novel_object_reward_lambda: 0.0                           # 0 to disable
 96 |     discount_gamma_game_reward: 0.9
 97 |     discount_gamma_count_reward: 0.5
 98 |     discount_gamma_novel_object_reward: 0.5
 99 |     replay_memory_capacity: 500000                            # adjust this depending on your RAM size
100 |     replay_memory_priority_fraction: 0.5
101 |     update_per_k_game_steps: 5
102 |     replay_batch_size: 64
103 |     multi_step: 3
104 |     replay_sample_history_length: 4
105 |     replay_sample_update_from: 2
106 | 
107 |   epsilon_greedy:
108 |     noisy_net: False                                          # if this is true, then epsilon greedy is disabled
109 |     epsilon_anneal_episodes: 1000                             # -1 if not annealing
110 |     epsilon_anneal_from: 0.3
111 |     epsilon_anneal_to: 0.1
112 | 
113 | dagger:
114 |   action_space: "generation"                                  # 'admissible' (candidates from text engine) or 'generation' (seq2seq-style generation) or 'exhaustive' (not working)
115 |   max_target_length: 20                                       # max token length for seq2seq generation
116 |   beam_width: 10                                              # 1 means greedy
117 |   generate_top_k: 5
118 |   unstick_by_beam_search: False                               # use beam-search for failed actions, set True during evaluation
119 | 
120 |   training:
121 |     max_nb_steps_per_episode: 50                              # terminate after this many steps
122 | 
123 |   fraction_assist:
124 |     fraction_assist_anneal_episodes: 50000
125 |     fraction_assist_anneal_from: 1.0
126 |     fraction_assist_anneal_to: 0.01
127 | 
128 |   fraction_random:
129 |     fraction_random_anneal_episodes: 0
130 |     fraction_random_anneal_from: 0.0
131 |     fraction_random_anneal_to: 0.0
132 | 
133 |   replay:
134 |     replay_memory_capacity: 500000
135 |     update_per_k_game_steps: 5
136 |     replay_batch_size: 64
137 |     replay_sample_history_length: 4
138 |     replay_sample_update_from: 2
139 | 
140 | vision_dagger:
141 |   model_type: "resnet"                                        # 'resnet' (whole image features) or 'maskrcnn_whole' (whole image MaskRCNN feats) or 'maskrcnn' (top k MaskRCNN detection feats) or 'no_vision' (zero vision input)
142 |   resnet_fc_dim: 64
143 |   maskrcnn_top_k_boxes: 10                                    # top k box features
144 |   use_exploration_frame_feats: False                          # append feats from initial exploration (memory intensive!)
145 |   sequence_aggregation_method: "average"                      # 'sum' or 'average' or 'rnn'
```

--------------------------------------------------------------------------------
/dspy/adapters/types/citation.py:
--------------------------------------------------------------------------------

```python
  1 | from typing import Any, Optional
  2 | 
  3 | import pydantic
  4 | 
  5 | from dspy.adapters.types.base_type import Type
  6 | from dspy.utils.annotation import experimental
  7 | 
  8 | 
  9 | @experimental(version="3.0.4")
 10 | class Citations(Type):
 11 |     """Citations extracted from an LM response with source references.
 12 | 
 13 |     This type represents citations returned by language models that support
 14 |     citation extraction, particularly Anthropic's Citations API through LiteLLM.
 15 |     Citations include the quoted text and source information.
 16 | 
 17 |     Example:
 18 |         ```python
 19 |         import os
 20 |         import dspy
 21 |         from dspy.signatures import Signature
 22 |         from dspy.experimental import Citations, Document
 23 |         os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY"
 24 | 
 25 |         class AnswerWithSources(Signature):
 26 |             '''Answer questions using provided documents with citations.'''
 27 |             documents: list[Document] = dspy.InputField()
 28 |             question: str = dspy.InputField()
 29 |             answer: str = dspy.OutputField()
 30 |             citations: Citations = dspy.OutputField()
 31 | 
 32 |         # Create documents to provide as sources
 33 |         docs = [
 34 |             Document(
 35 |                 data="The Earth orbits the Sun in an elliptical path.",
 36 |                 title="Basic Astronomy Facts"
 37 |             ),
 38 |             Document(
 39 |                 data="Water boils at 100°C at standard atmospheric pressure.",
 40 |                 title="Physics Fundamentals",
 41 |                 metadata={"author": "Dr. Smith", "year": 2023}
 42 |             )
 43 |         ]
 44 | 
 45 |         # Use with a model that supports citations like Claude
 46 |         lm = dspy.LM("anthropic/claude-opus-4-1-20250805")
 47 |         predictor = dspy.Predict(AnswerWithSources)
 48 |         result = predictor(documents=docs, question="What temperature does water boil?", lm=lm)
 49 | 
 50 |         for citation in result.citations.citations:
 51 |             print(citation.format())
 52 |         ```
 53 |     """
 54 | 
 55 |     class Citation(Type):
 56 |         """Individual citation with character location information."""
 57 |         type: str = "char_location"
 58 |         cited_text: str
 59 |         document_index: int
 60 |         document_title: str | None = None
 61 |         start_char_index: int
 62 |         end_char_index: int
 63 |         supported_text: str | None = None
 64 | 
 65 |         def format(self) -> dict[str, Any]:
 66 |             """Format citation as dictionary for LM consumption.
 67 | 
 68 |             Returns:
 69 |                 A dictionary in the format expected by citation APIs.
 70 |             """
 71 |             citation_dict = {
 72 |                 "type": self.type,
 73 |                 "cited_text": self.cited_text,
 74 |                 "document_index": self.document_index,
 75 |                 "start_char_index": self.start_char_index,
 76 |                 "end_char_index": self.end_char_index
 77 |             }
 78 | 
 79 |             if self.document_title:
 80 |                 citation_dict["document_title"] = self.document_title
 81 | 
 82 |             if self.supported_text:
 83 |                 citation_dict["supported_text"] = self.supported_text
 84 | 
 85 |             return citation_dict
 86 | 
 87 |     citations: list[Citation]
 88 | 
 89 |     @classmethod
 90 |     def from_dict_list(cls, citations_dicts: list[dict[str, Any]]) -> "Citations":
 91 |         """Convert a list of dictionaries to a Citations instance.
 92 | 
 93 |         Args:
 94 |             citations_dicts: A list of dictionaries, where each dictionary should have 'cited_text' key
 95 |                 and 'document_index', 'start_char_index', 'end_char_index' keys.
 96 | 
 97 |         Returns:
 98 |             A Citations instance.
 99 | 
100 |         Example:
101 |             ```python
102 |             citations_dict = [
103 |                 {
104 |                     "cited_text": "The sky is blue",
105 |                     "document_index": 0,
106 |                     "document_title": "Weather Guide",
107 |                     "start_char_index": 0,
108 |                     "end_char_index": 15,
109 |                     "supported_text": "The sky was blue yesterday."
110 |                 }
111 |             ]
112 |             citations = Citations.from_dict_list(citations_dict)
113 |             ```
114 |         """
115 |         citations = [cls.Citation(**item) for item in citations_dicts]
116 |         return cls(citations=citations)
117 | 
118 |     @classmethod
119 |     def description(cls) -> str:
120 |         """Description of the citations type for use in prompts."""
121 |         return (
122 |             "Citations with quoted text and source references. "
123 |             "Include the exact text being cited and information about its source."
124 |         )
125 | 
126 |     def format(self) -> list[dict[str, Any]]:
127 |         """Format citations as a list of dictionaries."""
128 |         return [citation.format() for citation in self.citations]
129 | 
130 |     @pydantic.model_validator(mode="before")
131 |     @classmethod
132 |     def validate_input(cls, data: Any):
133 |         if isinstance(data, cls):
134 |             return data
135 | 
136 |         # Handle case where data is a list of dicts with citation info
137 |         if isinstance(data, list) and all(
138 |             isinstance(item, dict) and "cited_text" in item for item in data
139 |         ):
140 |             return {"citations": [cls.Citation(**item) for item in data]}
141 | 
142 |         # Handle case where data is a dict
143 |         elif isinstance(data, dict):
144 |             if "citations" in data:
145 |                 # Handle case where data is a dict with "citations" key
146 |                 citations_data = data["citations"]
147 |                 if isinstance(citations_data, list):
148 |                     return {
149 |                         "citations": [
150 |                             cls.Citation(**item) if isinstance(item, dict) else item
151 |                             for item in citations_data
152 |                         ]
153 |                     }
154 |             elif "cited_text" in data:
155 |                 # Handle case where data is a single citation dict
156 |                 return {"citations": [cls.Citation(**data)]}
157 | 
158 |         raise ValueError(f"Received invalid value for `Citations`: {data}")
159 | 
160 |     def __iter__(self):
161 |         """Allow iteration over citations."""
162 |         return iter(self.citations)
163 | 
164 |     def __len__(self):
165 |         """Return the number of citations."""
166 |         return len(self.citations)
167 | 
168 |     def __getitem__(self, index):
169 |         """Allow indexing into citations."""
170 |         return self.citations[index]
171 | 
172 |     @classmethod
173 |     def is_streamable(cls) -> bool:
174 |         """Whether the Citations type is streamable."""
175 |         return True
176 | 
177 |     @classmethod
178 |     def parse_stream_chunk(cls, chunk) -> Optional["Citations"]:
179 |         """
180 |         Parse a stream chunk into Citations.
181 | 
182 |         Args:
183 |             chunk: A stream chunk from the LM.
184 | 
185 |         Returns:
186 |             A Citations object if the chunk contains citation data, None otherwise.
187 |         """
188 |         try:
189 |             # Check if the chunk has citation data in provider_specific_fields
190 |             if hasattr(chunk, "choices") and chunk.choices:
191 |                 delta = chunk.choices[0].delta
192 |                 if hasattr(delta, "provider_specific_fields") and delta.provider_specific_fields:
193 |                     citation_data = delta.provider_specific_fields.get("citation")
194 |                     if citation_data:
195 |                         return cls.from_dict_list([citation_data])
196 |         except Exception:
197 |             pass
198 |         return None
199 | 
200 | 
201 |     @classmethod
202 |     def parse_lm_response(cls, response: str | dict[str, Any]) -> Optional["Citations"]:
203 |         """Parse a LM response into Citations.
204 | 
205 |         Args:
206 |             response: A LM response that may contain citation data.
207 | 
208 |         Returns:
209 |             A Citations object if citation data is found, None otherwise.
210 |         """
211 |         if isinstance(response, dict):
212 |             # Check if the response contains citations in the expected format
213 |             if "citations" in response:
214 |                 citations_data = response["citations"]
215 |                 if isinstance(citations_data, list):
216 |                     return cls.from_dict_list(citations_data)
217 | 
218 |         return None
219 | 
```

--------------------------------------------------------------------------------
/docs/docs/tutorials/conversation_history/index.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Managing Conversation History
  2 | 
  3 | Maintaining conversation history is a fundamental feature when building AI applications such as chatbots. While DSPy does not provide automatic conversation history management within `dspy.Module`, it offers the `dspy.History` utility to help you manage conversation history effectively.
  4 | 
  5 | ## Using `dspy.History` to Manage Conversation History
  6 | 
  7 | The `dspy.History` class can be used as an input field type, containing a `messages: list[dict[str, Any]]` attribute that stores the conversation history. Each entry in this list is a dictionary with keys corresponding to the fields defined in your signature. See the example below:
  8 | 
  9 | ```python
 10 | import dspy
 11 | import os
 12 | 
 13 | os.environ["OPENAI_API_KEY"] = "{your_openai_api_key}"
 14 | 
 15 | dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
 16 | 
 17 | class QA(dspy.Signature):
 18 |     question: str = dspy.InputField()
 19 |     history: dspy.History = dspy.InputField()
 20 |     answer: str = dspy.OutputField()
 21 | 
 22 | predict = dspy.Predict(QA)
 23 | history = dspy.History(messages=[])
 24 | 
 25 | while True:
 26 |     question = input("Type your question, end conversation by typing 'finish': ")
 27 |     if question == "finish":
 28 |         break
 29 |     outputs = predict(question=question, history=history)
 30 |     print(f"\n{outputs.answer}\n")
 31 |     history.messages.append({"question": question, **outputs})
 32 | 
 33 | dspy.inspect_history()
 34 | ```
 35 | 
 36 | There are two key steps when using the conversation history:
 37 | 
 38 | - **Include a field of type `dspy.History` in your Signature.**
 39 | - **Maintain a history instance at runtime, appending new conversation turns to it.** Each entry should include all relevant input and output field information.
 40 | 
 41 | A sample run might look like this:
 42 | 
 43 | ```
 44 | Type your question, end conversation by typing 'finish': do you know the competition between pytorch and tensorflow?
 45 | 
 46 | Yes, there is a notable competition between PyTorch and TensorFlow, which are two of the most popular deep learning frameworks. PyTorch, developed by Facebook, is known for its dynamic computation graph, which allows for more flexibility and ease of use, especially in research settings. TensorFlow, developed by Google, initially used a static computation graph but has since introduced eager execution to improve usability. TensorFlow is often favored in production environments due to its scalability and deployment capabilities. Both frameworks have strong communities and extensive libraries, and the choice between them often depends on specific project requirements and personal preference.
 47 | 
 48 | Type your question, end conversation by typing 'finish': which one won the battle? just tell me the result, don't include any reasoning, thanks!
 49 | 
 50 | There is no definitive winner; both PyTorch and TensorFlow are widely used and have their own strengths.
 51 | Type your question, end conversation by typing 'finish': finish
 52 | 
 53 | 
 54 | 
 55 | 
 56 | [2025-07-11T16:35:57.592762]
 57 | 
 58 | System message:
 59 | 
 60 | Your input fields are:
 61 | 1. `question` (str): 
 62 | 2. `history` (History):
 63 | Your output fields are:
 64 | 1. `answer` (str):
 65 | All interactions will be structured in the following way, with the appropriate values filled in.
 66 | 
 67 | [[ ## question ## ]]
 68 | {question}
 69 | 
 70 | [[ ## history ## ]]
 71 | {history}
 72 | 
 73 | [[ ## answer ## ]]
 74 | {answer}
 75 | 
 76 | [[ ## completed ## ]]
 77 | In adhering to this structure, your objective is: 
 78 |         Given the fields `question`, `history`, produce the fields `answer`.
 79 | 
 80 | 
 81 | User message:
 82 | 
 83 | [[ ## question ## ]]
 84 | do you know the competition between pytorch and tensorflow?
 85 | 
 86 | 
 87 | Assistant message:
 88 | 
 89 | [[ ## answer ## ]]
 90 | Yes, there is a notable competition between PyTorch and TensorFlow, which are two of the most popular deep learning frameworks. PyTorch, developed by Facebook, is known for its dynamic computation graph, which allows for more flexibility and ease of use, especially in research settings. TensorFlow, developed by Google, initially used a static computation graph but has since introduced eager execution to improve usability. TensorFlow is often favored in production environments due to its scalability and deployment capabilities. Both frameworks have strong communities and extensive libraries, and the choice between them often depends on specific project requirements and personal preference.
 91 | 
 92 | [[ ## completed ## ]]
 93 | 
 94 | 
 95 | User message:
 96 | 
 97 | [[ ## question ## ]]
 98 | which one won the battle? just tell me the result, don't include any reasoning, thanks!
 99 | 
100 | Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
101 | 
102 | 
103 | Response:
104 | 
105 | [[ ## answer ## ]]
106 | There is no definitive winner; both PyTorch and TensorFlow are widely used and have their own strengths.
107 | 
108 | [[ ## completed ## ]]
109 | ```
110 | 
111 | Notice how each user input and assistant response is appended to the history, allowing the model to maintain context across turns.
112 | 
113 | The actual prompt sent to the language model is a multi-turn message, as shown by the output of `dspy.inspect_history`. Each conversation turn is represented as a user message followed by an assistant message.
114 | 
115 | ## History in Few-shot Examples
116 | 
117 | You may notice that `history` does not appear in the input fields section of the prompt, even though it is listed as an input field (e.g., "2. `history` (History):" in the system message). This is intentional: when formatting few-shot examples that include conversation history, DSPy does not expand the history into multiple turns. Instead, to remain compatible with the OpenAI standard format, each few-shot example is represented as a single turn.
118 | 
119 | For example:
120 | 
121 | ```
122 | import dspy
123 | 
124 | dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
125 | 
126 | 
127 | class QA(dspy.Signature):
128 |     question: str = dspy.InputField()
129 |     history: dspy.History = dspy.InputField()
130 |     answer: str = dspy.OutputField()
131 | 
132 | 
133 | predict = dspy.Predict(QA)
134 | history = dspy.History(messages=[])
135 | 
136 | predict.demos.append(
137 |     dspy.Example(
138 |         question="What is the capital of France?",
139 |         history=dspy.History(
140 |             messages=[{"question": "What is the capital of Germany?", "answer": "The capital of Germany is Berlin."}]
141 |         ),
142 |         answer="The capital of France is Paris.",
143 |     )
144 | )
145 | 
146 | predict(question="What is the capital of America?", history=dspy.History(messages=[]))
147 | dspy.inspect_history()
148 | ```
149 | 
150 | The resulting history will look like this:
151 | 
152 | ```
153 | [2025-07-11T16:53:10.994111]
154 | 
155 | System message:
156 | 
157 | Your input fields are:
158 | 1. `question` (str): 
159 | 2. `history` (History):
160 | Your output fields are:
161 | 1. `answer` (str):
162 | All interactions will be structured in the following way, with the appropriate values filled in.
163 | 
164 | [[ ## question ## ]]
165 | {question}
166 | 
167 | [[ ## history ## ]]
168 | {history}
169 | 
170 | [[ ## answer ## ]]
171 | {answer}
172 | 
173 | [[ ## completed ## ]]
174 | In adhering to this structure, your objective is: 
175 |         Given the fields `question`, `history`, produce the fields `answer`.
176 | 
177 | 
178 | User message:
179 | 
180 | [[ ## question ## ]]
181 | What is the capital of France?
182 | 
183 | [[ ## history ## ]]
184 | {"messages": [{"question": "What is the capital of Germany?", "answer": "The capital of Germany is Berlin."}]}
185 | 
186 | 
187 | Assistant message:
188 | 
189 | [[ ## answer ## ]]
190 | The capital of France is Paris.
191 | 
192 | [[ ## completed ## ]]
193 | 
194 | 
195 | User message:
196 | 
197 | [[ ## question ## ]]
198 | What is the capital of Germany?
199 | 
200 | Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
201 | 
202 | 
203 | Response:
204 | 
205 | [[ ## answer ## ]]
206 | The capital of Germany is Berlin.
207 | 
208 | [[ ## completed ## ]]
209 | ```
210 | 
211 | As you can see, the few-shot example does not expand the conversation history into multiple turns. Instead, it represents the history as JSON data within its section:
212 | 
213 | ```
214 | [[ ## history ## ]]
215 | {"messages": [{"question": "What is the capital of Germany?", "answer": "The capital of Germany is Berlin."}]}
216 | ```
217 | 
218 | This approach ensures compatibility with standard prompt formats while still providing the model with relevant conversational context.
219 | 
220 | 
```

--------------------------------------------------------------------------------
/dspy/predict/program_of_thought.py:
--------------------------------------------------------------------------------

```python
  1 | import json
  2 | import logging
  3 | import re
  4 | 
  5 | import dspy
  6 | from dspy.primitives.module import Module
  7 | from dspy.primitives.python_interpreter import PythonInterpreter
  8 | from dspy.signatures.signature import Signature, ensure_signature
  9 | 
 10 | logger = logging.getLogger(__name__)
 11 | 
 12 | 
 13 | class ProgramOfThought(Module):
 14 |     """
 15 |     A DSPy module that runs Python programs to solve a problem.
 16 |     This module requires deno to be installed. Please install deno following https://docs.deno.com/runtime/getting_started/installation/
 17 | 
 18 |     Example:
 19 |     ```
 20 |     import dspy
 21 | 
 22 |     lm = dspy.LM('openai/gpt-4o-mini')
 23 |     dspy.configure(lm=lm)
 24 |     pot = dspy.ProgramOfThought("question -> answer")
 25 |     pot(question="what is 1+1?")
 26 |     ```
 27 |     """
 28 | 
 29 |     def __init__(self, signature: str | type[Signature], max_iters: int = 3, interpreter: PythonInterpreter | None = None):
 30 |         """
 31 |         Args:
 32 |             signature: The signature of the module.
 33 |             max_iters: The maximum number of iterations to retry code generation and execution.
 34 |             interpreter: PythonInterpreter instance to use. If None, a new one is instantiated.
 35 |         """
 36 |         super().__init__()
 37 |         self.signature = signature = ensure_signature(signature)
 38 |         self.max_iters = max_iters
 39 | 
 40 |         self.input_fields = signature.input_fields
 41 |         self.output_fields = signature.output_fields
 42 | 
 43 |         self.code_generate = dspy.ChainOfThought(
 44 |             dspy.Signature(
 45 |                 self._generate_signature("generate").fields,
 46 |                 self._generate_instruction("generate"),
 47 |             ),
 48 |         )
 49 |         self.code_regenerate = dspy.ChainOfThought(
 50 |             dspy.Signature(
 51 |                 self._generate_signature("regenerate").fields,
 52 |                 self._generate_instruction("regenerate"),
 53 |             ),
 54 |         )
 55 |         self.generate_answer = dspy.ChainOfThought(
 56 |             dspy.Signature(
 57 |                 self._generate_signature("answer").fields,
 58 |                 self._generate_instruction("answer"),
 59 |             ),
 60 |         )
 61 |         # It will raises exception when dspy cannot find available deno instance by now.
 62 |         self.interpreter = interpreter or PythonInterpreter()
 63 | 
 64 |     def _generate_signature(self, mode):
 65 |         signature_dict = dict(self.input_fields)
 66 |         fields_for_mode = {
 67 |             "generate": {
 68 |                 "generated_code": dspy.OutputField(
 69 |                     prefix="Code:",
 70 |                     desc="python code that answers the question",
 71 |                     format=str,
 72 |                 ),
 73 |             },
 74 |             "regenerate": {
 75 |                 "previous_code": dspy.InputField(
 76 |                     prefix="Previous Code:",
 77 |                     desc="previously-generated python code that errored",
 78 |                     format=str,
 79 |                 ),
 80 |                 "error": dspy.InputField(
 81 |                     prefix="Error:",
 82 |                     desc="error message from previously-generated python code",
 83 |                 ),
 84 |                 "generated_code": dspy.OutputField(
 85 |                     prefix="Code:",
 86 |                     desc="python code that answers the question",
 87 |                     format=str,
 88 |                 ),
 89 |             },
 90 |             "answer": {
 91 |                 "final_generated_code": dspy.InputField(
 92 |                     prefix="Code:",
 93 |                     desc="python code that answers the question",
 94 |                     format=str,
 95 |                 ),
 96 |                 "code_output": dspy.InputField(
 97 |                     prefix="Code Output:",
 98 |                     desc="output of previously-generated python code",
 99 |                 ),
100 |             }
101 |             | self.signature.output_fields,
102 |         }
103 |         signature_dict.update(fields_for_mode[mode])
104 |         return dspy.Signature(signature_dict)
105 | 
106 |     def _generate_instruction(self, mode):
107 |         mode_inputs = ", ".join(
108 |             [f"`{field_name}`" for field_name in self._generate_signature(mode).input_fields],
109 |         )
110 |         mode_outputs = ", ".join(
111 |             [f"`{field_name}`" for field_name in self._generate_signature(mode).output_fields],
112 |         )
113 |         final_outputs = ", ".join(
114 |             [f"`{field_name}`" for field_name in self.output_fields],
115 |         )
116 |         if mode == "generate":
117 |             instr = [
118 |                 f"You will be given {mode_inputs} and you will respond with {mode_outputs}.",
119 |                 f"Generating executable Python code that programmatically computes the correct {mode_outputs}.",
120 |                 "After you're done with the computation and think you have the answer, make sure to provide your answer by calling the preloaded function `final_answer()`.",
121 |                 f'You should structure your answer in a dict object, like {{"field_a": answer_a, ...}}, evaluates to the correct value mapping for {final_outputs}.',
122 |             ]
123 |         elif mode == "regenerate":
124 |             instr = [
125 |                 f"You are given {mode_inputs} due to an error in previous code.",
126 |                 "Your task is to correct the error and provide the new `generated_code`.",
127 |             ]
128 |         else:  # mode == 'answer'
129 |             instr = [
130 |                 f"Given the final code {mode_inputs}, provide the final {mode_outputs}.",
131 |             ]
132 | 
133 |         return "\n".join(instr)
134 | 
135 |     def _parse_code(self, code_data):
136 |         code = code_data.get("generated_code", "").split("---", 1)[0].split("\n\n\n", 1)[0]
137 |         code_match = re.search(r"```python[ \n](.*?)[ \n]```?", code, re.DOTALL)
138 |         code_block = (code_match.group(1) if code_match else code).replace("\\n", "\n")
139 |         if not code_block:
140 |             return code, "Error: Empty code after parsing."
141 |         if "\n" not in code_block and code_block.count("=") > 1:
142 |             return code, "Error: Code format is not correct."
143 |         lines = code_block.split("\n")
144 |         last_line_match = re.match(r"^(\w+)\s*=", lines[-1].strip())
145 |         if last_line_match and len(lines) > 1:
146 |             code_block += "\n" + last_line_match.group(1)
147 |         else:
148 |             code_block = re.sub(
149 |                 r"([a-zA-Z_]\w* *=.*?)(?=[a-zA-Z_]\w* *=)",
150 |                 r"\1\n",
151 |                 code_block,
152 |             )
153 |             code_block = re.sub(
154 |                 r"([a-zA-Z_]\w* *=.*?)([a-zA-Z_]\w*)$",
155 |                 r"\1\n\2",
156 |                 code_block,
157 |             )
158 |         return code_block, None
159 | 
160 |     def _execute_code(self, code):
161 |         """
162 |         Execute the code using PythonInterpreter and return the output or error.
163 |         """
164 |         if not code:
165 |             return None, "Error: Empty code before execution."
166 | 
167 |         try:
168 |             # Since it's more complex structure now, just blindly use json to represents all.
169 |             output = json.dumps(self.interpreter.execute(code))
170 |             return output, None
171 |         except Exception as e:
172 |             return None, str(e)
173 | 
174 |     def forward(self, **kwargs):
175 |         input_kwargs = {field_name: kwargs[field_name] for field_name in self.input_fields}
176 |         code_data = self.code_generate(**input_kwargs)
177 |         output = None
178 |         code, error = self._parse_code(code_data)
179 |         if not error:
180 |             output, error = self._execute_code(code)
181 |         hop = 1
182 |         # Retying code generation and execution until no error or reach max_iters
183 |         while error is not None:
184 |             logger.error(f"Error in code execution: {error}")
185 |             if hop == self.max_iters:
186 |                 self.interpreter.shutdown()
187 |                 raise RuntimeError(f"Max hops reached. Failed to run ProgramOfThought: {error}")
188 |             input_kwargs.update({"previous_code": code, "error": error})
189 |             code_data = self.code_regenerate(**input_kwargs)
190 |             code, error = self._parse_code(code_data)
191 |             if not error:
192 |                 output, error = self._execute_code(code)
193 |             hop += 1
194 |         input_kwargs.update({"final_generated_code": code, "code_output": output})
195 |         answer_gen_result = self.generate_answer(**input_kwargs)
196 |         self.interpreter.shutdown()
197 |         return answer_gen_result
198 | 
```

--------------------------------------------------------------------------------
/dspy/adapters/types/image.py:
--------------------------------------------------------------------------------

```python
  1 | import base64
  2 | import io
  3 | import mimetypes
  4 | import os
  5 | import warnings
  6 | from functools import lru_cache
  7 | from typing import Any, Union
  8 | from urllib.parse import urlparse
  9 | 
 10 | import pydantic
 11 | import requests
 12 | 
 13 | from dspy.adapters.types.base_type import Type
 14 | 
 15 | try:
 16 |     from PIL import Image as PILImage
 17 | 
 18 |     PIL_AVAILABLE = True
 19 | except ImportError:
 20 |     PIL_AVAILABLE = False
 21 | 
 22 | 
 23 | class Image(Type):
 24 |     url: str
 25 | 
 26 |     model_config = pydantic.ConfigDict(
 27 |         frozen=True,
 28 |         str_strip_whitespace=True,
 29 |         validate_assignment=True,
 30 |         extra="forbid",
 31 |     )
 32 | 
 33 |     def __init__(self, url: Any = None, *, download: bool = False, **data):
 34 |         """Create an Image.
 35 | 
 36 |         Parameters
 37 |         ----------
 38 |         url:
 39 |             The image source. Supported values include
 40 | 
 41 |             - ``str``: HTTP(S)/GS URL or local file path
 42 |             - ``bytes``: raw image bytes
 43 |             - ``PIL.Image.Image``: a PIL image instance
 44 |             - ``dict`` with a single ``{"url": value}`` entry (legacy form)
 45 |             - already encoded data URI
 46 | 
 47 |         download:
 48 |             Whether remote URLs should be downloaded to infer their MIME type.
 49 | 
 50 |         Any additional keyword arguments are passed to :class:`pydantic.BaseModel`.
 51 |         """
 52 | 
 53 |         if url is not None and "url" not in data:
 54 |             # Support a positional argument while allowing ``url=`` in **data.
 55 |             if isinstance(url, dict) and set(url.keys()) == {"url"}:
 56 |                 # Legacy dict form from previous model validator.
 57 |                 data["url"] = url["url"]
 58 |             else:
 59 |                 # ``url`` may be a string, bytes, or a PIL image.
 60 |                 data["url"] = url
 61 | 
 62 |         if "url" in data:
 63 |             # Normalize any accepted input into a base64 data URI or plain URL.
 64 |             data["url"] = encode_image(data["url"], download_images=download)
 65 | 
 66 |         # Delegate the rest of initialization to pydantic's BaseModel.
 67 |         super().__init__(**data)
 68 | 
 69 |     @lru_cache(maxsize=32)
 70 |     def format(self) -> list[dict[str, Any]] | str:
 71 |         try:
 72 |             image_url = encode_image(self.url)
 73 |         except Exception as e:
 74 |             raise ValueError(f"Failed to format image for DSPy: {e}")
 75 |         return [{"type": "image_url", "image_url": {"url": image_url}}]
 76 | 
 77 |     @classmethod
 78 |     def from_url(cls, url: str, download: bool = False):
 79 |         warnings.warn(
 80 |             "Image.from_url is deprecated; use Image(url) instead.",
 81 |             DeprecationWarning,
 82 |             stacklevel=2,
 83 |         )
 84 |         return cls(url, download=download)
 85 | 
 86 |     @classmethod
 87 |     def from_file(cls, file_path: str):
 88 |         warnings.warn(
 89 |             "Image.from_file is deprecated; use Image(file_path) instead.",
 90 |             DeprecationWarning,
 91 |             stacklevel=2,
 92 |         )
 93 |         return cls(file_path)
 94 | 
 95 |     @classmethod
 96 |     def from_PIL(cls, pil_image):  # noqa: N802
 97 |         warnings.warn(
 98 |             "Image.from_PIL is deprecated; use Image(pil_image) instead.",
 99 |             DeprecationWarning,
100 |             stacklevel=2,
101 |         )
102 |         return cls(pil_image)
103 | 
104 |     def __str__(self):
105 |         return self.serialize_model()
106 | 
107 |     def __repr__(self):
108 |         if "base64" in self.url:
109 |             len_base64 = len(self.url.split("base64,")[1])
110 |             image_type = self.url.split(";")[0].split("/")[-1]
111 |             return f"Image(url=data:image/{image_type};base64,<IMAGE_BASE_64_ENCODED({len_base64!s})>)"
112 |         return f"Image(url='{self.url}')"
113 | 
114 | 
115 | def is_url(string: str) -> bool:
116 |     """Check if a string is a valid URL."""
117 |     try:
118 |         result = urlparse(string)
119 |         return all([result.scheme in ("http", "https", "gs"), result.netloc])
120 |     except ValueError:
121 |         return False
122 | 
123 | 
124 | def encode_image(image: Union[str, bytes, "PILImage.Image", dict], download_images: bool = False) -> str:
125 |     """
126 |     Encode an image or file to a base64 data URI.
127 | 
128 |     Args:
129 |         image: The image or file to encode. Can be a PIL Image, file path, URL, or data URI.
130 |         download_images: Whether to download images from URLs.
131 | 
132 |     Returns:
133 |         str: The data URI of the file or the URL if download_images is False.
134 | 
135 |     Raises:
136 |         ValueError: If the file type is not supported.
137 |     """
138 |     if isinstance(image, dict) and "url" in image:
139 |         # NOTE: Not doing other validation for now
140 |         return image["url"]
141 |     elif isinstance(image, str):
142 |         if image.startswith("data:"):
143 |             # Already a data URI
144 |             return image
145 |         elif os.path.isfile(image):
146 |             # File path
147 |             return _encode_image_from_file(image)
148 |         elif is_url(image):
149 |             # URL
150 |             if download_images:
151 |                 return _encode_image_from_url(image)
152 |             else:
153 |                 # Return the URL as is
154 |                 return image
155 |         else:
156 |             # Unsupported string format
157 |             raise ValueError(f"Unrecognized file string: {image}; If this file type should be supported, please open an issue.")
158 |     elif PIL_AVAILABLE and isinstance(image, PILImage.Image):
159 |         # PIL Image
160 |         return _encode_pil_image(image)
161 |     elif isinstance(image, bytes):
162 |         # Raw bytes
163 |         if not PIL_AVAILABLE:
164 |             raise ImportError("Pillow is required to process image bytes.")
165 |         img = PILImage.open(io.BytesIO(image))
166 |         return _encode_pil_image(img)
167 |     elif isinstance(image, Image):
168 |         return image.url
169 |     else:
170 |         print(f"Unsupported image type: {type(image)}")
171 |         raise ValueError(f"Unsupported image type: {type(image)}")
172 | 
173 | 
174 | def _encode_image_from_file(file_path: str) -> str:
175 |     """Encode a file from a file path to a base64 data URI."""
176 |     with open(file_path, "rb") as file:
177 |         file_data = file.read()
178 | 
179 |     # Use mimetypes to guess directly from the file path
180 |     mime_type, _ = mimetypes.guess_type(file_path)
181 |     if mime_type is None:
182 |         raise ValueError(f"Could not determine MIME type for file: {file_path}")
183 | 
184 |     encoded_data = base64.b64encode(file_data).decode("utf-8")
185 |     return f"data:{mime_type};base64,{encoded_data}"
186 | 
187 | 
188 | def _encode_image_from_url(image_url: str) -> str:
189 |     """Encode a file from a URL to a base64 data URI."""
190 |     response = requests.get(image_url)
191 |     response.raise_for_status()
192 |     content_type = response.headers.get("Content-Type", "")
193 | 
194 |     # Use the content type from the response headers if available
195 |     if content_type:
196 |         mime_type = content_type
197 |     else:
198 |         # Try to guess MIME type from URL
199 |         mime_type, _ = mimetypes.guess_type(image_url)
200 |         if mime_type is None:
201 |             raise ValueError(f"Could not determine MIME type for URL: {image_url}")
202 | 
203 |     encoded_data = base64.b64encode(response.content).decode("utf-8")
204 |     return f"data:{mime_type};base64,{encoded_data}"
205 | 
206 | 
207 | def _encode_pil_image(image: "PILImage") -> str:
208 |     """Encode a PIL Image object to a base64 data URI."""
209 |     buffered = io.BytesIO()
210 |     file_format = image.format or "PNG"
211 |     image.save(buffered, format=file_format)
212 | 
213 |     # Get the correct MIME type using the image format
214 |     file_extension = file_format.lower()
215 |     mime_type, _ = mimetypes.guess_type(f"file.{file_extension}")
216 |     if mime_type is None:
217 |         raise ValueError(f"Could not determine MIME type for image format: {file_format}")
218 | 
219 |     encoded_data = base64.b64encode(buffered.getvalue()).decode("utf-8")
220 |     return f"data:{mime_type};base64,{encoded_data}"
221 | 
222 | 
223 | def _get_file_extension(path_or_url: str) -> str:
224 |     """Extract the file extension from a file path or URL."""
225 |     extension = os.path.splitext(urlparse(path_or_url).path)[1].lstrip(".").lower()
226 |     return extension or "png"  # Default to 'png' if no extension found
227 | 
228 | 
229 | def is_image(obj) -> bool:
230 |     """Check if the object is an image or a valid media file reference."""
231 |     if PIL_AVAILABLE and isinstance(obj, PILImage.Image):
232 |         return True
233 |     if isinstance(obj, str):
234 |         if obj.startswith("data:"):
235 |             return True
236 |         elif os.path.isfile(obj):
237 |             return True
238 |         elif is_url(obj):
239 |             return True
240 |     return False
241 | 
```

--------------------------------------------------------------------------------
/dspy/primitives/module.py:
--------------------------------------------------------------------------------

```python
  1 | import inspect
  2 | import logging
  3 | from typing import Any
  4 | 
  5 | import magicattr
  6 | 
  7 | from dspy.dsp.utils.settings import settings, thread_local_overrides
  8 | from dspy.predict.parallel import Parallel
  9 | from dspy.primitives.base_module import BaseModule
 10 | from dspy.primitives.example import Example
 11 | from dspy.primitives.prediction import Prediction
 12 | from dspy.utils.callback import with_callbacks
 13 | from dspy.utils.inspect_history import pretty_print_history
 14 | from dspy.utils.usage_tracker import track_usage
 15 | 
 16 | logger = logging.getLogger(__name__)
 17 | 
 18 | 
 19 | class ProgramMeta(type):
 20 |     """Metaclass ensuring every ``dspy.Module`` instance is properly initialised."""
 21 | 
 22 |     def __call__(cls, *args, **kwargs):
 23 |         # Create the instance without invoking ``__init__`` so we can inject
 24 |         # the base initialization beforehand.
 25 |         obj = cls.__new__(cls, *args, **kwargs)
 26 |         if isinstance(obj, cls):
 27 |             # ``_base_init`` sets attributes that should exist on all modules
 28 |             # even when a subclass forgets to call ``super().__init__``.
 29 |             Module._base_init(obj)
 30 |             cls.__init__(obj, *args, **kwargs)
 31 | 
 32 |             # Guarantee existence of critical attributes if ``__init__`` didn't
 33 |             # create them.
 34 |             if not hasattr(obj, "callbacks"):
 35 |                 obj.callbacks = []
 36 |             if not hasattr(obj, "history"):
 37 |                 obj.history = []
 38 |         return obj
 39 | 
 40 | 
 41 | class Module(BaseModule, metaclass=ProgramMeta):
 42 |     def _base_init(self):
 43 |         self._compiled = False
 44 |         self.callbacks = []
 45 |         self.history = []
 46 | 
 47 |     def __init__(self, callbacks=None):
 48 |         self.callbacks = callbacks or []
 49 |         self._compiled = False
 50 |         # LM calling history of the module.
 51 |         self.history = []
 52 | 
 53 |     def __getstate__(self):
 54 |         state = self.__dict__.copy()
 55 |         state.pop("history", None)
 56 |         state.pop("callbacks", None)
 57 |         return state
 58 | 
 59 |     def __setstate__(self, state):
 60 |         self.__dict__.update(state)
 61 |         if not hasattr(self, "history"):
 62 |             self.history = []
 63 |         if not hasattr(self, "callbacks"):
 64 |             self.callbacks = []
 65 | 
 66 |     @with_callbacks
 67 |     def __call__(self, *args, **kwargs) -> Prediction:
 68 |         caller_modules = settings.caller_modules or []
 69 |         caller_modules = list(caller_modules)
 70 |         caller_modules.append(self)
 71 | 
 72 |         with settings.context(caller_modules=caller_modules):
 73 |             if settings.track_usage and thread_local_overrides.get().get("usage_tracker") is None:
 74 |                 with track_usage() as usage_tracker:
 75 |                     output = self.forward(*args, **kwargs)
 76 |                 tokens = usage_tracker.get_total_tokens()
 77 |                 self._set_lm_usage(tokens, output)
 78 | 
 79 |                 return output
 80 | 
 81 |             return self.forward(*args, **kwargs)
 82 | 
 83 |     @with_callbacks
 84 |     async def acall(self, *args, **kwargs) -> Prediction:
 85 |         caller_modules = settings.caller_modules or []
 86 |         caller_modules = list(caller_modules)
 87 |         caller_modules.append(self)
 88 | 
 89 |         with settings.context(caller_modules=caller_modules):
 90 |             if settings.track_usage and thread_local_overrides.get().get("usage_tracker") is None:
 91 |                 with track_usage() as usage_tracker:
 92 |                     output = await self.aforward(*args, **kwargs)
 93 |                     tokens = usage_tracker.get_total_tokens()
 94 |                     self._set_lm_usage(tokens, output)
 95 | 
 96 |                     return output
 97 | 
 98 |             return await self.aforward(*args, **kwargs)
 99 | 
100 |     def named_predictors(self):
101 |         from dspy.predict.predict import Predict
102 | 
103 |         return [(name, param) for name, param in self.named_parameters() if isinstance(param, Predict)]
104 | 
105 |     def predictors(self):
106 |         return [param for _, param in self.named_predictors()]
107 | 
108 |     def set_lm(self, lm):
109 |         for _, param in self.named_predictors():
110 |             param.lm = lm
111 | 
112 |     def get_lm(self):
113 |         all_used_lms = [param.lm for _, param in self.named_predictors()]
114 | 
115 |         if len(set(all_used_lms)) == 1:
116 |             return all_used_lms[0]
117 | 
118 |         raise ValueError("Multiple LMs are being used in the module. There's no unique LM to return.")
119 | 
120 |     def __repr__(self):
121 |         s = []
122 | 
123 |         for name, param in self.named_predictors():
124 |             s.append(f"{name} = {param}")
125 | 
126 |         return "\n".join(s)
127 | 
128 |     def map_named_predictors(self, func):
129 |         """Applies a function to all named predictors."""
130 |         for name, predictor in self.named_predictors():
131 |             set_attribute_by_name(self, name, func(predictor))
132 |         return self
133 | 
134 |     def inspect_history(self, n: int = 1):
135 |         return pretty_print_history(self.history, n)
136 | 
137 |     def batch(
138 |         self,
139 |         examples: list[Example],
140 |         num_threads: int | None = None,
141 |         max_errors: int | None = None,
142 |         return_failed_examples: bool = False,
143 |         provide_traceback: bool | None = None,
144 |         disable_progress_bar: bool = False,
145 |     ) -> list[Example] | tuple[list[Example], list[Example], list[Exception]]:
146 |         """
147 |         Processes a list of dspy.Example instances in parallel using the Parallel module.
148 | 
149 |         Args:
150 |             examples: List of dspy.Example instances to process.
151 |             num_threads: Number of threads to use for parallel processing.
152 |             max_errors: Maximum number of errors allowed before stopping execution.
153 |                 If ``None``, inherits from ``dspy.settings.max_errors``.
154 |             return_failed_examples: Whether to return failed examples and exceptions.
155 |             provide_traceback: Whether to include traceback information in error logs.
156 |             disable_progress_bar: Whether to display the progress bar.
157 | 
158 |         Returns:
159 |             List of results, and optionally failed examples and exceptions.
160 |         """
161 |         # Create a list of execution pairs (self, example)
162 |         exec_pairs = [(self, example.inputs()) for example in examples]
163 | 
164 |         # Create an instance of Parallel
165 |         parallel_executor = Parallel(
166 |             num_threads=num_threads,
167 |             max_errors=max_errors,
168 |             return_failed_examples=return_failed_examples,
169 |             provide_traceback=provide_traceback,
170 |             disable_progress_bar=disable_progress_bar,
171 |         )
172 | 
173 |         # Execute the forward method of Parallel
174 |         if return_failed_examples:
175 |             results, failed_examples, exceptions = parallel_executor.forward(exec_pairs)
176 |             return results, failed_examples, exceptions
177 |         else:
178 |             results = parallel_executor.forward(exec_pairs)
179 |             return results
180 | 
181 |     def _set_lm_usage(self, tokens: dict[str, Any], output: Any):
182 |         # Some optimizers (e.g., GEPA bootstrap tracing) temporarily patch
183 |         # module.forward to return a tuple: (prediction, trace).
184 |         # When usage tracking is enabled, ensure we attach usage to the
185 |         # prediction object if present.
186 |         prediction_in_output = None
187 |         if isinstance(output, Prediction):
188 |             prediction_in_output = output
189 |         elif isinstance(output, tuple) and len(output) > 0 and isinstance(output[0], Prediction):
190 |             prediction_in_output = output[0]
191 |         if prediction_in_output:
192 |             prediction_in_output.set_lm_usage(tokens)
193 |         else:
194 |             logger.warning("Failed to set LM usage. Please return `dspy.Prediction` object from dspy.Module to enable usage tracking.")
195 | 
196 | 
197 |     def __getattribute__(self, name):
198 |         attr = super().__getattribute__(name)
199 | 
200 |         if name == "forward" and callable(attr):
201 |             # Check if forward is called through __call__ or directly
202 |             stack = inspect.stack()
203 |             forward_called_directly = len(stack) <= 1 or stack[1].function != "__call__"
204 | 
205 |             if forward_called_directly:
206 |                 logger.warning(
207 |                     f"Calling module.forward(...) on {self.__class__.__name__} directly is discouraged. "
208 |                     f"Please use module(...) instead."
209 |                 )
210 | 
211 |         return attr
212 | 
213 | 
214 | def set_attribute_by_name(obj, name, value):
215 |     magicattr.set(obj, name, value)
216 | 
```

--------------------------------------------------------------------------------
/dspy/dsp/colbertv2.py:
--------------------------------------------------------------------------------

```python
  1 | from typing import Any
  2 | 
  3 | import requests
  4 | 
  5 | from dspy.clients.cache import request_cache
  6 | from dspy.dsp.utils import dotdict
  7 | 
  8 | # TODO: Ideally, this takes the name of the index and looks up its port.
  9 | 
 10 | 
 11 | class ColBERTv2:
 12 |     """Wrapper for the ColBERTv2 Retrieval."""
 13 | 
 14 |     def __init__(
 15 |         self,
 16 |         url: str = "http://0.0.0.0",
 17 |         port: str | int | None = None,
 18 |         post_requests: bool = False,
 19 |     ):
 20 |         self.post_requests = post_requests
 21 |         self.url = f"{url}:{port}" if port else url
 22 | 
 23 |     def __call__(
 24 |         self,
 25 |         query: str,
 26 |         k: int = 10,
 27 |         simplify: bool = False,
 28 |     ) -> list[str] | list[dotdict]:
 29 |         if self.post_requests:
 30 |             topk: list[dict[str, Any]] = colbertv2_post_request(self.url, query, k)
 31 |         else:
 32 |             topk: list[dict[str, Any]] = colbertv2_get_request(self.url, query, k)
 33 | 
 34 |         if simplify:
 35 |             return [psg["long_text"] for psg in topk]
 36 | 
 37 |         return [dotdict(psg) for psg in topk]
 38 | 
 39 | 
 40 | @request_cache()
 41 | def colbertv2_get_request_v2(url: str, query: str, k: int):
 42 |     assert k <= 100, "Only k <= 100 is supported for the hosted ColBERTv2 server at the moment."
 43 | 
 44 |     payload = {"query": query, "k": k}
 45 |     res = requests.get(url, params=payload, timeout=10)
 46 | 
 47 |     topk = res.json()["topk"][:k]
 48 |     topk = [{**d, "long_text": d["text"]} for d in topk]
 49 |     return topk[:k]
 50 | 
 51 | 
 52 | @request_cache()
 53 | def colbertv2_get_request_v2_wrapped(*args, **kwargs):
 54 |     return colbertv2_get_request_v2(*args, **kwargs)
 55 | 
 56 | 
 57 | colbertv2_get_request = colbertv2_get_request_v2_wrapped
 58 | 
 59 | 
 60 | @request_cache()
 61 | def colbertv2_post_request_v2(url: str, query: str, k: int):
 62 |     headers = {"Content-Type": "application/json; charset=utf-8"}
 63 |     payload = {"query": query, "k": k}
 64 |     res = requests.post(url, json=payload, headers=headers, timeout=10)
 65 | 
 66 |     return res.json()["topk"][:k]
 67 | 
 68 | 
 69 | @request_cache()
 70 | def colbertv2_post_request_v2_wrapped(*args, **kwargs):
 71 |     return colbertv2_post_request_v2(*args, **kwargs)
 72 | 
 73 | 
 74 | colbertv2_post_request = colbertv2_post_request_v2_wrapped
 75 | 
 76 | 
 77 | class ColBERTv2RetrieverLocal:
 78 |     def __init__(self, passages: list[str], colbert_config=None, load_only: bool = False):
 79 |         """Colbertv2 retriever module
 80 | 
 81 |         Args:
 82 |             passages (list[str]): list of passages
 83 |             colbert_config (ColBERTConfig, optional): colbert config for building and searching. Defaults to None.
 84 |             load_only (bool, optional): whether to load the index or build and then load. Defaults to False.
 85 |         """
 86 |         assert (
 87 |             colbert_config is not None
 88 |         ), "Please pass a valid colbert_config, which you can import from colbert.infra.config import ColBERTConfig and modify it"
 89 |         self.colbert_config = colbert_config
 90 | 
 91 |         assert (
 92 |             self.colbert_config.checkpoint is not None
 93 |         ), "Please pass a valid checkpoint like colbert-ir/colbertv2.0, which you can modify in the ColBERTConfig with attribute name checkpoint"
 94 |         self.passages = passages
 95 | 
 96 |         assert (
 97 |             self.colbert_config.index_name is not None
 98 |         ), "Please pass a valid index_name, which you can modify in the ColBERTConfig with attribute name index_name"
 99 |         self.passages = passages
100 | 
101 |         if not load_only:
102 |             print(
103 |                 f"Building the index for experiment {self.colbert_config.experiment} with index name "
104 |                 f"{self.colbert_config.index_name}"
105 |             )
106 |             self.build_index()
107 | 
108 |         print(
109 |             f"Loading the index for experiment {self.colbert_config.experiment} with index name "
110 |             f"{self.colbert_config.index_name}"
111 |         )
112 |         self.searcher = self.get_index()
113 | 
114 |     def build_index(self):
115 |         try:
116 |             import colbert  # noqa: F401
117 |         except ImportError:
118 |             print(
119 |                 "Colbert not found. Please check your installation or install the module using pip install "
120 |                 "colbert-ai[faiss-gpu,torch]."
121 |             )
122 | 
123 |         from colbert import Indexer
124 |         from colbert.infra import Run, RunConfig
125 | 
126 |         with Run().context(RunConfig(nranks=self.colbert_config.nranks, experiment=self.colbert_config.experiment)):
127 |             indexer = Indexer(checkpoint=self.colbert_config.checkpoint, config=self.colbert_config)
128 |             indexer.index(name=self.colbert_config.index_name, collection=self.passages, overwrite=True)
129 | 
130 |     def get_index(self):
131 |         try:
132 |             import colbert  # noqa: F401
133 |         except ImportError:
134 |             print(
135 |                 "Colbert not found. Please check your installation or install the module using pip install "
136 |                 "colbert-ai[faiss-gpu,torch]."
137 |             )
138 | 
139 |         from colbert import Searcher
140 |         from colbert.infra import Run, RunConfig
141 | 
142 |         with Run().context(RunConfig(experiment=self.colbert_config.experiment)):
143 |             searcher = Searcher(index=self.colbert_config.index_name, collection=self.passages)
144 |         return searcher
145 | 
146 |     def __call__(self, *args: Any, **kwargs: Any) -> Any:
147 |         return self.forward(*args, **kwargs)
148 | 
149 |     def forward(self, query: str, k: int = 7, **kwargs):
150 |         import torch
151 | 
152 |         if kwargs.get("filtered_pids"):
153 |             filtered_pids = kwargs.get("filtered_pids")
154 |             assert isinstance(filtered_pids, list) and all(isinstance(pid, int) for pid in filtered_pids), "The filtered pids should be a list of integers"
155 |             device = "cuda" if torch.cuda.is_available() else "cpu"
156 |             results = self.searcher.search(
157 |                 query,
158 |                 # Number of passages to receive
159 |                 k=k,
160 |                 # Passing the filter function of relevant
161 |                 filter_fn=lambda pids: torch.tensor(
162 |                     [pid for pid in pids if pid in filtered_pids], dtype=torch.int32
163 |                 ).to(device),
164 |             )
165 |         else:
166 |             searcher_results = self.searcher.search(query, k=k)
167 |         results = []
168 |         for pid, rank, score in zip(*searcher_results, strict=False):  # noqa: B007
169 |             results.append(dotdict({"long_text": self.searcher.collection[pid], "score": score, "pid": pid}))
170 |         return results
171 | 
172 | 
173 | class ColBERTv2RerankerLocal:
174 |     def __init__(self, colbert_config=None, checkpoint: str = "bert-base-uncased"):
175 |         try:
176 |             import colbert  # noqa: F401
177 |         except ImportError:
178 |             print(
179 |                 "Colbert not found. Please check your installation or install the module using pip install "
180 |                 "colbert-ai[faiss-gpu,torch]."
181 |             )
182 |         """_summary_
183 | 
184 |         Args:
185 |             colbert_config (ColBERTConfig, optional): Colbert config. Defaults to None.
186 |             checkpoint_name (str, optional): checkpoint for embeddings. Defaults to 'bert-base-uncased'.
187 |         """
188 |         self.colbert_config = colbert_config
189 |         self.checkpoint = checkpoint
190 |         self.colbert_config.checkpoint = checkpoint
191 | 
192 |     def __call__(self, *args: Any, **kwargs: Any) -> Any:
193 |         return self.forward(*args, **kwargs)
194 | 
195 |     def forward(self, query: str, passages: list[str] | None = None):
196 |         assert len(passages) > 0, "Passages should not be empty"
197 | 
198 |         import numpy as np
199 |         from colbert.modeling.colbert import ColBERT
200 |         from colbert.modeling.tokenization.doc_tokenization import DocTokenizer
201 |         from colbert.modeling.tokenization.query_tokenization import QueryTokenizer
202 | 
203 |         passages = passages or []
204 |         self.colbert_config.nway = len(passages)
205 |         query_tokenizer = QueryTokenizer(self.colbert_config, verbose=1)
206 |         doc_tokenizer = DocTokenizer(self.colbert_config)
207 |         query_ids, query_masks = query_tokenizer.tensorize([query])
208 |         doc_ids, doc_masks = doc_tokenizer.tensorize(passages)
209 | 
210 |         col = ColBERT(self.checkpoint, self.colbert_config)
211 |         q = col.query(query_ids, query_masks)
212 |         doc_ids, doc_masks = col.doc(doc_ids, doc_masks, keep_dims="return_mask")
213 |         q_duplicated = q.repeat_interleave(len(passages), dim=0).contiguous()
214 |         tensor_scores = col.score(q_duplicated, doc_ids, doc_masks)
215 |         passage_score_arr = np.array([score.cpu().detach().numpy().tolist() for score in tensor_scores])
216 |         return passage_score_arr
217 | 
```

--------------------------------------------------------------------------------
/tests/primitives/resources/saved_program.json:
--------------------------------------------------------------------------------

```json
 1 | {
 2 |     "react": {
 3 |       "traces": [],
 4 |       "train": [],
 5 |       "demos": [
 6 |         {
 7 |           "augmented": true,
 8 |           "question": "That Darn Cat! and Never a Dull Moment were both produced by what studio?",
 9 |           "trajectory": "[[ ## thought_0 ## ]]\nI need to find out which studio produced both \"That Darn Cat!\" and \"Never a Dull Moment.\" This information is likely available on Wikipedia, so I will search for it there.\n\n[[ ## tool_name_0 ## ]]\nsearch_wikipedia\n\n[[ ## tool_args_0 ## ]]\n{\"query\": \"That Darn Cat! and Never a Dull Moment studio production\"}\n\n[[ ## observation_0 ## ]]\n[1] «That Darn Cat! | That Darn Cat! is a 1965 American Walt Disney Productions thriller comedy film starring Hayley Mills (in her last of the six films she made for the Walt Disney Studios) and Dean Jones (starring in his first film for Disney) in a story about bank robbers, a kidnapping and a mischievous cat. The film was based on the 1963 novel \"Undercover Cat\" by Gordon and Mildred Gordon and was directed by Robert Stevenson. The title song was written by the Sherman Brothers and sung by Bobby Darin. The 1997 remake includes a cameo appearance by Dean Jones.»\n[2] «Never a Dull Moment (1968 film) | Never a Dull Moment is a 1968 American comedy crime film from Walt Disney Productions starring Dick Van Dyke and Edward G. Robinson and directed by Jerry Paris. The script by AJ Carothers was based on a novel by John Godey. The supporting cast features Dorothy Provine, Henry Silva, Slim Pickens and Jack Elam.»\n[3] «That Darn Cat (1997 film) | That Darn Cat is a 1997 American mystery comedy film starring Christina Ricci and Doug E. Doug. It is a remake of the 1965 film \"That Darn Cat! \", which in turn was based on the book \"Undercover Cat\" by Gordon and Mildred Gordon. It is directed by British TV veteran Bob Spiers (most famous for \"Fawlty Towers\", as well as \"Spice World\") and written by Scott Alexander and Larry Karaszewski, best known for \"Ed Wood\" and the first two \"Problem Child\" films.»",
10 |           "next_thought": "Both \"That Darn Cat!\" and \"Never a Dull Moment\" were produced by Walt Disney Productions. I have the information I need to answer the question.",
11 |           "next_tool_name": "finish",
12 |           "next_tool_args": {}
13 |         },
14 |         {
15 |           "question": "Was Yakov Protazanov or Marcel Duchamp born in 1881",
16 |           "answer": "Yakov Alexandrovich Protazanov (Russian: Я́ков Алекса́ндрович Протаза́нов ; January 23 (O.S. February 4), 1881",
17 |           "dspy_uuid": "15118a69-274d-4eea-b3eb-204d05e24012",
18 |           "dspy_split": "train"
19 |         }
20 |       ],
21 |       "signature": {
22 |         "instructions": "Imagine you are a detective racing against time to solve a high-profile case involving two famous films: \"That Darn Cat!\" and \"Never a Dull Moment.\" Your mission is to uncover which studio produced these films before the press conference starts in one hour. You have access to a powerful tool: a Wikipedia search. \n\nGiven the fields `question`, produce the fields `answer`.\n\nYou are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far. Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.\n\nTo do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task. After each tool call, you receive a resulting observation, which gets appended to your trajectory.\n\nWhen writing next_thought, you may reason about the current situation and plan for future steps. When selecting the next_tool_name and its next_tool_args, the tool must be one of:\n\n(1) search_wikipedia. It takes arguments {'query': {'type': 'string'}}.\n(2) finish, whose description is <desc>Marks the task as complete. That is, signals that all information for producing the outputs, i.e. `answer`, are now available to be extracted.<\/desc>. It takes arguments {}.\nWhen providing `next_tool_args`, the value inside the field must be in JSON format.",
23 |         "fields": [
24 |           {
25 |             "prefix": "Question:",
26 |             "description": "${question}"
27 |           },
28 |           {
29 |             "prefix": "Trajectory:",
30 |             "description": "${trajectory}"
31 |           },
32 |           {
33 |             "prefix": "Next Thought:",
34 |             "description": "${next_thought}"
35 |           },
36 |           {
37 |             "prefix": "Next Tool Name:",
38 |             "description": "${next_tool_name}"
39 |           },
40 |           {
41 |             "prefix": "Next Tool Args:",
42 |             "description": "${next_tool_args}"
43 |           }
44 |         ]
45 |       },
46 |       "lm": null
47 |     },
48 |     "extract.predict": {
49 |       "traces": [],
50 |       "train": [],
51 |       "demos": [
52 |         {
53 |           "augmented": true,
54 |           "question": "That Darn Cat! and Never a Dull Moment were both produced by what studio?",
55 |           "trajectory": "[[ ## thought_0 ## ]]\nI need to find out which studio produced both \"That Darn Cat!\" and \"Never a Dull Moment.\" This information is likely available on Wikipedia, so I will search for it there.\n\n[[ ## tool_name_0 ## ]]\nsearch_wikipedia\n\n[[ ## tool_args_0 ## ]]\n{\"query\": \"That Darn Cat! and Never a Dull Moment studio production\"}\n\n[[ ## observation_0 ## ]]\n[1] «That Darn Cat! | That Darn Cat! is a 1965 American Walt Disney Productions thriller comedy film starring Hayley Mills (in her last of the six films she made for the Walt Disney Studios) and Dean Jones (starring in his first film for Disney) in a story about bank robbers, a kidnapping and a mischievous cat. The film was based on the 1963 novel \"Undercover Cat\" by Gordon and Mildred Gordon and was directed by Robert Stevenson. The title song was written by the Sherman Brothers and sung by Bobby Darin. The 1997 remake includes a cameo appearance by Dean Jones.»\n[2] «Never a Dull Moment (1968 film) | Never a Dull Moment is a 1968 American comedy crime film from Walt Disney Productions starring Dick Van Dyke and Edward G. Robinson and directed by Jerry Paris. The script by AJ Carothers was based on a novel by John Godey. The supporting cast features Dorothy Provine, Henry Silva, Slim Pickens and Jack Elam.»\n[3] «That Darn Cat (1997 film) | That Darn Cat is a 1997 American mystery comedy film starring Christina Ricci and Doug E. Doug. It is a remake of the 1965 film \"That Darn Cat! \", which in turn was based on the book \"Undercover Cat\" by Gordon and Mildred Gordon. It is directed by British TV veteran Bob Spiers (most famous for \"Fawlty Towers\", as well as \"Spice World\") and written by Scott Alexander and Larry Karaszewski, best known for \"Ed Wood\" and the first two \"Problem Child\" films.»\n\n[[ ## thought_1 ## ]]\nBoth \"That Darn Cat!\" and \"Never a Dull Moment\" were produced by Walt Disney Productions. I have the information I need to answer the question.\n\n[[ ## tool_name_1 ## ]]\nfinish\n\n[[ ## tool_args_1 ## ]]\n{}\n\n[[ ## observation_1 ## ]]\nCompleted.",
56 |           "reasoning": "Both \"That Darn Cat!\" and \"Never a Dull Moment\" were produced by Walt Disney Productions, as confirmed by the information retrieved from Wikipedia.",
57 |           "answer": "Walt Disney Productions"
58 |         },
59 |         {
60 |           "question": "Are Smyrnium and Nymania both types of plant?",
61 |           "answer": "yes",
62 |           "dspy_uuid": "b57b5933-95c7-472a-801b-3cc9bc0a3b99",
63 |           "dspy_split": "train"
64 |         }
65 |       ],
66 |       "signature": {
67 |         "instructions": "Given the very verbose fields `question`, produce the fields `answer`.",
68 |         "fields": [
69 |           {
70 |             "prefix": "Question:",
71 |             "description": "${question}"
72 |           },
73 |           {
74 |             "prefix": "Trajectory:",
75 |             "description": "${trajectory}"
76 |           },
77 |           {
78 |             "prefix": "Reasoning: Let's think step by step in order to",
79 |             "description": "${reasoning}"
80 |           },
81 |           {
82 |             "prefix": "Answer:",
83 |             "description": "${answer}"
84 |           }
85 |         ]
86 |       },
87 |       "lm": null
88 |     },
89 |     "metadata": {
90 |       "dependency_versions": {
91 |         "python": "3.13",
92 |         "dspy": "3.0.0",
93 |         "cloudpickle": "3.1"
94 |       }
95 |     }
96 |   }
```

--------------------------------------------------------------------------------
/tests/utils/test_usage_tracker.py:
--------------------------------------------------------------------------------

```python
  1 | import dspy
  2 | from dspy.utils.usage_tracker import UsageTracker, track_usage
  3 | 
  4 | 
  5 | def test_add_usage_entry():
  6 |     """Test adding usage entries to the tracker."""
  7 |     tracker = UsageTracker()
  8 | 
  9 |     # Test with a single usage entry
 10 |     usage_entry = {
 11 |         "prompt_tokens": 1117,
 12 |         "completion_tokens": 46,
 13 |         "total_tokens": 1163,
 14 |         "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0},
 15 |         "completion_tokens_details": {
 16 |             "reasoning_tokens": 0,
 17 |             "audio_tokens": 0,
 18 |             "accepted_prediction_tokens": 0,
 19 |             "rejected_prediction_tokens": 0,
 20 |         },
 21 |     }
 22 | 
 23 |     tracker.add_usage("gpt-4o-mini", usage_entry)
 24 |     assert len(tracker.usage_data["gpt-4o-mini"]) == 1
 25 |     assert tracker.usage_data["gpt-4o-mini"][0] == usage_entry
 26 | 
 27 | 
 28 | def test_get_total_tokens():
 29 |     """Test calculating total tokens from usage entries."""
 30 |     tracker = UsageTracker()
 31 | 
 32 |     # Add multiple usage entries for the same model
 33 |     usage_entries = [
 34 |         {
 35 |             "prompt_tokens": 1117,
 36 |             "completion_tokens": 46,
 37 |             "total_tokens": 1163,
 38 |             "prompt_tokens_details": {"cached_tokens": 200, "audio_tokens": 50},
 39 |             "completion_tokens_details": {
 40 |                 "reasoning_tokens": 20,
 41 |                 "audio_tokens": 10,
 42 |                 "accepted_prediction_tokens": 16,
 43 |                 "rejected_prediction_tokens": 0,
 44 |             },
 45 |         },
 46 |         {
 47 |             "prompt_tokens": 800,
 48 |             "completion_tokens": 100,
 49 |             "total_tokens": 900,
 50 |             "prompt_tokens_details": {"cached_tokens": 300, "audio_tokens": 0},
 51 |             "completion_tokens_details": {
 52 |                 "reasoning_tokens": 50,
 53 |                 "audio_tokens": 0,
 54 |                 "accepted_prediction_tokens": 40,
 55 |                 "rejected_prediction_tokens": 10,
 56 |             },
 57 |         },
 58 |         {
 59 |             "prompt_tokens": 500,
 60 |             "completion_tokens": 80,
 61 |             "total_tokens": 580,
 62 |             "prompt_tokens_details": {"cached_tokens": 100, "audio_tokens": 25},
 63 |             "completion_tokens_details": {
 64 |                 "reasoning_tokens": 30,
 65 |                 "audio_tokens": 15,
 66 |                 "accepted_prediction_tokens": 25,
 67 |                 "rejected_prediction_tokens": 10,
 68 |             },
 69 |         },
 70 |     ]
 71 | 
 72 |     for entry in usage_entries:
 73 |         tracker.add_usage("gpt-4o-mini", entry)
 74 | 
 75 |     total_usage = tracker.get_total_tokens()
 76 |     assert "gpt-4o-mini" in total_usage
 77 |     assert total_usage["gpt-4o-mini"]["prompt_tokens"] == 2417  # 1117 + 800 + 500
 78 |     assert total_usage["gpt-4o-mini"]["completion_tokens"] == 226  # 46 + 100 + 80
 79 |     assert total_usage["gpt-4o-mini"]["total_tokens"] == 2643  # 1163 + 900 + 580
 80 |     assert total_usage["gpt-4o-mini"]["prompt_tokens_details"]["cached_tokens"] == 600  # 200 + 300 + 100
 81 |     assert total_usage["gpt-4o-mini"]["prompt_tokens_details"]["audio_tokens"] == 75  # 50 + 0 + 25
 82 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["reasoning_tokens"] == 100  # 20 + 50 + 30
 83 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["audio_tokens"] == 25  # 10 + 0 + 15
 84 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["accepted_prediction_tokens"] == 81  # 16 + 40 + 25
 85 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["rejected_prediction_tokens"] == 20  # 0 + 10 + 10
 86 | 
 87 | 
 88 | def test_track_usage_with_multiple_models():
 89 |     """Test tracking usage across multiple models."""
 90 |     tracker = UsageTracker()
 91 | 
 92 |     # Add usage entries for different models
 93 |     usage_entries = [
 94 |         {
 95 |             "model": "gpt-4o-mini",
 96 |             "usage": {
 97 |                 "prompt_tokens": 1117,
 98 |                 "completion_tokens": 46,
 99 |                 "total_tokens": 1163,
100 |                 "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0},
101 |                 "completion_tokens_details": {
102 |                     "reasoning_tokens": 0,
103 |                     "audio_tokens": 0,
104 |                     "accepted_prediction_tokens": 0,
105 |                     "rejected_prediction_tokens": 0,
106 |                 },
107 |             },
108 |         },
109 |         {
110 |             "model": "gpt-3.5-turbo",
111 |             "usage": {
112 |                 "prompt_tokens": 800,
113 |                 "completion_tokens": 100,
114 |                 "total_tokens": 900,
115 |                 "prompt_tokens_details": {"cached_tokens": 0, "audio_tokens": 0},
116 |                 "completion_tokens_details": {
117 |                     "reasoning_tokens": 0,
118 |                     "audio_tokens": 0,
119 |                     "accepted_prediction_tokens": 0,
120 |                     "rejected_prediction_tokens": 0,
121 |                 },
122 |             },
123 |         },
124 |     ]
125 | 
126 |     for entry in usage_entries:
127 |         tracker.add_usage(entry["model"], entry["usage"])
128 | 
129 |     total_usage = tracker.get_total_tokens()
130 |     assert "gpt-4o-mini" in total_usage
131 |     assert "gpt-3.5-turbo" in total_usage
132 |     assert total_usage["gpt-4o-mini"]["total_tokens"] == 1163
133 |     assert total_usage["gpt-3.5-turbo"]["total_tokens"] == 900
134 | 
135 | 
136 | def test_track_usage_context_manager(lm_for_test):
137 |     lm = dspy.LM(lm_for_test, cache=False)
138 |     dspy.settings.configure(lm=lm)
139 | 
140 |     predict = dspy.ChainOfThought("question -> answer")
141 |     with track_usage() as tracker:
142 |         predict(question="What is the capital of France?")
143 |         predict(question="What is the capital of Italy?")
144 | 
145 |     assert len(tracker.usage_data) > 0
146 |     assert len(tracker.usage_data[lm_for_test]) == 2
147 | 
148 |     total_usage = tracker.get_total_tokens()
149 |     assert lm_for_test in total_usage
150 |     assert len(total_usage.keys()) == 1
151 |     assert isinstance(total_usage[lm_for_test], dict)
152 | 
153 | 
154 | def test_merge_usage_entries_with_new_keys():
155 |     """Ensure merging usage entries preserves unseen keys."""
156 |     tracker = UsageTracker()
157 | 
158 |     tracker.add_usage("model-x", {"prompt_tokens": 5})
159 |     tracker.add_usage("model-x", {"completion_tokens": 2})
160 | 
161 |     total_usage = tracker.get_total_tokens()
162 | 
163 |     assert total_usage["model-x"]["prompt_tokens"] == 5
164 |     assert total_usage["model-x"]["completion_tokens"] == 2
165 | 
166 | 
167 | def test_merge_usage_entries_with_none_values():
168 |     """Test tracking usage across multiple models."""
169 |     tracker = UsageTracker()
170 | 
171 |     # Add usage entries for different models
172 |     usage_entries = [
173 |         {
174 |             "model": "gpt-4o-mini",
175 |             "usage": {
176 |                 "prompt_tokens": 1117,
177 |                 "completion_tokens": 46,
178 |                 "total_tokens": 1163,
179 |                 "prompt_tokens_details": None,
180 |                 "completion_tokens_details": {},
181 |             },
182 |         },
183 |         {
184 |             "model": "gpt-4o-mini",
185 |             "usage": {
186 |                 "prompt_tokens": 800,
187 |                 "completion_tokens": 100,
188 |                 "total_tokens": 900,
189 |                 "prompt_tokens_details": {"cached_tokens": 50, "audio_tokens": 50},
190 |                 "completion_tokens_details": None,
191 |             },
192 |         },
193 |         {
194 |             "model": "gpt-4o-mini",
195 |             "usage": {
196 |                 "prompt_tokens": 800,
197 |                 "completion_tokens": 100,
198 |                 "total_tokens": 900,
199 |                 "prompt_tokens_details": None,
200 |                 "completion_tokens_details": {
201 |                     "reasoning_tokens": 1,
202 |                     "audio_tokens": 1,
203 |                     "accepted_prediction_tokens": 1,
204 |                     "rejected_prediction_tokens": 1,
205 |                 },
206 |             },
207 |         },
208 |     ]
209 | 
210 |     for entry in usage_entries:
211 |         tracker.add_usage(entry["model"], entry["usage"])
212 | 
213 |     total_usage = tracker.get_total_tokens()
214 | 
215 |     assert total_usage["gpt-4o-mini"]["prompt_tokens"] == 2717
216 |     assert total_usage["gpt-4o-mini"]["completion_tokens"] == 246
217 |     assert total_usage["gpt-4o-mini"]["total_tokens"] == 2963
218 |     assert total_usage["gpt-4o-mini"]["prompt_tokens_details"]["cached_tokens"] == 50
219 |     assert total_usage["gpt-4o-mini"]["prompt_tokens_details"]["audio_tokens"] == 50
220 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["reasoning_tokens"] == 1
221 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["audio_tokens"] == 1
222 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["accepted_prediction_tokens"] == 1
223 |     assert total_usage["gpt-4o-mini"]["completion_tokens_details"]["rejected_prediction_tokens"] == 1
224 | 
```

--------------------------------------------------------------------------------
/tests/evaluate/test_evaluate.py:
--------------------------------------------------------------------------------

```python
  1 | import signal
  2 | import threading
  3 | from unittest.mock import patch
  4 | 
  5 | import pytest
  6 | 
  7 | import dspy
  8 | from dspy.evaluate.evaluate import Evaluate, EvaluationResult
  9 | from dspy.evaluate.metrics import answer_exact_match
 10 | from dspy.predict import Predict
 11 | from dspy.utils.callback import BaseCallback
 12 | from dspy.utils.dummies import DummyLM
 13 | 
 14 | 
 15 | def new_example(question, answer):
 16 |     """Helper function to create a new example."""
 17 |     return dspy.Example(
 18 |         question=question,
 19 |         answer=answer,
 20 |     ).with_inputs("question")
 21 | 
 22 | 
 23 | def test_evaluate_initialization():
 24 |     devset = [new_example("What is 1+1?", "2")]
 25 |     ev = Evaluate(
 26 |         devset=devset,
 27 |         metric=answer_exact_match,
 28 |         display_progress=False,
 29 |     )
 30 |     assert ev.devset == devset
 31 |     assert ev.metric == answer_exact_match
 32 |     assert ev.num_threads is None
 33 |     assert not ev.display_progress
 34 | 
 35 | 
 36 | def test_evaluate_call():
 37 |     dspy.settings.configure(
 38 |         lm=DummyLM(
 39 |             {
 40 |                 "What is 1+1?": {"answer": "2"},
 41 |                 "What is 2+2?": {"answer": "4"},
 42 |             }
 43 |         )
 44 |     )
 45 |     devset = [new_example("What is 1+1?", "2"), new_example("What is 2+2?", "4")]
 46 |     program = Predict("question -> answer")
 47 |     assert program(question="What is 1+1?").answer == "2"
 48 |     ev = Evaluate(
 49 |         devset=devset,
 50 |         metric=answer_exact_match,
 51 |         display_progress=False,
 52 |     )
 53 |     score = ev(program)
 54 |     assert score.score == 100.0
 55 | 
 56 | 
 57 | @pytest.mark.extra
 58 | def test_construct_result_df():
 59 |     import pandas as pd
 60 |     devset = [
 61 |         new_example("What is 1+1?", "2"),
 62 |         new_example("What is 2+2?", "4"),
 63 |         new_example("What is 3+3?", "-1"),
 64 |     ]
 65 |     ev = Evaluate(
 66 |         devset=devset,
 67 |         metric=answer_exact_match,
 68 |     )
 69 |     results = [
 70 |         (devset[0], {"answer": "2"}, 100.0),
 71 |         (devset[1], {"answer": "4"}, 100.0),
 72 |         (devset[2], {"answer": "-1"}, 0.0),
 73 |     ]
 74 |     result_df = ev._construct_result_table(results, answer_exact_match.__name__)
 75 |     pd.testing.assert_frame_equal(
 76 |         result_df,
 77 |         pd.DataFrame(
 78 |             {
 79 |                 "question": ["What is 1+1?", "What is 2+2?", "What is 3+3?"],
 80 |                 "example_answer": ["2", "4", "-1"],
 81 |                 "pred_answer": ["2", "4", "-1"],
 82 |                 "answer_exact_match": [100.0, 100.0, 0.0],
 83 |             }
 84 |         ),
 85 |     )
 86 | 
 87 | 
 88 | def test_multithread_evaluate_call():
 89 |     dspy.settings.configure(lm=DummyLM({"What is 1+1?": {"answer": "2"}, "What is 2+2?": {"answer": "4"}}))
 90 |     devset = [new_example("What is 1+1?", "2"), new_example("What is 2+2?", "4")]
 91 |     program = Predict("question -> answer")
 92 |     assert program(question="What is 1+1?").answer == "2"
 93 |     ev = Evaluate(
 94 |         devset=devset,
 95 |         metric=answer_exact_match,
 96 |         display_progress=False,
 97 |         num_threads=2,
 98 |     )
 99 |     result = ev(program)
100 |     assert result.score == 100.0
101 | 
102 | 
103 | def test_multi_thread_evaluate_call_cancelled(monkeypatch):
104 |     # slow LM that sleeps for 1 second before returning the answer
105 |     class SlowLM(DummyLM):
106 |         def __call__(self, *args, **kwargs):
107 |             import time
108 | 
109 |             time.sleep(1)
110 |             return super().__call__(*args, **kwargs)
111 | 
112 |     dspy.settings.configure(lm=SlowLM({"What is 1+1?": {"answer": "2"}, "What is 2+2?": {"answer": "4"}}))
113 | 
114 |     devset = [new_example("What is 1+1?", "2"), new_example("What is 2+2?", "4")]
115 |     program = Predict("question -> answer")
116 |     assert program(question="What is 1+1?").answer == "2"
117 | 
118 |     # spawn a thread that will sleep for .1 seconds then send a KeyboardInterrupt
119 |     def sleep_then_interrupt():
120 |         import time
121 | 
122 |         time.sleep(0.1)
123 |         import os
124 | 
125 |         os.kill(os.getpid(), signal.SIGINT)
126 | 
127 |     input_thread = threading.Thread(target=sleep_then_interrupt)
128 |     input_thread.start()
129 | 
130 |     with pytest.raises(KeyboardInterrupt):
131 |         ev = Evaluate(
132 |             devset=devset,
133 |             metric=answer_exact_match,
134 |             display_progress=False,
135 |             num_threads=2,
136 |         )
137 |         ev(program)
138 | 
139 | 
140 | def test_evaluate_call_wrong_answer():
141 |     dspy.settings.configure(lm=DummyLM({"What is 1+1?": {"answer": "0"}, "What is 2+2?": {"answer": "0"}}))
142 |     devset = [new_example("What is 1+1?", "2"), new_example("What is 2+2?", "4")]
143 |     program = Predict("question -> answer")
144 |     ev = Evaluate(
145 |         devset=devset,
146 |         metric=answer_exact_match,
147 |         display_progress=False,
148 |     )
149 |     result = ev(program)
150 |     assert result.score == 0.0
151 | 
152 | 
153 | @pytest.mark.extra
154 | @pytest.mark.parametrize(
155 |     "program_with_example",
156 |     [
157 |         (Predict("question -> answer"), new_example("What is 1+1?", "2")),
158 |         # Create programs that do not return dictionary-like objects because Evaluate()
159 |         # has failed for such cases in the past
160 |         (
161 |             lambda text: Predict("text: str -> entities: list[str]")(text=text).entities,
162 |             dspy.Example(text="United States", entities=["United States"]).with_inputs("text"),
163 |         ),
164 |         (
165 |             lambda text: Predict("text: str -> entities: list[dict[str, str]]")(text=text).entities,
166 |             dspy.Example(text="United States", entities=[{"name": "United States", "type": "location"}]).with_inputs(
167 |                 "text"
168 |             ),
169 |         ),
170 |         (
171 |             lambda text: Predict("text: str -> first_word: Tuple[str, int]")(text=text).words,
172 |             dspy.Example(text="United States", first_word=("United", 6)).with_inputs("text"),
173 |         ),
174 |     ],
175 | )
176 | @pytest.mark.parametrize("display_table", [True, False, 1])
177 | @pytest.mark.parametrize("is_in_ipython_notebook_environment", [True, False])
178 | def test_evaluate_display_table(program_with_example, display_table, is_in_ipython_notebook_environment, capfd):
179 |     program, example = program_with_example
180 |     example_input = next(iter(example.inputs().values()))
181 |     example_output = {key: value for key, value in example.toDict().items() if key not in example.inputs()}
182 | 
183 |     dspy.settings.configure(
184 |         lm=DummyLM(
185 |             {
186 |                 example_input: example_output,
187 |             }
188 |         )
189 |     )
190 | 
191 |     ev = Evaluate(
192 |         devset=[example],
193 |         metric=lambda example, pred, **kwargs: example == pred,
194 |         display_table=display_table,
195 |     )
196 |     assert ev.display_table == display_table
197 | 
198 |     with patch(
199 |         "dspy.evaluate.evaluate.is_in_ipython_notebook_environment", return_value=is_in_ipython_notebook_environment
200 |     ):
201 |         ev(program)
202 |         out, _ = capfd.readouterr()
203 |         if not is_in_ipython_notebook_environment and display_table:
204 |             # In console environments where IPython is not available, the table should be printed
205 |             # to the console
206 |             example_input = next(iter(example.inputs().values()))
207 |             assert example_input in out
208 | 
209 | 
210 | def test_evaluate_callback():
211 |     class TestCallback(BaseCallback):
212 |         def __init__(self):
213 |             self.start_call_inputs = None
214 |             self.start_call_count = 0
215 |             self.end_call_outputs = None
216 |             self.end_call_count = 0
217 | 
218 |         def on_evaluate_start(
219 |             self,
220 |             call_id: str,
221 |             instance,
222 |             inputs,
223 |         ):
224 |             self.start_call_inputs = inputs
225 |             self.start_call_count += 1
226 | 
227 |         def on_evaluate_end(
228 |             self,
229 |             call_id: str,
230 |             outputs,
231 |             exception=None,
232 |         ):
233 |             self.end_call_outputs = outputs
234 |             self.end_call_count += 1
235 | 
236 |     callback = TestCallback()
237 |     dspy.settings.configure(
238 |         lm=DummyLM(
239 |             {
240 |                 "What is 1+1?": {"answer": "2"},
241 |                 "What is 2+2?": {"answer": "4"},
242 |             }
243 |         ),
244 |         callbacks=[callback],
245 |     )
246 |     devset = [new_example("What is 1+1?", "2"), new_example("What is 2+2?", "4")]
247 |     program = Predict("question -> answer")
248 |     assert program(question="What is 1+1?").answer == "2"
249 |     ev = Evaluate(
250 |         devset=devset,
251 |         metric=answer_exact_match,
252 |         display_progress=False,
253 |     )
254 |     result = ev(program)
255 |     assert result.score == 100.0
256 |     assert callback.start_call_inputs["program"] == program
257 |     assert callback.start_call_count == 1
258 |     assert callback.end_call_outputs.score == 100.0
259 |     assert callback.end_call_count == 1
260 | 
261 | def test_evaluation_result_repr():
262 |     result = EvaluationResult(score=100.0, results=[(new_example("What is 1+1?", "2"), {"answer": "2"}, 100.0)])
263 |     assert repr(result) == "EvaluationResult(score=100.0, results=<list of 1 results>)"
264 | 
```

--------------------------------------------------------------------------------
/dspy/clients/openai.py:
--------------------------------------------------------------------------------

```python
  1 | import time
  2 | from datetime import datetime
  3 | from typing import Any
  4 | 
  5 | import openai
  6 | 
  7 | from dspy.clients.provider import Provider, TrainingJob
  8 | from dspy.clients.utils_finetune import TrainDataFormat, TrainingStatus, save_data
  9 | 
 10 | 
 11 | class TrainingJobOpenAI(TrainingJob):
 12 |     def __init__(self, *args, **kwargs):
 13 |         super().__init__(*args, **kwargs)
 14 |         self.provider_file_id = None
 15 |         self.provider_job_id = None
 16 | 
 17 |     def cancel(self):
 18 |         # Cancel the provider job
 19 |         if OpenAIProvider.does_job_exist(self.provider_job_id):
 20 |             status = self.status()
 21 |             if OpenAIProvider.is_terminal_training_status(status):
 22 |                 err_msg = "Jobs that are complete cannot be canceled."
 23 |                 err_msg += f" Job with ID {self.provider_job_id} is done."
 24 |                 raise Exception(err_msg)
 25 |             openai.fine_tuning.jobs.cancel(self.provider_job_id)
 26 |             self.provider_job_id = None
 27 | 
 28 |         # Delete the provider file
 29 |         if self.provider_file_id is not None:
 30 |             if OpenAIProvider.does_file_exist(self.provider_file_id):
 31 |                 openai.files.delete(self.provider_file_id)
 32 |             self.provider_file_id = None
 33 | 
 34 |         # Call the super's cancel method after the custom cancellation logic
 35 |         super().cancel()
 36 | 
 37 |     def status(self) -> TrainingStatus:
 38 |         status = OpenAIProvider.get_training_status(self.provider_job_id)
 39 |         return status
 40 | 
 41 | 
 42 | class OpenAIProvider(Provider):
 43 |     def __init__(self):
 44 |         super().__init__()
 45 |         self.finetunable = True
 46 |         self.TrainingJob = TrainingJobOpenAI
 47 | 
 48 |     @staticmethod
 49 |     def is_provider_model(model: str) -> bool:
 50 |         if model.startswith("openai/") or model.startswith("ft:"):
 51 |             # Althought it looks strange, `ft:` is a unique identifer for openai finetuned models in litellm context:
 52 |             # https://github.com/BerriAI/litellm/blob/cd893134b7974d9f21477049a373b469fff747a5/litellm/utils.py#L4495
 53 |             return True
 54 | 
 55 |         return False
 56 | 
 57 |     @staticmethod
 58 |     def _remove_provider_prefix(model: str) -> str:
 59 |         provider_prefix = "openai/"
 60 |         return model.replace(provider_prefix, "")
 61 | 
 62 |     @staticmethod
 63 |     def finetune(
 64 |         job: TrainingJobOpenAI,
 65 |         model: str,
 66 |         train_data: list[dict[str, Any]],
 67 |         train_data_format: TrainDataFormat | None,
 68 |         train_kwargs: dict[str, Any] | None = None,
 69 |     ) -> str:
 70 |         model = OpenAIProvider._remove_provider_prefix(model)
 71 | 
 72 |         print("[OpenAI Provider] Validating the data format")
 73 |         OpenAIProvider.validate_data_format(train_data_format)
 74 | 
 75 |         print("[OpenAI Provider] Saving the data to a file")
 76 |         data_path = save_data(train_data)
 77 |         print(f"[OpenAI Provider] Data saved to {data_path}")
 78 | 
 79 |         print("[OpenAI Provider] Uploading the data to the provider")
 80 |         provider_file_id = OpenAIProvider.upload_data(data_path)
 81 |         job.provider_file_id = provider_file_id
 82 | 
 83 |         print("[OpenAI Provider] Starting remote training")
 84 |         provider_job_id = OpenAIProvider._start_remote_training(
 85 |             train_file_id=job.provider_file_id,
 86 |             model=model,
 87 |             train_kwargs=train_kwargs,
 88 |         )
 89 |         job.provider_job_id = provider_job_id
 90 |         print(f"[OpenAI Provider] Job started with the OpenAI Job ID {provider_job_id}")
 91 | 
 92 |         print("[OpenAI Provider] Waiting for training to complete")
 93 |         # TODO(feature): Could we stream OAI logs?
 94 |         OpenAIProvider.wait_for_job(job)
 95 | 
 96 |         print("[OpenAI Provider] Attempting to retrieve the trained model")
 97 |         model = OpenAIProvider.get_trained_model(job)
 98 |         print(f"[OpenAI Provider] Model retrieved: {model}")
 99 | 
100 |         return model
101 | 
102 |     @staticmethod
103 |     def does_job_exist(job_id: str) -> bool:
104 |         try:
105 |             # TODO(nit): This call may fail for other reasons. We should check
106 |             # the error message to ensure that the job does not exist.
107 |             openai.fine_tuning.jobs.retrieve(job_id)
108 |             return True
109 |         except Exception:
110 |             return False
111 | 
112 |     @staticmethod
113 |     def does_file_exist(file_id: str) -> bool:
114 |         try:
115 |             # TODO(nit): This call may fail for other reasons. We should check
116 |             # the error message to ensure that the file does not exist.
117 |             openai.files.retrieve(file_id)
118 |             return True
119 |         except Exception:
120 |             return False
121 | 
122 |     @staticmethod
123 |     def is_terminal_training_status(status: TrainingStatus) -> bool:
124 |         return status in [
125 |             TrainingStatus.succeeded,
126 |             TrainingStatus.failed,
127 |             TrainingStatus.cancelled,
128 |         ]
129 | 
130 |     @staticmethod
131 |     def get_training_status(job_id: str) -> TrainingStatus:
132 |         provider_status_to_training_status = {
133 |             "validating_files": TrainingStatus.pending,
134 |             "queued": TrainingStatus.pending,
135 |             "running": TrainingStatus.running,
136 |             "succeeded": TrainingStatus.succeeded,
137 |             "failed": TrainingStatus.failed,
138 |             "cancelled": TrainingStatus.cancelled,
139 |         }
140 | 
141 |         # Check if there is an active job
142 |         if job_id is None:
143 |             print("There is no active job.")
144 |             return TrainingStatus.not_started
145 | 
146 |         err_msg = f"Job with ID {job_id} does not exist."
147 |         assert OpenAIProvider.does_job_exist(job_id), err_msg
148 | 
149 |         # Retrieve the provider's job and report the status
150 |         provider_job = openai.fine_tuning.jobs.retrieve(job_id)
151 |         provider_status = provider_job.status
152 |         status = provider_status_to_training_status[provider_status]
153 | 
154 |         return status
155 | 
156 |     @staticmethod
157 |     def validate_data_format(data_format: TrainDataFormat):
158 |         supported_data_formats = [
159 |             TrainDataFormat.CHAT,
160 |             TrainDataFormat.COMPLETION,
161 |         ]
162 |         if data_format not in supported_data_formats:
163 |             err_msg = f"OpenAI does not support the data format {data_format}."
164 |             raise ValueError(err_msg)
165 | 
166 |     @staticmethod
167 |     def upload_data(data_path: str) -> str:
168 |         # Upload the data to the provider
169 |         provider_file = openai.files.create(
170 |             file=open(data_path, "rb"),
171 |             purpose="fine-tune",
172 |         )
173 |         return provider_file.id
174 | 
175 |     @staticmethod
176 |     def _start_remote_training(train_file_id: str, model: str, train_kwargs: dict[str, Any] | None = None) -> str:
177 |         train_kwargs = train_kwargs or {}
178 |         provider_job = openai.fine_tuning.jobs.create(
179 |             model=model,
180 |             training_file=train_file_id,
181 |             hyperparameters=train_kwargs,
182 |         )
183 |         return provider_job.id
184 | 
185 |     @staticmethod
186 |     def wait_for_job(
187 |         job: TrainingJobOpenAI,
188 |         poll_frequency: int = 20,
189 |     ):
190 |         # Poll for the job until it is done
191 |         done = False
192 |         cur_event_id = None
193 |         reported_estimated_time = False
194 |         while not done:
195 |             # Report estimated time if not already reported
196 |             if not reported_estimated_time:
197 |                 remote_job = openai.fine_tuning.jobs.retrieve(job.provider_job_id)
198 |                 timestamp = remote_job.estimated_finish
199 |                 if timestamp:
200 |                     estimated_finish_dt = datetime.fromtimestamp(timestamp)
201 |                     delta_dt = estimated_finish_dt - datetime.now()
202 |                     print(f"[OpenAI Provider] The OpenAI estimated time remaining is: {delta_dt}")
203 |                     reported_estimated_time = True
204 | 
205 |             # Get new events
206 |             page = openai.fine_tuning.jobs.list_events(fine_tuning_job_id=job.provider_job_id, limit=1)
207 |             new_event = page.data[0] if page.data else None
208 |             if new_event and new_event.id != cur_event_id:
209 |                 dt = datetime.fromtimestamp(new_event.created_at)
210 |                 print(f"[OpenAI Provider] {dt} {new_event.message}")
211 |                 cur_event_id = new_event.id
212 | 
213 |             # Sleep and update the flag
214 |             time.sleep(poll_frequency)
215 |             done = OpenAIProvider.is_terminal_training_status(job.status())
216 | 
217 |     @staticmethod
218 |     def get_trained_model(job):
219 |         status = job.status()
220 |         if status != TrainingStatus.succeeded:
221 |             err_msg = f"Job status is {status}."
222 |             err_msg += f" Must be {TrainingStatus.succeeded} to retrieve model."
223 |             raise Exception(err_msg)
224 | 
225 |         provider_job = openai.fine_tuning.jobs.retrieve(job.provider_job_id)
226 |         finetuned_model = provider_job.fine_tuned_model
227 |         return finetuned_model
228 | 
```

--------------------------------------------------------------------------------
/docs/docs/learn/programming/tools.md:
--------------------------------------------------------------------------------

```markdown
  1 | ---
  2 | sidebar_position: 2
  3 | ---
  4 | 
  5 | # Tools
  6 | 
  7 | DSPy provides powerful support for **tool-using agents** that can interact with external functions, APIs, and services. Tools enable language models to go beyond text generation by performing actions, retrieving information, and processing data dynamically.
  8 | 
  9 | There are two main approaches to using tools in DSPy:
 10 | 
 11 | 1. **`dspy.ReAct`** - A fully managed tool agent that handles reasoning and tool calls automatically
 12 | 2. **Manual tool handling** - Direct control over tool calls using `dspy.Tool`, `dspy.ToolCalls`, and custom signatures
 13 | 
 14 | ## Approach 1: Using `dspy.ReAct` (Fully Managed)
 15 | 
 16 | The `dspy.ReAct` module implements the Reasoning and Acting (ReAct) pattern, where the language model iteratively reasons about the current situation and decides which tools to call.
 17 | 
 18 | ### Basic Example
 19 | 
 20 | ```python
 21 | import dspy
 22 | 
 23 | # Define your tools as functions
 24 | def get_weather(city: str) -> str:
 25 |     """Get the current weather for a city."""
 26 |     # In a real implementation, this would call a weather API
 27 |     return f"The weather in {city} is sunny and 75°F"
 28 | 
 29 | def search_web(query: str) -> str:
 30 |     """Search the web for information."""
 31 |     # In a real implementation, this would call a search API
 32 |     return f"Search results for '{query}': [relevant information...]"
 33 | 
 34 | # Create a ReAct agent
 35 | react_agent = dspy.ReAct(
 36 |     signature="question -> answer",
 37 |     tools=[get_weather, search_web],
 38 |     max_iters=5
 39 | )
 40 | 
 41 | # Use the agent
 42 | result = react_agent(question="What's the weather like in Tokyo?")
 43 | print(result.answer)
 44 | print("Tool calls made:", result.trajectory)
 45 | ```
 46 | 
 47 | ### ReAct Features
 48 | 
 49 | - **Automatic reasoning**: The model thinks through the problem step by step
 50 | - **Tool selection**: Automatically chooses which tool to use based on the situation
 51 | - **Iterative execution**: Can make multiple tool calls to gather information
 52 | - **Error handling**: Built-in error recovery for failed tool calls
 53 | - **Trajectory tracking**: Complete history of reasoning and tool calls
 54 | 
 55 | ### ReAct Parameters
 56 | 
 57 | ```python
 58 | react_agent = dspy.ReAct(
 59 |     signature="question -> answer",  # Input/output specification
 60 |     tools=[tool1, tool2, tool3],     # List of available tools
 61 |     max_iters=10                     # Maximum number of tool call iterations
 62 | )
 63 | ```
 64 | 
 65 | ## Approach 2: Manual Tool Handling
 66 | 
 67 | For more control over the tool calling process, you can manually handle tools using DSPy's tool types.
 68 | 
 69 | ### Basic Setup
 70 | 
 71 | ```python
 72 | import dspy
 73 | 
 74 | class ToolSignature(dspy.Signature):
 75 |     """Signature for manual tool handling."""
 76 |     question: str = dspy.InputField()
 77 |     tools: list[dspy.Tool] = dspy.InputField()
 78 |     outputs: dspy.ToolCalls = dspy.OutputField()
 79 | 
 80 | def weather(city: str) -> str:
 81 |     """Get weather information for a city."""
 82 |     return f"The weather in {city} is sunny"
 83 | 
 84 | def calculator(expression: str) -> str:
 85 |     """Evaluate a mathematical expression."""
 86 |     try:
 87 |         result = eval(expression)  # Note: Use safely in production
 88 |         return f"The result is {result}"
 89 |     except:
 90 |         return "Invalid expression"
 91 | 
 92 | # Create tool instances
 93 | tools = {
 94 |     "weather": dspy.Tool(weather),
 95 |     "calculator": dspy.Tool(calculator)
 96 | }
 97 | 
 98 | # Create predictor
 99 | predictor = dspy.Predict(ToolSignature)
100 | 
101 | # Make a prediction
102 | response = predictor(
103 |     question="What's the weather in New York?",
104 |     tools=list(tools.values())
105 | )
106 | 
107 | # Execute the tool calls
108 | for call in response.outputs.tool_calls:
109 |     # Execute the tool call
110 |     result = call.execute()
111 |     print(f"Tool: {call.name}")
112 |     print(f"Args: {call.args}")
113 |     print(f"Result: {result}")
114 | ```
115 | 
116 | ### Understanding `dspy.Tool`
117 | 
118 | The `dspy.Tool` class wraps regular Python functions to make them compatible with DSPy's tool system:
119 | 
120 | ```python
121 | def my_function(param1: str, param2: int = 5) -> str:
122 |     """A sample function with parameters."""
123 |     return f"Processed {param1} with value {param2}"
124 | 
125 | # Create a tool
126 | tool = dspy.Tool(my_function)
127 | 
128 | # Tool properties
129 | print(tool.name)        # "my_function"
130 | print(tool.desc)        # The function's docstring
131 | print(tool.args)        # Parameter schema
132 | print(str(tool))        # Full tool description
133 | ```
134 | 
135 | ### Understanding `dspy.ToolCalls`
136 | 
137 | The `dspy.ToolCalls` type represents the output from a model that can make tool calls. Each individual tool call can be executed using the `execute` method:
138 | 
139 | ```python
140 | # After getting a response with tool calls
141 | for call in response.outputs.tool_calls:
142 |     print(f"Tool name: {call.name}")
143 |     print(f"Arguments: {call.args}")
144 |     
145 |     # Execute individual tool calls with different options:
146 |     
147 |     # Option 1: Automatic discovery (finds functions in locals/globals)
148 |     result = call.execute()  # Automatically finds functions by name
149 | 
150 |     # Option 2: Pass tools as a dict (most explicit)
151 |     result = call.execute(functions={"weather": weather, "calculator": calculator})
152 |     
153 |     # Option 3: Pass Tool objects as a list
154 |     result = call.execute(functions=[dspy.Tool(weather), dspy.Tool(calculator)])
155 |     
156 |     print(f"Result: {result}")
157 | ```
158 | 
159 | ## Using Native Tool Calling
160 | 
161 | DSPy adapters support **native function calling**, which leverages the underlying language model's built-in tool calling capabilities rather
162 | than relying on text-based parsing. This approach can provide more reliable tool execution and better integration with models that support
163 | native function calling.
164 | 
165 | !!! warning "Native tool calling doesn't guarantee better quality"
166 | 
167 |     It's possible that native tool calling produces lower quality than custom tool calling.
168 | 
169 | ### Adapter Behavior
170 | 
171 | Different DSPy adapters have different defaults for native function calling:
172 | 
173 | - **`ChatAdapter`** - Uses `use_native_function_calling=False` by default (relies on text parsing)
174 | - **`JSONAdapter`** - Uses `use_native_function_calling=True` by default (uses native function calling)
175 | 
176 | You can override these defaults by explicitly setting the `use_native_function_calling` parameter when creating an adapter.
177 | 
178 | ### Configuration
179 | 
180 | ```python
181 | import dspy
182 | 
183 | # ChatAdapter with native function calling enabled
184 | chat_adapter_native = dspy.ChatAdapter(use_native_function_calling=True)
185 | 
186 | # JSONAdapter with native function calling disabled
187 | json_adapter_manual = dspy.JSONAdapter(use_native_function_calling=False)
188 | 
189 | # Configure DSPy to use the adapter
190 | dspy.configure(lm=dspy.LM(model="openai/gpt-4o"), adapter=chat_adapter_native)
191 | ```
192 | 
193 | You can enable the [MLflow tracing](https://dspy.ai/tutorials/observability/) to check how native tool
194 | calling is being used. If you use `JSONAdapter` or `ChatAdapter` with native function calling enabled on the code snippet
195 | as provided in [the section above](tools.md#basic-setup), you should see native function calling arg `tools` is set like
196 | the screenshot below:
197 | 
198 | ![native tool calling](../figures/native_tool_call.png)
199 | 
200 | 
201 | ### Model Compatibility
202 | 
203 | Native function calling automatically detects model support using `litellm.supports_function_calling()`. If the model doesn't support native function calling, DSPy will fall back to manual text-based parsing even when `use_native_function_calling=True` is set.
204 | 
205 | ## Best Practices
206 | 
207 | ### 1. Tool Function Design
208 | 
209 | - **Clear docstrings**: Tools work better with descriptive documentation
210 | - **Type hints**: Provide clear parameter and return types
211 | - **Simple parameters**: Use basic types (str, int, bool, dict, list) or Pydantic models
212 | 
213 | ```python
214 | def good_tool(city: str, units: str = "celsius") -> str:
215 |     """
216 |     Get weather information for a specific city.
217 |     
218 |     Args:
219 |         city: The name of the city to get weather for
220 |         units: Temperature units, either 'celsius' or 'fahrenheit'
221 |     
222 |     Returns:
223 |         A string describing the current weather conditions
224 |     """
225 |     # Implementation with proper error handling
226 |     if not city.strip():
227 |         return "Error: City name cannot be empty"
228 |     
229 |     # Weather logic here...
230 |     return f"Weather in {city}: 25°{units[0].upper()}, sunny"
231 | ```
232 | 
233 | ### 2. Choosing Between ReAct and Manual Handling
234 | 
235 | **Use `dspy.ReAct` when:**
236 | 
237 | - You want automatic reasoning and tool selection
238 | - The task requires multiple tool calls
239 | - You need built-in error recovery
240 | - You want to focus on tool implementation rather than orchestration
241 | 
242 | **Use manual tool handling when:**
243 | 
244 | - You need precise control over tool execution
245 | - You want custom error handling logic
246 | - You want to minimize the latency
247 | - Your tool returns nothing (void function)
248 | 
249 | 
250 | Tools in DSPy provide a powerful way to extend language model capabilities beyond text generation. Whether using the fully automated ReAct approach or manual tool handling, you can build sophisticated agents that interact with the world through code.
251 | 
```

--------------------------------------------------------------------------------
/dspy/adapters/two_step_adapter.py:
--------------------------------------------------------------------------------

```python
  1 | from typing import Any
  2 | 
  3 | import json_repair
  4 | 
  5 | from dspy.adapters.base import Adapter
  6 | from dspy.adapters.chat_adapter import ChatAdapter
  7 | from dspy.adapters.types import ToolCalls
  8 | from dspy.adapters.utils import get_field_description_string
  9 | from dspy.clients import LM
 10 | from dspy.signatures.field import InputField
 11 | from dspy.signatures.signature import Signature, make_signature
 12 | 
 13 | """
 14 | NOTE/TODO/FIXME:
 15 | 
 16 | The main issue below is that the second step's signature is entirely created on the fly and is invoked with a chat
 17 | adapter explicitly constructed with no demonstrations. This means that it cannot "learn" or get optimized.
 18 | """
 19 | 
 20 | 
 21 | class TwoStepAdapter(Adapter):
 22 |     """
 23 |     A two-stage adapter that:
 24 |         1. Uses a simpler, more natural prompt for the main LM
 25 |         2. Uses a smaller LM with chat adapter to extract structured data from the response of main LM
 26 |     This adapter uses a common __call__ logic defined in base Adapter class.
 27 |     This class is particularly useful when interacting with reasoning models as the main LM since reasoning models
 28 |     are known to struggle with structured outputs.
 29 | 
 30 |     Example:
 31 |     ```
 32 |     import dspy
 33 |     lm = dspy.LM(model="openai/o3-mini", max_tokens=16000, temperature = 1.0)
 34 |     adapter = dspy.TwoStepAdapter(dspy.LM("openai/gpt-4o-mini"))
 35 |     dspy.configure(lm=lm, adapter=adapter)
 36 |     program = dspy.ChainOfThought("question->answer")
 37 |     result = program("What is the capital of France?")
 38 |     print(result)
 39 |     ```
 40 |     """
 41 | 
 42 |     def __init__(self, extraction_model: LM, **kwargs):
 43 |         super().__init__(**kwargs)
 44 |         if not isinstance(extraction_model, LM):
 45 |             raise ValueError("extraction_model must be an instance of LM")
 46 |         self.extraction_model = extraction_model
 47 | 
 48 |     def format(
 49 |         self, signature: type[Signature], demos: list[dict[str, Any]], inputs: dict[str, Any]
 50 |     ) -> list[dict[str, Any]]:
 51 |         """
 52 |         Format a prompt for the first stage with the main LM.
 53 |         This no specific structure is required for the main LM, we customize the format method
 54 |         instead of format_field_description or format_field_structure.
 55 | 
 56 |         Args:
 57 |             signature: The signature of the original task
 58 |             demos: A list of demo examples
 59 |             inputs: The current input
 60 | 
 61 |         Returns:
 62 |             A list of messages to be passed to the main LM.
 63 |         """
 64 |         messages = []
 65 | 
 66 |         # Create a task description for the main LM
 67 |         task_description = self.format_task_description(signature)
 68 |         messages.append({"role": "system", "content": task_description})
 69 | 
 70 |         messages.extend(self.format_demos(signature, demos))
 71 | 
 72 |         # Format the current input
 73 |         messages.append({"role": "user", "content": self.format_user_message_content(signature, inputs)})
 74 | 
 75 |         return messages
 76 | 
 77 |     def parse(self, signature: Signature, completion: str) -> dict[str, Any]:
 78 |         """
 79 |         Use a smaller LM (extraction_model) with chat adapter to extract structured data
 80 |         from the raw completion text of the main LM.
 81 | 
 82 |         Args:
 83 |             signature: The signature of the original task
 84 |             completion: The completion from the main LM
 85 | 
 86 |         Returns:
 87 |             A dictionary containing the extracted structured data.
 88 |         """
 89 |         # The signature is supposed to be "text -> {original output fields}"
 90 |         extractor_signature = self._create_extractor_signature(signature)
 91 | 
 92 |         try:
 93 |             # Call the smaller LM to extract structured data from the raw completion text with ChatAdapter
 94 |             parsed_result = ChatAdapter()(
 95 |                 lm=self.extraction_model,
 96 |                 lm_kwargs={},
 97 |                 signature=extractor_signature,
 98 |                 demos=[],
 99 |                 inputs={"text": completion},
100 |             )
101 |             return parsed_result[0]
102 | 
103 |         except Exception as e:
104 |             raise ValueError(f"Failed to parse response from the original completion: {completion}") from e
105 | 
106 |     async def acall(
107 |         self,
108 |         lm: "LM",
109 |         lm_kwargs: dict[str, Any],
110 |         signature: type[Signature],
111 |         demos: list[dict[str, Any]],
112 |         inputs: dict[str, Any],
113 |     ) -> list[dict[str, Any]]:
114 |         inputs = self.format(signature, demos, inputs)
115 | 
116 |         outputs = await lm.acall(messages=inputs, **lm_kwargs)
117 |         # The signature is supposed to be "text -> {original output fields}"
118 |         extractor_signature = self._create_extractor_signature(signature)
119 | 
120 |         values = []
121 | 
122 |         tool_call_output_field_name = self._get_tool_call_output_field_name(signature)
123 |         for output in outputs:
124 |             output_logprobs = None
125 |             tool_calls = None
126 |             text = output
127 | 
128 |             if isinstance(output, dict):
129 |                 text = output["text"]
130 |                 output_logprobs = output.get("logprobs")
131 |                 tool_calls = output.get("tool_calls")
132 | 
133 |             try:
134 |                 # Call the smaller LM to extract structured data from the raw completion text with ChatAdapter
135 |                 value = await ChatAdapter().acall(
136 |                     lm=self.extraction_model,
137 |                     lm_kwargs={},
138 |                     signature=extractor_signature,
139 |                     demos=[],
140 |                     inputs={"text": text},
141 |                 )
142 |                 value = value[0]
143 | 
144 |             except Exception as e:
145 |                 raise ValueError(f"Failed to parse response from the original completion: {output}") from e
146 | 
147 |             if tool_calls and tool_call_output_field_name:
148 |                 tool_calls = [
149 |                     {
150 |                         "name": v["function"]["name"],
151 |                         "args": json_repair.loads(v["function"]["arguments"]),
152 |                     }
153 |                     for v in tool_calls
154 |                 ]
155 |                 value[tool_call_output_field_name] = ToolCalls.from_dict_list(tool_calls)
156 | 
157 |             if output_logprobs is not None:
158 |                 value["logprobs"] = output_logprobs
159 | 
160 |             values.append(value)
161 |         return values
162 | 
163 |     def format_task_description(self, signature: Signature) -> str:
164 |         """Create a description of the task based on the signature"""
165 |         parts = []
166 | 
167 |         parts.append("You are a helpful assistant that can solve tasks based on user input.")
168 |         parts.append("As input, you will be provided with:\n" + get_field_description_string(signature.input_fields))
169 |         parts.append("Your outputs must contain:\n" + get_field_description_string(signature.output_fields))
170 |         parts.append("You should lay out your outputs in detail so that your answer can be understood by another agent")
171 | 
172 |         if signature.instructions:
173 |             parts.append(f"Specific instructions: {signature.instructions}")
174 | 
175 |         return "\n".join(parts)
176 | 
177 |     def format_user_message_content(
178 |         self,
179 |         signature: type[Signature],
180 |         inputs: dict[str, Any],
181 |         prefix: str = "",
182 |         suffix: str = "",
183 |     ) -> str:
184 |         parts = [prefix]
185 | 
186 |         for name in signature.input_fields.keys():
187 |             if name in inputs:
188 |                 parts.append(f"{name}: {inputs.get(name, '')}")
189 | 
190 |         parts.append(suffix)
191 |         return "\n\n".join(parts).strip()
192 | 
193 |     def format_assistant_message_content(
194 |         self,
195 |         signature: type[Signature],
196 |         outputs: dict[str, Any],
197 |         missing_field_message: str | None = None,
198 |     ) -> str:
199 |         parts = []
200 | 
201 |         for name in signature.output_fields.keys():
202 |             if name in outputs:
203 |                 parts.append(f"{name}: {outputs.get(name, missing_field_message)}")
204 | 
205 |         return "\n\n".join(parts).strip()
206 | 
207 |     def _create_extractor_signature(
208 |         self,
209 |         original_signature: type[Signature],
210 |     ) -> type[Signature]:
211 |         """Create a new signature containing a new 'text' input field and all output fields.
212 | 
213 |         Args:
214 |             original_signature: The original signature to extract output fields from
215 | 
216 |         Returns:
217 |             A new Signature type with a text input field and all output fields
218 |         """
219 |         # Create new fields dict with 'text' input field and all output fields
220 |         new_fields = {
221 |             "text": (str, InputField()),
222 |             **{name: (field.annotation, field) for name, field in original_signature.output_fields.items()},
223 |         }
224 | 
225 |         outputs_str = ", ".join([f"`{field}`" for field in original_signature.output_fields])
226 |         instructions = f"The input is a text that should contain all the necessary information to produce the fields {outputs_str}. \
227 |             Your job is to extract the fields from the text verbatim. Extract precisely the appropriate value (content) for each field."
228 | 
229 |         return make_signature(new_fields, instructions)
230 | 
```

--------------------------------------------------------------------------------
/dspy/utils/parallelizer.py:
--------------------------------------------------------------------------------

```python
  1 | import contextlib
  2 | import copy
  3 | import logging
  4 | import signal
  5 | import sys
  6 | import threading
  7 | import time
  8 | import traceback
  9 | from concurrent.futures import FIRST_COMPLETED, ThreadPoolExecutor, wait
 10 | 
 11 | import tqdm
 12 | 
 13 | logger = logging.getLogger(__name__)
 14 | 
 15 | 
 16 | class ParallelExecutor:
 17 |     def __init__(
 18 |         self,
 19 |         num_threads=None,
 20 |         max_errors=None,
 21 |         disable_progress_bar=False,
 22 |         provide_traceback=None,
 23 |         compare_results=False,
 24 |         timeout=120,
 25 |         straggler_limit=3,
 26 |     ):
 27 |         """
 28 |         Offers isolation between the tasks (dspy.settings) irrespective of whether num_threads == 1 or > 1.
 29 |         Handles also straggler timeouts.
 30 |         """
 31 |         from dspy.dsp.utils.settings import settings
 32 | 
 33 |         self.num_threads = num_threads or settings.num_threads
 34 |         self.max_errors = settings.max_errors if max_errors is None else max_errors
 35 |         self.disable_progress_bar = disable_progress_bar
 36 |         self.provide_traceback = provide_traceback if provide_traceback is not None else settings.provide_traceback
 37 |         self.compare_results = compare_results
 38 |         self.timeout = timeout
 39 |         self.straggler_limit = straggler_limit
 40 | 
 41 |         self.error_count = 0
 42 |         self.error_lock = threading.Lock()
 43 |         self.cancel_jobs = threading.Event()
 44 |         self.failed_indices = []
 45 |         self.exceptions_map = {}
 46 | 
 47 |     def execute(self, function, data):
 48 |         tqdm.tqdm._instances.clear()
 49 |         wrapped = self._wrap_function(function)
 50 |         return self._execute_parallel(wrapped, data)
 51 | 
 52 |     def _wrap_function(self, user_function):
 53 |         def safe_func(item):
 54 |             if self.cancel_jobs.is_set():
 55 |                 return None
 56 |             try:
 57 |                 return user_function(item)
 58 |             except Exception as e:
 59 |                 with self.error_lock:
 60 |                     self.error_count += 1
 61 |                     if self.error_count >= self.max_errors:
 62 |                         self.cancel_jobs.set()
 63 |                 if self.provide_traceback:
 64 |                     logger.error(f"Error for {item}: {e}\n{traceback.format_exc()}")
 65 |                 else:
 66 |                     logger.error(f"Error for {item}: {e}. Set `provide_traceback=True` for traceback.")
 67 |                 return e
 68 | 
 69 |         return safe_func
 70 | 
 71 |     def _execute_parallel(self, function, data):
 72 |         results = [None] * len(data)
 73 |         job_cancelled = "cancelled"
 74 | 
 75 |         # We resubmit at most once per item.
 76 |         start_time_map = {}
 77 |         start_time_lock = threading.Lock()
 78 |         resubmitted = set()
 79 | 
 80 |         # This is the worker function each thread will run.
 81 |         def worker(parent_overrides, submission_id, index, item):
 82 |             if self.cancel_jobs.is_set():
 83 |                 return index, job_cancelled
 84 |             # Record actual start time
 85 |             with start_time_lock:
 86 |                 start_time_map[submission_id] = time.time()
 87 | 
 88 |             # Apply parent's thread-local overrides
 89 |             from dspy.dsp.utils.settings import thread_local_overrides
 90 | 
 91 |             original = thread_local_overrides.get()
 92 |             token = thread_local_overrides.set({**original, **parent_overrides.copy()})
 93 |             if parent_overrides.get("usage_tracker"):
 94 |                 # Usage tracker needs to be deep copied across threads so that each thread tracks its own usage
 95 |                 thread_local_overrides.overrides["usage_tracker"] = copy.deepcopy(parent_overrides["usage_tracker"])
 96 | 
 97 |             try:
 98 |                 return index, function(item)
 99 |             finally:
100 |                 thread_local_overrides.reset(token)
101 | 
102 |         # Handle Ctrl-C in the main thread
103 |         @contextlib.contextmanager
104 |         def interrupt_manager():
105 |             if threading.current_thread() is threading.main_thread():
106 |                 orig_handler = signal.getsignal(signal.SIGINT)
107 | 
108 |                 def handler(sig, frame):
109 |                     self.cancel_jobs.set()
110 |                     logger.warning("SIGINT received. Cancelling.")
111 |                     orig_handler(sig, frame)
112 | 
113 |                 signal.signal(signal.SIGINT, handler)
114 |                 try:
115 |                     yield
116 |                 finally:
117 |                     signal.signal(signal.SIGINT, orig_handler)
118 |             else:
119 |                 yield
120 | 
121 |         executor = ThreadPoolExecutor(max_workers=self.num_threads)
122 |         try:
123 |             with interrupt_manager():
124 |                 from dspy.dsp.utils.settings import thread_local_overrides
125 | 
126 |                 parent_overrides = thread_local_overrides.get().copy()
127 | 
128 |                 futures_map = {}
129 |                 futures_set = set()
130 |                 submission_counter = 0
131 | 
132 |                 for idx, item in enumerate(data):
133 |                     f = executor.submit(worker, parent_overrides, submission_counter, idx, item)
134 |                     futures_map[f] = (submission_counter, idx, item)
135 |                     futures_set.add(f)
136 |                     submission_counter += 1
137 | 
138 |                 pbar = tqdm.tqdm(
139 |                     total=len(data),
140 |                     dynamic_ncols=True,
141 |                     disable=self.disable_progress_bar,
142 |                     file=sys.stdout,
143 |                 )
144 | 
145 |                 def all_done():
146 |                     return all(r is not None for r in results)
147 | 
148 |                 while futures_set and not self.cancel_jobs.is_set():
149 |                     if all_done():
150 |                         break
151 |                     done, not_done = wait(futures_set, timeout=1, return_when=FIRST_COMPLETED)
152 |                     for f in done:
153 |                         futures_set.remove(f)
154 |                         try:
155 |                             index, outcome = f.result()
156 |                         except Exception:
157 |                             pass
158 |                         else:
159 |                             if outcome != job_cancelled and results[index] is None:
160 |                                 # Check if this is an exception
161 |                                 if isinstance(outcome, Exception):
162 |                                     with self.error_lock:
163 |                                         self.failed_indices.append(index)
164 |                                         self.exceptions_map[index] = outcome
165 |                                     results[index] = None  # Keep None for failed examples
166 |                                 else:
167 |                                     results[index] = outcome
168 | 
169 |                             # Update progress
170 |                             if self.compare_results:
171 |                                 vals = [r[-1] for r in results if r is not None]
172 |                                 self._update_progress(pbar, sum(vals), len(vals))
173 |                             else:
174 |                                 self._update_progress(
175 |                                     pbar,
176 |                                     len([r for r in results if r is not None]),
177 |                                     len(data),
178 |                                 )
179 | 
180 |                     if all_done():
181 |                         break
182 | 
183 |                     # Check stragglers if few remain
184 |                     if 0 < self.timeout and len(not_done) <= self.straggler_limit:
185 |                         now = time.time()
186 |                         for f in list(not_done):
187 |                             if f not in resubmitted:
188 |                                 sid, idx, item = futures_map[f]
189 |                                 with start_time_lock:
190 |                                     st = start_time_map.get(sid, None)
191 |                                 if st and (now - st) >= self.timeout:
192 |                                     resubmitted.add(f)
193 |                                     nf = executor.submit(
194 |                                         worker,
195 |                                         parent_overrides,
196 |                                         submission_counter,
197 |                                         idx,
198 |                                         item,
199 |                                     )
200 |                                     futures_map[nf] = (submission_counter, idx, item)
201 |                                     futures_set.add(nf)
202 |                                     submission_counter += 1
203 | 
204 |                 pbar.close()
205 | 
206 |         finally:
207 |             # Avoid waiting on leftover tasks that no longer matter
208 |             executor.shutdown(wait=False)
209 | 
210 |         if self.cancel_jobs.is_set():
211 |             logger.warning("Execution cancelled due to errors or interruption.")
212 |             raise Exception("Execution cancelled due to errors or interruption.")
213 | 
214 |         return results
215 | 
216 |     def _update_progress(self, pbar, nresults, ntotal):
217 |         if self.compare_results:
218 |             pct = round(100 * nresults / ntotal, 1) if ntotal else 0
219 |             pbar.set_description(f"Average Metric: {nresults:.2f} / {ntotal} ({pct}%)")
220 |         else:
221 |             pbar.set_description(f"Processed {nresults} / {ntotal} examples")
222 |         pbar.update()
223 | 
```

--------------------------------------------------------------------------------
/tests/callback/test_callback.py:
--------------------------------------------------------------------------------

```python
  1 | import time
  2 | 
  3 | import pytest
  4 | 
  5 | import dspy
  6 | from dspy.utils.callback import ACTIVE_CALL_ID, BaseCallback, with_callbacks
  7 | from dspy.utils.dummies import DummyLM
  8 | 
  9 | 
 10 | @pytest.fixture(autouse=True)
 11 | def reset_settings():
 12 |     # Make sure the settings are reset after each test
 13 |     original_settings = dspy.settings.copy()
 14 | 
 15 |     yield
 16 | 
 17 |     dspy.settings.configure(**original_settings)
 18 | 
 19 | 
 20 | class MyCallback(BaseCallback):
 21 |     """A simple callback that records the calls."""
 22 | 
 23 |     def __init__(self):
 24 |         self.calls = []
 25 | 
 26 |     def on_module_start(self, call_id, instance, inputs):
 27 |         self.calls.append({"handler": "on_module_start", "instance": instance, "inputs": inputs})
 28 | 
 29 |     def on_module_end(self, call_id, outputs, exception):
 30 |         self.calls.append({"handler": "on_module_end", "outputs": outputs, "exception": exception})
 31 | 
 32 |     def on_lm_start(self, call_id, instance, inputs):
 33 |         self.calls.append({"handler": "on_lm_start", "instance": instance, "inputs": inputs})
 34 | 
 35 |     def on_lm_end(self, call_id, outputs, exception):
 36 |         self.calls.append({"handler": "on_lm_end", "outputs": outputs, "exception": exception})
 37 | 
 38 |     def on_adapter_format_start(self, call_id, instance, inputs):
 39 |         self.calls.append({"handler": "on_adapter_format_start", "instance": instance, "inputs": inputs})
 40 | 
 41 |     def on_adapter_format_end(self, call_id, outputs, exception):
 42 |         self.calls.append({"handler": "on_adapter_format_end", "outputs": outputs, "exception": exception})
 43 | 
 44 |     def on_adapter_parse_start(self, call_id, instance, inputs):
 45 |         self.calls.append({"handler": "on_adapter_parse_start", "instance": instance, "inputs": inputs})
 46 | 
 47 |     def on_adapter_parse_end(self, call_id, outputs, exception):
 48 |         self.calls.append({"handler": "on_adapter_parse_end", "outputs": outputs, "exception": exception})
 49 | 
 50 |     def on_tool_start(self, call_id, instance, inputs):
 51 |         self.calls.append({"handler": "on_tool_start", "instance": instance, "inputs": inputs})
 52 | 
 53 |     def on_tool_end(self, call_id, outputs, exception):
 54 |         self.calls.append({"handler": "on_tool_end", "outputs": outputs, "exception": exception})
 55 | 
 56 | 
 57 | @pytest.mark.parametrize(
 58 |     ("args", "kwargs"),
 59 |     [
 60 |         ([1, "2", 3.0], {}),
 61 |         ([1, "2"], {"z": 3.0}),
 62 |         ([1], {"y": "2", "z": 3.0}),
 63 |         ([], {"x": 1, "y": "2", "z": 3.0}),
 64 |     ],
 65 | )
 66 | def test_callback_injection(args, kwargs):
 67 |     class Target(dspy.Module):
 68 |         @with_callbacks
 69 |         def forward(self, x: int, y: str, z: float) -> int:
 70 |             time.sleep(0.1)
 71 |             return x + int(y) + int(z)
 72 | 
 73 |     callback = MyCallback()
 74 |     dspy.settings.configure(callbacks=[callback])
 75 | 
 76 |     target = Target()
 77 |     result = target.forward(*args, **kwargs)
 78 | 
 79 |     assert result == 6
 80 | 
 81 |     assert len(callback.calls) == 2
 82 |     assert callback.calls[0]["handler"] == "on_module_start"
 83 |     assert callback.calls[0]["inputs"] == {"x": 1, "y": "2", "z": 3.0}
 84 |     assert callback.calls[1]["handler"] == "on_module_end"
 85 |     assert callback.calls[1]["outputs"] == 6
 86 | 
 87 | 
 88 | def test_callback_injection_local():
 89 |     class Target(dspy.Module):
 90 |         @with_callbacks
 91 |         def forward(self, x: int, y: str, z: float) -> int:
 92 |             time.sleep(0.1)
 93 |             return x + int(y) + int(z)
 94 | 
 95 |     callback = MyCallback()
 96 | 
 97 |     target_1 = Target(callbacks=[callback])
 98 |     result = target_1.forward(1, "2", 3.0)
 99 | 
100 |     assert result == 6
101 | 
102 |     assert len(callback.calls) == 2
103 |     assert callback.calls[0]["handler"] == "on_module_start"
104 |     assert callback.calls[0]["inputs"] == {"x": 1, "y": "2", "z": 3.0}
105 |     assert callback.calls[1]["handler"] == "on_module_end"
106 |     assert callback.calls[1]["outputs"] == 6
107 | 
108 |     callback.calls = []
109 | 
110 |     target_2 = Target()
111 |     result = target_2.forward(1, "2", 3.0)
112 | 
113 |     # Other instance should not trigger the callback
114 |     assert not callback.calls
115 | 
116 | 
117 | def test_callback_error_handling():
118 |     class Target(dspy.Module):
119 |         @with_callbacks
120 |         def forward(self, x: int, y: str, z: float) -> int:
121 |             time.sleep(0.1)
122 |             raise ValueError("Error")
123 | 
124 |     callback = MyCallback()
125 |     dspy.settings.configure(callbacks=[callback])
126 | 
127 |     target = Target()
128 | 
129 |     with pytest.raises(ValueError, match="Error"):
130 |         target.forward(1, "2", 3.0)
131 | 
132 |     assert len(callback.calls) == 2
133 |     assert callback.calls[0]["handler"] == "on_module_start"
134 |     assert callback.calls[1]["handler"] == "on_module_end"
135 |     assert isinstance(callback.calls[1]["exception"], ValueError)
136 | 
137 | 
138 | def test_multiple_callbacks():
139 |     class Target(dspy.Module):
140 |         @with_callbacks
141 |         def forward(self, x: int, y: str, z: float) -> int:
142 |             time.sleep(0.1)
143 |             return x + int(y) + int(z)
144 | 
145 |     callback_1 = MyCallback()
146 |     callback_2 = MyCallback()
147 |     dspy.settings.configure(callbacks=[callback_1, callback_2])
148 | 
149 |     target = Target()
150 |     result = target.forward(1, "2", 3.0)
151 | 
152 |     assert result == 6
153 | 
154 |     assert len(callback_1.calls) == 2
155 |     assert len(callback_2.calls) == 2
156 | 
157 | 
158 | def test_callback_complex_module():
159 |     callback = MyCallback()
160 |     dspy.settings.configure(
161 |         lm=DummyLM({"How are you?": {"answer": "test output", "reasoning": "No more responses"}}),
162 |         callbacks=[callback],
163 |     )
164 | 
165 |     cot = dspy.ChainOfThought("question -> answer", n=3)
166 |     result = cot(question="How are you?")
167 |     assert result["answer"] == "test output"
168 |     assert result["reasoning"] == "No more responses"
169 | 
170 |     assert len(callback.calls) == 14
171 |     assert [call["handler"] for call in callback.calls] == [
172 |         "on_module_start",
173 |         "on_module_start",
174 |         "on_adapter_format_start",
175 |         "on_adapter_format_end",
176 |         "on_lm_start",
177 |         "on_lm_end",
178 |         # Parsing will run per output (n=3)
179 |         "on_adapter_parse_start",
180 |         "on_adapter_parse_end",
181 |         "on_adapter_parse_start",
182 |         "on_adapter_parse_end",
183 |         "on_adapter_parse_start",
184 |         "on_adapter_parse_end",
185 |         "on_module_end",
186 |         "on_module_end",
187 |     ]
188 | 
189 | @pytest.mark.asyncio
190 | async def test_callback_async_module():
191 |     callback = MyCallback()
192 |     with dspy.context(
193 |         lm=DummyLM({"How are you?": {"answer": "test output", "reasoning": "No more responses"}}),
194 |         callbacks=[callback],
195 |     ):
196 |         cot = dspy.ChainOfThought("question -> answer", n=3)
197 |         result = await cot.acall(question="How are you?")
198 |     assert result["answer"] == "test output"
199 |     assert result["reasoning"] == "No more responses"
200 | 
201 |     assert len(callback.calls) == 14
202 |     assert [call["handler"] for call in callback.calls] == [
203 |         "on_module_start",
204 |         "on_module_start",
205 |         "on_adapter_format_start",
206 |         "on_adapter_format_end",
207 |         "on_lm_start",
208 |         "on_lm_end",
209 |         # Parsing will run per output (n=3)
210 |         "on_adapter_parse_start",
211 |         "on_adapter_parse_end",
212 |         "on_adapter_parse_start",
213 |         "on_adapter_parse_end",
214 |         "on_adapter_parse_start",
215 |         "on_adapter_parse_end",
216 |         "on_module_end",
217 |         "on_module_end",
218 |     ]
219 | 
220 | 
221 | def test_tool_calls():
222 |     callback = MyCallback()
223 |     dspy.settings.configure(callbacks=[callback])
224 | 
225 |     def tool_1(query: str) -> str:
226 |         """A dummy tool function."""
227 |         return "result 1"
228 | 
229 |     def tool_2(query: str) -> str:
230 |         """Another dummy tool function."""
231 |         return "result 2"
232 | 
233 |     class MyModule(dspy.Module):
234 |         def __init__(self):
235 |             self.tools = [dspy.Tool(tool_1), dspy.Tool(tool_2)]
236 | 
237 |         def forward(self, query: str) -> str:
238 |             query = self.tools[0](query=query)
239 |             return self.tools[1](query=query)
240 | 
241 |     module = MyModule()
242 |     result = module("query")
243 | 
244 |     assert result == "result 2"
245 |     assert len(callback.calls) == 6
246 |     assert [call["handler"] for call in callback.calls] == [
247 |         "on_module_start",
248 |         "on_tool_start",
249 |         "on_tool_end",
250 |         "on_tool_start",
251 |         "on_tool_end",
252 |         "on_module_end",
253 |     ]
254 | 
255 | 
256 | def test_active_id():
257 |     # Test the call ID is generated and handled properly
258 |     class CustomCallback(BaseCallback):
259 |         def __init__(self):
260 |             self.parent_call_ids = []
261 |             self.call_ids = []
262 | 
263 |         def on_module_start(self, call_id, instance, inputs):
264 |             parent_call_id = ACTIVE_CALL_ID.get()
265 |             self.parent_call_ids.append(parent_call_id)
266 |             self.call_ids.append(call_id)
267 | 
268 |     class Parent(dspy.Module):
269 |         def __init__(self):
270 |             self.child_1 = Child()
271 |             self.child_2 = Child()
272 | 
273 |         def forward(self):
274 |             self.child_1()
275 |             self.child_2()
276 | 
277 |     class Child(dspy.Module):
278 |         def forward(self):
279 |             pass
280 | 
281 |     callback = CustomCallback()
282 |     dspy.settings.configure(callbacks=[callback])
283 | 
284 |     parent = Parent()
285 |     parent()
286 | 
287 |     assert len(callback.call_ids) == 3
288 |     # All three calls should have different call ids
289 |     assert len(set(callback.call_ids)) == 3
290 |     parent_call_id = callback.call_ids[0]
291 |     assert callback.parent_call_ids == [None, parent_call_id, parent_call_id]
292 | 
```

--------------------------------------------------------------------------------
/dspy/teleprompt/avatar_optimizer.py:
--------------------------------------------------------------------------------

```python
  1 | from concurrent.futures import ThreadPoolExecutor
  2 | from copy import deepcopy
  3 | from random import sample
  4 | from typing import Callable
  5 | 
  6 | from pydantic import BaseModel
  7 | from tqdm import tqdm
  8 | 
  9 | import dspy
 10 | from dspy.predict.avatar import ActionOutput
 11 | from dspy.teleprompt.teleprompt import Teleprompter
 12 | 
 13 | DEFAULT_MAX_EXAMPLES = 10
 14 | 
 15 | 
 16 | class EvalResult(BaseModel):
 17 |     example: dict
 18 |     score: float
 19 |     actions: list[ActionOutput] | None = None
 20 | 
 21 | 
 22 | class Comparator(dspy.Signature):
 23 |     """After executing the given actions on user inputs using the given instruction, some inputs have yielded good, results, while others have not. I'll provide you the inputs along with their, corresponding evaluation metrics:
 24 | 
 25 | Task:
 26 | (1) Firstly, identify and contrast the patterns of inputs that have achieved good results with those that have not.
 27 | (2) Then, review the computational logic for any inconsistencies in the previous actions.
 28 | (3) Lastly, specify the modification in tools used that can lead to improved performance on the negative inputs."""
 29 | 
 30 |     instruction: str = dspy.InputField(
 31 |         prefix="Instruction: ",
 32 |         desc="Instruction for the actor to execute the task",
 33 |     )
 34 |     actions: list[str] = dspy.InputField(
 35 |         prefix="Actions: ",
 36 |         desc="Actions actor can take to complete the task",
 37 |     )
 38 |     pos_input_with_metrics: list[EvalResult] = dspy.InputField(
 39 |         prefix="Positive Inputs: ",
 40 |         desc="Positive inputs along with their score on a evaluation metric and actions taken",
 41 |     )
 42 |     neg_input_with_metrics: list[EvalResult] = dspy.InputField(
 43 |         prefix="Negative Inputs: ",
 44 |         desc="Negative inputs along with their score on a evaluation metric and actions taken",
 45 |     )
 46 |     feedback: str = dspy.OutputField(
 47 |         prefix="Feedback: ",
 48 |         desc="Feedback for the actor to improve the performance of negative inputs",
 49 |     )
 50 | 
 51 | 
 52 | class FeedbackBasedInstruction(dspy.Signature):
 53 |     """There is a task that needs to be completed for which one can use multiple tools to achieve the desired outcome. A group's performance was evaluated on a dataset of inputs, the inputs that did well are positive inputs, and the inputs that did not do well are negative inputs.
 54 | 
 55 | You received feedback on how they can better use the tools to improve your performance on the negative inputs. You have been provided with the previous instruction, that they followed to use tools to complete the task, and the feedback on your performance.
 56 | 
 57 | Your task is to incorporate the feedback and generate a detailed instruction for the group to follow to improve their performance on the task.
 58 | 
 59 | Make sure that the new instruction talks about how to use the tools effectively and should be no more than 3 paragraphs long. The previous instruction contains general guidelines that you must retain in the new instruction."""
 60 | 
 61 |     previous_instruction: str = dspy.InputField(
 62 |         prefix="Previous Instruction: ",
 63 |         desc="Previous instruction for the actor to execute the task",
 64 |     )
 65 |     feedback: str = dspy.InputField(
 66 |         prefix="Feedback: ",
 67 |         desc="Feedback for the actor to improve the performance of negative inputs",
 68 |     )
 69 |     new_instruction: str = dspy.OutputField(
 70 |         prefix="New Instruction: ",
 71 |         desc="New instruction for the actor to execute the task",
 72 |     )
 73 | 
 74 | 
 75 | class AvatarOptimizer(Teleprompter):
 76 |     def __init__(
 77 |         self,
 78 |         metric: Callable,
 79 |         max_iters: int = 10,
 80 |         lower_bound: int = 0,
 81 |         upper_bound: int = 1,
 82 |         max_positive_inputs: int | None = None,
 83 |         max_negative_inputs: int | None = None,
 84 |         optimize_for: str = "max",
 85 |     ):
 86 |         assert metric is not None, "`metric` argument cannot be None. Please provide a metric function."
 87 |         self.metric = metric
 88 |         self.optimize_for = optimize_for
 89 | 
 90 |         self.max_iters = max_iters
 91 | 
 92 |         self.lower_bound = lower_bound
 93 |         self.upper_bound = upper_bound
 94 | 
 95 |         self.max_positive_inputs = max_positive_inputs or DEFAULT_MAX_EXAMPLES
 96 |         self.max_negative_inputs = max_negative_inputs or DEFAULT_MAX_EXAMPLES
 97 | 
 98 |         self.comparator = dspy.TypedPredictor(Comparator)
 99 |         self.feedback_instruction = dspy.Predict(FeedbackBasedInstruction)
100 | 
101 |     def process_example(self, actor, example, return_outputs):
102 |         actor = deepcopy(actor)
103 | 
104 |         try:
105 |             prediction = actor(**example.inputs().toDict())
106 |             score = self.metric(example, prediction)
107 | 
108 |             if return_outputs:
109 |                 return example, prediction, score
110 |             else:
111 |                 return score
112 | 
113 |         except Exception as e:
114 |             print(e)
115 | 
116 |             if return_outputs:
117 |                 return example, None, 0
118 |             else:
119 |                 return 0
120 | 
121 | 
122 |     def thread_safe_evaluator(self, devset, actor, return_outputs=False, num_threads=None):
123 |         total_score = 0
124 |         total_examples = len(devset)
125 |         results = []
126 |         num_threads = num_threads or dspy.settings.num_threads
127 | 
128 |         with ThreadPoolExecutor(max_workers=num_threads) as executor:
129 |             futures = [executor.submit(self.process_example, actor, example, return_outputs) for example in devset]
130 | 
131 |             for future in tqdm(futures, total=total_examples, desc="Processing examples"):
132 |                 result = future.result()
133 |                 if return_outputs:
134 |                     example, prediction, score = result
135 |                     total_score += score
136 |                     results.append((example, prediction, score))
137 |                 else:
138 |                     total_score += result
139 | 
140 |         avg_metric = total_score / total_examples
141 | 
142 |         if return_outputs:
143 |             return avg_metric, results
144 |         else:
145 |             return avg_metric
146 | 
147 | 
148 |     def _get_pos_neg_results(
149 |         self,
150 |         actor: dspy.Module,
151 |         trainset: list[dspy.Example]
152 |     ) -> tuple[float, list[EvalResult], list[EvalResult]]:
153 |         pos_inputs = []
154 |         neg_inputs = []
155 | 
156 |         avg_score, results = self.thread_safe_evaluator(trainset, actor, return_outputs=True)
157 |         print(f"Average Score: {avg_score}")
158 | 
159 |         for example, prediction, score in results:
160 |             if score >= self.upper_bound:
161 |                 pos_inputs.append(
162 |                     EvalResult(
163 |                         example=example.inputs().toDict(),
164 |                         score=score,
165 |                         actions=prediction.actions if prediction else None
166 |                     )
167 |                 )
168 |             elif score <= self.lower_bound:
169 |                 neg_inputs.append(
170 |                     EvalResult(
171 |                         example=example.inputs().toDict(),
172 |                         score=score,
173 |                         actions=prediction.actions if prediction else None
174 |                     )
175 |                 )
176 | 
177 |         if len(pos_inputs) == 0:
178 |             raise ValueError("No positive examples found, try lowering the upper_bound or providing more training data")
179 |         if len(neg_inputs) == 0:
180 |             raise ValueError("No negative examples found, try raising the lower_bound or providing more training data")
181 | 
182 |         return (avg_score, pos_inputs, neg_inputs)
183 | 
184 | 
185 |     def compile(self, student, *, trainset):
186 |         best_actor = deepcopy(student)
187 |         best_score = -999 if self.optimize_for == "max" else 999
188 | 
189 |         for i in range(self.max_iters):
190 |             print(20*"=")
191 |             print(f"Iteration {i+1}/{self.max_iters}")
192 | 
193 |             score, pos_inputs, neg_inputs = self._get_pos_neg_results(best_actor, trainset)
194 |             print(f"Positive examples: {len(pos_inputs)}")
195 |             print(f"Negative examples: {len(neg_inputs)}")
196 |             print(f"Sampling {self.max_positive_inputs} positive examples and {self.max_negative_inputs} negative examples")
197 | 
198 |             if self.max_positive_inputs and len(pos_inputs) > self.max_positive_inputs:
199 |                 pos_inputs = sample(pos_inputs, self.max_positive_inputs)
200 | 
201 |             if self.max_negative_inputs and len(neg_inputs) > self.max_negative_inputs:
202 |                 neg_inputs = sample(neg_inputs, self.max_negative_inputs)
203 | 
204 |             feedback = self.comparator(
205 |                 instruction=best_actor.actor.signature.instructions,
206 |                 actions=[str(tool) for tool in best_actor.tools],
207 |                 pos_input_with_metrics=pos_inputs,
208 |                 neg_input_with_metrics=neg_inputs
209 |             ).feedback
210 | 
211 |             new_instruction = self.feedback_instruction(
212 |                 previous_instruction=best_actor.actor.signature.instructions,
213 |                 feedback=feedback
214 |             ).new_instruction
215 | 
216 |             print(f"Generated new instruction: {new_instruction}")
217 | 
218 |             if (self.optimize_for == "max" and best_score < score) or (self.optimize_for == "min" and best_score > score):
219 |                 best_actor.actor.signature = best_actor.actor.signature.with_instructions(new_instruction)
220 |                 best_actor.actor_clone = deepcopy(best_actor.actor)
221 |                 best_score = score
222 | 
223 |         print(f"Best Actor: {best_actor}")
224 | 
225 |         return best_actor
226 | 
```

--------------------------------------------------------------------------------
/docs/docs/tutorials/deployment/index.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Tutorial: Deploying your DSPy program
  2 | 
  3 | This guide demonstrates two potential ways to deploy your DSPy program in production: FastAPI for lightweight deployments and MLflow for more production-grade deployments with program versioning and management.
  4 | 
  5 | Below, we'll assume you have the following simple DSPy program that you want to deploy. You can replace this with something more sophisticated.
  6 | 
  7 | ```python
  8 | import dspy
  9 | 
 10 | dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
 11 | dspy_program = dspy.ChainOfThought("question -> answer")
 12 | ```
 13 | 
 14 | ## Deploying with FastAPI
 15 | 
 16 | FastAPI offers a straightforward way to serve your DSPy program as a REST API. This is ideal when you have direct access to your program code and need a lightweight deployment solution.
 17 | 
 18 | ```bash
 19 | > pip install fastapi uvicorn
 20 | > export OPENAI_API_KEY="your-openai-api-key"
 21 | ```
 22 | 
 23 | Let's create a FastAPI application to serve your `dspy_program` defined above.
 24 | 
 25 | ```python
 26 | from fastapi import FastAPI, HTTPException
 27 | from pydantic import BaseModel
 28 | 
 29 | import dspy
 30 | 
 31 | app = FastAPI(
 32 |     title="DSPy Program API",
 33 |     description="A simple API serving a DSPy Chain of Thought program",
 34 |     version="1.0.0"
 35 | )
 36 | 
 37 | # Define request model for better documentation and validation
 38 | class Question(BaseModel):
 39 |     text: str
 40 | 
 41 | # Configure your language model and 'asyncify' your DSPy program.
 42 | lm = dspy.LM("openai/gpt-4o-mini")
 43 | dspy.settings.configure(lm=lm, async_max_workers=4) # default is 8
 44 | dspy_program = dspy.ChainOfThought("question -> answer")
 45 | dspy_program = dspy.asyncify(dspy_program)
 46 | 
 47 | @app.post("/predict")
 48 | async def predict(question: Question):
 49 |     try:
 50 |         result = await dspy_program(question=question.text)
 51 |         return {
 52 |             "status": "success",
 53 |             "data": result.toDict()
 54 |         }
 55 |     except Exception as e:
 56 |         raise HTTPException(status_code=500, detail=str(e))
 57 | ```
 58 | 
 59 | In the code above, we call `dspy.asyncify` to convert the dspy program to run in async mode for high-throughput FastAPI
 60 | deployments. Currently, this runs the dspy program in a separate thread and awaits its result.
 61 | 
 62 | By default, the limit of spawned threads is 8. Think of this like a worker pool.
 63 | If you have 8 in-flight programs and call it once more, the 9th call will wait until one of the 8 returns.
 64 | You can configure the async capacity using the new `async_max_workers` setting.
 65 | 
 66 | ??? "Streaming, in DSPy 2.6.0+"
 67 | 
 68 |     Streaming is also supported in DSPy 2.6.0+, which can be installed via `pip install -U dspy`.
 69 | 
 70 |     We can use `dspy.streamify` to convert the dspy program to a streaming mode. This is useful when you want to stream
 71 |     the intermediate outputs (i.e. O1-style reasoning) to the client before the final prediction is ready. This uses
 72 |     asyncify under the hood and inherits the execution semantics.
 73 | 
 74 |     ```python
 75 |     dspy_program = dspy.asyncify(dspy.ChainOfThought("question -> answer"))
 76 |     streaming_dspy_program = dspy.streamify(dspy_program)
 77 | 
 78 |     @app.post("/predict/stream")
 79 |     async def stream(question: Question):
 80 |         async def generate():
 81 |             async for value in streaming_dspy_program(question=question.text):
 82 |                 if isinstance(value, dspy.Prediction):
 83 |                     data = {"prediction": value.labels().toDict()}
 84 |                 elif isinstance(value, litellm.ModelResponse):
 85 |                     data = {"chunk": value.json()}
 86 |                 yield f"data: {ujson.dumps(data)}\n\n"
 87 |             yield "data: [DONE]\n\n"
 88 | 
 89 |         return StreamingResponse(generate(), media_type="text/event-stream")
 90 | 
 91 |     # Since you're often going to want to stream the result of a DSPy program as server-sent events,
 92 |     # we've included a helper function for that, which is equivalent to the code above.
 93 | 
 94 |     from dspy.utils.streaming import streaming_response
 95 | 
 96 |     @app.post("/predict/stream")
 97 |     async def stream(question: Question):
 98 |         stream = streaming_dspy_program(question=question.text)
 99 |         return StreamingResponse(streaming_response(stream), media_type="text/event-stream")
100 |     ```
101 | 
102 | Write your code to a file, e.g., `fastapi_dspy.py`. Then you can serve the app with:
103 | 
104 | ```bash
105 | > uvicorn fastapi_dspy:app --reload
106 | ```
107 | 
108 | It will start a local server at `http://127.0.0.1:8000/`. You can test it with the python code below:
109 | 
110 | ```python
111 | import requests
112 | 
113 | response = requests.post(
114 |     "http://127.0.0.1:8000/predict",
115 |     json={"text": "What is the capital of France?"}
116 | )
117 | print(response.json())
118 | ```
119 | 
120 | You should see the response like below:
121 | 
122 | ```json
123 | {
124 |   "status": "success",
125 |   "data": {
126 |     "reasoning": "The capital of France is a well-known fact, commonly taught in geography classes and referenced in various contexts. Paris is recognized globally as the capital city, serving as the political, cultural, and economic center of the country.",
127 |     "answer": "The capital of France is Paris."
128 |   }
129 | }
130 | ```
131 | 
132 | ## Deploying with MLflow
133 | 
134 | We recommend deploying with MLflow if you are looking to package your DSPy program and deploy in an isolated environment.
135 | MLflow is a popular platform for managing machine learning workflows, including versioning, tracking, and deployment.
136 | 
137 | ```bash
138 | > pip install mlflow>=2.18.0
139 | ```
140 | 
141 | Let's spin up the MLflow tracking server, where we will store our DSPy program. The command below will start a local server at
142 | `http://127.0.0.1:5000/`.
143 | 
144 | ```bash
145 | > mlflow ui
146 | ```
147 | 
148 | Then we can define the DSPy program and log it to the MLflow server. "log" is an overloaded term in MLflow, basically it means
149 | we store the program information along with environment requirements in the MLflow server. This is done via the `mlflow.dspy.log_model()`
150 | function, please see the code below:
151 | 
152 | > [!NOTE]
153 | > As of MLflow 2.22.0, there is a caveat that you must wrap your DSPy program in a custom DSPy Module class when deploying with MLflow.
154 | > This is because MLflow requires positional arguments while DSPy pre-built modules disallow positional arguments, e.g., `dspy.Predict`
155 | > or `dspy.ChainOfThought`. To work around this, create a wrapper class that inherits from `dspy.Module` and implement your program's
156 | > logic in the `forward()` method, as shown in the example below.
157 | 
158 | ```python
159 | import dspy
160 | import mlflow
161 | 
162 | mlflow.set_tracking_uri("http://127.0.0.1:5000/")
163 | mlflow.set_experiment("deploy_dspy_program")
164 | 
165 | lm = dspy.LM("openai/gpt-4o-mini")
166 | dspy.settings.configure(lm=lm)
167 | 
168 | class MyProgram(dspy.Module):
169 |     def __init__(self):
170 |         super().__init__()
171 |         self.cot = dspy.ChainOfThought("question -> answer")
172 | 
173 |     def forward(self, messages):
174 |         return self.cot(question=messages[0]["content"])
175 | 
176 | dspy_program = MyProgram()
177 | 
178 | with mlflow.start_run():
179 |     mlflow.dspy.log_model(
180 |         dspy_program,
181 |         "dspy_program",
182 |         input_example={"messages": [{"role": "user", "content": "What is LLM agent?"}]},
183 |         task="llm/v1/chat",
184 |     )
185 | ```
186 | 
187 | We recommend you to set `task="llm/v1/chat"` so that the deployed program automatically takes input and generate output in
188 | the same format as the OpenAI chat API, which is a common interface for LM applications. Write the code above into
189 | a file, e.g. `mlflow_dspy.py`, and run it.
190 | 
191 | After you logged the program, you can view the saved information in MLflow UI. Open `http://127.0.0.1:5000/` and select
192 | the `deploy_dspy_program` experiment, then select the run your just created, under the `Artifacts` tab, you should see the
193 | logged program information, similar to the following screenshot:
194 | 
195 | ![MLflow UI](./dspy_mlflow_ui.png)
196 | 
197 | Grab your run id from UI (or the console print when you execute `mlflow_dspy.py`), now you can deploy the logged program
198 | with the following command:
199 | 
200 | ```bash
201 | > mlflow models serve -m runs:/{run_id}/model -p 6000
202 | ```
203 | 
204 | After the program is deployed, you can test it with the following command:
205 | 
206 | ```bash
207 | > curl http://127.0.0.1:6000/invocations -H "Content-Type:application/json"  --data '{"messages": [{"content": "what is 2 + 2?", "role": "user"}]}'
208 | ```
209 | 
210 | You should see the response like below:
211 | 
212 | ```json
213 | {
214 |   "choices": [
215 |     {
216 |       "index": 0,
217 |       "message": {
218 |         "role": "assistant",
219 |         "content": "{\"reasoning\": \"The question asks for the sum of 2 and 2. To find the answer, we simply add the two numbers together: 2 + 2 = 4.\", \"answer\": \"4\"}"
220 |       },
221 |       "finish_reason": "stop"
222 |     }
223 |   ]
224 | }
225 | ```
226 | 
227 | For complete guide on how to deploy a DSPy program with MLflow, and how to customize the deployment, please refer to the
228 | [MLflow documentation](https://mlflow.org/docs/latest/llms/dspy/index.html).
229 | 
230 | ### Best Practices for MLflow Deployment
231 | 
232 | 1. **Environment Management**: Always specify your Python dependencies in a `conda.yaml` or `requirements.txt` file.
233 | 2. **Versioning**: Use meaningful tags and descriptions for your model versions.
234 | 3. **Input Validation**: Define clear input schemas and examples.
235 | 4. **Monitoring**: Set up proper logging and monitoring for production deployments.
236 | 
237 | For production deployments, consider using MLflow with containerization:
238 | 
239 | ```bash
240 | > mlflow models build-docker -m "runs:/{run_id}/model" -n "dspy-program"
241 | > docker run -p 6000:8080 dspy-program
242 | ```
243 | 
244 | For a complete guide on production deployment options and best practices, refer to the
245 | [MLflow documentation](https://mlflow.org/docs/latest/llms/dspy/index.html).
246 | 
```