genomoncology/biomcp # codebase.md

This is page 8 of 19. Use http://codebase.md/genomoncology/biomcp?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .github
│   ├── actions
│   │   └── setup-python-env
│   │       └── action.yml
│   ├── dependabot.yml
│   └── workflows
│       ├── ci.yml
│       ├── deploy-docs.yml
│       ├── main.yml.disabled
│       ├── on-release-main.yml
│       └── validate-codecov-config.yml
├── .gitignore
├── .pre-commit-config.yaml
├── BIOMCP_DATA_FLOW.md
├── CHANGELOG.md
├── CNAME
├── codecov.yaml
├── docker-compose.yml
├── Dockerfile
├── docs
│   ├── apis
│   │   ├── error-codes.md
│   │   ├── overview.md
│   │   └── python-sdk.md
│   ├── assets
│   │   ├── biomcp-cursor-locations.png
│   │   ├── favicon.ico
│   │   ├── icon.png
│   │   ├── logo.png
│   │   ├── mcp_architecture.txt
│   │   └── remote-connection
│   │       ├── 00_connectors.png
│   │       ├── 01_add_custom_connector.png
│   │       ├── 02_connector_enabled.png
│   │       ├── 03_connect_to_biomcp.png
│   │       ├── 04_select_google_oauth.png
│   │       └── 05_success_connect.png
│   ├── backend-services-reference
│   │   ├── 01-overview.md
│   │   ├── 02-biothings-suite.md
│   │   ├── 03-cbioportal.md
│   │   ├── 04-clinicaltrials-gov.md
│   │   ├── 05-nci-cts-api.md
│   │   ├── 06-pubtator3.md
│   │   └── 07-alphagenome.md
│   ├── blog
│   │   ├── ai-assisted-clinical-trial-search-analysis.md
│   │   ├── images
│   │   │   ├── deep-researcher-video.png
│   │   │   ├── researcher-announce.png
│   │   │   ├── researcher-drop-down.png
│   │   │   ├── researcher-prompt.png
│   │   │   ├── trial-search-assistant.png
│   │   │   └── what_is_biomcp_thumbnail.png
│   │   └── researcher-persona-resource.md
│   ├── changelog.md
│   ├── CNAME
│   ├── concepts
│   │   ├── 01-what-is-biomcp.md
│   │   ├── 02-the-deep-researcher-persona.md
│   │   └── 03-sequential-thinking-with-the-think-tool.md
│   ├── developer-guides
│   │   ├── 01-server-deployment.md
│   │   ├── 02-contributing-and-testing.md
│   │   ├── 03-third-party-endpoints.md
│   │   ├── 04-transport-protocol.md
│   │   ├── 05-error-handling.md
│   │   ├── 06-http-client-and-caching.md
│   │   ├── 07-performance-optimizations.md
│   │   └── generate_endpoints.py
│   ├── faq-condensed.md
│   ├── FDA_SECURITY.md
│   ├── genomoncology.md
│   ├── getting-started
│   │   ├── 01-quickstart-cli.md
│   │   ├── 02-claude-desktop-integration.md
│   │   └── 03-authentication-and-api-keys.md
│   ├── how-to-guides
│   │   ├── 01-find-articles-and-cbioportal-data.md
│   │   ├── 02-find-trials-with-nci-and-biothings.md
│   │   ├── 03-get-comprehensive-variant-annotations.md
│   │   ├── 04-predict-variant-effects-with-alphagenome.md
│   │   ├── 05-logging-and-monitoring-with-bigquery.md
│   │   └── 06-search-nci-organizations-and-interventions.md
│   ├── index.md
│   ├── policies.md
│   ├── reference
│   │   ├── architecture-diagrams.md
│   │   ├── quick-architecture.md
│   │   ├── quick-reference.md
│   │   └── visual-architecture.md
│   ├── robots.txt
│   ├── stylesheets
│   │   ├── announcement.css
│   │   └── extra.css
│   ├── troubleshooting.md
│   ├── tutorials
│   │   ├── biothings-prompts.md
│   │   ├── claude-code-biomcp-alphagenome.md
│   │   ├── nci-prompts.md
│   │   ├── openfda-integration.md
│   │   ├── openfda-prompts.md
│   │   ├── pydantic-ai-integration.md
│   │   └── remote-connection.md
│   ├── user-guides
│   │   ├── 01-command-line-interface.md
│   │   ├── 02-mcp-tools-reference.md
│   │   └── 03-integrating-with-ides-and-clients.md
│   └── workflows
│       └── all-workflows.md
├── example_scripts
│   ├── mcp_integration.py
│   └── python_sdk.py
├── glama.json
├── LICENSE
├── lzyank.toml
├── Makefile
├── mkdocs.yml
├── package-lock.json
├── package.json
├── pyproject.toml
├── README.md
├── scripts
│   ├── check_docs_in_mkdocs.py
│   ├── check_http_imports.py
│   └── generate_endpoints_doc.py
├── smithery.yaml
├── src
│   └── biomcp
│       ├── __init__.py
│       ├── __main__.py
│       ├── articles
│       │   ├── __init__.py
│       │   ├── autocomplete.py
│       │   ├── fetch.py
│       │   ├── preprints.py
│       │   ├── search_optimized.py
│       │   ├── search.py
│       │   └── unified.py
│       ├── biomarkers
│       │   ├── __init__.py
│       │   └── search.py
│       ├── cbioportal_helper.py
│       ├── circuit_breaker.py
│       ├── cli
│       │   ├── __init__.py
│       │   ├── articles.py
│       │   ├── biomarkers.py
│       │   ├── diseases.py
│       │   ├── health.py
│       │   ├── interventions.py
│       │   ├── main.py
│       │   ├── openfda.py
│       │   ├── organizations.py
│       │   ├── server.py
│       │   ├── trials.py
│       │   └── variants.py
│       ├── connection_pool.py
│       ├── constants.py
│       ├── core.py
│       ├── diseases
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── domain_handlers.py
│       ├── drugs
│       │   ├── __init__.py
│       │   └── getter.py
│       ├── exceptions.py
│       ├── genes
│       │   ├── __init__.py
│       │   └── getter.py
│       ├── http_client_simple.py
│       ├── http_client.py
│       ├── individual_tools.py
│       ├── integrations
│       │   ├── __init__.py
│       │   ├── biothings_client.py
│       │   └── cts_api.py
│       ├── interventions
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── logging_filter.py
│       ├── metrics_handler.py
│       ├── metrics.py
│       ├── openfda
│       │   ├── __init__.py
│       │   ├── adverse_events_helpers.py
│       │   ├── adverse_events.py
│       │   ├── cache.py
│       │   ├── constants.py
│       │   ├── device_events_helpers.py
│       │   ├── device_events.py
│       │   ├── drug_approvals.py
│       │   ├── drug_labels_helpers.py
│       │   ├── drug_labels.py
│       │   ├── drug_recalls_helpers.py
│       │   ├── drug_recalls.py
│       │   ├── drug_shortages_detail_helpers.py
│       │   ├── drug_shortages_helpers.py
│       │   ├── drug_shortages.py
│       │   ├── exceptions.py
│       │   ├── input_validation.py
│       │   ├── rate_limiter.py
│       │   ├── utils.py
│       │   └── validation.py
│       ├── organizations
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── parameter_parser.py
│       ├── prefetch.py
│       ├── query_parser.py
│       ├── query_router.py
│       ├── rate_limiter.py
│       ├── render.py
│       ├── request_batcher.py
│       ├── resources
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   ├── instructions.md
│       │   └── researcher.md
│       ├── retry.py
│       ├── router_handlers.py
│       ├── router.py
│       ├── shared_context.py
│       ├── thinking
│       │   ├── __init__.py
│       │   ├── sequential.py
│       │   └── session.py
│       ├── thinking_tool.py
│       ├── thinking_tracker.py
│       ├── trials
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   ├── nci_getter.py
│       │   ├── nci_search.py
│       │   └── search.py
│       ├── utils
│       │   ├── __init__.py
│       │   ├── cancer_types_api.py
│       │   ├── cbio_http_adapter.py
│       │   ├── endpoint_registry.py
│       │   ├── gene_validator.py
│       │   ├── metrics.py
│       │   ├── mutation_filter.py
│       │   ├── query_utils.py
│       │   ├── rate_limiter.py
│       │   └── request_cache.py
│       ├── variants
│       │   ├── __init__.py
│       │   ├── alphagenome.py
│       │   ├── cancer_types.py
│       │   ├── cbio_external_client.py
│       │   ├── cbioportal_mutations.py
│       │   ├── cbioportal_search_helpers.py
│       │   ├── cbioportal_search.py
│       │   ├── constants.py
│       │   ├── external.py
│       │   ├── filters.py
│       │   ├── getter.py
│       │   ├── links.py
│       │   └── search.py
│       └── workers
│           ├── __init__.py
│           ├── worker_entry_stytch.js
│           ├── worker_entry.js
│           └── worker.py
├── tests
│   ├── bdd
│   │   ├── cli_help
│   │   │   ├── help.feature
│   │   │   └── test_help.py
│   │   ├── conftest.py
│   │   ├── features
│   │   │   └── alphagenome_integration.feature
│   │   ├── fetch_articles
│   │   │   ├── fetch.feature
│   │   │   └── test_fetch.py
│   │   ├── get_trials
│   │   │   ├── get.feature
│   │   │   └── test_get.py
│   │   ├── get_variants
│   │   │   ├── get.feature
│   │   │   └── test_get.py
│   │   ├── search_articles
│   │   │   ├── autocomplete.feature
│   │   │   ├── search.feature
│   │   │   ├── test_autocomplete.py
│   │   │   └── test_search.py
│   │   ├── search_trials
│   │   │   ├── search.feature
│   │   │   └── test_search.py
│   │   ├── search_variants
│   │   │   ├── search.feature
│   │   │   └── test_search.py
│   │   └── steps
│   │       └── test_alphagenome_steps.py
│   ├── config
│   │   └── test_smithery_config.py
│   ├── conftest.py
│   ├── data
│   │   ├── ct_gov
│   │   │   ├── clinical_trials_api_v2.yaml
│   │   │   ├── trials_NCT04280705.json
│   │   │   └── trials_NCT04280705.txt
│   │   ├── myvariant
│   │   │   ├── myvariant_api.yaml
│   │   │   ├── myvariant_field_descriptions.csv
│   │   │   ├── variants_full_braf_v600e.json
│   │   │   ├── variants_full_braf_v600e.txt
│   │   │   └── variants_part_braf_v600_multiple.json
│   │   ├── openfda
│   │   │   ├── drugsfda_detail.json
│   │   │   ├── drugsfda_search.json
│   │   │   ├── enforcement_detail.json
│   │   │   └── enforcement_search.json
│   │   └── pubtator
│   │       ├── pubtator_autocomplete.json
│   │       └── pubtator3_paper.txt
│   ├── integration
│   │   ├── test_openfda_integration.py
│   │   ├── test_preprints_integration.py
│   │   ├── test_simple.py
│   │   └── test_variants_integration.py
│   ├── tdd
│   │   ├── articles
│   │   │   ├── test_autocomplete.py
│   │   │   ├── test_cbioportal_integration.py
│   │   │   ├── test_fetch.py
│   │   │   ├── test_preprints.py
│   │   │   ├── test_search.py
│   │   │   └── test_unified.py
│   │   ├── conftest.py
│   │   ├── drugs
│   │   │   ├── __init__.py
│   │   │   └── test_drug_getter.py
│   │   ├── openfda
│   │   │   ├── __init__.py
│   │   │   ├── test_adverse_events.py
│   │   │   ├── test_device_events.py
│   │   │   ├── test_drug_approvals.py
│   │   │   ├── test_drug_labels.py
│   │   │   ├── test_drug_recalls.py
│   │   │   ├── test_drug_shortages.py
│   │   │   └── test_security.py
│   │   ├── test_biothings_integration_real.py
│   │   ├── test_biothings_integration.py
│   │   ├── test_circuit_breaker.py
│   │   ├── test_concurrent_requests.py
│   │   ├── test_connection_pool.py
│   │   ├── test_domain_handlers.py
│   │   ├── test_drug_approvals.py
│   │   ├── test_drug_recalls.py
│   │   ├── test_drug_shortages.py
│   │   ├── test_endpoint_documentation.py
│   │   ├── test_error_scenarios.py
│   │   ├── test_europe_pmc_fetch.py
│   │   ├── test_mcp_integration.py
│   │   ├── test_mcp_tools.py
│   │   ├── test_metrics.py
│   │   ├── test_nci_integration.py
│   │   ├── test_nci_mcp_tools.py
│   │   ├── test_network_policies.py
│   │   ├── test_offline_mode.py
│   │   ├── test_openfda_unified.py
│   │   ├── test_pten_r173_search.py
│   │   ├── test_render.py
│   │   ├── test_request_batcher.py.disabled
│   │   ├── test_retry.py
│   │   ├── test_router.py
│   │   ├── test_shared_context.py.disabled
│   │   ├── test_unified_biothings.py
│   │   ├── thinking
│   │   │   ├── __init__.py
│   │   │   └── test_sequential.py
│   │   ├── trials
│   │   │   ├── test_backward_compatibility.py
│   │   │   ├── test_getter.py
│   │   │   └── test_search.py
│   │   ├── utils
│   │   │   ├── test_gene_validator.py
│   │   │   ├── test_mutation_filter.py
│   │   │   ├── test_rate_limiter.py
│   │   │   └── test_request_cache.py
│   │   ├── variants
│   │   │   ├── constants.py
│   │   │   ├── test_alphagenome_api_key.py
│   │   │   ├── test_alphagenome_comprehensive.py
│   │   │   ├── test_alphagenome.py
│   │   │   ├── test_cbioportal_mutations.py
│   │   │   ├── test_cbioportal_search.py
│   │   │   ├── test_external_integration.py
│   │   │   ├── test_external.py
│   │   │   ├── test_extract_gene_aa_change.py
│   │   │   ├── test_filters.py
│   │   │   ├── test_getter.py
│   │   │   ├── test_links.py
│   │   │   └── test_search.py
│   │   └── workers
│   │       └── test_worker_sanitization.js
│   └── test_pydantic_ai_integration.py
├── THIRD_PARTY_ENDPOINTS.md
├── tox.ini
├── uv.lock
└── wrangler.toml
```

# Files

--------------------------------------------------------------------------------
/docs/tutorials/nci-prompts.md:
--------------------------------------------------------------------------------

```markdown
  1 | # NCI Tools Example Prompts
  2 | 
  3 | This guide provides example prompts for AI assistants to effectively use the NCI (National Cancer Institute) Clinical Trials Search API tools in BioMCP.
  4 | 
  5 | ## Overview of NCI Tools
  6 | 
  7 | BioMCP integrates with the NCI Clinical Trials Search API to provide:
  8 | 
  9 | - **Organization Search & Lookup** - Find cancer research centers, hospitals, and trial sponsors
 10 | - **Intervention Search & Lookup** - Search for drugs, devices, procedures, and other interventions
 11 | 
 12 | These tools require an NCI API key from: https://clinicaltrialsapi.cancer.gov/
 13 | 
 14 | ## Best Practices
 15 | 
 16 | ### API Key Required
 17 | 
 18 | All example prompts in this guide should include your NCI API key. Add this to the end of each prompt:
 19 | 
 20 | ```
 21 | "... my NCI API key is YOUR_API_KEY"
 22 | ```
 23 | 
 24 | ### Location Searches
 25 | 
 26 | **ALWAYS use city AND state together** when searching organizations by location. The NCI API has Elasticsearch limitations that cause errors with broad searches.
 27 | 
 28 | ✅ **Good**: `nci_organization_searcher(city="Cleveland", state="OH")`
 29 | ❌ **Bad**: `nci_organization_searcher(city="Cleveland")` or `nci_organization_searcher(state="OH")`
 30 | 
 31 | ### API Parameter Notes
 32 | 
 33 | - The NCI APIs do not support offset-based pagination (`from` parameter)
 34 | - Organization location parameters use `org_` prefix (e.g., `org_city`, `org_state_or_province`)
 35 | - When using `size` parameter, the API may not return a `total` count
 36 | 
 37 | ### Avoiding API Errors
 38 | 
 39 | - Use specific organization names when possible
 40 | - Combine multiple filters (name + type, city + state)
 41 | - Start with more specific searches, then broaden if needed
 42 | 
 43 | ## Organization Tools
 44 | 
 45 | ### Organization Search
 46 | 
 47 | #### Basic Organization Search
 48 | 
 49 | ```
 50 | "Find cancer centers in California, my NCI API key is YOUR_API_KEY"
 51 | "Search for MD Anderson Cancer Center, my NCI API key is YOUR_API_KEY"
 52 | "List academic cancer research centers in New York, my NCI API key is YOUR_API_KEY"
 53 | "Find all NCI-designated cancer centers, my NCI API key is YOUR_API_KEY"
 54 | ```
 55 | 
 56 | **Expected tool usage**: `nci_organization_searcher(state="CA", organization_type="Academic")`
 57 | 
 58 | #### Organization by Location
 59 | 
 60 | **IMPORTANT**: Always use city AND state together to avoid API errors!
 61 | 
 62 | ```
 63 | "Show me cancer treatment centers in Boston, MA, my NCI API key is YOUR_API_KEY"
 64 | "Find clinical trial sites in Houston, Texas, my NCI API key is YOUR_API_KEY"
 65 | "List all cancer research organizations in Cleveland, OH, my NCI API key is YOUR_API_KEY"
 66 | "Search for industry sponsors in San Francisco, CA, my NCI API key is YOUR_API_KEY"
 67 | ```
 68 | 
 69 | **Expected tool usage**: `nci_organization_searcher(city="Boston", state="MA")` ✓
 70 | **Never use**: `nci_organization_searcher(city="Boston")` ✗ or `nci_organization_searcher(state="MA")` ✗
 71 | 
 72 | #### Organization by Type
 73 | 
 74 | ```
 75 | "Find all government cancer research facilities, my NCI API key is YOUR_API_KEY"
 76 | "List pharmaceutical companies running cancer trials, my NCI API key is YOUR_API_KEY"
 77 | "Show me academic medical centers conducting trials, my NCI API key is YOUR_API_KEY"
 78 | "Find community hospitals participating in cancer research, my NCI API key is YOUR_API_KEY"
 79 | ```
 80 | 
 81 | **Expected tool usage**: `nci_organization_searcher(organization_type="Industry")`
 82 | 
 83 | ### Organization Details
 84 | 
 85 | ```
 86 | "Get details about organization NCI-2011-03337, my NCI API key is YOUR_API_KEY"
 87 | "Show me contact information for this cancer center, my NCI API key is YOUR_API_KEY"
 88 | "What trials is this organization conducting? My NCI API key is YOUR_API_KEY"
 89 | "Give me the full profile of this research institution, my NCI API key is YOUR_API_KEY"
 90 | ```
 91 | 
 92 | **Expected tool usage**: `organization_getter(organization_id="NCI-2011-03337")`
 93 | 
 94 | ## Intervention Tools
 95 | 
 96 | ### Intervention Search
 97 | 
 98 | #### Drug Search
 99 | 
100 | ```
101 | "Find all trials using pembrolizumab, my NCI API key is YOUR_API_KEY"
102 | "Search for PD-1 inhibitor drugs in trials, my NCI API key is YOUR_API_KEY"
103 | "List all immunotherapy drugs being tested, my NCI API key is YOUR_API_KEY"
104 | "Find trials using Keytruda or similar drugs, my NCI API key is YOUR_API_KEY"
105 | ```
106 | 
107 | **Expected tool usage**: `nci_intervention_searcher(name="pembrolizumab", intervention_type="Drug")`
108 | 
109 | #### Device Search
110 | 
111 | ```
112 | "Search for medical devices in cancer trials, my NCI API key is YOUR_API_KEY"
113 | "Find trials using surgical robots, my NCI API key is YOUR_API_KEY"
114 | "List radiation therapy devices being tested, my NCI API key is YOUR_API_KEY"
115 | "Show me trials with diagnostic devices, my NCI API key is YOUR_API_KEY"
116 | ```
117 | 
118 | **Expected tool usage**: `nci_intervention_searcher(intervention_type="Device")`
119 | 
120 | #### Procedure Search
121 | 
122 | ```
123 | "Find surgical procedures in cancer trials, my NCI API key is YOUR_API_KEY"
124 | "Search for minimally invasive surgery trials, my NCI API key is YOUR_API_KEY"
125 | "List trials with radiation therapy procedures, my NCI API key is YOUR_API_KEY"
126 | "Show me trials testing new biopsy techniques, my NCI API key is YOUR_API_KEY"
127 | ```
128 | 
129 | **Expected tool usage**: `nci_intervention_searcher(intervention_type="Procedure")`
130 | 
131 | #### Other Interventions
132 | 
133 | ```
134 | "Find behavioral interventions for cancer patients, my NCI API key is YOUR_API_KEY"
135 | "Search for dietary interventions in trials, my NCI API key is YOUR_API_KEY"
136 | "List genetic therapy trials, my NCI API key is YOUR_API_KEY"
137 | "Show me trials with exercise interventions, my NCI API key is YOUR_API_KEY"
138 | ```
139 | 
140 | **Expected tool usage**: `nci_intervention_searcher(intervention_type="Behavioral")`
141 | 
142 | ### Intervention Details
143 | 
144 | ```
145 | "Get full details about intervention INT123456, my NCI API key is YOUR_API_KEY"
146 | "Show me the mechanism of action for this drug, my NCI API key is YOUR_API_KEY"
147 | "Is this intervention FDA approved? My NCI API key is YOUR_API_KEY"
148 | "What trials are using this intervention? My NCI API key is YOUR_API_KEY"
149 | ```
150 | 
151 | **Expected tool usage**: `intervention_getter(intervention_id="INT123456")`
152 | 
153 | ## Biomarker Tools
154 | 
155 | ### Biomarker Search
156 | 
157 | #### Basic Biomarker Search
158 | 
159 | ```
160 | "Find PD-L1 expression biomarkers, my NCI API key is YOUR_API_KEY"
161 | "Search for EGFR mutations used in trials, my NCI API key is YOUR_API_KEY"
162 | "List biomarkers tested by IHC, my NCI API key is YOUR_API_KEY"
163 | "Find HER2 positive biomarkers, my NCI API key is YOUR_API_KEY"
164 | ```
165 | 
166 | **Expected tool usage**: `nci_biomarker_searcher(name="PD-L1")`
167 | 
168 | #### Biomarker by Type
169 | 
170 | ```
171 | "Show me all reference gene biomarkers, my NCI API key is YOUR_API_KEY"
172 | "Find branch biomarkers, my NCI API key is YOUR_API_KEY"
173 | "List all biomarkers of type reference_gene, my NCI API key is YOUR_API_KEY"
174 | ```
175 | 
176 | **Expected tool usage**: `nci_biomarker_searcher(biomarker_type="reference_gene")`
177 | 
178 | #### Important Note on Biomarker Types
179 | 
180 | The NCI API only supports two biomarker types:
181 | 
182 | - `reference_gene`: Gene-based biomarkers
183 | - `branch`: Branch/pathway biomarkers
184 | 
185 | Note: The API does NOT support searching by gene symbol or assay type directly.
186 | 
187 | ## NCI Disease Tools
188 | 
189 | ### Disease Search
190 | 
191 | #### Basic Disease Search
192 | 
193 | ```
194 | "Find melanoma in NCI vocabulary, my NCI API key is YOUR_API_KEY"
195 | "Search for lung cancer types, my NCI API key is YOUR_API_KEY"
196 | "List breast cancer subtypes, my NCI API key is YOUR_API_KEY"
197 | "Find official name for GIST, my NCI API key is YOUR_API_KEY"
198 | ```
199 | 
200 | **Expected tool usage**: `nci_disease_searcher(name="melanoma")`
201 | 
202 | #### Disease with Synonyms
203 | 
204 | ```
205 | "Find all names for gastrointestinal stromal tumor, my NCI API key is YOUR_API_KEY"
206 | "Search for NSCLC and all its synonyms, my NCI API key is YOUR_API_KEY"
207 | "List all terms for triple-negative breast cancer, my NCI API key is YOUR_API_KEY"
208 | "Find alternative names for melanoma, my NCI API key is YOUR_API_KEY"
209 | ```
210 | 
211 | **Expected tool usage**: `nci_disease_searcher(name="GIST", include_synonyms=True)`
212 | 
213 | ## Combined Workflows
214 | 
215 | ### Finding Trials at Specific Centers
216 | 
217 | ```
218 | "First find cancer centers in California, then show me their trials, my NCI API key is YOUR_API_KEY"
219 | ```
220 | 
221 | **Expected workflow**:
222 | 
223 | 1. `nci_organization_searcher(state="CA")`
224 | 2. For each organization, search trials with that sponsor
225 | 
226 | ### Drug Development Pipeline
227 | 
228 | ```
229 | "Search for CAR-T cell therapies and show me which organizations are developing them, my NCI API key is YOUR_API_KEY"
230 | ```
231 | 
232 | **Expected workflow**:
233 | 
234 | 1. `nci_intervention_searcher(name="CAR-T", intervention_type="Biological")`
235 | 2. For each intervention, get details to see associated trials
236 | 3. Extract organization information from trial data
237 | 
238 | ### Regional Cancer Research
239 | 
240 | ```
241 | "What cancer drugs are being tested in Boston area hospitals? My NCI API key is YOUR_API_KEY"
242 | ```
243 | 
244 | **Expected workflow**:
245 | 
246 | 1. `nci_organization_searcher(city="Boston", state="MA")`
247 | 2. `trial_searcher(location="Boston, MA", source="nci")` with organization filters
248 | 3. Extract intervention information from trials
249 | 
250 | ## Important Notes
251 | 
252 | ### API Key Handling
253 | 
254 | All NCI tools require an API key. The tools will check for:
255 | 
256 | 1. API key provided in the function call
257 | 2. `NCI_API_KEY` environment variable
258 | 3. User-provided key in their message (e.g., "my NCI API key is...")
259 | 
260 | ### Synonym Support
261 | 
262 | The intervention searcher includes a `synonyms` parameter (default: True) that will search for:
263 | 
264 | - Drug trade names (e.g., "Keytruda" finds "pembrolizumab")
265 | - Alternative spellings
266 | - Related terms
267 | 
268 | ### Pagination
269 | 
270 | Both search tools support pagination:
271 | 
272 | - `page`: Page number (1-based)
273 | - `page_size`: Results per page (max 100)
274 | 
275 | ### Organization Types
276 | 
277 | Valid organization types include:
278 | 
279 | - Academic
280 | - Industry
281 | - Government
282 | - Community
283 | - Network
284 | - Other
285 | 
286 | ### Intervention Types
287 | 
288 | Valid intervention types include:
289 | 
290 | - Drug
291 | - Device
292 | - Biological
293 | - Procedure
294 | - Radiation
295 | - Behavioral
296 | - Genetic
297 | - Dietary
298 | - Other
299 | 
300 | ## Error Handling
301 | 
302 | Common errors and solutions:
303 | 
304 | 1. **"NCI API key required"**: User needs to provide an API key
305 | 2. **"No results found"**: Try broader search terms or remove filters
306 | 3. **"Invalid organization/intervention ID"**: Verify the ID format
307 | 4. **Rate limiting**: The API has rate limits; wait before retrying
308 | 5. **"Search Too Broad" (Elasticsearch error)**: The search returns too many results
309 |    - This happens when searching with broad criteria
310 |    - **Prevention**: Always use city AND state together for location searches
311 |    - Add organization name (even partial) to narrow results
312 |    - Avoid searching by state alone or organization type alone
313 | 
```

--------------------------------------------------------------------------------
/src/biomcp/interventions/search.py:
--------------------------------------------------------------------------------

```python
  1 | """Search functionality for interventions via NCI CTS API."""
  2 | 
  3 | import logging
  4 | from typing import Any
  5 | 
  6 | from ..constants import NCI_INTERVENTIONS_URL
  7 | from ..integrations.cts_api import CTSAPIError, make_cts_request
  8 | from ..utils import parse_or_query
  9 | 
 10 | logger = logging.getLogger(__name__)
 11 | 
 12 | 
 13 | # Intervention types based on ClinicalTrials.gov categories
 14 | INTERVENTION_TYPES = [
 15 |     "Drug",
 16 |     "Device",
 17 |     "Biological",
 18 |     "Procedure",
 19 |     "Radiation",
 20 |     "Behavioral",
 21 |     "Genetic",
 22 |     "Dietary",
 23 |     "Diagnostic Test",
 24 |     "Other",
 25 | ]
 26 | 
 27 | 
 28 | def _build_intervention_params(
 29 |     name: str | None,
 30 |     intervention_type: str | None,
 31 |     category: str | None,
 32 |     codes: list[str] | None,
 33 |     include: list[str] | None,
 34 |     sort: str | None,
 35 |     order: str | None,
 36 |     page_size: int | None,
 37 | ) -> dict[str, Any]:
 38 |     """Build query parameters for intervention search."""
 39 |     params: dict[str, Any] = {}
 40 | 
 41 |     if name:
 42 |         params["name"] = name
 43 | 
 44 |     if intervention_type:
 45 |         params["type"] = intervention_type.lower()
 46 | 
 47 |     if category:
 48 |         params["category"] = category
 49 | 
 50 |     if codes:
 51 |         params["codes"] = ",".join(codes) if isinstance(codes, list) else codes
 52 | 
 53 |     if include:
 54 |         params["include"] = (
 55 |             ",".join(include) if isinstance(include, list) else include
 56 |         )
 57 | 
 58 |     if sort:
 59 |         params["sort"] = sort
 60 |         if order:
 61 |             params["order"] = order.lower()
 62 | 
 63 |     # Only add size if explicitly requested and > 0
 64 |     if page_size and page_size > 0:
 65 |         params["size"] = page_size
 66 | 
 67 |     return params
 68 | 
 69 | 
 70 | def _process_intervention_response(
 71 |     response: Any,
 72 |     page: int,
 73 |     page_size: int | None,
 74 | ) -> dict[str, Any]:
 75 |     """Process intervention search response."""
 76 |     if isinstance(response, dict):
 77 |         # Standard response format from the API
 78 |         interventions = response.get("data", [])
 79 |         # When size parameter is used, API doesn't return 'total'
 80 |         total = response.get("total", len(interventions))
 81 |     elif isinstance(response, list):
 82 |         # Direct list of interventions
 83 |         interventions = response
 84 |         total = len(interventions)
 85 |     else:
 86 |         # Unexpected response format
 87 |         logger.warning(f"Unexpected response type: {type(response)}")
 88 |         interventions = []
 89 |         total = 0
 90 | 
 91 |     return {
 92 |         "interventions": interventions,
 93 |         "total": total,
 94 |         "page": page,
 95 |         "page_size": page_size,
 96 |     }
 97 | 
 98 | 
 99 | async def search_interventions(
100 |     name: str | None = None,
101 |     intervention_type: str | None = None,
102 |     category: str | None = None,
103 |     codes: list[str] | None = None,
104 |     include: list[str] | None = None,
105 |     sort: str | None = None,
106 |     order: str | None = None,
107 |     synonyms: bool = True,  # Kept for backward compatibility but ignored
108 |     page_size: int | None = None,
109 |     page: int = 1,
110 |     api_key: str | None = None,
111 | ) -> dict[str, Any]:
112 |     """
113 |     Search for interventions in the NCI CTS database.
114 | 
115 |     Args:
116 |         name: Intervention name to search for (partial match)
117 |         intervention_type: Type of intervention (Drug, Device, Procedure, etc.)
118 |         category: Category filter (agent, agent category, other)
119 |         codes: List of intervention codes to search for (e.g., ["C82416", "C171257"])
120 |         include: Fields to include in response (all fields, name, category, codes, etc.)
121 |         sort: Sort field (default: 'name', also supports 'count')
122 |         order: Sort order ('asc' or 'desc', required when using sort)
123 |         synonyms: [Deprecated] Kept for backward compatibility but ignored
124 |         page_size: Number of results per page (when used, 'total' field not returned)
125 |         page: Page number (Note: API doesn't support offset pagination)
126 |         api_key: Optional API key (if not provided, uses NCI_API_KEY env var)
127 | 
128 |     Returns:
129 |         Dictionary with search results containing:
130 |         - interventions: List of intervention records
131 |         - total: Total number of results (only when size not specified)
132 |         - page: Current page
133 |         - page_size: Results per page
134 | 
135 |     Raises:
136 |         CTSAPIError: If the API request fails
137 |     """
138 |     # Build query parameters
139 |     params = _build_intervention_params(
140 |         name,
141 |         intervention_type,
142 |         category,
143 |         codes,
144 |         include,
145 |         sort,
146 |         order,
147 |         page_size,
148 |     )
149 | 
150 |     logger.info(
151 |         f"Searching interventions at {NCI_INTERVENTIONS_URL} with params: {params}"
152 |     )
153 | 
154 |     try:
155 |         # Make API request
156 |         response = await make_cts_request(
157 |             url=NCI_INTERVENTIONS_URL,
158 |             params=params,
159 |             api_key=api_key,
160 |         )
161 | 
162 |         # Log response info
163 |         logger.debug(f"Response type: {type(response)}")
164 | 
165 |         # Process response
166 |         return _process_intervention_response(response, page, page_size)
167 | 
168 |     except CTSAPIError:
169 |         raise
170 |     except Exception as e:
171 |         logger.error(f"Failed to search interventions: {e}")
172 |         raise CTSAPIError(f"Intervention search failed: {e!s}") from e
173 | 
174 | 
175 | def format_intervention_results(results: dict[str, Any]) -> str:
176 |     """
177 |     Format intervention search results as markdown.
178 | 
179 |     Args:
180 |         results: Search results dictionary
181 | 
182 |     Returns:
183 |         Formatted markdown string
184 |     """
185 |     interventions = results.get("interventions", [])
186 |     total = results.get("total", 0)
187 | 
188 |     if not interventions:
189 |         return "No interventions found matching the search criteria."
190 | 
191 |     # Build markdown output
192 |     actual_count = len(interventions)
193 |     if actual_count < total:
194 |         lines = [
195 |             f"## Intervention Search Results (showing {actual_count} of {total} found)",
196 |             "",
197 |         ]
198 |     else:
199 |         lines = [
200 |             f"## Intervention Search Results ({total} found)",
201 |             "",
202 |         ]
203 | 
204 |     for intervention in interventions:
205 |         int_id = intervention.get(
206 |             "id", intervention.get("intervention_id", "Unknown")
207 |         )
208 |         name = intervention.get("name", "Unknown Intervention")
209 |         int_type = intervention.get(
210 |             "type", intervention.get("category", "Unknown")
211 |         )
212 | 
213 |         lines.append(f"### {name}")
214 |         lines.append(f"- **ID**: {int_id}")
215 |         lines.append(f"- **Type**: {int_type}")
216 | 
217 |         # Add synonyms if available
218 |         synonyms = intervention.get("synonyms", [])
219 |         if synonyms:
220 |             if isinstance(synonyms, list):
221 |                 lines.append(f"- **Synonyms**: {', '.join(synonyms[:5])}")
222 |                 if len(synonyms) > 5:
223 |                     lines.append(f"  *(and {len(synonyms) - 5} more)*")
224 |             elif isinstance(synonyms, str):
225 |                 lines.append(f"- **Synonyms**: {synonyms}")
226 | 
227 |         # Add description if available
228 |         if intervention.get("description"):
229 |             desc = intervention["description"]
230 |             if len(desc) > 200:
231 |                 desc = desc[:197] + "..."
232 |             lines.append(f"- **Description**: {desc}")
233 | 
234 |         lines.append("")
235 | 
236 |     return "\n".join(lines)
237 | 
238 | 
239 | async def search_interventions_with_or(
240 |     name_query: str,
241 |     intervention_type: str | None = None,
242 |     category: str | None = None,
243 |     codes: list[str] | None = None,
244 |     include: list[str] | None = None,
245 |     sort: str | None = None,
246 |     order: str | None = None,
247 |     synonyms: bool = True,
248 |     page_size: int | None = None,
249 |     page: int = 1,
250 |     api_key: str | None = None,
251 | ) -> dict[str, Any]:
252 |     """
253 |     Search for interventions with OR query support.
254 | 
255 |     This function handles OR queries by making multiple API calls and combining results.
256 |     For example: "pembrolizumab OR nivolumab" will search for each term.
257 | 
258 |     Args:
259 |         name_query: Name query that may contain OR operators
260 |         Other args same as search_interventions
261 | 
262 |     Returns:
263 |         Combined results from all searches with duplicates removed
264 |     """
265 |     # Check if this is an OR query
266 |     if " OR " in name_query or " or " in name_query:
267 |         search_terms = parse_or_query(name_query)
268 |         logger.info(f"Parsed OR query into terms: {search_terms}")
269 |     else:
270 |         # Single term search
271 |         search_terms = [name_query]
272 | 
273 |     # Collect all unique interventions
274 |     all_interventions = {}
275 |     total_found = 0
276 | 
277 |     # Search for each term
278 |     for term in search_terms:
279 |         logger.info(f"Searching interventions for term: {term}")
280 |         try:
281 |             results = await search_interventions(
282 |                 name=term,
283 |                 intervention_type=intervention_type,
284 |                 category=category,
285 |                 codes=codes,
286 |                 include=include,
287 |                 sort=sort,
288 |                 order=order,
289 |                 synonyms=synonyms,
290 |                 page_size=page_size,
291 |                 page=page,
292 |                 api_key=api_key,
293 |             )
294 | 
295 |             # Add unique interventions (deduplicate by ID)
296 |             for intervention in results.get("interventions", []):
297 |                 int_id = intervention.get(
298 |                     "id", intervention.get("intervention_id")
299 |                 )
300 |                 if int_id and int_id not in all_interventions:
301 |                     all_interventions[int_id] = intervention
302 | 
303 |             total_found += results.get("total", 0)
304 | 
305 |         except Exception as e:
306 |             logger.warning(f"Failed to search for term '{term}': {e}")
307 |             # Continue with other terms
308 | 
309 |     # Convert back to list and apply pagination
310 |     unique_interventions = list(all_interventions.values())
311 | 
312 |     # Sort by name for consistent results
313 |     unique_interventions.sort(key=lambda x: x.get("name", "").lower())
314 | 
315 |     # Apply pagination to combined results
316 |     if page_size:
317 |         start_idx = (page - 1) * page_size
318 |         end_idx = start_idx + page_size
319 |         paginated_interventions = unique_interventions[start_idx:end_idx]
320 |     else:
321 |         paginated_interventions = unique_interventions
322 | 
323 |     return {
324 |         "interventions": paginated_interventions,
325 |         "total": len(unique_interventions),
326 |         "page": page,
327 |         "page_size": page_size,
328 |         "search_terms": search_terms,  # Include what we searched for
329 |         "total_found_across_terms": total_found,  # Total before deduplication
330 |     }
331 | 
```

--------------------------------------------------------------------------------
/docs/developer-guides/01-server-deployment.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Server Deployment Guide
  2 | 
  3 | This guide covers various deployment options for BioMCP, from local development to production cloud deployments with authentication.
  4 | 
  5 | ## Deployment Options Overview
  6 | 
  7 | | Mode                  | Use Case      | Transport       | Authentication | Scalability |
  8 | | --------------------- | ------------- | --------------- | -------------- | ----------- |
  9 | | **Local STDIO**       | Development   | STDIO           | None           | Single user |
 10 | | **HTTP Server**       | Small teams   | Streamable HTTP | Optional       | Moderate    |
 11 | | **Docker**            | Containerized | Streamable HTTP | Optional       | Moderate    |
 12 | | **Cloudflare Worker** | Production    | SSE/HTTP        | OAuth optional | High        |
 13 | 
 14 | ## Local Development (STDIO)
 15 | 
 16 | The simplest deployment for development and testing.
 17 | 
 18 | ### Setup
 19 | 
 20 | ```bash
 21 | # Install BioMCP
 22 | uv tool install biomcp
 23 | 
 24 | # Run in STDIO mode (default)
 25 | biomcp run
 26 | ```
 27 | 
 28 | ### Configuration
 29 | 
 30 | For Claude Desktop integration:
 31 | 
 32 | ```json
 33 | {
 34 |   "mcpServers": {
 35 |     "biomcp": {
 36 |       "command": "biomcp",
 37 |       "args": ["run"]
 38 |     }
 39 |   }
 40 | }
 41 | ```
 42 | 
 43 | ### Use Cases
 44 | 
 45 | - Local development
 46 | - Single-user research
 47 | - Testing new features
 48 | 
 49 | ## HTTP Server Deployment
 50 | 
 51 | Modern deployment using Streamable HTTP transport.
 52 | 
 53 | ### Basic Setup
 54 | 
 55 | ```bash
 56 | # Run HTTP server
 57 | biomcp run --mode http --host 0.0.0.0 --port 8000
 58 | ```
 59 | 
 60 | ### With Environment Variables
 61 | 
 62 | ```bash
 63 | # Create .env file
 64 | cat > .env << EOF
 65 | BIOMCP_HOST=0.0.0.0
 66 | BIOMCP_PORT=8000
 67 | NCI_API_KEY=your-key
 68 | ALPHAGENOME_API_KEY=your-key
 69 | EOF
 70 | 
 71 | # Run with env file
 72 | biomcp run --mode http
 73 | ```
 74 | 
 75 | ### Systemd Service (Linux)
 76 | 
 77 | Create `/etc/systemd/system/biomcp.service`:
 78 | 
 79 | ```ini
 80 | [Unit]
 81 | Description=BioMCP Server
 82 | After=network.target
 83 | 
 84 | [Service]
 85 | Type=simple
 86 | User=biomcp
 87 | WorkingDirectory=/opt/biomcp
 88 | Environment="PATH=/usr/local/bin:/usr/bin"
 89 | EnvironmentFile=/opt/biomcp/.env
 90 | ExecStart=/usr/local/bin/biomcp run --mode http
 91 | Restart=always
 92 | RestartSec=10
 93 | 
 94 | [Install]
 95 | WantedBy=multi-user.target
 96 | ```
 97 | 
 98 | Enable and start:
 99 | 
100 | ```bash
101 | sudo systemctl enable biomcp
102 | sudo systemctl start biomcp
103 | ```
104 | 
105 | ### Nginx Reverse Proxy
106 | 
107 | ```nginx
108 | server {
109 |     listen 443 ssl;
110 |     server_name biomcp.example.com;
111 | 
112 |     ssl_certificate /etc/ssl/certs/biomcp.crt;
113 |     ssl_certificate_key /etc/ssl/private/biomcp.key;
114 | 
115 |     location /mcp {
116 |         proxy_pass http://localhost:8000;
117 |         proxy_http_version 1.1;
118 |         proxy_set_header Upgrade $http_upgrade;
119 |         proxy_set_header Connection "upgrade";
120 |         proxy_set_header Host $host;
121 |         proxy_set_header X-Real-IP $remote_addr;
122 |         proxy_buffering off;
123 |     }
124 | }
125 | ```
126 | 
127 | ## Docker Deployment
128 | 
129 | Containerized deployment for consistency and portability.
130 | 
131 | ### Basic Dockerfile
132 | 
133 | ```dockerfile
134 | FROM python:3.11-slim
135 | 
136 | # Install BioMCP
137 | RUN pip install biomcp-python
138 | 
139 | # Add API keys (use secrets in production!)
140 | ENV NCI_API_KEY=""
141 | ENV ALPHAGENOME_API_KEY=""
142 | 
143 | # Expose port
144 | EXPOSE 8000
145 | 
146 | # Run server
147 | CMD ["biomcp", "run", "--mode", "http", "--host", "0.0.0.0"]
148 | ```
149 | 
150 | ### With AlphaGenome Support
151 | 
152 | ```dockerfile
153 | FROM python:3.11-slim
154 | 
155 | # Install system dependencies
156 | RUN apt-get update && apt-get install -y git
157 | 
158 | # Install BioMCP
159 | RUN pip install biomcp-python
160 | 
161 | # Install AlphaGenome
162 | RUN git clone https://github.com/google-deepmind/alphagenome.git && \
163 |     cd alphagenome && \
164 |     pip install .
165 | 
166 | # Configure
167 | ENV MCP_MODE=http
168 | ENV BIOMCP_HOST=0.0.0.0
169 | ENV BIOMCP_PORT=8000
170 | 
171 | EXPOSE 8000
172 | 
173 | CMD ["biomcp", "run"]
174 | ```
175 | 
176 | ### Docker Compose
177 | 
178 | ```yaml
179 | version: "3.8"
180 | 
181 | services:
182 |   biomcp:
183 |     build: .
184 |     ports:
185 |       - "8000:8000"
186 |     environment:
187 |       - MCP_MODE=http
188 |       - NCI_API_KEY=${NCI_API_KEY}
189 |       - ALPHAGENOME_API_KEY=${ALPHAGENOME_API_KEY}
190 |     volumes:
191 |       - ./logs:/app/logs
192 |     restart: unless-stopped
193 |     healthcheck:
194 |       test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
195 |       interval: 30s
196 |       timeout: 10s
197 |       retries: 3
198 | ```
199 | 
200 | ### Running
201 | 
202 | ```bash
203 | # Build and run
204 | docker-compose up -d
205 | 
206 | # View logs
207 | docker-compose logs -f
208 | 
209 | # Scale horizontally
210 | docker-compose up -d --scale biomcp=3
211 | ```
212 | 
213 | ## Cloudflare Worker Deployment
214 | 
215 | Enterprise-grade deployment with global edge distribution.
216 | 
217 | ### Prerequisites
218 | 
219 | 1. Cloudflare account
220 | 2. Wrangler CLI installed
221 | 3. Remote BioMCP server running
222 | 
223 | ### Architecture
224 | 
225 | ```
226 | Claude Desktop → Cloudflare Worker (Edge) → BioMCP Server (Origin)
227 | ```
228 | 
229 | ### Setup Worker
230 | 
231 | 1. **Install dependencies:**
232 | 
233 | ```bash
234 | npm install @modelcontextprotocol/sdk itty-router
235 | ```
236 | 
237 | 2. **Create `wrangler.toml`:**
238 | 
239 | ```toml
240 | name = "biomcp-worker"
241 | main = "src/index.js"
242 | compatibility_date = "2024-01-01"
243 | 
244 | [vars]
245 | REMOTE_MCP_SERVER_URL = "https://your-biomcp-server.com/mcp"
246 | MCP_SERVER_API_KEY = "your-secret-key"
247 | 
248 | [[kv_namespaces]]
249 | binding = "AUTH_TOKENS"
250 | id = "your-kv-namespace-id"
251 | ```
252 | 
253 | 3. **Deploy:**
254 | 
255 | ```bash
256 | wrangler deploy
257 | ```
258 | 
259 | ### With OAuth Authentication (Stytch)
260 | 
261 | 1. **Configure Stytch:**
262 | 
263 | ```toml
264 | [vars]
265 | STYTCH_PROJECT_ID = "project-test-..."
266 | STYTCH_SECRET = "secret-test-..."
267 | STYTCH_PUBLIC_TOKEN = "public-token-test-..."
268 | JWT_SECRET = "your-jwt-secret"
269 | ```
270 | 
271 | 2. **OAuth Endpoints:**
272 |    The worker automatically provides:
273 | 
274 | - `/.well-known/oauth-authorization-server`
275 | - `/authorize`
276 | - `/callback`
277 | - `/token`
278 | 
279 | 3. **Client Configuration:**
280 | 
281 | ```json
282 | {
283 |   "mcpServers": {
284 |     "biomcp": {
285 |       "transport": {
286 |         "type": "sse",
287 |         "url": "https://your-worker.workers.dev"
288 |       },
289 |       "auth": {
290 |         "type": "oauth",
291 |         "client_id": "mcp-client",
292 |         "authorization_endpoint": "https://your-worker.workers.dev/authorize",
293 |         "token_endpoint": "https://your-worker.workers.dev/token",
294 |         "scope": "mcp:access"
295 |       }
296 |     }
297 |   }
298 | }
299 | ```
300 | 
301 | ## Production Considerations
302 | 
303 | ### Security
304 | 
305 | 1. **API Key Management:**
306 | 
307 | ```bash
308 | # Use environment variables
309 | export NCI_API_KEY="$(vault kv get -field=key secret/biomcp/nci)"
310 | 
311 | # Or use secrets management
312 | docker run --secret biomcp_keys biomcp:latest
313 | ```
314 | 
315 | 2. **Network Security:**
316 | 
317 | - Use HTTPS everywhere
318 | - Implement rate limiting
319 | - Set up CORS properly
320 | - Use authentication for public endpoints
321 | 
322 | 3. **Access Control:**
323 | 
324 | ```python
325 | # Example middleware
326 | async def auth_middleware(request, call_next):
327 |     token = request.headers.get("Authorization")
328 |     if not validate_token(token):
329 |         return JSONResponse({"error": "Unauthorized"}, status_code=401)
330 |     return await call_next(request)
331 | ```
332 | 
333 | ### Monitoring
334 | 
335 | 1. **Health Checks:**
336 | 
337 | ```python
338 | # Built-in health endpoint
339 | GET /health
340 | 
341 | # Custom health check
342 | @app.get("/health/detailed")
343 | async def health_detailed():
344 |     return {
345 |         "status": "healthy",
346 |         "version": __version__,
347 |         "apis": check_api_status(),
348 |         "timestamp": datetime.utcnow()
349 |     }
350 | ```
351 | 
352 | 2. **Metrics:**
353 | 
354 | ```python
355 | # Prometheus metrics
356 | from prometheus_client import Counter, Histogram
357 | 
358 | request_count = Counter('biomcp_requests_total', 'Total requests')
359 | request_duration = Histogram('biomcp_request_duration_seconds', 'Request duration')
360 | ```
361 | 
362 | 3. **Logging:**
363 | 
364 | ```python
365 | # Structured logging
366 | import structlog
367 | 
368 | logger = structlog.get_logger()
369 | logger.info("request_processed",
370 |     tool="article_searcher",
371 |     duration=0.234,
372 |     user_id="user123"
373 | )
374 | ```
375 | 
376 | ### Scaling
377 | 
378 | 1. **Horizontal Scaling:**
379 | 
380 | ```yaml
381 | # Kubernetes deployment
382 | apiVersion: apps/v1
383 | kind: Deployment
384 | metadata:
385 |   name: biomcp
386 | spec:
387 |   replicas: 3
388 |   selector:
389 |     matchLabels:
390 |       app: biomcp
391 |   template:
392 |     metadata:
393 |       labels:
394 |         app: biomcp
395 |     spec:
396 |       containers:
397 |         - name: biomcp
398 |           image: biomcp:latest
399 |           ports:
400 |             - containerPort: 8000
401 |           resources:
402 |             requests:
403 |               memory: "512Mi"
404 |               cpu: "500m"
405 |             limits:
406 |               memory: "1Gi"
407 |               cpu: "1000m"
408 | ```
409 | 
410 | 2. **Caching:**
411 | 
412 | ```python
413 | # Redis caching
414 | import redis
415 | from functools import wraps
416 | 
417 | redis_client = redis.Redis()
418 | 
419 | def cache_result(ttl=3600):
420 |     def decorator(func):
421 |         @wraps(func)
422 |         async def wrapper(*args, **kwargs):
423 |             key = f"{func.__name__}:{str(args)}:{str(kwargs)}"
424 |             cached = redis_client.get(key)
425 |             if cached:
426 |                 return json.loads(cached)
427 |             result = await func(*args, **kwargs)
428 |             redis_client.setex(key, ttl, json.dumps(result))
429 |             return result
430 |         return wrapper
431 |     return decorator
432 | ```
433 | 
434 | ### Performance Optimization
435 | 
436 | 1. **Connection Pooling:**
437 | 
438 | ```python
439 | # Reuse HTTP connections
440 | import httpx
441 | 
442 | client = httpx.AsyncClient(
443 |     limits=httpx.Limits(max_keepalive_connections=20),
444 |     timeout=httpx.Timeout(30.0)
445 | )
446 | ```
447 | 
448 | 2. **Async Processing:**
449 | 
450 | ```python
451 | # Process requests concurrently
452 | async def handle_batch(requests):
453 |     tasks = [process_request(req) for req in requests]
454 |     return await asyncio.gather(*tasks)
455 | ```
456 | 
457 | 3. **Response Compression:**
458 | 
459 | ```python
460 | # Enable gzip compression
461 | from fastapi.middleware.gzip import GZipMiddleware
462 | 
463 | app.add_middleware(GZipMiddleware, minimum_size=1000)
464 | ```
465 | 
466 | ## Migration Path
467 | 
468 | ### From STDIO to HTTP
469 | 
470 | 1. Update server startup:
471 | 
472 | ```bash
473 | # Old
474 | biomcp run
475 | 
476 | # New
477 | biomcp run --mode http
478 | ```
479 | 
480 | 2. Update client configuration:
481 | 
482 | ```json
483 | {
484 |   "mcpServers": {
485 |     "biomcp": {
486 |       "url": "http://localhost:8000/mcp"
487 |     }
488 |   }
489 | }
490 | ```
491 | 
492 | ### From SSE to Streamable HTTP
493 | 
494 | 1. Update worker code to use `/mcp` endpoint
495 | 2. Update client to use new transport:
496 | 
497 | ```json
498 | {
499 |   "transport": {
500 |     "type": "http",
501 |     "url": "https://biomcp.example.com/mcp"
502 |   }
503 | }
504 | ```
505 | 
506 | ## Troubleshooting
507 | 
508 | ### Common Issues
509 | 
510 | 1. **Port Already in Use:**
511 | 
512 | ```bash
513 | # Find process using port
514 | lsof -i :8000
515 | 
516 | # Kill process
517 | kill -9 <PID>
518 | ```
519 | 
520 | 2. **API Key Errors:**
521 | 
522 | ```bash
523 | # Verify environment variables
524 | env | grep -E "(NCI|ALPHAGENOME|CBIO)"
525 | 
526 | # Test API key
527 | curl -H "X-API-KEY: $NCI_API_KEY" https://api.cancer.gov/v2/trials
528 | ```
529 | 
530 | 3. **Connection Timeouts:**
531 | 
532 | - Increase timeout values
533 | - Check firewall rules
534 | - Verify network connectivity
535 | 
536 | ### Debug Mode
537 | 
538 | ```bash
539 | # Enable debug logging
540 | BIOMCP_LOG_LEVEL=DEBUG biomcp run --mode http
541 | 
542 | # Or in Docker
543 | docker run -e BIOMCP_LOG_LEVEL=DEBUG biomcp:latest
544 | ```
545 | 
546 | ## Next Steps
547 | 
548 | - Set up [monitoring](../how-to-guides/05-logging-and-monitoring-with-bigquery.md)
549 | - Configure [authentication](../getting-started/03-authentication-and-api-keys.md)
550 | - Review [security policies](../policies.md)
551 | - Implement [CI/CD pipeline](02-contributing-and-testing.md)
552 | 
```

--------------------------------------------------------------------------------
/src/biomcp/openfda/utils.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Utility functions for OpenFDA API integration.
  3 | """
  4 | 
  5 | import asyncio
  6 | import logging
  7 | import os
  8 | from typing import Any
  9 | 
 10 | from ..http_client import request_api
 11 | from .cache import (
 12 |     get_cached_response,
 13 |     is_cacheable_request,
 14 |     set_cached_response,
 15 | )
 16 | from .exceptions import (
 17 |     OpenFDAConnectionError,
 18 |     OpenFDARateLimitError,
 19 |     OpenFDATimeoutError,
 20 |     OpenFDAValidationError,
 21 | )
 22 | from .input_validation import build_safe_query
 23 | from .rate_limiter import FDA_CIRCUIT_BREAKER, FDA_RATE_LIMITER, FDA_SEMAPHORE
 24 | from .validation import sanitize_response, validate_fda_response
 25 | 
 26 | logger = logging.getLogger(__name__)
 27 | 
 28 | 
 29 | def get_api_key() -> str | None:
 30 |     """Get OpenFDA API key from environment variable."""
 31 |     api_key = os.environ.get("OPENFDA_API_KEY")
 32 |     if not api_key:
 33 |         logger.debug("No OPENFDA_API_KEY found in environment")
 34 |     return api_key
 35 | 
 36 | 
 37 | async def make_openfda_request(  # noqa: C901
 38 |     endpoint: str,
 39 |     params: dict[str, Any],
 40 |     domain: str = "openfda",
 41 |     api_key: str | None = None,
 42 |     max_retries: int = 3,
 43 |     initial_delay: float = 1.0,
 44 | ) -> tuple[dict[str, Any] | None, str | None]:
 45 |     """
 46 |     Make a request to the OpenFDA API with retry logic and caching.
 47 | 
 48 |     Args:
 49 |         endpoint: Full URL to the OpenFDA endpoint
 50 |         params: Query parameters
 51 |         domain: Domain name for metrics tracking
 52 |         api_key: Optional API key (overrides environment variable)
 53 |         max_retries: Maximum number of retry attempts (default 3)
 54 |         initial_delay: Initial delay in seconds for exponential backoff (default 1.0)
 55 | 
 56 |     Returns:
 57 |         Tuple of (response_data, error_message)
 58 |     """
 59 |     # Validate and sanitize input parameters
 60 |     safe_params = build_safe_query(params)
 61 | 
 62 |     # Check cache first (with safe params)
 63 |     if is_cacheable_request(endpoint, safe_params):
 64 |         cached_response = get_cached_response(endpoint, safe_params)
 65 |         if cached_response:
 66 |             return cached_response, None
 67 | 
 68 |     # Use provided API key or get from environment
 69 |     if not api_key:
 70 |         api_key = get_api_key()
 71 |     if api_key:
 72 |         safe_params["api_key"] = api_key
 73 | 
 74 |     last_error = None
 75 |     delay = initial_delay
 76 | 
 77 |     for attempt in range(max_retries + 1):
 78 |         try:
 79 |             # Apply rate limiting and circuit breaker
 80 |             async with FDA_SEMAPHORE:
 81 |                 await FDA_RATE_LIMITER.acquire()
 82 | 
 83 |                 # Check circuit breaker state
 84 |                 if FDA_CIRCUIT_BREAKER.is_open:
 85 |                     state = FDA_CIRCUIT_BREAKER.get_state()
 86 |                     return None, f"FDA API circuit breaker is open: {state}"
 87 | 
 88 |                 response, error = await request_api(
 89 |                     url=endpoint,
 90 |                     request=safe_params,
 91 |                     method="GET",
 92 |                     domain=domain,
 93 |                 )
 94 | 
 95 |             if error:
 96 |                 error_msg = (
 97 |                     error.message if hasattr(error, "message") else str(error)
 98 |                 )
 99 | 
100 |                 # Check for specific error types
101 |                 if "429" in error_msg or "rate limit" in error_msg.lower():
102 |                     if attempt < max_retries:
103 |                         logger.warning(
104 |                             f"Rate limit hit (attempt {attempt + 1}/{max_retries + 1}). "
105 |                             f"Retrying in {delay:.1f} seconds..."
106 |                         )
107 |                         await asyncio.sleep(delay)
108 |                         delay *= 2  # Exponential backoff
109 |                         continue
110 |                     else:
111 |                         raise OpenFDARateLimitError(error_msg)
112 | 
113 |                 # Check if error is retryable
114 |                 if _is_retryable_error(error_msg) and attempt < max_retries:
115 |                     logger.warning(
116 |                         f"OpenFDA API error (attempt {attempt + 1}/{max_retries + 1}): {error_msg}. "
117 |                         f"Retrying in {delay:.1f} seconds..."
118 |                     )
119 |                     await asyncio.sleep(delay)
120 |                     delay *= 2  # Exponential backoff
121 |                     continue
122 | 
123 |                 logger.error(f"OpenFDA API error: {error_msg}")
124 |                 return None, error_msg
125 | 
126 |             # Validate and sanitize response
127 |             if response:
128 |                 try:
129 |                     validate_fda_response(response, response_type="search")
130 |                     response = sanitize_response(response)
131 |                 except OpenFDAValidationError as e:
132 |                     logger.error(f"Invalid FDA response: {e}")
133 |                     return None, str(e)
134 | 
135 |                 # Cache successful response
136 |                 if is_cacheable_request(endpoint, safe_params):
137 |                     set_cached_response(endpoint, safe_params, response)
138 | 
139 |             return response, None
140 | 
141 |         except asyncio.TimeoutError:
142 |             last_error = "Request timeout"
143 |             if attempt < max_retries:
144 |                 logger.warning(
145 |                     f"OpenFDA request timeout (attempt {attempt + 1}/{max_retries + 1}). "
146 |                     f"Retrying in {delay:.1f} seconds..."
147 |                 )
148 |                 await asyncio.sleep(delay)
149 |                 delay *= 2
150 |                 continue
151 |             logger.error(
152 |                 f"OpenFDA request failed after {max_retries + 1} attempts: {last_error}"
153 |             )
154 |             raise OpenFDATimeoutError(last_error) from None
155 | 
156 |         except ConnectionError as e:
157 |             last_error = f"Connection error: {e}"
158 |             if attempt < max_retries:
159 |                 logger.warning(
160 |                     f"OpenFDA connection error (attempt {attempt + 1}/{max_retries + 1}): {e}. "
161 |                     f"Retrying in {delay:.1f} seconds..."
162 |                 )
163 |                 await asyncio.sleep(delay)
164 |                 delay *= 2
165 |                 continue
166 |             logger.error(
167 |                 f"OpenFDA request failed after {max_retries + 1} attempts: {last_error}"
168 |             )
169 |             raise OpenFDAConnectionError(last_error) from None
170 | 
171 |         except (
172 |             OpenFDARateLimitError,
173 |             OpenFDATimeoutError,
174 |             OpenFDAConnectionError,
175 |         ):
176 |             # Re-raise our custom exceptions
177 |             raise
178 |         except Exception as e:
179 |             # Handle unexpected errors gracefully
180 |             logger.error(f"Unexpected OpenFDA request error: {e}")
181 |             return None, str(e)
182 | 
183 |     return None, last_error
184 | 
185 | 
186 | def _is_retryable_error(error_msg: str) -> bool:
187 |     """
188 |     Check if an error is retryable.
189 | 
190 |     Args:
191 |         error_msg: Error message string
192 | 
193 |     Returns:
194 |         True if the error is retryable
195 |     """
196 |     retryable_patterns = [
197 |         "rate limit",
198 |         "timeout",
199 |         "connection",
200 |         "503",  # Service unavailable
201 |         "502",  # Bad gateway
202 |         "504",  # Gateway timeout
203 |         "429",  # Too many requests
204 |         "temporary",
205 |         "try again",
206 |     ]
207 | 
208 |     error_lower = error_msg.lower()
209 |     return any(pattern in error_lower for pattern in retryable_patterns)
210 | 
211 | 
212 | def format_count(count: int, label: str) -> str:
213 |     """Format a count with appropriate singular/plural label."""
214 |     if count == 1:
215 |         return f"1 {label}"
216 |     return f"{count:,} {label}s"
217 | 
218 | 
219 | def truncate_text(text: str, max_length: int = 500) -> str:
220 |     """Truncate text to a maximum length with ellipsis."""
221 |     if len(text) <= max_length:
222 |         return text
223 |     return text[: max_length - 3] + "..."
224 | 
225 | 
226 | def clean_text(text: str | None) -> str:
227 |     """Clean and normalize text from FDA data."""
228 |     if not text:
229 |         return ""
230 | 
231 |     # Remove extra whitespace and newlines
232 |     text = " ".join(text.split())
233 | 
234 |     # Remove common FDA formatting artifacts
235 |     text = text.replace("\\n", " ")
236 |     text = text.replace("\\r", " ")
237 |     text = text.replace("\\t", " ")
238 | 
239 |     return text.strip()
240 | 
241 | 
242 | def build_search_query(
243 |     field_map: dict[str, str], operator: str = "AND"
244 | ) -> str:
245 |     """
246 |     Build an OpenFDA search query from field mappings.
247 | 
248 |     Args:
249 |         field_map: Dictionary mapping field names to search values
250 |         operator: Logical operator (AND/OR) to combine fields
251 | 
252 |     Returns:
253 |         Formatted search query string
254 |     """
255 |     query_parts = []
256 | 
257 |     for field, value in field_map.items():
258 |         if value:
259 |             # Escape special characters
260 |             escaped_value = value.replace('"', '\\"')
261 |             # Add quotes for multi-word values
262 |             if " " in escaped_value:
263 |                 escaped_value = f'"{escaped_value}"'
264 |             query_parts.append(f"{field}:{escaped_value}")
265 | 
266 |     return f" {operator} ".join(query_parts)
267 | 
268 | 
269 | def extract_drug_names(result: dict[str, Any]) -> list[str]:
270 |     """Extract drug names from an OpenFDA result."""
271 |     drug_names = set()
272 | 
273 |     # Check patient drug info (for adverse events)
274 |     if "patient" in result:
275 |         drugs = result.get("patient", {}).get("drug", [])
276 |         for drug in drugs:
277 |             if "medicinalproduct" in drug:
278 |                 drug_names.add(drug["medicinalproduct"])
279 |             # Check OpenFDA fields
280 |             openfda = drug.get("openfda", {})
281 |             if "brand_name" in openfda:
282 |                 drug_names.update(openfda["brand_name"])
283 |             if "generic_name" in openfda:
284 |                 drug_names.update(openfda["generic_name"])
285 | 
286 |     # Check direct OpenFDA fields (for labels)
287 |     if "openfda" in result:
288 |         openfda = result["openfda"]
289 |         if "brand_name" in openfda:
290 |             drug_names.update(openfda["brand_name"])
291 |         if "generic_name" in openfda:
292 |             drug_names.update(openfda["generic_name"])
293 | 
294 |     return sorted(drug_names)
295 | 
296 | 
297 | def extract_reactions(result: dict[str, Any]) -> list[str]:
298 |     """Extract reaction terms from an adverse event result."""
299 |     reactions = []
300 | 
301 |     patient = result.get("patient", {})
302 |     reaction_list = patient.get("reaction", [])
303 | 
304 |     for reaction in reaction_list:
305 |         if "reactionmeddrapt" in reaction:
306 |             reactions.append(reaction["reactionmeddrapt"])
307 | 
308 |     return reactions
309 | 
310 | 
311 | def format_drug_list(drugs: list[str], max_items: int = 5) -> str:
312 |     """Format a list of drug names for display."""
313 |     if not drugs:
314 |         return "None specified"
315 | 
316 |     if len(drugs) <= max_items:
317 |         return ", ".join(drugs)
318 | 
319 |     shown = drugs[:max_items]
320 |     remaining = len(drugs) - max_items
321 |     return f"{', '.join(shown)} (+{remaining} more)"
322 | 
```

--------------------------------------------------------------------------------
/src/biomcp/openfda/drug_recalls.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | OpenFDA drug recalls (Enforcement) integration.
  3 | """
  4 | 
  5 | import logging
  6 | from typing import Any
  7 | 
  8 | from .constants import (
  9 |     OPENFDA_DEFAULT_LIMIT,
 10 |     OPENFDA_DISCLAIMER,
 11 |     OPENFDA_DRUG_ENFORCEMENT_URL,
 12 | )
 13 | from .drug_recalls_helpers import (
 14 |     build_recall_search_params,
 15 | )
 16 | from .utils import (
 17 |     clean_text,
 18 |     format_count,
 19 |     make_openfda_request,
 20 |     truncate_text,
 21 | )
 22 | 
 23 | logger = logging.getLogger(__name__)
 24 | 
 25 | 
 26 | async def search_drug_recalls(
 27 |     drug: str | None = None,
 28 |     recall_class: str | None = None,
 29 |     status: str | None = None,
 30 |     reason: str | None = None,
 31 |     since_date: str | None = None,
 32 |     limit: int = OPENFDA_DEFAULT_LIMIT,
 33 |     skip: int = 0,
 34 |     api_key: str | None = None,
 35 | ) -> str:
 36 |     """
 37 |     Search FDA drug recall records from Enforcement database.
 38 | 
 39 |     Args:
 40 |         drug: Drug name (brand or generic) to search for
 41 |         recall_class: Classification (1, 2, or 3)
 42 |         status: Recall status (ongoing, completed, terminated)
 43 |         reason: Search text in recall reason
 44 |         since_date: Only show recalls after this date (YYYYMMDD format)
 45 |         limit: Maximum number of results to return
 46 |         skip: Number of results to skip (for pagination)
 47 | 
 48 |         api_key: Optional OpenFDA API key (overrides OPENFDA_API_KEY env var)
 49 | 
 50 |     Returns:
 51 |         Formatted string with drug recall information
 52 |     """
 53 |     # Build search parameters
 54 |     search_params = build_recall_search_params(
 55 |         drug, recall_class, status, reason, since_date, limit, skip
 56 |     )
 57 | 
 58 |     # Make the request
 59 |     response, error = await make_openfda_request(
 60 |         OPENFDA_DRUG_ENFORCEMENT_URL, search_params, "openfda_recalls", api_key
 61 |     )
 62 | 
 63 |     if error:
 64 |         return f"⚠️ Error searching drug recalls: {error}"
 65 | 
 66 |     if not response or not response.get("results"):
 67 |         return "No drug recall records found matching your criteria."
 68 | 
 69 |     # Format the results
 70 |     results = response["results"]
 71 |     total = (
 72 |         response.get("meta", {}).get("results", {}).get("total", len(results))
 73 |     )
 74 | 
 75 |     output = ["## FDA Drug Recall Records\n"]
 76 | 
 77 |     if drug:
 78 |         output.append(f"**Drug**: {drug}")
 79 |     if recall_class:
 80 |         output.append(f"**Classification**: Class {recall_class}")
 81 |     if status:
 82 |         output.append(f"**Status**: {status}")
 83 |     if since_date:
 84 |         output.append(f"**Since**: {since_date}")
 85 | 
 86 |     output.append(
 87 |         f"**Total Recalls Found**: {format_count(total, 'recall')}\n"
 88 |     )
 89 | 
 90 |     # Summary of recall classes if multiple results
 91 |     if len(results) > 1:
 92 |         output.extend(_format_recall_class_summary(results))
 93 | 
 94 |     # Show results
 95 |     output.append(f"### Recalls (showing {len(results)} of {total}):\n")
 96 | 
 97 |     for i, recall in enumerate(results, 1):
 98 |         output.extend(_format_recall_summary(recall, i))
 99 | 
100 |     output.append(f"\n{OPENFDA_DISCLAIMER}")
101 | 
102 |     return "\n".join(output)
103 | 
104 | 
105 | async def get_drug_recall(
106 |     recall_number: str,
107 |     api_key: str | None = None,
108 | ) -> str:
109 |     """
110 |     Get detailed drug recall information for a specific recall.
111 | 
112 |     Args:
113 |         recall_number: FDA recall number
114 | 
115 |         api_key: Optional OpenFDA API key (overrides OPENFDA_API_KEY env var)
116 | 
117 |     Returns:
118 |         Formatted string with detailed recall information
119 |     """
120 |     # Search for the specific recall
121 |     search_params = {"search": f'recall_number:"{recall_number}"', "limit": 1}
122 | 
123 |     response, error = await make_openfda_request(
124 |         OPENFDA_DRUG_ENFORCEMENT_URL, search_params, "openfda_recalls", api_key
125 |     )
126 | 
127 |     if error:
128 |         return f"⚠️ Error retrieving drug recall: {error}"
129 | 
130 |     if not response or not response.get("results"):
131 |         return f"No recall record found for {recall_number}"
132 | 
133 |     recall = response["results"][0]
134 | 
135 |     # Format detailed recall information
136 |     output = [f"## Drug Recall Details: {recall_number}\n"]
137 | 
138 |     # Basic information
139 |     output.extend(_format_recall_header(recall))
140 | 
141 |     # Reason and details
142 |     output.extend(_format_recall_details(recall))
143 | 
144 |     # Distribution information
145 |     output.extend(_format_distribution_info(recall))
146 | 
147 |     # OpenFDA metadata
148 |     if openfda := recall.get("openfda"):
149 |         output.extend(_format_recall_openfda(openfda))
150 | 
151 |     output.append(f"\n{OPENFDA_DISCLAIMER}")
152 | 
153 |     return "\n".join(output)
154 | 
155 | 
156 | def _format_recall_class_summary(results: list[dict[str, Any]]) -> list[str]:
157 |     """Format summary of recall classifications."""
158 |     output = []
159 | 
160 |     # Count by classification
161 |     class_counts = {"Class I": 0, "Class II": 0, "Class III": 0}
162 |     for recall in results:
163 |         classification = recall.get("classification", "")
164 |         if classification in class_counts:
165 |             class_counts[classification] += 1
166 | 
167 |     if any(class_counts.values()):
168 |         output.append("### Classification Summary:")
169 |         if class_counts["Class I"]:
170 |             output.append(
171 |                 f"- **Class I** (most serious): {class_counts['Class I']} recalls"
172 |             )
173 |         if class_counts["Class II"]:
174 |             output.append(
175 |                 f"- **Class II** (moderate): {class_counts['Class II']} recalls"
176 |             )
177 |         if class_counts["Class III"]:
178 |             output.append(
179 |                 f"- **Class III** (least serious): {class_counts['Class III']} recalls"
180 |             )
181 |         output.append("")
182 | 
183 |     return output
184 | 
185 | 
186 | def _format_recall_summary(recall: dict[str, Any], num: int) -> list[str]:
187 |     """Format a single recall summary."""
188 |     output = [f"#### {num}. Recall {recall.get('recall_number', 'Unknown')}"]
189 | 
190 |     # Classification and status
191 |     classification = recall.get("classification", "Unknown")
192 |     status = recall.get("status", "Unknown")
193 | 
194 |     # Add severity indicator
195 |     severity_emoji = {
196 |         "Class I": "🔴",  # Most serious
197 |         "Class II": "🟡",  # Moderate
198 |         "Class III": "🟢",  # Least serious
199 |     }.get(classification, "⚪")
200 | 
201 |     output.append(f"{severity_emoji} **{classification}** - {status}")
202 | 
203 |     # Date
204 |     if init_date := recall.get("recall_initiation_date"):
205 |         formatted_date = f"{init_date[:4]}-{init_date[4:6]}-{init_date[6:]}"
206 |         output.append(f"**Initiated**: {formatted_date}")
207 | 
208 |     # Product description
209 |     if product_desc := recall.get("product_description"):
210 |         cleaned = truncate_text(clean_text(product_desc), 200)
211 |         output.append(f"**Product**: {cleaned}")
212 | 
213 |     # OpenFDA names
214 |     openfda = recall.get("openfda", {})
215 |     if brand_names := openfda.get("brand_name"):
216 |         output.append(f"**Brand**: {', '.join(brand_names[:3])}")
217 | 
218 |     # Reason for recall
219 |     if reason := recall.get("reason_for_recall"):
220 |         cleaned_reason = truncate_text(clean_text(reason), 300)
221 |         output.append(f"\n**Reason**: {cleaned_reason}")
222 | 
223 |     # Firm name
224 |     if firm := recall.get("recalling_firm"):
225 |         output.append(f"\n**Recalling Firm**: {firm}")
226 | 
227 |     output.append("")
228 |     return output
229 | 
230 | 
231 | def _format_recall_header(recall: dict[str, Any]) -> list[str]:
232 |     """Format the header section of detailed recall."""
233 |     output = ["### Recall Information"]
234 | 
235 |     output.append(
236 |         f"**Recall Number**: {recall.get('recall_number', 'Unknown')}"
237 |     )
238 |     output.append(
239 |         f"**Classification**: {recall.get('classification', 'Unknown')}"
240 |     )
241 |     output.append(f"**Status**: {recall.get('status', 'Unknown')}")
242 | 
243 |     if event_id := recall.get("event_id"):
244 |         output.append(f"**Event ID**: {event_id}")
245 | 
246 |     # Dates
247 |     if init_date := recall.get("recall_initiation_date"):
248 |         formatted = f"{init_date[:4]}-{init_date[4:6]}-{init_date[6:]}"
249 |         output.append(f"**Initiation Date**: {formatted}")
250 | 
251 |     if report_date := recall.get("report_date"):
252 |         formatted = f"{report_date[:4]}-{report_date[4:6]}-{report_date[6:]}"
253 |         output.append(f"**Report Date**: {formatted}")
254 | 
255 |     if term_date := recall.get("termination_date"):
256 |         formatted = f"{term_date[:4]}-{term_date[4:6]}-{term_date[6:]}"
257 |         output.append(f"**Termination Date**: {formatted}")
258 | 
259 |     output.append("")
260 |     return output
261 | 
262 | 
263 | def _format_recall_details(recall: dict[str, Any]) -> list[str]:
264 |     """Format recall details and reason."""
265 |     output = ["### Product and Reason"]
266 | 
267 |     if product_desc := recall.get("product_description"):
268 |         output.append(f"**Product Description**:\n{clean_text(product_desc)}")
269 | 
270 |     if reason := recall.get("reason_for_recall"):
271 |         output.append(f"\n**Reason for Recall**:\n{clean_text(reason)}")
272 | 
273 |     if quantity := recall.get("product_quantity"):
274 |         output.append(f"\n**Product Quantity**: {quantity}")
275 | 
276 |     if code_info := recall.get("code_info"):
277 |         output.append(f"\n**Code Information**:\n{clean_text(code_info)}")
278 | 
279 |     output.append("")
280 |     return output
281 | 
282 | 
283 | def _format_distribution_info(recall: dict[str, Any]) -> list[str]:
284 |     """Format distribution information."""
285 |     output = ["### Distribution Information"]
286 | 
287 |     if firm := recall.get("recalling_firm"):
288 |         output.append(f"**Recalling Firm**: {firm}")
289 | 
290 |     if city := recall.get("city"):
291 |         state = recall.get("state", "")
292 |         country = recall.get("country", "")
293 |         location = city
294 |         if state:
295 |             location += f", {state}"
296 |         if country:
297 |             location += f", {country}"
298 |         output.append(f"**Location**: {location}")
299 | 
300 |     if dist_pattern := recall.get("distribution_pattern"):
301 |         output.append(
302 |             f"\n**Distribution Pattern**:\n{clean_text(dist_pattern)}"
303 |         )
304 | 
305 |     if action := recall.get("voluntary_mandated"):
306 |         output.append(f"\n**Action Type**: {action}")
307 | 
308 |     output.append("")
309 |     return output
310 | 
311 | 
312 | def _format_recall_openfda(openfda: dict[str, Any]) -> list[str]:
313 |     """Format OpenFDA metadata for recall."""
314 |     output = ["### Drug Information"]
315 | 
316 |     if brand_names := openfda.get("brand_name"):
317 |         output.append(f"**Brand Names**: {', '.join(brand_names)}")
318 | 
319 |     if generic_names := openfda.get("generic_name"):
320 |         output.append(f"**Generic Names**: {', '.join(generic_names)}")
321 | 
322 |     if manufacturers := openfda.get("manufacturer_name"):
323 |         output.append(f"**Manufacturers**: {', '.join(manufacturers[:3])}")
324 | 
325 |     if ndas := openfda.get("application_number"):
326 |         output.append(f"**Application Numbers**: {', '.join(ndas[:5])}")
327 | 
328 |     if routes := openfda.get("route"):
329 |         output.append(f"**Routes**: {', '.join(routes)}")
330 | 
331 |     if pharm_class := openfda.get("pharm_class_epc"):
332 |         output.append(f"**Pharmacologic Class**: {', '.join(pharm_class[:3])}")
333 | 
334 |     output.append("")
335 |     return output
336 | 
```

--------------------------------------------------------------------------------
/docs/workflows/all-workflows.md:
--------------------------------------------------------------------------------

```markdown
  1 | # BioMCP Research Workflows
  2 | 
  3 | Quick, practical workflows for common biomedical research tasks.
  4 | 
  5 | ## 1. Literature Review Workflow
  6 | 
  7 | ### Quick Start
  8 | 
  9 | ```bash
 10 | # Find key papers on BRAF V600E melanoma therapy
 11 | biomcp article search --gene BRAF --disease melanoma \
 12 |   --keyword "V600E|therapy|treatment" --limit 50 \
 13 |   --format json > braf_papers.json
 14 | ```
 15 | 
 16 | ### Full Workflow Script
 17 | 
 18 | ```python
 19 | import asyncio
 20 | from biomcp import BioMCPClient
 21 | 
 22 | async def literature_review(gene, disease, focus_terms):
 23 |     async with BioMCPClient() as client:
 24 |         # 1. Get gene context
 25 |         gene_info = await client.genes.get(gene)
 26 | 
 27 |         # 2. Search by topic
 28 |         results = {}
 29 |         for term in focus_terms:
 30 |             articles = await client.articles.search(
 31 |                 genes=[gene],
 32 |                 diseases=[disease],
 33 |                 keywords=[term],
 34 |                 limit=30
 35 |             )
 36 |             results[term] = articles.articles
 37 | 
 38 |         # 3. Generate summary
 39 |         print(f"\n{gene} in {disease}: Found {sum(len(v) for v in results.values())} articles")
 40 |         for topic, articles in results.items():
 41 |             print(f"\n{topic}: {len(articles)} articles")
 42 |             for a in articles[:3]:
 43 |                 print(f"  - {a.title[:80]}... ({a.year})")
 44 | 
 45 |         return results
 46 | 
 47 | # Run it
 48 | asyncio.run(literature_review(
 49 |     "BRAF",
 50 |     "melanoma",
 51 |     ["resistance", "combination therapy", "immunotherapy"]
 52 | ))
 53 | ```
 54 | 
 55 | ### Key Points
 56 | 
 57 | - Start broad, then narrow by topic
 58 | - Use OR syntax for variant notations
 59 | - Export results for citation management
 60 | - Set up weekly searches for updates
 61 | 
 62 | ---
 63 | 
 64 | ## 2. Clinical Trial Matching Workflow
 65 | 
 66 | ### Quick Start
 67 | 
 68 | ```bash
 69 | # Find trials for EGFR-mutant lung cancer near Boston
 70 | biomcp trial search --condition "lung cancer" \
 71 |   --term "EGFR mutation" --status RECRUITING \
 72 |   --latitude 42.3601 --longitude -71.0589 --distance 100
 73 | ```
 74 | 
 75 | ### Patient Matching Script
 76 | 
 77 | ```python
 78 | async def match_patient_to_trials(patient_profile):
 79 |     async with BioMCPClient() as client:
 80 |         # 1. Search trials with location
 81 |         trials = await client.trials.search(
 82 |             conditions=[patient_profile['diagnosis']],
 83 |             other_terms=patient_profile['mutations'],
 84 |             lat=patient_profile['lat'],
 85 |             long=patient_profile['long'],
 86 |             distance=patient_profile['max_distance'],
 87 |             status="RECRUITING"
 88 |         )
 89 | 
 90 |         # 2. Score trials
 91 |         scored = []
 92 |         for trial in trials.trials[:20]:
 93 |             score = 0
 94 | 
 95 |             # Location score
 96 |             if trial.distance < 50:
 97 |                 score += 25
 98 | 
 99 |             # Phase score
100 |             if trial.phase == "PHASE3":
101 |                 score += 20
102 |             elif trial.phase == "PHASE2":
103 |                 score += 15
104 | 
105 |             # Mutation match
106 |             if any(mut in str(trial.eligibility) for mut in patient_profile['mutations']):
107 |                 score += 30
108 | 
109 |             scored.append((score, trial))
110 | 
111 |         # 3. Return top matches
112 |         scored.sort(reverse=True, key=lambda x: x[0])
113 |         return [(s, t) for s, t in scored[:5]]
114 | 
115 | # Example patient
116 | patient = {
117 |     'diagnosis': 'non-small cell lung cancer',
118 |     'mutations': ['EGFR L858R'],
119 |     'lat': 42.3601,
120 |     'long': -71.0589,
121 |     'max_distance': 100
122 | }
123 | 
124 | matches = asyncio.run(match_patient_to_trials(patient))
125 | ```
126 | 
127 | ### Key Points
128 | 
129 | - Always use coordinates for location search
130 | - Check both ClinicalTrials.gov and NCI sources
131 | - Contact trial sites directly for pre-screening
132 | - Consider travel burden in recommendations
133 | 
134 | ---
135 | 
136 | ## 3. Variant Interpretation Workflow
137 | 
138 | ### Quick Start
139 | 
140 | ```bash
141 | # Get variant annotations
142 | biomcp variant get rs121913529  # By rsID
143 | biomcp variant get "NM_007294.4:c.5266dupC"  # By HGVS
144 | 
145 | # Search pathogenic variants
146 | biomcp variant search --gene BRCA1 --significance pathogenic
147 | ```
148 | 
149 | ### Variant Analysis Script
150 | 
151 | ```python
152 | async def interpret_variant(gene, variant_notation, cancer_type):
153 |     async with BioMCPClient() as client:
154 |         # 1. Get variant details
155 |         try:
156 |             variant = await client.variants.get(variant_notation)
157 |             significance = variant.clinical_significance
158 |             frequency = variant.frequencies.gnomad if hasattr(variant, 'frequencies') else None
159 |         except:
160 |             significance = "Not found"
161 |             frequency = None
162 | 
163 |         # 2. Search literature
164 |         articles = await client.articles.search(
165 |             genes=[gene],
166 |             variants=[variant_notation],
167 |             diseases=[cancer_type],
168 |             limit=10
169 |         )
170 | 
171 |         # 3. Find trials
172 |         trials = await client.trials.search(
173 |             conditions=[cancer_type],
174 |             other_terms=[f"{gene} mutation"],
175 |             status="RECRUITING",
176 |             limit=5
177 |         )
178 | 
179 |         # 4. Generate interpretation
180 |         print(f"\nVariant: {gene} {variant_notation}")
181 |         print(f"Significance: {significance}")
182 |         print(f"Population Frequency: {frequency or 'Unknown'}")
183 |         print(f"Literature: {len(articles.articles)} relevant papers")
184 |         print(f"Clinical Trials: {len(trials.trials)} active trials")
185 | 
186 |         # Actionability assessment
187 |         if significance in ["Pathogenic", "Likely pathogenic"]:
188 |             if trials.trials:
189 |                 print("✓ ACTIONABLE - Clinical trials available")
190 |             else:
191 |                 print("⚠ Pathogenic but no targeted trials")
192 | 
193 |         return {
194 |             'significance': significance,
195 |             'frequency': frequency,
196 |             'articles': len(articles.articles),
197 |             'trials': len(trials.trials)
198 |         }
199 | 
200 | # Run it
201 | asyncio.run(interpret_variant("BRAF", "p.V600E", "melanoma"))
202 | ```
203 | 
204 | ### Key Points
205 | 
206 | - Check multiple databases (MyVariant, ClinVar via articles)
207 | - Consider cancer type for interpretation
208 | - Look for FDA-approved therapies
209 | - Document tier classification
210 | 
211 | ---
212 | 
213 | ## 4. Quick Integration Patterns
214 | 
215 | ### Batch Processing
216 | 
217 | ```python
218 | # Process multiple queries efficiently
219 | async def batch_analysis(items):
220 |     async with BioMCPClient() as client:
221 |         tasks = []
222 |         for item in items:
223 |             if item['type'] == 'gene':
224 |                 tasks.append(client.genes.get(item['id']))
225 |             elif item['type'] == 'variant':
226 |                 tasks.append(client.variants.get(item['id']))
227 | 
228 |         results = await asyncio.gather(*tasks, return_exceptions=True)
229 |         return results
230 | ```
231 | 
232 | ### Error Handling
233 | 
234 | ```python
235 | from biomcp.exceptions import NotFoundError, RateLimitError
236 | import time
237 | 
238 | async def robust_search(search_func, **params):
239 |     retries = 3
240 |     for attempt in range(retries):
241 |         try:
242 |             return await search_func(**params)
243 |         except RateLimitError as e:
244 |             if attempt < retries - 1:
245 |                 time.sleep(2 ** attempt)  # Exponential backoff
246 |             else:
247 |                 raise
248 |         except NotFoundError:
249 |             return None
250 | ```
251 | 
252 | ### Caching Results
253 | 
254 | ```python
255 | from functools import lru_cache
256 | import json
257 | 
258 | # Simple file-based cache
259 | def cache_results(filename):
260 |     def decorator(func):
261 |         async def wrapper(*args, **kwargs):
262 |             # Check cache
263 |             try:
264 |                 with open(filename, 'r') as f:
265 |                     return json.load(f)
266 |             except FileNotFoundError:
267 |                 pass
268 | 
269 |             # Fetch and cache
270 |             result = await func(*args, **kwargs)
271 |             with open(filename, 'w') as f:
272 |                 json.dump(result, f)
273 |             return result
274 |         return wrapper
275 |     return decorator
276 | 
277 | @cache_results('gene_cache.json')
278 | async def get_gene_info(gene):
279 |     async with BioMCPClient() as client:
280 |         return await client.genes.get(gene)
281 | ```
282 | 
283 | ---
284 | 
285 | ## Complete Example: Precision Medicine Report
286 | 
287 | ```python
288 | async def generate_precision_medicine_report(patient):
289 |     """Generate comprehensive report for molecular tumor board."""
290 | 
291 |     async with BioMCPClient() as client:
292 |         report = {
293 |             'patient_id': patient['id'],
294 |             'date': datetime.now().isoformat(),
295 |             'variants': [],
296 |             'trials': [],
297 |             'therapies': []
298 |         }
299 | 
300 |         # Analyze each variant
301 |         for variant in patient['variants']:
302 |             # Get annotations
303 |             var_info = await robust_search(
304 |                 client.variants.search,
305 |                 gene=variant['gene'],
306 |                 hgvs=variant['hgvs']
307 |             )
308 | 
309 |             # Search literature
310 |             articles = await client.articles.search(
311 |                 genes=[variant['gene']],
312 |                 diseases=[patient['cancer_type']],
313 |                 keywords=['therapy', 'treatment'],
314 |                 limit=5
315 |             )
316 | 
317 |             # Find trials
318 |             trials = await client.trials.search(
319 |                 conditions=[patient['cancer_type']],
320 |                 other_terms=[f"{variant['gene']} mutation"],
321 |                 status="RECRUITING",
322 |                 limit=3
323 |             )
324 | 
325 |             report['variants'].append({
326 |                 'variant': variant,
327 |                 'annotation': var_info,
328 |                 'relevant_articles': len(articles.articles),
329 |                 'available_trials': len(trials.trials)
330 |             })
331 | 
332 |             report['trials'].extend(trials.trials)
333 | 
334 |         # Generate summary
335 |         print(f"\nPrecision Medicine Report - {patient['id']}")
336 |         print(f"Cancer Type: {patient['cancer_type']}")
337 |         print(f"Variants Analyzed: {len(report['variants'])}")
338 |         print(f"Clinical Trials Found: {len(report['trials'])}")
339 | 
340 |         # Prioritize actionable findings
341 |         actionable = [v for v in report['variants']
342 |                      if v['available_trials'] > 0]
343 | 
344 |         if actionable:
345 |             print(f"\n✓ {len(actionable)} ACTIONABLE variants with trial options")
346 | 
347 |         return report
348 | 
349 | # Example usage
350 | patient = {
351 |     'id': 'PT001',
352 |     'cancer_type': 'lung adenocarcinoma',
353 |     'variants': [
354 |         {'gene': 'EGFR', 'hgvs': 'p.L858R'},
355 |         {'gene': 'TP53', 'hgvs': 'p.R273H'}
356 |     ]
357 | }
358 | 
359 | report = asyncio.run(generate_precision_medicine_report(patient))
360 | ```
361 | 
362 | ---
363 | 
364 | ## Tips for All Workflows
365 | 
366 | 1. **Always start with the think tool** (for AI assistants)
367 | 2. **Use official gene symbols** - check genenames.org
368 | 3. **Batch API calls** when possible
369 | 4. **Handle errors gracefully** - APIs can be unavailable
370 | 5. **Cache frequently accessed data** - respect rate limits
371 | 6. **Document your process** - for reproducibility
372 | 
373 | ## Next Steps
374 | 
375 | - [Command Reference](../reference/quick-reference.md)
376 | - [API Documentation](../apis/python-sdk.md)
377 | - [Troubleshooting](../troubleshooting.md)
378 | 
```

--------------------------------------------------------------------------------
/src/biomcp/trials/nci_search.py:
--------------------------------------------------------------------------------

```python
  1 | """NCI Clinical Trials Search API integration for trial searches."""
  2 | 
  3 | import logging
  4 | from typing import Any
  5 | 
  6 | from ..constants import NCI_TRIALS_URL
  7 | from ..diseases.search import search_diseases
  8 | from ..integrations.cts_api import CTSAPIError, make_cts_request
  9 | from ..interventions.search import search_interventions
 10 | from .search import TrialQuery
 11 | 
 12 | logger = logging.getLogger(__name__)
 13 | 
 14 | 
 15 | async def _expand_disease_terms(
 16 |     conditions: list[str],
 17 |     expand_synonyms: bool,
 18 | ) -> list[str]:
 19 |     """Expand disease terms with synonyms if requested."""
 20 |     if not expand_synonyms:
 21 |         return conditions
 22 | 
 23 |     disease_terms = []
 24 |     for condition in conditions:
 25 |         try:
 26 |             results = await search_diseases(
 27 |                 name=condition,
 28 |                 include_synonyms=True,
 29 |                 page_size=5,
 30 |             )
 31 |             # Add the original term plus any exact matches
 32 |             disease_terms.append(condition)
 33 |             for disease in results.get("diseases", [])[:3]:
 34 |                 if disease.get("name"):
 35 |                     disease_terms.append(disease["name"])
 36 |                 # Add top synonyms
 37 |                 synonyms = disease.get("synonyms", [])
 38 |                 if isinstance(synonyms, list):
 39 |                     disease_terms.extend(synonyms[:2])
 40 |         except Exception as e:
 41 |             logger.warning(f"Failed to expand disease term {condition}: {e}")
 42 |             disease_terms.append(condition)
 43 | 
 44 |     # Remove duplicates while preserving order
 45 |     seen = set()
 46 |     unique_diseases = []
 47 |     for term in disease_terms:
 48 |         if term.lower() not in seen:
 49 |             seen.add(term.lower())
 50 |             unique_diseases.append(term)
 51 | 
 52 |     return unique_diseases
 53 | 
 54 | 
 55 | async def _normalize_interventions(interventions: list[str]) -> list[str]:
 56 |     """Normalize intervention names to IDs where possible."""
 57 |     intervention_ids = []
 58 |     for intervention in interventions:
 59 |         try:
 60 |             results = await search_interventions(
 61 |                 name=intervention,
 62 |                 page_size=1,
 63 |             )
 64 |             interventions_data = results.get("interventions", [])
 65 |             if interventions_data:
 66 |                 # Use the ID if available, otherwise the name
 67 |                 int_id = interventions_data[0].get("id", intervention)
 68 |                 intervention_ids.append(int_id)
 69 |             else:
 70 |                 intervention_ids.append(intervention)
 71 |         except Exception:
 72 |             intervention_ids.append(intervention)
 73 | 
 74 |     return intervention_ids
 75 | 
 76 | 
 77 | def _map_phase_to_nci(phase: Any) -> str | None:
 78 |     """Map TrialPhase enum to NCI phase values."""
 79 |     if not phase:
 80 |         return None
 81 | 
 82 |     phase_map = {
 83 |         "EARLY_PHASE1": "I",
 84 |         "PHASE1": "I",
 85 |         "PHASE2": "II",
 86 |         "PHASE3": "III",
 87 |         "PHASE4": "IV",
 88 |         "NOT_APPLICABLE": "NA",
 89 |     }
 90 |     return phase_map.get(phase.value, phase.value)
 91 | 
 92 | 
 93 | def _map_status_to_nci(recruiting_status: Any) -> list[str] | None:
 94 |     """Map RecruitingStatus enum to NCI status values."""
 95 |     if not recruiting_status:
 96 |         return None
 97 | 
 98 |     status_map = {
 99 |         "OPEN": ["recruiting", "enrolling_by_invitation"],
100 |         "CLOSED": ["active_not_recruiting", "completed", "terminated"],
101 |         "ANY": None,
102 |     }
103 |     return status_map.get(recruiting_status.value)
104 | 
105 | 
106 | def _map_sort_to_nci(sort: Any) -> str | None:
107 |     """Map SortOrder enum to NCI sort values."""
108 |     if not sort:
109 |         return None
110 | 
111 |     sort_map = {
112 |         "RELEVANCE": "relevance",
113 |         "LAST_UPDATE": "last_update_date",
114 |         "START_DATE": "start_date",
115 |         "COMPLETION_DATE": "completion_date",
116 |     }
117 |     return sort_map.get(sort.value)
118 | 
119 | 
120 | def _add_location_params(params: dict[str, Any], query: TrialQuery) -> None:
121 |     """Add location parameters if present."""
122 |     if query.lat is not None and query.long is not None:
123 |         params["latitude"] = query.lat
124 |         params["longitude"] = query.long
125 |         params["distance"] = query.distance or 50
126 | 
127 | 
128 | def _add_eligibility_params(params: dict[str, Any], query: TrialQuery) -> None:
129 |     """Add advanced eligibility criteria parameters."""
130 |     if query.prior_therapies:
131 |         params["prior_therapy"] = query.prior_therapies
132 | 
133 |     if query.required_mutations:
134 |         params["biomarkers"] = query.required_mutations
135 | 
136 |     if query.allow_brain_mets is not None:
137 |         params["accepts_brain_mets"] = query.allow_brain_mets
138 | 
139 | 
140 | async def convert_query_to_nci(query: TrialQuery) -> dict[str, Any]:
141 |     """
142 |     Convert a TrialQuery object to NCI CTS API parameters.
143 | 
144 |     Maps BioMCP's TrialQuery fields to NCI's parameter structure.
145 |     """
146 |     params: dict[str, Any] = {}
147 | 
148 |     # Basic search terms
149 |     if query.terms:
150 |         params["_fulltext"] = " ".join(query.terms)
151 | 
152 |     # Conditions/diseases with synonym expansion
153 |     if query.conditions:
154 |         disease_terms = await _expand_disease_terms(
155 |             query.conditions,
156 |             query.expand_synonyms,
157 |         )
158 |         if disease_terms:
159 |             params["diseases"] = disease_terms
160 | 
161 |     # Interventions
162 |     if query.interventions:
163 |         params["interventions"] = await _normalize_interventions(
164 |             query.interventions
165 |         )
166 | 
167 |     # NCT IDs
168 |     if query.nct_ids:
169 |         params["nct_ids"] = query.nct_ids
170 | 
171 |     # Phase and status mappings
172 |     nci_phase = _map_phase_to_nci(query.phase)
173 |     if nci_phase:
174 |         params["phase"] = nci_phase
175 | 
176 |     statuses = _map_status_to_nci(query.recruiting_status)
177 |     if statuses:
178 |         params["recruitment_status"] = statuses
179 | 
180 |     # Location and eligibility
181 |     _add_location_params(params, query)
182 |     _add_eligibility_params(params, query)
183 | 
184 |     # Pagination
185 |     params["size"] = query.page_size if query.page_size else 20
186 | 
187 |     # Sort order
188 |     sort_value = _map_sort_to_nci(query.sort)
189 |     if sort_value:
190 |         params["sort"] = sort_value
191 | 
192 |     return params
193 | 
194 | 
195 | async def search_trials_nci(
196 |     query: TrialQuery,
197 |     api_key: str | None = None,
198 | ) -> dict[str, Any]:
199 |     """
200 |     Search for clinical trials using NCI CTS API.
201 | 
202 |     Returns:
203 |         Dictionary with:
204 |         - trials: List of trial records
205 |         - total: Total number of results
206 |         - next_page: Token for next page (if available)
207 |         - source: "nci" to indicate data source
208 |     """
209 |     try:
210 |         # Convert query to NCI parameters
211 |         params = await convert_query_to_nci(query)
212 | 
213 |         # Make API request
214 |         response = await make_cts_request(
215 |             url=NCI_TRIALS_URL,
216 |             params=params,
217 |             api_key=api_key,
218 |         )
219 | 
220 |         # Process response
221 |         trials = response.get("data", response.get("trials", []))
222 |         total = response.get("total", len(trials))
223 |         next_page = response.get("next_page_token")
224 | 
225 |         return {
226 |             "trials": trials,
227 |             "total": total,
228 |             "next_page": next_page,
229 |             "source": "nci",
230 |         }
231 | 
232 |     except CTSAPIError:
233 |         raise
234 |     except Exception as e:
235 |         logger.error(f"NCI trial search failed: {e}")
236 |         raise CTSAPIError(f"Trial search failed: {e!s}") from e
237 | 
238 | 
239 | def _format_trial_header(trial: dict[str, Any]) -> list[str]:
240 |     """Format trial header with basic info."""
241 |     nct_id = trial.get("nct_id", trial.get("protocol_id", "Unknown"))
242 |     title = trial.get("title", trial.get("brief_title", "Untitled"))
243 |     phase = trial.get("phase", "Not specified")
244 |     status = trial.get("overall_status", trial.get("status", "Unknown"))
245 | 
246 |     return [
247 |         f"### [{nct_id}] {title}",
248 |         f"- **Phase**: {phase}",
249 |         f"- **Status**: {status}",
250 |     ]
251 | 
252 | 
253 | def _format_trial_summary_text(trial: dict[str, Any]) -> list[str]:
254 |     """Format trial summary text if available."""
255 |     summary = trial.get("brief_summary", trial.get("description", ""))
256 |     if not summary:
257 |         return []
258 | 
259 |     if len(summary) > 200:
260 |         summary = summary[:197] + "..."
261 |     return [f"- **Summary**: {summary}"]
262 | 
263 | 
264 | def _format_trial_conditions(trial: dict[str, Any]) -> list[str]:
265 |     """Format trial conditions/diseases."""
266 |     conditions = trial.get("diseases", trial.get("conditions", []))
267 |     if not conditions:
268 |         return []
269 | 
270 |     lines = []
271 |     if isinstance(conditions, list):
272 |         lines.append(f"- **Conditions**: {', '.join(conditions[:3])}")
273 |         if len(conditions) > 3:
274 |             lines.append(f"  *(and {len(conditions) - 3} more)*")
275 |     else:
276 |         lines.append(f"- **Conditions**: {conditions}")
277 | 
278 |     return lines
279 | 
280 | 
281 | def _format_trial_interventions(trial: dict[str, Any]) -> list[str]:
282 |     """Format trial interventions."""
283 |     interventions = trial.get("interventions", [])
284 |     if not interventions:
285 |         return []
286 | 
287 |     int_names = []
288 |     for intervention in interventions[:3]:
289 |         if isinstance(intervention, dict):
290 |             int_names.append(intervention.get("name", "Unknown"))
291 |         else:
292 |             int_names.append(str(intervention))
293 | 
294 |     if not int_names:
295 |         return []
296 | 
297 |     lines = [f"- **Interventions**: {', '.join(int_names)}"]
298 |     if len(interventions) > 3:
299 |         lines.append(f"  *(and {len(interventions) - 3} more)*")
300 | 
301 |     return lines
302 | 
303 | 
304 | def _format_trial_metadata(trial: dict[str, Any]) -> list[str]:
305 |     """Format trial metadata (sponsor, eligibility notes)."""
306 |     lines = []
307 | 
308 |     lead_org = trial.get("lead_org", trial.get("sponsor", ""))
309 |     if lead_org:
310 |         lines.append(f"- **Lead Organization**: {lead_org}")
311 | 
312 |     if trial.get("accepts_brain_mets"):
313 |         lines.append("- **Note**: Accepts patients with brain metastases")
314 | 
315 |     return lines
316 | 
317 | 
318 | def _format_trial_summary(trial: dict[str, Any]) -> list[str]:
319 |     """Format a single trial summary."""
320 |     lines = []
321 | 
322 |     # Add header info
323 |     lines.extend(_format_trial_header(trial))
324 | 
325 |     # Add summary text
326 |     lines.extend(_format_trial_summary_text(trial))
327 | 
328 |     # Add conditions
329 |     lines.extend(_format_trial_conditions(trial))
330 | 
331 |     # Add interventions
332 |     lines.extend(_format_trial_interventions(trial))
333 | 
334 |     # Add metadata
335 |     lines.extend(_format_trial_metadata(trial))
336 | 
337 |     lines.append("")
338 |     return lines
339 | 
340 | 
341 | def format_nci_trial_results(results: dict[str, Any]) -> str:
342 |     """
343 |     Format NCI trial search results as markdown.
344 |     """
345 |     trials = results.get("trials", [])
346 |     total = results.get("total", 0)
347 | 
348 |     if not trials:
349 |         return "No trials found matching the search criteria in NCI database."
350 | 
351 |     lines = [
352 |         f"## NCI Clinical Trials Search Results ({total} found)",
353 |         "",
354 |         "*Source: NCI Clinical Trials Search API*",
355 |         "",
356 |     ]
357 | 
358 |     for trial in trials:
359 |         lines.extend(_format_trial_summary(trial))
360 | 
361 |     return "\n".join(lines)
362 | 
```

--------------------------------------------------------------------------------
/src/biomcp/variants/alphagenome.py:
--------------------------------------------------------------------------------

```python
  1 | """AlphaGenome integration for variant effect prediction."""
  2 | 
  3 | import logging
  4 | import os
  5 | import re
  6 | from typing import Any, TypedDict
  7 | 
  8 | from ..utils.request_cache import request_cache
  9 | 
 10 | logger = logging.getLogger(__name__)
 11 | 
 12 | # Default threshold for significant changes
 13 | DEFAULT_SIGNIFICANCE_THRESHOLD = 0.5
 14 | 
 15 | # Chromosome pattern for validation
 16 | CHROMOSOME_PATTERN = re.compile(r"^chr([1-9]|1[0-9]|2[0-2]|X|Y|M|MT)$")
 17 | 
 18 | # Valid nucleotide characters
 19 | VALID_NUCLEOTIDES = set("ACGT")
 20 | 
 21 | 
 22 | class VariantPrediction(TypedDict):
 23 |     """Type definition for variant prediction results."""
 24 | 
 25 |     gene_expression: dict[str, float]
 26 |     chromatin_accessibility: dict[str, float]
 27 |     splicing_effects: list[str]
 28 |     summary_stats: dict[str, int]
 29 | 
 30 | 
 31 | @request_cache(ttl=1800)  # Cache for 30 minutes
 32 | async def predict_variant_effects(
 33 |     chromosome: str,
 34 |     position: int,
 35 |     reference: str,
 36 |     alternate: str,
 37 |     interval_size: int = 131_072,
 38 |     tissue_types: list[str] | None = None,
 39 |     significance_threshold: float = DEFAULT_SIGNIFICANCE_THRESHOLD,
 40 |     api_key: str | None = None,
 41 | ) -> str:
 42 |     """
 43 |     Predict variant effects using AlphaGenome.
 44 | 
 45 |     Args:
 46 |         chromosome: Chromosome (e.g., 'chr7')
 47 |         position: 1-based genomic position
 48 |         reference: Reference allele(s)
 49 |         alternate: Alternate allele(s)
 50 |         interval_size: Size of genomic context window (max 1,000,000)
 51 |         tissue_types: Optional UBERON ontology terms for tissue-specific predictions
 52 |         significance_threshold: Threshold for significant changes (default 0.5)
 53 |         api_key: Optional API key (if not provided, uses ALPHAGENOME_API_KEY env var)
 54 | 
 55 |     Returns:
 56 |         Formatted markdown string with predictions
 57 | 
 58 |     Raises:
 59 |         ValueError: If input parameters are invalid
 60 |     """
 61 |     # Validate inputs
 62 |     _validate_inputs(chromosome, position, reference, alternate)
 63 | 
 64 |     # Check for API key (prefer parameter over environment variable)
 65 |     if not api_key:
 66 |         api_key = os.getenv("ALPHAGENOME_API_KEY")
 67 | 
 68 |     if not api_key:
 69 |         return (
 70 |             "❌ **AlphaGenome API key required**\n\n"
 71 |             "I need an API key to use AlphaGenome. Please provide it by either:\n\n"
 72 |             "**Option 1: Include your key in your request**\n"
 73 |             'Say: "My AlphaGenome API key is YOUR_KEY_HERE" and I\'ll use it for this prediction.\n\n'
 74 |             "**Option 2: Set it as an environment variable (for persistent use)**\n"
 75 |             "```bash\n"
 76 |             "export ALPHAGENOME_API_KEY='your-key'\n"
 77 |             "```\n\n"
 78 |             "Get a free API key at: https://deepmind.google.com/science/alphagenome\n\n"
 79 |             "**ACTION REQUIRED**: Please provide your API key using Option 1 above to continue."
 80 |         )
 81 | 
 82 |     # Try to import AlphaGenome
 83 |     try:
 84 |         # Suppress protobuf version warnings
 85 |         import warnings
 86 | 
 87 |         warnings.filterwarnings(
 88 |             "ignore",
 89 |             category=UserWarning,
 90 |             module="google.protobuf.runtime_version",
 91 |         )
 92 | 
 93 |         from alphagenome.data import genome
 94 |         from alphagenome.models import dna_client, variant_scorers
 95 |     except ImportError:
 96 |         return (
 97 |             "❌ **AlphaGenome not installed**\n\n"
 98 |             "To install:\n"
 99 |             "```bash\n"
100 |             "git clone https://github.com/google-deepmind/alphagenome.git\n"
101 |             "cd alphagenome && pip install .\n"
102 |             "```\n\n"
103 |             "Standard variant annotations are still available via `variant_searcher`."
104 |         )
105 | 
106 |     try:
107 |         # Create client
108 |         model = dna_client.create(api_key)
109 | 
110 |         # Calculate interval boundaries (ensure within supported sizes)
111 |         # Supported sizes: 2048, 16384, 131072, 524288, 1048576
112 |         supported_sizes = [2048, 16384, 131072, 524288, 1048576]
113 | 
114 |         # Find smallest supported size that's >= requested size
115 |         valid_sizes = [s for s in supported_sizes if s >= interval_size]
116 |         if not valid_sizes:
117 |             # If requested size is larger than max, use max
118 |             interval_size = supported_sizes[-1]
119 |         else:
120 |             interval_size = min(valid_sizes)
121 | 
122 |         half_size = interval_size // 2
123 |         interval_start = max(0, position - half_size - 1)  # Convert to 0-based
124 |         interval_end = interval_start + interval_size
125 | 
126 |         # Create interval and variant objects
127 |         interval = genome.Interval(
128 |             chromosome=chromosome, start=interval_start, end=interval_end
129 |         )
130 | 
131 |         variant = genome.Variant(
132 |             chromosome=chromosome,
133 |             position=position,
134 |             reference_bases=reference,
135 |             alternate_bases=alternate,
136 |         )
137 | 
138 |         # Get recommended scorers for human
139 |         scorers = variant_scorers.get_recommended_scorers(organism="human")
140 | 
141 |         # Make prediction
142 |         scores = model.score_variant(
143 |             interval=interval, variant=variant, variant_scorers=scorers
144 |         )
145 | 
146 |         # Format results
147 |         return _format_predictions(
148 |             variant, scores, interval_size, significance_threshold
149 |         )
150 | 
151 |     except Exception as e:
152 |         logger.error(f"AlphaGenome prediction failed: {e}", exc_info=True)
153 |         error_context = (
154 |             f"❌ **AlphaGenome prediction failed**\n\n"
155 |             f"Error: {e!s}\n\n"
156 |             f"**Context:**\n"
157 |             f"- Variant: {chromosome}:{position} {reference}>{alternate}\n"
158 |             f"- Interval size: {interval_size:,} bp\n"
159 |             f"- Tissue types: {tissue_types or 'None specified'}"
160 |         )
161 |         return error_context
162 | 
163 | 
164 | def _format_predictions(
165 |     variant: Any,
166 |     scores: list[Any],
167 |     interval_size: int,
168 |     significance_threshold: float = DEFAULT_SIGNIFICANCE_THRESHOLD,
169 | ) -> str:
170 |     """Format AlphaGenome predictions into markdown.
171 | 
172 |     Args:
173 |         variant: The variant object from AlphaGenome
174 |         scores: List of prediction scores
175 |         interval_size: Size of the genomic context window
176 |         significance_threshold: Threshold for significant changes
177 | 
178 |     Returns:
179 |         Formatted markdown string
180 |     """
181 |     try:
182 |         from alphagenome.models import variant_scorers
183 | 
184 |         # Convert scores to DataFrame
185 |         scores_df = variant_scorers.tidy_scores(scores)
186 | 
187 |         # Start building the output
188 |         lines = [
189 |             "## AlphaGenome Variant Effect Predictions\n",
190 |             f"**Variant**: {variant.chromosome}:{variant.position} {variant.reference_bases}>{variant.alternate_bases}",
191 |             f"**Analysis window**: {interval_size:,} bp\n",
192 |         ]
193 | 
194 |         # Group scores by output type
195 |         if not scores_df.empty:
196 |             # Gene expression effects
197 |             expr_scores = scores_df[
198 |                 scores_df["output_type"].str.contains("RNA_SEQ", na=False)
199 |             ]
200 |             if not expr_scores.empty:
201 |                 top_expr = expr_scores.loc[
202 |                     expr_scores["raw_score"].abs().idxmax()
203 |                 ]
204 |                 gene = top_expr.get("gene_name", "Unknown")
205 |                 score = top_expr["raw_score"]
206 |                 direction = "↓ decreases" if score < 0 else "↑ increases"
207 |                 lines.append("\n### Gene Expression")
208 |                 lines.append(
209 |                     f"- **{gene}**: {score:+.2f} log₂ fold change ({direction} expression)"
210 |                 )
211 | 
212 |             # Chromatin accessibility
213 |             chrom_scores = scores_df[
214 |                 scores_df["output_type"].str.contains("ATAC|DNASE", na=False)
215 |             ]
216 |             if not chrom_scores.empty:
217 |                 top_chrom = chrom_scores.loc[
218 |                     chrom_scores["raw_score"].abs().idxmax()
219 |                 ]
220 |                 score = top_chrom["raw_score"]
221 |                 track = top_chrom.get("track_name", "tissue")
222 |                 direction = "↓ decreases" if score < 0 else "↑ increases"
223 |                 lines.append("\n### Chromatin Accessibility")
224 |                 lines.append(
225 |                     f"- **{track}**: {score:+.2f} log₂ change ({direction} accessibility)"
226 |                 )
227 | 
228 |             # Splicing effects
229 |             splice_scores = scores_df[
230 |                 scores_df["output_type"].str.contains("SPLICE", na=False)
231 |             ]
232 |             if not splice_scores.empty:
233 |                 lines.append("\n### Splicing")
234 |                 lines.append("- Potential splicing alterations detected")
235 | 
236 |             # Summary statistics
237 |             total_tracks = len(scores_df)
238 |             significant = len(
239 |                 scores_df[
240 |                     scores_df["raw_score"].abs() > significance_threshold
241 |                 ]
242 |             )
243 |             lines.append("\n### Summary")
244 |             lines.append(f"- Analyzed {total_tracks} regulatory tracks")
245 |             lines.append(
246 |                 f"- {significant} tracks show substantial changes (|log₂| > {significance_threshold})"
247 |             )
248 |         else:
249 |             lines.append("\n*No significant regulatory effects predicted*")
250 | 
251 |         return "\n".join(lines)
252 | 
253 |     except Exception as e:
254 |         logger.error(f"Failed to format predictions: {e}")
255 |         return f"## AlphaGenome Results\n\nPrediction completed but formatting failed: {e!s}"
256 | 
257 | 
258 | def _validate_inputs(
259 |     chromosome: str, position: int, reference: str, alternate: str
260 | ) -> None:
261 |     """Validate input parameters for variant prediction.
262 | 
263 |     Args:
264 |         chromosome: Chromosome identifier
265 |         position: Genomic position
266 |         reference: Reference allele(s)
267 |         alternate: Alternate allele(s)
268 | 
269 |     Raises:
270 |         ValueError: If any input is invalid
271 |     """
272 |     # Validate chromosome format
273 |     if not CHROMOSOME_PATTERN.match(chromosome):
274 |         raise ValueError(
275 |             f"Invalid chromosome format: {chromosome}. "
276 |             "Expected format: chr1-22, chrX, chrY, chrM, or chrMT"
277 |         )
278 | 
279 |     # Validate position
280 |     if position < 1:
281 |         raise ValueError(f"Position must be >= 1, got {position}")
282 | 
283 |     # Validate nucleotides
284 |     ref_upper = reference.upper()
285 |     alt_upper = alternate.upper()
286 | 
287 |     if not ref_upper:
288 |         raise ValueError("Reference allele cannot be empty")
289 | 
290 |     if not alt_upper:
291 |         raise ValueError("Alternate allele cannot be empty")
292 | 
293 |     invalid_ref = set(ref_upper) - VALID_NUCLEOTIDES
294 |     if invalid_ref:
295 |         raise ValueError(
296 |             f"Invalid nucleotides in reference allele: {invalid_ref}. "
297 |             f"Only A, C, G, T are allowed"
298 |         )
299 | 
300 |     invalid_alt = set(alt_upper) - VALID_NUCLEOTIDES
301 |     if invalid_alt:
302 |         raise ValueError(
303 |             f"Invalid nucleotides in alternate allele: {invalid_alt}. "
304 |             f"Only A, C, G, T are allowed"
305 |         )
306 | 
```

--------------------------------------------------------------------------------
/docs/backend-services-reference/02-biothings-suite.md:
--------------------------------------------------------------------------------

```markdown
  1 | # BioThings Suite API Reference
  2 | 
  3 | The BioThings Suite provides unified access to biomedical annotations across genes, variants, diseases, and drugs through a consistent API interface.
  4 | 
  5 | ## Usage Examples
  6 | 
  7 | For practical examples using the BioThings APIs, see:
  8 | 
  9 | - [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#biothings-integration-for-enhanced-search)
 10 | - [Get Comprehensive Variant Annotations](../how-to-guides/03-get-comprehensive-variant-annotations.md#integration-with-other-biomcp-tools)
 11 | 
 12 | ## Overview
 13 | 
 14 | BioMCP integrates with four BioThings APIs:
 15 | 
 16 | - **MyGene.info**: Gene annotations and functional information
 17 | - **MyVariant.info**: Genetic variant annotations and clinical significance
 18 | - **MyDisease.info**: Disease ontology and terminology mappings
 19 | - **MyChem.info**: Drug/chemical properties and mechanisms
 20 | 
 21 | All APIs share:
 22 | 
 23 | - RESTful JSON interface
 24 | - No authentication required
 25 | - Elasticsearch-based queries
 26 | - Comprehensive data aggregation
 27 | 
 28 | ## MyGene.info
 29 | 
 30 | ### Base URL
 31 | 
 32 | `https://mygene.info/v1/`
 33 | 
 34 | ### Key Endpoints
 35 | 
 36 | #### Gene Query
 37 | 
 38 | ```
 39 | GET /query?q={query}
 40 | ```
 41 | 
 42 | **Parameters:**
 43 | 
 44 | - `q`: Query string (gene symbol, name, or ID)
 45 | - `fields`: Specific fields to return
 46 | - `species`: Limit to species (default: human, mouse, rat)
 47 | - `size`: Number of results (default: 10)
 48 | 
 49 | **Example:**
 50 | 
 51 | ```bash
 52 | curl "https://mygene.info/v1/query?q=BRAF&fields=symbol,name,summary,type_of_gene"
 53 | ```
 54 | 
 55 | #### Gene Annotation
 56 | 
 57 | ```
 58 | GET /gene/{geneid}
 59 | ```
 60 | 
 61 | **Gene ID formats:**
 62 | 
 63 | - Entrez Gene ID: `673`
 64 | - Ensembl ID: `ENSG00000157764`
 65 | - Gene Symbol: `BRAF`
 66 | 
 67 | **Example:**
 68 | 
 69 | ```bash
 70 | curl "https://mygene.info/v1/gene/673?fields=symbol,name,summary,genomic_pos,pathway,go"
 71 | ```
 72 | 
 73 | ### Important Fields
 74 | 
 75 | | Field         | Description            | Example                                 |
 76 | | ------------- | ---------------------- | --------------------------------------- |
 77 | | `symbol`      | Official gene symbol   | "BRAF"                                  |
 78 | | `name`        | Full gene name         | "B-Raf proto-oncogene"                  |
 79 | | `entrezgene`  | NCBI Entrez ID         | 673                                     |
 80 | | `summary`     | Functional description | "This gene encodes..."                  |
 81 | | `genomic_pos` | Chromosomal location   | {"chr": "7", "start": 140433812}        |
 82 | | `pathway`     | Pathway memberships    | {"kegg": [...], "reactome": [...]}      |
 83 | | `go`          | Gene Ontology terms    | {"BP": [...], "MF": [...], "CC": [...]} |
 84 | 
 85 | ## MyVariant.info
 86 | 
 87 | ### Base URL
 88 | 
 89 | `https://myvariant.info/v1/`
 90 | 
 91 | ### Key Endpoints
 92 | 
 93 | #### Variant Query
 94 | 
 95 | ```
 96 | GET /query?q={query}
 97 | ```
 98 | 
 99 | **Query syntax:**
100 | 
101 | - Gene + variant: `dbnsfp.genename:BRAF AND dbnsfp.hgvsp:p.V600E`
102 | - rsID: `dbsnp.rsid:rs121913529`
103 | - Genomic: `_id:chr7:g.140453136A>T`
104 | 
105 | **Example:**
106 | 
107 | ```bash
108 | curl "https://myvariant.info/v1/query?q=dbnsfp.genename:TP53&fields=_id,clinvar,gnomad_exome"
109 | ```
110 | 
111 | #### Variant Annotation
112 | 
113 | ```
114 | GET /variant/{variant_id}
115 | ```
116 | 
117 | **ID formats:**
118 | 
119 | - HGVS genomic: `chr7:g.140453136A>T`
120 | - dbSNP: `rs121913529`
121 | 
122 | ### Important Fields
123 | 
124 | | Field          | Description            | Example                                 |
125 | | -------------- | ---------------------- | --------------------------------------- |
126 | | `clinvar`      | Clinical significance  | {"clinical_significance": "Pathogenic"} |
127 | | `dbsnp`        | dbSNP annotations      | {"rsid": "rs121913529"}                 |
128 | | `cadd`         | CADD scores            | {"phred": 35}                           |
129 | | `gnomad_exome` | Population frequency   | {"af": {"af": 0.00001}}                 |
130 | | `dbnsfp`       | Functional predictions | {"polyphen2": "probably_damaging"}      |
131 | 
132 | ### Query Filters
133 | 
134 | ```python
135 | # Clinical significance
136 | q = "clinvar.clinical_significance:pathogenic"
137 | 
138 | # Frequency filters
139 | q = "gnomad_exome.af.af:<0.01"  # Rare variants
140 | 
141 | # Gene-specific
142 | q = "dbnsfp.genename:BRCA1 AND cadd.phred:>20"
143 | ```
144 | 
145 | ## MyDisease.info
146 | 
147 | ### Base URL
148 | 
149 | `https://mydisease.info/v1/`
150 | 
151 | ### Key Endpoints
152 | 
153 | #### Disease Query
154 | 
155 | ```
156 | GET /query?q={query}
157 | ```
158 | 
159 | **Example:**
160 | 
161 | ```bash
162 | curl "https://mydisease.info/v1/query?q=melanoma&fields=mondo,disease_ontology,synonyms"
163 | ```
164 | 
165 | #### Disease Annotation
166 | 
167 | ```
168 | GET /disease/{disease_id}
169 | ```
170 | 
171 | **ID formats:**
172 | 
173 | - MONDO: `MONDO:0007254`
174 | - DOID: `DOID:1909`
175 | - OMIM: `OMIM:155600`
176 | 
177 | ### Important Fields
178 | 
179 | | Field              | Description       | Example                                      |
180 | | ------------------ | ----------------- | -------------------------------------------- |
181 | | `mondo`            | MONDO ontology    | {"id": "MONDO:0007254", "label": "melanoma"} |
182 | | `disease_ontology` | Disease Ontology  | {"id": "DOID:1909"}                          |
183 | | `synonyms`         | Alternative names | ["malignant melanoma", "MM"]                 |
184 | | `xrefs`            | Cross-references  | {"omim": ["155600"], "mesh": ["D008545"]}    |
185 | | `phenotypes`       | HPO terms         | [{"hpo_id": "HP:0002861"}]                   |
186 | 
187 | ## MyChem.info
188 | 
189 | ### Base URL
190 | 
191 | `https://mychem.info/v1/`
192 | 
193 | ### Key Endpoints
194 | 
195 | #### Drug Query
196 | 
197 | ```
198 | GET /query?q={query}
199 | ```
200 | 
201 | **Example:**
202 | 
203 | ```bash
204 | curl "https://mychem.info/v1/query?q=imatinib&fields=drugbank,chembl,chebi"
205 | ```
206 | 
207 | #### Drug Annotation
208 | 
209 | ```
210 | GET /drug/{drug_id}
211 | ```
212 | 
213 | **ID formats:**
214 | 
215 | - DrugBank: `DB00619`
216 | - ChEMBL: `CHEMBL941`
217 | - Name: `imatinib`
218 | 
219 | ### Important Fields
220 | 
221 | | Field          | Description    | Example                                      |
222 | | -------------- | -------------- | -------------------------------------------- |
223 | | `drugbank`     | DrugBank data  | {"id": "DB00619", "name": "Imatinib"}        |
224 | | `chembl`       | ChEMBL data    | {"molecule_chembl_id": "CHEMBL941"}          |
225 | | `chebi`        | ChEBI ontology | {"id": "CHEBI:45783"}                        |
226 | | `drugcentral`  | Indications    | {"indications": [...]}                       |
227 | | `pharmacology` | Mechanism      | {"mechanism_of_action": "BCR-ABL inhibitor"} |
228 | 
229 | ## Common Query Patterns
230 | 
231 | ### 1. Gene to Variant Pipeline
232 | 
233 | ```python
234 | # Step 1: Get gene info
235 | gene_response = requests.get(
236 |     "https://mygene.info/v1/gene/BRAF",
237 |     params={"fields": "symbol,genomic_pos"}
238 | )
239 | 
240 | # Step 2: Find variants in gene
241 | variant_response = requests.get(
242 |     "https://myvariant.info/v1/query",
243 |     params={
244 |         "q": "dbnsfp.genename:BRAF",
245 |         "fields": "clinvar.clinical_significance,gnomad_exome.af",
246 |         "size": 100
247 |     }
248 | )
249 | ```
250 | 
251 | ### 2. Disease Synonym Expansion
252 | 
253 | ```python
254 | # Get all synonyms for a disease
255 | disease_response = requests.get(
256 |     "https://mydisease.info/v1/query",
257 |     params={
258 |         "q": "melanoma",
259 |         "fields": "mondo,synonyms,xrefs"
260 |     }
261 | )
262 | 
263 | # Extract all names
264 | all_names = ["melanoma"]
265 | for hit in disease_response.json()["hits"]:
266 |     if "synonyms" in hit:
267 |         all_names.extend(hit["synonyms"])
268 | ```
269 | 
270 | ### 3. Drug Target Lookup
271 | 
272 | ```python
273 | # Find drugs targeting a gene
274 | drug_response = requests.get(
275 |     "https://mychem.info/v1/query",
276 |     params={
277 |         "q": "drugcentral.targets.gene_symbol:BRAF",
278 |         "fields": "drugbank.name,chembl.pref_name",
279 |         "size": 50
280 |     }
281 | )
282 | ```
283 | 
284 | ## Rate Limits and Best Practices
285 | 
286 | ### Rate Limits
287 | 
288 | - **Default**: 1,000 requests/hour per IP
289 | - **Batch queries**: Up to 1,000 IDs per request
290 | - **No authentication**: Public access
291 | 
292 | ### Best Practices
293 | 
294 | #### 1. Use Field Filtering
295 | 
296 | ```python
297 | # Good - only request needed fields
298 | params = {"fields": "symbol,name,summary"}
299 | 
300 | # Bad - returns all fields
301 | params = {}
302 | ```
303 | 
304 | #### 2. Batch Requests
305 | 
306 | ```python
307 | # Good - single request for multiple genes
308 | response = requests.post(
309 |     "https://mygene.info/v1/gene",
310 |     json={"ids": ["BRAF", "KRAS", "EGFR"]}
311 | )
312 | 
313 | # Bad - multiple individual requests
314 | for gene in ["BRAF", "KRAS", "EGFR"]:
315 |     requests.get(f"https://mygene.info/v1/gene/{gene}")
316 | ```
317 | 
318 | #### 3. Handle Missing Data
319 | 
320 | ```python
321 | # Check for field existence
322 | if "clinvar" in variant and "clinical_significance" in variant["clinvar"]:
323 |     significance = variant["clinvar"]["clinical_significance"]
324 | else:
325 |     significance = "Not available"
326 | ```
327 | 
328 | ## Error Handling
329 | 
330 | ### Common Errors
331 | 
332 | #### 404 Not Found
333 | 
334 | ```json
335 | {
336 |   "success": false,
337 |   "error": "ID not found"
338 | }
339 | ```
340 | 
341 | #### 400 Bad Request
342 | 
343 | ```json
344 | {
345 |   "success": false,
346 |   "error": "Invalid query syntax"
347 | }
348 | ```
349 | 
350 | #### 429 Rate Limited
351 | 
352 | ```json
353 | {
354 |   "success": false,
355 |   "error": "Rate limit exceeded"
356 | }
357 | ```
358 | 
359 | ### Error Handling Code
360 | 
361 | ```python
362 | def query_biothings(api_url, query_params):
363 |     try:
364 |         response = requests.get(api_url, params=query_params)
365 |         response.raise_for_status()
366 |         return response.json()
367 |     except requests.exceptions.HTTPError as e:
368 |         if e.response.status_code == 404:
369 |             return {"error": "Not found", "query": query_params}
370 |         elif e.response.status_code == 429:
371 |             # Implement exponential backoff
372 |             time.sleep(60)
373 |             return query_biothings(api_url, query_params)
374 |         else:
375 |             raise
376 | ```
377 | 
378 | ## Data Sources
379 | 
380 | Each BioThings API aggregates data from multiple sources:
381 | 
382 | ### MyGene.info Sources
383 | 
384 | - NCBI Entrez Gene
385 | - Ensembl
386 | - UniProt
387 | - KEGG, Reactome, WikiPathways
388 | - Gene Ontology
389 | 
390 | ### MyVariant.info Sources
391 | 
392 | - dbSNP
393 | - ClinVar
394 | - gnomAD
395 | - CADD
396 | - PolyPhen-2, SIFT
397 | - COSMIC
398 | 
399 | ### MyDisease.info Sources
400 | 
401 | - MONDO
402 | - Disease Ontology
403 | - OMIM
404 | - MeSH
405 | - HPO
406 | 
407 | ### MyChem.info Sources
408 | 
409 | - DrugBank
410 | - ChEMBL
411 | - ChEBI
412 | - PubChem
413 | - DrugCentral
414 | 
415 | ## Advanced Features
416 | 
417 | ### Full-Text Search
418 | 
419 | ```python
420 | # Search across all fields
421 | params = {
422 |     "q": "lung cancer EGFR",  # Searches all text fields
423 |     "fields": "symbol,name,summary"
424 | }
425 | ```
426 | 
427 | ### Faceted Search
428 | 
429 | ```python
430 | # Get aggregations
431 | params = {
432 |     "q": "clinvar.clinical_significance:pathogenic",
433 |     "facets": "dbnsfp.genename",
434 |     "size": 0  # Only return facets
435 | }
436 | ```
437 | 
438 | ### Scrolling Large Results
439 | 
440 | ```python
441 | # For results > 10,000
442 | params = {
443 |     "q": "dbnsfp.genename:TP53",
444 |     "fetch_all": True,
445 |     "fields": "_id"
446 | }
447 | ```
448 | 
449 | ## Integration Tips
450 | 
451 | ### 1. Caching Strategy
452 | 
453 | - Cache gene/drug/disease lookups (stable)
454 | - Don't cache variant queries (frequently updated)
455 | - Use ETags for conditional requests
456 | 
457 | ### 2. Parallel Requests
458 | 
459 | ```python
460 | import asyncio
461 | import aiohttp
462 | 
463 | async def fetch_all(session, urls):
464 |     tasks = []
465 |     for url in urls:
466 |         tasks.append(session.get(url))
467 |     return await asyncio.gather(*tasks)
468 | ```
469 | 
470 | ### 3. Data Normalization
471 | 
472 | ```python
473 | def normalize_gene_symbol(symbol):
474 |     # Query MyGene to get official symbol
475 |     response = requests.get(
476 |         f"https://mygene.info/v1/query?q={symbol}"
477 |     )
478 |     if response.json()["hits"]:
479 |         return response.json()["hits"][0]["symbol"]
480 |     return symbol
481 | ```
482 | 
```

--------------------------------------------------------------------------------
/tests/tdd/test_biothings_integration.py:
--------------------------------------------------------------------------------

```python
  1 | """Unit tests for BioThings API integration."""
  2 | 
  3 | from unittest.mock import AsyncMock, patch
  4 | 
  5 | import pytest
  6 | 
  7 | from biomcp.integrations import BioThingsClient, DiseaseInfo, GeneInfo
  8 | 
  9 | 
 10 | @pytest.fixture
 11 | def mock_http_client():
 12 |     """Mock the http_client.request_api function."""
 13 |     with patch("biomcp.integrations.biothings_client.http_client") as mock:
 14 |         yield mock
 15 | 
 16 | 
 17 | @pytest.fixture
 18 | def biothings_client():
 19 |     """Create a BioThings client instance."""
 20 |     return BioThingsClient()
 21 | 
 22 | 
 23 | class TestGeneInfo:
 24 |     """Test gene information retrieval."""
 25 | 
 26 |     @pytest.mark.asyncio
 27 |     async def test_get_gene_by_symbol(
 28 |         self, biothings_client, mock_http_client
 29 |     ):
 30 |         """Test getting gene info by symbol."""
 31 |         # Mock query response
 32 |         mock_http_client.request_api = AsyncMock(
 33 |             side_effect=[
 34 |                 (
 35 |                     {
 36 |                         "hits": [
 37 |                             {
 38 |                                 "_id": "7157",
 39 |                                 "symbol": "TP53",
 40 |                                 "name": "tumor protein p53",
 41 |                                 "taxid": 9606,
 42 |                             }
 43 |                         ]
 44 |                     },
 45 |                     None,
 46 |                 ),
 47 |                 # Mock get response
 48 |                 (
 49 |                     {
 50 |                         "_id": "7157",
 51 |                         "symbol": "TP53",
 52 |                         "name": "tumor protein p53",
 53 |                         "summary": "This gene encodes a tumor suppressor protein...",
 54 |                         "alias": ["p53", "LFS1"],
 55 |                         "type_of_gene": "protein-coding",
 56 |                         "entrezgene": 7157,
 57 |                     },
 58 |                     None,
 59 |                 ),
 60 |             ]
 61 |         )
 62 | 
 63 |         result = await biothings_client.get_gene_info("TP53")
 64 | 
 65 |         assert result is not None
 66 |         assert isinstance(result, GeneInfo)
 67 |         assert result.symbol == "TP53"
 68 |         assert result.name == "tumor protein p53"
 69 |         assert result.gene_id == "7157"
 70 |         assert "p53" in result.alias
 71 | 
 72 |     @pytest.mark.asyncio
 73 |     async def test_get_gene_by_id(self, biothings_client, mock_http_client):
 74 |         """Test getting gene info by Entrez ID."""
 75 |         # Mock direct get response
 76 |         mock_http_client.request_api = AsyncMock(
 77 |             return_value=(
 78 |                 {
 79 |                     "_id": "7157",
 80 |                     "symbol": "TP53",
 81 |                     "name": "tumor protein p53",
 82 |                     "summary": "This gene encodes a tumor suppressor protein...",
 83 |                 },
 84 |                 None,
 85 |             )
 86 |         )
 87 | 
 88 |         result = await biothings_client.get_gene_info("7157")
 89 | 
 90 |         assert result is not None
 91 |         assert result.symbol == "TP53"
 92 |         assert result.gene_id == "7157"
 93 | 
 94 |     @pytest.mark.asyncio
 95 |     async def test_gene_not_found(self, biothings_client, mock_http_client):
 96 |         """Test handling of gene not found."""
 97 |         mock_http_client.request_api = AsyncMock(
 98 |             return_value=({"hits": []}, None)
 99 |         )
100 | 
101 |         result = await biothings_client.get_gene_info("INVALID_GENE")
102 |         assert result is None
103 | 
104 |     @pytest.mark.asyncio
105 |     async def test_batch_get_genes(self, biothings_client, mock_http_client):
106 |         """Test batch gene retrieval."""
107 |         mock_http_client.request_api = AsyncMock(
108 |             return_value=(
109 |                 [
110 |                     {
111 |                         "_id": "7157",
112 |                         "symbol": "TP53",
113 |                         "name": "tumor protein p53",
114 |                     },
115 |                     {
116 |                         "_id": "673",
117 |                         "symbol": "BRAF",
118 |                         "name": "B-Raf proto-oncogene",
119 |                     },
120 |                 ],
121 |                 None,
122 |             )
123 |         )
124 | 
125 |         results = await biothings_client.batch_get_genes(["TP53", "BRAF"])
126 | 
127 |         assert len(results) == 2
128 |         assert results[0].symbol == "TP53"
129 |         assert results[1].symbol == "BRAF"
130 | 
131 | 
132 | class TestDiseaseInfo:
133 |     """Test disease information retrieval."""
134 | 
135 |     @pytest.mark.asyncio
136 |     async def test_get_disease_by_name(
137 |         self, biothings_client, mock_http_client
138 |     ):
139 |         """Test getting disease info by name."""
140 |         # Mock query response
141 |         mock_http_client.request_api = AsyncMock(
142 |             side_effect=[
143 |                 (
144 |                     {
145 |                         "hits": [
146 |                             {
147 |                                 "_id": "MONDO:0007959",
148 |                                 "name": "melanoma",
149 |                                 "mondo": {"mondo": "MONDO:0007959"},
150 |                             }
151 |                         ]
152 |                     },
153 |                     None,
154 |                 ),
155 |                 # Mock get response
156 |                 (
157 |                     {
158 |                         "_id": "MONDO:0007959",
159 |                         "name": "melanoma",
160 |                         "mondo": {
161 |                             "definition": "A malignant neoplasm composed of melanocytes.",
162 |                             "synonym": {
163 |                                 "exact": [
164 |                                     "malignant melanoma",
165 |                                     "naevocarcinoma",
166 |                                 ]
167 |                             },
168 |                         },
169 |                     },
170 |                     None,
171 |                 ),
172 |             ]
173 |         )
174 | 
175 |         result = await biothings_client.get_disease_info("melanoma")
176 | 
177 |         assert result is not None
178 |         assert isinstance(result, DiseaseInfo)
179 |         assert result.name == "melanoma"
180 |         assert result.disease_id == "MONDO:0007959"
181 |         assert "malignant melanoma" in result.synonyms
182 | 
183 |     @pytest.mark.asyncio
184 |     async def test_get_disease_by_id(self, biothings_client, mock_http_client):
185 |         """Test getting disease info by MONDO ID."""
186 |         mock_http_client.request_api = AsyncMock(
187 |             return_value=(
188 |                 {
189 |                     "_id": "MONDO:0016575",
190 |                     "name": "GIST",
191 |                     "mondo": {
192 |                         "definition": "Gastrointestinal stromal tumor...",
193 |                     },
194 |                 },
195 |                 None,
196 |             )
197 |         )
198 | 
199 |         result = await biothings_client.get_disease_info("MONDO:0016575")
200 | 
201 |         assert result is not None
202 |         assert result.name == "GIST"
203 |         assert result.disease_id == "MONDO:0016575"
204 | 
205 |     @pytest.mark.asyncio
206 |     async def test_get_disease_synonyms(
207 |         self, biothings_client, mock_http_client
208 |     ):
209 |         """Test getting disease synonyms for query expansion."""
210 |         mock_http_client.request_api = AsyncMock(
211 |             side_effect=[
212 |                 (
213 |                     {
214 |                         "hits": [
215 |                             {
216 |                                 "_id": "MONDO:0018076",
217 |                                 "name": "GIST",
218 |                             }
219 |                         ]
220 |                     },
221 |                     None,
222 |                 ),
223 |                 (
224 |                     {
225 |                         "_id": "MONDO:0018076",
226 |                         "name": "gastrointestinal stromal tumor",
227 |                         "mondo": {
228 |                             "synonym": {
229 |                                 "exact": [
230 |                                     "GIST",
231 |                                     "gastrointestinal stromal tumour",
232 |                                     "GI stromal tumor",
233 |                                 ]
234 |                             }
235 |                         },
236 |                     },
237 |                     None,
238 |                 ),
239 |             ]
240 |         )
241 | 
242 |         synonyms = await biothings_client.get_disease_synonyms("GIST")
243 | 
244 |         assert "GIST" in synonyms
245 |         assert "gastrointestinal stromal tumor" in synonyms
246 |         assert len(synonyms) <= 5  # Limited to 5
247 | 
248 | 
249 | class TestTrialSynonymExpansion:
250 |     """Test disease synonym expansion in trial searches."""
251 | 
252 |     @pytest.mark.asyncio
253 |     async def test_trial_search_with_synonym_expansion(self):
254 |         """Test that trial search expands disease synonyms."""
255 |         from biomcp.trials.search import TrialQuery, convert_query
256 | 
257 |         with patch("biomcp.trials.search.BioThingsClient") as mock_client:
258 |             # Mock synonym expansion
259 |             mock_instance = mock_client.return_value
260 |             mock_instance.get_disease_synonyms = AsyncMock(
261 |                 return_value=[
262 |                     "GIST",
263 |                     "gastrointestinal stromal tumor",
264 |                     "GI stromal tumor",
265 |                 ]
266 |             )
267 | 
268 |             query = TrialQuery(
269 |                 conditions=["GIST"],
270 |                 expand_synonyms=True,
271 |             )
272 | 
273 |             params = await convert_query(query)
274 | 
275 |             # Check that conditions were expanded
276 |             assert "query.cond" in params
277 |             cond_value = params["query.cond"][0]
278 |             assert "GIST" in cond_value
279 |             assert "gastrointestinal stromal tumor" in cond_value
280 | 
281 |     @pytest.mark.asyncio
282 |     async def test_trial_search_without_synonym_expansion(self):
283 |         """Test that trial search works without synonym expansion."""
284 |         from biomcp.trials.search import TrialQuery, convert_query
285 | 
286 |         query = TrialQuery(
287 |             conditions=["GIST"],
288 |             expand_synonyms=False,
289 |         )
290 | 
291 |         params = await convert_query(query)
292 | 
293 |         # Check that conditions were not expanded
294 |         assert "query.cond" in params
295 |         assert params["query.cond"] == ["GIST"]
296 | 
297 | 
298 | class TestErrorHandling:
299 |     """Test error handling in BioThings integration."""
300 | 
301 |     @pytest.mark.asyncio
302 |     async def test_api_error_handling(
303 |         self, biothings_client, mock_http_client
304 |     ):
305 |         """Test handling of API errors."""
306 |         from biomcp.http_client import RequestError
307 | 
308 |         mock_http_client.request_api = AsyncMock(
309 |             return_value=(
310 |                 None,
311 |                 RequestError(code=500, message="Internal server error"),
312 |             )
313 |         )
314 | 
315 |         result = await biothings_client.get_gene_info("TP53")
316 |         assert result is None
317 | 
318 |     @pytest.mark.asyncio
319 |     async def test_invalid_response_format(
320 |         self, biothings_client, mock_http_client
321 |     ):
322 |         """Test handling of invalid API responses."""
323 |         mock_http_client.request_api = AsyncMock(
324 |             return_value=({"invalid": "response"}, None)
325 |         )
326 | 
327 |         result = await biothings_client.get_gene_info("TP53")
328 |         assert result is None
329 | 
```

--------------------------------------------------------------------------------
/src/biomcp/http_client.py:
--------------------------------------------------------------------------------

```python
  1 | import csv
  2 | import json
  3 | import os
  4 | import ssl
  5 | from io import StringIO
  6 | from ssl import PROTOCOL_TLS_CLIENT, SSLContext, TLSVersion
  7 | from typing import Literal, TypeVar
  8 | 
  9 | import certifi
 10 | from diskcache import Cache
 11 | from platformdirs import user_cache_dir
 12 | from pydantic import BaseModel
 13 | 
 14 | from .circuit_breaker import CircuitBreakerConfig, circuit_breaker
 15 | from .constants import (
 16 |     AGGRESSIVE_INITIAL_RETRY_DELAY,
 17 |     AGGRESSIVE_MAX_RETRY_ATTEMPTS,
 18 |     AGGRESSIVE_MAX_RETRY_DELAY,
 19 |     DEFAULT_CACHE_TIMEOUT,
 20 |     DEFAULT_FAILURE_THRESHOLD,
 21 |     DEFAULT_RECOVERY_TIMEOUT,
 22 |     DEFAULT_SUCCESS_THRESHOLD,
 23 | )
 24 | from .http_client_simple import execute_http_request
 25 | from .metrics import Timer
 26 | from .rate_limiter import domain_limiter
 27 | from .retry import (
 28 |     RetryableHTTPError,
 29 |     RetryConfig,
 30 |     is_retryable_status,
 31 |     with_retry,
 32 | )
 33 | from .utils.endpoint_registry import get_registry
 34 | 
 35 | T = TypeVar("T", bound=BaseModel)
 36 | 
 37 | 
 38 | class RequestError(BaseModel):
 39 |     code: int
 40 |     message: str
 41 | 
 42 | 
 43 | _cache: Cache | None = None
 44 | 
 45 | 
 46 | def get_cache() -> Cache:
 47 |     global _cache
 48 |     if _cache is None:
 49 |         cache_path = os.path.join(user_cache_dir("biomcp"), "http_cache")
 50 |         _cache = Cache(cache_path)
 51 |     return _cache
 52 | 
 53 | 
 54 | def generate_cache_key(method: str, url: str, params: dict) -> str:
 55 |     """Generate cache key using Python's built-in hash function for speed."""
 56 |     # Handle simple cases without params
 57 |     if not params:
 58 |         return f"{method.upper()}:{url}"
 59 | 
 60 |     # Use Python's built-in hash with a fixed seed for consistency
 61 |     # This is much faster than SHA256 for cache keys
 62 |     params_str = json.dumps(params, sort_keys=True, separators=(",", ":"))
 63 |     key_source = f"{method.upper()}:{url}:{params_str}"
 64 | 
 65 |     # Use Python's hash function with a fixed seed for deterministic results
 66 |     # Convert to positive hex string for compatibility
 67 |     hash_value = hash(key_source)
 68 |     return f"{hash_value & 0xFFFFFFFFFFFFFFFF:016x}"
 69 | 
 70 | 
 71 | def cache_response(cache_key: str, content: str, ttl: int):
 72 |     expire = None if ttl == -1 else ttl
 73 |     cache = get_cache()
 74 |     cache.set(cache_key, content, expire=expire)
 75 | 
 76 | 
 77 | def get_cached_response(cache_key: str) -> str | None:
 78 |     cache = get_cache()
 79 |     return cache.get(cache_key)
 80 | 
 81 | 
 82 | def get_ssl_context(tls_version: TLSVersion) -> SSLContext:
 83 |     """Create an SSLContext with the specified TLS version."""
 84 |     context = SSLContext(PROTOCOL_TLS_CLIENT)
 85 |     context.minimum_version = tls_version
 86 |     context.maximum_version = tls_version
 87 |     context.load_verify_locations(cafile=certifi.where())
 88 |     return context
 89 | 
 90 | 
 91 | async def call_http(
 92 |     method: str,
 93 |     url: str,
 94 |     params: dict,
 95 |     verify: ssl.SSLContext | str | bool = True,
 96 |     retry_config: RetryConfig | None = None,
 97 |     headers: dict[str, str] | None = None,
 98 | ) -> tuple[int, str]:
 99 |     """Make HTTP request with optional retry logic.
100 | 
101 |     Args:
102 |         method: HTTP method (GET or POST)
103 |         url: Target URL
104 |         params: Request parameters
105 |         verify: SSL verification settings
106 |         retry_config: Retry configuration (if None, no retry)
107 | 
108 |     Returns:
109 |         Tuple of (status_code, response_text)
110 |     """
111 | 
112 |     async def _make_request() -> tuple[int, str]:
113 |         # Extract domain from URL for metrics tagging
114 |         from urllib.parse import urlparse
115 | 
116 |         parsed = urlparse(url)
117 |         host = parsed.hostname or "unknown"
118 | 
119 |         # Apply circuit breaker for the host
120 |         breaker_config = CircuitBreakerConfig(
121 |             failure_threshold=DEFAULT_FAILURE_THRESHOLD,
122 |             recovery_timeout=DEFAULT_RECOVERY_TIMEOUT,
123 |             success_threshold=DEFAULT_SUCCESS_THRESHOLD,
124 |             expected_exception=(ConnectionError, TimeoutError),
125 |         )
126 | 
127 |         @circuit_breaker(f"http_{host}", breaker_config)
128 |         async def _execute_with_breaker():
129 |             async with Timer(
130 |                 "http_request", tags={"method": method, "host": host}
131 |             ):
132 |                 return await execute_http_request(
133 |                     method, url, params, verify, headers
134 |                 )
135 | 
136 |         status, text = await _execute_with_breaker()
137 | 
138 |         # Check if status code should trigger retry
139 |         if retry_config and is_retryable_status(status, retry_config):
140 |             raise RetryableHTTPError(status, text)
141 | 
142 |         return status, text
143 | 
144 |     # Apply retry logic if configured
145 |     if retry_config:
146 |         wrapped_func = with_retry(retry_config)(_make_request)
147 |         try:
148 |             return await wrapped_func()
149 |         except RetryableHTTPError as exc:
150 |             # Convert retryable HTTP errors back to status/text
151 |             return exc.status_code, exc.message
152 |         except Exception:
153 |             # Let other exceptions bubble up
154 |             raise
155 |     else:
156 |         return await _make_request()
157 | 
158 | 
159 | def _handle_offline_mode(
160 |     url: str,
161 |     method: str,
162 |     request: BaseModel | dict,
163 |     cache_ttl: int,
164 |     response_model_type: type[T] | None,
165 | ) -> tuple[T | None, RequestError | None] | None:
166 |     """Handle offline mode logic. Returns None if not in offline mode."""
167 |     if os.getenv("BIOMCP_OFFLINE", "").lower() not in ("true", "1", "yes"):
168 |         return None
169 | 
170 |     # In offline mode, only return cached responses
171 |     if cache_ttl > 0:
172 |         cache_key = generate_cache_key(
173 |             method,
174 |             url,
175 |             request
176 |             if isinstance(request, dict)
177 |             else request.model_dump(exclude_none=True, by_alias=True),
178 |         )
179 |         cached_content = get_cached_response(cache_key)
180 |         if cached_content:
181 |             return parse_response(200, cached_content, response_model_type)
182 | 
183 |     return None, RequestError(
184 |         code=503,
185 |         message=f"Offline mode enabled (BIOMCP_OFFLINE=true). Cannot fetch from {url}",
186 |     )
187 | 
188 | 
189 | def _validate_endpoint(endpoint_key: str | None) -> None:
190 |     """Validate endpoint key if provided."""
191 |     if endpoint_key:
192 |         registry = get_registry()
193 |         if endpoint_key not in registry.get_all_endpoints():
194 |             raise ValueError(
195 |                 f"Unknown endpoint key: {endpoint_key}. Please register in endpoint_registry.py"
196 |             )
197 | 
198 | 
199 | def _prepare_request_params(
200 |     request: BaseModel | dict,
201 | ) -> tuple[dict, dict | None]:
202 |     """Convert request to params dict and extract headers."""
203 |     if isinstance(request, BaseModel):
204 |         params = request.model_dump(exclude_none=True, by_alias=True)
205 |     else:
206 |         params = request.copy() if isinstance(request, dict) else request
207 | 
208 |     # Extract headers if present
209 |     headers = None
210 |     if isinstance(params, dict) and "_headers" in params:
211 |         try:
212 |             import json
213 | 
214 |             headers = json.loads(params.pop("_headers"))
215 |         except (json.JSONDecodeError, TypeError):
216 |             pass  # Ignore invalid headers
217 | 
218 |     return params, headers
219 | 
220 | 
221 | def _get_retry_config(
222 |     enable_retry: bool, domain: str | None
223 | ) -> RetryConfig | None:
224 |     """Get retry configuration based on settings."""
225 |     if not enable_retry:
226 |         return None
227 | 
228 |     # Use more aggressive retry for certain domains
229 |     if domain in ["clinicaltrials", "pubmed", "myvariant"]:
230 |         return RetryConfig(
231 |             max_attempts=AGGRESSIVE_MAX_RETRY_ATTEMPTS,
232 |             initial_delay=AGGRESSIVE_INITIAL_RETRY_DELAY,
233 |             max_delay=AGGRESSIVE_MAX_RETRY_DELAY,
234 |         )
235 |     return RetryConfig()  # Default settings
236 | 
237 | 
238 | async def request_api(
239 |     url: str,
240 |     request: BaseModel | dict,
241 |     response_model_type: type[T] | None = None,
242 |     method: Literal["GET", "POST"] = "GET",
243 |     cache_ttl: int = DEFAULT_CACHE_TIMEOUT,
244 |     tls_version: TLSVersion | None = None,
245 |     domain: str | None = None,
246 |     enable_retry: bool = True,
247 |     endpoint_key: str | None = None,
248 | ) -> tuple[T | None, RequestError | None]:
249 |     # Handle offline mode
250 |     offline_result = _handle_offline_mode(
251 |         url, method, request, cache_ttl, response_model_type
252 |     )
253 |     if offline_result is not None:
254 |         return offline_result
255 | 
256 |     # Validate endpoint
257 |     _validate_endpoint(endpoint_key)
258 | 
259 |     # Apply rate limiting if domain is specified
260 |     if domain:
261 |         async with domain_limiter.limit(domain):
262 |             pass  # Rate limit acquired
263 | 
264 |     # Prepare request
265 |     verify = get_ssl_context(tls_version) if tls_version else True
266 |     params, headers = _prepare_request_params(request)
267 |     retry_config = _get_retry_config(enable_retry, domain)
268 | 
269 |     # Short-circuit if caching disabled
270 |     if cache_ttl == 0:
271 |         status, content = await call_http(
272 |             method,
273 |             url,
274 |             params,
275 |             verify=verify,
276 |             retry_config=retry_config,
277 |             headers=headers,
278 |         )
279 |         return parse_response(status, content, response_model_type)
280 | 
281 |     # Handle caching
282 |     cache_key = generate_cache_key(method, url, params)
283 |     cached_content = get_cached_response(cache_key)
284 | 
285 |     if cached_content:
286 |         return parse_response(200, cached_content, response_model_type)
287 | 
288 |     # Make HTTP request if not cached
289 |     status, content = await call_http(
290 |         method,
291 |         url,
292 |         params,
293 |         verify=verify,
294 |         retry_config=retry_config,
295 |         headers=headers,
296 |     )
297 |     parsed_response = parse_response(status, content, response_model_type)
298 | 
299 |     # Cache if successful response
300 |     if status == 200:
301 |         cache_response(cache_key, content, cache_ttl)
302 | 
303 |     return parsed_response
304 | 
305 | 
306 | def parse_response(
307 |     status_code: int,
308 |     content: str,
309 |     response_model_type: type[T] | None = None,
310 | ) -> tuple[T | None, RequestError | None]:
311 |     if status_code != 200:
312 |         return None, RequestError(code=status_code, message=content)
313 | 
314 |     # Handle empty content
315 |     if not content or content.strip() == "":
316 |         return None, RequestError(
317 |             code=500,
318 |             message="Empty response received from API",
319 |         )
320 | 
321 |     try:
322 |         if response_model_type is None:
323 |             # Try to parse as JSON first
324 |             if content.startswith("{") or content.startswith("["):
325 |                 response_dict = json.loads(content)
326 |             elif "," in content:
327 |                 io = StringIO(content)
328 |                 response_dict = list(csv.DictReader(io))
329 |             else:
330 |                 response_dict = {"text": content}
331 |             return response_dict, None
332 | 
333 |         parsed: T = response_model_type.model_validate_json(content)
334 |         return parsed, None
335 | 
336 |     except json.JSONDecodeError as exc:
337 |         # Provide more detailed error message for JSON parsing issues
338 |         return None, RequestError(
339 |             code=500,
340 |             message=f"Invalid JSON response: {exc}. Content preview: {content[:100]}...",
341 |         )
342 |     except Exception as exc:
343 |         return None, RequestError(
344 |             code=500,
345 |             message=f"Failed to parse response: {exc}",
346 |         )
347 | 
```

--------------------------------------------------------------------------------
/src/biomcp/diseases/search.py:
--------------------------------------------------------------------------------

```python
  1 | """Search functionality for diseases via NCI CTS API."""
  2 | 
  3 | import logging
  4 | from typing import Any
  5 | 
  6 | from ..constants import NCI_DISEASES_URL
  7 | from ..integrations.cts_api import CTSAPIError, make_cts_request
  8 | from ..utils import parse_or_query
  9 | 
 10 | logger = logging.getLogger(__name__)
 11 | 
 12 | 
 13 | def _build_disease_params(
 14 |     name: str | None,
 15 |     disease_type: str | None,
 16 |     category: str | None,
 17 |     codes: list[str] | None,
 18 |     parent_ids: list[str] | None,
 19 |     ancestor_ids: list[str] | None,
 20 |     include: list[str] | None,
 21 |     sort: str | None,
 22 |     order: str | None,
 23 |     page_size: int,
 24 | ) -> dict[str, Any]:
 25 |     """Build query parameters for disease search."""
 26 |     params: dict[str, Any] = {"size": page_size}
 27 | 
 28 |     if name:
 29 |         params["name"] = name
 30 | 
 31 |     # Use 'type' parameter instead of 'category'
 32 |     if disease_type:
 33 |         params["type"] = disease_type
 34 |     elif category:  # Backward compatibility
 35 |         params["type"] = category
 36 | 
 37 |     if codes:
 38 |         params["codes"] = ",".join(codes) if isinstance(codes, list) else codes
 39 | 
 40 |     if parent_ids:
 41 |         params["parent_ids"] = (
 42 |             ",".join(parent_ids)
 43 |             if isinstance(parent_ids, list)
 44 |             else parent_ids
 45 |         )
 46 | 
 47 |     if ancestor_ids:
 48 |         params["ancestor_ids"] = (
 49 |             ",".join(ancestor_ids)
 50 |             if isinstance(ancestor_ids, list)
 51 |             else ancestor_ids
 52 |         )
 53 | 
 54 |     if include:
 55 |         params["include"] = (
 56 |             ",".join(include) if isinstance(include, list) else include
 57 |         )
 58 | 
 59 |     if sort:
 60 |         params["sort"] = sort
 61 |         if order:
 62 |             params["order"] = order.lower()
 63 | 
 64 |     return params
 65 | 
 66 | 
 67 | async def search_diseases(
 68 |     name: str | None = None,
 69 |     include_synonyms: bool = True,  # Deprecated - kept for backward compatibility
 70 |     category: str | None = None,
 71 |     disease_type: str | None = None,
 72 |     codes: list[str] | None = None,
 73 |     parent_ids: list[str] | None = None,
 74 |     ancestor_ids: list[str] | None = None,
 75 |     include: list[str] | None = None,
 76 |     sort: str | None = None,
 77 |     order: str | None = None,
 78 |     page_size: int = 20,
 79 |     page: int = 1,
 80 |     api_key: str | None = None,
 81 | ) -> dict[str, Any]:
 82 |     """
 83 |     Search for diseases in the NCI CTS database.
 84 | 
 85 |     This provides access to NCI's controlled vocabulary of cancer conditions
 86 |     used in clinical trials, with official terms and synonyms.
 87 | 
 88 |     Args:
 89 |         name: Disease name to search for (partial match, searches synonyms automatically)
 90 |         include_synonyms: [Deprecated] This parameter is ignored - API always searches synonyms
 91 |         category: Disease category/type filter (deprecated - use disease_type)
 92 |         disease_type: Type of disease (e.g., 'maintype', 'subtype', 'stage')
 93 |         codes: List of disease codes (e.g., ['C3868', 'C5806'])
 94 |         parent_ids: List of parent disease IDs
 95 |         ancestor_ids: List of ancestor disease IDs
 96 |         include: Fields to include in response
 97 |         sort: Sort field
 98 |         order: Sort order ('asc' or 'desc')
 99 |         page_size: Number of results per page
100 |         page: Page number
101 |         api_key: Optional API key (if not provided, uses NCI_API_KEY env var)
102 | 
103 |     Returns:
104 |         Dictionary with search results containing:
105 |         - diseases: List of disease records with names and synonyms
106 |         - total: Total number of results
107 |         - page: Current page
108 |         - page_size: Results per page
109 | 
110 |     Raises:
111 |         CTSAPIError: If the API request fails
112 |     """
113 |     # Build query parameters
114 |     params = _build_disease_params(
115 |         name,
116 |         disease_type,
117 |         category,
118 |         codes,
119 |         parent_ids,
120 |         ancestor_ids,
121 |         include,
122 |         sort,
123 |         order,
124 |         page_size,
125 |     )
126 | 
127 |     try:
128 |         # Make API request
129 |         response = await make_cts_request(
130 |             url=NCI_DISEASES_URL,
131 |             params=params,
132 |             api_key=api_key,
133 |         )
134 | 
135 |         # Process response
136 |         diseases = response.get("data", response.get("diseases", []))
137 |         total = response.get("total", len(diseases))
138 | 
139 |         return {
140 |             "diseases": diseases,
141 |             "total": total,
142 |             "page": page,
143 |             "page_size": page_size,
144 |         }
145 | 
146 |     except CTSAPIError:
147 |         raise
148 |     except Exception as e:
149 |         logger.error(f"Failed to search diseases: {e}")
150 |         raise CTSAPIError(f"Disease search failed: {e!s}") from e
151 | 
152 | 
153 | async def get_disease_by_id(
154 |     disease_id: str,
155 |     api_key: str | None = None,
156 | ) -> dict[str, Any]:
157 |     """
158 |     Get detailed information about a specific disease by ID.
159 | 
160 |     Args:
161 |         disease_id: Disease ID from NCI CTS
162 |         api_key: Optional API key (if not provided, uses NCI_API_KEY env var)
163 | 
164 |     Returns:
165 |         Dictionary with disease details including synonyms
166 | 
167 |     Raises:
168 |         CTSAPIError: If the API request fails
169 |     """
170 |     try:
171 |         # Make API request
172 |         url = f"{NCI_DISEASES_URL}/{disease_id}"
173 |         response = await make_cts_request(
174 |             url=url,
175 |             api_key=api_key,
176 |         )
177 | 
178 |         # Return the disease data
179 |         if "data" in response:
180 |             return response["data"]
181 |         elif "disease" in response:
182 |             return response["disease"]
183 |         else:
184 |             return response
185 | 
186 |     except CTSAPIError:
187 |         raise
188 |     except Exception as e:
189 |         logger.error(f"Failed to get disease {disease_id}: {e}")
190 |         raise CTSAPIError(f"Failed to retrieve disease: {e!s}") from e
191 | 
192 | 
193 | def _format_disease_synonyms(synonyms: Any) -> list[str]:
194 |     """Format disease synonyms section."""
195 |     lines: list[str] = []
196 |     if not synonyms:
197 |         return lines
198 | 
199 |     if isinstance(synonyms, list) and synonyms:
200 |         lines.append("- **Synonyms**:")
201 |         for syn in synonyms[:5]:  # Show up to 5 synonyms
202 |             lines.append(f"  - {syn}")
203 |         if len(synonyms) > 5:
204 |             lines.append(f"  *(and {len(synonyms) - 5} more)*")
205 |     elif isinstance(synonyms, str):
206 |         lines.append(f"- **Synonyms**: {synonyms}")
207 | 
208 |     return lines
209 | 
210 | 
211 | def _format_disease_codes(codes: Any) -> list[str]:
212 |     """Format disease code mappings."""
213 |     if not codes or not isinstance(codes, dict):
214 |         return []
215 | 
216 |     code_items = []
217 |     for system, code in codes.items():
218 |         code_items.append(f"{system}: {code}")
219 | 
220 |     if code_items:
221 |         return [f"- **Codes**: {', '.join(code_items)}"]
222 |     return []
223 | 
224 | 
225 | def _format_single_disease(disease: dict[str, Any]) -> list[str]:
226 |     """Format a single disease record."""
227 |     disease_id = disease.get("id", disease.get("disease_id", "Unknown"))
228 |     name = disease.get(
229 |         "name", disease.get("preferred_name", "Unknown Disease")
230 |     )
231 |     category = disease.get("category", disease.get("type", ""))
232 | 
233 |     lines = [
234 |         f"### {name}",
235 |         f"- **ID**: {disease_id}",
236 |     ]
237 | 
238 |     if category:
239 |         lines.append(f"- **Category**: {category}")
240 | 
241 |     # Add synonyms
242 |     lines.extend(_format_disease_synonyms(disease.get("synonyms", [])))
243 | 
244 |     # Add code mappings
245 |     lines.extend(_format_disease_codes(disease.get("codes")))
246 | 
247 |     lines.append("")
248 |     return lines
249 | 
250 | 
251 | def format_disease_results(results: dict[str, Any]) -> str:
252 |     """
253 |     Format disease search results as markdown.
254 | 
255 |     Args:
256 |         results: Search results dictionary
257 | 
258 |     Returns:
259 |         Formatted markdown string
260 |     """
261 |     diseases = results.get("diseases", [])
262 |     total = results.get("total", 0)
263 | 
264 |     if not diseases:
265 |         return "No diseases found matching the search criteria."
266 | 
267 |     # Build markdown output
268 |     lines = [
269 |         f"## Disease Search Results ({total} found)",
270 |         "",
271 |     ]
272 | 
273 |     for disease in diseases:
274 |         lines.extend(_format_single_disease(disease))
275 | 
276 |     return "\n".join(lines)
277 | 
278 | 
279 | async def search_diseases_with_or(
280 |     name_query: str,
281 |     include_synonyms: bool = True,
282 |     category: str | None = None,
283 |     disease_type: str | None = None,
284 |     codes: list[str] | None = None,
285 |     parent_ids: list[str] | None = None,
286 |     ancestor_ids: list[str] | None = None,
287 |     include: list[str] | None = None,
288 |     sort: str | None = None,
289 |     order: str | None = None,
290 |     page_size: int = 20,
291 |     page: int = 1,
292 |     api_key: str | None = None,
293 | ) -> dict[str, Any]:
294 |     """
295 |     Search for diseases with OR query support.
296 | 
297 |     This function handles OR queries by making multiple API calls and combining results.
298 |     For example: "melanoma OR lung cancer" will search for each term.
299 | 
300 |     Args:
301 |         name_query: Name query that may contain OR operators
302 |         Other args same as search_diseases
303 | 
304 |     Returns:
305 |         Combined results from all searches with duplicates removed
306 |     """
307 |     # Check if this is an OR query
308 |     if " OR " in name_query or " or " in name_query:
309 |         search_terms = parse_or_query(name_query)
310 |         logger.info(f"Parsed OR query into terms: {search_terms}")
311 |     else:
312 |         # Single term search
313 |         search_terms = [name_query]
314 | 
315 |     # Collect all unique diseases
316 |     all_diseases = {}
317 |     total_found = 0
318 | 
319 |     # Search for each term
320 |     for term in search_terms:
321 |         logger.info(f"Searching diseases for term: {term}")
322 |         try:
323 |             results = await search_diseases(
324 |                 name=term,
325 |                 include_synonyms=include_synonyms,
326 |                 category=category,
327 |                 disease_type=disease_type,
328 |                 codes=codes,
329 |                 parent_ids=parent_ids,
330 |                 ancestor_ids=ancestor_ids,
331 |                 include=include,
332 |                 sort=sort,
333 |                 order=order,
334 |                 page_size=page_size,
335 |                 page=page,
336 |                 api_key=api_key,
337 |             )
338 | 
339 |             # Add unique diseases (deduplicate by ID)
340 |             for disease in results.get("diseases", []):
341 |                 disease_id = disease.get("id", disease.get("disease_id"))
342 |                 if disease_id and disease_id not in all_diseases:
343 |                     all_diseases[disease_id] = disease
344 | 
345 |             total_found += results.get("total", 0)
346 | 
347 |         except Exception as e:
348 |             logger.warning(f"Failed to search for term '{term}': {e}")
349 |             # Continue with other terms
350 | 
351 |     # Convert back to list and apply pagination
352 |     unique_diseases = list(all_diseases.values())
353 | 
354 |     # Sort by name for consistent results
355 |     unique_diseases.sort(
356 |         key=lambda x: x.get("name", x.get("preferred_name", "")).lower()
357 |     )
358 | 
359 |     # Apply pagination to combined results
360 |     start_idx = (page - 1) * page_size
361 |     end_idx = start_idx + page_size
362 |     paginated_diseases = unique_diseases[start_idx:end_idx]
363 | 
364 |     return {
365 |         "diseases": paginated_diseases,
366 |         "total": len(unique_diseases),
367 |         "page": page,
368 |         "page_size": page_size,
369 |         "search_terms": search_terms,  # Include what we searched for
370 |         "total_found_across_terms": total_found,  # Total before deduplication
371 |     }
372 | 
```

--------------------------------------------------------------------------------
/docs/tutorials/openfda-integration.md:
--------------------------------------------------------------------------------

```markdown
  1 | # OpenFDA Integration Guide
  2 | 
  3 | ## Overview
  4 | 
  5 | BioMCP now integrates with the FDA's openFDA API to provide access to critical drug safety and regulatory information. This integration adds three major data sources to BioMCP's capabilities:
  6 | 
  7 | 1. **Drug Adverse Events (FAERS)** - FDA Adverse Event Reporting System data
  8 | 2. **Drug Labels (SPL)** - Official FDA drug product labeling
  9 | 3. **Device Events (MAUDE)** - Medical device adverse event reports
 10 | 
 11 | This guide covers how to use these new tools effectively for precision oncology research.
 12 | 
 13 | ## Quick Start
 14 | 
 15 | ### Installation & Setup
 16 | 
 17 | The OpenFDA integration is included in the standard BioMCP installation:
 18 | 
 19 | ```bash
 20 | # Install BioMCP
 21 | pip install biomcp-python
 22 | 
 23 | # Optional: Set API key for higher rate limits
 24 | export OPENFDA_API_KEY="your-api-key-here"
 25 | ```
 26 | 
 27 | > **Note**: An API key is optional but recommended. Without one, you're limited to 40 requests/minute. With a key, you get 240 requests/minute. [Get a free API key here](https://open.fda.gov/apis/authentication/).
 28 | 
 29 | ### Basic Usage Examples
 30 | 
 31 | #### Search for drug adverse events
 32 | 
 33 | ```bash
 34 | # Find adverse events for a specific drug
 35 | biomcp openfda adverse search --drug imatinib
 36 | 
 37 | # Search for specific reactions
 38 | biomcp openfda adverse search --reaction nausea --serious
 39 | 
 40 | # Get detailed report
 41 | biomcp openfda adverse get REPORT123456
 42 | ```
 43 | 
 44 | #### Search drug labels
 45 | 
 46 | ```bash
 47 | # Find drugs for specific indications
 48 | biomcp openfda label search --indication melanoma
 49 | 
 50 | # Search for drugs with boxed warnings
 51 | biomcp openfda label search --boxed-warning
 52 | 
 53 | # Get complete label
 54 | biomcp openfda label get SET_ID_HERE
 55 | ```
 56 | 
 57 | #### Search device events
 58 | 
 59 | ```bash
 60 | # Search for genomic test device issues
 61 | biomcp openfda device search --device "FoundationOne"
 62 | 
 63 | # Search by manufacturer
 64 | biomcp openfda device search --manufacturer Illumina
 65 | 
 66 | # Get detailed device event
 67 | biomcp openfda device get MDR123456
 68 | ```
 69 | 
 70 | ## MCP Tool Usage
 71 | 
 72 | ### For AI Agents
 73 | 
 74 | The OpenFDA tools are available as MCP tools for AI agents. Each tool includes built-in reminders to use the `think` tool first for complex queries.
 75 | 
 76 | #### Available Tools
 77 | 
 78 | - `openfda_adverse_searcher` - Search drug adverse events
 79 | - `openfda_adverse_getter` - Get specific adverse event report
 80 | - `openfda_label_searcher` - Search drug labels
 81 | - `openfda_label_getter` - Get complete drug label
 82 | - `openfda_device_searcher` - Search device adverse events
 83 | - `openfda_device_getter` - Get specific device event report
 84 | 
 85 | #### Example Tool Usage
 86 | 
 87 | ```python
 88 | # Search for adverse events
 89 | result = await openfda_adverse_searcher(
 90 |     drug="pembrolizumab",
 91 |     serious=True,
 92 |     limit=25
 93 | )
 94 | 
 95 | # Get drug label
 96 | label = await openfda_label_getter(
 97 |     set_id="abc-123-def",
 98 |     sections=["indications_and_usage", "warnings_and_precautions"]
 99 | )
100 | 
101 | # Search genomic devices
102 | devices = await openfda_device_searcher(
103 |     device="sequencer",
104 |     genomics_only=True,  # Filter to genomic/diagnostic devices
105 |     problem="false positive"
106 | )
107 | ```
108 | 
109 | ## Data Sources Explained
110 | 
111 | ### Drug Adverse Events (FAERS)
112 | 
113 | The FDA Adverse Event Reporting System contains reports of adverse events and medication errors submitted to FDA. Key features:
114 | 
115 | - **Voluntary reporting**: Reports come from healthcare professionals, patients, and manufacturers
116 | - **No causation proof**: Reports don't establish that a drug caused the event
117 | - **Rich detail**: Includes patient demographics, drug information, reactions, and outcomes
118 | - **Real-world data**: Captures post-market safety signals
119 | 
120 | **Best for**: Understanding potential side effects, safety signals, drug interactions
121 | 
122 | ### Drug Labels (SPL)
123 | 
124 | Structured Product Labeling contains the official FDA-approved prescribing information. Includes:
125 | 
126 | - **Indications and usage**: FDA-approved uses
127 | - **Dosage and administration**: How to prescribe
128 | - **Contraindications**: When not to use
129 | - **Warnings and precautions**: Safety information
130 | - **Drug interactions**: Known interactions
131 | - **Clinical studies**: Trial data supporting approval
132 | 
133 | **Best for**: Official prescribing guidelines, approved indications, contraindications
134 | 
135 | ### Device Events (MAUDE)
136 | 
137 | Manufacturer and User Facility Device Experience database contains medical device adverse events. For BioMCP, we focus on genomic/diagnostic devices:
138 | 
139 | - **Genomic test devices**: Issues with sequencing platforms, diagnostic panels
140 | - **In vitro diagnostics**: Problems with biomarker tests
141 | - **Device malfunctions**: Technical failures affecting test results
142 | - **Patient impact**: How device issues affected patient care
143 | 
144 | **Best for**: Understanding reliability of genomic tests, device-related diagnostic issues
145 | 
146 | ## Advanced Features
147 | 
148 | ### Genomic Device Filtering
149 | 
150 | By default, device searches filter to genomic/diagnostic devices relevant to precision oncology:
151 | 
152 | ```bash
153 | # Search only genomic devices (default)
154 | biomcp openfda device search --device test
155 | 
156 | # Search ALL medical devices
157 | biomcp openfda device search --device test --all-devices
158 | ```
159 | 
160 | The genomic filter includes FDA product codes for:
161 | 
162 | - Next Generation Sequencing panels
163 | - Gene mutation detection systems
164 | - Tumor profiling tests
165 | - Hereditary variant detection systems
166 | 
167 | ### Pagination Support
168 | 
169 | All search tools support pagination for large result sets:
170 | 
171 | ```bash
172 | # Get second page of results
173 | biomcp openfda adverse search --drug aspirin --page 2 --limit 50
174 | ```
175 | 
176 | ### Section-Specific Label Retrieval
177 | 
178 | When retrieving drug labels, you can specify which sections to include:
179 | 
180 | ```bash
181 | # Get only specific sections
182 | biomcp openfda label get SET_ID --sections "indications_and_usage,adverse_reactions"
183 | ```
184 | 
185 | ## Integration with Other BioMCP Tools
186 | 
187 | ### Complementary Data Sources
188 | 
189 | OpenFDA data complements existing BioMCP tools:
190 | 
191 | | Tool                       | Data Source        | Best For                          |
192 | | -------------------------- | ------------------ | --------------------------------- |
193 | | `drug_getter`              | MyChem.info        | Chemical properties, mechanisms   |
194 | | `openfda_label_searcher`   | FDA Labels         | Official indications, prescribing |
195 | | `openfda_adverse_searcher` | FAERS              | Safety signals, side effects      |
196 | | `trial_searcher`           | ClinicalTrials.gov | Active trials, eligibility        |
197 | 
198 | ### Workflow Examples
199 | 
200 | #### Complete Drug Profile
201 | 
202 | ```python
203 | # 1. Get drug chemical info
204 | drug_info = await drug_getter("imatinib")
205 | 
206 | # 2. Get FDA label
207 | label = await openfda_label_searcher(name="imatinib")
208 | 
209 | # 3. Check adverse events
210 | safety = await openfda_adverse_searcher(drug="imatinib", serious=True)
211 | 
212 | # 4. Find current trials
213 | trials = await trial_searcher(interventions=["imatinib"])
214 | ```
215 | 
216 | #### Device Reliability Check
217 | 
218 | ```python
219 | # 1. Search for device issues
220 | events = await openfda_device_searcher(
221 |     device="FoundationOne CDx",
222 |     problem="false"
223 | )
224 | 
225 | # 2. Get specific event details
226 | if events:
227 |     details = await openfda_device_getter("MDR_KEY_HERE")
228 | ```
229 | 
230 | ## Important Considerations
231 | 
232 | ### Data Limitations
233 | 
234 | 1. **Adverse Events**:
235 | 
236 |    - Reports don't prove causation
237 |    - Reporting is voluntary, so not all events are captured
238 |    - Duplicate reports may exist
239 |    - Include appropriate disclaimers when presenting data
240 | 
241 | 2. **Drug Labels**:
242 | 
243 |    - May not reflect the most recent changes
244 |    - Off-label uses not included
245 |    - Generic drugs may have different inactive ingredients
246 | 
247 | 3. **Device Events**:
248 |    - Not all device problems are reported
249 |    - User error vs device malfunction can be unclear
250 |    - Reports may lack complete information
251 | 
252 | ### Rate Limits
253 | 
254 | - **Without API key**: 40 requests/minute per IP
255 | - **With API key**: 240 requests/minute per key
256 | - **Burst limit**: 4 requests/second
257 | 
258 | ### Best Practices
259 | 
260 | 1. **Always use disclaimers**: Include FDA's disclaimer about adverse events not proving causation
261 | 2. **Check multiple sources**: Combine OpenFDA data with other BioMCP tools
262 | 3. **Filter appropriately**: Use genomic device filtering for relevant results
263 | 4. **Handle no results gracefully**: Many specific queries may return no results
264 | 5. **Respect rate limits**: Use API key for production use
265 | 
266 | ## Troubleshooting
267 | 
268 | ### Common Issues
269 | 
270 | **No results found**
271 | 
272 | - Try broader search terms
273 | - Check spelling of drug/device names
274 | - Remove filters to expand search
275 | 
276 | **Rate limit errors**
277 | 
278 | - Add API key to environment
279 | - Reduce request frequency
280 | - Batch queries when possible
281 | 
282 | **Timeout errors**
283 | 
284 | - OpenFDA API may be slow/down
285 | - Retry after a brief wait
286 | - Consider caching frequent queries
287 | 
288 | ### Getting Help
289 | 
290 | - OpenFDA documentation: https://open.fda.gov/apis/
291 | - OpenFDA status: https://api.fda.gov/status
292 | - BioMCP issues: https://github.com/genomoncology/biomcp/issues
293 | 
294 | ## API Reference
295 | 
296 | ### Environment Variables
297 | 
298 | - `OPENFDA_API_KEY`: Your openFDA API key (optional but recommended)
299 | 
300 | ### CLI Commands
301 | 
302 | ```bash
303 | # Adverse Events
304 | biomcp openfda adverse search [OPTIONS]
305 |   --drug TEXT           Drug name to search
306 |   --reaction TEXT       Reaction to search
307 |   --serious/--all       Filter serious events
308 |   --limit INT           Results per page (max 100)
309 |   --page INT            Page number
310 | 
311 | biomcp openfda adverse get REPORT_ID
312 | 
313 | # Drug Labels
314 | biomcp openfda label search [OPTIONS]
315 |   --name TEXT           Drug name
316 |   --indication TEXT     Indication to search
317 |   --boxed-warning       Has boxed warning
318 |   --section TEXT        Label section
319 |   --limit INT           Results per page
320 |   --page INT            Page number
321 | 
322 | biomcp openfda label get SET_ID [OPTIONS]
323 |   --sections TEXT       Comma-separated sections
324 | 
325 | # Device Events
326 | biomcp openfda device search [OPTIONS]
327 |   --device TEXT         Device name
328 |   --manufacturer TEXT   Manufacturer name
329 |   --problem TEXT        Problem description
330 |   --product-code TEXT   FDA product code
331 |   --genomics-only/--all-devices
332 |   --limit INT           Results per page
333 |   --page INT            Page number
334 | 
335 | biomcp openfda device get MDR_KEY
336 | ```
337 | 
338 | ## Example Outputs
339 | 
340 | ### Adverse Event Search
341 | 
342 | ```markdown
343 | ## FDA Adverse Event Reports
344 | 
345 | **Drug**: imatinib | **Serious Events**: Yes
346 | **Total Reports Found**: 1,234 reports
347 | 
348 | ### Top Reported Reactions:
349 | 
350 | - **NAUSEA**: 234 reports (19.0%)
351 | - **FATIGUE**: 189 reports (15.3%)
352 | - **RASH**: 156 reports (12.6%)
353 | 
354 | ### Sample Reports (showing 3 of 1,234):
355 | 
356 | ...
357 | ```
358 | 
359 | ### Drug Label Search
360 | 
361 | ```markdown
362 | ## FDA Drug Labels
363 | 
364 | **Drug**: pembrolizumab
365 | **Total Labels Found**: 5 labels
366 | 
367 | ### Results (showing 5 of 5):
368 | 
369 | #### 1. KEYTRUDA
370 | 
371 | **Also known as**: pembrolizumab
372 | **FDA Application**: BLA125514
373 | **Manufacturer**: Merck Sharp & Dohme
374 | **Route**: INTRAVENOUS
375 | 
376 | ⚠️ **BOXED WARNING**: Immune-mediated adverse reactions...
377 | 
378 | **Indications**: KEYTRUDA is indicated for the treatment of...
379 | ```
380 | 
381 | ### Device Event Search
382 | 
383 | ```markdown
384 | ## FDA Device Adverse Event Reports
385 | 
386 | **Device**: FoundationOne | **Type**: Genomic/Diagnostic Devices
387 | **Total Reports Found**: 12 reports
388 | 
389 | ### Top Reported Problems:
390 | 
391 | - **False negative result**: 5 reports (41.7%)
392 | - **Software malfunction**: 3 reports (25.0%)
393 | 
394 | ### Sample Reports (showing 3 of 12):
395 | 
396 | ...
397 | ```
398 | 
```