This is page 8 of 19. Use http://codebase.md/genomoncology/biomcp?lines=true&page={x} to view the full context. # Directory Structure ``` ├── .github │ ├── actions │ │ └── setup-python-env │ │ └── action.yml │ ├── dependabot.yml │ └── workflows │ ├── ci.yml │ ├── deploy-docs.yml │ ├── main.yml.disabled │ ├── on-release-main.yml │ └── validate-codecov-config.yml ├── .gitignore ├── .pre-commit-config.yaml ├── BIOMCP_DATA_FLOW.md ├── CHANGELOG.md ├── CNAME ├── codecov.yaml ├── docker-compose.yml ├── Dockerfile ├── docs │ ├── apis │ │ ├── error-codes.md │ │ ├── overview.md │ │ └── python-sdk.md │ ├── assets │ │ ├── biomcp-cursor-locations.png │ │ ├── favicon.ico │ │ ├── icon.png │ │ ├── logo.png │ │ ├── mcp_architecture.txt │ │ └── remote-connection │ │ ├── 00_connectors.png │ │ ├── 01_add_custom_connector.png │ │ ├── 02_connector_enabled.png │ │ ├── 03_connect_to_biomcp.png │ │ ├── 04_select_google_oauth.png │ │ └── 05_success_connect.png │ ├── backend-services-reference │ │ ├── 01-overview.md │ │ ├── 02-biothings-suite.md │ │ ├── 03-cbioportal.md │ │ ├── 04-clinicaltrials-gov.md │ │ ├── 05-nci-cts-api.md │ │ ├── 06-pubtator3.md │ │ └── 07-alphagenome.md │ ├── blog │ │ ├── ai-assisted-clinical-trial-search-analysis.md │ │ ├── images │ │ │ ├── deep-researcher-video.png │ │ │ ├── researcher-announce.png │ │ │ ├── researcher-drop-down.png │ │ │ ├── researcher-prompt.png │ │ │ ├── trial-search-assistant.png │ │ │ └── what_is_biomcp_thumbnail.png │ │ └── researcher-persona-resource.md │ ├── changelog.md │ ├── CNAME │ ├── concepts │ │ ├── 01-what-is-biomcp.md │ │ ├── 02-the-deep-researcher-persona.md │ │ └── 03-sequential-thinking-with-the-think-tool.md │ ├── developer-guides │ │ ├── 01-server-deployment.md │ │ ├── 02-contributing-and-testing.md │ │ ├── 03-third-party-endpoints.md │ │ ├── 04-transport-protocol.md │ │ ├── 05-error-handling.md │ │ ├── 06-http-client-and-caching.md │ │ ├── 07-performance-optimizations.md │ │ └── generate_endpoints.py │ ├── faq-condensed.md │ ├── FDA_SECURITY.md │ ├── genomoncology.md │ ├── getting-started │ │ ├── 01-quickstart-cli.md │ │ ├── 02-claude-desktop-integration.md │ │ └── 03-authentication-and-api-keys.md │ ├── how-to-guides │ │ ├── 01-find-articles-and-cbioportal-data.md │ │ ├── 02-find-trials-with-nci-and-biothings.md │ │ ├── 03-get-comprehensive-variant-annotations.md │ │ ├── 04-predict-variant-effects-with-alphagenome.md │ │ ├── 05-logging-and-monitoring-with-bigquery.md │ │ └── 06-search-nci-organizations-and-interventions.md │ ├── index.md │ ├── policies.md │ ├── reference │ │ ├── architecture-diagrams.md │ │ ├── quick-architecture.md │ │ ├── quick-reference.md │ │ └── visual-architecture.md │ ├── robots.txt │ ├── stylesheets │ │ ├── announcement.css │ │ └── extra.css │ ├── troubleshooting.md │ ├── tutorials │ │ ├── biothings-prompts.md │ │ ├── claude-code-biomcp-alphagenome.md │ │ ├── nci-prompts.md │ │ ├── openfda-integration.md │ │ ├── openfda-prompts.md │ │ ├── pydantic-ai-integration.md │ │ └── remote-connection.md │ ├── user-guides │ │ ├── 01-command-line-interface.md │ │ ├── 02-mcp-tools-reference.md │ │ └── 03-integrating-with-ides-and-clients.md │ └── workflows │ └── all-workflows.md ├── example_scripts │ ├── mcp_integration.py │ └── python_sdk.py ├── glama.json ├── LICENSE ├── lzyank.toml ├── Makefile ├── mkdocs.yml ├── package-lock.json ├── package.json ├── pyproject.toml ├── README.md ├── scripts │ ├── check_docs_in_mkdocs.py │ ├── check_http_imports.py │ └── generate_endpoints_doc.py ├── smithery.yaml ├── src │ └── biomcp │ ├── __init__.py │ ├── __main__.py │ ├── articles │ │ ├── __init__.py │ │ ├── autocomplete.py │ │ ├── fetch.py │ │ ├── preprints.py │ │ ├── search_optimized.py │ │ ├── search.py │ │ └── unified.py │ ├── biomarkers │ │ ├── __init__.py │ │ └── search.py │ ├── cbioportal_helper.py │ ├── circuit_breaker.py │ ├── cli │ │ ├── __init__.py │ │ ├── articles.py │ │ ├── biomarkers.py │ │ ├── diseases.py │ │ ├── health.py │ │ ├── interventions.py │ │ ├── main.py │ │ ├── openfda.py │ │ ├── organizations.py │ │ ├── server.py │ │ ├── trials.py │ │ └── variants.py │ ├── connection_pool.py │ ├── constants.py │ ├── core.py │ ├── diseases │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── domain_handlers.py │ ├── drugs │ │ ├── __init__.py │ │ └── getter.py │ ├── exceptions.py │ ├── genes │ │ ├── __init__.py │ │ └── getter.py │ ├── http_client_simple.py │ ├── http_client.py │ ├── individual_tools.py │ ├── integrations │ │ ├── __init__.py │ │ ├── biothings_client.py │ │ └── cts_api.py │ ├── interventions │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── logging_filter.py │ ├── metrics_handler.py │ ├── metrics.py │ ├── openfda │ │ ├── __init__.py │ │ ├── adverse_events_helpers.py │ │ ├── adverse_events.py │ │ ├── cache.py │ │ ├── constants.py │ │ ├── device_events_helpers.py │ │ ├── device_events.py │ │ ├── drug_approvals.py │ │ ├── drug_labels_helpers.py │ │ ├── drug_labels.py │ │ ├── drug_recalls_helpers.py │ │ ├── drug_recalls.py │ │ ├── drug_shortages_detail_helpers.py │ │ ├── drug_shortages_helpers.py │ │ ├── drug_shortages.py │ │ ├── exceptions.py │ │ ├── input_validation.py │ │ ├── rate_limiter.py │ │ ├── utils.py │ │ └── validation.py │ ├── organizations │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── parameter_parser.py │ ├── prefetch.py │ ├── query_parser.py │ ├── query_router.py │ ├── rate_limiter.py │ ├── render.py │ ├── request_batcher.py │ ├── resources │ │ ├── __init__.py │ │ ├── getter.py │ │ ├── instructions.md │ │ └── researcher.md │ ├── retry.py │ ├── router_handlers.py │ ├── router.py │ ├── shared_context.py │ ├── thinking │ │ ├── __init__.py │ │ ├── sequential.py │ │ └── session.py │ ├── thinking_tool.py │ ├── thinking_tracker.py │ ├── trials │ │ ├── __init__.py │ │ ├── getter.py │ │ ├── nci_getter.py │ │ ├── nci_search.py │ │ └── search.py │ ├── utils │ │ ├── __init__.py │ │ ├── cancer_types_api.py │ │ ├── cbio_http_adapter.py │ │ ├── endpoint_registry.py │ │ ├── gene_validator.py │ │ ├── metrics.py │ │ ├── mutation_filter.py │ │ ├── query_utils.py │ │ ├── rate_limiter.py │ │ └── request_cache.py │ ├── variants │ │ ├── __init__.py │ │ ├── alphagenome.py │ │ ├── cancer_types.py │ │ ├── cbio_external_client.py │ │ ├── cbioportal_mutations.py │ │ ├── cbioportal_search_helpers.py │ │ ├── cbioportal_search.py │ │ ├── constants.py │ │ ├── external.py │ │ ├── filters.py │ │ ├── getter.py │ │ ├── links.py │ │ └── search.py │ └── workers │ ├── __init__.py │ ├── worker_entry_stytch.js │ ├── worker_entry.js │ └── worker.py ├── tests │ ├── bdd │ │ ├── cli_help │ │ │ ├── help.feature │ │ │ └── test_help.py │ │ ├── conftest.py │ │ ├── features │ │ │ └── alphagenome_integration.feature │ │ ├── fetch_articles │ │ │ ├── fetch.feature │ │ │ └── test_fetch.py │ │ ├── get_trials │ │ │ ├── get.feature │ │ │ └── test_get.py │ │ ├── get_variants │ │ │ ├── get.feature │ │ │ └── test_get.py │ │ ├── search_articles │ │ │ ├── autocomplete.feature │ │ │ ├── search.feature │ │ │ ├── test_autocomplete.py │ │ │ └── test_search.py │ │ ├── search_trials │ │ │ ├── search.feature │ │ │ └── test_search.py │ │ ├── search_variants │ │ │ ├── search.feature │ │ │ └── test_search.py │ │ └── steps │ │ └── test_alphagenome_steps.py │ ├── config │ │ └── test_smithery_config.py │ ├── conftest.py │ ├── data │ │ ├── ct_gov │ │ │ ├── clinical_trials_api_v2.yaml │ │ │ ├── trials_NCT04280705.json │ │ │ └── trials_NCT04280705.txt │ │ ├── myvariant │ │ │ ├── myvariant_api.yaml │ │ │ ├── myvariant_field_descriptions.csv │ │ │ ├── variants_full_braf_v600e.json │ │ │ ├── variants_full_braf_v600e.txt │ │ │ └── variants_part_braf_v600_multiple.json │ │ ├── openfda │ │ │ ├── drugsfda_detail.json │ │ │ ├── drugsfda_search.json │ │ │ ├── enforcement_detail.json │ │ │ └── enforcement_search.json │ │ └── pubtator │ │ ├── pubtator_autocomplete.json │ │ └── pubtator3_paper.txt │ ├── integration │ │ ├── test_openfda_integration.py │ │ ├── test_preprints_integration.py │ │ ├── test_simple.py │ │ └── test_variants_integration.py │ ├── tdd │ │ ├── articles │ │ │ ├── test_autocomplete.py │ │ │ ├── test_cbioportal_integration.py │ │ │ ├── test_fetch.py │ │ │ ├── test_preprints.py │ │ │ ├── test_search.py │ │ │ └── test_unified.py │ │ ├── conftest.py │ │ ├── drugs │ │ │ ├── __init__.py │ │ │ └── test_drug_getter.py │ │ ├── openfda │ │ │ ├── __init__.py │ │ │ ├── test_adverse_events.py │ │ │ ├── test_device_events.py │ │ │ ├── test_drug_approvals.py │ │ │ ├── test_drug_labels.py │ │ │ ├── test_drug_recalls.py │ │ │ ├── test_drug_shortages.py │ │ │ └── test_security.py │ │ ├── test_biothings_integration_real.py │ │ ├── test_biothings_integration.py │ │ ├── test_circuit_breaker.py │ │ ├── test_concurrent_requests.py │ │ ├── test_connection_pool.py │ │ ├── test_domain_handlers.py │ │ ├── test_drug_approvals.py │ │ ├── test_drug_recalls.py │ │ ├── test_drug_shortages.py │ │ ├── test_endpoint_documentation.py │ │ ├── test_error_scenarios.py │ │ ├── test_europe_pmc_fetch.py │ │ ├── test_mcp_integration.py │ │ ├── test_mcp_tools.py │ │ ├── test_metrics.py │ │ ├── test_nci_integration.py │ │ ├── test_nci_mcp_tools.py │ │ ├── test_network_policies.py │ │ ├── test_offline_mode.py │ │ ├── test_openfda_unified.py │ │ ├── test_pten_r173_search.py │ │ ├── test_render.py │ │ ├── test_request_batcher.py.disabled │ │ ├── test_retry.py │ │ ├── test_router.py │ │ ├── test_shared_context.py.disabled │ │ ├── test_unified_biothings.py │ │ ├── thinking │ │ │ ├── __init__.py │ │ │ └── test_sequential.py │ │ ├── trials │ │ │ ├── test_backward_compatibility.py │ │ │ ├── test_getter.py │ │ │ └── test_search.py │ │ ├── utils │ │ │ ├── test_gene_validator.py │ │ │ ├── test_mutation_filter.py │ │ │ ├── test_rate_limiter.py │ │ │ └── test_request_cache.py │ │ ├── variants │ │ │ ├── constants.py │ │ │ ├── test_alphagenome_api_key.py │ │ │ ├── test_alphagenome_comprehensive.py │ │ │ ├── test_alphagenome.py │ │ │ ├── test_cbioportal_mutations.py │ │ │ ├── test_cbioportal_search.py │ │ │ ├── test_external_integration.py │ │ │ ├── test_external.py │ │ │ ├── test_extract_gene_aa_change.py │ │ │ ├── test_filters.py │ │ │ ├── test_getter.py │ │ │ ├── test_links.py │ │ │ └── test_search.py │ │ └── workers │ │ └── test_worker_sanitization.js │ └── test_pydantic_ai_integration.py ├── THIRD_PARTY_ENDPOINTS.md ├── tox.ini ├── uv.lock └── wrangler.toml ``` # Files -------------------------------------------------------------------------------- /docs/tutorials/nci-prompts.md: -------------------------------------------------------------------------------- ```markdown 1 | # NCI Tools Example Prompts 2 | 3 | This guide provides example prompts for AI assistants to effectively use the NCI (National Cancer Institute) Clinical Trials Search API tools in BioMCP. 4 | 5 | ## Overview of NCI Tools 6 | 7 | BioMCP integrates with the NCI Clinical Trials Search API to provide: 8 | 9 | - **Organization Search & Lookup** - Find cancer research centers, hospitals, and trial sponsors 10 | - **Intervention Search & Lookup** - Search for drugs, devices, procedures, and other interventions 11 | 12 | These tools require an NCI API key from: https://clinicaltrialsapi.cancer.gov/ 13 | 14 | ## Best Practices 15 | 16 | ### API Key Required 17 | 18 | All example prompts in this guide should include your NCI API key. Add this to the end of each prompt: 19 | 20 | ``` 21 | "... my NCI API key is YOUR_API_KEY" 22 | ``` 23 | 24 | ### Location Searches 25 | 26 | **ALWAYS use city AND state together** when searching organizations by location. The NCI API has Elasticsearch limitations that cause errors with broad searches. 27 | 28 | ✅ **Good**: `nci_organization_searcher(city="Cleveland", state="OH")` 29 | ❌ **Bad**: `nci_organization_searcher(city="Cleveland")` or `nci_organization_searcher(state="OH")` 30 | 31 | ### API Parameter Notes 32 | 33 | - The NCI APIs do not support offset-based pagination (`from` parameter) 34 | - Organization location parameters use `org_` prefix (e.g., `org_city`, `org_state_or_province`) 35 | - When using `size` parameter, the API may not return a `total` count 36 | 37 | ### Avoiding API Errors 38 | 39 | - Use specific organization names when possible 40 | - Combine multiple filters (name + type, city + state) 41 | - Start with more specific searches, then broaden if needed 42 | 43 | ## Organization Tools 44 | 45 | ### Organization Search 46 | 47 | #### Basic Organization Search 48 | 49 | ``` 50 | "Find cancer centers in California, my NCI API key is YOUR_API_KEY" 51 | "Search for MD Anderson Cancer Center, my NCI API key is YOUR_API_KEY" 52 | "List academic cancer research centers in New York, my NCI API key is YOUR_API_KEY" 53 | "Find all NCI-designated cancer centers, my NCI API key is YOUR_API_KEY" 54 | ``` 55 | 56 | **Expected tool usage**: `nci_organization_searcher(state="CA", organization_type="Academic")` 57 | 58 | #### Organization by Location 59 | 60 | **IMPORTANT**: Always use city AND state together to avoid API errors! 61 | 62 | ``` 63 | "Show me cancer treatment centers in Boston, MA, my NCI API key is YOUR_API_KEY" 64 | "Find clinical trial sites in Houston, Texas, my NCI API key is YOUR_API_KEY" 65 | "List all cancer research organizations in Cleveland, OH, my NCI API key is YOUR_API_KEY" 66 | "Search for industry sponsors in San Francisco, CA, my NCI API key is YOUR_API_KEY" 67 | ``` 68 | 69 | **Expected tool usage**: `nci_organization_searcher(city="Boston", state="MA")` ✓ 70 | **Never use**: `nci_organization_searcher(city="Boston")` ✗ or `nci_organization_searcher(state="MA")` ✗ 71 | 72 | #### Organization by Type 73 | 74 | ``` 75 | "Find all government cancer research facilities, my NCI API key is YOUR_API_KEY" 76 | "List pharmaceutical companies running cancer trials, my NCI API key is YOUR_API_KEY" 77 | "Show me academic medical centers conducting trials, my NCI API key is YOUR_API_KEY" 78 | "Find community hospitals participating in cancer research, my NCI API key is YOUR_API_KEY" 79 | ``` 80 | 81 | **Expected tool usage**: `nci_organization_searcher(organization_type="Industry")` 82 | 83 | ### Organization Details 84 | 85 | ``` 86 | "Get details about organization NCI-2011-03337, my NCI API key is YOUR_API_KEY" 87 | "Show me contact information for this cancer center, my NCI API key is YOUR_API_KEY" 88 | "What trials is this organization conducting? My NCI API key is YOUR_API_KEY" 89 | "Give me the full profile of this research institution, my NCI API key is YOUR_API_KEY" 90 | ``` 91 | 92 | **Expected tool usage**: `organization_getter(organization_id="NCI-2011-03337")` 93 | 94 | ## Intervention Tools 95 | 96 | ### Intervention Search 97 | 98 | #### Drug Search 99 | 100 | ``` 101 | "Find all trials using pembrolizumab, my NCI API key is YOUR_API_KEY" 102 | "Search for PD-1 inhibitor drugs in trials, my NCI API key is YOUR_API_KEY" 103 | "List all immunotherapy drugs being tested, my NCI API key is YOUR_API_KEY" 104 | "Find trials using Keytruda or similar drugs, my NCI API key is YOUR_API_KEY" 105 | ``` 106 | 107 | **Expected tool usage**: `nci_intervention_searcher(name="pembrolizumab", intervention_type="Drug")` 108 | 109 | #### Device Search 110 | 111 | ``` 112 | "Search for medical devices in cancer trials, my NCI API key is YOUR_API_KEY" 113 | "Find trials using surgical robots, my NCI API key is YOUR_API_KEY" 114 | "List radiation therapy devices being tested, my NCI API key is YOUR_API_KEY" 115 | "Show me trials with diagnostic devices, my NCI API key is YOUR_API_KEY" 116 | ``` 117 | 118 | **Expected tool usage**: `nci_intervention_searcher(intervention_type="Device")` 119 | 120 | #### Procedure Search 121 | 122 | ``` 123 | "Find surgical procedures in cancer trials, my NCI API key is YOUR_API_KEY" 124 | "Search for minimally invasive surgery trials, my NCI API key is YOUR_API_KEY" 125 | "List trials with radiation therapy procedures, my NCI API key is YOUR_API_KEY" 126 | "Show me trials testing new biopsy techniques, my NCI API key is YOUR_API_KEY" 127 | ``` 128 | 129 | **Expected tool usage**: `nci_intervention_searcher(intervention_type="Procedure")` 130 | 131 | #### Other Interventions 132 | 133 | ``` 134 | "Find behavioral interventions for cancer patients, my NCI API key is YOUR_API_KEY" 135 | "Search for dietary interventions in trials, my NCI API key is YOUR_API_KEY" 136 | "List genetic therapy trials, my NCI API key is YOUR_API_KEY" 137 | "Show me trials with exercise interventions, my NCI API key is YOUR_API_KEY" 138 | ``` 139 | 140 | **Expected tool usage**: `nci_intervention_searcher(intervention_type="Behavioral")` 141 | 142 | ### Intervention Details 143 | 144 | ``` 145 | "Get full details about intervention INT123456, my NCI API key is YOUR_API_KEY" 146 | "Show me the mechanism of action for this drug, my NCI API key is YOUR_API_KEY" 147 | "Is this intervention FDA approved? My NCI API key is YOUR_API_KEY" 148 | "What trials are using this intervention? My NCI API key is YOUR_API_KEY" 149 | ``` 150 | 151 | **Expected tool usage**: `intervention_getter(intervention_id="INT123456")` 152 | 153 | ## Biomarker Tools 154 | 155 | ### Biomarker Search 156 | 157 | #### Basic Biomarker Search 158 | 159 | ``` 160 | "Find PD-L1 expression biomarkers, my NCI API key is YOUR_API_KEY" 161 | "Search for EGFR mutations used in trials, my NCI API key is YOUR_API_KEY" 162 | "List biomarkers tested by IHC, my NCI API key is YOUR_API_KEY" 163 | "Find HER2 positive biomarkers, my NCI API key is YOUR_API_KEY" 164 | ``` 165 | 166 | **Expected tool usage**: `nci_biomarker_searcher(name="PD-L1")` 167 | 168 | #### Biomarker by Type 169 | 170 | ``` 171 | "Show me all reference gene biomarkers, my NCI API key is YOUR_API_KEY" 172 | "Find branch biomarkers, my NCI API key is YOUR_API_KEY" 173 | "List all biomarkers of type reference_gene, my NCI API key is YOUR_API_KEY" 174 | ``` 175 | 176 | **Expected tool usage**: `nci_biomarker_searcher(biomarker_type="reference_gene")` 177 | 178 | #### Important Note on Biomarker Types 179 | 180 | The NCI API only supports two biomarker types: 181 | 182 | - `reference_gene`: Gene-based biomarkers 183 | - `branch`: Branch/pathway biomarkers 184 | 185 | Note: The API does NOT support searching by gene symbol or assay type directly. 186 | 187 | ## NCI Disease Tools 188 | 189 | ### Disease Search 190 | 191 | #### Basic Disease Search 192 | 193 | ``` 194 | "Find melanoma in NCI vocabulary, my NCI API key is YOUR_API_KEY" 195 | "Search for lung cancer types, my NCI API key is YOUR_API_KEY" 196 | "List breast cancer subtypes, my NCI API key is YOUR_API_KEY" 197 | "Find official name for GIST, my NCI API key is YOUR_API_KEY" 198 | ``` 199 | 200 | **Expected tool usage**: `nci_disease_searcher(name="melanoma")` 201 | 202 | #### Disease with Synonyms 203 | 204 | ``` 205 | "Find all names for gastrointestinal stromal tumor, my NCI API key is YOUR_API_KEY" 206 | "Search for NSCLC and all its synonyms, my NCI API key is YOUR_API_KEY" 207 | "List all terms for triple-negative breast cancer, my NCI API key is YOUR_API_KEY" 208 | "Find alternative names for melanoma, my NCI API key is YOUR_API_KEY" 209 | ``` 210 | 211 | **Expected tool usage**: `nci_disease_searcher(name="GIST", include_synonyms=True)` 212 | 213 | ## Combined Workflows 214 | 215 | ### Finding Trials at Specific Centers 216 | 217 | ``` 218 | "First find cancer centers in California, then show me their trials, my NCI API key is YOUR_API_KEY" 219 | ``` 220 | 221 | **Expected workflow**: 222 | 223 | 1. `nci_organization_searcher(state="CA")` 224 | 2. For each organization, search trials with that sponsor 225 | 226 | ### Drug Development Pipeline 227 | 228 | ``` 229 | "Search for CAR-T cell therapies and show me which organizations are developing them, my NCI API key is YOUR_API_KEY" 230 | ``` 231 | 232 | **Expected workflow**: 233 | 234 | 1. `nci_intervention_searcher(name="CAR-T", intervention_type="Biological")` 235 | 2. For each intervention, get details to see associated trials 236 | 3. Extract organization information from trial data 237 | 238 | ### Regional Cancer Research 239 | 240 | ``` 241 | "What cancer drugs are being tested in Boston area hospitals? My NCI API key is YOUR_API_KEY" 242 | ``` 243 | 244 | **Expected workflow**: 245 | 246 | 1. `nci_organization_searcher(city="Boston", state="MA")` 247 | 2. `trial_searcher(location="Boston, MA", source="nci")` with organization filters 248 | 3. Extract intervention information from trials 249 | 250 | ## Important Notes 251 | 252 | ### API Key Handling 253 | 254 | All NCI tools require an API key. The tools will check for: 255 | 256 | 1. API key provided in the function call 257 | 2. `NCI_API_KEY` environment variable 258 | 3. User-provided key in their message (e.g., "my NCI API key is...") 259 | 260 | ### Synonym Support 261 | 262 | The intervention searcher includes a `synonyms` parameter (default: True) that will search for: 263 | 264 | - Drug trade names (e.g., "Keytruda" finds "pembrolizumab") 265 | - Alternative spellings 266 | - Related terms 267 | 268 | ### Pagination 269 | 270 | Both search tools support pagination: 271 | 272 | - `page`: Page number (1-based) 273 | - `page_size`: Results per page (max 100) 274 | 275 | ### Organization Types 276 | 277 | Valid organization types include: 278 | 279 | - Academic 280 | - Industry 281 | - Government 282 | - Community 283 | - Network 284 | - Other 285 | 286 | ### Intervention Types 287 | 288 | Valid intervention types include: 289 | 290 | - Drug 291 | - Device 292 | - Biological 293 | - Procedure 294 | - Radiation 295 | - Behavioral 296 | - Genetic 297 | - Dietary 298 | - Other 299 | 300 | ## Error Handling 301 | 302 | Common errors and solutions: 303 | 304 | 1. **"NCI API key required"**: User needs to provide an API key 305 | 2. **"No results found"**: Try broader search terms or remove filters 306 | 3. **"Invalid organization/intervention ID"**: Verify the ID format 307 | 4. **Rate limiting**: The API has rate limits; wait before retrying 308 | 5. **"Search Too Broad" (Elasticsearch error)**: The search returns too many results 309 | - This happens when searching with broad criteria 310 | - **Prevention**: Always use city AND state together for location searches 311 | - Add organization name (even partial) to narrow results 312 | - Avoid searching by state alone or organization type alone 313 | ``` -------------------------------------------------------------------------------- /src/biomcp/interventions/search.py: -------------------------------------------------------------------------------- ```python 1 | """Search functionality for interventions via NCI CTS API.""" 2 | 3 | import logging 4 | from typing import Any 5 | 6 | from ..constants import NCI_INTERVENTIONS_URL 7 | from ..integrations.cts_api import CTSAPIError, make_cts_request 8 | from ..utils import parse_or_query 9 | 10 | logger = logging.getLogger(__name__) 11 | 12 | 13 | # Intervention types based on ClinicalTrials.gov categories 14 | INTERVENTION_TYPES = [ 15 | "Drug", 16 | "Device", 17 | "Biological", 18 | "Procedure", 19 | "Radiation", 20 | "Behavioral", 21 | "Genetic", 22 | "Dietary", 23 | "Diagnostic Test", 24 | "Other", 25 | ] 26 | 27 | 28 | def _build_intervention_params( 29 | name: str | None, 30 | intervention_type: str | None, 31 | category: str | None, 32 | codes: list[str] | None, 33 | include: list[str] | None, 34 | sort: str | None, 35 | order: str | None, 36 | page_size: int | None, 37 | ) -> dict[str, Any]: 38 | """Build query parameters for intervention search.""" 39 | params: dict[str, Any] = {} 40 | 41 | if name: 42 | params["name"] = name 43 | 44 | if intervention_type: 45 | params["type"] = intervention_type.lower() 46 | 47 | if category: 48 | params["category"] = category 49 | 50 | if codes: 51 | params["codes"] = ",".join(codes) if isinstance(codes, list) else codes 52 | 53 | if include: 54 | params["include"] = ( 55 | ",".join(include) if isinstance(include, list) else include 56 | ) 57 | 58 | if sort: 59 | params["sort"] = sort 60 | if order: 61 | params["order"] = order.lower() 62 | 63 | # Only add size if explicitly requested and > 0 64 | if page_size and page_size > 0: 65 | params["size"] = page_size 66 | 67 | return params 68 | 69 | 70 | def _process_intervention_response( 71 | response: Any, 72 | page: int, 73 | page_size: int | None, 74 | ) -> dict[str, Any]: 75 | """Process intervention search response.""" 76 | if isinstance(response, dict): 77 | # Standard response format from the API 78 | interventions = response.get("data", []) 79 | # When size parameter is used, API doesn't return 'total' 80 | total = response.get("total", len(interventions)) 81 | elif isinstance(response, list): 82 | # Direct list of interventions 83 | interventions = response 84 | total = len(interventions) 85 | else: 86 | # Unexpected response format 87 | logger.warning(f"Unexpected response type: {type(response)}") 88 | interventions = [] 89 | total = 0 90 | 91 | return { 92 | "interventions": interventions, 93 | "total": total, 94 | "page": page, 95 | "page_size": page_size, 96 | } 97 | 98 | 99 | async def search_interventions( 100 | name: str | None = None, 101 | intervention_type: str | None = None, 102 | category: str | None = None, 103 | codes: list[str] | None = None, 104 | include: list[str] | None = None, 105 | sort: str | None = None, 106 | order: str | None = None, 107 | synonyms: bool = True, # Kept for backward compatibility but ignored 108 | page_size: int | None = None, 109 | page: int = 1, 110 | api_key: str | None = None, 111 | ) -> dict[str, Any]: 112 | """ 113 | Search for interventions in the NCI CTS database. 114 | 115 | Args: 116 | name: Intervention name to search for (partial match) 117 | intervention_type: Type of intervention (Drug, Device, Procedure, etc.) 118 | category: Category filter (agent, agent category, other) 119 | codes: List of intervention codes to search for (e.g., ["C82416", "C171257"]) 120 | include: Fields to include in response (all fields, name, category, codes, etc.) 121 | sort: Sort field (default: 'name', also supports 'count') 122 | order: Sort order ('asc' or 'desc', required when using sort) 123 | synonyms: [Deprecated] Kept for backward compatibility but ignored 124 | page_size: Number of results per page (when used, 'total' field not returned) 125 | page: Page number (Note: API doesn't support offset pagination) 126 | api_key: Optional API key (if not provided, uses NCI_API_KEY env var) 127 | 128 | Returns: 129 | Dictionary with search results containing: 130 | - interventions: List of intervention records 131 | - total: Total number of results (only when size not specified) 132 | - page: Current page 133 | - page_size: Results per page 134 | 135 | Raises: 136 | CTSAPIError: If the API request fails 137 | """ 138 | # Build query parameters 139 | params = _build_intervention_params( 140 | name, 141 | intervention_type, 142 | category, 143 | codes, 144 | include, 145 | sort, 146 | order, 147 | page_size, 148 | ) 149 | 150 | logger.info( 151 | f"Searching interventions at {NCI_INTERVENTIONS_URL} with params: {params}" 152 | ) 153 | 154 | try: 155 | # Make API request 156 | response = await make_cts_request( 157 | url=NCI_INTERVENTIONS_URL, 158 | params=params, 159 | api_key=api_key, 160 | ) 161 | 162 | # Log response info 163 | logger.debug(f"Response type: {type(response)}") 164 | 165 | # Process response 166 | return _process_intervention_response(response, page, page_size) 167 | 168 | except CTSAPIError: 169 | raise 170 | except Exception as e: 171 | logger.error(f"Failed to search interventions: {e}") 172 | raise CTSAPIError(f"Intervention search failed: {e!s}") from e 173 | 174 | 175 | def format_intervention_results(results: dict[str, Any]) -> str: 176 | """ 177 | Format intervention search results as markdown. 178 | 179 | Args: 180 | results: Search results dictionary 181 | 182 | Returns: 183 | Formatted markdown string 184 | """ 185 | interventions = results.get("interventions", []) 186 | total = results.get("total", 0) 187 | 188 | if not interventions: 189 | return "No interventions found matching the search criteria." 190 | 191 | # Build markdown output 192 | actual_count = len(interventions) 193 | if actual_count < total: 194 | lines = [ 195 | f"## Intervention Search Results (showing {actual_count} of {total} found)", 196 | "", 197 | ] 198 | else: 199 | lines = [ 200 | f"## Intervention Search Results ({total} found)", 201 | "", 202 | ] 203 | 204 | for intervention in interventions: 205 | int_id = intervention.get( 206 | "id", intervention.get("intervention_id", "Unknown") 207 | ) 208 | name = intervention.get("name", "Unknown Intervention") 209 | int_type = intervention.get( 210 | "type", intervention.get("category", "Unknown") 211 | ) 212 | 213 | lines.append(f"### {name}") 214 | lines.append(f"- **ID**: {int_id}") 215 | lines.append(f"- **Type**: {int_type}") 216 | 217 | # Add synonyms if available 218 | synonyms = intervention.get("synonyms", []) 219 | if synonyms: 220 | if isinstance(synonyms, list): 221 | lines.append(f"- **Synonyms**: {', '.join(synonyms[:5])}") 222 | if len(synonyms) > 5: 223 | lines.append(f" *(and {len(synonyms) - 5} more)*") 224 | elif isinstance(synonyms, str): 225 | lines.append(f"- **Synonyms**: {synonyms}") 226 | 227 | # Add description if available 228 | if intervention.get("description"): 229 | desc = intervention["description"] 230 | if len(desc) > 200: 231 | desc = desc[:197] + "..." 232 | lines.append(f"- **Description**: {desc}") 233 | 234 | lines.append("") 235 | 236 | return "\n".join(lines) 237 | 238 | 239 | async def search_interventions_with_or( 240 | name_query: str, 241 | intervention_type: str | None = None, 242 | category: str | None = None, 243 | codes: list[str] | None = None, 244 | include: list[str] | None = None, 245 | sort: str | None = None, 246 | order: str | None = None, 247 | synonyms: bool = True, 248 | page_size: int | None = None, 249 | page: int = 1, 250 | api_key: str | None = None, 251 | ) -> dict[str, Any]: 252 | """ 253 | Search for interventions with OR query support. 254 | 255 | This function handles OR queries by making multiple API calls and combining results. 256 | For example: "pembrolizumab OR nivolumab" will search for each term. 257 | 258 | Args: 259 | name_query: Name query that may contain OR operators 260 | Other args same as search_interventions 261 | 262 | Returns: 263 | Combined results from all searches with duplicates removed 264 | """ 265 | # Check if this is an OR query 266 | if " OR " in name_query or " or " in name_query: 267 | search_terms = parse_or_query(name_query) 268 | logger.info(f"Parsed OR query into terms: {search_terms}") 269 | else: 270 | # Single term search 271 | search_terms = [name_query] 272 | 273 | # Collect all unique interventions 274 | all_interventions = {} 275 | total_found = 0 276 | 277 | # Search for each term 278 | for term in search_terms: 279 | logger.info(f"Searching interventions for term: {term}") 280 | try: 281 | results = await search_interventions( 282 | name=term, 283 | intervention_type=intervention_type, 284 | category=category, 285 | codes=codes, 286 | include=include, 287 | sort=sort, 288 | order=order, 289 | synonyms=synonyms, 290 | page_size=page_size, 291 | page=page, 292 | api_key=api_key, 293 | ) 294 | 295 | # Add unique interventions (deduplicate by ID) 296 | for intervention in results.get("interventions", []): 297 | int_id = intervention.get( 298 | "id", intervention.get("intervention_id") 299 | ) 300 | if int_id and int_id not in all_interventions: 301 | all_interventions[int_id] = intervention 302 | 303 | total_found += results.get("total", 0) 304 | 305 | except Exception as e: 306 | logger.warning(f"Failed to search for term '{term}': {e}") 307 | # Continue with other terms 308 | 309 | # Convert back to list and apply pagination 310 | unique_interventions = list(all_interventions.values()) 311 | 312 | # Sort by name for consistent results 313 | unique_interventions.sort(key=lambda x: x.get("name", "").lower()) 314 | 315 | # Apply pagination to combined results 316 | if page_size: 317 | start_idx = (page - 1) * page_size 318 | end_idx = start_idx + page_size 319 | paginated_interventions = unique_interventions[start_idx:end_idx] 320 | else: 321 | paginated_interventions = unique_interventions 322 | 323 | return { 324 | "interventions": paginated_interventions, 325 | "total": len(unique_interventions), 326 | "page": page, 327 | "page_size": page_size, 328 | "search_terms": search_terms, # Include what we searched for 329 | "total_found_across_terms": total_found, # Total before deduplication 330 | } 331 | ``` -------------------------------------------------------------------------------- /docs/developer-guides/01-server-deployment.md: -------------------------------------------------------------------------------- ```markdown 1 | # Server Deployment Guide 2 | 3 | This guide covers various deployment options for BioMCP, from local development to production cloud deployments with authentication. 4 | 5 | ## Deployment Options Overview 6 | 7 | | Mode | Use Case | Transport | Authentication | Scalability | 8 | | --------------------- | ------------- | --------------- | -------------- | ----------- | 9 | | **Local STDIO** | Development | STDIO | None | Single user | 10 | | **HTTP Server** | Small teams | Streamable HTTP | Optional | Moderate | 11 | | **Docker** | Containerized | Streamable HTTP | Optional | Moderate | 12 | | **Cloudflare Worker** | Production | SSE/HTTP | OAuth optional | High | 13 | 14 | ## Local Development (STDIO) 15 | 16 | The simplest deployment for development and testing. 17 | 18 | ### Setup 19 | 20 | ```bash 21 | # Install BioMCP 22 | uv tool install biomcp 23 | 24 | # Run in STDIO mode (default) 25 | biomcp run 26 | ``` 27 | 28 | ### Configuration 29 | 30 | For Claude Desktop integration: 31 | 32 | ```json 33 | { 34 | "mcpServers": { 35 | "biomcp": { 36 | "command": "biomcp", 37 | "args": ["run"] 38 | } 39 | } 40 | } 41 | ``` 42 | 43 | ### Use Cases 44 | 45 | - Local development 46 | - Single-user research 47 | - Testing new features 48 | 49 | ## HTTP Server Deployment 50 | 51 | Modern deployment using Streamable HTTP transport. 52 | 53 | ### Basic Setup 54 | 55 | ```bash 56 | # Run HTTP server 57 | biomcp run --mode http --host 0.0.0.0 --port 8000 58 | ``` 59 | 60 | ### With Environment Variables 61 | 62 | ```bash 63 | # Create .env file 64 | cat > .env << EOF 65 | BIOMCP_HOST=0.0.0.0 66 | BIOMCP_PORT=8000 67 | NCI_API_KEY=your-key 68 | ALPHAGENOME_API_KEY=your-key 69 | EOF 70 | 71 | # Run with env file 72 | biomcp run --mode http 73 | ``` 74 | 75 | ### Systemd Service (Linux) 76 | 77 | Create `/etc/systemd/system/biomcp.service`: 78 | 79 | ```ini 80 | [Unit] 81 | Description=BioMCP Server 82 | After=network.target 83 | 84 | [Service] 85 | Type=simple 86 | User=biomcp 87 | WorkingDirectory=/opt/biomcp 88 | Environment="PATH=/usr/local/bin:/usr/bin" 89 | EnvironmentFile=/opt/biomcp/.env 90 | ExecStart=/usr/local/bin/biomcp run --mode http 91 | Restart=always 92 | RestartSec=10 93 | 94 | [Install] 95 | WantedBy=multi-user.target 96 | ``` 97 | 98 | Enable and start: 99 | 100 | ```bash 101 | sudo systemctl enable biomcp 102 | sudo systemctl start biomcp 103 | ``` 104 | 105 | ### Nginx Reverse Proxy 106 | 107 | ```nginx 108 | server { 109 | listen 443 ssl; 110 | server_name biomcp.example.com; 111 | 112 | ssl_certificate /etc/ssl/certs/biomcp.crt; 113 | ssl_certificate_key /etc/ssl/private/biomcp.key; 114 | 115 | location /mcp { 116 | proxy_pass http://localhost:8000; 117 | proxy_http_version 1.1; 118 | proxy_set_header Upgrade $http_upgrade; 119 | proxy_set_header Connection "upgrade"; 120 | proxy_set_header Host $host; 121 | proxy_set_header X-Real-IP $remote_addr; 122 | proxy_buffering off; 123 | } 124 | } 125 | ``` 126 | 127 | ## Docker Deployment 128 | 129 | Containerized deployment for consistency and portability. 130 | 131 | ### Basic Dockerfile 132 | 133 | ```dockerfile 134 | FROM python:3.11-slim 135 | 136 | # Install BioMCP 137 | RUN pip install biomcp-python 138 | 139 | # Add API keys (use secrets in production!) 140 | ENV NCI_API_KEY="" 141 | ENV ALPHAGENOME_API_KEY="" 142 | 143 | # Expose port 144 | EXPOSE 8000 145 | 146 | # Run server 147 | CMD ["biomcp", "run", "--mode", "http", "--host", "0.0.0.0"] 148 | ``` 149 | 150 | ### With AlphaGenome Support 151 | 152 | ```dockerfile 153 | FROM python:3.11-slim 154 | 155 | # Install system dependencies 156 | RUN apt-get update && apt-get install -y git 157 | 158 | # Install BioMCP 159 | RUN pip install biomcp-python 160 | 161 | # Install AlphaGenome 162 | RUN git clone https://github.com/google-deepmind/alphagenome.git && \ 163 | cd alphagenome && \ 164 | pip install . 165 | 166 | # Configure 167 | ENV MCP_MODE=http 168 | ENV BIOMCP_HOST=0.0.0.0 169 | ENV BIOMCP_PORT=8000 170 | 171 | EXPOSE 8000 172 | 173 | CMD ["biomcp", "run"] 174 | ``` 175 | 176 | ### Docker Compose 177 | 178 | ```yaml 179 | version: "3.8" 180 | 181 | services: 182 | biomcp: 183 | build: . 184 | ports: 185 | - "8000:8000" 186 | environment: 187 | - MCP_MODE=http 188 | - NCI_API_KEY=${NCI_API_KEY} 189 | - ALPHAGENOME_API_KEY=${ALPHAGENOME_API_KEY} 190 | volumes: 191 | - ./logs:/app/logs 192 | restart: unless-stopped 193 | healthcheck: 194 | test: ["CMD", "curl", "-f", "http://localhost:8000/health"] 195 | interval: 30s 196 | timeout: 10s 197 | retries: 3 198 | ``` 199 | 200 | ### Running 201 | 202 | ```bash 203 | # Build and run 204 | docker-compose up -d 205 | 206 | # View logs 207 | docker-compose logs -f 208 | 209 | # Scale horizontally 210 | docker-compose up -d --scale biomcp=3 211 | ``` 212 | 213 | ## Cloudflare Worker Deployment 214 | 215 | Enterprise-grade deployment with global edge distribution. 216 | 217 | ### Prerequisites 218 | 219 | 1. Cloudflare account 220 | 2. Wrangler CLI installed 221 | 3. Remote BioMCP server running 222 | 223 | ### Architecture 224 | 225 | ``` 226 | Claude Desktop → Cloudflare Worker (Edge) → BioMCP Server (Origin) 227 | ``` 228 | 229 | ### Setup Worker 230 | 231 | 1. **Install dependencies:** 232 | 233 | ```bash 234 | npm install @modelcontextprotocol/sdk itty-router 235 | ``` 236 | 237 | 2. **Create `wrangler.toml`:** 238 | 239 | ```toml 240 | name = "biomcp-worker" 241 | main = "src/index.js" 242 | compatibility_date = "2024-01-01" 243 | 244 | [vars] 245 | REMOTE_MCP_SERVER_URL = "https://your-biomcp-server.com/mcp" 246 | MCP_SERVER_API_KEY = "your-secret-key" 247 | 248 | [[kv_namespaces]] 249 | binding = "AUTH_TOKENS" 250 | id = "your-kv-namespace-id" 251 | ``` 252 | 253 | 3. **Deploy:** 254 | 255 | ```bash 256 | wrangler deploy 257 | ``` 258 | 259 | ### With OAuth Authentication (Stytch) 260 | 261 | 1. **Configure Stytch:** 262 | 263 | ```toml 264 | [vars] 265 | STYTCH_PROJECT_ID = "project-test-..." 266 | STYTCH_SECRET = "secret-test-..." 267 | STYTCH_PUBLIC_TOKEN = "public-token-test-..." 268 | JWT_SECRET = "your-jwt-secret" 269 | ``` 270 | 271 | 2. **OAuth Endpoints:** 272 | The worker automatically provides: 273 | 274 | - `/.well-known/oauth-authorization-server` 275 | - `/authorize` 276 | - `/callback` 277 | - `/token` 278 | 279 | 3. **Client Configuration:** 280 | 281 | ```json 282 | { 283 | "mcpServers": { 284 | "biomcp": { 285 | "transport": { 286 | "type": "sse", 287 | "url": "https://your-worker.workers.dev" 288 | }, 289 | "auth": { 290 | "type": "oauth", 291 | "client_id": "mcp-client", 292 | "authorization_endpoint": "https://your-worker.workers.dev/authorize", 293 | "token_endpoint": "https://your-worker.workers.dev/token", 294 | "scope": "mcp:access" 295 | } 296 | } 297 | } 298 | } 299 | ``` 300 | 301 | ## Production Considerations 302 | 303 | ### Security 304 | 305 | 1. **API Key Management:** 306 | 307 | ```bash 308 | # Use environment variables 309 | export NCI_API_KEY="$(vault kv get -field=key secret/biomcp/nci)" 310 | 311 | # Or use secrets management 312 | docker run --secret biomcp_keys biomcp:latest 313 | ``` 314 | 315 | 2. **Network Security:** 316 | 317 | - Use HTTPS everywhere 318 | - Implement rate limiting 319 | - Set up CORS properly 320 | - Use authentication for public endpoints 321 | 322 | 3. **Access Control:** 323 | 324 | ```python 325 | # Example middleware 326 | async def auth_middleware(request, call_next): 327 | token = request.headers.get("Authorization") 328 | if not validate_token(token): 329 | return JSONResponse({"error": "Unauthorized"}, status_code=401) 330 | return await call_next(request) 331 | ``` 332 | 333 | ### Monitoring 334 | 335 | 1. **Health Checks:** 336 | 337 | ```python 338 | # Built-in health endpoint 339 | GET /health 340 | 341 | # Custom health check 342 | @app.get("/health/detailed") 343 | async def health_detailed(): 344 | return { 345 | "status": "healthy", 346 | "version": __version__, 347 | "apis": check_api_status(), 348 | "timestamp": datetime.utcnow() 349 | } 350 | ``` 351 | 352 | 2. **Metrics:** 353 | 354 | ```python 355 | # Prometheus metrics 356 | from prometheus_client import Counter, Histogram 357 | 358 | request_count = Counter('biomcp_requests_total', 'Total requests') 359 | request_duration = Histogram('biomcp_request_duration_seconds', 'Request duration') 360 | ``` 361 | 362 | 3. **Logging:** 363 | 364 | ```python 365 | # Structured logging 366 | import structlog 367 | 368 | logger = structlog.get_logger() 369 | logger.info("request_processed", 370 | tool="article_searcher", 371 | duration=0.234, 372 | user_id="user123" 373 | ) 374 | ``` 375 | 376 | ### Scaling 377 | 378 | 1. **Horizontal Scaling:** 379 | 380 | ```yaml 381 | # Kubernetes deployment 382 | apiVersion: apps/v1 383 | kind: Deployment 384 | metadata: 385 | name: biomcp 386 | spec: 387 | replicas: 3 388 | selector: 389 | matchLabels: 390 | app: biomcp 391 | template: 392 | metadata: 393 | labels: 394 | app: biomcp 395 | spec: 396 | containers: 397 | - name: biomcp 398 | image: biomcp:latest 399 | ports: 400 | - containerPort: 8000 401 | resources: 402 | requests: 403 | memory: "512Mi" 404 | cpu: "500m" 405 | limits: 406 | memory: "1Gi" 407 | cpu: "1000m" 408 | ``` 409 | 410 | 2. **Caching:** 411 | 412 | ```python 413 | # Redis caching 414 | import redis 415 | from functools import wraps 416 | 417 | redis_client = redis.Redis() 418 | 419 | def cache_result(ttl=3600): 420 | def decorator(func): 421 | @wraps(func) 422 | async def wrapper(*args, **kwargs): 423 | key = f"{func.__name__}:{str(args)}:{str(kwargs)}" 424 | cached = redis_client.get(key) 425 | if cached: 426 | return json.loads(cached) 427 | result = await func(*args, **kwargs) 428 | redis_client.setex(key, ttl, json.dumps(result)) 429 | return result 430 | return wrapper 431 | return decorator 432 | ``` 433 | 434 | ### Performance Optimization 435 | 436 | 1. **Connection Pooling:** 437 | 438 | ```python 439 | # Reuse HTTP connections 440 | import httpx 441 | 442 | client = httpx.AsyncClient( 443 | limits=httpx.Limits(max_keepalive_connections=20), 444 | timeout=httpx.Timeout(30.0) 445 | ) 446 | ``` 447 | 448 | 2. **Async Processing:** 449 | 450 | ```python 451 | # Process requests concurrently 452 | async def handle_batch(requests): 453 | tasks = [process_request(req) for req in requests] 454 | return await asyncio.gather(*tasks) 455 | ``` 456 | 457 | 3. **Response Compression:** 458 | 459 | ```python 460 | # Enable gzip compression 461 | from fastapi.middleware.gzip import GZipMiddleware 462 | 463 | app.add_middleware(GZipMiddleware, minimum_size=1000) 464 | ``` 465 | 466 | ## Migration Path 467 | 468 | ### From STDIO to HTTP 469 | 470 | 1. Update server startup: 471 | 472 | ```bash 473 | # Old 474 | biomcp run 475 | 476 | # New 477 | biomcp run --mode http 478 | ``` 479 | 480 | 2. Update client configuration: 481 | 482 | ```json 483 | { 484 | "mcpServers": { 485 | "biomcp": { 486 | "url": "http://localhost:8000/mcp" 487 | } 488 | } 489 | } 490 | ``` 491 | 492 | ### From SSE to Streamable HTTP 493 | 494 | 1. Update worker code to use `/mcp` endpoint 495 | 2. Update client to use new transport: 496 | 497 | ```json 498 | { 499 | "transport": { 500 | "type": "http", 501 | "url": "https://biomcp.example.com/mcp" 502 | } 503 | } 504 | ``` 505 | 506 | ## Troubleshooting 507 | 508 | ### Common Issues 509 | 510 | 1. **Port Already in Use:** 511 | 512 | ```bash 513 | # Find process using port 514 | lsof -i :8000 515 | 516 | # Kill process 517 | kill -9 <PID> 518 | ``` 519 | 520 | 2. **API Key Errors:** 521 | 522 | ```bash 523 | # Verify environment variables 524 | env | grep -E "(NCI|ALPHAGENOME|CBIO)" 525 | 526 | # Test API key 527 | curl -H "X-API-KEY: $NCI_API_KEY" https://api.cancer.gov/v2/trials 528 | ``` 529 | 530 | 3. **Connection Timeouts:** 531 | 532 | - Increase timeout values 533 | - Check firewall rules 534 | - Verify network connectivity 535 | 536 | ### Debug Mode 537 | 538 | ```bash 539 | # Enable debug logging 540 | BIOMCP_LOG_LEVEL=DEBUG biomcp run --mode http 541 | 542 | # Or in Docker 543 | docker run -e BIOMCP_LOG_LEVEL=DEBUG biomcp:latest 544 | ``` 545 | 546 | ## Next Steps 547 | 548 | - Set up [monitoring](../how-to-guides/05-logging-and-monitoring-with-bigquery.md) 549 | - Configure [authentication](../getting-started/03-authentication-and-api-keys.md) 550 | - Review [security policies](../policies.md) 551 | - Implement [CI/CD pipeline](02-contributing-and-testing.md) 552 | ``` -------------------------------------------------------------------------------- /src/biomcp/openfda/utils.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Utility functions for OpenFDA API integration. 3 | """ 4 | 5 | import asyncio 6 | import logging 7 | import os 8 | from typing import Any 9 | 10 | from ..http_client import request_api 11 | from .cache import ( 12 | get_cached_response, 13 | is_cacheable_request, 14 | set_cached_response, 15 | ) 16 | from .exceptions import ( 17 | OpenFDAConnectionError, 18 | OpenFDARateLimitError, 19 | OpenFDATimeoutError, 20 | OpenFDAValidationError, 21 | ) 22 | from .input_validation import build_safe_query 23 | from .rate_limiter import FDA_CIRCUIT_BREAKER, FDA_RATE_LIMITER, FDA_SEMAPHORE 24 | from .validation import sanitize_response, validate_fda_response 25 | 26 | logger = logging.getLogger(__name__) 27 | 28 | 29 | def get_api_key() -> str | None: 30 | """Get OpenFDA API key from environment variable.""" 31 | api_key = os.environ.get("OPENFDA_API_KEY") 32 | if not api_key: 33 | logger.debug("No OPENFDA_API_KEY found in environment") 34 | return api_key 35 | 36 | 37 | async def make_openfda_request( # noqa: C901 38 | endpoint: str, 39 | params: dict[str, Any], 40 | domain: str = "openfda", 41 | api_key: str | None = None, 42 | max_retries: int = 3, 43 | initial_delay: float = 1.0, 44 | ) -> tuple[dict[str, Any] | None, str | None]: 45 | """ 46 | Make a request to the OpenFDA API with retry logic and caching. 47 | 48 | Args: 49 | endpoint: Full URL to the OpenFDA endpoint 50 | params: Query parameters 51 | domain: Domain name for metrics tracking 52 | api_key: Optional API key (overrides environment variable) 53 | max_retries: Maximum number of retry attempts (default 3) 54 | initial_delay: Initial delay in seconds for exponential backoff (default 1.0) 55 | 56 | Returns: 57 | Tuple of (response_data, error_message) 58 | """ 59 | # Validate and sanitize input parameters 60 | safe_params = build_safe_query(params) 61 | 62 | # Check cache first (with safe params) 63 | if is_cacheable_request(endpoint, safe_params): 64 | cached_response = get_cached_response(endpoint, safe_params) 65 | if cached_response: 66 | return cached_response, None 67 | 68 | # Use provided API key or get from environment 69 | if not api_key: 70 | api_key = get_api_key() 71 | if api_key: 72 | safe_params["api_key"] = api_key 73 | 74 | last_error = None 75 | delay = initial_delay 76 | 77 | for attempt in range(max_retries + 1): 78 | try: 79 | # Apply rate limiting and circuit breaker 80 | async with FDA_SEMAPHORE: 81 | await FDA_RATE_LIMITER.acquire() 82 | 83 | # Check circuit breaker state 84 | if FDA_CIRCUIT_BREAKER.is_open: 85 | state = FDA_CIRCUIT_BREAKER.get_state() 86 | return None, f"FDA API circuit breaker is open: {state}" 87 | 88 | response, error = await request_api( 89 | url=endpoint, 90 | request=safe_params, 91 | method="GET", 92 | domain=domain, 93 | ) 94 | 95 | if error: 96 | error_msg = ( 97 | error.message if hasattr(error, "message") else str(error) 98 | ) 99 | 100 | # Check for specific error types 101 | if "429" in error_msg or "rate limit" in error_msg.lower(): 102 | if attempt < max_retries: 103 | logger.warning( 104 | f"Rate limit hit (attempt {attempt + 1}/{max_retries + 1}). " 105 | f"Retrying in {delay:.1f} seconds..." 106 | ) 107 | await asyncio.sleep(delay) 108 | delay *= 2 # Exponential backoff 109 | continue 110 | else: 111 | raise OpenFDARateLimitError(error_msg) 112 | 113 | # Check if error is retryable 114 | if _is_retryable_error(error_msg) and attempt < max_retries: 115 | logger.warning( 116 | f"OpenFDA API error (attempt {attempt + 1}/{max_retries + 1}): {error_msg}. " 117 | f"Retrying in {delay:.1f} seconds..." 118 | ) 119 | await asyncio.sleep(delay) 120 | delay *= 2 # Exponential backoff 121 | continue 122 | 123 | logger.error(f"OpenFDA API error: {error_msg}") 124 | return None, error_msg 125 | 126 | # Validate and sanitize response 127 | if response: 128 | try: 129 | validate_fda_response(response, response_type="search") 130 | response = sanitize_response(response) 131 | except OpenFDAValidationError as e: 132 | logger.error(f"Invalid FDA response: {e}") 133 | return None, str(e) 134 | 135 | # Cache successful response 136 | if is_cacheable_request(endpoint, safe_params): 137 | set_cached_response(endpoint, safe_params, response) 138 | 139 | return response, None 140 | 141 | except asyncio.TimeoutError: 142 | last_error = "Request timeout" 143 | if attempt < max_retries: 144 | logger.warning( 145 | f"OpenFDA request timeout (attempt {attempt + 1}/{max_retries + 1}). " 146 | f"Retrying in {delay:.1f} seconds..." 147 | ) 148 | await asyncio.sleep(delay) 149 | delay *= 2 150 | continue 151 | logger.error( 152 | f"OpenFDA request failed after {max_retries + 1} attempts: {last_error}" 153 | ) 154 | raise OpenFDATimeoutError(last_error) from None 155 | 156 | except ConnectionError as e: 157 | last_error = f"Connection error: {e}" 158 | if attempt < max_retries: 159 | logger.warning( 160 | f"OpenFDA connection error (attempt {attempt + 1}/{max_retries + 1}): {e}. " 161 | f"Retrying in {delay:.1f} seconds..." 162 | ) 163 | await asyncio.sleep(delay) 164 | delay *= 2 165 | continue 166 | logger.error( 167 | f"OpenFDA request failed after {max_retries + 1} attempts: {last_error}" 168 | ) 169 | raise OpenFDAConnectionError(last_error) from None 170 | 171 | except ( 172 | OpenFDARateLimitError, 173 | OpenFDATimeoutError, 174 | OpenFDAConnectionError, 175 | ): 176 | # Re-raise our custom exceptions 177 | raise 178 | except Exception as e: 179 | # Handle unexpected errors gracefully 180 | logger.error(f"Unexpected OpenFDA request error: {e}") 181 | return None, str(e) 182 | 183 | return None, last_error 184 | 185 | 186 | def _is_retryable_error(error_msg: str) -> bool: 187 | """ 188 | Check if an error is retryable. 189 | 190 | Args: 191 | error_msg: Error message string 192 | 193 | Returns: 194 | True if the error is retryable 195 | """ 196 | retryable_patterns = [ 197 | "rate limit", 198 | "timeout", 199 | "connection", 200 | "503", # Service unavailable 201 | "502", # Bad gateway 202 | "504", # Gateway timeout 203 | "429", # Too many requests 204 | "temporary", 205 | "try again", 206 | ] 207 | 208 | error_lower = error_msg.lower() 209 | return any(pattern in error_lower for pattern in retryable_patterns) 210 | 211 | 212 | def format_count(count: int, label: str) -> str: 213 | """Format a count with appropriate singular/plural label.""" 214 | if count == 1: 215 | return f"1 {label}" 216 | return f"{count:,} {label}s" 217 | 218 | 219 | def truncate_text(text: str, max_length: int = 500) -> str: 220 | """Truncate text to a maximum length with ellipsis.""" 221 | if len(text) <= max_length: 222 | return text 223 | return text[: max_length - 3] + "..." 224 | 225 | 226 | def clean_text(text: str | None) -> str: 227 | """Clean and normalize text from FDA data.""" 228 | if not text: 229 | return "" 230 | 231 | # Remove extra whitespace and newlines 232 | text = " ".join(text.split()) 233 | 234 | # Remove common FDA formatting artifacts 235 | text = text.replace("\\n", " ") 236 | text = text.replace("\\r", " ") 237 | text = text.replace("\\t", " ") 238 | 239 | return text.strip() 240 | 241 | 242 | def build_search_query( 243 | field_map: dict[str, str], operator: str = "AND" 244 | ) -> str: 245 | """ 246 | Build an OpenFDA search query from field mappings. 247 | 248 | Args: 249 | field_map: Dictionary mapping field names to search values 250 | operator: Logical operator (AND/OR) to combine fields 251 | 252 | Returns: 253 | Formatted search query string 254 | """ 255 | query_parts = [] 256 | 257 | for field, value in field_map.items(): 258 | if value: 259 | # Escape special characters 260 | escaped_value = value.replace('"', '\\"') 261 | # Add quotes for multi-word values 262 | if " " in escaped_value: 263 | escaped_value = f'"{escaped_value}"' 264 | query_parts.append(f"{field}:{escaped_value}") 265 | 266 | return f" {operator} ".join(query_parts) 267 | 268 | 269 | def extract_drug_names(result: dict[str, Any]) -> list[str]: 270 | """Extract drug names from an OpenFDA result.""" 271 | drug_names = set() 272 | 273 | # Check patient drug info (for adverse events) 274 | if "patient" in result: 275 | drugs = result.get("patient", {}).get("drug", []) 276 | for drug in drugs: 277 | if "medicinalproduct" in drug: 278 | drug_names.add(drug["medicinalproduct"]) 279 | # Check OpenFDA fields 280 | openfda = drug.get("openfda", {}) 281 | if "brand_name" in openfda: 282 | drug_names.update(openfda["brand_name"]) 283 | if "generic_name" in openfda: 284 | drug_names.update(openfda["generic_name"]) 285 | 286 | # Check direct OpenFDA fields (for labels) 287 | if "openfda" in result: 288 | openfda = result["openfda"] 289 | if "brand_name" in openfda: 290 | drug_names.update(openfda["brand_name"]) 291 | if "generic_name" in openfda: 292 | drug_names.update(openfda["generic_name"]) 293 | 294 | return sorted(drug_names) 295 | 296 | 297 | def extract_reactions(result: dict[str, Any]) -> list[str]: 298 | """Extract reaction terms from an adverse event result.""" 299 | reactions = [] 300 | 301 | patient = result.get("patient", {}) 302 | reaction_list = patient.get("reaction", []) 303 | 304 | for reaction in reaction_list: 305 | if "reactionmeddrapt" in reaction: 306 | reactions.append(reaction["reactionmeddrapt"]) 307 | 308 | return reactions 309 | 310 | 311 | def format_drug_list(drugs: list[str], max_items: int = 5) -> str: 312 | """Format a list of drug names for display.""" 313 | if not drugs: 314 | return "None specified" 315 | 316 | if len(drugs) <= max_items: 317 | return ", ".join(drugs) 318 | 319 | shown = drugs[:max_items] 320 | remaining = len(drugs) - max_items 321 | return f"{', '.join(shown)} (+{remaining} more)" 322 | ``` -------------------------------------------------------------------------------- /src/biomcp/openfda/drug_recalls.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | OpenFDA drug recalls (Enforcement) integration. 3 | """ 4 | 5 | import logging 6 | from typing import Any 7 | 8 | from .constants import ( 9 | OPENFDA_DEFAULT_LIMIT, 10 | OPENFDA_DISCLAIMER, 11 | OPENFDA_DRUG_ENFORCEMENT_URL, 12 | ) 13 | from .drug_recalls_helpers import ( 14 | build_recall_search_params, 15 | ) 16 | from .utils import ( 17 | clean_text, 18 | format_count, 19 | make_openfda_request, 20 | truncate_text, 21 | ) 22 | 23 | logger = logging.getLogger(__name__) 24 | 25 | 26 | async def search_drug_recalls( 27 | drug: str | None = None, 28 | recall_class: str | None = None, 29 | status: str | None = None, 30 | reason: str | None = None, 31 | since_date: str | None = None, 32 | limit: int = OPENFDA_DEFAULT_LIMIT, 33 | skip: int = 0, 34 | api_key: str | None = None, 35 | ) -> str: 36 | """ 37 | Search FDA drug recall records from Enforcement database. 38 | 39 | Args: 40 | drug: Drug name (brand or generic) to search for 41 | recall_class: Classification (1, 2, or 3) 42 | status: Recall status (ongoing, completed, terminated) 43 | reason: Search text in recall reason 44 | since_date: Only show recalls after this date (YYYYMMDD format) 45 | limit: Maximum number of results to return 46 | skip: Number of results to skip (for pagination) 47 | 48 | api_key: Optional OpenFDA API key (overrides OPENFDA_API_KEY env var) 49 | 50 | Returns: 51 | Formatted string with drug recall information 52 | """ 53 | # Build search parameters 54 | search_params = build_recall_search_params( 55 | drug, recall_class, status, reason, since_date, limit, skip 56 | ) 57 | 58 | # Make the request 59 | response, error = await make_openfda_request( 60 | OPENFDA_DRUG_ENFORCEMENT_URL, search_params, "openfda_recalls", api_key 61 | ) 62 | 63 | if error: 64 | return f"⚠️ Error searching drug recalls: {error}" 65 | 66 | if not response or not response.get("results"): 67 | return "No drug recall records found matching your criteria." 68 | 69 | # Format the results 70 | results = response["results"] 71 | total = ( 72 | response.get("meta", {}).get("results", {}).get("total", len(results)) 73 | ) 74 | 75 | output = ["## FDA Drug Recall Records\n"] 76 | 77 | if drug: 78 | output.append(f"**Drug**: {drug}") 79 | if recall_class: 80 | output.append(f"**Classification**: Class {recall_class}") 81 | if status: 82 | output.append(f"**Status**: {status}") 83 | if since_date: 84 | output.append(f"**Since**: {since_date}") 85 | 86 | output.append( 87 | f"**Total Recalls Found**: {format_count(total, 'recall')}\n" 88 | ) 89 | 90 | # Summary of recall classes if multiple results 91 | if len(results) > 1: 92 | output.extend(_format_recall_class_summary(results)) 93 | 94 | # Show results 95 | output.append(f"### Recalls (showing {len(results)} of {total}):\n") 96 | 97 | for i, recall in enumerate(results, 1): 98 | output.extend(_format_recall_summary(recall, i)) 99 | 100 | output.append(f"\n{OPENFDA_DISCLAIMER}") 101 | 102 | return "\n".join(output) 103 | 104 | 105 | async def get_drug_recall( 106 | recall_number: str, 107 | api_key: str | None = None, 108 | ) -> str: 109 | """ 110 | Get detailed drug recall information for a specific recall. 111 | 112 | Args: 113 | recall_number: FDA recall number 114 | 115 | api_key: Optional OpenFDA API key (overrides OPENFDA_API_KEY env var) 116 | 117 | Returns: 118 | Formatted string with detailed recall information 119 | """ 120 | # Search for the specific recall 121 | search_params = {"search": f'recall_number:"{recall_number}"', "limit": 1} 122 | 123 | response, error = await make_openfda_request( 124 | OPENFDA_DRUG_ENFORCEMENT_URL, search_params, "openfda_recalls", api_key 125 | ) 126 | 127 | if error: 128 | return f"⚠️ Error retrieving drug recall: {error}" 129 | 130 | if not response or not response.get("results"): 131 | return f"No recall record found for {recall_number}" 132 | 133 | recall = response["results"][0] 134 | 135 | # Format detailed recall information 136 | output = [f"## Drug Recall Details: {recall_number}\n"] 137 | 138 | # Basic information 139 | output.extend(_format_recall_header(recall)) 140 | 141 | # Reason and details 142 | output.extend(_format_recall_details(recall)) 143 | 144 | # Distribution information 145 | output.extend(_format_distribution_info(recall)) 146 | 147 | # OpenFDA metadata 148 | if openfda := recall.get("openfda"): 149 | output.extend(_format_recall_openfda(openfda)) 150 | 151 | output.append(f"\n{OPENFDA_DISCLAIMER}") 152 | 153 | return "\n".join(output) 154 | 155 | 156 | def _format_recall_class_summary(results: list[dict[str, Any]]) -> list[str]: 157 | """Format summary of recall classifications.""" 158 | output = [] 159 | 160 | # Count by classification 161 | class_counts = {"Class I": 0, "Class II": 0, "Class III": 0} 162 | for recall in results: 163 | classification = recall.get("classification", "") 164 | if classification in class_counts: 165 | class_counts[classification] += 1 166 | 167 | if any(class_counts.values()): 168 | output.append("### Classification Summary:") 169 | if class_counts["Class I"]: 170 | output.append( 171 | f"- **Class I** (most serious): {class_counts['Class I']} recalls" 172 | ) 173 | if class_counts["Class II"]: 174 | output.append( 175 | f"- **Class II** (moderate): {class_counts['Class II']} recalls" 176 | ) 177 | if class_counts["Class III"]: 178 | output.append( 179 | f"- **Class III** (least serious): {class_counts['Class III']} recalls" 180 | ) 181 | output.append("") 182 | 183 | return output 184 | 185 | 186 | def _format_recall_summary(recall: dict[str, Any], num: int) -> list[str]: 187 | """Format a single recall summary.""" 188 | output = [f"#### {num}. Recall {recall.get('recall_number', 'Unknown')}"] 189 | 190 | # Classification and status 191 | classification = recall.get("classification", "Unknown") 192 | status = recall.get("status", "Unknown") 193 | 194 | # Add severity indicator 195 | severity_emoji = { 196 | "Class I": "🔴", # Most serious 197 | "Class II": "🟡", # Moderate 198 | "Class III": "🟢", # Least serious 199 | }.get(classification, "⚪") 200 | 201 | output.append(f"{severity_emoji} **{classification}** - {status}") 202 | 203 | # Date 204 | if init_date := recall.get("recall_initiation_date"): 205 | formatted_date = f"{init_date[:4]}-{init_date[4:6]}-{init_date[6:]}" 206 | output.append(f"**Initiated**: {formatted_date}") 207 | 208 | # Product description 209 | if product_desc := recall.get("product_description"): 210 | cleaned = truncate_text(clean_text(product_desc), 200) 211 | output.append(f"**Product**: {cleaned}") 212 | 213 | # OpenFDA names 214 | openfda = recall.get("openfda", {}) 215 | if brand_names := openfda.get("brand_name"): 216 | output.append(f"**Brand**: {', '.join(brand_names[:3])}") 217 | 218 | # Reason for recall 219 | if reason := recall.get("reason_for_recall"): 220 | cleaned_reason = truncate_text(clean_text(reason), 300) 221 | output.append(f"\n**Reason**: {cleaned_reason}") 222 | 223 | # Firm name 224 | if firm := recall.get("recalling_firm"): 225 | output.append(f"\n**Recalling Firm**: {firm}") 226 | 227 | output.append("") 228 | return output 229 | 230 | 231 | def _format_recall_header(recall: dict[str, Any]) -> list[str]: 232 | """Format the header section of detailed recall.""" 233 | output = ["### Recall Information"] 234 | 235 | output.append( 236 | f"**Recall Number**: {recall.get('recall_number', 'Unknown')}" 237 | ) 238 | output.append( 239 | f"**Classification**: {recall.get('classification', 'Unknown')}" 240 | ) 241 | output.append(f"**Status**: {recall.get('status', 'Unknown')}") 242 | 243 | if event_id := recall.get("event_id"): 244 | output.append(f"**Event ID**: {event_id}") 245 | 246 | # Dates 247 | if init_date := recall.get("recall_initiation_date"): 248 | formatted = f"{init_date[:4]}-{init_date[4:6]}-{init_date[6:]}" 249 | output.append(f"**Initiation Date**: {formatted}") 250 | 251 | if report_date := recall.get("report_date"): 252 | formatted = f"{report_date[:4]}-{report_date[4:6]}-{report_date[6:]}" 253 | output.append(f"**Report Date**: {formatted}") 254 | 255 | if term_date := recall.get("termination_date"): 256 | formatted = f"{term_date[:4]}-{term_date[4:6]}-{term_date[6:]}" 257 | output.append(f"**Termination Date**: {formatted}") 258 | 259 | output.append("") 260 | return output 261 | 262 | 263 | def _format_recall_details(recall: dict[str, Any]) -> list[str]: 264 | """Format recall details and reason.""" 265 | output = ["### Product and Reason"] 266 | 267 | if product_desc := recall.get("product_description"): 268 | output.append(f"**Product Description**:\n{clean_text(product_desc)}") 269 | 270 | if reason := recall.get("reason_for_recall"): 271 | output.append(f"\n**Reason for Recall**:\n{clean_text(reason)}") 272 | 273 | if quantity := recall.get("product_quantity"): 274 | output.append(f"\n**Product Quantity**: {quantity}") 275 | 276 | if code_info := recall.get("code_info"): 277 | output.append(f"\n**Code Information**:\n{clean_text(code_info)}") 278 | 279 | output.append("") 280 | return output 281 | 282 | 283 | def _format_distribution_info(recall: dict[str, Any]) -> list[str]: 284 | """Format distribution information.""" 285 | output = ["### Distribution Information"] 286 | 287 | if firm := recall.get("recalling_firm"): 288 | output.append(f"**Recalling Firm**: {firm}") 289 | 290 | if city := recall.get("city"): 291 | state = recall.get("state", "") 292 | country = recall.get("country", "") 293 | location = city 294 | if state: 295 | location += f", {state}" 296 | if country: 297 | location += f", {country}" 298 | output.append(f"**Location**: {location}") 299 | 300 | if dist_pattern := recall.get("distribution_pattern"): 301 | output.append( 302 | f"\n**Distribution Pattern**:\n{clean_text(dist_pattern)}" 303 | ) 304 | 305 | if action := recall.get("voluntary_mandated"): 306 | output.append(f"\n**Action Type**: {action}") 307 | 308 | output.append("") 309 | return output 310 | 311 | 312 | def _format_recall_openfda(openfda: dict[str, Any]) -> list[str]: 313 | """Format OpenFDA metadata for recall.""" 314 | output = ["### Drug Information"] 315 | 316 | if brand_names := openfda.get("brand_name"): 317 | output.append(f"**Brand Names**: {', '.join(brand_names)}") 318 | 319 | if generic_names := openfda.get("generic_name"): 320 | output.append(f"**Generic Names**: {', '.join(generic_names)}") 321 | 322 | if manufacturers := openfda.get("manufacturer_name"): 323 | output.append(f"**Manufacturers**: {', '.join(manufacturers[:3])}") 324 | 325 | if ndas := openfda.get("application_number"): 326 | output.append(f"**Application Numbers**: {', '.join(ndas[:5])}") 327 | 328 | if routes := openfda.get("route"): 329 | output.append(f"**Routes**: {', '.join(routes)}") 330 | 331 | if pharm_class := openfda.get("pharm_class_epc"): 332 | output.append(f"**Pharmacologic Class**: {', '.join(pharm_class[:3])}") 333 | 334 | output.append("") 335 | return output 336 | ``` -------------------------------------------------------------------------------- /docs/workflows/all-workflows.md: -------------------------------------------------------------------------------- ```markdown 1 | # BioMCP Research Workflows 2 | 3 | Quick, practical workflows for common biomedical research tasks. 4 | 5 | ## 1. Literature Review Workflow 6 | 7 | ### Quick Start 8 | 9 | ```bash 10 | # Find key papers on BRAF V600E melanoma therapy 11 | biomcp article search --gene BRAF --disease melanoma \ 12 | --keyword "V600E|therapy|treatment" --limit 50 \ 13 | --format json > braf_papers.json 14 | ``` 15 | 16 | ### Full Workflow Script 17 | 18 | ```python 19 | import asyncio 20 | from biomcp import BioMCPClient 21 | 22 | async def literature_review(gene, disease, focus_terms): 23 | async with BioMCPClient() as client: 24 | # 1. Get gene context 25 | gene_info = await client.genes.get(gene) 26 | 27 | # 2. Search by topic 28 | results = {} 29 | for term in focus_terms: 30 | articles = await client.articles.search( 31 | genes=[gene], 32 | diseases=[disease], 33 | keywords=[term], 34 | limit=30 35 | ) 36 | results[term] = articles.articles 37 | 38 | # 3. Generate summary 39 | print(f"\n{gene} in {disease}: Found {sum(len(v) for v in results.values())} articles") 40 | for topic, articles in results.items(): 41 | print(f"\n{topic}: {len(articles)} articles") 42 | for a in articles[:3]: 43 | print(f" - {a.title[:80]}... ({a.year})") 44 | 45 | return results 46 | 47 | # Run it 48 | asyncio.run(literature_review( 49 | "BRAF", 50 | "melanoma", 51 | ["resistance", "combination therapy", "immunotherapy"] 52 | )) 53 | ``` 54 | 55 | ### Key Points 56 | 57 | - Start broad, then narrow by topic 58 | - Use OR syntax for variant notations 59 | - Export results for citation management 60 | - Set up weekly searches for updates 61 | 62 | --- 63 | 64 | ## 2. Clinical Trial Matching Workflow 65 | 66 | ### Quick Start 67 | 68 | ```bash 69 | # Find trials for EGFR-mutant lung cancer near Boston 70 | biomcp trial search --condition "lung cancer" \ 71 | --term "EGFR mutation" --status RECRUITING \ 72 | --latitude 42.3601 --longitude -71.0589 --distance 100 73 | ``` 74 | 75 | ### Patient Matching Script 76 | 77 | ```python 78 | async def match_patient_to_trials(patient_profile): 79 | async with BioMCPClient() as client: 80 | # 1. Search trials with location 81 | trials = await client.trials.search( 82 | conditions=[patient_profile['diagnosis']], 83 | other_terms=patient_profile['mutations'], 84 | lat=patient_profile['lat'], 85 | long=patient_profile['long'], 86 | distance=patient_profile['max_distance'], 87 | status="RECRUITING" 88 | ) 89 | 90 | # 2. Score trials 91 | scored = [] 92 | for trial in trials.trials[:20]: 93 | score = 0 94 | 95 | # Location score 96 | if trial.distance < 50: 97 | score += 25 98 | 99 | # Phase score 100 | if trial.phase == "PHASE3": 101 | score += 20 102 | elif trial.phase == "PHASE2": 103 | score += 15 104 | 105 | # Mutation match 106 | if any(mut in str(trial.eligibility) for mut in patient_profile['mutations']): 107 | score += 30 108 | 109 | scored.append((score, trial)) 110 | 111 | # 3. Return top matches 112 | scored.sort(reverse=True, key=lambda x: x[0]) 113 | return [(s, t) for s, t in scored[:5]] 114 | 115 | # Example patient 116 | patient = { 117 | 'diagnosis': 'non-small cell lung cancer', 118 | 'mutations': ['EGFR L858R'], 119 | 'lat': 42.3601, 120 | 'long': -71.0589, 121 | 'max_distance': 100 122 | } 123 | 124 | matches = asyncio.run(match_patient_to_trials(patient)) 125 | ``` 126 | 127 | ### Key Points 128 | 129 | - Always use coordinates for location search 130 | - Check both ClinicalTrials.gov and NCI sources 131 | - Contact trial sites directly for pre-screening 132 | - Consider travel burden in recommendations 133 | 134 | --- 135 | 136 | ## 3. Variant Interpretation Workflow 137 | 138 | ### Quick Start 139 | 140 | ```bash 141 | # Get variant annotations 142 | biomcp variant get rs121913529 # By rsID 143 | biomcp variant get "NM_007294.4:c.5266dupC" # By HGVS 144 | 145 | # Search pathogenic variants 146 | biomcp variant search --gene BRCA1 --significance pathogenic 147 | ``` 148 | 149 | ### Variant Analysis Script 150 | 151 | ```python 152 | async def interpret_variant(gene, variant_notation, cancer_type): 153 | async with BioMCPClient() as client: 154 | # 1. Get variant details 155 | try: 156 | variant = await client.variants.get(variant_notation) 157 | significance = variant.clinical_significance 158 | frequency = variant.frequencies.gnomad if hasattr(variant, 'frequencies') else None 159 | except: 160 | significance = "Not found" 161 | frequency = None 162 | 163 | # 2. Search literature 164 | articles = await client.articles.search( 165 | genes=[gene], 166 | variants=[variant_notation], 167 | diseases=[cancer_type], 168 | limit=10 169 | ) 170 | 171 | # 3. Find trials 172 | trials = await client.trials.search( 173 | conditions=[cancer_type], 174 | other_terms=[f"{gene} mutation"], 175 | status="RECRUITING", 176 | limit=5 177 | ) 178 | 179 | # 4. Generate interpretation 180 | print(f"\nVariant: {gene} {variant_notation}") 181 | print(f"Significance: {significance}") 182 | print(f"Population Frequency: {frequency or 'Unknown'}") 183 | print(f"Literature: {len(articles.articles)} relevant papers") 184 | print(f"Clinical Trials: {len(trials.trials)} active trials") 185 | 186 | # Actionability assessment 187 | if significance in ["Pathogenic", "Likely pathogenic"]: 188 | if trials.trials: 189 | print("✓ ACTIONABLE - Clinical trials available") 190 | else: 191 | print("⚠ Pathogenic but no targeted trials") 192 | 193 | return { 194 | 'significance': significance, 195 | 'frequency': frequency, 196 | 'articles': len(articles.articles), 197 | 'trials': len(trials.trials) 198 | } 199 | 200 | # Run it 201 | asyncio.run(interpret_variant("BRAF", "p.V600E", "melanoma")) 202 | ``` 203 | 204 | ### Key Points 205 | 206 | - Check multiple databases (MyVariant, ClinVar via articles) 207 | - Consider cancer type for interpretation 208 | - Look for FDA-approved therapies 209 | - Document tier classification 210 | 211 | --- 212 | 213 | ## 4. Quick Integration Patterns 214 | 215 | ### Batch Processing 216 | 217 | ```python 218 | # Process multiple queries efficiently 219 | async def batch_analysis(items): 220 | async with BioMCPClient() as client: 221 | tasks = [] 222 | for item in items: 223 | if item['type'] == 'gene': 224 | tasks.append(client.genes.get(item['id'])) 225 | elif item['type'] == 'variant': 226 | tasks.append(client.variants.get(item['id'])) 227 | 228 | results = await asyncio.gather(*tasks, return_exceptions=True) 229 | return results 230 | ``` 231 | 232 | ### Error Handling 233 | 234 | ```python 235 | from biomcp.exceptions import NotFoundError, RateLimitError 236 | import time 237 | 238 | async def robust_search(search_func, **params): 239 | retries = 3 240 | for attempt in range(retries): 241 | try: 242 | return await search_func(**params) 243 | except RateLimitError as e: 244 | if attempt < retries - 1: 245 | time.sleep(2 ** attempt) # Exponential backoff 246 | else: 247 | raise 248 | except NotFoundError: 249 | return None 250 | ``` 251 | 252 | ### Caching Results 253 | 254 | ```python 255 | from functools import lru_cache 256 | import json 257 | 258 | # Simple file-based cache 259 | def cache_results(filename): 260 | def decorator(func): 261 | async def wrapper(*args, **kwargs): 262 | # Check cache 263 | try: 264 | with open(filename, 'r') as f: 265 | return json.load(f) 266 | except FileNotFoundError: 267 | pass 268 | 269 | # Fetch and cache 270 | result = await func(*args, **kwargs) 271 | with open(filename, 'w') as f: 272 | json.dump(result, f) 273 | return result 274 | return wrapper 275 | return decorator 276 | 277 | @cache_results('gene_cache.json') 278 | async def get_gene_info(gene): 279 | async with BioMCPClient() as client: 280 | return await client.genes.get(gene) 281 | ``` 282 | 283 | --- 284 | 285 | ## Complete Example: Precision Medicine Report 286 | 287 | ```python 288 | async def generate_precision_medicine_report(patient): 289 | """Generate comprehensive report for molecular tumor board.""" 290 | 291 | async with BioMCPClient() as client: 292 | report = { 293 | 'patient_id': patient['id'], 294 | 'date': datetime.now().isoformat(), 295 | 'variants': [], 296 | 'trials': [], 297 | 'therapies': [] 298 | } 299 | 300 | # Analyze each variant 301 | for variant in patient['variants']: 302 | # Get annotations 303 | var_info = await robust_search( 304 | client.variants.search, 305 | gene=variant['gene'], 306 | hgvs=variant['hgvs'] 307 | ) 308 | 309 | # Search literature 310 | articles = await client.articles.search( 311 | genes=[variant['gene']], 312 | diseases=[patient['cancer_type']], 313 | keywords=['therapy', 'treatment'], 314 | limit=5 315 | ) 316 | 317 | # Find trials 318 | trials = await client.trials.search( 319 | conditions=[patient['cancer_type']], 320 | other_terms=[f"{variant['gene']} mutation"], 321 | status="RECRUITING", 322 | limit=3 323 | ) 324 | 325 | report['variants'].append({ 326 | 'variant': variant, 327 | 'annotation': var_info, 328 | 'relevant_articles': len(articles.articles), 329 | 'available_trials': len(trials.trials) 330 | }) 331 | 332 | report['trials'].extend(trials.trials) 333 | 334 | # Generate summary 335 | print(f"\nPrecision Medicine Report - {patient['id']}") 336 | print(f"Cancer Type: {patient['cancer_type']}") 337 | print(f"Variants Analyzed: {len(report['variants'])}") 338 | print(f"Clinical Trials Found: {len(report['trials'])}") 339 | 340 | # Prioritize actionable findings 341 | actionable = [v for v in report['variants'] 342 | if v['available_trials'] > 0] 343 | 344 | if actionable: 345 | print(f"\n✓ {len(actionable)} ACTIONABLE variants with trial options") 346 | 347 | return report 348 | 349 | # Example usage 350 | patient = { 351 | 'id': 'PT001', 352 | 'cancer_type': 'lung adenocarcinoma', 353 | 'variants': [ 354 | {'gene': 'EGFR', 'hgvs': 'p.L858R'}, 355 | {'gene': 'TP53', 'hgvs': 'p.R273H'} 356 | ] 357 | } 358 | 359 | report = asyncio.run(generate_precision_medicine_report(patient)) 360 | ``` 361 | 362 | --- 363 | 364 | ## Tips for All Workflows 365 | 366 | 1. **Always start with the think tool** (for AI assistants) 367 | 2. **Use official gene symbols** - check genenames.org 368 | 3. **Batch API calls** when possible 369 | 4. **Handle errors gracefully** - APIs can be unavailable 370 | 5. **Cache frequently accessed data** - respect rate limits 371 | 6. **Document your process** - for reproducibility 372 | 373 | ## Next Steps 374 | 375 | - [Command Reference](../reference/quick-reference.md) 376 | - [API Documentation](../apis/python-sdk.md) 377 | - [Troubleshooting](../troubleshooting.md) 378 | ``` -------------------------------------------------------------------------------- /src/biomcp/trials/nci_search.py: -------------------------------------------------------------------------------- ```python 1 | """NCI Clinical Trials Search API integration for trial searches.""" 2 | 3 | import logging 4 | from typing import Any 5 | 6 | from ..constants import NCI_TRIALS_URL 7 | from ..diseases.search import search_diseases 8 | from ..integrations.cts_api import CTSAPIError, make_cts_request 9 | from ..interventions.search import search_interventions 10 | from .search import TrialQuery 11 | 12 | logger = logging.getLogger(__name__) 13 | 14 | 15 | async def _expand_disease_terms( 16 | conditions: list[str], 17 | expand_synonyms: bool, 18 | ) -> list[str]: 19 | """Expand disease terms with synonyms if requested.""" 20 | if not expand_synonyms: 21 | return conditions 22 | 23 | disease_terms = [] 24 | for condition in conditions: 25 | try: 26 | results = await search_diseases( 27 | name=condition, 28 | include_synonyms=True, 29 | page_size=5, 30 | ) 31 | # Add the original term plus any exact matches 32 | disease_terms.append(condition) 33 | for disease in results.get("diseases", [])[:3]: 34 | if disease.get("name"): 35 | disease_terms.append(disease["name"]) 36 | # Add top synonyms 37 | synonyms = disease.get("synonyms", []) 38 | if isinstance(synonyms, list): 39 | disease_terms.extend(synonyms[:2]) 40 | except Exception as e: 41 | logger.warning(f"Failed to expand disease term {condition}: {e}") 42 | disease_terms.append(condition) 43 | 44 | # Remove duplicates while preserving order 45 | seen = set() 46 | unique_diseases = [] 47 | for term in disease_terms: 48 | if term.lower() not in seen: 49 | seen.add(term.lower()) 50 | unique_diseases.append(term) 51 | 52 | return unique_diseases 53 | 54 | 55 | async def _normalize_interventions(interventions: list[str]) -> list[str]: 56 | """Normalize intervention names to IDs where possible.""" 57 | intervention_ids = [] 58 | for intervention in interventions: 59 | try: 60 | results = await search_interventions( 61 | name=intervention, 62 | page_size=1, 63 | ) 64 | interventions_data = results.get("interventions", []) 65 | if interventions_data: 66 | # Use the ID if available, otherwise the name 67 | int_id = interventions_data[0].get("id", intervention) 68 | intervention_ids.append(int_id) 69 | else: 70 | intervention_ids.append(intervention) 71 | except Exception: 72 | intervention_ids.append(intervention) 73 | 74 | return intervention_ids 75 | 76 | 77 | def _map_phase_to_nci(phase: Any) -> str | None: 78 | """Map TrialPhase enum to NCI phase values.""" 79 | if not phase: 80 | return None 81 | 82 | phase_map = { 83 | "EARLY_PHASE1": "I", 84 | "PHASE1": "I", 85 | "PHASE2": "II", 86 | "PHASE3": "III", 87 | "PHASE4": "IV", 88 | "NOT_APPLICABLE": "NA", 89 | } 90 | return phase_map.get(phase.value, phase.value) 91 | 92 | 93 | def _map_status_to_nci(recruiting_status: Any) -> list[str] | None: 94 | """Map RecruitingStatus enum to NCI status values.""" 95 | if not recruiting_status: 96 | return None 97 | 98 | status_map = { 99 | "OPEN": ["recruiting", "enrolling_by_invitation"], 100 | "CLOSED": ["active_not_recruiting", "completed", "terminated"], 101 | "ANY": None, 102 | } 103 | return status_map.get(recruiting_status.value) 104 | 105 | 106 | def _map_sort_to_nci(sort: Any) -> str | None: 107 | """Map SortOrder enum to NCI sort values.""" 108 | if not sort: 109 | return None 110 | 111 | sort_map = { 112 | "RELEVANCE": "relevance", 113 | "LAST_UPDATE": "last_update_date", 114 | "START_DATE": "start_date", 115 | "COMPLETION_DATE": "completion_date", 116 | } 117 | return sort_map.get(sort.value) 118 | 119 | 120 | def _add_location_params(params: dict[str, Any], query: TrialQuery) -> None: 121 | """Add location parameters if present.""" 122 | if query.lat is not None and query.long is not None: 123 | params["latitude"] = query.lat 124 | params["longitude"] = query.long 125 | params["distance"] = query.distance or 50 126 | 127 | 128 | def _add_eligibility_params(params: dict[str, Any], query: TrialQuery) -> None: 129 | """Add advanced eligibility criteria parameters.""" 130 | if query.prior_therapies: 131 | params["prior_therapy"] = query.prior_therapies 132 | 133 | if query.required_mutations: 134 | params["biomarkers"] = query.required_mutations 135 | 136 | if query.allow_brain_mets is not None: 137 | params["accepts_brain_mets"] = query.allow_brain_mets 138 | 139 | 140 | async def convert_query_to_nci(query: TrialQuery) -> dict[str, Any]: 141 | """ 142 | Convert a TrialQuery object to NCI CTS API parameters. 143 | 144 | Maps BioMCP's TrialQuery fields to NCI's parameter structure. 145 | """ 146 | params: dict[str, Any] = {} 147 | 148 | # Basic search terms 149 | if query.terms: 150 | params["_fulltext"] = " ".join(query.terms) 151 | 152 | # Conditions/diseases with synonym expansion 153 | if query.conditions: 154 | disease_terms = await _expand_disease_terms( 155 | query.conditions, 156 | query.expand_synonyms, 157 | ) 158 | if disease_terms: 159 | params["diseases"] = disease_terms 160 | 161 | # Interventions 162 | if query.interventions: 163 | params["interventions"] = await _normalize_interventions( 164 | query.interventions 165 | ) 166 | 167 | # NCT IDs 168 | if query.nct_ids: 169 | params["nct_ids"] = query.nct_ids 170 | 171 | # Phase and status mappings 172 | nci_phase = _map_phase_to_nci(query.phase) 173 | if nci_phase: 174 | params["phase"] = nci_phase 175 | 176 | statuses = _map_status_to_nci(query.recruiting_status) 177 | if statuses: 178 | params["recruitment_status"] = statuses 179 | 180 | # Location and eligibility 181 | _add_location_params(params, query) 182 | _add_eligibility_params(params, query) 183 | 184 | # Pagination 185 | params["size"] = query.page_size if query.page_size else 20 186 | 187 | # Sort order 188 | sort_value = _map_sort_to_nci(query.sort) 189 | if sort_value: 190 | params["sort"] = sort_value 191 | 192 | return params 193 | 194 | 195 | async def search_trials_nci( 196 | query: TrialQuery, 197 | api_key: str | None = None, 198 | ) -> dict[str, Any]: 199 | """ 200 | Search for clinical trials using NCI CTS API. 201 | 202 | Returns: 203 | Dictionary with: 204 | - trials: List of trial records 205 | - total: Total number of results 206 | - next_page: Token for next page (if available) 207 | - source: "nci" to indicate data source 208 | """ 209 | try: 210 | # Convert query to NCI parameters 211 | params = await convert_query_to_nci(query) 212 | 213 | # Make API request 214 | response = await make_cts_request( 215 | url=NCI_TRIALS_URL, 216 | params=params, 217 | api_key=api_key, 218 | ) 219 | 220 | # Process response 221 | trials = response.get("data", response.get("trials", [])) 222 | total = response.get("total", len(trials)) 223 | next_page = response.get("next_page_token") 224 | 225 | return { 226 | "trials": trials, 227 | "total": total, 228 | "next_page": next_page, 229 | "source": "nci", 230 | } 231 | 232 | except CTSAPIError: 233 | raise 234 | except Exception as e: 235 | logger.error(f"NCI trial search failed: {e}") 236 | raise CTSAPIError(f"Trial search failed: {e!s}") from e 237 | 238 | 239 | def _format_trial_header(trial: dict[str, Any]) -> list[str]: 240 | """Format trial header with basic info.""" 241 | nct_id = trial.get("nct_id", trial.get("protocol_id", "Unknown")) 242 | title = trial.get("title", trial.get("brief_title", "Untitled")) 243 | phase = trial.get("phase", "Not specified") 244 | status = trial.get("overall_status", trial.get("status", "Unknown")) 245 | 246 | return [ 247 | f"### [{nct_id}] {title}", 248 | f"- **Phase**: {phase}", 249 | f"- **Status**: {status}", 250 | ] 251 | 252 | 253 | def _format_trial_summary_text(trial: dict[str, Any]) -> list[str]: 254 | """Format trial summary text if available.""" 255 | summary = trial.get("brief_summary", trial.get("description", "")) 256 | if not summary: 257 | return [] 258 | 259 | if len(summary) > 200: 260 | summary = summary[:197] + "..." 261 | return [f"- **Summary**: {summary}"] 262 | 263 | 264 | def _format_trial_conditions(trial: dict[str, Any]) -> list[str]: 265 | """Format trial conditions/diseases.""" 266 | conditions = trial.get("diseases", trial.get("conditions", [])) 267 | if not conditions: 268 | return [] 269 | 270 | lines = [] 271 | if isinstance(conditions, list): 272 | lines.append(f"- **Conditions**: {', '.join(conditions[:3])}") 273 | if len(conditions) > 3: 274 | lines.append(f" *(and {len(conditions) - 3} more)*") 275 | else: 276 | lines.append(f"- **Conditions**: {conditions}") 277 | 278 | return lines 279 | 280 | 281 | def _format_trial_interventions(trial: dict[str, Any]) -> list[str]: 282 | """Format trial interventions.""" 283 | interventions = trial.get("interventions", []) 284 | if not interventions: 285 | return [] 286 | 287 | int_names = [] 288 | for intervention in interventions[:3]: 289 | if isinstance(intervention, dict): 290 | int_names.append(intervention.get("name", "Unknown")) 291 | else: 292 | int_names.append(str(intervention)) 293 | 294 | if not int_names: 295 | return [] 296 | 297 | lines = [f"- **Interventions**: {', '.join(int_names)}"] 298 | if len(interventions) > 3: 299 | lines.append(f" *(and {len(interventions) - 3} more)*") 300 | 301 | return lines 302 | 303 | 304 | def _format_trial_metadata(trial: dict[str, Any]) -> list[str]: 305 | """Format trial metadata (sponsor, eligibility notes).""" 306 | lines = [] 307 | 308 | lead_org = trial.get("lead_org", trial.get("sponsor", "")) 309 | if lead_org: 310 | lines.append(f"- **Lead Organization**: {lead_org}") 311 | 312 | if trial.get("accepts_brain_mets"): 313 | lines.append("- **Note**: Accepts patients with brain metastases") 314 | 315 | return lines 316 | 317 | 318 | def _format_trial_summary(trial: dict[str, Any]) -> list[str]: 319 | """Format a single trial summary.""" 320 | lines = [] 321 | 322 | # Add header info 323 | lines.extend(_format_trial_header(trial)) 324 | 325 | # Add summary text 326 | lines.extend(_format_trial_summary_text(trial)) 327 | 328 | # Add conditions 329 | lines.extend(_format_trial_conditions(trial)) 330 | 331 | # Add interventions 332 | lines.extend(_format_trial_interventions(trial)) 333 | 334 | # Add metadata 335 | lines.extend(_format_trial_metadata(trial)) 336 | 337 | lines.append("") 338 | return lines 339 | 340 | 341 | def format_nci_trial_results(results: dict[str, Any]) -> str: 342 | """ 343 | Format NCI trial search results as markdown. 344 | """ 345 | trials = results.get("trials", []) 346 | total = results.get("total", 0) 347 | 348 | if not trials: 349 | return "No trials found matching the search criteria in NCI database." 350 | 351 | lines = [ 352 | f"## NCI Clinical Trials Search Results ({total} found)", 353 | "", 354 | "*Source: NCI Clinical Trials Search API*", 355 | "", 356 | ] 357 | 358 | for trial in trials: 359 | lines.extend(_format_trial_summary(trial)) 360 | 361 | return "\n".join(lines) 362 | ``` -------------------------------------------------------------------------------- /src/biomcp/variants/alphagenome.py: -------------------------------------------------------------------------------- ```python 1 | """AlphaGenome integration for variant effect prediction.""" 2 | 3 | import logging 4 | import os 5 | import re 6 | from typing import Any, TypedDict 7 | 8 | from ..utils.request_cache import request_cache 9 | 10 | logger = logging.getLogger(__name__) 11 | 12 | # Default threshold for significant changes 13 | DEFAULT_SIGNIFICANCE_THRESHOLD = 0.5 14 | 15 | # Chromosome pattern for validation 16 | CHROMOSOME_PATTERN = re.compile(r"^chr([1-9]|1[0-9]|2[0-2]|X|Y|M|MT)$") 17 | 18 | # Valid nucleotide characters 19 | VALID_NUCLEOTIDES = set("ACGT") 20 | 21 | 22 | class VariantPrediction(TypedDict): 23 | """Type definition for variant prediction results.""" 24 | 25 | gene_expression: dict[str, float] 26 | chromatin_accessibility: dict[str, float] 27 | splicing_effects: list[str] 28 | summary_stats: dict[str, int] 29 | 30 | 31 | @request_cache(ttl=1800) # Cache for 30 minutes 32 | async def predict_variant_effects( 33 | chromosome: str, 34 | position: int, 35 | reference: str, 36 | alternate: str, 37 | interval_size: int = 131_072, 38 | tissue_types: list[str] | None = None, 39 | significance_threshold: float = DEFAULT_SIGNIFICANCE_THRESHOLD, 40 | api_key: str | None = None, 41 | ) -> str: 42 | """ 43 | Predict variant effects using AlphaGenome. 44 | 45 | Args: 46 | chromosome: Chromosome (e.g., 'chr7') 47 | position: 1-based genomic position 48 | reference: Reference allele(s) 49 | alternate: Alternate allele(s) 50 | interval_size: Size of genomic context window (max 1,000,000) 51 | tissue_types: Optional UBERON ontology terms for tissue-specific predictions 52 | significance_threshold: Threshold for significant changes (default 0.5) 53 | api_key: Optional API key (if not provided, uses ALPHAGENOME_API_KEY env var) 54 | 55 | Returns: 56 | Formatted markdown string with predictions 57 | 58 | Raises: 59 | ValueError: If input parameters are invalid 60 | """ 61 | # Validate inputs 62 | _validate_inputs(chromosome, position, reference, alternate) 63 | 64 | # Check for API key (prefer parameter over environment variable) 65 | if not api_key: 66 | api_key = os.getenv("ALPHAGENOME_API_KEY") 67 | 68 | if not api_key: 69 | return ( 70 | "❌ **AlphaGenome API key required**\n\n" 71 | "I need an API key to use AlphaGenome. Please provide it by either:\n\n" 72 | "**Option 1: Include your key in your request**\n" 73 | 'Say: "My AlphaGenome API key is YOUR_KEY_HERE" and I\'ll use it for this prediction.\n\n' 74 | "**Option 2: Set it as an environment variable (for persistent use)**\n" 75 | "```bash\n" 76 | "export ALPHAGENOME_API_KEY='your-key'\n" 77 | "```\n\n" 78 | "Get a free API key at: https://deepmind.google.com/science/alphagenome\n\n" 79 | "**ACTION REQUIRED**: Please provide your API key using Option 1 above to continue." 80 | ) 81 | 82 | # Try to import AlphaGenome 83 | try: 84 | # Suppress protobuf version warnings 85 | import warnings 86 | 87 | warnings.filterwarnings( 88 | "ignore", 89 | category=UserWarning, 90 | module="google.protobuf.runtime_version", 91 | ) 92 | 93 | from alphagenome.data import genome 94 | from alphagenome.models import dna_client, variant_scorers 95 | except ImportError: 96 | return ( 97 | "❌ **AlphaGenome not installed**\n\n" 98 | "To install:\n" 99 | "```bash\n" 100 | "git clone https://github.com/google-deepmind/alphagenome.git\n" 101 | "cd alphagenome && pip install .\n" 102 | "```\n\n" 103 | "Standard variant annotations are still available via `variant_searcher`." 104 | ) 105 | 106 | try: 107 | # Create client 108 | model = dna_client.create(api_key) 109 | 110 | # Calculate interval boundaries (ensure within supported sizes) 111 | # Supported sizes: 2048, 16384, 131072, 524288, 1048576 112 | supported_sizes = [2048, 16384, 131072, 524288, 1048576] 113 | 114 | # Find smallest supported size that's >= requested size 115 | valid_sizes = [s for s in supported_sizes if s >= interval_size] 116 | if not valid_sizes: 117 | # If requested size is larger than max, use max 118 | interval_size = supported_sizes[-1] 119 | else: 120 | interval_size = min(valid_sizes) 121 | 122 | half_size = interval_size // 2 123 | interval_start = max(0, position - half_size - 1) # Convert to 0-based 124 | interval_end = interval_start + interval_size 125 | 126 | # Create interval and variant objects 127 | interval = genome.Interval( 128 | chromosome=chromosome, start=interval_start, end=interval_end 129 | ) 130 | 131 | variant = genome.Variant( 132 | chromosome=chromosome, 133 | position=position, 134 | reference_bases=reference, 135 | alternate_bases=alternate, 136 | ) 137 | 138 | # Get recommended scorers for human 139 | scorers = variant_scorers.get_recommended_scorers(organism="human") 140 | 141 | # Make prediction 142 | scores = model.score_variant( 143 | interval=interval, variant=variant, variant_scorers=scorers 144 | ) 145 | 146 | # Format results 147 | return _format_predictions( 148 | variant, scores, interval_size, significance_threshold 149 | ) 150 | 151 | except Exception as e: 152 | logger.error(f"AlphaGenome prediction failed: {e}", exc_info=True) 153 | error_context = ( 154 | f"❌ **AlphaGenome prediction failed**\n\n" 155 | f"Error: {e!s}\n\n" 156 | f"**Context:**\n" 157 | f"- Variant: {chromosome}:{position} {reference}>{alternate}\n" 158 | f"- Interval size: {interval_size:,} bp\n" 159 | f"- Tissue types: {tissue_types or 'None specified'}" 160 | ) 161 | return error_context 162 | 163 | 164 | def _format_predictions( 165 | variant: Any, 166 | scores: list[Any], 167 | interval_size: int, 168 | significance_threshold: float = DEFAULT_SIGNIFICANCE_THRESHOLD, 169 | ) -> str: 170 | """Format AlphaGenome predictions into markdown. 171 | 172 | Args: 173 | variant: The variant object from AlphaGenome 174 | scores: List of prediction scores 175 | interval_size: Size of the genomic context window 176 | significance_threshold: Threshold for significant changes 177 | 178 | Returns: 179 | Formatted markdown string 180 | """ 181 | try: 182 | from alphagenome.models import variant_scorers 183 | 184 | # Convert scores to DataFrame 185 | scores_df = variant_scorers.tidy_scores(scores) 186 | 187 | # Start building the output 188 | lines = [ 189 | "## AlphaGenome Variant Effect Predictions\n", 190 | f"**Variant**: {variant.chromosome}:{variant.position} {variant.reference_bases}>{variant.alternate_bases}", 191 | f"**Analysis window**: {interval_size:,} bp\n", 192 | ] 193 | 194 | # Group scores by output type 195 | if not scores_df.empty: 196 | # Gene expression effects 197 | expr_scores = scores_df[ 198 | scores_df["output_type"].str.contains("RNA_SEQ", na=False) 199 | ] 200 | if not expr_scores.empty: 201 | top_expr = expr_scores.loc[ 202 | expr_scores["raw_score"].abs().idxmax() 203 | ] 204 | gene = top_expr.get("gene_name", "Unknown") 205 | score = top_expr["raw_score"] 206 | direction = "↓ decreases" if score < 0 else "↑ increases" 207 | lines.append("\n### Gene Expression") 208 | lines.append( 209 | f"- **{gene}**: {score:+.2f} log₂ fold change ({direction} expression)" 210 | ) 211 | 212 | # Chromatin accessibility 213 | chrom_scores = scores_df[ 214 | scores_df["output_type"].str.contains("ATAC|DNASE", na=False) 215 | ] 216 | if not chrom_scores.empty: 217 | top_chrom = chrom_scores.loc[ 218 | chrom_scores["raw_score"].abs().idxmax() 219 | ] 220 | score = top_chrom["raw_score"] 221 | track = top_chrom.get("track_name", "tissue") 222 | direction = "↓ decreases" if score < 0 else "↑ increases" 223 | lines.append("\n### Chromatin Accessibility") 224 | lines.append( 225 | f"- **{track}**: {score:+.2f} log₂ change ({direction} accessibility)" 226 | ) 227 | 228 | # Splicing effects 229 | splice_scores = scores_df[ 230 | scores_df["output_type"].str.contains("SPLICE", na=False) 231 | ] 232 | if not splice_scores.empty: 233 | lines.append("\n### Splicing") 234 | lines.append("- Potential splicing alterations detected") 235 | 236 | # Summary statistics 237 | total_tracks = len(scores_df) 238 | significant = len( 239 | scores_df[ 240 | scores_df["raw_score"].abs() > significance_threshold 241 | ] 242 | ) 243 | lines.append("\n### Summary") 244 | lines.append(f"- Analyzed {total_tracks} regulatory tracks") 245 | lines.append( 246 | f"- {significant} tracks show substantial changes (|log₂| > {significance_threshold})" 247 | ) 248 | else: 249 | lines.append("\n*No significant regulatory effects predicted*") 250 | 251 | return "\n".join(lines) 252 | 253 | except Exception as e: 254 | logger.error(f"Failed to format predictions: {e}") 255 | return f"## AlphaGenome Results\n\nPrediction completed but formatting failed: {e!s}" 256 | 257 | 258 | def _validate_inputs( 259 | chromosome: str, position: int, reference: str, alternate: str 260 | ) -> None: 261 | """Validate input parameters for variant prediction. 262 | 263 | Args: 264 | chromosome: Chromosome identifier 265 | position: Genomic position 266 | reference: Reference allele(s) 267 | alternate: Alternate allele(s) 268 | 269 | Raises: 270 | ValueError: If any input is invalid 271 | """ 272 | # Validate chromosome format 273 | if not CHROMOSOME_PATTERN.match(chromosome): 274 | raise ValueError( 275 | f"Invalid chromosome format: {chromosome}. " 276 | "Expected format: chr1-22, chrX, chrY, chrM, or chrMT" 277 | ) 278 | 279 | # Validate position 280 | if position < 1: 281 | raise ValueError(f"Position must be >= 1, got {position}") 282 | 283 | # Validate nucleotides 284 | ref_upper = reference.upper() 285 | alt_upper = alternate.upper() 286 | 287 | if not ref_upper: 288 | raise ValueError("Reference allele cannot be empty") 289 | 290 | if not alt_upper: 291 | raise ValueError("Alternate allele cannot be empty") 292 | 293 | invalid_ref = set(ref_upper) - VALID_NUCLEOTIDES 294 | if invalid_ref: 295 | raise ValueError( 296 | f"Invalid nucleotides in reference allele: {invalid_ref}. " 297 | f"Only A, C, G, T are allowed" 298 | ) 299 | 300 | invalid_alt = set(alt_upper) - VALID_NUCLEOTIDES 301 | if invalid_alt: 302 | raise ValueError( 303 | f"Invalid nucleotides in alternate allele: {invalid_alt}. " 304 | f"Only A, C, G, T are allowed" 305 | ) 306 | ``` -------------------------------------------------------------------------------- /docs/backend-services-reference/02-biothings-suite.md: -------------------------------------------------------------------------------- ```markdown 1 | # BioThings Suite API Reference 2 | 3 | The BioThings Suite provides unified access to biomedical annotations across genes, variants, diseases, and drugs through a consistent API interface. 4 | 5 | ## Usage Examples 6 | 7 | For practical examples using the BioThings APIs, see: 8 | 9 | - [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#biothings-integration-for-enhanced-search) 10 | - [Get Comprehensive Variant Annotations](../how-to-guides/03-get-comprehensive-variant-annotations.md#integration-with-other-biomcp-tools) 11 | 12 | ## Overview 13 | 14 | BioMCP integrates with four BioThings APIs: 15 | 16 | - **MyGene.info**: Gene annotations and functional information 17 | - **MyVariant.info**: Genetic variant annotations and clinical significance 18 | - **MyDisease.info**: Disease ontology and terminology mappings 19 | - **MyChem.info**: Drug/chemical properties and mechanisms 20 | 21 | All APIs share: 22 | 23 | - RESTful JSON interface 24 | - No authentication required 25 | - Elasticsearch-based queries 26 | - Comprehensive data aggregation 27 | 28 | ## MyGene.info 29 | 30 | ### Base URL 31 | 32 | `https://mygene.info/v1/` 33 | 34 | ### Key Endpoints 35 | 36 | #### Gene Query 37 | 38 | ``` 39 | GET /query?q={query} 40 | ``` 41 | 42 | **Parameters:** 43 | 44 | - `q`: Query string (gene symbol, name, or ID) 45 | - `fields`: Specific fields to return 46 | - `species`: Limit to species (default: human, mouse, rat) 47 | - `size`: Number of results (default: 10) 48 | 49 | **Example:** 50 | 51 | ```bash 52 | curl "https://mygene.info/v1/query?q=BRAF&fields=symbol,name,summary,type_of_gene" 53 | ``` 54 | 55 | #### Gene Annotation 56 | 57 | ``` 58 | GET /gene/{geneid} 59 | ``` 60 | 61 | **Gene ID formats:** 62 | 63 | - Entrez Gene ID: `673` 64 | - Ensembl ID: `ENSG00000157764` 65 | - Gene Symbol: `BRAF` 66 | 67 | **Example:** 68 | 69 | ```bash 70 | curl "https://mygene.info/v1/gene/673?fields=symbol,name,summary,genomic_pos,pathway,go" 71 | ``` 72 | 73 | ### Important Fields 74 | 75 | | Field | Description | Example | 76 | | ------------- | ---------------------- | --------------------------------------- | 77 | | `symbol` | Official gene symbol | "BRAF" | 78 | | `name` | Full gene name | "B-Raf proto-oncogene" | 79 | | `entrezgene` | NCBI Entrez ID | 673 | 80 | | `summary` | Functional description | "This gene encodes..." | 81 | | `genomic_pos` | Chromosomal location | {"chr": "7", "start": 140433812} | 82 | | `pathway` | Pathway memberships | {"kegg": [...], "reactome": [...]} | 83 | | `go` | Gene Ontology terms | {"BP": [...], "MF": [...], "CC": [...]} | 84 | 85 | ## MyVariant.info 86 | 87 | ### Base URL 88 | 89 | `https://myvariant.info/v1/` 90 | 91 | ### Key Endpoints 92 | 93 | #### Variant Query 94 | 95 | ``` 96 | GET /query?q={query} 97 | ``` 98 | 99 | **Query syntax:** 100 | 101 | - Gene + variant: `dbnsfp.genename:BRAF AND dbnsfp.hgvsp:p.V600E` 102 | - rsID: `dbsnp.rsid:rs121913529` 103 | - Genomic: `_id:chr7:g.140453136A>T` 104 | 105 | **Example:** 106 | 107 | ```bash 108 | curl "https://myvariant.info/v1/query?q=dbnsfp.genename:TP53&fields=_id,clinvar,gnomad_exome" 109 | ``` 110 | 111 | #### Variant Annotation 112 | 113 | ``` 114 | GET /variant/{variant_id} 115 | ``` 116 | 117 | **ID formats:** 118 | 119 | - HGVS genomic: `chr7:g.140453136A>T` 120 | - dbSNP: `rs121913529` 121 | 122 | ### Important Fields 123 | 124 | | Field | Description | Example | 125 | | -------------- | ---------------------- | --------------------------------------- | 126 | | `clinvar` | Clinical significance | {"clinical_significance": "Pathogenic"} | 127 | | `dbsnp` | dbSNP annotations | {"rsid": "rs121913529"} | 128 | | `cadd` | CADD scores | {"phred": 35} | 129 | | `gnomad_exome` | Population frequency | {"af": {"af": 0.00001}} | 130 | | `dbnsfp` | Functional predictions | {"polyphen2": "probably_damaging"} | 131 | 132 | ### Query Filters 133 | 134 | ```python 135 | # Clinical significance 136 | q = "clinvar.clinical_significance:pathogenic" 137 | 138 | # Frequency filters 139 | q = "gnomad_exome.af.af:<0.01" # Rare variants 140 | 141 | # Gene-specific 142 | q = "dbnsfp.genename:BRCA1 AND cadd.phred:>20" 143 | ``` 144 | 145 | ## MyDisease.info 146 | 147 | ### Base URL 148 | 149 | `https://mydisease.info/v1/` 150 | 151 | ### Key Endpoints 152 | 153 | #### Disease Query 154 | 155 | ``` 156 | GET /query?q={query} 157 | ``` 158 | 159 | **Example:** 160 | 161 | ```bash 162 | curl "https://mydisease.info/v1/query?q=melanoma&fields=mondo,disease_ontology,synonyms" 163 | ``` 164 | 165 | #### Disease Annotation 166 | 167 | ``` 168 | GET /disease/{disease_id} 169 | ``` 170 | 171 | **ID formats:** 172 | 173 | - MONDO: `MONDO:0007254` 174 | - DOID: `DOID:1909` 175 | - OMIM: `OMIM:155600` 176 | 177 | ### Important Fields 178 | 179 | | Field | Description | Example | 180 | | ------------------ | ----------------- | -------------------------------------------- | 181 | | `mondo` | MONDO ontology | {"id": "MONDO:0007254", "label": "melanoma"} | 182 | | `disease_ontology` | Disease Ontology | {"id": "DOID:1909"} | 183 | | `synonyms` | Alternative names | ["malignant melanoma", "MM"] | 184 | | `xrefs` | Cross-references | {"omim": ["155600"], "mesh": ["D008545"]} | 185 | | `phenotypes` | HPO terms | [{"hpo_id": "HP:0002861"}] | 186 | 187 | ## MyChem.info 188 | 189 | ### Base URL 190 | 191 | `https://mychem.info/v1/` 192 | 193 | ### Key Endpoints 194 | 195 | #### Drug Query 196 | 197 | ``` 198 | GET /query?q={query} 199 | ``` 200 | 201 | **Example:** 202 | 203 | ```bash 204 | curl "https://mychem.info/v1/query?q=imatinib&fields=drugbank,chembl,chebi" 205 | ``` 206 | 207 | #### Drug Annotation 208 | 209 | ``` 210 | GET /drug/{drug_id} 211 | ``` 212 | 213 | **ID formats:** 214 | 215 | - DrugBank: `DB00619` 216 | - ChEMBL: `CHEMBL941` 217 | - Name: `imatinib` 218 | 219 | ### Important Fields 220 | 221 | | Field | Description | Example | 222 | | -------------- | -------------- | -------------------------------------------- | 223 | | `drugbank` | DrugBank data | {"id": "DB00619", "name": "Imatinib"} | 224 | | `chembl` | ChEMBL data | {"molecule_chembl_id": "CHEMBL941"} | 225 | | `chebi` | ChEBI ontology | {"id": "CHEBI:45783"} | 226 | | `drugcentral` | Indications | {"indications": [...]} | 227 | | `pharmacology` | Mechanism | {"mechanism_of_action": "BCR-ABL inhibitor"} | 228 | 229 | ## Common Query Patterns 230 | 231 | ### 1. Gene to Variant Pipeline 232 | 233 | ```python 234 | # Step 1: Get gene info 235 | gene_response = requests.get( 236 | "https://mygene.info/v1/gene/BRAF", 237 | params={"fields": "symbol,genomic_pos"} 238 | ) 239 | 240 | # Step 2: Find variants in gene 241 | variant_response = requests.get( 242 | "https://myvariant.info/v1/query", 243 | params={ 244 | "q": "dbnsfp.genename:BRAF", 245 | "fields": "clinvar.clinical_significance,gnomad_exome.af", 246 | "size": 100 247 | } 248 | ) 249 | ``` 250 | 251 | ### 2. Disease Synonym Expansion 252 | 253 | ```python 254 | # Get all synonyms for a disease 255 | disease_response = requests.get( 256 | "https://mydisease.info/v1/query", 257 | params={ 258 | "q": "melanoma", 259 | "fields": "mondo,synonyms,xrefs" 260 | } 261 | ) 262 | 263 | # Extract all names 264 | all_names = ["melanoma"] 265 | for hit in disease_response.json()["hits"]: 266 | if "synonyms" in hit: 267 | all_names.extend(hit["synonyms"]) 268 | ``` 269 | 270 | ### 3. Drug Target Lookup 271 | 272 | ```python 273 | # Find drugs targeting a gene 274 | drug_response = requests.get( 275 | "https://mychem.info/v1/query", 276 | params={ 277 | "q": "drugcentral.targets.gene_symbol:BRAF", 278 | "fields": "drugbank.name,chembl.pref_name", 279 | "size": 50 280 | } 281 | ) 282 | ``` 283 | 284 | ## Rate Limits and Best Practices 285 | 286 | ### Rate Limits 287 | 288 | - **Default**: 1,000 requests/hour per IP 289 | - **Batch queries**: Up to 1,000 IDs per request 290 | - **No authentication**: Public access 291 | 292 | ### Best Practices 293 | 294 | #### 1. Use Field Filtering 295 | 296 | ```python 297 | # Good - only request needed fields 298 | params = {"fields": "symbol,name,summary"} 299 | 300 | # Bad - returns all fields 301 | params = {} 302 | ``` 303 | 304 | #### 2. Batch Requests 305 | 306 | ```python 307 | # Good - single request for multiple genes 308 | response = requests.post( 309 | "https://mygene.info/v1/gene", 310 | json={"ids": ["BRAF", "KRAS", "EGFR"]} 311 | ) 312 | 313 | # Bad - multiple individual requests 314 | for gene in ["BRAF", "KRAS", "EGFR"]: 315 | requests.get(f"https://mygene.info/v1/gene/{gene}") 316 | ``` 317 | 318 | #### 3. Handle Missing Data 319 | 320 | ```python 321 | # Check for field existence 322 | if "clinvar" in variant and "clinical_significance" in variant["clinvar"]: 323 | significance = variant["clinvar"]["clinical_significance"] 324 | else: 325 | significance = "Not available" 326 | ``` 327 | 328 | ## Error Handling 329 | 330 | ### Common Errors 331 | 332 | #### 404 Not Found 333 | 334 | ```json 335 | { 336 | "success": false, 337 | "error": "ID not found" 338 | } 339 | ``` 340 | 341 | #### 400 Bad Request 342 | 343 | ```json 344 | { 345 | "success": false, 346 | "error": "Invalid query syntax" 347 | } 348 | ``` 349 | 350 | #### 429 Rate Limited 351 | 352 | ```json 353 | { 354 | "success": false, 355 | "error": "Rate limit exceeded" 356 | } 357 | ``` 358 | 359 | ### Error Handling Code 360 | 361 | ```python 362 | def query_biothings(api_url, query_params): 363 | try: 364 | response = requests.get(api_url, params=query_params) 365 | response.raise_for_status() 366 | return response.json() 367 | except requests.exceptions.HTTPError as e: 368 | if e.response.status_code == 404: 369 | return {"error": "Not found", "query": query_params} 370 | elif e.response.status_code == 429: 371 | # Implement exponential backoff 372 | time.sleep(60) 373 | return query_biothings(api_url, query_params) 374 | else: 375 | raise 376 | ``` 377 | 378 | ## Data Sources 379 | 380 | Each BioThings API aggregates data from multiple sources: 381 | 382 | ### MyGene.info Sources 383 | 384 | - NCBI Entrez Gene 385 | - Ensembl 386 | - UniProt 387 | - KEGG, Reactome, WikiPathways 388 | - Gene Ontology 389 | 390 | ### MyVariant.info Sources 391 | 392 | - dbSNP 393 | - ClinVar 394 | - gnomAD 395 | - CADD 396 | - PolyPhen-2, SIFT 397 | - COSMIC 398 | 399 | ### MyDisease.info Sources 400 | 401 | - MONDO 402 | - Disease Ontology 403 | - OMIM 404 | - MeSH 405 | - HPO 406 | 407 | ### MyChem.info Sources 408 | 409 | - DrugBank 410 | - ChEMBL 411 | - ChEBI 412 | - PubChem 413 | - DrugCentral 414 | 415 | ## Advanced Features 416 | 417 | ### Full-Text Search 418 | 419 | ```python 420 | # Search across all fields 421 | params = { 422 | "q": "lung cancer EGFR", # Searches all text fields 423 | "fields": "symbol,name,summary" 424 | } 425 | ``` 426 | 427 | ### Faceted Search 428 | 429 | ```python 430 | # Get aggregations 431 | params = { 432 | "q": "clinvar.clinical_significance:pathogenic", 433 | "facets": "dbnsfp.genename", 434 | "size": 0 # Only return facets 435 | } 436 | ``` 437 | 438 | ### Scrolling Large Results 439 | 440 | ```python 441 | # For results > 10,000 442 | params = { 443 | "q": "dbnsfp.genename:TP53", 444 | "fetch_all": True, 445 | "fields": "_id" 446 | } 447 | ``` 448 | 449 | ## Integration Tips 450 | 451 | ### 1. Caching Strategy 452 | 453 | - Cache gene/drug/disease lookups (stable) 454 | - Don't cache variant queries (frequently updated) 455 | - Use ETags for conditional requests 456 | 457 | ### 2. Parallel Requests 458 | 459 | ```python 460 | import asyncio 461 | import aiohttp 462 | 463 | async def fetch_all(session, urls): 464 | tasks = [] 465 | for url in urls: 466 | tasks.append(session.get(url)) 467 | return await asyncio.gather(*tasks) 468 | ``` 469 | 470 | ### 3. Data Normalization 471 | 472 | ```python 473 | def normalize_gene_symbol(symbol): 474 | # Query MyGene to get official symbol 475 | response = requests.get( 476 | f"https://mygene.info/v1/query?q={symbol}" 477 | ) 478 | if response.json()["hits"]: 479 | return response.json()["hits"][0]["symbol"] 480 | return symbol 481 | ``` 482 | ``` -------------------------------------------------------------------------------- /tests/tdd/test_biothings_integration.py: -------------------------------------------------------------------------------- ```python 1 | """Unit tests for BioThings API integration.""" 2 | 3 | from unittest.mock import AsyncMock, patch 4 | 5 | import pytest 6 | 7 | from biomcp.integrations import BioThingsClient, DiseaseInfo, GeneInfo 8 | 9 | 10 | @pytest.fixture 11 | def mock_http_client(): 12 | """Mock the http_client.request_api function.""" 13 | with patch("biomcp.integrations.biothings_client.http_client") as mock: 14 | yield mock 15 | 16 | 17 | @pytest.fixture 18 | def biothings_client(): 19 | """Create a BioThings client instance.""" 20 | return BioThingsClient() 21 | 22 | 23 | class TestGeneInfo: 24 | """Test gene information retrieval.""" 25 | 26 | @pytest.mark.asyncio 27 | async def test_get_gene_by_symbol( 28 | self, biothings_client, mock_http_client 29 | ): 30 | """Test getting gene info by symbol.""" 31 | # Mock query response 32 | mock_http_client.request_api = AsyncMock( 33 | side_effect=[ 34 | ( 35 | { 36 | "hits": [ 37 | { 38 | "_id": "7157", 39 | "symbol": "TP53", 40 | "name": "tumor protein p53", 41 | "taxid": 9606, 42 | } 43 | ] 44 | }, 45 | None, 46 | ), 47 | # Mock get response 48 | ( 49 | { 50 | "_id": "7157", 51 | "symbol": "TP53", 52 | "name": "tumor protein p53", 53 | "summary": "This gene encodes a tumor suppressor protein...", 54 | "alias": ["p53", "LFS1"], 55 | "type_of_gene": "protein-coding", 56 | "entrezgene": 7157, 57 | }, 58 | None, 59 | ), 60 | ] 61 | ) 62 | 63 | result = await biothings_client.get_gene_info("TP53") 64 | 65 | assert result is not None 66 | assert isinstance(result, GeneInfo) 67 | assert result.symbol == "TP53" 68 | assert result.name == "tumor protein p53" 69 | assert result.gene_id == "7157" 70 | assert "p53" in result.alias 71 | 72 | @pytest.mark.asyncio 73 | async def test_get_gene_by_id(self, biothings_client, mock_http_client): 74 | """Test getting gene info by Entrez ID.""" 75 | # Mock direct get response 76 | mock_http_client.request_api = AsyncMock( 77 | return_value=( 78 | { 79 | "_id": "7157", 80 | "symbol": "TP53", 81 | "name": "tumor protein p53", 82 | "summary": "This gene encodes a tumor suppressor protein...", 83 | }, 84 | None, 85 | ) 86 | ) 87 | 88 | result = await biothings_client.get_gene_info("7157") 89 | 90 | assert result is not None 91 | assert result.symbol == "TP53" 92 | assert result.gene_id == "7157" 93 | 94 | @pytest.mark.asyncio 95 | async def test_gene_not_found(self, biothings_client, mock_http_client): 96 | """Test handling of gene not found.""" 97 | mock_http_client.request_api = AsyncMock( 98 | return_value=({"hits": []}, None) 99 | ) 100 | 101 | result = await biothings_client.get_gene_info("INVALID_GENE") 102 | assert result is None 103 | 104 | @pytest.mark.asyncio 105 | async def test_batch_get_genes(self, biothings_client, mock_http_client): 106 | """Test batch gene retrieval.""" 107 | mock_http_client.request_api = AsyncMock( 108 | return_value=( 109 | [ 110 | { 111 | "_id": "7157", 112 | "symbol": "TP53", 113 | "name": "tumor protein p53", 114 | }, 115 | { 116 | "_id": "673", 117 | "symbol": "BRAF", 118 | "name": "B-Raf proto-oncogene", 119 | }, 120 | ], 121 | None, 122 | ) 123 | ) 124 | 125 | results = await biothings_client.batch_get_genes(["TP53", "BRAF"]) 126 | 127 | assert len(results) == 2 128 | assert results[0].symbol == "TP53" 129 | assert results[1].symbol == "BRAF" 130 | 131 | 132 | class TestDiseaseInfo: 133 | """Test disease information retrieval.""" 134 | 135 | @pytest.mark.asyncio 136 | async def test_get_disease_by_name( 137 | self, biothings_client, mock_http_client 138 | ): 139 | """Test getting disease info by name.""" 140 | # Mock query response 141 | mock_http_client.request_api = AsyncMock( 142 | side_effect=[ 143 | ( 144 | { 145 | "hits": [ 146 | { 147 | "_id": "MONDO:0007959", 148 | "name": "melanoma", 149 | "mondo": {"mondo": "MONDO:0007959"}, 150 | } 151 | ] 152 | }, 153 | None, 154 | ), 155 | # Mock get response 156 | ( 157 | { 158 | "_id": "MONDO:0007959", 159 | "name": "melanoma", 160 | "mondo": { 161 | "definition": "A malignant neoplasm composed of melanocytes.", 162 | "synonym": { 163 | "exact": [ 164 | "malignant melanoma", 165 | "naevocarcinoma", 166 | ] 167 | }, 168 | }, 169 | }, 170 | None, 171 | ), 172 | ] 173 | ) 174 | 175 | result = await biothings_client.get_disease_info("melanoma") 176 | 177 | assert result is not None 178 | assert isinstance(result, DiseaseInfo) 179 | assert result.name == "melanoma" 180 | assert result.disease_id == "MONDO:0007959" 181 | assert "malignant melanoma" in result.synonyms 182 | 183 | @pytest.mark.asyncio 184 | async def test_get_disease_by_id(self, biothings_client, mock_http_client): 185 | """Test getting disease info by MONDO ID.""" 186 | mock_http_client.request_api = AsyncMock( 187 | return_value=( 188 | { 189 | "_id": "MONDO:0016575", 190 | "name": "GIST", 191 | "mondo": { 192 | "definition": "Gastrointestinal stromal tumor...", 193 | }, 194 | }, 195 | None, 196 | ) 197 | ) 198 | 199 | result = await biothings_client.get_disease_info("MONDO:0016575") 200 | 201 | assert result is not None 202 | assert result.name == "GIST" 203 | assert result.disease_id == "MONDO:0016575" 204 | 205 | @pytest.mark.asyncio 206 | async def test_get_disease_synonyms( 207 | self, biothings_client, mock_http_client 208 | ): 209 | """Test getting disease synonyms for query expansion.""" 210 | mock_http_client.request_api = AsyncMock( 211 | side_effect=[ 212 | ( 213 | { 214 | "hits": [ 215 | { 216 | "_id": "MONDO:0018076", 217 | "name": "GIST", 218 | } 219 | ] 220 | }, 221 | None, 222 | ), 223 | ( 224 | { 225 | "_id": "MONDO:0018076", 226 | "name": "gastrointestinal stromal tumor", 227 | "mondo": { 228 | "synonym": { 229 | "exact": [ 230 | "GIST", 231 | "gastrointestinal stromal tumour", 232 | "GI stromal tumor", 233 | ] 234 | } 235 | }, 236 | }, 237 | None, 238 | ), 239 | ] 240 | ) 241 | 242 | synonyms = await biothings_client.get_disease_synonyms("GIST") 243 | 244 | assert "GIST" in synonyms 245 | assert "gastrointestinal stromal tumor" in synonyms 246 | assert len(synonyms) <= 5 # Limited to 5 247 | 248 | 249 | class TestTrialSynonymExpansion: 250 | """Test disease synonym expansion in trial searches.""" 251 | 252 | @pytest.mark.asyncio 253 | async def test_trial_search_with_synonym_expansion(self): 254 | """Test that trial search expands disease synonyms.""" 255 | from biomcp.trials.search import TrialQuery, convert_query 256 | 257 | with patch("biomcp.trials.search.BioThingsClient") as mock_client: 258 | # Mock synonym expansion 259 | mock_instance = mock_client.return_value 260 | mock_instance.get_disease_synonyms = AsyncMock( 261 | return_value=[ 262 | "GIST", 263 | "gastrointestinal stromal tumor", 264 | "GI stromal tumor", 265 | ] 266 | ) 267 | 268 | query = TrialQuery( 269 | conditions=["GIST"], 270 | expand_synonyms=True, 271 | ) 272 | 273 | params = await convert_query(query) 274 | 275 | # Check that conditions were expanded 276 | assert "query.cond" in params 277 | cond_value = params["query.cond"][0] 278 | assert "GIST" in cond_value 279 | assert "gastrointestinal stromal tumor" in cond_value 280 | 281 | @pytest.mark.asyncio 282 | async def test_trial_search_without_synonym_expansion(self): 283 | """Test that trial search works without synonym expansion.""" 284 | from biomcp.trials.search import TrialQuery, convert_query 285 | 286 | query = TrialQuery( 287 | conditions=["GIST"], 288 | expand_synonyms=False, 289 | ) 290 | 291 | params = await convert_query(query) 292 | 293 | # Check that conditions were not expanded 294 | assert "query.cond" in params 295 | assert params["query.cond"] == ["GIST"] 296 | 297 | 298 | class TestErrorHandling: 299 | """Test error handling in BioThings integration.""" 300 | 301 | @pytest.mark.asyncio 302 | async def test_api_error_handling( 303 | self, biothings_client, mock_http_client 304 | ): 305 | """Test handling of API errors.""" 306 | from biomcp.http_client import RequestError 307 | 308 | mock_http_client.request_api = AsyncMock( 309 | return_value=( 310 | None, 311 | RequestError(code=500, message="Internal server error"), 312 | ) 313 | ) 314 | 315 | result = await biothings_client.get_gene_info("TP53") 316 | assert result is None 317 | 318 | @pytest.mark.asyncio 319 | async def test_invalid_response_format( 320 | self, biothings_client, mock_http_client 321 | ): 322 | """Test handling of invalid API responses.""" 323 | mock_http_client.request_api = AsyncMock( 324 | return_value=({"invalid": "response"}, None) 325 | ) 326 | 327 | result = await biothings_client.get_gene_info("TP53") 328 | assert result is None 329 | ``` -------------------------------------------------------------------------------- /src/biomcp/http_client.py: -------------------------------------------------------------------------------- ```python 1 | import csv 2 | import json 3 | import os 4 | import ssl 5 | from io import StringIO 6 | from ssl import PROTOCOL_TLS_CLIENT, SSLContext, TLSVersion 7 | from typing import Literal, TypeVar 8 | 9 | import certifi 10 | from diskcache import Cache 11 | from platformdirs import user_cache_dir 12 | from pydantic import BaseModel 13 | 14 | from .circuit_breaker import CircuitBreakerConfig, circuit_breaker 15 | from .constants import ( 16 | AGGRESSIVE_INITIAL_RETRY_DELAY, 17 | AGGRESSIVE_MAX_RETRY_ATTEMPTS, 18 | AGGRESSIVE_MAX_RETRY_DELAY, 19 | DEFAULT_CACHE_TIMEOUT, 20 | DEFAULT_FAILURE_THRESHOLD, 21 | DEFAULT_RECOVERY_TIMEOUT, 22 | DEFAULT_SUCCESS_THRESHOLD, 23 | ) 24 | from .http_client_simple import execute_http_request 25 | from .metrics import Timer 26 | from .rate_limiter import domain_limiter 27 | from .retry import ( 28 | RetryableHTTPError, 29 | RetryConfig, 30 | is_retryable_status, 31 | with_retry, 32 | ) 33 | from .utils.endpoint_registry import get_registry 34 | 35 | T = TypeVar("T", bound=BaseModel) 36 | 37 | 38 | class RequestError(BaseModel): 39 | code: int 40 | message: str 41 | 42 | 43 | _cache: Cache | None = None 44 | 45 | 46 | def get_cache() -> Cache: 47 | global _cache 48 | if _cache is None: 49 | cache_path = os.path.join(user_cache_dir("biomcp"), "http_cache") 50 | _cache = Cache(cache_path) 51 | return _cache 52 | 53 | 54 | def generate_cache_key(method: str, url: str, params: dict) -> str: 55 | """Generate cache key using Python's built-in hash function for speed.""" 56 | # Handle simple cases without params 57 | if not params: 58 | return f"{method.upper()}:{url}" 59 | 60 | # Use Python's built-in hash with a fixed seed for consistency 61 | # This is much faster than SHA256 for cache keys 62 | params_str = json.dumps(params, sort_keys=True, separators=(",", ":")) 63 | key_source = f"{method.upper()}:{url}:{params_str}" 64 | 65 | # Use Python's hash function with a fixed seed for deterministic results 66 | # Convert to positive hex string for compatibility 67 | hash_value = hash(key_source) 68 | return f"{hash_value & 0xFFFFFFFFFFFFFFFF:016x}" 69 | 70 | 71 | def cache_response(cache_key: str, content: str, ttl: int): 72 | expire = None if ttl == -1 else ttl 73 | cache = get_cache() 74 | cache.set(cache_key, content, expire=expire) 75 | 76 | 77 | def get_cached_response(cache_key: str) -> str | None: 78 | cache = get_cache() 79 | return cache.get(cache_key) 80 | 81 | 82 | def get_ssl_context(tls_version: TLSVersion) -> SSLContext: 83 | """Create an SSLContext with the specified TLS version.""" 84 | context = SSLContext(PROTOCOL_TLS_CLIENT) 85 | context.minimum_version = tls_version 86 | context.maximum_version = tls_version 87 | context.load_verify_locations(cafile=certifi.where()) 88 | return context 89 | 90 | 91 | async def call_http( 92 | method: str, 93 | url: str, 94 | params: dict, 95 | verify: ssl.SSLContext | str | bool = True, 96 | retry_config: RetryConfig | None = None, 97 | headers: dict[str, str] | None = None, 98 | ) -> tuple[int, str]: 99 | """Make HTTP request with optional retry logic. 100 | 101 | Args: 102 | method: HTTP method (GET or POST) 103 | url: Target URL 104 | params: Request parameters 105 | verify: SSL verification settings 106 | retry_config: Retry configuration (if None, no retry) 107 | 108 | Returns: 109 | Tuple of (status_code, response_text) 110 | """ 111 | 112 | async def _make_request() -> tuple[int, str]: 113 | # Extract domain from URL for metrics tagging 114 | from urllib.parse import urlparse 115 | 116 | parsed = urlparse(url) 117 | host = parsed.hostname or "unknown" 118 | 119 | # Apply circuit breaker for the host 120 | breaker_config = CircuitBreakerConfig( 121 | failure_threshold=DEFAULT_FAILURE_THRESHOLD, 122 | recovery_timeout=DEFAULT_RECOVERY_TIMEOUT, 123 | success_threshold=DEFAULT_SUCCESS_THRESHOLD, 124 | expected_exception=(ConnectionError, TimeoutError), 125 | ) 126 | 127 | @circuit_breaker(f"http_{host}", breaker_config) 128 | async def _execute_with_breaker(): 129 | async with Timer( 130 | "http_request", tags={"method": method, "host": host} 131 | ): 132 | return await execute_http_request( 133 | method, url, params, verify, headers 134 | ) 135 | 136 | status, text = await _execute_with_breaker() 137 | 138 | # Check if status code should trigger retry 139 | if retry_config and is_retryable_status(status, retry_config): 140 | raise RetryableHTTPError(status, text) 141 | 142 | return status, text 143 | 144 | # Apply retry logic if configured 145 | if retry_config: 146 | wrapped_func = with_retry(retry_config)(_make_request) 147 | try: 148 | return await wrapped_func() 149 | except RetryableHTTPError as exc: 150 | # Convert retryable HTTP errors back to status/text 151 | return exc.status_code, exc.message 152 | except Exception: 153 | # Let other exceptions bubble up 154 | raise 155 | else: 156 | return await _make_request() 157 | 158 | 159 | def _handle_offline_mode( 160 | url: str, 161 | method: str, 162 | request: BaseModel | dict, 163 | cache_ttl: int, 164 | response_model_type: type[T] | None, 165 | ) -> tuple[T | None, RequestError | None] | None: 166 | """Handle offline mode logic. Returns None if not in offline mode.""" 167 | if os.getenv("BIOMCP_OFFLINE", "").lower() not in ("true", "1", "yes"): 168 | return None 169 | 170 | # In offline mode, only return cached responses 171 | if cache_ttl > 0: 172 | cache_key = generate_cache_key( 173 | method, 174 | url, 175 | request 176 | if isinstance(request, dict) 177 | else request.model_dump(exclude_none=True, by_alias=True), 178 | ) 179 | cached_content = get_cached_response(cache_key) 180 | if cached_content: 181 | return parse_response(200, cached_content, response_model_type) 182 | 183 | return None, RequestError( 184 | code=503, 185 | message=f"Offline mode enabled (BIOMCP_OFFLINE=true). Cannot fetch from {url}", 186 | ) 187 | 188 | 189 | def _validate_endpoint(endpoint_key: str | None) -> None: 190 | """Validate endpoint key if provided.""" 191 | if endpoint_key: 192 | registry = get_registry() 193 | if endpoint_key not in registry.get_all_endpoints(): 194 | raise ValueError( 195 | f"Unknown endpoint key: {endpoint_key}. Please register in endpoint_registry.py" 196 | ) 197 | 198 | 199 | def _prepare_request_params( 200 | request: BaseModel | dict, 201 | ) -> tuple[dict, dict | None]: 202 | """Convert request to params dict and extract headers.""" 203 | if isinstance(request, BaseModel): 204 | params = request.model_dump(exclude_none=True, by_alias=True) 205 | else: 206 | params = request.copy() if isinstance(request, dict) else request 207 | 208 | # Extract headers if present 209 | headers = None 210 | if isinstance(params, dict) and "_headers" in params: 211 | try: 212 | import json 213 | 214 | headers = json.loads(params.pop("_headers")) 215 | except (json.JSONDecodeError, TypeError): 216 | pass # Ignore invalid headers 217 | 218 | return params, headers 219 | 220 | 221 | def _get_retry_config( 222 | enable_retry: bool, domain: str | None 223 | ) -> RetryConfig | None: 224 | """Get retry configuration based on settings.""" 225 | if not enable_retry: 226 | return None 227 | 228 | # Use more aggressive retry for certain domains 229 | if domain in ["clinicaltrials", "pubmed", "myvariant"]: 230 | return RetryConfig( 231 | max_attempts=AGGRESSIVE_MAX_RETRY_ATTEMPTS, 232 | initial_delay=AGGRESSIVE_INITIAL_RETRY_DELAY, 233 | max_delay=AGGRESSIVE_MAX_RETRY_DELAY, 234 | ) 235 | return RetryConfig() # Default settings 236 | 237 | 238 | async def request_api( 239 | url: str, 240 | request: BaseModel | dict, 241 | response_model_type: type[T] | None = None, 242 | method: Literal["GET", "POST"] = "GET", 243 | cache_ttl: int = DEFAULT_CACHE_TIMEOUT, 244 | tls_version: TLSVersion | None = None, 245 | domain: str | None = None, 246 | enable_retry: bool = True, 247 | endpoint_key: str | None = None, 248 | ) -> tuple[T | None, RequestError | None]: 249 | # Handle offline mode 250 | offline_result = _handle_offline_mode( 251 | url, method, request, cache_ttl, response_model_type 252 | ) 253 | if offline_result is not None: 254 | return offline_result 255 | 256 | # Validate endpoint 257 | _validate_endpoint(endpoint_key) 258 | 259 | # Apply rate limiting if domain is specified 260 | if domain: 261 | async with domain_limiter.limit(domain): 262 | pass # Rate limit acquired 263 | 264 | # Prepare request 265 | verify = get_ssl_context(tls_version) if tls_version else True 266 | params, headers = _prepare_request_params(request) 267 | retry_config = _get_retry_config(enable_retry, domain) 268 | 269 | # Short-circuit if caching disabled 270 | if cache_ttl == 0: 271 | status, content = await call_http( 272 | method, 273 | url, 274 | params, 275 | verify=verify, 276 | retry_config=retry_config, 277 | headers=headers, 278 | ) 279 | return parse_response(status, content, response_model_type) 280 | 281 | # Handle caching 282 | cache_key = generate_cache_key(method, url, params) 283 | cached_content = get_cached_response(cache_key) 284 | 285 | if cached_content: 286 | return parse_response(200, cached_content, response_model_type) 287 | 288 | # Make HTTP request if not cached 289 | status, content = await call_http( 290 | method, 291 | url, 292 | params, 293 | verify=verify, 294 | retry_config=retry_config, 295 | headers=headers, 296 | ) 297 | parsed_response = parse_response(status, content, response_model_type) 298 | 299 | # Cache if successful response 300 | if status == 200: 301 | cache_response(cache_key, content, cache_ttl) 302 | 303 | return parsed_response 304 | 305 | 306 | def parse_response( 307 | status_code: int, 308 | content: str, 309 | response_model_type: type[T] | None = None, 310 | ) -> tuple[T | None, RequestError | None]: 311 | if status_code != 200: 312 | return None, RequestError(code=status_code, message=content) 313 | 314 | # Handle empty content 315 | if not content or content.strip() == "": 316 | return None, RequestError( 317 | code=500, 318 | message="Empty response received from API", 319 | ) 320 | 321 | try: 322 | if response_model_type is None: 323 | # Try to parse as JSON first 324 | if content.startswith("{") or content.startswith("["): 325 | response_dict = json.loads(content) 326 | elif "," in content: 327 | io = StringIO(content) 328 | response_dict = list(csv.DictReader(io)) 329 | else: 330 | response_dict = {"text": content} 331 | return response_dict, None 332 | 333 | parsed: T = response_model_type.model_validate_json(content) 334 | return parsed, None 335 | 336 | except json.JSONDecodeError as exc: 337 | # Provide more detailed error message for JSON parsing issues 338 | return None, RequestError( 339 | code=500, 340 | message=f"Invalid JSON response: {exc}. Content preview: {content[:100]}...", 341 | ) 342 | except Exception as exc: 343 | return None, RequestError( 344 | code=500, 345 | message=f"Failed to parse response: {exc}", 346 | ) 347 | ``` -------------------------------------------------------------------------------- /src/biomcp/diseases/search.py: -------------------------------------------------------------------------------- ```python 1 | """Search functionality for diseases via NCI CTS API.""" 2 | 3 | import logging 4 | from typing import Any 5 | 6 | from ..constants import NCI_DISEASES_URL 7 | from ..integrations.cts_api import CTSAPIError, make_cts_request 8 | from ..utils import parse_or_query 9 | 10 | logger = logging.getLogger(__name__) 11 | 12 | 13 | def _build_disease_params( 14 | name: str | None, 15 | disease_type: str | None, 16 | category: str | None, 17 | codes: list[str] | None, 18 | parent_ids: list[str] | None, 19 | ancestor_ids: list[str] | None, 20 | include: list[str] | None, 21 | sort: str | None, 22 | order: str | None, 23 | page_size: int, 24 | ) -> dict[str, Any]: 25 | """Build query parameters for disease search.""" 26 | params: dict[str, Any] = {"size": page_size} 27 | 28 | if name: 29 | params["name"] = name 30 | 31 | # Use 'type' parameter instead of 'category' 32 | if disease_type: 33 | params["type"] = disease_type 34 | elif category: # Backward compatibility 35 | params["type"] = category 36 | 37 | if codes: 38 | params["codes"] = ",".join(codes) if isinstance(codes, list) else codes 39 | 40 | if parent_ids: 41 | params["parent_ids"] = ( 42 | ",".join(parent_ids) 43 | if isinstance(parent_ids, list) 44 | else parent_ids 45 | ) 46 | 47 | if ancestor_ids: 48 | params["ancestor_ids"] = ( 49 | ",".join(ancestor_ids) 50 | if isinstance(ancestor_ids, list) 51 | else ancestor_ids 52 | ) 53 | 54 | if include: 55 | params["include"] = ( 56 | ",".join(include) if isinstance(include, list) else include 57 | ) 58 | 59 | if sort: 60 | params["sort"] = sort 61 | if order: 62 | params["order"] = order.lower() 63 | 64 | return params 65 | 66 | 67 | async def search_diseases( 68 | name: str | None = None, 69 | include_synonyms: bool = True, # Deprecated - kept for backward compatibility 70 | category: str | None = None, 71 | disease_type: str | None = None, 72 | codes: list[str] | None = None, 73 | parent_ids: list[str] | None = None, 74 | ancestor_ids: list[str] | None = None, 75 | include: list[str] | None = None, 76 | sort: str | None = None, 77 | order: str | None = None, 78 | page_size: int = 20, 79 | page: int = 1, 80 | api_key: str | None = None, 81 | ) -> dict[str, Any]: 82 | """ 83 | Search for diseases in the NCI CTS database. 84 | 85 | This provides access to NCI's controlled vocabulary of cancer conditions 86 | used in clinical trials, with official terms and synonyms. 87 | 88 | Args: 89 | name: Disease name to search for (partial match, searches synonyms automatically) 90 | include_synonyms: [Deprecated] This parameter is ignored - API always searches synonyms 91 | category: Disease category/type filter (deprecated - use disease_type) 92 | disease_type: Type of disease (e.g., 'maintype', 'subtype', 'stage') 93 | codes: List of disease codes (e.g., ['C3868', 'C5806']) 94 | parent_ids: List of parent disease IDs 95 | ancestor_ids: List of ancestor disease IDs 96 | include: Fields to include in response 97 | sort: Sort field 98 | order: Sort order ('asc' or 'desc') 99 | page_size: Number of results per page 100 | page: Page number 101 | api_key: Optional API key (if not provided, uses NCI_API_KEY env var) 102 | 103 | Returns: 104 | Dictionary with search results containing: 105 | - diseases: List of disease records with names and synonyms 106 | - total: Total number of results 107 | - page: Current page 108 | - page_size: Results per page 109 | 110 | Raises: 111 | CTSAPIError: If the API request fails 112 | """ 113 | # Build query parameters 114 | params = _build_disease_params( 115 | name, 116 | disease_type, 117 | category, 118 | codes, 119 | parent_ids, 120 | ancestor_ids, 121 | include, 122 | sort, 123 | order, 124 | page_size, 125 | ) 126 | 127 | try: 128 | # Make API request 129 | response = await make_cts_request( 130 | url=NCI_DISEASES_URL, 131 | params=params, 132 | api_key=api_key, 133 | ) 134 | 135 | # Process response 136 | diseases = response.get("data", response.get("diseases", [])) 137 | total = response.get("total", len(diseases)) 138 | 139 | return { 140 | "diseases": diseases, 141 | "total": total, 142 | "page": page, 143 | "page_size": page_size, 144 | } 145 | 146 | except CTSAPIError: 147 | raise 148 | except Exception as e: 149 | logger.error(f"Failed to search diseases: {e}") 150 | raise CTSAPIError(f"Disease search failed: {e!s}") from e 151 | 152 | 153 | async def get_disease_by_id( 154 | disease_id: str, 155 | api_key: str | None = None, 156 | ) -> dict[str, Any]: 157 | """ 158 | Get detailed information about a specific disease by ID. 159 | 160 | Args: 161 | disease_id: Disease ID from NCI CTS 162 | api_key: Optional API key (if not provided, uses NCI_API_KEY env var) 163 | 164 | Returns: 165 | Dictionary with disease details including synonyms 166 | 167 | Raises: 168 | CTSAPIError: If the API request fails 169 | """ 170 | try: 171 | # Make API request 172 | url = f"{NCI_DISEASES_URL}/{disease_id}" 173 | response = await make_cts_request( 174 | url=url, 175 | api_key=api_key, 176 | ) 177 | 178 | # Return the disease data 179 | if "data" in response: 180 | return response["data"] 181 | elif "disease" in response: 182 | return response["disease"] 183 | else: 184 | return response 185 | 186 | except CTSAPIError: 187 | raise 188 | except Exception as e: 189 | logger.error(f"Failed to get disease {disease_id}: {e}") 190 | raise CTSAPIError(f"Failed to retrieve disease: {e!s}") from e 191 | 192 | 193 | def _format_disease_synonyms(synonyms: Any) -> list[str]: 194 | """Format disease synonyms section.""" 195 | lines: list[str] = [] 196 | if not synonyms: 197 | return lines 198 | 199 | if isinstance(synonyms, list) and synonyms: 200 | lines.append("- **Synonyms**:") 201 | for syn in synonyms[:5]: # Show up to 5 synonyms 202 | lines.append(f" - {syn}") 203 | if len(synonyms) > 5: 204 | lines.append(f" *(and {len(synonyms) - 5} more)*") 205 | elif isinstance(synonyms, str): 206 | lines.append(f"- **Synonyms**: {synonyms}") 207 | 208 | return lines 209 | 210 | 211 | def _format_disease_codes(codes: Any) -> list[str]: 212 | """Format disease code mappings.""" 213 | if not codes or not isinstance(codes, dict): 214 | return [] 215 | 216 | code_items = [] 217 | for system, code in codes.items(): 218 | code_items.append(f"{system}: {code}") 219 | 220 | if code_items: 221 | return [f"- **Codes**: {', '.join(code_items)}"] 222 | return [] 223 | 224 | 225 | def _format_single_disease(disease: dict[str, Any]) -> list[str]: 226 | """Format a single disease record.""" 227 | disease_id = disease.get("id", disease.get("disease_id", "Unknown")) 228 | name = disease.get( 229 | "name", disease.get("preferred_name", "Unknown Disease") 230 | ) 231 | category = disease.get("category", disease.get("type", "")) 232 | 233 | lines = [ 234 | f"### {name}", 235 | f"- **ID**: {disease_id}", 236 | ] 237 | 238 | if category: 239 | lines.append(f"- **Category**: {category}") 240 | 241 | # Add synonyms 242 | lines.extend(_format_disease_synonyms(disease.get("synonyms", []))) 243 | 244 | # Add code mappings 245 | lines.extend(_format_disease_codes(disease.get("codes"))) 246 | 247 | lines.append("") 248 | return lines 249 | 250 | 251 | def format_disease_results(results: dict[str, Any]) -> str: 252 | """ 253 | Format disease search results as markdown. 254 | 255 | Args: 256 | results: Search results dictionary 257 | 258 | Returns: 259 | Formatted markdown string 260 | """ 261 | diseases = results.get("diseases", []) 262 | total = results.get("total", 0) 263 | 264 | if not diseases: 265 | return "No diseases found matching the search criteria." 266 | 267 | # Build markdown output 268 | lines = [ 269 | f"## Disease Search Results ({total} found)", 270 | "", 271 | ] 272 | 273 | for disease in diseases: 274 | lines.extend(_format_single_disease(disease)) 275 | 276 | return "\n".join(lines) 277 | 278 | 279 | async def search_diseases_with_or( 280 | name_query: str, 281 | include_synonyms: bool = True, 282 | category: str | None = None, 283 | disease_type: str | None = None, 284 | codes: list[str] | None = None, 285 | parent_ids: list[str] | None = None, 286 | ancestor_ids: list[str] | None = None, 287 | include: list[str] | None = None, 288 | sort: str | None = None, 289 | order: str | None = None, 290 | page_size: int = 20, 291 | page: int = 1, 292 | api_key: str | None = None, 293 | ) -> dict[str, Any]: 294 | """ 295 | Search for diseases with OR query support. 296 | 297 | This function handles OR queries by making multiple API calls and combining results. 298 | For example: "melanoma OR lung cancer" will search for each term. 299 | 300 | Args: 301 | name_query: Name query that may contain OR operators 302 | Other args same as search_diseases 303 | 304 | Returns: 305 | Combined results from all searches with duplicates removed 306 | """ 307 | # Check if this is an OR query 308 | if " OR " in name_query or " or " in name_query: 309 | search_terms = parse_or_query(name_query) 310 | logger.info(f"Parsed OR query into terms: {search_terms}") 311 | else: 312 | # Single term search 313 | search_terms = [name_query] 314 | 315 | # Collect all unique diseases 316 | all_diseases = {} 317 | total_found = 0 318 | 319 | # Search for each term 320 | for term in search_terms: 321 | logger.info(f"Searching diseases for term: {term}") 322 | try: 323 | results = await search_diseases( 324 | name=term, 325 | include_synonyms=include_synonyms, 326 | category=category, 327 | disease_type=disease_type, 328 | codes=codes, 329 | parent_ids=parent_ids, 330 | ancestor_ids=ancestor_ids, 331 | include=include, 332 | sort=sort, 333 | order=order, 334 | page_size=page_size, 335 | page=page, 336 | api_key=api_key, 337 | ) 338 | 339 | # Add unique diseases (deduplicate by ID) 340 | for disease in results.get("diseases", []): 341 | disease_id = disease.get("id", disease.get("disease_id")) 342 | if disease_id and disease_id not in all_diseases: 343 | all_diseases[disease_id] = disease 344 | 345 | total_found += results.get("total", 0) 346 | 347 | except Exception as e: 348 | logger.warning(f"Failed to search for term '{term}': {e}") 349 | # Continue with other terms 350 | 351 | # Convert back to list and apply pagination 352 | unique_diseases = list(all_diseases.values()) 353 | 354 | # Sort by name for consistent results 355 | unique_diseases.sort( 356 | key=lambda x: x.get("name", x.get("preferred_name", "")).lower() 357 | ) 358 | 359 | # Apply pagination to combined results 360 | start_idx = (page - 1) * page_size 361 | end_idx = start_idx + page_size 362 | paginated_diseases = unique_diseases[start_idx:end_idx] 363 | 364 | return { 365 | "diseases": paginated_diseases, 366 | "total": len(unique_diseases), 367 | "page": page, 368 | "page_size": page_size, 369 | "search_terms": search_terms, # Include what we searched for 370 | "total_found_across_terms": total_found, # Total before deduplication 371 | } 372 | ``` -------------------------------------------------------------------------------- /docs/tutorials/openfda-integration.md: -------------------------------------------------------------------------------- ```markdown 1 | # OpenFDA Integration Guide 2 | 3 | ## Overview 4 | 5 | BioMCP now integrates with the FDA's openFDA API to provide access to critical drug safety and regulatory information. This integration adds three major data sources to BioMCP's capabilities: 6 | 7 | 1. **Drug Adverse Events (FAERS)** - FDA Adverse Event Reporting System data 8 | 2. **Drug Labels (SPL)** - Official FDA drug product labeling 9 | 3. **Device Events (MAUDE)** - Medical device adverse event reports 10 | 11 | This guide covers how to use these new tools effectively for precision oncology research. 12 | 13 | ## Quick Start 14 | 15 | ### Installation & Setup 16 | 17 | The OpenFDA integration is included in the standard BioMCP installation: 18 | 19 | ```bash 20 | # Install BioMCP 21 | pip install biomcp-python 22 | 23 | # Optional: Set API key for higher rate limits 24 | export OPENFDA_API_KEY="your-api-key-here" 25 | ``` 26 | 27 | > **Note**: An API key is optional but recommended. Without one, you're limited to 40 requests/minute. With a key, you get 240 requests/minute. [Get a free API key here](https://open.fda.gov/apis/authentication/). 28 | 29 | ### Basic Usage Examples 30 | 31 | #### Search for drug adverse events 32 | 33 | ```bash 34 | # Find adverse events for a specific drug 35 | biomcp openfda adverse search --drug imatinib 36 | 37 | # Search for specific reactions 38 | biomcp openfda adverse search --reaction nausea --serious 39 | 40 | # Get detailed report 41 | biomcp openfda adverse get REPORT123456 42 | ``` 43 | 44 | #### Search drug labels 45 | 46 | ```bash 47 | # Find drugs for specific indications 48 | biomcp openfda label search --indication melanoma 49 | 50 | # Search for drugs with boxed warnings 51 | biomcp openfda label search --boxed-warning 52 | 53 | # Get complete label 54 | biomcp openfda label get SET_ID_HERE 55 | ``` 56 | 57 | #### Search device events 58 | 59 | ```bash 60 | # Search for genomic test device issues 61 | biomcp openfda device search --device "FoundationOne" 62 | 63 | # Search by manufacturer 64 | biomcp openfda device search --manufacturer Illumina 65 | 66 | # Get detailed device event 67 | biomcp openfda device get MDR123456 68 | ``` 69 | 70 | ## MCP Tool Usage 71 | 72 | ### For AI Agents 73 | 74 | The OpenFDA tools are available as MCP tools for AI agents. Each tool includes built-in reminders to use the `think` tool first for complex queries. 75 | 76 | #### Available Tools 77 | 78 | - `openfda_adverse_searcher` - Search drug adverse events 79 | - `openfda_adverse_getter` - Get specific adverse event report 80 | - `openfda_label_searcher` - Search drug labels 81 | - `openfda_label_getter` - Get complete drug label 82 | - `openfda_device_searcher` - Search device adverse events 83 | - `openfda_device_getter` - Get specific device event report 84 | 85 | #### Example Tool Usage 86 | 87 | ```python 88 | # Search for adverse events 89 | result = await openfda_adverse_searcher( 90 | drug="pembrolizumab", 91 | serious=True, 92 | limit=25 93 | ) 94 | 95 | # Get drug label 96 | label = await openfda_label_getter( 97 | set_id="abc-123-def", 98 | sections=["indications_and_usage", "warnings_and_precautions"] 99 | ) 100 | 101 | # Search genomic devices 102 | devices = await openfda_device_searcher( 103 | device="sequencer", 104 | genomics_only=True, # Filter to genomic/diagnostic devices 105 | problem="false positive" 106 | ) 107 | ``` 108 | 109 | ## Data Sources Explained 110 | 111 | ### Drug Adverse Events (FAERS) 112 | 113 | The FDA Adverse Event Reporting System contains reports of adverse events and medication errors submitted to FDA. Key features: 114 | 115 | - **Voluntary reporting**: Reports come from healthcare professionals, patients, and manufacturers 116 | - **No causation proof**: Reports don't establish that a drug caused the event 117 | - **Rich detail**: Includes patient demographics, drug information, reactions, and outcomes 118 | - **Real-world data**: Captures post-market safety signals 119 | 120 | **Best for**: Understanding potential side effects, safety signals, drug interactions 121 | 122 | ### Drug Labels (SPL) 123 | 124 | Structured Product Labeling contains the official FDA-approved prescribing information. Includes: 125 | 126 | - **Indications and usage**: FDA-approved uses 127 | - **Dosage and administration**: How to prescribe 128 | - **Contraindications**: When not to use 129 | - **Warnings and precautions**: Safety information 130 | - **Drug interactions**: Known interactions 131 | - **Clinical studies**: Trial data supporting approval 132 | 133 | **Best for**: Official prescribing guidelines, approved indications, contraindications 134 | 135 | ### Device Events (MAUDE) 136 | 137 | Manufacturer and User Facility Device Experience database contains medical device adverse events. For BioMCP, we focus on genomic/diagnostic devices: 138 | 139 | - **Genomic test devices**: Issues with sequencing platforms, diagnostic panels 140 | - **In vitro diagnostics**: Problems with biomarker tests 141 | - **Device malfunctions**: Technical failures affecting test results 142 | - **Patient impact**: How device issues affected patient care 143 | 144 | **Best for**: Understanding reliability of genomic tests, device-related diagnostic issues 145 | 146 | ## Advanced Features 147 | 148 | ### Genomic Device Filtering 149 | 150 | By default, device searches filter to genomic/diagnostic devices relevant to precision oncology: 151 | 152 | ```bash 153 | # Search only genomic devices (default) 154 | biomcp openfda device search --device test 155 | 156 | # Search ALL medical devices 157 | biomcp openfda device search --device test --all-devices 158 | ``` 159 | 160 | The genomic filter includes FDA product codes for: 161 | 162 | - Next Generation Sequencing panels 163 | - Gene mutation detection systems 164 | - Tumor profiling tests 165 | - Hereditary variant detection systems 166 | 167 | ### Pagination Support 168 | 169 | All search tools support pagination for large result sets: 170 | 171 | ```bash 172 | # Get second page of results 173 | biomcp openfda adverse search --drug aspirin --page 2 --limit 50 174 | ``` 175 | 176 | ### Section-Specific Label Retrieval 177 | 178 | When retrieving drug labels, you can specify which sections to include: 179 | 180 | ```bash 181 | # Get only specific sections 182 | biomcp openfda label get SET_ID --sections "indications_and_usage,adverse_reactions" 183 | ``` 184 | 185 | ## Integration with Other BioMCP Tools 186 | 187 | ### Complementary Data Sources 188 | 189 | OpenFDA data complements existing BioMCP tools: 190 | 191 | | Tool | Data Source | Best For | 192 | | -------------------------- | ------------------ | --------------------------------- | 193 | | `drug_getter` | MyChem.info | Chemical properties, mechanisms | 194 | | `openfda_label_searcher` | FDA Labels | Official indications, prescribing | 195 | | `openfda_adverse_searcher` | FAERS | Safety signals, side effects | 196 | | `trial_searcher` | ClinicalTrials.gov | Active trials, eligibility | 197 | 198 | ### Workflow Examples 199 | 200 | #### Complete Drug Profile 201 | 202 | ```python 203 | # 1. Get drug chemical info 204 | drug_info = await drug_getter("imatinib") 205 | 206 | # 2. Get FDA label 207 | label = await openfda_label_searcher(name="imatinib") 208 | 209 | # 3. Check adverse events 210 | safety = await openfda_adverse_searcher(drug="imatinib", serious=True) 211 | 212 | # 4. Find current trials 213 | trials = await trial_searcher(interventions=["imatinib"]) 214 | ``` 215 | 216 | #### Device Reliability Check 217 | 218 | ```python 219 | # 1. Search for device issues 220 | events = await openfda_device_searcher( 221 | device="FoundationOne CDx", 222 | problem="false" 223 | ) 224 | 225 | # 2. Get specific event details 226 | if events: 227 | details = await openfda_device_getter("MDR_KEY_HERE") 228 | ``` 229 | 230 | ## Important Considerations 231 | 232 | ### Data Limitations 233 | 234 | 1. **Adverse Events**: 235 | 236 | - Reports don't prove causation 237 | - Reporting is voluntary, so not all events are captured 238 | - Duplicate reports may exist 239 | - Include appropriate disclaimers when presenting data 240 | 241 | 2. **Drug Labels**: 242 | 243 | - May not reflect the most recent changes 244 | - Off-label uses not included 245 | - Generic drugs may have different inactive ingredients 246 | 247 | 3. **Device Events**: 248 | - Not all device problems are reported 249 | - User error vs device malfunction can be unclear 250 | - Reports may lack complete information 251 | 252 | ### Rate Limits 253 | 254 | - **Without API key**: 40 requests/minute per IP 255 | - **With API key**: 240 requests/minute per key 256 | - **Burst limit**: 4 requests/second 257 | 258 | ### Best Practices 259 | 260 | 1. **Always use disclaimers**: Include FDA's disclaimer about adverse events not proving causation 261 | 2. **Check multiple sources**: Combine OpenFDA data with other BioMCP tools 262 | 3. **Filter appropriately**: Use genomic device filtering for relevant results 263 | 4. **Handle no results gracefully**: Many specific queries may return no results 264 | 5. **Respect rate limits**: Use API key for production use 265 | 266 | ## Troubleshooting 267 | 268 | ### Common Issues 269 | 270 | **No results found** 271 | 272 | - Try broader search terms 273 | - Check spelling of drug/device names 274 | - Remove filters to expand search 275 | 276 | **Rate limit errors** 277 | 278 | - Add API key to environment 279 | - Reduce request frequency 280 | - Batch queries when possible 281 | 282 | **Timeout errors** 283 | 284 | - OpenFDA API may be slow/down 285 | - Retry after a brief wait 286 | - Consider caching frequent queries 287 | 288 | ### Getting Help 289 | 290 | - OpenFDA documentation: https://open.fda.gov/apis/ 291 | - OpenFDA status: https://api.fda.gov/status 292 | - BioMCP issues: https://github.com/genomoncology/biomcp/issues 293 | 294 | ## API Reference 295 | 296 | ### Environment Variables 297 | 298 | - `OPENFDA_API_KEY`: Your openFDA API key (optional but recommended) 299 | 300 | ### CLI Commands 301 | 302 | ```bash 303 | # Adverse Events 304 | biomcp openfda adverse search [OPTIONS] 305 | --drug TEXT Drug name to search 306 | --reaction TEXT Reaction to search 307 | --serious/--all Filter serious events 308 | --limit INT Results per page (max 100) 309 | --page INT Page number 310 | 311 | biomcp openfda adverse get REPORT_ID 312 | 313 | # Drug Labels 314 | biomcp openfda label search [OPTIONS] 315 | --name TEXT Drug name 316 | --indication TEXT Indication to search 317 | --boxed-warning Has boxed warning 318 | --section TEXT Label section 319 | --limit INT Results per page 320 | --page INT Page number 321 | 322 | biomcp openfda label get SET_ID [OPTIONS] 323 | --sections TEXT Comma-separated sections 324 | 325 | # Device Events 326 | biomcp openfda device search [OPTIONS] 327 | --device TEXT Device name 328 | --manufacturer TEXT Manufacturer name 329 | --problem TEXT Problem description 330 | --product-code TEXT FDA product code 331 | --genomics-only/--all-devices 332 | --limit INT Results per page 333 | --page INT Page number 334 | 335 | biomcp openfda device get MDR_KEY 336 | ``` 337 | 338 | ## Example Outputs 339 | 340 | ### Adverse Event Search 341 | 342 | ```markdown 343 | ## FDA Adverse Event Reports 344 | 345 | **Drug**: imatinib | **Serious Events**: Yes 346 | **Total Reports Found**: 1,234 reports 347 | 348 | ### Top Reported Reactions: 349 | 350 | - **NAUSEA**: 234 reports (19.0%) 351 | - **FATIGUE**: 189 reports (15.3%) 352 | - **RASH**: 156 reports (12.6%) 353 | 354 | ### Sample Reports (showing 3 of 1,234): 355 | 356 | ... 357 | ``` 358 | 359 | ### Drug Label Search 360 | 361 | ```markdown 362 | ## FDA Drug Labels 363 | 364 | **Drug**: pembrolizumab 365 | **Total Labels Found**: 5 labels 366 | 367 | ### Results (showing 5 of 5): 368 | 369 | #### 1. KEYTRUDA 370 | 371 | **Also known as**: pembrolizumab 372 | **FDA Application**: BLA125514 373 | **Manufacturer**: Merck Sharp & Dohme 374 | **Route**: INTRAVENOUS 375 | 376 | ⚠️ **BOXED WARNING**: Immune-mediated adverse reactions... 377 | 378 | **Indications**: KEYTRUDA is indicated for the treatment of... 379 | ``` 380 | 381 | ### Device Event Search 382 | 383 | ```markdown 384 | ## FDA Device Adverse Event Reports 385 | 386 | **Device**: FoundationOne | **Type**: Genomic/Diagnostic Devices 387 | **Total Reports Found**: 12 reports 388 | 389 | ### Top Reported Problems: 390 | 391 | - **False negative result**: 5 reports (41.7%) 392 | - **Software malfunction**: 3 reports (25.0%) 393 | 394 | ### Sample Reports (showing 3 of 12): 395 | 396 | ... 397 | ``` 398 | ```