This is page 5 of 19. Use http://codebase.md/genomoncology/biomcp?lines=true&page={x} to view the full context. # Directory Structure ``` ├── .github │ ├── actions │ │ └── setup-python-env │ │ └── action.yml │ ├── dependabot.yml │ └── workflows │ ├── ci.yml │ ├── deploy-docs.yml │ ├── main.yml.disabled │ ├── on-release-main.yml │ └── validate-codecov-config.yml ├── .gitignore ├── .pre-commit-config.yaml ├── BIOMCP_DATA_FLOW.md ├── CHANGELOG.md ├── CNAME ├── codecov.yaml ├── docker-compose.yml ├── Dockerfile ├── docs │ ├── apis │ │ ├── error-codes.md │ │ ├── overview.md │ │ └── python-sdk.md │ ├── assets │ │ ├── biomcp-cursor-locations.png │ │ ├── favicon.ico │ │ ├── icon.png │ │ ├── logo.png │ │ ├── mcp_architecture.txt │ │ └── remote-connection │ │ ├── 00_connectors.png │ │ ├── 01_add_custom_connector.png │ │ ├── 02_connector_enabled.png │ │ ├── 03_connect_to_biomcp.png │ │ ├── 04_select_google_oauth.png │ │ └── 05_success_connect.png │ ├── backend-services-reference │ │ ├── 01-overview.md │ │ ├── 02-biothings-suite.md │ │ ├── 03-cbioportal.md │ │ ├── 04-clinicaltrials-gov.md │ │ ├── 05-nci-cts-api.md │ │ ├── 06-pubtator3.md │ │ └── 07-alphagenome.md │ ├── blog │ │ ├── ai-assisted-clinical-trial-search-analysis.md │ │ ├── images │ │ │ ├── deep-researcher-video.png │ │ │ ├── researcher-announce.png │ │ │ ├── researcher-drop-down.png │ │ │ ├── researcher-prompt.png │ │ │ ├── trial-search-assistant.png │ │ │ └── what_is_biomcp_thumbnail.png │ │ └── researcher-persona-resource.md │ ├── changelog.md │ ├── CNAME │ ├── concepts │ │ ├── 01-what-is-biomcp.md │ │ ├── 02-the-deep-researcher-persona.md │ │ └── 03-sequential-thinking-with-the-think-tool.md │ ├── developer-guides │ │ ├── 01-server-deployment.md │ │ ├── 02-contributing-and-testing.md │ │ ├── 03-third-party-endpoints.md │ │ ├── 04-transport-protocol.md │ │ ├── 05-error-handling.md │ │ ├── 06-http-client-and-caching.md │ │ ├── 07-performance-optimizations.md │ │ └── generate_endpoints.py │ ├── faq-condensed.md │ ├── FDA_SECURITY.md │ ├── genomoncology.md │ ├── getting-started │ │ ├── 01-quickstart-cli.md │ │ ├── 02-claude-desktop-integration.md │ │ └── 03-authentication-and-api-keys.md │ ├── how-to-guides │ │ ├── 01-find-articles-and-cbioportal-data.md │ │ ├── 02-find-trials-with-nci-and-biothings.md │ │ ├── 03-get-comprehensive-variant-annotations.md │ │ ├── 04-predict-variant-effects-with-alphagenome.md │ │ ├── 05-logging-and-monitoring-with-bigquery.md │ │ └── 06-search-nci-organizations-and-interventions.md │ ├── index.md │ ├── policies.md │ ├── reference │ │ ├── architecture-diagrams.md │ │ ├── quick-architecture.md │ │ ├── quick-reference.md │ │ └── visual-architecture.md │ ├── robots.txt │ ├── stylesheets │ │ ├── announcement.css │ │ └── extra.css │ ├── troubleshooting.md │ ├── tutorials │ │ ├── biothings-prompts.md │ │ ├── claude-code-biomcp-alphagenome.md │ │ ├── nci-prompts.md │ │ ├── openfda-integration.md │ │ ├── openfda-prompts.md │ │ ├── pydantic-ai-integration.md │ │ └── remote-connection.md │ ├── user-guides │ │ ├── 01-command-line-interface.md │ │ ├── 02-mcp-tools-reference.md │ │ └── 03-integrating-with-ides-and-clients.md │ └── workflows │ └── all-workflows.md ├── example_scripts │ ├── mcp_integration.py │ └── python_sdk.py ├── glama.json ├── LICENSE ├── lzyank.toml ├── Makefile ├── mkdocs.yml ├── package-lock.json ├── package.json ├── pyproject.toml ├── README.md ├── scripts │ ├── check_docs_in_mkdocs.py │ ├── check_http_imports.py │ └── generate_endpoints_doc.py ├── smithery.yaml ├── src │ └── biomcp │ ├── __init__.py │ ├── __main__.py │ ├── articles │ │ ├── __init__.py │ │ ├── autocomplete.py │ │ ├── fetch.py │ │ ├── preprints.py │ │ ├── search_optimized.py │ │ ├── search.py │ │ └── unified.py │ ├── biomarkers │ │ ├── __init__.py │ │ └── search.py │ ├── cbioportal_helper.py │ ├── circuit_breaker.py │ ├── cli │ │ ├── __init__.py │ │ ├── articles.py │ │ ├── biomarkers.py │ │ ├── diseases.py │ │ ├── health.py │ │ ├── interventions.py │ │ ├── main.py │ │ ├── openfda.py │ │ ├── organizations.py │ │ ├── server.py │ │ ├── trials.py │ │ └── variants.py │ ├── connection_pool.py │ ├── constants.py │ ├── core.py │ ├── diseases │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── domain_handlers.py │ ├── drugs │ │ ├── __init__.py │ │ └── getter.py │ ├── exceptions.py │ ├── genes │ │ ├── __init__.py │ │ └── getter.py │ ├── http_client_simple.py │ ├── http_client.py │ ├── individual_tools.py │ ├── integrations │ │ ├── __init__.py │ │ ├── biothings_client.py │ │ └── cts_api.py │ ├── interventions │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── logging_filter.py │ ├── metrics_handler.py │ ├── metrics.py │ ├── openfda │ │ ├── __init__.py │ │ ├── adverse_events_helpers.py │ │ ├── adverse_events.py │ │ ├── cache.py │ │ ├── constants.py │ │ ├── device_events_helpers.py │ │ ├── device_events.py │ │ ├── drug_approvals.py │ │ ├── drug_labels_helpers.py │ │ ├── drug_labels.py │ │ ├── drug_recalls_helpers.py │ │ ├── drug_recalls.py │ │ ├── drug_shortages_detail_helpers.py │ │ ├── drug_shortages_helpers.py │ │ ├── drug_shortages.py │ │ ├── exceptions.py │ │ ├── input_validation.py │ │ ├── rate_limiter.py │ │ ├── utils.py │ │ └── validation.py │ ├── organizations │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── parameter_parser.py │ ├── prefetch.py │ ├── query_parser.py │ ├── query_router.py │ ├── rate_limiter.py │ ├── render.py │ ├── request_batcher.py │ ├── resources │ │ ├── __init__.py │ │ ├── getter.py │ │ ├── instructions.md │ │ └── researcher.md │ ├── retry.py │ ├── router_handlers.py │ ├── router.py │ ├── shared_context.py │ ├── thinking │ │ ├── __init__.py │ │ ├── sequential.py │ │ └── session.py │ ├── thinking_tool.py │ ├── thinking_tracker.py │ ├── trials │ │ ├── __init__.py │ │ ├── getter.py │ │ ├── nci_getter.py │ │ ├── nci_search.py │ │ └── search.py │ ├── utils │ │ ├── __init__.py │ │ ├── cancer_types_api.py │ │ ├── cbio_http_adapter.py │ │ ├── endpoint_registry.py │ │ ├── gene_validator.py │ │ ├── metrics.py │ │ ├── mutation_filter.py │ │ ├── query_utils.py │ │ ├── rate_limiter.py │ │ └── request_cache.py │ ├── variants │ │ ├── __init__.py │ │ ├── alphagenome.py │ │ ├── cancer_types.py │ │ ├── cbio_external_client.py │ │ ├── cbioportal_mutations.py │ │ ├── cbioportal_search_helpers.py │ │ ├── cbioportal_search.py │ │ ├── constants.py │ │ ├── external.py │ │ ├── filters.py │ │ ├── getter.py │ │ ├── links.py │ │ └── search.py │ └── workers │ ├── __init__.py │ ├── worker_entry_stytch.js │ ├── worker_entry.js │ └── worker.py ├── tests │ ├── bdd │ │ ├── cli_help │ │ │ ├── help.feature │ │ │ └── test_help.py │ │ ├── conftest.py │ │ ├── features │ │ │ └── alphagenome_integration.feature │ │ ├── fetch_articles │ │ │ ├── fetch.feature │ │ │ └── test_fetch.py │ │ ├── get_trials │ │ │ ├── get.feature │ │ │ └── test_get.py │ │ ├── get_variants │ │ │ ├── get.feature │ │ │ └── test_get.py │ │ ├── search_articles │ │ │ ├── autocomplete.feature │ │ │ ├── search.feature │ │ │ ├── test_autocomplete.py │ │ │ └── test_search.py │ │ ├── search_trials │ │ │ ├── search.feature │ │ │ └── test_search.py │ │ ├── search_variants │ │ │ ├── search.feature │ │ │ └── test_search.py │ │ └── steps │ │ └── test_alphagenome_steps.py │ ├── config │ │ └── test_smithery_config.py │ ├── conftest.py │ ├── data │ │ ├── ct_gov │ │ │ ├── clinical_trials_api_v2.yaml │ │ │ ├── trials_NCT04280705.json │ │ │ └── trials_NCT04280705.txt │ │ ├── myvariant │ │ │ ├── myvariant_api.yaml │ │ │ ├── myvariant_field_descriptions.csv │ │ │ ├── variants_full_braf_v600e.json │ │ │ ├── variants_full_braf_v600e.txt │ │ │ └── variants_part_braf_v600_multiple.json │ │ ├── openfda │ │ │ ├── drugsfda_detail.json │ │ │ ├── drugsfda_search.json │ │ │ ├── enforcement_detail.json │ │ │ └── enforcement_search.json │ │ └── pubtator │ │ ├── pubtator_autocomplete.json │ │ └── pubtator3_paper.txt │ ├── integration │ │ ├── test_openfda_integration.py │ │ ├── test_preprints_integration.py │ │ ├── test_simple.py │ │ └── test_variants_integration.py │ ├── tdd │ │ ├── articles │ │ │ ├── test_autocomplete.py │ │ │ ├── test_cbioportal_integration.py │ │ │ ├── test_fetch.py │ │ │ ├── test_preprints.py │ │ │ ├── test_search.py │ │ │ └── test_unified.py │ │ ├── conftest.py │ │ ├── drugs │ │ │ ├── __init__.py │ │ │ └── test_drug_getter.py │ │ ├── openfda │ │ │ ├── __init__.py │ │ │ ├── test_adverse_events.py │ │ │ ├── test_device_events.py │ │ │ ├── test_drug_approvals.py │ │ │ ├── test_drug_labels.py │ │ │ ├── test_drug_recalls.py │ │ │ ├── test_drug_shortages.py │ │ │ └── test_security.py │ │ ├── test_biothings_integration_real.py │ │ ├── test_biothings_integration.py │ │ ├── test_circuit_breaker.py │ │ ├── test_concurrent_requests.py │ │ ├── test_connection_pool.py │ │ ├── test_domain_handlers.py │ │ ├── test_drug_approvals.py │ │ ├── test_drug_recalls.py │ │ ├── test_drug_shortages.py │ │ ├── test_endpoint_documentation.py │ │ ├── test_error_scenarios.py │ │ ├── test_europe_pmc_fetch.py │ │ ├── test_mcp_integration.py │ │ ├── test_mcp_tools.py │ │ ├── test_metrics.py │ │ ├── test_nci_integration.py │ │ ├── test_nci_mcp_tools.py │ │ ├── test_network_policies.py │ │ ├── test_offline_mode.py │ │ ├── test_openfda_unified.py │ │ ├── test_pten_r173_search.py │ │ ├── test_render.py │ │ ├── test_request_batcher.py.disabled │ │ ├── test_retry.py │ │ ├── test_router.py │ │ ├── test_shared_context.py.disabled │ │ ├── test_unified_biothings.py │ │ ├── thinking │ │ │ ├── __init__.py │ │ │ └── test_sequential.py │ │ ├── trials │ │ │ ├── test_backward_compatibility.py │ │ │ ├── test_getter.py │ │ │ └── test_search.py │ │ ├── utils │ │ │ ├── test_gene_validator.py │ │ │ ├── test_mutation_filter.py │ │ │ ├── test_rate_limiter.py │ │ │ └── test_request_cache.py │ │ ├── variants │ │ │ ├── constants.py │ │ │ ├── test_alphagenome_api_key.py │ │ │ ├── test_alphagenome_comprehensive.py │ │ │ ├── test_alphagenome.py │ │ │ ├── test_cbioportal_mutations.py │ │ │ ├── test_cbioportal_search.py │ │ │ ├── test_external_integration.py │ │ │ ├── test_external.py │ │ │ ├── test_extract_gene_aa_change.py │ │ │ ├── test_filters.py │ │ │ ├── test_getter.py │ │ │ ├── test_links.py │ │ │ └── test_search.py │ │ └── workers │ │ └── test_worker_sanitization.js │ └── test_pydantic_ai_integration.py ├── THIRD_PARTY_ENDPOINTS.md ├── tox.ini ├── uv.lock └── wrangler.toml ``` # Files -------------------------------------------------------------------------------- /tests/tdd/test_drug_approvals.py: -------------------------------------------------------------------------------- ```python 1 | """Tests for FDA drug approvals module.""" 2 | 3 | import json 4 | from pathlib import Path 5 | from unittest.mock import AsyncMock, patch 6 | 7 | import pytest 8 | 9 | from biomcp.openfda.drug_approvals import ( 10 | get_drug_approval, 11 | search_drug_approvals, 12 | ) 13 | 14 | # Load mock data 15 | MOCK_DIR = Path(__file__).parent.parent / "data" / "openfda" 16 | MOCK_APPROVALS_SEARCH = json.loads( 17 | (MOCK_DIR / "drugsfda_search.json").read_text() 18 | ) 19 | MOCK_APPROVAL_DETAIL = json.loads( 20 | (MOCK_DIR / "drugsfda_detail.json").read_text() 21 | ) 22 | 23 | 24 | class TestDrugApprovals: 25 | """Test drug approvals functionality.""" 26 | 27 | @pytest.mark.asyncio 28 | async def test_search_drug_approvals_success(self): 29 | """Test successful drug approval search.""" 30 | with patch( 31 | "biomcp.openfda.drug_approvals.make_openfda_request", 32 | new_callable=AsyncMock, 33 | ) as mock_request: 34 | mock_request.return_value = (MOCK_APPROVALS_SEARCH, None) 35 | 36 | result = await search_drug_approvals( 37 | drug="pembrolizumab", 38 | limit=10, 39 | ) 40 | 41 | assert "FDA Drug Approval Records" in result 42 | assert "pembrolizumab" in result.lower() 43 | assert "Application" in result 44 | assert "BLA125514" in result 45 | mock_request.assert_called_once() 46 | 47 | @pytest.mark.asyncio 48 | async def test_search_drug_approvals_with_filters(self): 49 | """Test drug approval search with multiple filters.""" 50 | with patch( 51 | "biomcp.openfda.drug_approvals.make_openfda_request", 52 | new_callable=AsyncMock, 53 | ) as mock_request: 54 | mock_request.return_value = (MOCK_APPROVALS_SEARCH, None) 55 | 56 | result = await search_drug_approvals( 57 | drug="keytruda", 58 | application_number="BLA125514", 59 | approval_year="2014", 60 | limit=5, 61 | api_key="test-key", 62 | ) 63 | 64 | assert "FDA Drug Approval Records" in result 65 | # Verify API key was passed as the 4th positional argument 66 | call_args = mock_request.call_args 67 | assert ( 68 | call_args[0][3] == "test-key" 69 | ) # api_key is 4th positional arg 70 | 71 | @pytest.mark.asyncio 72 | async def test_search_drug_approvals_no_results(self): 73 | """Test drug approval search with no results.""" 74 | with patch( 75 | "biomcp.openfda.drug_approvals.make_openfda_request", 76 | new_callable=AsyncMock, 77 | ) as mock_request: 78 | mock_request.return_value = ({"results": []}, None) 79 | 80 | result = await search_drug_approvals(drug="nonexistent-drug") 81 | 82 | assert "No drug approval records found" in result 83 | 84 | @pytest.mark.asyncio 85 | async def test_search_drug_approvals_api_error(self): 86 | """Test drug approval search with API error.""" 87 | with patch( 88 | "biomcp.openfda.drug_approvals.make_openfda_request", 89 | new_callable=AsyncMock, 90 | ) as mock_request: 91 | mock_request.return_value = (None, "API rate limit exceeded") 92 | 93 | result = await search_drug_approvals(drug="test") 94 | 95 | assert "Error searching drug approvals" in result 96 | assert "API rate limit exceeded" in result 97 | 98 | @pytest.mark.asyncio 99 | async def test_get_drug_approval_success(self): 100 | """Test getting specific drug approval details.""" 101 | with patch( 102 | "biomcp.openfda.drug_approvals.make_openfda_request", 103 | new_callable=AsyncMock, 104 | ) as mock_request: 105 | mock_request.return_value = (MOCK_APPROVAL_DETAIL, None) 106 | 107 | result = await get_drug_approval("BLA125514") 108 | 109 | # Should have detailed approval info 110 | assert "BLA125514" in result or "Drug Approval Details" in result 111 | assert "BLA125514" in result 112 | assert "Products" in result 113 | assert "Submission" in result 114 | 115 | @pytest.mark.asyncio 116 | async def test_get_drug_approval_not_found(self): 117 | """Test getting drug approval that doesn't exist.""" 118 | with patch( 119 | "biomcp.openfda.drug_approvals.make_openfda_request", 120 | new_callable=AsyncMock, 121 | ) as mock_request: 122 | mock_request.return_value = ({"results": []}, None) 123 | 124 | result = await get_drug_approval("INVALID123") 125 | 126 | assert "No approval record found" in result 127 | assert "INVALID123" in result 128 | 129 | @pytest.mark.asyncio 130 | async def test_get_drug_approval_with_api_key(self): 131 | """Test getting drug approval with API key.""" 132 | with patch( 133 | "biomcp.openfda.drug_approvals.make_openfda_request", 134 | new_callable=AsyncMock, 135 | ) as mock_request: 136 | mock_request.return_value = (MOCK_APPROVAL_DETAIL, None) 137 | 138 | result = await get_drug_approval( 139 | "BLA125514", 140 | api_key="test-api-key", 141 | ) 142 | 143 | # Should have detailed approval info 144 | assert "BLA125514" in result or "Drug Approval Details" in result 145 | # Verify API key was passed as the 4th positional argument 146 | call_args = mock_request.call_args 147 | assert ( 148 | call_args[0][3] == "test-api-key" 149 | ) # api_key is 4th positional arg 150 | 151 | @pytest.mark.asyncio 152 | async def test_search_drug_approvals_pagination(self): 153 | """Test drug approval search pagination.""" 154 | with patch( 155 | "biomcp.openfda.drug_approvals.make_openfda_request", 156 | new_callable=AsyncMock, 157 | ) as mock_request: 158 | mock_response = { 159 | "meta": {"results": {"total": 100}}, 160 | "results": MOCK_APPROVALS_SEARCH["results"], 161 | } 162 | mock_request.return_value = (mock_response, None) 163 | 164 | result = await search_drug_approvals( 165 | drug="cancer", 166 | limit=10, 167 | skip=20, 168 | ) 169 | 170 | # The output format is different - just check for the total 171 | assert "100" in result 172 | # Verify skip parameter was passed (2nd positional arg) 173 | call_args = mock_request.call_args 174 | assert ( 175 | call_args[0][1]["skip"] == "20" 176 | ) # params is 2nd positional arg, value is string 177 | 178 | @pytest.mark.asyncio 179 | async def test_approval_year_validation(self): 180 | """Test that approval year is properly formatted.""" 181 | with patch( 182 | "biomcp.openfda.drug_approvals.make_openfda_request", 183 | new_callable=AsyncMock, 184 | ) as mock_request: 185 | mock_request.return_value = (MOCK_APPROVALS_SEARCH, None) 186 | 187 | await search_drug_approvals( 188 | approval_year="2023", 189 | ) 190 | 191 | # Check that year was properly formatted in query 192 | call_args = mock_request.call_args 193 | params = call_args[0][1] # params is 2nd positional arg 194 | assert "marketing_status_date" in params["search"] 195 | assert "[2023-01-01 TO 2023-12-31]" in params["search"] 196 | ``` -------------------------------------------------------------------------------- /src/biomcp/articles/fetch.py: -------------------------------------------------------------------------------- ```python 1 | import json 2 | import re 3 | from ssl import TLSVersion 4 | from typing import Annotated, Any 5 | 6 | from pydantic import BaseModel, Field, computed_field 7 | 8 | from .. import http_client, render 9 | from ..constants import PUBTATOR3_FULLTEXT_URL 10 | from ..http_client import RequestError 11 | 12 | 13 | class PassageInfo(BaseModel): 14 | section_type: str | None = Field( 15 | None, 16 | description="Type of the section.", 17 | ) 18 | passage_type: str | None = Field( 19 | None, 20 | alias="type", 21 | description="Type of the passage.", 22 | ) 23 | 24 | 25 | class Passage(BaseModel): 26 | info: PassageInfo | None = Field( 27 | None, 28 | alias="infons", 29 | ) 30 | text: str | None = None 31 | 32 | @property 33 | def section_type(self) -> str: 34 | section_type = None 35 | if self.info is not None: 36 | section_type = self.info.section_type or self.info.passage_type 37 | section_type = section_type or "UNKNOWN" 38 | return section_type.upper() 39 | 40 | @property 41 | def is_title(self) -> bool: 42 | return self.section_type == "TITLE" 43 | 44 | @property 45 | def is_abstract(self) -> bool: 46 | return self.section_type == "ABSTRACT" 47 | 48 | @property 49 | def is_text(self) -> bool: 50 | return self.section_type in { 51 | "INTRO", 52 | "RESULTS", 53 | "METHODS", 54 | "DISCUSS", 55 | "CONCL", 56 | "FIG", 57 | "TABLE", 58 | } 59 | 60 | 61 | class Article(BaseModel): 62 | pmid: int | None = Field( 63 | None, 64 | description="PubMed ID of the reference article.", 65 | ) 66 | pmcid: str | None = Field( 67 | None, 68 | description="PubMed Central ID of the reference article.", 69 | ) 70 | date: str | None = Field( 71 | None, 72 | description="Date of the reference article's publication.", 73 | ) 74 | journal: str | None = Field( 75 | None, 76 | description="Journal name.", 77 | ) 78 | authors: list[str] | None = Field( 79 | None, 80 | description="List of authors.", 81 | ) 82 | passages: list[Passage] = Field( 83 | ..., 84 | alias="passages", 85 | description="List of passages in the reference article.", 86 | exclude=True, 87 | ) 88 | 89 | @computed_field 90 | def title(self) -> str: 91 | lines = [] 92 | for passage in filter(lambda p: p.is_title, self.passages): 93 | if passage.text: 94 | lines.append(passage.text) 95 | return " ... ".join(lines) or f"Article: {self.pmid}" 96 | 97 | @computed_field 98 | def abstract(self) -> str: 99 | lines = [] 100 | for passage in filter(lambda p: p.is_abstract, self.passages): 101 | if passage.text: 102 | lines.append(passage.text) 103 | return "\n\n".join(lines) or f"Article: {self.pmid}" 104 | 105 | @computed_field 106 | def full_text(self) -> str: 107 | lines = [] 108 | for passage in filter(lambda p: p.is_text, self.passages): 109 | if passage.text: 110 | lines.append(passage.text) 111 | return "\n\n".join(lines) or "" 112 | 113 | @computed_field 114 | def pubmed_url(self) -> str | None: 115 | url = None 116 | if self.pmid: 117 | url = f"https://pubmed.ncbi.nlm.nih.gov/{self.pmid}/" 118 | return url 119 | 120 | @computed_field 121 | def pmc_url(self) -> str | None: 122 | """Generates the PMC URL if PMCID exists.""" 123 | url = None 124 | if self.pmcid: 125 | url = f"https://www.ncbi.nlm.nih.gov/pmc/articles/{self.pmcid}/" 126 | return url 127 | 128 | 129 | class FetchArticlesResponse(BaseModel): 130 | articles: list[Article] = Field( 131 | ..., 132 | alias="PubTator3", 133 | description="List of full texts Articles retrieved from PubTator3.", 134 | ) 135 | 136 | def get_abstract(self, pmid: int | None) -> str | None: 137 | for article in self.articles: 138 | if pmid and article.pmid == pmid: 139 | return str(article.abstract) 140 | return None 141 | 142 | 143 | async def call_pubtator_api( 144 | pmids: list[int], 145 | full: bool, 146 | ) -> tuple[FetchArticlesResponse | None, RequestError | None]: 147 | """Fetch the text of a list of PubMed IDs.""" 148 | 149 | request = { 150 | "pmids": ",".join(str(pmid) for pmid in pmids), 151 | "full": str(full).lower(), 152 | } 153 | 154 | response, error = await http_client.request_api( 155 | url=PUBTATOR3_FULLTEXT_URL, 156 | request=request, 157 | response_model_type=FetchArticlesResponse, 158 | tls_version=TLSVersion.TLSv1_2, 159 | domain="pubmed", 160 | ) 161 | return response, error 162 | 163 | 164 | async def fetch_articles( 165 | pmids: list[int], 166 | full: bool, 167 | output_json: bool = False, 168 | ) -> str: 169 | """Fetch the text of a list of PubMed IDs.""" 170 | 171 | response, error = await call_pubtator_api(pmids, full) 172 | 173 | # PubTator API returns full text even when full=False 174 | exclude_fields = {"full_text"} if not full else set() 175 | 176 | # noinspection DuplicatedCode 177 | if error: 178 | data: list[dict[str, Any]] = [ 179 | {"error": f"Error {error.code}: {error.message}"} 180 | ] 181 | else: 182 | data = [ 183 | article.model_dump( 184 | mode="json", 185 | exclude_none=True, 186 | exclude=exclude_fields, 187 | ) 188 | for article in (response.articles if response else []) 189 | ] 190 | 191 | if data and not output_json: 192 | return render.to_markdown(data) 193 | else: 194 | return json.dumps(data, indent=2) 195 | 196 | 197 | def is_doi(identifier: str) -> bool: 198 | """Check if the identifier is a DOI.""" 199 | # DOI pattern: starts with 10. followed by numbers/slash/alphanumeric 200 | doi_pattern = r"^10\.\d{4,9}/[\-._;()/:\w]+$" 201 | return bool(re.match(doi_pattern, str(identifier))) 202 | 203 | 204 | def is_pmid(identifier: str) -> bool: 205 | """Check if the identifier is a PubMed ID.""" 206 | # PMID is a numeric string 207 | return str(identifier).isdigit() 208 | 209 | 210 | async def _article_details( 211 | call_benefit: Annotated[ 212 | str, 213 | "Define and summarize why this function is being called and the intended benefit", 214 | ], 215 | pmid, 216 | ) -> str: 217 | """ 218 | Retrieves details for a single article given its identifier. 219 | 220 | Parameters: 221 | - call_benefit: Define and summarize why this function is being called and the intended benefit 222 | - pmid: An article identifier - either a PubMed ID (e.g., 34397683) or DOI (e.g., 10.1101/2024.01.20.23288905) 223 | 224 | Process: 225 | - For PMIDs: Calls the PubTator3 API to fetch the article's title, abstract, and full text (if available) 226 | - For DOIs: Calls Europe PMC API to fetch preprint details 227 | 228 | Output: A JSON formatted string containing the retrieved article content. 229 | """ 230 | identifier = str(pmid) 231 | 232 | # Check if it's a DOI (Europe PMC preprint) 233 | if is_doi(identifier): 234 | from .preprints import fetch_europe_pmc_article 235 | 236 | return await fetch_europe_pmc_article(identifier, output_json=True) 237 | # Check if it's a PMID (PubMed article) 238 | elif is_pmid(identifier): 239 | return await fetch_articles( 240 | [int(identifier)], full=True, output_json=True 241 | ) 242 | else: 243 | # Unknown identifier format 244 | return json.dumps( 245 | [ 246 | { 247 | "error": f"Invalid identifier format: {identifier}. Expected either a PMID (numeric) or DOI (10.xxxx/xxxx format)." 248 | } 249 | ], 250 | indent=2, 251 | ) 252 | ``` -------------------------------------------------------------------------------- /docs/concepts/02-the-deep-researcher-persona.md: -------------------------------------------------------------------------------- ```markdown 1 | # The Deep Researcher Persona 2 | 3 | ## Overview 4 | 5 | The Deep Researcher Persona is a core philosophy of BioMCP that transforms AI assistants into systematic biomedical research partners. This persona embodies the methodical approach of a dedicated biomedical researcher, enabling AI agents to conduct thorough literature reviews, analyze complex datasets, and synthesize findings into actionable insights. 6 | 7 | ## Why the Deep Researcher Persona? 8 | 9 | Traditional AI interactions often result in surface-level responses. The Deep Researcher Persona addresses this by: 10 | 11 | - **Enforcing Systematic Thinking**: Requiring the use of the `think` tool before any research operation 12 | - **Preventing Premature Conclusions**: Breaking complex queries into manageable research steps 13 | - **Ensuring Comprehensive Analysis**: Following a proven 10-step methodology 14 | - **Maintaining Research Rigor**: Documenting thought processes and decision rationale 15 | 16 | ## Core Traits and Personality 17 | 18 | The Deep Researcher embodies these characteristics: 19 | 20 | - **Curious and Methodical**: Always seeking deeper understanding through systematic investigation 21 | - **Evidence-Based**: Grounding all conclusions in concrete data from multiple sources 22 | - **Professional Voice**: Clear, concise scientific communication 23 | - **Collaborative**: Working as a research partner, not just an information retriever 24 | - **Objective**: Presenting balanced findings including contradictory evidence 25 | 26 | ## The 10-Step Sequential Thinking Process 27 | 28 | This methodology ensures comprehensive research coverage: 29 | 30 | ### 1. Problem Definition and Scope 31 | 32 | - Parse the research question to identify key concepts 33 | - Define clear objectives and expected deliverables 34 | - Establish research boundaries and constraints 35 | 36 | ### 2. Initial Knowledge Assessment 37 | 38 | - Evaluate existing knowledge on the topic 39 | - Identify knowledge gaps requiring investigation 40 | - Form initial hypotheses to guide research 41 | 42 | ### 3. Search Strategy Development 43 | 44 | - Design comprehensive search queries 45 | - Select appropriate databases and tools 46 | - Plan iterative search refinements 47 | 48 | ### 4. Data Collection and Retrieval 49 | 50 | - Execute searches across multiple sources (PubTator3, ClinicalTrials.gov, variant databases) 51 | - Collect relevant articles, trials, and annotations 52 | - Document search parameters and results 53 | 54 | ### 5. Quality Assessment and Filtering 55 | 56 | - Evaluate source credibility and relevance 57 | - Apply inclusion/exclusion criteria 58 | - Prioritize high-impact findings 59 | 60 | ### 6. Information Extraction 61 | 62 | - Extract key findings, methodologies, and conclusions 63 | - Identify patterns and relationships 64 | - Note contradictions and uncertainties 65 | 66 | ### 7. Synthesis and Integration 67 | 68 | - Combine findings from multiple sources 69 | - Resolve contradictions when possible 70 | - Build coherent narrative from evidence 71 | 72 | ### 8. Critical Analysis 73 | 74 | - Evaluate strength of evidence 75 | - Identify limitations and biases 76 | - Consider alternative interpretations 77 | 78 | ### 9. Knowledge Synthesis 79 | 80 | - Create structured summary of findings 81 | - Highlight key insights and implications 82 | - Prepare actionable recommendations 83 | 84 | ### 10. Communication and Reporting 85 | 86 | - Format findings for target audience 87 | - Include proper citations and references 88 | - Provide clear next steps 89 | 90 | ## Mandatory Think Tool Usage 91 | 92 | **CRITICAL**: The `think` tool must ALWAYS be used first before any BioMCP operation. This is not optional. 93 | 94 | ```python 95 | # Correct pattern - ALWAYS start with think 96 | think(thought="Breaking down the research question...", thoughtNumber=1) 97 | # Then proceed with searches 98 | article_searcher(genes=["BRAF"], diseases=["melanoma"]) 99 | 100 | # INCORRECT - Never skip the think step 101 | article_searcher(genes=["BRAF"]) # ❌ Will produce suboptimal results 102 | ``` 103 | 104 | ## Implementation in Practice 105 | 106 | ### Example Research Flow 107 | 108 | 1. **User Query**: "What are the treatment options for BRAF V600E melanoma?" 109 | 110 | 2. **Think Step 1**: Problem decomposition 111 | 112 | ``` 113 | think(thought="Breaking down query: Need to find 1) BRAF V600E mutation significance, 2) current treatments, 3) clinical trials", thoughtNumber=1) 114 | ``` 115 | 116 | 3. **Think Step 2**: Search strategy 117 | 118 | ``` 119 | think(thought="Will search articles for BRAF inhibitors, then trials for V600E-specific treatments", thoughtNumber=2) 120 | ``` 121 | 122 | 4. **Execute Searches**: Following the planned strategy 123 | 5. **Synthesize**: Combine findings into comprehensive brief 124 | 125 | ### Research Brief Format 126 | 127 | Every research session concludes with a structured brief: 128 | 129 | ```markdown 130 | ## Research Brief: [Topic] 131 | 132 | ### Executive Summary 133 | 134 | - 3-5 bullet points of key findings 135 | - Clear, actionable insights 136 | 137 | ### Detailed Findings 138 | 139 | 1. **Literature Review** (X papers analyzed) 140 | 141 | - Key discoveries 142 | - Consensus findings 143 | - Contradictions noted 144 | 145 | 2. **Clinical Evidence** (Y trials reviewed) 146 | 147 | - Current treatment landscape 148 | - Emerging therapies 149 | - Trial enrollment opportunities 150 | 151 | 3. **Molecular Insights** 152 | - Variant annotations 153 | - Pathway implications 154 | - Biomarker relevance 155 | 156 | ### Recommendations 157 | 158 | - Evidence-based suggestions 159 | - Areas for further investigation 160 | - Clinical considerations 161 | 162 | ### References 163 | 164 | - Full citations for all sources 165 | - Direct links to primary data 166 | ``` 167 | 168 | ## Tool Inventory and Usage 169 | 170 | The Deep Researcher has access to 24 specialized tools: 171 | 172 | ### Core Research Tools 173 | 174 | - **think**: Sequential reasoning and planning 175 | - **article_searcher**: PubMed/PubTator3 literature search 176 | - **trial_searcher**: Clinical trials discovery 177 | - **variant_searcher**: Genetic variant annotations 178 | 179 | ### Specialized Analysis Tools 180 | 181 | - **gene_getter**: Gene function and pathway data 182 | - **drug_getter**: Medication information 183 | - **disease_getter**: Disease ontology and synonyms 184 | - **alphagenome_predictor**: Variant effect prediction 185 | 186 | ### Integration Features 187 | 188 | - **Automatic cBioPortal Integration**: Cancer genomics context for all gene searches 189 | - **BioThings Suite Access**: Real-time biomedical annotations 190 | - **NCI Database Integration**: Comprehensive cancer trial data 191 | 192 | ## Best Practices 193 | 194 | 1. **Always Think First**: Never skip the sequential thinking process 195 | 2. **Use Multiple Sources**: Cross-reference findings across databases 196 | 3. **Document Reasoning**: Explain why certain searches or filters were chosen 197 | 4. **Consider Context**: Account for disease stage, prior treatments, and patient factors 198 | 5. **Stay Current**: Leverage preprint integration for latest findings 199 | 200 | ## Community Impact 201 | 202 | The Deep Researcher Persona has transformed how researchers interact with biomedical data: 203 | 204 | - **Reduced Research Time**: From days to minutes for comprehensive reviews 205 | - **Improved Accuracy**: Systematic approach reduces missed connections 206 | - **Enhanced Collaboration**: Consistent methodology enables team research 207 | - **Democratized Access**: Complex research capabilities available to all 208 | 209 | ## Getting Started 210 | 211 | To use the Deep Researcher Persona: 212 | 213 | 1. Ensure BioMCP is installed and configured 214 | 2. Load the persona resource when starting your AI session 215 | 3. Always begin research queries with the think tool 216 | 4. Follow the 10-step methodology for comprehensive results 217 | 218 | Remember: The Deep Researcher Persona is not just a tool configuration—it's a systematic approach to biomedical research that ensures thorough, evidence-based insights every time. 219 | ``` -------------------------------------------------------------------------------- /src/biomcp/render.py: -------------------------------------------------------------------------------- ```python 1 | import json 2 | import re 3 | import textwrap 4 | from typing import Any 5 | 6 | MAX_WIDTH = 72 7 | 8 | REMOVE_MULTI_LINES = re.compile(r"\s+") 9 | 10 | 11 | def dedupe_list_keep_order(lst: list[Any]) -> list[Any]: 12 | """ 13 | Remove duplicates from a list while preserving order. 14 | Uses string to handle elements like dicts that are not hashable. 15 | """ 16 | seen = set() 17 | data = [] 18 | for x in lst: 19 | if str(x) not in seen: 20 | data.append(x) 21 | seen.add(str(x)) 22 | return data 23 | 24 | 25 | def to_markdown(data: str | list | dict) -> str: 26 | """Convert a JSON string or already-parsed data (dict or list) into 27 | a simple Markdown representation. 28 | 29 | :param data: The input data, either as a JSON string, or a parsed list/dict. 30 | :return: A string containing the generated Markdown output. 31 | """ 32 | if isinstance(data, str): 33 | data = json.loads(data) 34 | 35 | if isinstance(data, list): 36 | new_data = [] 37 | for index, item in enumerate(data, start=1): 38 | new_data.append({f"Record {index}": item}) 39 | data = new_data 40 | 41 | lines: list[str] = [] 42 | process_any(data, [], lines) 43 | return ("\n".join(lines)).strip() + "\n" 44 | 45 | 46 | def wrap_preserve_newlines(text: str, width: int) -> list[str]: 47 | """For each line in the text (split by newlines), wrap it to 'width' columns. 48 | Blank lines are preserved. Returns a list of wrapped lines without 49 | inserting extra blank lines. 50 | 51 | :param text: The multiline string to wrap. 52 | :param width: Maximum line width for wrapping. 53 | :return: A list of lines after wrapping. 54 | """ 55 | wrapped_lines: list[str] = [] 56 | for line in text.splitlines(keepends=False): 57 | if not line.strip(): 58 | wrapped_lines.append("") 59 | continue 60 | # remove excessive spaces (pmid=38296628) 61 | line = REMOVE_MULTI_LINES.sub(" ", line) 62 | pieces = textwrap.wrap(line, width=width) 63 | wrapped_lines.extend(pieces) 64 | return wrapped_lines 65 | 66 | 67 | def append_line(lines: list[str], line: str) -> None: 68 | """Append a line to 'lines', avoiding consecutive blank lines. 69 | 70 | :param lines: The running list of lines to which we add. 71 | :param line: The line to append. 72 | """ 73 | line = line.rstrip() 74 | lines.append(line) 75 | 76 | 77 | def process_any( 78 | value: Any, 79 | path_keys: list[str], 80 | lines: list[str], 81 | ) -> None: 82 | """Dispatch function to handle dict, list, or scalar (str/int/float/bool). 83 | 84 | :param value: The current JSON data node. 85 | :param path_keys: The list of keys leading to this node (for headings). 86 | :param lines: The running list of output Markdown lines. 87 | """ 88 | if isinstance(value, dict): 89 | process_dict(value, path_keys, lines) 90 | elif isinstance(value, list): 91 | process_list(value, path_keys, lines) 92 | elif value is not None: 93 | render_key_value(lines, path_keys[-1], value) 94 | 95 | 96 | def process_dict(dct: dict, path_keys: list[str], lines: list[str]) -> None: 97 | """Handle a dictionary by printing a heading for the current path (if any), 98 | then processing key/value pairs in order: scalars first, then nested dicts, then lists. 99 | 100 | :param dct: The dictionary to process. 101 | :param path_keys: The list of keys leading to this dict (for heading). 102 | :param lines: The running list of output Markdown lines. 103 | """ 104 | if path_keys: 105 | level = min(len(path_keys), 5) 106 | heading_hash = "#" * level 107 | heading_text = transform_key(path_keys[-1]) 108 | # Blank line, then heading 109 | append_line(lines, "") 110 | append_line(lines, f"{heading_hash} {heading_text}") 111 | 112 | # Group keys by value type 113 | scalar_keys = [] 114 | dict_keys = [] 115 | list_keys = [] 116 | 117 | for key, val in dct.items(): 118 | if isinstance(val, str | int | float | bool) or val is None: 119 | scalar_keys.append(key) 120 | elif isinstance(val, dict): 121 | dict_keys.append(key) 122 | elif isinstance(val, list): 123 | list_keys.append(key) 124 | 125 | # Process scalars first 126 | for key in scalar_keys: 127 | next_path = path_keys + [key] 128 | process_any(dct[key], next_path, lines) 129 | 130 | # Process dicts second 131 | for key in dict_keys: 132 | next_path = path_keys + [key] 133 | process_any(dct[key], next_path, lines) 134 | 135 | # Process lists last 136 | for key in list_keys: 137 | next_path = path_keys + [key] 138 | process_any(dct[key], next_path, lines) 139 | 140 | 141 | def process_list(lst: list, path_keys: list[str], lines: list[str]) -> None: 142 | """If all items in the list are scalar, attempt to render them on one line 143 | if it fits, otherwise use bullet points. Otherwise, we recursively 144 | process each item. 145 | 146 | :param lst: The list of items to process. 147 | :param path_keys: The keys leading to this list. 148 | :param lines: The running list of Markdown lines. 149 | """ 150 | all_scalars = all(isinstance(i, str | int | float | bool) for i in lst) 151 | lst = dedupe_list_keep_order(lst) 152 | if path_keys and all_scalars: 153 | key = path_keys[-1] 154 | process_scalar_list(key, lines, lst) 155 | else: 156 | for item in lst: 157 | process_any(item, path_keys, lines) 158 | 159 | 160 | def process_scalar_list(key: str, lines: list[str], lst: list) -> None: 161 | """Print a list of scalars either on one line as "Key: item1, item2, ..." 162 | if it fits within MAX_WIDTH, otherwise print a bullet list. 163 | 164 | :param key: The key name for this list of scalars. 165 | :param lines: The running list of Markdown lines. 166 | :param lst: The actual list of scalar items. 167 | """ 168 | label = transform_key(key) 169 | items_str = ", ".join(str(item) for item in lst) 170 | single_line = f"{label}: {items_str}" 171 | if len(single_line) <= MAX_WIDTH: 172 | append_line(lines, single_line) 173 | else: 174 | # bullet list 175 | append_line(lines, f"{label}:") 176 | for item in lst: 177 | bullet = f"- {item}" 178 | append_line(lines, bullet) 179 | 180 | 181 | def render_key_value(lines: list[str], key: str, value: Any) -> None: 182 | """Render a single "key: value" pair. If the value is a long string, 183 | we do multiline wrapping with an indentation for clarity. Otherwise, 184 | it appears on the same line. 185 | 186 | :param lines: The running list of Markdown lines. 187 | :param key: The raw key name (untransformed). 188 | :param value: The value associated with this key. 189 | """ 190 | label = transform_key(key) 191 | val_str = str(value) 192 | 193 | # If the value is a fairly long string, do multiline 194 | if isinstance(value, str) and len(value) > MAX_WIDTH: 195 | append_line(lines, f"{label}:") 196 | for wrapped in wrap_preserve_newlines(val_str, MAX_WIDTH): 197 | append_line(lines, " " + wrapped) 198 | else: 199 | append_line(lines, f"{label}: {val_str}") 200 | 201 | 202 | def transform_key(s: str) -> str: 203 | # Replace underscores with spaces. 204 | s = s.replace("_", " ") 205 | # Insert a space between an uppercase letter followed by an uppercase letter then a lowercase letter. 206 | s = re.sub(r"(?<=[A-Z])(?=[A-Z][a-z])", " ", s) 207 | # Insert a space between a lowercase letter or digit and an uppercase letter. 208 | s = re.sub(r"(?<=[a-z0-9])(?=[A-Z])", " ", s) 209 | 210 | words = s.split() 211 | transformed_words = [] 212 | for word in words: 213 | transformed_words.append(word.capitalize()) 214 | return " ".join(transformed_words) 215 | ``` -------------------------------------------------------------------------------- /docs/getting-started/02-claude-desktop-integration.md: -------------------------------------------------------------------------------- ```markdown 1 | # Claude Desktop Integration 2 | 3 | This guide covers how to integrate BioMCP with Claude Desktop, enabling AI-powered biomedical research directly in your Claude conversations. 4 | 5 | ## Prerequisites 6 | 7 | - [Claude Desktop](https://claude.ai/download) application 8 | - One of the following: 9 | - **Option A**: Python 3.10+ and [uv](https://docs.astral.sh/uv/) (recommended) 10 | - **Option B**: [Docker](https://www.docker.com/products/docker-desktop/) 11 | 12 | ## Installation Methods 13 | 14 | ### Option A: Using uv (Recommended) 15 | 16 | This method is fastest and easiest for most users. 17 | 18 | #### 1. Install uv 19 | 20 | ```bash 21 | # macOS/Linux 22 | curl -LsSf https://astral.sh/uv/install.sh | sh 23 | 24 | # Windows 25 | powershell -c "irm https://astral.sh/uv/install.ps1 | iex" 26 | ``` 27 | 28 | #### 2. Configure Claude Desktop 29 | 30 | Add BioMCP to your Claude Desktop configuration file: 31 | 32 | **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` 33 | **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` 34 | 35 | ```json 36 | { 37 | "mcpServers": { 38 | "biomcp": { 39 | "command": "uv", 40 | "args": ["run", "--with", "biomcp-python", "biomcp", "run"], 41 | "env": { 42 | "NCI_API_KEY": "your-nci-api-key-here", 43 | "ALPHAGENOME_API_KEY": "your-alphagenome-key-here", 44 | "CBIO_TOKEN": "your-cbioportal-token-here" 45 | } 46 | } 47 | } 48 | } 49 | ``` 50 | 51 | ### Option B: Using Docker 52 | 53 | This method provides better isolation and consistency across systems. 54 | 55 | #### 1. Create a Dockerfile 56 | 57 | Create a file named `Dockerfile`: 58 | 59 | ```dockerfile 60 | FROM python:3.11-slim 61 | 62 | # Install BioMCP 63 | RUN pip install biomcp-python 64 | 65 | # Set the entrypoint 66 | ENTRYPOINT ["biomcp", "run"] 67 | ``` 68 | 69 | #### 2. Build the Docker Image 70 | 71 | ```bash 72 | docker build -t biomcp:latest . 73 | ``` 74 | 75 | #### 3. Configure Claude Desktop 76 | 77 | Add BioMCP to your configuration file: 78 | 79 | ```json 80 | { 81 | "mcpServers": { 82 | "biomcp": { 83 | "command": "docker", 84 | "args": ["run", "-i", "--rm", "biomcp:latest"], 85 | "env": { 86 | "NCI_API_KEY": "your-nci-api-key-here", 87 | "ALPHAGENOME_API_KEY": "your-alphagenome-key-here", 88 | "CBIO_TOKEN": "your-cbioportal-token-here" 89 | } 90 | } 91 | } 92 | } 93 | ``` 94 | 95 | ## Verification 96 | 97 | 1. Restart Claude Desktop after updating the configuration 98 | 2. Start a new conversation 99 | 3. Look for the 🔌 icon indicating MCP is connected 100 | 4. Test with: "Can you search for articles about BRAF mutations in melanoma?" 101 | 102 | ## Setting Up API Keys 103 | 104 | While BioMCP works without API keys, some features require them for full functionality: 105 | 106 | ### NCI API Key (Optional) 107 | 108 | Enables access to NCI's clinical trials database with advanced filters: 109 | 110 | - Get your key from [NCI API Portal](https://api.cancer.gov) 111 | - Add to configuration as `NCI_API_KEY` 112 | 113 | ### AlphaGenome API Key (Optional) 114 | 115 | Enables variant effect predictions using Google DeepMind's AlphaGenome: 116 | 117 | - Register at [AlphaGenome Portal](https://alphagenome.google.com) 118 | - Add to configuration as `ALPHAGENOME_API_KEY` 119 | 120 | ### cBioPortal Token (Optional) 121 | 122 | Enables enhanced cancer genomics queries: 123 | 124 | - Get token from [cBioPortal](https://www.cbioportal.org/webAPI) 125 | - Add to configuration as `CBIO_TOKEN` 126 | 127 | ## Usage Examples 128 | 129 | Once configured, you can ask Claude to perform various biomedical research tasks: 130 | 131 | ### Literature Search 132 | 133 | ``` 134 | "Find recent articles about CAR-T therapy for B-cell lymphomas" 135 | ``` 136 | 137 | ### Clinical Trials 138 | 139 | ``` 140 | "Search for actively recruiting trials for EGFR-mutant lung cancer" 141 | ``` 142 | 143 | ### Variant Analysis 144 | 145 | ``` 146 | "What is known about the pathogenicity of BRCA1 c.5266dupC?" 147 | ``` 148 | 149 | ### Drug Information 150 | 151 | ``` 152 | "Tell me about the mechanism of action and indications for pembrolizumab" 153 | ``` 154 | 155 | ### Complex Research 156 | 157 | ``` 158 | "I need a comprehensive analysis of treatment options for a patient with 159 | BRAF V600E melanoma who has progressed on dabrafenib/trametinib" 160 | ``` 161 | 162 | ## The Deep Researcher Persona 163 | 164 | BioMCP includes a specialized "Deep Researcher" persona that enhances Claude's biomedical research capabilities: 165 | 166 | - **Sequential Thinking**: Automatically uses the `think` tool for systematic analysis 167 | - **Comprehensive Coverage**: Searches multiple databases and synthesizes findings 168 | - **Evidence-Based**: Provides citations and links to primary sources 169 | - **Clinical Focus**: Understands medical context and terminology 170 | 171 | To activate, simply ask biomedical questions naturally. The persona automatically engages for research tasks. 172 | 173 | ## Troubleshooting 174 | 175 | ### "MCP Connection Failed" 176 | 177 | 1. Verify the configuration file path is correct 178 | 2. Check JSON syntax (no trailing commas) 179 | 3. Ensure Claude Desktop has been restarted 180 | 4. Check that uv or Docker is properly installed 181 | 182 | ### "Command Not Found" 183 | 184 | **For uv**: 185 | 186 | ```bash 187 | # Verify uv installation 188 | uv --version 189 | 190 | # Ensure PATH includes uv 191 | echo $PATH | grep -q "\.local/bin" || echo "PATH needs updating" 192 | ``` 193 | 194 | **For Docker**: 195 | 196 | ```bash 197 | # Verify Docker is running 198 | docker ps 199 | 200 | # Test BioMCP container 201 | docker run -it --rm biomcp:latest --help 202 | ``` 203 | 204 | ### "No Results Found" 205 | 206 | - Check your internet connection 207 | - Verify API keys are correctly set (if using optional features) 208 | - Try simpler queries first 209 | - Use official gene symbols (e.g., "TP53" not "p53") 210 | 211 | ### Performance Issues 212 | 213 | **For uv**: 214 | 215 | - First run may be slow due to package downloads 216 | - Subsequent runs use cached environments 217 | 218 | **For Docker**: 219 | 220 | - Ensure Docker has sufficient memory allocated 221 | - Consider building with `--platform` flag for Apple Silicon 222 | 223 | ## Advanced Configuration 224 | 225 | ### Custom Environment Variables 226 | 227 | Add any additional environment variables your research requires: 228 | 229 | ```json 230 | { 231 | "mcpServers": { 232 | "biomcp": { 233 | "command": "uv", 234 | "args": ["run", "--with", "biomcp-python", "biomcp", "run"], 235 | "env": { 236 | "BIOMCP_LOG_LEVEL": "DEBUG", 237 | "BIOMCP_CACHE_DIR": "/path/to/cache", 238 | "HTTP_PROXY": "http://your-proxy:8080" 239 | } 240 | } 241 | } 242 | } 243 | ``` 244 | 245 | ### Multiple Configurations 246 | 247 | You can run multiple BioMCP instances with different settings: 248 | 249 | ```json 250 | { 251 | "mcpServers": { 252 | "biomcp-prod": { 253 | "command": "uv", 254 | "args": ["run", "--with", "biomcp-python", "biomcp", "run"], 255 | "env": { 256 | "BIOMCP_ENV": "production" 257 | } 258 | }, 259 | "biomcp-dev": { 260 | "command": "uv", 261 | "args": ["run", "--with", "biomcp-python@latest", "biomcp", "run"], 262 | "env": { 263 | "BIOMCP_ENV": "development", 264 | "BIOMCP_LOG_LEVEL": "DEBUG" 265 | } 266 | } 267 | } 268 | } 269 | ``` 270 | 271 | ## Best Practices 272 | 273 | 1. **Start Simple**: Test with basic queries before complex research tasks 274 | 2. **Be Specific**: Use official gene symbols and disease names 275 | 3. **Iterate**: Refine queries based on initial results 276 | 4. **Verify Sources**: Always check the provided citations 277 | 5. **Save Important Findings**: Export conversation or copy key results 278 | 279 | ## Getting Help 280 | 281 | - **Documentation**: [BioMCP Docs](https://github.com/genomoncology/biomcp) 282 | - **Issues**: [GitHub Issues](https://github.com/genomoncology/biomcp/issues) 283 | - **Community**: [Discussions](https://github.com/genomoncology/biomcp/discussions) 284 | 285 | ## Next Steps 286 | 287 | Now that BioMCP is integrated with Claude Desktop: 288 | 289 | 1. Try the [example queries](#usage-examples) above 290 | 2. Explore [How-to Guides](../how-to-guides/01-find-articles-and-cbioportal-data.md) for specific research workflows 291 | 3. Learn about [Sequential Thinking](../concepts/03-sequential-thinking-with-the-think-tool.md) for complex analyses 292 | 4. Set up [additional API keys](03-authentication-and-api-keys.md) for enhanced features 293 | ``` -------------------------------------------------------------------------------- /src/biomcp/articles/unified.py: -------------------------------------------------------------------------------- ```python 1 | """Unified article search combining PubMed and preprint sources.""" 2 | 3 | import asyncio 4 | import json 5 | import logging 6 | from collections.abc import Coroutine 7 | from typing import Any 8 | 9 | from .. import render 10 | from .preprints import search_preprints 11 | from .search import PubmedRequest, search_articles 12 | 13 | logger = logging.getLogger(__name__) 14 | 15 | 16 | def _deduplicate_articles(articles: list[dict]) -> list[dict]: 17 | """Remove duplicate articles based on DOI.""" 18 | seen_dois = set() 19 | unique_articles = [] 20 | for article in articles: 21 | doi = article.get("doi") 22 | if doi and doi in seen_dois: 23 | continue 24 | if doi: 25 | seen_dois.add(doi) 26 | unique_articles.append(article) 27 | return unique_articles 28 | 29 | 30 | def _parse_search_results(results: list) -> list[dict]: 31 | """Parse search results from JSON strings.""" 32 | all_articles = [] 33 | for result in results: 34 | if isinstance(result, str): 35 | try: 36 | articles = json.loads(result) 37 | if isinstance(articles, list): 38 | all_articles.extend(articles) 39 | except json.JSONDecodeError: 40 | continue 41 | return all_articles 42 | 43 | 44 | async def _extract_mutation_pattern( 45 | keywords: list[str], 46 | ) -> tuple[str | None, str | None]: 47 | """Extract mutation pattern from keywords asynchronously.""" 48 | if not keywords: 49 | return None, None 50 | 51 | # Use asyncio.to_thread for CPU-bound regex operations 52 | import re 53 | 54 | def _extract_sync(): 55 | for keyword in keywords: 56 | # Check for specific mutations (e.g., F57Y, V600E) 57 | if re.match(r"^[A-Z]\d+[A-Z*]$", keyword): 58 | if keyword.endswith("*"): 59 | return keyword, None # mutation_pattern 60 | else: 61 | return None, keyword # specific_mutation 62 | return None, None 63 | 64 | # Run CPU-bound operation in thread pool 65 | return await asyncio.to_thread(_extract_sync) 66 | 67 | 68 | async def _get_mutation_summary( 69 | gene: str, mutation: str | None, pattern: str | None 70 | ) -> str | None: 71 | """Get mutation-specific cBioPortal summary.""" 72 | from ..variants.cbioportal_mutations import ( 73 | CBioPortalMutationClient, 74 | format_mutation_search_result, 75 | ) 76 | 77 | mutation_client = CBioPortalMutationClient() 78 | 79 | if mutation: 80 | logger.info(f"Searching for specific mutation {gene} {mutation}") 81 | result = await mutation_client.search_specific_mutation( 82 | gene=gene, mutation=mutation, max_studies=20 83 | ) 84 | else: 85 | logger.info(f"Searching for mutation pattern {gene} {pattern}") 86 | result = await mutation_client.search_specific_mutation( 87 | gene=gene, pattern=pattern, max_studies=20 88 | ) 89 | 90 | return format_mutation_search_result(result) if result else None 91 | 92 | 93 | async def _get_gene_summary(gene: str) -> str | None: 94 | """Get regular gene cBioPortal summary.""" 95 | from ..variants.cbioportal_search import ( 96 | CBioPortalSearchClient, 97 | format_cbioportal_search_summary, 98 | ) 99 | 100 | client = CBioPortalSearchClient() 101 | summary = await client.get_gene_search_summary(gene, max_studies=5) 102 | return format_cbioportal_search_summary(summary) if summary else None 103 | 104 | 105 | async def _get_cbioportal_summary(request: PubmedRequest) -> str | None: 106 | """Get cBioPortal summary for the search request.""" 107 | if not request.genes: 108 | return None 109 | 110 | try: 111 | gene = request.genes[0] 112 | mutation_pattern, specific_mutation = await _extract_mutation_pattern( 113 | request.keywords 114 | ) 115 | 116 | if specific_mutation or mutation_pattern: 117 | return await _get_mutation_summary( 118 | gene, specific_mutation, mutation_pattern 119 | ) 120 | else: 121 | return await _get_gene_summary(gene) 122 | 123 | except Exception as e: 124 | logger.warning( 125 | f"Failed to get cBioPortal summary for gene search: {e}" 126 | ) 127 | return None 128 | 129 | 130 | async def search_articles_unified( # noqa: C901 131 | request: PubmedRequest, 132 | include_pubmed: bool = True, 133 | include_preprints: bool = False, 134 | include_cbioportal: bool = True, 135 | output_json: bool = False, 136 | ) -> str: 137 | """Search for articles across PubMed and preprint sources.""" 138 | # Import here to avoid circular imports 139 | from ..shared_context import SearchContextManager 140 | 141 | # Use shared context to avoid redundant validations 142 | with SearchContextManager() as context: 143 | # Pre-validate genes once 144 | if request.genes: 145 | valid_genes = [] 146 | for gene in request.genes: 147 | if await context.validate_gene(gene): 148 | valid_genes.append(gene) 149 | request.genes = valid_genes 150 | 151 | tasks: list[Coroutine[Any, Any, Any]] = [] 152 | task_labels = [] 153 | 154 | if include_pubmed: 155 | tasks.append(search_articles(request, output_json=True)) 156 | task_labels.append("pubmed") 157 | 158 | if include_preprints: 159 | tasks.append(search_preprints(request, output_json=True)) 160 | task_labels.append("preprints") 161 | 162 | # Add cBioPortal to parallel execution 163 | if include_cbioportal and request.genes: 164 | tasks.append(_get_cbioportal_summary(request)) 165 | task_labels.append("cbioportal") 166 | 167 | if not tasks: 168 | return json.dumps([]) if output_json else render.to_markdown([]) 169 | 170 | # Run all operations in parallel 171 | results = await asyncio.gather(*tasks, return_exceptions=True) 172 | 173 | # Create result map for easier processing 174 | result_map = dict(zip(task_labels, results, strict=False)) 175 | 176 | # Extract cBioPortal summary if it was included 177 | cbioportal_summary: str | None = None 178 | if "cbioportal" in result_map: 179 | result = result_map["cbioportal"] 180 | if not isinstance(result, Exception) and isinstance(result, str): 181 | cbioportal_summary = result 182 | 183 | # Parse article search results 184 | article_results = [] 185 | for label, result in result_map.items(): 186 | if label != "cbioportal" and not isinstance(result, Exception): 187 | article_results.append(result) 188 | 189 | # Parse and deduplicate results 190 | all_articles = _parse_search_results(article_results) 191 | unique_articles = _deduplicate_articles(all_articles) 192 | 193 | # Sort by publication state (peer-reviewed first) and then by date 194 | unique_articles.sort( 195 | key=lambda x: ( 196 | 0 197 | if x.get("publication_state", "peer_reviewed") 198 | == "peer_reviewed" 199 | else 1, 200 | x.get("date", "0000-00-00"), 201 | ), 202 | reverse=True, 203 | ) 204 | 205 | if unique_articles and not output_json: 206 | result = render.to_markdown(unique_articles) 207 | if cbioportal_summary and isinstance(cbioportal_summary, str): 208 | # Add cBioPortal summary at the beginning 209 | result = cbioportal_summary + "\n\n---\n\n" + result 210 | return result 211 | else: 212 | if cbioportal_summary: 213 | return json.dumps( 214 | { 215 | "cbioportal_summary": cbioportal_summary, 216 | "articles": unique_articles, 217 | }, 218 | indent=2, 219 | ) 220 | return json.dumps(unique_articles, indent=2) 221 | ``` -------------------------------------------------------------------------------- /src/biomcp/openfda/adverse_events.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | OpenFDA Drug Adverse Events (FAERS) integration. 3 | """ 4 | 5 | import logging 6 | 7 | from .adverse_events_helpers import ( 8 | format_drug_details, 9 | format_reaction_details, 10 | format_report_metadata, 11 | format_report_summary, 12 | format_search_summary, 13 | format_top_reactions, 14 | ) 15 | from .constants import ( 16 | OPENFDA_DEFAULT_LIMIT, 17 | OPENFDA_DISCLAIMER, 18 | OPENFDA_DRUG_EVENTS_URL, 19 | OPENFDA_MAX_LIMIT, 20 | ) 21 | from .exceptions import ( 22 | OpenFDAConnectionError, 23 | OpenFDARateLimitError, 24 | OpenFDATimeoutError, 25 | ) 26 | from .input_validation import sanitize_input 27 | from .utils import clean_text, make_openfda_request 28 | 29 | logger = logging.getLogger(__name__) 30 | 31 | 32 | def _build_search_query( 33 | drug: str | None, reaction: str | None, serious: bool | None 34 | ) -> str: 35 | """Build the search query for adverse events.""" 36 | search_parts = [] 37 | 38 | if drug: 39 | # Sanitize drug input to prevent injection 40 | drug = sanitize_input(drug, max_length=100) 41 | if drug: 42 | drug_query = ( 43 | f'(patient.drug.medicinalproduct:"{drug}" OR ' 44 | f'patient.drug.openfda.brand_name:"{drug}" OR ' 45 | f'patient.drug.openfda.generic_name:"{drug}")' 46 | ) 47 | search_parts.append(drug_query) 48 | 49 | if reaction: 50 | # Sanitize reaction input 51 | reaction = sanitize_input(reaction, max_length=200) 52 | if reaction: 53 | search_parts.append( 54 | f'patient.reaction.reactionmeddrapt:"{reaction}"' 55 | ) 56 | 57 | if serious is not None: 58 | serious_value = "1" if serious else "2" 59 | search_parts.append(f"serious:{serious_value}") 60 | 61 | return " AND ".join(search_parts) 62 | 63 | 64 | async def search_adverse_events( # noqa: C901 65 | drug: str | None = None, 66 | reaction: str | None = None, 67 | serious: bool | None = None, 68 | limit: int = OPENFDA_DEFAULT_LIMIT, 69 | skip: int = 0, 70 | api_key: str | None = None, 71 | ) -> str: 72 | """ 73 | Search FDA adverse event reports (FAERS). 74 | 75 | Args: 76 | drug: Drug name to search for 77 | reaction: Adverse reaction term to search for 78 | serious: Filter for serious events only 79 | limit: Maximum number of results 80 | skip: Number of results to skip 81 | api_key: Optional OpenFDA API key (overrides OPENFDA_API_KEY env var) 82 | 83 | Returns: 84 | Formatted string with adverse event information 85 | """ 86 | if not drug and not reaction: 87 | return ( 88 | "⚠️ Please specify either a drug name or reaction term to search " 89 | "adverse events.\n\n" 90 | "Examples:\n" 91 | "- Search by drug: --drug 'imatinib'\n" 92 | "- Search by reaction: --reaction 'nausea'\n" 93 | "- Both: --drug 'imatinib' --reaction 'nausea'" 94 | ) 95 | 96 | # Build and execute search 97 | search_query = _build_search_query(drug, reaction, serious) 98 | params = { 99 | "search": search_query, 100 | "limit": min(limit, OPENFDA_MAX_LIMIT), 101 | "skip": skip, 102 | } 103 | 104 | try: 105 | response, error = await make_openfda_request( 106 | OPENFDA_DRUG_EVENTS_URL, params, "openfda_adverse_events", api_key 107 | ) 108 | except OpenFDARateLimitError: 109 | return ( 110 | "⚠️ **FDA API Rate Limit Exceeded**\n\n" 111 | "You've exceeded the FDA's rate limit. Options:\n" 112 | "• Wait a moment and try again\n" 113 | "• Provide an FDA API key for higher limits (240/min vs 40/min)\n" 114 | "• Get a free key at: https://open.fda.gov/apis/authentication/" 115 | ) 116 | except OpenFDATimeoutError: 117 | return ( 118 | "⏱️ **Request Timeout**\n\n" 119 | "The FDA API is taking too long to respond. This may be due to:\n" 120 | "• High server load\n" 121 | "• Complex query\n" 122 | "• Network issues\n\n" 123 | "Please try again in a moment." 124 | ) 125 | except OpenFDAConnectionError as e: 126 | return ( 127 | "🔌 **Connection Error**\n\n" 128 | f"Unable to connect to FDA API: {e}\n\n" 129 | "Please check your internet connection and try again." 130 | ) 131 | 132 | if error: 133 | return f"⚠️ Error searching adverse events: {error}" 134 | 135 | if not response or not response.get("results"): 136 | search_desc = [] 137 | if drug: 138 | search_desc.append(f"drug '{drug}'") 139 | if reaction: 140 | search_desc.append(f"reaction '{reaction}'") 141 | return ( 142 | f"No adverse event reports found for {' and '.join(search_desc)}." 143 | ) 144 | 145 | results = response["results"] 146 | total = ( 147 | response.get("meta", {}).get("results", {}).get("total", len(results)) 148 | ) 149 | 150 | # Build output 151 | output = ["## FDA Adverse Event Reports\n"] 152 | output.extend(format_search_summary(drug, reaction, serious, total)) 153 | 154 | # Add top reactions if searching by drug 155 | if drug and not reaction: 156 | output.extend(format_top_reactions(results)) 157 | 158 | # Add sample reports 159 | output.append( 160 | f"### Sample Reports (showing {min(len(results), 3)} of {total}):\n" 161 | ) 162 | for i, result in enumerate(results[:3], 1): 163 | output.extend(format_report_summary(result, i)) 164 | 165 | output.append(f"\n{OPENFDA_DISCLAIMER}") 166 | return "\n".join(output) 167 | 168 | 169 | async def get_adverse_event(report_id: str, api_key: str | None = None) -> str: 170 | """ 171 | Get detailed information for a specific adverse event report. 172 | 173 | Args: 174 | report_id: Safety report ID 175 | api_key: Optional OpenFDA API key (overrides OPENFDA_API_KEY env var) 176 | 177 | Returns: 178 | Formatted string with detailed report information 179 | """ 180 | params = { 181 | "search": f'safetyreportid:"{report_id}"', 182 | "limit": 1, 183 | } 184 | 185 | response, error = await make_openfda_request( 186 | OPENFDA_DRUG_EVENTS_URL, 187 | params, 188 | "openfda_adverse_event_detail", 189 | api_key, 190 | ) 191 | 192 | if error: 193 | return f"⚠️ Error retrieving adverse event report: {error}" 194 | 195 | if not response or not response.get("results"): 196 | return f"Adverse event report '{report_id}' not found." 197 | 198 | result = response["results"][0] 199 | patient = result.get("patient", {}) 200 | 201 | # Build detailed output 202 | output = [f"## Adverse Event Report: {report_id}\n"] 203 | 204 | # Patient Information 205 | output.extend(_format_patient_info(patient)) 206 | 207 | # Drug Information 208 | if drugs := patient.get("drug", []): 209 | output.extend(format_drug_details(drugs)) 210 | 211 | # Reactions 212 | if reactions := patient.get("reaction", []): 213 | output.extend(format_reaction_details(reactions)) 214 | 215 | # Event Summary 216 | if summary := patient.get("summary", {}).get("narrativeincludeclinical"): 217 | output.append("### Event Narrative") 218 | output.append(clean_text(summary)) 219 | output.append("") 220 | 221 | # Report metadata 222 | output.extend(format_report_metadata(result)) 223 | 224 | output.append(f"\n{OPENFDA_DISCLAIMER}") 225 | return "\n".join(output) 226 | 227 | 228 | def _format_patient_info(patient: dict) -> list[str]: 229 | """Format patient information section.""" 230 | output = ["### Patient Information"] 231 | 232 | if age := patient.get("patientonsetage"): 233 | output.append(f"- **Age**: {age} years") 234 | 235 | sex_map = {0: "Unknown", 1: "Male", 2: "Female"} 236 | sex_code = patient.get("patientsex") 237 | sex = ( 238 | sex_map.get(sex_code, "Unknown") if sex_code is not None else "Unknown" 239 | ) 240 | output.append(f"- **Sex**: {sex}") 241 | 242 | if weight := patient.get("patientweight"): 243 | output.append(f"- **Weight**: {weight} kg") 244 | 245 | output.append("") 246 | return output 247 | ``` -------------------------------------------------------------------------------- /docs/how-to-guides/01-find-articles-and-cbioportal-data.md: -------------------------------------------------------------------------------- ```markdown 1 | # How to Find Articles and cBioPortal Data 2 | 3 | This guide walks you through searching biomedical literature with automatic cancer genomics integration from cBioPortal. 4 | 5 | ## Overview 6 | 7 | When searching for articles about genes, BioMCP automatically enriches your results with: 8 | 9 | - **cBioPortal Summary**: Mutation frequencies, hotspots, and cancer type distribution ([API Reference](../backend-services-reference/03-cbioportal.md)) 10 | - **PubMed Articles**: Peer-reviewed research with entity annotations ([PubTator3 Reference](../backend-services-reference/06-pubtator3.md)) 11 | - **Preprints**: Latest findings from bioRxiv and medRxiv 12 | 13 | ## Basic Article Search 14 | 15 | ### Search by Gene 16 | 17 | Find articles about a specific gene: 18 | 19 | ```bash 20 | # CLI 21 | biomcp article search --gene BRAF --limit 5 22 | 23 | # Python 24 | articles = await client.articles.search(genes=["BRAF"], limit=5) 25 | 26 | # MCP Tool 27 | article_searcher(genes=["BRAF"], limit=5) 28 | ``` 29 | 30 | This automatically includes: 31 | 32 | 1. cBioPortal summary showing BRAF mutation frequency across cancers 33 | 2. Top mutation hotspots (e.g., V600E) 34 | 3. Recent articles mentioning BRAF 35 | 36 | ### Search by Disease 37 | 38 | Find articles about a specific disease: 39 | 40 | ```bash 41 | # CLI 42 | biomcp article search --disease melanoma --limit 10 43 | 44 | # Python 45 | articles = await client.articles.search(diseases=["melanoma"]) 46 | 47 | # MCP Tool 48 | article_searcher(diseases=["melanoma"]) 49 | ``` 50 | 51 | ## Advanced Search Techniques 52 | 53 | ### Combining Multiple Filters 54 | 55 | Search for articles at the intersection of genes, diseases, and chemicals: 56 | 57 | ```bash 58 | # CLI - EGFR mutations in lung cancer treated with erlotinib 59 | biomcp article search \ 60 | --gene EGFR \ 61 | --disease "lung cancer" \ 62 | --chemical erlotinib \ 63 | --limit 20 64 | 65 | # Python 66 | articles = await client.articles.search( 67 | genes=["EGFR"], 68 | diseases=["lung cancer"], 69 | chemicals=["erlotinib"] 70 | ) 71 | ``` 72 | 73 | ### Using OR Logic in Keywords 74 | 75 | Find articles mentioning different notations of the same variant: 76 | 77 | ```bash 78 | # CLI - Find any notation of BRAF V600E 79 | biomcp article search \ 80 | --gene BRAF \ 81 | --keyword "V600E|p.V600E|c.1799T>A" 82 | 83 | # Python - Different names for same concept 84 | articles = await client.articles.search( 85 | diseases=["NSCLC|non-small cell lung cancer"], 86 | chemicals=["pembrolizumab|Keytruda|anti-PD-1"] 87 | ) 88 | ``` 89 | 90 | ### Excluding Preprints 91 | 92 | For peer-reviewed articles only: 93 | 94 | ```bash 95 | # CLI 96 | biomcp article search --gene TP53 --no-preprints 97 | 98 | # Python 99 | articles = await client.articles.search( 100 | genes=["TP53"], 101 | include_preprints=False 102 | ) 103 | ``` 104 | 105 | ## Understanding cBioPortal Integration 106 | 107 | ### What cBioPortal Provides 108 | 109 | When you search for a gene, the first result includes: 110 | 111 | ```markdown 112 | ### cBioPortal Summary for BRAF 113 | 114 | - **Mutation Frequency**: 76.7% (368 mutations in 480 samples) 115 | - **Studies**: 1 of 5 studies have mutations 116 | 117 | **Top Hotspots:** 118 | 119 | 1. V600E: 310 mutations (84.2%) 120 | 2. V600K: 23 mutations (6.3%) 121 | 3. V600M: 12 mutations (3.3%) 122 | 123 | **Cancer Type Distribution:** 124 | 125 | - Skin Cancer, Non-Melanoma: 156 mutations 126 | - Melanoma: 91 mutations 127 | - Thyroid Cancer: 87 mutations 128 | ``` 129 | 130 | ### Mutation-Specific Searches 131 | 132 | Search for articles about specific mutations: 133 | 134 | ```python 135 | # Search for BRAF V600E specifically 136 | articles = await client.articles.search( 137 | genes=["BRAF"], 138 | keywords=["V600E"], 139 | include_cbioportal=True # Default 140 | ) 141 | ``` 142 | 143 | The cBioPortal summary will highlight the specific mutation if found. 144 | 145 | ### Disabling cBioPortal 146 | 147 | If you don't need cancer genomics data: 148 | 149 | ```bash 150 | # CLI 151 | biomcp article search --gene BRCA1 --no-cbioportal 152 | 153 | # Python 154 | articles = await client.articles.search( 155 | genes=["BRCA1"], 156 | include_cbioportal=False 157 | ) 158 | ``` 159 | 160 | ## Practical Examples 161 | 162 | ### Example 1: Resistance Mechanism Research 163 | 164 | Find articles about EGFR T790M resistance: 165 | 166 | ```python 167 | # Using think tool first (for MCP) 168 | think( 169 | thought="Researching EGFR T790M resistance mechanisms in lung cancer", 170 | thoughtNumber=1 171 | ) 172 | 173 | # Search with multiple relevant terms 174 | articles = await article_searcher( 175 | genes=["EGFR"], 176 | diseases=["lung cancer|NSCLC"], 177 | keywords=["T790M|p.T790M|resistance|resistant"], 178 | chemicals=["osimertinib|gefitinib|erlotinib"] 179 | ) 180 | ``` 181 | 182 | ### Example 2: Combination Therapy Research 183 | 184 | Research BRAF/MEK combination therapy: 185 | 186 | ```bash 187 | # CLI approach 188 | biomcp article search \ 189 | --gene BRAF --gene MEK1 --gene MEK2 \ 190 | --disease melanoma \ 191 | --chemical dabrafenib --chemical trametinib \ 192 | --keyword "combination therapy|combined treatment" 193 | ``` 194 | 195 | ### Example 3: Biomarker Discovery 196 | 197 | Find articles about potential biomarkers: 198 | 199 | ```python 200 | # Search for PD-L1 as a biomarker 201 | articles = await client.articles.search( 202 | genes=["CD274"], # PD-L1 gene symbol 203 | keywords=["biomarker|predictive|prognostic"], 204 | diseases=["cancer"], 205 | limit=50 206 | ) 207 | 208 | # Filter results programmatically 209 | biomarker_articles = [ 210 | a for a in articles 211 | if "biomarker" in a.title.lower() or "predictive" in a.abstract.lower() 212 | ] 213 | ``` 214 | 215 | ## Working with Results 216 | 217 | ### Extracting Key Information 218 | 219 | ```python 220 | # Process article results 221 | for article in articles: 222 | print(f"Title: {article.title}") 223 | print(f"PMID: {article.pmid}") 224 | print(f"URL: {article.url}") 225 | 226 | # Extract annotated entities 227 | genes = article.metadata.get("genes", []) 228 | diseases = article.metadata.get("diseases", []) 229 | chemicals = article.metadata.get("chemicals", []) 230 | 231 | print(f"Genes mentioned: {', '.join(genes)}") 232 | print(f"Diseases: {', '.join(diseases)}") 233 | print(f"Chemicals: {', '.join(chemicals)}") 234 | ``` 235 | 236 | ### Fetching Full Article Details 237 | 238 | Get complete article information: 239 | 240 | ```python 241 | # Get article by PMID 242 | full_article = await client.articles.get("38768446") 243 | 244 | # Access full abstract 245 | print(full_article.abstract) 246 | 247 | # Check for full text availability 248 | if full_article.full_text_url: 249 | print(f"Full text: {full_article.full_text_url}") 250 | ``` 251 | 252 | ## Tips for Effective Searches 253 | 254 | ### 1. Use Official Gene Symbols 255 | 256 | ```python 257 | # ✅ Correct - Official HGNC symbol 258 | articles = await search(genes=["ERBB2"]) 259 | 260 | # ❌ Avoid - Common name 261 | articles = await search(genes=["HER2"]) # May miss results 262 | ``` 263 | 264 | ### 2. Include Synonyms for Diseases 265 | 266 | ```python 267 | # Cover all variations 268 | articles = await search( 269 | diseases=["GIST|gastrointestinal stromal tumor|gastrointestinal stromal tumour"] 270 | ) 271 | ``` 272 | 273 | ### 3. Leverage PubTator Annotations 274 | 275 | PubTator automatically annotates articles with: 276 | 277 | - Gene mentions (normalized to official symbols) 278 | - Disease concepts (mapped to MeSH terms) 279 | - Chemical/drug entities 280 | - Genetic variants 281 | - Species 282 | 283 | ### 4. Combine with Other Tools 284 | 285 | ```python 286 | # 1. Find articles about a gene 287 | articles = await article_searcher(genes=["ALK"]) 288 | 289 | # 2. Get gene details for context 290 | gene_info = await gene_getter("ALK") 291 | 292 | # 3. Find relevant trials 293 | trials = await trial_searcher( 294 | other_terms=["ALK positive", "ALK rearrangement"] 295 | ) 296 | ``` 297 | 298 | ## Troubleshooting 299 | 300 | ### No Results Found 301 | 302 | 1. **Check gene symbols**: Use [genenames.org](https://www.genenames.org) 303 | 2. **Broaden search**: Remove filters one by one 304 | 3. **Try synonyms**: Especially for diseases and drugs 305 | 306 | ### cBioPortal Data Missing 307 | 308 | - Some genes may not have cancer genomics data 309 | - Try searching for cancer-related genes 310 | - Check if gene symbol is correct 311 | 312 | ### Preprint Issues 313 | 314 | - Europe PMC may have delays in indexing 315 | - Some preprints may not have DOIs 316 | - Try searching by title keywords instead 317 | 318 | ## Next Steps 319 | 320 | - Learn to [find trials with NCI and BioThings](02-find-trials-with-nci-and-biothings.md) 321 | - Explore [variant annotations](03-get-comprehensive-variant-annotations.md) 322 | - Set up [API keys](../getting-started/03-authentication-and-api-keys.md) for enhanced features 323 | ``` -------------------------------------------------------------------------------- /tests/tdd/test_network_policies.py: -------------------------------------------------------------------------------- ```python 1 | """Comprehensive tests for network policies and HTTP centralization.""" 2 | 3 | from pathlib import Path 4 | from unittest.mock import patch 5 | 6 | import pytest 7 | 8 | from biomcp.http_client import request_api 9 | from biomcp.utils.endpoint_registry import ( 10 | DataType, 11 | EndpointCategory, 12 | EndpointInfo, 13 | EndpointRegistry, 14 | get_registry, 15 | ) 16 | 17 | 18 | class TestEndpointRegistry: 19 | """Test the endpoint registry functionality.""" 20 | 21 | def test_registry_initialization(self): 22 | """Test that registry initializes with known endpoints.""" 23 | registry = EndpointRegistry() 24 | endpoints = registry.get_all_endpoints() 25 | 26 | # Check we have endpoints registered 27 | assert len(endpoints) > 0 28 | 29 | # Check specific endpoints exist 30 | assert "pubtator3_search" in endpoints 31 | assert "clinicaltrials_search" in endpoints 32 | assert "myvariant_query" in endpoints 33 | assert "cbioportal_api" in endpoints 34 | 35 | def test_get_endpoints_by_category(self): 36 | """Test filtering endpoints by category.""" 37 | registry = EndpointRegistry() 38 | 39 | # Get biomedical literature endpoints 40 | lit_endpoints = registry.get_endpoints_by_category( 41 | EndpointCategory.BIOMEDICAL_LITERATURE 42 | ) 43 | assert len(lit_endpoints) > 0 44 | assert all( 45 | e.category == EndpointCategory.BIOMEDICAL_LITERATURE 46 | for e in lit_endpoints.values() 47 | ) 48 | 49 | # Get clinical trials endpoints 50 | trial_endpoints = registry.get_endpoints_by_category( 51 | EndpointCategory.CLINICAL_TRIALS 52 | ) 53 | assert len(trial_endpoints) > 0 54 | assert all( 55 | e.category == EndpointCategory.CLINICAL_TRIALS 56 | for e in trial_endpoints.values() 57 | ) 58 | 59 | def test_get_unique_domains(self): 60 | """Test getting unique domains.""" 61 | registry = EndpointRegistry() 62 | domains = registry.get_unique_domains() 63 | 64 | assert len(domains) > 0 65 | assert "www.ncbi.nlm.nih.gov" in domains 66 | assert "clinicaltrials.gov" in domains 67 | assert "myvariant.info" in domains 68 | assert "www.cbioportal.org" in domains 69 | 70 | def test_endpoint_info_properties(self): 71 | """Test EndpointInfo dataclass properties.""" 72 | endpoint = EndpointInfo( 73 | url="https://api.example.com/test", 74 | category=EndpointCategory.BIOMEDICAL_LITERATURE, 75 | data_types=[DataType.RESEARCH_ARTICLES], 76 | description="Test endpoint", 77 | compliance_notes="Test compliance", 78 | rate_limit="10 requests/second", 79 | authentication="API key required", 80 | ) 81 | 82 | assert endpoint.domain == "api.example.com" 83 | assert endpoint.category == EndpointCategory.BIOMEDICAL_LITERATURE 84 | assert DataType.RESEARCH_ARTICLES in endpoint.data_types 85 | 86 | def test_markdown_report_generation(self): 87 | """Test markdown report generation.""" 88 | registry = EndpointRegistry() 89 | report = registry.generate_markdown_report() 90 | 91 | # Check report contains expected sections 92 | assert "# Third-Party Endpoints Used by BioMCP" in report 93 | assert "## Overview" in report 94 | assert "## Endpoints by Category" in report 95 | assert "## Domain Summary" in report 96 | assert "## Compliance and Privacy" in report 97 | assert "## Network Control" in report 98 | 99 | # Check it mentions offline mode 100 | assert "BIOMCP_OFFLINE" in report 101 | 102 | # Check it contains actual endpoints 103 | assert "pubtator3" in report 104 | assert "clinicaltrials.gov" in report 105 | assert "myvariant.info" in report 106 | 107 | def test_save_markdown_report(self, tmp_path): 108 | """Test saving markdown report to file.""" 109 | registry = EndpointRegistry() 110 | output_path = tmp_path / "test_endpoints.md" 111 | 112 | saved_path = registry.save_markdown_report(output_path) 113 | 114 | assert saved_path == output_path 115 | assert output_path.exists() 116 | 117 | # Read and verify content 118 | content = output_path.read_text() 119 | assert "Third-Party Endpoints Used by BioMCP" in content 120 | 121 | 122 | class TestEndpointTracking: 123 | """Test endpoint tracking in HTTP client.""" 124 | 125 | @pytest.mark.asyncio 126 | async def test_valid_endpoint_key(self): 127 | """Test that valid endpoint keys are accepted.""" 128 | with patch("biomcp.http_client.call_http") as mock_call: 129 | mock_call.return_value = (200, '{"data": "test"}') 130 | 131 | # Should not raise an error 132 | result, error = await request_api( 133 | url="https://www.ncbi.nlm.nih.gov/research/pubtator3-api/search/", 134 | request={"text": "BRAF"}, 135 | endpoint_key="pubtator3_search", 136 | cache_ttl=0, 137 | ) 138 | 139 | assert result == {"data": "test"} 140 | assert error is None 141 | 142 | @pytest.mark.asyncio 143 | async def test_invalid_endpoint_key_raises_error(self): 144 | """Test that invalid endpoint keys raise an error.""" 145 | with pytest.raises(ValueError, match="Unknown endpoint key"): 146 | await request_api( 147 | url="https://api.example.com/test", 148 | request={"test": "data"}, 149 | endpoint_key="invalid_endpoint_key", 150 | cache_ttl=0, 151 | ) 152 | 153 | @pytest.mark.asyncio 154 | async def test_no_endpoint_key_allowed(self): 155 | """Test that requests without endpoint keys are allowed.""" 156 | with patch("biomcp.http_client.call_http") as mock_call: 157 | mock_call.return_value = (200, '{"data": "test"}') 158 | 159 | # Should not raise an error 160 | result, error = await request_api( 161 | url="https://api.example.com/test", 162 | request={"test": "data"}, 163 | cache_ttl=0, 164 | ) 165 | 166 | assert result == {"data": "test"} 167 | assert error is None 168 | 169 | 170 | class TestHTTPImportChecks: 171 | """Test the HTTP import checking script.""" 172 | 173 | def test_check_script_exists(self): 174 | """Test that the check script exists.""" 175 | script_path = ( 176 | Path(__file__).parent.parent.parent 177 | / "scripts" 178 | / "check_http_imports.py" 179 | ) 180 | assert script_path.exists() 181 | 182 | def test_allowed_files_configured(self): 183 | """Test that allowed files are properly configured.""" 184 | # Import the script module 185 | import sys 186 | 187 | script_path = Path(__file__).parent.parent.parent / "scripts" 188 | sys.path.insert(0, str(script_path)) 189 | 190 | try: 191 | from check_http_imports import ALLOWED_FILES, HTTP_LIBRARIES 192 | 193 | # Check essential files are allowed 194 | assert "http_client.py" in ALLOWED_FILES 195 | assert "http_client_simple.py" in ALLOWED_FILES 196 | 197 | # Check we're checking for the right libraries 198 | assert "httpx" in HTTP_LIBRARIES 199 | assert "aiohttp" in HTTP_LIBRARIES 200 | assert "requests" in HTTP_LIBRARIES 201 | finally: 202 | sys.path.pop(0) 203 | 204 | 205 | class TestGlobalRegistry: 206 | """Test the global registry instance.""" 207 | 208 | def test_get_registry_returns_same_instance(self): 209 | """Test that get_registry returns the same instance.""" 210 | registry1 = get_registry() 211 | registry2 = get_registry() 212 | 213 | assert registry1 is registry2 214 | 215 | def test_global_registry_has_endpoints(self): 216 | """Test that the global registry has endpoints.""" 217 | registry = get_registry() 218 | endpoints = registry.get_all_endpoints() 219 | 220 | assert len(endpoints) > 0 221 | ``` -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- ```markdown 1 | # BioMCP: AI-Powered Biomedical Research 2 | 3 | [](https://github.com/genomoncology/biomcp/tags) 4 | [](https://github.com/genomoncology/biomcp/actions/workflows/main.yml?query=branch%3Amain) 5 | [](https://img.shields.io/github/license/genomoncology/biomcp) 6 | 7 | **Transform how you search and analyze biomedical data** with BioMCP - a powerful tool that connects AI assistants and researchers to critical biomedical databases through natural language. 8 | 9 | ### Built and Maintained by <a href="https://www.genomoncology.com"><img src="./assets/logo.png" width=200 valign="middle" /></a> 10 | 11 | <div class="announcement-banner"> 12 | <div class="announcement-content"> 13 | <h2> 14 | <span class="badge-new">NEW</span> 15 | Remote BioMCP Now Available! 16 | </h2> 17 | <p>Connect to BioMCP instantly through Claude - no installation required!</p> 18 | 19 | <div class="announcement-features"> 20 | <div class="feature-item"> 21 | <strong>🚀 Instant Access</strong> 22 | <span>Start using BioMCP in under 2 minutes</span> 23 | </div> 24 | <div class="feature-item"> 25 | <strong>☁️ Cloud-Powered</strong> 26 | <span>Always up-to-date with latest features</span> 27 | </div> 28 | <div class="feature-item"> 29 | <strong>🔒 Secure Auth</strong> 30 | <span>Google OAuth authentication</span> 31 | </div> 32 | <div class="feature-item"> 33 | <strong>🛠️ 23+ Tools</strong> 34 | <span>Full suite of biomedical research tools</span> 35 | </div> 36 | </div> 37 | 38 | <a href="tutorials/remote-connection/" class="cta-button"> 39 | Connect to Remote BioMCP Now 40 | </a> 41 | 42 | </div> 43 | </div> 44 | 45 | ## What Can You Do with BioMCP? 46 | 47 | ### Search Research Literature 48 | 49 | Find articles about genes, variants, diseases, and drugs with automatic cancer genomics data from cBioPortal 50 | 51 | ```bash 52 | biomcp article search --gene BRAF --disease melanoma 53 | ``` 54 | 55 | ### Discover Clinical Trials 56 | 57 | Search active trials by condition, location, phase, and eligibility criteria including genetic biomarkers 58 | 59 | ```bash 60 | biomcp trial search --condition "lung cancer" --status RECRUITING 61 | ``` 62 | 63 | ### Analyze Genetic Variants 64 | 65 | Query variant databases, predict effects, and understand clinical significance 66 | 67 | ```bash 68 | biomcp variant search --gene TP53 --significance pathogenic 69 | ``` 70 | 71 | ### AI-Powered Analysis 72 | 73 | Use with Claude Desktop for conversational biomedical research with sequential thinking 74 | 75 | ```python 76 | # Claude automatically uses BioMCP tools 77 | "What BRAF mutations are found in melanoma?" 78 | ``` 79 | 80 | ## 5-Minute Quick Start 81 | 82 | ### Choose Your Interface 83 | 84 | === "Claude Desktop (Recommended)" 85 | 86 | **Best for**: Conversational research, complex queries, AI-assisted analysis 87 | 88 | 1. **Install Claude Desktop** from [claude.ai/desktop](https://claude.ai/desktop) 89 | 90 | 2. **Configure BioMCP**: 91 | ```json 92 | { 93 | "mcpServers": { 94 | "biomcp": { 95 | "command": "uv", 96 | "args": [ 97 | "run", "--with", "biomcp-python", 98 | "biomcp", "run" 99 | ] 100 | } 101 | } 102 | } 103 | ``` 104 | 105 | 3. **Start researching**: Ask Claude about any biomedical topic! 106 | 107 | [Full Claude Desktop Guide →](getting-started/02-claude-desktop-integration.md) 108 | 109 | === "Command Line" 110 | 111 | **Best for**: Direct queries, scripting, automation 112 | 113 | 1. **Install BioMCP**: 114 | ```bash 115 | # Using uv (recommended) 116 | uv tool install biomcp 117 | 118 | # Or using pip 119 | pip install biomcp-python 120 | ``` 121 | 122 | 2. **Run your first search**: 123 | ```bash 124 | biomcp article search \ 125 | --gene BRAF --disease melanoma \ 126 | --limit 5 127 | ``` 128 | 129 | [CLI Reference →](user-guides/01-command-line-interface.md) 130 | 131 | === "Python SDK" 132 | 133 | **Best for**: Integration, custom applications, bulk operations 134 | 135 | 1. **Install the package**: 136 | ```bash 137 | pip install biomcp-python 138 | ``` 139 | 140 | 2. **Use in your code**: 141 | ```python 142 | from biomcp import BioMCPClient 143 | 144 | async with BioMCPClient() as client: 145 | articles = await client.articles.search( 146 | genes=["BRAF"], 147 | diseases=["melanoma"] 148 | ) 149 | ``` 150 | 151 | [Python SDK Docs →](apis/python-sdk.md) 152 | 153 | ## Key Features 154 | 155 | ### Unified Search Across Databases 156 | 157 | - **PubMed/PubTator3**: 30M+ research articles with entity recognition 158 | - **ClinicalTrials.gov**: 400K+ clinical trials worldwide 159 | - **MyVariant.info**: Comprehensive variant annotations 160 | - **cBioPortal**: Automatic cancer genomics integration 161 | 162 | ### Intelligent Query Processing 163 | 164 | - Natural language to structured queries 165 | - Automatic synonym expansion 166 | - OR logic support for flexible matching 167 | - Cross-domain relationship discovery 168 | 169 | ### Built for AI Integration 170 | 171 | - 24 specialized MCP tools 172 | - Sequential thinking for complex analysis 173 | - Streaming responses for real-time updates 174 | - Context preservation across queries 175 | 176 | [Explore All Features →](concepts/01-what-is-biomcp.md) 177 | 178 | ## Learn by Example 179 | 180 | ### Find Articles About a Specific Mutation 181 | 182 | ```bash 183 | # Search for BRAF V600E mutations 184 | biomcp article search --gene BRAF \ 185 | --keyword "V600E|p.V600E|c.1799T>A" 186 | ``` 187 | 188 | ### Discover Trials Near You 189 | 190 | ```bash 191 | # Find cancer trials in Boston area 192 | biomcp trial search --condition cancer \ 193 | --latitude 42.3601 --longitude -71.0589 \ 194 | --distance 50 195 | ``` 196 | 197 | ### Get Gene Information 198 | 199 | ```bash 200 | # Get comprehensive gene data 201 | biomcp gene get TP53 202 | ``` 203 | 204 | [More Examples →](tutorials/biothings-prompts.md) 205 | 206 | ## Popular Workflows 207 | 208 | ### Literature Review 209 | 210 | Systematic search across papers, preprints, and clinical trials 211 | [Workflow Guide →](workflows/all-workflows.md#1-literature-review-workflow) 212 | 213 | ### Variant Interpretation 214 | 215 | From variant ID to clinical significance and treatment implications 216 | [Workflow Guide →](workflows/all-workflows.md#3-variant-interpretation-workflow) 217 | 218 | ### Trial Matching 219 | 220 | Find eligible trials based on patient criteria and biomarkers 221 | [Workflow Guide →](workflows/all-workflows.md#2-clinical-trial-matching-workflow) 222 | 223 | ### Drug Research 224 | 225 | Connect drugs to targets, trials, and research literature 226 | [Workflow Guide →](workflows/all-workflows.md) 227 | 228 | ## Advanced Features 229 | 230 | - **[NCI Integration](getting-started/03-authentication-and-api-keys.md#nci-clinical-trials-api)**: Enhanced cancer trial search with biomarker filtering 231 | - **[AlphaGenome](how-to-guides/04-predict-variant-effects-with-alphagenome.md)**: Predict variant effects on gene regulation 232 | - **[BigQuery Logging](how-to-guides/05-logging-and-monitoring-with-bigquery.md)**: Monitor usage and performance 233 | - **[HTTP Server Mode](developer-guides/01-server-deployment.md)**: Deploy as a service 234 | 235 | ## Documentation 236 | 237 | - **[Getting Started](getting-started/01-quickstart-cli.md)** - Installation and first steps 238 | - **[User Guides](user-guides/01-command-line-interface.md)** - Detailed usage instructions 239 | - **[API Reference](apis/overview.md)** - Technical documentation 240 | - **[FAQ](faq-condensed.md)** - Quick answers to common questions 241 | 242 | ## Community & Support 243 | 244 | - **GitHub**: [github.com/genomoncology/biomcp](https://github.com/genomoncology/biomcp) 245 | - **Issues**: [Report bugs or request features](https://github.com/genomoncology/biomcp/issues) 246 | - **Discussions**: [Ask questions and share tips](https://github.com/genomoncology/biomcp/discussions) 247 | 248 | ## License 249 | 250 | BioMCP is licensed under the MIT License. See [LICENSE](https://github.com/genomoncology/biomcp/blob/main/LICENSE) for details. 251 | ``` -------------------------------------------------------------------------------- /docs/tutorials/claude-code-biomcp-alphagenome.md: -------------------------------------------------------------------------------- ```markdown 1 | # Using Claude Code with BioMCP for AlphaGenome Variant Analysis 2 | 3 | This tutorial demonstrates how to use Claude Code with BioMCP to analyze genetic variants using Google DeepMind's AlphaGenome. We'll explore both the MCP server integration and CLI approaches, showing how Claude Code can seamlessly work with both interfaces. 4 | 5 | ## Prerequisites 6 | 7 | - **Claude Code**: Latest version with MCP support 8 | - **Python 3.11+**: Required for BioMCP and AlphaGenome 9 | - **uv**: Modern Python package manager ([installation guide](https://docs.astral.sh/uv/getting-started/installation/)) 10 | - **AlphaGenome API Key**: Get free access at [Google DeepMind AlphaGenome](https://deepmind.google.com/science/alphagenome) 11 | 12 | ## Setup Overview 13 | 14 | BioMCP offers two interfaces that work perfectly with Claude Code: 15 | 16 | 1. **MCP Server**: Integrated directly into Claude Code for seamless workflows 17 | 2. **CLI**: Command-line interface for direct terminal access 18 | 19 | Both produce identical results, giving you flexibility in how you work. 20 | 21 | ## Part 1: MCP Server Setup 22 | 23 | ### Step 1: Install BioMCP CLI 24 | 25 | ```bash 26 | # Install BioMCP CLI globally (note: biomcp-python, not biomcp!) 27 | uv tool install -q biomcp-python 28 | 29 | # Verify installation 30 | biomcp --version 31 | ``` 32 | 33 | ### Step 2: Configure MCP Server 34 | 35 | Add BioMCP to your Claude Code MCP configuration: 36 | 37 | ```bash 38 | # Basic setup (requires ALPHAGENOME_API_KEY environment variable) 39 | claude mcp add biomcp -- uv run --with biomcp-python biomcp run 40 | 41 | # Or with API key in configuration 42 | claude mcp add biomcp -e ALPHAGENOME_API_KEY=your-api-key-here -- uv run --with biomcp-python biomcp run 43 | ``` 44 | 45 | Verify the setup: 46 | 47 | ```bash 48 | claude mcp list 49 | claude mcp get biomcp 50 | ``` 51 | 52 | ### Step 3: Set Environment Variable 53 | 54 | ```bash 55 | # Add to your shell profile (~/.zshrc or ~/.bashrc) 56 | export ALPHAGENOME_API_KEY='your-api-key-here' 57 | 58 | # Or set per-session 59 | export ALPHAGENOME_API_KEY='your-api-key-here' 60 | ``` 61 | 62 | ### Step 4: Install AlphaGenome 63 | 64 | ```bash 65 | # Clone and install AlphaGenome 66 | git clone https://github.com/google-deepmind/alphagenome.git 67 | cd alphagenome && uv pip install . 68 | ``` 69 | 70 | ## Part 2: Testing with Claude Code 71 | 72 | ### Example: DLG1 Exon Skipping Variant 73 | 74 | Let's analyze the variant `chr3:197081044:TACTC>T` from the AlphaGenome paper, which demonstrates exon skipping in the DLG1 gene. 75 | 76 | #### Using MCP Server (Recommended) 77 | 78 | ```python 79 | # Claude Code automatically uses MCP when available 80 | mcp__biomcp__alphagenome_predictor( 81 | chromosome="chr3", 82 | position=197081044, 83 | reference="TACTC", 84 | alternate="T" 85 | ) 86 | ``` 87 | 88 | **Result:** 89 | 90 | ```markdown 91 | ## AlphaGenome Variant Effect Predictions 92 | 93 | **Variant**: chr3:197081044 TACTC>T 94 | **Analysis window**: 131,072 bp 95 | 96 | ### Gene Expression 97 | 98 | - **MELTF**: +2.57 log₂ fold change (↑ increases expression) 99 | 100 | ### Chromatin Accessibility 101 | 102 | - **EFO:0005719 DNase-seq**: +17.27 log₂ change (↑ increases accessibility) 103 | 104 | ### Splicing 105 | 106 | - Potential splicing alterations detected 107 | 108 | ### Summary 109 | 110 | - Analyzed 11796 regulatory tracks 111 | - 6045 tracks show substantial changes (|log₂| > 0.5) 112 | ``` 113 | 114 | #### Using CLI Interface 115 | 116 | ```bash 117 | # Same analysis via CLI 118 | export ALPHAGENOME_API_KEY='your-api-key-here' 119 | uv run biomcp variant predict chr3 197081044 TACTC T 120 | ``` 121 | 122 | **Result:** Identical output to MCP server. 123 | 124 | ## Part 3: Why Both Interfaces Matter 125 | 126 | ### MCP Server Advantages 🔌 127 | 128 | - **Persistent State**: No need to re-export environment variables 129 | - **Workflow Integration**: Seamless chaining with other biomedical tools 130 | - **Structured Data**: Direct programmatic access to results 131 | - **Auto-Documentation**: Built-in parameter validation 132 | 133 | ### CLI Advantages 💻 134 | 135 | - **Immediate Access**: No server setup required 136 | - **Debugging**: Direct command-line testing 137 | - **Scripting**: Easy integration into bash scripts 138 | - **Standalone Use**: Works without Claude Code 139 | 140 | ### Claude Code Perspective 141 | 142 | As Claude Code, both interfaces work equally well. The **MCP approach provides slight benefits**: 143 | 144 | - Results persist across conversation turns 145 | - Built-in error handling and validation 146 | - Automatic integration with thinking and search workflows 147 | - No need to manage environment variables per session 148 | 149 | **Trade-off**: MCP requires initial setup, while CLI is immediately available. 150 | 151 | ## Part 4: Advanced Usage Examples 152 | 153 | ### Multi-Variant Analysis 154 | 155 | ```python 156 | # Analyze multiple variants from AlphaGenome paper 157 | variants = [ 158 | ("chr3", 197081044, "TACTC", "T"), # DLG1 exon skipping 159 | ("chr21", 46126238, "G", "C"), # COL6A2 splice junction 160 | ("chr16", 173694, "A", "G") # HBA2 polyadenylation 161 | ] 162 | 163 | for chr, pos, ref, alt in variants: 164 | result = mcp__biomcp__alphagenome_predictor( 165 | chromosome=chr, 166 | position=pos, 167 | reference=ref, 168 | alternate=alt 169 | ) 170 | print(f"Most affected gene: {result}") 171 | ``` 172 | 173 | ### Tissue-Specific Analysis 174 | 175 | ```python 176 | # Analyze with tissue context 177 | mcp__biomcp__alphagenome_predictor( 178 | chromosome="chr7", 179 | position=140753336, 180 | reference="A", 181 | alternate="T", 182 | tissue_types=["UBERON:0000310"] # breast tissue 183 | ) 184 | ``` 185 | 186 | ### Combined BioMCP Workflow 187 | 188 | ```python 189 | # 1. First, search for known annotations 190 | variant_data = mcp__biomcp__variant_searcher(gene="BRAF") 191 | 192 | # 2. Then predict regulatory effects 193 | regulatory_effects = mcp__biomcp__alphagenome_predictor( 194 | chromosome="chr7", 195 | position=140753336, 196 | reference="A", 197 | alternate="T" 198 | ) 199 | 200 | # 3. Search literature for context 201 | literature = mcp__biomcp__article_searcher( 202 | genes=["BRAF"], 203 | variants=["V600E"] 204 | ) 205 | ``` 206 | 207 | ## Part 5: Validation and Quality Assurance 208 | 209 | ### How We Validated the Integration 210 | 211 | 1. **Raw API Testing**: Directly tested Google's AlphaGenome API 212 | 2. **Source Code Analysis**: Verified BioMCP uses correct API methods (`score_variant` + `get_recommended_scorers`) 213 | 3. **Cross-Validation**: Confirmed identical results across all three approaches: 214 | - Raw Python API: MELTF +2.57 log₂ 215 | - BioMCP CLI: MELTF +2.57 log₂ 216 | - BioMCP MCP: MELTF +2.57 log₂ 217 | 218 | ### Key Scientific Finding 219 | 220 | The variant `chr3:197081044:TACTC>T` most strongly affects **MELTF** (+2.57 log₂ fold change), not DLG1 as initially expected. This demonstrates that AlphaGenome considers the full regulatory landscape, not just the nearest gene. 221 | 222 | ## Part 6: Best Practices 223 | 224 | ### For MCP Usage 225 | 226 | - Use structured thinking with `mcp__biomcp__think` for complex analyses 227 | - Leverage `call_benefit` parameter to improve result quality 228 | - Chain multiple tools for comprehensive variant characterization 229 | 230 | ### For CLI Usage 231 | 232 | - Set `ALPHAGENOME_API_KEY` in your shell profile 233 | - Use `--help` to explore all available parameters 234 | - Combine with other CLI tools via pipes and scripts 235 | 236 | ### General Tips 237 | 238 | - Start with default 131kb analysis window 239 | - Use tissue-specific analysis when relevant 240 | - Validate surprising results with literature search 241 | - Consider both gene expression and chromatin accessibility effects 242 | 243 | ## Conclusion 244 | 245 | BioMCP's dual interface approach (MCP + CLI) provides robust variant analysis capabilities. Claude Code works seamlessly with both, offering flexibility for different workflows. The MCP integration provides slight advantages for interactive analysis, while the CLI excels for scripting and debugging. 246 | 247 | The combination of AlphaGenome's predictive power with BioMCP's comprehensive biomedical data access creates a powerful platform for genetic variant analysis and interpretation. 248 | 249 | ## Resources 250 | 251 | - [BioMCP Documentation](https://biomcp.org) 252 | - [AlphaGenome Paper](https://deepmind.google/science/alphagenome) 253 | - [Claude Code MCP Guide](https://docs.anthropic.com/claude/docs/model-context-protocol) 254 | - [uv Documentation](https://docs.astral.sh/uv/) 255 | ``` -------------------------------------------------------------------------------- /tests/tdd/articles/test_search.py: -------------------------------------------------------------------------------- ```python 1 | import json 2 | from unittest.mock import patch 3 | 4 | import pytest 5 | 6 | from biomcp.articles.search import ( 7 | PubmedRequest, 8 | ResultItem, 9 | SearchResponse, 10 | convert_request, 11 | search_articles, 12 | ) 13 | 14 | 15 | async def test_convert_search_query(anyio_backend): 16 | pubmed_request = PubmedRequest( 17 | chemicals=["Caffeine"], 18 | diseases=["non-small cell lung cancer"], 19 | genes=["BRAF"], 20 | variants=["BRAF V600E"], 21 | keywords=["therapy"], 22 | ) 23 | pubtator_request = await convert_request(request=pubmed_request) 24 | 25 | # The API may or may not return prefixed entity IDs, so we check for both possibilities 26 | query_text = pubtator_request.text 27 | 28 | # Keywords should always be first 29 | assert query_text.startswith("therapy AND ") 30 | 31 | # Check that all terms are present (with or without prefixes) 32 | assert "Caffeine" in query_text or "@CHEMICAL_Caffeine" in query_text 33 | assert ( 34 | "non-small cell lung cancer" in query_text.lower() 35 | or "carcinoma" in query_text.lower() 36 | or "@DISEASE_" in query_text 37 | ) 38 | assert "BRAF" in query_text or "@GENE_BRAF" in query_text 39 | assert ( 40 | "V600E" in query_text 41 | or "p.V600E" in query_text 42 | or "@VARIANT_" in query_text 43 | ) 44 | 45 | # All terms should be joined with AND 46 | assert ( 47 | query_text.count(" AND ") >= 4 48 | ) # At least 4 AND operators for 5 terms 49 | 50 | # default page request (changed to 10 for token efficiency) 51 | assert pubtator_request.size == 10 52 | 53 | 54 | async def test_convert_search_query_with_or_logic(anyio_backend): 55 | """Test that keywords with pipe separators are converted to OR queries.""" 56 | pubmed_request = PubmedRequest( 57 | genes=["PTEN"], 58 | keywords=["R173|Arg173|p.R173", "mutation"], 59 | ) 60 | pubtator_request = await convert_request(request=pubmed_request) 61 | 62 | query_text = pubtator_request.text 63 | 64 | # Check that OR logic is properly formatted 65 | assert "(R173 OR Arg173 OR p.R173)" in query_text 66 | assert "mutation" in query_text 67 | assert "PTEN" in query_text or "@GENE_PTEN" in query_text 68 | 69 | # Check overall structure 70 | assert ( 71 | query_text.count(" AND ") >= 2 72 | ) # At least 2 AND operators for 3 terms 73 | 74 | 75 | async def test_search(anyio_backend): 76 | """Test search with real API call - may be flaky due to network dependency. 77 | 78 | This test makes real API calls to PubTator3 and can fail due to: 79 | - Network connectivity issues (Error 599) 80 | - API rate limiting 81 | - Changes in search results over time 82 | 83 | Consider using test_search_mocked for more reliable testing. 84 | """ 85 | query = { 86 | "genes": ["BRAF"], 87 | "diseases": ["NSCLC", "Non - Small Cell Lung Cancer"], 88 | "keywords": ["BRAF mutations NSCLC"], 89 | "variants": ["mutation", "mutations"], 90 | } 91 | 92 | query = PubmedRequest(**query) 93 | output = await search_articles(query, output_json=True) 94 | data = json.loads(output) 95 | assert isinstance(data, list) 96 | 97 | # Handle potential errors - if the first item has an 'error' key, it's an error response 98 | if data and isinstance(data[0], dict) and "error" in data[0]: 99 | import pytest 100 | 101 | pytest.skip(f"API returned error: {data[0]['error']}") 102 | 103 | assert len(data) == 10 # Changed from 40 to 10 for token efficiency 104 | result = ResultItem.model_validate(data[0]) 105 | # todo: this might be flaky. 106 | assert ( 107 | result.title 108 | == "[Expert consensus on the diagnosis and treatment in advanced " 109 | "non-small cell lung cancer with BRAF mutation in China]." 110 | ) 111 | 112 | 113 | @pytest.mark.asyncio 114 | async def test_search_mocked(anyio_backend): 115 | """Test search with mocked API response to avoid network dependency.""" 116 | query = { 117 | "genes": ["BRAF"], 118 | "diseases": ["NSCLC", "Non - Small Cell Lung Cancer"], 119 | "keywords": ["BRAF mutations NSCLC"], 120 | "variants": ["mutation", "mutations"], 121 | } 122 | 123 | # Create mock response - don't include abstract here as it will be added by add_abstracts 124 | mock_response = SearchResponse( 125 | results=[ 126 | ResultItem( 127 | pmid=37495419, 128 | title="[Expert consensus on the diagnosis and treatment in advanced " 129 | "non-small cell lung cancer with BRAF mutation in China].", 130 | journal="Zhonghua Zhong Liu Za Zhi", 131 | authors=["Zhang", "Li", "Wang"], 132 | date="2023-07-23", 133 | doi="10.3760/cma.j.cn112152-20230314-00115", 134 | ) 135 | for _ in range(10) # Create 40 results 136 | ], 137 | page_size=10, 138 | current=1, 139 | count=10, 140 | total_pages=1, 141 | ) 142 | 143 | with patch("biomcp.http_client.request_api") as mock_request: 144 | mock_request.return_value = (mock_response, None) 145 | 146 | # Mock the autocomplete calls 147 | with patch("biomcp.articles.search.autocomplete") as mock_autocomplete: 148 | mock_autocomplete.return_value = ( 149 | None # Simplified - no entity mapping 150 | ) 151 | 152 | # Mock the call_pubtator_api function 153 | with patch( 154 | "biomcp.articles.search.call_pubtator_api" 155 | ) as mock_pubtator: 156 | from biomcp.articles.fetch import ( 157 | Article, 158 | FetchArticlesResponse, 159 | Passage, 160 | PassageInfo, 161 | ) 162 | 163 | # Create a mock response with abstracts 164 | mock_fetch_response = FetchArticlesResponse( 165 | PubTator3=[ 166 | Article( 167 | pmid=37495419, 168 | passages=[ 169 | Passage( 170 | text="This is a test abstract about BRAF mutations in NSCLC.", 171 | infons=PassageInfo( 172 | section_type="ABSTRACT" 173 | ), 174 | ) 175 | ], 176 | ) 177 | ] 178 | ) 179 | mock_pubtator.return_value = (mock_fetch_response, None) 180 | 181 | query_obj = PubmedRequest(**query) 182 | output = await search_articles(query_obj, output_json=True) 183 | data = json.loads(output) 184 | 185 | assert isinstance(data, list) 186 | assert ( 187 | len(data) == 10 188 | ) # Changed from 40 to 10 for token efficiency 189 | result = ResultItem.model_validate(data[0]) 190 | assert ( 191 | result.title 192 | == "[Expert consensus on the diagnosis and treatment in advanced " 193 | "non-small cell lung cancer with BRAF mutation in China]." 194 | ) 195 | assert ( 196 | result.abstract 197 | == "This is a test abstract about BRAF mutations in NSCLC." 198 | ) 199 | 200 | 201 | @pytest.mark.asyncio 202 | async def test_search_network_error(anyio_backend): 203 | """Test search handles network errors gracefully.""" 204 | query = PubmedRequest(genes=["BRAF"]) 205 | 206 | with patch("biomcp.http_client.request_api") as mock_request: 207 | from biomcp.http_client import RequestError 208 | 209 | mock_request.return_value = ( 210 | None, 211 | RequestError(code=599, message="Network connectivity error"), 212 | ) 213 | 214 | output = await search_articles(query, output_json=True) 215 | data = json.loads(output) 216 | 217 | assert isinstance(data, list) 218 | assert len(data) == 1 219 | assert "error" in data[0] 220 | assert "Error 599: Network connectivity error" in data[0]["error"] 221 | ``` -------------------------------------------------------------------------------- /BIOMCP_DATA_FLOW.md: -------------------------------------------------------------------------------- ```markdown 1 | # BioMCP Data Flow Diagram 2 | 3 | This document illustrates how BioMCP (Biomedical Model Context Protocol) works, showing the interaction between AI clients, the MCP server, domains, and external data sources. 4 | 5 | ## High-Level Architecture 6 | 7 | ```mermaid 8 | graph TB 9 | subgraph "AI Client Layer" 10 | AI[AI Assistant<br/>e.g., Claude, GPT] 11 | end 12 | 13 | subgraph "MCP Server Layer" 14 | MCP[MCP Server<br/>router.py] 15 | SEARCH[search tool] 16 | FETCH[fetch tool] 17 | end 18 | 19 | subgraph "Domain Routing Layer" 20 | ROUTER[Query Router] 21 | PARSER[Query Parser] 22 | UNIFIED[Unified Query<br/>Language] 23 | end 24 | 25 | subgraph "Domain Handlers" 26 | ARTICLES[Articles Domain<br/>Handler] 27 | TRIALS[Trials Domain<br/>Handler] 28 | VARIANTS[Variants Domain<br/>Handler] 29 | THINKING[Thinking Domain<br/>Handler] 30 | end 31 | 32 | subgraph "External APIs" 33 | subgraph "Article Sources" 34 | PUBMED[PubTator3/<br/>PubMed] 35 | BIORXIV[bioRxiv/<br/>medRxiv] 36 | EUROPEPMC[Europe PMC] 37 | end 38 | 39 | subgraph "Clinical Data" 40 | CLINICALTRIALS[ClinicalTrials.gov] 41 | end 42 | 43 | subgraph "Variant Sources" 44 | MYVARIANT[MyVariant.info] 45 | TCGA[TCGA] 46 | KG[1000 Genomes] 47 | CBIO[cBioPortal] 48 | end 49 | end 50 | 51 | %% Connections 52 | AI -->|MCP Protocol| MCP 53 | MCP --> SEARCH 54 | MCP --> FETCH 55 | 56 | SEARCH --> ROUTER 57 | ROUTER --> PARSER 58 | PARSER --> UNIFIED 59 | 60 | ROUTER --> ARTICLES 61 | ROUTER --> TRIALS 62 | ROUTER --> VARIANTS 63 | ROUTER --> THINKING 64 | 65 | ARTICLES --> PUBMED 66 | ARTICLES --> BIORXIV 67 | ARTICLES --> EUROPEPMC 68 | ARTICLES -.->|Gene enrichment| CBIO 69 | 70 | TRIALS --> CLINICALTRIALS 71 | 72 | VARIANTS --> MYVARIANT 73 | MYVARIANT --> TCGA 74 | MYVARIANT --> KG 75 | VARIANTS --> CBIO 76 | 77 | THINKING -->|Internal| THINKING 78 | 79 | classDef clientClass fill:#e1f5fe,stroke:#01579b,stroke-width:2px 80 | classDef serverClass fill:#f3e5f5,stroke:#4a148c,stroke-width:2px 81 | classDef domainClass fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px 82 | classDef apiClass fill:#fff3e0,stroke:#e65100,stroke-width:2px 83 | 84 | class AI clientClass 85 | class MCP,SEARCH,FETCH serverClass 86 | class ARTICLES,TRIALS,VARIANTS,THINKING domainClass 87 | class PUBMED,BIORXIV,EUROPEPMC,CLINICALTRIALS,MYVARIANT,TCGA,KG,CBIO apiClass 88 | ``` 89 | 90 | ## Detailed Search Flow 91 | 92 | ```mermaid 93 | sequenceDiagram 94 | participant AI as AI Client 95 | participant MCP as MCP Server 96 | participant Router as Query Router 97 | participant Domain as Domain Handler 98 | participant API as External API 99 | 100 | AI->>MCP: search(query="gene:BRAF AND disease:melanoma") 101 | MCP->>Router: Parse & route query 102 | 103 | alt Unified Query 104 | Router->>Router: Parse field syntax 105 | Router->>Router: Create routing plan 106 | 107 | par Search Articles 108 | Router->>Domain: Search articles (BRAF, melanoma) 109 | Domain->>API: PubTator3 API call 110 | API-->>Domain: Article results 111 | Domain->>API: cBioPortal enrichment 112 | API-->>Domain: Mutation data 113 | and Search Trials 114 | Router->>Domain: Search trials (melanoma) 115 | Domain->>API: ClinicalTrials.gov API 116 | API-->>Domain: Trial results 117 | and Search Variants 118 | Router->>Domain: Search variants (BRAF) 119 | Domain->>API: MyVariant.info API 120 | API-->>Domain: Variant results 121 | end 122 | else Domain-specific 123 | Router->>Domain: Direct domain search 124 | Domain->>API: Single API call 125 | API-->>Domain: Domain results 126 | else Sequential Thinking 127 | Router->>Domain: Process thought 128 | Domain->>Domain: Update session state 129 | Domain-->>Router: Thought response 130 | end 131 | 132 | Domain-->>Router: Formatted results 133 | Router-->>MCP: Aggregated results 134 | MCP-->>AI: Standardized response 135 | ``` 136 | 137 | ## Search Tool Parameters 138 | 139 | ```mermaid 140 | graph LR 141 | subgraph "Search Tool Input" 142 | PARAMS[Parameters] 143 | QUERY[query: string] 144 | DOMAIN[domain: article/trial/variant/thinking] 145 | GENES[genes: list] 146 | DISEASES[diseases: list] 147 | CONDITIONS[conditions: list] 148 | LAT[lat/long: coordinates] 149 | THOUGHT[thought parameters] 150 | end 151 | 152 | subgraph "Search Modes" 153 | MODE1[Unified Query Mode<br/>Uses 'query' param] 154 | MODE2[Domain-Specific Mode<br/>Uses domain + params] 155 | MODE3[Thinking Mode<br/>Uses thought params] 156 | end 157 | 158 | PARAMS --> MODE1 159 | PARAMS --> MODE2 160 | PARAMS --> MODE3 161 | ``` 162 | 163 | ## Domain-Specific Data Sources 164 | 165 | ```mermaid 166 | graph TD 167 | subgraph "Articles Domain" 168 | A1[PubTator3/PubMed<br/>- Published articles<br/>- Annotations] 169 | A2[bioRxiv/medRxiv<br/>- Preprints<br/>- Early research] 170 | A3[Europe PMC<br/>- Open access<br/>- Full text] 171 | A4[cBioPortal Integration<br/>- Auto-enrichment when genes specified<br/>- Mutation summaries & hotspots] 172 | end 173 | 174 | subgraph "Trials Domain" 175 | T1[ClinicalTrials.gov<br/>- Active trials<br/>- Trial details<br/>- Location search] 176 | end 177 | 178 | subgraph "Variants Domain" 179 | V1[MyVariant.info<br/>- Variant annotations<br/>- Clinical significance] 180 | V2[TCGA<br/>- Cancer variants<br/>- Somatic mutations] 181 | V3[1000 Genomes<br/>- Population frequency<br/>- Allele data] 182 | V4[cBioPortal<br/>- Cancer mutations<br/>- Hotspots] 183 | end 184 | 185 | A1 -.->|When genes present| A4 186 | A2 -.->|When genes present| A4 187 | A3 -.->|When genes present| A4 188 | ``` 189 | 190 | ## Unified Query Language 191 | 192 | ```mermaid 193 | graph TD 194 | QUERY[Unified Query<br/>"gene:BRAF AND disease:melanoma"] 195 | 196 | QUERY --> PARSE[Query Parser] 197 | 198 | PARSE --> F1[Field: gene<br/>Value: BRAF] 199 | PARSE --> F2[Field: disease<br/>Value: melanoma] 200 | 201 | F1 --> D1[Articles Domain] 202 | F1 --> D2[Variants Domain] 203 | F2 --> D1 204 | F2 --> D3[Trials Domain] 205 | 206 | D1 --> R1[PubMed Results] 207 | D2 --> R2[Variant Results] 208 | D3 --> R3[Trial Results] 209 | 210 | R1 --> AGG[Aggregated Results] 211 | R2 --> AGG 212 | R3 --> AGG 213 | ``` 214 | 215 | ## Example: Location-Based Trial Search 216 | 217 | ```mermaid 218 | sequenceDiagram 219 | participant User as User 220 | participant AI as AI Client 221 | participant MCP as BioMCP 222 | participant GEO as Geocoding Service 223 | participant CT as ClinicalTrials.gov 224 | 225 | User->>AI: Find active trials in Cleveland for NSCLC 226 | AI->>AI: Recognize location needs geocoding 227 | AI->>GEO: Geocode "Cleveland" 228 | GEO-->>AI: lat: 41.4993, long: -81.6944 229 | 230 | AI->>MCP: search(domain="trial",<br/>diseases=["NSCLC"],<br/>lat=41.4993,<br/>long=-81.6944,<br/>distance=50) 231 | 232 | MCP->>CT: API call with geo filter 233 | CT-->>MCP: Trials near Cleveland 234 | MCP-->>AI: Formatted trial results 235 | AI-->>User: Here are X active NSCLC trials in Cleveland area 236 | ``` 237 | 238 | ## Key Features 239 | 240 | 1. **Parallel Execution**: Multiple domains are searched simultaneously for unified queries 241 | 2. **Smart Enrichment**: Article searches automatically include cBioPortal mutation summaries when genes are specified, providing clinical context alongside literature results 242 | 3. **Location Awareness**: Trial searches support geographic filtering with lat/long coordinates 243 | 4. **Sequential Thinking**: Built-in reasoning system for complex biomedical questions 244 | 5. **Standardized Output**: All results follow OpenAI MCP format for consistency 245 | 246 | ## Response Format 247 | 248 | All search results follow this standardized structure: 249 | 250 | ```json 251 | { 252 | "results": [ 253 | { 254 | "id": "PMID12345678", 255 | "title": "BRAF V600E mutation in melanoma", 256 | "text": "This study investigates BRAF mutations...", 257 | "url": "https://pubmed.ncbi.nlm.nih.gov/12345678" 258 | } 259 | ] 260 | } 261 | ``` 262 | 263 | Fetch results include additional domain-specific metadata in the response. 264 | ``` -------------------------------------------------------------------------------- /src/biomcp/openfda/drug_labels_helpers.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Helper functions for OpenFDA drug labels to reduce complexity. 3 | """ 4 | 5 | from typing import Any 6 | 7 | from .input_validation import sanitize_input 8 | from .utils import clean_text, extract_drug_names, truncate_text 9 | 10 | 11 | def build_label_search_query( 12 | name: str | None, 13 | indication: str | None, 14 | boxed_warning: bool, 15 | section: str | None, 16 | ) -> str: 17 | """Build the search query for drug labels.""" 18 | search_parts = [] 19 | 20 | if name: 21 | # Sanitize input to prevent injection 22 | name = sanitize_input(name, max_length=100) 23 | 24 | if name: 25 | name_query = ( 26 | f'(openfda.brand_name:"{name}" OR ' 27 | f'openfda.generic_name:"{name}" OR ' 28 | f'openfda.substance_name:"{name}")' 29 | ) 30 | search_parts.append(name_query) 31 | 32 | if indication: 33 | # Sanitize indication input 34 | indication = sanitize_input(indication, max_length=200) 35 | if indication: 36 | search_parts.append(f'indications_and_usage:"{indication}"') 37 | 38 | if boxed_warning: 39 | search_parts.append("_exists_:boxed_warning") 40 | 41 | if section: 42 | # Map common section names to FDA fields 43 | section_map = { 44 | "indications": "indications_and_usage", 45 | "dosage": "dosage_and_administration", 46 | "contraindications": "contraindications", 47 | "warnings": "warnings_and_precautions", 48 | "adverse": "adverse_reactions", 49 | "interactions": "drug_interactions", 50 | "pregnancy": "pregnancy", 51 | "pediatric": "pediatric_use", 52 | "geriatric": "geriatric_use", 53 | "overdose": "overdosage", 54 | } 55 | field_name = section_map.get(section.lower(), section) 56 | search_parts.append(f"_exists_:{field_name}") 57 | 58 | return " AND ".join(search_parts) 59 | 60 | 61 | def format_label_summary(result: dict[str, Any], index: int) -> list[str]: 62 | """Format a single drug label summary.""" 63 | output = [] 64 | 65 | # Extract drug names 66 | drug_names = extract_drug_names(result) 67 | primary_name = drug_names[0] if drug_names else "Unknown Drug" 68 | 69 | output.append(f"#### {index}. {primary_name}") 70 | 71 | # Get OpenFDA data 72 | openfda = result.get("openfda", {}) 73 | 74 | # Show all names if multiple 75 | if len(drug_names) > 1: 76 | output.append(f"**Also known as**: {', '.join(drug_names[1:])}") 77 | 78 | # Basic info 79 | output.extend(_format_label_basic_info(openfda)) 80 | 81 | # Boxed warning 82 | if "boxed_warning" in result: 83 | warning_text = clean_text(" ".join(result["boxed_warning"])) 84 | output.append( 85 | f"\n⚠️ **BOXED WARNING**: {truncate_text(warning_text, 200)}" 86 | ) 87 | 88 | # Key sections 89 | output.extend(_format_label_key_sections(result)) 90 | 91 | # Set ID for retrieval 92 | if "set_id" in result: 93 | output.append(f"\n*Label ID: {result['set_id']}*") 94 | 95 | output.append("") 96 | return output 97 | 98 | 99 | def _format_label_basic_info(openfda: dict) -> list[str]: 100 | """Format basic label information from OpenFDA data.""" 101 | output = [] 102 | 103 | # Application number 104 | if app_numbers := openfda.get("application_number", []): 105 | output.append(f"**FDA Application**: {app_numbers[0]}") 106 | 107 | # Manufacturer 108 | if manufacturers := openfda.get("manufacturer_name", []): 109 | output.append(f"**Manufacturer**: {manufacturers[0]}") 110 | 111 | # Route 112 | if routes := openfda.get("route", []): 113 | output.append(f"**Route**: {', '.join(routes)}") 114 | 115 | return output 116 | 117 | 118 | def _format_label_key_sections(result: dict) -> list[str]: 119 | """Format key label sections.""" 120 | output = [] 121 | 122 | # Indications 123 | if "indications_and_usage" in result: 124 | indications_text = clean_text( 125 | " ".join(result["indications_and_usage"]) 126 | ) 127 | output.append( 128 | f"\n**Indications**: {truncate_text(indications_text, 300)}" 129 | ) 130 | 131 | # Contraindications 132 | if "contraindications" in result: 133 | contra_text = clean_text(" ".join(result["contraindications"])) 134 | output.append( 135 | f"\n**Contraindications**: {truncate_text(contra_text, 200)}" 136 | ) 137 | 138 | return output 139 | 140 | 141 | def format_label_header(result: dict[str, Any], set_id: str) -> list[str]: 142 | """Format the header for detailed drug label.""" 143 | output = [] 144 | 145 | drug_names = extract_drug_names(result) 146 | primary_name = drug_names[0] if drug_names else "Unknown Drug" 147 | 148 | output.append(f"## FDA Drug Label: {primary_name}\n") 149 | 150 | # Basic information 151 | openfda = result.get("openfda", {}) 152 | 153 | if len(drug_names) > 1: 154 | output.append(f"**Other Names**: {', '.join(drug_names[1:])}") 155 | 156 | output.extend(_format_detailed_metadata(openfda)) 157 | output.append(f"**Label ID**: {set_id}\n") 158 | 159 | return output 160 | 161 | 162 | def _format_detailed_metadata(openfda: dict) -> list[str]: 163 | """Format detailed metadata from OpenFDA.""" 164 | output = [] 165 | 166 | # FDA application numbers 167 | if app_numbers := openfda.get("application_number", []): 168 | output.append(f"**FDA Application**: {', '.join(app_numbers)}") 169 | 170 | # Manufacturers 171 | if manufacturers := openfda.get("manufacturer_name", []): 172 | output.append(f"**Manufacturer**: {', '.join(manufacturers)}") 173 | 174 | # Routes of administration 175 | if routes := openfda.get("route", []): 176 | output.append(f"**Route of Administration**: {', '.join(routes)}") 177 | 178 | # Pharmacologic class 179 | if pharm_classes := openfda.get("pharm_class_epc", []): 180 | output.append(f"**Pharmacologic Class**: {', '.join(pharm_classes)}") 181 | 182 | return output 183 | 184 | 185 | def format_label_section( 186 | result: dict[str, Any], section: str, section_titles: dict[str, str] 187 | ) -> list[str]: 188 | """Format a single label section.""" 189 | output: list[str] = [] 190 | 191 | if section not in result: 192 | return output 193 | 194 | title = section_titles.get(section, section.upper().replace("_", " ")) 195 | output.append(f"### {title}\n") 196 | 197 | section_text = result[section] 198 | if isinstance(section_text, list): 199 | section_text = " ".join(section_text) 200 | 201 | cleaned_text = clean_text(section_text) 202 | 203 | # For very long sections, provide a truncated version 204 | if len(cleaned_text) > 3000: 205 | output.append(truncate_text(cleaned_text, 3000)) 206 | output.append("\n*[Section truncated for brevity]*") 207 | else: 208 | output.append(cleaned_text) 209 | 210 | output.append("") 211 | return output 212 | 213 | 214 | def get_default_sections() -> list[str]: 215 | """Get the default sections to display.""" 216 | return [ 217 | "indications_and_usage", 218 | "dosage_and_administration", 219 | "contraindications", 220 | "warnings_and_precautions", 221 | "adverse_reactions", 222 | "drug_interactions", 223 | "use_in_specific_populations", 224 | "clinical_pharmacology", 225 | "clinical_studies", 226 | ] 227 | 228 | 229 | def get_section_titles() -> dict[str, str]: 230 | """Get the mapping of section names to display titles.""" 231 | return { 232 | "indications_and_usage": "INDICATIONS AND USAGE", 233 | "dosage_and_administration": "DOSAGE AND ADMINISTRATION", 234 | "contraindications": "CONTRAINDICATIONS", 235 | "warnings_and_precautions": "WARNINGS AND PRECAUTIONS", 236 | "adverse_reactions": "ADVERSE REACTIONS", 237 | "drug_interactions": "DRUG INTERACTIONS", 238 | "use_in_specific_populations": "USE IN SPECIFIC POPULATIONS", 239 | "clinical_pharmacology": "CLINICAL PHARMACOLOGY", 240 | "clinical_studies": "CLINICAL STUDIES", 241 | "how_supplied": "HOW SUPPLIED", 242 | "storage_and_handling": "STORAGE AND HANDLING", 243 | "patient_counseling_information": "PATIENT COUNSELING INFORMATION", 244 | "pregnancy": "PREGNANCY", 245 | "nursing_mothers": "NURSING MOTHERS", 246 | "pediatric_use": "PEDIATRIC USE", 247 | "geriatric_use": "GERIATRIC USE", 248 | "overdosage": "OVERDOSAGE", 249 | } 250 | ``` -------------------------------------------------------------------------------- /tests/tdd/test_drug_shortages.py: -------------------------------------------------------------------------------- ```python 1 | """Tests for FDA drug shortages module.""" 2 | 3 | from datetime import datetime 4 | from unittest.mock import AsyncMock, patch 5 | 6 | import pytest 7 | 8 | from biomcp.openfda.drug_shortages import ( 9 | get_drug_shortage, 10 | search_drug_shortages, 11 | ) 12 | 13 | 14 | class TestDrugShortages: 15 | """Test drug shortages functionality.""" 16 | 17 | @pytest.mark.asyncio 18 | async def test_search_drug_shortages_no_data_available(self): 19 | """Test drug shortage search when FDA data is unavailable.""" 20 | with patch( 21 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 22 | new_callable=AsyncMock, 23 | ) as mock_get_data: 24 | mock_get_data.return_value = None 25 | 26 | result = await search_drug_shortages(drug="cisplatin") 27 | 28 | assert "Drug Shortage Data Temporarily Unavailable" in result 29 | assert "FDA drug shortage database cannot be accessed" in result 30 | assert ( 31 | "https://www.accessdata.fda.gov/scripts/drugshortages/" 32 | in result 33 | ) 34 | assert ( 35 | "https://www.ashp.org/drug-shortages/current-shortages" 36 | in result 37 | ) 38 | 39 | @pytest.mark.asyncio 40 | async def test_get_drug_shortage_no_data_available(self): 41 | """Test getting specific drug shortage when FDA data is unavailable.""" 42 | with patch( 43 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 44 | new_callable=AsyncMock, 45 | ) as mock_get_data: 46 | mock_get_data.return_value = None 47 | 48 | result = await get_drug_shortage("cisplatin") 49 | 50 | assert "Drug Shortage Data Temporarily Unavailable" in result 51 | assert "FDA drug shortage database cannot be accessed" in result 52 | assert "Alternative Options:" in result 53 | 54 | @pytest.mark.asyncio 55 | async def test_mock_data_not_used_in_production(self): 56 | """Test that mock data is never returned in production scenarios.""" 57 | with patch( 58 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 59 | new_callable=AsyncMock, 60 | ) as mock_get_data: 61 | # Simulate no data available (cache miss and fetch failure) 62 | mock_get_data.return_value = None 63 | 64 | result = await search_drug_shortages(drug="test") 65 | 66 | assert "Drug Shortage Data Temporarily Unavailable" in result 67 | # Ensure mock data is not present 68 | assert "Cisplatin Injection" not in result 69 | assert "Methotrexate" not in result 70 | 71 | # Cache functionality test removed - was testing private implementation details 72 | # The public API is tested through search_drug_shortages and get_drug_shortage 73 | 74 | # Cache expiry test removed - was testing private implementation details 75 | # The caching behavior is an implementation detail not part of the public API 76 | 77 | @pytest.mark.asyncio 78 | async def test_search_with_filters(self): 79 | """Test drug shortage search with various filters.""" 80 | mock_data = { 81 | "_fetched_at": datetime.now().isoformat(), 82 | "shortages": [ 83 | { 84 | "generic_name": "Drug A", 85 | "brand_names": ["Brand A"], 86 | "status": "Current Shortage", 87 | "therapeutic_category": "Oncology", 88 | }, 89 | { 90 | "generic_name": "Drug B", 91 | "brand_names": ["Brand B"], 92 | "status": "Resolved", 93 | "therapeutic_category": "Cardiology", 94 | }, 95 | { 96 | "generic_name": "Drug C", 97 | "brand_names": ["Brand C"], 98 | "status": "Current Shortage", 99 | "therapeutic_category": "Oncology", 100 | }, 101 | ], 102 | } 103 | 104 | with patch( 105 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 106 | new_callable=AsyncMock, 107 | ) as mock_get_data: 108 | mock_get_data.return_value = mock_data 109 | 110 | # Test status filter 111 | result = await search_drug_shortages(status="current") 112 | assert "Drug A" in result 113 | assert "Drug C" in result 114 | assert "Drug B" not in result 115 | 116 | # Test therapeutic category filter 117 | result = await search_drug_shortages( 118 | therapeutic_category="Oncology" 119 | ) 120 | assert "Drug A" in result 121 | assert "Drug C" in result 122 | assert "Drug B" not in result 123 | 124 | # Test drug name filter 125 | result = await search_drug_shortages(drug="Drug B") 126 | assert "Drug B" in result 127 | assert "Drug A" not in result 128 | 129 | @pytest.mark.asyncio 130 | async def test_get_specific_drug_shortage(self): 131 | """Test getting details for a specific drug shortage.""" 132 | mock_data = { 133 | "_fetched_at": datetime.now().isoformat(), 134 | "shortages": [ 135 | { 136 | "generic_name": "Cisplatin Injection", 137 | "brand_names": ["Platinol"], 138 | "status": "Current Shortage", 139 | "shortage_start_date": "2023-02-10", 140 | "estimated_resolution": "Q2 2024", 141 | "reason": "Manufacturing delays", 142 | "therapeutic_category": "Oncology", 143 | "notes": "Limited supplies available", 144 | }, 145 | ], 146 | } 147 | 148 | with patch( 149 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 150 | new_callable=AsyncMock, 151 | ) as mock_get_data: 152 | mock_get_data.return_value = mock_data 153 | 154 | result = await get_drug_shortage("cisplatin") 155 | 156 | assert "Cisplatin Injection" in result 157 | assert "Current Shortage" in result 158 | assert "Manufacturing delays" in result 159 | assert "Oncology" in result 160 | assert "Limited supplies available" in result 161 | 162 | @pytest.mark.asyncio 163 | async def test_get_drug_shortage_not_found(self): 164 | """Test getting drug shortage for non-existent drug.""" 165 | mock_data = { 166 | "_fetched_at": datetime.now().isoformat(), 167 | "shortages": [ 168 | { 169 | "generic_name": "Drug A", 170 | "status": "Current Shortage", 171 | }, 172 | ], 173 | } 174 | 175 | with patch( 176 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 177 | new_callable=AsyncMock, 178 | ) as mock_get_data: 179 | mock_get_data.return_value = mock_data 180 | 181 | result = await get_drug_shortage("nonexistent-drug") 182 | 183 | assert "No shortage information found" in result 184 | assert "nonexistent-drug" in result 185 | 186 | @pytest.mark.asyncio 187 | async def test_api_key_parameter_ignored(self): 188 | """Test that API key parameter is accepted but not used (FDA limitation).""" 189 | mock_data = { 190 | "_fetched_at": datetime.now().isoformat(), 191 | "shortages": [ 192 | { 193 | "generic_name": "Test Drug", 194 | "status": "Current Shortage", 195 | "therapeutic_category": "Test Category", 196 | } 197 | ], 198 | } 199 | 200 | with patch( 201 | "biomcp.openfda.drug_shortages._get_cached_shortage_data", 202 | new_callable=AsyncMock, 203 | ) as mock_get_data: 204 | mock_get_data.return_value = mock_data 205 | 206 | # API key should be accepted but not affect functionality 207 | result = await search_drug_shortages( 208 | drug="test", 209 | api_key="test-key", 210 | ) 211 | 212 | # When there's data, it should format results 213 | assert "FDA Drug Shortage Information" in result 214 | assert "Test Drug" in result 215 | 216 | # Mock data function has been removed - no longer needed 217 | ``` -------------------------------------------------------------------------------- /tests/tdd/thinking/test_sequential.py: -------------------------------------------------------------------------------- ```python 1 | """Tests for sequential thinking functionality.""" 2 | 3 | from datetime import datetime 4 | 5 | import pytest 6 | 7 | from biomcp.thinking import sequential 8 | from biomcp.thinking.session import ThoughtEntry, _session_manager 9 | 10 | 11 | @pytest.fixture(autouse=True) 12 | def clear_thinking_state(): 13 | """Clear thinking state before each test.""" 14 | _session_manager.clear_all_sessions() 15 | yield 16 | _session_manager.clear_all_sessions() 17 | 18 | 19 | class TestSequentialThinking: 20 | """Test the sequential thinking MCP tool.""" 21 | 22 | @pytest.mark.anyio 23 | async def test_basic_sequential_thinking(self): 24 | """Test basic sequential thinking flow.""" 25 | result = await sequential._sequential_thinking( 26 | thought="First step: analyze the problem", 27 | nextThoughtNeeded=True, 28 | thoughtNumber=1, 29 | totalThoughts=3, 30 | ) 31 | 32 | assert "Added thought 1 to main sequence" in result 33 | assert "Progress: 1/3 thoughts" in result 34 | assert "Next thought needed" in result 35 | 36 | # Get current session 37 | session = _session_manager.get_session() 38 | assert session is not None 39 | assert len(session.thought_history) == 1 40 | 41 | # Verify thought structure 42 | thought = session.thought_history[0] 43 | assert thought.thought == "First step: analyze the problem" 44 | assert thought.thought_number == 1 45 | assert thought.total_thoughts == 3 46 | assert thought.next_thought_needed is True 47 | assert thought.is_revision is False 48 | 49 | @pytest.mark.anyio 50 | async def test_multiple_sequential_thoughts(self): 51 | """Test adding multiple thoughts in sequence.""" 52 | # Add first thought 53 | await sequential._sequential_thinking( 54 | thought="First step", 55 | nextThoughtNeeded=True, 56 | thoughtNumber=1, 57 | totalThoughts=3, 58 | ) 59 | 60 | # Add second thought 61 | await sequential._sequential_thinking( 62 | thought="Second step", 63 | nextThoughtNeeded=True, 64 | thoughtNumber=2, 65 | totalThoughts=3, 66 | ) 67 | 68 | # Add final thought 69 | result = await sequential._sequential_thinking( 70 | thought="Final step", 71 | nextThoughtNeeded=False, 72 | thoughtNumber=3, 73 | totalThoughts=3, 74 | ) 75 | 76 | assert "Added thought 3 to main sequence" in result 77 | assert "Thinking sequence complete" in result 78 | session = _session_manager.get_session() 79 | assert len(session.thought_history) == 3 80 | 81 | @pytest.mark.anyio 82 | async def test_thought_revision(self): 83 | """Test revising a previous thought.""" 84 | # Add initial thought 85 | await sequential._sequential_thinking( 86 | thought="Initial analysis", 87 | nextThoughtNeeded=True, 88 | thoughtNumber=1, 89 | totalThoughts=2, 90 | ) 91 | 92 | # Revise the thought 93 | result = await sequential._sequential_thinking( 94 | thought="Better analysis", 95 | nextThoughtNeeded=True, 96 | thoughtNumber=1, 97 | totalThoughts=2, 98 | isRevision=True, 99 | revisesThought=1, 100 | ) 101 | 102 | assert "Revised thought 1" in result 103 | session = _session_manager.get_session() 104 | assert len(session.thought_history) == 1 105 | assert session.thought_history[0].thought == "Better analysis" 106 | assert session.thought_history[0].is_revision is True 107 | 108 | @pytest.mark.anyio 109 | async def test_branching_logic(self): 110 | """Test creating thought branches.""" 111 | # Add main sequence thoughts 112 | await sequential._sequential_thinking( 113 | thought="Main thought 1", 114 | nextThoughtNeeded=True, 115 | thoughtNumber=1, 116 | totalThoughts=3, 117 | ) 118 | 119 | await sequential._sequential_thinking( 120 | thought="Main thought 2", 121 | nextThoughtNeeded=True, 122 | thoughtNumber=2, 123 | totalThoughts=3, 124 | ) 125 | 126 | # Create a branch 127 | result = await sequential._sequential_thinking( 128 | thought="Alternative approach", 129 | nextThoughtNeeded=True, 130 | thoughtNumber=3, 131 | totalThoughts=3, 132 | branchFromThought=2, 133 | ) 134 | 135 | assert "Added thought 3 to branch 'branch_2'" in result 136 | session = _session_manager.get_session() 137 | assert len(session.thought_history) == 2 138 | assert len(session.thought_branches) == 1 139 | assert "branch_2" in session.thought_branches 140 | assert len(session.thought_branches["branch_2"]) == 1 141 | 142 | @pytest.mark.anyio 143 | async def test_validation_errors(self): 144 | """Test input validation errors.""" 145 | # Test invalid thought number 146 | result = await sequential._sequential_thinking( 147 | thought="Test", 148 | nextThoughtNeeded=False, 149 | thoughtNumber=0, 150 | totalThoughts=1, 151 | ) 152 | assert "thoughtNumber must be >= 1" in result 153 | 154 | # Test invalid total thoughts 155 | result = await sequential._sequential_thinking( 156 | thought="Test", 157 | nextThoughtNeeded=False, 158 | thoughtNumber=1, 159 | totalThoughts=0, 160 | ) 161 | assert "totalThoughts must be >= 1" in result 162 | 163 | # Test revision without specifying which thought 164 | result = await sequential._sequential_thinking( 165 | thought="Test", 166 | nextThoughtNeeded=False, 167 | thoughtNumber=1, 168 | totalThoughts=1, 169 | isRevision=True, 170 | ) 171 | assert ( 172 | "revisesThought must be specified when isRevision=True" in result 173 | ) 174 | 175 | @pytest.mark.anyio 176 | async def test_needs_more_thoughts(self): 177 | """Test the needsMoreThoughts parameter.""" 178 | result = await sequential._sequential_thinking( 179 | thought="This problem is more complex than expected", 180 | nextThoughtNeeded=True, 181 | thoughtNumber=3, 182 | totalThoughts=3, 183 | needsMoreThoughts=True, 184 | ) 185 | 186 | assert "Added thought 3 to main sequence" in result 187 | session = _session_manager.get_session() 188 | assert len(session.thought_history) == 1 189 | assert ( 190 | session.thought_history[0].metadata.get("needsMoreThoughts") 191 | is True 192 | ) 193 | 194 | 195 | class TestUtilityFunctions: 196 | """Test utility functions.""" 197 | 198 | def test_get_current_timestamp(self): 199 | """Test timestamp generation.""" 200 | timestamp = sequential.get_current_timestamp() 201 | assert isinstance(timestamp, str) 202 | # Should be able to parse as ISO format 203 | parsed = datetime.fromisoformat( 204 | timestamp.replace("Z", "+00:00").replace("T", " ").split(".")[0] 205 | ) 206 | assert isinstance(parsed, datetime) 207 | 208 | def test_session_management(self): 209 | """Test session management functionality.""" 210 | # Clear any existing sessions 211 | _session_manager.clear_all_sessions() 212 | 213 | # Create a new session 214 | session = _session_manager.create_session() 215 | assert session is not None 216 | assert session.session_id is not None 217 | 218 | # Add a thought entry 219 | entry = ThoughtEntry( 220 | thought="Test thought", 221 | thought_number=1, 222 | total_thoughts=1, 223 | next_thought_needed=False, 224 | ) 225 | session.add_thought(entry) 226 | assert len(session.thought_history) == 1 227 | assert session.thought_history[0].thought == "Test thought" 228 | 229 | # Test branch creation 230 | branch_entry = ThoughtEntry( 231 | thought="Branch thought", 232 | thought_number=2, 233 | total_thoughts=2, 234 | next_thought_needed=False, 235 | branch_id="test-branch", 236 | branch_from_thought=1, 237 | ) 238 | session.add_thought(branch_entry) 239 | assert len(session.thought_branches) == 1 240 | assert "test-branch" in session.thought_branches 241 | assert len(session.thought_branches["test-branch"]) == 1 242 | ``` -------------------------------------------------------------------------------- /tests/tdd/openfda/test_drug_labels.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Unit tests for OpenFDA drug labels integration. 3 | """ 4 | 5 | from unittest.mock import patch 6 | 7 | import pytest 8 | 9 | from biomcp.openfda.drug_labels import get_drug_label, search_drug_labels 10 | 11 | 12 | @pytest.mark.asyncio 13 | async def test_search_drug_labels_by_name(): 14 | """Test searching drug labels by name.""" 15 | mock_response = { 16 | "meta": {"results": {"total": 5}}, 17 | "results": [ 18 | { 19 | "set_id": "abc123", 20 | "openfda": { 21 | "brand_name": ["KEYTRUDA"], 22 | "generic_name": ["PEMBROLIZUMAB"], 23 | "application_number": ["BLA125514"], 24 | "manufacturer_name": ["MERCK"], 25 | "route": ["INTRAVENOUS"], 26 | }, 27 | "indications_and_usage": [ 28 | "KEYTRUDA is indicated for the treatment of patients with unresectable or metastatic melanoma." 29 | ], 30 | "boxed_warning": [ 31 | "Immune-mediated adverse reactions can occur." 32 | ], 33 | } 34 | ], 35 | } 36 | 37 | with patch( 38 | "biomcp.openfda.drug_labels.make_openfda_request" 39 | ) as mock_request: 40 | mock_request.return_value = (mock_response, None) 41 | 42 | result = await search_drug_labels(name="pembrolizumab", limit=10) 43 | 44 | # Verify request 45 | mock_request.assert_called_once() 46 | call_args = mock_request.call_args 47 | assert "pembrolizumab" in call_args[0][1]["search"].lower() 48 | 49 | # Check output 50 | assert "FDA Drug Labels" in result 51 | assert "KEYTRUDA" in result 52 | assert "PEMBROLIZUMAB" in result 53 | assert "melanoma" in result 54 | assert "BOXED WARNING" in result 55 | assert "Immune-mediated" in result 56 | assert "abc123" in result 57 | 58 | 59 | @pytest.mark.asyncio 60 | async def test_search_drug_labels_by_indication(): 61 | """Test searching drug labels by indication.""" 62 | mock_response = { 63 | "meta": {"results": {"total": 10}}, 64 | "results": [ 65 | { 66 | "set_id": "xyz789", 67 | "openfda": { 68 | "brand_name": ["DRUG X"], 69 | "generic_name": ["GENERIC X"], 70 | }, 71 | "indications_and_usage": [ 72 | "Indicated for breast cancer treatment" 73 | ], 74 | } 75 | ], 76 | } 77 | 78 | with patch( 79 | "biomcp.openfda.drug_labels.make_openfda_request" 80 | ) as mock_request: 81 | mock_request.return_value = (mock_response, None) 82 | 83 | result = await search_drug_labels(indication="breast cancer") 84 | 85 | # Verify request 86 | call_args = mock_request.call_args 87 | assert "breast cancer" in call_args[0][1]["search"].lower() 88 | 89 | # Check output 90 | assert "breast cancer" in result 91 | assert "10 labels" in result 92 | 93 | 94 | @pytest.mark.asyncio 95 | async def test_search_drug_labels_no_params(): 96 | """Test that searching without parameters returns helpful message.""" 97 | result = await search_drug_labels() 98 | 99 | assert "Please specify" in result 100 | assert "drug name, indication, or label section" in result 101 | assert "Examples:" in result 102 | 103 | 104 | @pytest.mark.asyncio 105 | async def test_search_drug_labels_boxed_warning_filter(): 106 | """Test filtering for drugs with boxed warnings.""" 107 | mock_response = { 108 | "meta": {"results": {"total": 3}}, 109 | "results": [ 110 | { 111 | "set_id": "warn123", 112 | "openfda": {"brand_name": ["WARNING DRUG"]}, 113 | "boxed_warning": ["Serious warning text"], 114 | } 115 | ], 116 | } 117 | 118 | with patch( 119 | "biomcp.openfda.drug_labels.make_openfda_request" 120 | ) as mock_request: 121 | mock_request.return_value = (mock_response, None) 122 | 123 | result = await search_drug_labels(boxed_warning=True) 124 | 125 | # Verify boxed warning filter in search 126 | call_args = mock_request.call_args 127 | assert "_exists_:boxed_warning" in call_args[0][1]["search"] 128 | 129 | # Check output 130 | assert "WARNING DRUG" in result 131 | assert "Serious warning" in result 132 | 133 | 134 | @pytest.mark.asyncio 135 | async def test_get_drug_label_detail(): 136 | """Test getting detailed drug label.""" 137 | mock_response = { 138 | "results": [ 139 | { 140 | "set_id": "detail123", 141 | "openfda": { 142 | "brand_name": ["DETAILED DRUG"], 143 | "generic_name": ["GENERIC DETAILED"], 144 | "application_number": ["NDA123456"], 145 | "manufacturer_name": ["PHARMA CORP"], 146 | "route": ["ORAL"], 147 | "pharm_class_epc": ["KINASE INHIBITOR"], 148 | }, 149 | "boxed_warning": ["Serious boxed warning"], 150 | "indications_and_usage": ["Indicated for cancer"], 151 | "dosage_and_administration": ["Take once daily"], 152 | "contraindications": ["Do not use if allergic"], 153 | "warnings_and_precautions": ["Monitor liver function"], 154 | "adverse_reactions": ["Common: nausea, fatigue"], 155 | "drug_interactions": ["Avoid with CYP3A4 inhibitors"], 156 | "clinical_pharmacology": ["Mechanism of action details"], 157 | "clinical_studies": ["Phase 3 trial results"], 158 | } 159 | ] 160 | } 161 | 162 | with patch( 163 | "biomcp.openfda.drug_labels.make_openfda_request" 164 | ) as mock_request: 165 | mock_request.return_value = (mock_response, None) 166 | 167 | result = await get_drug_label("detail123") 168 | 169 | # Verify request 170 | mock_request.assert_called_once() 171 | call_args = mock_request.call_args 172 | assert "detail123" in call_args[0][1]["search"] 173 | 174 | # Check detailed output 175 | assert "DETAILED DRUG" in result 176 | assert "GENERIC DETAILED" in result 177 | assert "NDA123456" in result 178 | assert "PHARMA CORP" in result 179 | assert "ORAL" in result 180 | assert "KINASE INHIBITOR" in result 181 | assert "BOXED WARNING" in result 182 | assert "Serious boxed warning" in result 183 | assert "INDICATIONS AND USAGE" in result 184 | assert "Indicated for cancer" in result 185 | assert "DOSAGE AND ADMINISTRATION" in result 186 | assert "Take once daily" in result 187 | assert "CONTRAINDICATIONS" in result 188 | assert "WARNINGS AND PRECAUTIONS" in result 189 | assert "ADVERSE REACTIONS" in result 190 | assert "DRUG INTERACTIONS" in result 191 | 192 | 193 | @pytest.mark.asyncio 194 | async def test_get_drug_label_specific_sections(): 195 | """Test getting specific sections of drug label.""" 196 | mock_response = { 197 | "results": [ 198 | { 199 | "set_id": "section123", 200 | "openfda": {"brand_name": ["SECTION DRUG"]}, 201 | "indications_and_usage": ["Cancer indication"], 202 | "adverse_reactions": ["Side effects list"], 203 | "clinical_studies": ["Study data"], 204 | } 205 | ] 206 | } 207 | 208 | with patch( 209 | "biomcp.openfda.drug_labels.make_openfda_request" 210 | ) as mock_request: 211 | mock_request.return_value = (mock_response, None) 212 | 213 | sections = ["indications_and_usage", "adverse_reactions"] 214 | result = await get_drug_label("section123", sections) 215 | 216 | # Check that requested sections are included 217 | assert "INDICATIONS AND USAGE" in result 218 | assert "Cancer indication" in result 219 | assert "ADVERSE REACTIONS" in result 220 | assert "Side effects list" in result 221 | # Clinical studies should not be in output since not requested 222 | assert "CLINICAL STUDIES" not in result 223 | 224 | 225 | @pytest.mark.asyncio 226 | async def test_get_drug_label_not_found(): 227 | """Test handling when drug label is not found.""" 228 | with patch( 229 | "biomcp.openfda.drug_labels.make_openfda_request" 230 | ) as mock_request: 231 | mock_request.return_value = ({"results": []}, None) 232 | 233 | result = await get_drug_label("NOTFOUND456") 234 | 235 | assert "NOTFOUND456" in result 236 | assert "not found" in result 237 | ``` -------------------------------------------------------------------------------- /docs/getting-started/03-authentication-and-api-keys.md: -------------------------------------------------------------------------------- ```markdown 1 | # Authentication and API Keys 2 | 3 | BioMCP integrates with multiple biomedical databases. While many features work without authentication, some advanced capabilities require API keys for enhanced functionality. 4 | 5 | ## Overview of API Keys 6 | 7 | | Service | Required? | Features Enabled | Get Key | 8 | | --------------- | ---------- | ------------------------------------------------- | ---------------------------------------------------------------------- | 9 | | **NCI API** | Optional | Advanced clinical trial filters, biomarker search | [api.cancer.gov](https://api.cancer.gov) | 10 | | **AlphaGenome** | Required\* | Variant effect predictions | [deepmind.google.com](https://deepmind.google.com/science/alphagenome) | 11 | | **cBioPortal** | Optional | Enhanced cancer genomics queries | [cbioportal.org](https://www.cbioportal.org/webAPI) | 12 | 13 | \*Required only when using AlphaGenome features 14 | 15 | ## Setting Up API Keys 16 | 17 | ### Method 1: Environment Variables (Recommended for Personal Use) 18 | 19 | Set environment variables in your shell configuration: 20 | 21 | ```bash 22 | # Add to ~/.bashrc, ~/.zshrc, or equivalent 23 | export NCI_API_KEY="your-nci-api-key" 24 | export ALPHAGENOME_API_KEY="your-alphagenome-key" 25 | export CBIO_TOKEN="your-cbioportal-token" 26 | ``` 27 | 28 | ### Method 2: Configuration Files 29 | 30 | #### For Claude Desktop 31 | 32 | Add keys to your Claude Desktop configuration: 33 | 34 | ```json 35 | { 36 | "mcpServers": { 37 | "biomcp": { 38 | "command": "uv", 39 | "args": ["run", "--with", "biomcp-python", "biomcp", "run"], 40 | "env": { 41 | "NCI_API_KEY": "your-nci-api-key", 42 | "ALPHAGENOME_API_KEY": "your-alphagenome-key", 43 | "CBIO_TOKEN": "your-cbioportal-token" 44 | } 45 | } 46 | } 47 | } 48 | ``` 49 | 50 | #### For Docker Deployments 51 | 52 | Include in your Docker run command: 53 | 54 | ```bash 55 | docker run -e NCI_API_KEY="your-key" \ 56 | -e ALPHAGENOME_API_KEY="your-key" \ 57 | -e CBIO_TOKEN="your-token" \ 58 | biomcp:latest 59 | ``` 60 | 61 | ### Method 3: Per-Request Keys (For Hosted Environments) 62 | 63 | When using BioMCP through AI assistants or hosted services, provide keys in your request: 64 | 65 | ``` 66 | "Predict effects of BRAF V600E mutation. My AlphaGenome API key is YOUR_KEY_HERE" 67 | ``` 68 | 69 | The AI will recognize patterns like "My [service] API key is..." and use the key for that request only. 70 | 71 | ## Individual Service Setup 72 | 73 | ### NCI Clinical Trials API 74 | 75 | The National Cancer Institute API provides advanced clinical trial search capabilities. 76 | 77 | #### Getting Your Key 78 | 79 | 1. Visit [api.cancer.gov](https://api.cancer.gov) 80 | 2. Click "Get API Key" 81 | 3. Complete registration 82 | 4. Key is emailed immediately 83 | 84 | #### Features Enabled 85 | 86 | - Advanced biomarker-based trial search 87 | - Organization and investigator lookups 88 | - Intervention and disease vocabularies 89 | - Higher rate limits (1000 requests/day vs 100) 90 | 91 | #### Usage Example 92 | 93 | ```bash 94 | # With API key set 95 | export NCI_API_KEY="your-key" 96 | 97 | # Search trials with biomarker criteria 98 | biomcp trial search --condition melanoma --source nci \ 99 | --required-mutations "BRAF V600E" --allow-brain-mets true 100 | ``` 101 | 102 | ### AlphaGenome 103 | 104 | Google DeepMind's AlphaGenome predicts variant effects on gene expression and chromatin accessibility. 105 | 106 | #### Getting Your Key 107 | 108 | 1. Visit [AlphaGenome Portal](https://deepmind.google.com/science/alphagenome) 109 | 2. Register for non-commercial use 110 | 3. Receive API key via email 111 | 4. Accept terms of service 112 | 113 | #### Features Enabled 114 | 115 | - Gene expression predictions 116 | - Chromatin accessibility analysis 117 | - Splicing effect predictions 118 | - Tissue-specific analyses 119 | 120 | #### Usage Examples 121 | 122 | **CLI with environment variable:** 123 | 124 | ```bash 125 | export ALPHAGENOME_API_KEY="your-key" 126 | biomcp variant predict chr7 140753336 A T 127 | ``` 128 | 129 | **CLI with per-request key:** 130 | 131 | ```bash 132 | biomcp variant predict chr7 140753336 A T --api-key YOUR_KEY 133 | ``` 134 | 135 | **Through AI assistant:** 136 | 137 | ``` 138 | "Predict regulatory effects of BRAF V600E (chr7:140753336 A>T). 139 | My AlphaGenome API key is YOUR_KEY_HERE" 140 | ``` 141 | 142 | ### cBioPortal 143 | 144 | The cBioPortal token enables enhanced cancer genomics queries. 145 | 146 | #### Getting Your Token 147 | 148 | 1. Create account at [cbioportal.org](https://www.cbioportal.org) 149 | 2. Navigate to "Web API" section 150 | 3. Generate a personal access token 151 | 4. Copy the token (shown only once) 152 | 153 | #### Features Enabled 154 | 155 | - Higher API rate limits 156 | - Access to private studies (if authorized) 157 | - Batch query capabilities 158 | - Extended timeout limits 159 | 160 | #### Usage 161 | 162 | cBioPortal integration is automatic when searching for genes. The token enables: 163 | 164 | ```bash 165 | # Enhanced gene search with cancer genomics 166 | export CBIO_TOKEN="your-token" 167 | biomcp article search --gene BRAF --disease melanoma 168 | ``` 169 | 170 | ## Security Best Practices 171 | 172 | ### DO: 173 | 174 | - Store keys in environment variables or secure config files 175 | - Use per-request keys in shared/hosted environments 176 | - Rotate keys periodically 177 | - Use separate keys for development/production 178 | 179 | ### DON'T: 180 | 181 | - Commit keys to version control 182 | - Share keys with others 183 | - Include keys in code or documentation 184 | - Store keys in plain text files 185 | 186 | ### Git Security 187 | 188 | Add to `.gitignore`: 189 | 190 | ``` 191 | .env 192 | .env.local 193 | *.key 194 | config/secrets/ 195 | ``` 196 | 197 | Use git-secrets to prevent accidental commits: 198 | 199 | ```bash 200 | # Install git-secrets 201 | brew install git-secrets # macOS 202 | # or follow instructions at github.com/awslabs/git-secrets 203 | 204 | # Set up in your repo 205 | git secrets --install 206 | git secrets --register-aws # Detects common key patterns 207 | ``` 208 | 209 | ## Troubleshooting 210 | 211 | ### "API Key Required" Errors 212 | 213 | **For AlphaGenome:** 214 | 215 | - This service always requires a key 216 | - Provide it via environment variable or per-request 217 | - Check key spelling and format 218 | 219 | **For NCI:** 220 | 221 | - Basic search works without key 222 | - Advanced features require authentication 223 | - Verify key is active at api.cancer.gov 224 | 225 | ### "Invalid API Key" Errors 226 | 227 | 1. Check for extra spaces or quotes 228 | 2. Ensure key hasn't expired 229 | 3. Verify you're using the correct service's key 230 | 4. Test key directly with the service's API 231 | 232 | ### Rate Limit Errors 233 | 234 | **Without API keys:** 235 | 236 | - Public limits are restrictive (e.g., 100 requests/day) 237 | - Add delays between requests 238 | - Consider getting API keys 239 | 240 | **With API keys:** 241 | 242 | - Limits are much higher but still exist 243 | - Implement exponential backoff 244 | - Cache results when possible 245 | 246 | ## Testing Your Setup 247 | 248 | ### Check Environment Variables 249 | 250 | ```bash 251 | # List all BioMCP-related environment variables 252 | env | grep -E "(NCI_API_KEY|ALPHAGENOME_API_KEY|CBIO_TOKEN)" 253 | ``` 254 | 255 | ### Test Each Service 256 | 257 | ```bash 258 | # Test NCI API 259 | biomcp trial search --condition cancer --source nci --limit 1 260 | 261 | # Test AlphaGenome (requires key) 262 | biomcp variant predict chr7 140753336 A T --limit 1 263 | 264 | # Test cBioPortal integration 265 | biomcp article search --gene TP53 --limit 1 266 | ``` 267 | 268 | ## API Key Management Tools 269 | 270 | For managing multiple API keys securely: 271 | 272 | ### 1. direnv (Recommended) 273 | 274 | ```bash 275 | # Install direnv 276 | brew install direnv # macOS 277 | # Add to shell: eval "$(direnv hook zsh)" 278 | 279 | # Create .envrc in project 280 | echo 'export NCI_API_KEY="your-key"' > .envrc 281 | direnv allow 282 | ``` 283 | 284 | ### 2. 1Password CLI 285 | 286 | ```bash 287 | # Store in 1Password 288 | op item create --category=password \ 289 | --title="BioMCP API Keys" \ 290 | --vault="Development" \ 291 | NCI_API_KEY="your-key" 292 | 293 | # Load in shell 294 | export NCI_API_KEY=$(op read "op://Development/BioMCP API Keys/NCI_API_KEY") 295 | ``` 296 | 297 | ### 3. AWS Secrets Manager 298 | 299 | ```bash 300 | # Store secret 301 | aws secretsmanager create-secret \ 302 | --name biomcp/api-keys \ 303 | --secret-string '{"NCI_API_KEY":"your-key"}' 304 | 305 | # Retrieve in script 306 | export NCI_API_KEY=$(aws secretsmanager get-secret-value \ 307 | --secret-id biomcp/api-keys \ 308 | --query SecretString \ 309 | --output text | jq -r .NCI_API_KEY) 310 | ``` 311 | 312 | ## Next Steps 313 | 314 | Now that you have API keys configured: 315 | 316 | 1. Test each service to ensure keys work 317 | 2. Explore [How-to Guides](../how-to-guides/01-find-articles-and-cbioportal-data.md) for advanced features 318 | 3. Set up [logging and monitoring](../how-to-guides/05-logging-and-monitoring-with-bigquery.md) 319 | 4. Review [security policies](../policies.md) for your organization 320 | ``` -------------------------------------------------------------------------------- /docs/concepts/03-sequential-thinking-with-the-think-tool.md: -------------------------------------------------------------------------------- ```markdown 1 | # Sequential Thinking with the Think Tool 2 | 3 | ## CRITICAL: The Think Tool is MANDATORY 4 | 5 | **The 'think' tool must be your FIRST action when using BioMCP. This is not optional.** 6 | 7 | For detailed technical documentation on the think tool parameters and usage, see the [MCP Tools Reference - Think Tool](../user-guides/02-mcp-tools-reference.md#3-think). 8 | 9 | ## Why Sequential Thinking? 10 | 11 | Biomedical research is inherently complex, requiring systematic analysis of interconnected data from multiple sources. The think tool enforces a structured approach that: 12 | 13 | - **Prevents Information Overload**: Breaks complex queries into manageable steps 14 | - **Ensures Comprehensive Coverage**: Systematic thinking catches details that might be missed 15 | - **Documents Reasoning**: Creates an audit trail of research decisions 16 | - **Improves Accuracy**: Thoughtful planning leads to better search strategies 17 | 18 | ## Mandatory Usage Requirements 19 | 20 | 🚨 **REQUIRED USAGE:** 21 | 22 | - You MUST call 'think' BEFORE any search or fetch operations 23 | - EVERY biomedical research query requires thinking first 24 | - ALL multi-step analyses must begin with the think tool 25 | - ANY task using BioMCP tools requires prior planning with think 26 | 27 | ⚠️ **WARNING - Skipping the think tool will result in:** 28 | 29 | - Incomplete analysis 30 | - Poor search strategies 31 | - Missing critical connections 32 | - Suboptimal results 33 | - Frustrated users 34 | 35 | ## How to Use the Think Tool 36 | 37 | The think tool accepts these parameters: 38 | 39 | ```python 40 | think( 41 | thought="Your reasoning about the current step", 42 | thoughtNumber=1, # Sequential number starting from 1 43 | totalThoughts=5, # Optional: estimated total thoughts needed 44 | nextThoughtNeeded=True # Set to False only when analysis is complete 45 | ) 46 | ``` 47 | 48 | ## Sequential Thinking Patterns 49 | 50 | ### Pattern 1: Initial Query Decomposition 51 | 52 | Always start by breaking down the user's query: 53 | 54 | ```python 55 | # User asks: "What are the treatment options for BRAF V600E melanoma?" 56 | 57 | think( 58 | thought="Breaking down query: Need to find 1) BRAF V600E mutation significance in melanoma, 2) approved treatments for BRAF-mutant melanoma, 3) clinical trials for new therapies, 4) resistance mechanisms and combination strategies", 59 | thoughtNumber=1, 60 | nextThoughtNeeded=True 61 | ) 62 | ``` 63 | 64 | ### Pattern 2: Search Strategy Planning 65 | 66 | Plan your data collection approach: 67 | 68 | ```python 69 | think( 70 | thought="Search strategy: First use gene_getter for BRAF context, then article_searcher for BRAF V600E melanoma treatments focusing on FDA-approved drugs, followed by trial_searcher for ongoing studies with BRAF inhibitors", 71 | thoughtNumber=2, 72 | nextThoughtNeeded=True 73 | ) 74 | ``` 75 | 76 | ### Pattern 3: Progressive Refinement 77 | 78 | Document findings and adjust strategy: 79 | 80 | ```python 81 | think( 82 | thought="Found 3 FDA-approved BRAF inhibitors (vemurafenib, dabrafenib, encorafenib). Need to search for combination therapies with MEK inhibitors based on resistance patterns identified in literature", 83 | thoughtNumber=3, 84 | nextThoughtNeeded=True 85 | ) 86 | ``` 87 | 88 | ### Pattern 4: Synthesis Planning 89 | 90 | Before creating final output: 91 | 92 | ```python 93 | think( 94 | thought="Ready to synthesize: Will organize findings into 1) First-line treatments (BRAF+MEK combos), 2) Second-line options (immunotherapy), 3) Emerging therapies from trials, 4) Resistance mechanisms to consider", 95 | thoughtNumber=4, 96 | nextThoughtNeeded=False # Analysis complete 97 | ) 98 | ``` 99 | 100 | ## Common Think Tool Workflows 101 | 102 | ### Literature Review Workflow 103 | 104 | ```python 105 | # Step 1: Problem definition 106 | think(thought="User wants comprehensive review of CDK4/6 inhibitors in breast cancer...", thoughtNumber=1) 107 | 108 | # Step 2: Search parameters 109 | think(thought="Will search for palbociclib, ribociclib, abemaciclib in HR+/HER2- breast cancer...", thoughtNumber=2) 110 | 111 | # Step 3: Quality filtering 112 | think(thought="Found 47 articles, filtering for Phase III trials and meta-analyses...", thoughtNumber=3) 113 | 114 | # Step 4: Evidence synthesis 115 | think(thought="Identified consistent PFS benefit across trials, now analyzing OS data...", thoughtNumber=4) 116 | ``` 117 | 118 | ### Clinical Trial Analysis Workflow 119 | 120 | ```python 121 | # Step 1: Criteria identification 122 | think(thought="Patient has EGFR L858R lung cancer, progressed on osimertinib...", thoughtNumber=1) 123 | 124 | # Step 2: Trial search strategy 125 | think(thought="Searching for trials accepting EGFR-mutant NSCLC after TKI resistance...", thoughtNumber=2) 126 | 127 | # Step 3: Eligibility assessment 128 | think(thought="Found 12 trials, checking for brain metastases eligibility...", thoughtNumber=3) 129 | 130 | # Step 4: Prioritization 131 | think(thought="Ranking trials by proximity, novel mechanisms, and enrollment status...", thoughtNumber=4) 132 | ``` 133 | 134 | ### Variant Interpretation Workflow 135 | 136 | ```python 137 | # Step 1: Variant identification 138 | think(thought="Analyzing TP53 R248Q mutation found in patient's tumor...", thoughtNumber=1) 139 | 140 | # Step 2: Database queries 141 | think(thought="Will check MyVariant for population frequency, cBioPortal for cancer prevalence...", thoughtNumber=2) 142 | 143 | # Step 3: Functional assessment 144 | think(thought="Variant is pathogenic, affects DNA binding domain, common in multiple cancers...", thoughtNumber=3) 145 | 146 | # Step 4: Clinical implications 147 | think(thought="Synthesizing prognostic impact and potential therapeutic vulnerabilities...", thoughtNumber=4) 148 | ``` 149 | 150 | ## Think Tool Best Practices 151 | 152 | ### DO: 153 | 154 | - Start EVERY BioMCP session with think 155 | - Use sequential numbering (1, 2, 3...) 156 | - Document key findings in each thought 157 | - Adjust strategy based on intermediate results 158 | - Use think to track progress through complex analyses 159 | 160 | ### DON'T: 161 | 162 | - Skip think and jump to searches 163 | - Use think only at the beginning 164 | - Set nextThoughtNeeded=false prematurely 165 | - Use generic thoughts without specific content 166 | - Forget to document decision rationale 167 | 168 | ## Integration with Other Tools 169 | 170 | The think tool should wrap around other tool usage: 171 | 172 | ```python 173 | # CORRECT PATTERN 174 | think(thought="Planning BRAF melanoma research...", thoughtNumber=1) 175 | gene_info = gene_getter("BRAF") 176 | 177 | think(thought="BRAF is a serine/threonine kinase, V600E creates constitutive activation. Searching for targeted therapies...", thoughtNumber=2) 178 | articles = article_searcher(genes=["BRAF"], diseases=["melanoma"], keywords=["vemurafenib", "dabrafenib"]) 179 | 180 | think(thought="Found key trials showing BRAF+MEK combination superiority. Checking for active trials...", thoughtNumber=3) 181 | trials = trial_searcher(conditions=["melanoma"], interventions=["BRAF inhibitor"]) 182 | 183 | # INCORRECT PATTERN - NO THINKING 184 | gene_info = gene_getter("BRAF") # ❌ Started without thinking 185 | articles = article_searcher(...) # ❌ No strategy planning 186 | ``` 187 | 188 | ## Reminder System 189 | 190 | BioMCP includes automatic reminders if you forget to use think: 191 | 192 | - Search results will include a reminder message 193 | - The reminder appears as a system message 194 | - It prompts you to use think for better results 195 | - This ensures consistent methodology 196 | 197 | ## Advanced Sequential Thinking 198 | 199 | ### Branching Logic 200 | 201 | Use think to handle conditional paths: 202 | 203 | ```python 204 | think( 205 | thought="No direct trials found for this rare mutation. Pivoting to search for basket trials and mutation-agnostic approaches...", 206 | thoughtNumber=5, 207 | nextThoughtNeeded=True 208 | ) 209 | ``` 210 | 211 | ### Error Recovery 212 | 213 | Document and adjust when searches fail: 214 | 215 | ```python 216 | think( 217 | thought="MyVariant query failed for this structural variant. Will use article search to find functional studies instead...", 218 | thoughtNumber=6, 219 | nextThoughtNeeded=True 220 | ) 221 | ``` 222 | 223 | ### Complex Integration 224 | 225 | Coordinate multiple data sources: 226 | 227 | ```python 228 | think( 229 | thought="Integrating findings: cBioPortal shows 15% frequency in lung adenocarcinoma, articles describe resistance mechanisms, trials testing combination strategies...", 230 | thoughtNumber=7, 231 | nextThoughtNeeded=True 232 | ) 233 | ``` 234 | 235 | ## Conclusion 236 | 237 | The think tool is not just a requirement—it's your research companion that ensures systematic, thorough, and reproducible biomedical research. By following sequential thinking patterns, you'll deliver comprehensive insights that address all aspects of complex biomedical queries. 238 | 239 | Remember: **Always think first, then search. Document your reasoning. Only mark thinking complete when your analysis is truly finished.** 240 | ```