This is page 12 of 19. Use http://codebase.md/genomoncology/biomcp?lines=true&page={x} to view the full context. # Directory Structure ``` ├── .github │ ├── actions │ │ └── setup-python-env │ │ └── action.yml │ ├── dependabot.yml │ └── workflows │ ├── ci.yml │ ├── deploy-docs.yml │ ├── main.yml.disabled │ ├── on-release-main.yml │ └── validate-codecov-config.yml ├── .gitignore ├── .pre-commit-config.yaml ├── BIOMCP_DATA_FLOW.md ├── CHANGELOG.md ├── CNAME ├── codecov.yaml ├── docker-compose.yml ├── Dockerfile ├── docs │ ├── apis │ │ ├── error-codes.md │ │ ├── overview.md │ │ └── python-sdk.md │ ├── assets │ │ ├── biomcp-cursor-locations.png │ │ ├── favicon.ico │ │ ├── icon.png │ │ ├── logo.png │ │ ├── mcp_architecture.txt │ │ └── remote-connection │ │ ├── 00_connectors.png │ │ ├── 01_add_custom_connector.png │ │ ├── 02_connector_enabled.png │ │ ├── 03_connect_to_biomcp.png │ │ ├── 04_select_google_oauth.png │ │ └── 05_success_connect.png │ ├── backend-services-reference │ │ ├── 01-overview.md │ │ ├── 02-biothings-suite.md │ │ ├── 03-cbioportal.md │ │ ├── 04-clinicaltrials-gov.md │ │ ├── 05-nci-cts-api.md │ │ ├── 06-pubtator3.md │ │ └── 07-alphagenome.md │ ├── blog │ │ ├── ai-assisted-clinical-trial-search-analysis.md │ │ ├── images │ │ │ ├── deep-researcher-video.png │ │ │ ├── researcher-announce.png │ │ │ ├── researcher-drop-down.png │ │ │ ├── researcher-prompt.png │ │ │ ├── trial-search-assistant.png │ │ │ └── what_is_biomcp_thumbnail.png │ │ └── researcher-persona-resource.md │ ├── changelog.md │ ├── CNAME │ ├── concepts │ │ ├── 01-what-is-biomcp.md │ │ ├── 02-the-deep-researcher-persona.md │ │ └── 03-sequential-thinking-with-the-think-tool.md │ ├── developer-guides │ │ ├── 01-server-deployment.md │ │ ├── 02-contributing-and-testing.md │ │ ├── 03-third-party-endpoints.md │ │ ├── 04-transport-protocol.md │ │ ├── 05-error-handling.md │ │ ├── 06-http-client-and-caching.md │ │ ├── 07-performance-optimizations.md │ │ └── generate_endpoints.py │ ├── faq-condensed.md │ ├── FDA_SECURITY.md │ ├── genomoncology.md │ ├── getting-started │ │ ├── 01-quickstart-cli.md │ │ ├── 02-claude-desktop-integration.md │ │ └── 03-authentication-and-api-keys.md │ ├── how-to-guides │ │ ├── 01-find-articles-and-cbioportal-data.md │ │ ├── 02-find-trials-with-nci-and-biothings.md │ │ ├── 03-get-comprehensive-variant-annotations.md │ │ ├── 04-predict-variant-effects-with-alphagenome.md │ │ ├── 05-logging-and-monitoring-with-bigquery.md │ │ └── 06-search-nci-organizations-and-interventions.md │ ├── index.md │ ├── policies.md │ ├── reference │ │ ├── architecture-diagrams.md │ │ ├── quick-architecture.md │ │ ├── quick-reference.md │ │ └── visual-architecture.md │ ├── robots.txt │ ├── stylesheets │ │ ├── announcement.css │ │ └── extra.css │ ├── troubleshooting.md │ ├── tutorials │ │ ├── biothings-prompts.md │ │ ├── claude-code-biomcp-alphagenome.md │ │ ├── nci-prompts.md │ │ ├── openfda-integration.md │ │ ├── openfda-prompts.md │ │ ├── pydantic-ai-integration.md │ │ └── remote-connection.md │ ├── user-guides │ │ ├── 01-command-line-interface.md │ │ ├── 02-mcp-tools-reference.md │ │ └── 03-integrating-with-ides-and-clients.md │ └── workflows │ └── all-workflows.md ├── example_scripts │ ├── mcp_integration.py │ └── python_sdk.py ├── glama.json ├── LICENSE ├── lzyank.toml ├── Makefile ├── mkdocs.yml ├── package-lock.json ├── package.json ├── pyproject.toml ├── README.md ├── scripts │ ├── check_docs_in_mkdocs.py │ ├── check_http_imports.py │ └── generate_endpoints_doc.py ├── smithery.yaml ├── src │ └── biomcp │ ├── __init__.py │ ├── __main__.py │ ├── articles │ │ ├── __init__.py │ │ ├── autocomplete.py │ │ ├── fetch.py │ │ ├── preprints.py │ │ ├── search_optimized.py │ │ ├── search.py │ │ └── unified.py │ ├── biomarkers │ │ ├── __init__.py │ │ └── search.py │ ├── cbioportal_helper.py │ ├── circuit_breaker.py │ ├── cli │ │ ├── __init__.py │ │ ├── articles.py │ │ ├── biomarkers.py │ │ ├── diseases.py │ │ ├── health.py │ │ ├── interventions.py │ │ ├── main.py │ │ ├── openfda.py │ │ ├── organizations.py │ │ ├── server.py │ │ ├── trials.py │ │ └── variants.py │ ├── connection_pool.py │ ├── constants.py │ ├── core.py │ ├── diseases │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── domain_handlers.py │ ├── drugs │ │ ├── __init__.py │ │ └── getter.py │ ├── exceptions.py │ ├── genes │ │ ├── __init__.py │ │ └── getter.py │ ├── http_client_simple.py │ ├── http_client.py │ ├── individual_tools.py │ ├── integrations │ │ ├── __init__.py │ │ ├── biothings_client.py │ │ └── cts_api.py │ ├── interventions │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── logging_filter.py │ ├── metrics_handler.py │ ├── metrics.py │ ├── openfda │ │ ├── __init__.py │ │ ├── adverse_events_helpers.py │ │ ├── adverse_events.py │ │ ├── cache.py │ │ ├── constants.py │ │ ├── device_events_helpers.py │ │ ├── device_events.py │ │ ├── drug_approvals.py │ │ ├── drug_labels_helpers.py │ │ ├── drug_labels.py │ │ ├── drug_recalls_helpers.py │ │ ├── drug_recalls.py │ │ ├── drug_shortages_detail_helpers.py │ │ ├── drug_shortages_helpers.py │ │ ├── drug_shortages.py │ │ ├── exceptions.py │ │ ├── input_validation.py │ │ ├── rate_limiter.py │ │ ├── utils.py │ │ └── validation.py │ ├── organizations │ │ ├── __init__.py │ │ ├── getter.py │ │ └── search.py │ ├── parameter_parser.py │ ├── prefetch.py │ ├── query_parser.py │ ├── query_router.py │ ├── rate_limiter.py │ ├── render.py │ ├── request_batcher.py │ ├── resources │ │ ├── __init__.py │ │ ├── getter.py │ │ ├── instructions.md │ │ └── researcher.md │ ├── retry.py │ ├── router_handlers.py │ ├── router.py │ ├── shared_context.py │ ├── thinking │ │ ├── __init__.py │ │ ├── sequential.py │ │ └── session.py │ ├── thinking_tool.py │ ├── thinking_tracker.py │ ├── trials │ │ ├── __init__.py │ │ ├── getter.py │ │ ├── nci_getter.py │ │ ├── nci_search.py │ │ └── search.py │ ├── utils │ │ ├── __init__.py │ │ ├── cancer_types_api.py │ │ ├── cbio_http_adapter.py │ │ ├── endpoint_registry.py │ │ ├── gene_validator.py │ │ ├── metrics.py │ │ ├── mutation_filter.py │ │ ├── query_utils.py │ │ ├── rate_limiter.py │ │ └── request_cache.py │ ├── variants │ │ ├── __init__.py │ │ ├── alphagenome.py │ │ ├── cancer_types.py │ │ ├── cbio_external_client.py │ │ ├── cbioportal_mutations.py │ │ ├── cbioportal_search_helpers.py │ │ ├── cbioportal_search.py │ │ ├── constants.py │ │ ├── external.py │ │ ├── filters.py │ │ ├── getter.py │ │ ├── links.py │ │ └── search.py │ └── workers │ ├── __init__.py │ ├── worker_entry_stytch.js │ ├── worker_entry.js │ └── worker.py ├── tests │ ├── bdd │ │ ├── cli_help │ │ │ ├── help.feature │ │ │ └── test_help.py │ │ ├── conftest.py │ │ ├── features │ │ │ └── alphagenome_integration.feature │ │ ├── fetch_articles │ │ │ ├── fetch.feature │ │ │ └── test_fetch.py │ │ ├── get_trials │ │ │ ├── get.feature │ │ │ └── test_get.py │ │ ├── get_variants │ │ │ ├── get.feature │ │ │ └── test_get.py │ │ ├── search_articles │ │ │ ├── autocomplete.feature │ │ │ ├── search.feature │ │ │ ├── test_autocomplete.py │ │ │ └── test_search.py │ │ ├── search_trials │ │ │ ├── search.feature │ │ │ └── test_search.py │ │ ├── search_variants │ │ │ ├── search.feature │ │ │ └── test_search.py │ │ └── steps │ │ └── test_alphagenome_steps.py │ ├── config │ │ └── test_smithery_config.py │ ├── conftest.py │ ├── data │ │ ├── ct_gov │ │ │ ├── clinical_trials_api_v2.yaml │ │ │ ├── trials_NCT04280705.json │ │ │ └── trials_NCT04280705.txt │ │ ├── myvariant │ │ │ ├── myvariant_api.yaml │ │ │ ├── myvariant_field_descriptions.csv │ │ │ ├── variants_full_braf_v600e.json │ │ │ ├── variants_full_braf_v600e.txt │ │ │ └── variants_part_braf_v600_multiple.json │ │ ├── openfda │ │ │ ├── drugsfda_detail.json │ │ │ ├── drugsfda_search.json │ │ │ ├── enforcement_detail.json │ │ │ └── enforcement_search.json │ │ └── pubtator │ │ ├── pubtator_autocomplete.json │ │ └── pubtator3_paper.txt │ ├── integration │ │ ├── test_openfda_integration.py │ │ ├── test_preprints_integration.py │ │ ├── test_simple.py │ │ └── test_variants_integration.py │ ├── tdd │ │ ├── articles │ │ │ ├── test_autocomplete.py │ │ │ ├── test_cbioportal_integration.py │ │ │ ├── test_fetch.py │ │ │ ├── test_preprints.py │ │ │ ├── test_search.py │ │ │ └── test_unified.py │ │ ├── conftest.py │ │ ├── drugs │ │ │ ├── __init__.py │ │ │ └── test_drug_getter.py │ │ ├── openfda │ │ │ ├── __init__.py │ │ │ ├── test_adverse_events.py │ │ │ ├── test_device_events.py │ │ │ ├── test_drug_approvals.py │ │ │ ├── test_drug_labels.py │ │ │ ├── test_drug_recalls.py │ │ │ ├── test_drug_shortages.py │ │ │ └── test_security.py │ │ ├── test_biothings_integration_real.py │ │ ├── test_biothings_integration.py │ │ ├── test_circuit_breaker.py │ │ ├── test_concurrent_requests.py │ │ ├── test_connection_pool.py │ │ ├── test_domain_handlers.py │ │ ├── test_drug_approvals.py │ │ ├── test_drug_recalls.py │ │ ├── test_drug_shortages.py │ │ ├── test_endpoint_documentation.py │ │ ├── test_error_scenarios.py │ │ ├── test_europe_pmc_fetch.py │ │ ├── test_mcp_integration.py │ │ ├── test_mcp_tools.py │ │ ├── test_metrics.py │ │ ├── test_nci_integration.py │ │ ├── test_nci_mcp_tools.py │ │ ├── test_network_policies.py │ │ ├── test_offline_mode.py │ │ ├── test_openfda_unified.py │ │ ├── test_pten_r173_search.py │ │ ├── test_render.py │ │ ├── test_request_batcher.py.disabled │ │ ├── test_retry.py │ │ ├── test_router.py │ │ ├── test_shared_context.py.disabled │ │ ├── test_unified_biothings.py │ │ ├── thinking │ │ │ ├── __init__.py │ │ │ └── test_sequential.py │ │ ├── trials │ │ │ ├── test_backward_compatibility.py │ │ │ ├── test_getter.py │ │ │ └── test_search.py │ │ ├── utils │ │ │ ├── test_gene_validator.py │ │ │ ├── test_mutation_filter.py │ │ │ ├── test_rate_limiter.py │ │ │ └── test_request_cache.py │ │ ├── variants │ │ │ ├── constants.py │ │ │ ├── test_alphagenome_api_key.py │ │ │ ├── test_alphagenome_comprehensive.py │ │ │ ├── test_alphagenome.py │ │ │ ├── test_cbioportal_mutations.py │ │ │ ├── test_cbioportal_search.py │ │ │ ├── test_external_integration.py │ │ │ ├── test_external.py │ │ │ ├── test_extract_gene_aa_change.py │ │ │ ├── test_filters.py │ │ │ ├── test_getter.py │ │ │ ├── test_links.py │ │ │ └── test_search.py │ │ └── workers │ │ └── test_worker_sanitization.js │ └── test_pydantic_ai_integration.py ├── THIRD_PARTY_ENDPOINTS.md ├── tox.ini ├── uv.lock └── wrangler.toml ``` # Files -------------------------------------------------------------------------------- /src/biomcp/query_router.py: -------------------------------------------------------------------------------- ```python 1 | """Query router for unified search in BioMCP.""" 2 | 3 | import asyncio 4 | from dataclasses import dataclass 5 | from typing import Any 6 | 7 | from biomcp.articles.search import PubmedRequest 8 | from biomcp.articles.unified import search_articles_unified 9 | from biomcp.query_parser import ParsedQuery 10 | from biomcp.trials.search import TrialQuery, search_trials 11 | from biomcp.variants.search import VariantQuery, search_variants 12 | 13 | 14 | @dataclass 15 | class RoutingPlan: 16 | """Plan for routing a query to appropriate tools.""" 17 | 18 | tools_to_call: list[str] 19 | field_mappings: dict[str, dict[str, Any]] 20 | coordination_strategy: str = "parallel" 21 | 22 | 23 | class QueryRouter: 24 | """Routes unified queries to appropriate domain-specific tools.""" 25 | 26 | def route(self, parsed_query: ParsedQuery) -> RoutingPlan: 27 | """Determine which tools to call based on query fields.""" 28 | tools_to_call = [] 29 | field_mappings = {} 30 | 31 | # Check which domains are referenced 32 | domains_referenced = self._get_referenced_domains(parsed_query) 33 | 34 | # Build field mappings for each domain 35 | domain_mappers = { 36 | "articles": ("article_searcher", self._map_article_fields), 37 | "trials": ("trial_searcher", self._map_trial_fields), 38 | "variants": ("variant_searcher", self._map_variant_fields), 39 | "genes": ("gene_searcher", self._map_gene_fields), 40 | "drugs": ("drug_searcher", self._map_drug_fields), 41 | "diseases": ("disease_searcher", self._map_disease_fields), 42 | } 43 | 44 | for domain, (tool_name, mapper_func) in domain_mappers.items(): 45 | if domain in domains_referenced: 46 | tools_to_call.append(tool_name) 47 | field_mappings[tool_name] = mapper_func(parsed_query) 48 | 49 | return RoutingPlan( 50 | tools_to_call=tools_to_call, 51 | field_mappings=field_mappings, 52 | coordination_strategy="parallel", 53 | ) 54 | 55 | def _get_referenced_domains(self, parsed_query: ParsedQuery) -> set[str]: 56 | """Get all domains referenced in the query.""" 57 | domains_referenced = set() 58 | 59 | # Check domain-specific fields 60 | for domain, fields in parsed_query.domain_specific_fields.items(): 61 | if fields: 62 | domains_referenced.add(domain) 63 | 64 | # Check cross-domain fields (these trigger multiple searches) 65 | if parsed_query.cross_domain_fields: 66 | cross_domain_mappings = { 67 | "gene": ["articles", "variants", "genes", "trials"], 68 | "disease": ["articles", "trials", "diseases"], 69 | "variant": ["articles", "variants"], 70 | "chemical": ["articles", "trials", "drugs"], 71 | "drug": ["articles", "trials", "drugs"], 72 | } 73 | 74 | for field, domains in cross_domain_mappings.items(): 75 | if field in parsed_query.cross_domain_fields: 76 | domains_referenced.update(domains) 77 | 78 | return domains_referenced 79 | 80 | def _map_article_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]: 81 | """Map query fields to article searcher parameters.""" 82 | mapping: dict[str, Any] = {} 83 | 84 | # Map cross-domain fields 85 | if "gene" in parsed_query.cross_domain_fields: 86 | mapping["genes"] = [parsed_query.cross_domain_fields["gene"]] 87 | if "disease" in parsed_query.cross_domain_fields: 88 | mapping["diseases"] = [parsed_query.cross_domain_fields["disease"]] 89 | if "variant" in parsed_query.cross_domain_fields: 90 | mapping["variants"] = [parsed_query.cross_domain_fields["variant"]] 91 | 92 | # Map article-specific fields 93 | article_fields = parsed_query.domain_specific_fields.get( 94 | "articles", {} 95 | ) 96 | if "title" in article_fields: 97 | mapping["keywords"] = [article_fields["title"]] 98 | if "author" in article_fields: 99 | mapping["keywords"] = mapping.get("keywords", []) + [ 100 | article_fields["author"] 101 | ] 102 | if "journal" in article_fields: 103 | mapping["keywords"] = mapping.get("keywords", []) + [ 104 | article_fields["journal"] 105 | ] 106 | 107 | # Extract mutation patterns from raw query 108 | import re 109 | 110 | raw_query = parsed_query.raw_query 111 | # Look for mutation patterns like F57Y, F57*, V600E 112 | mutation_patterns = re.findall(r"\b[A-Z]\d+[A-Z*]\b", raw_query) 113 | if mutation_patterns: 114 | if "keywords" not in mapping: 115 | mapping["keywords"] = [] 116 | mapping["keywords"].extend(mutation_patterns) 117 | 118 | return mapping 119 | 120 | def _map_trial_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]: 121 | """Map query fields to trial searcher parameters.""" 122 | mapping: dict[str, Any] = {} 123 | 124 | # Map cross-domain fields 125 | if "disease" in parsed_query.cross_domain_fields: 126 | mapping["conditions"] = [ 127 | parsed_query.cross_domain_fields["disease"] 128 | ] 129 | 130 | # Gene searches in trials might look for targeted therapies 131 | if "gene" in parsed_query.cross_domain_fields: 132 | gene = parsed_query.cross_domain_fields["gene"] 133 | # Search for gene-targeted interventions 134 | mapping["keywords"] = [gene] 135 | 136 | # Map trial-specific fields 137 | trial_fields = parsed_query.domain_specific_fields.get("trials", {}) 138 | if "condition" in trial_fields: 139 | mapping["conditions"] = [trial_fields["condition"]] 140 | if "intervention" in trial_fields: 141 | mapping["interventions"] = [trial_fields["intervention"]] 142 | if "phase" in trial_fields: 143 | mapping["phase"] = f"PHASE{trial_fields['phase']}" 144 | if "status" in trial_fields: 145 | mapping["recruiting_status"] = trial_fields["status"].upper() 146 | 147 | return mapping 148 | 149 | def _map_variant_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]: 150 | """Map query fields to variant searcher parameters.""" 151 | mapping: dict[str, Any] = {} 152 | 153 | # Map cross-domain fields 154 | if "gene" in parsed_query.cross_domain_fields: 155 | mapping["gene"] = parsed_query.cross_domain_fields["gene"] 156 | if "variant" in parsed_query.cross_domain_fields: 157 | variant = parsed_query.cross_domain_fields["variant"] 158 | # Check if it's an rsID or protein change 159 | if variant.startswith("rs"): 160 | mapping["rsid"] = variant 161 | else: 162 | mapping["hgvsp"] = variant 163 | 164 | # Map variant-specific fields 165 | variant_fields = parsed_query.domain_specific_fields.get( 166 | "variants", {} 167 | ) 168 | if "rsid" in variant_fields: 169 | mapping["rsid"] = variant_fields["rsid"] 170 | if "gene" in variant_fields: 171 | mapping["gene"] = variant_fields["gene"] 172 | if "significance" in variant_fields: 173 | mapping["significance"] = variant_fields["significance"] 174 | if "frequency" in variant_fields: 175 | # Parse frequency operators 176 | freq = variant_fields["frequency"] 177 | if freq.startswith("<"): 178 | mapping["max_frequency"] = float(freq[1:]) 179 | elif freq.startswith(">"): 180 | mapping["min_frequency"] = float(freq[1:]) 181 | 182 | return mapping 183 | 184 | def _map_gene_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]: 185 | """Map query fields to gene searcher parameters.""" 186 | mapping: dict[str, Any] = {} 187 | 188 | # Map cross-domain fields 189 | if "gene" in parsed_query.cross_domain_fields: 190 | mapping["query"] = parsed_query.cross_domain_fields["gene"] 191 | 192 | # Map gene-specific fields 193 | gene_fields = parsed_query.domain_specific_fields.get("genes", {}) 194 | if "symbol" in gene_fields: 195 | mapping["query"] = gene_fields["symbol"] 196 | elif "name" in gene_fields: 197 | mapping["query"] = gene_fields["name"] 198 | elif "type" in gene_fields: 199 | mapping["type_of_gene"] = gene_fields["type"] 200 | 201 | return mapping 202 | 203 | def _map_drug_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]: 204 | """Map query fields to drug searcher parameters.""" 205 | mapping: dict[str, Any] = {} 206 | 207 | # Map cross-domain fields 208 | if "chemical" in parsed_query.cross_domain_fields: 209 | mapping["query"] = parsed_query.cross_domain_fields["chemical"] 210 | elif "drug" in parsed_query.cross_domain_fields: 211 | mapping["query"] = parsed_query.cross_domain_fields["drug"] 212 | 213 | # Map drug-specific fields 214 | drug_fields = parsed_query.domain_specific_fields.get("drugs", {}) 215 | if "name" in drug_fields: 216 | mapping["query"] = drug_fields["name"] 217 | elif "tradename" in drug_fields: 218 | mapping["query"] = drug_fields["tradename"] 219 | elif "indication" in drug_fields: 220 | mapping["indication"] = drug_fields["indication"] 221 | 222 | return mapping 223 | 224 | def _map_disease_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]: 225 | """Map query fields to disease searcher parameters.""" 226 | mapping: dict[str, Any] = {} 227 | 228 | # Map cross-domain fields 229 | if "disease" in parsed_query.cross_domain_fields: 230 | mapping["query"] = parsed_query.cross_domain_fields["disease"] 231 | 232 | # Map disease-specific fields 233 | disease_fields = parsed_query.domain_specific_fields.get( 234 | "diseases", {} 235 | ) 236 | if "name" in disease_fields: 237 | mapping["query"] = disease_fields["name"] 238 | elif "mondo" in disease_fields: 239 | mapping["query"] = disease_fields["mondo"] 240 | elif "synonym" in disease_fields: 241 | mapping["query"] = disease_fields["synonym"] 242 | 243 | return mapping 244 | 245 | 246 | async def execute_routing_plan( 247 | plan: RoutingPlan, output_json: bool = True 248 | ) -> dict[str, Any]: 249 | """Execute a routing plan by calling the appropriate tools.""" 250 | tasks = [] 251 | task_names = [] 252 | 253 | for tool_name in plan.tools_to_call: 254 | params = plan.field_mappings[tool_name] 255 | 256 | if tool_name == "article_searcher": 257 | request = PubmedRequest(**params) 258 | tasks.append( 259 | search_articles_unified( 260 | request, 261 | include_pubmed=True, 262 | include_preprints=False, 263 | output_json=output_json, 264 | ) 265 | ) 266 | task_names.append("articles") 267 | 268 | elif tool_name == "trial_searcher": 269 | query = TrialQuery(**params) 270 | tasks.append(search_trials(query, output_json=output_json)) 271 | task_names.append("trials") 272 | 273 | elif tool_name == "variant_searcher": 274 | variant_query = VariantQuery(**params) 275 | tasks.append( 276 | search_variants(variant_query, output_json=output_json) 277 | ) 278 | task_names.append("variants") 279 | 280 | elif tool_name == "gene_searcher": 281 | # For gene search, we'll use the BioThingsClient directly 282 | from biomcp.integrations.biothings_client import BioThingsClient 283 | 284 | client = BioThingsClient() 285 | query_str = params.get("query", "") 286 | tasks.append(_search_genes(client, query_str, output_json)) 287 | task_names.append("genes") 288 | 289 | elif tool_name == "drug_searcher": 290 | # For drug search, we'll use the BioThingsClient directly 291 | from biomcp.integrations.biothings_client import BioThingsClient 292 | 293 | client = BioThingsClient() 294 | query_str = params.get("query", "") 295 | tasks.append(_search_drugs(client, query_str, output_json)) 296 | task_names.append("drugs") 297 | 298 | elif tool_name == "disease_searcher": 299 | # For disease search, we'll use the BioThingsClient directly 300 | from biomcp.integrations.biothings_client import BioThingsClient 301 | 302 | client = BioThingsClient() 303 | query_str = params.get("query", "") 304 | tasks.append(_search_diseases(client, query_str, output_json)) 305 | task_names.append("diseases") 306 | 307 | # Execute all searches in parallel 308 | results = await asyncio.gather(*tasks, return_exceptions=True) 309 | 310 | # Package results 311 | output: dict[str, Any] = {} 312 | for name, result in zip(task_names, results, strict=False): 313 | if isinstance(result, Exception): 314 | output[name] = {"error": str(result)} 315 | else: 316 | output[name] = result 317 | 318 | return output 319 | 320 | 321 | async def _search_genes(client, query: str, output_json: bool) -> Any: 322 | """Search for genes using BioThingsClient.""" 323 | results = await client._query_gene(query) 324 | if not results: 325 | return [] if output_json else "No genes found" 326 | 327 | # Fetch full details for each result 328 | detailed_results = [] 329 | for result in results[:10]: # Limit to 10 results 330 | gene_id = result.get("_id") 331 | if gene_id: 332 | full_gene = await client._get_gene_by_id(gene_id) 333 | if full_gene: 334 | detailed_results.append(full_gene.model_dump(by_alias=True)) 335 | 336 | if output_json: 337 | import json 338 | 339 | return json.dumps(detailed_results) 340 | else: 341 | return detailed_results 342 | 343 | 344 | async def _search_drugs(client, query: str, output_json: bool) -> Any: 345 | """Search for drugs using BioThingsClient.""" 346 | results = await client._query_drug(query) 347 | if not results: 348 | return [] if output_json else "No drugs found" 349 | 350 | # Fetch full details for each result 351 | detailed_results = [] 352 | for result in results[:10]: # Limit to 10 results 353 | drug_id = result.get("_id") 354 | if drug_id: 355 | full_drug = await client._get_drug_by_id(drug_id) 356 | if full_drug: 357 | detailed_results.append(full_drug.model_dump(by_alias=True)) 358 | 359 | if output_json: 360 | import json 361 | 362 | return json.dumps(detailed_results) 363 | else: 364 | return detailed_results 365 | 366 | 367 | async def _search_diseases(client, query: str, output_json: bool) -> Any: 368 | """Search for diseases using BioThingsClient.""" 369 | results = await client._query_disease(query) 370 | if not results: 371 | return [] if output_json else "No diseases found" 372 | 373 | # Fetch full details for each result 374 | detailed_results = [] 375 | for result in results[:10]: # Limit to 10 results 376 | disease_id = result.get("_id") 377 | if disease_id: 378 | full_disease = await client._get_disease_by_id(disease_id) 379 | if full_disease: 380 | detailed_results.append(full_disease.model_dump(by_alias=True)) 381 | 382 | if output_json: 383 | import json 384 | 385 | return json.dumps(detailed_results) 386 | else: 387 | return detailed_results 388 | ``` -------------------------------------------------------------------------------- /docs/user-guides/03-integrating-with-ides-and-clients.md: -------------------------------------------------------------------------------- ```markdown 1 | # Integrating with IDEs and Clients 2 | 3 | BioMCP can be integrated into your development workflow through multiple approaches. This guide covers integration with IDEs, Python applications, and MCP-compatible clients. 4 | 5 | ## Integration Methods Overview 6 | 7 | | Method | Best For | Installation | Usage Pattern | 8 | | -------------- | ------------------------- | ------------ | ------------------------ | 9 | | **Cursor IDE** | Interactive development | Smithery CLI | Natural language queries | 10 | | **Python SDK** | Application development | pip/uv | Direct function calls | 11 | | **MCP Client** | AI assistants & protocols | Subprocess | Tool-based communication | 12 | 13 | ## Cursor IDE Integration 14 | 15 | Cursor IDE provides the most seamless integration for interactive biomedical research during development. 16 | 17 | ### Installation 18 | 19 | 1. **Prerequisites:** 20 | 21 | - [Cursor IDE](https://cursor.sh/) installed 22 | - [Smithery](https://smithery.ai/) account and token 23 | 24 | 2. **Install BioMCP:** 25 | 26 | ```bash 27 | npx -y @smithery/cli@latest install @genomoncology/biomcp --client cursor 28 | ``` 29 | 30 | 3. **Configuration:** 31 | - The Smithery CLI automatically configures Cursor 32 | - No manual configuration needed 33 | 34 | ### Usage in Cursor 35 | 36 | Once installed, you can query biomedical data using natural language: 37 | 38 | #### Clinical Trials 39 | 40 | ``` 41 | "Find Phase 3 clinical trials for lung cancer with immunotherapy" 42 | ``` 43 | 44 | #### Research Articles 45 | 46 | ``` 47 | "Summarize recent research on EGFR mutations in lung cancer" 48 | ``` 49 | 50 | #### Genetic Variants 51 | 52 | ``` 53 | "What's the clinical significance of the BRAF V600E mutation?" 54 | ``` 55 | 56 | #### Complex Queries 57 | 58 | ``` 59 | "Compare treatment outcomes for ALK-positive vs EGFR-mutant NSCLC" 60 | ``` 61 | 62 | ### Cursor Tips 63 | 64 | 1. **Be Specific**: Include gene names, disease types, and treatment modalities 65 | 2. **Iterate**: Refine queries based on initial results 66 | 3. **Cross-Reference**: Ask for both articles and trials on the same topic 67 | 4. **Export Results**: Copy formatted results for documentation 68 | 69 | ## Python SDK Integration 70 | 71 | The Python SDK provides programmatic access to BioMCP for building applications. 72 | 73 | ### Installation 74 | 75 | ```bash 76 | # Using pip 77 | pip install biomcp-python 78 | 79 | # Using uv 80 | uv add biomcp-python 81 | 82 | # For scripts 83 | uv pip install biomcp-python 84 | ``` 85 | 86 | ### Basic Usage 87 | 88 | ```python 89 | import asyncio 90 | from biomcp import BioMCP 91 | 92 | async def main(): 93 | # Initialize client 94 | client = BioMCP() 95 | 96 | # Search for articles 97 | articles = await client.articles.search( 98 | genes=["BRAF"], 99 | diseases=["melanoma"], 100 | limit=5 101 | ) 102 | 103 | # Search for trials 104 | trials = await client.trials.search( 105 | conditions=["breast cancer"], 106 | interventions=["CDK4/6 inhibitor"], 107 | recruiting_status="RECRUITING" 108 | ) 109 | 110 | # Get variant details 111 | variant = await client.variants.get("rs121913529") 112 | 113 | return articles, trials, variant 114 | 115 | # Run the async function 116 | results = asyncio.run(main()) 117 | ``` 118 | 119 | ### Advanced Features 120 | 121 | #### Domain-Specific Modules 122 | 123 | ```python 124 | from biomcp import BioMCP 125 | from biomcp.variants import search_variants, get_variant 126 | from biomcp.trials import search_trials, get_trial 127 | from biomcp.articles import search_articles, fetch_articles 128 | 129 | # Direct module usage 130 | async def variant_analysis(): 131 | # Search pathogenic TP53 variants 132 | results = await search_variants( 133 | gene="TP53", 134 | significance="pathogenic", 135 | frequency_max=0.01, 136 | limit=20 137 | ) 138 | 139 | # Get detailed annotations 140 | for variant in results: 141 | details = await get_variant(variant.id) 142 | print(f"{variant.id}: {details.clinical_significance}") 143 | ``` 144 | 145 | #### Output Formats 146 | 147 | ```python 148 | # JSON for programmatic use 149 | articles_json = await client.articles.search( 150 | genes=["KRAS"], 151 | format="json" 152 | ) 153 | 154 | # Markdown for display 155 | articles_md = await client.articles.search( 156 | genes=["KRAS"], 157 | format="markdown" 158 | ) 159 | ``` 160 | 161 | #### Error Handling 162 | 163 | ```python 164 | from biomcp.exceptions import BioMCPError, APIError, ValidationError 165 | 166 | try: 167 | results = await client.articles.search(genes=["INVALID_GENE"]) 168 | except ValidationError as e: 169 | print(f"Invalid input: {e}") 170 | except APIError as e: 171 | print(f"API error: {e}") 172 | except BioMCPError as e: 173 | print(f"General error: {e}") 174 | ``` 175 | 176 | ### Example: Building a Variant Report 177 | 178 | ```python 179 | import asyncio 180 | from biomcp import BioMCP 181 | 182 | async def generate_variant_report(gene: str, mutation: str): 183 | client = BioMCP() 184 | 185 | # 1. Get gene information 186 | gene_info = await client.genes.get(gene) 187 | 188 | # 2. Search for the specific variant 189 | variants = await client.variants.search( 190 | gene=gene, 191 | keywords=[mutation] 192 | ) 193 | 194 | # 3. Find relevant articles 195 | articles = await client.articles.search( 196 | genes=[gene], 197 | keywords=[mutation], 198 | limit=10 199 | ) 200 | 201 | # 4. Look for clinical trials 202 | trials = await client.trials.search( 203 | conditions=["cancer"], 204 | other_terms=[f"{gene} {mutation}"], 205 | recruiting_status="RECRUITING" 206 | ) 207 | 208 | # 5. Generate report 209 | report = f""" 210 | # Variant Report: {gene} {mutation} 211 | 212 | ## Gene Information 213 | - **Official Name**: {gene_info.name} 214 | - **Summary**: {gene_info.summary} 215 | 216 | ## Variant Details 217 | Found {len(variants)} matching variants 218 | 219 | ## Literature ({len(articles)} articles) 220 | Recent publications discussing this variant... 221 | 222 | ## Clinical Trials ({len(trials)} active trials) 223 | Currently recruiting studies... 224 | """ 225 | 226 | return report 227 | 228 | # Generate report 229 | report = asyncio.run(generate_variant_report("BRAF", "V600E")) 230 | print(report) 231 | ``` 232 | 233 | ## MCP Client Integration 234 | 235 | The Model Context Protocol (MCP) provides a standardized way to integrate BioMCP with AI assistants and other tools. 236 | 237 | ### Understanding MCP 238 | 239 | MCP is a protocol for communication between: 240 | 241 | - **Clients**: AI assistants, IDEs, or custom applications 242 | - **Servers**: Tool providers like BioMCP 243 | 244 | ### Critical Requirement: Think Tool 245 | 246 | **IMPORTANT**: When using MCP, you MUST call the `think` tool first before any search or fetch operations. This ensures systematic analysis and optimal results. 247 | 248 | ### Basic MCP Integration 249 | 250 | ```python 251 | import asyncio 252 | import subprocess 253 | from mcp import ClientSession, StdioServerParameters 254 | from mcp.client.stdio import stdio_client 255 | 256 | async def run_biomcp_query(): 257 | # Start BioMCP server 258 | server_params = StdioServerParameters( 259 | command="uv", 260 | args=["run", "--with", "biomcp-python", "biomcp", "run"], 261 | env={"PYTHONUNBUFFERED": "1"} 262 | ) 263 | 264 | async with stdio_client(server_params) as (read, write): 265 | async with ClientSession(read, write) as session: 266 | # Initialize and discover tools 267 | await session.initialize() 268 | tools = await session.list_tools() 269 | 270 | # CRITICAL: Always think first! 271 | await session.call_tool( 272 | "think", 273 | arguments={ 274 | "thought": "Analyzing BRAF V600E in melanoma...", 275 | "thoughtNumber": 1, 276 | "nextThoughtNeeded": True 277 | } 278 | ) 279 | 280 | # Now search for articles 281 | result = await session.call_tool( 282 | "article_searcher", 283 | arguments={ 284 | "genes": ["BRAF"], 285 | "diseases": ["melanoma"], 286 | "keywords": ["V600E"] 287 | } 288 | ) 289 | 290 | return result 291 | 292 | # Run the query 293 | result = asyncio.run(run_biomcp_query()) 294 | ``` 295 | 296 | ### Available MCP Tools 297 | 298 | BioMCP provides 24 tools through MCP: 299 | 300 | #### Core Tools (Always Use First) 301 | 302 | - `think` - Sequential reasoning (MANDATORY first step) 303 | - `search` - Unified search across domains 304 | - `fetch` - Retrieve specific records 305 | 306 | #### Domain-Specific Tools 307 | 308 | - **Articles**: `article_searcher`, `article_getter` 309 | - **Trials**: `trial_searcher`, `trial_getter`, plus detail getters 310 | - **Variants**: `variant_searcher`, `variant_getter`, `alphagenome_predictor` 311 | - **BioThings**: `gene_getter`, `disease_getter`, `drug_getter` 312 | - **NCI**: Organization, intervention, biomarker, disease tools 313 | 314 | ### MCP Integration Patterns 315 | 316 | #### Pattern 1: AI Assistant Integration 317 | 318 | ```python 319 | # Example for integrating with an AI assistant 320 | class BioMCPAssistant: 321 | def __init__(self): 322 | self.session = None 323 | 324 | async def connect(self): 325 | # Initialize MCP connection 326 | server_params = StdioServerParameters( 327 | command="biomcp", 328 | args=["run"] 329 | ) 330 | # ... connection setup ... 331 | 332 | async def process_query(self, user_query: str): 333 | # 1. Always think first 334 | await self.think_about_query(user_query) 335 | 336 | # 2. Determine appropriate tools 337 | tools_needed = self.analyze_query(user_query) 338 | 339 | # 3. Execute tool calls 340 | results = [] 341 | for tool in tools_needed: 342 | result = await self.session.call_tool(tool.name, tool.args) 343 | results.append(result) 344 | 345 | # 4. Synthesize results 346 | return self.format_response(results) 347 | ``` 348 | 349 | #### Pattern 2: Custom Client Implementation 350 | 351 | ```python 352 | import json 353 | from typing import Any, Dict 354 | 355 | class BioMCPClient: 356 | """Custom client for specific biomedical workflows""" 357 | 358 | async def variant_to_trials_pipeline(self, variant_id: str): 359 | """Find trials for patients with specific variants""" 360 | 361 | # Step 1: Think and plan 362 | await self.think( 363 | "Planning variant-to-trials search pipeline...", 364 | thoughtNumber=1 365 | ) 366 | 367 | # Step 2: Get variant details 368 | variant = await self.call_tool("variant_getter", { 369 | "variant_id": variant_id 370 | }) 371 | 372 | # Step 3: Extract gene and disease associations 373 | gene = variant.get("gene", {}).get("symbol") 374 | diseases = self.extract_diseases(variant) 375 | 376 | # Step 4: Search for relevant trials 377 | trials = await self.call_tool("trial_searcher", { 378 | "conditions": diseases, 379 | "other_terms": [f"{gene} mutation"], 380 | "recruiting_status": "RECRUITING" 381 | }) 382 | 383 | return { 384 | "variant": variant, 385 | "associated_trials": trials 386 | } 387 | ``` 388 | 389 | ### MCP Best Practices 390 | 391 | 1. **Always Think First** 392 | 393 | ```python 394 | # ✅ Correct 395 | await think(thought="Planning research...", thoughtNumber=1) 396 | await search(...) 397 | 398 | # ❌ Wrong - skips thinking 399 | await search(...) # Will produce poor results 400 | ``` 401 | 402 | 2. **Use Appropriate Tools** 403 | 404 | ```python 405 | # For broad searches across domains 406 | await call_tool("search", {"query": "gene:BRAF AND melanoma"}) 407 | 408 | # For specific domain searches 409 | await call_tool("article_searcher", {"genes": ["BRAF"]}) 410 | ``` 411 | 412 | 3. **Handle Tool Responses** 413 | ```python 414 | try: 415 | result = await session.call_tool("variant_getter", { 416 | "variant_id": "rs121913529" 417 | }) 418 | # Process structured result 419 | if result.get("error"): 420 | handle_error(result["error"]) 421 | else: 422 | process_variant(result["data"]) 423 | except Exception as e: 424 | logger.error(f"Tool call failed: {e}") 425 | ``` 426 | 427 | ## Choosing the Right Integration 428 | 429 | ### Use Cursor IDE When: 430 | 431 | - Doing interactive research during development 432 | - Exploring biomedical data for new projects 433 | - Need quick answers without writing code 434 | - Want natural language queries 435 | 436 | ### Use Python SDK When: 437 | 438 | - Building production applications 439 | - Need type-safe interfaces 440 | - Want direct function calls 441 | - Require custom error handling 442 | 443 | ### Use MCP Client When: 444 | 445 | - Integrating with AI assistants 446 | - Building protocol-compliant tools 447 | - Need standardized tool interfaces 448 | - Want language-agnostic integration 449 | 450 | ## Integration Examples 451 | 452 | ### Example 1: Research Dashboard (Python SDK) 453 | 454 | ```python 455 | from biomcp import BioMCP 456 | import streamlit as st 457 | 458 | async def create_dashboard(): 459 | client = BioMCP() 460 | 461 | st.title("Biomedical Research Dashboard") 462 | 463 | # Gene input 464 | gene = st.text_input("Enter gene symbol:", "BRAF") 465 | 466 | if st.button("Search"): 467 | # Fetch comprehensive data 468 | col1, col2 = st.columns(2) 469 | 470 | with col1: 471 | st.subheader("Recent Articles") 472 | articles = await client.articles.search(genes=[gene], limit=5) 473 | for article in articles: 474 | st.write(f"- [{article.title}]({article.url})") 475 | 476 | with col2: 477 | st.subheader("Active Trials") 478 | trials = await client.trials.search( 479 | other_terms=[gene], 480 | recruiting_status="RECRUITING", 481 | limit=5 482 | ) 483 | for trial in trials: 484 | st.write(f"- [{trial.nct_id}]({trial.url})") 485 | ``` 486 | 487 | ### Example 2: Variant Analysis Pipeline (MCP) 488 | 489 | ```python 490 | async def comprehensive_variant_analysis(session, hgvs: str): 491 | """Complete variant analysis workflow using MCP""" 492 | 493 | # Think about the analysis 494 | await session.call_tool("think", { 495 | "thought": f"Planning comprehensive analysis for {hgvs}", 496 | "thoughtNumber": 1 497 | }) 498 | 499 | # Get variant details 500 | variant = await session.call_tool("variant_getter", { 501 | "variant_id": hgvs 502 | }) 503 | 504 | # Search related articles 505 | articles = await session.call_tool("article_searcher", { 506 | "variants": [hgvs], 507 | "limit": 10 508 | }) 509 | 510 | # Find applicable trials 511 | gene = variant.get("gene", {}).get("symbol") 512 | trials = await session.call_tool("trial_searcher", { 513 | "other_terms": [f"{gene} mutation"], 514 | "recruiting_status": "RECRUITING" 515 | }) 516 | 517 | # Predict functional effects if genomic coordinates available 518 | if variant.get("chrom") and variant.get("pos"): 519 | prediction = await session.call_tool("alphagenome_predictor", { 520 | "chromosome": f"chr{variant['chrom']}", 521 | "position": variant["pos"], 522 | "reference": variant["ref"], 523 | "alternate": variant["alt"] 524 | }) 525 | 526 | return { 527 | "variant": variant, 528 | "articles": articles, 529 | "trials": trials, 530 | "prediction": prediction 531 | } 532 | ``` 533 | 534 | ## Troubleshooting 535 | 536 | ### Common Issues 537 | 538 | 1. **"Think tool not called" errors** 539 | 540 | - Always call think before other operations 541 | - Include thoughtNumber parameter 542 | 543 | 2. **API rate limits** 544 | 545 | - Add delays between requests 546 | - Use API keys for higher limits 547 | 548 | 3. **Connection failures** 549 | 550 | - Check network connectivity 551 | - Verify server is running 552 | - Ensure correct installation 553 | 554 | 4. **Invalid gene symbols** 555 | - Use official HGNC symbols 556 | - Check [genenames.org](https://www.genenames.org) 557 | 558 | ### Debug Mode 559 | 560 | Enable debug logging: 561 | 562 | ```python 563 | # Python SDK 564 | import logging 565 | logging.basicConfig(level=logging.DEBUG) 566 | 567 | # MCP Client 568 | server_params = StdioServerParameters( 569 | command="biomcp", 570 | args=["run", "--log-level", "DEBUG"] 571 | ) 572 | ``` 573 | 574 | ## Next Steps 575 | 576 | - Explore [tool-specific documentation](02-mcp-tools-reference.md) 577 | - Review [API authentication](../getting-started/03-authentication-and-api-keys.md) 578 | - Check [example workflows](../how-to-guides/01-find-articles-and-cbioportal-data.md) for your use case 579 | ``` -------------------------------------------------------------------------------- /docs/user-guides/01-command-line-interface.md: -------------------------------------------------------------------------------- ```markdown 1 | # Command Line Interface Reference 2 | 3 | BioMCP provides a comprehensive command-line interface for biomedical data retrieval and analysis. This guide covers all available commands, options, and usage patterns. 4 | 5 | ## Installation 6 | 7 | ```bash 8 | # Using uv (recommended) 9 | uv tool install biomcp 10 | 11 | # Using pip 12 | pip install biomcp-python 13 | ``` 14 | 15 | ## Global Options 16 | 17 | These options work with all commands: 18 | 19 | ```bash 20 | biomcp [OPTIONS] COMMAND [ARGS]... 21 | 22 | Options: 23 | --version Show the version and exit 24 | --help Show help message and exit 25 | ``` 26 | 27 | ## Commands Overview 28 | 29 | | Domain | Commands | Purpose | 30 | | ---------------- | -------------------- | ----------------------------------------------- | 31 | | **article** | search, get | Search and retrieve biomedical literature | 32 | | **trial** | search, get | Find and fetch clinical trial information | 33 | | **variant** | search, get, predict | Analyze genetic variants and predict effects | 34 | | **gene** | get | Retrieve gene information and annotations | 35 | | **drug** | get | Look up drug/chemical information | 36 | | **disease** | get | Get disease definitions and synonyms | 37 | | **organization** | search | Search NCI organization database | 38 | | **intervention** | search | Find interventions (drugs, devices, procedures) | 39 | | **biomarker** | search | Search biomarkers used in trials | 40 | | **health** | check | Monitor API status and system health | 41 | 42 | ## Article Commands 43 | 44 | For practical examples and workflows, see [How to Find Articles and cBioPortal Data](../how-to-guides/01-find-articles-and-cbioportal-data.md). 45 | 46 | ### article search 47 | 48 | Search PubMed/PubTator3 for biomedical literature with automatic cBioPortal integration. 49 | 50 | ```bash 51 | biomcp article search [OPTIONS] 52 | ``` 53 | 54 | **Options:** 55 | 56 | - `--gene, -g TEXT`: Gene symbol(s) to search for 57 | - `--variant, -v TEXT`: Genetic variant(s) to search for 58 | - `--disease, -d TEXT`: Disease/condition(s) to search for 59 | - `--chemical, -c TEXT`: Chemical/drug name(s) to search for 60 | - `--keyword, -k TEXT`: Keyword(s) to search for (supports OR with `|`) 61 | - `--pmid TEXT`: Specific PubMed ID(s) to retrieve 62 | - `--limit INTEGER`: Maximum results to return (default: 10) 63 | - `--no-preprints`: Exclude preprints from results 64 | - `--no-cbioportal`: Disable automatic cBioPortal integration 65 | - `--format [json|markdown]`: Output format (default: markdown) 66 | 67 | **Examples:** 68 | 69 | ```bash 70 | # Basic gene search with automatic cBioPortal data 71 | biomcp article search --gene BRAF --disease melanoma 72 | 73 | # Multiple filters 74 | biomcp article search --gene EGFR --disease "lung cancer" --chemical erlotinib 75 | 76 | # OR logic in keywords (find different variant notations) 77 | biomcp article search --gene PTEN --keyword "R173|Arg173|p.R173" 78 | 79 | # Exclude preprints 80 | biomcp article search --gene TP53 --no-preprints --limit 20 81 | 82 | # JSON output for programmatic use 83 | biomcp article search --gene KRAS --format json > results.json 84 | ``` 85 | 86 | ### article get 87 | 88 | Retrieve a specific article by PubMed ID or DOI. 89 | 90 | ```bash 91 | biomcp article get IDENTIFIER 92 | ``` 93 | 94 | **Arguments:** 95 | 96 | - `IDENTIFIER`: PubMed ID (e.g., "38768446") or DOI (e.g., "10.1101/2024.01.20.23288905") 97 | 98 | **Examples:** 99 | 100 | ```bash 101 | # Get article by PubMed ID 102 | biomcp article get 38768446 103 | 104 | # Get preprint by DOI 105 | biomcp article get "10.1101/2024.01.20.23288905" 106 | ``` 107 | 108 | ## Trial Commands 109 | 110 | For practical examples and workflows, see [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md). 111 | 112 | ### trial search 113 | 114 | Search ClinicalTrials.gov or NCI CTS API for clinical trials. 115 | 116 | ```bash 117 | biomcp trial search [OPTIONS] 118 | ``` 119 | 120 | **Basic Options:** 121 | 122 | - `--condition TEXT`: Disease/condition to search 123 | - `--intervention TEXT`: Treatment/intervention to search 124 | - `--term TEXT`: General search terms 125 | - `--nct-id TEXT`: Specific NCT ID(s) 126 | - `--limit INTEGER`: Maximum results (default: 10) 127 | - `--source [ctgov|nci]`: Data source (default: ctgov) 128 | - `--api-key TEXT`: API key for NCI source 129 | 130 | **Study Characteristics:** 131 | 132 | - `--status TEXT`: Trial status (RECRUITING, ACTIVE_NOT_RECRUITING, etc.) 133 | - `--study-type TEXT`: Type of study (INTERVENTIONAL, OBSERVATIONAL) 134 | - `--phase TEXT`: Trial phase (EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4) 135 | - `--study-purpose TEXT`: Primary purpose (TREATMENT, PREVENTION, etc.) 136 | - `--age-group TEXT`: Target age group (CHILD, ADULT, OLDER_ADULT) 137 | 138 | **Location Options:** 139 | 140 | - `--country TEXT`: Country name 141 | - `--state TEXT`: State/province 142 | - `--city TEXT`: City name 143 | - `--latitude FLOAT`: Geographic latitude 144 | - `--longitude FLOAT`: Geographic longitude 145 | - `--distance INTEGER`: Search radius in miles 146 | 147 | **Advanced Filters:** 148 | 149 | - `--start-date TEXT`: Trial start date (YYYY-MM-DD) 150 | - `--end-date TEXT`: Trial end date (YYYY-MM-DD) 151 | - `--intervention-type TEXT`: Type of intervention 152 | - `--sponsor-type TEXT`: Type of sponsor 153 | - `--is-fda-regulated`: FDA-regulated trials only 154 | - `--expanded-access`: Trials offering expanded access 155 | 156 | **Examples:** 157 | 158 | ```bash 159 | # Find recruiting melanoma trials 160 | biomcp trial search --condition melanoma --status RECRUITING 161 | 162 | # Search by location (requires coordinates) 163 | biomcp trial search --condition "lung cancer" \ 164 | --latitude 41.4993 --longitude -81.6944 --distance 50 165 | 166 | # Use NCI source with advanced filters 167 | biomcp trial search --condition melanoma --source nci \ 168 | --required-mutations "BRAF V600E" --allow-brain-mets true \ 169 | --api-key YOUR_KEY 170 | 171 | # Multiple filters 172 | biomcp trial search --condition "breast cancer" \ 173 | --intervention "CDK4/6 inhibitor" --phase PHASE3 \ 174 | --status RECRUITING --country "United States" 175 | ``` 176 | 177 | ### trial get 178 | 179 | Retrieve detailed information about a specific clinical trial. 180 | 181 | ```bash 182 | biomcp trial get NCT_ID [OPTIONS] 183 | ``` 184 | 185 | **Arguments:** 186 | 187 | - `NCT_ID`: Clinical trial identifier (e.g., NCT03006926) 188 | 189 | **Options:** 190 | 191 | - `--include TEXT`: Specific sections to include (Protocol, Locations, References, Outcomes) 192 | - `--source [ctgov|nci]`: Data source (default: ctgov) 193 | - `--api-key TEXT`: API key for NCI source 194 | 195 | **Examples:** 196 | 197 | ```bash 198 | # Get basic trial information 199 | biomcp trial get NCT03006926 200 | 201 | # Get specific sections 202 | biomcp trial get NCT03006926 --include Protocol --include Locations 203 | 204 | # Use NCI source 205 | biomcp trial get NCT04280705 --source nci --api-key YOUR_KEY 206 | ``` 207 | 208 | ## Variant Commands 209 | 210 | For practical examples and workflows, see: 211 | 212 | - [Get Comprehensive Variant Annotations](../how-to-guides/03-get-comprehensive-variant-annotations.md) 213 | - [Predict Variant Effects with AlphaGenome](../how-to-guides/04-predict-variant-effects-with-alphagenome.md) 214 | 215 | ### variant search 216 | 217 | Search MyVariant.info for genetic variant annotations. 218 | 219 | ```bash 220 | biomcp variant search [OPTIONS] 221 | ``` 222 | 223 | **Options:** 224 | 225 | - `--gene TEXT`: Gene symbol 226 | - `--hgvs TEXT`: HGVS notation 227 | - `--rsid TEXT`: dbSNP rsID 228 | - `--chromosome TEXT`: Chromosome 229 | - `--start INTEGER`: Genomic start position 230 | - `--end INTEGER`: Genomic end position 231 | - `--assembly [hg19|hg38]`: Genome assembly (default: hg38) 232 | - `--significance TEXT`: Clinical significance 233 | - `--min-frequency FLOAT`: Minimum allele frequency 234 | - `--max-frequency FLOAT`: Maximum allele frequency 235 | - `--min-cadd FLOAT`: Minimum CADD score 236 | - `--polyphen TEXT`: PolyPhen prediction 237 | - `--sift TEXT`: SIFT prediction 238 | - `--sources TEXT`: Data sources to include 239 | - `--limit INTEGER`: Maximum results (default: 10) 240 | - `--no-cbioportal`: Disable cBioPortal integration 241 | 242 | **Examples:** 243 | 244 | ```bash 245 | # Search pathogenic BRCA1 variants 246 | biomcp variant search --gene BRCA1 --significance pathogenic 247 | 248 | # Search by HGVS notation 249 | biomcp variant search --hgvs "NM_007294.4:c.5266dupC" 250 | 251 | # Filter by frequency and prediction scores 252 | biomcp variant search --gene TP53 --max-frequency 0.01 \ 253 | --min-cadd 20 --polyphen possibly_damaging 254 | 255 | # Search genomic region 256 | biomcp variant search --chromosome 7 --start 140753336 --end 140753337 257 | ``` 258 | 259 | ### variant get 260 | 261 | Retrieve detailed information about a specific variant. 262 | 263 | ```bash 264 | biomcp variant get VARIANT_ID [OPTIONS] 265 | ``` 266 | 267 | **Arguments:** 268 | 269 | - `VARIANT_ID`: Variant identifier (HGVS, rsID, or genomic) 270 | 271 | **Options:** 272 | 273 | - `--json, -j`: Output in JSON format 274 | - `--include-external / --no-external`: Include/exclude external annotations (default: include) 275 | - `--assembly TEXT`: Genome assembly (hg19 or hg38, default: hg19) 276 | 277 | **Examples:** 278 | 279 | ```bash 280 | # Get variant by HGVS (defaults to hg19) 281 | biomcp variant get "NM_007294.4:c.5266dupC" 282 | 283 | # Get variant by rsID 284 | biomcp variant get rs121913529 285 | 286 | # Specify hg38 assembly 287 | biomcp variant get rs113488022 --assembly hg38 288 | 289 | # JSON output with hg38 290 | biomcp variant get rs113488022 --json --assembly hg38 291 | 292 | # Without external annotations 293 | biomcp variant get rs113488022 --no-external 294 | 295 | # Get variant by genomic coordinates 296 | biomcp variant get "chr17:g.43082434G>A" 297 | ``` 298 | 299 | ### variant predict 300 | 301 | Predict variant effects using Google DeepMind's AlphaGenome (requires API key). 302 | 303 | ```bash 304 | biomcp variant predict CHROMOSOME POSITION REFERENCE ALTERNATE [OPTIONS] 305 | ``` 306 | 307 | **Arguments:** 308 | 309 | - `CHROMOSOME`: Chromosome (e.g., chr7) 310 | - `POSITION`: Genomic position 311 | - `REFERENCE`: Reference allele 312 | - `ALTERNATE`: Alternate allele 313 | 314 | **Options:** 315 | 316 | - `--tissue TEXT`: Tissue type(s) using UBERON ontology 317 | - `--interval INTEGER`: Analysis window size (default: 20000) 318 | - `--api-key TEXT`: AlphaGenome API key 319 | 320 | **Examples:** 321 | 322 | ```bash 323 | # Basic prediction (requires ALPHAGENOME_API_KEY env var) 324 | biomcp variant predict chr7 140753336 A T 325 | 326 | # Tissue-specific prediction 327 | biomcp variant predict chr7 140753336 A T \ 328 | --tissue UBERON:0002367 # breast tissue 329 | 330 | # With per-request API key 331 | biomcp variant predict chr7 140753336 A T --api-key YOUR_KEY 332 | ``` 333 | 334 | ## Gene/Drug/Disease Commands 335 | 336 | For practical examples using BioThings integration, see [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#biothings-integration-for-enhanced-search). 337 | 338 | ### gene get 339 | 340 | Retrieve gene information from MyGene.info. 341 | 342 | ```bash 343 | biomcp gene get GENE_NAME 344 | ``` 345 | 346 | **Examples:** 347 | 348 | ```bash 349 | # Get gene information 350 | biomcp gene get TP53 351 | biomcp gene get BRAF 352 | ``` 353 | 354 | ### drug get 355 | 356 | Retrieve drug/chemical information from MyChem.info. 357 | 358 | ```bash 359 | biomcp drug get DRUG_NAME 360 | ``` 361 | 362 | **Examples:** 363 | 364 | ```bash 365 | # Get drug information 366 | biomcp drug get imatinib 367 | biomcp drug get pembrolizumab 368 | ``` 369 | 370 | ### disease get 371 | 372 | Retrieve disease information from MyDisease.info. 373 | 374 | ```bash 375 | biomcp disease get DISEASE_NAME 376 | ``` 377 | 378 | **Examples:** 379 | 380 | ```bash 381 | # Get disease information 382 | biomcp disease get melanoma 383 | biomcp disease get "non-small cell lung cancer" 384 | ``` 385 | 386 | ## NCI-Specific Commands 387 | 388 | These commands require an NCI API key. For setup instructions and usage examples, see: 389 | 390 | - [Authentication and API Keys](../getting-started/03-authentication-and-api-keys.md#nci-clinical-trials-api) 391 | - [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#using-nci-api-advanced-features) 392 | 393 | ### organization search 394 | 395 | Search NCI's organization database. 396 | 397 | ```bash 398 | biomcp organization search [OPTIONS] 399 | ``` 400 | 401 | **Options:** 402 | 403 | - `--name TEXT`: Organization name 404 | - `--city TEXT`: City location 405 | - `--state TEXT`: State/province 406 | - `--country TEXT`: Country 407 | - `--org-type TEXT`: Organization type 408 | - `--api-key TEXT`: NCI API key 409 | 410 | **Example:** 411 | 412 | ```bash 413 | biomcp organization search --name "MD Anderson" \ 414 | --city Houston --state TX --api-key YOUR_KEY 415 | ``` 416 | 417 | ### intervention search 418 | 419 | Search NCI's intervention database. 420 | 421 | ```bash 422 | biomcp intervention search [OPTIONS] 423 | ``` 424 | 425 | **Options:** 426 | 427 | - `--name TEXT`: Intervention name 428 | - `--intervention-type TEXT`: Type (Drug, Device, Procedure, etc.) 429 | - `--api-key TEXT`: NCI API key 430 | 431 | **Example:** 432 | 433 | ```bash 434 | biomcp intervention search --name pembrolizumab \ 435 | --intervention-type Drug --api-key YOUR_KEY 436 | ``` 437 | 438 | ### biomarker search 439 | 440 | Search biomarkers used in clinical trials. 441 | 442 | ```bash 443 | biomcp biomarker search [OPTIONS] 444 | ``` 445 | 446 | **Options:** 447 | 448 | - `--gene TEXT`: Gene symbol 449 | - `--biomarker-type TEXT`: Type of biomarker 450 | - `--api-key TEXT`: NCI API key 451 | 452 | **Example:** 453 | 454 | ```bash 455 | biomcp biomarker search --gene EGFR \ 456 | --biomarker-type mutation --api-key YOUR_KEY 457 | ``` 458 | 459 | ## Health Command 460 | 461 | For monitoring API status before bulk operations, see the [Performance Optimizations Guide](../developer-guides/07-performance-optimizations.md). 462 | 463 | ### health check 464 | 465 | Monitor API endpoints and system health. 466 | 467 | ```bash 468 | biomcp health check [OPTIONS] 469 | ``` 470 | 471 | **Options:** 472 | 473 | - `--apis-only`: Check only API endpoints 474 | - `--system-only`: Check only system resources 475 | - `--verbose, -v`: Show detailed information 476 | 477 | **Examples:** 478 | 479 | ```bash 480 | # Full health check 481 | biomcp health check 482 | 483 | # Check APIs only 484 | biomcp health check --apis-only 485 | 486 | # Detailed system check 487 | biomcp health check --system-only --verbose 488 | ``` 489 | 490 | ## Output Formats 491 | 492 | Most commands support both human-readable markdown and machine-readable JSON output: 493 | 494 | ```bash 495 | # Default markdown output 496 | biomcp article search --gene BRAF 497 | 498 | # JSON for programmatic use 499 | biomcp article search --gene BRAF --format json 500 | 501 | # Save to file 502 | biomcp trial search --condition melanoma --format json > trials.json 503 | ``` 504 | 505 | ## Environment Variables 506 | 507 | Configure default behavior with environment variables: 508 | 509 | ```bash 510 | # API Keys 511 | export NCI_API_KEY="your-nci-key" 512 | export ALPHAGENOME_API_KEY="your-alphagenome-key" 513 | export CBIO_TOKEN="your-cbioportal-token" 514 | 515 | # Logging 516 | export BIOMCP_LOG_LEVEL="DEBUG" 517 | export BIOMCP_CACHE_DIR="/path/to/cache" 518 | ``` 519 | 520 | ## Getting Help 521 | 522 | Every command has a built-in help flag: 523 | 524 | ```bash 525 | # General help 526 | biomcp --help 527 | 528 | # Command-specific help 529 | biomcp article search --help 530 | biomcp trial get --help 531 | biomcp variant predict --help 532 | ``` 533 | 534 | ## Tips and Best Practices 535 | 536 | 1. **Use Official Gene Symbols**: Always use HGNC-approved gene symbols (e.g., "TP53" not "p53") 537 | 538 | 2. **Combine Filters**: Most commands support multiple filters for precise results: 539 | 540 | ```bash 541 | biomcp article search --gene EGFR --disease "lung cancer" \ 542 | --chemical erlotinib --keyword "resistance" 543 | ``` 544 | 545 | 3. **Handle Large Results**: Use `--limit` and `--format json` for processing: 546 | 547 | ```bash 548 | biomcp article search --gene BRCA1 --limit 100 --format json | \ 549 | jq '.results[] | {pmid: .pmid, title: .title}' 550 | ``` 551 | 552 | 4. **Location Searches**: Always provide both latitude and longitude: 553 | 554 | ```bash 555 | # Find trials near Boston 556 | biomcp trial search --condition cancer \ 557 | --latitude 42.3601 --longitude -71.0589 --distance 25 558 | ``` 559 | 560 | 5. **Use OR Logic**: The pipe character enables flexible searches: 561 | 562 | ```bash 563 | # Find articles mentioning any form of a variant 564 | biomcp article search --gene BRAF --keyword "V600E|p.V600E|c.1799T>A" 565 | ``` 566 | 567 | 6. **Check API Health**: Before bulk operations, verify API status: 568 | ```bash 569 | biomcp health check --apis-only 570 | ``` 571 | 572 | ## Next Steps 573 | 574 | - Set up [API keys](../getting-started/03-authentication-and-api-keys.md) for enhanced features 575 | - Explore [MCP tools](02-mcp-tools-reference.md) for AI integration 576 | - Read [how-to guides](../how-to-guides/01-find-articles-and-cbioportal-data.md) for complex workflows 577 | ``` -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- ```markdown 1 | # Changelog 2 | 3 | All notable changes to the BioMCP project will be documented in this file. 4 | 5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), 6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 7 | 8 | ## [0.6.7] - 2025-08-13 9 | 10 | ### Fixed 11 | 12 | - **MCP Resource Encoding** - Fixed character encoding error when loading resources on Windows (Issue #63): 13 | - Added explicit UTF-8 encoding for reading `instructions.md` and `researcher.md` resource files 14 | - Resolves "'charmap' codec can't decode byte 0x8f" error on Windows systems 15 | - Ensures cross-platform compatibility for resource loading 16 | 17 | ### Changed 18 | 19 | - **Documentation** - Clarified sequential thinking integration: 20 | - Updated `researcher-persona-resource.md` to remove references to external `sequential-thinking` MCP server 21 | - Clarified that the `think` tool is built into BioMCP (no external dependencies needed) 22 | - Updated configuration examples to show only BioMCP server is required 23 | 24 | ## [0.6.6] - 2025-08-08 25 | 26 | ### Fixed 27 | 28 | - **Windows Compatibility** - Fixed fcntl module import error on Windows (Issue #57): 29 | - Added conditional import with try/except for fcntl module 30 | - File locking now only applies on Unix systems 31 | - Windows users get full functionality without file locking 32 | - Refactored cache functions to reduce code complexity 33 | 34 | ### Changed 35 | 36 | - **Documentation** - Updated Docker instructions in README (Issue #58): 37 | - Added `docker build -t biomcp:latest .` command before `docker run` 38 | - Clarified that biomcp:latest is a local build, not pulled from Docker Hub 39 | 40 | ## [0.6.5] - 2025-08-07 41 | 42 | ### Added 43 | 44 | - **OpenFDA Integration** - Comprehensive FDA regulatory data access: 45 | - **12 New MCP Tools** for adverse events, drug labels, device events, drug approvals, recalls, and shortages 46 | - Each domain includes searcher and getter tools for flexible data retrieval 47 | - Unified search support with `domain="fda_*"` parameters 48 | - Enhanced CLI commands for all OpenFDA endpoints 49 | - Smart caching and rate limiting for API efficiency 50 | - Comprehensive error handling and data validation 51 | 52 | ### Changed 53 | 54 | - Improved API key support across all OpenFDA tools 55 | - Enhanced documentation for FDA data integration 56 | 57 | ## [0.6.4] - 2025-08-06 58 | 59 | ### Changed 60 | 61 | - **Documentation Restructure** - Major documentation improvements: 62 | - Simplified navigation structure for better user experience 63 | - Fixed code block formatting and layout issues 64 | - Removed unnecessary sections and redundant content 65 | - Improved overall documentation readability and organization 66 | - Enhanced mobile responsiveness 67 | 68 | ## [0.6.3] - 2025-08-05 69 | 70 | ### Added 71 | 72 | - **NCI Clinical Trials Search API Integration** - Enhanced cancer trial search capabilities: 73 | - Dual source support for trial search/getter tools (ClinicalTrials.gov + NCI) 74 | - NCI API key handling via `NCI_API_KEY` environment variable or parameter 75 | - Advanced trial filters: biomarkers, prior therapy, brain metastases acceptance 76 | - **6 New MCP Tools** for NCI-specific searches: 77 | - `nci_organization_searcher` / `nci_organization_getter`: Cancer centers, hospitals, research institutions 78 | - `nci_intervention_searcher` / `nci_intervention_getter`: Drugs, devices, procedures, biologicals 79 | - `nci_biomarker_searcher`: Trial eligibility biomarkers (reference genes, branches) 80 | - `nci_disease_searcher`: NCI's controlled vocabulary of cancer conditions 81 | - **OR Query Support**: All NCI endpoints support OR queries (e.g., "PD-L1 OR CD274") 82 | - Real-time access to NCI's curated cancer trials database 83 | - Automatic cBioPortal integration for gene searches 84 | - Proper NCI parameter mapping (org_city, org_state_or_province, etc.) 85 | - Comprehensive error handling for Elasticsearch limits 86 | 87 | ### Changed 88 | 89 | - Enhanced unified search router to properly handle NCI domains 90 | - Trial search/getter tools now accept `source` parameter ("clinicaltrials" or "nci") 91 | - Improved domain-specific search logic for query+domain combinations 92 | 93 | ## [0.6.2] - 2025-08-05 94 | 95 | Note: Initial NCI integration release - see v0.6.3 for the full implementation. 96 | 97 | ## [0.6.1] - 2025-08-03 98 | 99 | ### Fixed 100 | 101 | - **Dependency Management** - Fixed alphagenome dependency to enable PyPI publishing 102 | - Made alphagenome an optional dependency 103 | - Resolved packaging conflicts for distribution 104 | 105 | ## [0.6.0] - 2025-08-02 106 | 107 | ### Added 108 | 109 | - **Streamable HTTP Transport Protocol** - Modern MCP transport implementation: 110 | - Single `/mcp` endpoint for all communication 111 | - Session management with persistent session IDs 112 | - Event resumption support for reliability 113 | - On-demand streaming for long operations 114 | - Configurable HTTP server modes (STDIO, HTTP, Worker) 115 | - Better scalability for cloud deployments 116 | - Full MCP specification compliance (2025-03-26) 117 | 118 | ### Changed 119 | 120 | - Improved Cloudflare Worker integration 121 | - Enhanced transport layer with comprehensive testing 122 | - Updated deployment configurations for HTTP mode 123 | 124 | ## [0.5.0] - 2025-07-31 125 | 126 | ### Added 127 | 128 | - **BioThings API Integration** - Real-time biomedical data access: 129 | - **MyGene.info**: Gene annotations, summaries, aliases, and database links 130 | - **MyChem.info**: Drug/chemical information, identifiers, mechanisms of action 131 | - **MyDisease.info**: Disease definitions, synonyms, MONDO/DOID mappings 132 | - **3 New MCP Tools**: `gene_getter`, `drug_getter`, `disease_getter` 133 | - Automatic synonym expansion for enhanced trial searches 134 | - Batch optimization for multiple gene lookups 135 | - Live data fetching ensures current information 136 | 137 | ### Changed 138 | 139 | - Enhanced unified search capabilities with BioThings data 140 | - Expanded query language support for gene, drug, and disease queries 141 | - Improved trial searches with automatic disease synonym expansion 142 | 143 | ## [0.4.7] - 2025-07-30 144 | 145 | ### Added 146 | 147 | - **BioThings Integration** for real-time biomedical data access: 148 | - **New MCP Tools** (3 tools added, total now 17): 149 | - `gene_getter`: Query MyGene.info for gene information (symbols, names, summaries) 150 | - `drug_getter`: Query MyChem.info for drug/chemical data (formulas, indications, mechanisms) 151 | - `disease_getter`: Query MyDisease.info for disease information (definitions, synonyms, ontologies) 152 | - **Unified Search/Fetch Enhancement**: 153 | - Added `gene`, `drug`, `disease` as new searchable domains alongside article, trial, variant 154 | - Integrated into unified search syntax: `search(domain="gene", keywords=["BRAF"])` 155 | - Query language support: `gene:BRAF`, `drug:pembrolizumab`, `disease:melanoma` 156 | - Full fetch support: `fetch(domain="drug", id="DB00945")` 157 | - **Clinical Trial Enhancement**: 158 | - Automatic disease synonym expansion for trial searches 159 | - Real-time synonym lookup from MyDisease.info 160 | - Example: searching for "GIST" automatically includes "gastrointestinal stromal tumor" 161 | - **Smart Caching & Performance**: 162 | - Batch operations for multiple gene/drug lookups 163 | - Intelligent caching with TTL (gene: 24h, drug: 48h, disease: 72h) 164 | - Rate limiting to respect API guidelines 165 | 166 | ### Changed 167 | 168 | - Trial search now expands disease terms by default (disable with `expand_synonyms=False`) 169 | - Enhanced error handling for BioThings API responses 170 | - Improved network reliability with automatic retries 171 | 172 | ## [0.4.6] - 2025-07-09 173 | 174 | ### Added 175 | 176 | - MkDocs documentation deployment 177 | 178 | ## [0.4.5] - 2025-07-09 179 | 180 | ### Added 181 | 182 | - Unified search and fetch tools following OpenAI MCP guidelines 183 | - Additional variant sources (TCGA/GDC, 1000 Genomes) enabled by default in fetch operations 184 | - Additional article sources (bioRxiv, medRxiv, Europe PMC) enabled by default in search operations 185 | 186 | ### Changed 187 | 188 | - Consolidated 10 separate MCP tools into 2 unified tools (search and fetch) 189 | - Updated response formats to comply with OpenAI MCP specifications 190 | 191 | ### Fixed 192 | 193 | - OpenAI MCP compliance issues to enable integration 194 | 195 | ## [0.4.4] - 2025-07-08 196 | 197 | ### Added 198 | 199 | - **Performance Optimizations**: 200 | - Connection pooling with event loop lifecycle management (30% latency reduction) 201 | - Parallel test execution with pytest-xdist (5x faster test runs) 202 | - Request batching for cBioPortal API calls (80% fewer API calls) 203 | - Smart caching with LRU eviction and fast hash keys (10x faster cache operations) 204 | - Major performance improvements achieving ~3x faster test execution (120s → 42s) 205 | 206 | ### Fixed 207 | 208 | - Non-critical ASGI errors suppressed 209 | - Performance issues in article_searcher 210 | 211 | ## [0.4.3] - 2025-07-08 212 | 213 | ### Added 214 | 215 | - Complete HTTP centralization and improved code quality 216 | - Comprehensive constants module for better maintainability 217 | - Domain-specific handlers for result formatting 218 | - Parameter parser for robust input validation 219 | - Custom exception hierarchy for better error handling 220 | 221 | ### Changed 222 | 223 | - Refactored domain handlers to use static methods for better performance 224 | - Enhanced type safety throughout the codebase 225 | - Refactored complex functions to meet code quality standards 226 | 227 | ### Fixed 228 | 229 | - Type errors in router.py for full mypy compliance 230 | - Complex functions exceeding cyclomatic complexity thresholds 231 | 232 | ## [0.4.2] - 2025-07-07 233 | 234 | ### Added 235 | 236 | - Europe PMC DOI support for article fetching 237 | - Pagination support for Europe PMC searches 238 | - OR logic support for variant notation searches (e.g., R173 vs Arg173 vs p.R173) 239 | 240 | ### Changed 241 | 242 | - Enhanced variant notation search capabilities 243 | 244 | ## [0.4.1] - 2025-07-03 245 | 246 | ### Added 247 | 248 | - AlphaGenome as an optional dependency to predict variant effects on gene regulation 249 | - Per-request API key support for AlphaGenome integration 250 | - AI predictions to complement existing database lookups 251 | 252 | ### Security 253 | 254 | - Comprehensive sanitization in Cloudflare Worker to prevent sensitive data logging 255 | - Secure usage in hosted environments where users provide their own keys 256 | 257 | ## [0.4.0] - 2025-06-27 258 | 259 | ### Added 260 | 261 | - **cBioPortal Integration** for article searches: 262 | - Automatic gene-level mutation summaries when searching with gene parameters 263 | - Mutation-specific search capabilities (e.g., BRAF V600E, SRSF2 F57\*) 264 | - Dynamic cancer type resolution using cBioPortal API 265 | - Smart caching and rate limiting for optimal performance 266 | 267 | ## [0.3.3] - 2025-06-20 268 | 269 | ### Changed 270 | 271 | - Release workflow updates 272 | 273 | ## [0.3.2] - 2025-06-20 274 | 275 | ### Changed 276 | 277 | - Release workflow updates 278 | 279 | ## [0.3.1] - 2025-06-20 280 | 281 | ### Fixed 282 | 283 | - Build and release process improvements 284 | 285 | ## [0.3.0] - 2025-06-20 286 | 287 | ### Added 288 | 289 | - Expanded search capabilities 290 | - Integration tests for MCP server functionality 291 | - Utility modules for gene validation, mutation filtering, and request caching 292 | 293 | ## [0.2.1] - 2025-06-19 294 | 295 | ### Added 296 | 297 | - Remote MCP policies 298 | 299 | ## [0.2.0] - 2025-06-17 300 | 301 | ### Added 302 | 303 | - Sequential thinking tool for systematic problem-solving 304 | - Session-based thinking to replace global state 305 | - Extracted router handlers to reduce complexity 306 | 307 | ### Changed 308 | 309 | - Replaced global state in thinking module with session management 310 | 311 | ### Removed 312 | 313 | - Global state from sequential thinking module 314 | 315 | ### Fixed 316 | 317 | - Race conditions in sequential thinking with concurrent usage 318 | 319 | ## [0.1.11] - 2025-06-12 320 | 321 | ### Added 322 | 323 | - Advanced eligibility criteria filters to clinical trial search 324 | 325 | ## [0.1.10] - 2025-05-21 326 | 327 | ### Added 328 | 329 | - OAuth support on the Cloudflare worker via Stytch 330 | 331 | ## [0.1.9] - 2025-05-17 332 | 333 | ### Fixed 334 | 335 | - Refactor: Bump minimum Python version to 3.10 336 | 337 | ## [0.1.8] - 2025-05-14 338 | 339 | ### Fixed 340 | 341 | - Article searcher fixes 342 | 343 | ## [0.1.7] - 2025-05-07 344 | 345 | ### Added 346 | 347 | - Remote OAuth support 348 | 349 | ## [0.1.6] - 2025-05-05 350 | 351 | ### Added 352 | 353 | - Updates to handle cursor integration 354 | 355 | ## [0.1.5] - 2025-05-01 356 | 357 | ### Added 358 | 359 | - Updates to smithery yaml to account for object types needed for remote calls 360 | - Documentation and Lzyank updates 361 | 362 | ## [0.1.3] - 2025-05-01 363 | 364 | ### Added 365 | 366 | - Health check functionality to assist with API call issues 367 | - System resources and network & environment information gathering 368 | - Remote MCP capability via Cloudflare using SSE 369 | 370 | ## [0.1.2] - 2025-04-18 371 | 372 | ### Added 373 | 374 | - Researcher persona and BioMCP v0.1.2 release 375 | - Deep Researcher Persona blog post 376 | - Researcher persona video demo 377 | 378 | ## [0.1.1] - 2025-04-14 379 | 380 | ### Added 381 | 382 | - Claude Desktop and MCP Inspector tutorials 383 | - Improved Claude Desktop Tutorial for BioMCP 384 | - Troubleshooting guide and blog post 385 | 386 | ### Fixed 387 | 388 | - Log tool names as comma separated string 389 | - Server hanging issues 390 | - Error responses in variant count check 391 | 392 | ## [0.1.0] - 2025-04-08 393 | 394 | ### Added 395 | 396 | - Initial release of BioMCP 397 | - PubMed/PubTator3 article search integration 398 | - ClinicalTrials.gov trial search integration 399 | - MyVariant.info variant search integration 400 | - CLI interface for direct usage 401 | - MCP server for AI assistant integration 402 | - Cloudflare Worker support for remote deployment 403 | - Comprehensive test suite with pytest-bdd 404 | - GenomOncology introduction 405 | - Blog post on AI-assisted clinical trial search 406 | - MacOS troubleshooting guide 407 | 408 | ### Security 409 | 410 | - API keys properly externalized 411 | - Input validation using Pydantic models 412 | - Safe string handling in all API calls 413 | 414 | [Unreleased]: https://github.com/genomoncology/biomcp/compare/v0.6.6...HEAD 415 | [0.6.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.6 416 | [0.6.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.5 417 | [0.6.4]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.4 418 | [0.6.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.3 419 | [0.6.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.2 420 | [0.6.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.1 421 | [0.6.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.0 422 | [0.5.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.5.0 423 | [0.4.7]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.7 424 | [0.4.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.6 425 | [0.4.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.5 426 | [0.4.4]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.4 427 | [0.4.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.3 428 | [0.4.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.2 429 | [0.4.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.1 430 | [0.4.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.0 431 | [0.3.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.3 432 | [0.3.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.2 433 | [0.3.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.1 434 | [0.3.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.0 435 | [0.2.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.2.1 436 | [0.2.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.2.0 437 | [0.1.11]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.11 438 | [0.1.10]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.10 439 | [0.1.9]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.9 440 | [0.1.8]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.8 441 | [0.1.7]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.7 442 | [0.1.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.6 443 | [0.1.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.5 444 | [0.1.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.3 445 | [0.1.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.2 446 | [0.1.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.1 447 | [0.1.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.0 448 | ``` -------------------------------------------------------------------------------- /docs/developer-guides/02-contributing-and-testing.md: -------------------------------------------------------------------------------- ```markdown 1 | # Contributing and Testing Guide 2 | 3 | This guide covers how to contribute to BioMCP and run the comprehensive test suite. 4 | 5 | ## Getting Started 6 | 7 | ### Prerequisites 8 | 9 | - Python 3.10 or higher 10 | - [uv](https://docs.astral.sh/uv/) package manager 11 | - Git 12 | - Node.js (for MCP Inspector) 13 | 14 | ### Initial Setup 15 | 16 | 1. **Fork and clone the repository:** 17 | 18 | ```bash 19 | git clone https://github.com/YOUR_USERNAME/biomcp.git 20 | cd biomcp 21 | ``` 22 | 23 | 2. **Install dependencies and setup:** 24 | 25 | ```bash 26 | # Recommended: Use make for complete setup 27 | make install 28 | 29 | # Alternative: Manual setup 30 | uv sync --all-extras 31 | uv run pre-commit install 32 | ``` 33 | 34 | 3. **Verify installation:** 35 | 36 | ```bash 37 | # Run server 38 | biomcp run 39 | 40 | # Run tests 41 | make test-offline 42 | ``` 43 | 44 | ## Development Workflow 45 | 46 | ### 1. Create Feature Branch 47 | 48 | ```bash 49 | git checkout -b feature/your-feature-name 50 | ``` 51 | 52 | ### 2. Make Changes 53 | 54 | Follow these principles: 55 | 56 | - **Keep changes minimal and focused** 57 | - **Follow existing code patterns** 58 | - **Add tests for new functionality** 59 | - **Update documentation as needed** 60 | 61 | ### 3. Quality Checks 62 | 63 | **MANDATORY: Run these before considering work complete:** 64 | 65 | ```bash 66 | # Step 1: Code quality checks 67 | make check 68 | 69 | # This runs: 70 | # - ruff check (linting) 71 | # - ruff format (code formatting) 72 | # - mypy (type checking) 73 | # - pre-commit hooks 74 | # - deptry (dependency analysis) 75 | ``` 76 | 77 | ### 4. Run Tests 78 | 79 | ```bash 80 | # Step 2: Run appropriate test suite 81 | make test # Full suite (requires network) 82 | # OR 83 | make test-offline # Unit tests only (no network) 84 | ``` 85 | 86 | **Both quality checks and tests MUST pass before submitting changes.** 87 | 88 | ## Testing Strategy 89 | 90 | ### Test Categories 91 | 92 | #### Unit Tests 93 | 94 | - Fast, reliable tests without external dependencies 95 | - Mock all external API calls 96 | - Always run in CI/CD 97 | 98 | ```python 99 | # Example unit test 100 | @patch('httpx.AsyncClient.get') 101 | async def test_article_search(mock_get): 102 | mock_get.return_value.json.return_value = {"results": [...]} 103 | result = await article_searcher(genes=["BRAF"]) 104 | assert len(result) > 0 105 | ``` 106 | 107 | #### Integration Tests 108 | 109 | - Test real API interactions 110 | - May fail due to network/API issues 111 | - Run separately in CI with `continue-on-error` 112 | 113 | ```python 114 | # Example integration test 115 | @pytest.mark.integration 116 | async def test_real_pubmed_search(): 117 | result = await article_searcher(genes=["TP53"], limit=5) 118 | assert len(result) == 5 119 | assert all("TP53" in r.text for r in result) 120 | ``` 121 | 122 | ### Running Tests 123 | 124 | #### Command Options 125 | 126 | ```bash 127 | # Run all tests 128 | make test 129 | uv run python -m pytest 130 | 131 | # Run only unit tests (fast, offline) 132 | make test-offline 133 | uv run python -m pytest -m "not integration" 134 | 135 | # Run only integration tests 136 | uv run python -m pytest -m "integration" 137 | 138 | # Run specific test file 139 | uv run python -m pytest tests/tdd/test_article_search.py 140 | 141 | # Run with coverage 142 | make cov 143 | uv run python -m pytest --cov --cov-report=html 144 | 145 | # Run tests verbosely 146 | uv run python -m pytest -v 147 | 148 | # Run tests and stop on first failure 149 | uv run python -m pytest -x 150 | ``` 151 | 152 | #### Test Discovery 153 | 154 | Tests are organized in: 155 | 156 | - `tests/tdd/` - Unit and integration tests 157 | - `tests/bdd/` - Behavior-driven development tests 158 | - `tests/data/` - Test fixtures and sample data 159 | 160 | ### Writing Tests 161 | 162 | #### Test Structure 163 | 164 | ```python 165 | import pytest 166 | from unittest.mock import patch, AsyncMock 167 | from biomcp.articles import article_searcher 168 | 169 | class TestArticleSearch: 170 | """Test article search functionality""" 171 | 172 | @pytest.fixture 173 | def mock_response(self): 174 | """Sample API response""" 175 | return { 176 | "results": [ 177 | {"pmid": "12345", "title": "BRAF in melanoma"} 178 | ] 179 | } 180 | 181 | @patch('httpx.AsyncClient.get') 182 | async def test_basic_search(self, mock_get, mock_response): 183 | """Test basic article search""" 184 | # Setup 185 | mock_get.return_value = AsyncMock() 186 | mock_get.return_value.json.return_value = mock_response 187 | 188 | # Execute 189 | result = await article_searcher(genes=["BRAF"]) 190 | 191 | # Assert 192 | assert len(result) == 1 193 | assert "BRAF" in result[0].title 194 | ``` 195 | 196 | #### Async Testing 197 | 198 | ```python 199 | import pytest 200 | import asyncio 201 | 202 | @pytest.mark.asyncio 203 | async def test_async_function(): 204 | """Test async functionality""" 205 | result = await some_async_function() 206 | assert result is not None 207 | 208 | # Or use pytest-asyncio fixtures 209 | @pytest.fixture 210 | async def async_client(): 211 | async with AsyncClient() as client: 212 | yield client 213 | ``` 214 | 215 | #### Mocking External APIs 216 | 217 | ```python 218 | from unittest.mock import patch, MagicMock 219 | 220 | @patch('biomcp.integrations.pubmed.search') 221 | def test_with_mock(mock_search): 222 | # Configure mock 223 | mock_search.return_value = [{ 224 | "pmid": "12345", 225 | "title": "Test Article" 226 | }] 227 | 228 | # Test code that uses the mocked function 229 | result = search_articles("BRAF") 230 | 231 | # Verify mock was called correctly 232 | mock_search.assert_called_once_with("BRAF") 233 | ``` 234 | 235 | ## MCP Inspector Testing 236 | 237 | The MCP Inspector provides an interactive way to test MCP tools. 238 | 239 | ### Setup 240 | 241 | ```bash 242 | # Install inspector 243 | npm install -g @modelcontextprotocol/inspector 244 | 245 | # Run BioMCP with inspector 246 | make inspector 247 | # OR 248 | npx @modelcontextprotocol/inspector uv run --with biomcp-python biomcp run 249 | ``` 250 | 251 | ### Testing Tools 252 | 253 | 1. **Connect to server** in the inspector UI 254 | 2. **View available tools** in the tools panel 255 | 3. **Test individual tools** with sample inputs 256 | 257 | #### Example Tool Tests 258 | 259 | ```javascript 260 | // Test article search 261 | { 262 | "tool": "article_searcher", 263 | "arguments": { 264 | "genes": ["BRAF"], 265 | "diseases": ["melanoma"], 266 | "limit": 5 267 | } 268 | } 269 | 270 | // Test trial search 271 | { 272 | "tool": "trial_searcher", 273 | "arguments": { 274 | "conditions": ["lung cancer"], 275 | "recruiting_status": "OPEN", 276 | "limit": 10 277 | } 278 | } 279 | 280 | // Test think tool (ALWAYS first!) 281 | { 282 | "tool": "think", 283 | "arguments": { 284 | "thought": "Planning to search for BRAF mutations", 285 | "thoughtNumber": 1, 286 | "nextThoughtNeeded": true 287 | } 288 | } 289 | ``` 290 | 291 | ### Debugging with Inspector 292 | 293 | 1. **Check request/response**: View raw MCP messages 294 | 2. **Verify parameters**: Ensure correct argument format 295 | 3. **Test error handling**: Try invalid inputs 296 | 4. **Monitor performance**: Check response times 297 | 298 | ## Code Style and Standards 299 | 300 | ### Python Style 301 | 302 | - **Formatter**: ruff (line length: 79) 303 | - **Type hints**: Required for all functions 304 | - **Docstrings**: Google style for all public functions 305 | 306 | ```python 307 | def search_articles( 308 | genes: list[str], 309 | limit: int = 10 310 | ) -> list[Article]: 311 | """Search for articles by gene names. 312 | 313 | Args: 314 | genes: List of gene symbols to search 315 | limit: Maximum number of results 316 | 317 | Returns: 318 | List of Article objects 319 | 320 | Raises: 321 | ValueError: If genes list is empty 322 | """ 323 | if not genes: 324 | raise ValueError("Genes list cannot be empty") 325 | # Implementation... 326 | ``` 327 | 328 | ### Pre-commit Hooks 329 | 330 | Automatically run on commit: 331 | 332 | - ruff formatting 333 | - ruff linting 334 | - mypy type checking 335 | - File checks (YAML, TOML, merge conflicts) 336 | 337 | Manual run: 338 | 339 | ```bash 340 | uv run pre-commit run --all-files 341 | ``` 342 | 343 | ## Continuous Integration 344 | 345 | ### GitHub Actions Workflow 346 | 347 | The CI pipeline runs: 348 | 349 | 1. **Linting and Formatting** 350 | 2. **Type Checking** 351 | 3. **Unit Tests** (required to pass) 352 | 4. **Integration Tests** (allowed to fail) 353 | 5. **Coverage Report** 354 | 355 | ### CI Configuration 356 | 357 | ```yaml 358 | # .github/workflows/test.yml structure 359 | jobs: 360 | test: 361 | strategy: 362 | matrix: 363 | python-version: ["3.10", "3.11", "3.12"] 364 | steps: 365 | - uses: actions/checkout@v4 366 | - uses: astral-sh/setup-uv@v2 367 | - run: make check 368 | - run: make test-offline 369 | ``` 370 | 371 | ## Debugging and Troubleshooting 372 | 373 | ### Common Issues 374 | 375 | #### Test Failures 376 | 377 | ```bash 378 | # Run failed test with more details 379 | uv run python -m pytest -vvs tests/path/to/test.py::test_name 380 | 381 | # Debug with print statements 382 | uv run python -m pytest -s # Don't capture stdout 383 | 384 | # Use debugger 385 | uv run python -m pytest --pdb # Drop to debugger on failure 386 | ``` 387 | 388 | #### Integration Test Issues 389 | 390 | Common causes: 391 | 392 | - **Rate limiting**: Add delays or use mocks 393 | - **API changes**: Update test expectations 394 | - **Network issues**: Check connectivity 395 | - **API keys**: Ensure valid keys for NCI tests 396 | 397 | ## Integration Testing 398 | 399 | ### Overview 400 | 401 | BioMCP includes integration tests that make real API calls to external services. These tests verify that our integrations work correctly with live data but can be affected by API availability, rate limits, and data changes. 402 | 403 | ### Running Integration Tests 404 | 405 | ```bash 406 | # Run all tests including integration 407 | make test 408 | 409 | # Run only integration tests 410 | pytest -m integration 411 | 412 | # Skip integration tests 413 | pytest -m "not integration" 414 | ``` 415 | 416 | ### Handling Flaky Tests 417 | 418 | Integration tests may fail or skip for various reasons: 419 | 420 | 1. **API Unavailability** 421 | 422 | - **Symptom**: Tests skip with "API returned no data" message 423 | - **Cause**: The external service is down or experiencing issues 424 | - **Action**: Re-run tests later or check service status 425 | 426 | 2. **Rate Limiting** 427 | 428 | - **Symptom**: Multiple test failures after initial successes 429 | - **Cause**: Too many requests in a short time 430 | - **Action**: Run tests with delays between them or use API tokens 431 | 432 | 3. **Data Changes** 433 | - **Symptom**: Assertions about specific data fail 434 | - **Cause**: The external data has changed (e.g., new mutations discovered) 435 | - **Action**: Update tests to use more flexible assertions 436 | 437 | ### Integration Test Design Principles 438 | 439 | #### 1. Graceful Skipping 440 | 441 | Tests should skip rather than fail when: 442 | 443 | - API returns no data 444 | - Service is unavailable 445 | - Rate limits are hit 446 | 447 | ```python 448 | if not data or data.total_count == 0: 449 | pytest.skip("API returned no data - possible service issue") 450 | ``` 451 | 452 | #### 2. Flexible Assertions 453 | 454 | Avoid assertions on specific data values that might change: 455 | 456 | ❌ **Bad**: Expecting exact mutation counts 457 | 458 | ```python 459 | assert summary.total_mutations == 1234 460 | ``` 461 | 462 | ✅ **Good**: Checking data exists and has reasonable structure 463 | 464 | ```python 465 | assert summary.total_mutations > 0 466 | assert hasattr(summary, 'hotspots') 467 | ``` 468 | 469 | #### 3. Retry Logic 470 | 471 | For critical tests, implement retry with delay: 472 | 473 | ```python 474 | async def fetch_with_retry(client, resource, max_attempts=2, delay=1.0): 475 | for attempt in range(max_attempts): 476 | result = await client.get(resource) 477 | if result and result.data: 478 | return result 479 | if attempt < max_attempts - 1: 480 | await asyncio.sleep(delay) 481 | return None 482 | ``` 483 | 484 | #### 4. Cache Management 485 | 486 | Clear caches before tests to ensure fresh data: 487 | 488 | ```python 489 | from biomcp.utils.request_cache import clear_cache 490 | await clear_cache() 491 | ``` 492 | 493 | ### Common Integration Test Patterns 494 | 495 | #### Testing Search Functionality 496 | 497 | ```python 498 | @pytest.mark.integration 499 | async def test_gene_search(self): 500 | client = SearchClient() 501 | results = await client.search("BRAF") 502 | 503 | # Flexible assertions 504 | assert results is not None 505 | if results.count > 0: 506 | assert results.items[0].gene_symbol == "BRAF" 507 | else: 508 | pytest.skip("No results returned - API may be unavailable") 509 | ``` 510 | 511 | #### Testing Data Retrieval 512 | 513 | ```python 514 | @pytest.mark.integration 515 | async def test_variant_details(self): 516 | client = VariantClient() 517 | variant = await client.get_variant("rs121913529") 518 | 519 | if not variant: 520 | pytest.skip("Variant not found - may have been removed from database") 521 | 522 | # Check structure, not specific values 523 | assert hasattr(variant, 'chromosome') 524 | assert hasattr(variant, 'position') 525 | ``` 526 | 527 | ### Debugging Failed Integration Tests 528 | 529 | 1. **Enable Debug Logging** 530 | 531 | ```bash 532 | BIOMCP_LOG_LEVEL=DEBUG pytest tests/integration/test_failing.py -v 533 | ``` 534 | 535 | 2. **Check API Status** 536 | 537 | - PubMed: https://www.ncbi.nlm.nih.gov/home/about/website-updates/ 538 | - ClinicalTrials.gov: https://clinicaltrials.gov/about/announcements 539 | - cBioPortal: https://www.cbioportal.org/ 540 | 541 | 3. **Inspect Response Data** 542 | ```python 543 | if not expected_data: 544 | print(f"Unexpected response: {response}") 545 | pytest.skip("Data structure changed") 546 | ``` 547 | 548 | ### Environment Variables for Testing 549 | 550 | #### API Tokens 551 | 552 | Some services provide higher rate limits with authentication: 553 | 554 | ```bash 555 | export CBIO_TOKEN="your-token-here" 556 | export PUBMED_API_KEY="your-key-here" 557 | ``` 558 | 559 | #### Offline Mode 560 | 561 | Test offline behavior: 562 | 563 | ```bash 564 | export BIOMCP_OFFLINE=true 565 | pytest tests/ 566 | ``` 567 | 568 | #### Custom Timeouts 569 | 570 | Adjust timeouts for slow connections: 571 | 572 | ```bash 573 | export BIOMCP_REQUEST_TIMEOUT=60 574 | pytest tests/integration/ 575 | ``` 576 | 577 | ### CI/CD Considerations 578 | 579 | 1. **Separate Test Runs** 580 | 581 | ```yaml 582 | - name: Unit Tests 583 | run: pytest -m "not integration" 584 | 585 | - name: Integration Tests 586 | run: pytest -m integration 587 | continue-on-error: true 588 | ``` 589 | 590 | 2. **Scheduled Runs** 591 | 592 | ```yaml 593 | on: 594 | schedule: 595 | - cron: "0 6 * * *" # Daily at 6 AM 596 | ``` 597 | 598 | 3. **Result Monitoring**: Track integration test success rates over time to identify patterns. 599 | 600 | ### Integration Testing Best Practices 601 | 602 | 1. **Keep integration tests focused** - Test integration points, not business logic 603 | 2. **Use reasonable timeouts** - Don't wait forever for slow APIs 604 | 3. **Document expected failures** - Add comments explaining why tests might skip 605 | 4. **Monitor external changes** - Subscribe to API change notifications 606 | 5. **Provide escape hatches** - Allow skipping integration tests when needed 607 | 608 | #### Type Checking Errors 609 | 610 | ```bash 611 | # Check specific file 612 | uv run mypy src/biomcp/specific_file.py 613 | 614 | # Ignore specific error 615 | # type: ignore[error-code] 616 | 617 | # Show error codes 618 | uv run mypy --show-error-codes 619 | ``` 620 | 621 | ### Performance Testing 622 | 623 | ```python 624 | import time 625 | import pytest 626 | 627 | @pytest.mark.performance 628 | def test_search_performance(): 629 | """Ensure search completes within time limit""" 630 | start = time.time() 631 | result = search_articles("TP53", limit=100) 632 | duration = time.time() - start 633 | 634 | assert duration < 5.0 # Should complete in 5 seconds 635 | assert len(result) == 100 636 | ``` 637 | 638 | ## Submitting Changes 639 | 640 | ### Pull Request Process 641 | 642 | 1. **Ensure all checks pass:** 643 | 644 | ```bash 645 | make check && make test 646 | ``` 647 | 648 | 2. **Update documentation** if needed 649 | 650 | 3. **Commit with clear message:** 651 | 652 | ```bash 653 | git add . 654 | git commit -m "feat: add support for variant batch queries 655 | 656 | - Add batch_variant_search function 657 | - Update tests for batch functionality 658 | - Document batch size limits" 659 | ``` 660 | 661 | 4. **Push to your fork:** 662 | 663 | ```bash 664 | git push origin feature/your-feature-name 665 | ``` 666 | 667 | 5. **Create Pull Request** with: 668 | - Clear description of changes 669 | - Link to related issues 670 | - Test results summary 671 | 672 | ### Code Review Guidelines 673 | 674 | Your PR will be reviewed for: 675 | 676 | - **Code quality** and style consistency 677 | - **Test coverage** for new features 678 | - **Documentation** updates 679 | - **Performance** impact 680 | - **Security** considerations 681 | 682 | ## Best Practices 683 | 684 | ### DO: 685 | 686 | - Write tests for new functionality 687 | - Follow existing patterns 688 | - Keep PRs focused and small 689 | - Update documentation 690 | - Run full test suite locally 691 | 692 | ### DON'T: 693 | 694 | - Skip tests to "save time" 695 | - Mix unrelated changes in one PR 696 | - Ignore linting warnings 697 | - Commit sensitive data 698 | - Break existing functionality 699 | 700 | ## Additional Resources 701 | 702 | - [MCP Documentation](https://modelcontextprotocol.org) 703 | - [pytest Documentation](https://docs.pytest.org) 704 | - [Type Hints Guide](https://mypy.readthedocs.io) 705 | - [Ruff Documentation](https://docs.astral.sh/ruff) 706 | 707 | ## Getting Help 708 | 709 | - **GitHub Issues**: Report bugs or request features 710 | - **Issues**: Ask questions or share ideas 711 | - **Pull Requests**: Submit contributions 712 | - **Documentation**: Check existing docs first 713 | 714 | Remember: Quality over speed. Take time to write good tests and clean code! 715 | ``` -------------------------------------------------------------------------------- /src/biomcp/cli/openfda.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | OpenFDA CLI commands for BioMCP. 3 | """ 4 | 5 | import asyncio 6 | from typing import Annotated 7 | 8 | import typer 9 | from rich.console import Console 10 | 11 | from ..openfda import ( 12 | get_adverse_event, 13 | get_device_event, 14 | get_drug_approval, 15 | get_drug_label, 16 | get_drug_recall, 17 | get_drug_shortage, 18 | search_adverse_events, 19 | search_device_events, 20 | search_drug_approvals, 21 | search_drug_labels, 22 | search_drug_recalls, 23 | search_drug_shortages, 24 | ) 25 | 26 | console = Console() 27 | 28 | # Create separate Typer apps for each subdomain 29 | adverse_app = typer.Typer( 30 | no_args_is_help=True, 31 | help="Search and retrieve FDA drug adverse event reports (FAERS)", 32 | ) 33 | 34 | label_app = typer.Typer( 35 | no_args_is_help=True, 36 | help="Search and retrieve FDA drug product labels (SPL)", 37 | ) 38 | 39 | device_app = typer.Typer( 40 | no_args_is_help=True, 41 | help="Search and retrieve FDA device adverse event reports (MAUDE)", 42 | ) 43 | 44 | approval_app = typer.Typer( 45 | no_args_is_help=True, 46 | help="Search and retrieve FDA drug approval records (Drugs@FDA)", 47 | ) 48 | 49 | recall_app = typer.Typer( 50 | no_args_is_help=True, 51 | help="Search and retrieve FDA drug recall records (Enforcement)", 52 | ) 53 | 54 | shortage_app = typer.Typer( 55 | no_args_is_help=True, 56 | help="Search and retrieve FDA drug shortage information", 57 | ) 58 | 59 | 60 | # Adverse Events Commands 61 | @adverse_app.command("search") 62 | def search_adverse_events_cli( 63 | drug: Annotated[ 64 | str | None, 65 | typer.Option("--drug", "-d", help="Drug name to search for"), 66 | ] = None, 67 | reaction: Annotated[ 68 | str | None, 69 | typer.Option( 70 | "--reaction", "-r", help="Adverse reaction to search for" 71 | ), 72 | ] = None, 73 | serious: Annotated[ 74 | bool | None, 75 | typer.Option("--serious/--all", help="Filter for serious events only"), 76 | ] = None, 77 | limit: Annotated[ 78 | int, typer.Option("--limit", "-l", help="Maximum number of results") 79 | ] = 25, 80 | page: Annotated[ 81 | int, typer.Option("--page", "-p", help="Page number (1-based)") 82 | ] = 1, 83 | api_key: Annotated[ 84 | str | None, 85 | typer.Option( 86 | "--api-key", 87 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 88 | ), 89 | ] = None, 90 | ): 91 | """Search FDA adverse event reports for drugs.""" 92 | skip = (page - 1) * limit 93 | 94 | try: 95 | results = asyncio.run( 96 | search_adverse_events( 97 | drug=drug, 98 | reaction=reaction, 99 | serious=serious, 100 | limit=limit, 101 | skip=skip, 102 | api_key=api_key, 103 | ) 104 | ) 105 | console.print(results) 106 | except Exception as e: 107 | console.print(f"[red]Error: {e}[/red]") 108 | raise typer.Exit(1) from e 109 | 110 | 111 | @adverse_app.command("get") 112 | def get_adverse_event_cli( 113 | report_id: Annotated[str, typer.Argument(help="Safety report ID")], 114 | api_key: Annotated[ 115 | str | None, 116 | typer.Option( 117 | "--api-key", 118 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 119 | ), 120 | ] = None, 121 | ): 122 | """Get detailed information for a specific adverse event report.""" 123 | try: 124 | result = asyncio.run(get_adverse_event(report_id, api_key=api_key)) 125 | console.print(result) 126 | except Exception as e: 127 | console.print(f"[red]Error: {e}[/red]") 128 | raise typer.Exit(1) from e 129 | 130 | 131 | # Drug Label Commands 132 | @label_app.command("search") 133 | def search_drug_labels_cli( 134 | name: Annotated[ 135 | str | None, 136 | typer.Option("--name", "-n", help="Drug name to search for"), 137 | ] = None, 138 | indication: Annotated[ 139 | str | None, 140 | typer.Option( 141 | "--indication", 142 | "-i", 143 | help="Search for drugs indicated for this condition", 144 | ), 145 | ] = None, 146 | boxed_warning: Annotated[ 147 | bool, 148 | typer.Option( 149 | "--boxed-warning", help="Filter for drugs with boxed warnings" 150 | ), 151 | ] = False, 152 | section: Annotated[ 153 | str | None, 154 | typer.Option( 155 | "--section", "-s", help="Specific label section to search" 156 | ), 157 | ] = None, 158 | limit: Annotated[ 159 | int, typer.Option("--limit", "-l", help="Maximum number of results") 160 | ] = 25, 161 | page: Annotated[ 162 | int, typer.Option("--page", "-p", help="Page number (1-based)") 163 | ] = 1, 164 | api_key: Annotated[ 165 | str | None, 166 | typer.Option( 167 | "--api-key", 168 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 169 | ), 170 | ] = None, 171 | ): 172 | """Search FDA drug product labels.""" 173 | skip = (page - 1) * limit 174 | 175 | try: 176 | results = asyncio.run( 177 | search_drug_labels( 178 | name=name, 179 | indication=indication, 180 | boxed_warning=boxed_warning, 181 | section=section, 182 | limit=limit, 183 | skip=skip, 184 | api_key=api_key, 185 | ) 186 | ) 187 | console.print(results) 188 | except Exception as e: 189 | console.print(f"[red]Error: {e}[/red]") 190 | raise typer.Exit(1) from e 191 | 192 | 193 | @label_app.command("get") 194 | def get_drug_label_cli( 195 | set_id: Annotated[str, typer.Argument(help="Label set ID")], 196 | sections: Annotated[ 197 | str | None, 198 | typer.Option( 199 | "--sections", help="Comma-separated list of sections to retrieve" 200 | ), 201 | ] = None, 202 | api_key: Annotated[ 203 | str | None, 204 | typer.Option( 205 | "--api-key", 206 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 207 | ), 208 | ] = None, 209 | ): 210 | """Get detailed drug label information.""" 211 | section_list = None 212 | if sections: 213 | section_list = [s.strip() for s in sections.split(",")] 214 | 215 | try: 216 | result = asyncio.run( 217 | get_drug_label(set_id, section_list, api_key=api_key) 218 | ) 219 | console.print(result) 220 | except Exception as e: 221 | console.print(f"[red]Error: {e}[/red]") 222 | raise typer.Exit(1) from e 223 | 224 | 225 | # Device Event Commands 226 | @device_app.command("search") 227 | def search_device_events_cli( 228 | device: Annotated[ 229 | str | None, 230 | typer.Option("--device", "-d", help="Device name to search for"), 231 | ] = None, 232 | manufacturer: Annotated[ 233 | str | None, 234 | typer.Option("--manufacturer", "-m", help="Manufacturer name"), 235 | ] = None, 236 | problem: Annotated[ 237 | str | None, 238 | typer.Option("--problem", "-p", help="Device problem description"), 239 | ] = None, 240 | product_code: Annotated[ 241 | str | None, typer.Option("--product-code", help="FDA product code") 242 | ] = None, 243 | genomics_only: Annotated[ 244 | bool, 245 | typer.Option( 246 | "--genomics-only/--all-devices", 247 | help="Filter to genomic/diagnostic devices", 248 | ), 249 | ] = True, 250 | limit: Annotated[ 251 | int, typer.Option("--limit", "-l", help="Maximum number of results") 252 | ] = 25, 253 | page: Annotated[ 254 | int, typer.Option("--page", help="Page number (1-based)") 255 | ] = 1, 256 | api_key: Annotated[ 257 | str | None, 258 | typer.Option( 259 | "--api-key", 260 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 261 | ), 262 | ] = None, 263 | ): 264 | """Search FDA device adverse event reports.""" 265 | skip = (page - 1) * limit 266 | 267 | try: 268 | results = asyncio.run( 269 | search_device_events( 270 | device=device, 271 | manufacturer=manufacturer, 272 | problem=problem, 273 | product_code=product_code, 274 | genomics_only=genomics_only, 275 | limit=limit, 276 | skip=skip, 277 | api_key=api_key, 278 | ) 279 | ) 280 | console.print(results) 281 | except Exception as e: 282 | console.print(f"[red]Error: {e}[/red]") 283 | raise typer.Exit(1) from e 284 | 285 | 286 | @device_app.command("get") 287 | def get_device_event_cli( 288 | mdr_report_key: Annotated[str, typer.Argument(help="MDR report key")], 289 | api_key: Annotated[ 290 | str | None, 291 | typer.Option( 292 | "--api-key", 293 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 294 | ), 295 | ] = None, 296 | ): 297 | """Get detailed information for a specific device event report.""" 298 | try: 299 | result = asyncio.run(get_device_event(mdr_report_key, api_key=api_key)) 300 | console.print(result) 301 | except Exception as e: 302 | console.print(f"[red]Error: {e}[/red]") 303 | raise typer.Exit(1) from e 304 | 305 | 306 | # Drug Approval Commands 307 | @approval_app.command("search") 308 | def search_drug_approvals_cli( 309 | drug: Annotated[ 310 | str | None, 311 | typer.Option("--drug", "-d", help="Drug name to search for"), 312 | ] = None, 313 | application: Annotated[ 314 | str | None, 315 | typer.Option( 316 | "--application", "-a", help="NDA or BLA application number" 317 | ), 318 | ] = None, 319 | year: Annotated[ 320 | str | None, 321 | typer.Option("--year", "-y", help="Approval year (YYYY format)"), 322 | ] = None, 323 | limit: Annotated[ 324 | int, typer.Option("--limit", "-l", help="Maximum number of results") 325 | ] = 25, 326 | page: Annotated[ 327 | int, typer.Option("--page", "-p", help="Page number (1-based)") 328 | ] = 1, 329 | api_key: Annotated[ 330 | str | None, 331 | typer.Option( 332 | "--api-key", 333 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 334 | ), 335 | ] = None, 336 | ): 337 | """Search FDA drug approval records.""" 338 | skip = (page - 1) * limit 339 | 340 | try: 341 | results = asyncio.run( 342 | search_drug_approvals( 343 | drug=drug, 344 | application_number=application, 345 | approval_year=year, 346 | limit=limit, 347 | skip=skip, 348 | api_key=api_key, 349 | ) 350 | ) 351 | console.print(results) 352 | except Exception as e: 353 | console.print(f"[red]Error: {e}[/red]") 354 | raise typer.Exit(1) from e 355 | 356 | 357 | @approval_app.command("get") 358 | def get_drug_approval_cli( 359 | application: Annotated[ 360 | str, typer.Argument(help="NDA or BLA application number") 361 | ], 362 | api_key: Annotated[ 363 | str | None, 364 | typer.Option( 365 | "--api-key", 366 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 367 | ), 368 | ] = None, 369 | ): 370 | """Get detailed drug approval information.""" 371 | try: 372 | result = asyncio.run(get_drug_approval(application, api_key=api_key)) 373 | console.print(result) 374 | except Exception as e: 375 | console.print(f"[red]Error: {e}[/red]") 376 | raise typer.Exit(1) from e 377 | 378 | 379 | # Drug Recall Commands 380 | @recall_app.command("search") 381 | def search_drug_recalls_cli( 382 | drug: Annotated[ 383 | str | None, 384 | typer.Option("--drug", "-d", help="Drug name to search for"), 385 | ] = None, 386 | recall_class: Annotated[ 387 | str | None, 388 | typer.Option( 389 | "--class", "-c", help="Recall classification (1, 2, or 3)" 390 | ), 391 | ] = None, 392 | status: Annotated[ 393 | str | None, 394 | typer.Option( 395 | "--status", "-s", help="Recall status (ongoing, completed)" 396 | ), 397 | ] = None, 398 | reason: Annotated[ 399 | str | None, 400 | typer.Option("--reason", "-r", help="Search in recall reason"), 401 | ] = None, 402 | since: Annotated[ 403 | str | None, 404 | typer.Option("--since", help="Show recalls after date (YYYYMMDD)"), 405 | ] = None, 406 | limit: Annotated[ 407 | int, typer.Option("--limit", "-l", help="Maximum number of results") 408 | ] = 25, 409 | page: Annotated[ 410 | int, typer.Option("--page", "-p", help="Page number (1-based)") 411 | ] = 1, 412 | api_key: Annotated[ 413 | str | None, 414 | typer.Option( 415 | "--api-key", 416 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 417 | ), 418 | ] = None, 419 | ): 420 | """Search FDA drug recall records.""" 421 | skip = (page - 1) * limit 422 | 423 | try: 424 | results = asyncio.run( 425 | search_drug_recalls( 426 | drug=drug, 427 | recall_class=recall_class, 428 | status=status, 429 | reason=reason, 430 | since_date=since, 431 | limit=limit, 432 | skip=skip, 433 | api_key=api_key, 434 | ) 435 | ) 436 | console.print(results) 437 | except Exception as e: 438 | console.print(f"[red]Error: {e}[/red]") 439 | raise typer.Exit(1) from e 440 | 441 | 442 | @recall_app.command("get") 443 | def get_drug_recall_cli( 444 | recall_number: Annotated[str, typer.Argument(help="FDA recall number")], 445 | api_key: Annotated[ 446 | str | None, 447 | typer.Option( 448 | "--api-key", 449 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 450 | ), 451 | ] = None, 452 | ): 453 | """Get detailed drug recall information.""" 454 | try: 455 | result = asyncio.run(get_drug_recall(recall_number, api_key=api_key)) 456 | console.print(result) 457 | except Exception as e: 458 | console.print(f"[red]Error: {e}[/red]") 459 | raise typer.Exit(1) from e 460 | 461 | 462 | # Drug Shortage Commands 463 | @shortage_app.command("search") 464 | def search_drug_shortages_cli( 465 | drug: Annotated[ 466 | str | None, 467 | typer.Option("--drug", "-d", help="Drug name to search for"), 468 | ] = None, 469 | status: Annotated[ 470 | str | None, 471 | typer.Option( 472 | "--status", "-s", help="Shortage status (current, resolved)" 473 | ), 474 | ] = None, 475 | category: Annotated[ 476 | str | None, 477 | typer.Option("--category", "-c", help="Therapeutic category"), 478 | ] = None, 479 | limit: Annotated[ 480 | int, typer.Option("--limit", "-l", help="Maximum number of results") 481 | ] = 25, 482 | page: Annotated[ 483 | int, typer.Option("--page", "-p", help="Page number (1-based)") 484 | ] = 1, 485 | api_key: Annotated[ 486 | str | None, 487 | typer.Option( 488 | "--api-key", 489 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 490 | ), 491 | ] = None, 492 | ): 493 | """Search FDA drug shortage records.""" 494 | skip = (page - 1) * limit 495 | 496 | try: 497 | results = asyncio.run( 498 | search_drug_shortages( 499 | drug=drug, 500 | status=status, 501 | therapeutic_category=category, 502 | limit=limit, 503 | skip=skip, 504 | api_key=api_key, 505 | ) 506 | ) 507 | console.print(results) 508 | except Exception as e: 509 | console.print(f"[red]Error: {e}[/red]") 510 | raise typer.Exit(1) from e 511 | 512 | 513 | @shortage_app.command("get") 514 | def get_drug_shortage_cli( 515 | drug: Annotated[str, typer.Argument(help="Drug name")], 516 | api_key: Annotated[ 517 | str | None, 518 | typer.Option( 519 | "--api-key", 520 | help="OpenFDA API key (overrides OPENFDA_API_KEY env var)", 521 | ), 522 | ] = None, 523 | ): 524 | """Get detailed drug shortage information.""" 525 | try: 526 | result = asyncio.run(get_drug_shortage(drug, api_key=api_key)) 527 | console.print(result) 528 | except Exception as e: 529 | console.print(f"[red]Error: {e}[/red]") 530 | raise typer.Exit(1) from e 531 | 532 | 533 | # Main OpenFDA app that combines all subcommands 534 | openfda_app = typer.Typer( 535 | no_args_is_help=True, 536 | help="Search and retrieve data from FDA's openFDA API", 537 | ) 538 | 539 | # Add subcommands 540 | openfda_app.add_typer( 541 | adverse_app, name="adverse", help="Drug adverse events (FAERS)" 542 | ) 543 | openfda_app.add_typer( 544 | label_app, name="label", help="Drug product labels (SPL)" 545 | ) 546 | openfda_app.add_typer( 547 | device_app, name="device", help="Device adverse events (MAUDE)" 548 | ) 549 | openfda_app.add_typer( 550 | approval_app, name="approval", help="Drug approvals (Drugs@FDA)" 551 | ) 552 | openfda_app.add_typer( 553 | recall_app, name="recall", help="Drug recalls (Enforcement)" 554 | ) 555 | openfda_app.add_typer(shortage_app, name="shortage", help="Drug shortages") 556 | ``` -------------------------------------------------------------------------------- /src/biomcp/articles/preprints.py: -------------------------------------------------------------------------------- ```python 1 | """Preprint search functionality for bioRxiv/medRxiv and Europe PMC.""" 2 | 3 | import asyncio 4 | import json 5 | import logging 6 | from datetime import datetime 7 | from typing import Any 8 | 9 | from pydantic import BaseModel, Field 10 | 11 | from .. import http_client, render 12 | from ..constants import ( 13 | BIORXIV_BASE_URL, 14 | BIORXIV_DEFAULT_DAYS_BACK, 15 | BIORXIV_MAX_PAGES, 16 | BIORXIV_RESULTS_PER_PAGE, 17 | EUROPE_PMC_BASE_URL, 18 | EUROPE_PMC_PAGE_SIZE, 19 | MEDRXIV_BASE_URL, 20 | SYSTEM_PAGE_SIZE, 21 | ) 22 | from ..core import PublicationState 23 | from .search import PubmedRequest, ResultItem, SearchResponse 24 | 25 | logger = logging.getLogger(__name__) 26 | 27 | 28 | class BiorxivRequest(BaseModel): 29 | """Request parameters for bioRxiv/medRxiv API.""" 30 | 31 | query: str 32 | interval: str = Field( 33 | default="", description="Date interval in YYYY-MM-DD/YYYY-MM-DD format" 34 | ) 35 | cursor: int = Field(default=0, description="Starting position") 36 | 37 | 38 | class BiorxivResult(BaseModel): 39 | """Individual result from bioRxiv/medRxiv.""" 40 | 41 | doi: str | None = None 42 | title: str | None = None 43 | authors: str | None = None 44 | author_corresponding: str | None = None 45 | author_corresponding_institution: str | None = None 46 | date: str | None = None 47 | version: int | None = None 48 | type: str | None = None 49 | license: str | None = None 50 | category: str | None = None 51 | jatsxml: str | None = None 52 | abstract: str | None = None 53 | published: str | None = None 54 | server: str | None = None 55 | 56 | def to_result_item(self) -> ResultItem: 57 | """Convert to standard ResultItem format.""" 58 | authors_list = [] 59 | if self.authors: 60 | authors_list = [ 61 | author.strip() for author in self.authors.split(";") 62 | ] 63 | 64 | return ResultItem( 65 | pmid=None, 66 | pmcid=None, 67 | title=self.title, 68 | journal=f"{self.server or 'bioRxiv'} (preprint)", 69 | authors=authors_list, 70 | date=self.date, 71 | doi=self.doi, 72 | abstract=self.abstract, 73 | publication_state=PublicationState.PREPRINT, 74 | source=self.server or "bioRxiv", 75 | ) 76 | 77 | 78 | class BiorxivResponse(BaseModel): 79 | """Response from bioRxiv/medRxiv API.""" 80 | 81 | collection: list[BiorxivResult] = Field(default_factory=list) 82 | messages: list[dict[str, Any]] = Field(default_factory=list) 83 | total: int = Field(default=0, alias="total") 84 | 85 | 86 | class EuropePMCRequest(BaseModel): 87 | """Request parameters for Europe PMC API.""" 88 | 89 | query: str 90 | format: str = "json" 91 | pageSize: int = Field(default=25, le=1000) 92 | cursorMark: str = Field(default="*") 93 | src: str = Field(default="PPR", description="Source: PPR for preprints") 94 | 95 | 96 | class EuropePMCResult(BaseModel): 97 | """Individual result from Europe PMC.""" 98 | 99 | id: str | None = None 100 | source: str | None = None 101 | pmid: str | None = None 102 | pmcid: str | None = None 103 | doi: str | None = None 104 | title: str | None = None 105 | authorString: str | None = None 106 | journalTitle: str | None = None 107 | pubYear: str | None = None 108 | firstPublicationDate: str | None = None 109 | abstractText: str | None = None 110 | 111 | def to_result_item(self) -> ResultItem: 112 | """Convert to standard ResultItem format.""" 113 | authors_list = [] 114 | if self.authorString: 115 | authors_list = [ 116 | author.strip() for author in self.authorString.split(",") 117 | ] 118 | 119 | return ResultItem( 120 | pmid=int(self.pmid) if self.pmid and self.pmid.isdigit() else None, 121 | pmcid=self.pmcid, 122 | title=self.title, 123 | journal=f"{self.journalTitle or 'Preprint Server'} (preprint)", 124 | authors=authors_list, 125 | date=self.firstPublicationDate or self.pubYear, 126 | doi=self.doi, 127 | abstract=self.abstractText, 128 | publication_state=PublicationState.PREPRINT, 129 | source="Europe PMC", 130 | ) 131 | 132 | 133 | class EuropePMCResponse(BaseModel): 134 | """Response from Europe PMC API.""" 135 | 136 | hitCount: int = Field(default=0) 137 | nextCursorMark: str | None = None 138 | resultList: dict[str, Any] = Field(default_factory=dict) 139 | 140 | @property 141 | def results(self) -> list[EuropePMCResult]: 142 | result_data = self.resultList.get("result", []) 143 | return [EuropePMCResult(**r) for r in result_data] 144 | 145 | 146 | class PreprintSearcher: 147 | """Handles searching across multiple preprint sources.""" 148 | 149 | def __init__(self): 150 | self.biorxiv_client = BiorxivClient() 151 | self.europe_pmc_client = EuropePMCClient() 152 | 153 | async def search( 154 | self, 155 | request: PubmedRequest, 156 | include_biorxiv: bool = True, 157 | include_europe_pmc: bool = True, 158 | ) -> SearchResponse: 159 | """Search across preprint sources and merge results.""" 160 | query = self._build_query(request) 161 | 162 | tasks = [] 163 | if include_biorxiv: 164 | tasks.append(self.biorxiv_client.search(query)) 165 | if include_europe_pmc: 166 | tasks.append(self.europe_pmc_client.search(query)) 167 | 168 | results_lists = await asyncio.gather(*tasks, return_exceptions=True) 169 | 170 | all_results = [] 171 | for results in results_lists: 172 | if isinstance(results, list): 173 | all_results.extend(results) 174 | 175 | # Remove duplicates based on DOI 176 | seen_dois = set() 177 | unique_results = [] 178 | for result in all_results: 179 | if result.doi and result.doi in seen_dois: 180 | continue 181 | if result.doi: 182 | seen_dois.add(result.doi) 183 | unique_results.append(result) 184 | 185 | # Sort by date (newest first) 186 | unique_results.sort(key=lambda x: x.date or "0000-00-00", reverse=True) 187 | 188 | # Limit results 189 | limited_results = unique_results[:SYSTEM_PAGE_SIZE] 190 | 191 | return SearchResponse( 192 | results=limited_results, 193 | page_size=len(limited_results), 194 | current=0, 195 | count=len(limited_results), 196 | total_pages=1, 197 | ) 198 | 199 | def _build_query(self, request: PubmedRequest) -> str: 200 | """Build query string from structured request. 201 | 202 | Note: Preprint servers use plain text search, not PubMed syntax. 203 | """ 204 | query_parts = [] 205 | 206 | if request.keywords: 207 | query_parts.extend(request.keywords) 208 | if request.genes: 209 | query_parts.extend(request.genes) 210 | if request.diseases: 211 | query_parts.extend(request.diseases) 212 | if request.chemicals: 213 | query_parts.extend(request.chemicals) 214 | if request.variants: 215 | query_parts.extend(request.variants) 216 | 217 | return " ".join(query_parts) if query_parts else "" 218 | 219 | 220 | class BiorxivClient: 221 | """Client for bioRxiv/medRxiv API. 222 | 223 | IMPORTANT LIMITATION: bioRxiv/medRxiv APIs do not provide a search endpoint. 224 | This implementation works around this limitation by: 225 | 1. Fetching articles from a date range (last 365 days by default) 226 | 2. Filtering results client-side based on query match in title/abstract 227 | 228 | This approach has limitations but is optimized for performance: 229 | - Searches up to 1 year of preprints by default (configurable) 230 | - Uses pagination to avoid fetching all results at once 231 | - May still miss older preprints beyond the date range 232 | 233 | Consider using Europe PMC for more comprehensive preprint search capabilities, 234 | as it has proper search functionality without date limitations. 235 | """ 236 | 237 | async def search( # noqa: C901 238 | self, 239 | query: str, 240 | server: str = "biorxiv", 241 | days_back: int = BIORXIV_DEFAULT_DAYS_BACK, 242 | ) -> list[ResultItem]: 243 | """Search bioRxiv or medRxiv for articles. 244 | 245 | Note: Due to API limitations, this performs client-side filtering on 246 | recent articles only. See class docstring for details. 247 | """ 248 | if not query: 249 | return [] 250 | 251 | base_url = ( 252 | BIORXIV_BASE_URL if server == "biorxiv" else MEDRXIV_BASE_URL 253 | ) 254 | 255 | # Optimize by only fetching recent articles (last 30 days by default) 256 | from datetime import timedelta 257 | 258 | today = datetime.now() 259 | start_date = today - timedelta(days=days_back) 260 | interval = f"{start_date.year}-{start_date.month:02d}-{start_date.day:02d}/{today.year}-{today.month:02d}-{today.day:02d}" 261 | 262 | # Prepare query terms for better matching 263 | query_terms = query.lower().split() 264 | 265 | filtered_results = [] 266 | cursor = 0 267 | max_pages = ( 268 | BIORXIV_MAX_PAGES # Limit pagination to avoid excessive API calls 269 | ) 270 | 271 | for page in range(max_pages): 272 | request = BiorxivRequest( 273 | query=query, interval=interval, cursor=cursor 274 | ) 275 | url = f"{base_url}/{request.interval}/{request.cursor}" 276 | 277 | response, error = await http_client.request_api( 278 | url=url, 279 | method="GET", 280 | request={}, 281 | response_model_type=BiorxivResponse, 282 | domain="biorxiv", 283 | cache_ttl=300, # Cache for 5 minutes 284 | ) 285 | 286 | if error or not response: 287 | logger.warning( 288 | f"Failed to fetch {server} articles page {page} for query '{query}': {error if error else 'No response'}" 289 | ) 290 | break 291 | 292 | # Filter results based on query 293 | page_filtered = 0 294 | for result in response.collection: 295 | # Create searchable text from title and abstract 296 | searchable_text = "" 297 | if result.title: 298 | searchable_text += result.title.lower() + " " 299 | if result.abstract: 300 | searchable_text += result.abstract.lower() 301 | 302 | # Check if all query terms are present (AND logic) 303 | if all(term in searchable_text for term in query_terms): 304 | filtered_results.append(result.to_result_item()) 305 | page_filtered += 1 306 | 307 | # Stop if we have enough results 308 | if len(filtered_results) >= SYSTEM_PAGE_SIZE: 309 | return filtered_results[:SYSTEM_PAGE_SIZE] 310 | 311 | # If this page had no matches and we have some results, stop pagination 312 | if page_filtered == 0 and filtered_results: 313 | break 314 | 315 | # Move to next page 316 | cursor += len(response.collection) 317 | 318 | # Stop if we've processed all available results 319 | if ( 320 | len(response.collection) < BIORXIV_RESULTS_PER_PAGE 321 | ): # bioRxiv typically returns this many per page 322 | break 323 | 324 | return filtered_results[:SYSTEM_PAGE_SIZE] 325 | 326 | 327 | class EuropePMCClient: 328 | """Client for Europe PMC API.""" 329 | 330 | async def search( 331 | self, query: str, max_results: int = SYSTEM_PAGE_SIZE 332 | ) -> list[ResultItem]: 333 | """Search Europe PMC for preprints with pagination support.""" 334 | results: list[ResultItem] = [] 335 | cursor_mark = "*" 336 | page_size = min( 337 | EUROPE_PMC_PAGE_SIZE, max_results 338 | ) # Europe PMC optimal page size 339 | 340 | while len(results) < max_results: 341 | request = EuropePMCRequest( 342 | query=f"(SRC:PPR) AND ({query})" if query else "SRC:PPR", 343 | pageSize=page_size, 344 | cursorMark=cursor_mark, 345 | ) 346 | 347 | params = request.model_dump(exclude_none=True) 348 | 349 | response, error = await http_client.request_api( 350 | url=EUROPE_PMC_BASE_URL, 351 | method="GET", 352 | request=params, 353 | response_model_type=EuropePMCResponse, 354 | domain="europepmc", 355 | cache_ttl=300, # Cache for 5 minutes 356 | ) 357 | 358 | if error or not response: 359 | logger.warning( 360 | f"Failed to fetch Europe PMC preprints for query '{query}': {error if error else 'No response'}" 361 | ) 362 | break 363 | 364 | # Add results 365 | page_results = [ 366 | result.to_result_item() for result in response.results 367 | ] 368 | results.extend(page_results) 369 | 370 | # Check if we have more pages 371 | if ( 372 | not response.nextCursorMark 373 | or response.nextCursorMark == cursor_mark 374 | ): 375 | break 376 | 377 | # Check if we got fewer results than requested (last page) 378 | if len(page_results) < page_size: 379 | break 380 | 381 | cursor_mark = response.nextCursorMark 382 | 383 | # Adjust page size for last request if needed 384 | remaining = max_results - len(results) 385 | if remaining < page_size: 386 | page_size = remaining 387 | 388 | return results[:max_results] 389 | 390 | 391 | async def fetch_europe_pmc_article( 392 | doi: str, 393 | output_json: bool = False, 394 | ) -> str: 395 | """Fetch a single article from Europe PMC by DOI.""" 396 | # Europe PMC search API can fetch article details by DOI 397 | request = EuropePMCRequest( 398 | query=f'DOI:"{doi}"', 399 | pageSize=1, 400 | src="PPR", # Preprints source 401 | ) 402 | 403 | params = request.model_dump(exclude_none=True) 404 | 405 | response, error = await http_client.request_api( 406 | url=EUROPE_PMC_BASE_URL, 407 | method="GET", 408 | request=params, 409 | response_model_type=EuropePMCResponse, 410 | domain="europepmc", 411 | ) 412 | 413 | if error: 414 | data: list[dict[str, Any]] = [ 415 | {"error": f"Error {error.code}: {error.message}"} 416 | ] 417 | elif response and response.results: 418 | # Convert Europe PMC result to Article format for consistency 419 | europe_pmc_result = response.results[0] 420 | article_data = { 421 | "pmid": None, # Europe PMC preprints don't have PMIDs 422 | "pmcid": europe_pmc_result.pmcid, 423 | "doi": europe_pmc_result.doi, 424 | "title": europe_pmc_result.title, 425 | "journal": f"{europe_pmc_result.journalTitle or 'Preprint Server'} (preprint)", 426 | "date": europe_pmc_result.firstPublicationDate 427 | or europe_pmc_result.pubYear, 428 | "authors": [ 429 | author.strip() 430 | for author in (europe_pmc_result.authorString or "").split(",") 431 | ], 432 | "abstract": europe_pmc_result.abstractText, 433 | "full_text": "", # Europe PMC API doesn't provide full text for preprints 434 | "pubmed_url": None, 435 | "pmc_url": f"https://europepmc.org/article/PPR/{doi}" 436 | if doi 437 | else None, 438 | "source": "Europe PMC", 439 | } 440 | data = [article_data] 441 | else: 442 | data = [{"error": "Article not found in Europe PMC"}] 443 | 444 | if data and not output_json: 445 | return render.to_markdown(data) 446 | else: 447 | return json.dumps(data, indent=2) 448 | 449 | 450 | async def search_preprints( 451 | request: PubmedRequest, 452 | include_biorxiv: bool = True, 453 | include_europe_pmc: bool = True, 454 | output_json: bool = False, 455 | ) -> str: 456 | """Search for preprints across multiple sources.""" 457 | searcher = PreprintSearcher() 458 | response = await searcher.search( 459 | request, 460 | include_biorxiv=include_biorxiv, 461 | include_europe_pmc=include_europe_pmc, 462 | ) 463 | 464 | if response and response.results: 465 | data = [ 466 | result.model_dump(mode="json", exclude_none=True) 467 | for result in response.results 468 | ] 469 | else: 470 | data = [] 471 | 472 | if data and not output_json: 473 | return render.to_markdown(data) 474 | else: 475 | return json.dumps(data, indent=2) 476 | ``` -------------------------------------------------------------------------------- /src/biomcp/query_parser.py: -------------------------------------------------------------------------------- ```python 1 | """Query parser for unified search language in BioMCP.""" 2 | 3 | from dataclasses import dataclass 4 | from enum import Enum 5 | from typing import Any 6 | 7 | 8 | class Operator(str, Enum): 9 | """Query operators.""" 10 | 11 | EQ = ":" 12 | GT = ">" 13 | LT = "<" 14 | GTE = ">=" 15 | LTE = "<=" 16 | RANGE = ".." 17 | AND = "AND" 18 | OR = "OR" 19 | NOT = "NOT" 20 | 21 | 22 | class FieldType(str, Enum): 23 | """Field data types.""" 24 | 25 | STRING = "string" 26 | NUMBER = "number" 27 | DATE = "date" 28 | ENUM = "enum" 29 | BOOLEAN = "boolean" 30 | 31 | 32 | @dataclass 33 | class FieldDefinition: 34 | """Definition of a searchable field.""" 35 | 36 | name: str 37 | domain: str # "trials", "articles", "variants", "cross" 38 | type: FieldType 39 | operators: list[str] 40 | example_values: list[str] 41 | description: str 42 | underlying_api_field: str 43 | aliases: list[str] | None = None 44 | 45 | 46 | @dataclass 47 | class QueryTerm: 48 | """Parsed query term.""" 49 | 50 | field: str 51 | operator: Operator 52 | value: Any 53 | domain: str | None = None 54 | is_negated: bool = False 55 | 56 | 57 | @dataclass 58 | class ParsedQuery: 59 | """Parsed query structure.""" 60 | 61 | terms: list[QueryTerm] 62 | cross_domain_fields: dict[str, Any] 63 | domain_specific_fields: dict[str, dict[str, Any]] 64 | raw_query: str 65 | 66 | 67 | class QueryParser: 68 | """Parser for unified search queries.""" 69 | 70 | def __init__(self): 71 | self.field_registry = self._build_field_registry() 72 | 73 | def _build_field_registry(self) -> dict[str, FieldDefinition]: 74 | """Build the field registry with all searchable fields.""" 75 | registry = {} 76 | 77 | # Cross-domain fields 78 | cross_domain_fields = [ 79 | FieldDefinition( 80 | name="gene", 81 | domain="cross", 82 | type=FieldType.STRING, 83 | operators=[Operator.EQ], 84 | example_values=["BRAF", "TP53", "EGFR"], 85 | description="Gene symbol", 86 | underlying_api_field="gene", 87 | ), 88 | FieldDefinition( 89 | name="variant", 90 | domain="cross", 91 | type=FieldType.STRING, 92 | operators=[Operator.EQ], 93 | example_values=["V600E", "L858R", "rs113488022"], 94 | description="Variant notation or rsID", 95 | underlying_api_field="variant", 96 | ), 97 | FieldDefinition( 98 | name="disease", 99 | domain="cross", 100 | type=FieldType.STRING, 101 | operators=[Operator.EQ], 102 | example_values=["melanoma", "lung cancer", "diabetes"], 103 | description="Disease or condition", 104 | underlying_api_field="disease", 105 | ), 106 | ] 107 | 108 | # Trial-specific fields 109 | trial_fields = [ 110 | FieldDefinition( 111 | name="trials.condition", 112 | domain="trials", 113 | type=FieldType.STRING, 114 | operators=[Operator.EQ], 115 | example_values=["melanoma", "lung cancer"], 116 | description="Clinical trial condition", 117 | underlying_api_field="conditions", 118 | ), 119 | FieldDefinition( 120 | name="trials.intervention", 121 | domain="trials", 122 | type=FieldType.STRING, 123 | operators=[Operator.EQ], 124 | example_values=["osimertinib", "pembrolizumab"], 125 | description="Trial intervention", 126 | underlying_api_field="interventions", 127 | ), 128 | FieldDefinition( 129 | name="trials.phase", 130 | domain="trials", 131 | type=FieldType.ENUM, 132 | operators=[Operator.EQ], 133 | example_values=["1", "2", "3", "4"], 134 | description="Trial phase", 135 | underlying_api_field="phase", 136 | ), 137 | FieldDefinition( 138 | name="trials.status", 139 | domain="trials", 140 | type=FieldType.ENUM, 141 | operators=[Operator.EQ], 142 | example_values=["recruiting", "active", "completed"], 143 | description="Trial recruitment status", 144 | underlying_api_field="recruiting_status", 145 | ), 146 | ] 147 | 148 | # Article-specific fields 149 | article_fields = [ 150 | FieldDefinition( 151 | name="articles.title", 152 | domain="articles", 153 | type=FieldType.STRING, 154 | operators=[Operator.EQ], 155 | example_values=["EGFR mutations", "cancer therapy"], 156 | description="Article title", 157 | underlying_api_field="title", 158 | ), 159 | FieldDefinition( 160 | name="articles.author", 161 | domain="articles", 162 | type=FieldType.STRING, 163 | operators=[Operator.EQ], 164 | example_values=["Smith J", "Johnson A"], 165 | description="Article author", 166 | underlying_api_field="author", 167 | ), 168 | FieldDefinition( 169 | name="articles.journal", 170 | domain="articles", 171 | type=FieldType.STRING, 172 | operators=[Operator.EQ], 173 | example_values=["Nature", "Science", "Cell"], 174 | description="Journal name", 175 | underlying_api_field="journal", 176 | ), 177 | FieldDefinition( 178 | name="articles.date", 179 | domain="articles", 180 | type=FieldType.DATE, 181 | operators=[Operator.GT, Operator.LT, Operator.RANGE], 182 | example_values=[">2023-01-01", "2023-01-01..2024-01-01"], 183 | description="Publication date", 184 | underlying_api_field="date", 185 | ), 186 | ] 187 | 188 | # Variant-specific fields 189 | variant_fields = [ 190 | FieldDefinition( 191 | name="variants.rsid", 192 | domain="variants", 193 | type=FieldType.STRING, 194 | operators=[Operator.EQ], 195 | example_values=["rs113488022", "rs121913529"], 196 | description="dbSNP rsID", 197 | underlying_api_field="rsid", 198 | ), 199 | FieldDefinition( 200 | name="variants.gene", 201 | domain="variants", 202 | type=FieldType.STRING, 203 | operators=[Operator.EQ], 204 | example_values=["BRAF", "TP53"], 205 | description="Gene containing variant", 206 | underlying_api_field="gene", 207 | ), 208 | FieldDefinition( 209 | name="variants.significance", 210 | domain="variants", 211 | type=FieldType.ENUM, 212 | operators=[Operator.EQ], 213 | example_values=["pathogenic", "benign", "uncertain"], 214 | description="Clinical significance", 215 | underlying_api_field="significance", 216 | ), 217 | FieldDefinition( 218 | name="variants.frequency", 219 | domain="variants", 220 | type=FieldType.NUMBER, 221 | operators=[Operator.LT, Operator.GT], 222 | example_values=["<0.01", ">0.05"], 223 | description="Population allele frequency", 224 | underlying_api_field="frequency", 225 | ), 226 | ] 227 | 228 | # Gene-specific fields 229 | gene_fields = [ 230 | FieldDefinition( 231 | name="genes.symbol", 232 | domain="genes", 233 | type=FieldType.STRING, 234 | operators=[Operator.EQ], 235 | example_values=["BRAF", "TP53", "EGFR"], 236 | description="Gene symbol", 237 | underlying_api_field="symbol", 238 | ), 239 | FieldDefinition( 240 | name="genes.name", 241 | domain="genes", 242 | type=FieldType.STRING, 243 | operators=[Operator.EQ], 244 | example_values=[ 245 | "tumor protein p53", 246 | "epidermal growth factor receptor", 247 | ], 248 | description="Gene name", 249 | underlying_api_field="name", 250 | ), 251 | FieldDefinition( 252 | name="genes.type", 253 | domain="genes", 254 | type=FieldType.STRING, 255 | operators=[Operator.EQ], 256 | example_values=["protein-coding", "pseudo", "ncRNA"], 257 | description="Gene type", 258 | underlying_api_field="type_of_gene", 259 | ), 260 | ] 261 | 262 | # Drug-specific fields 263 | drug_fields = [ 264 | FieldDefinition( 265 | name="drugs.name", 266 | domain="drugs", 267 | type=FieldType.STRING, 268 | operators=[Operator.EQ], 269 | example_values=["imatinib", "aspirin", "metformin"], 270 | description="Drug name", 271 | underlying_api_field="name", 272 | ), 273 | FieldDefinition( 274 | name="drugs.tradename", 275 | domain="drugs", 276 | type=FieldType.STRING, 277 | operators=[Operator.EQ], 278 | example_values=["Gleevec", "Tylenol", "Lipitor"], 279 | description="Drug trade name", 280 | underlying_api_field="tradename", 281 | ), 282 | FieldDefinition( 283 | name="drugs.indication", 284 | domain="drugs", 285 | type=FieldType.STRING, 286 | operators=[Operator.EQ], 287 | example_values=["leukemia", "hypertension", "diabetes"], 288 | description="Drug indication", 289 | underlying_api_field="indication", 290 | ), 291 | ] 292 | 293 | # Disease-specific fields 294 | disease_fields = [ 295 | FieldDefinition( 296 | name="diseases.name", 297 | domain="diseases", 298 | type=FieldType.STRING, 299 | operators=[Operator.EQ], 300 | example_values=["melanoma", "breast cancer", "diabetes"], 301 | description="Disease name", 302 | underlying_api_field="name", 303 | ), 304 | FieldDefinition( 305 | name="diseases.mondo", 306 | domain="diseases", 307 | type=FieldType.STRING, 308 | operators=[Operator.EQ], 309 | example_values=["MONDO:0005105", "MONDO:0007254"], 310 | description="MONDO disease ID", 311 | underlying_api_field="mondo_id", 312 | ), 313 | FieldDefinition( 314 | name="diseases.synonym", 315 | domain="diseases", 316 | type=FieldType.STRING, 317 | operators=[Operator.EQ], 318 | example_values=["cancer", "tumor", "neoplasm"], 319 | description="Disease synonym", 320 | underlying_api_field="synonyms", 321 | ), 322 | ] 323 | 324 | # Build registry 325 | for field_list in [ 326 | cross_domain_fields, 327 | trial_fields, 328 | article_fields, 329 | variant_fields, 330 | gene_fields, 331 | drug_fields, 332 | disease_fields, 333 | ]: 334 | for field in field_list: 335 | registry[field.name] = field 336 | 337 | return registry 338 | 339 | def parse(self, query: str) -> ParsedQuery: 340 | """Parse a unified search query.""" 341 | # Simple tokenization - in production, use a proper parser 342 | terms = self._tokenize(query) 343 | parsed_terms = [] 344 | 345 | cross_domain = {} 346 | domain_specific: dict[str, dict[str, Any]] = { 347 | "trials": {}, 348 | "articles": {}, 349 | "variants": {}, 350 | "genes": {}, 351 | "drugs": {}, 352 | "diseases": {}, 353 | } 354 | 355 | for term in terms: 356 | if ":" in term: 357 | field, value = term.split(":", 1) 358 | 359 | # Check if it's a known field 360 | if field in self.field_registry: 361 | field_def = self.field_registry[field] 362 | parsed_term = QueryTerm( 363 | field=field, 364 | operator=Operator.EQ, 365 | value=value.strip('"'), 366 | domain=field_def.domain, 367 | ) 368 | parsed_terms.append(parsed_term) 369 | 370 | # Categorize the term 371 | if field_def.domain == "cross": 372 | cross_domain[field] = value.strip('"') 373 | else: 374 | domain = ( 375 | field.split(".")[0] 376 | if "." in field 377 | else field_def.domain 378 | ) 379 | if domain not in domain_specific: 380 | domain_specific[domain] = {} 381 | field_name = ( 382 | field.split(".")[-1] if "." in field else field 383 | ) 384 | domain_specific[domain][field_name] = value.strip('"') 385 | 386 | return ParsedQuery( 387 | terms=parsed_terms, 388 | cross_domain_fields=cross_domain, 389 | domain_specific_fields=domain_specific, 390 | raw_query=query, 391 | ) 392 | 393 | def _tokenize(self, query: str) -> list[str]: 394 | """Simple tokenizer for query strings.""" 395 | # This is a simplified tokenizer - in production, use a proper lexer 396 | # For now, split on AND/OR/NOT while preserving field:value pairs 397 | tokens = [] 398 | current_token = "" 399 | in_quotes = False 400 | 401 | for char in query: 402 | if char == '"': 403 | in_quotes = not in_quotes 404 | current_token += char 405 | elif char == " " and not in_quotes: 406 | if current_token: 407 | tokens.append(current_token) 408 | current_token = "" 409 | else: 410 | current_token += char 411 | 412 | if current_token: 413 | tokens.append(current_token) 414 | 415 | # Filter out boolean operators for now 416 | return [t for t in tokens if t not in ["AND", "OR", "NOT"]] 417 | 418 | def get_schema(self) -> dict[str, Any]: 419 | """Get the complete field schema for discovery.""" 420 | schema: dict[str, Any] = { 421 | "domains": [ 422 | "trials", 423 | "articles", 424 | "variants", 425 | "genes", 426 | "drugs", 427 | "diseases", 428 | ], 429 | "cross_domain_fields": {}, 430 | "domain_fields": { 431 | "trials": {}, 432 | "articles": {}, 433 | "variants": {}, 434 | "genes": {}, 435 | "drugs": {}, 436 | "diseases": {}, 437 | }, 438 | "operators": [op.value for op in Operator], 439 | "examples": [ 440 | "gene:BRAF AND trials.condition:melanoma", 441 | "articles.date:>2023 AND disease:cancer", 442 | "variants.significance:pathogenic AND gene:TP53", 443 | "genes.symbol:BRAF AND genes.type:protein-coding", 444 | "drugs.tradename:gleevec", 445 | "diseases.name:melanoma", 446 | ], 447 | } 448 | 449 | for field_name, field_def in self.field_registry.items(): 450 | field_info = { 451 | "type": field_def.type.value, 452 | "operators": field_def.operators, 453 | "examples": field_def.example_values, 454 | "description": field_def.description, 455 | } 456 | 457 | if field_def.domain == "cross": 458 | schema["cross_domain_fields"][field_name] = field_info 459 | else: 460 | domain = field_name.split(".")[0] 461 | field_short_name = field_name.split(".")[-1] 462 | schema["domain_fields"][domain][field_short_name] = field_info 463 | 464 | return schema 465 | ``` -------------------------------------------------------------------------------- /src/biomcp/resources/instructions.md: -------------------------------------------------------------------------------- ```markdown 1 | # BioMCP Instructions for the Biomedical Assistant 2 | 3 | Welcome to **BioMCP** – your unified interface to access key biomedical data 4 | sources. This document serves as an internal instruction set for the biomedical 5 | assistant (LLM) to ensure a clear, well-reasoned, and accurate response to user 6 | queries. 7 | 8 | --- 9 | 10 | ## CRITICAL: Always Use the 'think' Tool FIRST 11 | 12 | **The 'think' tool is MANDATORY and must be your FIRST action when using BioMCP.** 13 | 14 | 🚨 **REQUIRED USAGE:** 15 | 16 | - You MUST call 'think' BEFORE any search or fetch operations 17 | - EVERY biomedical research query requires thinking first 18 | - ALL multi-step analyses must begin with the think tool 19 | - ANY task using BioMCP tools requires prior planning with think 20 | 21 | ⚠️ **WARNING:** Skipping the 'think' tool will result in: 22 | 23 | - Incomplete analysis 24 | - Poor search strategies 25 | - Missing critical connections 26 | - Suboptimal results 27 | 28 | Start EVERY BioMCP interaction with the 'think' tool. Use it throughout your analysis to track progress. Only set nextThoughtNeeded=false when your analysis is complete. 29 | 30 | --- 31 | 32 | ## 1. Purpose of BioMCP 33 | 34 | BioMCP (Biomedical Model Context Protocol) standardizes access to multiple 35 | biomedical data sources. It transforms complex, filter-intensive queries into 36 | natural language interactions. The assistant should leverage this capability 37 | to: 38 | 39 | - Integrate clinical trial data, literature, variant annotations, and 40 | comprehensive biomedical information from multiple resources. 41 | - Synthesize the results into a coherent, accurate, and concise answer. 42 | - Enhance user trust by providing key snippets and citations (with clickable 43 | URLs) from the original materials, unless the user opts to omit them. 44 | 45 | --- 46 | 47 | ## 2. Available Data Sources 48 | 49 | BioMCP provides access to the following biomedical databases: 50 | 51 | ### Literature & Clinical Sources 52 | 53 | - **PubMed/PubTator3**: Peer-reviewed biomedical literature with entity annotations 54 | - **bioRxiv/medRxiv**: Preprint servers (included by default in article searches) 55 | - **Europe PMC**: Additional literature including preprints 56 | - **ClinicalTrials.gov**: Clinical trial registry with comprehensive trial data 57 | 58 | ### BioThings Suite APIs 59 | 60 | - **MyVariant.info**: Genetic variant annotations and population frequencies 61 | - **MyGene.info**: Real-time gene information, aliases, and summaries 62 | - **MyDisease.info**: Disease ontology, definitions, and synonym expansion 63 | - **MyChem.info**: Drug/chemical properties, mechanisms, and identifiers 64 | 65 | ### Cancer & Genomic Resources 66 | 67 | - **cBioPortal**: Cancer genomics data (automatically integrated with gene searches) 68 | - **TCGA/GDC**: The Cancer Genome Atlas data for variants 69 | - **1000 Genomes**: Population frequency data via Ensembl 70 | 71 | --- 72 | 73 | ## 3. Internal Workflow for Query Handling 74 | 75 | When a user query is received (for example, "Please investigate ALK 76 | rearrangements in advanced NSCLC..."), the assistant should follow these steps: 77 | 78 | ### A. ALWAYS Start with the 'think' Tool 79 | 80 | - **Use 'think' immediately:** For ANY biomedical research query, you MUST begin by invoking the 'think' tool to break down the problem systematically. 81 | - **Initial thought should:** Parse the user's natural language query and extract relevant details such as gene variants (e.g., ALK rearrangements), disease type (advanced NSCLC), and treatment focus (combinations of ALK inhibitors with immunotherapy). 82 | - **Continue thinking:** Use additional 'think' calls to plan your approach, identify data sources needed, and track your analysis progress. 83 | 84 | ### B. Plan and Explain the Tool Sequence (via the 'think' Tool) 85 | 86 | - **Use 'think' to plan:** Continue using the 'think' tool to outline your reasoning and planned tool sequence: 87 | - **Step 1:** Use gene_getter to understand ALK gene function and context. 88 | - **Step 2:** Use disease_getter to get comprehensive information about NSCLC, 89 | including synonyms for better search coverage. 90 | - **Step 3:** Use ClinicalTrials.gov to retrieve clinical trial data 91 | related to the query (disease synonyms are automatically expanded). 92 | - **Step 4:** Use PubMed (via PubTator3) to fetch relevant literature 93 | discussing outcomes or synergy. Note: Preprints from bioRxiv/medRxiv 94 | are included by default, and cBioPortal cancer genomics data is 95 | automatically integrated for gene-based searches. 96 | - **Step 5:** Query MyVariant.info for variant annotations (noting 97 | limitations for gene fusions if applicable). 98 | - **Step 6:** If specific drugs are mentioned, use drug_getter for 99 | mechanism of action and properties. 100 | - **Transparency:** Clearly indicate which tool is being called for which part 101 | of the query. 102 | 103 | #### Search Syntax Enhancement: OR Logic for Keywords 104 | 105 | When searching articles, the keywords parameter now supports OR logic using the pipe (|) separator: 106 | 107 | **Syntax**: `keyword1|keyword2|keyword3` 108 | 109 | **Examples**: 110 | 111 | - `"R173|Arg173|p.R173"` - Finds articles mentioning any of these variant notations 112 | - `"V600E|p.V600E|c.1799T>A"` - Handles different mutation nomenclatures 113 | - `"immunotherapy|checkpoint inhibitor|PD-1"` - Searches for related treatment terms 114 | - `"NSCLC|non-small cell lung cancer"` - Covers abbreviations and full terms 115 | 116 | **Important Notes**: 117 | 118 | - OR logic only applies within a single keyword parameter 119 | - Multiple keywords are still combined with AND logic 120 | - Example: keywords=["BRAF|B-RAF", "therapy|treatment"] means: 121 | - (BRAF OR B-RAF) AND (therapy OR treatment) 122 | 123 | This feature is particularly useful for: 124 | 125 | - Handling different nomenclatures for the same concept 126 | - Searching for synonyms or related terms 127 | - Dealing with abbreviations and full names 128 | - Finding articles that use different notations for variants 129 | 130 | ### C. Execute and Synthesize Results 131 | 132 | - **Combine Data:** After retrieving results from each tool, synthesize the 133 | information into a final answer. 134 | - **Include Citations with URLs:** Always include clickable URLs from the 135 | original sources in your citations. Extract URLs (Pubmed_Url, Doi_Url, 136 | Study_Url, etc.) from function results and incorporate these into your 137 | response when referencing specific findings or papers. 138 | - **Follow-up Opportunity:** If the response leaves any ambiguity or if 139 | additional information might be helpful, prompt the user for follow-up 140 | questions. 141 | 142 | --- 143 | 144 | ## 3. Best Practices for the Biomedical Assistant 145 | 146 | - **Understanding the Query:** Focus on accurately interpreting the user's 147 | query, rather than instructing the user on query formulation. 148 | - **Reasoning Transparency:** Briefly explain your thought process and the 149 | sequence of tool calls before presenting the final answer. 150 | - **Conciseness and Clarity:** Ensure your final response is succinct and 151 | well-organized, using bullet points or sections as needed. 152 | - **Citation Inclusion Mandatory:** Provide key snippets and links to the 153 | original materials (e.g., clinical trial records, PubMed articles, ClinVar 154 | entries, COSMIC database) to support the answer. ALWAYS include clickable 155 | URLs to these resources when referencing specific findings or data. 156 | - **User Follow-up Questions Before Startup:** If anything is unclear in the 157 | user's query or if more details would improve the answer, politely request 158 | additional clarification. 159 | - **Audience Awareness:** Structure your response with both depth for 160 | specialists and clarity for general audiences. Begin with accessible 161 | explanations before delving into scientific details. 162 | - **Organization and Clarity:** Ensure your final response is well-structured, 163 | accessible, and easy to navigate by: 164 | - Using descriptive section headings and subheadings to organize 165 | information logically 166 | - Employing consistent formatting with bulleted or numbered lists to break 167 | down complex information 168 | - Starting each major section with a plain-language summary before 169 | exploring technical details 170 | - Creating clear visual separation between different topics 171 | - Using concise sentence structures while maintaining informational depth 172 | - Explicitly differentiating between established practices and experimental 173 | approaches 174 | - Including brief transition sentences between major sections 175 | - Presenting clinical trial data in consistent formats 176 | - Using strategic white space to improve readability 177 | - Summarizing key takeaways at the end of major sections when appropriate 178 | 179 | --- 180 | 181 | ## 4. Visual Organization and Formatting 182 | 183 | - **Comparison Tables:** When comparing two or more entities (like mutation 184 | classes, treatment approaches, or clinical trials), create a comparison table 185 | to highlight key differences at a glance. Tables should have clear headers, 186 | consistent formatting, and focus on the most important distinguishing 187 | features. 188 | - **Format Optimization:** Utilize formatting elements strategically - tables 189 | for comparisons, bullet points for lists, headings for section organization, 190 | and whitespace for readability. 191 | - **Visual Hierarchy:** For complex biomedical topics, create a visual 192 | hierarchy that helps readers quickly identify key information. 193 | - **Balance Between Comprehensiveness and Clarity:** While providing 194 | comprehensive information, prioritize clarity and accessibility. Organize 195 | content from most important/general to more specialized details. 196 | - **Section Summaries:** Conclude sections with key takeaways that highlight 197 | the practical implications of the scientific information. 198 | 199 | --- 200 | 201 | ## 5. Example Scenario: ALK Rearrangements in Advanced NSCLC 202 | 203 | ### Example 1: ALK Rearrangements in Advanced NSCLC 204 | 205 | For a query such as: 206 | 207 | ``` 208 | Please investigate ALK rearrangements in advanced NSCLC, particularly any 209 | clinical trials exploring combinations of ALK inhibitors and immunotherapy. 210 | ``` 211 | 212 | The assistant should: 213 | 214 | 1. **Start with the 'think' Tool:** 215 | - Invoke 'think' with thoughtNumber=1 to understand the query focus on ALK rearrangements in advanced NSCLC with combination treatments 216 | - Use thoughtNumber=2 to plan the research approach and identify needed data sources 217 | 2. **Execute Tool Calls (tracking with 'think'):** 218 | - **First:** Use gene_getter("ALK") to understand the gene's function and role in cancer (document findings in thoughtNumber=3) 219 | - **Second:** Use disease_getter("NSCLC") to get disease information and synonyms like "non-small cell lung cancer" (document in thoughtNumber=4) 220 | - **Third:** Query ClinicalTrials.gov for ALK+ NSCLC trials that combine ALK inhibitors with immunotherapy (document findings in thoughtNumber=5) 221 | - **Fourth:** Query PubMed to retrieve key articles discussing treatment outcomes or synergy (document in thoughtNumber=6) 222 | - **Fifth:** Check MyVariant.info for any annotations on ALK fusions or rearrangements (document in thoughtNumber=7) 223 | - **Sixth:** If specific ALK inhibitors are mentioned, use drug_getter to understand their mechanisms (document in thoughtNumber=8) 224 | 3. **Synthesize and Report (via 'think'):** Use final thoughts to synthesize findings before producing the answer that includes: 225 | - A concise summary of clinical trials with comparison tables like: 226 | 227 | | **Trial** | **Combination** | **Patient Population** | **Results** | **Safety Profile** | **Reference** | 228 | | ---------------- | ---------------------- | ------------------------------ | ----------- | ----------------------------------------------- | ---------------------------------------------------------------- | 229 | | CheckMate 370 | Crizotinib + Nivolumab | 13 treatment-naive ALK+ NSCLC | 38% ORR | 5/13 with grade ≥3 hepatic toxicities; 2 deaths | [Schenk et al., 2023](https://pubmed.ncbi.nlm.nih.gov/36895933/) | 230 | | JAVELIN Lung 101 | Avelumab + Lorlatinib | 28 previously treated patients | 46.4% ORR | No DLTs; milder toxicity | [NCT02584634](https://clinicaltrials.gov/study/NCT02584634) | 231 | 232 | - Key literature findings with proper citations: 233 | "A review by Schenk concluded that combining ALK inhibitors with checkpoint inhibitors resulted in 'significant toxicities without clear improvement in patient outcomes' [https://pubmed.ncbi.nlm.nih.gov/36895933/](https://pubmed.ncbi.nlm.nih.gov/36895933/)." 234 | 235 | - Tables comparing response rates: 236 | 237 | | **Study** | **Patient Population** | **Immunotherapy Agent** | **Response Rate** | **Reference** | 238 | | --------------------- | ---------------------- | ----------------------------- | ----------------- | ------------------------------------------------------------- | 239 | | ATLANTIC Trial | 11 ALK+ NSCLC | Durvalumab | 0% | [Link to study](https://pubmed.ncbi.nlm.nih.gov/36895933/) | 240 | | IMMUNOTARGET Registry | 19 ALK+ NSCLC | Various PD-1/PD-L1 inhibitors | 0% | [Link to registry](https://pubmed.ncbi.nlm.nih.gov/36895933/) | 241 | 242 | - Variant information with proper attribution. 243 | 244 | 4. **Offer Follow-up:** Conclude by asking if further details are needed or if 245 | any part of the answer should be clarified. 246 | 247 | ### Example 2: BRAF Mutation Classes in Cancer Therapeutics 248 | 249 | For a query such as: 250 | 251 | ``` 252 | Please investigate the differences in BRAF Class I (e.g., V600E) and Class III 253 | (e.g., D594G) mutations that lead to different therapeutic strategies in cancers 254 | like melanoma or colorectal carcinoma. 255 | ``` 256 | 257 | The assistant should: 258 | 259 | 1. **Understand and Clarify:** Identify that the query focuses on comparing two 260 | specific BRAF mutation classes (Class I/V600E vs. Class III/D594G) and their 261 | therapeutic implications in melanoma and colorectal cancer. 262 | 263 | 2. **Plan Tool Calls:** 264 | 265 | - **First:** Search PubMed literature to understand the molecular 266 | differences between BRAF Class I and Class III mutations. 267 | - **Second:** Explore specific variant details using the variant search 268 | tool to understand the characteristics of these mutations. 269 | - **Third:** Look for clinical trials involving these mutation types to 270 | identify therapeutic strategies. 271 | 272 | 3. **Synthesize and Report:** Create a comprehensive comparison that includes: 273 | - Comparison tables highlighting key differences between mutation classes: 274 | 275 | | Feature | Class I (e.g., V600E) | Class III (e.g., D594G) | 276 | | ---------------------------- | ------------------------------ | ------------------------------------------ | 277 | | **Signaling Mechanism** | Constitutively active monomers | Kinase-impaired heterodimers | 278 | | **RAS Dependency** | RAS-independent | RAS-dependent | 279 | | **Dimerization Requirement** | Function as monomers | Require heterodimerization with CRAF | 280 | | **Therapeutic Response** | Responsive to BRAF inhibitors | Paradoxically activated by BRAF inhibitors | 281 | 282 | - Specific therapeutic strategies with clickable citation links: 283 | - For Class I: BRAF inhibitors as demonstrated 284 | in [Davies et al.](https://pubmed.ncbi.nlm.nih.gov/35869122/) 285 | - For Class III: Alternative approaches such as MEK inhibitors shown 286 | in [Śmiech et al.](https://pubmed.ncbi.nlm.nih.gov/33198372/) 287 | 288 | - Cancer-specific implications with relevant clinical evidence: 289 | - Melanoma treatment differences including clinical trial data 290 | from [NCT05767879](https://clinicaltrials.gov/study/NCT05767879) 291 | - Colorectal cancer approaches citing research 292 | from [Liu et al.](https://pubmed.ncbi.nlm.nih.gov/37760573/) 293 | 294 | 4. **Offer Follow-up:** Conclude by asking if the user would like more detailed 295 | information on specific aspects, such as resistance mechanisms, emerging 296 | therapies, or mutation detection methods. 297 | ```