genomoncology/biomcp # codebase.md

This is page 12 of 19. Use http://codebase.md/genomoncology/biomcp?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .github
│   ├── actions
│   │   └── setup-python-env
│   │       └── action.yml
│   ├── dependabot.yml
│   └── workflows
│       ├── ci.yml
│       ├── deploy-docs.yml
│       ├── main.yml.disabled
│       ├── on-release-main.yml
│       └── validate-codecov-config.yml
├── .gitignore
├── .pre-commit-config.yaml
├── BIOMCP_DATA_FLOW.md
├── CHANGELOG.md
├── CNAME
├── codecov.yaml
├── docker-compose.yml
├── Dockerfile
├── docs
│   ├── apis
│   │   ├── error-codes.md
│   │   ├── overview.md
│   │   └── python-sdk.md
│   ├── assets
│   │   ├── biomcp-cursor-locations.png
│   │   ├── favicon.ico
│   │   ├── icon.png
│   │   ├── logo.png
│   │   ├── mcp_architecture.txt
│   │   └── remote-connection
│   │       ├── 00_connectors.png
│   │       ├── 01_add_custom_connector.png
│   │       ├── 02_connector_enabled.png
│   │       ├── 03_connect_to_biomcp.png
│   │       ├── 04_select_google_oauth.png
│   │       └── 05_success_connect.png
│   ├── backend-services-reference
│   │   ├── 01-overview.md
│   │   ├── 02-biothings-suite.md
│   │   ├── 03-cbioportal.md
│   │   ├── 04-clinicaltrials-gov.md
│   │   ├── 05-nci-cts-api.md
│   │   ├── 06-pubtator3.md
│   │   └── 07-alphagenome.md
│   ├── blog
│   │   ├── ai-assisted-clinical-trial-search-analysis.md
│   │   ├── images
│   │   │   ├── deep-researcher-video.png
│   │   │   ├── researcher-announce.png
│   │   │   ├── researcher-drop-down.png
│   │   │   ├── researcher-prompt.png
│   │   │   ├── trial-search-assistant.png
│   │   │   └── what_is_biomcp_thumbnail.png
│   │   └── researcher-persona-resource.md
│   ├── changelog.md
│   ├── CNAME
│   ├── concepts
│   │   ├── 01-what-is-biomcp.md
│   │   ├── 02-the-deep-researcher-persona.md
│   │   └── 03-sequential-thinking-with-the-think-tool.md
│   ├── developer-guides
│   │   ├── 01-server-deployment.md
│   │   ├── 02-contributing-and-testing.md
│   │   ├── 03-third-party-endpoints.md
│   │   ├── 04-transport-protocol.md
│   │   ├── 05-error-handling.md
│   │   ├── 06-http-client-and-caching.md
│   │   ├── 07-performance-optimizations.md
│   │   └── generate_endpoints.py
│   ├── faq-condensed.md
│   ├── FDA_SECURITY.md
│   ├── genomoncology.md
│   ├── getting-started
│   │   ├── 01-quickstart-cli.md
│   │   ├── 02-claude-desktop-integration.md
│   │   └── 03-authentication-and-api-keys.md
│   ├── how-to-guides
│   │   ├── 01-find-articles-and-cbioportal-data.md
│   │   ├── 02-find-trials-with-nci-and-biothings.md
│   │   ├── 03-get-comprehensive-variant-annotations.md
│   │   ├── 04-predict-variant-effects-with-alphagenome.md
│   │   ├── 05-logging-and-monitoring-with-bigquery.md
│   │   └── 06-search-nci-organizations-and-interventions.md
│   ├── index.md
│   ├── policies.md
│   ├── reference
│   │   ├── architecture-diagrams.md
│   │   ├── quick-architecture.md
│   │   ├── quick-reference.md
│   │   └── visual-architecture.md
│   ├── robots.txt
│   ├── stylesheets
│   │   ├── announcement.css
│   │   └── extra.css
│   ├── troubleshooting.md
│   ├── tutorials
│   │   ├── biothings-prompts.md
│   │   ├── claude-code-biomcp-alphagenome.md
│   │   ├── nci-prompts.md
│   │   ├── openfda-integration.md
│   │   ├── openfda-prompts.md
│   │   ├── pydantic-ai-integration.md
│   │   └── remote-connection.md
│   ├── user-guides
│   │   ├── 01-command-line-interface.md
│   │   ├── 02-mcp-tools-reference.md
│   │   └── 03-integrating-with-ides-and-clients.md
│   └── workflows
│       └── all-workflows.md
├── example_scripts
│   ├── mcp_integration.py
│   └── python_sdk.py
├── glama.json
├── LICENSE
├── lzyank.toml
├── Makefile
├── mkdocs.yml
├── package-lock.json
├── package.json
├── pyproject.toml
├── README.md
├── scripts
│   ├── check_docs_in_mkdocs.py
│   ├── check_http_imports.py
│   └── generate_endpoints_doc.py
├── smithery.yaml
├── src
│   └── biomcp
│       ├── __init__.py
│       ├── __main__.py
│       ├── articles
│       │   ├── __init__.py
│       │   ├── autocomplete.py
│       │   ├── fetch.py
│       │   ├── preprints.py
│       │   ├── search_optimized.py
│       │   ├── search.py
│       │   └── unified.py
│       ├── biomarkers
│       │   ├── __init__.py
│       │   └── search.py
│       ├── cbioportal_helper.py
│       ├── circuit_breaker.py
│       ├── cli
│       │   ├── __init__.py
│       │   ├── articles.py
│       │   ├── biomarkers.py
│       │   ├── diseases.py
│       │   ├── health.py
│       │   ├── interventions.py
│       │   ├── main.py
│       │   ├── openfda.py
│       │   ├── organizations.py
│       │   ├── server.py
│       │   ├── trials.py
│       │   └── variants.py
│       ├── connection_pool.py
│       ├── constants.py
│       ├── core.py
│       ├── diseases
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── domain_handlers.py
│       ├── drugs
│       │   ├── __init__.py
│       │   └── getter.py
│       ├── exceptions.py
│       ├── genes
│       │   ├── __init__.py
│       │   └── getter.py
│       ├── http_client_simple.py
│       ├── http_client.py
│       ├── individual_tools.py
│       ├── integrations
│       │   ├── __init__.py
│       │   ├── biothings_client.py
│       │   └── cts_api.py
│       ├── interventions
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── logging_filter.py
│       ├── metrics_handler.py
│       ├── metrics.py
│       ├── openfda
│       │   ├── __init__.py
│       │   ├── adverse_events_helpers.py
│       │   ├── adverse_events.py
│       │   ├── cache.py
│       │   ├── constants.py
│       │   ├── device_events_helpers.py
│       │   ├── device_events.py
│       │   ├── drug_approvals.py
│       │   ├── drug_labels_helpers.py
│       │   ├── drug_labels.py
│       │   ├── drug_recalls_helpers.py
│       │   ├── drug_recalls.py
│       │   ├── drug_shortages_detail_helpers.py
│       │   ├── drug_shortages_helpers.py
│       │   ├── drug_shortages.py
│       │   ├── exceptions.py
│       │   ├── input_validation.py
│       │   ├── rate_limiter.py
│       │   ├── utils.py
│       │   └── validation.py
│       ├── organizations
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── parameter_parser.py
│       ├── prefetch.py
│       ├── query_parser.py
│       ├── query_router.py
│       ├── rate_limiter.py
│       ├── render.py
│       ├── request_batcher.py
│       ├── resources
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   ├── instructions.md
│       │   └── researcher.md
│       ├── retry.py
│       ├── router_handlers.py
│       ├── router.py
│       ├── shared_context.py
│       ├── thinking
│       │   ├── __init__.py
│       │   ├── sequential.py
│       │   └── session.py
│       ├── thinking_tool.py
│       ├── thinking_tracker.py
│       ├── trials
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   ├── nci_getter.py
│       │   ├── nci_search.py
│       │   └── search.py
│       ├── utils
│       │   ├── __init__.py
│       │   ├── cancer_types_api.py
│       │   ├── cbio_http_adapter.py
│       │   ├── endpoint_registry.py
│       │   ├── gene_validator.py
│       │   ├── metrics.py
│       │   ├── mutation_filter.py
│       │   ├── query_utils.py
│       │   ├── rate_limiter.py
│       │   └── request_cache.py
│       ├── variants
│       │   ├── __init__.py
│       │   ├── alphagenome.py
│       │   ├── cancer_types.py
│       │   ├── cbio_external_client.py
│       │   ├── cbioportal_mutations.py
│       │   ├── cbioportal_search_helpers.py
│       │   ├── cbioportal_search.py
│       │   ├── constants.py
│       │   ├── external.py
│       │   ├── filters.py
│       │   ├── getter.py
│       │   ├── links.py
│       │   └── search.py
│       └── workers
│           ├── __init__.py
│           ├── worker_entry_stytch.js
│           ├── worker_entry.js
│           └── worker.py
├── tests
│   ├── bdd
│   │   ├── cli_help
│   │   │   ├── help.feature
│   │   │   └── test_help.py
│   │   ├── conftest.py
│   │   ├── features
│   │   │   └── alphagenome_integration.feature
│   │   ├── fetch_articles
│   │   │   ├── fetch.feature
│   │   │   └── test_fetch.py
│   │   ├── get_trials
│   │   │   ├── get.feature
│   │   │   └── test_get.py
│   │   ├── get_variants
│   │   │   ├── get.feature
│   │   │   └── test_get.py
│   │   ├── search_articles
│   │   │   ├── autocomplete.feature
│   │   │   ├── search.feature
│   │   │   ├── test_autocomplete.py
│   │   │   └── test_search.py
│   │   ├── search_trials
│   │   │   ├── search.feature
│   │   │   └── test_search.py
│   │   ├── search_variants
│   │   │   ├── search.feature
│   │   │   └── test_search.py
│   │   └── steps
│   │       └── test_alphagenome_steps.py
│   ├── config
│   │   └── test_smithery_config.py
│   ├── conftest.py
│   ├── data
│   │   ├── ct_gov
│   │   │   ├── clinical_trials_api_v2.yaml
│   │   │   ├── trials_NCT04280705.json
│   │   │   └── trials_NCT04280705.txt
│   │   ├── myvariant
│   │   │   ├── myvariant_api.yaml
│   │   │   ├── myvariant_field_descriptions.csv
│   │   │   ├── variants_full_braf_v600e.json
│   │   │   ├── variants_full_braf_v600e.txt
│   │   │   └── variants_part_braf_v600_multiple.json
│   │   ├── openfda
│   │   │   ├── drugsfda_detail.json
│   │   │   ├── drugsfda_search.json
│   │   │   ├── enforcement_detail.json
│   │   │   └── enforcement_search.json
│   │   └── pubtator
│   │       ├── pubtator_autocomplete.json
│   │       └── pubtator3_paper.txt
│   ├── integration
│   │   ├── test_openfda_integration.py
│   │   ├── test_preprints_integration.py
│   │   ├── test_simple.py
│   │   └── test_variants_integration.py
│   ├── tdd
│   │   ├── articles
│   │   │   ├── test_autocomplete.py
│   │   │   ├── test_cbioportal_integration.py
│   │   │   ├── test_fetch.py
│   │   │   ├── test_preprints.py
│   │   │   ├── test_search.py
│   │   │   └── test_unified.py
│   │   ├── conftest.py
│   │   ├── drugs
│   │   │   ├── __init__.py
│   │   │   └── test_drug_getter.py
│   │   ├── openfda
│   │   │   ├── __init__.py
│   │   │   ├── test_adverse_events.py
│   │   │   ├── test_device_events.py
│   │   │   ├── test_drug_approvals.py
│   │   │   ├── test_drug_labels.py
│   │   │   ├── test_drug_recalls.py
│   │   │   ├── test_drug_shortages.py
│   │   │   └── test_security.py
│   │   ├── test_biothings_integration_real.py
│   │   ├── test_biothings_integration.py
│   │   ├── test_circuit_breaker.py
│   │   ├── test_concurrent_requests.py
│   │   ├── test_connection_pool.py
│   │   ├── test_domain_handlers.py
│   │   ├── test_drug_approvals.py
│   │   ├── test_drug_recalls.py
│   │   ├── test_drug_shortages.py
│   │   ├── test_endpoint_documentation.py
│   │   ├── test_error_scenarios.py
│   │   ├── test_europe_pmc_fetch.py
│   │   ├── test_mcp_integration.py
│   │   ├── test_mcp_tools.py
│   │   ├── test_metrics.py
│   │   ├── test_nci_integration.py
│   │   ├── test_nci_mcp_tools.py
│   │   ├── test_network_policies.py
│   │   ├── test_offline_mode.py
│   │   ├── test_openfda_unified.py
│   │   ├── test_pten_r173_search.py
│   │   ├── test_render.py
│   │   ├── test_request_batcher.py.disabled
│   │   ├── test_retry.py
│   │   ├── test_router.py
│   │   ├── test_shared_context.py.disabled
│   │   ├── test_unified_biothings.py
│   │   ├── thinking
│   │   │   ├── __init__.py
│   │   │   └── test_sequential.py
│   │   ├── trials
│   │   │   ├── test_backward_compatibility.py
│   │   │   ├── test_getter.py
│   │   │   └── test_search.py
│   │   ├── utils
│   │   │   ├── test_gene_validator.py
│   │   │   ├── test_mutation_filter.py
│   │   │   ├── test_rate_limiter.py
│   │   │   └── test_request_cache.py
│   │   ├── variants
│   │   │   ├── constants.py
│   │   │   ├── test_alphagenome_api_key.py
│   │   │   ├── test_alphagenome_comprehensive.py
│   │   │   ├── test_alphagenome.py
│   │   │   ├── test_cbioportal_mutations.py
│   │   │   ├── test_cbioportal_search.py
│   │   │   ├── test_external_integration.py
│   │   │   ├── test_external.py
│   │   │   ├── test_extract_gene_aa_change.py
│   │   │   ├── test_filters.py
│   │   │   ├── test_getter.py
│   │   │   ├── test_links.py
│   │   │   └── test_search.py
│   │   └── workers
│   │       └── test_worker_sanitization.js
│   └── test_pydantic_ai_integration.py
├── THIRD_PARTY_ENDPOINTS.md
├── tox.ini
├── uv.lock
└── wrangler.toml
```

# Files

--------------------------------------------------------------------------------
/src/biomcp/query_router.py:
--------------------------------------------------------------------------------

```python
  1 | """Query router for unified search in BioMCP."""
  2 | 
  3 | import asyncio
  4 | from dataclasses import dataclass
  5 | from typing import Any
  6 | 
  7 | from biomcp.articles.search import PubmedRequest
  8 | from biomcp.articles.unified import search_articles_unified
  9 | from biomcp.query_parser import ParsedQuery
 10 | from biomcp.trials.search import TrialQuery, search_trials
 11 | from biomcp.variants.search import VariantQuery, search_variants
 12 | 
 13 | 
 14 | @dataclass
 15 | class RoutingPlan:
 16 |     """Plan for routing a query to appropriate tools."""
 17 | 
 18 |     tools_to_call: list[str]
 19 |     field_mappings: dict[str, dict[str, Any]]
 20 |     coordination_strategy: str = "parallel"
 21 | 
 22 | 
 23 | class QueryRouter:
 24 |     """Routes unified queries to appropriate domain-specific tools."""
 25 | 
 26 |     def route(self, parsed_query: ParsedQuery) -> RoutingPlan:
 27 |         """Determine which tools to call based on query fields."""
 28 |         tools_to_call = []
 29 |         field_mappings = {}
 30 | 
 31 |         # Check which domains are referenced
 32 |         domains_referenced = self._get_referenced_domains(parsed_query)
 33 | 
 34 |         # Build field mappings for each domain
 35 |         domain_mappers = {
 36 |             "articles": ("article_searcher", self._map_article_fields),
 37 |             "trials": ("trial_searcher", self._map_trial_fields),
 38 |             "variants": ("variant_searcher", self._map_variant_fields),
 39 |             "genes": ("gene_searcher", self._map_gene_fields),
 40 |             "drugs": ("drug_searcher", self._map_drug_fields),
 41 |             "diseases": ("disease_searcher", self._map_disease_fields),
 42 |         }
 43 | 
 44 |         for domain, (tool_name, mapper_func) in domain_mappers.items():
 45 |             if domain in domains_referenced:
 46 |                 tools_to_call.append(tool_name)
 47 |                 field_mappings[tool_name] = mapper_func(parsed_query)
 48 | 
 49 |         return RoutingPlan(
 50 |             tools_to_call=tools_to_call,
 51 |             field_mappings=field_mappings,
 52 |             coordination_strategy="parallel",
 53 |         )
 54 | 
 55 |     def _get_referenced_domains(self, parsed_query: ParsedQuery) -> set[str]:
 56 |         """Get all domains referenced in the query."""
 57 |         domains_referenced = set()
 58 | 
 59 |         # Check domain-specific fields
 60 |         for domain, fields in parsed_query.domain_specific_fields.items():
 61 |             if fields:
 62 |                 domains_referenced.add(domain)
 63 | 
 64 |         # Check cross-domain fields (these trigger multiple searches)
 65 |         if parsed_query.cross_domain_fields:
 66 |             cross_domain_mappings = {
 67 |                 "gene": ["articles", "variants", "genes", "trials"],
 68 |                 "disease": ["articles", "trials", "diseases"],
 69 |                 "variant": ["articles", "variants"],
 70 |                 "chemical": ["articles", "trials", "drugs"],
 71 |                 "drug": ["articles", "trials", "drugs"],
 72 |             }
 73 | 
 74 |             for field, domains in cross_domain_mappings.items():
 75 |                 if field in parsed_query.cross_domain_fields:
 76 |                     domains_referenced.update(domains)
 77 | 
 78 |         return domains_referenced
 79 | 
 80 |     def _map_article_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
 81 |         """Map query fields to article searcher parameters."""
 82 |         mapping: dict[str, Any] = {}
 83 | 
 84 |         # Map cross-domain fields
 85 |         if "gene" in parsed_query.cross_domain_fields:
 86 |             mapping["genes"] = [parsed_query.cross_domain_fields["gene"]]
 87 |         if "disease" in parsed_query.cross_domain_fields:
 88 |             mapping["diseases"] = [parsed_query.cross_domain_fields["disease"]]
 89 |         if "variant" in parsed_query.cross_domain_fields:
 90 |             mapping["variants"] = [parsed_query.cross_domain_fields["variant"]]
 91 | 
 92 |         # Map article-specific fields
 93 |         article_fields = parsed_query.domain_specific_fields.get(
 94 |             "articles", {}
 95 |         )
 96 |         if "title" in article_fields:
 97 |             mapping["keywords"] = [article_fields["title"]]
 98 |         if "author" in article_fields:
 99 |             mapping["keywords"] = mapping.get("keywords", []) + [
100 |                 article_fields["author"]
101 |             ]
102 |         if "journal" in article_fields:
103 |             mapping["keywords"] = mapping.get("keywords", []) + [
104 |                 article_fields["journal"]
105 |             ]
106 | 
107 |         # Extract mutation patterns from raw query
108 |         import re
109 | 
110 |         raw_query = parsed_query.raw_query
111 |         # Look for mutation patterns like F57Y, F57*, V600E
112 |         mutation_patterns = re.findall(r"\b[A-Z]\d+[A-Z*]\b", raw_query)
113 |         if mutation_patterns:
114 |             if "keywords" not in mapping:
115 |                 mapping["keywords"] = []
116 |             mapping["keywords"].extend(mutation_patterns)
117 | 
118 |         return mapping
119 | 
120 |     def _map_trial_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
121 |         """Map query fields to trial searcher parameters."""
122 |         mapping: dict[str, Any] = {}
123 | 
124 |         # Map cross-domain fields
125 |         if "disease" in parsed_query.cross_domain_fields:
126 |             mapping["conditions"] = [
127 |                 parsed_query.cross_domain_fields["disease"]
128 |             ]
129 | 
130 |         # Gene searches in trials might look for targeted therapies
131 |         if "gene" in parsed_query.cross_domain_fields:
132 |             gene = parsed_query.cross_domain_fields["gene"]
133 |             # Search for gene-targeted interventions
134 |             mapping["keywords"] = [gene]
135 | 
136 |         # Map trial-specific fields
137 |         trial_fields = parsed_query.domain_specific_fields.get("trials", {})
138 |         if "condition" in trial_fields:
139 |             mapping["conditions"] = [trial_fields["condition"]]
140 |         if "intervention" in trial_fields:
141 |             mapping["interventions"] = [trial_fields["intervention"]]
142 |         if "phase" in trial_fields:
143 |             mapping["phase"] = f"PHASE{trial_fields['phase']}"
144 |         if "status" in trial_fields:
145 |             mapping["recruiting_status"] = trial_fields["status"].upper()
146 | 
147 |         return mapping
148 | 
149 |     def _map_variant_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
150 |         """Map query fields to variant searcher parameters."""
151 |         mapping: dict[str, Any] = {}
152 | 
153 |         # Map cross-domain fields
154 |         if "gene" in parsed_query.cross_domain_fields:
155 |             mapping["gene"] = parsed_query.cross_domain_fields["gene"]
156 |         if "variant" in parsed_query.cross_domain_fields:
157 |             variant = parsed_query.cross_domain_fields["variant"]
158 |             # Check if it's an rsID or protein change
159 |             if variant.startswith("rs"):
160 |                 mapping["rsid"] = variant
161 |             else:
162 |                 mapping["hgvsp"] = variant
163 | 
164 |         # Map variant-specific fields
165 |         variant_fields = parsed_query.domain_specific_fields.get(
166 |             "variants", {}
167 |         )
168 |         if "rsid" in variant_fields:
169 |             mapping["rsid"] = variant_fields["rsid"]
170 |         if "gene" in variant_fields:
171 |             mapping["gene"] = variant_fields["gene"]
172 |         if "significance" in variant_fields:
173 |             mapping["significance"] = variant_fields["significance"]
174 |         if "frequency" in variant_fields:
175 |             # Parse frequency operators
176 |             freq = variant_fields["frequency"]
177 |             if freq.startswith("<"):
178 |                 mapping["max_frequency"] = float(freq[1:])
179 |             elif freq.startswith(">"):
180 |                 mapping["min_frequency"] = float(freq[1:])
181 | 
182 |         return mapping
183 | 
184 |     def _map_gene_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
185 |         """Map query fields to gene searcher parameters."""
186 |         mapping: dict[str, Any] = {}
187 | 
188 |         # Map cross-domain fields
189 |         if "gene" in parsed_query.cross_domain_fields:
190 |             mapping["query"] = parsed_query.cross_domain_fields["gene"]
191 | 
192 |         # Map gene-specific fields
193 |         gene_fields = parsed_query.domain_specific_fields.get("genes", {})
194 |         if "symbol" in gene_fields:
195 |             mapping["query"] = gene_fields["symbol"]
196 |         elif "name" in gene_fields:
197 |             mapping["query"] = gene_fields["name"]
198 |         elif "type" in gene_fields:
199 |             mapping["type_of_gene"] = gene_fields["type"]
200 | 
201 |         return mapping
202 | 
203 |     def _map_drug_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
204 |         """Map query fields to drug searcher parameters."""
205 |         mapping: dict[str, Any] = {}
206 | 
207 |         # Map cross-domain fields
208 |         if "chemical" in parsed_query.cross_domain_fields:
209 |             mapping["query"] = parsed_query.cross_domain_fields["chemical"]
210 |         elif "drug" in parsed_query.cross_domain_fields:
211 |             mapping["query"] = parsed_query.cross_domain_fields["drug"]
212 | 
213 |         # Map drug-specific fields
214 |         drug_fields = parsed_query.domain_specific_fields.get("drugs", {})
215 |         if "name" in drug_fields:
216 |             mapping["query"] = drug_fields["name"]
217 |         elif "tradename" in drug_fields:
218 |             mapping["query"] = drug_fields["tradename"]
219 |         elif "indication" in drug_fields:
220 |             mapping["indication"] = drug_fields["indication"]
221 | 
222 |         return mapping
223 | 
224 |     def _map_disease_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
225 |         """Map query fields to disease searcher parameters."""
226 |         mapping: dict[str, Any] = {}
227 | 
228 |         # Map cross-domain fields
229 |         if "disease" in parsed_query.cross_domain_fields:
230 |             mapping["query"] = parsed_query.cross_domain_fields["disease"]
231 | 
232 |         # Map disease-specific fields
233 |         disease_fields = parsed_query.domain_specific_fields.get(
234 |             "diseases", {}
235 |         )
236 |         if "name" in disease_fields:
237 |             mapping["query"] = disease_fields["name"]
238 |         elif "mondo" in disease_fields:
239 |             mapping["query"] = disease_fields["mondo"]
240 |         elif "synonym" in disease_fields:
241 |             mapping["query"] = disease_fields["synonym"]
242 | 
243 |         return mapping
244 | 
245 | 
246 | async def execute_routing_plan(
247 |     plan: RoutingPlan, output_json: bool = True
248 | ) -> dict[str, Any]:
249 |     """Execute a routing plan by calling the appropriate tools."""
250 |     tasks = []
251 |     task_names = []
252 | 
253 |     for tool_name in plan.tools_to_call:
254 |         params = plan.field_mappings[tool_name]
255 | 
256 |         if tool_name == "article_searcher":
257 |             request = PubmedRequest(**params)
258 |             tasks.append(
259 |                 search_articles_unified(
260 |                     request,
261 |                     include_pubmed=True,
262 |                     include_preprints=False,
263 |                     output_json=output_json,
264 |                 )
265 |             )
266 |             task_names.append("articles")
267 | 
268 |         elif tool_name == "trial_searcher":
269 |             query = TrialQuery(**params)
270 |             tasks.append(search_trials(query, output_json=output_json))
271 |             task_names.append("trials")
272 | 
273 |         elif tool_name == "variant_searcher":
274 |             variant_query = VariantQuery(**params)
275 |             tasks.append(
276 |                 search_variants(variant_query, output_json=output_json)
277 |             )
278 |             task_names.append("variants")
279 | 
280 |         elif tool_name == "gene_searcher":
281 |             # For gene search, we'll use the BioThingsClient directly
282 |             from biomcp.integrations.biothings_client import BioThingsClient
283 | 
284 |             client = BioThingsClient()
285 |             query_str = params.get("query", "")
286 |             tasks.append(_search_genes(client, query_str, output_json))
287 |             task_names.append("genes")
288 | 
289 |         elif tool_name == "drug_searcher":
290 |             # For drug search, we'll use the BioThingsClient directly
291 |             from biomcp.integrations.biothings_client import BioThingsClient
292 | 
293 |             client = BioThingsClient()
294 |             query_str = params.get("query", "")
295 |             tasks.append(_search_drugs(client, query_str, output_json))
296 |             task_names.append("drugs")
297 | 
298 |         elif tool_name == "disease_searcher":
299 |             # For disease search, we'll use the BioThingsClient directly
300 |             from biomcp.integrations.biothings_client import BioThingsClient
301 | 
302 |             client = BioThingsClient()
303 |             query_str = params.get("query", "")
304 |             tasks.append(_search_diseases(client, query_str, output_json))
305 |             task_names.append("diseases")
306 | 
307 |     # Execute all searches in parallel
308 |     results = await asyncio.gather(*tasks, return_exceptions=True)
309 | 
310 |     # Package results
311 |     output: dict[str, Any] = {}
312 |     for name, result in zip(task_names, results, strict=False):
313 |         if isinstance(result, Exception):
314 |             output[name] = {"error": str(result)}
315 |         else:
316 |             output[name] = result
317 | 
318 |     return output
319 | 
320 | 
321 | async def _search_genes(client, query: str, output_json: bool) -> Any:
322 |     """Search for genes using BioThingsClient."""
323 |     results = await client._query_gene(query)
324 |     if not results:
325 |         return [] if output_json else "No genes found"
326 | 
327 |     # Fetch full details for each result
328 |     detailed_results = []
329 |     for result in results[:10]:  # Limit to 10 results
330 |         gene_id = result.get("_id")
331 |         if gene_id:
332 |             full_gene = await client._get_gene_by_id(gene_id)
333 |             if full_gene:
334 |                 detailed_results.append(full_gene.model_dump(by_alias=True))
335 | 
336 |     if output_json:
337 |         import json
338 | 
339 |         return json.dumps(detailed_results)
340 |     else:
341 |         return detailed_results
342 | 
343 | 
344 | async def _search_drugs(client, query: str, output_json: bool) -> Any:
345 |     """Search for drugs using BioThingsClient."""
346 |     results = await client._query_drug(query)
347 |     if not results:
348 |         return [] if output_json else "No drugs found"
349 | 
350 |     # Fetch full details for each result
351 |     detailed_results = []
352 |     for result in results[:10]:  # Limit to 10 results
353 |         drug_id = result.get("_id")
354 |         if drug_id:
355 |             full_drug = await client._get_drug_by_id(drug_id)
356 |             if full_drug:
357 |                 detailed_results.append(full_drug.model_dump(by_alias=True))
358 | 
359 |     if output_json:
360 |         import json
361 | 
362 |         return json.dumps(detailed_results)
363 |     else:
364 |         return detailed_results
365 | 
366 | 
367 | async def _search_diseases(client, query: str, output_json: bool) -> Any:
368 |     """Search for diseases using BioThingsClient."""
369 |     results = await client._query_disease(query)
370 |     if not results:
371 |         return [] if output_json else "No diseases found"
372 | 
373 |     # Fetch full details for each result
374 |     detailed_results = []
375 |     for result in results[:10]:  # Limit to 10 results
376 |         disease_id = result.get("_id")
377 |         if disease_id:
378 |             full_disease = await client._get_disease_by_id(disease_id)
379 |             if full_disease:
380 |                 detailed_results.append(full_disease.model_dump(by_alias=True))
381 | 
382 |     if output_json:
383 |         import json
384 | 
385 |         return json.dumps(detailed_results)
386 |     else:
387 |         return detailed_results
388 | 
```

--------------------------------------------------------------------------------
/docs/user-guides/03-integrating-with-ides-and-clients.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Integrating with IDEs and Clients
  2 | 
  3 | BioMCP can be integrated into your development workflow through multiple approaches. This guide covers integration with IDEs, Python applications, and MCP-compatible clients.
  4 | 
  5 | ## Integration Methods Overview
  6 | 
  7 | | Method         | Best For                  | Installation | Usage Pattern            |
  8 | | -------------- | ------------------------- | ------------ | ------------------------ |
  9 | | **Cursor IDE** | Interactive development   | Smithery CLI | Natural language queries |
 10 | | **Python SDK** | Application development   | pip/uv       | Direct function calls    |
 11 | | **MCP Client** | AI assistants & protocols | Subprocess   | Tool-based communication |
 12 | 
 13 | ## Cursor IDE Integration
 14 | 
 15 | Cursor IDE provides the most seamless integration for interactive biomedical research during development.
 16 | 
 17 | ### Installation
 18 | 
 19 | 1. **Prerequisites:**
 20 | 
 21 |    - [Cursor IDE](https://cursor.sh/) installed
 22 |    - [Smithery](https://smithery.ai/) account and token
 23 | 
 24 | 2. **Install BioMCP:**
 25 | 
 26 |    ```bash
 27 |    npx -y @smithery/cli@latest install @genomoncology/biomcp --client cursor
 28 |    ```
 29 | 
 30 | 3. **Configuration:**
 31 |    - The Smithery CLI automatically configures Cursor
 32 |    - No manual configuration needed
 33 | 
 34 | ### Usage in Cursor
 35 | 
 36 | Once installed, you can query biomedical data using natural language:
 37 | 
 38 | #### Clinical Trials
 39 | 
 40 | ```
 41 | "Find Phase 3 clinical trials for lung cancer with immunotherapy"
 42 | ```
 43 | 
 44 | #### Research Articles
 45 | 
 46 | ```
 47 | "Summarize recent research on EGFR mutations in lung cancer"
 48 | ```
 49 | 
 50 | #### Genetic Variants
 51 | 
 52 | ```
 53 | "What's the clinical significance of the BRAF V600E mutation?"
 54 | ```
 55 | 
 56 | #### Complex Queries
 57 | 
 58 | ```
 59 | "Compare treatment outcomes for ALK-positive vs EGFR-mutant NSCLC"
 60 | ```
 61 | 
 62 | ### Cursor Tips
 63 | 
 64 | 1. **Be Specific**: Include gene names, disease types, and treatment modalities
 65 | 2. **Iterate**: Refine queries based on initial results
 66 | 3. **Cross-Reference**: Ask for both articles and trials on the same topic
 67 | 4. **Export Results**: Copy formatted results for documentation
 68 | 
 69 | ## Python SDK Integration
 70 | 
 71 | The Python SDK provides programmatic access to BioMCP for building applications.
 72 | 
 73 | ### Installation
 74 | 
 75 | ```bash
 76 | # Using pip
 77 | pip install biomcp-python
 78 | 
 79 | # Using uv
 80 | uv add biomcp-python
 81 | 
 82 | # For scripts
 83 | uv pip install biomcp-python
 84 | ```
 85 | 
 86 | ### Basic Usage
 87 | 
 88 | ```python
 89 | import asyncio
 90 | from biomcp import BioMCP
 91 | 
 92 | async def main():
 93 |     # Initialize client
 94 |     client = BioMCP()
 95 | 
 96 |     # Search for articles
 97 |     articles = await client.articles.search(
 98 |         genes=["BRAF"],
 99 |         diseases=["melanoma"],
100 |         limit=5
101 |     )
102 | 
103 |     # Search for trials
104 |     trials = await client.trials.search(
105 |         conditions=["breast cancer"],
106 |         interventions=["CDK4/6 inhibitor"],
107 |         recruiting_status="RECRUITING"
108 |     )
109 | 
110 |     # Get variant details
111 |     variant = await client.variants.get("rs121913529")
112 | 
113 |     return articles, trials, variant
114 | 
115 | # Run the async function
116 | results = asyncio.run(main())
117 | ```
118 | 
119 | ### Advanced Features
120 | 
121 | #### Domain-Specific Modules
122 | 
123 | ```python
124 | from biomcp import BioMCP
125 | from biomcp.variants import search_variants, get_variant
126 | from biomcp.trials import search_trials, get_trial
127 | from biomcp.articles import search_articles, fetch_articles
128 | 
129 | # Direct module usage
130 | async def variant_analysis():
131 |     # Search pathogenic TP53 variants
132 |     results = await search_variants(
133 |         gene="TP53",
134 |         significance="pathogenic",
135 |         frequency_max=0.01,
136 |         limit=20
137 |     )
138 | 
139 |     # Get detailed annotations
140 |     for variant in results:
141 |         details = await get_variant(variant.id)
142 |         print(f"{variant.id}: {details.clinical_significance}")
143 | ```
144 | 
145 | #### Output Formats
146 | 
147 | ```python
148 | # JSON for programmatic use
149 | articles_json = await client.articles.search(
150 |     genes=["KRAS"],
151 |     format="json"
152 | )
153 | 
154 | # Markdown for display
155 | articles_md = await client.articles.search(
156 |     genes=["KRAS"],
157 |     format="markdown"
158 | )
159 | ```
160 | 
161 | #### Error Handling
162 | 
163 | ```python
164 | from biomcp.exceptions import BioMCPError, APIError, ValidationError
165 | 
166 | try:
167 |     results = await client.articles.search(genes=["INVALID_GENE"])
168 | except ValidationError as e:
169 |     print(f"Invalid input: {e}")
170 | except APIError as e:
171 |     print(f"API error: {e}")
172 | except BioMCPError as e:
173 |     print(f"General error: {e}")
174 | ```
175 | 
176 | ### Example: Building a Variant Report
177 | 
178 | ```python
179 | import asyncio
180 | from biomcp import BioMCP
181 | 
182 | async def generate_variant_report(gene: str, mutation: str):
183 |     client = BioMCP()
184 | 
185 |     # 1. Get gene information
186 |     gene_info = await client.genes.get(gene)
187 | 
188 |     # 2. Search for the specific variant
189 |     variants = await client.variants.search(
190 |         gene=gene,
191 |         keywords=[mutation]
192 |     )
193 | 
194 |     # 3. Find relevant articles
195 |     articles = await client.articles.search(
196 |         genes=[gene],
197 |         keywords=[mutation],
198 |         limit=10
199 |     )
200 | 
201 |     # 4. Look for clinical trials
202 |     trials = await client.trials.search(
203 |         conditions=["cancer"],
204 |         other_terms=[f"{gene} {mutation}"],
205 |         recruiting_status="RECRUITING"
206 |     )
207 | 
208 |     # 5. Generate report
209 |     report = f"""
210 | # Variant Report: {gene} {mutation}
211 | 
212 | ## Gene Information
213 | - **Official Name**: {gene_info.name}
214 | - **Summary**: {gene_info.summary}
215 | 
216 | ## Variant Details
217 | Found {len(variants)} matching variants
218 | 
219 | ## Literature ({len(articles)} articles)
220 | Recent publications discussing this variant...
221 | 
222 | ## Clinical Trials ({len(trials)} active trials)
223 | Currently recruiting studies...
224 | """
225 | 
226 |     return report
227 | 
228 | # Generate report
229 | report = asyncio.run(generate_variant_report("BRAF", "V600E"))
230 | print(report)
231 | ```
232 | 
233 | ## MCP Client Integration
234 | 
235 | The Model Context Protocol (MCP) provides a standardized way to integrate BioMCP with AI assistants and other tools.
236 | 
237 | ### Understanding MCP
238 | 
239 | MCP is a protocol for communication between:
240 | 
241 | - **Clients**: AI assistants, IDEs, or custom applications
242 | - **Servers**: Tool providers like BioMCP
243 | 
244 | ### Critical Requirement: Think Tool
245 | 
246 | **IMPORTANT**: When using MCP, you MUST call the `think` tool first before any search or fetch operations. This ensures systematic analysis and optimal results.
247 | 
248 | ### Basic MCP Integration
249 | 
250 | ```python
251 | import asyncio
252 | import subprocess
253 | from mcp import ClientSession, StdioServerParameters
254 | from mcp.client.stdio import stdio_client
255 | 
256 | async def run_biomcp_query():
257 |     # Start BioMCP server
258 |     server_params = StdioServerParameters(
259 |         command="uv",
260 |         args=["run", "--with", "biomcp-python", "biomcp", "run"],
261 |         env={"PYTHONUNBUFFERED": "1"}
262 |     )
263 | 
264 |     async with stdio_client(server_params) as (read, write):
265 |         async with ClientSession(read, write) as session:
266 |             # Initialize and discover tools
267 |             await session.initialize()
268 |             tools = await session.list_tools()
269 | 
270 |             # CRITICAL: Always think first!
271 |             await session.call_tool(
272 |                 "think",
273 |                 arguments={
274 |                     "thought": "Analyzing BRAF V600E in melanoma...",
275 |                     "thoughtNumber": 1,
276 |                     "nextThoughtNeeded": True
277 |                 }
278 |             )
279 | 
280 |             # Now search for articles
281 |             result = await session.call_tool(
282 |                 "article_searcher",
283 |                 arguments={
284 |                     "genes": ["BRAF"],
285 |                     "diseases": ["melanoma"],
286 |                     "keywords": ["V600E"]
287 |                 }
288 |             )
289 | 
290 |             return result
291 | 
292 | # Run the query
293 | result = asyncio.run(run_biomcp_query())
294 | ```
295 | 
296 | ### Available MCP Tools
297 | 
298 | BioMCP provides 24 tools through MCP:
299 | 
300 | #### Core Tools (Always Use First)
301 | 
302 | - `think` - Sequential reasoning (MANDATORY first step)
303 | - `search` - Unified search across domains
304 | - `fetch` - Retrieve specific records
305 | 
306 | #### Domain-Specific Tools
307 | 
308 | - **Articles**: `article_searcher`, `article_getter`
309 | - **Trials**: `trial_searcher`, `trial_getter`, plus detail getters
310 | - **Variants**: `variant_searcher`, `variant_getter`, `alphagenome_predictor`
311 | - **BioThings**: `gene_getter`, `disease_getter`, `drug_getter`
312 | - **NCI**: Organization, intervention, biomarker, disease tools
313 | 
314 | ### MCP Integration Patterns
315 | 
316 | #### Pattern 1: AI Assistant Integration
317 | 
318 | ```python
319 | # Example for integrating with an AI assistant
320 | class BioMCPAssistant:
321 |     def __init__(self):
322 |         self.session = None
323 | 
324 |     async def connect(self):
325 |         # Initialize MCP connection
326 |         server_params = StdioServerParameters(
327 |             command="biomcp",
328 |             args=["run"]
329 |         )
330 |         # ... connection setup ...
331 | 
332 |     async def process_query(self, user_query: str):
333 |         # 1. Always think first
334 |         await self.think_about_query(user_query)
335 | 
336 |         # 2. Determine appropriate tools
337 |         tools_needed = self.analyze_query(user_query)
338 | 
339 |         # 3. Execute tool calls
340 |         results = []
341 |         for tool in tools_needed:
342 |             result = await self.session.call_tool(tool.name, tool.args)
343 |             results.append(result)
344 | 
345 |         # 4. Synthesize results
346 |         return self.format_response(results)
347 | ```
348 | 
349 | #### Pattern 2: Custom Client Implementation
350 | 
351 | ```python
352 | import json
353 | from typing import Any, Dict
354 | 
355 | class BioMCPClient:
356 |     """Custom client for specific biomedical workflows"""
357 | 
358 |     async def variant_to_trials_pipeline(self, variant_id: str):
359 |         """Find trials for patients with specific variants"""
360 | 
361 |         # Step 1: Think and plan
362 |         await self.think(
363 |             "Planning variant-to-trials search pipeline...",
364 |             thoughtNumber=1
365 |         )
366 | 
367 |         # Step 2: Get variant details
368 |         variant = await self.call_tool("variant_getter", {
369 |             "variant_id": variant_id
370 |         })
371 | 
372 |         # Step 3: Extract gene and disease associations
373 |         gene = variant.get("gene", {}).get("symbol")
374 |         diseases = self.extract_diseases(variant)
375 | 
376 |         # Step 4: Search for relevant trials
377 |         trials = await self.call_tool("trial_searcher", {
378 |             "conditions": diseases,
379 |             "other_terms": [f"{gene} mutation"],
380 |             "recruiting_status": "RECRUITING"
381 |         })
382 | 
383 |         return {
384 |             "variant": variant,
385 |             "associated_trials": trials
386 |         }
387 | ```
388 | 
389 | ### MCP Best Practices
390 | 
391 | 1. **Always Think First**
392 | 
393 |    ```python
394 |    # ✅ Correct
395 |    await think(thought="Planning research...", thoughtNumber=1)
396 |    await search(...)
397 | 
398 |    # ❌ Wrong - skips thinking
399 |    await search(...)  # Will produce poor results
400 |    ```
401 | 
402 | 2. **Use Appropriate Tools**
403 | 
404 |    ```python
405 |    # For broad searches across domains
406 |    await call_tool("search", {"query": "gene:BRAF AND melanoma"})
407 | 
408 |    # For specific domain searches
409 |    await call_tool("article_searcher", {"genes": ["BRAF"]})
410 |    ```
411 | 
412 | 3. **Handle Tool Responses**
413 |    ```python
414 |    try:
415 |        result = await session.call_tool("variant_getter", {
416 |            "variant_id": "rs121913529"
417 |        })
418 |        # Process structured result
419 |        if result.get("error"):
420 |            handle_error(result["error"])
421 |        else:
422 |            process_variant(result["data"])
423 |    except Exception as e:
424 |        logger.error(f"Tool call failed: {e}")
425 |    ```
426 | 
427 | ## Choosing the Right Integration
428 | 
429 | ### Use Cursor IDE When:
430 | 
431 | - Doing interactive research during development
432 | - Exploring biomedical data for new projects
433 | - Need quick answers without writing code
434 | - Want natural language queries
435 | 
436 | ### Use Python SDK When:
437 | 
438 | - Building production applications
439 | - Need type-safe interfaces
440 | - Want direct function calls
441 | - Require custom error handling
442 | 
443 | ### Use MCP Client When:
444 | 
445 | - Integrating with AI assistants
446 | - Building protocol-compliant tools
447 | - Need standardized tool interfaces
448 | - Want language-agnostic integration
449 | 
450 | ## Integration Examples
451 | 
452 | ### Example 1: Research Dashboard (Python SDK)
453 | 
454 | ```python
455 | from biomcp import BioMCP
456 | import streamlit as st
457 | 
458 | async def create_dashboard():
459 |     client = BioMCP()
460 | 
461 |     st.title("Biomedical Research Dashboard")
462 | 
463 |     # Gene input
464 |     gene = st.text_input("Enter gene symbol:", "BRAF")
465 | 
466 |     if st.button("Search"):
467 |         # Fetch comprehensive data
468 |         col1, col2 = st.columns(2)
469 | 
470 |         with col1:
471 |             st.subheader("Recent Articles")
472 |             articles = await client.articles.search(genes=[gene], limit=5)
473 |             for article in articles:
474 |                 st.write(f"- [{article.title}]({article.url})")
475 | 
476 |         with col2:
477 |             st.subheader("Active Trials")
478 |             trials = await client.trials.search(
479 |                 other_terms=[gene],
480 |                 recruiting_status="RECRUITING",
481 |                 limit=5
482 |             )
483 |             for trial in trials:
484 |                 st.write(f"- [{trial.nct_id}]({trial.url})")
485 | ```
486 | 
487 | ### Example 2: Variant Analysis Pipeline (MCP)
488 | 
489 | ```python
490 | async def comprehensive_variant_analysis(session, hgvs: str):
491 |     """Complete variant analysis workflow using MCP"""
492 | 
493 |     # Think about the analysis
494 |     await session.call_tool("think", {
495 |         "thought": f"Planning comprehensive analysis for {hgvs}",
496 |         "thoughtNumber": 1
497 |     })
498 | 
499 |     # Get variant details
500 |     variant = await session.call_tool("variant_getter", {
501 |         "variant_id": hgvs
502 |     })
503 | 
504 |     # Search related articles
505 |     articles = await session.call_tool("article_searcher", {
506 |         "variants": [hgvs],
507 |         "limit": 10
508 |     })
509 | 
510 |     # Find applicable trials
511 |     gene = variant.get("gene", {}).get("symbol")
512 |     trials = await session.call_tool("trial_searcher", {
513 |         "other_terms": [f"{gene} mutation"],
514 |         "recruiting_status": "RECRUITING"
515 |     })
516 | 
517 |     # Predict functional effects if genomic coordinates available
518 |     if variant.get("chrom") and variant.get("pos"):
519 |         prediction = await session.call_tool("alphagenome_predictor", {
520 |             "chromosome": f"chr{variant['chrom']}",
521 |             "position": variant["pos"],
522 |             "reference": variant["ref"],
523 |             "alternate": variant["alt"]
524 |         })
525 | 
526 |     return {
527 |         "variant": variant,
528 |         "articles": articles,
529 |         "trials": trials,
530 |         "prediction": prediction
531 |     }
532 | ```
533 | 
534 | ## Troubleshooting
535 | 
536 | ### Common Issues
537 | 
538 | 1. **"Think tool not called" errors**
539 | 
540 |    - Always call think before other operations
541 |    - Include thoughtNumber parameter
542 | 
543 | 2. **API rate limits**
544 | 
545 |    - Add delays between requests
546 |    - Use API keys for higher limits
547 | 
548 | 3. **Connection failures**
549 | 
550 |    - Check network connectivity
551 |    - Verify server is running
552 |    - Ensure correct installation
553 | 
554 | 4. **Invalid gene symbols**
555 |    - Use official HGNC symbols
556 |    - Check [genenames.org](https://www.genenames.org)
557 | 
558 | ### Debug Mode
559 | 
560 | Enable debug logging:
561 | 
562 | ```python
563 | # Python SDK
564 | import logging
565 | logging.basicConfig(level=logging.DEBUG)
566 | 
567 | # MCP Client
568 | server_params = StdioServerParameters(
569 |     command="biomcp",
570 |     args=["run", "--log-level", "DEBUG"]
571 | )
572 | ```
573 | 
574 | ## Next Steps
575 | 
576 | - Explore [tool-specific documentation](02-mcp-tools-reference.md)
577 | - Review [API authentication](../getting-started/03-authentication-and-api-keys.md)
578 | - Check [example workflows](../how-to-guides/01-find-articles-and-cbioportal-data.md) for your use case
579 | 
```

--------------------------------------------------------------------------------
/docs/user-guides/01-command-line-interface.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Command Line Interface Reference
  2 | 
  3 | BioMCP provides a comprehensive command-line interface for biomedical data retrieval and analysis. This guide covers all available commands, options, and usage patterns.
  4 | 
  5 | ## Installation
  6 | 
  7 | ```bash
  8 | # Using uv (recommended)
  9 | uv tool install biomcp
 10 | 
 11 | # Using pip
 12 | pip install biomcp-python
 13 | ```
 14 | 
 15 | ## Global Options
 16 | 
 17 | These options work with all commands:
 18 | 
 19 | ```bash
 20 | biomcp [OPTIONS] COMMAND [ARGS]...
 21 | 
 22 | Options:
 23 |   --version  Show the version and exit
 24 |   --help     Show help message and exit
 25 | ```
 26 | 
 27 | ## Commands Overview
 28 | 
 29 | | Domain           | Commands             | Purpose                                         |
 30 | | ---------------- | -------------------- | ----------------------------------------------- |
 31 | | **article**      | search, get          | Search and retrieve biomedical literature       |
 32 | | **trial**        | search, get          | Find and fetch clinical trial information       |
 33 | | **variant**      | search, get, predict | Analyze genetic variants and predict effects    |
 34 | | **gene**         | get                  | Retrieve gene information and annotations       |
 35 | | **drug**         | get                  | Look up drug/chemical information               |
 36 | | **disease**      | get                  | Get disease definitions and synonyms            |
 37 | | **organization** | search               | Search NCI organization database                |
 38 | | **intervention** | search               | Find interventions (drugs, devices, procedures) |
 39 | | **biomarker**    | search               | Search biomarkers used in trials                |
 40 | | **health**       | check                | Monitor API status and system health            |
 41 | 
 42 | ## Article Commands
 43 | 
 44 | For practical examples and workflows, see [How to Find Articles and cBioPortal Data](../how-to-guides/01-find-articles-and-cbioportal-data.md).
 45 | 
 46 | ### article search
 47 | 
 48 | Search PubMed/PubTator3 for biomedical literature with automatic cBioPortal integration.
 49 | 
 50 | ```bash
 51 | biomcp article search [OPTIONS]
 52 | ```
 53 | 
 54 | **Options:**
 55 | 
 56 | - `--gene, -g TEXT`: Gene symbol(s) to search for
 57 | - `--variant, -v TEXT`: Genetic variant(s) to search for
 58 | - `--disease, -d TEXT`: Disease/condition(s) to search for
 59 | - `--chemical, -c TEXT`: Chemical/drug name(s) to search for
 60 | - `--keyword, -k TEXT`: Keyword(s) to search for (supports OR with `|`)
 61 | - `--pmid TEXT`: Specific PubMed ID(s) to retrieve
 62 | - `--limit INTEGER`: Maximum results to return (default: 10)
 63 | - `--no-preprints`: Exclude preprints from results
 64 | - `--no-cbioportal`: Disable automatic cBioPortal integration
 65 | - `--format [json|markdown]`: Output format (default: markdown)
 66 | 
 67 | **Examples:**
 68 | 
 69 | ```bash
 70 | # Basic gene search with automatic cBioPortal data
 71 | biomcp article search --gene BRAF --disease melanoma
 72 | 
 73 | # Multiple filters
 74 | biomcp article search --gene EGFR --disease "lung cancer" --chemical erlotinib
 75 | 
 76 | # OR logic in keywords (find different variant notations)
 77 | biomcp article search --gene PTEN --keyword "R173|Arg173|p.R173"
 78 | 
 79 | # Exclude preprints
 80 | biomcp article search --gene TP53 --no-preprints --limit 20
 81 | 
 82 | # JSON output for programmatic use
 83 | biomcp article search --gene KRAS --format json > results.json
 84 | ```
 85 | 
 86 | ### article get
 87 | 
 88 | Retrieve a specific article by PubMed ID or DOI.
 89 | 
 90 | ```bash
 91 | biomcp article get IDENTIFIER
 92 | ```
 93 | 
 94 | **Arguments:**
 95 | 
 96 | - `IDENTIFIER`: PubMed ID (e.g., "38768446") or DOI (e.g., "10.1101/2024.01.20.23288905")
 97 | 
 98 | **Examples:**
 99 | 
100 | ```bash
101 | # Get article by PubMed ID
102 | biomcp article get 38768446
103 | 
104 | # Get preprint by DOI
105 | biomcp article get "10.1101/2024.01.20.23288905"
106 | ```
107 | 
108 | ## Trial Commands
109 | 
110 | For practical examples and workflows, see [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md).
111 | 
112 | ### trial search
113 | 
114 | Search ClinicalTrials.gov or NCI CTS API for clinical trials.
115 | 
116 | ```bash
117 | biomcp trial search [OPTIONS]
118 | ```
119 | 
120 | **Basic Options:**
121 | 
122 | - `--condition TEXT`: Disease/condition to search
123 | - `--intervention TEXT`: Treatment/intervention to search
124 | - `--term TEXT`: General search terms
125 | - `--nct-id TEXT`: Specific NCT ID(s)
126 | - `--limit INTEGER`: Maximum results (default: 10)
127 | - `--source [ctgov|nci]`: Data source (default: ctgov)
128 | - `--api-key TEXT`: API key for NCI source
129 | 
130 | **Study Characteristics:**
131 | 
132 | - `--status TEXT`: Trial status (RECRUITING, ACTIVE_NOT_RECRUITING, etc.)
133 | - `--study-type TEXT`: Type of study (INTERVENTIONAL, OBSERVATIONAL)
134 | - `--phase TEXT`: Trial phase (EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4)
135 | - `--study-purpose TEXT`: Primary purpose (TREATMENT, PREVENTION, etc.)
136 | - `--age-group TEXT`: Target age group (CHILD, ADULT, OLDER_ADULT)
137 | 
138 | **Location Options:**
139 | 
140 | - `--country TEXT`: Country name
141 | - `--state TEXT`: State/province
142 | - `--city TEXT`: City name
143 | - `--latitude FLOAT`: Geographic latitude
144 | - `--longitude FLOAT`: Geographic longitude
145 | - `--distance INTEGER`: Search radius in miles
146 | 
147 | **Advanced Filters:**
148 | 
149 | - `--start-date TEXT`: Trial start date (YYYY-MM-DD)
150 | - `--end-date TEXT`: Trial end date (YYYY-MM-DD)
151 | - `--intervention-type TEXT`: Type of intervention
152 | - `--sponsor-type TEXT`: Type of sponsor
153 | - `--is-fda-regulated`: FDA-regulated trials only
154 | - `--expanded-access`: Trials offering expanded access
155 | 
156 | **Examples:**
157 | 
158 | ```bash
159 | # Find recruiting melanoma trials
160 | biomcp trial search --condition melanoma --status RECRUITING
161 | 
162 | # Search by location (requires coordinates)
163 | biomcp trial search --condition "lung cancer" \
164 |   --latitude 41.4993 --longitude -81.6944 --distance 50
165 | 
166 | # Use NCI source with advanced filters
167 | biomcp trial search --condition melanoma --source nci \
168 |   --required-mutations "BRAF V600E" --allow-brain-mets true \
169 |   --api-key YOUR_KEY
170 | 
171 | # Multiple filters
172 | biomcp trial search --condition "breast cancer" \
173 |   --intervention "CDK4/6 inhibitor" --phase PHASE3 \
174 |   --status RECRUITING --country "United States"
175 | ```
176 | 
177 | ### trial get
178 | 
179 | Retrieve detailed information about a specific clinical trial.
180 | 
181 | ```bash
182 | biomcp trial get NCT_ID [OPTIONS]
183 | ```
184 | 
185 | **Arguments:**
186 | 
187 | - `NCT_ID`: Clinical trial identifier (e.g., NCT03006926)
188 | 
189 | **Options:**
190 | 
191 | - `--include TEXT`: Specific sections to include (Protocol, Locations, References, Outcomes)
192 | - `--source [ctgov|nci]`: Data source (default: ctgov)
193 | - `--api-key TEXT`: API key for NCI source
194 | 
195 | **Examples:**
196 | 
197 | ```bash
198 | # Get basic trial information
199 | biomcp trial get NCT03006926
200 | 
201 | # Get specific sections
202 | biomcp trial get NCT03006926 --include Protocol --include Locations
203 | 
204 | # Use NCI source
205 | biomcp trial get NCT04280705 --source nci --api-key YOUR_KEY
206 | ```
207 | 
208 | ## Variant Commands
209 | 
210 | For practical examples and workflows, see:
211 | 
212 | - [Get Comprehensive Variant Annotations](../how-to-guides/03-get-comprehensive-variant-annotations.md)
213 | - [Predict Variant Effects with AlphaGenome](../how-to-guides/04-predict-variant-effects-with-alphagenome.md)
214 | 
215 | ### variant search
216 | 
217 | Search MyVariant.info for genetic variant annotations.
218 | 
219 | ```bash
220 | biomcp variant search [OPTIONS]
221 | ```
222 | 
223 | **Options:**
224 | 
225 | - `--gene TEXT`: Gene symbol
226 | - `--hgvs TEXT`: HGVS notation
227 | - `--rsid TEXT`: dbSNP rsID
228 | - `--chromosome TEXT`: Chromosome
229 | - `--start INTEGER`: Genomic start position
230 | - `--end INTEGER`: Genomic end position
231 | - `--assembly [hg19|hg38]`: Genome assembly (default: hg38)
232 | - `--significance TEXT`: Clinical significance
233 | - `--min-frequency FLOAT`: Minimum allele frequency
234 | - `--max-frequency FLOAT`: Maximum allele frequency
235 | - `--min-cadd FLOAT`: Minimum CADD score
236 | - `--polyphen TEXT`: PolyPhen prediction
237 | - `--sift TEXT`: SIFT prediction
238 | - `--sources TEXT`: Data sources to include
239 | - `--limit INTEGER`: Maximum results (default: 10)
240 | - `--no-cbioportal`: Disable cBioPortal integration
241 | 
242 | **Examples:**
243 | 
244 | ```bash
245 | # Search pathogenic BRCA1 variants
246 | biomcp variant search --gene BRCA1 --significance pathogenic
247 | 
248 | # Search by HGVS notation
249 | biomcp variant search --hgvs "NM_007294.4:c.5266dupC"
250 | 
251 | # Filter by frequency and prediction scores
252 | biomcp variant search --gene TP53 --max-frequency 0.01 \
253 |   --min-cadd 20 --polyphen possibly_damaging
254 | 
255 | # Search genomic region
256 | biomcp variant search --chromosome 7 --start 140753336 --end 140753337
257 | ```
258 | 
259 | ### variant get
260 | 
261 | Retrieve detailed information about a specific variant.
262 | 
263 | ```bash
264 | biomcp variant get VARIANT_ID [OPTIONS]
265 | ```
266 | 
267 | **Arguments:**
268 | 
269 | - `VARIANT_ID`: Variant identifier (HGVS, rsID, or genomic)
270 | 
271 | **Options:**
272 | 
273 | - `--json, -j`: Output in JSON format
274 | - `--include-external / --no-external`: Include/exclude external annotations (default: include)
275 | - `--assembly TEXT`: Genome assembly (hg19 or hg38, default: hg19)
276 | 
277 | **Examples:**
278 | 
279 | ```bash
280 | # Get variant by HGVS (defaults to hg19)
281 | biomcp variant get "NM_007294.4:c.5266dupC"
282 | 
283 | # Get variant by rsID
284 | biomcp variant get rs121913529
285 | 
286 | # Specify hg38 assembly
287 | biomcp variant get rs113488022 --assembly hg38
288 | 
289 | # JSON output with hg38
290 | biomcp variant get rs113488022 --json --assembly hg38
291 | 
292 | # Without external annotations
293 | biomcp variant get rs113488022 --no-external
294 | 
295 | # Get variant by genomic coordinates
296 | biomcp variant get "chr17:g.43082434G>A"
297 | ```
298 | 
299 | ### variant predict
300 | 
301 | Predict variant effects using Google DeepMind's AlphaGenome (requires API key).
302 | 
303 | ```bash
304 | biomcp variant predict CHROMOSOME POSITION REFERENCE ALTERNATE [OPTIONS]
305 | ```
306 | 
307 | **Arguments:**
308 | 
309 | - `CHROMOSOME`: Chromosome (e.g., chr7)
310 | - `POSITION`: Genomic position
311 | - `REFERENCE`: Reference allele
312 | - `ALTERNATE`: Alternate allele
313 | 
314 | **Options:**
315 | 
316 | - `--tissue TEXT`: Tissue type(s) using UBERON ontology
317 | - `--interval INTEGER`: Analysis window size (default: 20000)
318 | - `--api-key TEXT`: AlphaGenome API key
319 | 
320 | **Examples:**
321 | 
322 | ```bash
323 | # Basic prediction (requires ALPHAGENOME_API_KEY env var)
324 | biomcp variant predict chr7 140753336 A T
325 | 
326 | # Tissue-specific prediction
327 | biomcp variant predict chr7 140753336 A T \
328 |   --tissue UBERON:0002367  # breast tissue
329 | 
330 | # With per-request API key
331 | biomcp variant predict chr7 140753336 A T --api-key YOUR_KEY
332 | ```
333 | 
334 | ## Gene/Drug/Disease Commands
335 | 
336 | For practical examples using BioThings integration, see [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#biothings-integration-for-enhanced-search).
337 | 
338 | ### gene get
339 | 
340 | Retrieve gene information from MyGene.info.
341 | 
342 | ```bash
343 | biomcp gene get GENE_NAME
344 | ```
345 | 
346 | **Examples:**
347 | 
348 | ```bash
349 | # Get gene information
350 | biomcp gene get TP53
351 | biomcp gene get BRAF
352 | ```
353 | 
354 | ### drug get
355 | 
356 | Retrieve drug/chemical information from MyChem.info.
357 | 
358 | ```bash
359 | biomcp drug get DRUG_NAME
360 | ```
361 | 
362 | **Examples:**
363 | 
364 | ```bash
365 | # Get drug information
366 | biomcp drug get imatinib
367 | biomcp drug get pembrolizumab
368 | ```
369 | 
370 | ### disease get
371 | 
372 | Retrieve disease information from MyDisease.info.
373 | 
374 | ```bash
375 | biomcp disease get DISEASE_NAME
376 | ```
377 | 
378 | **Examples:**
379 | 
380 | ```bash
381 | # Get disease information
382 | biomcp disease get melanoma
383 | biomcp disease get "non-small cell lung cancer"
384 | ```
385 | 
386 | ## NCI-Specific Commands
387 | 
388 | These commands require an NCI API key. For setup instructions and usage examples, see:
389 | 
390 | - [Authentication and API Keys](../getting-started/03-authentication-and-api-keys.md#nci-clinical-trials-api)
391 | - [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#using-nci-api-advanced-features)
392 | 
393 | ### organization search
394 | 
395 | Search NCI's organization database.
396 | 
397 | ```bash
398 | biomcp organization search [OPTIONS]
399 | ```
400 | 
401 | **Options:**
402 | 
403 | - `--name TEXT`: Organization name
404 | - `--city TEXT`: City location
405 | - `--state TEXT`: State/province
406 | - `--country TEXT`: Country
407 | - `--org-type TEXT`: Organization type
408 | - `--api-key TEXT`: NCI API key
409 | 
410 | **Example:**
411 | 
412 | ```bash
413 | biomcp organization search --name "MD Anderson" \
414 |   --city Houston --state TX --api-key YOUR_KEY
415 | ```
416 | 
417 | ### intervention search
418 | 
419 | Search NCI's intervention database.
420 | 
421 | ```bash
422 | biomcp intervention search [OPTIONS]
423 | ```
424 | 
425 | **Options:**
426 | 
427 | - `--name TEXT`: Intervention name
428 | - `--intervention-type TEXT`: Type (Drug, Device, Procedure, etc.)
429 | - `--api-key TEXT`: NCI API key
430 | 
431 | **Example:**
432 | 
433 | ```bash
434 | biomcp intervention search --name pembrolizumab \
435 |   --intervention-type Drug --api-key YOUR_KEY
436 | ```
437 | 
438 | ### biomarker search
439 | 
440 | Search biomarkers used in clinical trials.
441 | 
442 | ```bash
443 | biomcp biomarker search [OPTIONS]
444 | ```
445 | 
446 | **Options:**
447 | 
448 | - `--gene TEXT`: Gene symbol
449 | - `--biomarker-type TEXT`: Type of biomarker
450 | - `--api-key TEXT`: NCI API key
451 | 
452 | **Example:**
453 | 
454 | ```bash
455 | biomcp biomarker search --gene EGFR \
456 |   --biomarker-type mutation --api-key YOUR_KEY
457 | ```
458 | 
459 | ## Health Command
460 | 
461 | For monitoring API status before bulk operations, see the [Performance Optimizations Guide](../developer-guides/07-performance-optimizations.md).
462 | 
463 | ### health check
464 | 
465 | Monitor API endpoints and system health.
466 | 
467 | ```bash
468 | biomcp health check [OPTIONS]
469 | ```
470 | 
471 | **Options:**
472 | 
473 | - `--apis-only`: Check only API endpoints
474 | - `--system-only`: Check only system resources
475 | - `--verbose, -v`: Show detailed information
476 | 
477 | **Examples:**
478 | 
479 | ```bash
480 | # Full health check
481 | biomcp health check
482 | 
483 | # Check APIs only
484 | biomcp health check --apis-only
485 | 
486 | # Detailed system check
487 | biomcp health check --system-only --verbose
488 | ```
489 | 
490 | ## Output Formats
491 | 
492 | Most commands support both human-readable markdown and machine-readable JSON output:
493 | 
494 | ```bash
495 | # Default markdown output
496 | biomcp article search --gene BRAF
497 | 
498 | # JSON for programmatic use
499 | biomcp article search --gene BRAF --format json
500 | 
501 | # Save to file
502 | biomcp trial search --condition melanoma --format json > trials.json
503 | ```
504 | 
505 | ## Environment Variables
506 | 
507 | Configure default behavior with environment variables:
508 | 
509 | ```bash
510 | # API Keys
511 | export NCI_API_KEY="your-nci-key"
512 | export ALPHAGENOME_API_KEY="your-alphagenome-key"
513 | export CBIO_TOKEN="your-cbioportal-token"
514 | 
515 | # Logging
516 | export BIOMCP_LOG_LEVEL="DEBUG"
517 | export BIOMCP_CACHE_DIR="/path/to/cache"
518 | ```
519 | 
520 | ## Getting Help
521 | 
522 | Every command has a built-in help flag:
523 | 
524 | ```bash
525 | # General help
526 | biomcp --help
527 | 
528 | # Command-specific help
529 | biomcp article search --help
530 | biomcp trial get --help
531 | biomcp variant predict --help
532 | ```
533 | 
534 | ## Tips and Best Practices
535 | 
536 | 1. **Use Official Gene Symbols**: Always use HGNC-approved gene symbols (e.g., "TP53" not "p53")
537 | 
538 | 2. **Combine Filters**: Most commands support multiple filters for precise results:
539 | 
540 |    ```bash
541 |    biomcp article search --gene EGFR --disease "lung cancer" \
542 |      --chemical erlotinib --keyword "resistance"
543 |    ```
544 | 
545 | 3. **Handle Large Results**: Use `--limit` and `--format json` for processing:
546 | 
547 |    ```bash
548 |    biomcp article search --gene BRCA1 --limit 100 --format json | \
549 |      jq '.results[] | {pmid: .pmid, title: .title}'
550 |    ```
551 | 
552 | 4. **Location Searches**: Always provide both latitude and longitude:
553 | 
554 |    ```bash
555 |    # Find trials near Boston
556 |    biomcp trial search --condition cancer \
557 |      --latitude 42.3601 --longitude -71.0589 --distance 25
558 |    ```
559 | 
560 | 5. **Use OR Logic**: The pipe character enables flexible searches:
561 | 
562 |    ```bash
563 |    # Find articles mentioning any form of a variant
564 |    biomcp article search --gene BRAF --keyword "V600E|p.V600E|c.1799T>A"
565 |    ```
566 | 
567 | 6. **Check API Health**: Before bulk operations, verify API status:
568 |    ```bash
569 |    biomcp health check --apis-only
570 |    ```
571 | 
572 | ## Next Steps
573 | 
574 | - Set up [API keys](../getting-started/03-authentication-and-api-keys.md) for enhanced features
575 | - Explore [MCP tools](02-mcp-tools-reference.md) for AI integration
576 | - Read [how-to guides](../how-to-guides/01-find-articles-and-cbioportal-data.md) for complex workflows
577 | 
```

--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Changelog
  2 | 
  3 | All notable changes to the BioMCP project will be documented in this file.
  4 | 
  5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
  6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
  7 | 
  8 | ## [0.6.7] - 2025-08-13
  9 | 
 10 | ### Fixed
 11 | 
 12 | - **MCP Resource Encoding** - Fixed character encoding error when loading resources on Windows (Issue #63):
 13 |   - Added explicit UTF-8 encoding for reading `instructions.md` and `researcher.md` resource files
 14 |   - Resolves "'charmap' codec can't decode byte 0x8f" error on Windows systems
 15 |   - Ensures cross-platform compatibility for resource loading
 16 | 
 17 | ### Changed
 18 | 
 19 | - **Documentation** - Clarified sequential thinking integration:
 20 |   - Updated `researcher-persona-resource.md` to remove references to external `sequential-thinking` MCP server
 21 |   - Clarified that the `think` tool is built into BioMCP (no external dependencies needed)
 22 |   - Updated configuration examples to show only BioMCP server is required
 23 | 
 24 | ## [0.6.6] - 2025-08-08
 25 | 
 26 | ### Fixed
 27 | 
 28 | - **Windows Compatibility** - Fixed fcntl module import error on Windows (Issue #57):
 29 |   - Added conditional import with try/except for fcntl module
 30 |   - File locking now only applies on Unix systems
 31 |   - Windows users get full functionality without file locking
 32 |   - Refactored cache functions to reduce code complexity
 33 | 
 34 | ### Changed
 35 | 
 36 | - **Documentation** - Updated Docker instructions in README (Issue #58):
 37 |   - Added `docker build -t biomcp:latest .` command before `docker run`
 38 |   - Clarified that biomcp:latest is a local build, not pulled from Docker Hub
 39 | 
 40 | ## [0.6.5] - 2025-08-07
 41 | 
 42 | ### Added
 43 | 
 44 | - **OpenFDA Integration** - Comprehensive FDA regulatory data access:
 45 |   - **12 New MCP Tools** for adverse events, drug labels, device events, drug approvals, recalls, and shortages
 46 |   - Each domain includes searcher and getter tools for flexible data retrieval
 47 |   - Unified search support with `domain="fda_*"` parameters
 48 |   - Enhanced CLI commands for all OpenFDA endpoints
 49 |   - Smart caching and rate limiting for API efficiency
 50 |   - Comprehensive error handling and data validation
 51 | 
 52 | ### Changed
 53 | 
 54 | - Improved API key support across all OpenFDA tools
 55 | - Enhanced documentation for FDA data integration
 56 | 
 57 | ## [0.6.4] - 2025-08-06
 58 | 
 59 | ### Changed
 60 | 
 61 | - **Documentation Restructure** - Major documentation improvements:
 62 |   - Simplified navigation structure for better user experience
 63 |   - Fixed code block formatting and layout issues
 64 |   - Removed unnecessary sections and redundant content
 65 |   - Improved overall documentation readability and organization
 66 |   - Enhanced mobile responsiveness
 67 | 
 68 | ## [0.6.3] - 2025-08-05
 69 | 
 70 | ### Added
 71 | 
 72 | - **NCI Clinical Trials Search API Integration** - Enhanced cancer trial search capabilities:
 73 |   - Dual source support for trial search/getter tools (ClinicalTrials.gov + NCI)
 74 |   - NCI API key handling via `NCI_API_KEY` environment variable or parameter
 75 |   - Advanced trial filters: biomarkers, prior therapy, brain metastases acceptance
 76 |   - **6 New MCP Tools** for NCI-specific searches:
 77 |     - `nci_organization_searcher` / `nci_organization_getter`: Cancer centers, hospitals, research institutions
 78 |     - `nci_intervention_searcher` / `nci_intervention_getter`: Drugs, devices, procedures, biologicals
 79 |     - `nci_biomarker_searcher`: Trial eligibility biomarkers (reference genes, branches)
 80 |     - `nci_disease_searcher`: NCI's controlled vocabulary of cancer conditions
 81 |   - **OR Query Support**: All NCI endpoints support OR queries (e.g., "PD-L1 OR CD274")
 82 |   - Real-time access to NCI's curated cancer trials database
 83 |   - Automatic cBioPortal integration for gene searches
 84 |   - Proper NCI parameter mapping (org_city, org_state_or_province, etc.)
 85 |   - Comprehensive error handling for Elasticsearch limits
 86 | 
 87 | ### Changed
 88 | 
 89 | - Enhanced unified search router to properly handle NCI domains
 90 | - Trial search/getter tools now accept `source` parameter ("clinicaltrials" or "nci")
 91 | - Improved domain-specific search logic for query+domain combinations
 92 | 
 93 | ## [0.6.2] - 2025-08-05
 94 | 
 95 | Note: Initial NCI integration release - see v0.6.3 for the full implementation.
 96 | 
 97 | ## [0.6.1] - 2025-08-03
 98 | 
 99 | ### Fixed
100 | 
101 | - **Dependency Management** - Fixed alphagenome dependency to enable PyPI publishing
102 |   - Made alphagenome an optional dependency
103 |   - Resolved packaging conflicts for distribution
104 | 
105 | ## [0.6.0] - 2025-08-02
106 | 
107 | ### Added
108 | 
109 | - **Streamable HTTP Transport Protocol** - Modern MCP transport implementation:
110 |   - Single `/mcp` endpoint for all communication
111 |   - Session management with persistent session IDs
112 |   - Event resumption support for reliability
113 |   - On-demand streaming for long operations
114 |   - Configurable HTTP server modes (STDIO, HTTP, Worker)
115 |   - Better scalability for cloud deployments
116 |   - Full MCP specification compliance (2025-03-26)
117 | 
118 | ### Changed
119 | 
120 | - Improved Cloudflare Worker integration
121 | - Enhanced transport layer with comprehensive testing
122 | - Updated deployment configurations for HTTP mode
123 | 
124 | ## [0.5.0] - 2025-07-31
125 | 
126 | ### Added
127 | 
128 | - **BioThings API Integration** - Real-time biomedical data access:
129 |   - **MyGene.info**: Gene annotations, summaries, aliases, and database links
130 |   - **MyChem.info**: Drug/chemical information, identifiers, mechanisms of action
131 |   - **MyDisease.info**: Disease definitions, synonyms, MONDO/DOID mappings
132 |   - **3 New MCP Tools**: `gene_getter`, `drug_getter`, `disease_getter`
133 |   - Automatic synonym expansion for enhanced trial searches
134 |   - Batch optimization for multiple gene lookups
135 |   - Live data fetching ensures current information
136 | 
137 | ### Changed
138 | 
139 | - Enhanced unified search capabilities with BioThings data
140 | - Expanded query language support for gene, drug, and disease queries
141 | - Improved trial searches with automatic disease synonym expansion
142 | 
143 | ## [0.4.7] - 2025-07-30
144 | 
145 | ### Added
146 | 
147 | - **BioThings Integration** for real-time biomedical data access:
148 |   - **New MCP Tools** (3 tools added, total now 17):
149 |     - `gene_getter`: Query MyGene.info for gene information (symbols, names, summaries)
150 |     - `drug_getter`: Query MyChem.info for drug/chemical data (formulas, indications, mechanisms)
151 |     - `disease_getter`: Query MyDisease.info for disease information (definitions, synonyms, ontologies)
152 |   - **Unified Search/Fetch Enhancement**:
153 |     - Added `gene`, `drug`, `disease` as new searchable domains alongside article, trial, variant
154 |     - Integrated into unified search syntax: `search(domain="gene", keywords=["BRAF"])`
155 |     - Query language support: `gene:BRAF`, `drug:pembrolizumab`, `disease:melanoma`
156 |     - Full fetch support: `fetch(domain="drug", id="DB00945")`
157 |   - **Clinical Trial Enhancement**:
158 |     - Automatic disease synonym expansion for trial searches
159 |     - Real-time synonym lookup from MyDisease.info
160 |     - Example: searching for "GIST" automatically includes "gastrointestinal stromal tumor"
161 |   - **Smart Caching & Performance**:
162 |     - Batch operations for multiple gene/drug lookups
163 |     - Intelligent caching with TTL (gene: 24h, drug: 48h, disease: 72h)
164 |     - Rate limiting to respect API guidelines
165 | 
166 | ### Changed
167 | 
168 | - Trial search now expands disease terms by default (disable with `expand_synonyms=False`)
169 | - Enhanced error handling for BioThings API responses
170 | - Improved network reliability with automatic retries
171 | 
172 | ## [0.4.6] - 2025-07-09
173 | 
174 | ### Added
175 | 
176 | - MkDocs documentation deployment
177 | 
178 | ## [0.4.5] - 2025-07-09
179 | 
180 | ### Added
181 | 
182 | - Unified search and fetch tools following OpenAI MCP guidelines
183 | - Additional variant sources (TCGA/GDC, 1000 Genomes) enabled by default in fetch operations
184 | - Additional article sources (bioRxiv, medRxiv, Europe PMC) enabled by default in search operations
185 | 
186 | ### Changed
187 | 
188 | - Consolidated 10 separate MCP tools into 2 unified tools (search and fetch)
189 | - Updated response formats to comply with OpenAI MCP specifications
190 | 
191 | ### Fixed
192 | 
193 | - OpenAI MCP compliance issues to enable integration
194 | 
195 | ## [0.4.4] - 2025-07-08
196 | 
197 | ### Added
198 | 
199 | - **Performance Optimizations**:
200 |   - Connection pooling with event loop lifecycle management (30% latency reduction)
201 |   - Parallel test execution with pytest-xdist (5x faster test runs)
202 |   - Request batching for cBioPortal API calls (80% fewer API calls)
203 |   - Smart caching with LRU eviction and fast hash keys (10x faster cache operations)
204 |   - Major performance improvements achieving ~3x faster test execution (120s → 42s)
205 | 
206 | ### Fixed
207 | 
208 | - Non-critical ASGI errors suppressed
209 | - Performance issues in article_searcher
210 | 
211 | ## [0.4.3] - 2025-07-08
212 | 
213 | ### Added
214 | 
215 | - Complete HTTP centralization and improved code quality
216 | - Comprehensive constants module for better maintainability
217 | - Domain-specific handlers for result formatting
218 | - Parameter parser for robust input validation
219 | - Custom exception hierarchy for better error handling
220 | 
221 | ### Changed
222 | 
223 | - Refactored domain handlers to use static methods for better performance
224 | - Enhanced type safety throughout the codebase
225 | - Refactored complex functions to meet code quality standards
226 | 
227 | ### Fixed
228 | 
229 | - Type errors in router.py for full mypy compliance
230 | - Complex functions exceeding cyclomatic complexity thresholds
231 | 
232 | ## [0.4.2] - 2025-07-07
233 | 
234 | ### Added
235 | 
236 | - Europe PMC DOI support for article fetching
237 | - Pagination support for Europe PMC searches
238 | - OR logic support for variant notation searches (e.g., R173 vs Arg173 vs p.R173)
239 | 
240 | ### Changed
241 | 
242 | - Enhanced variant notation search capabilities
243 | 
244 | ## [0.4.1] - 2025-07-03
245 | 
246 | ### Added
247 | 
248 | - AlphaGenome as an optional dependency to predict variant effects on gene regulation
249 | - Per-request API key support for AlphaGenome integration
250 | - AI predictions to complement existing database lookups
251 | 
252 | ### Security
253 | 
254 | - Comprehensive sanitization in Cloudflare Worker to prevent sensitive data logging
255 | - Secure usage in hosted environments where users provide their own keys
256 | 
257 | ## [0.4.0] - 2025-06-27
258 | 
259 | ### Added
260 | 
261 | - **cBioPortal Integration** for article searches:
262 |   - Automatic gene-level mutation summaries when searching with gene parameters
263 |   - Mutation-specific search capabilities (e.g., BRAF V600E, SRSF2 F57\*)
264 |   - Dynamic cancer type resolution using cBioPortal API
265 |   - Smart caching and rate limiting for optimal performance
266 | 
267 | ## [0.3.3] - 2025-06-20
268 | 
269 | ### Changed
270 | 
271 | - Release workflow updates
272 | 
273 | ## [0.3.2] - 2025-06-20
274 | 
275 | ### Changed
276 | 
277 | - Release workflow updates
278 | 
279 | ## [0.3.1] - 2025-06-20
280 | 
281 | ### Fixed
282 | 
283 | - Build and release process improvements
284 | 
285 | ## [0.3.0] - 2025-06-20
286 | 
287 | ### Added
288 | 
289 | - Expanded search capabilities
290 | - Integration tests for MCP server functionality
291 | - Utility modules for gene validation, mutation filtering, and request caching
292 | 
293 | ## [0.2.1] - 2025-06-19
294 | 
295 | ### Added
296 | 
297 | - Remote MCP policies
298 | 
299 | ## [0.2.0] - 2025-06-17
300 | 
301 | ### Added
302 | 
303 | - Sequential thinking tool for systematic problem-solving
304 | - Session-based thinking to replace global state
305 | - Extracted router handlers to reduce complexity
306 | 
307 | ### Changed
308 | 
309 | - Replaced global state in thinking module with session management
310 | 
311 | ### Removed
312 | 
313 | - Global state from sequential thinking module
314 | 
315 | ### Fixed
316 | 
317 | - Race conditions in sequential thinking with concurrent usage
318 | 
319 | ## [0.1.11] - 2025-06-12
320 | 
321 | ### Added
322 | 
323 | - Advanced eligibility criteria filters to clinical trial search
324 | 
325 | ## [0.1.10] - 2025-05-21
326 | 
327 | ### Added
328 | 
329 | - OAuth support on the Cloudflare worker via Stytch
330 | 
331 | ## [0.1.9] - 2025-05-17
332 | 
333 | ### Fixed
334 | 
335 | - Refactor: Bump minimum Python version to 3.10
336 | 
337 | ## [0.1.8] - 2025-05-14
338 | 
339 | ### Fixed
340 | 
341 | - Article searcher fixes
342 | 
343 | ## [0.1.7] - 2025-05-07
344 | 
345 | ### Added
346 | 
347 | - Remote OAuth support
348 | 
349 | ## [0.1.6] - 2025-05-05
350 | 
351 | ### Added
352 | 
353 | - Updates to handle cursor integration
354 | 
355 | ## [0.1.5] - 2025-05-01
356 | 
357 | ### Added
358 | 
359 | - Updates to smithery yaml to account for object types needed for remote calls
360 | - Documentation and Lzyank updates
361 | 
362 | ## [0.1.3] - 2025-05-01
363 | 
364 | ### Added
365 | 
366 | - Health check functionality to assist with API call issues
367 | - System resources and network & environment information gathering
368 | - Remote MCP capability via Cloudflare using SSE
369 | 
370 | ## [0.1.2] - 2025-04-18
371 | 
372 | ### Added
373 | 
374 | - Researcher persona and BioMCP v0.1.2 release
375 | - Deep Researcher Persona blog post
376 | - Researcher persona video demo
377 | 
378 | ## [0.1.1] - 2025-04-14
379 | 
380 | ### Added
381 | 
382 | - Claude Desktop and MCP Inspector tutorials
383 | - Improved Claude Desktop Tutorial for BioMCP
384 | - Troubleshooting guide and blog post
385 | 
386 | ### Fixed
387 | 
388 | - Log tool names as comma separated string
389 | - Server hanging issues
390 | - Error responses in variant count check
391 | 
392 | ## [0.1.0] - 2025-04-08
393 | 
394 | ### Added
395 | 
396 | - Initial release of BioMCP
397 | - PubMed/PubTator3 article search integration
398 | - ClinicalTrials.gov trial search integration
399 | - MyVariant.info variant search integration
400 | - CLI interface for direct usage
401 | - MCP server for AI assistant integration
402 | - Cloudflare Worker support for remote deployment
403 | - Comprehensive test suite with pytest-bdd
404 | - GenomOncology introduction
405 | - Blog post on AI-assisted clinical trial search
406 | - MacOS troubleshooting guide
407 | 
408 | ### Security
409 | 
410 | - API keys properly externalized
411 | - Input validation using Pydantic models
412 | - Safe string handling in all API calls
413 | 
414 | [Unreleased]: https://github.com/genomoncology/biomcp/compare/v0.6.6...HEAD
415 | [0.6.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.6
416 | [0.6.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.5
417 | [0.6.4]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.4
418 | [0.6.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.3
419 | [0.6.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.2
420 | [0.6.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.1
421 | [0.6.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.0
422 | [0.5.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.5.0
423 | [0.4.7]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.7
424 | [0.4.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.6
425 | [0.4.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.5
426 | [0.4.4]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.4
427 | [0.4.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.3
428 | [0.4.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.2
429 | [0.4.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.1
430 | [0.4.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.0
431 | [0.3.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.3
432 | [0.3.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.2
433 | [0.3.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.1
434 | [0.3.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.0
435 | [0.2.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.2.1
436 | [0.2.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.2.0
437 | [0.1.11]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.11
438 | [0.1.10]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.10
439 | [0.1.9]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.9
440 | [0.1.8]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.8
441 | [0.1.7]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.7
442 | [0.1.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.6
443 | [0.1.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.5
444 | [0.1.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.3
445 | [0.1.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.2
446 | [0.1.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.1
447 | [0.1.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.0
448 | 
```

--------------------------------------------------------------------------------
/docs/developer-guides/02-contributing-and-testing.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Contributing and Testing Guide
  2 | 
  3 | This guide covers how to contribute to BioMCP and run the comprehensive test suite.
  4 | 
  5 | ## Getting Started
  6 | 
  7 | ### Prerequisites
  8 | 
  9 | - Python 3.10 or higher
 10 | - [uv](https://docs.astral.sh/uv/) package manager
 11 | - Git
 12 | - Node.js (for MCP Inspector)
 13 | 
 14 | ### Initial Setup
 15 | 
 16 | 1. **Fork and clone the repository:**
 17 | 
 18 | ```bash
 19 | git clone https://github.com/YOUR_USERNAME/biomcp.git
 20 | cd biomcp
 21 | ```
 22 | 
 23 | 2. **Install dependencies and setup:**
 24 | 
 25 | ```bash
 26 | # Recommended: Use make for complete setup
 27 | make install
 28 | 
 29 | # Alternative: Manual setup
 30 | uv sync --all-extras
 31 | uv run pre-commit install
 32 | ```
 33 | 
 34 | 3. **Verify installation:**
 35 | 
 36 | ```bash
 37 | # Run server
 38 | biomcp run
 39 | 
 40 | # Run tests
 41 | make test-offline
 42 | ```
 43 | 
 44 | ## Development Workflow
 45 | 
 46 | ### 1. Create Feature Branch
 47 | 
 48 | ```bash
 49 | git checkout -b feature/your-feature-name
 50 | ```
 51 | 
 52 | ### 2. Make Changes
 53 | 
 54 | Follow these principles:
 55 | 
 56 | - **Keep changes minimal and focused**
 57 | - **Follow existing code patterns**
 58 | - **Add tests for new functionality**
 59 | - **Update documentation as needed**
 60 | 
 61 | ### 3. Quality Checks
 62 | 
 63 | **MANDATORY: Run these before considering work complete:**
 64 | 
 65 | ```bash
 66 | # Step 1: Code quality checks
 67 | make check
 68 | 
 69 | # This runs:
 70 | # - ruff check (linting)
 71 | # - ruff format (code formatting)
 72 | # - mypy (type checking)
 73 | # - pre-commit hooks
 74 | # - deptry (dependency analysis)
 75 | ```
 76 | 
 77 | ### 4. Run Tests
 78 | 
 79 | ```bash
 80 | # Step 2: Run appropriate test suite
 81 | make test          # Full suite (requires network)
 82 | # OR
 83 | make test-offline  # Unit tests only (no network)
 84 | ```
 85 | 
 86 | **Both quality checks and tests MUST pass before submitting changes.**
 87 | 
 88 | ## Testing Strategy
 89 | 
 90 | ### Test Categories
 91 | 
 92 | #### Unit Tests
 93 | 
 94 | - Fast, reliable tests without external dependencies
 95 | - Mock all external API calls
 96 | - Always run in CI/CD
 97 | 
 98 | ```python
 99 | # Example unit test
100 | @patch('httpx.AsyncClient.get')
101 | async def test_article_search(mock_get):
102 |     mock_get.return_value.json.return_value = {"results": [...]}
103 |     result = await article_searcher(genes=["BRAF"])
104 |     assert len(result) > 0
105 | ```
106 | 
107 | #### Integration Tests
108 | 
109 | - Test real API interactions
110 | - May fail due to network/API issues
111 | - Run separately in CI with `continue-on-error`
112 | 
113 | ```python
114 | # Example integration test
115 | @pytest.mark.integration
116 | async def test_real_pubmed_search():
117 |     result = await article_searcher(genes=["TP53"], limit=5)
118 |     assert len(result) == 5
119 |     assert all("TP53" in r.text for r in result)
120 | ```
121 | 
122 | ### Running Tests
123 | 
124 | #### Command Options
125 | 
126 | ```bash
127 | # Run all tests
128 | make test
129 | uv run python -m pytest
130 | 
131 | # Run only unit tests (fast, offline)
132 | make test-offline
133 | uv run python -m pytest -m "not integration"
134 | 
135 | # Run only integration tests
136 | uv run python -m pytest -m "integration"
137 | 
138 | # Run specific test file
139 | uv run python -m pytest tests/tdd/test_article_search.py
140 | 
141 | # Run with coverage
142 | make cov
143 | uv run python -m pytest --cov --cov-report=html
144 | 
145 | # Run tests verbosely
146 | uv run python -m pytest -v
147 | 
148 | # Run tests and stop on first failure
149 | uv run python -m pytest -x
150 | ```
151 | 
152 | #### Test Discovery
153 | 
154 | Tests are organized in:
155 | 
156 | - `tests/tdd/` - Unit and integration tests
157 | - `tests/bdd/` - Behavior-driven development tests
158 | - `tests/data/` - Test fixtures and sample data
159 | 
160 | ### Writing Tests
161 | 
162 | #### Test Structure
163 | 
164 | ```python
165 | import pytest
166 | from unittest.mock import patch, AsyncMock
167 | from biomcp.articles import article_searcher
168 | 
169 | class TestArticleSearch:
170 |     """Test article search functionality"""
171 | 
172 |     @pytest.fixture
173 |     def mock_response(self):
174 |         """Sample API response"""
175 |         return {
176 |             "results": [
177 |                 {"pmid": "12345", "title": "BRAF in melanoma"}
178 |             ]
179 |         }
180 | 
181 |     @patch('httpx.AsyncClient.get')
182 |     async def test_basic_search(self, mock_get, mock_response):
183 |         """Test basic article search"""
184 |         # Setup
185 |         mock_get.return_value = AsyncMock()
186 |         mock_get.return_value.json.return_value = mock_response
187 | 
188 |         # Execute
189 |         result = await article_searcher(genes=["BRAF"])
190 | 
191 |         # Assert
192 |         assert len(result) == 1
193 |         assert "BRAF" in result[0].title
194 | ```
195 | 
196 | #### Async Testing
197 | 
198 | ```python
199 | import pytest
200 | import asyncio
201 | 
202 | @pytest.mark.asyncio
203 | async def test_async_function():
204 |     """Test async functionality"""
205 |     result = await some_async_function()
206 |     assert result is not None
207 | 
208 | # Or use pytest-asyncio fixtures
209 | @pytest.fixture
210 | async def async_client():
211 |     async with AsyncClient() as client:
212 |         yield client
213 | ```
214 | 
215 | #### Mocking External APIs
216 | 
217 | ```python
218 | from unittest.mock import patch, MagicMock
219 | 
220 | @patch('biomcp.integrations.pubmed.search')
221 | def test_with_mock(mock_search):
222 |     # Configure mock
223 |     mock_search.return_value = [{
224 |         "pmid": "12345",
225 |         "title": "Test Article"
226 |     }]
227 | 
228 |     # Test code that uses the mocked function
229 |     result = search_articles("BRAF")
230 | 
231 |     # Verify mock was called correctly
232 |     mock_search.assert_called_once_with("BRAF")
233 | ```
234 | 
235 | ## MCP Inspector Testing
236 | 
237 | The MCP Inspector provides an interactive way to test MCP tools.
238 | 
239 | ### Setup
240 | 
241 | ```bash
242 | # Install inspector
243 | npm install -g @modelcontextprotocol/inspector
244 | 
245 | # Run BioMCP with inspector
246 | make inspector
247 | # OR
248 | npx @modelcontextprotocol/inspector uv run --with biomcp-python biomcp run
249 | ```
250 | 
251 | ### Testing Tools
252 | 
253 | 1. **Connect to server** in the inspector UI
254 | 2. **View available tools** in the tools panel
255 | 3. **Test individual tools** with sample inputs
256 | 
257 | #### Example Tool Tests
258 | 
259 | ```javascript
260 | // Test article search
261 | {
262 |   "tool": "article_searcher",
263 |   "arguments": {
264 |     "genes": ["BRAF"],
265 |     "diseases": ["melanoma"],
266 |     "limit": 5
267 |   }
268 | }
269 | 
270 | // Test trial search
271 | {
272 |   "tool": "trial_searcher",
273 |   "arguments": {
274 |     "conditions": ["lung cancer"],
275 |     "recruiting_status": "OPEN",
276 |     "limit": 10
277 |   }
278 | }
279 | 
280 | // Test think tool (ALWAYS first!)
281 | {
282 |   "tool": "think",
283 |   "arguments": {
284 |     "thought": "Planning to search for BRAF mutations",
285 |     "thoughtNumber": 1,
286 |     "nextThoughtNeeded": true
287 |   }
288 | }
289 | ```
290 | 
291 | ### Debugging with Inspector
292 | 
293 | 1. **Check request/response**: View raw MCP messages
294 | 2. **Verify parameters**: Ensure correct argument format
295 | 3. **Test error handling**: Try invalid inputs
296 | 4. **Monitor performance**: Check response times
297 | 
298 | ## Code Style and Standards
299 | 
300 | ### Python Style
301 | 
302 | - **Formatter**: ruff (line length: 79)
303 | - **Type hints**: Required for all functions
304 | - **Docstrings**: Google style for all public functions
305 | 
306 | ```python
307 | def search_articles(
308 |     genes: list[str],
309 |     limit: int = 10
310 | ) -> list[Article]:
311 |     """Search for articles by gene names.
312 | 
313 |     Args:
314 |         genes: List of gene symbols to search
315 |         limit: Maximum number of results
316 | 
317 |     Returns:
318 |         List of Article objects
319 | 
320 |     Raises:
321 |         ValueError: If genes list is empty
322 |     """
323 |     if not genes:
324 |         raise ValueError("Genes list cannot be empty")
325 |     # Implementation...
326 | ```
327 | 
328 | ### Pre-commit Hooks
329 | 
330 | Automatically run on commit:
331 | 
332 | - ruff formatting
333 | - ruff linting
334 | - mypy type checking
335 | - File checks (YAML, TOML, merge conflicts)
336 | 
337 | Manual run:
338 | 
339 | ```bash
340 | uv run pre-commit run --all-files
341 | ```
342 | 
343 | ## Continuous Integration
344 | 
345 | ### GitHub Actions Workflow
346 | 
347 | The CI pipeline runs:
348 | 
349 | 1. **Linting and Formatting**
350 | 2. **Type Checking**
351 | 3. **Unit Tests** (required to pass)
352 | 4. **Integration Tests** (allowed to fail)
353 | 5. **Coverage Report**
354 | 
355 | ### CI Configuration
356 | 
357 | ```yaml
358 | # .github/workflows/test.yml structure
359 | jobs:
360 |   test:
361 |     strategy:
362 |       matrix:
363 |         python-version: ["3.10", "3.11", "3.12"]
364 |     steps:
365 |       - uses: actions/checkout@v4
366 |       - uses: astral-sh/setup-uv@v2
367 |       - run: make check
368 |       - run: make test-offline
369 | ```
370 | 
371 | ## Debugging and Troubleshooting
372 | 
373 | ### Common Issues
374 | 
375 | #### Test Failures
376 | 
377 | ```bash
378 | # Run failed test with more details
379 | uv run python -m pytest -vvs tests/path/to/test.py::test_name
380 | 
381 | # Debug with print statements
382 | uv run python -m pytest -s  # Don't capture stdout
383 | 
384 | # Use debugger
385 | uv run python -m pytest --pdb  # Drop to debugger on failure
386 | ```
387 | 
388 | #### Integration Test Issues
389 | 
390 | Common causes:
391 | 
392 | - **Rate limiting**: Add delays or use mocks
393 | - **API changes**: Update test expectations
394 | - **Network issues**: Check connectivity
395 | - **API keys**: Ensure valid keys for NCI tests
396 | 
397 | ## Integration Testing
398 | 
399 | ### Overview
400 | 
401 | BioMCP includes integration tests that make real API calls to external services. These tests verify that our integrations work correctly with live data but can be affected by API availability, rate limits, and data changes.
402 | 
403 | ### Running Integration Tests
404 | 
405 | ```bash
406 | # Run all tests including integration
407 | make test
408 | 
409 | # Run only integration tests
410 | pytest -m integration
411 | 
412 | # Skip integration tests
413 | pytest -m "not integration"
414 | ```
415 | 
416 | ### Handling Flaky Tests
417 | 
418 | Integration tests may fail or skip for various reasons:
419 | 
420 | 1. **API Unavailability**
421 | 
422 |    - **Symptom**: Tests skip with "API returned no data" message
423 |    - **Cause**: The external service is down or experiencing issues
424 |    - **Action**: Re-run tests later or check service status
425 | 
426 | 2. **Rate Limiting**
427 | 
428 |    - **Symptom**: Multiple test failures after initial successes
429 |    - **Cause**: Too many requests in a short time
430 |    - **Action**: Run tests with delays between them or use API tokens
431 | 
432 | 3. **Data Changes**
433 |    - **Symptom**: Assertions about specific data fail
434 |    - **Cause**: The external data has changed (e.g., new mutations discovered)
435 |    - **Action**: Update tests to use more flexible assertions
436 | 
437 | ### Integration Test Design Principles
438 | 
439 | #### 1. Graceful Skipping
440 | 
441 | Tests should skip rather than fail when:
442 | 
443 | - API returns no data
444 | - Service is unavailable
445 | - Rate limits are hit
446 | 
447 | ```python
448 | if not data or data.total_count == 0:
449 |     pytest.skip("API returned no data - possible service issue")
450 | ```
451 | 
452 | #### 2. Flexible Assertions
453 | 
454 | Avoid assertions on specific data values that might change:
455 | 
456 | ❌ **Bad**: Expecting exact mutation counts
457 | 
458 | ```python
459 | assert summary.total_mutations == 1234
460 | ```
461 | 
462 | ✅ **Good**: Checking data exists and has reasonable structure
463 | 
464 | ```python
465 | assert summary.total_mutations > 0
466 | assert hasattr(summary, 'hotspots')
467 | ```
468 | 
469 | #### 3. Retry Logic
470 | 
471 | For critical tests, implement retry with delay:
472 | 
473 | ```python
474 | async def fetch_with_retry(client, resource, max_attempts=2, delay=1.0):
475 |     for attempt in range(max_attempts):
476 |         result = await client.get(resource)
477 |         if result and result.data:
478 |             return result
479 |         if attempt < max_attempts - 1:
480 |             await asyncio.sleep(delay)
481 |     return None
482 | ```
483 | 
484 | #### 4. Cache Management
485 | 
486 | Clear caches before tests to ensure fresh data:
487 | 
488 | ```python
489 | from biomcp.utils.request_cache import clear_cache
490 | await clear_cache()
491 | ```
492 | 
493 | ### Common Integration Test Patterns
494 | 
495 | #### Testing Search Functionality
496 | 
497 | ```python
498 | @pytest.mark.integration
499 | async def test_gene_search(self):
500 |     client = SearchClient()
501 |     results = await client.search("BRAF")
502 | 
503 |     # Flexible assertions
504 |     assert results is not None
505 |     if results.count > 0:
506 |         assert results.items[0].gene_symbol == "BRAF"
507 |     else:
508 |         pytest.skip("No results returned - API may be unavailable")
509 | ```
510 | 
511 | #### Testing Data Retrieval
512 | 
513 | ```python
514 | @pytest.mark.integration
515 | async def test_variant_details(self):
516 |     client = VariantClient()
517 |     variant = await client.get_variant("rs121913529")
518 | 
519 |     if not variant:
520 |         pytest.skip("Variant not found - may have been removed from database")
521 | 
522 |     # Check structure, not specific values
523 |     assert hasattr(variant, 'chromosome')
524 |     assert hasattr(variant, 'position')
525 | ```
526 | 
527 | ### Debugging Failed Integration Tests
528 | 
529 | 1. **Enable Debug Logging**
530 | 
531 |    ```bash
532 |    BIOMCP_LOG_LEVEL=DEBUG pytest tests/integration/test_failing.py -v
533 |    ```
534 | 
535 | 2. **Check API Status**
536 | 
537 |    - PubMed: https://www.ncbi.nlm.nih.gov/home/about/website-updates/
538 |    - ClinicalTrials.gov: https://clinicaltrials.gov/about/announcements
539 |    - cBioPortal: https://www.cbioportal.org/
540 | 
541 | 3. **Inspect Response Data**
542 |    ```python
543 |    if not expected_data:
544 |        print(f"Unexpected response: {response}")
545 |        pytest.skip("Data structure changed")
546 |    ```
547 | 
548 | ### Environment Variables for Testing
549 | 
550 | #### API Tokens
551 | 
552 | Some services provide higher rate limits with authentication:
553 | 
554 | ```bash
555 | export CBIO_TOKEN="your-token-here"
556 | export PUBMED_API_KEY="your-key-here"
557 | ```
558 | 
559 | #### Offline Mode
560 | 
561 | Test offline behavior:
562 | 
563 | ```bash
564 | export BIOMCP_OFFLINE=true
565 | pytest tests/
566 | ```
567 | 
568 | #### Custom Timeouts
569 | 
570 | Adjust timeouts for slow connections:
571 | 
572 | ```bash
573 | export BIOMCP_REQUEST_TIMEOUT=60
574 | pytest tests/integration/
575 | ```
576 | 
577 | ### CI/CD Considerations
578 | 
579 | 1. **Separate Test Runs**
580 | 
581 |    ```yaml
582 |    - name: Unit Tests
583 |      run: pytest -m "not integration"
584 | 
585 |    - name: Integration Tests
586 |      run: pytest -m integration
587 |      continue-on-error: true
588 |    ```
589 | 
590 | 2. **Scheduled Runs**
591 | 
592 |    ```yaml
593 |    on:
594 |      schedule:
595 |        - cron: "0 6 * * *" # Daily at 6 AM
596 |    ```
597 | 
598 | 3. **Result Monitoring**: Track integration test success rates over time to identify patterns.
599 | 
600 | ### Integration Testing Best Practices
601 | 
602 | 1. **Keep integration tests focused** - Test integration points, not business logic
603 | 2. **Use reasonable timeouts** - Don't wait forever for slow APIs
604 | 3. **Document expected failures** - Add comments explaining why tests might skip
605 | 4. **Monitor external changes** - Subscribe to API change notifications
606 | 5. **Provide escape hatches** - Allow skipping integration tests when needed
607 | 
608 | #### Type Checking Errors
609 | 
610 | ```bash
611 | # Check specific file
612 | uv run mypy src/biomcp/specific_file.py
613 | 
614 | # Ignore specific error
615 | # type: ignore[error-code]
616 | 
617 | # Show error codes
618 | uv run mypy --show-error-codes
619 | ```
620 | 
621 | ### Performance Testing
622 | 
623 | ```python
624 | import time
625 | import pytest
626 | 
627 | @pytest.mark.performance
628 | def test_search_performance():
629 |     """Ensure search completes within time limit"""
630 |     start = time.time()
631 |     result = search_articles("TP53", limit=100)
632 |     duration = time.time() - start
633 | 
634 |     assert duration < 5.0  # Should complete in 5 seconds
635 |     assert len(result) == 100
636 | ```
637 | 
638 | ## Submitting Changes
639 | 
640 | ### Pull Request Process
641 | 
642 | 1. **Ensure all checks pass:**
643 | 
644 | ```bash
645 | make check && make test
646 | ```
647 | 
648 | 2. **Update documentation** if needed
649 | 
650 | 3. **Commit with clear message:**
651 | 
652 | ```bash
653 | git add .
654 | git commit -m "feat: add support for variant batch queries
655 | 
656 | - Add batch_variant_search function
657 | - Update tests for batch functionality
658 | - Document batch size limits"
659 | ```
660 | 
661 | 4. **Push to your fork:**
662 | 
663 | ```bash
664 | git push origin feature/your-feature-name
665 | ```
666 | 
667 | 5. **Create Pull Request** with:
668 |    - Clear description of changes
669 |    - Link to related issues
670 |    - Test results summary
671 | 
672 | ### Code Review Guidelines
673 | 
674 | Your PR will be reviewed for:
675 | 
676 | - **Code quality** and style consistency
677 | - **Test coverage** for new features
678 | - **Documentation** updates
679 | - **Performance** impact
680 | - **Security** considerations
681 | 
682 | ## Best Practices
683 | 
684 | ### DO:
685 | 
686 | - Write tests for new functionality
687 | - Follow existing patterns
688 | - Keep PRs focused and small
689 | - Update documentation
690 | - Run full test suite locally
691 | 
692 | ### DON'T:
693 | 
694 | - Skip tests to "save time"
695 | - Mix unrelated changes in one PR
696 | - Ignore linting warnings
697 | - Commit sensitive data
698 | - Break existing functionality
699 | 
700 | ## Additional Resources
701 | 
702 | - [MCP Documentation](https://modelcontextprotocol.org)
703 | - [pytest Documentation](https://docs.pytest.org)
704 | - [Type Hints Guide](https://mypy.readthedocs.io)
705 | - [Ruff Documentation](https://docs.astral.sh/ruff)
706 | 
707 | ## Getting Help
708 | 
709 | - **GitHub Issues**: Report bugs or request features
710 | - **Issues**: Ask questions or share ideas
711 | - **Pull Requests**: Submit contributions
712 | - **Documentation**: Check existing docs first
713 | 
714 | Remember: Quality over speed. Take time to write good tests and clean code!
715 | 
```

--------------------------------------------------------------------------------
/src/biomcp/cli/openfda.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | OpenFDA CLI commands for BioMCP.
  3 | """
  4 | 
  5 | import asyncio
  6 | from typing import Annotated
  7 | 
  8 | import typer
  9 | from rich.console import Console
 10 | 
 11 | from ..openfda import (
 12 |     get_adverse_event,
 13 |     get_device_event,
 14 |     get_drug_approval,
 15 |     get_drug_label,
 16 |     get_drug_recall,
 17 |     get_drug_shortage,
 18 |     search_adverse_events,
 19 |     search_device_events,
 20 |     search_drug_approvals,
 21 |     search_drug_labels,
 22 |     search_drug_recalls,
 23 |     search_drug_shortages,
 24 | )
 25 | 
 26 | console = Console()
 27 | 
 28 | # Create separate Typer apps for each subdomain
 29 | adverse_app = typer.Typer(
 30 |     no_args_is_help=True,
 31 |     help="Search and retrieve FDA drug adverse event reports (FAERS)",
 32 | )
 33 | 
 34 | label_app = typer.Typer(
 35 |     no_args_is_help=True,
 36 |     help="Search and retrieve FDA drug product labels (SPL)",
 37 | )
 38 | 
 39 | device_app = typer.Typer(
 40 |     no_args_is_help=True,
 41 |     help="Search and retrieve FDA device adverse event reports (MAUDE)",
 42 | )
 43 | 
 44 | approval_app = typer.Typer(
 45 |     no_args_is_help=True,
 46 |     help="Search and retrieve FDA drug approval records (Drugs@FDA)",
 47 | )
 48 | 
 49 | recall_app = typer.Typer(
 50 |     no_args_is_help=True,
 51 |     help="Search and retrieve FDA drug recall records (Enforcement)",
 52 | )
 53 | 
 54 | shortage_app = typer.Typer(
 55 |     no_args_is_help=True,
 56 |     help="Search and retrieve FDA drug shortage information",
 57 | )
 58 | 
 59 | 
 60 | # Adverse Events Commands
 61 | @adverse_app.command("search")
 62 | def search_adverse_events_cli(
 63 |     drug: Annotated[
 64 |         str | None,
 65 |         typer.Option("--drug", "-d", help="Drug name to search for"),
 66 |     ] = None,
 67 |     reaction: Annotated[
 68 |         str | None,
 69 |         typer.Option(
 70 |             "--reaction", "-r", help="Adverse reaction to search for"
 71 |         ),
 72 |     ] = None,
 73 |     serious: Annotated[
 74 |         bool | None,
 75 |         typer.Option("--serious/--all", help="Filter for serious events only"),
 76 |     ] = None,
 77 |     limit: Annotated[
 78 |         int, typer.Option("--limit", "-l", help="Maximum number of results")
 79 |     ] = 25,
 80 |     page: Annotated[
 81 |         int, typer.Option("--page", "-p", help="Page number (1-based)")
 82 |     ] = 1,
 83 |     api_key: Annotated[
 84 |         str | None,
 85 |         typer.Option(
 86 |             "--api-key",
 87 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
 88 |         ),
 89 |     ] = None,
 90 | ):
 91 |     """Search FDA adverse event reports for drugs."""
 92 |     skip = (page - 1) * limit
 93 | 
 94 |     try:
 95 |         results = asyncio.run(
 96 |             search_adverse_events(
 97 |                 drug=drug,
 98 |                 reaction=reaction,
 99 |                 serious=serious,
100 |                 limit=limit,
101 |                 skip=skip,
102 |                 api_key=api_key,
103 |             )
104 |         )
105 |         console.print(results)
106 |     except Exception as e:
107 |         console.print(f"[red]Error: {e}[/red]")
108 |         raise typer.Exit(1) from e
109 | 
110 | 
111 | @adverse_app.command("get")
112 | def get_adverse_event_cli(
113 |     report_id: Annotated[str, typer.Argument(help="Safety report ID")],
114 |     api_key: Annotated[
115 |         str | None,
116 |         typer.Option(
117 |             "--api-key",
118 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
119 |         ),
120 |     ] = None,
121 | ):
122 |     """Get detailed information for a specific adverse event report."""
123 |     try:
124 |         result = asyncio.run(get_adverse_event(report_id, api_key=api_key))
125 |         console.print(result)
126 |     except Exception as e:
127 |         console.print(f"[red]Error: {e}[/red]")
128 |         raise typer.Exit(1) from e
129 | 
130 | 
131 | # Drug Label Commands
132 | @label_app.command("search")
133 | def search_drug_labels_cli(
134 |     name: Annotated[
135 |         str | None,
136 |         typer.Option("--name", "-n", help="Drug name to search for"),
137 |     ] = None,
138 |     indication: Annotated[
139 |         str | None,
140 |         typer.Option(
141 |             "--indication",
142 |             "-i",
143 |             help="Search for drugs indicated for this condition",
144 |         ),
145 |     ] = None,
146 |     boxed_warning: Annotated[
147 |         bool,
148 |         typer.Option(
149 |             "--boxed-warning", help="Filter for drugs with boxed warnings"
150 |         ),
151 |     ] = False,
152 |     section: Annotated[
153 |         str | None,
154 |         typer.Option(
155 |             "--section", "-s", help="Specific label section to search"
156 |         ),
157 |     ] = None,
158 |     limit: Annotated[
159 |         int, typer.Option("--limit", "-l", help="Maximum number of results")
160 |     ] = 25,
161 |     page: Annotated[
162 |         int, typer.Option("--page", "-p", help="Page number (1-based)")
163 |     ] = 1,
164 |     api_key: Annotated[
165 |         str | None,
166 |         typer.Option(
167 |             "--api-key",
168 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
169 |         ),
170 |     ] = None,
171 | ):
172 |     """Search FDA drug product labels."""
173 |     skip = (page - 1) * limit
174 | 
175 |     try:
176 |         results = asyncio.run(
177 |             search_drug_labels(
178 |                 name=name,
179 |                 indication=indication,
180 |                 boxed_warning=boxed_warning,
181 |                 section=section,
182 |                 limit=limit,
183 |                 skip=skip,
184 |                 api_key=api_key,
185 |             )
186 |         )
187 |         console.print(results)
188 |     except Exception as e:
189 |         console.print(f"[red]Error: {e}[/red]")
190 |         raise typer.Exit(1) from e
191 | 
192 | 
193 | @label_app.command("get")
194 | def get_drug_label_cli(
195 |     set_id: Annotated[str, typer.Argument(help="Label set ID")],
196 |     sections: Annotated[
197 |         str | None,
198 |         typer.Option(
199 |             "--sections", help="Comma-separated list of sections to retrieve"
200 |         ),
201 |     ] = None,
202 |     api_key: Annotated[
203 |         str | None,
204 |         typer.Option(
205 |             "--api-key",
206 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
207 |         ),
208 |     ] = None,
209 | ):
210 |     """Get detailed drug label information."""
211 |     section_list = None
212 |     if sections:
213 |         section_list = [s.strip() for s in sections.split(",")]
214 | 
215 |     try:
216 |         result = asyncio.run(
217 |             get_drug_label(set_id, section_list, api_key=api_key)
218 |         )
219 |         console.print(result)
220 |     except Exception as e:
221 |         console.print(f"[red]Error: {e}[/red]")
222 |         raise typer.Exit(1) from e
223 | 
224 | 
225 | # Device Event Commands
226 | @device_app.command("search")
227 | def search_device_events_cli(
228 |     device: Annotated[
229 |         str | None,
230 |         typer.Option("--device", "-d", help="Device name to search for"),
231 |     ] = None,
232 |     manufacturer: Annotated[
233 |         str | None,
234 |         typer.Option("--manufacturer", "-m", help="Manufacturer name"),
235 |     ] = None,
236 |     problem: Annotated[
237 |         str | None,
238 |         typer.Option("--problem", "-p", help="Device problem description"),
239 |     ] = None,
240 |     product_code: Annotated[
241 |         str | None, typer.Option("--product-code", help="FDA product code")
242 |     ] = None,
243 |     genomics_only: Annotated[
244 |         bool,
245 |         typer.Option(
246 |             "--genomics-only/--all-devices",
247 |             help="Filter to genomic/diagnostic devices",
248 |         ),
249 |     ] = True,
250 |     limit: Annotated[
251 |         int, typer.Option("--limit", "-l", help="Maximum number of results")
252 |     ] = 25,
253 |     page: Annotated[
254 |         int, typer.Option("--page", help="Page number (1-based)")
255 |     ] = 1,
256 |     api_key: Annotated[
257 |         str | None,
258 |         typer.Option(
259 |             "--api-key",
260 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
261 |         ),
262 |     ] = None,
263 | ):
264 |     """Search FDA device adverse event reports."""
265 |     skip = (page - 1) * limit
266 | 
267 |     try:
268 |         results = asyncio.run(
269 |             search_device_events(
270 |                 device=device,
271 |                 manufacturer=manufacturer,
272 |                 problem=problem,
273 |                 product_code=product_code,
274 |                 genomics_only=genomics_only,
275 |                 limit=limit,
276 |                 skip=skip,
277 |                 api_key=api_key,
278 |             )
279 |         )
280 |         console.print(results)
281 |     except Exception as e:
282 |         console.print(f"[red]Error: {e}[/red]")
283 |         raise typer.Exit(1) from e
284 | 
285 | 
286 | @device_app.command("get")
287 | def get_device_event_cli(
288 |     mdr_report_key: Annotated[str, typer.Argument(help="MDR report key")],
289 |     api_key: Annotated[
290 |         str | None,
291 |         typer.Option(
292 |             "--api-key",
293 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
294 |         ),
295 |     ] = None,
296 | ):
297 |     """Get detailed information for a specific device event report."""
298 |     try:
299 |         result = asyncio.run(get_device_event(mdr_report_key, api_key=api_key))
300 |         console.print(result)
301 |     except Exception as e:
302 |         console.print(f"[red]Error: {e}[/red]")
303 |         raise typer.Exit(1) from e
304 | 
305 | 
306 | # Drug Approval Commands
307 | @approval_app.command("search")
308 | def search_drug_approvals_cli(
309 |     drug: Annotated[
310 |         str | None,
311 |         typer.Option("--drug", "-d", help="Drug name to search for"),
312 |     ] = None,
313 |     application: Annotated[
314 |         str | None,
315 |         typer.Option(
316 |             "--application", "-a", help="NDA or BLA application number"
317 |         ),
318 |     ] = None,
319 |     year: Annotated[
320 |         str | None,
321 |         typer.Option("--year", "-y", help="Approval year (YYYY format)"),
322 |     ] = None,
323 |     limit: Annotated[
324 |         int, typer.Option("--limit", "-l", help="Maximum number of results")
325 |     ] = 25,
326 |     page: Annotated[
327 |         int, typer.Option("--page", "-p", help="Page number (1-based)")
328 |     ] = 1,
329 |     api_key: Annotated[
330 |         str | None,
331 |         typer.Option(
332 |             "--api-key",
333 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
334 |         ),
335 |     ] = None,
336 | ):
337 |     """Search FDA drug approval records."""
338 |     skip = (page - 1) * limit
339 | 
340 |     try:
341 |         results = asyncio.run(
342 |             search_drug_approvals(
343 |                 drug=drug,
344 |                 application_number=application,
345 |                 approval_year=year,
346 |                 limit=limit,
347 |                 skip=skip,
348 |                 api_key=api_key,
349 |             )
350 |         )
351 |         console.print(results)
352 |     except Exception as e:
353 |         console.print(f"[red]Error: {e}[/red]")
354 |         raise typer.Exit(1) from e
355 | 
356 | 
357 | @approval_app.command("get")
358 | def get_drug_approval_cli(
359 |     application: Annotated[
360 |         str, typer.Argument(help="NDA or BLA application number")
361 |     ],
362 |     api_key: Annotated[
363 |         str | None,
364 |         typer.Option(
365 |             "--api-key",
366 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
367 |         ),
368 |     ] = None,
369 | ):
370 |     """Get detailed drug approval information."""
371 |     try:
372 |         result = asyncio.run(get_drug_approval(application, api_key=api_key))
373 |         console.print(result)
374 |     except Exception as e:
375 |         console.print(f"[red]Error: {e}[/red]")
376 |         raise typer.Exit(1) from e
377 | 
378 | 
379 | # Drug Recall Commands
380 | @recall_app.command("search")
381 | def search_drug_recalls_cli(
382 |     drug: Annotated[
383 |         str | None,
384 |         typer.Option("--drug", "-d", help="Drug name to search for"),
385 |     ] = None,
386 |     recall_class: Annotated[
387 |         str | None,
388 |         typer.Option(
389 |             "--class", "-c", help="Recall classification (1, 2, or 3)"
390 |         ),
391 |     ] = None,
392 |     status: Annotated[
393 |         str | None,
394 |         typer.Option(
395 |             "--status", "-s", help="Recall status (ongoing, completed)"
396 |         ),
397 |     ] = None,
398 |     reason: Annotated[
399 |         str | None,
400 |         typer.Option("--reason", "-r", help="Search in recall reason"),
401 |     ] = None,
402 |     since: Annotated[
403 |         str | None,
404 |         typer.Option("--since", help="Show recalls after date (YYYYMMDD)"),
405 |     ] = None,
406 |     limit: Annotated[
407 |         int, typer.Option("--limit", "-l", help="Maximum number of results")
408 |     ] = 25,
409 |     page: Annotated[
410 |         int, typer.Option("--page", "-p", help="Page number (1-based)")
411 |     ] = 1,
412 |     api_key: Annotated[
413 |         str | None,
414 |         typer.Option(
415 |             "--api-key",
416 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
417 |         ),
418 |     ] = None,
419 | ):
420 |     """Search FDA drug recall records."""
421 |     skip = (page - 1) * limit
422 | 
423 |     try:
424 |         results = asyncio.run(
425 |             search_drug_recalls(
426 |                 drug=drug,
427 |                 recall_class=recall_class,
428 |                 status=status,
429 |                 reason=reason,
430 |                 since_date=since,
431 |                 limit=limit,
432 |                 skip=skip,
433 |                 api_key=api_key,
434 |             )
435 |         )
436 |         console.print(results)
437 |     except Exception as e:
438 |         console.print(f"[red]Error: {e}[/red]")
439 |         raise typer.Exit(1) from e
440 | 
441 | 
442 | @recall_app.command("get")
443 | def get_drug_recall_cli(
444 |     recall_number: Annotated[str, typer.Argument(help="FDA recall number")],
445 |     api_key: Annotated[
446 |         str | None,
447 |         typer.Option(
448 |             "--api-key",
449 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
450 |         ),
451 |     ] = None,
452 | ):
453 |     """Get detailed drug recall information."""
454 |     try:
455 |         result = asyncio.run(get_drug_recall(recall_number, api_key=api_key))
456 |         console.print(result)
457 |     except Exception as e:
458 |         console.print(f"[red]Error: {e}[/red]")
459 |         raise typer.Exit(1) from e
460 | 
461 | 
462 | # Drug Shortage Commands
463 | @shortage_app.command("search")
464 | def search_drug_shortages_cli(
465 |     drug: Annotated[
466 |         str | None,
467 |         typer.Option("--drug", "-d", help="Drug name to search for"),
468 |     ] = None,
469 |     status: Annotated[
470 |         str | None,
471 |         typer.Option(
472 |             "--status", "-s", help="Shortage status (current, resolved)"
473 |         ),
474 |     ] = None,
475 |     category: Annotated[
476 |         str | None,
477 |         typer.Option("--category", "-c", help="Therapeutic category"),
478 |     ] = None,
479 |     limit: Annotated[
480 |         int, typer.Option("--limit", "-l", help="Maximum number of results")
481 |     ] = 25,
482 |     page: Annotated[
483 |         int, typer.Option("--page", "-p", help="Page number (1-based)")
484 |     ] = 1,
485 |     api_key: Annotated[
486 |         str | None,
487 |         typer.Option(
488 |             "--api-key",
489 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
490 |         ),
491 |     ] = None,
492 | ):
493 |     """Search FDA drug shortage records."""
494 |     skip = (page - 1) * limit
495 | 
496 |     try:
497 |         results = asyncio.run(
498 |             search_drug_shortages(
499 |                 drug=drug,
500 |                 status=status,
501 |                 therapeutic_category=category,
502 |                 limit=limit,
503 |                 skip=skip,
504 |                 api_key=api_key,
505 |             )
506 |         )
507 |         console.print(results)
508 |     except Exception as e:
509 |         console.print(f"[red]Error: {e}[/red]")
510 |         raise typer.Exit(1) from e
511 | 
512 | 
513 | @shortage_app.command("get")
514 | def get_drug_shortage_cli(
515 |     drug: Annotated[str, typer.Argument(help="Drug name")],
516 |     api_key: Annotated[
517 |         str | None,
518 |         typer.Option(
519 |             "--api-key",
520 |             help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
521 |         ),
522 |     ] = None,
523 | ):
524 |     """Get detailed drug shortage information."""
525 |     try:
526 |         result = asyncio.run(get_drug_shortage(drug, api_key=api_key))
527 |         console.print(result)
528 |     except Exception as e:
529 |         console.print(f"[red]Error: {e}[/red]")
530 |         raise typer.Exit(1) from e
531 | 
532 | 
533 | # Main OpenFDA app that combines all subcommands
534 | openfda_app = typer.Typer(
535 |     no_args_is_help=True,
536 |     help="Search and retrieve data from FDA's openFDA API",
537 | )
538 | 
539 | # Add subcommands
540 | openfda_app.add_typer(
541 |     adverse_app, name="adverse", help="Drug adverse events (FAERS)"
542 | )
543 | openfda_app.add_typer(
544 |     label_app, name="label", help="Drug product labels (SPL)"
545 | )
546 | openfda_app.add_typer(
547 |     device_app, name="device", help="Device adverse events (MAUDE)"
548 | )
549 | openfda_app.add_typer(
550 |     approval_app, name="approval", help="Drug approvals (Drugs@FDA)"
551 | )
552 | openfda_app.add_typer(
553 |     recall_app, name="recall", help="Drug recalls (Enforcement)"
554 | )
555 | openfda_app.add_typer(shortage_app, name="shortage", help="Drug shortages")
556 | 
```

--------------------------------------------------------------------------------
/src/biomcp/articles/preprints.py:
--------------------------------------------------------------------------------

```python
  1 | """Preprint search functionality for bioRxiv/medRxiv and Europe PMC."""
  2 | 
  3 | import asyncio
  4 | import json
  5 | import logging
  6 | from datetime import datetime
  7 | from typing import Any
  8 | 
  9 | from pydantic import BaseModel, Field
 10 | 
 11 | from .. import http_client, render
 12 | from ..constants import (
 13 |     BIORXIV_BASE_URL,
 14 |     BIORXIV_DEFAULT_DAYS_BACK,
 15 |     BIORXIV_MAX_PAGES,
 16 |     BIORXIV_RESULTS_PER_PAGE,
 17 |     EUROPE_PMC_BASE_URL,
 18 |     EUROPE_PMC_PAGE_SIZE,
 19 |     MEDRXIV_BASE_URL,
 20 |     SYSTEM_PAGE_SIZE,
 21 | )
 22 | from ..core import PublicationState
 23 | from .search import PubmedRequest, ResultItem, SearchResponse
 24 | 
 25 | logger = logging.getLogger(__name__)
 26 | 
 27 | 
 28 | class BiorxivRequest(BaseModel):
 29 |     """Request parameters for bioRxiv/medRxiv API."""
 30 | 
 31 |     query: str
 32 |     interval: str = Field(
 33 |         default="", description="Date interval in YYYY-MM-DD/YYYY-MM-DD format"
 34 |     )
 35 |     cursor: int = Field(default=0, description="Starting position")
 36 | 
 37 | 
 38 | class BiorxivResult(BaseModel):
 39 |     """Individual result from bioRxiv/medRxiv."""
 40 | 
 41 |     doi: str | None = None
 42 |     title: str | None = None
 43 |     authors: str | None = None
 44 |     author_corresponding: str | None = None
 45 |     author_corresponding_institution: str | None = None
 46 |     date: str | None = None
 47 |     version: int | None = None
 48 |     type: str | None = None
 49 |     license: str | None = None
 50 |     category: str | None = None
 51 |     jatsxml: str | None = None
 52 |     abstract: str | None = None
 53 |     published: str | None = None
 54 |     server: str | None = None
 55 | 
 56 |     def to_result_item(self) -> ResultItem:
 57 |         """Convert to standard ResultItem format."""
 58 |         authors_list = []
 59 |         if self.authors:
 60 |             authors_list = [
 61 |                 author.strip() for author in self.authors.split(";")
 62 |             ]
 63 | 
 64 |         return ResultItem(
 65 |             pmid=None,
 66 |             pmcid=None,
 67 |             title=self.title,
 68 |             journal=f"{self.server or 'bioRxiv'} (preprint)",
 69 |             authors=authors_list,
 70 |             date=self.date,
 71 |             doi=self.doi,
 72 |             abstract=self.abstract,
 73 |             publication_state=PublicationState.PREPRINT,
 74 |             source=self.server or "bioRxiv",
 75 |         )
 76 | 
 77 | 
 78 | class BiorxivResponse(BaseModel):
 79 |     """Response from bioRxiv/medRxiv API."""
 80 | 
 81 |     collection: list[BiorxivResult] = Field(default_factory=list)
 82 |     messages: list[dict[str, Any]] = Field(default_factory=list)
 83 |     total: int = Field(default=0, alias="total")
 84 | 
 85 | 
 86 | class EuropePMCRequest(BaseModel):
 87 |     """Request parameters for Europe PMC API."""
 88 | 
 89 |     query: str
 90 |     format: str = "json"
 91 |     pageSize: int = Field(default=25, le=1000)
 92 |     cursorMark: str = Field(default="*")
 93 |     src: str = Field(default="PPR", description="Source: PPR for preprints")
 94 | 
 95 | 
 96 | class EuropePMCResult(BaseModel):
 97 |     """Individual result from Europe PMC."""
 98 | 
 99 |     id: str | None = None
100 |     source: str | None = None
101 |     pmid: str | None = None
102 |     pmcid: str | None = None
103 |     doi: str | None = None
104 |     title: str | None = None
105 |     authorString: str | None = None
106 |     journalTitle: str | None = None
107 |     pubYear: str | None = None
108 |     firstPublicationDate: str | None = None
109 |     abstractText: str | None = None
110 | 
111 |     def to_result_item(self) -> ResultItem:
112 |         """Convert to standard ResultItem format."""
113 |         authors_list = []
114 |         if self.authorString:
115 |             authors_list = [
116 |                 author.strip() for author in self.authorString.split(",")
117 |             ]
118 | 
119 |         return ResultItem(
120 |             pmid=int(self.pmid) if self.pmid and self.pmid.isdigit() else None,
121 |             pmcid=self.pmcid,
122 |             title=self.title,
123 |             journal=f"{self.journalTitle or 'Preprint Server'} (preprint)",
124 |             authors=authors_list,
125 |             date=self.firstPublicationDate or self.pubYear,
126 |             doi=self.doi,
127 |             abstract=self.abstractText,
128 |             publication_state=PublicationState.PREPRINT,
129 |             source="Europe PMC",
130 |         )
131 | 
132 | 
133 | class EuropePMCResponse(BaseModel):
134 |     """Response from Europe PMC API."""
135 | 
136 |     hitCount: int = Field(default=0)
137 |     nextCursorMark: str | None = None
138 |     resultList: dict[str, Any] = Field(default_factory=dict)
139 | 
140 |     @property
141 |     def results(self) -> list[EuropePMCResult]:
142 |         result_data = self.resultList.get("result", [])
143 |         return [EuropePMCResult(**r) for r in result_data]
144 | 
145 | 
146 | class PreprintSearcher:
147 |     """Handles searching across multiple preprint sources."""
148 | 
149 |     def __init__(self):
150 |         self.biorxiv_client = BiorxivClient()
151 |         self.europe_pmc_client = EuropePMCClient()
152 | 
153 |     async def search(
154 |         self,
155 |         request: PubmedRequest,
156 |         include_biorxiv: bool = True,
157 |         include_europe_pmc: bool = True,
158 |     ) -> SearchResponse:
159 |         """Search across preprint sources and merge results."""
160 |         query = self._build_query(request)
161 | 
162 |         tasks = []
163 |         if include_biorxiv:
164 |             tasks.append(self.biorxiv_client.search(query))
165 |         if include_europe_pmc:
166 |             tasks.append(self.europe_pmc_client.search(query))
167 | 
168 |         results_lists = await asyncio.gather(*tasks, return_exceptions=True)
169 | 
170 |         all_results = []
171 |         for results in results_lists:
172 |             if isinstance(results, list):
173 |                 all_results.extend(results)
174 | 
175 |         # Remove duplicates based on DOI
176 |         seen_dois = set()
177 |         unique_results = []
178 |         for result in all_results:
179 |             if result.doi and result.doi in seen_dois:
180 |                 continue
181 |             if result.doi:
182 |                 seen_dois.add(result.doi)
183 |             unique_results.append(result)
184 | 
185 |         # Sort by date (newest first)
186 |         unique_results.sort(key=lambda x: x.date or "0000-00-00", reverse=True)
187 | 
188 |         # Limit results
189 |         limited_results = unique_results[:SYSTEM_PAGE_SIZE]
190 | 
191 |         return SearchResponse(
192 |             results=limited_results,
193 |             page_size=len(limited_results),
194 |             current=0,
195 |             count=len(limited_results),
196 |             total_pages=1,
197 |         )
198 | 
199 |     def _build_query(self, request: PubmedRequest) -> str:
200 |         """Build query string from structured request.
201 | 
202 |         Note: Preprint servers use plain text search, not PubMed syntax.
203 |         """
204 |         query_parts = []
205 | 
206 |         if request.keywords:
207 |             query_parts.extend(request.keywords)
208 |         if request.genes:
209 |             query_parts.extend(request.genes)
210 |         if request.diseases:
211 |             query_parts.extend(request.diseases)
212 |         if request.chemicals:
213 |             query_parts.extend(request.chemicals)
214 |         if request.variants:
215 |             query_parts.extend(request.variants)
216 | 
217 |         return " ".join(query_parts) if query_parts else ""
218 | 
219 | 
220 | class BiorxivClient:
221 |     """Client for bioRxiv/medRxiv API.
222 | 
223 |     IMPORTANT LIMITATION: bioRxiv/medRxiv APIs do not provide a search endpoint.
224 |     This implementation works around this limitation by:
225 |     1. Fetching articles from a date range (last 365 days by default)
226 |     2. Filtering results client-side based on query match in title/abstract
227 | 
228 |     This approach has limitations but is optimized for performance:
229 |     - Searches up to 1 year of preprints by default (configurable)
230 |     - Uses pagination to avoid fetching all results at once
231 |     - May still miss older preprints beyond the date range
232 | 
233 |     Consider using Europe PMC for more comprehensive preprint search capabilities,
234 |     as it has proper search functionality without date limitations.
235 |     """
236 | 
237 |     async def search(  # noqa: C901
238 |         self,
239 |         query: str,
240 |         server: str = "biorxiv",
241 |         days_back: int = BIORXIV_DEFAULT_DAYS_BACK,
242 |     ) -> list[ResultItem]:
243 |         """Search bioRxiv or medRxiv for articles.
244 | 
245 |         Note: Due to API limitations, this performs client-side filtering on
246 |         recent articles only. See class docstring for details.
247 |         """
248 |         if not query:
249 |             return []
250 | 
251 |         base_url = (
252 |             BIORXIV_BASE_URL if server == "biorxiv" else MEDRXIV_BASE_URL
253 |         )
254 | 
255 |         # Optimize by only fetching recent articles (last 30 days by default)
256 |         from datetime import timedelta
257 | 
258 |         today = datetime.now()
259 |         start_date = today - timedelta(days=days_back)
260 |         interval = f"{start_date.year}-{start_date.month:02d}-{start_date.day:02d}/{today.year}-{today.month:02d}-{today.day:02d}"
261 | 
262 |         # Prepare query terms for better matching
263 |         query_terms = query.lower().split()
264 | 
265 |         filtered_results = []
266 |         cursor = 0
267 |         max_pages = (
268 |             BIORXIV_MAX_PAGES  # Limit pagination to avoid excessive API calls
269 |         )
270 | 
271 |         for page in range(max_pages):
272 |             request = BiorxivRequest(
273 |                 query=query, interval=interval, cursor=cursor
274 |             )
275 |             url = f"{base_url}/{request.interval}/{request.cursor}"
276 | 
277 |             response, error = await http_client.request_api(
278 |                 url=url,
279 |                 method="GET",
280 |                 request={},
281 |                 response_model_type=BiorxivResponse,
282 |                 domain="biorxiv",
283 |                 cache_ttl=300,  # Cache for 5 minutes
284 |             )
285 | 
286 |             if error or not response:
287 |                 logger.warning(
288 |                     f"Failed to fetch {server} articles page {page} for query '{query}': {error if error else 'No response'}"
289 |                 )
290 |                 break
291 | 
292 |             # Filter results based on query
293 |             page_filtered = 0
294 |             for result in response.collection:
295 |                 # Create searchable text from title and abstract
296 |                 searchable_text = ""
297 |                 if result.title:
298 |                     searchable_text += result.title.lower() + " "
299 |                 if result.abstract:
300 |                     searchable_text += result.abstract.lower()
301 | 
302 |                 # Check if all query terms are present (AND logic)
303 |                 if all(term in searchable_text for term in query_terms):
304 |                     filtered_results.append(result.to_result_item())
305 |                     page_filtered += 1
306 | 
307 |                     # Stop if we have enough results
308 |                     if len(filtered_results) >= SYSTEM_PAGE_SIZE:
309 |                         return filtered_results[:SYSTEM_PAGE_SIZE]
310 | 
311 |             # If this page had no matches and we have some results, stop pagination
312 |             if page_filtered == 0 and filtered_results:
313 |                 break
314 | 
315 |             # Move to next page
316 |             cursor += len(response.collection)
317 | 
318 |             # Stop if we've processed all available results
319 |             if (
320 |                 len(response.collection) < BIORXIV_RESULTS_PER_PAGE
321 |             ):  # bioRxiv typically returns this many per page
322 |                 break
323 | 
324 |         return filtered_results[:SYSTEM_PAGE_SIZE]
325 | 
326 | 
327 | class EuropePMCClient:
328 |     """Client for Europe PMC API."""
329 | 
330 |     async def search(
331 |         self, query: str, max_results: int = SYSTEM_PAGE_SIZE
332 |     ) -> list[ResultItem]:
333 |         """Search Europe PMC for preprints with pagination support."""
334 |         results: list[ResultItem] = []
335 |         cursor_mark = "*"
336 |         page_size = min(
337 |             EUROPE_PMC_PAGE_SIZE, max_results
338 |         )  # Europe PMC optimal page size
339 | 
340 |         while len(results) < max_results:
341 |             request = EuropePMCRequest(
342 |                 query=f"(SRC:PPR) AND ({query})" if query else "SRC:PPR",
343 |                 pageSize=page_size,
344 |                 cursorMark=cursor_mark,
345 |             )
346 | 
347 |             params = request.model_dump(exclude_none=True)
348 | 
349 |             response, error = await http_client.request_api(
350 |                 url=EUROPE_PMC_BASE_URL,
351 |                 method="GET",
352 |                 request=params,
353 |                 response_model_type=EuropePMCResponse,
354 |                 domain="europepmc",
355 |                 cache_ttl=300,  # Cache for 5 minutes
356 |             )
357 | 
358 |             if error or not response:
359 |                 logger.warning(
360 |                     f"Failed to fetch Europe PMC preprints for query '{query}': {error if error else 'No response'}"
361 |                 )
362 |                 break
363 | 
364 |             # Add results
365 |             page_results = [
366 |                 result.to_result_item() for result in response.results
367 |             ]
368 |             results.extend(page_results)
369 | 
370 |             # Check if we have more pages
371 |             if (
372 |                 not response.nextCursorMark
373 |                 or response.nextCursorMark == cursor_mark
374 |             ):
375 |                 break
376 | 
377 |             # Check if we got fewer results than requested (last page)
378 |             if len(page_results) < page_size:
379 |                 break
380 | 
381 |             cursor_mark = response.nextCursorMark
382 | 
383 |             # Adjust page size for last request if needed
384 |             remaining = max_results - len(results)
385 |             if remaining < page_size:
386 |                 page_size = remaining
387 | 
388 |         return results[:max_results]
389 | 
390 | 
391 | async def fetch_europe_pmc_article(
392 |     doi: str,
393 |     output_json: bool = False,
394 | ) -> str:
395 |     """Fetch a single article from Europe PMC by DOI."""
396 |     # Europe PMC search API can fetch article details by DOI
397 |     request = EuropePMCRequest(
398 |         query=f'DOI:"{doi}"',
399 |         pageSize=1,
400 |         src="PPR",  # Preprints source
401 |     )
402 | 
403 |     params = request.model_dump(exclude_none=True)
404 | 
405 |     response, error = await http_client.request_api(
406 |         url=EUROPE_PMC_BASE_URL,
407 |         method="GET",
408 |         request=params,
409 |         response_model_type=EuropePMCResponse,
410 |         domain="europepmc",
411 |     )
412 | 
413 |     if error:
414 |         data: list[dict[str, Any]] = [
415 |             {"error": f"Error {error.code}: {error.message}"}
416 |         ]
417 |     elif response and response.results:
418 |         # Convert Europe PMC result to Article format for consistency
419 |         europe_pmc_result = response.results[0]
420 |         article_data = {
421 |             "pmid": None,  # Europe PMC preprints don't have PMIDs
422 |             "pmcid": europe_pmc_result.pmcid,
423 |             "doi": europe_pmc_result.doi,
424 |             "title": europe_pmc_result.title,
425 |             "journal": f"{europe_pmc_result.journalTitle or 'Preprint Server'} (preprint)",
426 |             "date": europe_pmc_result.firstPublicationDate
427 |             or europe_pmc_result.pubYear,
428 |             "authors": [
429 |                 author.strip()
430 |                 for author in (europe_pmc_result.authorString or "").split(",")
431 |             ],
432 |             "abstract": europe_pmc_result.abstractText,
433 |             "full_text": "",  # Europe PMC API doesn't provide full text for preprints
434 |             "pubmed_url": None,
435 |             "pmc_url": f"https://europepmc.org/article/PPR/{doi}"
436 |             if doi
437 |             else None,
438 |             "source": "Europe PMC",
439 |         }
440 |         data = [article_data]
441 |     else:
442 |         data = [{"error": "Article not found in Europe PMC"}]
443 | 
444 |     if data and not output_json:
445 |         return render.to_markdown(data)
446 |     else:
447 |         return json.dumps(data, indent=2)
448 | 
449 | 
450 | async def search_preprints(
451 |     request: PubmedRequest,
452 |     include_biorxiv: bool = True,
453 |     include_europe_pmc: bool = True,
454 |     output_json: bool = False,
455 | ) -> str:
456 |     """Search for preprints across multiple sources."""
457 |     searcher = PreprintSearcher()
458 |     response = await searcher.search(
459 |         request,
460 |         include_biorxiv=include_biorxiv,
461 |         include_europe_pmc=include_europe_pmc,
462 |     )
463 | 
464 |     if response and response.results:
465 |         data = [
466 |             result.model_dump(mode="json", exclude_none=True)
467 |             for result in response.results
468 |         ]
469 |     else:
470 |         data = []
471 | 
472 |     if data and not output_json:
473 |         return render.to_markdown(data)
474 |     else:
475 |         return json.dumps(data, indent=2)
476 | 
```

--------------------------------------------------------------------------------
/src/biomcp/query_parser.py:
--------------------------------------------------------------------------------

```python
  1 | """Query parser for unified search language in BioMCP."""
  2 | 
  3 | from dataclasses import dataclass
  4 | from enum import Enum
  5 | from typing import Any
  6 | 
  7 | 
  8 | class Operator(str, Enum):
  9 |     """Query operators."""
 10 | 
 11 |     EQ = ":"
 12 |     GT = ">"
 13 |     LT = "<"
 14 |     GTE = ">="
 15 |     LTE = "<="
 16 |     RANGE = ".."
 17 |     AND = "AND"
 18 |     OR = "OR"
 19 |     NOT = "NOT"
 20 | 
 21 | 
 22 | class FieldType(str, Enum):
 23 |     """Field data types."""
 24 | 
 25 |     STRING = "string"
 26 |     NUMBER = "number"
 27 |     DATE = "date"
 28 |     ENUM = "enum"
 29 |     BOOLEAN = "boolean"
 30 | 
 31 | 
 32 | @dataclass
 33 | class FieldDefinition:
 34 |     """Definition of a searchable field."""
 35 | 
 36 |     name: str
 37 |     domain: str  # "trials", "articles", "variants", "cross"
 38 |     type: FieldType
 39 |     operators: list[str]
 40 |     example_values: list[str]
 41 |     description: str
 42 |     underlying_api_field: str
 43 |     aliases: list[str] | None = None
 44 | 
 45 | 
 46 | @dataclass
 47 | class QueryTerm:
 48 |     """Parsed query term."""
 49 | 
 50 |     field: str
 51 |     operator: Operator
 52 |     value: Any
 53 |     domain: str | None = None
 54 |     is_negated: bool = False
 55 | 
 56 | 
 57 | @dataclass
 58 | class ParsedQuery:
 59 |     """Parsed query structure."""
 60 | 
 61 |     terms: list[QueryTerm]
 62 |     cross_domain_fields: dict[str, Any]
 63 |     domain_specific_fields: dict[str, dict[str, Any]]
 64 |     raw_query: str
 65 | 
 66 | 
 67 | class QueryParser:
 68 |     """Parser for unified search queries."""
 69 | 
 70 |     def __init__(self):
 71 |         self.field_registry = self._build_field_registry()
 72 | 
 73 |     def _build_field_registry(self) -> dict[str, FieldDefinition]:
 74 |         """Build the field registry with all searchable fields."""
 75 |         registry = {}
 76 | 
 77 |         # Cross-domain fields
 78 |         cross_domain_fields = [
 79 |             FieldDefinition(
 80 |                 name="gene",
 81 |                 domain="cross",
 82 |                 type=FieldType.STRING,
 83 |                 operators=[Operator.EQ],
 84 |                 example_values=["BRAF", "TP53", "EGFR"],
 85 |                 description="Gene symbol",
 86 |                 underlying_api_field="gene",
 87 |             ),
 88 |             FieldDefinition(
 89 |                 name="variant",
 90 |                 domain="cross",
 91 |                 type=FieldType.STRING,
 92 |                 operators=[Operator.EQ],
 93 |                 example_values=["V600E", "L858R", "rs113488022"],
 94 |                 description="Variant notation or rsID",
 95 |                 underlying_api_field="variant",
 96 |             ),
 97 |             FieldDefinition(
 98 |                 name="disease",
 99 |                 domain="cross",
100 |                 type=FieldType.STRING,
101 |                 operators=[Operator.EQ],
102 |                 example_values=["melanoma", "lung cancer", "diabetes"],
103 |                 description="Disease or condition",
104 |                 underlying_api_field="disease",
105 |             ),
106 |         ]
107 | 
108 |         # Trial-specific fields
109 |         trial_fields = [
110 |             FieldDefinition(
111 |                 name="trials.condition",
112 |                 domain="trials",
113 |                 type=FieldType.STRING,
114 |                 operators=[Operator.EQ],
115 |                 example_values=["melanoma", "lung cancer"],
116 |                 description="Clinical trial condition",
117 |                 underlying_api_field="conditions",
118 |             ),
119 |             FieldDefinition(
120 |                 name="trials.intervention",
121 |                 domain="trials",
122 |                 type=FieldType.STRING,
123 |                 operators=[Operator.EQ],
124 |                 example_values=["osimertinib", "pembrolizumab"],
125 |                 description="Trial intervention",
126 |                 underlying_api_field="interventions",
127 |             ),
128 |             FieldDefinition(
129 |                 name="trials.phase",
130 |                 domain="trials",
131 |                 type=FieldType.ENUM,
132 |                 operators=[Operator.EQ],
133 |                 example_values=["1", "2", "3", "4"],
134 |                 description="Trial phase",
135 |                 underlying_api_field="phase",
136 |             ),
137 |             FieldDefinition(
138 |                 name="trials.status",
139 |                 domain="trials",
140 |                 type=FieldType.ENUM,
141 |                 operators=[Operator.EQ],
142 |                 example_values=["recruiting", "active", "completed"],
143 |                 description="Trial recruitment status",
144 |                 underlying_api_field="recruiting_status",
145 |             ),
146 |         ]
147 | 
148 |         # Article-specific fields
149 |         article_fields = [
150 |             FieldDefinition(
151 |                 name="articles.title",
152 |                 domain="articles",
153 |                 type=FieldType.STRING,
154 |                 operators=[Operator.EQ],
155 |                 example_values=["EGFR mutations", "cancer therapy"],
156 |                 description="Article title",
157 |                 underlying_api_field="title",
158 |             ),
159 |             FieldDefinition(
160 |                 name="articles.author",
161 |                 domain="articles",
162 |                 type=FieldType.STRING,
163 |                 operators=[Operator.EQ],
164 |                 example_values=["Smith J", "Johnson A"],
165 |                 description="Article author",
166 |                 underlying_api_field="author",
167 |             ),
168 |             FieldDefinition(
169 |                 name="articles.journal",
170 |                 domain="articles",
171 |                 type=FieldType.STRING,
172 |                 operators=[Operator.EQ],
173 |                 example_values=["Nature", "Science", "Cell"],
174 |                 description="Journal name",
175 |                 underlying_api_field="journal",
176 |             ),
177 |             FieldDefinition(
178 |                 name="articles.date",
179 |                 domain="articles",
180 |                 type=FieldType.DATE,
181 |                 operators=[Operator.GT, Operator.LT, Operator.RANGE],
182 |                 example_values=[">2023-01-01", "2023-01-01..2024-01-01"],
183 |                 description="Publication date",
184 |                 underlying_api_field="date",
185 |             ),
186 |         ]
187 | 
188 |         # Variant-specific fields
189 |         variant_fields = [
190 |             FieldDefinition(
191 |                 name="variants.rsid",
192 |                 domain="variants",
193 |                 type=FieldType.STRING,
194 |                 operators=[Operator.EQ],
195 |                 example_values=["rs113488022", "rs121913529"],
196 |                 description="dbSNP rsID",
197 |                 underlying_api_field="rsid",
198 |             ),
199 |             FieldDefinition(
200 |                 name="variants.gene",
201 |                 domain="variants",
202 |                 type=FieldType.STRING,
203 |                 operators=[Operator.EQ],
204 |                 example_values=["BRAF", "TP53"],
205 |                 description="Gene containing variant",
206 |                 underlying_api_field="gene",
207 |             ),
208 |             FieldDefinition(
209 |                 name="variants.significance",
210 |                 domain="variants",
211 |                 type=FieldType.ENUM,
212 |                 operators=[Operator.EQ],
213 |                 example_values=["pathogenic", "benign", "uncertain"],
214 |                 description="Clinical significance",
215 |                 underlying_api_field="significance",
216 |             ),
217 |             FieldDefinition(
218 |                 name="variants.frequency",
219 |                 domain="variants",
220 |                 type=FieldType.NUMBER,
221 |                 operators=[Operator.LT, Operator.GT],
222 |                 example_values=["<0.01", ">0.05"],
223 |                 description="Population allele frequency",
224 |                 underlying_api_field="frequency",
225 |             ),
226 |         ]
227 | 
228 |         # Gene-specific fields
229 |         gene_fields = [
230 |             FieldDefinition(
231 |                 name="genes.symbol",
232 |                 domain="genes",
233 |                 type=FieldType.STRING,
234 |                 operators=[Operator.EQ],
235 |                 example_values=["BRAF", "TP53", "EGFR"],
236 |                 description="Gene symbol",
237 |                 underlying_api_field="symbol",
238 |             ),
239 |             FieldDefinition(
240 |                 name="genes.name",
241 |                 domain="genes",
242 |                 type=FieldType.STRING,
243 |                 operators=[Operator.EQ],
244 |                 example_values=[
245 |                     "tumor protein p53",
246 |                     "epidermal growth factor receptor",
247 |                 ],
248 |                 description="Gene name",
249 |                 underlying_api_field="name",
250 |             ),
251 |             FieldDefinition(
252 |                 name="genes.type",
253 |                 domain="genes",
254 |                 type=FieldType.STRING,
255 |                 operators=[Operator.EQ],
256 |                 example_values=["protein-coding", "pseudo", "ncRNA"],
257 |                 description="Gene type",
258 |                 underlying_api_field="type_of_gene",
259 |             ),
260 |         ]
261 | 
262 |         # Drug-specific fields
263 |         drug_fields = [
264 |             FieldDefinition(
265 |                 name="drugs.name",
266 |                 domain="drugs",
267 |                 type=FieldType.STRING,
268 |                 operators=[Operator.EQ],
269 |                 example_values=["imatinib", "aspirin", "metformin"],
270 |                 description="Drug name",
271 |                 underlying_api_field="name",
272 |             ),
273 |             FieldDefinition(
274 |                 name="drugs.tradename",
275 |                 domain="drugs",
276 |                 type=FieldType.STRING,
277 |                 operators=[Operator.EQ],
278 |                 example_values=["Gleevec", "Tylenol", "Lipitor"],
279 |                 description="Drug trade name",
280 |                 underlying_api_field="tradename",
281 |             ),
282 |             FieldDefinition(
283 |                 name="drugs.indication",
284 |                 domain="drugs",
285 |                 type=FieldType.STRING,
286 |                 operators=[Operator.EQ],
287 |                 example_values=["leukemia", "hypertension", "diabetes"],
288 |                 description="Drug indication",
289 |                 underlying_api_field="indication",
290 |             ),
291 |         ]
292 | 
293 |         # Disease-specific fields
294 |         disease_fields = [
295 |             FieldDefinition(
296 |                 name="diseases.name",
297 |                 domain="diseases",
298 |                 type=FieldType.STRING,
299 |                 operators=[Operator.EQ],
300 |                 example_values=["melanoma", "breast cancer", "diabetes"],
301 |                 description="Disease name",
302 |                 underlying_api_field="name",
303 |             ),
304 |             FieldDefinition(
305 |                 name="diseases.mondo",
306 |                 domain="diseases",
307 |                 type=FieldType.STRING,
308 |                 operators=[Operator.EQ],
309 |                 example_values=["MONDO:0005105", "MONDO:0007254"],
310 |                 description="MONDO disease ID",
311 |                 underlying_api_field="mondo_id",
312 |             ),
313 |             FieldDefinition(
314 |                 name="diseases.synonym",
315 |                 domain="diseases",
316 |                 type=FieldType.STRING,
317 |                 operators=[Operator.EQ],
318 |                 example_values=["cancer", "tumor", "neoplasm"],
319 |                 description="Disease synonym",
320 |                 underlying_api_field="synonyms",
321 |             ),
322 |         ]
323 | 
324 |         # Build registry
325 |         for field_list in [
326 |             cross_domain_fields,
327 |             trial_fields,
328 |             article_fields,
329 |             variant_fields,
330 |             gene_fields,
331 |             drug_fields,
332 |             disease_fields,
333 |         ]:
334 |             for field in field_list:
335 |                 registry[field.name] = field
336 | 
337 |         return registry
338 | 
339 |     def parse(self, query: str) -> ParsedQuery:
340 |         """Parse a unified search query."""
341 |         # Simple tokenization - in production, use a proper parser
342 |         terms = self._tokenize(query)
343 |         parsed_terms = []
344 | 
345 |         cross_domain = {}
346 |         domain_specific: dict[str, dict[str, Any]] = {
347 |             "trials": {},
348 |             "articles": {},
349 |             "variants": {},
350 |             "genes": {},
351 |             "drugs": {},
352 |             "diseases": {},
353 |         }
354 | 
355 |         for term in terms:
356 |             if ":" in term:
357 |                 field, value = term.split(":", 1)
358 | 
359 |                 # Check if it's a known field
360 |                 if field in self.field_registry:
361 |                     field_def = self.field_registry[field]
362 |                     parsed_term = QueryTerm(
363 |                         field=field,
364 |                         operator=Operator.EQ,
365 |                         value=value.strip('"'),
366 |                         domain=field_def.domain,
367 |                     )
368 |                     parsed_terms.append(parsed_term)
369 | 
370 |                     # Categorize the term
371 |                     if field_def.domain == "cross":
372 |                         cross_domain[field] = value.strip('"')
373 |                     else:
374 |                         domain = (
375 |                             field.split(".")[0]
376 |                             if "." in field
377 |                             else field_def.domain
378 |                         )
379 |                         if domain not in domain_specific:
380 |                             domain_specific[domain] = {}
381 |                         field_name = (
382 |                             field.split(".")[-1] if "." in field else field
383 |                         )
384 |                         domain_specific[domain][field_name] = value.strip('"')
385 | 
386 |         return ParsedQuery(
387 |             terms=parsed_terms,
388 |             cross_domain_fields=cross_domain,
389 |             domain_specific_fields=domain_specific,
390 |             raw_query=query,
391 |         )
392 | 
393 |     def _tokenize(self, query: str) -> list[str]:
394 |         """Simple tokenizer for query strings."""
395 |         # This is a simplified tokenizer - in production, use a proper lexer
396 |         # For now, split on AND/OR/NOT while preserving field:value pairs
397 |         tokens = []
398 |         current_token = ""
399 |         in_quotes = False
400 | 
401 |         for char in query:
402 |             if char == '"':
403 |                 in_quotes = not in_quotes
404 |                 current_token += char
405 |             elif char == " " and not in_quotes:
406 |                 if current_token:
407 |                     tokens.append(current_token)
408 |                     current_token = ""
409 |             else:
410 |                 current_token += char
411 | 
412 |         if current_token:
413 |             tokens.append(current_token)
414 | 
415 |         # Filter out boolean operators for now
416 |         return [t for t in tokens if t not in ["AND", "OR", "NOT"]]
417 | 
418 |     def get_schema(self) -> dict[str, Any]:
419 |         """Get the complete field schema for discovery."""
420 |         schema: dict[str, Any] = {
421 |             "domains": [
422 |                 "trials",
423 |                 "articles",
424 |                 "variants",
425 |                 "genes",
426 |                 "drugs",
427 |                 "diseases",
428 |             ],
429 |             "cross_domain_fields": {},
430 |             "domain_fields": {
431 |                 "trials": {},
432 |                 "articles": {},
433 |                 "variants": {},
434 |                 "genes": {},
435 |                 "drugs": {},
436 |                 "diseases": {},
437 |             },
438 |             "operators": [op.value for op in Operator],
439 |             "examples": [
440 |                 "gene:BRAF AND trials.condition:melanoma",
441 |                 "articles.date:>2023 AND disease:cancer",
442 |                 "variants.significance:pathogenic AND gene:TP53",
443 |                 "genes.symbol:BRAF AND genes.type:protein-coding",
444 |                 "drugs.tradename:gleevec",
445 |                 "diseases.name:melanoma",
446 |             ],
447 |         }
448 | 
449 |         for field_name, field_def in self.field_registry.items():
450 |             field_info = {
451 |                 "type": field_def.type.value,
452 |                 "operators": field_def.operators,
453 |                 "examples": field_def.example_values,
454 |                 "description": field_def.description,
455 |             }
456 | 
457 |             if field_def.domain == "cross":
458 |                 schema["cross_domain_fields"][field_name] = field_info
459 |             else:
460 |                 domain = field_name.split(".")[0]
461 |                 field_short_name = field_name.split(".")[-1]
462 |                 schema["domain_fields"][domain][field_short_name] = field_info
463 | 
464 |         return schema
465 | 
```

--------------------------------------------------------------------------------
/src/biomcp/resources/instructions.md:
--------------------------------------------------------------------------------

```markdown
  1 | # BioMCP Instructions for the Biomedical Assistant
  2 | 
  3 | Welcome to **BioMCP** – your unified interface to access key biomedical data
  4 | sources. This document serves as an internal instruction set for the biomedical
  5 | assistant (LLM) to ensure a clear, well-reasoned, and accurate response to user
  6 | queries.
  7 | 
  8 | ---
  9 | 
 10 | ## CRITICAL: Always Use the 'think' Tool FIRST
 11 | 
 12 | **The 'think' tool is MANDATORY and must be your FIRST action when using BioMCP.**
 13 | 
 14 | 🚨 **REQUIRED USAGE:**
 15 | 
 16 | - You MUST call 'think' BEFORE any search or fetch operations
 17 | - EVERY biomedical research query requires thinking first
 18 | - ALL multi-step analyses must begin with the think tool
 19 | - ANY task using BioMCP tools requires prior planning with think
 20 | 
 21 | ⚠️ **WARNING:** Skipping the 'think' tool will result in:
 22 | 
 23 | - Incomplete analysis
 24 | - Poor search strategies
 25 | - Missing critical connections
 26 | - Suboptimal results
 27 | 
 28 | Start EVERY BioMCP interaction with the 'think' tool. Use it throughout your analysis to track progress. Only set nextThoughtNeeded=false when your analysis is complete.
 29 | 
 30 | ---
 31 | 
 32 | ## 1. Purpose of BioMCP
 33 | 
 34 | BioMCP (Biomedical Model Context Protocol) standardizes access to multiple
 35 | biomedical data sources. It transforms complex, filter-intensive queries into
 36 | natural language interactions. The assistant should leverage this capability
 37 | to:
 38 | 
 39 | - Integrate clinical trial data, literature, variant annotations, and
 40 |   comprehensive biomedical information from multiple resources.
 41 | - Synthesize the results into a coherent, accurate, and concise answer.
 42 | - Enhance user trust by providing key snippets and citations (with clickable
 43 |   URLs) from the original materials, unless the user opts to omit them.
 44 | 
 45 | ---
 46 | 
 47 | ## 2. Available Data Sources
 48 | 
 49 | BioMCP provides access to the following biomedical databases:
 50 | 
 51 | ### Literature & Clinical Sources
 52 | 
 53 | - **PubMed/PubTator3**: Peer-reviewed biomedical literature with entity annotations
 54 | - **bioRxiv/medRxiv**: Preprint servers (included by default in article searches)
 55 | - **Europe PMC**: Additional literature including preprints
 56 | - **ClinicalTrials.gov**: Clinical trial registry with comprehensive trial data
 57 | 
 58 | ### BioThings Suite APIs
 59 | 
 60 | - **MyVariant.info**: Genetic variant annotations and population frequencies
 61 | - **MyGene.info**: Real-time gene information, aliases, and summaries
 62 | - **MyDisease.info**: Disease ontology, definitions, and synonym expansion
 63 | - **MyChem.info**: Drug/chemical properties, mechanisms, and identifiers
 64 | 
 65 | ### Cancer & Genomic Resources
 66 | 
 67 | - **cBioPortal**: Cancer genomics data (automatically integrated with gene searches)
 68 | - **TCGA/GDC**: The Cancer Genome Atlas data for variants
 69 | - **1000 Genomes**: Population frequency data via Ensembl
 70 | 
 71 | ---
 72 | 
 73 | ## 3. Internal Workflow for Query Handling
 74 | 
 75 | When a user query is received (for example, "Please investigate ALK
 76 | rearrangements in advanced NSCLC..."), the assistant should follow these steps:
 77 | 
 78 | ### A. ALWAYS Start with the 'think' Tool
 79 | 
 80 | - **Use 'think' immediately:** For ANY biomedical research query, you MUST begin by invoking the 'think' tool to break down the problem systematically.
 81 | - **Initial thought should:** Parse the user's natural language query and extract relevant details such as gene variants (e.g., ALK rearrangements), disease type (advanced NSCLC), and treatment focus (combinations of ALK inhibitors with immunotherapy).
 82 | - **Continue thinking:** Use additional 'think' calls to plan your approach, identify data sources needed, and track your analysis progress.
 83 | 
 84 | ### B. Plan and Explain the Tool Sequence (via the 'think' Tool)
 85 | 
 86 | - **Use 'think' to plan:** Continue using the 'think' tool to outline your reasoning and planned tool sequence:
 87 |   - **Step 1:** Use gene_getter to understand ALK gene function and context.
 88 |   - **Step 2:** Use disease_getter to get comprehensive information about NSCLC,
 89 |     including synonyms for better search coverage.
 90 |   - **Step 3:** Use ClinicalTrials.gov to retrieve clinical trial data
 91 |     related to the query (disease synonyms are automatically expanded).
 92 |   - **Step 4:** Use PubMed (via PubTator3) to fetch relevant literature
 93 |     discussing outcomes or synergy. Note: Preprints from bioRxiv/medRxiv
 94 |     are included by default, and cBioPortal cancer genomics data is
 95 |     automatically integrated for gene-based searches.
 96 |   - **Step 5:** Query MyVariant.info for variant annotations (noting
 97 |     limitations for gene fusions if applicable).
 98 |   - **Step 6:** If specific drugs are mentioned, use drug_getter for
 99 |     mechanism of action and properties.
100 | - **Transparency:** Clearly indicate which tool is being called for which part
101 |   of the query.
102 | 
103 | #### Search Syntax Enhancement: OR Logic for Keywords
104 | 
105 | When searching articles, the keywords parameter now supports OR logic using the pipe (|) separator:
106 | 
107 | **Syntax**: `keyword1|keyword2|keyword3`
108 | 
109 | **Examples**:
110 | 
111 | - `"R173|Arg173|p.R173"` - Finds articles mentioning any of these variant notations
112 | - `"V600E|p.V600E|c.1799T>A"` - Handles different mutation nomenclatures
113 | - `"immunotherapy|checkpoint inhibitor|PD-1"` - Searches for related treatment terms
114 | - `"NSCLC|non-small cell lung cancer"` - Covers abbreviations and full terms
115 | 
116 | **Important Notes**:
117 | 
118 | - OR logic only applies within a single keyword parameter
119 | - Multiple keywords are still combined with AND logic
120 | - Example: keywords=["BRAF|B-RAF", "therapy|treatment"] means:
121 |   - (BRAF OR B-RAF) AND (therapy OR treatment)
122 | 
123 | This feature is particularly useful for:
124 | 
125 | - Handling different nomenclatures for the same concept
126 | - Searching for synonyms or related terms
127 | - Dealing with abbreviations and full names
128 | - Finding articles that use different notations for variants
129 | 
130 | ### C. Execute and Synthesize Results
131 | 
132 | - **Combine Data:** After retrieving results from each tool, synthesize the
133 |   information into a final answer.
134 | - **Include Citations with URLs:** Always include clickable URLs from the
135 |   original sources in your citations. Extract URLs (Pubmed_Url, Doi_Url,
136 |   Study_Url, etc.) from function results and incorporate these into your
137 |   response when referencing specific findings or papers.
138 | - **Follow-up Opportunity:** If the response leaves any ambiguity or if
139 |   additional information might be helpful, prompt the user for follow-up
140 |   questions.
141 | 
142 | ---
143 | 
144 | ## 3. Best Practices for the Biomedical Assistant
145 | 
146 | - **Understanding the Query:** Focus on accurately interpreting the user's
147 |   query, rather than instructing the user on query formulation.
148 | - **Reasoning Transparency:** Briefly explain your thought process and the
149 |   sequence of tool calls before presenting the final answer.
150 | - **Conciseness and Clarity:** Ensure your final response is succinct and
151 |   well-organized, using bullet points or sections as needed.
152 | - **Citation Inclusion Mandatory:** Provide key snippets and links to the
153 |   original materials (e.g., clinical trial records, PubMed articles, ClinVar
154 |   entries, COSMIC database) to support the answer. ALWAYS include clickable
155 |   URLs to these resources when referencing specific findings or data.
156 | - **User Follow-up Questions Before Startup:** If anything is unclear in the
157 |   user's query or if more details would improve the answer, politely request
158 |   additional clarification.
159 | - **Audience Awareness:** Structure your response with both depth for
160 |   specialists and clarity for general audiences. Begin with accessible
161 |   explanations before delving into scientific details.
162 | - **Organization and Clarity:** Ensure your final response is well-structured,
163 |   accessible, and easy to navigate by:
164 |   - Using descriptive section headings and subheadings to organize
165 |     information logically
166 |   - Employing consistent formatting with bulleted or numbered lists to break
167 |     down complex information
168 |   - Starting each major section with a plain-language summary before
169 |     exploring technical details
170 |   - Creating clear visual separation between different topics
171 |   - Using concise sentence structures while maintaining informational depth
172 |   - Explicitly differentiating between established practices and experimental
173 |     approaches
174 |   - Including brief transition sentences between major sections
175 |   - Presenting clinical trial data in consistent formats
176 |   - Using strategic white space to improve readability
177 |   - Summarizing key takeaways at the end of major sections when appropriate
178 | 
179 | ---
180 | 
181 | ## 4. Visual Organization and Formatting
182 | 
183 | - **Comparison Tables:** When comparing two or more entities (like mutation
184 |   classes, treatment approaches, or clinical trials), create a comparison table
185 |   to highlight key differences at a glance. Tables should have clear headers,
186 |   consistent formatting, and focus on the most important distinguishing
187 |   features.
188 | - **Format Optimization:** Utilize formatting elements strategically - tables
189 |   for comparisons, bullet points for lists, headings for section organization,
190 |   and whitespace for readability.
191 | - **Visual Hierarchy:** For complex biomedical topics, create a visual
192 |   hierarchy that helps readers quickly identify key information.
193 | - **Balance Between Comprehensiveness and Clarity:** While providing
194 |   comprehensive information, prioritize clarity and accessibility. Organize
195 |   content from most important/general to more specialized details.
196 | - **Section Summaries:** Conclude sections with key takeaways that highlight
197 |   the practical implications of the scientific information.
198 | 
199 | ---
200 | 
201 | ## 5. Example Scenario: ALK Rearrangements in Advanced NSCLC
202 | 
203 | ### Example 1: ALK Rearrangements in Advanced NSCLC
204 | 
205 | For a query such as:
206 | 
207 | ```
208 | Please investigate ALK rearrangements in advanced NSCLC, particularly any
209 | clinical trials exploring combinations of ALK inhibitors and immunotherapy.
210 | ```
211 | 
212 | The assistant should:
213 | 
214 | 1. **Start with the 'think' Tool:**
215 |    - Invoke 'think' with thoughtNumber=1 to understand the query focus on ALK rearrangements in advanced NSCLC with combination treatments
216 |    - Use thoughtNumber=2 to plan the research approach and identify needed data sources
217 | 2. **Execute Tool Calls (tracking with 'think'):**
218 |    - **First:** Use gene_getter("ALK") to understand the gene's function and role in cancer (document findings in thoughtNumber=3)
219 |    - **Second:** Use disease_getter("NSCLC") to get disease information and synonyms like "non-small cell lung cancer" (document in thoughtNumber=4)
220 |    - **Third:** Query ClinicalTrials.gov for ALK+ NSCLC trials that combine ALK inhibitors with immunotherapy (document findings in thoughtNumber=5)
221 |    - **Fourth:** Query PubMed to retrieve key articles discussing treatment outcomes or synergy (document in thoughtNumber=6)
222 |    - **Fifth:** Check MyVariant.info for any annotations on ALK fusions or rearrangements (document in thoughtNumber=7)
223 |    - **Sixth:** If specific ALK inhibitors are mentioned, use drug_getter to understand their mechanisms (document in thoughtNumber=8)
224 | 3. **Synthesize and Report (via 'think'):** Use final thoughts to synthesize findings before producing the answer that includes:
225 |    - A concise summary of clinical trials with comparison tables like:
226 | 
227 | | **Trial**        | **Combination**        | **Patient Population**         | **Results** | **Safety Profile**                              | **Reference**                                                    |
228 | | ---------------- | ---------------------- | ------------------------------ | ----------- | ----------------------------------------------- | ---------------------------------------------------------------- |
229 | | CheckMate 370    | Crizotinib + Nivolumab | 13 treatment-naive ALK+ NSCLC  | 38% ORR     | 5/13 with grade ≥3 hepatic toxicities; 2 deaths | [Schenk et al., 2023](https://pubmed.ncbi.nlm.nih.gov/36895933/) |
230 | | JAVELIN Lung 101 | Avelumab + Lorlatinib  | 28 previously treated patients | 46.4% ORR   | No DLTs; milder toxicity                        | [NCT02584634](https://clinicaltrials.gov/study/NCT02584634)      |
231 | 
232 |     - Key literature findings with proper citations:
233 |       "A review by Schenk concluded that combining ALK inhibitors with checkpoint inhibitors resulted in 'significant toxicities without clear improvement in patient outcomes' [https://pubmed.ncbi.nlm.nih.gov/36895933/](https://pubmed.ncbi.nlm.nih.gov/36895933/)."
234 | 
235 |     - Tables comparing response rates:
236 | 
237 | | **Study**             | **Patient Population** | **Immunotherapy Agent**       | **Response Rate** | **Reference**                                                 |
238 | | --------------------- | ---------------------- | ----------------------------- | ----------------- | ------------------------------------------------------------- |
239 | | ATLANTIC Trial        | 11 ALK+ NSCLC          | Durvalumab                    | 0%                | [Link to study](https://pubmed.ncbi.nlm.nih.gov/36895933/)    |
240 | | IMMUNOTARGET Registry | 19 ALK+ NSCLC          | Various PD-1/PD-L1 inhibitors | 0%                | [Link to registry](https://pubmed.ncbi.nlm.nih.gov/36895933/) |
241 | 
242 |     - Variant information with proper attribution.
243 | 
244 | 4. **Offer Follow-up:** Conclude by asking if further details are needed or if
245 |    any part of the answer should be clarified.
246 | 
247 | ### Example 2: BRAF Mutation Classes in Cancer Therapeutics
248 | 
249 | For a query such as:
250 | 
251 | ```
252 | Please investigate the differences in BRAF Class I (e.g., V600E) and Class III
253 | (e.g., D594G) mutations that lead to different therapeutic strategies in cancers
254 | like melanoma or colorectal carcinoma.
255 | ```
256 | 
257 | The assistant should:
258 | 
259 | 1. **Understand and Clarify:** Identify that the query focuses on comparing two
260 |    specific BRAF mutation classes (Class I/V600E vs. Class III/D594G) and their
261 |    therapeutic implications in melanoma and colorectal cancer.
262 | 
263 | 2. **Plan Tool Calls:**
264 | 
265 |    - **First:** Search PubMed literature to understand the molecular
266 |      differences between BRAF Class I and Class III mutations.
267 |    - **Second:** Explore specific variant details using the variant search
268 |      tool to understand the characteristics of these mutations.
269 |    - **Third:** Look for clinical trials involving these mutation types to
270 |      identify therapeutic strategies.
271 | 
272 | 3. **Synthesize and Report:** Create a comprehensive comparison that includes:
273 |    - Comparison tables highlighting key differences between mutation classes:
274 | 
275 | | Feature                      | Class I (e.g., V600E)          | Class III (e.g., D594G)                    |
276 | | ---------------------------- | ------------------------------ | ------------------------------------------ |
277 | | **Signaling Mechanism**      | Constitutively active monomers | Kinase-impaired heterodimers               |
278 | | **RAS Dependency**           | RAS-independent                | RAS-dependent                              |
279 | | **Dimerization Requirement** | Function as monomers           | Require heterodimerization with CRAF       |
280 | | **Therapeutic Response**     | Responsive to BRAF inhibitors  | Paradoxically activated by BRAF inhibitors |
281 | 
282 |     - Specific therapeutic strategies with clickable citation links:
283 |         - For Class I: BRAF inhibitors as demonstrated
284 |           in [Davies et al.](https://pubmed.ncbi.nlm.nih.gov/35869122/)
285 |         - For Class III: Alternative approaches such as MEK inhibitors shown
286 |           in [Śmiech et al.](https://pubmed.ncbi.nlm.nih.gov/33198372/)
287 | 
288 |     - Cancer-specific implications with relevant clinical evidence:
289 |         - Melanoma treatment differences including clinical trial data
290 |           from [NCT05767879](https://clinicaltrials.gov/study/NCT05767879)
291 |         - Colorectal cancer approaches citing research
292 |           from [Liu et al.](https://pubmed.ncbi.nlm.nih.gov/37760573/)
293 | 
294 | 4. **Offer Follow-up:** Conclude by asking if the user would like more detailed
295 |    information on specific aspects, such as resistance mechanisms, emerging
296 |    therapies, or mutation detection methods.
297 | 
```