#
tokens: 46604/50000 12/303 files (page 9/15)
lines: off (toggle) GitHub
raw markdown copy
This is page 9 of 15. Use http://codebase.md/genomoncology/biomcp?page={x} to view the full context.

# Directory Structure

```
├── .github
│   ├── actions
│   │   └── setup-python-env
│   │       └── action.yml
│   ├── dependabot.yml
│   └── workflows
│       ├── ci.yml
│       ├── deploy-docs.yml
│       ├── main.yml.disabled
│       ├── on-release-main.yml
│       └── validate-codecov-config.yml
├── .gitignore
├── .pre-commit-config.yaml
├── BIOMCP_DATA_FLOW.md
├── CHANGELOG.md
├── CNAME
├── codecov.yaml
├── docker-compose.yml
├── Dockerfile
├── docs
│   ├── apis
│   │   ├── error-codes.md
│   │   ├── overview.md
│   │   └── python-sdk.md
│   ├── assets
│   │   ├── biomcp-cursor-locations.png
│   │   ├── favicon.ico
│   │   ├── icon.png
│   │   ├── logo.png
│   │   ├── mcp_architecture.txt
│   │   └── remote-connection
│   │       ├── 00_connectors.png
│   │       ├── 01_add_custom_connector.png
│   │       ├── 02_connector_enabled.png
│   │       ├── 03_connect_to_biomcp.png
│   │       ├── 04_select_google_oauth.png
│   │       └── 05_success_connect.png
│   ├── backend-services-reference
│   │   ├── 01-overview.md
│   │   ├── 02-biothings-suite.md
│   │   ├── 03-cbioportal.md
│   │   ├── 04-clinicaltrials-gov.md
│   │   ├── 05-nci-cts-api.md
│   │   ├── 06-pubtator3.md
│   │   └── 07-alphagenome.md
│   ├── blog
│   │   ├── ai-assisted-clinical-trial-search-analysis.md
│   │   ├── images
│   │   │   ├── deep-researcher-video.png
│   │   │   ├── researcher-announce.png
│   │   │   ├── researcher-drop-down.png
│   │   │   ├── researcher-prompt.png
│   │   │   ├── trial-search-assistant.png
│   │   │   └── what_is_biomcp_thumbnail.png
│   │   └── researcher-persona-resource.md
│   ├── changelog.md
│   ├── CNAME
│   ├── concepts
│   │   ├── 01-what-is-biomcp.md
│   │   ├── 02-the-deep-researcher-persona.md
│   │   └── 03-sequential-thinking-with-the-think-tool.md
│   ├── developer-guides
│   │   ├── 01-server-deployment.md
│   │   ├── 02-contributing-and-testing.md
│   │   ├── 03-third-party-endpoints.md
│   │   ├── 04-transport-protocol.md
│   │   ├── 05-error-handling.md
│   │   ├── 06-http-client-and-caching.md
│   │   ├── 07-performance-optimizations.md
│   │   └── generate_endpoints.py
│   ├── faq-condensed.md
│   ├── FDA_SECURITY.md
│   ├── genomoncology.md
│   ├── getting-started
│   │   ├── 01-quickstart-cli.md
│   │   ├── 02-claude-desktop-integration.md
│   │   └── 03-authentication-and-api-keys.md
│   ├── how-to-guides
│   │   ├── 01-find-articles-and-cbioportal-data.md
│   │   ├── 02-find-trials-with-nci-and-biothings.md
│   │   ├── 03-get-comprehensive-variant-annotations.md
│   │   ├── 04-predict-variant-effects-with-alphagenome.md
│   │   ├── 05-logging-and-monitoring-with-bigquery.md
│   │   └── 06-search-nci-organizations-and-interventions.md
│   ├── index.md
│   ├── policies.md
│   ├── reference
│   │   ├── architecture-diagrams.md
│   │   ├── quick-architecture.md
│   │   ├── quick-reference.md
│   │   └── visual-architecture.md
│   ├── robots.txt
│   ├── stylesheets
│   │   ├── announcement.css
│   │   └── extra.css
│   ├── troubleshooting.md
│   ├── tutorials
│   │   ├── biothings-prompts.md
│   │   ├── claude-code-biomcp-alphagenome.md
│   │   ├── nci-prompts.md
│   │   ├── openfda-integration.md
│   │   ├── openfda-prompts.md
│   │   ├── pydantic-ai-integration.md
│   │   └── remote-connection.md
│   ├── user-guides
│   │   ├── 01-command-line-interface.md
│   │   ├── 02-mcp-tools-reference.md
│   │   └── 03-integrating-with-ides-and-clients.md
│   └── workflows
│       └── all-workflows.md
├── example_scripts
│   ├── mcp_integration.py
│   └── python_sdk.py
├── glama.json
├── LICENSE
├── lzyank.toml
├── Makefile
├── mkdocs.yml
├── package-lock.json
├── package.json
├── pyproject.toml
├── README.md
├── scripts
│   ├── check_docs_in_mkdocs.py
│   ├── check_http_imports.py
│   └── generate_endpoints_doc.py
├── smithery.yaml
├── src
│   └── biomcp
│       ├── __init__.py
│       ├── __main__.py
│       ├── articles
│       │   ├── __init__.py
│       │   ├── autocomplete.py
│       │   ├── fetch.py
│       │   ├── preprints.py
│       │   ├── search_optimized.py
│       │   ├── search.py
│       │   └── unified.py
│       ├── biomarkers
│       │   ├── __init__.py
│       │   └── search.py
│       ├── cbioportal_helper.py
│       ├── circuit_breaker.py
│       ├── cli
│       │   ├── __init__.py
│       │   ├── articles.py
│       │   ├── biomarkers.py
│       │   ├── diseases.py
│       │   ├── health.py
│       │   ├── interventions.py
│       │   ├── main.py
│       │   ├── openfda.py
│       │   ├── organizations.py
│       │   ├── server.py
│       │   ├── trials.py
│       │   └── variants.py
│       ├── connection_pool.py
│       ├── constants.py
│       ├── core.py
│       ├── diseases
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── domain_handlers.py
│       ├── drugs
│       │   ├── __init__.py
│       │   └── getter.py
│       ├── exceptions.py
│       ├── genes
│       │   ├── __init__.py
│       │   └── getter.py
│       ├── http_client_simple.py
│       ├── http_client.py
│       ├── individual_tools.py
│       ├── integrations
│       │   ├── __init__.py
│       │   ├── biothings_client.py
│       │   └── cts_api.py
│       ├── interventions
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── logging_filter.py
│       ├── metrics_handler.py
│       ├── metrics.py
│       ├── openfda
│       │   ├── __init__.py
│       │   ├── adverse_events_helpers.py
│       │   ├── adverse_events.py
│       │   ├── cache.py
│       │   ├── constants.py
│       │   ├── device_events_helpers.py
│       │   ├── device_events.py
│       │   ├── drug_approvals.py
│       │   ├── drug_labels_helpers.py
│       │   ├── drug_labels.py
│       │   ├── drug_recalls_helpers.py
│       │   ├── drug_recalls.py
│       │   ├── drug_shortages_detail_helpers.py
│       │   ├── drug_shortages_helpers.py
│       │   ├── drug_shortages.py
│       │   ├── exceptions.py
│       │   ├── input_validation.py
│       │   ├── rate_limiter.py
│       │   ├── utils.py
│       │   └── validation.py
│       ├── organizations
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   └── search.py
│       ├── parameter_parser.py
│       ├── prefetch.py
│       ├── query_parser.py
│       ├── query_router.py
│       ├── rate_limiter.py
│       ├── render.py
│       ├── request_batcher.py
│       ├── resources
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   ├── instructions.md
│       │   └── researcher.md
│       ├── retry.py
│       ├── router_handlers.py
│       ├── router.py
│       ├── shared_context.py
│       ├── thinking
│       │   ├── __init__.py
│       │   ├── sequential.py
│       │   └── session.py
│       ├── thinking_tool.py
│       ├── thinking_tracker.py
│       ├── trials
│       │   ├── __init__.py
│       │   ├── getter.py
│       │   ├── nci_getter.py
│       │   ├── nci_search.py
│       │   └── search.py
│       ├── utils
│       │   ├── __init__.py
│       │   ├── cancer_types_api.py
│       │   ├── cbio_http_adapter.py
│       │   ├── endpoint_registry.py
│       │   ├── gene_validator.py
│       │   ├── metrics.py
│       │   ├── mutation_filter.py
│       │   ├── query_utils.py
│       │   ├── rate_limiter.py
│       │   └── request_cache.py
│       ├── variants
│       │   ├── __init__.py
│       │   ├── alphagenome.py
│       │   ├── cancer_types.py
│       │   ├── cbio_external_client.py
│       │   ├── cbioportal_mutations.py
│       │   ├── cbioportal_search_helpers.py
│       │   ├── cbioportal_search.py
│       │   ├── constants.py
│       │   ├── external.py
│       │   ├── filters.py
│       │   ├── getter.py
│       │   ├── links.py
│       │   └── search.py
│       └── workers
│           ├── __init__.py
│           ├── worker_entry_stytch.js
│           ├── worker_entry.js
│           └── worker.py
├── tests
│   ├── bdd
│   │   ├── cli_help
│   │   │   ├── help.feature
│   │   │   └── test_help.py
│   │   ├── conftest.py
│   │   ├── features
│   │   │   └── alphagenome_integration.feature
│   │   ├── fetch_articles
│   │   │   ├── fetch.feature
│   │   │   └── test_fetch.py
│   │   ├── get_trials
│   │   │   ├── get.feature
│   │   │   └── test_get.py
│   │   ├── get_variants
│   │   │   ├── get.feature
│   │   │   └── test_get.py
│   │   ├── search_articles
│   │   │   ├── autocomplete.feature
│   │   │   ├── search.feature
│   │   │   ├── test_autocomplete.py
│   │   │   └── test_search.py
│   │   ├── search_trials
│   │   │   ├── search.feature
│   │   │   └── test_search.py
│   │   ├── search_variants
│   │   │   ├── search.feature
│   │   │   └── test_search.py
│   │   └── steps
│   │       └── test_alphagenome_steps.py
│   ├── config
│   │   └── test_smithery_config.py
│   ├── conftest.py
│   ├── data
│   │   ├── ct_gov
│   │   │   ├── clinical_trials_api_v2.yaml
│   │   │   ├── trials_NCT04280705.json
│   │   │   └── trials_NCT04280705.txt
│   │   ├── myvariant
│   │   │   ├── myvariant_api.yaml
│   │   │   ├── myvariant_field_descriptions.csv
│   │   │   ├── variants_full_braf_v600e.json
│   │   │   ├── variants_full_braf_v600e.txt
│   │   │   └── variants_part_braf_v600_multiple.json
│   │   ├── openfda
│   │   │   ├── drugsfda_detail.json
│   │   │   ├── drugsfda_search.json
│   │   │   ├── enforcement_detail.json
│   │   │   └── enforcement_search.json
│   │   └── pubtator
│   │       ├── pubtator_autocomplete.json
│   │       └── pubtator3_paper.txt
│   ├── integration
│   │   ├── test_openfda_integration.py
│   │   ├── test_preprints_integration.py
│   │   ├── test_simple.py
│   │   └── test_variants_integration.py
│   ├── tdd
│   │   ├── articles
│   │   │   ├── test_autocomplete.py
│   │   │   ├── test_cbioportal_integration.py
│   │   │   ├── test_fetch.py
│   │   │   ├── test_preprints.py
│   │   │   ├── test_search.py
│   │   │   └── test_unified.py
│   │   ├── conftest.py
│   │   ├── drugs
│   │   │   ├── __init__.py
│   │   │   └── test_drug_getter.py
│   │   ├── openfda
│   │   │   ├── __init__.py
│   │   │   ├── test_adverse_events.py
│   │   │   ├── test_device_events.py
│   │   │   ├── test_drug_approvals.py
│   │   │   ├── test_drug_labels.py
│   │   │   ├── test_drug_recalls.py
│   │   │   ├── test_drug_shortages.py
│   │   │   └── test_security.py
│   │   ├── test_biothings_integration_real.py
│   │   ├── test_biothings_integration.py
│   │   ├── test_circuit_breaker.py
│   │   ├── test_concurrent_requests.py
│   │   ├── test_connection_pool.py
│   │   ├── test_domain_handlers.py
│   │   ├── test_drug_approvals.py
│   │   ├── test_drug_recalls.py
│   │   ├── test_drug_shortages.py
│   │   ├── test_endpoint_documentation.py
│   │   ├── test_error_scenarios.py
│   │   ├── test_europe_pmc_fetch.py
│   │   ├── test_mcp_integration.py
│   │   ├── test_mcp_tools.py
│   │   ├── test_metrics.py
│   │   ├── test_nci_integration.py
│   │   ├── test_nci_mcp_tools.py
│   │   ├── test_network_policies.py
│   │   ├── test_offline_mode.py
│   │   ├── test_openfda_unified.py
│   │   ├── test_pten_r173_search.py
│   │   ├── test_render.py
│   │   ├── test_request_batcher.py.disabled
│   │   ├── test_retry.py
│   │   ├── test_router.py
│   │   ├── test_shared_context.py.disabled
│   │   ├── test_unified_biothings.py
│   │   ├── thinking
│   │   │   ├── __init__.py
│   │   │   └── test_sequential.py
│   │   ├── trials
│   │   │   ├── test_backward_compatibility.py
│   │   │   ├── test_getter.py
│   │   │   └── test_search.py
│   │   ├── utils
│   │   │   ├── test_gene_validator.py
│   │   │   ├── test_mutation_filter.py
│   │   │   ├── test_rate_limiter.py
│   │   │   └── test_request_cache.py
│   │   ├── variants
│   │   │   ├── constants.py
│   │   │   ├── test_alphagenome_api_key.py
│   │   │   ├── test_alphagenome_comprehensive.py
│   │   │   ├── test_alphagenome.py
│   │   │   ├── test_cbioportal_mutations.py
│   │   │   ├── test_cbioportal_search.py
│   │   │   ├── test_external_integration.py
│   │   │   ├── test_external.py
│   │   │   ├── test_extract_gene_aa_change.py
│   │   │   ├── test_filters.py
│   │   │   ├── test_getter.py
│   │   │   ├── test_links.py
│   │   │   └── test_search.py
│   │   └── workers
│   │       └── test_worker_sanitization.js
│   └── test_pydantic_ai_integration.py
├── THIRD_PARTY_ENDPOINTS.md
├── tox.ini
├── uv.lock
└── wrangler.toml
```

# Files

--------------------------------------------------------------------------------
/src/biomcp/variants/cbioportal_search.py:
--------------------------------------------------------------------------------

```python
"""cBioPortal search enhancements for variant queries."""

import asyncio
import logging
from typing import Any

from pydantic import BaseModel, Field

from ..utils.cbio_http_adapter import CBioHTTPAdapter
from ..utils.gene_validator import is_valid_gene_symbol, sanitize_gene_symbol
from ..utils.request_cache import request_cache
from .cancer_types import get_cancer_keywords

logger = logging.getLogger(__name__)

# Cache for frequently accessed data
_cancer_type_cache: dict[str, dict[str, Any]] = {}
_gene_panel_cache: dict[str, list[str]] = {}


class GeneHotspot(BaseModel):
    """Hotspot mutation information."""

    position: int
    amino_acid_change: str
    count: int
    frequency: float
    cancer_types: list[str] = Field(default_factory=list)


class CBioPortalSearchSummary(BaseModel):
    """Summary data from cBioPortal for a gene search."""

    gene: str
    total_mutations: int = 0
    total_samples_tested: int = 0
    mutation_frequency: float = 0.0
    hotspots: list[GeneHotspot] = Field(default_factory=list)
    cancer_distribution: dict[str, int] = Field(default_factory=dict)
    study_coverage: dict[str, Any] = Field(default_factory=dict)
    top_studies: list[str] = Field(default_factory=list)


class CBioPortalSearchClient:
    """Client for cBioPortal search operations."""

    def __init__(self):
        self.http_adapter = CBioHTTPAdapter()

    @request_cache(ttl=900)  # Cache for 15 minutes
    async def get_gene_search_summary(
        self, gene: str, max_studies: int = 10
    ) -> CBioPortalSearchSummary | None:
        """Get summary statistics for a gene across cBioPortal.

        Args:
            gene: Gene symbol (e.g., "BRAF")
            max_studies: Maximum number of studies to query

        Returns:
            Summary statistics or None if gene not found
        """
        # Validate and sanitize gene symbol
        if not is_valid_gene_symbol(gene):
            logger.warning(f"Invalid gene symbol: {gene}")
            return None

        gene = sanitize_gene_symbol(gene)

        try:
            # Get gene info first
            gene_data, error = await self.http_adapter.get(
                f"/genes/{gene}", endpoint_key="cbioportal_genes"
            )
            if error or not gene_data:
                logger.warning(f"Gene {gene} not found in cBioPortal")
                return None

            gene_id = gene_data.get("entrezGeneId")

            if not gene_id:
                return None

            # Get cancer type keywords for this gene
            cancer_keywords = get_cancer_keywords(gene)

            # Get relevant molecular profiles in parallel with cancer types
            profiles_task = self._get_relevant_profiles(gene, cancer_keywords)
            cancer_types_task = self._get_cancer_types()

            profiles, cancer_types = await asyncio.gather(
                profiles_task, cancer_types_task
            )

            if not profiles:
                logger.info(f"No relevant profiles found for {gene}")
                return None

            # Query mutations from top studies
            selected_profiles = profiles[:max_studies]
            mutation_summary = await self._get_mutation_summary(
                gene_id, selected_profiles, cancer_types
            )

            # Build summary
            summary = CBioPortalSearchSummary(
                gene=gene,
                total_mutations=mutation_summary.get("total_mutations", 0),
                total_samples_tested=mutation_summary.get("total_samples", 0),
                mutation_frequency=mutation_summary.get("frequency", 0.0),
                hotspots=mutation_summary.get("hotspots", []),
                cancer_distribution=mutation_summary.get(
                    "cancer_distribution", {}
                ),
                study_coverage={
                    "total_studies": len(profiles),
                    "queried_studies": len(selected_profiles),
                    "studies_with_data": mutation_summary.get(
                        "studies_with_data", 0
                    ),
                },
                top_studies=[
                    p.get("studyId", "")
                    for p in selected_profiles
                    if p.get("studyId")
                ][:5],
            )

            return summary

        except TimeoutError:
            logger.error(
                f"cBioPortal API timeout for gene {gene}. "
                "The API may be slow or unavailable. Try again later."
            )
            return None
        except ConnectionError as e:
            logger.error(
                f"Network error accessing cBioPortal for gene {gene}: {e}. "
                "Check your internet connection."
            )
            return None
        except Exception as e:
            logger.error(
                f"Unexpected error getting cBioPortal summary for {gene}: "
                f"{type(e).__name__}: {e}. "
                "This may be a temporary issue. If it persists, please report it."
            )
            return None

    async def _get_cancer_types(self) -> dict[str, dict[str, Any]]:
        """Get cancer type hierarchy (cached)."""
        if _cancer_type_cache:
            return _cancer_type_cache

        try:
            cancer_types, error = await self.http_adapter.get(
                "/cancer-types",
                endpoint_key="cbioportal_cancer_types",
                cache_ttl=86400,  # Cache for 24 hours
            )
            if not error and cancer_types:
                # Build lookup by ID
                for ct in cancer_types:
                    ct_id = ct.get("cancerTypeId")
                    if ct_id:
                        _cancer_type_cache[ct_id] = ct
                return _cancer_type_cache
        except Exception as e:
            logger.warning(f"Failed to get cancer types: {e}")

        return {}

    async def _get_relevant_profiles(
        self,
        gene: str,
        cancer_keywords: list[str],
    ) -> list[dict[str, Any]]:
        """Get molecular profiles relevant to the gene."""
        try:
            # Get all mutation profiles
            all_profiles, error = await self.http_adapter.get(
                "/molecular-profiles",
                params={"molecularAlterationType": "MUTATION_EXTENDED"},
                endpoint_key="cbioportal_molecular_profiles",
                cache_ttl=3600,  # Cache for 1 hour
            )

            if error or not all_profiles:
                return []

            # Filter by cancer keywords
            relevant_profiles = []
            for profile in all_profiles:
                study_id = profile.get("studyId", "").lower()
                if any(keyword in study_id for keyword in cancer_keywords):
                    relevant_profiles.append(profile)

            # Sort by sample count (larger studies first)
            # Note: We'd need to fetch study details for actual sample counts
            # For now, prioritize known large studies
            priority_studies = [
                "msk_impact",
                "tcga",
                "genie",
                "metabric",
                "broad",
            ]

            def study_priority(profile):
                study_id = profile.get("studyId", "").lower()
                for i, priority in enumerate(priority_studies):
                    if priority in study_id:
                        return i
                return len(priority_studies)

            relevant_profiles.sort(key=study_priority)

            return relevant_profiles

        except Exception as e:
            logger.warning(f"Failed to get profiles: {e}")
            return []

    async def _get_mutation_summary(
        self,
        gene_id: int,
        profiles: list[dict[str, Any]],
        cancer_types: dict[str, dict[str, Any]],
    ) -> dict[str, Any]:
        """Get mutation summary across selected profiles."""
        # Batch mutations queries for better performance
        BATCH_SIZE = (
            5  # Process 5 profiles at a time to avoid overwhelming the API
        )

        mutation_results = []
        study_ids = []

        for i in range(0, len(profiles), BATCH_SIZE):
            batch = profiles[i : i + BATCH_SIZE]
            batch_tasks = []
            batch_study_ids = []

            for profile in batch:
                profile_id = profile.get("molecularProfileId")
                study_id = profile.get("studyId")
                if profile_id and study_id:
                    task = self._get_profile_mutations(
                        gene_id, profile_id, study_id
                    )
                    batch_tasks.append(task)
                    batch_study_ids.append(study_id)

            if batch_tasks:
                # Execute batch in parallel
                batch_results = await asyncio.gather(
                    *batch_tasks, return_exceptions=True
                )
                mutation_results.extend(batch_results)
                study_ids.extend(batch_study_ids)

                # Small delay between batches to avoid rate limiting
                if i + BATCH_SIZE < len(profiles):
                    await asyncio.sleep(0.05)  # 50ms delay

        results = mutation_results

        # Process results using helper function
        from .cbioportal_search_helpers import (
            format_hotspots,
            process_mutation_results,
        )

        mutation_data = await process_mutation_results(
            list(zip(results, study_ids, strict=False)),
            cancer_types,
            self,
        )

        # Calculate frequency
        frequency = (
            mutation_data["total_mutations"] / mutation_data["total_samples"]
            if mutation_data["total_samples"] > 0
            else 0.0
        )

        # Format hotspots
        hotspots = format_hotspots(
            mutation_data["hotspot_counts"], mutation_data["total_mutations"]
        )

        return {
            "total_mutations": mutation_data["total_mutations"],
            "total_samples": mutation_data["total_samples"],
            "frequency": frequency,
            "hotspots": hotspots,
            "cancer_distribution": mutation_data["cancer_distribution"],
            "studies_with_data": mutation_data["studies_with_data"],
        }

    async def _get_profile_mutations(
        self,
        gene_id: int,
        profile_id: str,
        study_id: str,
    ) -> dict[str, Any] | None:
        """Get mutations for a gene in a specific profile."""
        try:
            # Get sample count for the study
            samples, samples_error = await self.http_adapter.get(
                f"/studies/{study_id}/samples",
                params={"projection": "SUMMARY"},
                endpoint_key="cbioportal_studies",
                cache_ttl=3600,  # Cache for 1 hour
            )

            sample_count = len(samples) if samples and not samples_error else 0

            # Get mutations
            mutations, mut_error = await self.http_adapter.get(
                f"/molecular-profiles/{profile_id}/mutations",
                params={
                    "sampleListId": f"{study_id}_all",
                    "geneIdType": "ENTREZ_GENE_ID",
                    "geneIds": str(gene_id),
                    "projection": "SUMMARY",
                },
                endpoint_key="cbioportal_mutations",
                cache_ttl=900,  # Cache for 15 minutes
            )

            if not mut_error and mutations:
                return {"mutations": mutations, "sample_count": sample_count}

        except Exception as e:
            logger.debug(
                f"Failed to get mutations for {profile_id}: {type(e).__name__}"
            )

        return None

    async def _get_study_cancer_type(
        self,
        study_id: str,
        cancer_types: dict[str, dict[str, Any]],
    ) -> str:
        """Get cancer type name for a study."""
        try:
            study, error = await self.http_adapter.get(
                f"/studies/{study_id}",
                endpoint_key="cbioportal_studies",
                cache_ttl=3600,  # Cache for 1 hour
            )
            if not error and study:
                cancer_type_id = study.get("cancerTypeId")
                if cancer_type_id and cancer_type_id in cancer_types:
                    return cancer_types[cancer_type_id].get("name", "Unknown")
                elif cancer_type := study.get("cancerType"):
                    return cancer_type.get("name", "Unknown")
        except Exception:
            logger.debug(f"Failed to get cancer type for study {study_id}")

        # Fallback: infer from study ID
        study_lower = study_id.lower()
        if "brca" in study_lower or "breast" in study_lower:
            return "Breast Cancer"
        elif "lung" in study_lower or "nsclc" in study_lower:
            return "Lung Cancer"
        elif "coad" in study_lower or "colorectal" in study_lower:
            return "Colorectal Cancer"
        elif "skcm" in study_lower or "melanoma" in study_lower:
            return "Melanoma"
        elif "prad" in study_lower or "prostate" in study_lower:
            return "Prostate Cancer"

        return "Unknown"


def format_cbioportal_search_summary(
    summary: CBioPortalSearchSummary | None,
) -> str:
    """Format cBioPortal search summary for display."""
    if not summary:
        return ""

    lines = [
        f"\n### cBioPortal Summary for {summary.gene}",
        f"- **Mutation Frequency**: {summary.mutation_frequency:.1%} ({summary.total_mutations:,} mutations in {summary.total_samples_tested:,} samples)",
        f"- **Studies**: {summary.study_coverage.get('studies_with_data', 0)} of {summary.study_coverage.get('queried_studies', 0)} studies have mutations",
    ]

    if summary.hotspots:
        lines.append("\n**Top Hotspots:**")
        for hs in summary.hotspots[:3]:
            lines.append(
                f"- {hs.amino_acid_change}: {hs.count} cases ({hs.frequency:.1%}) in {', '.join(hs.cancer_types[:3])}"
            )

    if summary.cancer_distribution:
        lines.append("\n**Cancer Type Distribution:**")
        for cancer_type, count in sorted(
            summary.cancer_distribution.items(),
            key=lambda x: x[1],
            reverse=True,
        )[:5]:
            lines.append(f"- {cancer_type}: {count} mutations")

    return "\n".join(lines)

```

--------------------------------------------------------------------------------
/src/biomcp/query_router.py:
--------------------------------------------------------------------------------

```python
"""Query router for unified search in BioMCP."""

import asyncio
from dataclasses import dataclass
from typing import Any

from biomcp.articles.search import PubmedRequest
from biomcp.articles.unified import search_articles_unified
from biomcp.query_parser import ParsedQuery
from biomcp.trials.search import TrialQuery, search_trials
from biomcp.variants.search import VariantQuery, search_variants


@dataclass
class RoutingPlan:
    """Plan for routing a query to appropriate tools."""

    tools_to_call: list[str]
    field_mappings: dict[str, dict[str, Any]]
    coordination_strategy: str = "parallel"


class QueryRouter:
    """Routes unified queries to appropriate domain-specific tools."""

    def route(self, parsed_query: ParsedQuery) -> RoutingPlan:
        """Determine which tools to call based on query fields."""
        tools_to_call = []
        field_mappings = {}

        # Check which domains are referenced
        domains_referenced = self._get_referenced_domains(parsed_query)

        # Build field mappings for each domain
        domain_mappers = {
            "articles": ("article_searcher", self._map_article_fields),
            "trials": ("trial_searcher", self._map_trial_fields),
            "variants": ("variant_searcher", self._map_variant_fields),
            "genes": ("gene_searcher", self._map_gene_fields),
            "drugs": ("drug_searcher", self._map_drug_fields),
            "diseases": ("disease_searcher", self._map_disease_fields),
        }

        for domain, (tool_name, mapper_func) in domain_mappers.items():
            if domain in domains_referenced:
                tools_to_call.append(tool_name)
                field_mappings[tool_name] = mapper_func(parsed_query)

        return RoutingPlan(
            tools_to_call=tools_to_call,
            field_mappings=field_mappings,
            coordination_strategy="parallel",
        )

    def _get_referenced_domains(self, parsed_query: ParsedQuery) -> set[str]:
        """Get all domains referenced in the query."""
        domains_referenced = set()

        # Check domain-specific fields
        for domain, fields in parsed_query.domain_specific_fields.items():
            if fields:
                domains_referenced.add(domain)

        # Check cross-domain fields (these trigger multiple searches)
        if parsed_query.cross_domain_fields:
            cross_domain_mappings = {
                "gene": ["articles", "variants", "genes", "trials"],
                "disease": ["articles", "trials", "diseases"],
                "variant": ["articles", "variants"],
                "chemical": ["articles", "trials", "drugs"],
                "drug": ["articles", "trials", "drugs"],
            }

            for field, domains in cross_domain_mappings.items():
                if field in parsed_query.cross_domain_fields:
                    domains_referenced.update(domains)

        return domains_referenced

    def _map_article_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
        """Map query fields to article searcher parameters."""
        mapping: dict[str, Any] = {}

        # Map cross-domain fields
        if "gene" in parsed_query.cross_domain_fields:
            mapping["genes"] = [parsed_query.cross_domain_fields["gene"]]
        if "disease" in parsed_query.cross_domain_fields:
            mapping["diseases"] = [parsed_query.cross_domain_fields["disease"]]
        if "variant" in parsed_query.cross_domain_fields:
            mapping["variants"] = [parsed_query.cross_domain_fields["variant"]]

        # Map article-specific fields
        article_fields = parsed_query.domain_specific_fields.get(
            "articles", {}
        )
        if "title" in article_fields:
            mapping["keywords"] = [article_fields["title"]]
        if "author" in article_fields:
            mapping["keywords"] = mapping.get("keywords", []) + [
                article_fields["author"]
            ]
        if "journal" in article_fields:
            mapping["keywords"] = mapping.get("keywords", []) + [
                article_fields["journal"]
            ]

        # Extract mutation patterns from raw query
        import re

        raw_query = parsed_query.raw_query
        # Look for mutation patterns like F57Y, F57*, V600E
        mutation_patterns = re.findall(r"\b[A-Z]\d+[A-Z*]\b", raw_query)
        if mutation_patterns:
            if "keywords" not in mapping:
                mapping["keywords"] = []
            mapping["keywords"].extend(mutation_patterns)

        return mapping

    def _map_trial_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
        """Map query fields to trial searcher parameters."""
        mapping: dict[str, Any] = {}

        # Map cross-domain fields
        if "disease" in parsed_query.cross_domain_fields:
            mapping["conditions"] = [
                parsed_query.cross_domain_fields["disease"]
            ]

        # Gene searches in trials might look for targeted therapies
        if "gene" in parsed_query.cross_domain_fields:
            gene = parsed_query.cross_domain_fields["gene"]
            # Search for gene-targeted interventions
            mapping["keywords"] = [gene]

        # Map trial-specific fields
        trial_fields = parsed_query.domain_specific_fields.get("trials", {})
        if "condition" in trial_fields:
            mapping["conditions"] = [trial_fields["condition"]]
        if "intervention" in trial_fields:
            mapping["interventions"] = [trial_fields["intervention"]]
        if "phase" in trial_fields:
            mapping["phase"] = f"PHASE{trial_fields['phase']}"
        if "status" in trial_fields:
            mapping["recruiting_status"] = trial_fields["status"].upper()

        return mapping

    def _map_variant_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
        """Map query fields to variant searcher parameters."""
        mapping: dict[str, Any] = {}

        # Map cross-domain fields
        if "gene" in parsed_query.cross_domain_fields:
            mapping["gene"] = parsed_query.cross_domain_fields["gene"]
        if "variant" in parsed_query.cross_domain_fields:
            variant = parsed_query.cross_domain_fields["variant"]
            # Check if it's an rsID or protein change
            if variant.startswith("rs"):
                mapping["rsid"] = variant
            else:
                mapping["hgvsp"] = variant

        # Map variant-specific fields
        variant_fields = parsed_query.domain_specific_fields.get(
            "variants", {}
        )
        if "rsid" in variant_fields:
            mapping["rsid"] = variant_fields["rsid"]
        if "gene" in variant_fields:
            mapping["gene"] = variant_fields["gene"]
        if "significance" in variant_fields:
            mapping["significance"] = variant_fields["significance"]
        if "frequency" in variant_fields:
            # Parse frequency operators
            freq = variant_fields["frequency"]
            if freq.startswith("<"):
                mapping["max_frequency"] = float(freq[1:])
            elif freq.startswith(">"):
                mapping["min_frequency"] = float(freq[1:])

        return mapping

    def _map_gene_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
        """Map query fields to gene searcher parameters."""
        mapping: dict[str, Any] = {}

        # Map cross-domain fields
        if "gene" in parsed_query.cross_domain_fields:
            mapping["query"] = parsed_query.cross_domain_fields["gene"]

        # Map gene-specific fields
        gene_fields = parsed_query.domain_specific_fields.get("genes", {})
        if "symbol" in gene_fields:
            mapping["query"] = gene_fields["symbol"]
        elif "name" in gene_fields:
            mapping["query"] = gene_fields["name"]
        elif "type" in gene_fields:
            mapping["type_of_gene"] = gene_fields["type"]

        return mapping

    def _map_drug_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
        """Map query fields to drug searcher parameters."""
        mapping: dict[str, Any] = {}

        # Map cross-domain fields
        if "chemical" in parsed_query.cross_domain_fields:
            mapping["query"] = parsed_query.cross_domain_fields["chemical"]
        elif "drug" in parsed_query.cross_domain_fields:
            mapping["query"] = parsed_query.cross_domain_fields["drug"]

        # Map drug-specific fields
        drug_fields = parsed_query.domain_specific_fields.get("drugs", {})
        if "name" in drug_fields:
            mapping["query"] = drug_fields["name"]
        elif "tradename" in drug_fields:
            mapping["query"] = drug_fields["tradename"]
        elif "indication" in drug_fields:
            mapping["indication"] = drug_fields["indication"]

        return mapping

    def _map_disease_fields(self, parsed_query: ParsedQuery) -> dict[str, Any]:
        """Map query fields to disease searcher parameters."""
        mapping: dict[str, Any] = {}

        # Map cross-domain fields
        if "disease" in parsed_query.cross_domain_fields:
            mapping["query"] = parsed_query.cross_domain_fields["disease"]

        # Map disease-specific fields
        disease_fields = parsed_query.domain_specific_fields.get(
            "diseases", {}
        )
        if "name" in disease_fields:
            mapping["query"] = disease_fields["name"]
        elif "mondo" in disease_fields:
            mapping["query"] = disease_fields["mondo"]
        elif "synonym" in disease_fields:
            mapping["query"] = disease_fields["synonym"]

        return mapping


async def execute_routing_plan(
    plan: RoutingPlan, output_json: bool = True
) -> dict[str, Any]:
    """Execute a routing plan by calling the appropriate tools."""
    tasks = []
    task_names = []

    for tool_name in plan.tools_to_call:
        params = plan.field_mappings[tool_name]

        if tool_name == "article_searcher":
            request = PubmedRequest(**params)
            tasks.append(
                search_articles_unified(
                    request,
                    include_pubmed=True,
                    include_preprints=False,
                    output_json=output_json,
                )
            )
            task_names.append("articles")

        elif tool_name == "trial_searcher":
            query = TrialQuery(**params)
            tasks.append(search_trials(query, output_json=output_json))
            task_names.append("trials")

        elif tool_name == "variant_searcher":
            variant_query = VariantQuery(**params)
            tasks.append(
                search_variants(variant_query, output_json=output_json)
            )
            task_names.append("variants")

        elif tool_name == "gene_searcher":
            # For gene search, we'll use the BioThingsClient directly
            from biomcp.integrations.biothings_client import BioThingsClient

            client = BioThingsClient()
            query_str = params.get("query", "")
            tasks.append(_search_genes(client, query_str, output_json))
            task_names.append("genes")

        elif tool_name == "drug_searcher":
            # For drug search, we'll use the BioThingsClient directly
            from biomcp.integrations.biothings_client import BioThingsClient

            client = BioThingsClient()
            query_str = params.get("query", "")
            tasks.append(_search_drugs(client, query_str, output_json))
            task_names.append("drugs")

        elif tool_name == "disease_searcher":
            # For disease search, we'll use the BioThingsClient directly
            from biomcp.integrations.biothings_client import BioThingsClient

            client = BioThingsClient()
            query_str = params.get("query", "")
            tasks.append(_search_diseases(client, query_str, output_json))
            task_names.append("diseases")

    # Execute all searches in parallel
    results = await asyncio.gather(*tasks, return_exceptions=True)

    # Package results
    output: dict[str, Any] = {}
    for name, result in zip(task_names, results, strict=False):
        if isinstance(result, Exception):
            output[name] = {"error": str(result)}
        else:
            output[name] = result

    return output


async def _search_genes(client, query: str, output_json: bool) -> Any:
    """Search for genes using BioThingsClient."""
    results = await client._query_gene(query)
    if not results:
        return [] if output_json else "No genes found"

    # Fetch full details for each result
    detailed_results = []
    for result in results[:10]:  # Limit to 10 results
        gene_id = result.get("_id")
        if gene_id:
            full_gene = await client._get_gene_by_id(gene_id)
            if full_gene:
                detailed_results.append(full_gene.model_dump(by_alias=True))

    if output_json:
        import json

        return json.dumps(detailed_results)
    else:
        return detailed_results


async def _search_drugs(client, query: str, output_json: bool) -> Any:
    """Search for drugs using BioThingsClient."""
    results = await client._query_drug(query)
    if not results:
        return [] if output_json else "No drugs found"

    # Fetch full details for each result
    detailed_results = []
    for result in results[:10]:  # Limit to 10 results
        drug_id = result.get("_id")
        if drug_id:
            full_drug = await client._get_drug_by_id(drug_id)
            if full_drug:
                detailed_results.append(full_drug.model_dump(by_alias=True))

    if output_json:
        import json

        return json.dumps(detailed_results)
    else:
        return detailed_results


async def _search_diseases(client, query: str, output_json: bool) -> Any:
    """Search for diseases using BioThingsClient."""
    results = await client._query_disease(query)
    if not results:
        return [] if output_json else "No diseases found"

    # Fetch full details for each result
    detailed_results = []
    for result in results[:10]:  # Limit to 10 results
        disease_id = result.get("_id")
        if disease_id:
            full_disease = await client._get_disease_by_id(disease_id)
            if full_disease:
                detailed_results.append(full_disease.model_dump(by_alias=True))

    if output_json:
        import json

        return json.dumps(detailed_results)
    else:
        return detailed_results

```

--------------------------------------------------------------------------------
/docs/user-guides/03-integrating-with-ides-and-clients.md:
--------------------------------------------------------------------------------

```markdown
# Integrating with IDEs and Clients

BioMCP can be integrated into your development workflow through multiple approaches. This guide covers integration with IDEs, Python applications, and MCP-compatible clients.

## Integration Methods Overview

| Method         | Best For                  | Installation | Usage Pattern            |
| -------------- | ------------------------- | ------------ | ------------------------ |
| **Cursor IDE** | Interactive development   | Smithery CLI | Natural language queries |
| **Python SDK** | Application development   | pip/uv       | Direct function calls    |
| **MCP Client** | AI assistants & protocols | Subprocess   | Tool-based communication |

## Cursor IDE Integration

Cursor IDE provides the most seamless integration for interactive biomedical research during development.

### Installation

1. **Prerequisites:**

   - [Cursor IDE](https://cursor.sh/) installed
   - [Smithery](https://smithery.ai/) account and token

2. **Install BioMCP:**

   ```bash
   npx -y @smithery/cli@latest install @genomoncology/biomcp --client cursor
   ```

3. **Configuration:**
   - The Smithery CLI automatically configures Cursor
   - No manual configuration needed

### Usage in Cursor

Once installed, you can query biomedical data using natural language:

#### Clinical Trials

```
"Find Phase 3 clinical trials for lung cancer with immunotherapy"
```

#### Research Articles

```
"Summarize recent research on EGFR mutations in lung cancer"
```

#### Genetic Variants

```
"What's the clinical significance of the BRAF V600E mutation?"
```

#### Complex Queries

```
"Compare treatment outcomes for ALK-positive vs EGFR-mutant NSCLC"
```

### Cursor Tips

1. **Be Specific**: Include gene names, disease types, and treatment modalities
2. **Iterate**: Refine queries based on initial results
3. **Cross-Reference**: Ask for both articles and trials on the same topic
4. **Export Results**: Copy formatted results for documentation

## Python SDK Integration

The Python SDK provides programmatic access to BioMCP for building applications.

### Installation

```bash
# Using pip
pip install biomcp-python

# Using uv
uv add biomcp-python

# For scripts
uv pip install biomcp-python
```

### Basic Usage

```python
import asyncio
from biomcp import BioMCP

async def main():
    # Initialize client
    client = BioMCP()

    # Search for articles
    articles = await client.articles.search(
        genes=["BRAF"],
        diseases=["melanoma"],
        limit=5
    )

    # Search for trials
    trials = await client.trials.search(
        conditions=["breast cancer"],
        interventions=["CDK4/6 inhibitor"],
        recruiting_status="RECRUITING"
    )

    # Get variant details
    variant = await client.variants.get("rs121913529")

    return articles, trials, variant

# Run the async function
results = asyncio.run(main())
```

### Advanced Features

#### Domain-Specific Modules

```python
from biomcp import BioMCP
from biomcp.variants import search_variants, get_variant
from biomcp.trials import search_trials, get_trial
from biomcp.articles import search_articles, fetch_articles

# Direct module usage
async def variant_analysis():
    # Search pathogenic TP53 variants
    results = await search_variants(
        gene="TP53",
        significance="pathogenic",
        frequency_max=0.01,
        limit=20
    )

    # Get detailed annotations
    for variant in results:
        details = await get_variant(variant.id)
        print(f"{variant.id}: {details.clinical_significance}")
```

#### Output Formats

```python
# JSON for programmatic use
articles_json = await client.articles.search(
    genes=["KRAS"],
    format="json"
)

# Markdown for display
articles_md = await client.articles.search(
    genes=["KRAS"],
    format="markdown"
)
```

#### Error Handling

```python
from biomcp.exceptions import BioMCPError, APIError, ValidationError

try:
    results = await client.articles.search(genes=["INVALID_GENE"])
except ValidationError as e:
    print(f"Invalid input: {e}")
except APIError as e:
    print(f"API error: {e}")
except BioMCPError as e:
    print(f"General error: {e}")
```

### Example: Building a Variant Report

```python
import asyncio
from biomcp import BioMCP

async def generate_variant_report(gene: str, mutation: str):
    client = BioMCP()

    # 1. Get gene information
    gene_info = await client.genes.get(gene)

    # 2. Search for the specific variant
    variants = await client.variants.search(
        gene=gene,
        keywords=[mutation]
    )

    # 3. Find relevant articles
    articles = await client.articles.search(
        genes=[gene],
        keywords=[mutation],
        limit=10
    )

    # 4. Look for clinical trials
    trials = await client.trials.search(
        conditions=["cancer"],
        other_terms=[f"{gene} {mutation}"],
        recruiting_status="RECRUITING"
    )

    # 5. Generate report
    report = f"""
# Variant Report: {gene} {mutation}

## Gene Information
- **Official Name**: {gene_info.name}
- **Summary**: {gene_info.summary}

## Variant Details
Found {len(variants)} matching variants

## Literature ({len(articles)} articles)
Recent publications discussing this variant...

## Clinical Trials ({len(trials)} active trials)
Currently recruiting studies...
"""

    return report

# Generate report
report = asyncio.run(generate_variant_report("BRAF", "V600E"))
print(report)
```

## MCP Client Integration

The Model Context Protocol (MCP) provides a standardized way to integrate BioMCP with AI assistants and other tools.

### Understanding MCP

MCP is a protocol for communication between:

- **Clients**: AI assistants, IDEs, or custom applications
- **Servers**: Tool providers like BioMCP

### Critical Requirement: Think Tool

**IMPORTANT**: When using MCP, you MUST call the `think` tool first before any search or fetch operations. This ensures systematic analysis and optimal results.

### Basic MCP Integration

```python
import asyncio
import subprocess
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def run_biomcp_query():
    # Start BioMCP server
    server_params = StdioServerParameters(
        command="uv",
        args=["run", "--with", "biomcp-python", "biomcp", "run"],
        env={"PYTHONUNBUFFERED": "1"}
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize and discover tools
            await session.initialize()
            tools = await session.list_tools()

            # CRITICAL: Always think first!
            await session.call_tool(
                "think",
                arguments={
                    "thought": "Analyzing BRAF V600E in melanoma...",
                    "thoughtNumber": 1,
                    "nextThoughtNeeded": True
                }
            )

            # Now search for articles
            result = await session.call_tool(
                "article_searcher",
                arguments={
                    "genes": ["BRAF"],
                    "diseases": ["melanoma"],
                    "keywords": ["V600E"]
                }
            )

            return result

# Run the query
result = asyncio.run(run_biomcp_query())
```

### Available MCP Tools

BioMCP provides 24 tools through MCP:

#### Core Tools (Always Use First)

- `think` - Sequential reasoning (MANDATORY first step)
- `search` - Unified search across domains
- `fetch` - Retrieve specific records

#### Domain-Specific Tools

- **Articles**: `article_searcher`, `article_getter`
- **Trials**: `trial_searcher`, `trial_getter`, plus detail getters
- **Variants**: `variant_searcher`, `variant_getter`, `alphagenome_predictor`
- **BioThings**: `gene_getter`, `disease_getter`, `drug_getter`
- **NCI**: Organization, intervention, biomarker, disease tools

### MCP Integration Patterns

#### Pattern 1: AI Assistant Integration

```python
# Example for integrating with an AI assistant
class BioMCPAssistant:
    def __init__(self):
        self.session = None

    async def connect(self):
        # Initialize MCP connection
        server_params = StdioServerParameters(
            command="biomcp",
            args=["run"]
        )
        # ... connection setup ...

    async def process_query(self, user_query: str):
        # 1. Always think first
        await self.think_about_query(user_query)

        # 2. Determine appropriate tools
        tools_needed = self.analyze_query(user_query)

        # 3. Execute tool calls
        results = []
        for tool in tools_needed:
            result = await self.session.call_tool(tool.name, tool.args)
            results.append(result)

        # 4. Synthesize results
        return self.format_response(results)
```

#### Pattern 2: Custom Client Implementation

```python
import json
from typing import Any, Dict

class BioMCPClient:
    """Custom client for specific biomedical workflows"""

    async def variant_to_trials_pipeline(self, variant_id: str):
        """Find trials for patients with specific variants"""

        # Step 1: Think and plan
        await self.think(
            "Planning variant-to-trials search pipeline...",
            thoughtNumber=1
        )

        # Step 2: Get variant details
        variant = await self.call_tool("variant_getter", {
            "variant_id": variant_id
        })

        # Step 3: Extract gene and disease associations
        gene = variant.get("gene", {}).get("symbol")
        diseases = self.extract_diseases(variant)

        # Step 4: Search for relevant trials
        trials = await self.call_tool("trial_searcher", {
            "conditions": diseases,
            "other_terms": [f"{gene} mutation"],
            "recruiting_status": "RECRUITING"
        })

        return {
            "variant": variant,
            "associated_trials": trials
        }
```

### MCP Best Practices

1. **Always Think First**

   ```python
   # ✅ Correct
   await think(thought="Planning research...", thoughtNumber=1)
   await search(...)

   # ❌ Wrong - skips thinking
   await search(...)  # Will produce poor results
   ```

2. **Use Appropriate Tools**

   ```python
   # For broad searches across domains
   await call_tool("search", {"query": "gene:BRAF AND melanoma"})

   # For specific domain searches
   await call_tool("article_searcher", {"genes": ["BRAF"]})
   ```

3. **Handle Tool Responses**
   ```python
   try:
       result = await session.call_tool("variant_getter", {
           "variant_id": "rs121913529"
       })
       # Process structured result
       if result.get("error"):
           handle_error(result["error"])
       else:
           process_variant(result["data"])
   except Exception as e:
       logger.error(f"Tool call failed: {e}")
   ```

## Choosing the Right Integration

### Use Cursor IDE When:

- Doing interactive research during development
- Exploring biomedical data for new projects
- Need quick answers without writing code
- Want natural language queries

### Use Python SDK When:

- Building production applications
- Need type-safe interfaces
- Want direct function calls
- Require custom error handling

### Use MCP Client When:

- Integrating with AI assistants
- Building protocol-compliant tools
- Need standardized tool interfaces
- Want language-agnostic integration

## Integration Examples

### Example 1: Research Dashboard (Python SDK)

```python
from biomcp import BioMCP
import streamlit as st

async def create_dashboard():
    client = BioMCP()

    st.title("Biomedical Research Dashboard")

    # Gene input
    gene = st.text_input("Enter gene symbol:", "BRAF")

    if st.button("Search"):
        # Fetch comprehensive data
        col1, col2 = st.columns(2)

        with col1:
            st.subheader("Recent Articles")
            articles = await client.articles.search(genes=[gene], limit=5)
            for article in articles:
                st.write(f"- [{article.title}]({article.url})")

        with col2:
            st.subheader("Active Trials")
            trials = await client.trials.search(
                other_terms=[gene],
                recruiting_status="RECRUITING",
                limit=5
            )
            for trial in trials:
                st.write(f"- [{trial.nct_id}]({trial.url})")
```

### Example 2: Variant Analysis Pipeline (MCP)

```python
async def comprehensive_variant_analysis(session, hgvs: str):
    """Complete variant analysis workflow using MCP"""

    # Think about the analysis
    await session.call_tool("think", {
        "thought": f"Planning comprehensive analysis for {hgvs}",
        "thoughtNumber": 1
    })

    # Get variant details
    variant = await session.call_tool("variant_getter", {
        "variant_id": hgvs
    })

    # Search related articles
    articles = await session.call_tool("article_searcher", {
        "variants": [hgvs],
        "limit": 10
    })

    # Find applicable trials
    gene = variant.get("gene", {}).get("symbol")
    trials = await session.call_tool("trial_searcher", {
        "other_terms": [f"{gene} mutation"],
        "recruiting_status": "RECRUITING"
    })

    # Predict functional effects if genomic coordinates available
    if variant.get("chrom") and variant.get("pos"):
        prediction = await session.call_tool("alphagenome_predictor", {
            "chromosome": f"chr{variant['chrom']}",
            "position": variant["pos"],
            "reference": variant["ref"],
            "alternate": variant["alt"]
        })

    return {
        "variant": variant,
        "articles": articles,
        "trials": trials,
        "prediction": prediction
    }
```

## Troubleshooting

### Common Issues

1. **"Think tool not called" errors**

   - Always call think before other operations
   - Include thoughtNumber parameter

2. **API rate limits**

   - Add delays between requests
   - Use API keys for higher limits

3. **Connection failures**

   - Check network connectivity
   - Verify server is running
   - Ensure correct installation

4. **Invalid gene symbols**
   - Use official HGNC symbols
   - Check [genenames.org](https://www.genenames.org)

### Debug Mode

Enable debug logging:

```python
# Python SDK
import logging
logging.basicConfig(level=logging.DEBUG)

# MCP Client
server_params = StdioServerParameters(
    command="biomcp",
    args=["run", "--log-level", "DEBUG"]
)
```

## Next Steps

- Explore [tool-specific documentation](02-mcp-tools-reference.md)
- Review [API authentication](../getting-started/03-authentication-and-api-keys.md)
- Check [example workflows](../how-to-guides/01-find-articles-and-cbioportal-data.md) for your use case

```

--------------------------------------------------------------------------------
/docs/user-guides/01-command-line-interface.md:
--------------------------------------------------------------------------------

```markdown
# Command Line Interface Reference

BioMCP provides a comprehensive command-line interface for biomedical data retrieval and analysis. This guide covers all available commands, options, and usage patterns.

## Installation

```bash
# Using uv (recommended)
uv tool install biomcp

# Using pip
pip install biomcp-python
```

## Global Options

These options work with all commands:

```bash
biomcp [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit
  --help     Show help message and exit
```

## Commands Overview

| Domain           | Commands             | Purpose                                         |
| ---------------- | -------------------- | ----------------------------------------------- |
| **article**      | search, get          | Search and retrieve biomedical literature       |
| **trial**        | search, get          | Find and fetch clinical trial information       |
| **variant**      | search, get, predict | Analyze genetic variants and predict effects    |
| **gene**         | get                  | Retrieve gene information and annotations       |
| **drug**         | get                  | Look up drug/chemical information               |
| **disease**      | get                  | Get disease definitions and synonyms            |
| **organization** | search               | Search NCI organization database                |
| **intervention** | search               | Find interventions (drugs, devices, procedures) |
| **biomarker**    | search               | Search biomarkers used in trials                |
| **health**       | check                | Monitor API status and system health            |

## Article Commands

For practical examples and workflows, see [How to Find Articles and cBioPortal Data](../how-to-guides/01-find-articles-and-cbioportal-data.md).

### article search

Search PubMed/PubTator3 for biomedical literature with automatic cBioPortal integration.

```bash
biomcp article search [OPTIONS]
```

**Options:**

- `--gene, -g TEXT`: Gene symbol(s) to search for
- `--variant, -v TEXT`: Genetic variant(s) to search for
- `--disease, -d TEXT`: Disease/condition(s) to search for
- `--chemical, -c TEXT`: Chemical/drug name(s) to search for
- `--keyword, -k TEXT`: Keyword(s) to search for (supports OR with `|`)
- `--pmid TEXT`: Specific PubMed ID(s) to retrieve
- `--limit INTEGER`: Maximum results to return (default: 10)
- `--no-preprints`: Exclude preprints from results
- `--no-cbioportal`: Disable automatic cBioPortal integration
- `--format [json|markdown]`: Output format (default: markdown)

**Examples:**

```bash
# Basic gene search with automatic cBioPortal data
biomcp article search --gene BRAF --disease melanoma

# Multiple filters
biomcp article search --gene EGFR --disease "lung cancer" --chemical erlotinib

# OR logic in keywords (find different variant notations)
biomcp article search --gene PTEN --keyword "R173|Arg173|p.R173"

# Exclude preprints
biomcp article search --gene TP53 --no-preprints --limit 20

# JSON output for programmatic use
biomcp article search --gene KRAS --format json > results.json
```

### article get

Retrieve a specific article by PubMed ID or DOI.

```bash
biomcp article get IDENTIFIER
```

**Arguments:**

- `IDENTIFIER`: PubMed ID (e.g., "38768446") or DOI (e.g., "10.1101/2024.01.20.23288905")

**Examples:**

```bash
# Get article by PubMed ID
biomcp article get 38768446

# Get preprint by DOI
biomcp article get "10.1101/2024.01.20.23288905"
```

## Trial Commands

For practical examples and workflows, see [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md).

### trial search

Search ClinicalTrials.gov or NCI CTS API for clinical trials.

```bash
biomcp trial search [OPTIONS]
```

**Basic Options:**

- `--condition TEXT`: Disease/condition to search
- `--intervention TEXT`: Treatment/intervention to search
- `--term TEXT`: General search terms
- `--nct-id TEXT`: Specific NCT ID(s)
- `--limit INTEGER`: Maximum results (default: 10)
- `--source [ctgov|nci]`: Data source (default: ctgov)
- `--api-key TEXT`: API key for NCI source

**Study Characteristics:**

- `--status TEXT`: Trial status (RECRUITING, ACTIVE_NOT_RECRUITING, etc.)
- `--study-type TEXT`: Type of study (INTERVENTIONAL, OBSERVATIONAL)
- `--phase TEXT`: Trial phase (EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4)
- `--study-purpose TEXT`: Primary purpose (TREATMENT, PREVENTION, etc.)
- `--age-group TEXT`: Target age group (CHILD, ADULT, OLDER_ADULT)

**Location Options:**

- `--country TEXT`: Country name
- `--state TEXT`: State/province
- `--city TEXT`: City name
- `--latitude FLOAT`: Geographic latitude
- `--longitude FLOAT`: Geographic longitude
- `--distance INTEGER`: Search radius in miles

**Advanced Filters:**

- `--start-date TEXT`: Trial start date (YYYY-MM-DD)
- `--end-date TEXT`: Trial end date (YYYY-MM-DD)
- `--intervention-type TEXT`: Type of intervention
- `--sponsor-type TEXT`: Type of sponsor
- `--is-fda-regulated`: FDA-regulated trials only
- `--expanded-access`: Trials offering expanded access

**Examples:**

```bash
# Find recruiting melanoma trials
biomcp trial search --condition melanoma --status RECRUITING

# Search by location (requires coordinates)
biomcp trial search --condition "lung cancer" \
  --latitude 41.4993 --longitude -81.6944 --distance 50

# Use NCI source with advanced filters
biomcp trial search --condition melanoma --source nci \
  --required-mutations "BRAF V600E" --allow-brain-mets true \
  --api-key YOUR_KEY

# Multiple filters
biomcp trial search --condition "breast cancer" \
  --intervention "CDK4/6 inhibitor" --phase PHASE3 \
  --status RECRUITING --country "United States"
```

### trial get

Retrieve detailed information about a specific clinical trial.

```bash
biomcp trial get NCT_ID [OPTIONS]
```

**Arguments:**

- `NCT_ID`: Clinical trial identifier (e.g., NCT03006926)

**Options:**

- `--include TEXT`: Specific sections to include (Protocol, Locations, References, Outcomes)
- `--source [ctgov|nci]`: Data source (default: ctgov)
- `--api-key TEXT`: API key for NCI source

**Examples:**

```bash
# Get basic trial information
biomcp trial get NCT03006926

# Get specific sections
biomcp trial get NCT03006926 --include Protocol --include Locations

# Use NCI source
biomcp trial get NCT04280705 --source nci --api-key YOUR_KEY
```

## Variant Commands

For practical examples and workflows, see:

- [Get Comprehensive Variant Annotations](../how-to-guides/03-get-comprehensive-variant-annotations.md)
- [Predict Variant Effects with AlphaGenome](../how-to-guides/04-predict-variant-effects-with-alphagenome.md)

### variant search

Search MyVariant.info for genetic variant annotations.

```bash
biomcp variant search [OPTIONS]
```

**Options:**

- `--gene TEXT`: Gene symbol
- `--hgvs TEXT`: HGVS notation
- `--rsid TEXT`: dbSNP rsID
- `--chromosome TEXT`: Chromosome
- `--start INTEGER`: Genomic start position
- `--end INTEGER`: Genomic end position
- `--assembly [hg19|hg38]`: Genome assembly (default: hg38)
- `--significance TEXT`: Clinical significance
- `--min-frequency FLOAT`: Minimum allele frequency
- `--max-frequency FLOAT`: Maximum allele frequency
- `--min-cadd FLOAT`: Minimum CADD score
- `--polyphen TEXT`: PolyPhen prediction
- `--sift TEXT`: SIFT prediction
- `--sources TEXT`: Data sources to include
- `--limit INTEGER`: Maximum results (default: 10)
- `--no-cbioportal`: Disable cBioPortal integration

**Examples:**

```bash
# Search pathogenic BRCA1 variants
biomcp variant search --gene BRCA1 --significance pathogenic

# Search by HGVS notation
biomcp variant search --hgvs "NM_007294.4:c.5266dupC"

# Filter by frequency and prediction scores
biomcp variant search --gene TP53 --max-frequency 0.01 \
  --min-cadd 20 --polyphen possibly_damaging

# Search genomic region
biomcp variant search --chromosome 7 --start 140753336 --end 140753337
```

### variant get

Retrieve detailed information about a specific variant.

```bash
biomcp variant get VARIANT_ID [OPTIONS]
```

**Arguments:**

- `VARIANT_ID`: Variant identifier (HGVS, rsID, or genomic)

**Options:**

- `--json, -j`: Output in JSON format
- `--include-external / --no-external`: Include/exclude external annotations (default: include)
- `--assembly TEXT`: Genome assembly (hg19 or hg38, default: hg19)

**Examples:**

```bash
# Get variant by HGVS (defaults to hg19)
biomcp variant get "NM_007294.4:c.5266dupC"

# Get variant by rsID
biomcp variant get rs121913529

# Specify hg38 assembly
biomcp variant get rs113488022 --assembly hg38

# JSON output with hg38
biomcp variant get rs113488022 --json --assembly hg38

# Without external annotations
biomcp variant get rs113488022 --no-external

# Get variant by genomic coordinates
biomcp variant get "chr17:g.43082434G>A"
```

### variant predict

Predict variant effects using Google DeepMind's AlphaGenome (requires API key).

```bash
biomcp variant predict CHROMOSOME POSITION REFERENCE ALTERNATE [OPTIONS]
```

**Arguments:**

- `CHROMOSOME`: Chromosome (e.g., chr7)
- `POSITION`: Genomic position
- `REFERENCE`: Reference allele
- `ALTERNATE`: Alternate allele

**Options:**

- `--tissue TEXT`: Tissue type(s) using UBERON ontology
- `--interval INTEGER`: Analysis window size (default: 20000)
- `--api-key TEXT`: AlphaGenome API key

**Examples:**

```bash
# Basic prediction (requires ALPHAGENOME_API_KEY env var)
biomcp variant predict chr7 140753336 A T

# Tissue-specific prediction
biomcp variant predict chr7 140753336 A T \
  --tissue UBERON:0002367  # breast tissue

# With per-request API key
biomcp variant predict chr7 140753336 A T --api-key YOUR_KEY
```

## Gene/Drug/Disease Commands

For practical examples using BioThings integration, see [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#biothings-integration-for-enhanced-search).

### gene get

Retrieve gene information from MyGene.info.

```bash
biomcp gene get GENE_NAME
```

**Examples:**

```bash
# Get gene information
biomcp gene get TP53
biomcp gene get BRAF
```

### drug get

Retrieve drug/chemical information from MyChem.info.

```bash
biomcp drug get DRUG_NAME
```

**Examples:**

```bash
# Get drug information
biomcp drug get imatinib
biomcp drug get pembrolizumab
```

### disease get

Retrieve disease information from MyDisease.info.

```bash
biomcp disease get DISEASE_NAME
```

**Examples:**

```bash
# Get disease information
biomcp disease get melanoma
biomcp disease get "non-small cell lung cancer"
```

## NCI-Specific Commands

These commands require an NCI API key. For setup instructions and usage examples, see:

- [Authentication and API Keys](../getting-started/03-authentication-and-api-keys.md#nci-clinical-trials-api)
- [How to Find Trials with NCI and BioThings](../how-to-guides/02-find-trials-with-nci-and-biothings.md#using-nci-api-advanced-features)

### organization search

Search NCI's organization database.

```bash
biomcp organization search [OPTIONS]
```

**Options:**

- `--name TEXT`: Organization name
- `--city TEXT`: City location
- `--state TEXT`: State/province
- `--country TEXT`: Country
- `--org-type TEXT`: Organization type
- `--api-key TEXT`: NCI API key

**Example:**

```bash
biomcp organization search --name "MD Anderson" \
  --city Houston --state TX --api-key YOUR_KEY
```

### intervention search

Search NCI's intervention database.

```bash
biomcp intervention search [OPTIONS]
```

**Options:**

- `--name TEXT`: Intervention name
- `--intervention-type TEXT`: Type (Drug, Device, Procedure, etc.)
- `--api-key TEXT`: NCI API key

**Example:**

```bash
biomcp intervention search --name pembrolizumab \
  --intervention-type Drug --api-key YOUR_KEY
```

### biomarker search

Search biomarkers used in clinical trials.

```bash
biomcp biomarker search [OPTIONS]
```

**Options:**

- `--gene TEXT`: Gene symbol
- `--biomarker-type TEXT`: Type of biomarker
- `--api-key TEXT`: NCI API key

**Example:**

```bash
biomcp biomarker search --gene EGFR \
  --biomarker-type mutation --api-key YOUR_KEY
```

## Health Command

For monitoring API status before bulk operations, see the [Performance Optimizations Guide](../developer-guides/07-performance-optimizations.md).

### health check

Monitor API endpoints and system health.

```bash
biomcp health check [OPTIONS]
```

**Options:**

- `--apis-only`: Check only API endpoints
- `--system-only`: Check only system resources
- `--verbose, -v`: Show detailed information

**Examples:**

```bash
# Full health check
biomcp health check

# Check APIs only
biomcp health check --apis-only

# Detailed system check
biomcp health check --system-only --verbose
```

## Output Formats

Most commands support both human-readable markdown and machine-readable JSON output:

```bash
# Default markdown output
biomcp article search --gene BRAF

# JSON for programmatic use
biomcp article search --gene BRAF --format json

# Save to file
biomcp trial search --condition melanoma --format json > trials.json
```

## Environment Variables

Configure default behavior with environment variables:

```bash
# API Keys
export NCI_API_KEY="your-nci-key"
export ALPHAGENOME_API_KEY="your-alphagenome-key"
export CBIO_TOKEN="your-cbioportal-token"

# Logging
export BIOMCP_LOG_LEVEL="DEBUG"
export BIOMCP_CACHE_DIR="/path/to/cache"
```

## Getting Help

Every command has a built-in help flag:

```bash
# General help
biomcp --help

# Command-specific help
biomcp article search --help
biomcp trial get --help
biomcp variant predict --help
```

## Tips and Best Practices

1. **Use Official Gene Symbols**: Always use HGNC-approved gene symbols (e.g., "TP53" not "p53")

2. **Combine Filters**: Most commands support multiple filters for precise results:

   ```bash
   biomcp article search --gene EGFR --disease "lung cancer" \
     --chemical erlotinib --keyword "resistance"
   ```

3. **Handle Large Results**: Use `--limit` and `--format json` for processing:

   ```bash
   biomcp article search --gene BRCA1 --limit 100 --format json | \
     jq '.results[] | {pmid: .pmid, title: .title}'
   ```

4. **Location Searches**: Always provide both latitude and longitude:

   ```bash
   # Find trials near Boston
   biomcp trial search --condition cancer \
     --latitude 42.3601 --longitude -71.0589 --distance 25
   ```

5. **Use OR Logic**: The pipe character enables flexible searches:

   ```bash
   # Find articles mentioning any form of a variant
   biomcp article search --gene BRAF --keyword "V600E|p.V600E|c.1799T>A"
   ```

6. **Check API Health**: Before bulk operations, verify API status:
   ```bash
   biomcp health check --apis-only
   ```

## Next Steps

- Set up [API keys](../getting-started/03-authentication-and-api-keys.md) for enhanced features
- Explore [MCP tools](02-mcp-tools-reference.md) for AI integration
- Read [how-to guides](../how-to-guides/01-find-articles-and-cbioportal-data.md) for complex workflows

```

--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------

```markdown
# Changelog

All notable changes to the BioMCP project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.6.7] - 2025-08-13

### Fixed

- **MCP Resource Encoding** - Fixed character encoding error when loading resources on Windows (Issue #63):
  - Added explicit UTF-8 encoding for reading `instructions.md` and `researcher.md` resource files
  - Resolves "'charmap' codec can't decode byte 0x8f" error on Windows systems
  - Ensures cross-platform compatibility for resource loading

### Changed

- **Documentation** - Clarified sequential thinking integration:
  - Updated `researcher-persona-resource.md` to remove references to external `sequential-thinking` MCP server
  - Clarified that the `think` tool is built into BioMCP (no external dependencies needed)
  - Updated configuration examples to show only BioMCP server is required

## [0.6.6] - 2025-08-08

### Fixed

- **Windows Compatibility** - Fixed fcntl module import error on Windows (Issue #57):
  - Added conditional import with try/except for fcntl module
  - File locking now only applies on Unix systems
  - Windows users get full functionality without file locking
  - Refactored cache functions to reduce code complexity

### Changed

- **Documentation** - Updated Docker instructions in README (Issue #58):
  - Added `docker build -t biomcp:latest .` command before `docker run`
  - Clarified that biomcp:latest is a local build, not pulled from Docker Hub

## [0.6.5] - 2025-08-07

### Added

- **OpenFDA Integration** - Comprehensive FDA regulatory data access:
  - **12 New MCP Tools** for adverse events, drug labels, device events, drug approvals, recalls, and shortages
  - Each domain includes searcher and getter tools for flexible data retrieval
  - Unified search support with `domain="fda_*"` parameters
  - Enhanced CLI commands for all OpenFDA endpoints
  - Smart caching and rate limiting for API efficiency
  - Comprehensive error handling and data validation

### Changed

- Improved API key support across all OpenFDA tools
- Enhanced documentation for FDA data integration

## [0.6.4] - 2025-08-06

### Changed

- **Documentation Restructure** - Major documentation improvements:
  - Simplified navigation structure for better user experience
  - Fixed code block formatting and layout issues
  - Removed unnecessary sections and redundant content
  - Improved overall documentation readability and organization
  - Enhanced mobile responsiveness

## [0.6.3] - 2025-08-05

### Added

- **NCI Clinical Trials Search API Integration** - Enhanced cancer trial search capabilities:
  - Dual source support for trial search/getter tools (ClinicalTrials.gov + NCI)
  - NCI API key handling via `NCI_API_KEY` environment variable or parameter
  - Advanced trial filters: biomarkers, prior therapy, brain metastases acceptance
  - **6 New MCP Tools** for NCI-specific searches:
    - `nci_organization_searcher` / `nci_organization_getter`: Cancer centers, hospitals, research institutions
    - `nci_intervention_searcher` / `nci_intervention_getter`: Drugs, devices, procedures, biologicals
    - `nci_biomarker_searcher`: Trial eligibility biomarkers (reference genes, branches)
    - `nci_disease_searcher`: NCI's controlled vocabulary of cancer conditions
  - **OR Query Support**: All NCI endpoints support OR queries (e.g., "PD-L1 OR CD274")
  - Real-time access to NCI's curated cancer trials database
  - Automatic cBioPortal integration for gene searches
  - Proper NCI parameter mapping (org_city, org_state_or_province, etc.)
  - Comprehensive error handling for Elasticsearch limits

### Changed

- Enhanced unified search router to properly handle NCI domains
- Trial search/getter tools now accept `source` parameter ("clinicaltrials" or "nci")
- Improved domain-specific search logic for query+domain combinations

## [0.6.2] - 2025-08-05

Note: Initial NCI integration release - see v0.6.3 for the full implementation.

## [0.6.1] - 2025-08-03

### Fixed

- **Dependency Management** - Fixed alphagenome dependency to enable PyPI publishing
  - Made alphagenome an optional dependency
  - Resolved packaging conflicts for distribution

## [0.6.0] - 2025-08-02

### Added

- **Streamable HTTP Transport Protocol** - Modern MCP transport implementation:
  - Single `/mcp` endpoint for all communication
  - Session management with persistent session IDs
  - Event resumption support for reliability
  - On-demand streaming for long operations
  - Configurable HTTP server modes (STDIO, HTTP, Worker)
  - Better scalability for cloud deployments
  - Full MCP specification compliance (2025-03-26)

### Changed

- Improved Cloudflare Worker integration
- Enhanced transport layer with comprehensive testing
- Updated deployment configurations for HTTP mode

## [0.5.0] - 2025-07-31

### Added

- **BioThings API Integration** - Real-time biomedical data access:
  - **MyGene.info**: Gene annotations, summaries, aliases, and database links
  - **MyChem.info**: Drug/chemical information, identifiers, mechanisms of action
  - **MyDisease.info**: Disease definitions, synonyms, MONDO/DOID mappings
  - **3 New MCP Tools**: `gene_getter`, `drug_getter`, `disease_getter`
  - Automatic synonym expansion for enhanced trial searches
  - Batch optimization for multiple gene lookups
  - Live data fetching ensures current information

### Changed

- Enhanced unified search capabilities with BioThings data
- Expanded query language support for gene, drug, and disease queries
- Improved trial searches with automatic disease synonym expansion

## [0.4.7] - 2025-07-30

### Added

- **BioThings Integration** for real-time biomedical data access:
  - **New MCP Tools** (3 tools added, total now 17):
    - `gene_getter`: Query MyGene.info for gene information (symbols, names, summaries)
    - `drug_getter`: Query MyChem.info for drug/chemical data (formulas, indications, mechanisms)
    - `disease_getter`: Query MyDisease.info for disease information (definitions, synonyms, ontologies)
  - **Unified Search/Fetch Enhancement**:
    - Added `gene`, `drug`, `disease` as new searchable domains alongside article, trial, variant
    - Integrated into unified search syntax: `search(domain="gene", keywords=["BRAF"])`
    - Query language support: `gene:BRAF`, `drug:pembrolizumab`, `disease:melanoma`
    - Full fetch support: `fetch(domain="drug", id="DB00945")`
  - **Clinical Trial Enhancement**:
    - Automatic disease synonym expansion for trial searches
    - Real-time synonym lookup from MyDisease.info
    - Example: searching for "GIST" automatically includes "gastrointestinal stromal tumor"
  - **Smart Caching & Performance**:
    - Batch operations for multiple gene/drug lookups
    - Intelligent caching with TTL (gene: 24h, drug: 48h, disease: 72h)
    - Rate limiting to respect API guidelines

### Changed

- Trial search now expands disease terms by default (disable with `expand_synonyms=False`)
- Enhanced error handling for BioThings API responses
- Improved network reliability with automatic retries

## [0.4.6] - 2025-07-09

### Added

- MkDocs documentation deployment

## [0.4.5] - 2025-07-09

### Added

- Unified search and fetch tools following OpenAI MCP guidelines
- Additional variant sources (TCGA/GDC, 1000 Genomes) enabled by default in fetch operations
- Additional article sources (bioRxiv, medRxiv, Europe PMC) enabled by default in search operations

### Changed

- Consolidated 10 separate MCP tools into 2 unified tools (search and fetch)
- Updated response formats to comply with OpenAI MCP specifications

### Fixed

- OpenAI MCP compliance issues to enable integration

## [0.4.4] - 2025-07-08

### Added

- **Performance Optimizations**:
  - Connection pooling with event loop lifecycle management (30% latency reduction)
  - Parallel test execution with pytest-xdist (5x faster test runs)
  - Request batching for cBioPortal API calls (80% fewer API calls)
  - Smart caching with LRU eviction and fast hash keys (10x faster cache operations)
  - Major performance improvements achieving ~3x faster test execution (120s → 42s)

### Fixed

- Non-critical ASGI errors suppressed
- Performance issues in article_searcher

## [0.4.3] - 2025-07-08

### Added

- Complete HTTP centralization and improved code quality
- Comprehensive constants module for better maintainability
- Domain-specific handlers for result formatting
- Parameter parser for robust input validation
- Custom exception hierarchy for better error handling

### Changed

- Refactored domain handlers to use static methods for better performance
- Enhanced type safety throughout the codebase
- Refactored complex functions to meet code quality standards

### Fixed

- Type errors in router.py for full mypy compliance
- Complex functions exceeding cyclomatic complexity thresholds

## [0.4.2] - 2025-07-07

### Added

- Europe PMC DOI support for article fetching
- Pagination support for Europe PMC searches
- OR logic support for variant notation searches (e.g., R173 vs Arg173 vs p.R173)

### Changed

- Enhanced variant notation search capabilities

## [0.4.1] - 2025-07-03

### Added

- AlphaGenome as an optional dependency to predict variant effects on gene regulation
- Per-request API key support for AlphaGenome integration
- AI predictions to complement existing database lookups

### Security

- Comprehensive sanitization in Cloudflare Worker to prevent sensitive data logging
- Secure usage in hosted environments where users provide their own keys

## [0.4.0] - 2025-06-27

### Added

- **cBioPortal Integration** for article searches:
  - Automatic gene-level mutation summaries when searching with gene parameters
  - Mutation-specific search capabilities (e.g., BRAF V600E, SRSF2 F57\*)
  - Dynamic cancer type resolution using cBioPortal API
  - Smart caching and rate limiting for optimal performance

## [0.3.3] - 2025-06-20

### Changed

- Release workflow updates

## [0.3.2] - 2025-06-20

### Changed

- Release workflow updates

## [0.3.1] - 2025-06-20

### Fixed

- Build and release process improvements

## [0.3.0] - 2025-06-20

### Added

- Expanded search capabilities
- Integration tests for MCP server functionality
- Utility modules for gene validation, mutation filtering, and request caching

## [0.2.1] - 2025-06-19

### Added

- Remote MCP policies

## [0.2.0] - 2025-06-17

### Added

- Sequential thinking tool for systematic problem-solving
- Session-based thinking to replace global state
- Extracted router handlers to reduce complexity

### Changed

- Replaced global state in thinking module with session management

### Removed

- Global state from sequential thinking module

### Fixed

- Race conditions in sequential thinking with concurrent usage

## [0.1.11] - 2025-06-12

### Added

- Advanced eligibility criteria filters to clinical trial search

## [0.1.10] - 2025-05-21

### Added

- OAuth support on the Cloudflare worker via Stytch

## [0.1.9] - 2025-05-17

### Fixed

- Refactor: Bump minimum Python version to 3.10

## [0.1.8] - 2025-05-14

### Fixed

- Article searcher fixes

## [0.1.7] - 2025-05-07

### Added

- Remote OAuth support

## [0.1.6] - 2025-05-05

### Added

- Updates to handle cursor integration

## [0.1.5] - 2025-05-01

### Added

- Updates to smithery yaml to account for object types needed for remote calls
- Documentation and Lzyank updates

## [0.1.3] - 2025-05-01

### Added

- Health check functionality to assist with API call issues
- System resources and network & environment information gathering
- Remote MCP capability via Cloudflare using SSE

## [0.1.2] - 2025-04-18

### Added

- Researcher persona and BioMCP v0.1.2 release
- Deep Researcher Persona blog post
- Researcher persona video demo

## [0.1.1] - 2025-04-14

### Added

- Claude Desktop and MCP Inspector tutorials
- Improved Claude Desktop Tutorial for BioMCP
- Troubleshooting guide and blog post

### Fixed

- Log tool names as comma separated string
- Server hanging issues
- Error responses in variant count check

## [0.1.0] - 2025-04-08

### Added

- Initial release of BioMCP
- PubMed/PubTator3 article search integration
- ClinicalTrials.gov trial search integration
- MyVariant.info variant search integration
- CLI interface for direct usage
- MCP server for AI assistant integration
- Cloudflare Worker support for remote deployment
- Comprehensive test suite with pytest-bdd
- GenomOncology introduction
- Blog post on AI-assisted clinical trial search
- MacOS troubleshooting guide

### Security

- API keys properly externalized
- Input validation using Pydantic models
- Safe string handling in all API calls

[Unreleased]: https://github.com/genomoncology/biomcp/compare/v0.6.6...HEAD
[0.6.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.6
[0.6.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.5
[0.6.4]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.4
[0.6.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.3
[0.6.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.2
[0.6.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.1
[0.6.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.6.0
[0.5.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.5.0
[0.4.7]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.7
[0.4.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.6
[0.4.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.5
[0.4.4]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.4
[0.4.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.3
[0.4.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.2
[0.4.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.1
[0.4.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.4.0
[0.3.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.3
[0.3.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.2
[0.3.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.1
[0.3.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.3.0
[0.2.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.2.1
[0.2.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.2.0
[0.1.11]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.11
[0.1.10]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.10
[0.1.9]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.9
[0.1.8]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.8
[0.1.7]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.7
[0.1.6]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.6
[0.1.5]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.5
[0.1.3]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.3
[0.1.2]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.2
[0.1.1]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.1
[0.1.0]: https://github.com/genomoncology/biomcp/releases/tag/v0.1.0

```

--------------------------------------------------------------------------------
/docs/developer-guides/02-contributing-and-testing.md:
--------------------------------------------------------------------------------

```markdown
# Contributing and Testing Guide

This guide covers how to contribute to BioMCP and run the comprehensive test suite.

## Getting Started

### Prerequisites

- Python 3.10 or higher
- [uv](https://docs.astral.sh/uv/) package manager
- Git
- Node.js (for MCP Inspector)

### Initial Setup

1. **Fork and clone the repository:**

```bash
git clone https://github.com/YOUR_USERNAME/biomcp.git
cd biomcp
```

2. **Install dependencies and setup:**

```bash
# Recommended: Use make for complete setup
make install

# Alternative: Manual setup
uv sync --all-extras
uv run pre-commit install
```

3. **Verify installation:**

```bash
# Run server
biomcp run

# Run tests
make test-offline
```

## Development Workflow

### 1. Create Feature Branch

```bash
git checkout -b feature/your-feature-name
```

### 2. Make Changes

Follow these principles:

- **Keep changes minimal and focused**
- **Follow existing code patterns**
- **Add tests for new functionality**
- **Update documentation as needed**

### 3. Quality Checks

**MANDATORY: Run these before considering work complete:**

```bash
# Step 1: Code quality checks
make check

# This runs:
# - ruff check (linting)
# - ruff format (code formatting)
# - mypy (type checking)
# - pre-commit hooks
# - deptry (dependency analysis)
```

### 4. Run Tests

```bash
# Step 2: Run appropriate test suite
make test          # Full suite (requires network)
# OR
make test-offline  # Unit tests only (no network)
```

**Both quality checks and tests MUST pass before submitting changes.**

## Testing Strategy

### Test Categories

#### Unit Tests

- Fast, reliable tests without external dependencies
- Mock all external API calls
- Always run in CI/CD

```python
# Example unit test
@patch('httpx.AsyncClient.get')
async def test_article_search(mock_get):
    mock_get.return_value.json.return_value = {"results": [...]}
    result = await article_searcher(genes=["BRAF"])
    assert len(result) > 0
```

#### Integration Tests

- Test real API interactions
- May fail due to network/API issues
- Run separately in CI with `continue-on-error`

```python
# Example integration test
@pytest.mark.integration
async def test_real_pubmed_search():
    result = await article_searcher(genes=["TP53"], limit=5)
    assert len(result) == 5
    assert all("TP53" in r.text for r in result)
```

### Running Tests

#### Command Options

```bash
# Run all tests
make test
uv run python -m pytest

# Run only unit tests (fast, offline)
make test-offline
uv run python -m pytest -m "not integration"

# Run only integration tests
uv run python -m pytest -m "integration"

# Run specific test file
uv run python -m pytest tests/tdd/test_article_search.py

# Run with coverage
make cov
uv run python -m pytest --cov --cov-report=html

# Run tests verbosely
uv run python -m pytest -v

# Run tests and stop on first failure
uv run python -m pytest -x
```

#### Test Discovery

Tests are organized in:

- `tests/tdd/` - Unit and integration tests
- `tests/bdd/` - Behavior-driven development tests
- `tests/data/` - Test fixtures and sample data

### Writing Tests

#### Test Structure

```python
import pytest
from unittest.mock import patch, AsyncMock
from biomcp.articles import article_searcher

class TestArticleSearch:
    """Test article search functionality"""

    @pytest.fixture
    def mock_response(self):
        """Sample API response"""
        return {
            "results": [
                {"pmid": "12345", "title": "BRAF in melanoma"}
            ]
        }

    @patch('httpx.AsyncClient.get')
    async def test_basic_search(self, mock_get, mock_response):
        """Test basic article search"""
        # Setup
        mock_get.return_value = AsyncMock()
        mock_get.return_value.json.return_value = mock_response

        # Execute
        result = await article_searcher(genes=["BRAF"])

        # Assert
        assert len(result) == 1
        assert "BRAF" in result[0].title
```

#### Async Testing

```python
import pytest
import asyncio

@pytest.mark.asyncio
async def test_async_function():
    """Test async functionality"""
    result = await some_async_function()
    assert result is not None

# Or use pytest-asyncio fixtures
@pytest.fixture
async def async_client():
    async with AsyncClient() as client:
        yield client
```

#### Mocking External APIs

```python
from unittest.mock import patch, MagicMock

@patch('biomcp.integrations.pubmed.search')
def test_with_mock(mock_search):
    # Configure mock
    mock_search.return_value = [{
        "pmid": "12345",
        "title": "Test Article"
    }]

    # Test code that uses the mocked function
    result = search_articles("BRAF")

    # Verify mock was called correctly
    mock_search.assert_called_once_with("BRAF")
```

## MCP Inspector Testing

The MCP Inspector provides an interactive way to test MCP tools.

### Setup

```bash
# Install inspector
npm install -g @modelcontextprotocol/inspector

# Run BioMCP with inspector
make inspector
# OR
npx @modelcontextprotocol/inspector uv run --with biomcp-python biomcp run
```

### Testing Tools

1. **Connect to server** in the inspector UI
2. **View available tools** in the tools panel
3. **Test individual tools** with sample inputs

#### Example Tool Tests

```javascript
// Test article search
{
  "tool": "article_searcher",
  "arguments": {
    "genes": ["BRAF"],
    "diseases": ["melanoma"],
    "limit": 5
  }
}

// Test trial search
{
  "tool": "trial_searcher",
  "arguments": {
    "conditions": ["lung cancer"],
    "recruiting_status": "OPEN",
    "limit": 10
  }
}

// Test think tool (ALWAYS first!)
{
  "tool": "think",
  "arguments": {
    "thought": "Planning to search for BRAF mutations",
    "thoughtNumber": 1,
    "nextThoughtNeeded": true
  }
}
```

### Debugging with Inspector

1. **Check request/response**: View raw MCP messages
2. **Verify parameters**: Ensure correct argument format
3. **Test error handling**: Try invalid inputs
4. **Monitor performance**: Check response times

## Code Style and Standards

### Python Style

- **Formatter**: ruff (line length: 79)
- **Type hints**: Required for all functions
- **Docstrings**: Google style for all public functions

```python
def search_articles(
    genes: list[str],
    limit: int = 10
) -> list[Article]:
    """Search for articles by gene names.

    Args:
        genes: List of gene symbols to search
        limit: Maximum number of results

    Returns:
        List of Article objects

    Raises:
        ValueError: If genes list is empty
    """
    if not genes:
        raise ValueError("Genes list cannot be empty")
    # Implementation...
```

### Pre-commit Hooks

Automatically run on commit:

- ruff formatting
- ruff linting
- mypy type checking
- File checks (YAML, TOML, merge conflicts)

Manual run:

```bash
uv run pre-commit run --all-files
```

## Continuous Integration

### GitHub Actions Workflow

The CI pipeline runs:

1. **Linting and Formatting**
2. **Type Checking**
3. **Unit Tests** (required to pass)
4. **Integration Tests** (allowed to fail)
5. **Coverage Report**

### CI Configuration

```yaml
# .github/workflows/test.yml structure
jobs:
  test:
    strategy:
      matrix:
        python-version: ["3.10", "3.11", "3.12"]
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v2
      - run: make check
      - run: make test-offline
```

## Debugging and Troubleshooting

### Common Issues

#### Test Failures

```bash
# Run failed test with more details
uv run python -m pytest -vvs tests/path/to/test.py::test_name

# Debug with print statements
uv run python -m pytest -s  # Don't capture stdout

# Use debugger
uv run python -m pytest --pdb  # Drop to debugger on failure
```

#### Integration Test Issues

Common causes:

- **Rate limiting**: Add delays or use mocks
- **API changes**: Update test expectations
- **Network issues**: Check connectivity
- **API keys**: Ensure valid keys for NCI tests

## Integration Testing

### Overview

BioMCP includes integration tests that make real API calls to external services. These tests verify that our integrations work correctly with live data but can be affected by API availability, rate limits, and data changes.

### Running Integration Tests

```bash
# Run all tests including integration
make test

# Run only integration tests
pytest -m integration

# Skip integration tests
pytest -m "not integration"
```

### Handling Flaky Tests

Integration tests may fail or skip for various reasons:

1. **API Unavailability**

   - **Symptom**: Tests skip with "API returned no data" message
   - **Cause**: The external service is down or experiencing issues
   - **Action**: Re-run tests later or check service status

2. **Rate Limiting**

   - **Symptom**: Multiple test failures after initial successes
   - **Cause**: Too many requests in a short time
   - **Action**: Run tests with delays between them or use API tokens

3. **Data Changes**
   - **Symptom**: Assertions about specific data fail
   - **Cause**: The external data has changed (e.g., new mutations discovered)
   - **Action**: Update tests to use more flexible assertions

### Integration Test Design Principles

#### 1. Graceful Skipping

Tests should skip rather than fail when:

- API returns no data
- Service is unavailable
- Rate limits are hit

```python
if not data or data.total_count == 0:
    pytest.skip("API returned no data - possible service issue")
```

#### 2. Flexible Assertions

Avoid assertions on specific data values that might change:

❌ **Bad**: Expecting exact mutation counts

```python
assert summary.total_mutations == 1234
```

✅ **Good**: Checking data exists and has reasonable structure

```python
assert summary.total_mutations > 0
assert hasattr(summary, 'hotspots')
```

#### 3. Retry Logic

For critical tests, implement retry with delay:

```python
async def fetch_with_retry(client, resource, max_attempts=2, delay=1.0):
    for attempt in range(max_attempts):
        result = await client.get(resource)
        if result and result.data:
            return result
        if attempt < max_attempts - 1:
            await asyncio.sleep(delay)
    return None
```

#### 4. Cache Management

Clear caches before tests to ensure fresh data:

```python
from biomcp.utils.request_cache import clear_cache
await clear_cache()
```

### Common Integration Test Patterns

#### Testing Search Functionality

```python
@pytest.mark.integration
async def test_gene_search(self):
    client = SearchClient()
    results = await client.search("BRAF")

    # Flexible assertions
    assert results is not None
    if results.count > 0:
        assert results.items[0].gene_symbol == "BRAF"
    else:
        pytest.skip("No results returned - API may be unavailable")
```

#### Testing Data Retrieval

```python
@pytest.mark.integration
async def test_variant_details(self):
    client = VariantClient()
    variant = await client.get_variant("rs121913529")

    if not variant:
        pytest.skip("Variant not found - may have been removed from database")

    # Check structure, not specific values
    assert hasattr(variant, 'chromosome')
    assert hasattr(variant, 'position')
```

### Debugging Failed Integration Tests

1. **Enable Debug Logging**

   ```bash
   BIOMCP_LOG_LEVEL=DEBUG pytest tests/integration/test_failing.py -v
   ```

2. **Check API Status**

   - PubMed: https://www.ncbi.nlm.nih.gov/home/about/website-updates/
   - ClinicalTrials.gov: https://clinicaltrials.gov/about/announcements
   - cBioPortal: https://www.cbioportal.org/

3. **Inspect Response Data**
   ```python
   if not expected_data:
       print(f"Unexpected response: {response}")
       pytest.skip("Data structure changed")
   ```

### Environment Variables for Testing

#### API Tokens

Some services provide higher rate limits with authentication:

```bash
export CBIO_TOKEN="your-token-here"
export PUBMED_API_KEY="your-key-here"
```

#### Offline Mode

Test offline behavior:

```bash
export BIOMCP_OFFLINE=true
pytest tests/
```

#### Custom Timeouts

Adjust timeouts for slow connections:

```bash
export BIOMCP_REQUEST_TIMEOUT=60
pytest tests/integration/
```

### CI/CD Considerations

1. **Separate Test Runs**

   ```yaml
   - name: Unit Tests
     run: pytest -m "not integration"

   - name: Integration Tests
     run: pytest -m integration
     continue-on-error: true
   ```

2. **Scheduled Runs**

   ```yaml
   on:
     schedule:
       - cron: "0 6 * * *" # Daily at 6 AM
   ```

3. **Result Monitoring**: Track integration test success rates over time to identify patterns.

### Integration Testing Best Practices

1. **Keep integration tests focused** - Test integration points, not business logic
2. **Use reasonable timeouts** - Don't wait forever for slow APIs
3. **Document expected failures** - Add comments explaining why tests might skip
4. **Monitor external changes** - Subscribe to API change notifications
5. **Provide escape hatches** - Allow skipping integration tests when needed

#### Type Checking Errors

```bash
# Check specific file
uv run mypy src/biomcp/specific_file.py

# Ignore specific error
# type: ignore[error-code]

# Show error codes
uv run mypy --show-error-codes
```

### Performance Testing

```python
import time
import pytest

@pytest.mark.performance
def test_search_performance():
    """Ensure search completes within time limit"""
    start = time.time()
    result = search_articles("TP53", limit=100)
    duration = time.time() - start

    assert duration < 5.0  # Should complete in 5 seconds
    assert len(result) == 100
```

## Submitting Changes

### Pull Request Process

1. **Ensure all checks pass:**

```bash
make check && make test
```

2. **Update documentation** if needed

3. **Commit with clear message:**

```bash
git add .
git commit -m "feat: add support for variant batch queries

- Add batch_variant_search function
- Update tests for batch functionality
- Document batch size limits"
```

4. **Push to your fork:**

```bash
git push origin feature/your-feature-name
```

5. **Create Pull Request** with:
   - Clear description of changes
   - Link to related issues
   - Test results summary

### Code Review Guidelines

Your PR will be reviewed for:

- **Code quality** and style consistency
- **Test coverage** for new features
- **Documentation** updates
- **Performance** impact
- **Security** considerations

## Best Practices

### DO:

- Write tests for new functionality
- Follow existing patterns
- Keep PRs focused and small
- Update documentation
- Run full test suite locally

### DON'T:

- Skip tests to "save time"
- Mix unrelated changes in one PR
- Ignore linting warnings
- Commit sensitive data
- Break existing functionality

## Additional Resources

- [MCP Documentation](https://modelcontextprotocol.org)
- [pytest Documentation](https://docs.pytest.org)
- [Type Hints Guide](https://mypy.readthedocs.io)
- [Ruff Documentation](https://docs.astral.sh/ruff)

## Getting Help

- **GitHub Issues**: Report bugs or request features
- **Issues**: Ask questions or share ideas
- **Pull Requests**: Submit contributions
- **Documentation**: Check existing docs first

Remember: Quality over speed. Take time to write good tests and clean code!

```

--------------------------------------------------------------------------------
/src/biomcp/cli/openfda.py:
--------------------------------------------------------------------------------

```python
"""
OpenFDA CLI commands for BioMCP.
"""

import asyncio
from typing import Annotated

import typer
from rich.console import Console

from ..openfda import (
    get_adverse_event,
    get_device_event,
    get_drug_approval,
    get_drug_label,
    get_drug_recall,
    get_drug_shortage,
    search_adverse_events,
    search_device_events,
    search_drug_approvals,
    search_drug_labels,
    search_drug_recalls,
    search_drug_shortages,
)

console = Console()

# Create separate Typer apps for each subdomain
adverse_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve FDA drug adverse event reports (FAERS)",
)

label_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve FDA drug product labels (SPL)",
)

device_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve FDA device adverse event reports (MAUDE)",
)

approval_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve FDA drug approval records (Drugs@FDA)",
)

recall_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve FDA drug recall records (Enforcement)",
)

shortage_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve FDA drug shortage information",
)


# Adverse Events Commands
@adverse_app.command("search")
def search_adverse_events_cli(
    drug: Annotated[
        str | None,
        typer.Option("--drug", "-d", help="Drug name to search for"),
    ] = None,
    reaction: Annotated[
        str | None,
        typer.Option(
            "--reaction", "-r", help="Adverse reaction to search for"
        ),
    ] = None,
    serious: Annotated[
        bool | None,
        typer.Option("--serious/--all", help="Filter for serious events only"),
    ] = None,
    limit: Annotated[
        int, typer.Option("--limit", "-l", help="Maximum number of results")
    ] = 25,
    page: Annotated[
        int, typer.Option("--page", "-p", help="Page number (1-based)")
    ] = 1,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Search FDA adverse event reports for drugs."""
    skip = (page - 1) * limit

    try:
        results = asyncio.run(
            search_adverse_events(
                drug=drug,
                reaction=reaction,
                serious=serious,
                limit=limit,
                skip=skip,
                api_key=api_key,
            )
        )
        console.print(results)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


@adverse_app.command("get")
def get_adverse_event_cli(
    report_id: Annotated[str, typer.Argument(help="Safety report ID")],
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Get detailed information for a specific adverse event report."""
    try:
        result = asyncio.run(get_adverse_event(report_id, api_key=api_key))
        console.print(result)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


# Drug Label Commands
@label_app.command("search")
def search_drug_labels_cli(
    name: Annotated[
        str | None,
        typer.Option("--name", "-n", help="Drug name to search for"),
    ] = None,
    indication: Annotated[
        str | None,
        typer.Option(
            "--indication",
            "-i",
            help="Search for drugs indicated for this condition",
        ),
    ] = None,
    boxed_warning: Annotated[
        bool,
        typer.Option(
            "--boxed-warning", help="Filter for drugs with boxed warnings"
        ),
    ] = False,
    section: Annotated[
        str | None,
        typer.Option(
            "--section", "-s", help="Specific label section to search"
        ),
    ] = None,
    limit: Annotated[
        int, typer.Option("--limit", "-l", help="Maximum number of results")
    ] = 25,
    page: Annotated[
        int, typer.Option("--page", "-p", help="Page number (1-based)")
    ] = 1,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Search FDA drug product labels."""
    skip = (page - 1) * limit

    try:
        results = asyncio.run(
            search_drug_labels(
                name=name,
                indication=indication,
                boxed_warning=boxed_warning,
                section=section,
                limit=limit,
                skip=skip,
                api_key=api_key,
            )
        )
        console.print(results)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


@label_app.command("get")
def get_drug_label_cli(
    set_id: Annotated[str, typer.Argument(help="Label set ID")],
    sections: Annotated[
        str | None,
        typer.Option(
            "--sections", help="Comma-separated list of sections to retrieve"
        ),
    ] = None,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Get detailed drug label information."""
    section_list = None
    if sections:
        section_list = [s.strip() for s in sections.split(",")]

    try:
        result = asyncio.run(
            get_drug_label(set_id, section_list, api_key=api_key)
        )
        console.print(result)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


# Device Event Commands
@device_app.command("search")
def search_device_events_cli(
    device: Annotated[
        str | None,
        typer.Option("--device", "-d", help="Device name to search for"),
    ] = None,
    manufacturer: Annotated[
        str | None,
        typer.Option("--manufacturer", "-m", help="Manufacturer name"),
    ] = None,
    problem: Annotated[
        str | None,
        typer.Option("--problem", "-p", help="Device problem description"),
    ] = None,
    product_code: Annotated[
        str | None, typer.Option("--product-code", help="FDA product code")
    ] = None,
    genomics_only: Annotated[
        bool,
        typer.Option(
            "--genomics-only/--all-devices",
            help="Filter to genomic/diagnostic devices",
        ),
    ] = True,
    limit: Annotated[
        int, typer.Option("--limit", "-l", help="Maximum number of results")
    ] = 25,
    page: Annotated[
        int, typer.Option("--page", help="Page number (1-based)")
    ] = 1,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Search FDA device adverse event reports."""
    skip = (page - 1) * limit

    try:
        results = asyncio.run(
            search_device_events(
                device=device,
                manufacturer=manufacturer,
                problem=problem,
                product_code=product_code,
                genomics_only=genomics_only,
                limit=limit,
                skip=skip,
                api_key=api_key,
            )
        )
        console.print(results)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


@device_app.command("get")
def get_device_event_cli(
    mdr_report_key: Annotated[str, typer.Argument(help="MDR report key")],
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Get detailed information for a specific device event report."""
    try:
        result = asyncio.run(get_device_event(mdr_report_key, api_key=api_key))
        console.print(result)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


# Drug Approval Commands
@approval_app.command("search")
def search_drug_approvals_cli(
    drug: Annotated[
        str | None,
        typer.Option("--drug", "-d", help="Drug name to search for"),
    ] = None,
    application: Annotated[
        str | None,
        typer.Option(
            "--application", "-a", help="NDA or BLA application number"
        ),
    ] = None,
    year: Annotated[
        str | None,
        typer.Option("--year", "-y", help="Approval year (YYYY format)"),
    ] = None,
    limit: Annotated[
        int, typer.Option("--limit", "-l", help="Maximum number of results")
    ] = 25,
    page: Annotated[
        int, typer.Option("--page", "-p", help="Page number (1-based)")
    ] = 1,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Search FDA drug approval records."""
    skip = (page - 1) * limit

    try:
        results = asyncio.run(
            search_drug_approvals(
                drug=drug,
                application_number=application,
                approval_year=year,
                limit=limit,
                skip=skip,
                api_key=api_key,
            )
        )
        console.print(results)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


@approval_app.command("get")
def get_drug_approval_cli(
    application: Annotated[
        str, typer.Argument(help="NDA or BLA application number")
    ],
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Get detailed drug approval information."""
    try:
        result = asyncio.run(get_drug_approval(application, api_key=api_key))
        console.print(result)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


# Drug Recall Commands
@recall_app.command("search")
def search_drug_recalls_cli(
    drug: Annotated[
        str | None,
        typer.Option("--drug", "-d", help="Drug name to search for"),
    ] = None,
    recall_class: Annotated[
        str | None,
        typer.Option(
            "--class", "-c", help="Recall classification (1, 2, or 3)"
        ),
    ] = None,
    status: Annotated[
        str | None,
        typer.Option(
            "--status", "-s", help="Recall status (ongoing, completed)"
        ),
    ] = None,
    reason: Annotated[
        str | None,
        typer.Option("--reason", "-r", help="Search in recall reason"),
    ] = None,
    since: Annotated[
        str | None,
        typer.Option("--since", help="Show recalls after date (YYYYMMDD)"),
    ] = None,
    limit: Annotated[
        int, typer.Option("--limit", "-l", help="Maximum number of results")
    ] = 25,
    page: Annotated[
        int, typer.Option("--page", "-p", help="Page number (1-based)")
    ] = 1,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Search FDA drug recall records."""
    skip = (page - 1) * limit

    try:
        results = asyncio.run(
            search_drug_recalls(
                drug=drug,
                recall_class=recall_class,
                status=status,
                reason=reason,
                since_date=since,
                limit=limit,
                skip=skip,
                api_key=api_key,
            )
        )
        console.print(results)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


@recall_app.command("get")
def get_drug_recall_cli(
    recall_number: Annotated[str, typer.Argument(help="FDA recall number")],
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Get detailed drug recall information."""
    try:
        result = asyncio.run(get_drug_recall(recall_number, api_key=api_key))
        console.print(result)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


# Drug Shortage Commands
@shortage_app.command("search")
def search_drug_shortages_cli(
    drug: Annotated[
        str | None,
        typer.Option("--drug", "-d", help="Drug name to search for"),
    ] = None,
    status: Annotated[
        str | None,
        typer.Option(
            "--status", "-s", help="Shortage status (current, resolved)"
        ),
    ] = None,
    category: Annotated[
        str | None,
        typer.Option("--category", "-c", help="Therapeutic category"),
    ] = None,
    limit: Annotated[
        int, typer.Option("--limit", "-l", help="Maximum number of results")
    ] = 25,
    page: Annotated[
        int, typer.Option("--page", "-p", help="Page number (1-based)")
    ] = 1,
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Search FDA drug shortage records."""
    skip = (page - 1) * limit

    try:
        results = asyncio.run(
            search_drug_shortages(
                drug=drug,
                status=status,
                therapeutic_category=category,
                limit=limit,
                skip=skip,
                api_key=api_key,
            )
        )
        console.print(results)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


@shortage_app.command("get")
def get_drug_shortage_cli(
    drug: Annotated[str, typer.Argument(help="Drug name")],
    api_key: Annotated[
        str | None,
        typer.Option(
            "--api-key",
            help="OpenFDA API key (overrides OPENFDA_API_KEY env var)",
        ),
    ] = None,
):
    """Get detailed drug shortage information."""
    try:
        result = asyncio.run(get_drug_shortage(drug, api_key=api_key))
        console.print(result)
    except Exception as e:
        console.print(f"[red]Error: {e}[/red]")
        raise typer.Exit(1) from e


# Main OpenFDA app that combines all subcommands
openfda_app = typer.Typer(
    no_args_is_help=True,
    help="Search and retrieve data from FDA's openFDA API",
)

# Add subcommands
openfda_app.add_typer(
    adverse_app, name="adverse", help="Drug adverse events (FAERS)"
)
openfda_app.add_typer(
    label_app, name="label", help="Drug product labels (SPL)"
)
openfda_app.add_typer(
    device_app, name="device", help="Device adverse events (MAUDE)"
)
openfda_app.add_typer(
    approval_app, name="approval", help="Drug approvals (Drugs@FDA)"
)
openfda_app.add_typer(
    recall_app, name="recall", help="Drug recalls (Enforcement)"
)
openfda_app.add_typer(shortage_app, name="shortage", help="Drug shortages")

```

--------------------------------------------------------------------------------
/src/biomcp/articles/preprints.py:
--------------------------------------------------------------------------------

```python
"""Preprint search functionality for bioRxiv/medRxiv and Europe PMC."""

import asyncio
import json
import logging
from datetime import datetime
from typing import Any

from pydantic import BaseModel, Field

from .. import http_client, render
from ..constants import (
    BIORXIV_BASE_URL,
    BIORXIV_DEFAULT_DAYS_BACK,
    BIORXIV_MAX_PAGES,
    BIORXIV_RESULTS_PER_PAGE,
    EUROPE_PMC_BASE_URL,
    EUROPE_PMC_PAGE_SIZE,
    MEDRXIV_BASE_URL,
    SYSTEM_PAGE_SIZE,
)
from ..core import PublicationState
from .search import PubmedRequest, ResultItem, SearchResponse

logger = logging.getLogger(__name__)


class BiorxivRequest(BaseModel):
    """Request parameters for bioRxiv/medRxiv API."""

    query: str
    interval: str = Field(
        default="", description="Date interval in YYYY-MM-DD/YYYY-MM-DD format"
    )
    cursor: int = Field(default=0, description="Starting position")


class BiorxivResult(BaseModel):
    """Individual result from bioRxiv/medRxiv."""

    doi: str | None = None
    title: str | None = None
    authors: str | None = None
    author_corresponding: str | None = None
    author_corresponding_institution: str | None = None
    date: str | None = None
    version: int | None = None
    type: str | None = None
    license: str | None = None
    category: str | None = None
    jatsxml: str | None = None
    abstract: str | None = None
    published: str | None = None
    server: str | None = None

    def to_result_item(self) -> ResultItem:
        """Convert to standard ResultItem format."""
        authors_list = []
        if self.authors:
            authors_list = [
                author.strip() for author in self.authors.split(";")
            ]

        return ResultItem(
            pmid=None,
            pmcid=None,
            title=self.title,
            journal=f"{self.server or 'bioRxiv'} (preprint)",
            authors=authors_list,
            date=self.date,
            doi=self.doi,
            abstract=self.abstract,
            publication_state=PublicationState.PREPRINT,
            source=self.server or "bioRxiv",
        )


class BiorxivResponse(BaseModel):
    """Response from bioRxiv/medRxiv API."""

    collection: list[BiorxivResult] = Field(default_factory=list)
    messages: list[dict[str, Any]] = Field(default_factory=list)
    total: int = Field(default=0, alias="total")


class EuropePMCRequest(BaseModel):
    """Request parameters for Europe PMC API."""

    query: str
    format: str = "json"
    pageSize: int = Field(default=25, le=1000)
    cursorMark: str = Field(default="*")
    src: str = Field(default="PPR", description="Source: PPR for preprints")


class EuropePMCResult(BaseModel):
    """Individual result from Europe PMC."""

    id: str | None = None
    source: str | None = None
    pmid: str | None = None
    pmcid: str | None = None
    doi: str | None = None
    title: str | None = None
    authorString: str | None = None
    journalTitle: str | None = None
    pubYear: str | None = None
    firstPublicationDate: str | None = None
    abstractText: str | None = None

    def to_result_item(self) -> ResultItem:
        """Convert to standard ResultItem format."""
        authors_list = []
        if self.authorString:
            authors_list = [
                author.strip() for author in self.authorString.split(",")
            ]

        return ResultItem(
            pmid=int(self.pmid) if self.pmid and self.pmid.isdigit() else None,
            pmcid=self.pmcid,
            title=self.title,
            journal=f"{self.journalTitle or 'Preprint Server'} (preprint)",
            authors=authors_list,
            date=self.firstPublicationDate or self.pubYear,
            doi=self.doi,
            abstract=self.abstractText,
            publication_state=PublicationState.PREPRINT,
            source="Europe PMC",
        )


class EuropePMCResponse(BaseModel):
    """Response from Europe PMC API."""

    hitCount: int = Field(default=0)
    nextCursorMark: str | None = None
    resultList: dict[str, Any] = Field(default_factory=dict)

    @property
    def results(self) -> list[EuropePMCResult]:
        result_data = self.resultList.get("result", [])
        return [EuropePMCResult(**r) for r in result_data]


class PreprintSearcher:
    """Handles searching across multiple preprint sources."""

    def __init__(self):
        self.biorxiv_client = BiorxivClient()
        self.europe_pmc_client = EuropePMCClient()

    async def search(
        self,
        request: PubmedRequest,
        include_biorxiv: bool = True,
        include_europe_pmc: bool = True,
    ) -> SearchResponse:
        """Search across preprint sources and merge results."""
        query = self._build_query(request)

        tasks = []
        if include_biorxiv:
            tasks.append(self.biorxiv_client.search(query))
        if include_europe_pmc:
            tasks.append(self.europe_pmc_client.search(query))

        results_lists = await asyncio.gather(*tasks, return_exceptions=True)

        all_results = []
        for results in results_lists:
            if isinstance(results, list):
                all_results.extend(results)

        # Remove duplicates based on DOI
        seen_dois = set()
        unique_results = []
        for result in all_results:
            if result.doi and result.doi in seen_dois:
                continue
            if result.doi:
                seen_dois.add(result.doi)
            unique_results.append(result)

        # Sort by date (newest first)
        unique_results.sort(key=lambda x: x.date or "0000-00-00", reverse=True)

        # Limit results
        limited_results = unique_results[:SYSTEM_PAGE_SIZE]

        return SearchResponse(
            results=limited_results,
            page_size=len(limited_results),
            current=0,
            count=len(limited_results),
            total_pages=1,
        )

    def _build_query(self, request: PubmedRequest) -> str:
        """Build query string from structured request.

        Note: Preprint servers use plain text search, not PubMed syntax.
        """
        query_parts = []

        if request.keywords:
            query_parts.extend(request.keywords)
        if request.genes:
            query_parts.extend(request.genes)
        if request.diseases:
            query_parts.extend(request.diseases)
        if request.chemicals:
            query_parts.extend(request.chemicals)
        if request.variants:
            query_parts.extend(request.variants)

        return " ".join(query_parts) if query_parts else ""


class BiorxivClient:
    """Client for bioRxiv/medRxiv API.

    IMPORTANT LIMITATION: bioRxiv/medRxiv APIs do not provide a search endpoint.
    This implementation works around this limitation by:
    1. Fetching articles from a date range (last 365 days by default)
    2. Filtering results client-side based on query match in title/abstract

    This approach has limitations but is optimized for performance:
    - Searches up to 1 year of preprints by default (configurable)
    - Uses pagination to avoid fetching all results at once
    - May still miss older preprints beyond the date range

    Consider using Europe PMC for more comprehensive preprint search capabilities,
    as it has proper search functionality without date limitations.
    """

    async def search(  # noqa: C901
        self,
        query: str,
        server: str = "biorxiv",
        days_back: int = BIORXIV_DEFAULT_DAYS_BACK,
    ) -> list[ResultItem]:
        """Search bioRxiv or medRxiv for articles.

        Note: Due to API limitations, this performs client-side filtering on
        recent articles only. See class docstring for details.
        """
        if not query:
            return []

        base_url = (
            BIORXIV_BASE_URL if server == "biorxiv" else MEDRXIV_BASE_URL
        )

        # Optimize by only fetching recent articles (last 30 days by default)
        from datetime import timedelta

        today = datetime.now()
        start_date = today - timedelta(days=days_back)
        interval = f"{start_date.year}-{start_date.month:02d}-{start_date.day:02d}/{today.year}-{today.month:02d}-{today.day:02d}"

        # Prepare query terms for better matching
        query_terms = query.lower().split()

        filtered_results = []
        cursor = 0
        max_pages = (
            BIORXIV_MAX_PAGES  # Limit pagination to avoid excessive API calls
        )

        for page in range(max_pages):
            request = BiorxivRequest(
                query=query, interval=interval, cursor=cursor
            )
            url = f"{base_url}/{request.interval}/{request.cursor}"

            response, error = await http_client.request_api(
                url=url,
                method="GET",
                request={},
                response_model_type=BiorxivResponse,
                domain="biorxiv",
                cache_ttl=300,  # Cache for 5 minutes
            )

            if error or not response:
                logger.warning(
                    f"Failed to fetch {server} articles page {page} for query '{query}': {error if error else 'No response'}"
                )
                break

            # Filter results based on query
            page_filtered = 0
            for result in response.collection:
                # Create searchable text from title and abstract
                searchable_text = ""
                if result.title:
                    searchable_text += result.title.lower() + " "
                if result.abstract:
                    searchable_text += result.abstract.lower()

                # Check if all query terms are present (AND logic)
                if all(term in searchable_text for term in query_terms):
                    filtered_results.append(result.to_result_item())
                    page_filtered += 1

                    # Stop if we have enough results
                    if len(filtered_results) >= SYSTEM_PAGE_SIZE:
                        return filtered_results[:SYSTEM_PAGE_SIZE]

            # If this page had no matches and we have some results, stop pagination
            if page_filtered == 0 and filtered_results:
                break

            # Move to next page
            cursor += len(response.collection)

            # Stop if we've processed all available results
            if (
                len(response.collection) < BIORXIV_RESULTS_PER_PAGE
            ):  # bioRxiv typically returns this many per page
                break

        return filtered_results[:SYSTEM_PAGE_SIZE]


class EuropePMCClient:
    """Client for Europe PMC API."""

    async def search(
        self, query: str, max_results: int = SYSTEM_PAGE_SIZE
    ) -> list[ResultItem]:
        """Search Europe PMC for preprints with pagination support."""
        results: list[ResultItem] = []
        cursor_mark = "*"
        page_size = min(
            EUROPE_PMC_PAGE_SIZE, max_results
        )  # Europe PMC optimal page size

        while len(results) < max_results:
            request = EuropePMCRequest(
                query=f"(SRC:PPR) AND ({query})" if query else "SRC:PPR",
                pageSize=page_size,
                cursorMark=cursor_mark,
            )

            params = request.model_dump(exclude_none=True)

            response, error = await http_client.request_api(
                url=EUROPE_PMC_BASE_URL,
                method="GET",
                request=params,
                response_model_type=EuropePMCResponse,
                domain="europepmc",
                cache_ttl=300,  # Cache for 5 minutes
            )

            if error or not response:
                logger.warning(
                    f"Failed to fetch Europe PMC preprints for query '{query}': {error if error else 'No response'}"
                )
                break

            # Add results
            page_results = [
                result.to_result_item() for result in response.results
            ]
            results.extend(page_results)

            # Check if we have more pages
            if (
                not response.nextCursorMark
                or response.nextCursorMark == cursor_mark
            ):
                break

            # Check if we got fewer results than requested (last page)
            if len(page_results) < page_size:
                break

            cursor_mark = response.nextCursorMark

            # Adjust page size for last request if needed
            remaining = max_results - len(results)
            if remaining < page_size:
                page_size = remaining

        return results[:max_results]


async def fetch_europe_pmc_article(
    doi: str,
    output_json: bool = False,
) -> str:
    """Fetch a single article from Europe PMC by DOI."""
    # Europe PMC search API can fetch article details by DOI
    request = EuropePMCRequest(
        query=f'DOI:"{doi}"',
        pageSize=1,
        src="PPR",  # Preprints source
    )

    params = request.model_dump(exclude_none=True)

    response, error = await http_client.request_api(
        url=EUROPE_PMC_BASE_URL,
        method="GET",
        request=params,
        response_model_type=EuropePMCResponse,
        domain="europepmc",
    )

    if error:
        data: list[dict[str, Any]] = [
            {"error": f"Error {error.code}: {error.message}"}
        ]
    elif response and response.results:
        # Convert Europe PMC result to Article format for consistency
        europe_pmc_result = response.results[0]
        article_data = {
            "pmid": None,  # Europe PMC preprints don't have PMIDs
            "pmcid": europe_pmc_result.pmcid,
            "doi": europe_pmc_result.doi,
            "title": europe_pmc_result.title,
            "journal": f"{europe_pmc_result.journalTitle or 'Preprint Server'} (preprint)",
            "date": europe_pmc_result.firstPublicationDate
            or europe_pmc_result.pubYear,
            "authors": [
                author.strip()
                for author in (europe_pmc_result.authorString or "").split(",")
            ],
            "abstract": europe_pmc_result.abstractText,
            "full_text": "",  # Europe PMC API doesn't provide full text for preprints
            "pubmed_url": None,
            "pmc_url": f"https://europepmc.org/article/PPR/{doi}"
            if doi
            else None,
            "source": "Europe PMC",
        }
        data = [article_data]
    else:
        data = [{"error": "Article not found in Europe PMC"}]

    if data and not output_json:
        return render.to_markdown(data)
    else:
        return json.dumps(data, indent=2)


async def search_preprints(
    request: PubmedRequest,
    include_biorxiv: bool = True,
    include_europe_pmc: bool = True,
    output_json: bool = False,
) -> str:
    """Search for preprints across multiple sources."""
    searcher = PreprintSearcher()
    response = await searcher.search(
        request,
        include_biorxiv=include_biorxiv,
        include_europe_pmc=include_europe_pmc,
    )

    if response and response.results:
        data = [
            result.model_dump(mode="json", exclude_none=True)
            for result in response.results
        ]
    else:
        data = []

    if data and not output_json:
        return render.to_markdown(data)
    else:
        return json.dumps(data, indent=2)

```

--------------------------------------------------------------------------------
/src/biomcp/query_parser.py:
--------------------------------------------------------------------------------

```python
"""Query parser for unified search language in BioMCP."""

from dataclasses import dataclass
from enum import Enum
from typing import Any


class Operator(str, Enum):
    """Query operators."""

    EQ = ":"
    GT = ">"
    LT = "<"
    GTE = ">="
    LTE = "<="
    RANGE = ".."
    AND = "AND"
    OR = "OR"
    NOT = "NOT"


class FieldType(str, Enum):
    """Field data types."""

    STRING = "string"
    NUMBER = "number"
    DATE = "date"
    ENUM = "enum"
    BOOLEAN = "boolean"


@dataclass
class FieldDefinition:
    """Definition of a searchable field."""

    name: str
    domain: str  # "trials", "articles", "variants", "cross"
    type: FieldType
    operators: list[str]
    example_values: list[str]
    description: str
    underlying_api_field: str
    aliases: list[str] | None = None


@dataclass
class QueryTerm:
    """Parsed query term."""

    field: str
    operator: Operator
    value: Any
    domain: str | None = None
    is_negated: bool = False


@dataclass
class ParsedQuery:
    """Parsed query structure."""

    terms: list[QueryTerm]
    cross_domain_fields: dict[str, Any]
    domain_specific_fields: dict[str, dict[str, Any]]
    raw_query: str


class QueryParser:
    """Parser for unified search queries."""

    def __init__(self):
        self.field_registry = self._build_field_registry()

    def _build_field_registry(self) -> dict[str, FieldDefinition]:
        """Build the field registry with all searchable fields."""
        registry = {}

        # Cross-domain fields
        cross_domain_fields = [
            FieldDefinition(
                name="gene",
                domain="cross",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["BRAF", "TP53", "EGFR"],
                description="Gene symbol",
                underlying_api_field="gene",
            ),
            FieldDefinition(
                name="variant",
                domain="cross",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["V600E", "L858R", "rs113488022"],
                description="Variant notation or rsID",
                underlying_api_field="variant",
            ),
            FieldDefinition(
                name="disease",
                domain="cross",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["melanoma", "lung cancer", "diabetes"],
                description="Disease or condition",
                underlying_api_field="disease",
            ),
        ]

        # Trial-specific fields
        trial_fields = [
            FieldDefinition(
                name="trials.condition",
                domain="trials",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["melanoma", "lung cancer"],
                description="Clinical trial condition",
                underlying_api_field="conditions",
            ),
            FieldDefinition(
                name="trials.intervention",
                domain="trials",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["osimertinib", "pembrolizumab"],
                description="Trial intervention",
                underlying_api_field="interventions",
            ),
            FieldDefinition(
                name="trials.phase",
                domain="trials",
                type=FieldType.ENUM,
                operators=[Operator.EQ],
                example_values=["1", "2", "3", "4"],
                description="Trial phase",
                underlying_api_field="phase",
            ),
            FieldDefinition(
                name="trials.status",
                domain="trials",
                type=FieldType.ENUM,
                operators=[Operator.EQ],
                example_values=["recruiting", "active", "completed"],
                description="Trial recruitment status",
                underlying_api_field="recruiting_status",
            ),
        ]

        # Article-specific fields
        article_fields = [
            FieldDefinition(
                name="articles.title",
                domain="articles",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["EGFR mutations", "cancer therapy"],
                description="Article title",
                underlying_api_field="title",
            ),
            FieldDefinition(
                name="articles.author",
                domain="articles",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["Smith J", "Johnson A"],
                description="Article author",
                underlying_api_field="author",
            ),
            FieldDefinition(
                name="articles.journal",
                domain="articles",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["Nature", "Science", "Cell"],
                description="Journal name",
                underlying_api_field="journal",
            ),
            FieldDefinition(
                name="articles.date",
                domain="articles",
                type=FieldType.DATE,
                operators=[Operator.GT, Operator.LT, Operator.RANGE],
                example_values=[">2023-01-01", "2023-01-01..2024-01-01"],
                description="Publication date",
                underlying_api_field="date",
            ),
        ]

        # Variant-specific fields
        variant_fields = [
            FieldDefinition(
                name="variants.rsid",
                domain="variants",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["rs113488022", "rs121913529"],
                description="dbSNP rsID",
                underlying_api_field="rsid",
            ),
            FieldDefinition(
                name="variants.gene",
                domain="variants",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["BRAF", "TP53"],
                description="Gene containing variant",
                underlying_api_field="gene",
            ),
            FieldDefinition(
                name="variants.significance",
                domain="variants",
                type=FieldType.ENUM,
                operators=[Operator.EQ],
                example_values=["pathogenic", "benign", "uncertain"],
                description="Clinical significance",
                underlying_api_field="significance",
            ),
            FieldDefinition(
                name="variants.frequency",
                domain="variants",
                type=FieldType.NUMBER,
                operators=[Operator.LT, Operator.GT],
                example_values=["<0.01", ">0.05"],
                description="Population allele frequency",
                underlying_api_field="frequency",
            ),
        ]

        # Gene-specific fields
        gene_fields = [
            FieldDefinition(
                name="genes.symbol",
                domain="genes",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["BRAF", "TP53", "EGFR"],
                description="Gene symbol",
                underlying_api_field="symbol",
            ),
            FieldDefinition(
                name="genes.name",
                domain="genes",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=[
                    "tumor protein p53",
                    "epidermal growth factor receptor",
                ],
                description="Gene name",
                underlying_api_field="name",
            ),
            FieldDefinition(
                name="genes.type",
                domain="genes",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["protein-coding", "pseudo", "ncRNA"],
                description="Gene type",
                underlying_api_field="type_of_gene",
            ),
        ]

        # Drug-specific fields
        drug_fields = [
            FieldDefinition(
                name="drugs.name",
                domain="drugs",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["imatinib", "aspirin", "metformin"],
                description="Drug name",
                underlying_api_field="name",
            ),
            FieldDefinition(
                name="drugs.tradename",
                domain="drugs",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["Gleevec", "Tylenol", "Lipitor"],
                description="Drug trade name",
                underlying_api_field="tradename",
            ),
            FieldDefinition(
                name="drugs.indication",
                domain="drugs",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["leukemia", "hypertension", "diabetes"],
                description="Drug indication",
                underlying_api_field="indication",
            ),
        ]

        # Disease-specific fields
        disease_fields = [
            FieldDefinition(
                name="diseases.name",
                domain="diseases",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["melanoma", "breast cancer", "diabetes"],
                description="Disease name",
                underlying_api_field="name",
            ),
            FieldDefinition(
                name="diseases.mondo",
                domain="diseases",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["MONDO:0005105", "MONDO:0007254"],
                description="MONDO disease ID",
                underlying_api_field="mondo_id",
            ),
            FieldDefinition(
                name="diseases.synonym",
                domain="diseases",
                type=FieldType.STRING,
                operators=[Operator.EQ],
                example_values=["cancer", "tumor", "neoplasm"],
                description="Disease synonym",
                underlying_api_field="synonyms",
            ),
        ]

        # Build registry
        for field_list in [
            cross_domain_fields,
            trial_fields,
            article_fields,
            variant_fields,
            gene_fields,
            drug_fields,
            disease_fields,
        ]:
            for field in field_list:
                registry[field.name] = field

        return registry

    def parse(self, query: str) -> ParsedQuery:
        """Parse a unified search query."""
        # Simple tokenization - in production, use a proper parser
        terms = self._tokenize(query)
        parsed_terms = []

        cross_domain = {}
        domain_specific: dict[str, dict[str, Any]] = {
            "trials": {},
            "articles": {},
            "variants": {},
            "genes": {},
            "drugs": {},
            "diseases": {},
        }

        for term in terms:
            if ":" in term:
                field, value = term.split(":", 1)

                # Check if it's a known field
                if field in self.field_registry:
                    field_def = self.field_registry[field]
                    parsed_term = QueryTerm(
                        field=field,
                        operator=Operator.EQ,
                        value=value.strip('"'),
                        domain=field_def.domain,
                    )
                    parsed_terms.append(parsed_term)

                    # Categorize the term
                    if field_def.domain == "cross":
                        cross_domain[field] = value.strip('"')
                    else:
                        domain = (
                            field.split(".")[0]
                            if "." in field
                            else field_def.domain
                        )
                        if domain not in domain_specific:
                            domain_specific[domain] = {}
                        field_name = (
                            field.split(".")[-1] if "." in field else field
                        )
                        domain_specific[domain][field_name] = value.strip('"')

        return ParsedQuery(
            terms=parsed_terms,
            cross_domain_fields=cross_domain,
            domain_specific_fields=domain_specific,
            raw_query=query,
        )

    def _tokenize(self, query: str) -> list[str]:
        """Simple tokenizer for query strings."""
        # This is a simplified tokenizer - in production, use a proper lexer
        # For now, split on AND/OR/NOT while preserving field:value pairs
        tokens = []
        current_token = ""
        in_quotes = False

        for char in query:
            if char == '"':
                in_quotes = not in_quotes
                current_token += char
            elif char == " " and not in_quotes:
                if current_token:
                    tokens.append(current_token)
                    current_token = ""
            else:
                current_token += char

        if current_token:
            tokens.append(current_token)

        # Filter out boolean operators for now
        return [t for t in tokens if t not in ["AND", "OR", "NOT"]]

    def get_schema(self) -> dict[str, Any]:
        """Get the complete field schema for discovery."""
        schema: dict[str, Any] = {
            "domains": [
                "trials",
                "articles",
                "variants",
                "genes",
                "drugs",
                "diseases",
            ],
            "cross_domain_fields": {},
            "domain_fields": {
                "trials": {},
                "articles": {},
                "variants": {},
                "genes": {},
                "drugs": {},
                "diseases": {},
            },
            "operators": [op.value for op in Operator],
            "examples": [
                "gene:BRAF AND trials.condition:melanoma",
                "articles.date:>2023 AND disease:cancer",
                "variants.significance:pathogenic AND gene:TP53",
                "genes.symbol:BRAF AND genes.type:protein-coding",
                "drugs.tradename:gleevec",
                "diseases.name:melanoma",
            ],
        }

        for field_name, field_def in self.field_registry.items():
            field_info = {
                "type": field_def.type.value,
                "operators": field_def.operators,
                "examples": field_def.example_values,
                "description": field_def.description,
            }

            if field_def.domain == "cross":
                schema["cross_domain_fields"][field_name] = field_info
            else:
                domain = field_name.split(".")[0]
                field_short_name = field_name.split(".")[-1]
                schema["domain_fields"][domain][field_short_name] = field_info

        return schema

```

--------------------------------------------------------------------------------
/src/biomcp/resources/instructions.md:
--------------------------------------------------------------------------------

```markdown
# BioMCP Instructions for the Biomedical Assistant

Welcome to **BioMCP** – your unified interface to access key biomedical data
sources. This document serves as an internal instruction set for the biomedical
assistant (LLM) to ensure a clear, well-reasoned, and accurate response to user
queries.

---

## CRITICAL: Always Use the 'think' Tool FIRST

**The 'think' tool is MANDATORY and must be your FIRST action when using BioMCP.**

🚨 **REQUIRED USAGE:**

- You MUST call 'think' BEFORE any search or fetch operations
- EVERY biomedical research query requires thinking first
- ALL multi-step analyses must begin with the think tool
- ANY task using BioMCP tools requires prior planning with think

⚠️ **WARNING:** Skipping the 'think' tool will result in:

- Incomplete analysis
- Poor search strategies
- Missing critical connections
- Suboptimal results

Start EVERY BioMCP interaction with the 'think' tool. Use it throughout your analysis to track progress. Only set nextThoughtNeeded=false when your analysis is complete.

---

## 1. Purpose of BioMCP

BioMCP (Biomedical Model Context Protocol) standardizes access to multiple
biomedical data sources. It transforms complex, filter-intensive queries into
natural language interactions. The assistant should leverage this capability
to:

- Integrate clinical trial data, literature, variant annotations, and
  comprehensive biomedical information from multiple resources.
- Synthesize the results into a coherent, accurate, and concise answer.
- Enhance user trust by providing key snippets and citations (with clickable
  URLs) from the original materials, unless the user opts to omit them.

---

## 2. Available Data Sources

BioMCP provides access to the following biomedical databases:

### Literature & Clinical Sources

- **PubMed/PubTator3**: Peer-reviewed biomedical literature with entity annotations
- **bioRxiv/medRxiv**: Preprint servers (included by default in article searches)
- **Europe PMC**: Additional literature including preprints
- **ClinicalTrials.gov**: Clinical trial registry with comprehensive trial data

### BioThings Suite APIs

- **MyVariant.info**: Genetic variant annotations and population frequencies
- **MyGene.info**: Real-time gene information, aliases, and summaries
- **MyDisease.info**: Disease ontology, definitions, and synonym expansion
- **MyChem.info**: Drug/chemical properties, mechanisms, and identifiers

### Cancer & Genomic Resources

- **cBioPortal**: Cancer genomics data (automatically integrated with gene searches)
- **TCGA/GDC**: The Cancer Genome Atlas data for variants
- **1000 Genomes**: Population frequency data via Ensembl

---

## 3. Internal Workflow for Query Handling

When a user query is received (for example, "Please investigate ALK
rearrangements in advanced NSCLC..."), the assistant should follow these steps:

### A. ALWAYS Start with the 'think' Tool

- **Use 'think' immediately:** For ANY biomedical research query, you MUST begin by invoking the 'think' tool to break down the problem systematically.
- **Initial thought should:** Parse the user's natural language query and extract relevant details such as gene variants (e.g., ALK rearrangements), disease type (advanced NSCLC), and treatment focus (combinations of ALK inhibitors with immunotherapy).
- **Continue thinking:** Use additional 'think' calls to plan your approach, identify data sources needed, and track your analysis progress.

### B. Plan and Explain the Tool Sequence (via the 'think' Tool)

- **Use 'think' to plan:** Continue using the 'think' tool to outline your reasoning and planned tool sequence:
  - **Step 1:** Use gene_getter to understand ALK gene function and context.
  - **Step 2:** Use disease_getter to get comprehensive information about NSCLC,
    including synonyms for better search coverage.
  - **Step 3:** Use ClinicalTrials.gov to retrieve clinical trial data
    related to the query (disease synonyms are automatically expanded).
  - **Step 4:** Use PubMed (via PubTator3) to fetch relevant literature
    discussing outcomes or synergy. Note: Preprints from bioRxiv/medRxiv
    are included by default, and cBioPortal cancer genomics data is
    automatically integrated for gene-based searches.
  - **Step 5:** Query MyVariant.info for variant annotations (noting
    limitations for gene fusions if applicable).
  - **Step 6:** If specific drugs are mentioned, use drug_getter for
    mechanism of action and properties.
- **Transparency:** Clearly indicate which tool is being called for which part
  of the query.

#### Search Syntax Enhancement: OR Logic for Keywords

When searching articles, the keywords parameter now supports OR logic using the pipe (|) separator:

**Syntax**: `keyword1|keyword2|keyword3`

**Examples**:

- `"R173|Arg173|p.R173"` - Finds articles mentioning any of these variant notations
- `"V600E|p.V600E|c.1799T>A"` - Handles different mutation nomenclatures
- `"immunotherapy|checkpoint inhibitor|PD-1"` - Searches for related treatment terms
- `"NSCLC|non-small cell lung cancer"` - Covers abbreviations and full terms

**Important Notes**:

- OR logic only applies within a single keyword parameter
- Multiple keywords are still combined with AND logic
- Example: keywords=["BRAF|B-RAF", "therapy|treatment"] means:
  - (BRAF OR B-RAF) AND (therapy OR treatment)

This feature is particularly useful for:

- Handling different nomenclatures for the same concept
- Searching for synonyms or related terms
- Dealing with abbreviations and full names
- Finding articles that use different notations for variants

### C. Execute and Synthesize Results

- **Combine Data:** After retrieving results from each tool, synthesize the
  information into a final answer.
- **Include Citations with URLs:** Always include clickable URLs from the
  original sources in your citations. Extract URLs (Pubmed_Url, Doi_Url,
  Study_Url, etc.) from function results and incorporate these into your
  response when referencing specific findings or papers.
- **Follow-up Opportunity:** If the response leaves any ambiguity or if
  additional information might be helpful, prompt the user for follow-up
  questions.

---

## 3. Best Practices for the Biomedical Assistant

- **Understanding the Query:** Focus on accurately interpreting the user's
  query, rather than instructing the user on query formulation.
- **Reasoning Transparency:** Briefly explain your thought process and the
  sequence of tool calls before presenting the final answer.
- **Conciseness and Clarity:** Ensure your final response is succinct and
  well-organized, using bullet points or sections as needed.
- **Citation Inclusion Mandatory:** Provide key snippets and links to the
  original materials (e.g., clinical trial records, PubMed articles, ClinVar
  entries, COSMIC database) to support the answer. ALWAYS include clickable
  URLs to these resources when referencing specific findings or data.
- **User Follow-up Questions Before Startup:** If anything is unclear in the
  user's query or if more details would improve the answer, politely request
  additional clarification.
- **Audience Awareness:** Structure your response with both depth for
  specialists and clarity for general audiences. Begin with accessible
  explanations before delving into scientific details.
- **Organization and Clarity:** Ensure your final response is well-structured,
  accessible, and easy to navigate by:
  - Using descriptive section headings and subheadings to organize
    information logically
  - Employing consistent formatting with bulleted or numbered lists to break
    down complex information
  - Starting each major section with a plain-language summary before
    exploring technical details
  - Creating clear visual separation between different topics
  - Using concise sentence structures while maintaining informational depth
  - Explicitly differentiating between established practices and experimental
    approaches
  - Including brief transition sentences between major sections
  - Presenting clinical trial data in consistent formats
  - Using strategic white space to improve readability
  - Summarizing key takeaways at the end of major sections when appropriate

---

## 4. Visual Organization and Formatting

- **Comparison Tables:** When comparing two or more entities (like mutation
  classes, treatment approaches, or clinical trials), create a comparison table
  to highlight key differences at a glance. Tables should have clear headers,
  consistent formatting, and focus on the most important distinguishing
  features.
- **Format Optimization:** Utilize formatting elements strategically - tables
  for comparisons, bullet points for lists, headings for section organization,
  and whitespace for readability.
- **Visual Hierarchy:** For complex biomedical topics, create a visual
  hierarchy that helps readers quickly identify key information.
- **Balance Between Comprehensiveness and Clarity:** While providing
  comprehensive information, prioritize clarity and accessibility. Organize
  content from most important/general to more specialized details.
- **Section Summaries:** Conclude sections with key takeaways that highlight
  the practical implications of the scientific information.

---

## 5. Example Scenario: ALK Rearrangements in Advanced NSCLC

### Example 1: ALK Rearrangements in Advanced NSCLC

For a query such as:

```
Please investigate ALK rearrangements in advanced NSCLC, particularly any
clinical trials exploring combinations of ALK inhibitors and immunotherapy.
```

The assistant should:

1. **Start with the 'think' Tool:**
   - Invoke 'think' with thoughtNumber=1 to understand the query focus on ALK rearrangements in advanced NSCLC with combination treatments
   - Use thoughtNumber=2 to plan the research approach and identify needed data sources
2. **Execute Tool Calls (tracking with 'think'):**
   - **First:** Use gene_getter("ALK") to understand the gene's function and role in cancer (document findings in thoughtNumber=3)
   - **Second:** Use disease_getter("NSCLC") to get disease information and synonyms like "non-small cell lung cancer" (document in thoughtNumber=4)
   - **Third:** Query ClinicalTrials.gov for ALK+ NSCLC trials that combine ALK inhibitors with immunotherapy (document findings in thoughtNumber=5)
   - **Fourth:** Query PubMed to retrieve key articles discussing treatment outcomes or synergy (document in thoughtNumber=6)
   - **Fifth:** Check MyVariant.info for any annotations on ALK fusions or rearrangements (document in thoughtNumber=7)
   - **Sixth:** If specific ALK inhibitors are mentioned, use drug_getter to understand their mechanisms (document in thoughtNumber=8)
3. **Synthesize and Report (via 'think'):** Use final thoughts to synthesize findings before producing the answer that includes:
   - A concise summary of clinical trials with comparison tables like:

| **Trial**        | **Combination**        | **Patient Population**         | **Results** | **Safety Profile**                              | **Reference**                                                    |
| ---------------- | ---------------------- | ------------------------------ | ----------- | ----------------------------------------------- | ---------------------------------------------------------------- |
| CheckMate 370    | Crizotinib + Nivolumab | 13 treatment-naive ALK+ NSCLC  | 38% ORR     | 5/13 with grade ≥3 hepatic toxicities; 2 deaths | [Schenk et al., 2023](https://pubmed.ncbi.nlm.nih.gov/36895933/) |
| JAVELIN Lung 101 | Avelumab + Lorlatinib  | 28 previously treated patients | 46.4% ORR   | No DLTs; milder toxicity                        | [NCT02584634](https://clinicaltrials.gov/study/NCT02584634)      |

    - Key literature findings with proper citations:
      "A review by Schenk concluded that combining ALK inhibitors with checkpoint inhibitors resulted in 'significant toxicities without clear improvement in patient outcomes' [https://pubmed.ncbi.nlm.nih.gov/36895933/](https://pubmed.ncbi.nlm.nih.gov/36895933/)."

    - Tables comparing response rates:

| **Study**             | **Patient Population** | **Immunotherapy Agent**       | **Response Rate** | **Reference**                                                 |
| --------------------- | ---------------------- | ----------------------------- | ----------------- | ------------------------------------------------------------- |
| ATLANTIC Trial        | 11 ALK+ NSCLC          | Durvalumab                    | 0%                | [Link to study](https://pubmed.ncbi.nlm.nih.gov/36895933/)    |
| IMMUNOTARGET Registry | 19 ALK+ NSCLC          | Various PD-1/PD-L1 inhibitors | 0%                | [Link to registry](https://pubmed.ncbi.nlm.nih.gov/36895933/) |

    - Variant information with proper attribution.

4. **Offer Follow-up:** Conclude by asking if further details are needed or if
   any part of the answer should be clarified.

### Example 2: BRAF Mutation Classes in Cancer Therapeutics

For a query such as:

```
Please investigate the differences in BRAF Class I (e.g., V600E) and Class III
(e.g., D594G) mutations that lead to different therapeutic strategies in cancers
like melanoma or colorectal carcinoma.
```

The assistant should:

1. **Understand and Clarify:** Identify that the query focuses on comparing two
   specific BRAF mutation classes (Class I/V600E vs. Class III/D594G) and their
   therapeutic implications in melanoma and colorectal cancer.

2. **Plan Tool Calls:**

   - **First:** Search PubMed literature to understand the molecular
     differences between BRAF Class I and Class III mutations.
   - **Second:** Explore specific variant details using the variant search
     tool to understand the characteristics of these mutations.
   - **Third:** Look for clinical trials involving these mutation types to
     identify therapeutic strategies.

3. **Synthesize and Report:** Create a comprehensive comparison that includes:
   - Comparison tables highlighting key differences between mutation classes:

| Feature                      | Class I (e.g., V600E)          | Class III (e.g., D594G)                    |
| ---------------------------- | ------------------------------ | ------------------------------------------ |
| **Signaling Mechanism**      | Constitutively active monomers | Kinase-impaired heterodimers               |
| **RAS Dependency**           | RAS-independent                | RAS-dependent                              |
| **Dimerization Requirement** | Function as monomers           | Require heterodimerization with CRAF       |
| **Therapeutic Response**     | Responsive to BRAF inhibitors  | Paradoxically activated by BRAF inhibitors |

    - Specific therapeutic strategies with clickable citation links:
        - For Class I: BRAF inhibitors as demonstrated
          in [Davies et al.](https://pubmed.ncbi.nlm.nih.gov/35869122/)
        - For Class III: Alternative approaches such as MEK inhibitors shown
          in [Śmiech et al.](https://pubmed.ncbi.nlm.nih.gov/33198372/)

    - Cancer-specific implications with relevant clinical evidence:
        - Melanoma treatment differences including clinical trial data
          from [NCT05767879](https://clinicaltrials.gov/study/NCT05767879)
        - Colorectal cancer approaches citing research
          from [Liu et al.](https://pubmed.ncbi.nlm.nih.gov/37760573/)

4. **Offer Follow-up:** Conclude by asking if the user would like more detailed
   information on specific aspects, such as resistance mechanisms, emerging
   therapies, or mutation detection methods.

```

--------------------------------------------------------------------------------
/docs/tutorials/openfda-prompts.md:
--------------------------------------------------------------------------------

```markdown
# OpenFDA Example Prompts for AI Agents

This document provides example prompts that demonstrate effective use of BioMCP's OpenFDA integration for various precision oncology use cases.

## Drug Safety Assessment

### Basic Safety Profile

```
What are the most common adverse events reported for pembrolizumab?
Include both serious and non-serious events.
```

**Expected BioMCP Usage:**

1. `think` - Plan safety assessment approach
2. `openfda_adverse_searcher(drug="pembrolizumab", limit=50)`
3. Analyze and summarize top reactions

### Comparative Safety Analysis

```
Compare the adverse event profiles of imatinib and dasatinib for CML treatment.
Focus on serious events and their frequencies.
```

**Expected BioMCP Usage:**

1. `think` - Plan comparative analysis
2. `openfda_adverse_searcher(drug="imatinib", serious=True)`
3. `openfda_adverse_searcher(drug="dasatinib", serious=True)`
4. Compare and contrast findings

### Drug Interaction Investigation

```
A patient on warfarin needs to start erlotinib for NSCLC. What drug interactions
and adverse events should we monitor based on FDA data?
```

**Expected BioMCP Usage:**

1. `think` - Consider interaction risks
2. `openfda_label_searcher(name="erlotinib")` - Check drug interactions section
3. `openfda_adverse_searcher(drug="erlotinib", reaction="bleeding")`
4. `openfda_adverse_searcher(drug="erlotinib", reaction="INR")`

## Treatment Planning

### Indication Verification

```
Is trastuzumab deruxtecan FDA-approved for HER2-low breast cancer?
What are the specific approved indications?
```

**Expected BioMCP Usage:**

1. `think` - Plan indication search
2. `openfda_label_searcher(name="trastuzumab deruxtecan")`
3. `openfda_label_getter(set_id="...")` - Get full indications section
4. Extract and summarize approved uses

### Contraindication Screening

```
Patient has severe hepatic impairment. Which targeted therapy drugs for
melanoma have contraindications or warnings for liver dysfunction?
```

**Expected BioMCP Usage:**

1. `think` - Identify melanoma drugs to check
2. `openfda_label_searcher(indication="melanoma")`
3. For each drug: `openfda_label_getter(set_id="...", sections=["contraindications", "warnings_and_precautions"])`
4. Summarize liver-related contraindications

### Dosing Guidelines

```
What is the FDA-recommended dosing for osimertinib in EGFR-mutated NSCLC,
including dose modifications for adverse events?
```

**Expected BioMCP Usage:**

1. `think` - Plan dosing information retrieval
2. `openfda_label_searcher(name="osimertinib")`
3. `openfda_label_getter(set_id="...", sections=["dosage_and_administration", "dose_modifications"])`
4. Extract dosing guidelines

## Device Reliability Assessment

### Genomic Test Reliability

```
What adverse events have been reported for NGS-based cancer diagnostic devices?
Show me any false positive or accuracy issues.
```

**Expected BioMCP Usage:**

1. `think` - Consider test reliability factors
2. `openfda_device_searcher(genomics_only=True, limit=25)` - Get all genomic device events
3. `openfda_device_searcher(problem="false positive", genomics_only=True)`
4. `openfda_device_searcher(problem="accuracy", genomics_only=True)`
5. For significant events: `openfda_device_getter(mdr_report_key="...")`

**Note:** The FDA database uses abbreviated names (e.g., "F1CDX" instead of "FoundationOne CDx").
For specific devices, try: `openfda_device_searcher(device="F1CDX")` or search by key terms.

### Laboratory Equipment Issues

```
Our lab uses Illumina sequencers. What device malfunctions have been
reported that could impact our genomic testing workflow?
```

**Expected BioMCP Usage:**

1. `think` - Assess potential workflow impacts
2. `openfda_device_searcher(manufacturer="Illumina", genomics_only=True)`
3. Analyze problem patterns
4. `openfda_device_getter(mdr_report_key="...")` for critical issues

## Comprehensive Drug Evaluation

### New Drug Assessment

```
Provide a comprehensive safety and efficacy profile for sotorasib (Lumakras)
including FDA approval, indications, major warnings, and post-market adverse events.
```

**Expected BioMCP Usage:**

1. `think` - Plan comprehensive assessment
2. `drug_getter("sotorasib")` - Basic drug info
3. `openfda_label_searcher(name="sotorasib")`
4. `openfda_label_getter(set_id="...")` - Full label
5. `openfda_adverse_searcher(drug="sotorasib", serious=True)`
6. `trial_searcher(interventions=["sotorasib"])` - Ongoing trials

### Risk-Benefit Analysis

```
For a 75-year-old patient with metastatic melanoma, analyze the risk-benefit
profile of nivolumab plus ipilimumab combination therapy based on FDA data.
```

**Expected BioMCP Usage:**

1. `think` - Structure risk-benefit analysis
2. `openfda_label_searcher(name="nivolumab")`
3. `openfda_label_searcher(name="ipilimumab")`
4. `openfda_label_getter(set_id="...", sections=["geriatric_use", "warnings_and_precautions"])`
5. `openfda_adverse_searcher(drug="nivolumab", serious=True)`
6. `openfda_adverse_searcher(drug="ipilimumab", serious=True)`

## Special Populations

### Pregnancy Considerations

```
Which FDA-approved lung cancer treatments have pregnancy category data
or specific warnings for pregnant patients?
```

**Expected BioMCP Usage:**

1. `think` - Plan pregnancy safety search
2. `openfda_label_searcher(indication="lung cancer")`
3. For each drug: `openfda_label_getter(set_id="...", sections=["pregnancy", "use_in_specific_populations"])`
4. Compile pregnancy categories and warnings

### Pediatric Oncology

```
What FDA-approved indications and safety data exist for using
checkpoint inhibitors in pediatric cancer patients?
```

**Expected BioMCP Usage:**

1. `think` - Identify checkpoint inhibitors
2. `openfda_label_searcher(name="pembrolizumab")`
3. `openfda_label_getter(set_id="...", sections=["pediatric_use"])`
4. `openfda_adverse_searcher(drug="pembrolizumab")` - Filter for pediatric if possible
5. Repeat for other checkpoint inhibitors

## Complex Queries

### Multi-Drug Regimen Safety

```
Analyze potential safety concerns for the FOLFOX chemotherapy regimen
(5-FU, leucovorin, oxaliplatin) based on FDA adverse event data.
```

**Expected BioMCP Usage:**

1. `think` - Plan multi-drug analysis
2. `openfda_adverse_searcher(drug="fluorouracil")`
3. `openfda_adverse_searcher(drug="leucovorin")`
4. `openfda_adverse_searcher(drug="oxaliplatin")`
5. Identify overlapping toxicities
6. `openfda_label_searcher(name="oxaliplatin")` - Check for combination warnings

### Biomarker-Driven Treatment Selection

```
For a patient with BRAF V600E mutant melanoma with brain metastases,
what FDA-approved treatments are available and what are their CNS-specific
efficacy and safety considerations?
```

**Expected BioMCP Usage:**

1. `think` - Structure biomarker-driven search
2. `article_searcher(genes=["BRAF"], variants=["V600E"], diseases=["melanoma"])`
3. `openfda_label_searcher(indication="melanoma")`
4. For BRAF inhibitors: `openfda_label_getter(set_id="...", sections=["clinical_studies", "warnings_and_precautions"])`
5. `openfda_adverse_searcher(drug="dabrafenib", reaction="seizure")`
6. `openfda_adverse_searcher(drug="vemurafenib", reaction="brain")`

### Treatment Failure Analysis

```
A patient's lung adenocarcinoma progressed on osimertinib. Based on FDA data,
what are the documented resistance mechanisms and alternative approved treatments?
```

**Expected BioMCP Usage:**

1. `think` - Analyze resistance and alternatives
2. `openfda_label_getter(set_id="...", sections=["clinical_studies"])` for osimertinib
3. `article_searcher(genes=["EGFR"], keywords=["resistance", "osimertinib"])`
4. `openfda_label_searcher(indication="non-small cell lung cancer")`
5. `trial_searcher(conditions=["NSCLC"], keywords=["osimertinib resistant"])`

## Safety Monitoring

### Post-Market Surveillance

```
Have there been any new safety signals for CDK4/6 inhibitors
(palbociclib, ribociclib, abemaciclib) in the past year?
```

**Expected BioMCP Usage:**

1. `think` - Plan safety signal detection
2. `openfda_adverse_searcher(drug="palbociclib", limit=100)`
3. `openfda_adverse_searcher(drug="ribociclib", limit=100)`
4. `openfda_adverse_searcher(drug="abemaciclib", limit=100)`
5. Analyze for unusual patterns or frequencies

### Rare Adverse Event Investigation

```
Investigate reports of pneumonitis associated with immune checkpoint inhibitors.
Which drugs have the highest frequency and what are the typical outcomes?
```

**Expected BioMCP Usage:**

1. `think` - Structure pneumonitis investigation
2. `openfda_adverse_searcher(drug="pembrolizumab", reaction="pneumonitis")`
3. `openfda_adverse_searcher(drug="nivolumab", reaction="pneumonitis")`
4. `openfda_adverse_searcher(drug="atezolizumab", reaction="pneumonitis")`
5. Compare frequencies and outcomes
6. `openfda_adverse_getter(report_id="...")` for severe cases

## Quality Assurance

### Diagnostic Test Validation

```
What quality issues have been reported for liquid biopsy ctDNA tests
that could affect treatment decisions?
```

**Expected BioMCP Usage:**

1. `think` - Identify quality factors
2. `openfda_device_searcher(device="liquid biopsy", genomics_only=True)`
3. `openfda_device_searcher(device="ctDNA", genomics_only=True)`
4. `openfda_device_searcher(device="circulating tumor", genomics_only=True)`
5. Analyze failure modes

## Tips for Effective Prompts

1. **Be specific about the data needed**: Specify if you want adverse events, labels, or device data
2. **Include relevant filters**: Mention if focusing on serious events, specific populations, or genomic devices
3. **Request appropriate analysis**: Ask for comparisons, trends, or specific data points
4. **Consider multiple data sources**: Combine OpenFDA with literature and trial data for comprehensive answers
5. **Include time frames when relevant**: Though OpenFDA doesn't support date filtering in queries, you can ask for analysis of recent reports

## Integration Examples

### Combining with Literature Search

```
Find FDA adverse events for venetoclax in CLL, then search for published
case reports that provide more clinical context for the most serious events.
```

### Combining with Clinical Trials

```
What adverse events are reported for FDA-approved CAR-T therapies, and how
do these compare to adverse events being monitored in current clinical trials?
```

### Combining with Variant Data

```
For patients with RET fusion-positive cancers, what FDA-approved targeted
therapies are available and what are their mutation-specific response rates?
```

## Using Your OpenFDA API Key

The OpenFDA API has rate limits: 40 requests/minute without a key, or 240 requests/minute with a key. You can get a free API key at https://open.fda.gov/apis/authentication/

### Method 1: Include API Key in Your Prompt

You can provide your API key directly in your conversation with the AI:

```
My OpenFDA API key is: YOUR_API_KEY_HERE

Search for all serious adverse events reported for pembrolizumab in the last year.
Include both death and hospitalization events.
```

```
Using my OpenFDA API key YOUR_API_KEY_HERE, compare the safety profiles of
all FDA-approved BRAF inhibitors for melanoma treatment.
```

```
I have an OpenFDA API key: YOUR_API_KEY_HERE

Analyze device malfunction reports for all NGS-based cancer diagnostic tests.
Focus on false positive and false negative results that could impact treatment decisions.
```

### Method 2: Set Environment Variable (Recommended for Regular Use)

If you're using BioMCP regularly, set the environment variable once:

```bash
export OPENFDA_API_KEY="YOUR_API_KEY_HERE"
```

Then your prompts don't need to include the key:

```
Retrieve the complete FDA label for trastuzumab deruxtecan, including all
warnings and contraindications for HER2-low breast cancer patients.
```

### Method 3: CLI Usage with API Key

When using the BioMCP CLI directly:

```bash
# Pass API key as parameter
biomcp openfda adverse search --drug pembrolizumab --serious --api-key YOUR_API_KEY_HERE

# Or set environment variable first
export OPENFDA_API_KEY="YOUR_API_KEY_HERE"
biomcp openfda device search --manufacturer Illumina --genomics-only
```

### Example Prompts with API Key

#### Comprehensive Drug Safety Analysis

```
My OpenFDA API key is: YOUR_API_KEY_HERE

Perform a comprehensive safety analysis of sotorasib (Lumakras) including:
1. All serious adverse events from post-market surveillance
2. Complete FDA label with all sections
3. Any device issues if it's a companion diagnostic drug
4. Compare its safety profile to other KRAS G12C inhibitors if available

This is for a clinical review, so I need detailed data from all available FDA sources.
```

#### Large-Scale Adverse Event Analysis

```
Using my OpenFDA API key YOUR_API_KEY_HERE, analyze adverse events for all
FDA-approved checkpoint inhibitors (pembrolizumab, nivolumab, atezolizumab,
durvalumab, avelumab, cemiplimab).

For each drug:
- Get the top 20 most frequent adverse events
- Identify immune-related adverse events
- Check for any black box warnings in their labels
- Note any fatal events

This requires many API calls, so please use my API key for higher rate limits.
```

#### Multi-Device Comparison

```
I have an OpenFDA API key: YOUR_API_KEY_HERE

Compare all FDA adverse event reports for NGS-based companion diagnostic devices
from major manufacturers (Foundation Medicine, Guardant Health, Tempus, etc.).
Focus on:
- Test failure rates
- Sample quality issues
- False positive/negative reports
- Software-related problems

This analysis requires querying multiple device records, so the API key will help
avoid rate limiting.
```

#### Batch Label Retrieval

```
My OpenFDA API key is YOUR_API_KEY_HERE.

Retrieve the complete FDA labels for all CDK4/6 inhibitors (palbociclib,
ribociclib, abemaciclib) and extract:
- Approved indications
- Dose modifications for adverse events
- Drug-drug interactions
- Special population considerations

Then create a comparison table of their safety profiles and dosing guidelines.
```

### When to Provide an API Key

You should provide your API key when:

1. **Performing large-scale analyses** requiring many API calls
2. **Conducting comprehensive safety reviews** across multiple drugs/devices
3. **Running batch operations** like comparing multiple products
4. **Doing rapid iterative searches** that might hit rate limits
5. **Performing systematic reviews** requiring extensive data retrieval

### API Key Security Notes

- Never share your actual API key in public forums or repositories
- The AI will use your key only for the current session
- Keys passed as parameters override environment variables
- The FDA API key is free and can be regenerated if compromised

## Important Notes

- Always expect the AI to use the `think` tool first for complex queries
- The AI should include appropriate disclaimers about adverse events not proving causation
- Results are limited by FDA's data availability and reporting patterns
- The AI should suggest when additional data sources might provide complementary information
- With an API key, you can make 240 requests/minute vs 40 without

## Known Limitations

### Drug Shortage Data

**Important:** The FDA does not currently provide a machine-readable API for drug shortage data. The shortage search tools will return an informative message directing users to the FDA's web-based shortage database. This is a limitation of FDA's current data infrastructure, not a bug in BioMCP.

Alternative resources for drug shortage information:

- FDA Drug Shortages Database: https://www.accessdata.fda.gov/scripts/drugshortages/
- ASHP Drug Shortages: https://www.ashp.org/drug-shortages/current-shortages

### Other Limitations

- Device adverse event reports use abbreviated device names (e.g., "F1CDX" instead of "FoundationOne CDx")
- Adverse event reports represent voluntary submissions and may not reflect true incidence rates
- Recall information may have a delay of 24-48 hours from initial FDA announcement

```

--------------------------------------------------------------------------------
/docs/tutorials/pydantic-ai-integration.md:
--------------------------------------------------------------------------------

```markdown
# Pydantic AI Integration Guide

This guide explains how to integrate BioMCP with Pydantic AI for building biomedical AI agents.

## Server Modes and Endpoints

BioMCP supports two primary transport modes for Pydantic AI integration:

### Available Transport Modes

| Mode              | Endpoints                  | Pydantic AI Client        | Use Case                        |
| ----------------- | -------------------------- | ------------------------- | ------------------------------- |
| `stdio`           | N/A (subprocess)           | `MCPServerStdio`          | Local development, testing      |
| `streamable_http` | `POST /mcp`, `GET /health` | `MCPServerStreamableHTTP` | Production HTTP deployments     |
| `worker`          | `POST /mcp`, `GET /health` | `MCPServerStreamableHTTP` | HTTP mode using streamable HTTP |

Both `streamable_http` and `worker` modes now use FastMCP's native streamable HTTP implementation for full MCP protocol compliance. The SSE-based transport has been deprecated.

## Working Examples for Pydantic AI

Here are the recommended configurations for connecting Pydantic AI to BioMCP:

### 1. STDIO Mode (Recommended for Local Development)

This mode runs BioMCP as a subprocess without needing an HTTP server:

```python
import asyncio
import os
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio

async def main():
    # Run BioMCP as a subprocess
    server = MCPServerStdio(
        "python",
        args=["-m", "biomcp", "run", "--mode", "stdio"]
    )

    # Use a real LLM model (requires API key)
    model = "openai:gpt-4o-mini"  # Set OPENAI_API_KEY environment variable

    agent = Agent(model, toolsets=[server])

    async with agent:
        # Example query that returns real results
        result = await agent.run(
            "Find articles about BRAF V600E mutations in melanoma"
        )
        print(result.output)

if __name__ == "__main__":
    asyncio.run(main())
```

### 2. Streamable HTTP Mode (Recommended for Production)

For production deployments with proper MCP compliance (requires pydantic-ai>=0.6.9):

```python
import asyncio
import os
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStreamableHTTP

async def main():
    # Connect to the /mcp endpoint
    server = MCPServerStreamableHTTP("http://localhost:8000/mcp")

    # Use a real LLM model (requires API key)
    # Options: openai:gpt-4o-mini, anthropic:claude-3-haiku-20240307, groq:llama-3.1-70b-versatile
    model = "openai:gpt-4o-mini"  # Set OPENAI_API_KEY environment variable

    agent = Agent(model, toolsets=[server])

    async with agent:
        # Example queries that return real results
        result = await agent.run(
            "Find recent articles about BRAF V600E in melanoma"
        )
        print(result.output)

if __name__ == "__main__":
    asyncio.run(main())
```

To run the server for this mode:

```bash
# Using streamable_http mode (recommended)
biomcp run --mode streamable_http --host 0.0.0.0 --port 8000

# Or using worker mode (also uses streamable HTTP)
biomcp run --mode worker --host 0.0.0.0 --port 8000

# Or using Docker
docker run -p 8000:8000 genomoncology/biomcp:latest biomcp run --mode streamable_http
```

### 3. Direct JSON-RPC Mode (Alternative HTTP)

You can also use the JSON-RPC endpoint at the root path:

```python
import httpx
import json

async def call_biomcp_jsonrpc(method, params=None):
    """Direct JSON-RPC calls to BioMCP"""
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://localhost:8000/",
            json={
                "jsonrpc": "2.0",
                "id": 1,
                "method": method,
                "params": params or {}
            }
        )
        return response.json()

# Example usage
result = await call_biomcp_jsonrpc("tools/list")
print("Available tools:", result)
```

## Troubleshooting Common Issues

### Issue: TestModel returns empty results

**Cause**: TestModel is a mock model for testing - it doesn't execute real searches.

**Solution**: This is expected behavior. TestModel returns `{"search":{"results":[]}}` by design. To get real results:

- Use a real LLM model with API key: `Agent("openai:gpt-4o-mini", toolsets=[server])`
- Use Groq for free tier: Sign up at console.groq.com, get API key, use `Agent("groq:llama-3.1-70b-versatile", toolsets=[server])`
- Or use BioMCP CLI directly (no API key needed): `biomcp article search --gene BRAF`

### Issue: Connection refused

**Solution**: Ensure the server is running with the correct host binding:

```bash
biomcp run --mode worker --host 0.0.0.0 --port 8000
```

### Issue: CORS errors in browser

**Solution**: The server includes CORS headers by default. If you still have issues, check if a proxy or firewall is blocking the headers.

### Issue: Health endpoint returns 404

**Solution**: The health endpoint is available at `GET /health` in both worker and streamable_http modes. Ensure you're using the latest version:

```bash
pip install --upgrade biomcp-python
```

### Issue: SSE endpoint not found

**Solution**: The SSE transport has been deprecated. Use streamable HTTP mode instead:

```python
# Old (deprecated)
# from pydantic_ai.mcp import MCPServerSSE
# server = MCPServerSSE("http://localhost:8000/sse")

# New (recommended)
from pydantic_ai.mcp import MCPServerStreamableHTTP
server = MCPServerStreamableHTTP("http://localhost:8000/mcp")
```

## Testing Your Connection

Here are test scripts to verify your setup for different modes:

### Testing STDIO Mode (Local Development)

```python
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
from pydantic_ai.mcp import MCPServerStdio

async def test_stdio_connection():
    # Use TestModel to verify connection (won't return real data)
    server = MCPServerStdio(
        "python",
        args=["-m", "biomcp", "run", "--mode", "stdio"]
    )

    agent = Agent(
        model=TestModel(call_tools=["search"]),
        toolsets=[server]
    )

    async with agent:
        print(f"✅ STDIO Connection successful!")

        # Test a simple search (returns mock data)
        result = await agent.run("Test search for BRAF")
        print(f"✅ Tool execution successful!")
        print(f"Note: TestModel returns mock data: {result.output}")

if __name__ == "__main__":
    asyncio.run(test_stdio_connection())
```

### Testing Streamable HTTP Mode (Production)

First, ensure the server is running:

```bash
# Start the server in a separate terminal
biomcp run --mode streamable_http --port 8000
```

Then test the connection:

```python
import asyncio
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
from pydantic_ai.mcp import MCPServerStreamableHTTP

async def test_streamable_http_connection():
    # Connect to the running server's /mcp endpoint
    server = MCPServerStreamableHTTP("http://localhost:8000/mcp")

    # Create agent with TestModel (no API keys needed)
    agent = Agent(
        model=TestModel(call_tools=["search"]),
        toolsets=[server]
    )

    async with agent:
        print("✅ Streamable HTTP Connection successful!")

        # Test a query
        result = await agent.run("Find articles about BRAF")
        print("✅ Tool execution successful!")
        if result.output:
            print(f"📄 Received {len(result.output)} characters of output")

if __name__ == "__main__":
    asyncio.run(test_streamable_http_connection())
```

### Important: Understanding TestModel vs Real Results

**TestModel is a MOCK model** - it doesn't execute real searches:

- TestModel simulates tool calls but returns empty results: `{"search":{"results":[]}}`
- This is by design - TestModel is for testing the connection flow, not getting real data
- To get actual search results, you need to use a real LLM model

**To get real results:**

1. **Use a real LLM model** (requires API key):

```python
# Replace TestModel with a real model
agent = Agent(
    "openai:gpt-4o-mini",  # or "anthropic:claude-3-haiku"
    toolsets=[server]
)
```

2. **Use BioMCP CLI directly** (no API key needed):

```bash
# Get real search results via CLI
biomcp article search --gene BRAF --disease melanoma --json
```

3. **For integration testing** without API keys:

```python
import subprocess
import json

# Use CLI to get real results
result = subprocess.run(
    ["biomcp", "article", "search", "--gene", "BRAF", "--json"],
    capture_output=True,
    text=True
)
data = json.loads(result.stdout)
print(f"Found {len(data['articles'])} real articles")
```

**Note**: The Streamable HTTP tests in our test suite verify this functionality works correctly. If you encounter connection issues, ensure:

1. The server is fully started before connecting
2. You're using pydantic-ai >= 0.6.9
3. The port is not blocked by a firewall

### Complete Working Example with Real Results

Here's a complete example that connects to BioMCP via Streamable HTTP and retrieves real biomedical data:

```python
#!/usr/bin/env python3
"""
Working example of Pydantic AI + BioMCP with Streamable HTTP.
This will get real search results from your BioMCP server.

Requires one of:
- export OPENAI_API_KEY='your-key'
- export ANTHROPIC_API_KEY='your-key'
- export GROQ_API_KEY='your-key'  (free tier at console.groq.com)
"""

import asyncio
import os
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStreamableHTTP


async def main():
    # Server configuration
    SERVER_URL = "http://localhost:8000/mcp"  # Adjust port as needed

    # Detect which API key is available
    if os.getenv("OPENAI_API_KEY"):
        model = "openai:gpt-4o-mini"
        print("Using OpenAI GPT-4o-mini")
    elif os.getenv("ANTHROPIC_API_KEY"):
        model = "anthropic:claude-3-haiku-20240307"
        print("Using Claude 3 Haiku")
    elif os.getenv("GROQ_API_KEY"):
        model = "groq:llama-3.1-70b-versatile"  # Free tier available
        print("Using Groq Llama 3.1")
    else:
        print("No API key found! Please set OPENAI_API_KEY, ANTHROPIC_API_KEY, or GROQ_API_KEY")
        return

    # Connect to BioMCP server
    server = MCPServerStreamableHTTP(SERVER_URL)
    agent = Agent(model, toolsets=[server])

    async with agent:
        print("Connected to BioMCP!\n")

        # Search for articles (includes cBioPortal data for genes)
        result = await agent.run(
            "Search for 2 recent articles about BRAF V600E mutations in melanoma. "
            "List the title and first author for each."
        )
        print("Article Search Results:")
        print(result.output)
        print("\n" + "="*60 + "\n")

        # Search for clinical trials
        result2 = await agent.run(
            "Find 2 clinical trials for melanoma with BRAF mutations "
            "that are currently recruiting. Show NCT ID and title."
        )
        print("Clinical Trial Results:")
        print(result2.output)
        print("\n" + "="*60 + "\n")

        # Search for variant information
        result3 = await agent.run(
            "Search for pathogenic TP53 variants. Show 2 examples."
        )
        print("Variant Search Results:")
        print(result3.output)


if __name__ == "__main__":
    # Start your BioMCP server first:
    # biomcp run --mode streamable_http --port 8000

    asyncio.run(main())
```

**Running this example:**

1. Start the BioMCP server:

```bash
biomcp run --mode streamable_http --port 8000
```

2. Set your API key (choose one):

```bash
export OPENAI_API_KEY='your-key'        # OpenAI
export ANTHROPIC_API_KEY='your-key'     # Anthropic
export GROQ_API_KEY='your-key'          # Groq (free tier available)
```

3. Run the script:

```bash
python biomcp_example.py
```

This will return actual biomedical data from PubMed, ClinicalTrials.gov, and variant databases!

## Using BioMCP Tools with Pydantic AI

Once connected, you can use BioMCP's biomedical research tools:

```python
import os
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerStdio

async def biomedical_research_example():
    server = MCPServerStdio(
        "python",
        args=["-m", "biomcp", "run", "--mode", "stdio"]
    )

    # Choose model based on available API key
    if os.getenv("OPENAI_API_KEY"):
        model = "openai:gpt-4o-mini"
    elif os.getenv("GROQ_API_KEY"):
        model = "groq:llama-3.1-70b-versatile"  # Free tier available
    else:
        raise ValueError("Please set OPENAI_API_KEY or GROQ_API_KEY")

    agent = Agent(model, toolsets=[server])

    async with agent:
        # Important: Always use the think tool first for complex queries
        result = await agent.run("""
            First use the think tool to plan your approach, then:
            1. Search for articles about immunotherapy resistance in melanoma
            2. Find clinical trials testing combination therapies
            3. Look up genetic markers associated with treatment response
        """)

        print(result.output)
```

## Production Deployment Considerations

For production deployments:

1. **Use STDIO mode** for local development or when running in containerized environments where the agent and BioMCP can run in the same container
2. **Use Streamable HTTP mode** when you need HTTP-based communication between separate services (recommended for production)
3. **Both `worker` and `streamable_http` modes** now use the same underlying streamable HTTP transport
4. **Require a real LLM model** - TestModel won't work for production as it only returns mock data
5. **Consider API costs** - Use cheaper models like `gpt-4o-mini` or Groq's free tier for testing
6. **Implement proper error handling** and retry logic for network failures
7. **Set appropriate timeouts** for long-running biomedical searches
8. **Cache frequently accessed data** to reduce API calls to backend services

### Important Notes

- **Real LLM required for results**: TestModel is only for testing connections - use a real LLM (OpenAI, Anthropic, Groq) to get actual biomedical data
- **SSE transport is deprecated**: The old SSE-based transport (`/sse` endpoint) has been removed in favor of streamable HTTP
- **Worker mode now uses streamable HTTP**: The `worker` mode has been updated to use streamable HTTP transport internally
- **Health endpoint**: The `/health` endpoint is available in both HTTP modes for monitoring
- **Free tier option**: Groq offers a free API tier at console.groq.com for testing without costs

## Migration Guide from SSE to Streamable HTTP

If you're upgrading from an older version that used SSE transport:

### Code Changes

```python
# Old code (deprecated)
from pydantic_ai.mcp import MCPServerSSE
server = MCPServerSSE("http://localhost:8000/sse")

# New code (recommended)
from pydantic_ai.mcp import MCPServerStreamableHTTP
server = MCPServerStreamableHTTP("http://localhost:8000/mcp")
```

### Server Command Changes

```bash
# Old: SSE endpoints were at /sse
# biomcp run --mode worker  # Used to expose /sse endpoint

# New: Both modes now use /mcp endpoint with streamable HTTP
biomcp run --mode worker         # Now uses /mcp with streamable HTTP
biomcp run --mode streamable_http # Also uses /mcp with streamable HTTP
```

### Key Differences

1. **Endpoint Change**: `/sse` → `/mcp`
2. **Protocol**: Server-Sent Events → Streamable HTTP (supports both JSON and SSE)
3. **Client Library**: `MCPServerSSE` → `MCPServerStreamableHTTP`
4. **Compatibility**: Requires pydantic-ai >= 0.6.9 for `MCPServerStreamableHTTP`

## Next Steps

- Review the [MCP Tools Reference](../user-guides/02-mcp-tools-reference.md) for available biomedical research tools
- See [CLI Guide](../user-guides/01-command-line-interface.md) for more server configuration options
- Check [Transport Protocol Guide](../developer-guides/04-transport-protocol.md) for detailed protocol information

## Support

If you continue to experience issues:

1. Verify your BioMCP version: `biomcp --version`
2. Check server logs for error messages
3. Open an issue on [GitHub](https://github.com/genomoncology/biomcp/issues) with:
   - Your BioMCP version
   - Server startup command
   - Complete error messages
   - Minimal reproduction code

```
Page 9/15FirstPrevNextLast