# Directory Structure ``` ├── .gitignore ├── .python-version ├── biorxiv_server.py ├── biorxiv_web_search.py ├── pyproject.toml ├── README.md └── requirements.txt ``` # Files -------------------------------------------------------------------------------- /.python-version: -------------------------------------------------------------------------------- ``` 3.10 ``` -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- ``` # Python-generated files __pycache__/ *.py[oc] build/ dist/ wheels/ *.egg-info # Virtual environments .venv ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- ```markdown # bioRxiv MCP Server 🔍 Enable AI assistants to search and access bioRxiv papers through a simple MCP interface. The bioRxiv MCP Server provides a bridge between AI assistants and bioRxiv's preprint repository through the Model Context Protocol (MCP). It allows AI models to search for biology preprints and access their metadata in a programmatic way. 🤝 Contribute • 📝 Report Bug ## ✨ Core Features - 🔎 Paper Search: Query bioRxiv papers with keywords or advanced search ✅ - 🚀 Efficient Retrieval: Fast access to paper metadata ✅ - 📊 Metadata Access: Retrieve detailed metadata for specific papers ✅ - 📊 Research Support: Facilitate biological sciences research and analysis ✅ - 📄 Paper Access: Download and read paper content 📝 - 📋 Paper Listing: View all downloaded papers 📝 - 🗃️ Local Storage: Papers are saved locally for faster access 📝 - 📝 Research Prompts: A set of specialized prompts for paper analysis 📝 ## 🚀 Quick Start ### Prerequisites - Python 3.10+ - FastMCP library ### Installation 1. Clone the repository: ``` git clone https://github.com/JackKuo666/bioRxiv-MCP-Server.git cd bioRxiv-MCP-Server ``` 2. Install the required dependencies: ``` pip install -r requirements.txt ``` ### Installing via Smithery To install bioRxiv Server for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@JackKuo666/biorxiv-mcp-server): #### claude ```bash npx -y @smithery/cli@latest install @JackKuo666/biorxiv-mcp-server --client claude --config "{}" ``` #### Cursor Paste the following into Settings → Cursor Settings → MCP → Add new server: - Mac/Linux ```s npx -y @smithery/cli@latest run @JackKuo666/biorxiv-mcp-server --client cursor --config "{}" ``` #### Windsurf ```sh npx -y @smithery/cli@latest install @JackKuo666/biorxiv-mcp-server --client windsurf --config "{}" ``` #### CLine ```sh npx -y @smithery/cli@latest install @JackKuo666/biorxiv-mcp-server --client cline --config "{}" ``` #### Usage with Claude Desktop Add this configuration to your `claude_desktop_config.json`: (Mac OS) ```json { "mcpServers": { "biorxiv": { "command": "python", "args": ["-m", "biorxiv-mcp-server"] } } } ``` (Windows version): ```json { "mcpServers": { "biorxiv": { "command": "C:\\Users\\YOUR_USERNAME\\AppData\\Local\\Programs\\Python\\Python311\\python.exe", "args": [ "-m", "biorxiv-mcp-server" ] } } } ``` Using with Cline ```json { "mcpServers": { "biorxiv": { "command": "bash", "args": [ "-c", "source /home/YOUR/PATH/mcp-server-bioRxiv/.venv/bin/activate && python /home/YOUR/PATH/mcp-server-bioRxiv/biorxiv_server.py" ], "env": {}, "disabled": false, "autoApprove": [] } } } ``` ## 📊 Usage Start the MCP server: ```bash python biorxiv_server.py ``` ## 🛠 MCP Tools The bioRxiv MCP Server provides the following tools: 1. `search_biorxiv_key_words`: Search for articles on bioRxiv using keywords. 2. `search_biorxiv_advanced`: Perform an advanced search for articles on bioRxiv with multiple parameters. 3. `get_biorxiv_metadata`: Fetch metadata for a bioRxiv article using its DOI. ### Searching Papers You can ask the AI assistant to search for papers using queries like: ``` Can you search bioRxiv for recent papers about genomics? ``` ### Getting Paper Details Once you have a DOI, you can ask for more details: ``` Can you show me the metadata for the paper with DOI 10.1101/123456? ``` ## 📁 Project Structure - `biorxiv_server.py`: The main MCP server implementation using FastMCP - `biorxiv_web_search.py`: Contains the web scraping logic for searching bioRxiv ## 🔧 Dependencies - Python 3.10+ - FastMCP - asyncio - logging ## 🤝 Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## 📄 License This project is licensed under the MIT License. ## ⚠️ Disclaimer This tool is for research purposes only. Please respect bioRxiv's terms of service and use this tool responsibly. ``` -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- ``` requests bs4 mcp ``` -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- ```toml [project] name = "mcp-server-bioRxiv" version = "0.1.0" description = "An MCP server for searching and retrieving articles from bioRxiv" readme = "README.md" requires-python = ">=3.10" dependencies = [ "mcp[cli]>=1.4.1", "requests>=2.25.1", "beautifulsoup4>=4.9.3", ] ``` -------------------------------------------------------------------------------- /biorxiv_server.py: -------------------------------------------------------------------------------- ```python from typing import Any, List, Dict, Optional import asyncio import logging from mcp.server.fastmcp import FastMCP from biorxiv_web_search import search_key_words, search_advanced, doi_get_biorxiv_metadata # Set up logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') # Initialize FastMCP server mcp = FastMCP("biorxiv") @mcp.tool() async def search_biorxiv_key_words(key_words: str, num_results: int = 10) -> List[Dict[str, Any]]: logging.info(f"Searching for articles with key words: {key_words}, num_results: {num_results}") """ Search for articles on bioRxiv using key words. Args: key_words: Search query string num_results: Number of results to return (default: 10) Returns: List of dictionaries containing article information """ try: results = await asyncio.to_thread(search_key_words, key_words, num_results) return results except Exception as e: return [{"error": f"An error occurred while searching: {str(e)}"}] @mcp.tool() async def search_biorxiv_advanced( term: Optional[str] = None, title: Optional[str] = None, author1: Optional[str] = None, author2: Optional[str] = None, abstract_title: Optional[str] = None, text_abstract_title: Optional[str] = None, section: Optional[str] = None, start_date: Optional[str] = None, end_date: Optional[str] = None, num_results: int = 10 ) -> List[Dict[str, Any]]: logging.info(f"Performing advanced search with parameters: {locals()}") """ Perform an advanced search for articles on bioRxiv. Args: term: General search term title: Search in title author1: First author author2: Second author abstract_title: Search in abstract and title text_abstract_title: Search in full text, abstract, and title section: Section of bioRxiv start_date: Start date for search range (format: YYYY-MM-DD) end_date: End date for search range (format: YYYY-MM-DD) num_results: Number of results to return (default: 10) Returns: List of dictionaries containing article information """ try: results = await asyncio.to_thread( search_advanced, term, title, author1, author2, abstract_title, text_abstract_title, section, start_date, end_date, num_results ) return results except Exception as e: return [{"error": f"An error occurred while performing advanced search: {str(e)}"}] @mcp.tool() async def get_biorxiv_metadata(doi: str) -> Dict[str, Any]: logging.info(f"Fetching metadata for DOI: {doi}") """ Fetch metadata for a bioRxiv article using its DOI. Args: doi: DOI of the article Returns: Dictionary containing article metadata """ try: metadata = await asyncio.to_thread(doi_get_biorxiv_metadata, doi) return metadata if metadata else {"error": f"No metadata found for DOI: {doi}"} except Exception as e: return {"error": f"An error occurred while fetching metadata: {str(e)}"} if __name__ == "__main__": logging.info("Starting bioRxiv MCP server") # Initialize and run the server mcp.run(transport='stdio') ``` -------------------------------------------------------------------------------- /biorxiv_web_search.py: -------------------------------------------------------------------------------- ```python import requests from bs4 import BeautifulSoup from urllib.parse import quote def generate_biorxiv_search_url(term=None, title=None, author1=None, author2=None, abstract_title=None, text_abstract_title=None, journal_code="biorxiv", section=None, start_date=None, end_date=None, num_results=10, sort="relevance-rank"): """根据用户输入的字段生成 bioRxiv 搜索 URL""" base_url = "https://www.biorxiv.org/search/" query_parts = [] if term: query_parts.append(f"{quote(term)}") if title: query_parts.append(f"title%3A{quote(title)} title_flags%3Amatch-all") if author1: query_parts.append(f"author1%3A{quote(author1)}") if author2: query_parts.append(f"author2%3A{quote(author2)}") if abstract_title: query_parts.append(f"abstract_title%3A{quote(abstract_title)} abstract_title_flags%3Amatch-all") if text_abstract_title: query_parts.append(f"text_abstract_title%3A{quote(text_abstract_title)} text_abstract_title_flags%3Amatch-all") if journal_code: query_parts.append(f"jcode%3A{quote(journal_code)}") if section: query_parts.append(f"toc_section%3A{quote(section)}") if start_date and end_date: query_parts.append(f"limit_from%3A{start_date} limit_to%3A{end_date}") query_parts.append(f"numresults%3A{num_results}") query_parts.append(f"sort%3A{quote(sort)} format_result%3Astandard") return base_url + "%20".join(query_parts) def scrape_biorxiv_results(search_url): """从 bioRxiv 搜索结果页面解析文章信息,包括 DOI""" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' } response = requests.get(search_url, headers=headers) if response.status_code == 200: soup = BeautifulSoup(response.text, 'html.parser') articles = soup.find_all('li', class_='search-result') results = [] for article in articles: title_tag = article.find('span', class_='highwire-cite-title') title = title_tag.text.strip() if title_tag else "No title" authors_tag = article.find('span', class_='highwire-citation-authors') authors = authors_tag.text.strip() if authors_tag else "No authors" abstract_tag = article.find('div', class_='highwire-cite-snippet') abstract = abstract_tag.text.strip() if abstract_tag else "No abstract" link_tag = article.find('a', class_='highwire-cite-linked-title') link = "https://www.biorxiv.org" + link_tag['href'] if link_tag else "No link" doi_tag = article.find('span', class_='highwire-cite-metadata-doi') doi_link = doi_tag.text.strip().replace("doi:", "").strip() if doi_tag else "No DOI" metadata = {} result = { "Title": title, "Authors": authors, "DOI_link": doi_link, "Link": link } if doi_link != "No DOI": metadata = doi_get_biorxiv_metadata(doi_link.replace("https://doi.org/", "")) if metadata: result.update(metadata) results.append(result) return results else: print(f"Error: Unable to fetch data (status code: {response.status_code})") return None def doi_get_biorxiv_metadata(doi, server="biorxiv"): """使用 bioRxiv API 通过 DOI 获取文章的详细元数据""" url = f"https://api.biorxiv.org/details/{server}/{doi}/na/json" headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' } response = requests.get(url, headers=headers) if response.status_code == 200: data = response.json() if 'collection' in data and len(data['collection']) > 0: article = data['collection'][0] return { "DOI": article.get("doi", "No DOI"), "Title": article.get("title", "No title"), "Authors": article.get("authors", "No authors"), "Corresponding Author": article.get("author_corresponding", "No corresponding author"), "Corresponding Institution": article.get("author_corresponding_institution", "No institution"), "Date": article.get("date", "No date"), "Version": article.get("version", "No version"), "Category": article.get("category", "No category"), "JATS XML Path": article.get("jats xml path", "No XML path"), "Abstract": article.get("abstract", "No abstract") } else: print("No data found for DOI:", doi) return None else: print(f"Error: Unable to fetch metadata (status code: {response.status_code})") return None def search_key_words(key_words, num_results=10): # 生成搜索 URL search_url = generate_biorxiv_search_url(term=key_words, num_results=num_results) print("Generated URL:", search_url) # 获取并解析搜索结果 articles = scrape_biorxiv_results(search_url) return articles def search_advanced(term, title, author1, author2, abstract_title, text_abstract_title, section, start_date, end_date, num_results): # 生成搜索 URL search_url = generate_biorxiv_search_url(term, title=title, author1=author1, author2=author2, abstract_title=abstract_title, text_abstract_title=text_abstract_title, section=section, start_date=start_date, end_date=end_date, num_results=num_results) print("Generated URL:", search_url) # 获取并解析搜索结果 articles = scrape_biorxiv_results(search_url) return articles if __name__ == "__main__": # 1. search_key_words key_words = "COVID-19" articles = search_key_words(key_words, num_results=5) print(articles) # 2. search_advanced # 示例:用户输入搜索参数 term = "CRISPR" title = "CRISPR" author1 = "Doudna" author2 = None abstract_title = "genome" text_abstract_title = None section = "New Results" start_date = "2025-02-27" end_date = "2025-03-18" num_results = 5 articles = search_advanced(term, title, author1, author2, abstract_title, text_abstract_title, section, start_date, end_date, num_results) print(articles) # 3. doi get biorxiv metadata doi = "10.1101/2024.06.25.600517" metadata = doi_get_biorxiv_metadata(doi) print(metadata) ```