afrise/academic-search-mcp-server # codebase.md

# Directory Structure

```
├── .gitignore
├── .python-version
├── .vscode
│   └── launch.json
├── Dockerfile
├── launch.json
├── LICENSE
├── pyproject.toml
├── README.md
├── server.py
├── smithery.yaml
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------

```
3.10

```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info

logs/
docs/ 
# Virtual environments
.venv


```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
# Academic Paper Search MCP Server

[![smithery badge](https://smithery.ai/badge/@afrise/academic-search-mcp-server)](https://smithery.ai/server/@afrise/academic-search-mcp-server)

A [Model Context Protocol (MCP)](https://www.anthropic.com/news/model-context-protocol) server that enables searching and retrieving academic paper information from multiple sources.

The server provides LLMs with:
- Real-time academic paper search functionality  
- Access to paper metadata and abstracts
- Ability to retrieve full-text content when available
- Structured data responses following the MCP specification

While primarily designed for integration with Anthropic's Claude Desktop client, the MCP specification allows for potential compatibility with other AI models and clients that support tool/function calling capabilities (e.g. OpenAI's API).

**Note**: This software is under active development. Features and functionality are subject to change.

<a href="https://glama.ai/mcp/servers/kzsu1zzz9j"><img width="380" height="200" src="https://glama.ai/mcp/servers/kzsu1zzz9j/badge" alt="Academic Paper Search Server MCP server" /></a>

## Features

This server exposes the following tools:
- `search_papers`: Search for academic papers across multiple sources
  - Parameters:
    - `query` (str): Search query text
    - `limit` (int, optional): Maximum number of results to return (default: 10)
  - Returns: Formatted string containing paper details
  
- `fetch_paper_details`: Retrieve detailed information for a specific paper
  - Parameters:
    - `paper_id` (str): Paper identifier (DOI or Semantic Scholar ID)
    - `source` (str, optional): Data source ("crossref" or "semantic_scholar", default: "crossref")
  - Returns: Formatted string with comprehensive paper metadata including:
    - Title, authors, year, DOI
    - Venue, open access status, PDF URL (Semantic Scholar only)
    - Abstract and TL;DR summary (when available)

- `search_by_topic`: Search for papers by topic with optional date range filter
  - Parameters:
    - `topic` (str): Search query text (limited to 300 characters)
    - `year_start` (int, optional): Start year for date range 
    - `year_end` (int, optional): End year for date range
    - `limit` (int, optional): Maximum number of results to return (default: 10)
  - Returns: Formatted string containing search results including:
    - Paper titles, authors, and years
    - Abstracts and TL;DR summaries when available
    - Venue and open access information

## Setup


### Installing via Smithery

To install Academic Paper Search Server for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@afrise/academic-search-mcp-server):

```bash
npx -y @smithery/cli install @afrise/academic-search-mcp-server --client claude
```

***note*** this method is largely untested, as their server seems to be having trouble. you can follow the standalone instructions until smithery gets fixed. 

### Installing via uv (manual install): 

1. Install dependencies:
```sh
uv add "mcp[cli]" httpx
```

2. Set up required API keys in your environment or `.env` file:
```sh
#  These are not actually implemented
SEMANTIC_SCHOLAR_API_KEY=your_key_here 
CROSSREF_API_KEY=your_key_here  # Optional but recommended
```

3. Run the server:
```sh
uv run server.py
```

## Usage with Claude Desktop

1. Add the server to your Claude Desktop configuration (`claude_desktop_config.json`):
```json
{
  "mcpServers": {
    "academic-search": {
      "command": "uv",
      "args": ["run ", "/path/to/server/server.py"],
      "env": {
        "SEMANTIC_SCHOLAR_API_KEY": "your_key_here",
        "CROSSREF_API_KEY": "your_key_here"
      }
    }
  }
}
```

2. Restart Claude Desktop


## Development

This server is built using:
- Python MCP SDK
- FastMCP for simplified server implementation
- httpx for API requests

## API Sources

- Semantic Scholar API
- Crossref API

## License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This license ensures that:

- You can freely use, modify, and distribute this software
- Any modifications must be open-sourced under the same license
- Anyone providing network services using this software must make the source code available
- Commercial use is allowed, but the software and any derivatives must remain free and open source

See the [LICENSE](LICENSE) file for the full license text.

## Contributing

Contributions are welcome! Here's how you can help:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

Please note:
- Follow the existing code style and conventions
- Add tests for any new functionality
- Update documentation as needed
- Ensure your changes respect the AGPL-3.0 license terms

By contributing to this project, you agree that your contributions will be licensed under the AGPL-3.0 license.

```

--------------------------------------------------------------------------------
/launch.json:
--------------------------------------------------------------------------------

```json

```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
[project]
name = "academic-search"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
    "httpx>=0.28.1",
    "mcp[cli]>=1.2.1",
]

```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
# Generated by https://smithery.ai. See: https://smithery.ai/docs/config#dockerfile
# Use the official Python image with version 3.10
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy the project files into the container
COPY . /app

# Install dependencies from pyproject.toml using uv
# We will install uv first to use it for dependency management
RUN pip install uv

# Install the project's dependencies using the lockfile
RUN uv sync --frozen --no-install-project --no-dev --no-editable

# Set environment variables for the API keys
ENV SEMANTIC_SCHOLAR_API_KEY=your_key_here
ENV CROSSREF_API_KEY=your_key_here

# Expose the port that the server will run on
EXPOSE 8000

# Command to run the server
CMD ["uv", "run", "server.py"]

```

--------------------------------------------------------------------------------
/.vscode/launch.json:
--------------------------------------------------------------------------------

```json
{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [

        {
            "name": "MCP Inspector",
            "type": "node",
            "request": "launch",
            "runtimeExecutable": "npx",
            "args": [
                "@modelcontextprotocol/inspector", 
                "uv", 
                "run", 
                "G:/code/science/server.py"], 
            "env": {
                "PYTHONIOENCODING": "utf-8",
                "LANG": "en_US.UTF-8"
            },    
            "console": "integratedTerminal"
        }
    ]
}
```

--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------

```yaml
# Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml

startCommand:
  type: stdio
  configSchema:
    # JSON Schema defining the configuration options for the MCP.
    type: object
    required:
      - semanticScholarApiKey
    properties:
      semanticScholarApiKey:
        type: string
        description: The API key for Semantic Scholar (Optional).
      crossrefApiKey:
        type: string
        description: The API key for Crossref (optional).
  commandFunction:
    # A function that produces the CLI command to start the MCP on stdio.
    |-
    (config) => ({ command: 'uv', args: ['run', 'server.py'], env: { SEMANTIC_SCHOLAR_API_KEY: config.semanticScholarApiKey, CROSSREF_API_KEY: config.crossrefApiKey || '' } })

```

--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------

```python
import logging
import sys
import os
from datetime import datetime
from typing import Any
import httpx
from mcp.server.fastmcp import FastMCP
import unicodedata
import json
import sys

# Set UTF-8 as default encoding for Python
sys.stdout.recodeinfo = 'utf-8'
if sys.stdout.encoding != 'utf-8':
    sys.stdout.reconfigure(encoding='utf-8')

# Initialize FastMCP server
mcp = FastMCP("scientific_literature")

# Constants
SEMANTIC_SCHOLAR_API = "https://api.semanticscholar.org/graph/v1"
CROSSREF_API = "https://api.crossref.org/works"
USER_AGENT = "scientific-literature-app/1.0"


async def make_api_request(url: str, headers: dict = None, params: dict = None) -> dict[str, Any] | None:
    """Make a request to the API with proper error handling."""
    if headers is None:
        headers = { "User-Agent": USER_AGENT }
    async with httpx.AsyncClient() as client:
        try:
            response = await client.get(url, headers=headers, params=params, timeout=30.0)
            response.raise_for_status()
            return response.json()
        except Exception as e:
            return None

def format_paper_data(data: dict, source: str) -> str:
    """Format paper data from different sources into a consistent string format."""
    if not data:
        return "No paper data available"
        
    try:
        if source == "semantic_scholar":
            title = unicodedata.normalize('NFKD', str(data.get('title', 'No title available')))
            authors = ', '.join([author.get('name', 'Unknown Author') for author in data.get('authors', [])])
            year = data.get('year') or 'Year unknown'
            external_ids = data.get('externalIds', {}) or {}
            doi = external_ids.get('DOI', 'No DOI available')
            venue = data.get('venue') or 'Venue unknown'
            abstract = data.get('abstract') or 'No abstract available'
            tldr = (data.get('tldr') or {}).get('text', '')
            is_open = "Yes" if data.get('isOpenAccess') else "No"
            pdf_data = data.get('openAccessPdf', {}) or {}
            pdf_url = pdf_data.get('url', 'Not available')

        elif source == "crossref":
            title = (data.get('title') or ['No title available'])[0]
            authors = ', '.join([
                f"{author.get('given', '')} {author.get('family', '')}".strip() or 'Unknown Author'
                for author in data.get('author', [])
            ])
            year = (data.get('published-print', {}).get('date-parts', [['']])[0][0]) or 'Year unknown'
            doi = data.get('DOI') or 'No DOI available'
            
        result = [
            f"Title: {title}",
            f"Authors: {authors}",
            f"Year: {year}",
            f"DOI: {doi}"
        ]
        
        if source == "semantic_scholar":
            result.extend([
                f"Venue: {venue}",
                f"Open Access: {is_open}",
                f"PDF URL: {pdf_url}",
                f"Abstract: {abstract}"
            ])
            if tldr:
                result.append(f"TL;DR: {tldr}")
                
        return "\n".join(result) + "\t\t\n"
        
    except Exception as e:
        return f"Error formatting paper data: {str(e)}"

@mcp.tool()
async def search_papers(query: str, limit: int = 10) -> str:
    """Search for papers across multiple sources.

    args: 
        query: the search query
        limit: the maximum number of results to return (default 10)
    """

    if query == "":
        return "Please provide a search query."
    
    # Truncate long queries
    MAX_QUERY_LENGTH = 300
    if len(query) > MAX_QUERY_LENGTH:
        original_length = len(query)
        query = query[:MAX_QUERY_LENGTH] + "..."
    
    try:
        # Search Semantic Scholar
        semantic_url = f"{SEMANTIC_SCHOLAR_API}/paper/search?query={query}&limit={limit}"
        semantic_data = await make_api_request(semantic_url)

        # Search Crossref
        crossref_url = f"{CROSSREF_API}?query={query}&rows={limit}"
        crossref_data = await make_api_request(crossref_url)

        results = []
        
        if semantic_data and 'papers' in semantic_data:
            results.append("=== Semantic Scholar Results ===")
            for paper in semantic_data['papers']:
                results.append(format_paper_data(paper, "semantic_scholar"))

        if crossref_data and 'items' in crossref_data.get('message', {}):
            results.append("\n=== Crossref Results ===")
            for paper in crossref_data['message']['items']:
                results.append(format_paper_data(paper, "crossref"))

        if not results:
            return "No results found or error occurred while fetching papers."

        return "\n".join(results)
    except:
        return "Error searching papers."

@mcp.tool()
async def fetch_paper_details(paper_id: str, source: str = "semantic_scholar") -> str:
    """Get detailed information about a specific paper.

    Args:
        paper_id: Paper identifier (DOI for Crossref, paper ID for Semantic Scholar)
        source: Source database ("semantic_scholar" or "crossref")
    """
    if source == "semantic_scholar":
        url = f"{SEMANTIC_SCHOLAR_API}/paper/{paper_id}"
    elif source == "crossref":
        url = f"{CROSSREF_API}/{paper_id}"
    else:
        return "Unsupported source. Please use 'semantic_scholar' or 'crossref'."

    data = await make_api_request(url)
    
    if not data:
        return f"Unable to fetch paper details from {source}."

    if source == "crossref":
        data = data.get('message', {})

    return format_paper_data(data, source)


@mcp.tool()
async def search_by_topic(topic: str, year_start: int = None, year_end: int = None, limit: int = 10) -> str:
    """Search for papers by topic with optional date range. 
    
    Note: Query length is limited to 300 characters. Longer queries will be automatically truncated.
    
    Args:
        topic (str): Search query (max 300 chars)
        year_start (int, optional): Start year for date range
        year_end (int, optional): End year for date range  
        limit (int, optional): Maximum number of results to return (default 10)
        
    Returns:
        str: Formatted search results or error message
    """
    
    try:
        # Truncate long queries to prevent API errors
        MAX_QUERY_LENGTH = 300
        if len(topic) > MAX_QUERY_LENGTH:
            original_length = len(topic)
            topic = topic[:MAX_QUERY_LENGTH] + "..."
        
        # Try Semantic Scholar API first
        semantic_url = f"{SEMANTIC_SCHOLAR_API}/paper/search"
        params = {
            "query": topic.encode('utf-8').decode('utf-8'),
            "limit": limit,
            "fields": "title,authors,year,paperId,externalIds,abstract,venue,isOpenAccess,openAccessPdf,tldr"
        }
        if year_start and year_end:
            params["year"] = f"{year_start}-{year_end}"
            
        headers = {
            "Accept": "application/json",
            "Content-Type": "application/json; charset=utf-8"
        }
        data = await make_api_request(semantic_url, headers=headers, params=params)
        
        if data and 'data' in data:
            results = ["=== Search Results ==="]
            for paper in data['data']:
                results.append(format_paper_data(paper, "semantic_scholar"))
            return "\n".join(results)
            
        # Fallback to Crossref if Semantic Scholar fails
        return await search_papers(topic, limit)
        
    except Exception as e:
        return f"Error searching papers!"


if __name__ == "__main__":
    # Initialize and run the server
    mcp.run(transport='stdio')

```