# Directory Structure
```
├── .dockerignore
├── .gitignore
├── .python-version
├── Dockerfile
├── README.md
├── requirements.txt
├── search.py
├── server.py
└── smithery.yaml
```
# Files
--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------
```
3.10
```
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
venv/
.claude/
```
--------------------------------------------------------------------------------
/.dockerignore:
--------------------------------------------------------------------------------
```
__pycache__
*.pyc
*.pyo
*.pyd
.git
.gitignore
README.md
.env
.venv
venv/
.pytest_cache
.coverage
htmlcov/
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
# 📚 Semantic Scholar MCP Server
> **A comprehensive Model Context Protocol (MCP) server for seamless integration with Semantic Scholar's academic database**
[](https://smithery.ai/server/@alperenkocyigit/semantic-scholar-graph-api)


**Maintainer:** [@alperenkocyigit](https://github.com/alperenkocyigit)
This powerful MCP server bridges the gap between AI assistants and academic research by providing direct access to Semantic Scholar's comprehensive database. Whether you're conducting literature reviews, exploring citation networks, or seeking academic insights, this server offers a streamlined interface to millions of research papers.
## 🌟 What Can You Do?
### 🔍 **Advanced Paper Discovery**
- **Smart Search**: Find papers using natural language queries
- **Bulk Operations**: Process multiple papers simultaneously
- **Autocomplete**: Get intelligent title suggestions as you type
- **Precise Matching**: Find exact papers using title-based search
### 🎯 **AI-Powered Recommendations**
- **Smart Paper Recommendations**: Get personalized paper suggestions based on your interests
- **Multi-Example Learning**: Use multiple positive and negative examples to fine-tune recommendations
- **Single Paper Similarity**: Find papers similar to a specific research work
- **Relevance Scoring**: AI-powered relevance scores for better paper discovery
### 👥 **Author Research**
- **Author Profiles**: Comprehensive author information and metrics
- **Bulk Author Data**: Fetch multiple author profiles at once
- **Author Search**: Discover researchers by name or affiliation
### 📊 **Citation Analysis**
- **Citation Networks**: Explore forward and backward citations
- **Reference Mapping**: Understand paper relationships
- **Impact Metrics**: Access citation counts and paper influence
### 💡 **Content Discovery**
- **Text Snippets**: Search within paper content
- **Contextual Results**: Find relevant passages and quotes
- **Full-Text Access**: When available through Semantic Scholar
---
## 🛠️ Quick Setup
### System Requirements
- **Python**: 3.10 or higher
- **Dependencies**: `requests`, `mcp`, `bs4`, `pydantic`, `uvicorn`, `httpx`, `anyio`
- **Network**: Stable internet connection for API access
### 🆕 **NEW: MCP Streamable HTTP Transport**
This server now implements the **MCP Streamable HTTP** transport protocol, providing:
- **20x Higher Concurrency**: Handle significantly more simultaneous requests
- **Lower Latency**: Direct HTTP communication for faster response times
- **Better Resource Efficiency**: More efficient resource utilization
- **Future-Proofing**: HTTP is the recommended transport in MCP specifications
The server uses FastMCP for seamless MCP protocol compliance and optimal performance.
## 🚀 Installation Options
### ⚡ One-Click Install with Smithery
**For Claude Desktop:**
```bash
npx -y @smithery/cli@latest install @alperenkocyigit/semantic-scholar-graph-api --client claude --config "{}"
```
**For Cursor IDE:**
Navigate to `Settings → Cursor Settings → MCP → Add new server` and paste:
```bash
npx -y @smithery/cli@latest run @alperenkocyigit/semantic-scholar-graph-api --client cursor --config "{}"
```
**For Windsurf:**
```bash
npx -y @smithery/cli@latest install @alperenkocyigit/semantic-scholar-graph-api --client windsurf --config "{}"
```
**For Cline:**
```bash
npx -y @smithery/cli@latest install @alperenkocyigit/semantic-scholar-graph-api --client cline --config "{}"
```
### 🔧 Manual Installation
1. **Clone the repository:**
```bash
git clone https://github.com/alperenkocyigit/semantic-scholar-graph-api.git
cd semantic-scholar-graph-api
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run the MCP Streamable HTTP server:**
```bash
python server.py
```
---
## 🔧 Configuration Guide
### Local Setups
#### Claude Desktop Setup
**macOS/Linux Configuration:**
Add to your `claude_desktop_config.json`:
```json
{
"mcpServers": {
"semanticscholar": {
"command": "python",
"args": ["/path/to/your/semantic_scholar_server.py"]
}
}
}
```
**Windows Configuration:**
```json
{
"mcpServers": {
"semanticscholar": {
"command": "C:\\Users\\YOUR_USERNAME\\miniconda3\\envs\\mcp_server\\python.exe",
"args": ["D:\\path\\to\\your\\semantic_scholar_server.py"],
"env": {},
"disabled": false,
"autoApprove": []
}
}
}
```
#### Cline Integration
```json
{
"mcpServers": {
"semanticscholar": {
"command": "bash",
"args": [
"-c",
"source /path/to/your/.venv/bin/activate && python /path/to/your/semantic_scholar_server.py"
],
"env": {},
"disabled": false,
"autoApprove": []
}
}
}
```
### Remote Setups
#### Auto Configuration
```bash
npx -y @smithery/cli@latest install @alperenkocyigit/semantic-scholar-graph-api --client <valid-client-name> --key <your-smithery-api-key>
```
**Valid client names: [claude,cursor,vscode,boltai]**
#### Json Configuration
**macOS/Linux Configuration:**
```json
{
"mcpServers": {
"semantic-scholar-graph-api": {
"command": "npx",
"args": [
"-y",
"@smithery/cli@latest",
"run",
"@alperenkocyigit/semantic-scholar-graph-api",
"--key",
"your-smithery-api-key"
]
}
}
}
```
**Windows Configuration:**
```json
{
"mcpServers": {
"semantic-scholar-graph-api": {
"command": "cmd",
"args": [
"/c",
"npx",
"-y",
"@smithery/cli@latest",
"run",
"@alperenkocyigit/semantic-scholar-graph-api",
"--key",
"your-smithery-api-key"
]
}
}
}
```
**WSL Configuration:**
```json
{
"mcpServers": {
"semantic-scholar-graph-api": {
"command": "wsl",
"args": [
"npx",
"-y",
"@smithery/cli@latest",
"run",
"@alperenkocyigit/semantic-scholar-graph-api",
"--key",
"your-smithery-api-key"
]
}
}
}
```
---
## 🎯 Available Tools
| Tool | Description | Use Case |
|------|-------------|----------|
| `search_semantic_scholar` | Search papers by query | Literature discovery |
| `search_semantic_scholar_authors` | Find authors by name | Researcher identification |
| `get_semantic_scholar_paper_details` | Get comprehensive paper info | Detailed analysis |
| `get_semantic_scholar_author_details` | Get author profiles | Author research |
| `get_semantic_scholar_citations_and_references` | Fetch citation network | Impact analysis |
| `get_semantic_scholar_paper_match` | Find exact paper matches | Precise searching |
| `get_semantic_scholar_paper_autocomplete` | Get title suggestions | Smart completion |
| `get_semantic_scholar_papers_batch` | Bulk paper retrieval | Batch processing |
| `get_semantic_scholar_authors_batch` | Bulk author data | Mass analysis |
| `search_semantic_scholar_snippets` | Search text content | Content discovery |
| `get_semantic_scholar_paper_recommendations_from_lists` | Get recommendations from positive/negative examples | AI-powered discovery |
| `get_semantic_scholar_paper_recommendations` | Get recommendations from single paper | Similar paper finding |
---
## 💡 Usage Examples
### Basic Paper Search
```python
# Search for papers on machine learning
results = await search_semantic_scholar("machine learning", num_results=5)
```
### Author Research
```python
# Find authors working on natural language processing
authors = await search_semantic_scholar_authors("natural language processing")
```
### Citation Analysis
```python
# Get citation network for a specific paper
citations = await get_semantic_scholar_citations_and_references("paper_id_here")
```
### 🆕 AI-Powered Paper Recommendations
#### Multi-Example Recommendations
```python
# Get recommendations based on multiple positive and negative examples
positive_papers = ["paper_id_1", "paper_id_2", "paper_id_3"]
negative_papers = ["bad_paper_id_1", "bad_paper_id_2"]
recommendations = await get_semantic_scholar_paper_recommendations_from_lists(
positive_paper_ids=positive_papers,
negative_paper_ids=negative_papers,
limit=20
)
```
#### Single Paper Similarity
```python
# Find papers similar to a specific research work
similar_papers = await get_semantic_scholar_paper_recommendations(
paper_id="target_paper_id",
limit=15
)
```
#### Content Discovery
```python
# Search for specific text content within papers
snippets = await search_semantic_scholar_snippets(
query="neural network optimization",
limit=10
)
```
---
## 📂 Project Architecture
```
semantic-scholar-graph-api/
├── 📄 README.md # Project documentation
├── 📋 requirements.txt # Python dependencies
├── 🔍 search.py # Core API interaction module
├── 🖥️ server.py # MCP server implementation
└── 🗂️ __pycache__/ # Compiled Python files
```
### Core Components
- **`search.py`**: Handles all interactions with the Semantic Scholar API, including rate limiting, error handling, and data processing
- **`server.py`**: Implements the MCP server protocol and exposes tools for AI assistant integration
---
## 🤝 Contributing
We welcome contributions from the community! Here's how you can help:
### Ways to Contribute
- 🐛 **Bug Reports**: Found an issue? Let us know!
- 💡 **Feature Requests**: Have ideas for improvements?
- 🔧 **Code Contributions**: Submit pull requests
- 📖 **Documentation**: Help improve our docs
### Development Setup
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Make your changes and test thoroughly
4. Commit your changes: `git commit -m 'Add amazing feature'`
5. Push to the branch: `git push origin feature/amazing-feature`
6. Open a Pull Request
---
## 📄 License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
---
## 🙏 Acknowledgments
- **Semantic Scholar Team** for providing the excellent API
- **Model Context Protocol** community for the framework
- **Contributors** who help improve this project
---
## 📞 Support
- **Issues**: [GitHub Issues](https://github.com/alperenkocyigit/semantic-scholar-graph-api/issues)
- **Discussions**: [GitHub Discussions](https://github.com/alperenkocyigit/semantic-scholar-graph-api/discussions)
- **Maintainer**: [@alperenkocyigit](https://github.com/alperenkocyigit)
---
<div align="center">
<strong>Made with ❤️ for the research community</strong>
<br>
<sub>Empowering AI agents with academic knowledge</sub>
</div>
```
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
```
requests
bs4
mcp
uvicorn
httpx
pydantic
anyio
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port
EXPOSE 3000
# Set environment variables
ENV HOST=0.0.0.0
ENV PORT=3000
# Health check for MCP server
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/ || exit 1
# Run the HTTP server
CMD ["python", "server.py"]
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
name: semantic-scholar-graph-api
description: "A comprehensive Model Context Protocol (MCP) server for accessing Semantic Scholar's academic database"
version: "1.0.0"
transport: streamable-http
runtime: python
entrypoint: server.py
port: 8000
capabilities:
tools: true
resources: false
prompts: false
environment:
HOST: "0.0.0.0"
PORT: "8000"
dependencies:
- requests
- bs4
- mcp
- uvicorn
- httpx
- pydantic
- anyio
python:
version: "3.11"
requirements: requirements.txt
health_check:
path: "/"
timeout: 30
interval: 30
resources:
memory: 256
cpu: 0.5
tags:
- academic
- research
- papers
- citations
- semantic-scholar
- ai
- machine-learning
```
--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
```python
#!/usr/bin/env python3
"""
Semantic Scholar MCP Server - Streamable HTTP Transport
A Model Context Protocol (MCP) server for accessing Semantic Scholar's academic database.
Implements the MCP Streamable HTTP transport protocol.
"""
import asyncio
import logging
import os
from typing import Any, Dict, List, Optional
from mcp.server import FastMCP
from pydantic import BaseModel, Field
from search import (
search_papers, get_paper_details, get_author_details, get_citations_and_references,
search_authors, search_paper_match, get_paper_autocomplete, get_papers_batch,
get_authors_batch, search_snippets, get_paper_recommendations_from_lists,
get_paper_recommendations
)
# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
# Initialize the FastMCP server
app = FastMCP("Semantic Scholar MCP Server")
# Tool implementations
@app.tool()
async def search_semantic_scholar_papers(
query: str,
num_results: int = 10
) -> List[Dict[str, Any]]:
"""
Search for papers on Semantic Scholar using a query string.
Args:
query: Search query for papers
num_results: Number of results to return (max 100)
Returns:
List of paper objects with details like title, authors, year, abstract, etc.
"""
logger.info(f"Searching for papers with query: {query}, num_results: {num_results}")
try:
results = await asyncio.to_thread(search_papers, query, num_results)
return results
except Exception as e:
logger.error(f"Error searching papers: {e}")
raise Exception(f"An error occurred while searching: {str(e)}")
@app.tool()
async def get_semantic_scholar_paper_details(
paper_id: str
) -> Dict[str, Any]:
"""
Get details of a specific paper on Semantic Scholar.
Args:
paper_id: Paper ID (e.g., Semantic Scholar paper ID or DOI)
Returns:
Paper object with comprehensive details
"""
logger.info(f"Fetching paper details for paper ID: {paper_id}")
try:
paper = await asyncio.to_thread(get_paper_details, paper_id)
return paper
except Exception as e:
logger.error(f"Error fetching paper details: {e}")
raise Exception(f"An error occurred while fetching paper details: {str(e)}")
@app.tool()
async def get_semantic_scholar_author_details(
author_id: str
) -> Dict[str, Any]:
"""
Get details of a specific author on Semantic Scholar.
Args:
author_id: Author ID (Semantic Scholar author ID)
Returns:
Author object with comprehensive details including publications, h-index, etc.
"""
logger.info(f"Fetching author details for author ID: {author_id}")
try:
author = await asyncio.to_thread(get_author_details, author_id)
return author
except Exception as e:
logger.error(f"Error fetching author details: {e}")
raise Exception(f"An error occurred while fetching author details: {str(e)}")
@app.tool()
async def get_semantic_scholar_citations_and_references(
paper_id: str
) -> Dict[str, Any]:
"""
Get citations and references for a specific paper on Semantic Scholar.
Args:
paper_id: Paper ID to get citations and references for
Returns:
Object containing citations and references lists
"""
logger.info(f"Fetching citations and references for paper ID: {paper_id}")
try:
citations_refs = await asyncio.to_thread(get_citations_and_references, paper_id)
return citations_refs
except Exception as e:
logger.error(f"Error fetching citations and references: {e}")
raise Exception(f"An error occurred while fetching citations and references: {str(e)}")
@app.tool()
async def search_semantic_scholar_authors(
query: str,
limit: int = 10
) -> List[Dict[str, Any]]:
"""
Search for authors on Semantic Scholar using a query string.
Args:
query: Search query for authors
limit: Maximum number of authors to return
Returns:
List of author objects with details
"""
logger.info(f"Searching for authors with query: {query}, limit: {limit}")
try:
results = await asyncio.to_thread(search_authors, query, limit)
return results
except Exception as e:
logger.error(f"Error searching authors: {e}")
raise Exception(f"An error occurred while searching authors: {str(e)}")
@app.tool()
async def get_semantic_scholar_paper_match(
query: str
) -> Dict[str, Any]:
"""
Find the best matching paper on Semantic Scholar using title-based search.
Args:
query: Paper title or description to match
Returns:
Best matching paper object
"""
logger.info(f"Finding paper match for query: {query}")
try:
result = await asyncio.to_thread(search_paper_match, query)
return result
except Exception as e:
logger.error(f"Error finding paper match: {e}")
raise Exception(f"An error occurred while finding paper match: {str(e)}")
@app.tool()
async def get_semantic_scholar_paper_autocomplete(
query: str
) -> List[str]:
"""
Get paper title autocompletion suggestions for a partial query.
Args:
query: Partial paper title for autocomplete suggestions
Returns:
List of autocomplete suggestions
"""
logger.info(f"Getting paper autocomplete for query: {query}")
try:
results = await asyncio.to_thread(get_paper_autocomplete, query)
return results
except Exception as e:
logger.error(f"Error getting autocomplete suggestions: {e}")
raise Exception(f"An error occurred while getting autocomplete suggestions: {str(e)}")
@app.tool()
async def get_semantic_scholar_papers_batch(
paper_ids: List[str]
) -> List[Dict[str, Any]]:
"""
Get details for multiple papers at once using batch API.
Args:
paper_ids: List of paper IDs to fetch
Returns:
List of paper objects
"""
logger.info(f"Fetching batch paper details for {len(paper_ids)} papers")
try:
results = await asyncio.to_thread(get_papers_batch, paper_ids)
return results
except Exception as e:
logger.error(f"Error fetching batch paper details: {e}")
raise Exception(f"An error occurred while fetching batch paper details: {str(e)}")
@app.tool()
async def get_semantic_scholar_authors_batch(
author_ids: List[str]
) -> List[Dict[str, Any]]:
"""
Get details for multiple authors at once using batch API.
Args:
author_ids: List of author IDs to fetch
Returns:
List of author objects
"""
logger.info(f"Fetching batch author details for {len(author_ids)} authors")
try:
results = await asyncio.to_thread(get_authors_batch, author_ids)
return results
except Exception as e:
logger.error(f"Error fetching batch author details: {e}")
raise Exception(f"An error occurred while fetching batch author details: {str(e)}")
@app.tool()
async def search_semantic_scholar_snippets(
query: str,
limit: int = 10
) -> List[Dict[str, Any]]:
"""
Search for text snippets from papers that match the query.
Args:
query: Search query for text snippets within papers
limit: Maximum number of snippets to return
Returns:
List of snippet objects with context and source paper information
"""
logger.info(f"Searching for text snippets with query: {query}, limit: {limit}")
try:
results = await asyncio.to_thread(search_snippets, query, limit)
return results
except Exception as e:
logger.error(f"Error searching snippets: {e}")
raise Exception(f"An error occurred while searching snippets: {str(e)}")
@app.tool()
async def get_semantic_scholar_paper_recommendations_from_lists(
positive_paper_ids: List[str],
negative_paper_ids: List[str] = [],
limit: int = 10
) -> List[Dict[str, Any]]:
"""
Get recommended papers based on lists of positive and negative example papers.
Args:
positive_paper_ids: List of positive example paper IDs
negative_paper_ids: List of negative example paper IDs
limit: Maximum number of recommendations to return
Returns:
List of recommended paper objects with relevance scores
"""
logger.info(f"Getting paper recommendations from lists: {len(positive_paper_ids)} positive, {len(negative_paper_ids)} negative, limit: {limit}")
try:
results = await asyncio.to_thread(get_paper_recommendations_from_lists, positive_paper_ids, negative_paper_ids or [], limit)
return results
except Exception as e:
logger.error(f"Error getting paper recommendations from lists: {e}")
raise Exception(f"An error occurred while getting paper recommendations from lists: {str(e)}")
@app.tool()
async def get_semantic_scholar_paper_recommendations(
paper_id: str,
limit: int = 10
) -> List[Dict[str, Any]]:
"""
Get recommended papers for a single positive example paper.
Args:
paper_id: Paper ID to get recommendations for
limit: Maximum number of recommendations to return
Returns:
List of recommended paper objects with relevance scores
"""
logger.info(f"Getting paper recommendations for single paper: {paper_id}, limit: {limit}")
try:
results = await asyncio.to_thread(get_paper_recommendations, paper_id, limit)
return results
except Exception as e:
logger.error(f"Error getting paper recommendations for single paper: {e}")
raise Exception(f"An error occurred while getting paper recommendations for single paper: {str(e)}")
if __name__ == "__main__":
# Get configuration from environment variables
port = int(os.getenv('PORT', 3000))
host = os.getenv('HOST', '0.0.0.0')
logger.info(f"Starting Semantic Scholar MCP HTTP Server on {host}:{port}")
# Run the FastMCP server with streamable HTTP transport
app.run(transport="streamable-http")
```
--------------------------------------------------------------------------------
/search.py:
--------------------------------------------------------------------------------
```python
import requests
import time
import logging
from typing import List, Dict, Any, Optional
# Base URL for the Semantic Scholar API
BASE_URL = "https://api.semanticscholar.org/graph/v1"
BASE_RECOMMENDATION_URL = "https://api.semanticscholar.org/recommendations/v1"
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def make_request_with_retry(url: str, params: Optional[Dict] = None, json_data: Optional[Dict] = None,
method: str = "GET", max_retries: int = 5, base_delay: float = 1.0) -> Dict[str, Any]:
"""
Make HTTP request with retry logic for 429 rate limit errors.
Args:
url: The URL to make the request to
params: Query parameters for GET requests
json_data: JSON data for POST requests
method: HTTP method (GET or POST)
max_retries: Maximum number of retry attempts
base_delay: Base delay in seconds, will be exponentially increased
Returns:
JSON response as dictionary
Raises:
Exception: If all retries are exhausted or other errors occur
"""
for attempt in range(max_retries + 1):
try:
if method.upper() == "GET":
response = requests.get(url, params=params, timeout=30)
elif method.upper() == "POST":
response = requests.post(url, params=params, json=json_data, timeout=30)
else:
raise ValueError(f"Unsupported HTTP method: {method}")
# Check if request was successful
if response.status_code == 200:
return response.json()
# Handle rate limiting (429 Too Many Requests)
elif response.status_code == 429:
if attempt < max_retries:
# Exponential backoff with jitter
delay = base_delay * (2 ** attempt)
logger.warning(f"Rate limit hit (429). Retrying in {delay} seconds... (attempt {attempt + 1}/{max_retries + 1})")
time.sleep(delay)
continue
else:
raise Exception(f"Rate limit exceeded. Max retries ({max_retries}) exhausted.")
# Handle other HTTP errors
else:
response.raise_for_status()
except requests.exceptions.Timeout:
if attempt < max_retries:
delay = base_delay * (2 ** attempt)
logger.warning(f"Request timeout. Retrying in {delay} seconds... (attempt {attempt + 1}/{max_retries + 1})")
time.sleep(delay)
continue
else:
raise Exception("Request timeout. Max retries exhausted.")
except requests.exceptions.RequestException as e:
if attempt < max_retries:
delay = base_delay * (2 ** attempt)
logger.warning(f"Request failed: {e}. Retrying in {delay} seconds... (attempt {attempt + 1}/{max_retries + 1})")
time.sleep(delay)
continue
else:
raise Exception(f"Request failed after {max_retries} retries: {e}")
raise Exception("Unexpected error in request retry logic")
def search_papers(query: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Search for papers using a query string."""
url = f"{BASE_URL}/paper/search"
params = {
"query": query,
"limit": min(limit, 100), # API limit is 100
"fields": "paperId,title,abstract,year,authors,url,venue,publicationTypes,citationCount,tldr"
}
try:
response_data = make_request_with_retry(url, params=params)
papers = response_data.get("data", [])
return [
{
"paperId": paper.get("paperId"),
"title": paper.get("title"),
"abstract": paper.get("abstract"),
"year": paper.get("year"),
"authors": [{"name": author.get("name"), "authorId": author.get("authorId")}
for author in paper.get("authors", [])],
"url": paper.get("url"),
"venue": paper.get("venue"),
"publicationTypes": paper.get("publicationTypes"),
"citationCount": paper.get("citationCount"),
"tldr": {
"model": paper.get("tldr", {}).get("model", ""),
"text": paper.get("tldr", {}).get("text", "")
} if paper.get("tldr") else None
} for paper in papers
]
except Exception as e:
logger.error(f"Error searching papers: {e}")
return []
def get_paper_details(paper_id: str) -> Dict[str, Any]:
"""Get details of a specific paper."""
url = f"{BASE_URL}/paper/{paper_id}"
params = {
"fields": "paperId,title,abstract,year,authors,url,venue,publicationTypes,citationCount,referenceCount,influentialCitationCount,fieldsOfStudy,publicationDate,tldr"
}
try:
response_data = make_request_with_retry(url, params=params)
return {
"paperId": response_data.get("paperId"),
"title": response_data.get("title"),
"abstract": response_data.get("abstract"),
"year": response_data.get("year"),
"authors": [{"name": author.get("name"), "authorId": author.get("authorId")}
for author in response_data.get("authors", [])],
"url": response_data.get("url"),
"venue": response_data.get("venue"),
"publicationTypes": response_data.get("publicationTypes"),
"citationCount": response_data.get("citationCount"),
"referenceCount": response_data.get("referenceCount"),
"influentialCitationCount": response_data.get("influentialCitationCount"),
"fieldsOfStudy": response_data.get("fieldsOfStudy"),
"publicationDate": response_data.get("publicationDate"),
"tldr": {
"model": response_data.get("tldr", {}).get("model", ""),
"text": response_data.get("tldr", {}).get("text", "")
} if response_data.get("tldr") else None
}
except Exception as e:
logger.error(f"Error getting paper details for {paper_id}: {e}")
return {"error": f"Failed to get paper details: {e}"}
def get_author_details(author_id: str) -> Dict[str, Any]:
"""Get details of a specific author."""
url = f"{BASE_URL}/author/{author_id}"
params = {
"fields": "authorId,name,url,affiliations,paperCount,citationCount,hIndex"
}
try:
response_data = make_request_with_retry(url, params=params)
return {
"authorId": response_data.get("authorId"),
"name": response_data.get("name"),
"url": response_data.get("url"),
"affiliations": response_data.get("affiliations"),
"paperCount": response_data.get("paperCount"),
"citationCount": response_data.get("citationCount"),
"hIndex": response_data.get("hIndex")
}
except Exception as e:
logger.error(f"Error getting author details for {author_id}: {e}")
return {"error": f"Failed to get author details: {e}"}
def get_paper_citations(paper_id: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Get citations for a specific paper."""
url = f"{BASE_URL}/paper/{paper_id}/citations"
params = {
"limit": min(limit, 100), # API limit is 100
"fields": "contexts,isInfluential,title,authors,year,venue"
}
try:
response_data = make_request_with_retry(url, params=params)
citations = response_data.get("data", [])
return [
{
"contexts": citation.get("contexts", []),
"isInfluential": citation.get("isInfluential"),
"citingPaper": {
"paperId": citation.get("citingPaper", {}).get("paperId"),
"title": citation.get("citingPaper", {}).get("title"),
"authors": [{"name": author.get("name"), "authorId": author.get("authorId")}
for author in citation.get("citingPaper", {}).get("authors", [])],
"year": citation.get("citingPaper", {}).get("year"),
"venue": citation.get("citingPaper", {}).get("venue")
}
} for citation in citations
]
except Exception as e:
logger.error(f"Error getting citations for {paper_id}: {e}")
return []
def get_paper_references(paper_id: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Get references for a specific paper."""
url = f"{BASE_URL}/paper/{paper_id}/references"
params = {
"limit": min(limit, 100), # API limit is 100
"fields": "contexts,isInfluential,title,authors,year,venue"
}
try:
response_data = make_request_with_retry(url, params=params)
references = response_data.get("data", [])
return [
{
"contexts": reference.get("contexts", []),
"isInfluential": reference.get("isInfluential"),
"citedPaper": {
"paperId": reference.get("citedPaper", {}).get("paperId"),
"title": reference.get("citedPaper", {}).get("title"),
"authors": [{"name": author.get("name"), "authorId": author.get("authorId")}
for author in reference.get("citedPaper", {}).get("authors", [])],
"year": reference.get("citedPaper", {}).get("year"),
"venue": reference.get("citedPaper", {}).get("venue")
}
} for reference in references
]
except Exception as e:
logger.error(f"Error getting references for {paper_id}: {e}")
return []
def get_citations_and_references(paper_id: str) -> Dict[str, List[Dict[str, Any]]]:
"""Get citations and references for a paper using paper ID."""
citations = get_paper_citations(paper_id)
references = get_paper_references(paper_id)
return {
"citations": citations,
"references": references
}
def search_authors(query: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Search for authors using a query string."""
url = f"{BASE_URL}/author/search"
params = {
"query": query,
"limit": min(limit, 100), # API limit is 100
"fields": "authorId,name,url,affiliations,paperCount,citationCount,hIndex"
}
try:
response_data = make_request_with_retry(url, params=params)
authors = response_data.get("data", [])
return [
{
"authorId": author.get("authorId"),
"name": author.get("name"),
"url": author.get("url"),
"affiliations": author.get("affiliations"),
"paperCount": author.get("paperCount"),
"citationCount": author.get("citationCount"),
"hIndex": author.get("hIndex")
} for author in authors
]
except Exception as e:
logger.error(f"Error searching authors: {e}")
return []
def search_paper_match(query: str) -> Dict[str, Any]:
"""Find the best matching paper using title-based search."""
url = f"{BASE_URL}/paper/search/match"
params = {
"query": query,
"fields": "paperId,title,abstract,year,authors,url,venue,publicationTypes,citationCount,tldr"
}
try:
response_data = make_request_with_retry(url, params=params)
if response_data.get("data"):
paper = response_data["data"][0] # Returns single best match
return {
"matchScore": paper.get("matchScore"),
"paperId": paper.get("paperId"),
"title": paper.get("title"),
"abstract": paper.get("abstract"),
"year": paper.get("year"),
"authors": [{"name": author.get("name"), "authorId": author.get("authorId")}
for author in paper.get("authors", [])],
"url": paper.get("url"),
"venue": paper.get("venue"),
"publicationTypes": paper.get("publicationTypes"),
"citationCount": paper.get("citationCount"),
"tldr": {
"model": paper.get("tldr", {}).get("model", ""),
"text": paper.get("tldr", {}).get("text", "")
} if paper.get("tldr") else None
}
else:
return {"error": "No matching paper found"}
except Exception as e:
logger.error(f"Error finding paper match: {e}")
return {"error": f"Failed to find paper match: {e}"}
def get_paper_autocomplete(query: str) -> List[Dict[str, Any]]:
"""Get paper title autocompletion suggestions."""
url = f"{BASE_URL}/paper/autocomplete"
params = {
"query": query[:100] # API truncates to 100 characters
}
try:
response_data = make_request_with_retry(url, params=params)
matches = response_data.get("matches", [])
return [
{
"id": match.get("id"),
"title": match.get("title"),
"authorsYear": match.get("authorsYear")
} for match in matches
]
except Exception as e:
logger.error(f"Error getting autocomplete: {e}")
return []
def get_papers_batch(paper_ids: List[str]) -> List[Dict[str, Any]]:
"""Get details for multiple papers using batch API."""
url = f"{BASE_URL}/paper/batch"
# API limit is 500 papers at a time
if len(paper_ids) > 500:
paper_ids = paper_ids[:500]
logger.warning(f"Paper IDs list truncated to 500 items (API limit)")
params = {
"fields": "paperId,title,abstract,year,authors,url,venue,publicationTypes,citationCount,referenceCount,influentialCitationCount,fieldsOfStudy,publicationDate,tldr"
}
json_data = {"ids": paper_ids}
try:
response_data = make_request_with_retry(url, params=params, json_data=json_data, method="POST")
if isinstance(response_data, list):
return [
{
"paperId": paper.get("paperId"),
"title": paper.get("title"),
"abstract": paper.get("abstract"),
"year": paper.get("year"),
"authors": [{"name": author.get("name"), "authorId": author.get("authorId")}
for author in paper.get("authors", [])],
"url": paper.get("url"),
"venue": paper.get("venue"),
"publicationTypes": paper.get("publicationTypes"),
"citationCount": paper.get("citationCount"),
"referenceCount": paper.get("referenceCount"),
"influentialCitationCount": paper.get("influentialCitationCount"),
"fieldsOfStudy": paper.get("fieldsOfStudy"),
"publicationDate": paper.get("publicationDate"),
"tldr": {
"model": paper.get("tldr", {}).get("model", ""),
"text": paper.get("tldr", {}).get("text", "")
} if paper.get("tldr") else None
} for paper in response_data if paper # Filter out None entries
]
else:
return []
except Exception as e:
logger.error(f"Error getting papers batch: {e}")
return []
def get_authors_batch(author_ids: List[str]) -> List[Dict[str, Any]]:
"""Get details for multiple authors using batch API."""
url = f"{BASE_URL}/author/batch"
# API limit is 1000 authors at a time
if len(author_ids) > 1000:
author_ids = author_ids[:1000]
logger.warning(f"Author IDs list truncated to 1000 items (API limit)")
params = {
"fields": "authorId,name,url,affiliations,paperCount,citationCount,hIndex"
}
json_data = {"ids": author_ids}
try:
response_data = make_request_with_retry(url, params=params, json_data=json_data, method="POST")
if isinstance(response_data, list):
return [
{
"authorId": author.get("authorId"),
"name": author.get("name"),
"url": author.get("url"),
"affiliations": author.get("affiliations"),
"paperCount": author.get("paperCount"),
"citationCount": author.get("citationCount"),
"hIndex": author.get("hIndex")
} for author in response_data if author # Filter out None entries
]
else:
return []
except Exception as e:
logger.error(f"Error getting authors batch: {e}")
return []
def search_snippets(query: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Search for text snippets from papers."""
url = f"{BASE_URL}/snippet/search"
params = {
"query": query,
"limit": min(limit, 1000), # API limit is 1000
"fields": "snippet.text,snippet.snippetKind,snippet.section,snippet.snippetOffset"
}
try:
response_data = make_request_with_retry(url, params=params)
data = response_data.get("data", [])
return [
{
"score": item.get("score"),
"snippet": {
"text": item.get("snippet", {}).get("text"),
"snippetKind": item.get("snippet", {}).get("snippetKind"),
"section": item.get("snippet", {}).get("section"),
"snippetOffset": item.get("snippet", {}).get("snippetOffset")
},
"paper": {
"corpusId": item.get("paper", {}).get("corpusId"),
"title": item.get("paper", {}).get("title"),
"authors": item.get("paper", {}).get("authors", [])
}
} for item in data
]
except Exception as e:
logger.error(f"Error searching snippets: {e}")
return []
def get_paper_recommendations_from_lists(positive_paper_ids: List[str], negative_paper_ids: List[str] = None, limit: int = 10) -> List[Dict[str, Any]]:
"""Get recommended papers based on lists of positive and negative example papers."""
url = f"{BASE_RECOMMENDATION_URL}/papers"
# Prepare the request payload
payload = {
"positivePaperIds": positive_paper_ids
}
if negative_paper_ids:
payload["negativePaperIds"] = negative_paper_ids
params = {
"limit": min(limit, 500),
"fields": "paperId,corpusId,externalIds,url,title,abstract,venue,publicationVenue,year,referenceCount,citationCount,influentialCitationCount,isOpenAccess,openAccessPdf,fieldsOfStudy,s2FieldsOfStudy,publicationTypes,publicationDate,journal,citationStyles,authors"
}
try:
response_data = make_request_with_retry(url, params=params, json_data=payload, method="POST")
# Handle response structure with recommendedPapers wrapper
papers = response_data.get("recommendedPapers", [])
return [
{
"paperId": paper.get("paperId"),
"corpusId": paper.get("corpusId"),
"externalIds": paper.get("externalIds"),
"url": paper.get("url"),
"title": paper.get("title"),
"abstract": paper.get("abstract"),
"venue": paper.get("venue"),
"publicationVenue": paper.get("publicationVenue"),
"year": paper.get("year"),
"referenceCount": paper.get("referenceCount"),
"citationCount": paper.get("citationCount"),
"influentialCitationCount": paper.get("influentialCitationCount"),
"isOpenAccess": paper.get("isOpenAccess"),
"openAccessPdf": paper.get("openAccessPdf"),
"fieldsOfStudy": paper.get("fieldsOfStudy"),
"s2FieldsOfStudy": paper.get("s2FieldsOfStudy"),
"publicationTypes": paper.get("publicationTypes"),
"publicationDate": paper.get("publicationDate"),
"journal": paper.get("journal"),
"citationStyles": paper.get("citationStyles"),
"authors": [
{
"authorId": author.get("authorId"),
"name": author.get("name")
} for author in paper.get("authors", [])
]
} for paper in papers
]
except Exception as e:
logger.error(f"Error getting paper recommendations from lists: {e}")
return []
def get_paper_recommendations(paper_id: str, limit: int = 10) -> List[Dict[str, Any]]:
"""Get recommended papers for a single positive example paper."""
url = f"{BASE_RECOMMENDATION_URL}/papers/forpaper/{paper_id}"
params = {
"limit": min(limit, 500), # API typical limit
"fields": "paperId,corpusId,externalIds,url,title,abstract,venue,publicationVenue,year,referenceCount,citationCount,influentialCitationCount,isOpenAccess,openAccessPdf,fieldsOfStudy,s2FieldsOfStudy,publicationTypes,publicationDate,journal,citationStyles,authors"
}
try:
response_data = make_request_with_retry(url, params=params)
# Handle response structure with recommendedPapers wrapper
papers = response_data.get("recommendedPapers", [])
return [
{
"paperId": paper.get("paperId"),
"corpusId": paper.get("corpusId"),
"externalIds": paper.get("externalIds"),
"url": paper.get("url"),
"title": paper.get("title"),
"abstract": paper.get("abstract"),
"venue": paper.get("venue"),
"publicationVenue": paper.get("publicationVenue"),
"year": paper.get("year"),
"referenceCount": paper.get("referenceCount"),
"citationCount": paper.get("citationCount"),
"influentialCitationCount": paper.get("influentialCitationCount"),
"isOpenAccess": paper.get("isOpenAccess"),
"openAccessPdf": paper.get("openAccessPdf"),
"fieldsOfStudy": paper.get("fieldsOfStudy"),
"s2FieldsOfStudy": paper.get("s2FieldsOfStudy"),
"publicationTypes": paper.get("publicationTypes"),
"publicationDate": paper.get("publicationDate"),
"journal": paper.get("journal"),
"citationStyles": paper.get("citationStyles"),
"authors": [
{
"authorId": author.get("authorId"),
"name": author.get("name")
} for author in paper.get("authors", [])
]
} for paper in papers
]
except Exception as e:
logger.error(f"Error getting paper recommendations for {paper_id}: {e}")
return []
def main():
"""Test function for the API client."""
try:
# Search for papers
search_results = search_papers("machine learning", limit=2)
print(f"Search results: {search_results}")
# Get paper details
if search_results:
paper_id = search_results[0]['paperId']
if paper_id:
paper_details = get_paper_details(paper_id)
print(f"Paper details: {paper_details}")
# Get citations and references
citations_refs = get_citations_and_references(paper_id)
print(f"Citations count: {len(citations_refs['citations'])}")
print(f"References count: {len(citations_refs['references'])}")
# Get author details
author_id = "1741101" # Example author ID
author_details = get_author_details(author_id)
print(f"Author details: {author_details}")
# Search for authors
author_search_results = search_authors("john", limit=2)
print(f"Author search results: {author_search_results}")
# Find paper match
if search_results:
paper_title = search_results[0]['title']
paper_match = search_paper_match(paper_title)
print(f"Paper match: {paper_match}")
# Get paper autocomplete
if search_results:
paper_query = search_results[0]['title'][:10] # First 10 characters
autocomplete_results = get_paper_autocomplete(paper_query)
print(f"Autocomplete results: {autocomplete_results}")
# Get papers batch
if search_results:
paper_ids = [paper['paperId'] for paper in search_results]
papers_batch = get_papers_batch(paper_ids)
print(f"Papers batch: {papers_batch}")
# Get authors batch
if author_search_results:
author_ids = [author['authorId'] for author in author_search_results]
authors_batch = get_authors_batch(author_ids)
print(f"Authors batch: {authors_batch}")
# Search snippets
if search_results:
snippet_query = search_results[0]['title']
snippets = search_snippets(snippet_query, limit=2)
print(f"Snippets: {snippets}")
# Get paper recommendations from lists
if search_results:
positive_paper_ids = [search_results[0]['paperId']]
negative_paper_ids = [search_results[1]['paperId']] # Just for testing, may not be relevant
recommendations = get_paper_recommendations_from_lists(positive_paper_ids, negative_paper_ids, limit=2)
print(f"Recommendations from lists: {recommendations}")
# Get paper recommendations single
if search_results:
paper_id = search_results[0]['paperId']
single_recommendations = get_paper_recommendations(paper_id, limit=2)
print(f"Single paper recommendations: {single_recommendations}")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
main()
```