zzaebok/mcp-wikidata # codebase.md

# Directory Structure

```
├── .gitignore
├── .python-version
├── Dockerfile
├── LICENSE
├── pyproject.toml
├── README.md
├── smithery.yaml
├── src
│   ├── client.py
│   └── server.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------

```
3.11

```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info

# Virtual environments
.venv

```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
## Wikidata MCP Server

[![smithery badge](https://smithery.ai/badge/@zzaebok/mcp-wikidata)](https://smithery.ai/server/@zzaebok/mcp-wikidata)

A server implementation for Wikidata API using the Model Context Protocol (MCP).
This project provides tools to interact with Wikidata, such as **searching identifiers** (entity and property), **extracting metadata** (label and description) and **executing sparql query**.

---

### Installation

#### Installing via Smithery

To install Wikidata MCP Server for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@zzaebok/mcp-wikidata):

```bash
npx -y @smithery/cli install @zzaebok/mcp-wikidata --client claude
```

#### Installing Manually
Install `uv` if it is not installed yet.

```bash
$ curl -LsSf https://astral.sh/uv/install.sh | sh
```

Then, install dependencies.

```bash
$ git clone https://github.com/zzaebok/mcp-wikidata.git
$ cd mcp-wikidata
$ uv sync
# if you want to run client example together
$ uv sync --extra example
```

---

### Run

Run the server with:

```bash
$ uv run src/server.py
```

If you want to test it with a simple client code (with `langchain-mcp-adapters`), run the client with:

```bash
# in another shell
$ uv run src/client.py
```

The LLM extracts valid entity and property identifiers, executes a sparql query, and finally recommend a movie directed by Bong Joon-ho.

<details>
  <summary><i>See the execution output</i></summary>

```python
{
  "messages": [
      HumanMessage(
          content="Can you recommend me a movie directed by Bong Joonho?",
      ),
      AIMessage(
          tool_calls=[
              {
                  "name": "search_entity",
                  "args": {"query": "Bong Joon-ho"},
              }
          ],
      ),
      ToolMessage(
          content="Q495980",
          name="search_entity",
      ),
      AIMessage(
          tool_calls=[
              {
                  "name": "get_properties",
                  "args": {"entity_id": "Q495980"},
              }
          ],
      ),
      ToolMessage(
          content='["P345", "P244", "P214", "P227", ...]',
          name="get_properties",
      ),
      AIMessage(
          tool_calls=[
              {
                  "name": "search_property",
                  "args": {"query": "director"},
              }
          ],
      ),
      ToolMessage(
          content="P57",
          name="search_property",
      ),
      AIMessage(
          tool_calls=[
              {
                  "name": "execute_sparql",
                  "args": {
                      "sparql_query": 'SELECT ?film ?filmLabel WHERE {\n  ?film wdt:P57 wd:Q495980.\n  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }\n} LIMIT 1'
                  },
              }
          ],
      ),
      ToolMessage(
          content='[{"film": {"type": "uri", "value": "http://www.wikidata.org/entity/Q483761"}, "filmLabel": {"xml:lang": "en", "type": "literal", "value": "Mother"}}]',
          name="execute_sparql",
      ),
      AIMessage(
          content='I recommend the movie "Mother," which was directed by Bong Joon-ho.',
      ),
  ]
}
```

</details>

---

### Wikidata MCP Tools

The following tools are implemented in the server:

| Tool                                                 | Description                                                                |
| ---------------------------------------------------- | -------------------------------------------------------------------------- |
| `search_entity(query: str)`                          | Search for a Wikidata entity ID by its query.                              |
| `search_property(query: str)`                        | Search for a Wikidata property ID by its query.                            |
| `get_properties(entity_id: str)`                     | Get the properties associated with a given Wikidata entity ID.             |
| `execute_sparql(sparql_query: str)`                  | Execute a SPARQL query on Wikidata.                                        |
| `get_metadata(entity_id: str, language: str = "en")` | Retrieve the English label and description for a given Wikidata entity ID. |

---

#### License

MIT License

```

--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------

```toml
[project]
name = "mcp-wikidata"
version = "0.1.0"
description = "MCP Wikidata Server"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
    "mcp[cli]>=1.4.1",
    "httpx>=0.28.1",
]
authors = [{ name = "Jaebok Lee" }]

[project.optional-dependencies]
example = [
    "black>=25.1.0",
    "langchain-openai>=0.3.11",
    "langgraph>=0.3.21",
    "langchain-core>=0.3.49",
    "langchain-mcp-adapters>=0.0.6",
]

```

--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------

```yaml
# Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml

startCommand:
  type: stdio
  configSchema:
    # JSON Schema defining the configuration options for the MCP.
    type: object
    properties: {}
    default: {}
  commandFunction:
    # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
    |-
    (config) => ({
      command: 'uv',
      args: ['run', 'src/server.py'],
      env: {}
    })
  exampleConfig: {}

```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
# Generated by https://smithery.ai. See: https://smithery.ai/docs/config#dockerfile
FROM python:3.11-slim

# Install curl for installation of uv
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Install uv - see https://astral.sh/uv
RUN curl -LsSf https://astral.sh/uv/install.sh | sh

# Add local binary directory to PATH
ENV PATH="/root/.local/bin:$PATH"

# Set working directory
WORKDIR /app

# Copy requirements and project files
COPY pyproject.toml ./
COPY README.md ./
COPY src/ ./src/
COPY uv.lock ./

# Install project dependencies using pip
RUN pip install --upgrade pip && \
    pip install . --no-cache-dir

# Command to run the MCP server
CMD ["uv", "run", "src/server.py"]

```

--------------------------------------------------------------------------------
/src/client.py:
--------------------------------------------------------------------------------

```python
import os

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.tools import load_mcp_tools
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = "your-api-key"

model = ChatOpenAI(model="gpt-4o")

server_py = os.path.join(os.path.dirname(os.path.abspath(__file__)), "server.py")
server_params = StdioServerParameters(
    command="python",
    args=[server_py],
)


async def main():
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize the connection
            await session.initialize()

            # Get tools
            tools = await load_mcp_tools(session)

            # Create and run the agent
            agent = create_react_agent(
                model,
                tools,
                prompt="You are a helpful assistant. Answer the user's questions based on Wikidata.",
            )
            agent_response = await agent.ainvoke(
                {
                    "messages": "Can you recommend me a movie directed by Bong Joonho?",
                }
            )
            print(agent_response)


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

```

--------------------------------------------------------------------------------
/src/server.py:
--------------------------------------------------------------------------------

```python
# reference: https://github.com/langchain-ai/langchain/blob/master/cookbook/wikibase_agent.ipynb
import httpx
import json
from mcp.server.fastmcp import FastMCP
from typing import List, Dict

server = FastMCP("Wikidata MCP Server")

WIKIDATA_URL = "https://www.wikidata.org/w/api.php"
HEADER = {"Accept": "application/json", "User-Agent": "foobar"}


async def search_wikidata(query: str, is_entity: bool = True) -> str:
    """
    Search for a Wikidata item or property ID by its query.
    """
    params = {
        "action": "query",
        "list": "search",
        "srsearch": query,
        "srnamespace": 0 if is_entity else 120,
        "srlimit": 1,  # TODO: add a parameter to limit the number of results?
        "srqiprofile": "classic_noboostlinks" if is_entity else "classic",
        "srwhat": "text",
        "format": "json",
    }
    async with httpx.AsyncClient() as client:
        response = await client.get(WIKIDATA_URL, headers=HEADER, params=params)
    response.raise_for_status()
    try:
        title = response.json()["query"]["search"][0]["title"]
        title = title.split(":")[-1]
        return title
    except KeyError:
        return "No results found. Consider changing the search term."


@server.tool()
async def search_entity(query: str) -> str:
    """
    Search for a Wikidata entity ID by its query.

    Args:
        query (str): The query to search for. The query should be unambiguous enough to uniquely identify the entity.

    Returns:
        str: The Wikidata entity ID corresponding to the given query."
    """
    return await search_wikidata(query, is_entity=True)


@server.tool()
async def search_property(query: str) -> str:
    """
    Search for a Wikidata property ID by its query.

    Args:
        query (str): The query to search for. The query should be unambiguous enough to uniquely identify the property.

    Returns:
        str: The Wikidata property ID corresponding to the given query."
    """
    return await search_wikidata(query, is_entity=False)


@server.tool()
async def get_properties(entity_id: str) -> List[str]:
    """
    Get the properties associated with a given Wikidata entity ID.

    Args:
        entity_id (str): The entity ID to retrieve properties for. This should be a valid Wikidata entity ID.

    Returns:
        list: A list of property IDs associated with the given entity ID. If no properties are found, an empty list is returned.
    """
    params = {
        "action": "wbgetentities",
        "ids": entity_id,
        "props": "claims",
        "format": "json",
    }
    async with httpx.AsyncClient() as client:
        response = await client.get(WIKIDATA_URL, headers=HEADER, params=params)
    response.raise_for_status()
    data = response.json()
    return list(data.get("entities", {}).get(entity_id, {}).get("claims", {}).keys())


@server.tool()
async def execute_sparql(sparql_query: str) -> str:
    """
    Execute a SPARQL query on Wikidata.

    You may assume the following prefixes:
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX p: <http://www.wikidata.org/prop/>
    PREFIX ps: <http://www.wikidata.org/prop/statement/>

    Args:
        sparql_query (str): The SPARQL query to execute.

    Returns:
        str: The JSON-formatted result of the SPARQL query execution. If there are no results, an empty JSON object will be returned.
    """
    url = "https://query.wikidata.org/sparql"
    async with httpx.AsyncClient() as client:
        response = await client.get(
            url, params={"query": sparql_query, "format": "json"}
        )
    response.raise_for_status()
    result = response.json()["results"]["bindings"]
    return json.dumps(result)


@server.tool()
async def get_metadata(entity_id: str, language: str = "en") -> Dict[str, str]:
    """
    Retrieve the English label and description for a given Wikidata entity ID.

    Args:
        entity_id (str): The entity ID to retrieve metadata for.
        language (str): The language code for the label and description (default is "en"). Use ISO 639-1 codes.

    Returns:
        dict: A dictionary containing the label and description of the entity, if available.
    """
    params = {
        "action": "wbgetentities",
        "ids": entity_id,
        "props": "labels|descriptions",
        "languages": language,  # specify the desired language
        "format": "json",
    }
    async with httpx.AsyncClient() as client:
        response = await client.get(WIKIDATA_URL, params=params)
    response.raise_for_status()
    data = response.json()
    entity_data = data.get("entities", {}).get(entity_id, {})
    label = (
        entity_data.get("labels", {}).get(language, {}).get("value", "No label found")
    )
    descriptions = (
        entity_data.get("descriptions", {})
        .get(language, {})
        .get("value", "No label found")
    )
    return {"Label": label, "Descriptions": descriptions}


if __name__ == "__main__":
    server.run()

```