allenday/solr-mcp # codebase.md

This is page 3 of 3. Use http://codebase.md/allenday/solr-mcp?page={x} to view the full context.

# Directory Structure

```
├── .flake8
├── .gitignore
├── CHANGELOG.md
├── CLAUDE.md
├── CONTRIBUTING.md
├── data
│   ├── bitcoin-whitepaper.json
│   ├── bitcoin-whitepaper.md
│   └── README.md
├── docker-compose.yml
├── LICENSE
├── poetry.lock
├── pyproject.toml
├── QUICKSTART.md
├── README.md
├── scripts
│   ├── check_solr.py
│   ├── create_test_collection.py
│   ├── create_unified_collection.py
│   ├── demo_hybrid_search.py
│   ├── demo_search.py
│   ├── diagnose_search.py
│   ├── direct_mcp_test.py
│   ├── format.py
│   ├── index_documents.py
│   ├── lint.py
│   ├── prepare_data.py
│   ├── process_markdown.py
│   ├── README.md
│   ├── setup.sh
│   ├── simple_index.py
│   ├── simple_mcp_test.py
│   ├── simple_search.py
│   ├── unified_index.py
│   ├── unified_search.py
│   ├── vector_index_simple.py
│   ├── vector_index.py
│   └── vector_search.py
├── solr_config
│   └── unified
│       └── conf
│           ├── schema.xml
│           ├── solrconfig.xml
│           ├── stopwords.txt
│           └── synonyms.txt
├── solr_mcp
│   ├── __init__.py
│   ├── server.py
│   ├── solr
│   │   ├── __init__.py
│   │   ├── client.py
│   │   ├── collections.py
│   │   ├── config.py
│   │   ├── constants.py
│   │   ├── exceptions.py
│   │   ├── interfaces.py
│   │   ├── query
│   │   │   ├── __init__.py
│   │   │   ├── builder.py
│   │   │   ├── executor.py
│   │   │   ├── parser.py
│   │   │   └── validator.py
│   │   ├── response.py
│   │   ├── schema
│   │   │   ├── __init__.py
│   │   │   ├── cache.py
│   │   │   └── fields.py
│   │   ├── utils
│   │   │   ├── __init__.py
│   │   │   └── formatting.py
│   │   ├── vector
│   │   │   ├── __init__.py
│   │   │   ├── manager.py
│   │   │   └── results.py
│   │   └── zookeeper.py
│   ├── tools
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── solr_default_vectorizer.py
│   │   ├── solr_list_collections.py
│   │   ├── solr_list_fields.py
│   │   ├── solr_select.py
│   │   ├── solr_semantic_select.py
│   │   ├── solr_vector_select.py
│   │   └── tool_decorator.py
│   ├── utils.py
│   └── vector_provider
│       ├── __init__.py
│       ├── clients
│       │   ├── __init__.py
│       │   └── ollama.py
│       ├── constants.py
│       ├── exceptions.py
│       └── interfaces.py
├── solr.Dockerfile
└── tests
    ├── __init__.py
    ├── integration
    │   ├── __init__.py
    │   └── test_direct_solr.py
    └── unit
        ├── __init__.py
        ├── conftest.py
        ├── fixtures
        │   ├── __init__.py
        │   ├── common.py
        │   ├── config_fixtures.py
        │   ├── http_fixtures.py
        │   ├── server_fixtures.py
        │   ├── solr_fixtures.py
        │   ├── time_fixtures.py
        │   ├── vector_fixtures.py
        │   └── zookeeper_fixtures.py
        ├── solr
        │   ├── schema
        │   │   └── test_fields.py
        │   ├── test_client.py
        │   ├── test_config.py
        │   ├── utils
        │   │   └── test_formatting.py
        │   └── vector
        │       └── test_results.py
        ├── test_cache.py
        ├── test_client.py
        ├── test_config.py
        ├── test_formatting.py
        ├── test_interfaces.py
        ├── test_parser.py
        ├── test_query.py
        ├── test_schema.py
        ├── test_utils.py
        ├── test_validator.py
        ├── test_vector.py
        ├── test_zookeeper.py
        ├── tools
        │   ├── test_base.py
        │   ├── test_init.py
        │   ├── test_solr_default_vectorizer.py
        │   ├── test_solr_list_collections.py
        │   ├── test_solr_list_fields.py
        │   ├── test_tool_decorator.py
        │   └── test_tools.py
        └── vector_provider
            ├── test_constants.py
            ├── test_exceptions.py
            ├── test_interfaces.py
            └── test_ollama.py
```

# Files

--------------------------------------------------------------------------------
/tests/unit/test_schema.py:
--------------------------------------------------------------------------------

```python
"""Unit tests for FieldManager."""

from unittest.mock import MagicMock, Mock, patch

import pytest

from solr_mcp.solr.constants import FIELD_TYPE_MAPPING, SYNTHETIC_SORT_FIELDS
from solr_mcp.solr.exceptions import SchemaError
from solr_mcp.solr.schema import FieldCache, FieldManager


@pytest.fixture
def mock_schema_requests():
    """Mock requests module for schema tests."""
    with patch("solr_mcp.solr.schema.fields.requests") as mock_requests:
        mock_response = MagicMock()
        mock_response.status_code = 200
        mock_response.json.return_value = {
            "schema": {
                "fieldTypes": [
                    {
                        "name": "string",
                        "class": "solr.StrField",
                        "sortMissingLast": True,
                    },
                    {
                        "name": "text_general",
                        "class": "solr.TextField",
                        "positionIncrementGap": "100",
                    },
                    {
                        "name": "knn_vector",
                        "class": "solr.DenseVectorField",
                        "vectorDimension": 768,
                    },
                ],
                "fields": [
                    {
                        "name": "id",
                        "type": "string",
                        "required": True,
                        "multiValued": False,
                    },
                    {"name": "title", "type": "text_general", "multiValued": False},
                    {"name": "content", "type": "text_general", "multiValued": False},
                    {"name": "vector", "type": "knn_vector", "multiValued": False},
                ],
            }
        }
        mock_requests.get.return_value = mock_response
        yield mock_requests


@pytest.fixture
def field_manager():
    """Create FieldManager instance."""
    return FieldManager("http://localhost:8983/solr")


class TestFieldManager:
    """Test cases for FieldManager."""

    def test_init(self, field_manager):
        """Test FieldManager initialization."""
        assert field_manager.solr_base_url == "http://localhost:8983/solr"

    def test_get_schema_success(self, field_manager, mock_schema_requests):
        """Test successful schema retrieval."""
        schema = field_manager.get_schema("test_collection")
        assert "fieldTypes" in schema
        assert "fields" in schema

    def test_get_schema_error(self, field_manager, mock_schema_requests):
        """Test schema retrieval error handling."""
        mock_schema_requests.get.return_value.status_code = 500
        mock_schema_requests.get.return_value.text = "Internal Server Error"
        mock_schema_requests.get.return_value.raise_for_status.side_effect = Exception(
            "Server error"
        )

        with pytest.raises(SchemaError):
            field_manager.get_schema("test_collection")

    def test_get_field_types_success(self, field_manager, mock_schema_requests):
        """Test successful field types retrieval."""
        field_types = field_manager.get_field_types("test_collection")
        assert isinstance(field_types, dict)
        assert "id" in field_types
        assert field_types["id"] == "string"

    def test_get_field_types_cache(self, field_manager, mock_schema_requests):
        """Test field types caching."""
        # First call should make HTTP request
        field_manager.get_field_types("test_collection")
        initial_call_count = mock_schema_requests.get.call_count

        # Second call should use cache
        field_manager.get_field_types("test_collection")
        assert mock_schema_requests.get.call_count == initial_call_count

    def test_get_field_type_success(self, field_manager, mock_schema_requests):
        """Test getting single field type."""
        field_type = field_manager.get_field_type("test_collection", "id")
        assert field_type == "string"

    def test_get_field_type_nonexistent(self, field_manager, mock_schema_requests):
        """Test getting nonexistent field type."""
        with pytest.raises(SchemaError) as exc_info:
            field_manager.get_field_type("test_collection", "nonexistent")
        assert "not found" in str(exc_info.value)

    def test_validate_field_exists_success(self, field_manager, mock_schema_requests):
        """Test field existence validation."""
        # Should not raise exception
        field_manager.validate_field_exists("id", "test_collection")

    def test_validate_field_exists_error(self, field_manager, mock_schema_requests):
        """Test field existence validation error."""
        with pytest.raises(SchemaError):
            field_manager.validate_field_exists("nonexistent", "test_collection")

    def test_get_searchable_fields_success(self, field_manager, mock_schema_requests):
        """Test getting searchable fields from schema API."""
        searchable_fields = field_manager._get_searchable_fields("test_collection")
        assert isinstance(searchable_fields, list)
        assert "title" in searchable_fields
        assert "content" in searchable_fields
        assert "_text_" in searchable_fields

    def test_get_searchable_fields_fallback(self, field_manager, mock_schema_requests):
        """Test getting searchable fields with fallback."""
        # Configure mock to fail first call (schema API) but succeed second call (direct URL)
        mock_schema_requests.get.side_effect = [
            Exception("Schema API error"),
            Mock(json=lambda: {"responseHeader": {"params": {"fl": "title,content"}}}),
        ]

        searchable_fields = field_manager._get_searchable_fields("test_collection")
        assert isinstance(searchable_fields, list)
        assert set(searchable_fields) == set(["title", "content", "_text_"])

    def test_get_sortable_fields_success(self, field_manager, mock_schema_requests):
        """Test getting sortable fields."""
        sortable_fields = field_manager._get_sortable_fields("test_collection")
        assert isinstance(sortable_fields, dict)
        assert "_docid_" in sortable_fields
        assert "score" in sortable_fields
        assert sortable_fields["_docid_"]["type"] == "numeric"
        assert sortable_fields["score"] == SYNTHETIC_SORT_FIELDS["score"]

    def test_get_sortable_fields_error(self, field_manager, mock_schema_requests):
        """Test getting sortable fields with error."""
        mock_schema_requests.get.side_effect = Exception("API error")

        sortable_fields = field_manager._get_sortable_fields("test_collection")
        assert isinstance(sortable_fields, dict)
        assert len(sortable_fields) == 1
        assert "score" in sortable_fields
        assert sortable_fields["score"] == SYNTHETIC_SORT_FIELDS["score"]

    def test_validate_fields_success(self, field_manager, mock_field_manager_methods):
        """Test validating fields."""
        with mock_field_manager_methods["patch_get_collection_fields"](field_manager):
            # Should not raise exception
            field_manager.validate_fields("test_collection", ["title", "id"])

    def test_validate_fields_error(self, field_manager, mock_field_manager_methods):
        """Test validating fields with invalid fields."""
        with mock_field_manager_methods["patch_get_collection_fields"](field_manager):
            with pytest.raises(SchemaError) as exc_info:
                field_manager.validate_fields("test_collection", ["nonexistent"])
            assert "Invalid fields" in str(exc_info.value)

    def test_validate_sort_fields_success(
        self, field_manager, mock_field_manager_methods
    ):
        """Test validating sort fields."""
        with mock_field_manager_methods["patch_get_collection_fields"](field_manager):
            # Should not raise exception
            field_manager.validate_sort_fields("test_collection", ["id", "score"])

    def test_validate_sort_fields_error(
        self, field_manager, mock_field_manager_methods
    ):
        """Test validating sort fields with invalid fields."""
        with mock_field_manager_methods["patch_get_collection_fields"](field_manager):
            with pytest.raises(SchemaError) as exc_info:
                field_manager.validate_sort_fields("test_collection", ["title"])
            assert "Fields not sortable" in str(exc_info.value)

    def test_get_field_info_success(self, field_manager, mock_schema_requests):
        """Test getting field information."""
        field_info = field_manager.get_field_info("test_collection")
        assert "searchable_fields" in field_info
        assert "sortable_fields" in field_info
        assert "id" in field_info["searchable_fields"]
        assert "title" in field_info["searchable_fields"]
        assert "content" in field_info["searchable_fields"]
        assert "_docid_" in field_info["sortable_fields"]
        assert "score" in field_info["sortable_fields"]

    def test_clear_cache_specific(self, field_manager, mock_schema_requests):
        """Test clearing cache for specific collection."""
        # Cache some data
        field_manager.get_schema("test_collection")
        field_manager.get_field_types("test_collection")
        initial_call_count = mock_schema_requests.get.call_count

        # Clear cache for test_collection
        field_manager.clear_cache("test_collection")

        # Should make new request
        field_manager.get_schema("test_collection")
        assert mock_schema_requests.get.call_count > initial_call_count

    def test_clear_cache_all(self, field_manager, mock_schema_requests):
        """Test clearing all cache."""
        # Cache some data
        field_manager.get_schema("test_collection")
        field_manager.get_schema("test_collection2")
        initial_call_count = mock_schema_requests.get.call_count

        # Clear all cache
        field_manager.clear_cache()

        # Should make new requests
        field_manager.get_schema("test_collection")
        field_manager.get_schema("test_collection2")
        assert mock_schema_requests.get.call_count > initial_call_count + 1

    def test_get_collection_fields_cached(self, field_manager):
        """Test getting collection fields from cache."""
        field_manager.cache = FieldCache()
        expected_info = {
            "searchable_fields": ["title", "content"],
            "sortable_fields": {"id": {}, "score": {}},
        }
        field_manager.cache.set("test_collection", expected_info)

        field_info = field_manager._get_collection_fields("test_collection")

        # Remove last_updated for comparison since it's dynamic
        field_info.pop("last_updated", None)
        assert field_info == expected_info

    def test_get_collection_fields_error_with_cache(
        self, field_manager, mock_field_manager_methods
    ):
        """Test getting collection fields with error and cache fallback."""
        field_manager.cache = FieldCache()
        with mock_field_manager_methods["patch_get_searchable_fields"](field_manager):
            field_info = field_manager._get_collection_fields("test_collection")
            assert "searchable_fields" in field_info
            assert "_text_" in field_info["searchable_fields"]
            assert "score" in field_info["sortable_fields"]

    def test_get_searchable_fields_direct_url_error(
        self, field_manager, mock_schema_requests
    ):
        """Test getting searchable fields with both API and direct URL failing."""
        mock_schema_requests.get.side_effect = [
            Exception("Schema API error"),
            Exception("Direct URL error"),
        ]

        searchable_fields = field_manager._get_searchable_fields("test_collection")
        assert set(searchable_fields) == set(["content", "title", "_text_"])

    def test_get_sortable_fields_empty_response(
        self, field_manager, mock_schema_requests
    ):
        """Test getting sortable fields with empty response."""
        mock_response = Mock()
        mock_response.status_code = 200
        mock_response.json.return_value = {"fields": []}
        mock_schema_requests.get.return_value = mock_response

        sortable_fields = field_manager._get_sortable_fields("test_collection")
        assert "score" in sortable_fields
        assert sortable_fields["score"] == SYNTHETIC_SORT_FIELDS["score"]

    def test_get_collection_fields_error_no_cache(
        self, field_manager, mock_field_manager_methods
    ):
        """Test getting collection fields with error and no cache."""
        field_manager.cache = FieldCache()
        with mock_field_manager_methods["patch_get_searchable_fields"](field_manager):
            field_info = field_manager._get_collection_fields("test_collection")
            assert "searchable_fields" in field_info
            assert "_text_" in field_info["searchable_fields"]
            assert "score" in field_info["sortable_fields"]

    def test_get_searchable_fields_schema_error(
        self, field_manager, mock_schema_requests
    ):
        """Test getting searchable fields with schema error."""
        mock_response = Mock()
        mock_response.status_code = 404
        mock_response.text = "Schema not found"
        mock_schema_requests.get.return_value = mock_response

        searchable_fields = field_manager._get_searchable_fields("test_collection")
        assert set(searchable_fields) == set(["content", "title", "_text_"])

    def test_get_searchable_fields_empty_response(
        self, field_manager, mock_schema_requests
    ):
        """Test getting searchable fields with empty response."""
        mock_response = Mock()
        mock_response.status_code = 200
        mock_response.json.return_value = {"fields": []}
        mock_schema_requests.get.return_value = mock_response

        searchable_fields = field_manager._get_searchable_fields("test_collection")
        assert set(searchable_fields) == set(["content", "title", "_text_"])

    def test_get_collection_fields_error_with_cache_fallback(
        self, field_manager, mock_schema_requests
    ):
        """Test getting collection fields with error and cache fallback."""
        field_manager.cache = FieldCache()
        expected_info = {
            "searchable_fields": ["title", "content"],
            "sortable_fields": {"id": {}, "score": {}},
        }
        field_manager.cache.set("test_collection", expected_info)

        mock_schema_requests.get.side_effect = Exception("API error")

        field_info = field_manager._get_collection_fields("test_collection")
        field_info.pop("last_updated", None)
        assert field_info == expected_info

```

--------------------------------------------------------------------------------
/tests/unit/solr/schema/test_fields.py:
--------------------------------------------------------------------------------

```python
"""Tests for solr_mcp.solr.schema.fields module."""

import json
from typing import Any, Dict
from unittest.mock import Mock, patch

import pytest
import requests

from solr_mcp.solr.exceptions import SchemaError
from solr_mcp.solr.schema.fields import FieldManager


@pytest.fixture
def field_manager():
    """Create a FieldManager instance."""
    return FieldManager("http://localhost:8983")


@pytest.fixture
def mock_schema_response() -> Dict[str, Any]:
    """Create a mock schema response."""
    return {
        "schema": {
            "name": "test",
            "version": 1.6,
            "uniqueKey": "id",
            "fieldTypes": [
                {
                    "name": "text_general",
                    "class": "solr.TextField",
                    "positionIncrementGap": "100",
                },
                {"name": "string", "class": "solr.StrField", "sortMissingLast": True},
            ],
            "fields": [
                {
                    "name": "id",
                    "type": "string",
                    "indexed": True,
                    "stored": True,
                    "required": True,
                    "docValues": True,
                },
                {
                    "name": "title",
                    "type": "text_general",
                    "indexed": True,
                    "stored": True,
                    "docValues": False,
                },
                {
                    "name": "sort_field",
                    "type": "string",
                    "indexed": True,
                    "stored": True,
                    "docValues": True,
                },
            ],
            "copyFields": [{"source": "title", "dest": "_text_"}],
        }
    }


@pytest.fixture
def mock_direct_response():
    """Create a mock direct response."""
    return {"responseHeader": {"params": {"fl": "title,content,_text_"}}}


def test_get_field_info_success(field_manager, mock_schema_response):
    """Test getting field info."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        field_info = field_manager.get_field_info("test_collection")
        assert "searchable_fields" in field_info
        assert "sortable_fields" in field_info
        assert set(field_info["searchable_fields"]) == {"id", "title", "sort_field"}
        assert set(field_info["sortable_fields"].keys()) >= {
            "id",
            "sort_field",
            "_docid_",
            "score",
        }


def test_get_searchable_fields_schema_api(field_manager, mock_schema_response):
    """Test getting searchable fields using schema API."""
    with patch("requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = mock_schema_response

        fields = field_manager._get_searchable_fields("test_collection")

        assert "title" in fields
        assert "content" in fields
        assert "_text_" in fields


def test_get_searchable_fields_direct_url(field_manager, mock_direct_response):
    """Test getting searchable fields using direct URL."""
    with patch("requests.get") as mock_get:
        # First call fails to trigger fallback
        mock_get.side_effect = [
            requests.exceptions.RequestException("Schema API error"),
            Mock(status_code=200, json=lambda: mock_direct_response),
        ]

        fields = field_manager._get_searchable_fields("test_collection")

        assert "title" in fields
        assert "content" in fields
        assert "_text_" in fields


def test_get_searchable_fields_fallback(field_manager):
    """Test getting searchable fields with both methods failing."""
    with patch("requests.get") as mock_get:
        mock_get.side_effect = [
            requests.exceptions.RequestException("Schema API error"),
            requests.exceptions.RequestException("Direct URL error"),
        ]

        fields = field_manager._get_searchable_fields("test_collection")

        # Should return default fields
        assert fields == ["content", "title", "_text_"]


def test_get_searchable_fields_skip_special(field_manager):
    """Test skipping special fields except _text_."""
    schema_response = {
        "fields": [
            {"name": "_version_", "type": "long"},
            {"name": "_text_", "type": "text_general"},
            {"name": "_root_", "type": "string"},
            {"name": "title", "type": "text_general"},
        ]
    }

    with patch("requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = schema_response

        fields = field_manager._get_searchable_fields("test_collection")

        assert "_text_" in fields
        assert "title" in fields
        assert "_version_" not in fields
        assert "_root_" not in fields


def test_get_searchable_fields_text_types(field_manager):
    """Test identifying text type fields."""
    schema_response = {
        "fields": [
            {"name": "text_field", "type": "text_general"},
            {"name": "string_field", "type": "string"},
            {"name": "custom_text", "type": "custom_text_type"},
            {"name": "numeric_field", "type": "long"},
        ]
    }

    with patch("requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = schema_response

        fields = field_manager._get_searchable_fields("test_collection")

        assert "text_field" in fields
        assert "string_field" in fields
        assert "custom_text" in fields
        assert "numeric_field" not in fields


def test_get_field_info_specific_field(field_manager, mock_schema_response):
    """Test getting field info for specific field."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        field_info = field_manager.get_field_info("test_collection", "title")
        assert field_info["type"] == "text_general"
        assert field_info["searchable"] is True


def test_get_field_info_nonexistent_field(field_manager, mock_schema_response):
    """Test getting field info for non-existent field."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        with pytest.raises(
            SchemaError,
            match="Field nonexistent not found in collection test_collection",
        ):
            field_manager.get_field_info("test_collection", "nonexistent")


def test_get_schema_cached(field_manager, mock_schema_response):
    """Test schema caching."""
    with patch("requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = mock_schema_response

        # First call should make the request
        schema1 = field_manager.get_schema("test_collection")
        assert mock_get.call_count == 1

        # Second call should use cache
        schema2 = field_manager.get_schema("test_collection")
        assert mock_get.call_count == 1
        assert schema1 == schema2


def test_get_schema_invalid_response(field_manager):
    """Test handling of invalid schema response."""
    with patch("requests.get") as mock_get:
        mock_get.return_value.status_code = 200
        mock_get.return_value.json.return_value = {"invalid": "response"}

        with pytest.raises(SchemaError, match="Invalid schema response"):
            field_manager.get_schema("test_collection")


def test_get_field_info_no_fields(field_manager):
    """Test getting field info with no fields in schema."""
    with patch.object(field_manager, "get_schema", return_value={"schema": {}}):
        field_info = field_manager.get_field_info("test_collection")
        assert field_info["searchable_fields"] == []
        assert field_info["sortable_fields"] == {
            "_docid_": {
                "type": "numeric",
                "searchable": False,
                "directions": ["asc", "desc"],
                "default_direction": "asc",
            },
            "score": {
                "type": "numeric",
                "searchable": True,
                "directions": ["asc", "desc"],
                "default_direction": "desc",
            },
        }


def test_get_field_info_invalid_field_def(field_manager):
    """Test getting field info with invalid field definition."""
    schema = {"schema": {"fields": [{"invalid": "field"}]}}
    with patch.object(field_manager, "get_schema", return_value=schema):
        field_info = field_manager.get_field_info("test_collection")
        assert field_info["searchable_fields"] == []
        assert set(field_info["sortable_fields"].keys()) == {"_docid_", "score"}


def test_get_field_info_with_copy_fields(field_manager, mock_schema_response):
    """Test getting field info with copy fields."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        field_info = field_manager.get_field_info("test_collection")
        assert "title" in field_info["searchable_fields"]
        assert "_text_" not in field_info["searchable_fields"]


def test_get_field_types(field_manager, mock_schema_response):
    """Test getting field types."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        field_types = field_manager.get_field_types("test_collection")
        assert field_types["text_general"] == "text_general"
        assert field_types["string"] == "string"
        assert field_types["title"] == "text_general"
        assert field_types["id"] == "string"


def test_get_field_type(field_manager, mock_schema_response):
    """Test getting field type for a specific field."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        field_type = field_manager.get_field_type("test_collection", "title")
        assert field_type == "text_general"


def test_get_field_type_not_found(field_manager, mock_schema_response):
    """Test getting field type for a non-existent field."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ):
        with pytest.raises(SchemaError, match="Field not found: nonexistent"):
            field_manager.get_field_type("test_collection", "nonexistent")


def test_validate_field_exists_success(field_manager):
    """Test validating existing field."""
    with patch.object(field_manager, "get_field_info") as mock_get_info:
        mock_get_info.return_value = {"searchable_fields": ["title", "id"]}
        assert field_manager.validate_field_exists("title", "test_collection") is True


def test_validate_field_exists_wildcard(field_manager):
    """Test validating wildcard field."""
    with patch.object(field_manager, "get_field_info") as mock_get_info:
        mock_get_info.return_value = {"searchable_fields": ["title", "id"]}
        assert field_manager.validate_field_exists("*", "test_collection") is True


def test_validate_field_exists_not_found(field_manager):
    """Test validating non-existent field."""
    with patch.object(field_manager, "get_field_info") as mock_get_info:
        mock_get_info.return_value = {"searchable_fields": ["title", "id"]}
        with pytest.raises(
            SchemaError,
            match="Field nonexistent not found in collection test_collection",
        ):
            field_manager.validate_field_exists("nonexistent", "test_collection")


def test_validate_field_exists_error(field_manager):
    """Test field validation with error."""
    with patch.object(
        field_manager, "get_field_info", side_effect=Exception("Test error")
    ):
        with pytest.raises(
            SchemaError, match="Error validating field test: Test error"
        ):
            field_manager.validate_field_exists("test", "test_collection")


def test_validate_sort_field_success(field_manager):
    """Test validating sortable field."""
    with patch.object(field_manager, "get_field_info") as mock_get_info:
        mock_get_info.return_value = {"sortable_fields": {"sort_field": {}, "id": {}}}
        assert (
            field_manager.validate_sort_field("sort_field", "test_collection") is True
        )


def test_validate_sort_field_not_found(field_manager):
    """Test validating non-sortable field."""
    with patch.object(field_manager, "get_field_info") as mock_get_info:
        mock_get_info.return_value = {"sortable_fields": {"sort_field": {}, "id": {}}}
        with pytest.raises(
            SchemaError,
            match="Field title is not sortable in collection test_collection",
        ):
            field_manager.validate_sort_field("title", "test_collection")


def test_validate_sort_field_error(field_manager):
    """Test sort field validation with error."""
    with patch.object(
        field_manager, "get_field_info", side_effect=Exception("Test error")
    ):
        with pytest.raises(
            SchemaError, match="Error validating sort field test: Test error"
        ):
            field_manager.validate_sort_field("test", "test_collection")


def test_get_field_types_cached(field_manager, mock_schema_response):
    """Test field types caching."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ) as mock_get_schema:
        # First call should hit the API
        field_types1 = field_manager.get_field_types("test_collection")
        # Second call should use cache
        field_types2 = field_manager.get_field_types("test_collection")

        assert field_types1 == field_types2
        mock_get_schema.assert_called_once()


def test_clear_cache_specific_collection(field_manager, mock_schema_response):
    """Test clearing cache for specific collection."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ) as mock_get_schema:
        # Populate cache
        field_manager.get_field_types("test_collection")
        field_manager.get_field_types("other_collection")

        # Clear specific collection
        field_manager.clear_cache("test_collection")

        # Should hit API again for cleared collection
        field_manager.get_field_types("test_collection")
        # Should use cache for other collection
        field_manager.get_field_types("other_collection")

        assert mock_get_schema.call_count == 3


def test_clear_cache_all(field_manager, mock_schema_response):
    """Test clearing entire cache."""
    with patch.object(
        field_manager, "get_schema", return_value=mock_schema_response["schema"]
    ) as mock_get_schema:
        # Populate cache
        field_manager.get_field_types("test_collection")
        field_manager.get_field_types("other_collection")

        # Clear all cache
        field_manager.clear_cache()

        # Should hit API again for both collections
        field_manager.get_field_types("test_collection")
        field_manager.get_field_types("other_collection")

        assert mock_get_schema.call_count == 4


def test_validate_collection_exists_success(field_manager, mock_schema_response):
    """Test validating existing collection."""
    with patch.object(field_manager, "get_schema") as mock_get_schema:
        mock_get_schema.return_value = mock_schema_response["schema"]
        assert field_manager.validate_collection_exists("test_collection") is True


def test_validate_collection_exists_error(field_manager):
    """Test collection validation with error."""
    with patch.object(field_manager, "get_schema", side_effect=Exception("Test error")):
        with pytest.raises(
            SchemaError, match="Error validating collection: Test error"
        ):
            field_manager.validate_collection_exists("test_collection")


def test_validate_fields_success(field_manager):
    """Test validating multiple fields."""
    with patch.object(field_manager, "_get_collection_fields") as mock_get_fields:
        mock_get_fields.return_value = {
            "searchable_fields": ["title", "id"],
            "sortable_fields": {"id": {}, "sort_field": {}},
        }
        field_manager.validate_fields("test_collection", ["title", "id"])
        mock_get_fields.assert_called_once_with("test_collection")


def test_validate_fields_error(field_manager):
    """Test validating multiple fields with error."""
    with patch.object(field_manager, "_get_collection_fields") as mock_get_fields:
        mock_get_fields.return_value = {
            "searchable_fields": ["title"],
            "sortable_fields": {"sort_field": {}},
        }
        with pytest.raises(
            SchemaError,
            match="Invalid fields for collection test_collection: nonexistent",
        ):
            field_manager.validate_fields("test_collection", ["title", "nonexistent"])


def test_validate_sort_fields_success(field_manager):
    """Test validating multiple sort fields."""
    with patch.object(field_manager, "_get_collection_fields") as mock_get_fields:
        mock_get_fields.return_value = {
            "searchable_fields": ["title", "id"],
            "sortable_fields": {"id": {}, "sort_field": {}},
        }
        field_manager.validate_sort_fields("test_collection", ["sort_field", "id"])
        mock_get_fields.assert_called_once_with("test_collection")


def test_validate_sort_fields_error(field_manager):
    """Test validating multiple sort fields with error."""
    with patch.object(field_manager, "_get_collection_fields") as mock_get_fields:
        mock_get_fields.return_value = {
            "searchable_fields": ["title"],
            "sortable_fields": {"sort_field": {}},
        }
        with pytest.raises(
            SchemaError,
            match="Fields not sortable in collection test_collection: title",
        ):
            field_manager.validate_sort_fields(
                "test_collection", ["sort_field", "title"]
            )

```

--------------------------------------------------------------------------------
/solr_mcp/solr/schema/fields.py:
--------------------------------------------------------------------------------

```python
"""Schema and field management for SolrCloud client."""

import json
import logging
from typing import Any, Dict, List, Optional

import aiohttp
import requests
from loguru import logger
from requests.exceptions import HTTPError, RequestException

from solr_mcp.solr.constants import FIELD_TYPE_MAPPING, SYNTHETIC_SORT_FIELDS
from solr_mcp.solr.exceptions import SchemaError, SolrError
from solr_mcp.solr.schema.cache import FieldCache

logger = logging.getLogger(__name__)


class FieldManager:
    """Manages Solr schema fields and field types."""

    def __init__(self, solr_base_url: str):
        """Initialize the field manager.

        Args:
            solr_base_url: Base URL for Solr instance
        """
        self.solr_base_url = (
            solr_base_url.rstrip("/")
            if isinstance(solr_base_url, str)
            else solr_base_url.config.solr_base_url.rstrip("/")
        )
        self._schema_cache = {}
        self._field_types_cache = {}
        self._vector_field_cache = {}
        self.cache = FieldCache()

    def get_schema(self, collection: str) -> Dict:
        """Get schema for a collection.

        Args:
            collection: Collection name

        Returns:
            Schema information

        Raises:
            SchemaError: If schema cannot be retrieved
        """
        if collection in self._schema_cache:
            return self._schema_cache[collection]

        try:
            # Try schema API first
            url = f"{self.solr_base_url}/{collection}/schema"
            response = requests.get(url)
            response.raise_for_status()
            schema = response.json()

            if "schema" not in schema:
                raise SchemaError("Invalid schema response")

            self._schema_cache[collection] = schema["schema"]
            return schema["schema"]

        except HTTPError as e:
            if getattr(e.response, "status_code", None) == 404:
                raise SchemaError(f"Collection not found: {collection}")
            raise SchemaError(f"Failed to get schema: {str(e)}")

        except Exception as e:
            logger.error(f"Error getting schema: {str(e)}")
            raise SchemaError(f"Failed to get schema: {str(e)}")

    def get_field_types(self, collection: str) -> Dict[str, str]:
        """Get field types for a collection."""
        if collection in self._field_types_cache:
            return self._field_types_cache[collection]

        schema = self.get_schema(collection)
        field_types = {}

        # First map field type names to their definitions
        for field_type in schema.get("fieldTypes", []):
            field_types[field_type["name"]] = field_type["name"]

        # Then map fields to their types
        for field in schema.get("fields", []):
            if "name" in field and "type" in field:
                field_types[field["name"]] = field["type"]

        self._field_types_cache[collection] = field_types
        return field_types

    def get_field_type(self, collection: str, field_name: str) -> str:
        """Get field type for a specific field."""
        field_types = self.get_field_types(collection)
        if field_name not in field_types:
            raise SchemaError(f"Field not found: {field_name}")
        return field_types[field_name]

    def validate_field_exists(self, field: str, collection: str) -> bool:
        """Validate that a field exists in a collection.

        Args:
            field: Field name to validate
            collection: Collection name

        Returns:
            True if field exists

        Raises:
            SchemaError: If field does not exist
        """
        try:
            # Handle wildcard field
            if field == "*":
                return True

            field_info = self.get_field_info(collection)
            if field not in field_info["searchable_fields"]:
                raise SchemaError(f"Field {field} not found in collection {collection}")

            return True

        except SchemaError:
            raise
        except Exception as e:
            logger.error(f"Error validating field {field}: {str(e)}")
            raise SchemaError(f"Error validating field {field}: {str(e)}")

    def validate_sort_field(self, field: str, collection: str) -> bool:
        """Validate that a field can be used for sorting.

        Args:
            field: Field name to validate
            collection: Collection name

        Returns:
            True if field is sortable

        Raises:
            SchemaError: If field is not sortable
        """
        try:
            field_info = self.get_field_info(collection)
            if field not in field_info["sortable_fields"]:
                raise SchemaError(
                    f"Field {field} is not sortable in collection {collection}"
                )

            return True

        except SchemaError:
            raise
        except Exception as e:
            logger.error(f"Error validating sort field {field}: {str(e)}")
            raise SchemaError(f"Error validating sort field {field}: {str(e)}")

    def get_field_info(
        self, collection: str, field: Optional[str] = None
    ) -> Dict[str, Any]:
        """Get field information for a collection.

        Args:
            collection: Collection name
            field: Optional field name to get specific info for

        Returns:
            Field information including searchable and sortable fields

        Raises:
            SchemaError: If field info cannot be retrieved
        """
        try:
            schema = self.get_schema(collection)

            # Get all fields
            fields = schema.get("fields", [])

            # Build field info
            searchable_fields = []
            sortable_fields = {}

            for field_def in fields:
                name = field_def.get("name")
                if not name:
                    continue

                # Check if field is searchable
                if field_def.get("indexed", True):
                    searchable_fields.append(name)

                # Check if field is sortable
                if field_def.get("docValues", False) or field_def.get("stored", False):
                    sortable_fields[name] = {
                        "type": field_def.get("type", "string"),
                        "searchable": field_def.get("indexed", True),
                        "directions": ["asc", "desc"],
                        "default_direction": "asc",
                    }

            # Add special fields
            sortable_fields["_docid_"] = {
                "type": "numeric",
                "searchable": False,
                "directions": ["asc", "desc"],
                "default_direction": "asc",
            }
            sortable_fields["score"] = {
                "type": "numeric",
                "searchable": True,
                "directions": ["asc", "desc"],
                "default_direction": "desc",
            }

            field_info = {
                "searchable_fields": searchable_fields,
                "sortable_fields": sortable_fields,
            }

            if field:
                if field in sortable_fields:
                    return sortable_fields[field]
                raise SchemaError(f"Field {field} not found in collection {collection}")

            return field_info

        except SchemaError:
            raise
        except Exception as e:
            logger.error(f"Error getting field info: {str(e)}")
            raise SchemaError(f"Failed to get field info: {str(e)}")

    def validate_collection(self, collection: str) -> bool:
        """Validate that a collection exists.

        Args:
            collection: Collection name to validate

        Returns:
            True if collection exists

        Raises:
            SchemaError: If collection does not exist
        """
        try:
            self.get_schema(collection)
            return True

        except Exception as e:
            logger.error(f"Error validating collection {collection}: {str(e)}")
            raise SchemaError(f"Collection {collection} does not exist: {str(e)}")

    def clear_cache(self, collection: Optional[str] = None):
        """Clear schema cache.

        Args:
            collection: Optional collection name to clear cache for. If None, clears all cache.
        """
        if collection:
            self._schema_cache.pop(collection, None)
            self._field_types_cache.pop(collection, None)
        else:
            self._schema_cache = {}
            self._field_types_cache = {}

    def _get_collection_fields(self, collection: str) -> Dict[str, Any]:
        """Get or load field information for a collection.

        Args:
            collection: Collection name

        Returns:
            Dict containing searchable and sortable fields for the collection
        """
        # Check cache first
        if not self.cache.is_stale(collection):
            return self.cache.get(collection)

        try:
            searchable_fields = self._get_searchable_fields(collection)
            sortable_fields = self._get_sortable_fields(collection)

            field_info = {
                "searchable_fields": searchable_fields,
                "sortable_fields": sortable_fields,
            }

            # Update cache
            self.cache.set(collection, field_info)

            logger.info(f"Loaded field information for collection {collection}")
            logger.debug(f"Searchable fields: {searchable_fields}")
            logger.debug(f"Sortable fields: {sortable_fields}")

            return field_info

        except Exception as e:
            logger.error(
                f"Error loading field information for collection {collection}: {e}"
            )
            # Use cached defaults
            return self.cache.get_or_default(collection)

    def _get_searchable_fields(self, collection: str) -> List[str]:
        """Get list of searchable fields for a collection.

        Args:
            collection: Collection name

        Returns:
            List of field names that can be searched
        """
        try:
            # Try schema API first
            schema_url = f"{collection}/schema/fields?wt=json"
            logger.debug(f"Getting searchable fields from schema URL: {schema_url}")
            full_url = f"{self.solr_base_url}/{schema_url}"
            logger.debug(f"Full URL: {full_url}")

            response = requests.get(full_url)
            fields_data = response.json()

            searchable_fields = []
            for field in fields_data.get("fields", []):
                field_name = field.get("name")
                field_type = field.get("type")

                # Skip special fields
                if field_name.startswith("_") and field_name not in ["_text_"]:
                    continue

                # Add text and string fields
                if field_type in ["text_general", "string"] or "text" in field_type:
                    logger.debug(
                        f"Found searchable field: {field_name}, type: {field_type}"
                    )
                    searchable_fields.append(field_name)

            # Add known content fields
            content_fields = ["content", "title", "_text_"]
            for field in content_fields:
                if field not in searchable_fields:
                    searchable_fields.append(field)

            logger.info(
                f"Using searchable fields for collection {collection}: {searchable_fields}"
            )
            return searchable_fields

        except Exception as e:
            logger.warning(f"Error getting schema fields: {str(e)}")
            logger.info(
                "Fallback: trying direct URL with query that returns field info"
            )

            try:
                direct_url = (
                    f"{self.solr_base_url}/{collection}/select?q=*:*&rows=0&wt=json"
                )
                logger.debug(f"Trying direct URL: {direct_url}")

                response = requests.get(direct_url)
                response_data = response.json()

                # Extract fields from response header
                fields = []
                if "responseHeader" in response_data:
                    header = response_data["responseHeader"]
                    if "params" in header and "fl" in header["params"]:
                        fields = header["params"]["fl"].split(",")

                # Add known searchable fields
                fields.extend(["content", "title", "_text_"])
                searchable_fields = list(set(fields))  # Remove duplicates

            except Exception as e2:
                logger.error(f"Error getting searchable fields: {str(e2)}")
                logger.info(
                    "Using fallback searchable fields: ['content', 'title', '_text_']"
                )
                searchable_fields = ["content", "title", "_text_"]

            logger.info(
                f"Using searchable fields for collection {collection}: {searchable_fields}"
            )
            return searchable_fields

    def _get_sortable_fields(self, collection: str) -> Dict[str, Dict[str, Any]]:
        """Get list of sortable fields and their properties for a collection.

        Args:
            collection: Collection name

        Returns:
            Dict mapping field names to their properties
        """
        try:
            # Try schema API first
            schema_url = f"{collection}/schema/fields?wt=json"
            logger.debug(f"Getting sortable fields from schema URL: {schema_url}")
            full_url = f"{self.solr_base_url}/{schema_url}"
            logger.debug(f"Full URL: {full_url}")

            response = requests.get(full_url)
            fields_data = response.json()

            sortable_fields = {}

            # Process schema fields
            for field in fields_data.get("fields", []):
                field_name = field.get("name")
                field_type = field.get("type")
                multi_valued = field.get("multiValued", False)
                doc_values = field.get("docValues", False)

                # Skip special fields, multi-valued fields, and fields without a recognized type
                if (
                    (
                        field_name.startswith("_")
                        and field_name not in SYNTHETIC_SORT_FIELDS
                    )
                    or multi_valued
                    or field_type not in FIELD_TYPE_MAPPING
                ):
                    continue

                # Add field to sortable fields
                sortable_fields[field_name] = {
                    "type": FIELD_TYPE_MAPPING[field_type],
                    "directions": ["asc", "desc"],
                    "default_direction": (
                        "asc"
                        if FIELD_TYPE_MAPPING[field_type]
                        in ["string", "numeric", "date"]
                        else "desc"
                    ),
                    "searchable": True,  # Regular schema fields are searchable
                }

            # Add synthetic fields
            sortable_fields.update(SYNTHETIC_SORT_FIELDS)

            return sortable_fields

        except Exception as e:
            logger.error(f"Error getting sortable fields: {e}")
            # Return only the guaranteed score field
            return {"score": SYNTHETIC_SORT_FIELDS["score"]}

    def validate_fields(self, collection: str, fields: List[str]) -> None:
        """Validate that the requested fields exist in the collection.

        Args:
            collection: Collection name
            fields: List of field names to validate

        Raises:
            SchemaError: If any field is not valid for the collection
        """
        collection_info = self._get_collection_fields(collection)
        searchable_fields = collection_info["searchable_fields"]
        sortable_fields = collection_info["sortable_fields"]

        # Combine all valid fields
        valid_fields = set(searchable_fields) | set(sortable_fields.keys())

        # Check each requested field
        invalid_fields = [f for f in fields if f not in valid_fields]
        if invalid_fields:
            raise SchemaError(
                f"Invalid fields for collection {collection}: {', '.join(invalid_fields)}"
            )

    def validate_sort_fields(self, collection: str, sort_fields: List[str]) -> None:
        """Validate that the requested sort fields are sortable in the collection.

        Args:
            collection: Collection name
            sort_fields: List of field names to validate for sorting

        Raises:
            SchemaError: If any field is not sortable in the collection
        """
        collection_info = self._get_collection_fields(collection)
        sortable_fields = collection_info["sortable_fields"]

        # Check each sort field
        invalid_fields = [f for f in sort_fields if f not in sortable_fields]
        if invalid_fields:
            raise SchemaError(
                f"Fields not sortable in collection {collection}: {', '.join(invalid_fields)}"
            )

    def validate_collection_exists(self, collection: str) -> bool:
        """Validate that a collection exists.

        Args:
            collection: Collection name

        Returns:
            True if collection exists

        Raises:
            SchemaError: If collection does not exist
        """
        try:
            self.get_schema(collection)
            return True

        except SchemaError as e:
            if "Collection not found" in str(e):
                raise
            logger.error(f"Error validating collection: {str(e)}")
            raise SchemaError(f"Error validating collection: {str(e)}")

        except Exception as e:
            logger.error(f"Error validating collection: {str(e)}")
            raise SchemaError(f"Error validating collection: {str(e)}")

    async def list_fields(self, collection: str) -> List[Dict[str, Any]]:
        """List all fields in a collection with their properties.

        Args:
            collection: Collection name

        Returns:
            List of field dictionaries with their properties

        Raises:
            SchemaError: If fields cannot be retrieved
        """
        try:
            # Verify collection exists
            schema = self.get_schema(collection)

            # Get schema fields and copyFields
            fields = schema.get("fields", [])
            copy_fields = schema.get("copyFields", [])

            # Build map of destination fields to their source fields
            copies_from = {}
            for copy_field in copy_fields:
                dest = copy_field.get("dest")
                source = copy_field.get("source")
                if not dest or not source:
                    continue
                if dest not in copies_from:
                    copies_from[dest] = []
                copies_from[dest].append(source)

            # Add copyField information to field properties
            for field in fields:
                if field.get("name") in copies_from:
                    field["copies_from"] = copies_from[field["name"]]

            return fields

        except SchemaError:
            raise
        except Exception as e:
            raise SchemaError(
                f"Failed to list fields for collection '{collection}': {str(e)}"
            )

    async def find_vector_field(self, collection: str) -> str:
        """Find the first vector field in a collection.

        Args:
            collection: Collection name

        Returns:
            Name of the first vector field found

        Raises:
            SchemaError: If no vector fields found
        """
        try:
            fields = await self.list_fields(collection)

            # Look for vector fields
            vector_fields = [
                f
                for f in fields
                if f.get("type") in ["dense_vector", "knn_vector"]
                or f.get("class") == "solr.DenseVectorField"
            ]

            if not vector_fields:
                raise SchemaError(
                    f"No vector fields found in collection '{collection}'"
                )

            field = vector_fields[0]["name"]
            logger.info(f"Using auto-detected vector field: {field}")
            return field

        except SchemaError:
            raise
        except Exception as e:
            raise SchemaError(
                f"Failed to find vector field in collection '{collection}': {str(e)}"
            )

    async def validate_vector_field_dimension(
        self,
        collection: str,
        field: str,
        vector_provider_model: Optional[str] = None,
        model_dimensions: Optional[Dict[str, int]] = None,
    ) -> Dict[str, Any]:
        """Validate that the vector field exists and its dimension matches the vectorizer.

        Args:
            collection: Collection name
            field: Field name to validate
            vector_provider_model: Optional vectorizer model name
            model_dimensions: Dictionary mapping model names to dimensions

        Returns:
            Field information dictionary

        Raises:
            SchemaError: If validation fails
        """
        # Check cache first
        cache_key = f"{collection}:{field}"
        if cache_key in self._vector_field_cache:
            field_info = self._vector_field_cache[cache_key]
            logger.debug(f"Using cached field info for {cache_key}")
            return field_info

        try:
            # Get collection fields
            fields = await self.list_fields(collection)

            # Find the specified field
            field_info = next((f for f in fields if f.get("name") == field), None)
            if not field_info:
                raise SchemaError(
                    f"Field '{field}' does not exist in collection '{collection}'"
                )

            # Check if field is a vector type (supporting both dense_vector and knn_vector)
            field_type = field_info.get("type")
            field_class = field_info.get("class")
            if (
                field_type not in ["dense_vector", "knn_vector"]
                and field_class != "solr.DenseVectorField"
            ):
                raise SchemaError(
                    f"Field '{field}' is not a vector field (type: {field_type}, class: {field_class})"
                )

            # Get field dimension
            vector_dimension = None

            # First check if dimension is directly in field info
            if "vectorDimension" in field_info:
                vector_dimension = field_info["vectorDimension"]
            else:
                # Look up the field type definition
                field_type_name = field_info.get("type")

                # Get all field types
                schema_url = f"{self.solr_base_url}/{collection}/schema"
                try:
                    schema_response = requests.get(schema_url)
                    schema_data = schema_response.json()
                    field_types = schema_data.get("schema", {}).get("fieldTypes", [])

                    # Find matching field type
                    matching_type = next(
                        (ft for ft in field_types if ft.get("name") == field_type_name),
                        None,
                    )

                    if matching_type and "vectorDimension" in matching_type:
                        vector_dimension = matching_type["vectorDimension"]
                    elif (
                        matching_type
                        and matching_type.get("class") == "solr.DenseVectorField"
                    ):
                        # For solr.DenseVectorField, dimension should be specified in the field type
                        vector_dimension = matching_type.get("vectorDimension")
                except Exception as e:
                    logger.warning(
                        f"Error fetching schema to determine vector dimension: {str(e)}"
                    )

            # If still not found, attempt to get from fields
            if not vector_dimension:
                # Look for field types in the fields list that match this type
                field_types = [
                    f
                    for f in fields
                    if f.get("class") == "solr.DenseVectorField"
                    or (f.get("name") == field_type and "vectorDimension" in f)
                ]
                if field_types and "vectorDimension" in field_types[0]:
                    vector_dimension = field_types[0]["vectorDimension"]

            # No need to use hardcoded defaults - this should be explicitly defined in the schema

            if not vector_dimension:
                raise SchemaError(
                    f"Could not determine vector dimension for field '{field}' (type: {field_type})"
                )

            # If vector provider model and dimensions are provided, check compatibility
            if vector_provider_model and model_dimensions:
                model_dimension = model_dimensions.get(vector_provider_model)
                if model_dimension:
                    # Validate dimensions match
                    if int(vector_dimension) != model_dimension:
                        raise SchemaError(
                            f"Vector dimension mismatch: field '{field}' has dimension {vector_dimension}, "
                            f"but model '{vector_provider_model}' produces vectors with dimension {model_dimension}"
                        )

            # Cache the result
            self._vector_field_cache[cache_key] = field_info
            return field_info

        except SchemaError:
            raise
        except Exception as e:
            raise SchemaError(f"Error validating vector field dimension: {str(e)}")

```

--------------------------------------------------------------------------------
/data/bitcoin-whitepaper.md:
--------------------------------------------------------------------------------

```markdown
# Bitcoin: A Peer-to-Peer Electronic Cash System

Satoshi Nakamoto  
[[email protected]](mailto:[email protected])  
www.bitcoin.org

**Abstract.** A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.

## 1. Introduction

Commerce on the Internet has come to rely almost exclusively on financial institutions serving as trusted third parties to process electronic payments. While the system works well enough for most transactions, it still suffers from the inherent weaknesses of the trust based model. Completely non-reversible transactions are not really possible, since financial institutions cannot avoid mediating disputes. The cost of mediation increases transaction costs, limiting the minimum practical transaction size and cutting off the possibility for small casual transactions, and there is a broader cost in the loss of ability to make non-reversible payments for non-reversible services. With the possibility of reversal, the need for trust spreads. Merchants must be wary of their customers, hassling them for more information than they would otherwise need. A certain percentage of fraud is accepted as unavoidable. These costs and payment uncertainties can be avoided in person by using physical currency, but no mechanism exists to make payments over a communications channel without a trusted party.

What is needed is an electronic payment system based on cryptographic proof instead of trust, allowing any two willing parties to transact directly with each other without the need for a trusted third party. Transactions that are computationally impractical to reverse would protect sellers from fraud, and routine escrow mechanisms could easily be implemented to protect buyers. In this paper, we propose a solution to the double-spending problem using a peer-to-peer distributed timestamp server to generate computational proof of the chronological order of transactions. The system is secure as long as honest nodes collectively control more CPU power than any cooperating group of attacker nodes.

## 2. Transactions

We define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin. A payee can verify the signatures to verify the chain of ownership.

```
      ┌─────────────────────┐               ┌─────────────────────┐              ┌─────────────────────┐
      │                     │               │                     │              │                     │
      │    Transaction      │               │    Transaction      │              │    Transaction      │
      │                     │               │                     │              │                     │
      │   ┌─────────────┐   │               │   ┌─────────────┐   │              │   ┌─────────────┐   │
      │   │ Owner 1's   │   │               │   │ Owner 2's   │   │              │   │ Owner 3's   │   │
      │   │ Public Key  │   │               │   │ Public Key  │   │              │   │ Public Key  │   │
      │   └───────┬─────┘   │               │   └───────┬─────┘   │              │   └───────┬─────┘   │
      │           │    .    │               │           │    .    │              │           │         │
──────┼─────────┐ │    .    ├───────────────┼─────────┐ │    .    ├──────────────┼─────────┐ │         │
      │         │ │    .    │               │         │ │    .    │              │         │ │         │
      │      ┌──▼─▼──┐ .    │               │      ┌──▼─▼──┐ .    │              │      ┌──▼─▼──┐      │
      │      │ Hash  │ .    │               │      │ Hash  │ .    │              │      │ Hash  │      │
      │      └───┬───┘ .    │    Verify     │      └───┬───┘ .    │    Verify    │      └───┬───┘      │
      │          │     ............................    │     ...........................    │          │
      │          │          │               │     │    │          │              │     │    │          │
      │   ┌──────▼──────┐   │               │   ┌─▼────▼──────┐   │              │   ┌─▼────▼──────┐   │
      │   │ Owner 0's   │   │      Sign     │   │ Owner 1's   │   │      Sign    │   │ Owner 2's   │   │
      │   │ Signature   │   │      ...........─►│ Signature   │   │     ...........─►│ Signature   │   │
      │   └─────────────┘   │      .        │   └─────────────┘   │     .        │   └─────────────┘   │
      │                     │      .        │                     │     .        │                     │
      └─────────────────────┘      .        └─────────────────────┘     .        └─────────────────────┘
                                   .                                    .
          ┌─────────────┐          .            ┌─────────────┐         .            ┌─────────────┐
          │ Owner 1's   │...........            │ Owner 2's   │..........            │ Owner 3's   │
          │ Private Key │                       │ Private Key │                      │ Private Key │
          └─────────────┘                       └─────────────┘                      └─────────────┘
```

The problem of course is the payee can't verify that one of the owners did not double-spend the coin. A common solution is to introduce a trusted central authority, or mint, that checks every transaction for double spending. After each transaction, the coin must be returned to the mint to issue a new coin, and only coins issued directly from the mint are trusted not to be double-spent. The problem with this solution is that the fate of the entire money system depends on the company running the mint, with every transaction having to go through them, just like a bank.

We need a way for the payee to know that the previous owners did not sign any earlier transactions. For our purposes, the earliest transaction is the one that counts, so we don't care about later attempts to double-spend. The only way to confirm the absence of a transaction is to be aware of all transactions. In the mint based model, the mint was aware of all transactions and decided which arrived first. To accomplish this without a trusted party, transactions must be publicly announced [^1], and we need a system for participants to agree on a single history of the order in which they were received. The payee needs proof that at the time of each transaction, the majority of nodes agreed it was the first received.

## 3. Timestamp Server

The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post [^2] [^3] [^4] [^5]. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it.

```
             ┌──────┐                        ┌──────┐
────────────►│      ├───────────────────────►│      ├───────────────────►
             │ Hash │                        │ Hash │
        ┌───►│      │                   ┌───►│      │
        │    └──────┘                   │    └──────┘
        │                               │
       ┌┴──────────────────────────┐   ┌┴──────────────────────────┐
       │ Block                     │   │ Block                     │
       │ ┌─────┐ ┌─────┐ ┌─────┐   │   │ ┌─────┐ ┌─────┐ ┌─────┐   │
       │ │Item │ │Item │ │...  │   │   │ │Item │ │Item │ │...  │   │
       │ └─────┘ └─────┘ └─────┘   │   │ └─────┘ └─────┘ └─────┘   │
       │                           │   │                           │
       └───────────────────────────┘   └───────────────────────────┘
```

## 4. Proof-of-Work

To implement a distributed timestamp server on a peer-to-peer basis, we will need to use a proof-of-work system similar to Adam Back's Hashcash [^6], rather than newspaper or Usenet posts. The proof-of-work involves scanning for a value that when hashed, such as with SHA-256, the hash begins with a number of zero bits. The average work required is exponential in the number of zero bits required and can be verified by executing a single hash.

For our timestamp network, we implement the proof-of-work by incrementing a nonce in the block until a value is found that gives the block's hash the required zero bits. Once the CPU effort has been expended to make it satisfy the proof-of-work, the block cannot be changed without redoing the work. As later blocks are chained after it, the work to change the block would include redoing all the blocks after it.

```
       ┌────────────────────────────────────────┐      ┌────────────────────────────────────────┐
       │  Block                                 │      │  Block                                 │
       │  ┌──────────────────┐ ┌──────────────┐ │      │  ┌──────────────────┐ ┌──────────────┐ │
───────┼─►│ Prev Hash        │ │ Nonce        │ ├──────┼─►│ Prev Hash        │ │ Nonce        │ │
       │  └──────────────────┘ └──────────────┘ │      │  └──────────────────┘ └──────────────┘ │
       │                                        │      │                                        │
       │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │      │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
       │ │ Tx       │ │ Tx       │ │ ...      │ │      │ │ Tx       │ │ Tx       │ │ ...      │ │
       │ └──────────┘ └──────────┘ └──────────┘ │      │ └──────────┘ └──────────┘ └──────────┘ │
       │                                        │      │                                        │
       └────────────────────────────────────────┘      └────────────────────────────────────────┘
```

The proof-of-work also solves the problem of determining representation in majority decision making. If the majority were based on one-IP-address-one-vote, it could be subverted by anyone able to allocate many IPs. Proof-of-work is essentially one-CPU-one-vote. The majority decision is represented by the longest chain, which has the greatest proof-of-work effort invested in it. If a majority of CPU power is controlled by honest nodes, the honest chain will grow the fastest and outpace any competing chains. To modify a past block, an attacker would have to redo the proof-of-work of the block and all blocks after it and then catch up with and surpass the work of the honest nodes. We will show later that the probability of a slower attacker catching up diminishes exponentially as subsequent blocks are added.

To compensate for increasing hardware speed and varying interest in running nodes over time, the proof-of-work difficulty is determined by a moving average targeting an average number of blocks per hour. If they're generated too fast, the difficulty increases.

## 5. Network

The steps to run the network are as follows:

1. New transactions are broadcast to all nodes.
2. Each node collects new transactions into a block.
3. Each node works on finding a difficult proof-of-work for its block.
4. When a node finds a proof-of-work, it broadcasts the block to all nodes.
5. Nodes accept the block only if all transactions in it are valid and not already spent.
6. Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash.

Nodes always consider the longest chain to be the correct one and will keep working on extending it. If two nodes broadcast different versions of the next block simultaneously, some nodes may receive one or the other first. In that case, they work on the first one they received, but save the other branch in case it becomes longer. The tie will be broken when the next proof-of-work is found and one branch becomes longer; the nodes that were working on the other branch will then switch to the longer one.

New transaction broadcasts do not necessarily need to reach all nodes. As long as they reach many nodes, they will get into a block before long. Block broadcasts are also tolerant of dropped messages. If a node does not receive a block, it will request it when it receives the next block and realizes it missed one.

## 6. Incentive

By convention, the first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. This adds an incentive for nodes to support the network, and provides a way to initially distribute coins into circulation, since there is no central authority to issue them. The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation. In our case, it is CPU time and electricity that is expended.

The incentive can also be funded with transaction fees. If the output value of a transaction is less than its input value, the difference is a transaction fee that is added to the incentive value of the block containing the transaction. Once a predetermined number of coins have entered circulation, the incentive can transition entirely to transaction fees and be completely inflation free.

The incentive may help encourage nodes to stay honest. If a greedy attacker is able to assemble more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate new coins. He ought to find it more profitable to play by the rules, such rules that favour him with more new coins than everyone else combined, than to undermine the system and the validity of his own wealth.

## 7. Reclaiming Disk Space

Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space. To facilitate this without breaking the block's hash, transactions are hashed in a Merkle Tree [^7] [^2] [^5], with only the root included in the block's hash. Old blocks can then be compacted by stubbing off branches of the tree. The interior hashes do not need to be stored.

```
┌──────────────────────────────────────────┐    ┌──────────────────────────────────────────┐
│                                          │    │                                          │
│ Block ┌─────────────────────────────┐    │    │ Block ┌─────────────────────────────┐    │
│       │  Block Header (Block Hash)  │    │    │       │  Block Header (Block Hash)  │    │
│       │ ┌────────────┐ ┌─────────┐  │    │    │       │ ┌────────────┐ ┌─────────┐  │    │
│       │ │ Prev Hash  │ │ Nonce   │  │    │    │       │ │ Prev Hash  │ │ Nonce   │  │    │
│       │ └────────────┘ └─────────┘  │    │    │       │ └────────────┘ └─────────┘  │    │
│       │                             │    │    │       │                             │    │
│       │     ┌─────────────┐         │    │    │       │     ┌─────────────┐         │    │
│       │     │  Root Hash  │         │    │    │       │     │  Root Hash  │         │    │
│       │     └─────▲─▲─────┘         │    │    │       │     └─────▲─▲─────┘         │    │
│       │           │ │               │    │    │       │           │ │               │    │
│       │           │ │               │    │    │       │           │ │               │    │
│       └───────────┼─┼───────────────┘    │    │       └───────────┼─┼───────────────┘    │
│                   │ │                    │    │                   │ │                    │
│     ..........    │ │     ..........     │    │     ┌────────┐    │ │     ..........     │
│     .        ─────┘ └─────.        .     │    │     │        ├────┘ └─────.        .     │
│     . Hash01 .            . Hash23 .     │    │     │ Hash01 │            . Hash23 .     │
│     .▲.....▲..            .▲.....▲..     │    │     │        │            .▲.....▲..     │
│      │     │               │     │       │    │     └────────┘             │     │       │
│      │     │               │     │       │    │                            │     │       │
│      │     │               │     │       │    │                            │     │       │
│ .....│.. ..│.....     .....│.. ..│.....  │    │                       ┌────┴─┐ ..│.....  │
│ .      . .      .     .      . .      .  │    │                       │      │ .      .  │
│ .Hash0 . .Hash1 .     .Hash2 . .Hash3 .  │    │                       │Hash2 │ .Hash3 .  │
│ ...▲.... ...▲....     ...▲.... ...▲....  │    │                       │      │ .      .  │
│    │        │            │        │      │    │                       └──────┘ ...▲....  │
│    │        │            │        │      │    │                                   │      │
│    │        │            │        │      │    │                                   │      │
│ ┌──┴───┐ ┌──┴───┐     ┌──┴───┐ ┌──┴───┐  │    │                                ┌──┴───┐  │
│ │ Tx0  │ │ Tx1  │     │ Tx2  │ │ Tx3  │  │    │                                │ Tx3  │  │
│ └──────┘ └──────┘     └──────┘ └──────┘  │    │                                └──────┘  │
│                                          │    │                                          │
└──────────────────────────────────────────┘    └──────────────────────────────────────────┘
     Transactions Hashed in a Merkle Tree              After Pruning Tx0-2 from the Block
```

A block header with no transactions would be about 80 bytes. If we suppose blocks are generated every 10 minutes, 80 bytes * 6 * 24 * 365 = 4.2MB per year. With computer systems typically selling with 2GB of RAM as of 2008, and Moore's Law predicting current growth of 1.2GB per year, storage should not be a problem even if the block headers must be kept in memory.

## 8. Simplified Payment Verification

It is possible to verify payments without running a full network node. A user only needs to keep a copy of the block headers of the longest proof-of-work chain, which he can get by querying network nodes until he's convinced he has the longest chain, and obtain the Merkle branch linking the transaction to the block it's timestamped in. He can't check the transaction for himself, but by linking it to a place in the chain, he can see that a network node has accepted it, and blocks added after it further confirm the network has accepted it.

```
     Longest Proof-of-Work Chain
        ┌────────────────────────────────────────┐      ┌────────────────────────────────────────┐       ┌────────────────────────────────────────┐
        │   Block Header                         │      │   Block Header                         │       │   Block Header                         │
        │  ┌──────────────────┐ ┌──────────────┐ │      │  ┌──────────────────┐ ┌──────────────┐ │       │  ┌──────────────────┐ ┌──────────────┐ │
 ───────┼─►│ Prev Hash        │ │ Nonce        │ ├──────┼─►│ Prev Hash        │ │ Nonce        │ ├───────┼─►│ Prev Hash        │ │ Nonce        │ ├────────►
        │  └──────────────────┘ └──────────────┘ │      │  └──────────────────┘ └──────────────┘ │       │  └──────────────────┘ └──────────────┘ │
        │                                        │      │                                        │       │                                        │
        │     ┌───────────────────┐              │      │    ┌────────────────────┐              │       │     ┌───────────────────┐              │
        │     │   Merkle Root     │              │      │    │   Merkle Root      │              │       │     │   Merkle Root     │              │
        │     └───────────────────┘              │      │    └────────▲─▲─────────┘              │       │     └───────────────────┘              │
        │                                        │      │             │ │                        │       │                                        │
        └────────────────────────────────────────┘      └─────────────┼─┼────────────────────────┘       └────────────────────────────────────────┘
                                                                      │ │
                                                                      │ │
                                                        ┌────────┐    │ │     ..........
                                                        │        ├────┘ └─────.        .
                                                        │ Hash01 │            . Hash23 .
                                                        │        │            .▲.....▲..
                                                        └────────┘             │     │
                                                                               │     │
                                                                               │     │   Merkle Branch for Tx3
                                                                               │     │
                                                                         ┌─────┴─┐ ..│.....
                                                                         │       │ .      .
                                                                         │ Hash2 │ .Hash3 .
                                                                         │       │ .      .
                                                                         └───────┘ ...▲....
                                                                                      │
                                                                                      │
                                                                                  ┌───┴───┐
                                                                                  │  Tx3  │
                                                                                  └───────┘
```

As such, the verification is reliable as long as honest nodes control the network, but is more vulnerable if the network is overpowered by an attacker. While network nodes can verify transactions for themselves, the simplified method can be fooled by an attacker's fabricated transactions for as long as the attacker can continue to overpower the network. One strategy to protect against this would be to accept alerts from network nodes when they detect an invalid block, prompting the user's software to download the full block and alerted transactions to confirm the inconsistency. Businesses that receive frequent payments will probably still want to run their own nodes for more independent security and quicker verification.

## 9. Combining and Splitting Value

Although it would be possible to handle coins individually, it would be unwieldy to make a separate transaction for every cent in a transfer. To allow value to be split and combined, transactions contain multiple inputs and outputs. Normally there will be either a single input from a larger previous transaction or multiple inputs combining smaller amounts, and at most two outputs: one for the payment, and one returning the change, if any, back to the sender.

```
     ┌──────────────────────┐
     │ Transaction          │
     │                      │
     │   ┌─────┐  ┌─────┐   │
─────┼──►│ in  │  │ out │ ──┼─────►
     │   └─────┘  └─────┘   │
     │                      │
     │                      │
     │   ┌─────┐  ┌─────┐   │
─────┼──►│ in  │  │ ... │ ──┼─────►
     │   └─────┘  └─────┘   │
     │                      │
     │                      │
     │   ┌─────┐            │
─────┼──►│...  │            │
     │   └─────┘            │
     │                      │
     └──────────────────────┘
```
It should be noted that fan-out, where a transaction depends on several transactions, and those transactions depend on many more, is not a problem here. There is never the need to extract a complete standalone copy of a transaction's history.

## 10. Privacy

The traditional banking model achieves a level of privacy by limiting access to information to the parties involved and the trusted third party. The necessity to announce all transactions publicly precludes this method, but privacy can still be maintained by breaking the flow of information in another place: by keeping public keys anonymous. The public can see that someone is sending an amount to someone else, but without information linking the transaction to anyone. This is similar to the level of information released by stock exchanges, where the time and size of individual trades, the "tape", is made public, but without telling who the parties were.

```
Traditional Privacy Models                                                │
                                      ┌─────────────┐   ┌──────────────┐  │  ┌────────┐
┌──────────────┐  ┌──────────────┐    │  Trusted    │   │              │  │  │        │
│  Identities  ├──┤ Transactions ├───►│ Third Party ├──►│ Counterparty │  │  │ Public │
└──────────────┘  └──────────────┘    │             │   │              │  │  │        │
                                      └─────────────┘   └──────────────┘  │  └────────┘
                                                                          │

New Privacy Model
                                       ┌────────┐
┌──────────────┐ │ ┌──────────────┐    │        │
│  Identities  │ │ │ Transactions ├───►│ Public │
└──────────────┘ │ └──────────────┘    │        │
                                       └────────┘
```
As an additional firewall, a new key pair should be used for each transaction to keep them from being linked to a common owner. Some linking is still unavoidable with multi-input transactions, which necessarily reveal that their inputs were owned by the same owner. The risk is that if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner.

## 11. Calculations
We consider the scenario of an attacker trying to generate an alternate chain faster than the honest chain. Even if this is accomplished, it does not throw the system open to arbitrary changes, such as creating value out of thin air or taking money that never belonged to the attacker. Nodes are not going to accept an invalid transaction as payment, and honest nodes will never accept a block containing them. An attacker can only try to change one of his own transactions to take back money he recently spent.

The race between the honest chain and an attacker chain can be characterized as a Binomial Random Walk. The success event is the honest chain being extended by one block, increasing its lead by +1, and the failure event is the attacker's chain being extended by one block, reducing the gap by -1.

The probability of an attacker catching up from a given deficit is analogous to a Gambler's Ruin problem. Suppose a gambler with unlimited credit starts at a deficit and plays potentially an infinite number of trials to try to reach breakeven. We can calculate the probability he ever reaches breakeven, or that an attacker ever catches up with the honest chain, as follows [^8]:

```plaintext
p = probability an honest node finds the next block<
q = probability the attacker finds the next block
q = probability the attacker will ever catch up from z blocks behind
``````
     
$$
qz = 
\begin{cases} 
1 & \text{if } p \leq q \\
\left(\frac{q}{p}\right) z & \text{if } p > q 
\end{cases}
$$

Given our assumption that p > q, the probability drops exponentially as the number of blocks the attacker has to catch up with increases. With the odds against him, if he doesn't make a lucky lunge forward early on, his chances become vanishingly small as he falls further behind. 

We now consider how long the recipient of a new transaction needs to wait before being sufficiently certain the sender can't change the transaction. We assume the sender is an attacker who wants to make the recipient believe he paid him for a while, then switch it to pay back to himself after some time has passed. The receiver will be alerted when that happens, but the sender hopes it will be too late.

The receiver generates a new key pair and gives the public key to the sender shortly before signing. This prevents the sender from preparing a chain of blocks ahead of time by working on it continuously until he is lucky enough to get far enough ahead, then executing the transaction at that moment. Once the transaction is sent, the dishonest sender starts working in secret on a parallel chain containing an alternate version of his transaction.

The recipient waits until the transaction has been added to a block and z blocks have been linked after it. He doesn't know the exact amount of progress the attacker has made, but assuming the honest blocks took the average expected time per block, the attacker's potential progress will be a Poisson distribution with expected value:

$$
\lambda = z\frac{q}{p}
$$

To get the probability the attacker could still catch up now, we multiply the Poisson density for each amount of progress he could have made by the probability he could catch up from that point:

$$
\sum_{k=0}^{\infty} \frac{\lambda^k e^{-\lambda}}{k!} \cdot \left\{ 
\begin{array}{cl} 
\left(\frac{q}{p}\right)^{(z-k)} & \text{if } k \leq z \\
1 & \text{if } k > z 
\end{array}
\right.
$$

Rearranging to avoid summing the infinite tail of the distribution...

$$
1 - \sum_{k=0}^{z} \frac{\lambda^k e^{-\lambda}}{k!} \left(1-\left(\frac{q}{p}\right)^{(z-k)}\right)
$$

Converting to C code...

```c
#include <math.h>

double AttackerSuccessProbability(double q, int z)
{
    double p = 1.0 - q;
    double lambda = z * (q / p);
    double sum = 1.0;
    int i, k;
    for (k = 0; k <= z; k++)
    {
        double poisson = exp(-lambda);
        for (i = 1; i <= k; i++)
            poisson *= lambda / i;
        sum -= poisson * (1 - pow(q / p, z - k));
    }
    return sum;
}
```
Running some results, we can see the probability drop off exponentially with z.

```plaintext
q=0.1
z=0 P=1.0000000
z=1 P=0.2045873
z=2 P=0.0509779
z=3 P=0.0131722
z=4 P=0.0034552
z=5 P=0.0009137
z=6 P=0.0002428
z=7 P=0.0000647
z=8 P=0.0000173
z=9 P=0.0000046
z=10 P=0.0000012

q=0.3
z=0 P=1.0000000
z=5 P=0.1773523
z=10 P=0.0416605
z=15 P=0.0101008
z=20 P=0.0024804
z=25 P=0.0006132
z=30 P=0.0001522
z=35 P=0.0000379
z=40 P=0.0000095
z=45 P=0.0000024
z=50 P=0.0000006
```
Solving for P less than 0.1%...
```plaintext
P < 0.001
q=0.10 z=5
q=0.15 z=8
q=0.20 z=11
q=0.25 z=15
q=0.30 z=24
q=0.35 z=41
q=0.40 z=89
q=0.45 z=340
```
## 12. Conclusion
We have proposed a system for electronic transactions without relying on trust. We started with the usual framework of coins made from digital signatures, which provides strong control of ownership, but is incomplete without a way to prevent double-spending. To solve this, we proposed a peer-to-peer network using proof-of-work to record a public history of transactions that quickly becomes computationally impractical for an attacker to change if honest nodes control a majority of CPU power. The network is robust in its unstructured simplicity. Nodes work all at once with little coordination. They do not need to be identified, since messages are not routed to any particular place and only need to be delivered on a best effort basis. Nodes can leave and rejoin the network at will, accepting the proof-of-work chain as proof of what happened while they were gone. They vote with their CPU power, expressing their acceptance of valid blocks by working on extending them and rejecting invalid blocks by refusing to work on them. Any needed rules and incentives can be enforced with this consensus mechanism.
<br>

### References
---
[^1]: W. Dai, "b-money," http://www.weidai.com/bmoney.txt, 1998.
[^2]: H. Massias, X.S. Avila, and J.-J. Quisquater, "Design of a secure timestamping service with minimal
trust requirements," In 20th Symposium on Information Theory in the Benelux, May 1999.
[^3]: S. Haber, W.S. Stornetta, "How to time-stamp a digital document," In Journal of Cryptology, vol 3, no
2, pages 99-111, 1991.
[^4]: D. Bayer, S. Haber, W.S. Stornetta, "Improving the efficiency and reliability of digital time-stamping,"
In Sequences II: Methods in Communication, Security and Computer Science, pages 329-334, 1993.
[^5]: S. Haber, W.S. Stornetta, "Secure names for bit-strings," In Proceedings of the 4th ACM Conference
on Computer and Communications Security, pages 28-35, April 1997.
[^6]: A. Back, "Hashcash - a denial of service counter-measure,"
http://www.hashcash.org/papers/hashcash.pdf, 2002.
[^7]: R.C. Merkle, "Protocols for public key cryptosystems," In Proc. 1980 Symposium on Security and
Privacy, IEEE Computer Society, pages 122-133, April 1980.
[^8]: W. Feller, "An introduction to probability theory and its applications," 1957.

```

--------------------------------------------------------------------------------
/data/bitcoin-whitepaper.json:
--------------------------------------------------------------------------------

```json
[
  {
    "id": "bitcoin-whitepaper.md_section_0",
    "title": "Bitcoin: A Peer-to-Peer Electronic Cash System",
    "text": "Satoshi Nakamoto  \n[[email protected]](mailto:[email protected])  \nwww.bitcoin.org\n\n**Abstract.** A purely peer-to-peer version of electronic cash would allow online payments to be sent directly from one party to another without going through a financial institution. Digital signatures provide part of the solution, but the main benefits are lost if a trusted third party is still required to prevent double-spending. We propose a solution to the double-spending problem using a peer-to-peer network. The network timestamps transactions by hashing them into an ongoing chain of hash-based proof-of-work, forming a record that cannot be changed without redoing the proof-of-work. The longest chain not only serves as proof of the sequence of events witnessed, but proof that it came from the largest pool of CPU power. As long as a majority of CPU power is controlled by nodes that are not cooperating to attack the network, they'll generate the longest chain and outpace attackers. The network itself requires minimal structure. Messages are broadcast on a best effort basis, and nodes can leave and rejoin the network at will, accepting the longest proof-of-work chain as proof of what happened while they were gone.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 0,
    "date_indexed": "2025-03-20T15:28:42.816825",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_1",
    "title": "1. Introduction",
    "text": "Commerce on the Internet has come to rely almost exclusively on financial institutions serving as trusted third parties to process electronic payments. While the system works well enough for most transactions, it still suffers from the inherent weaknesses of the trust based model. Completely non-reversible transactions are not really possible, since financial institutions cannot avoid mediating disputes. The cost of mediation increases transaction costs, limiting the minimum practical transaction size and cutting off the possibility for small casual transactions, and there is a broader cost in the loss of ability to make non-reversible payments for non-reversible services. With the possibility of reversal, the need for trust spreads. Merchants must be wary of their customers, hassling them for more information than they would otherwise need. A certain percentage of fraud is accepted as unavoidable. These costs and payment uncertainties can be avoided in person by using physical currency, but no mechanism exists to make payments over a communications channel without a trusted party.\n\nWhat is needed is an electronic payment system based on cryptographic proof instead of trust, allowing any two willing parties to transact directly with each other without the need for a trusted third party. Transactions that are computationally impractical to reverse would protect sellers from fraud, and routine escrow mechanisms could easily be implemented to protect buyers. In this paper, we propose a solution to the double-spending problem using a peer-to-peer distributed timestamp server to generate computational proof of the chronological order of transactions. The system is secure as long as honest nodes collectively control more CPU power than any cooperating group of attacker nodes.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 1,
    "date_indexed": "2025-03-20T15:28:42.816838",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_2",
    "title": "2. Transactions",
    "text": "We define an electronic coin as a chain of digital signatures. Each owner transfers the coin to the next by digitally signing a hash of the previous transaction and the public key of the next owner and adding these to the end of the coin. A payee can verify the signatures to verify the chain of ownership.\n\n```\n      \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510               \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n      \u2502                     \u2502               \u2502                     \u2502              \u2502                     \u2502\n      \u2502    Transaction      \u2502               \u2502    Transaction      \u2502              \u2502    Transaction      \u2502\n      \u2502                     \u2502               \u2502                     \u2502              \u2502                     \u2502\n      \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502               \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502              \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502\n      \u2502   \u2502 Owner 1's   \u2502   \u2502               \u2502   \u2502 Owner 2's   \u2502   \u2502              \u2502   \u2502 Owner 3's   \u2502   \u2502\n      \u2502   \u2502 Public Key  \u2502   \u2502               \u2502   \u2502 Public Key  \u2502   \u2502              \u2502   \u2502 Public Key  \u2502   \u2502\n      \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2518   \u2502               \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2518   \u2502              \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2518   \u2502\n      \u2502           \u2502    .    \u2502               \u2502           \u2502    .    \u2502              \u2502           \u2502         \u2502\n\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502    .    \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502    .    \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502         \u2502\n      \u2502         \u2502 \u2502    .    \u2502               \u2502         \u2502 \u2502    .    \u2502              \u2502         \u2502 \u2502         \u2502\n      \u2502      \u250c\u2500\u2500\u25bc\u2500\u25bc\u2500\u2500\u2510 .    \u2502               \u2502      \u250c\u2500\u2500\u25bc\u2500\u25bc\u2500\u2500\u2510 .    \u2502              \u2502      \u250c\u2500\u2500\u25bc\u2500\u25bc\u2500\u2500\u2510      \u2502\n      \u2502      \u2502 Hash  \u2502 .    \u2502               \u2502      \u2502 Hash  \u2502 .    \u2502              \u2502      \u2502 Hash  \u2502      \u2502\n      \u2502      \u2514\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2518 .    \u2502    Verify     \u2502      \u2514\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2518 .    \u2502    Verify    \u2502      \u2514\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2518      \u2502\n      \u2502          \u2502     ............................    \u2502     ...........................    \u2502          \u2502\n      \u2502          \u2502          \u2502               \u2502     \u2502    \u2502          \u2502              \u2502     \u2502    \u2502          \u2502\n      \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502               \u2502   \u250c\u2500\u25bc\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502              \u2502   \u250c\u2500\u25bc\u2500\u2500\u2500\u2500\u25bc\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u2502\n      \u2502   \u2502 Owner 0's   \u2502   \u2502      Sign     \u2502   \u2502 Owner 1's   \u2502   \u2502      Sign    \u2502   \u2502 Owner 2's   \u2502   \u2502\n      \u2502   \u2502 Signature   \u2502   \u2502      ...........\u2500\u25ba\u2502 Signature   \u2502   \u2502     ...........\u2500\u25ba\u2502 Signature   \u2502   \u2502\n      \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2502      .        \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2502     .        \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2502\n      \u2502                     \u2502      .        \u2502                     \u2502     .        \u2502                     \u2502\n      \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518      .        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518     .        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                   .                                    .\n          \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510          .            \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510         .            \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n          \u2502 Owner 1's   \u2502...........            \u2502 Owner 2's   \u2502..........            \u2502 Owner 3's   \u2502\n          \u2502 Private Key \u2502                       \u2502 Private Key \u2502                      \u2502 Private Key \u2502\n          \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518                       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518                      \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\nThe problem of course is the payee can't verify that one of the owners did not double-spend the coin. A common solution is to introduce a trusted central authority, or mint, that checks every transaction for double spending. After each transaction, the coin must be returned to the mint to issue a new coin, and only coins issued directly from the mint are trusted not to be double-spent. The problem with this solution is that the fate of the entire money system depends on the company running the mint, with every transaction having to go through them, just like a bank.\n\nWe need a way for the payee to know that the previous owners did not sign any earlier transactions. For our purposes, the earliest transaction is the one that counts, so we don't care about later attempts to double-spend. The only way to confirm the absence of a transaction is to be aware of all transactions. In the mint based model, the mint was aware of all transactions and decided which arrived first. To accomplish this without a trusted party, transactions must be publicly announced [^1], and we need a system for participants to agree on a single history of the order in which they were received. The payee needs proof that at the time of each transaction, the majority of nodes agreed it was the first received.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 2,
    "date_indexed": "2025-03-20T15:28:42.816840",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_3",
    "title": "3. Timestamp Server",
    "text": "The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash, such as in a newspaper or Usenet post [^2] [^3] [^4] [^5]. The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it.\n\n```\n             \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2510                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25ba\u2502      \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25ba\u2502      \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25ba\n             \u2502 Hash \u2502                        \u2502 Hash \u2502\n        \u250c\u2500\u2500\u2500\u25ba\u2502      \u2502                   \u250c\u2500\u2500\u2500\u25ba\u2502      \u2502\n        \u2502    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518                   \u2502    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n        \u2502                               \u2502\n       \u250c\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u250c\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n       \u2502 Block                     \u2502   \u2502 Block                     \u2502\n       \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2510   \u2502   \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2510   \u2502\n       \u2502 \u2502Item \u2502 \u2502Item \u2502 \u2502...  \u2502   \u2502   \u2502 \u2502Item \u2502 \u2502Item \u2502 \u2502...  \u2502   \u2502\n       \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2518   \u2502   \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2518   \u2502\n       \u2502                           \u2502   \u2502                           \u2502\n       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 3,
    "date_indexed": "2025-03-20T15:28:42.816841",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_4",
    "title": "4. Proof-of-Work",
    "text": "To implement a distributed timestamp server on a peer-to-peer basis, we will need to use a proof-of-work system similar to Adam Back's Hashcash [^6], rather than newspaper or Usenet posts. The proof-of-work involves scanning for a value that when hashed, such as with SHA-256, the hash begins with a number of zero bits. The average work required is exponential in the number of zero bits required and can be verified by executing a single hash.\n\nFor our timestamp network, we implement the proof-of-work by incrementing a nonce in the block until a value is found that gives the block's hash the required zero bits. Once the CPU effort has been expended to make it satisfy the proof-of-work, the block cannot be changed without redoing the work. As later blocks are chained after it, the work to change the block would include redoing all the blocks after it.\n\n```\n       \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510      \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n       \u2502  Block                                 \u2502      \u2502  Block                                 \u2502\n       \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502      \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u25ba\u2502 Prev Hash        \u2502 \u2502 Nonce        \u2502 \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u25ba\u2502 Prev Hash        \u2502 \u2502 Nonce        \u2502 \u2502\n       \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502      \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n       \u2502                                        \u2502      \u2502                                        \u2502\n       \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502      \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n       \u2502 \u2502 Tx       \u2502 \u2502 Tx       \u2502 \u2502 ...      \u2502 \u2502      \u2502 \u2502 Tx       \u2502 \u2502 Tx       \u2502 \u2502 ...      \u2502 \u2502\n       \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502      \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n       \u2502                                        \u2502      \u2502                                        \u2502\n       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518      \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\nThe proof-of-work also solves the problem of determining representation in majority decision making. If the majority were based on one-IP-address-one-vote, it could be subverted by anyone able to allocate many IPs. Proof-of-work is essentially one-CPU-one-vote. The majority decision is represented by the longest chain, which has the greatest proof-of-work effort invested in it. If a majority of CPU power is controlled by honest nodes, the honest chain will grow the fastest and outpace any competing chains. To modify a past block, an attacker would have to redo the proof-of-work of the block and all blocks after it and then catch up with and surpass the work of the honest nodes. We will show later that the probability of a slower attacker catching up diminishes exponentially as subsequent blocks are added.\n\nTo compensate for increasing hardware speed and varying interest in running nodes over time, the proof-of-work difficulty is determined by a moving average targeting an average number of blocks per hour. If they're generated too fast, the difficulty increases.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 4,
    "date_indexed": "2025-03-20T15:28:42.816842",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_5",
    "title": "5. Network",
    "text": "The steps to run the network are as follows:\n\n1. New transactions are broadcast to all nodes.\n2. Each node collects new transactions into a block.\n3. Each node works on finding a difficult proof-of-work for its block.\n4. When a node finds a proof-of-work, it broadcasts the block to all nodes.\n5. Nodes accept the block only if all transactions in it are valid and not already spent.\n6. Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash.\n\nNodes always consider the longest chain to be the correct one and will keep working on extending it. If two nodes broadcast different versions of the next block simultaneously, some nodes may receive one or the other first. In that case, they work on the first one they received, but save the other branch in case it becomes longer. The tie will be broken when the next proof-of-work is found and one branch becomes longer; the nodes that were working on the other branch will then switch to the longer one.\n\nNew transaction broadcasts do not necessarily need to reach all nodes. As long as they reach many nodes, they will get into a block before long. Block broadcasts are also tolerant of dropped messages. If a node does not receive a block, it will request it when it receives the next block and realizes it missed one.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 5,
    "date_indexed": "2025-03-20T15:28:42.816844",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_6",
    "title": "6. Incentive",
    "text": "By convention, the first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. This adds an incentive for nodes to support the network, and provides a way to initially distribute coins into circulation, since there is no central authority to issue them. The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation. In our case, it is CPU time and electricity that is expended.\n\nThe incentive can also be funded with transaction fees. If the output value of a transaction is less than its input value, the difference is a transaction fee that is added to the incentive value of the block containing the transaction. Once a predetermined number of coins have entered circulation, the incentive can transition entirely to transaction fees and be completely inflation free.\n\nThe incentive may help encourage nodes to stay honest. If a greedy attacker is able to assemble more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate new coins. He ought to find it more profitable to play by the rules, such rules that favour him with more new coins than everyone else combined, than to undermine the system and the validity of his own wealth.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 6,
    "date_indexed": "2025-03-20T15:28:42.816845",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_7",
    "title": "7. Reclaiming Disk Space",
    "text": "Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space. To facilitate this without breaking the block's hash, transactions are hashed in a Merkle Tree [^7] [^2] [^5], with only the root included in the block's hash. Old blocks can then be compacted by stubbing off branches of the tree. The interior hashes do not need to be stored.\n\n```\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502                                          \u2502    \u2502                                          \u2502\n\u2502 Block \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u2502    \u2502 Block \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u2502\n\u2502       \u2502  Block Header (Block Hash)  \u2502    \u2502    \u2502       \u2502  Block Header (Block Hash)  \u2502    \u2502\n\u2502       \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u2502    \u2502    \u2502       \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u2502    \u2502\n\u2502       \u2502 \u2502 Prev Hash  \u2502 \u2502 Nonce   \u2502  \u2502    \u2502    \u2502       \u2502 \u2502 Prev Hash  \u2502 \u2502 Nonce   \u2502  \u2502    \u2502\n\u2502       \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502    \u2502    \u2502       \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502    \u2502\n\u2502       \u2502                             \u2502    \u2502    \u2502       \u2502                             \u2502    \u2502\n\u2502       \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510         \u2502    \u2502    \u2502       \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510         \u2502    \u2502\n\u2502       \u2502     \u2502  Root Hash  \u2502         \u2502    \u2502    \u2502       \u2502     \u2502  Root Hash  \u2502         \u2502    \u2502\n\u2502       \u2502     \u2514\u2500\u2500\u2500\u2500\u2500\u25b2\u2500\u25b2\u2500\u2500\u2500\u2500\u2500\u2518         \u2502    \u2502    \u2502       \u2502     \u2514\u2500\u2500\u2500\u2500\u2500\u25b2\u2500\u25b2\u2500\u2500\u2500\u2500\u2500\u2518         \u2502    \u2502\n\u2502       \u2502           \u2502 \u2502               \u2502    \u2502    \u2502       \u2502           \u2502 \u2502               \u2502    \u2502\n\u2502       \u2502           \u2502 \u2502               \u2502    \u2502    \u2502       \u2502           \u2502 \u2502               \u2502    \u2502\n\u2502       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2502    \u2502       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2502\n\u2502                   \u2502 \u2502                    \u2502    \u2502                   \u2502 \u2502                    \u2502\n\u2502     ..........    \u2502 \u2502     ..........     \u2502    \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u2502 \u2502     ..........     \u2502\n\u2502     .        \u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500.        .     \u2502    \u2502     \u2502        \u251c\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500.        .     \u2502\n\u2502     . Hash01 .            . Hash23 .     \u2502    \u2502     \u2502 Hash01 \u2502            . Hash23 .     \u2502\n\u2502     .\u25b2.....\u25b2..            .\u25b2.....\u25b2..     \u2502    \u2502     \u2502        \u2502            .\u25b2.....\u25b2..     \u2502\n\u2502      \u2502     \u2502               \u2502     \u2502       \u2502    \u2502     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518             \u2502     \u2502       \u2502\n\u2502      \u2502     \u2502               \u2502     \u2502       \u2502    \u2502                            \u2502     \u2502       \u2502\n\u2502      \u2502     \u2502               \u2502     \u2502       \u2502    \u2502                            \u2502     \u2502       \u2502\n\u2502 .....\u2502.. ..\u2502.....     .....\u2502.. ..\u2502.....  \u2502    \u2502                       \u250c\u2500\u2500\u2500\u2500\u2534\u2500\u2510 ..\u2502.....  \u2502\n\u2502 .      . .      .     .      . .      .  \u2502    \u2502                       \u2502      \u2502 .      .  \u2502\n\u2502 .Hash0 . .Hash1 .     .Hash2 . .Hash3 .  \u2502    \u2502                       \u2502Hash2 \u2502 .Hash3 .  \u2502\n\u2502 ...\u25b2.... ...\u25b2....     ...\u25b2.... ...\u25b2....  \u2502    \u2502                       \u2502      \u2502 .      .  \u2502\n\u2502    \u2502        \u2502            \u2502        \u2502      \u2502    \u2502                       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518 ...\u25b2....  \u2502\n\u2502    \u2502        \u2502            \u2502        \u2502      \u2502    \u2502                                   \u2502      \u2502\n\u2502    \u2502        \u2502            \u2502        \u2502      \u2502    \u2502                                   \u2502      \u2502\n\u2502 \u250c\u2500\u2500\u2534\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2534\u2500\u2500\u2500\u2510     \u250c\u2500\u2500\u2534\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2534\u2500\u2500\u2500\u2510  \u2502    \u2502                                \u250c\u2500\u2500\u2534\u2500\u2500\u2500\u2510  \u2502\n\u2502 \u2502 Tx0  \u2502 \u2502 Tx1  \u2502     \u2502 Tx2  \u2502 \u2502 Tx3  \u2502  \u2502    \u2502                                \u2502 Tx3  \u2502  \u2502\n\u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502    \u2502                                \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502\n\u2502                                          \u2502    \u2502                                          \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n     Transactions Hashed in a Merkle Tree              After Pruning Tx0-2 from the Block\n```\n\nA block header with no transactions would be about 80 bytes. If we suppose blocks are generated every 10 minutes, 80 bytes * 6 * 24 * 365 = 4.2MB per year. With computer systems typically selling with 2GB of RAM as of 2008, and Moore's Law predicting current growth of 1.2GB per year, storage should not be a problem even if the block headers must be kept in memory.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 7,
    "date_indexed": "2025-03-20T15:28:42.816846",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_8",
    "title": "8. Simplified Payment Verification",
    "text": "It is possible to verify payments without running a full network node. A user only needs to keep a copy of the block headers of the longest proof-of-work chain, which he can get by querying network nodes until he's convinced he has the longest chain, and obtain the Merkle branch linking the transaction to the block it's timestamped in. He can't check the transaction for himself, but by linking it to a place in the chain, he can see that a network node has accepted it, and blocks added after it further confirm the network has accepted it.\n\n```\n     Longest Proof-of-Work Chain\n        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510      \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510       \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n        \u2502   Block Header                         \u2502      \u2502   Block Header                         \u2502       \u2502   Block Header                         \u2502\n        \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502      \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502       \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502\n \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u25ba\u2502 Prev Hash        \u2502 \u2502 Nonce        \u2502 \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u25ba\u2502 Prev Hash        \u2502 \u2502 Nonce        \u2502 \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u25ba\u2502 Prev Hash        \u2502 \u2502 Nonce        \u2502 \u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25ba\n        \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502      \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502       \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502\n        \u2502                                        \u2502      \u2502                                        \u2502       \u2502                                        \u2502\n        \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u2502      \u2502    \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u2502       \u2502     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510              \u2502\n        \u2502     \u2502   Merkle Root     \u2502              \u2502      \u2502    \u2502   Merkle Root      \u2502              \u2502       \u2502     \u2502   Merkle Root     \u2502              \u2502\n        \u2502     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518              \u2502      \u2502    \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u25b2\u2500\u25b2\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518              \u2502       \u2502     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518              \u2502\n        \u2502                                        \u2502      \u2502             \u2502 \u2502                        \u2502       \u2502                                        \u2502\n        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518      \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                                                      \u2502 \u2502\n                                                                      \u2502 \u2502\n                                                        \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u2502 \u2502     ..........\n                                                        \u2502        \u251c\u2500\u2500\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500.        .\n                                                        \u2502 Hash01 \u2502            . Hash23 .\n                                                        \u2502        \u2502            .\u25b2.....\u25b2..\n                                                        \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518             \u2502     \u2502\n                                                                               \u2502     \u2502\n                                                                               \u2502     \u2502   Merkle Branch for Tx3\n                                                                               \u2502     \u2502\n                                                                         \u250c\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2510 ..\u2502.....\n                                                                         \u2502       \u2502 .      .\n                                                                         \u2502 Hash2 \u2502 .Hash3 .\n                                                                         \u2502       \u2502 .      .\n                                                                         \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 ...\u25b2....\n                                                                                      \u2502\n                                                                                      \u2502\n                                                                                  \u250c\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2510\n                                                                                  \u2502  Tx3  \u2502\n                                                                                  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\nAs such, the verification is reliable as long as honest nodes control the network, but is more vulnerable if the network is overpowered by an attacker. While network nodes can verify transactions for themselves, the simplified method can be fooled by an attacker's fabricated transactions for as long as the attacker can continue to overpower the network. One strategy to protect against this would be to accept alerts from network nodes when they detect an invalid block, prompting the user's software to download the full block and alerted transactions to confirm the inconsistency. Businesses that receive frequent payments will probably still want to run their own nodes for more independent security and quicker verification.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 8,
    "date_indexed": "2025-03-20T15:28:42.816847",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_9",
    "title": "9. Combining and Splitting Value",
    "text": "Although it would be possible to handle coins individually, it would be unwieldy to make a separate transaction for every cent in a transfer. To allow value to be split and combined, transactions contain multiple inputs and outputs. Normally there will be either a single input from a larger previous transaction or multiple inputs combining smaller amounts, and at most two outputs: one for the payment, and one returning the change, if any, back to the sender.\n\n```\n     \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n     \u2502 Transaction          \u2502\n     \u2502                      \u2502\n     \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2510   \u2502\n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u25ba\u2502 in  \u2502  \u2502 out \u2502 \u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u25ba\n     \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2518   \u2502\n     \u2502                      \u2502\n     \u2502                      \u2502\n     \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2510   \u2502\n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u25ba\u2502 in  \u2502  \u2502 ... \u2502 \u2500\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u25ba\n     \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2518   \u2502\n     \u2502                      \u2502\n     \u2502                      \u2502\n     \u2502   \u250c\u2500\u2500\u2500\u2500\u2500\u2510            \u2502\n\u2500\u2500\u2500\u2500\u2500\u253c\u2500\u2500\u25ba\u2502...  \u2502            \u2502\n     \u2502   \u2514\u2500\u2500\u2500\u2500\u2500\u2518            \u2502\n     \u2502                      \u2502\n     \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\nIt should be noted that fan-out, where a transaction depends on several transactions, and those transactions depend on many more, is not a problem here. There is never the need to extract a complete standalone copy of a transaction's history.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 9,
    "date_indexed": "2025-03-20T15:28:42.816849",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_10",
    "title": "10. Privacy",
    "text": "The traditional banking model achieves a level of privacy by limiting access to information to the parties involved and the trusted third party. The necessity to announce all transactions publicly precludes this method, but privacy can still be maintained by breaking the flow of information in another place: by keeping public keys anonymous. The public can see that someone is sending an amount to someone else, but without information linking the transaction to anyone. This is similar to the level of information released by stock exchanges, where the time and size of individual trades, the \"tape\", is made public, but without telling who the parties were.\n\n```\nTraditional Privacy Models                                                \u2502\n                                      \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510   \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u2502  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510  \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u2502  Trusted    \u2502   \u2502              \u2502  \u2502  \u2502        \u2502\n\u2502  Identities  \u251c\u2500\u2500\u2524 Transactions \u251c\u2500\u2500\u2500\u25ba\u2502 Third Party \u251c\u2500\u2500\u25ba\u2502 Counterparty \u2502  \u2502  \u2502 Public \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2502             \u2502   \u2502              \u2502  \u2502  \u2502        \u2502\n                                      \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518   \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518  \u2502  \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n                                                                          \u2502\n\nNew Privacy Model\n                                       \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510 \u2502 \u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510    \u2502        \u2502\n\u2502  Identities  \u2502 \u2502 \u2502 Transactions \u251c\u2500\u2500\u2500\u25ba\u2502 Public \u2502\n\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518 \u2502 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518    \u2502        \u2502\n                                       \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\nAs an additional firewall, a new key pair should be used for each transaction to keep them from being linked to a common owner. Some linking is still unavoidable with multi-input transactions, which necessarily reveal that their inputs were owned by the same owner. The risk is that if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 10,
    "date_indexed": "2025-03-20T15:28:42.816850",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_11",
    "title": "11. Calculations",
    "text": "We consider the scenario of an attacker trying to generate an alternate chain faster than the honest chain. Even if this is accomplished, it does not throw the system open to arbitrary changes, such as creating value out of thin air or taking money that never belonged to the attacker. Nodes are not going to accept an invalid transaction as payment, and honest nodes will never accept a block containing them. An attacker can only try to change one of his own transactions to take back money he recently spent.\n\nThe race between the honest chain and an attacker chain can be characterized as a Binomial Random Walk. The success event is the honest chain being extended by one block, increasing its lead by +1, and the failure event is the attacker's chain being extended by one block, reducing the gap by -1.\n\nThe probability of an attacker catching up from a given deficit is analogous to a Gambler's Ruin problem. Suppose a gambler with unlimited credit starts at a deficit and plays potentially an infinite number of trials to try to reach breakeven. We can calculate the probability he ever reaches breakeven, or that an attacker ever catches up with the honest chain, as follows [^8]:\n\n```plaintext\np = probability an honest node finds the next block<\nq = probability the attacker finds the next block\nq = probability the attacker will ever catch up from z blocks behind\n``````\n     \n$$\nqz = \n\\begin{cases} \n1 & \\text{if } p \\leq q \\\\\n\\left(\\frac{q}{p}\\right) z & \\text{if } p > q \n\\end{cases}\n$$\n\nGiven our assumption that p > q, the probability drops exponentially as the number of blocks the attacker has to catch up with increases. With the odds against him, if he doesn't make a lucky lunge forward early on, his chances become vanishingly small as he falls further behind. \n\nWe now consider how long the recipient of a new transaction needs to wait before being sufficiently certain the sender can't change the transaction. We assume the sender is an attacker who wants to make the recipient believe he paid him for a while, then switch it to pay back to himself after some time has passed. The receiver will be alerted when that happens, but the sender hopes it will be too late.\n\nThe receiver generates a new key pair and gives the public key to the sender shortly before signing. This prevents the sender from preparing a chain of blocks ahead of time by working on it continuously until he is lucky enough to get far enough ahead, then executing the transaction at that moment. Once the transaction is sent, the dishonest sender starts working in secret on a parallel chain containing an alternate version of his transaction.\n\nThe recipient waits until the transaction has been added to a block and z blocks have been linked after it. He doesn't know the exact amount of progress the attacker has made, but assuming the honest blocks took the average expected time per block, the attacker's potential progress will be a Poisson distribution with expected value:\n\n$$\n\\lambda = z\\frac{q}{p}\n$$\n\nTo get the probability the attacker could still catch up now, we multiply the Poisson density for each amount of progress he could have made by the probability he could catch up from that point:\n\n$$\n\\sum_{k=0}^{\\infty} \\frac{\\lambda^k e^{-\\lambda}}{k!} \\cdot \\left\\{ \n\\begin{array}{cl} \n\\left(\\frac{q}{p}\\right)^{(z-k)} & \\text{if } k \\leq z \\\\\n1 & \\text{if } k > z \n\\end{array}\n\\right.\n$$\n\nRearranging to avoid summing the infinite tail of the distribution...\n\n$$\n1 - \\sum_{k=0}^{z} \\frac{\\lambda^k e^{-\\lambda}}{k!} \\left(1-\\left(\\frac{q}{p}\\right)^{(z-k)}\\right)\n$$\n\nConverting to C code...\n\n```c\n#include <math.h>\n\ndouble AttackerSuccessProbability(double q, int z)\n{\n    double p = 1.0 - q;\n    double lambda = z * (q / p);\n    double sum = 1.0;\n    int i, k;\n    for (k = 0; k <= z; k++)\n    {\n        double poisson = exp(-lambda);\n        for (i = 1; i <= k; i++)\n            poisson *= lambda / i;\n        sum -= poisson * (1 - pow(q / p, z - k));\n    }\n    return sum;\n}\n```\nRunning some results, we can see the probability drop off exponentially with z.\n\n```plaintext\nq=0.1\nz=0 P=1.0000000\nz=1 P=0.2045873\nz=2 P=0.0509779\nz=3 P=0.0131722\nz=4 P=0.0034552\nz=5 P=0.0009137\nz=6 P=0.0002428\nz=7 P=0.0000647\nz=8 P=0.0000173\nz=9 P=0.0000046\nz=10 P=0.0000012\n\nq=0.3\nz=0 P=1.0000000\nz=5 P=0.1773523\nz=10 P=0.0416605\nz=15 P=0.0101008\nz=20 P=0.0024804\nz=25 P=0.0006132\nz=30 P=0.0001522\nz=35 P=0.0000379\nz=40 P=0.0000095\nz=45 P=0.0000024\nz=50 P=0.0000006\n```\nSolving for P less than 0.1%...\n```plaintext\nP < 0.001\nq=0.10 z=5\nq=0.15 z=8\nq=0.20 z=11\nq=0.25 z=15\nq=0.30 z=24\nq=0.35 z=41\nq=0.40 z=89\nq=0.45 z=340\n```",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 11,
    "date_indexed": "2025-03-20T15:28:42.816851",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_12",
    "title": "12. Conclusion",
    "text": "We have proposed a system for electronic transactions without relying on trust. We started with the usual framework of coins made from digital signatures, which provides strong control of ownership, but is incomplete without a way to prevent double-spending. To solve this, we proposed a peer-to-peer network using proof-of-work to record a public history of transactions that quickly becomes computationally impractical for an attacker to change if honest nodes control a majority of CPU power. The network is robust in its unstructured simplicity. Nodes work all at once with little coordination. They do not need to be identified, since messages are not routed to any particular place and only need to be delivered on a best effort basis. Nodes can leave and rejoin the network at will, accepting the proof-of-work chain as proof of what happened while they were gone. They vote with their CPU power, expressing their acceptance of valid blocks by working on extending them and rejecting invalid blocks by refusing to work on them. Any needed rules and incentives can be enforced with this consensus mechanism.\n<br>",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 12,
    "date_indexed": "2025-03-20T15:28:42.816852",
    "tags": [],
    "category": []
  },
  {
    "id": "bitcoin-whitepaper.md_section_13",
    "title": "References",
    "text": "---\n[^1]: W. Dai, \"b-money,\" http://www.weidai.com/bmoney.txt, 1998.\n[^2]: H. Massias, X.S. Avila, and J.-J. Quisquater, \"Design of a secure timestamping service with minimal\ntrust requirements,\" In 20th Symposium on Information Theory in the Benelux, May 1999.\n[^3]: S. Haber, W.S. Stornetta, \"How to time-stamp a digital document,\" In Journal of Cryptology, vol 3, no\n2, pages 99-111, 1991.\n[^4]: D. Bayer, S. Haber, W.S. Stornetta, \"Improving the efficiency and reliability of digital time-stamping,\"\nIn Sequences II: Methods in Communication, Security and Computer Science, pages 329-334, 1993.\n[^5]: S. Haber, W.S. Stornetta, \"Secure names for bit-strings,\" In Proceedings of the 4th ACM Conference\non Computer and Communications Security, pages 28-35, April 1997.\n[^6]: A. Back, \"Hashcash - a denial of service counter-measure,\"\nhttp://www.hashcash.org/papers/hashcash.pdf, 2002.\n[^7]: R.C. Merkle, \"Protocols for public key cryptosystems,\" In Proc. 1980 Symposium on Security and\nPrivacy, IEEE Computer Society, pages 122-133, April 1980.\n[^8]: W. Feller, \"An introduction to probability theory and its applications,\" 1957.",
    "source": "data/bitcoin-whitepaper.md",
    "section_number": 13,
    "date_indexed": "2025-03-20T15:28:42.816854",
    "tags": [],
    "category": []
  }
]
```