#
tokens: 44574/50000 3/347 files (page 19/23)
lines: on (toggle) GitHub
raw markdown copy reset
This is page 19 of 23. Use http://codebase.md/basicmachines-co/basic-memory?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .claude
│   ├── agents
│   │   ├── python-developer.md
│   │   └── system-architect.md
│   └── commands
│       ├── release
│       │   ├── beta.md
│       │   ├── changelog.md
│       │   ├── release-check.md
│       │   └── release.md
│       ├── spec.md
│       └── test-live.md
├── .dockerignore
├── .github
│   ├── dependabot.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   ├── documentation.md
│   │   └── feature_request.md
│   └── workflows
│       ├── claude-code-review.yml
│       ├── claude-issue-triage.yml
│       ├── claude.yml
│       ├── dev-release.yml
│       ├── docker.yml
│       ├── pr-title.yml
│       ├── release.yml
│       └── test.yml
├── .gitignore
├── .python-version
├── CHANGELOG.md
├── CITATION.cff
├── CLA.md
├── CLAUDE.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── docker-compose.yml
├── Dockerfile
├── docs
│   ├── ai-assistant-guide-extended.md
│   ├── character-handling.md
│   ├── cloud-cli.md
│   └── Docker.md
├── justfile
├── LICENSE
├── llms-install.md
├── pyproject.toml
├── README.md
├── SECURITY.md
├── smithery.yaml
├── specs
│   ├── SPEC-1 Specification-Driven Development Process.md
│   ├── SPEC-10 Unified Deployment Workflow and Event Tracking.md
│   ├── SPEC-11 Basic Memory API Performance Optimization.md
│   ├── SPEC-12 OpenTelemetry Observability.md
│   ├── SPEC-13 CLI Authentication with Subscription Validation.md
│   ├── SPEC-14 Cloud Git Versioning & GitHub Backup.md
│   ├── SPEC-14- Cloud Git Versioning & GitHub Backup.md
│   ├── SPEC-15 Configuration Persistence via Tigris for Cloud Tenants.md
│   ├── SPEC-16 MCP Cloud Service Consolidation.md
│   ├── SPEC-17 Semantic Search with ChromaDB.md
│   ├── SPEC-18 AI Memory Management Tool.md
│   ├── SPEC-19 Sync Performance and Memory Optimization.md
│   ├── SPEC-2 Slash Commands Reference.md
│   ├── SPEC-20 Simplified Project-Scoped Rclone Sync.md
│   ├── SPEC-3 Agent Definitions.md
│   ├── SPEC-4 Notes Web UI Component Architecture.md
│   ├── SPEC-5 CLI Cloud Upload via WebDAV.md
│   ├── SPEC-6 Explicit Project Parameter Architecture.md
│   ├── SPEC-7 POC to spike Tigris Turso for local access to cloud data.md
│   ├── SPEC-8 TigrisFS Integration.md
│   ├── SPEC-9 Multi-Project Bidirectional Sync Architecture.md
│   ├── SPEC-9 Signed Header Tenant Information.md
│   └── SPEC-9-1 Follow-Ups- Conflict, Sync, and Observability.md
├── src
│   └── basic_memory
│       ├── __init__.py
│       ├── alembic
│       │   ├── alembic.ini
│       │   ├── env.py
│       │   ├── migrations.py
│       │   ├── script.py.mako
│       │   └── versions
│       │       ├── 3dae7c7b1564_initial_schema.py
│       │       ├── 502b60eaa905_remove_required_from_entity_permalink.py
│       │       ├── 5fe1ab1ccebe_add_projects_table.py
│       │       ├── 647e7a75e2cd_project_constraint_fix.py
│       │       ├── 9d9c1cb7d8f5_add_mtime_and_size_columns_to_entity_.py
│       │       ├── a1b2c3d4e5f6_fix_project_foreign_keys.py
│       │       ├── b3c3938bacdb_relation_to_name_unique_index.py
│       │       ├── cc7172b46608_update_search_index_schema.py
│       │       └── e7e1f4367280_add_scan_watermark_tracking_to_project.py
│       ├── api
│       │   ├── __init__.py
│       │   ├── app.py
│       │   ├── routers
│       │   │   ├── __init__.py
│       │   │   ├── directory_router.py
│       │   │   ├── importer_router.py
│       │   │   ├── knowledge_router.py
│       │   │   ├── management_router.py
│       │   │   ├── memory_router.py
│       │   │   ├── project_router.py
│       │   │   ├── prompt_router.py
│       │   │   ├── resource_router.py
│       │   │   ├── search_router.py
│       │   │   └── utils.py
│       │   └── template_loader.py
│       ├── cli
│       │   ├── __init__.py
│       │   ├── app.py
│       │   ├── auth.py
│       │   ├── commands
│       │   │   ├── __init__.py
│       │   │   ├── cloud
│       │   │   │   ├── __init__.py
│       │   │   │   ├── api_client.py
│       │   │   │   ├── bisync_commands.py
│       │   │   │   ├── cloud_utils.py
│       │   │   │   ├── core_commands.py
│       │   │   │   ├── rclone_commands.py
│       │   │   │   ├── rclone_config.py
│       │   │   │   ├── rclone_installer.py
│       │   │   │   ├── upload_command.py
│       │   │   │   └── upload.py
│       │   │   ├── command_utils.py
│       │   │   ├── db.py
│       │   │   ├── import_chatgpt.py
│       │   │   ├── import_claude_conversations.py
│       │   │   ├── import_claude_projects.py
│       │   │   ├── import_memory_json.py
│       │   │   ├── mcp.py
│       │   │   ├── project.py
│       │   │   ├── status.py
│       │   │   └── tool.py
│       │   └── main.py
│       ├── config.py
│       ├── db.py
│       ├── deps.py
│       ├── file_utils.py
│       ├── ignore_utils.py
│       ├── importers
│       │   ├── __init__.py
│       │   ├── base.py
│       │   ├── chatgpt_importer.py
│       │   ├── claude_conversations_importer.py
│       │   ├── claude_projects_importer.py
│       │   ├── memory_json_importer.py
│       │   └── utils.py
│       ├── markdown
│       │   ├── __init__.py
│       │   ├── entity_parser.py
│       │   ├── markdown_processor.py
│       │   ├── plugins.py
│       │   ├── schemas.py
│       │   └── utils.py
│       ├── mcp
│       │   ├── __init__.py
│       │   ├── async_client.py
│       │   ├── project_context.py
│       │   ├── prompts
│       │   │   ├── __init__.py
│       │   │   ├── ai_assistant_guide.py
│       │   │   ├── continue_conversation.py
│       │   │   ├── recent_activity.py
│       │   │   ├── search.py
│       │   │   └── utils.py
│       │   ├── resources
│       │   │   ├── ai_assistant_guide.md
│       │   │   └── project_info.py
│       │   ├── server.py
│       │   └── tools
│       │       ├── __init__.py
│       │       ├── build_context.py
│       │       ├── canvas.py
│       │       ├── chatgpt_tools.py
│       │       ├── delete_note.py
│       │       ├── edit_note.py
│       │       ├── list_directory.py
│       │       ├── move_note.py
│       │       ├── project_management.py
│       │       ├── read_content.py
│       │       ├── read_note.py
│       │       ├── recent_activity.py
│       │       ├── search.py
│       │       ├── utils.py
│       │       ├── view_note.py
│       │       └── write_note.py
│       ├── models
│       │   ├── __init__.py
│       │   ├── base.py
│       │   ├── knowledge.py
│       │   ├── project.py
│       │   └── search.py
│       ├── repository
│       │   ├── __init__.py
│       │   ├── entity_repository.py
│       │   ├── observation_repository.py
│       │   ├── project_info_repository.py
│       │   ├── project_repository.py
│       │   ├── relation_repository.py
│       │   ├── repository.py
│       │   └── search_repository.py
│       ├── schemas
│       │   ├── __init__.py
│       │   ├── base.py
│       │   ├── cloud.py
│       │   ├── delete.py
│       │   ├── directory.py
│       │   ├── importer.py
│       │   ├── memory.py
│       │   ├── project_info.py
│       │   ├── prompt.py
│       │   ├── request.py
│       │   ├── response.py
│       │   ├── search.py
│       │   └── sync_report.py
│       ├── services
│       │   ├── __init__.py
│       │   ├── context_service.py
│       │   ├── directory_service.py
│       │   ├── entity_service.py
│       │   ├── exceptions.py
│       │   ├── file_service.py
│       │   ├── initialization.py
│       │   ├── link_resolver.py
│       │   ├── project_service.py
│       │   ├── search_service.py
│       │   └── service.py
│       ├── sync
│       │   ├── __init__.py
│       │   ├── background_sync.py
│       │   ├── sync_service.py
│       │   └── watch_service.py
│       ├── templates
│       │   └── prompts
│       │       ├── continue_conversation.hbs
│       │       └── search.hbs
│       └── utils.py
├── test-int
│   ├── BENCHMARKS.md
│   ├── cli
│   │   ├── test_project_commands_integration.py
│   │   └── test_version_integration.py
│   ├── conftest.py
│   ├── mcp
│   │   ├── test_build_context_underscore.py
│   │   ├── test_build_context_validation.py
│   │   ├── test_chatgpt_tools_integration.py
│   │   ├── test_default_project_mode_integration.py
│   │   ├── test_delete_note_integration.py
│   │   ├── test_edit_note_integration.py
│   │   ├── test_list_directory_integration.py
│   │   ├── test_move_note_integration.py
│   │   ├── test_project_management_integration.py
│   │   ├── test_project_state_sync_integration.py
│   │   ├── test_read_content_integration.py
│   │   ├── test_read_note_integration.py
│   │   ├── test_search_integration.py
│   │   ├── test_single_project_mcp_integration.py
│   │   └── test_write_note_integration.py
│   ├── test_db_wal_mode.py
│   ├── test_disable_permalinks_integration.py
│   └── test_sync_performance_benchmark.py
├── tests
│   ├── __init__.py
│   ├── api
│   │   ├── conftest.py
│   │   ├── test_async_client.py
│   │   ├── test_continue_conversation_template.py
│   │   ├── test_directory_router.py
│   │   ├── test_importer_router.py
│   │   ├── test_knowledge_router.py
│   │   ├── test_management_router.py
│   │   ├── test_memory_router.py
│   │   ├── test_project_router_operations.py
│   │   ├── test_project_router.py
│   │   ├── test_prompt_router.py
│   │   ├── test_relation_background_resolution.py
│   │   ├── test_resource_router.py
│   │   ├── test_search_router.py
│   │   ├── test_search_template.py
│   │   ├── test_template_loader_helpers.py
│   │   └── test_template_loader.py
│   ├── cli
│   │   ├── conftest.py
│   │   ├── test_cli_tools.py
│   │   ├── test_cloud_authentication.py
│   │   ├── test_ignore_utils.py
│   │   ├── test_import_chatgpt.py
│   │   ├── test_import_claude_conversations.py
│   │   ├── test_import_claude_projects.py
│   │   ├── test_import_memory_json.py
│   │   ├── test_project_add_with_local_path.py
│   │   └── test_upload.py
│   ├── conftest.py
│   ├── db
│   │   └── test_issue_254_foreign_key_constraints.py
│   ├── importers
│   │   ├── test_importer_base.py
│   │   └── test_importer_utils.py
│   ├── markdown
│   │   ├── __init__.py
│   │   ├── test_date_frontmatter_parsing.py
│   │   ├── test_entity_parser_error_handling.py
│   │   ├── test_entity_parser.py
│   │   ├── test_markdown_plugins.py
│   │   ├── test_markdown_processor.py
│   │   ├── test_observation_edge_cases.py
│   │   ├── test_parser_edge_cases.py
│   │   ├── test_relation_edge_cases.py
│   │   └── test_task_detection.py
│   ├── mcp
│   │   ├── conftest.py
│   │   ├── test_obsidian_yaml_formatting.py
│   │   ├── test_permalink_collision_file_overwrite.py
│   │   ├── test_prompts.py
│   │   ├── test_resources.py
│   │   ├── test_tool_build_context.py
│   │   ├── test_tool_canvas.py
│   │   ├── test_tool_delete_note.py
│   │   ├── test_tool_edit_note.py
│   │   ├── test_tool_list_directory.py
│   │   ├── test_tool_move_note.py
│   │   ├── test_tool_read_content.py
│   │   ├── test_tool_read_note.py
│   │   ├── test_tool_recent_activity.py
│   │   ├── test_tool_resource.py
│   │   ├── test_tool_search.py
│   │   ├── test_tool_utils.py
│   │   ├── test_tool_view_note.py
│   │   ├── test_tool_write_note.py
│   │   └── tools
│   │       └── test_chatgpt_tools.py
│   ├── Non-MarkdownFileSupport.pdf
│   ├── repository
│   │   ├── test_entity_repository_upsert.py
│   │   ├── test_entity_repository.py
│   │   ├── test_entity_upsert_issue_187.py
│   │   ├── test_observation_repository.py
│   │   ├── test_project_info_repository.py
│   │   ├── test_project_repository.py
│   │   ├── test_relation_repository.py
│   │   ├── test_repository.py
│   │   ├── test_search_repository_edit_bug_fix.py
│   │   └── test_search_repository.py
│   ├── schemas
│   │   ├── test_base_timeframe_minimum.py
│   │   ├── test_memory_serialization.py
│   │   ├── test_memory_url_validation.py
│   │   ├── test_memory_url.py
│   │   ├── test_schemas.py
│   │   └── test_search.py
│   ├── Screenshot.png
│   ├── services
│   │   ├── test_context_service.py
│   │   ├── test_directory_service.py
│   │   ├── test_entity_service_disable_permalinks.py
│   │   ├── test_entity_service.py
│   │   ├── test_file_service.py
│   │   ├── test_initialization.py
│   │   ├── test_link_resolver.py
│   │   ├── test_project_removal_bug.py
│   │   ├── test_project_service_operations.py
│   │   ├── test_project_service.py
│   │   └── test_search_service.py
│   ├── sync
│   │   ├── test_character_conflicts.py
│   │   ├── test_sync_service_incremental.py
│   │   ├── test_sync_service.py
│   │   ├── test_sync_wikilink_issue.py
│   │   ├── test_tmp_files.py
│   │   ├── test_watch_service_edge_cases.py
│   │   ├── test_watch_service_reload.py
│   │   └── test_watch_service.py
│   ├── test_config.py
│   ├── test_db_migration_deduplication.py
│   ├── test_deps.py
│   ├── test_production_cascade_delete.py
│   ├── test_rclone_commands.py
│   └── utils
│       ├── test_file_utils.py
│       ├── test_frontmatter_obsidian_compatible.py
│       ├── test_parse_tags.py
│       ├── test_permalink_formatting.py
│       ├── test_utf8_handling.py
│       └── test_validate_project_path.py
├── uv.lock
├── v0.15.0-RELEASE-DOCS.md
└── v15-docs
    ├── api-performance.md
    ├── background-relations.md
    ├── basic-memory-home.md
    ├── bug-fixes.md
    ├── chatgpt-integration.md
    ├── cloud-authentication.md
    ├── cloud-bisync.md
    ├── cloud-mode-usage.md
    ├── cloud-mount.md
    ├── default-project-mode.md
    ├── env-file-removal.md
    ├── env-var-overrides.md
    ├── explicit-project-parameter.md
    ├── gitignore-integration.md
    ├── project-root-env-var.md
    ├── README.md
    └── sqlite-performance.md
```

# Files

--------------------------------------------------------------------------------
/specs/SPEC-9 Multi-Project Bidirectional Sync Architecture.md:
--------------------------------------------------------------------------------

```markdown
   1 | ---
   2 | title: 'SPEC-9: Multi-Project Bidirectional Sync Architecture'
   3 | type: spec
   4 | permalink: specs/spec-9-multi-project-bisync
   5 | tags:
   6 | - cloud
   7 | - bisync
   8 | - architecture
   9 | - multi-project
  10 | ---
  11 | 
  12 | # SPEC-9: Multi-Project Bidirectional Sync Architecture
  13 | 
  14 | ## Status: ✅ Implementation Complete
  15 | 
  16 | **Completed Phases:**
  17 | - ✅ Phase 1: Cloud Mode Toggle & Config
  18 | - ✅ Phase 2: Bisync Updates (Multi-Project)
  19 | - ✅ Phase 3: Sync Command Dual Mode
  20 | - ✅ Phase 4: Remove Duplicate Commands & Cloud Mode Auth
  21 | - ✅ Phase 5: Mount Updates
  22 | - ✅ Phase 6: Safety & Validation
  23 | - ⏸️ Phase 7: Cloud-Side Implementation (Deferred to cloud repo)
  24 | - ✅ Phase 8.1: Testing (All test scenarios validated)
  25 | - ✅ Phase 8.2: Documentation (Core docs complete, demos pending)
  26 | 
  27 | **Key Achievements:**
  28 | - Unified CLI: `bm sync`, `bm project`, `bm tool` work transparently in both local and cloud modes
  29 | - Multi-project sync: Single `bm sync` operation handles all projects bidirectionally
  30 | - Cloud mode toggle: `bm cloud login` / `bm cloud logout` switches modes seamlessly
  31 | - Integrity checking: `bm cloud check` verifies file matching without data transfer
  32 | - Directory isolation: Mount and bisync use separate directories with conflict prevention
  33 | - Clean UX: No RCLONE_TEST files, clear error messages, transparent implementation
  34 | 
  35 | ## Why
  36 | 
  37 | **Current State:**
  38 | SPEC-8 implemented rclone bisync for cloud file synchronization, but has several architectural limitations:
  39 | 1. Syncs only a single project subdirectory (`bucket:/basic-memory`)
  40 | 2. Requires separate `bm cloud` command namespace, duplicating existing CLI commands
  41 | 3. Users must learn different commands for local vs cloud operations
  42 | 4. RCLONE_TEST marker files clutter user directories
  43 | 
  44 | **Problems:**
  45 | 1. **Duplicate Commands**: `bm project` vs `bm cloud project`, `bm tool` vs (no cloud equivalent)
  46 | 2. **Inconsistent UX**: Same operations require different command syntax depending on mode
  47 | 3. **Single Project Sync**: Users can only sync one project at a time
  48 | 4. **Manual Coordination**: Creating new projects requires manual coordination between local and cloud
  49 | 5. **Confusing Artifacts**: RCLONE_TEST marker files confuse users
  50 | 
  51 | **Goals:**
  52 | - **Unified CLI**: All existing `bm` commands work in both local and cloud mode via toggle
  53 | - **Multi-Project Sync**: Single sync operation handles all projects bidirectionally
  54 | - **Simple Mode Switch**: `bm cloud login` enables cloud mode, `logout` returns to local
  55 | - **Automatic Registration**: Projects auto-register on both local and cloud sides
  56 | - **Clean UX**: Remove unnecessary safety checks and confusing artifacts
  57 | 
  58 | ## Cloud Access Paradigm: The Dropbox Model
  59 | 
  60 | **Mental Model Shift:**
  61 | 
  62 | Basic Memory cloud access follows the **Dropbox/iCloud paradigm** - not a per-project cloud connection model.
  63 | 
  64 | **What This Means:**
  65 | 
  66 | ```
  67 | Traditional Project-Based Model (❌ Not This):
  68 |   bm cloud mount --project work      # Mount individual project
  69 |   bm cloud mount --project personal  # Mount another project
  70 |   bm cloud sync --project research   # Sync specific project
  71 |   → Multiple connections, multiple credentials, complex management
  72 | 
  73 | Dropbox Model (✅ This):
  74 |   bm cloud mount                     # One mount, all projects
  75 |   bm sync                            # One sync, all projects
  76 |   ~/basic-memory-cloud/              # One folder, all content
  77 |   → Single connection, organized by folders (projects)
  78 | ```
  79 | 
  80 | **Key Principles:**
  81 | 
  82 | 1. **Mount/Bisync = Access Methods, Not Project Tools**
  83 |    - Mount: Read-through cache to cloud (like Dropbox folder)
  84 |    - Bisync: Bidirectional sync with cloud (like Dropbox sync)
  85 |    - Both operate at **bucket level** (all projects)
  86 | 
  87 | 2. **Projects = Organization Within Cloud Space**
  88 |    - Projects are folders within your cloud storage
  89 |    - Creating a folder creates a project (auto-discovered)
  90 |    - Projects are managed via `bm project` commands
  91 | 
  92 | 3. **One Cloud Space Per Machine**
  93 |    - One set of IAM credentials per tenant
  94 |    - One mount point: `~/basic-memory-cloud/`
  95 |    - One bisync directory: `~/basic-memory-cloud-sync/` (default)
  96 |    - All projects accessible through this single entry point
  97 | 
  98 | 4. **Why This Works Better**
  99 |    - **Credential Management**: One credential set, not N sets per project
 100 |    - **Resource Efficiency**: One rclone process, not N processes
 101 |    - **Familiar Pattern**: Users already understand Dropbox/iCloud
 102 |    - **Operational Simplicity**: `mount` once, `unmount` once
 103 |    - **Scales Naturally**: Add projects by creating folders, not reconfiguring cloud access
 104 | 
 105 | **User Journey:**
 106 | 
 107 | ```bash
 108 | # Setup cloud access (once)
 109 | bm cloud login
 110 | bm cloud mount  # or: bm cloud setup for bisync
 111 | 
 112 | # Work with projects (create folders as needed)
 113 | cd ~/basic-memory-cloud/
 114 | mkdir my-new-project
 115 | echo "# Notes" > my-new-project/readme.md
 116 | 
 117 | # Cloud auto-discovers and registers project
 118 | # No additional cloud configuration needed
 119 | ```
 120 | 
 121 | This paradigm shift means **mount and bisync are infrastructure concerns**, while **projects are content organization**. Users think about their knowledge, not about cloud plumbing.
 122 | 
 123 | ## What
 124 | 
 125 | This spec affects:
 126 | 
 127 | 1. **Cloud Mode Toggle** (`config.py`, `async_client.py`):
 128 |    - Add `cloud_mode` flag to `~/.basic-memory/config.json`
 129 |    - Set/unset `BASIC_MEMORY_PROXY_URL` based on cloud mode
 130 |    - `bm cloud login` enables cloud mode, `logout` disables it
 131 |    - All CLI commands respect cloud mode via existing async_client
 132 | 
 133 | 2. **Unified CLI Commands**:
 134 |    - **Remove**: `bm cloud project` commands (duplicate of `bm project`)
 135 |    - **Enhance**: `bm sync` co-opted for bisync in cloud mode
 136 |    - **Keep**: `bm cloud login/logout/status/setup` for mode management
 137 |    - **Result**: `bm project`, `bm tool`, `bm sync` work in both modes
 138 | 
 139 | 3. **Bisync Integration** (`bisync_commands.py`):
 140 |    - Remove `--check-access` (no RCLONE_TEST files)
 141 |    - Sync bucket root (all projects), not single subdirectory
 142 |    - Project auto-registration before sync
 143 |    - `bm sync` triggers bisync in cloud mode
 144 |    - `bm sync --watch` for continuous sync
 145 | 
 146 | 4. **Config Structure**:
 147 |    ```json
 148 |    {
 149 |      "cloud_mode": true,
 150 |      "cloud_host": "https://cloud.basicmemory.com",
 151 |      "auth_tokens": {...},
 152 |      "bisync_config": {
 153 |        "profile": "balanced",
 154 |        "sync_dir": "~/basic-memory-cloud-sync"
 155 |      }
 156 |    }
 157 |    ```
 158 | 
 159 | 5. **User Workflows**:
 160 |    - **Enable cloud**: `bm cloud login` → all commands work remotely
 161 |    - **Create projects**: `bm project add "name"` creates on cloud
 162 |    - **Sync files**: `bm sync` runs bisync (all projects)
 163 |    - **Use tools**: `bm tool write-note` creates notes on cloud
 164 |    - **Disable cloud**: `bm cloud logout` → back to local mode
 165 | 
 166 | ## Implementation Tasks
 167 | 
 168 | ### Phase 1: Cloud Mode Toggle & Config (Foundation) ✅
 169 | 
 170 | **1.1 Update Config Schema**
 171 | - [x] Add `cloud_mode: bool = False` to Config model
 172 | - [x] Add `bisync_config: dict` with `profile` and `sync_dir` fields
 173 | - [x] Ensure `cloud_host` field exists
 174 | - [x] Add config migration for existing users (defaults handle this)
 175 | 
 176 | **1.2 Update async_client.py**
 177 | - [x] Read `cloud_mode` from config (not just environment)
 178 | - [x] Set `BASIC_MEMORY_PROXY_URL` from config when `cloud_mode=true`
 179 | - [x] Priority: env var > config.cloud_host (if cloud_mode) > None (local ASGI)
 180 | - [ ] Test both local and cloud mode routing
 181 | 
 182 | **1.3 Update Login/Logout Commands**
 183 | - [x] `bm cloud login`: Set `cloud_mode=true` and save config
 184 | - [x] `bm cloud login`: Set `BASIC_MEMORY_PROXY_URL` environment variable
 185 | - [x] `bm cloud logout`: Set `cloud_mode=false` and save config
 186 | - [x] `bm cloud logout`: Clear `BASIC_MEMORY_PROXY_URL` environment variable
 187 | - [x] `bm cloud status`: Show current mode (local/cloud), connection status
 188 | 
 189 | **1.4 Skip Initialization in Cloud Mode** ✅
 190 | - [x] Update `ensure_initialization()` to check `cloud_mode` and return early
 191 | - [x] Document that `config.projects` is only used in local mode
 192 | - [x] Cloud manages its own projects via API, no local reconciliation needed
 193 | 
 194 | ### Phase 2: Bisync Updates (Multi-Project)
 195 | 
 196 | **2.1 Remove RCLONE_TEST Files** ✅
 197 | - [x] Update all bisync profiles: `check_access=False`
 198 | - [x] Remove RCLONE_TEST creation from `setup_cloud_bisync()`
 199 | - [x] Remove RCLONE_TEST upload logic
 200 | - [ ] Update documentation
 201 | 
 202 | **2.2 Sync Bucket Root (All Projects)** ✅
 203 | - [x] Change remote path from `bucket:/basic-memory` to `bucket:/` in `build_bisync_command()`
 204 | - [x] Update `setup_cloud_bisync()` to use bucket root
 205 | - [ ] Test with multiple projects
 206 | 
 207 | **2.3 Project Auto-Registration (Bisync)** ✅
 208 | - [x] Add `fetch_cloud_projects()` function (GET /proxy/projects/projects)
 209 | - [x] Add `scan_local_directories()` function
 210 | - [x] Add `create_cloud_project()` function (POST /proxy/projects/projects)
 211 | - [x] Integrate into `run_bisync()`: fetch → scan → create missing → sync
 212 | - [x] Wait for API 201 response before syncing
 213 | 
 214 | **2.4 Bisync Directory Configuration** ✅
 215 | - [x] Add `--dir` parameter to `bm cloud bisync-setup`
 216 | - [x] Store bisync directory in config
 217 | - [x] Default to `~/basic-memory-cloud-sync/`
 218 | - [x] Add `validate_bisync_directory()` safety check
 219 | - [x] Update `get_default_mount_path()` to return fixed `~/basic-memory-cloud/`
 220 | 
 221 | **2.5 Sync/Status API Infrastructure** ✅ (commit d48b1dc)
 222 | - [x] Create `POST /{project}/project/sync` endpoint for background sync
 223 | - [x] Create `POST /{project}/project/status` endpoint for scan-only status
 224 | - [x] Create `SyncReportResponse` Pydantic schema
 225 | - [x] Refactor CLI `sync` command to use API endpoint
 226 | - [x] Refactor CLI `status` command to use API endpoint
 227 | - [x] Create `command_utils.py` with shared `run_sync()` function
 228 | - [x] Update `notify_container_sync()` to call `run_sync()` for each project
 229 | - [x] Update all tests to match new API-based implementation
 230 | 
 231 | ### Phase 3: Sync Command Dual Mode ✅
 232 | 
 233 | **3.1 Update `bm sync` Command** ✅
 234 | - [x] Check `config.cloud_mode` at start
 235 | - [x] If `cloud_mode=false`: Run existing local sync
 236 | - [x] If `cloud_mode=true`: Run bisync
 237 | - [x] Add `--watch` parameter for continuous sync
 238 | - [x] Add `--interval` parameter (default 60 seconds)
 239 | - [x] Error if `--watch` used in local mode with helpful message
 240 | 
 241 | **3.2 Watch Mode for Bisync** ✅
 242 | - [x] Implement `run_bisync_watch()` with interval loop
 243 | - [x] Add `--interval` parameter (default 60 seconds)
 244 | - [x] Handle errors gracefully, continue on failure
 245 | - [x] Show sync progress and status
 246 | 
 247 | **3.3 Integrity Check Command** ✅
 248 | - [x] Implement `bm cloud check` command using `rclone check`
 249 | - [x] Read-only operation that verifies file matching
 250 | - [x] Error with helpful messages if rclone/bisync not set up
 251 | - [x] Support `--one-way` flag for faster checks
 252 | - [x] Transparent about rclone implementation
 253 | - [x] Suggest `bm sync` to resolve differences
 254 | 
 255 | **Implementation Notes:**
 256 | - `bm sync` adapts to cloud mode automatically - users don't need separate commands
 257 | - `bm cloud bisync` kept for power users with full options (--dry-run, --resync, --profile, --verbose)
 258 | - `bm cloud check` provides integrity verification without transferring data
 259 | - Design philosophy: Simplicity for everyday use, transparency about implementation
 260 | 
 261 | ### Phase 4: Remove Duplicate Commands & Cloud Mode Auth ✅
 262 | 
 263 | **4.0 Cloud Mode Authentication** ✅
 264 | - [x] Update `async_client.py` to support dual auth sources
 265 | - [x] FastMCP context auth (cloud service mode) via `inject_auth_header()`
 266 | - [x] JWT token file auth (CLI cloud mode) via `CLIAuth.get_valid_token()`
 267 | - [x] Automatic token refresh for CLI cloud mode
 268 | - [x] Remove `BASIC_MEMORY_PROXY_URL` environment variable dependency
 269 | - [x] Simplify to use only `config.cloud_mode` + `config.cloud_host`
 270 | 
 271 | **4.1 Delete `bm cloud project` Commands** ✅
 272 | - [x] Remove `bm cloud project list` (use `bm project list`)
 273 | - [x] Remove `bm cloud project add` (use `bm project add`)
 274 | - [x] Update `core_commands.py` to remove project_app subcommands
 275 | - [x] Keep only: `login`, `logout`, `status`, `setup`, `mount`, `unmount`, bisync commands
 276 | - [x] Remove unused imports (Table, generate_permalink, os)
 277 | - [x] Clean up environment variable references in login/logout
 278 | 
 279 | **4.2 CLI Command Cloud Mode Integration** ✅
 280 | - [x] Add runtime `cloud_mode_enabled` checks to all CLI commands
 281 | - [x] Update `list_projects()` to conditionally authenticate based on cloud mode
 282 | - [x] Update `remove_project()` to conditionally authenticate based on cloud mode
 283 | - [x] Update `run_sync()` to conditionally authenticate based on cloud mode
 284 | - [x] Update `get_project_info()` to conditionally authenticate based on cloud mode
 285 | - [x] Update `run_status()` to conditionally authenticate based on cloud mode
 286 | - [x] Remove auth from `set_default_project()` (local-only command, no cloud version)
 287 | - [x] Create CLI integration tests (`test-int/cli/`) to validate both local and cloud modes
 288 | - [x] Replace mock-heavy CLI tests with integration tests (deleted 5 mock test files)
 289 | 
 290 | **4.3 OAuth Authentication Fixes** ✅
 291 | - [x] Restore missing `SettingsConfigDict` in `BasicMemoryConfig`
 292 | - [x] Fix environment variable reading with `BASIC_MEMORY_` prefix
 293 | - [x] Fix `.env` file loading
 294 | - [x] Fix extra field handling for config files
 295 | - [x] Resolve `bm cloud login` OAuth failure ("Something went wrong" error)
 296 | - [x] Implement PKCE (Proof Key for Code Exchange) for device flow
 297 | - [x] Generate code verifier and SHA256 challenge for device authorization
 298 | - [x] Send code_verifier with token polling requests
 299 | - [x] Support both PKCE-required and PKCE-optional OAuth clients
 300 | - [x] Verify authentication flow works end-to-end with staging and production
 301 | - [x] Document WorkOS requirement: redirect URI must be configured even for device flow
 302 | 
 303 | **4.4 Update Documentation**
 304 | - [ ] Update `cloud-cli.md` with cloud mode toggle workflow
 305 | - [ ] Document `bm cloud login` → use normal commands
 306 | - [ ] Add examples of cloud mode usage
 307 | - [ ] Document mount vs bisync directory isolation
 308 | - [ ] Add troubleshooting section
 309 | 
 310 | ### Phase 5: Mount Updates ✅
 311 | 
 312 | **5.1 Fixed Mount Directory** ✅
 313 | - [x] Change mount path to `~/basic-memory-cloud/` (fixed, no tenant ID)
 314 | - [x] Update `get_default_mount_path()` function
 315 | - [x] Remove configurability (fixed location)
 316 | - [x] Update mount commands to use new path
 317 | 
 318 | **5.2 Mount at Bucket Root** ✅
 319 | - [x] Ensure mount uses bucket root (not subdirectory)
 320 | - [x] Test with multiple projects
 321 | - [x] Verify all projects visible in mount
 322 | 
 323 | **Implementation:** Mount uses fixed `~/basic-memory-cloud/` directory and syncs entire bucket root `basic-memory-{tenant_id}:{bucket_name}` for all projects.
 324 | 
 325 | ### Phase 6: Safety & Validation ✅
 326 | 
 327 | **6.1 Directory Conflict Prevention** ✅
 328 | - [x] Implement `validate_bisync_directory()` check
 329 | - [x] Detect if bisync dir == mount dir
 330 | - [x] Detect if bisync dir is currently mounted
 331 | - [x] Show clear error messages with solutions
 332 | 
 333 | **6.2 State Management** ✅
 334 | - [x] Use `--workdir` for bisync state
 335 | - [x] Store state in `~/.basic-memory/bisync-state/{tenant-id}/`
 336 | - [x] Ensure state directory created before bisync
 337 | 
 338 | **Implementation:** `validate_bisync_directory()` prevents conflicts by checking directory equality and mount status. State managed in isolated `~/.basic-memory/bisync-state/{tenant-id}/` directory using `--workdir` flag.
 339 | 
 340 | ### Phase 7: Cloud-Side Implementation (Deferred to Cloud Repo)
 341 | 
 342 | **7.1 Project Discovery Service (Cloud)** - Deferred
 343 | - [ ] Create `ProjectDiscoveryService` background job
 344 | - [ ] Scan `/app/data/` every 2 minutes
 345 | - [ ] Auto-register new directories as projects
 346 | - [ ] Log discovery events
 347 | - [ ] Handle errors gracefully
 348 | 
 349 | **7.2 Project API Updates (Cloud)** - Deferred
 350 | - [ ] Ensure `POST /proxy/projects/projects` creates directory synchronously
 351 | - [ ] Return 201 with project details
 352 | - [ ] Ensure directory ready immediately after creation
 353 | 
 354 | **Note:** Phase 7 is cloud-side work that belongs in the basic-memory-cloud repository. The CLI-side implementation (Phase 2.3 auto-registration) is complete and working - it calls the existing cloud API endpoints.
 355 | 
 356 | ### Phase 8: Testing & Documentation
 357 | 
 358 | **8.1 Test Scenarios**
 359 | - [x] Test: Cloud mode toggle (login/logout)
 360 | - [x] Test: Local-first project creation (bisync)
 361 | - [x] Test: Cloud-first project creation (API)
 362 | - [x] Test: Multi-project bidirectional sync
 363 | - [x] Test: MCP tools in cloud mode
 364 | - [x] Test: Watch mode continuous sync
 365 | - [x] Test: Safety profile protection (max_delete implemented)
 366 | - [x] Test: No RCLONE_TEST files (check_access=False in all profiles)
 367 | - [x] Test: Mount/bisync directory isolation (validate_bisync_directory)
 368 | - [x] Test: Integrity check command (bm cloud check)
 369 | 
 370 | **8.2 Documentation**
 371 | - [x] Update cloud-cli.md with cloud mode instructions
 372 | - [x] Document Dropbox model paradigm
 373 | - [x] Update command reference with new commands
 374 | - [x] Document `bm sync` dual mode behavior
 375 | - [x] Document `bm cloud check` command
 376 | - [x] Document directory structure and fixed paths
 377 | - [ ] Update README with quick start
 378 | - [ ] Create migration guide for existing users
 379 | - [ ] Create video/GIF demos
 380 | 
 381 | ### Success Criteria Checklist
 382 | 
 383 | - [x] `bm cloud login` enables cloud mode for all commands
 384 | - [x] `bm cloud logout` reverts to local mode
 385 | - [x] `bm project`, `bm tool`, `bm sync` work transparently in both modes
 386 | - [x] `bm sync` runs bisync in cloud mode, local sync in local mode
 387 | - [x] Single sync operation handles all projects bidirectionally
 388 | - [x] Local directories auto-create cloud projects via API
 389 | - [x] Cloud projects auto-sync to local directories
 390 | - [x] No RCLONE_TEST files in user directories
 391 | - [x] Bisync profiles provide safety via `max_delete` limits
 392 | - [x] `bm sync --watch` enables continuous sync
 393 | - [x] No duplicate `bm cloud project` commands (removed)
 394 | - [x] `bm cloud check` command for integrity verification
 395 | - [ ] Documentation covers cloud mode toggle and workflows
 396 | - [ ] Edge cases handled gracefully with clear errors
 397 | 
 398 | ## How (High Level)
 399 | 
 400 | ### Architecture Overview
 401 | 
 402 | **Cloud Mode Toggle:**
 403 | ```
 404 | ┌─────────────────────────────────────┐
 405 | │  bm cloud login                     │
 406 | │  ├─ Authenticate via OAuth          │
 407 | │  ├─ Set cloud_mode: true in config  │
 408 | │  └─ Set BASIC_MEMORY_PROXY_URL      │
 409 | └─────────────────────────────────────┘
 410 |            ↓
 411 | ┌─────────────────────────────────────┐
 412 | │  All CLI commands use async_client  │
 413 | │  ├─ async_client checks proxy URL   │
 414 | │  ├─ If set: HTTP to cloud           │
 415 | │  └─ If not: Local ASGI              │
 416 | └─────────────────────────────────────┘
 417 |            ↓
 418 | ┌─────────────────────────────────────┐
 419 | │  bm project add "work"              │
 420 | │  bm tool write-note ...             │
 421 | │  bm sync (triggers bisync)          │
 422 | │  → All work against cloud           │
 423 | └─────────────────────────────────────┘
 424 | ```
 425 | 
 426 | **Storage Hierarchy:**
 427 | ```
 428 | Cloud Container:                   Bucket:                      Local Sync Dir:
 429 | /app/data/ (mounted) ←→ production-tenant-{id}/ ←→ ~/basic-memory-cloud-sync/
 430 | ├── basic-memory/               ├── basic-memory/               ├── basic-memory/
 431 | │   ├── notes/                  │   ├── notes/                  │   ├── notes/
 432 | │   └── concepts/               │   └── concepts/               │   └── concepts/
 433 | ├── work-project/               ├── work-project/               ├── work-project/
 434 | │   └── tasks/                  │   └── tasks/                  │   └── tasks/
 435 | └── personal/                   └── personal/                   └── personal/
 436 |     └── journal/                    └── journal/                    └── journal/
 437 | 
 438 | Bidirectional sync via rclone bisync
 439 | ```
 440 | 
 441 | ### Sync Flow
 442 | 
 443 | **`bm sync` execution (in cloud mode):**
 444 | 
 445 | 1. **Check cloud mode**
 446 |    ```python
 447 |    if not config.cloud_mode:
 448 |        # Run normal local file sync
 449 |        run_local_sync()
 450 |        return
 451 | 
 452 |    # Cloud mode: Run bisync
 453 |    ```
 454 | 
 455 | 2. **Fetch cloud projects**
 456 |    ```python
 457 |    # GET /proxy/projects/projects (via async_client)
 458 |    cloud_projects = fetch_cloud_projects()
 459 |    cloud_project_names = {p["name"] for p in cloud_projects["projects"]}
 460 |    ```
 461 | 
 462 | 3. **Scan local sync directory**
 463 |    ```python
 464 |    sync_dir = config.bisync_config["sync_dir"]  # ~/basic-memory-cloud-sync
 465 |    local_dirs = [d.name for d in sync_dir.iterdir()
 466 |                  if d.is_dir() and not d.name.startswith('.')]
 467 |    ```
 468 | 
 469 | 4. **Create missing cloud projects**
 470 |    ```python
 471 |    for dir_name in local_dirs:
 472 |        if dir_name not in cloud_project_names:
 473 |            # POST /proxy/projects/projects (via async_client)
 474 |            create_cloud_project(name=dir_name)
 475 |            # Blocks until 201 response
 476 |    ```
 477 | 
 478 | 5. **Run bisync on bucket root**
 479 |    ```bash
 480 |    rclone bisync \
 481 |      ~/basic-memory-cloud-sync \
 482 |      basic-memory-{tenant}:{bucket} \
 483 |      --filters-file ~/.basic-memory/.bmignore.rclone \
 484 |      --conflict-resolve=newer \
 485 |      --max-delete=25
 486 |    # Syncs ALL project subdirectories bidirectionally
 487 |    ```
 488 | 
 489 | 6. **Notify cloud to refresh** (commit d48b1dc)
 490 |    ```python
 491 |    # After rclone bisync completes, sync each project's database
 492 |    for project in cloud_projects:
 493 |        # POST /{project}/project/sync (via async_client)
 494 |        # Triggers background sync for this project
 495 |        await run_sync(project=project_name)
 496 |    ```
 497 | 
 498 | ### Key Changes
 499 | 
 500 | **1. Cloud Mode via Config**
 501 | 
 502 | **Config changes:**
 503 | ```python
 504 | class Config:
 505 |     cloud_mode: bool = False
 506 |     cloud_host: str = "https://cloud.basicmemory.com"
 507 |     bisync_config: dict = {
 508 |         "profile": "balanced",
 509 |         "sync_dir": "~/basic-memory-cloud-sync"
 510 |     }
 511 | ```
 512 | 
 513 | **async_client.py behavior:**
 514 | ```python
 515 | def create_client() -> AsyncClient:
 516 |     # Check config first, then environment
 517 |     config = ConfigManager().config
 518 |     proxy_url = os.getenv("BASIC_MEMORY_PROXY_URL") or \
 519 |                 (config.cloud_host if config.cloud_mode else None)
 520 | 
 521 |     if proxy_url:
 522 |         return AsyncClient(base_url=proxy_url)  # HTTP to cloud
 523 |     else:
 524 |         return AsyncClient(transport=ASGITransport(...))  # Local ASGI
 525 | ```
 526 | 
 527 | **2. Login/Logout Sets Cloud Mode**
 528 | 
 529 | ```python
 530 | # bm cloud login
 531 | async def login():
 532 |     # Existing OAuth flow...
 533 |     success = await auth.login()
 534 |     if success:
 535 |         config.cloud_mode = True
 536 |         config.save()
 537 |         os.environ["BASIC_MEMORY_PROXY_URL"] = config.cloud_host
 538 | ```
 539 | 
 540 | ```python
 541 | # bm cloud logout
 542 | def logout():
 543 |     config.cloud_mode = False
 544 |     config.save()
 545 |     os.environ.pop("BASIC_MEMORY_PROXY_URL", None)
 546 | ```
 547 | 
 548 | **3. Remove Duplicate Commands**
 549 | 
 550 | **Delete:**
 551 | - `bm cloud project list` → use `bm project list`
 552 | - `bm cloud project add` → use `bm project add`
 553 | 
 554 | **Keep:**
 555 | - `bm cloud login` - Enable cloud mode
 556 | - `bm cloud logout` - Disable cloud mode
 557 | - `bm cloud status` - Show current mode & connection
 558 | - `bm cloud setup` - Initial bisync setup
 559 | - `bm cloud bisync` - Power-user command with full options
 560 | - `bm cloud check` - Verify file integrity between local and cloud
 561 | 
 562 | **4. Sync Command Dual Mode**
 563 | 
 564 | ```python
 565 | # bm sync
 566 | def sync_command(watch: bool = False, profile: str = "balanced"):
 567 |     config = ConfigManager().config
 568 | 
 569 |     if config.cloud_mode:
 570 |         # Run bisync for cloud sync
 571 |         run_bisync(profile=profile, watch=watch)
 572 |     else:
 573 |         # Run local file sync
 574 |         run_local_sync()
 575 | ```
 576 | 
 577 | **5. Remove RCLONE_TEST Files**
 578 | 
 579 | ```python
 580 | # All profiles: check_access=False
 581 | BISYNC_PROFILES = {
 582 |     "safe": RcloneBisyncProfile(check_access=False, max_delete=10),
 583 |     "balanced": RcloneBisyncProfile(check_access=False, max_delete=25),
 584 |     "fast": RcloneBisyncProfile(check_access=False, max_delete=50),
 585 | }
 586 | ```
 587 | 
 588 | **6. Sync Bucket Root (All Projects)**
 589 | 
 590 | ```python
 591 | # Sync entire bucket, not subdirectory
 592 | rclone_remote = f"basic-memory-{tenant_id}:{bucket_name}"
 593 | ```
 594 | 
 595 | ## How to Evaluate
 596 | 
 597 | ### Test Scenarios
 598 | 
 599 | **1. Cloud Mode Toggle**
 600 | ```bash
 601 | # Start in local mode
 602 | bm project list
 603 | # → Shows local projects
 604 | 
 605 | # Enable cloud mode
 606 | bm cloud login
 607 | # → Authenticates, sets cloud_mode=true
 608 | 
 609 | bm project list
 610 | # → Now shows cloud projects (same command!)
 611 | 
 612 | # Disable cloud mode
 613 | bm cloud logout
 614 | 
 615 | bm project list
 616 | # → Back to local projects
 617 | ```
 618 | 
 619 | **Expected:** ✅ Single command works in both modes
 620 | 
 621 | **2. Local-First Project Creation (Cloud Mode)**
 622 | ```bash
 623 | # Enable cloud mode
 624 | bm cloud login
 625 | 
 626 | # Create new project locally in sync dir
 627 | mkdir ~/basic-memory-cloud-sync/my-research
 628 | echo "# Research Notes" > ~/basic-memory-cloud-sync/my-research/index.md
 629 | 
 630 | # Run sync (triggers bisync in cloud mode)
 631 | bm sync
 632 | 
 633 | # Verify:
 634 | # - Cloud project created automatically via API
 635 | # - Files synced to bucket:/my-research/
 636 | # - Cloud database updated
 637 | # - `bm project list` shows new project
 638 | ```
 639 | 
 640 | **Expected:** ✅ Project visible in cloud project list
 641 | 
 642 | **3. Cloud-First Project Creation**
 643 | ```bash
 644 | # In cloud mode
 645 | bm project add "work-notes"
 646 | # → Creates project on cloud (via async_client HTTP)
 647 | 
 648 | # Run sync
 649 | bm sync
 650 | 
 651 | # Verify:
 652 | # - Local directory ~/basic-memory-cloud-sync/work-notes/ created
 653 | # - Files sync bidirectionally
 654 | # - Can use `bm tool write-note` to add content remotely
 655 | ```
 656 | 
 657 | **Expected:** ✅ Project accessible via all CLI commands
 658 | 
 659 | **4. Multi-Project Bidirectional Sync**
 660 | ```bash
 661 | # Setup: 3 projects in cloud mode
 662 | # Modify files in all 3 locally and remotely
 663 | 
 664 | bm sync
 665 | 
 666 | # Verify:
 667 | # - All 3 projects sync simultaneously
 668 | # - Changes propagate correctly
 669 | # - No cross-project interference
 670 | ```
 671 | 
 672 | **Expected:** ✅ All projects in sync state
 673 | 
 674 | **5. MCP Tools Work in Cloud Mode**
 675 | ```bash
 676 | # In cloud mode
 677 | bm tool write-note \
 678 |   --title "Meeting Notes" \
 679 |   --folder "work-notes" \
 680 |   --content "Discussion points..."
 681 | 
 682 | # Verify:
 683 | # - Note created on cloud (via async_client HTTP)
 684 | # - Next `bm sync` pulls note to local
 685 | # - Note appears in ~/basic-memory-cloud-sync/work-notes/
 686 | ```
 687 | 
 688 | **Expected:** ✅ Tools work transparently in cloud mode
 689 | 
 690 | **6. Watch Mode Continuous Sync**
 691 | ```bash
 692 | # In cloud mode
 693 | bm sync --watch
 694 | 
 695 | # While running:
 696 | # - Create local folder → auto-creates cloud project
 697 | # - Edit files locally → syncs to cloud
 698 | # - Edit files remotely → syncs to local
 699 | # - Create project via API → appears locally
 700 | 
 701 | # Verify:
 702 | # - Continuous bidirectional sync
 703 | # - New projects handled automatically
 704 | # - No manual intervention needed
 705 | ```
 706 | 
 707 | **Expected:** ✅ Seamless continuous sync
 708 | 
 709 | **7. Safety Profile Protection**
 710 | ```bash
 711 | # Create project with 15 files locally
 712 | # Delete project from cloud (simulate error)
 713 | 
 714 | bm sync --profile safe
 715 | 
 716 | # Verify:
 717 | # - Bisync detects 15 pending deletions
 718 | # - Exceeds max_delete=10 limit
 719 | # - Aborts with clear error
 720 | # - No files deleted locally
 721 | ```
 722 | 
 723 | **Expected:** ✅ Safety limit prevents data loss
 724 | 
 725 | **8. No RCLONE_TEST Files**
 726 | ```bash
 727 | # After setup and multiple syncs
 728 | ls -la ~/basic-memory-cloud-sync/
 729 | 
 730 | # Verify:
 731 | # - No RCLONE_TEST files
 732 | # - No .rclone state files (in ~/.basic-memory/bisync-state/)
 733 | # - Clean directory structure
 734 | ```
 735 | 
 736 | **Expected:** ✅ User directory stays clean
 737 | 
 738 | ### Success Criteria
 739 | 
 740 | - [x] `bm cloud login` enables cloud mode for all commands
 741 | - [x] `bm cloud logout` reverts to local mode
 742 | - [x] `bm project`, `bm tool`, `bm sync` work in both modes transparently
 743 | - [x] `bm sync` runs bisync in cloud mode, local sync in local mode
 744 | - [x] Single sync operation handles all projects bidirectionally
 745 | - [x] Local directories auto-create cloud projects via API
 746 | - [x] Cloud projects auto-sync to local directories
 747 | - [x] No RCLONE_TEST files in user directories
 748 | - [x] Bisync profiles provide safety via `max_delete` limits
 749 | - [x] `bm sync --watch` enables continuous sync
 750 | - [x] No duplicate `bm cloud project` commands (removed)
 751 | - [x] `bm cloud check` command for integrity verification
 752 | - [ ] Documentation covers cloud mode toggle and workflows
 753 | - [ ] Edge cases handled gracefully with clear errors
 754 | 
 755 | ## Notes
 756 | 
 757 | ### API Contract
 758 | 
 759 | **Cloud must provide:**
 760 | 
 761 | 1. **Project Management APIs:**
 762 |    - `GET /proxy/projects/projects` - List all projects
 763 |    - `POST /proxy/projects/projects` - Create project synchronously
 764 |    - `POST /proxy/sync` - Trigger cache refresh
 765 | 
 766 | 2. **Project Discovery Service (Background):**
 767 |    - **Purpose**: Auto-register projects created via mount, direct bucket uploads, or any non-API method
 768 |    - **Interval**: Every 2 minutes
 769 |    - **Behavior**:
 770 |      - Scan `/app/data/` for directories
 771 |      - Register any directory not already in project database
 772 |      - Log discovery events
 773 |    - **Implementation**:
 774 |      ```python
 775 |      class ProjectDiscoveryService:
 776 |          """Background service to auto-discover projects from filesystem."""
 777 | 
 778 |          async def run(self):
 779 |              """Scan /app/data/ and register new project directories."""
 780 |              data_path = Path("/app/data")
 781 | 
 782 |              for dir_path in data_path.iterdir():
 783 |                  # Skip hidden and special directories
 784 |                  if not dir_path.is_dir() or dir_path.name.startswith('.'):
 785 |                      continue
 786 | 
 787 |                  project_name = dir_path.name
 788 | 
 789 |                  # Check if project already registered
 790 |                  project = await self.project_repo.get_by_name(project_name)
 791 |                  if not project:
 792 |                      # Auto-register new project
 793 |                      await self.project_repo.create(
 794 |                          name=project_name,
 795 |                          path=str(dir_path)
 796 |                      )
 797 |                      logger.info(f"Auto-discovered project: {project_name}")
 798 |      ```
 799 | 
 800 | **Project Creation (API-based):**
 801 | - API creates `/app/data/{project-name}/` directory
 802 | - Registers project in database
 803 | - Returns 201 with project details
 804 | - Directory ready for bisync immediately
 805 | 
 806 | **Project Creation (Discovery-based):**
 807 | - User creates folder via mount: `~/basic-memory-cloud/new-project/`
 808 | - Files appear in `/app/data/new-project/` (mounted bucket)
 809 | - Discovery service finds directory on next scan (within 2 minutes)
 810 | - Auto-registers as project
 811 | - User sees project in `bm project list` after discovery
 812 | 
 813 | **Why Both Methods:**
 814 | - **API**: Immediate registration when using bisync (client-side scan + API call)
 815 | - **Discovery**: Delayed registration when using mount (no API call hook)
 816 | - **Result**: Projects created ANY way (API, mount, bisync, WebDAV) eventually registered
 817 | - **Trade-off**: 2-minute delay for mount-created projects is acceptable
 818 | 
 819 | ### Mount vs Bisync Directory Isolation
 820 | 
 821 | **Critical Safety Requirement**: Mount and bisync MUST use different directories to prevent conflicts.
 822 | 
 823 | **The Dropbox Model Applied:**
 824 | 
 825 | Both mount and bisync operate at **bucket level** (all projects), following the Dropbox/iCloud paradigm:
 826 | 
 827 | ```
 828 | ~/basic-memory-cloud/          # Mount: Read-through cache (like Dropbox folder)
 829 | ├── work-notes/
 830 | ├── personal/
 831 | └── research/
 832 | 
 833 | ~/basic-memory-cloud-sync/         # Bisync: Bidirectional sync (like Dropbox sync folder)
 834 | ├── work-notes/
 835 | ├── personal/
 836 | └── research/
 837 | ```
 838 | 
 839 | **Mount Directory (Fixed):**
 840 | ```bash
 841 | # Fixed location, not configurable
 842 | ~/basic-memory-cloud/
 843 | ```
 844 | - **Scope**: Entire bucket (all projects)
 845 | - **Method**: NFS mount via `rclone nfsmount`
 846 | - **Behavior**: Read-through cache to cloud bucket
 847 | - **Credentials**: One IAM credential set per tenant
 848 | - **Process**: One rclone mount process
 849 | - **Use Case**: Quick access, browsing, light editing
 850 | - **Known Issue**: Obsidian compatibility problems with NFS
 851 | - **Not Configurable**: Fixed location prevents user error
 852 | 
 853 | **Why Fixed Location:**
 854 | - One mount point per machine (like `/Users/you/Dropbox`)
 855 | - Prevents credential proliferation (one credential set, not N)
 856 | - Prevents multiple mount processes (resource efficiency)
 857 | - Familiar pattern users already understand
 858 | - Simple operations: `mount` once, `unmount` once
 859 | 
 860 | **Bisync Directory (User Configurable):**
 861 | ```bash
 862 | # Default location
 863 | ~/basic-memory-cloud-sync/
 864 | 
 865 | # User can override
 866 | bm cloud setup --dir ~/my-knowledge-base
 867 | ```
 868 | - **Scope**: Entire bucket (all projects)
 869 | - **Method**: Bidirectional sync via `rclone bisync`
 870 | - **Behavior**: Full local copy with periodic sync
 871 | - **Credentials**: Same IAM credential set as mount
 872 | - **Use Case**: Full offline access, reliable editing, Obsidian support
 873 | - **Configurable**: Users may want specific locations (external drive, existing folder structure)
 874 | 
 875 | **Why User Configurable:**
 876 | - Users have preferences for where local copies live
 877 | - May want sync folder on external drive
 878 | - May want to integrate with existing folder structure
 879 | - Default works for most, option available for power users
 880 | 
 881 | **Conflict Prevention:**
 882 | ```python
 883 | def validate_bisync_directory(bisync_dir: Path):
 884 |     """Ensure bisync directory doesn't conflict with mount."""
 885 |     mount_dir = Path.home() / "basic-memory-cloud"
 886 | 
 887 |     if bisync_dir.resolve() == mount_dir.resolve():
 888 |         raise BisyncError(
 889 |             f"Cannot use {bisync_dir} for bisync - it's the mount directory!\n"
 890 |             f"Mount and bisync must use different directories.\n\n"
 891 |             f"Options:\n"
 892 |             f"  1. Use default: ~/basic-memory-cloud-sync/\n"
 893 |             f"  2. Specify different directory: --dir ~/my-sync-folder"
 894 |         )
 895 | 
 896 |     # Check if mount is active at this location
 897 |     result = subprocess.run(["mount"], capture_output=True, text=True)
 898 |     if str(bisync_dir) in result.stdout and "rclone" in result.stdout:
 899 |         raise BisyncError(
 900 |             f"{bisync_dir} is currently mounted via 'bm cloud mount'\n"
 901 |             f"Cannot use mounted directory for bisync.\n\n"
 902 |             f"Either:\n"
 903 |             f"  1. Unmount first: bm cloud unmount\n"
 904 |             f"  2. Use different directory for bisync"
 905 |         )
 906 | ```
 907 | 
 908 | **Why This Matters:**
 909 | - Mounting and syncing the SAME directory would create infinite loops
 910 | - rclone mount → bisync detects changes → syncs to bucket → mount sees changes → triggers bisync → ∞
 911 | - Separate directories = clean separation of concerns
 912 | - Mount is read-heavy caching layer, bisync is write-heavy bidirectional sync
 913 | 
 914 | ### Future Enhancements
 915 | 
 916 | **Phase 2 (Not in this spec):**
 917 | - **Near Real-Time Sync**: Integrate `watch_service.py` with cloud mode
 918 |   - Watch service detects local changes (already battle-tested)
 919 |   - Queue changes in memory
 920 |   - Use `rclone copy` for individual file sync (near instant)
 921 |   - Example: `rclone copyto ~/sync/project/file.md tenant:{bucket}/project/file.md`
 922 |   - Fallback to full `rclone bisync` every N seconds for bidirectional changes
 923 |   - Provides near real-time sync without polling overhead
 924 | - Per-project bisync profiles (different safety levels per project)
 925 | - Selective project sync (exclude specific projects from sync)
 926 | - Project deletion workflow (cascade to cloud/local)
 927 | - Conflict resolution UI/CLI
 928 | 
 929 | **Phase 3:**
 930 | - Project sharing between tenants
 931 | - Incremental backup/restore
 932 | - Sync statistics and bandwidth monitoring
 933 | - Mobile app integration with cloud mode
 934 | 
 935 | ### Related Specs
 936 | 
 937 | - **SPEC-8**: TigrisFS Integration - Original bisync implementation
 938 | - **SPEC-6**: Explicit Project Parameter Architecture - Multi-project foundations
 939 | - **SPEC-5**: CLI Cloud Upload via WebDAV - Cloud file operations
 940 | 
 941 | ### Implementation Notes
 942 | 
 943 | **Architectural Simplifications:**
 944 | - **Unified CLI**: Eliminated duplicate commands by using mode toggle
 945 | - **Single Entry Point**: All commands route through `async_client` which handles mode
 946 | - **Config-Driven**: Cloud mode stored in persistent config, not just environment
 947 | - **Transparent Routing**: Existing commands work without modification in cloud mode
 948 | 
 949 | **Complexity Trade-offs:**
 950 | - Removed: Separate `bm cloud project` command namespace
 951 | - Removed: Complex state detection for new projects
 952 | - Removed: RCLONE_TEST marker file management
 953 | - Added: Simple cloud_mode flag and config integration
 954 | - Added: Simple project list comparison before sync
 955 | - Relied on: Existing bisync profile safety mechanisms
 956 | - Result: Significantly simpler, more maintainable code
 957 | 
 958 | **User Experience:**
 959 | - **Mental Model**: "Toggle cloud mode, use normal commands"
 960 | - **No Learning Curve**: Same commands work locally and in cloud
 961 | - **Minimal Config**: Just login/logout to switch modes
 962 | - **Safety**: Profile system gives users control over safety/speed trade-offs
 963 | - **"Just Works"**: Create folders anywhere, they sync automatically
 964 | 
 965 | **Migration Path:**
 966 | - Existing `bm cloud project` users: Use `bm project` instead
 967 | - Existing `bm cloud bisync` becomes `bm sync` in cloud mode
 968 | - Config automatically migrates on first `bm cloud login`
 969 | 
 970 | 
 971 | ## Testing
 972 | 
 973 | 
 974 | Initial Setup (One Time)
 975 | 
 976 | 1. Login to cloud and enable cloud mode:
 977 | bm cloud login
 978 | # → Authenticates via OAuth
 979 | # → Sets cloud_mode=true in config
 980 | # → Sets BASIC_MEMORY_PROXY_URL environment variable
 981 | # → All CLI commands now route to cloud
 982 | 
 983 | 2. Check cloud mode status:
 984 | bm cloud status
 985 | # → Shows: Mode: Cloud (enabled)
 986 | # → Shows: Host: https://cloud.basicmemory.com
 987 | # → Checks cloud health
 988 | 
 989 | 3. Set up bidirectional sync:
 990 | bm cloud bisync-setup
 991 | # Or with custom directory:
 992 | bm cloud bisync-setup --dir ~/my-sync-folder
 993 | 
 994 | # This will:
 995 | # → Install rclone (if not already installed)
 996 | # → Get tenant info (tenant_id, bucket_name)
 997 | # → Generate scoped IAM credentials
 998 | # → Configure rclone with credentials
 999 | # → Create sync directory (default: ~/basic-memory-cloud-sync/)
1000 | # → Validate no conflict with mount directory
1001 | # → Run initial --resync to establish baseline
1002 | 
1003 | Normal Usage
1004 | 
1005 | 4. Create local project and sync:
1006 | # Create a local project directory
1007 | mkdir ~/basic-memory-cloud-sync/my-research
1008 | echo "# Research Notes" > ~/basic-memory-cloud-sync/my-research/readme.md
1009 | 
1010 | # Run sync
1011 | bm cloud bisync
1012 | 
1013 | # Auto-magic happens:
1014 | # → Checks for new local directories
1015 | # → Finds "my-research" not in cloud
1016 | # → Creates project on cloud via POST /proxy/projects/projects
1017 | # → Runs bidirectional sync (all projects)
1018 | # → Syncs to bucket root (all projects synced together)
1019 | 
1020 | 5. Watch mode for continuous sync:
1021 | bm cloud bisync --watch
1022 | # Or with custom interval:
1023 | bm cloud bisync --watch --interval 30
1024 | 
1025 | # → Syncs every 60 seconds (or custom interval)
1026 | # → Auto-registers new projects on each run
1027 | # → Press Ctrl+C to stop
1028 | 
1029 | 6. Check bisync status:
1030 | bm cloud bisync-status
1031 | # → Shows tenant ID
1032 | # → Shows sync directory path
1033 | # → Shows initialization status
1034 | # → Shows last sync time
1035 | # → Lists available profiles (safe/balanced/fast)
1036 | 
1037 | 7. Manual sync with different profiles:
1038 | # Safe mode (max 10 deletes, preserves conflicts)
1039 | bm cloud bisync --profile safe
1040 | 
1041 | # Balanced mode (max 25 deletes, auto-resolve to newer) - default
1042 | bm cloud bisync --profile balanced
1043 | 
1044 | # Fast mode (max 50 deletes, skip verification)
1045 | bm cloud bisync --profile fast
1046 | 
1047 | 8. Dry run to preview changes:
1048 | bm cloud bisync --dry-run
1049 | # → Shows what would be synced without making changes
1050 | 
1051 | 9. Force resync (if needed):
1052 | bm cloud bisync --resync
1053 | # → Establishes new baseline
1054 | # → Use if sync state is corrupted
1055 | 
1056 | 10. Check file integrity:
1057 | bm cloud check
1058 | # → Verifies all files match between local and cloud
1059 | # → Read-only operation (no data transfer)
1060 | # → Shows differences if any found
1061 | 
1062 | # Faster one-way check
1063 | bm cloud check --one-way
1064 | # → Only checks for missing files on destination
1065 | 
1066 | Verify Cloud Mode Integration
1067 | 
1068 | 11. Test that all commands work in cloud mode:
1069 | # List cloud projects (not local)
1070 | bm project list
1071 | 
1072 | # Create project on cloud
1073 | bm project add "work-notes"
1074 | 
1075 | # Use MCP tools against cloud
1076 | bm tool write-note --title "Test" --folder "my-research" --content "Hello"
1077 | 
1078 | # All of these work against cloud because cloud_mode=true
1079 | 
1080 | 12. Switch back to local mode:
1081 | bm cloud logout
1082 | # → Sets cloud_mode=false
1083 | # → Clears BASIC_MEMORY_PROXY_URL
1084 | # → All commands now work locally again
1085 | 
1086 | Expected Directory Structure
1087 | 
1088 | ~/basic-memory-cloud-sync/          # Your local sync directory
1089 | ├── my-research/                    # Auto-created cloud project
1090 | │   ├── readme.md
1091 | │   └── notes.md
1092 | ├── work-notes/                     # Another project
1093 | │   └── tasks.md
1094 | └── personal/                       # Another project
1095 |   └── journal.md
1096 | 
1097 | # All sync bidirectionally with:
1098 | bucket:/                            # Cloud bucket root
1099 | ├── my-research/
1100 | ├── work-notes/
1101 | └── personal/
1102 | 
1103 | Key Points to Test
1104 | 
1105 | 1. ✅ Cloud mode toggle works (login/logout)
1106 | 2. ✅ Bisync setup validates directory (no conflict with mount)
1107 | 3. ✅ Local directories auto-create cloud projects
1108 | 4. ✅ All projects sync together (bucket root)
1109 | 5. ✅ No RCLONE_TEST files created
1110 | 6. ✅ Changes sync bidirectionally
1111 | 7. ✅ Watch mode continuous sync works
1112 | 8. ✅ Profile safety limits work (max_delete)
1113 | 9. ✅ `bm sync` adapts to cloud mode automatically
1114 | 10. ✅ `bm cloud check` verifies file integrity without side effects
1115 | 
```

--------------------------------------------------------------------------------
/tests/mcp/test_tool_write_note.py:
--------------------------------------------------------------------------------

```python
   1 | """Tests for note tools that exercise the full stack with SQLite."""
   2 | 
   3 | from textwrap import dedent
   4 | import pytest
   5 | 
   6 | from basic_memory.mcp.tools import write_note, read_note, delete_note
   7 | from basic_memory.utils import normalize_newlines
   8 | 
   9 | 
  10 | @pytest.mark.asyncio
  11 | async def test_write_note(app, test_project):
  12 |     """Test creating a new note.
  13 | 
  14 |     Should:
  15 |     - Create entity with correct type and content
  16 |     - Save markdown content
  17 |     - Handle tags correctly
  18 |     - Return valid permalink
  19 |     """
  20 |     result = await write_note.fn(
  21 |         project=test_project.name,
  22 |         title="Test Note",
  23 |         folder="test",
  24 |         content="# Test\nThis is a test note",
  25 |         tags=["test", "documentation"],
  26 |     )
  27 | 
  28 |     assert result
  29 |     assert "# Created note" in result
  30 |     assert f"project: {test_project.name}" in result
  31 |     assert "file_path: test/Test Note.md" in result
  32 |     assert "permalink: test/test-note" in result
  33 |     assert "## Tags" in result
  34 |     assert "- test, documentation" in result
  35 |     assert f"[Session: Using project '{test_project.name}']" in result
  36 | 
  37 |     # Try reading it back via permalink
  38 |     content = await read_note.fn("test/test-note", project=test_project.name)
  39 |     assert (
  40 |         normalize_newlines(
  41 |             dedent("""
  42 |         ---
  43 |         title: Test Note
  44 |         type: note
  45 |         permalink: test/test-note
  46 |         tags:
  47 |         - test
  48 |         - documentation
  49 |         ---
  50 |         
  51 |         # Test
  52 |         This is a test note
  53 |         """).strip()
  54 |         )
  55 |         in content
  56 |     )
  57 | 
  58 | 
  59 | @pytest.mark.asyncio
  60 | async def test_write_note_no_tags(app, test_project):
  61 |     """Test creating a note without tags."""
  62 |     result = await write_note.fn(
  63 |         project=test_project.name, title="Simple Note", folder="test", content="Just some text"
  64 |     )
  65 | 
  66 |     assert result
  67 |     assert "# Created note" in result
  68 |     assert f"project: {test_project.name}" in result
  69 |     assert "file_path: test/Simple Note.md" in result
  70 |     assert "permalink: test/simple-note" in result
  71 |     assert f"[Session: Using project '{test_project.name}']" in result
  72 |     # Should be able to read it back
  73 |     content = await read_note.fn("test/simple-note", project=test_project.name)
  74 |     assert (
  75 |         normalize_newlines(
  76 |             dedent("""
  77 |         ---
  78 |         title: Simple Note
  79 |         type: note
  80 |         permalink: test/simple-note
  81 |         ---
  82 |         
  83 |         Just some text
  84 |         """).strip()
  85 |         )
  86 |         in content
  87 |     )
  88 | 
  89 | 
  90 | @pytest.mark.asyncio
  91 | async def test_write_note_update_existing(app, test_project):
  92 |     """Test creating a new note.
  93 | 
  94 |     Should:
  95 |     - Create entity with correct type and content
  96 |     - Save markdown content
  97 |     - Handle tags correctly
  98 |     - Return valid permalink
  99 |     """
 100 |     result = await write_note.fn(
 101 |         project=test_project.name,
 102 |         title="Test Note",
 103 |         folder="test",
 104 |         content="# Test\nThis is a test note",
 105 |         tags=["test", "documentation"],
 106 |     )
 107 | 
 108 |     assert result  # Got a valid permalink
 109 |     assert "# Created note" in result
 110 |     assert f"project: {test_project.name}" in result
 111 |     assert "file_path: test/Test Note.md" in result
 112 |     assert "permalink: test/test-note" in result
 113 |     assert "## Tags" in result
 114 |     assert "- test, documentation" in result
 115 |     assert f"[Session: Using project '{test_project.name}']" in result
 116 | 
 117 |     result = await write_note.fn(
 118 |         project=test_project.name,
 119 |         title="Test Note",
 120 |         folder="test",
 121 |         content="# Test\nThis is an updated note",
 122 |         tags=["test", "documentation"],
 123 |     )
 124 |     assert "# Updated note" in result
 125 |     assert f"project: {test_project.name}" in result
 126 |     assert "file_path: test/Test Note.md" in result
 127 |     assert "permalink: test/test-note" in result
 128 |     assert "## Tags" in result
 129 |     assert "- test, documentation" in result
 130 |     assert f"[Session: Using project '{test_project.name}']" in result
 131 | 
 132 |     # Try reading it back
 133 |     content = await read_note.fn("test/test-note", project=test_project.name)
 134 |     assert (
 135 |         normalize_newlines(
 136 |             dedent(
 137 |                 """
 138 |         ---
 139 |         title: Test Note
 140 |         type: note
 141 |         permalink: test/test-note
 142 |         tags:
 143 |         - test
 144 |         - documentation
 145 |         ---
 146 |         
 147 |         # Test
 148 |         This is an updated note
 149 |         """
 150 |             ).strip()
 151 |         )
 152 |         == content
 153 |     )
 154 | 
 155 | 
 156 | @pytest.mark.asyncio
 157 | async def test_issue_93_write_note_respects_custom_permalink_new_note(app, test_project):
 158 |     """Test that write_note respects custom permalinks in frontmatter for new notes (Issue #93)"""
 159 | 
 160 |     # Create a note with custom permalink in frontmatter
 161 |     content_with_custom_permalink = dedent("""
 162 |         ---
 163 |         permalink: custom/my-desired-permalink
 164 |         ---
 165 | 
 166 |         # My New Note
 167 | 
 168 |         This note has a custom permalink specified in frontmatter.
 169 | 
 170 |         - [note] Testing if custom permalink is respected
 171 |     """).strip()
 172 | 
 173 |     result = await write_note.fn(
 174 |         project=test_project.name,
 175 |         title="My New Note",
 176 |         folder="notes",
 177 |         content=content_with_custom_permalink,
 178 |     )
 179 | 
 180 |     # Verify the custom permalink is respected
 181 |     assert "# Created note" in result
 182 |     assert f"project: {test_project.name}" in result
 183 |     assert "file_path: notes/My New Note.md" in result
 184 |     assert "permalink: custom/my-desired-permalink" in result
 185 |     assert f"[Session: Using project '{test_project.name}']" in result
 186 | 
 187 | 
 188 | @pytest.mark.asyncio
 189 | async def test_issue_93_write_note_respects_custom_permalink_existing_note(app, test_project):
 190 |     """Test that write_note respects custom permalinks when updating existing notes (Issue #93)"""
 191 | 
 192 |     # Step 1: Create initial note (auto-generated permalink)
 193 |     result1 = await write_note.fn(
 194 |         project=test_project.name,
 195 |         title="Existing Note",
 196 |         folder="test",
 197 |         content="Initial content without custom permalink",
 198 |     )
 199 | 
 200 |     assert "# Created note" in result1
 201 |     assert f"project: {test_project.name}" in result1
 202 | 
 203 |     # Extract the auto-generated permalink
 204 |     initial_permalink = None
 205 |     for line in result1.split("\n"):
 206 |         if line.startswith("permalink:"):
 207 |             initial_permalink = line.split(":", 1)[1].strip()
 208 |             break
 209 | 
 210 |     assert initial_permalink is not None
 211 | 
 212 |     # Step 2: Update with content that includes custom permalink in frontmatter
 213 |     updated_content = dedent("""
 214 |         ---
 215 |         permalink: custom/new-permalink
 216 |         ---
 217 | 
 218 |         # Existing Note
 219 | 
 220 |         Updated content with custom permalink in frontmatter.
 221 | 
 222 |         - [note] Custom permalink should be respected on update
 223 |     """).strip()
 224 | 
 225 |     result2 = await write_note.fn(
 226 |         project=test_project.name,
 227 |         title="Existing Note",
 228 |         folder="test",
 229 |         content=updated_content,
 230 |     )
 231 | 
 232 |     # Verify the custom permalink is respected
 233 |     assert "# Updated note" in result2
 234 |     assert f"project: {test_project.name}" in result2
 235 |     assert "permalink: custom/new-permalink" in result2
 236 |     assert f"permalink: {initial_permalink}" not in result2
 237 |     assert f"[Session: Using project '{test_project.name}']" in result2
 238 | 
 239 | 
 240 | @pytest.mark.asyncio
 241 | async def test_delete_note_existing(app, test_project):
 242 |     """Test deleting a new note.
 243 | 
 244 |     Should:
 245 |     - Create entity with correct type and content
 246 |     - Return valid permalink
 247 |     - Delete the note
 248 |     """
 249 |     result = await write_note.fn(
 250 |         project=test_project.name,
 251 |         title="Test Note",
 252 |         folder="test",
 253 |         content="# Test\nThis is a test note",
 254 |         tags=["test", "documentation"],
 255 |     )
 256 | 
 257 |     assert result
 258 |     assert f"project: {test_project.name}" in result
 259 | 
 260 |     deleted = await delete_note.fn("test/test-note", project=test_project.name)
 261 |     assert deleted is True
 262 | 
 263 | 
 264 | @pytest.mark.asyncio
 265 | async def test_delete_note_doesnt_exist(app, test_project):
 266 |     """Test deleting a new note.
 267 | 
 268 |     Should:
 269 |     - Delete the note
 270 |     - verify returns false
 271 |     """
 272 |     deleted = await delete_note.fn("doesnt-exist", project=test_project.name)
 273 |     assert deleted is False
 274 | 
 275 | 
 276 | @pytest.mark.asyncio
 277 | async def test_write_note_with_tag_array_from_bug_report(app, test_project):
 278 |     """Test creating a note with a tag array as reported in issue #38.
 279 | 
 280 |     This reproduces the exact payload from the bug report where Cursor
 281 |     was passing an array of tags and getting a type mismatch error.
 282 |     """
 283 |     # This is the exact payload from the bug report
 284 |     bug_payload = {
 285 |         "project": test_project.name,
 286 |         "title": "Title",
 287 |         "folder": "folder",
 288 |         "content": "CONTENT",
 289 |         "tags": ["hipporag", "search", "fallback", "symfony", "error-handling"],
 290 |     }
 291 | 
 292 |     # Try to call the function with this data directly
 293 |     result = await write_note.fn(**bug_payload)
 294 | 
 295 |     assert result
 296 |     assert f"project: {test_project.name}" in result
 297 |     assert "permalink: folder/title" in result
 298 |     assert "Tags" in result
 299 |     assert "hipporag" in result
 300 |     assert f"[Session: Using project '{test_project.name}']" in result
 301 | 
 302 | 
 303 | @pytest.mark.asyncio
 304 | async def test_write_note_verbose(app, test_project):
 305 |     """Test creating a new note.
 306 | 
 307 |     Should:
 308 |     - Create entity with correct type and content
 309 |     - Save markdown content
 310 |     - Handle tags correctly
 311 |     - Return valid permalink
 312 |     """
 313 |     result = await write_note.fn(
 314 |         project=test_project.name,
 315 |         title="Test Note",
 316 |         folder="test",
 317 |         content="""
 318 | # Test\nThis is a test note
 319 | 
 320 | - [note] First observation
 321 | - relates to [[Knowledge]]
 322 | 
 323 | """,
 324 |         tags=["test", "documentation"],
 325 |     )
 326 | 
 327 |     assert "# Created note" in result
 328 |     assert f"project: {test_project.name}" in result
 329 |     assert "file_path: test/Test Note.md" in result
 330 |     assert "permalink: test/test-note" in result
 331 |     assert "## Observations" in result
 332 |     assert "- note: 1" in result
 333 |     assert "## Relations" in result
 334 |     assert "## Tags" in result
 335 |     assert "- test, documentation" in result
 336 |     assert f"[Session: Using project '{test_project.name}']" in result
 337 | 
 338 | 
 339 | @pytest.mark.asyncio
 340 | async def test_write_note_preserves_custom_metadata(app, project_config, test_project):
 341 |     """Test that updating a note preserves custom metadata fields.
 342 | 
 343 |     Reproduces issue #36 where custom frontmatter fields like Status
 344 |     were being lost when updating notes with the write_note tool.
 345 | 
 346 |     Should:
 347 |     - Create a note with custom frontmatter
 348 |     - Update the note with new content
 349 |     - Verify custom frontmatter is preserved
 350 |     """
 351 |     # First, create a note with custom metadata using write_note
 352 |     await write_note.fn(
 353 |         project=test_project.name,
 354 |         title="Custom Metadata Note",
 355 |         folder="test",
 356 |         content="# Initial content",
 357 |         tags=["test"],
 358 |     )
 359 | 
 360 |     # Read the note to get its permalink
 361 |     content = await read_note.fn("test/custom-metadata-note", project=test_project.name)
 362 | 
 363 |     # Now directly update the file with custom frontmatter
 364 |     # We need to use a direct file update to add custom frontmatter
 365 |     import frontmatter
 366 | 
 367 |     file_path = project_config.home / "test" / "Custom Metadata Note.md"
 368 |     post = frontmatter.load(file_path)
 369 | 
 370 |     # Add custom frontmatter
 371 |     post["Status"] = "In Progress"
 372 |     post["Priority"] = "High"
 373 |     post["Version"] = "1.0"
 374 | 
 375 |     # Write the file back
 376 |     with open(file_path, "w") as f:
 377 |         f.write(frontmatter.dumps(post))
 378 | 
 379 |     # Now update the note using write_note
 380 |     result = await write_note.fn(
 381 |         project=test_project.name,
 382 |         title="Custom Metadata Note",
 383 |         folder="test",
 384 |         content="# Updated content",
 385 |         tags=["test", "updated"],
 386 |     )
 387 | 
 388 |     # Verify the update was successful
 389 |     assert (
 390 |         "Updated note\nproject: test-project\nfile_path: test/Custom Metadata Note.md"
 391 |     ) in result
 392 |     assert f"project: {test_project.name}" in result
 393 | 
 394 |     # Read the note back and check if custom frontmatter is preserved
 395 |     content = await read_note.fn("test/custom-metadata-note", project=test_project.name)
 396 | 
 397 |     # Custom frontmatter should be preserved
 398 |     assert "Status: In Progress" in content
 399 |     assert "Priority: High" in content
 400 |     # Version might be quoted as '1.0' due to YAML serialization
 401 |     assert "Version:" in content  # Just check that the field exists
 402 |     assert "1.0" in content  # And that the value exists somewhere
 403 | 
 404 |     # And new content should be there
 405 |     assert "# Updated content" in content
 406 | 
 407 |     # And tags should be updated (without # prefix)
 408 |     assert "- test" in content
 409 |     assert "- updated" in content
 410 | 
 411 | 
 412 | @pytest.mark.asyncio
 413 | async def test_write_note_preserves_content_frontmatter(app, test_project):
 414 |     """Test creating a new note."""
 415 |     await write_note.fn(
 416 |         project=test_project.name,
 417 |         title="Test Note",
 418 |         folder="test",
 419 |         content=dedent(
 420 |             """
 421 |             ---
 422 |             title: Test Note
 423 |             type: note
 424 |             version: 1.0
 425 |             author: name
 426 |             ---
 427 |             # Test
 428 | 
 429 |             This is a test note
 430 |             """
 431 |         ),
 432 |         tags=["test", "documentation"],
 433 |     )
 434 | 
 435 |     # Try reading it back via permalink
 436 |     content = await read_note.fn("test/test-note", project=test_project.name)
 437 |     assert (
 438 |         normalize_newlines(
 439 |             dedent(
 440 |                 """
 441 |             ---
 442 |             title: Test Note
 443 |             type: note
 444 |             permalink: test/test-note
 445 |             version: 1.0
 446 |             author: name
 447 |             tags:
 448 |             - test
 449 |             - documentation
 450 |             ---
 451 | 
 452 |             # Test
 453 | 
 454 |             This is a test note
 455 |             """
 456 |             ).strip()
 457 |         )
 458 |         in content
 459 |     )
 460 | 
 461 | 
 462 | @pytest.mark.asyncio
 463 | async def test_write_note_permalink_collision_fix_issue_139(app, test_project):
 464 |     """Test fix for GitHub Issue #139: UNIQUE constraint failed: entity.permalink.
 465 | 
 466 |     This reproduces the exact scenario described in the issue:
 467 |     1. Create a note with title "Note 1"
 468 |     2. Create another note with title "Note 2"
 469 |     3. Try to create/replace first note again with same title "Note 1"
 470 | 
 471 |     Before the fix, step 3 would fail with UNIQUE constraint error.
 472 |     After the fix, it should either update the existing note or create with unique permalink.
 473 |     """
 474 |     # Step 1: Create first note
 475 |     result1 = await write_note.fn(
 476 |         project=test_project.name,
 477 |         title="Note 1",
 478 |         folder="test",
 479 |         content="Original content for note 1",
 480 |     )
 481 |     assert "# Created note" in result1
 482 |     assert f"project: {test_project.name}" in result1
 483 |     assert "permalink: test/note-1" in result1
 484 | 
 485 |     # Step 2: Create second note with different title
 486 |     result2 = await write_note.fn(
 487 |         project=test_project.name, title="Note 2", folder="test", content="Content for note 2"
 488 |     )
 489 |     assert "# Created note" in result2
 490 |     assert f"project: {test_project.name}" in result2
 491 |     assert "permalink: test/note-2" in result2
 492 | 
 493 |     # Step 3: Try to create/replace first note again
 494 |     # This scenario would trigger the UNIQUE constraint failure before the fix
 495 |     result3 = await write_note.fn(
 496 |         project=test_project.name,
 497 |         title="Note 1",  # Same title as first note
 498 |         folder="test",  # Same folder as first note
 499 |         content="Replacement content for note 1",  # Different content
 500 |     )
 501 | 
 502 |     # This should not raise a UNIQUE constraint failure error
 503 |     # It should succeed and either:
 504 |     # 1. Update the existing note (preferred behavior)
 505 |     # 2. Create a new note with unique permalink (fallback behavior)
 506 | 
 507 |     assert result3 is not None
 508 |     assert f"project: {test_project.name}" in result3
 509 |     assert "Updated note" in result3 or "Created note" in result3
 510 | 
 511 |     # The result should contain either the original permalink or a unique one
 512 |     assert "permalink: test/note-1" in result3 or "permalink: test/note-1-1" in result3
 513 | 
 514 |     # Verify we can read back the content
 515 |     if "permalink: test/note-1" in result3:
 516 |         # Updated existing note case
 517 |         content = await read_note.fn("test/note-1", project=test_project.name)
 518 |         assert "Replacement content for note 1" in content
 519 |     else:
 520 |         # Created new note with unique permalink case
 521 |         content = await read_note.fn(test_project.name, "test/note-1-1")
 522 |         assert "Replacement content for note 1" in content
 523 |         # Original note should still exist
 524 |         original_content = await read_note.fn(test_project.name, "test/note-1")
 525 |         assert "Original content for note 1" in original_content
 526 | 
 527 | 
 528 | @pytest.mark.asyncio
 529 | async def test_write_note_with_custom_entity_type(app, test_project):
 530 |     """Test creating a note with custom entity_type parameter.
 531 | 
 532 |     This test verifies the fix for Issue #144 where entity_type parameter
 533 |     was hardcoded to "note" instead of allowing custom types.
 534 |     """
 535 |     result = await write_note.fn(
 536 |         project=test_project.name,
 537 |         title="Test Guide",
 538 |         folder="guides",
 539 |         content="# Guide Content\nThis is a guide",
 540 |         tags=["guide", "documentation"],
 541 |         entity_type="guide",
 542 |     )
 543 | 
 544 |     assert result
 545 |     assert "# Created note" in result
 546 |     assert f"project: {test_project.name}" in result
 547 |     assert "file_path: guides/Test Guide.md" in result
 548 |     assert "permalink: guides/test-guide" in result
 549 |     assert "## Tags" in result
 550 |     assert "- guide, documentation" in result
 551 |     assert f"[Session: Using project '{test_project.name}']" in result
 552 | 
 553 |     # Verify the entity type is correctly set in the frontmatter
 554 |     content = await read_note.fn("guides/test-guide", project=test_project.name)
 555 |     assert (
 556 |         normalize_newlines(
 557 |             dedent("""
 558 |         ---
 559 |         title: Test Guide
 560 |         type: guide
 561 |         permalink: guides/test-guide
 562 |         tags:
 563 |         - guide
 564 |         - documentation
 565 |         ---
 566 | 
 567 |         # Guide Content
 568 |         This is a guide
 569 |         """).strip()
 570 |         )
 571 |         in content
 572 |     )
 573 | 
 574 | 
 575 | @pytest.mark.asyncio
 576 | async def test_write_note_with_report_entity_type(app, test_project):
 577 |     """Test creating a note with entity_type="report"."""
 578 |     result = await write_note.fn(
 579 |         project=test_project.name,
 580 |         title="Monthly Report",
 581 |         folder="reports",
 582 |         content="# Monthly Report\nThis is a monthly report",
 583 |         tags=["report", "monthly"],
 584 |         entity_type="report",
 585 |     )
 586 | 
 587 |     assert result
 588 |     assert "# Created note" in result
 589 |     assert f"project: {test_project.name}" in result
 590 |     assert "file_path: reports/Monthly Report.md" in result
 591 |     assert "permalink: reports/monthly-report" in result
 592 |     assert f"[Session: Using project '{test_project.name}']" in result
 593 | 
 594 |     # Verify the entity type is correctly set in the frontmatter
 595 |     content = await read_note.fn("reports/monthly-report", project=test_project.name)
 596 |     assert "type: report" in content
 597 |     assert "# Monthly Report" in content
 598 | 
 599 | 
 600 | @pytest.mark.asyncio
 601 | async def test_write_note_with_config_entity_type(app, test_project):
 602 |     """Test creating a note with entity_type="config"."""
 603 |     result = await write_note.fn(
 604 |         project=test_project.name,
 605 |         title="System Config",
 606 |         folder="config",
 607 |         content="# System Configuration\nThis is a config file",
 608 |         entity_type="config",
 609 |     )
 610 | 
 611 |     assert result
 612 |     assert "# Created note" in result
 613 |     assert f"project: {test_project.name}" in result
 614 |     assert "file_path: config/System Config.md" in result
 615 |     assert "permalink: config/system-config" in result
 616 |     assert f"[Session: Using project '{test_project.name}']" in result
 617 | 
 618 |     # Verify the entity type is correctly set in the frontmatter
 619 |     content = await read_note.fn("config/system-config", project=test_project.name)
 620 |     assert "type: config" in content
 621 |     assert "# System Configuration" in content
 622 | 
 623 | 
 624 | @pytest.mark.asyncio
 625 | async def test_write_note_entity_type_default_behavior(app, test_project):
 626 |     """Test that the entity_type parameter defaults to "note" when not specified.
 627 | 
 628 |     This ensures backward compatibility - existing code that doesn't specify
 629 |     entity_type should continue to work as before.
 630 |     """
 631 |     result = await write_note.fn(
 632 |         project=test_project.name,
 633 |         title="Default Type Test",
 634 |         folder="test",
 635 |         content="# Default Type Test\nThis should be type 'note'",
 636 |         tags=["test"],
 637 |     )
 638 | 
 639 |     assert result
 640 |     assert "# Created note" in result
 641 |     assert f"project: {test_project.name}" in result
 642 |     assert "file_path: test/Default Type Test.md" in result
 643 |     assert "permalink: test/default-type-test" in result
 644 |     assert f"[Session: Using project '{test_project.name}']" in result
 645 | 
 646 |     # Verify the entity type defaults to "note"
 647 |     content = await read_note.fn("test/default-type-test", project=test_project.name)
 648 |     assert "type: note" in content
 649 |     assert "# Default Type Test" in content
 650 | 
 651 | 
 652 | @pytest.mark.asyncio
 653 | async def test_write_note_update_existing_with_different_entity_type(app, test_project):
 654 |     """Test updating an existing note with a different entity_type."""
 655 |     # Create initial note as "note" type
 656 |     result1 = await write_note.fn(
 657 |         project=test_project.name,
 658 |         title="Changeable Type",
 659 |         folder="test",
 660 |         content="# Initial Content\nThis starts as a note",
 661 |         tags=["test"],
 662 |         entity_type="note",
 663 |     )
 664 | 
 665 |     assert result1
 666 |     assert "# Created note" in result1
 667 |     assert f"project: {test_project.name}" in result1
 668 | 
 669 |     # Update the same note with a different entity_type
 670 |     result2 = await write_note.fn(
 671 |         project=test_project.name,
 672 |         title="Changeable Type",
 673 |         folder="test",
 674 |         content="# Updated Content\nThis is now a guide",
 675 |         tags=["guide"],
 676 |         entity_type="guide",
 677 |     )
 678 | 
 679 |     assert result2
 680 |     assert "# Updated note" in result2
 681 |     assert f"project: {test_project.name}" in result2
 682 | 
 683 |     # Verify the entity type was updated
 684 |     content = await read_note.fn("test/changeable-type", project=test_project.name)
 685 |     assert "type: guide" in content
 686 |     assert "# Updated Content" in content
 687 |     assert "- guide" in content
 688 | 
 689 | 
 690 | @pytest.mark.asyncio
 691 | async def test_write_note_respects_frontmatter_entity_type(app, test_project):
 692 |     """Test that entity_type in frontmatter is respected when parameter is not provided.
 693 | 
 694 |     This verifies that when write_note is called without entity_type parameter,
 695 |     but the content includes frontmatter with a 'type' field, that type is respected
 696 |     instead of defaulting to 'note'.
 697 |     """
 698 |     note = dedent("""
 699 |         ---
 700 |         title: Test Guide
 701 |         type: guide
 702 |         permalink: guides/test-guide
 703 |         tags:
 704 |         - guide
 705 |         - documentation
 706 |         ---
 707 | 
 708 |         # Guide Content
 709 |         This is a guide
 710 |         """).strip()
 711 | 
 712 |     # Call write_note without entity_type parameter - it should respect frontmatter type
 713 |     result = await write_note.fn(
 714 |         project=test_project.name, title="Test Guide", folder="guides", content=note
 715 |     )
 716 | 
 717 |     assert result
 718 |     assert "# Created note" in result
 719 |     assert f"project: {test_project.name}" in result
 720 |     assert "file_path: guides/Test Guide.md" in result
 721 |     assert "permalink: guides/test-guide" in result
 722 |     assert f"[Session: Using project '{test_project.name}']" in result
 723 | 
 724 |     # Verify the entity type from frontmatter is respected (should be "guide", not "note")
 725 |     content = await read_note.fn("guides/test-guide", project=test_project.name)
 726 |     assert "type: guide" in content
 727 |     assert "# Guide Content" in content
 728 |     assert "- guide" in content
 729 |     assert "- documentation" in content
 730 | 
 731 | 
 732 | class TestWriteNoteSecurityValidation:
 733 |     """Test write_note security validation features."""
 734 | 
 735 |     @pytest.mark.asyncio
 736 |     async def test_write_note_blocks_path_traversal_unix(self, app, test_project):
 737 |         """Test that Unix-style path traversal attacks are blocked in folder parameter."""
 738 |         # Test various Unix-style path traversal patterns
 739 |         attack_folders = [
 740 |             "../",
 741 |             "../../",
 742 |             "../../../",
 743 |             "../secrets",
 744 |             "../../etc",
 745 |             "../../../etc/passwd_folder",
 746 |             "notes/../../../etc",
 747 |             "folder/../../outside",
 748 |             "../../../../malicious",
 749 |         ]
 750 | 
 751 |         for attack_folder in attack_folders:
 752 |             result = await write_note.fn(
 753 |                 project=test_project.name,
 754 |                 title="Test Note",
 755 |                 folder=attack_folder,
 756 |                 content="# Test Content\nThis should be blocked by security validation.",
 757 |             )
 758 | 
 759 |             assert isinstance(result, str)
 760 |             assert "# Error" in result
 761 |             assert "paths must stay within project boundaries" in result
 762 |             assert attack_folder in result
 763 | 
 764 |     @pytest.mark.asyncio
 765 |     async def test_write_note_blocks_path_traversal_windows(self, app, test_project):
 766 |         """Test that Windows-style path traversal attacks are blocked in folder parameter."""
 767 |         # Test various Windows-style path traversal patterns
 768 |         attack_folders = [
 769 |             "..\\",
 770 |             "..\\..\\",
 771 |             "..\\..\\..\\",
 772 |             "..\\secrets",
 773 |             "..\\..\\Windows",
 774 |             "..\\..\\..\\Windows\\System32",
 775 |             "notes\\..\\..\\..\\Windows",
 776 |             "\\\\server\\share",
 777 |             "\\\\..\\..\\Windows",
 778 |         ]
 779 | 
 780 |         for attack_folder in attack_folders:
 781 |             result = await write_note.fn(
 782 |                 project=test_project.name,
 783 |                 title="Test Note",
 784 |                 folder=attack_folder,
 785 |                 content="# Test Content\nThis should be blocked by security validation.",
 786 |             )
 787 | 
 788 |             assert isinstance(result, str)
 789 |             assert "# Error" in result
 790 |             assert "paths must stay within project boundaries" in result
 791 |             assert attack_folder in result
 792 | 
 793 |     @pytest.mark.asyncio
 794 |     async def test_write_note_blocks_absolute_paths(self, app, test_project):
 795 |         """Test that absolute paths are blocked in folder parameter."""
 796 |         # Test various absolute path patterns
 797 |         attack_folders = [
 798 |             "/etc",
 799 |             "/home/user",
 800 |             "/var/log",
 801 |             "/root",
 802 |             "C:\\Windows",
 803 |             "C:\\Users\\user",
 804 |             "D:\\secrets",
 805 |             "/tmp/malicious",
 806 |             "/usr/local/evil",
 807 |         ]
 808 | 
 809 |         for attack_folder in attack_folders:
 810 |             result = await write_note.fn(
 811 |                 project=test_project.name,
 812 |                 title="Test Note",
 813 |                 folder=attack_folder,
 814 |                 content="# Test Content\nThis should be blocked by security validation.",
 815 |             )
 816 | 
 817 |             assert isinstance(result, str)
 818 |             assert "# Error" in result
 819 |             assert "paths must stay within project boundaries" in result
 820 |             assert attack_folder in result
 821 | 
 822 |     @pytest.mark.asyncio
 823 |     async def test_write_note_blocks_home_directory_access(self, app, test_project):
 824 |         """Test that home directory access patterns are blocked in folder parameter."""
 825 |         # Test various home directory access patterns
 826 |         attack_folders = [
 827 |             "~",
 828 |             "~/",
 829 |             "~/secrets",
 830 |             "~/.ssh",
 831 |             "~/Documents",
 832 |             "~\\AppData",
 833 |             "~\\Desktop",
 834 |             "~/.env_folder",
 835 |         ]
 836 | 
 837 |         for attack_folder in attack_folders:
 838 |             result = await write_note.fn(
 839 |                 project=test_project.name,
 840 |                 title="Test Note",
 841 |                 folder=attack_folder,
 842 |                 content="# Test Content\nThis should be blocked by security validation.",
 843 |             )
 844 | 
 845 |             assert isinstance(result, str)
 846 |             assert "# Error" in result
 847 |             assert "paths must stay within project boundaries" in result
 848 |             assert attack_folder in result
 849 | 
 850 |     @pytest.mark.asyncio
 851 |     async def test_write_note_blocks_mixed_attack_patterns(self, app, test_project):
 852 |         """Test that mixed legitimate/attack patterns are blocked in folder parameter."""
 853 |         # Test mixed patterns that start legitimate but contain attacks
 854 |         attack_folders = [
 855 |             "notes/../../../etc",
 856 |             "docs/../../.env_folder",
 857 |             "legitimate/path/../../.ssh",
 858 |             "project/folder/../../../Windows",
 859 |             "valid/folder/../../home/user",
 860 |             "assets/../../../tmp/evil",
 861 |         ]
 862 | 
 863 |         for attack_folder in attack_folders:
 864 |             result = await write_note.fn(
 865 |                 project=test_project.name,
 866 |                 title="Test Note",
 867 |                 folder=attack_folder,
 868 |                 content="# Test Content\nThis should be blocked by security validation.",
 869 |             )
 870 | 
 871 |             assert isinstance(result, str)
 872 |             assert "# Error" in result
 873 |             assert "paths must stay within project boundaries" in result
 874 | 
 875 |     @pytest.mark.asyncio
 876 |     async def test_write_note_allows_safe_folder_paths(self, app, test_project):
 877 |         """Test that legitimate folder paths are still allowed."""
 878 |         # Test various safe folder patterns
 879 |         safe_folders = [
 880 |             "notes",
 881 |             "docs",
 882 |             "projects/2025",
 883 |             "archive/old-notes",
 884 |             "deep/nested/directory/structure",
 885 |             "folder/subfolder",
 886 |             "research/ml",
 887 |             "meeting-notes",
 888 |         ]
 889 | 
 890 |         for safe_folder in safe_folders:
 891 |             result = await write_note.fn(
 892 |                 project=test_project.name,
 893 |                 title=f"Test Note in {safe_folder.replace('/', '-')}",
 894 |                 folder=safe_folder,
 895 |                 content="# Test Content\nThis should work normally with security validation.",
 896 |                 tags=["test", "security"],
 897 |             )
 898 | 
 899 |             # Should succeed (not a security error)
 900 |             assert isinstance(result, str)
 901 |             assert "# Error" not in result
 902 |             assert "paths must stay within project boundaries" not in result
 903 |             # Should be normal successful creation/update
 904 |             assert ("# Created note" in result) or ("# Updated note" in result)
 905 |             assert safe_folder in result  # Should show in file_path
 906 | 
 907 |     @pytest.mark.asyncio
 908 |     async def test_write_note_empty_folder_security(self, app, test_project):
 909 |         """Test that empty folder parameter is handled securely."""
 910 |         # Empty folder should be allowed (creates in root)
 911 |         result = await write_note.fn(
 912 |             project=test_project.name,
 913 |             title="Root Note",
 914 |             folder="",
 915 |             content="# Root Note\nThis note should be created in the project root.",
 916 |         )
 917 | 
 918 |         assert isinstance(result, str)
 919 |         # Empty folder should not trigger security error
 920 |         assert "# Error" not in result
 921 |         assert "paths must stay within project boundaries" not in result
 922 |         # Should succeed normally
 923 |         assert ("# Created note" in result) or ("# Updated note" in result)
 924 | 
 925 |     @pytest.mark.asyncio
 926 |     async def test_write_note_none_folder_security(self, app, test_project):
 927 |         """Test that default folder behavior works securely when folder is omitted."""
 928 |         # The write_note function requires folder parameter, but we can test with empty string
 929 |         # which effectively creates in project root
 930 |         result = await write_note.fn(
 931 |             project=test_project.name,
 932 |             title="Root Folder Note",
 933 |             folder="",  # Empty string instead of None since folder is required
 934 |             content="# Root Folder Note\nThis note should be created in the project root.",
 935 |         )
 936 | 
 937 |         assert isinstance(result, str)
 938 |         # Empty folder should not trigger security error
 939 |         assert "# Error" not in result
 940 |         assert "paths must stay within project boundaries" not in result
 941 |         # Should succeed normally
 942 |         assert ("# Created note" in result) or ("# Updated note" in result)
 943 | 
 944 |     @pytest.mark.asyncio
 945 |     async def test_write_note_current_directory_references_security(self, app, test_project):
 946 |         """Test that current directory references are handled securely."""
 947 |         # Test current directory references (should be safe)
 948 |         safe_folders = [
 949 |             "./notes",
 950 |             "folder/./subfolder",
 951 |             "./folder/subfolder",
 952 |         ]
 953 | 
 954 |         for safe_folder in safe_folders:
 955 |             result = await write_note.fn(
 956 |                 project=test_project.name,
 957 |                 title=f"Current Dir Test {safe_folder.replace('/', '-').replace('.', 'dot')}",
 958 |                 folder=safe_folder,
 959 |                 content="# Current Directory Test\nThis should work with current directory references.",
 960 |             )
 961 | 
 962 |             assert isinstance(result, str)
 963 |             # Should NOT contain security error message
 964 |             assert "# Error" not in result
 965 |             assert "paths must stay within project boundaries" not in result
 966 |             # Should succeed normally
 967 |             assert ("# Created note" in result) or ("# Updated note" in result)
 968 | 
 969 |     @pytest.mark.asyncio
 970 |     async def test_write_note_security_with_all_parameters(self, app, test_project):
 971 |         """Test security validation works with all write_note parameters."""
 972 |         # Test that security validation is applied even when all other parameters are provided
 973 |         result = await write_note.fn(
 974 |             project=test_project.name,
 975 |             title="Security Test with All Params",
 976 |             folder="../../../etc/malicious",
 977 |             content="# Malicious Content\nThis should be blocked by security validation.",
 978 |             tags=["malicious", "test"],
 979 |             entity_type="guide",
 980 |         )
 981 | 
 982 |         assert isinstance(result, str)
 983 |         assert "# Error" in result
 984 |         assert "paths must stay within project boundaries" in result
 985 |         assert "../../../etc/malicious" in result
 986 | 
 987 |     @pytest.mark.asyncio
 988 |     async def test_write_note_security_logging(self, app, test_project, caplog):
 989 |         """Test that security violations are properly logged."""
 990 |         # Attempt path traversal attack
 991 |         result = await write_note.fn(
 992 |             project=test_project.name,
 993 |             title="Security Logging Test",
 994 |             folder="../../../etc/passwd_folder",
 995 |             content="# Test Content\nThis should trigger security logging.",
 996 |         )
 997 | 
 998 |         assert "# Error" in result
 999 |         assert "paths must stay within project boundaries" in result
1000 | 
1001 |         # Check that security violation was logged
1002 |         # Note: This test may need adjustment based on the actual logging setup
1003 |         # The security validation should generate a warning log entry
1004 | 
1005 |     @pytest.mark.asyncio
1006 |     async def test_write_note_preserves_functionality_with_security(self, app, test_project):
1007 |         """Test that security validation doesn't break normal note creation functionality."""
1008 |         # Create a note with all features to ensure security validation doesn't interfere
1009 |         result = await write_note.fn(
1010 |             project=test_project.name,
1011 |             title="Full Feature Security Test",
1012 |             folder="security-tests",
1013 |             content=dedent("""
1014 |                 # Full Feature Security Test
1015 | 
1016 |                 This note tests that security validation doesn't break normal functionality.
1017 | 
1018 |                 ## Observations
1019 |                 - [security] Path validation working correctly #security
1020 |                 - [feature] All features still functional #test
1021 | 
1022 |                 ## Relations
1023 |                 - relates_to [[Security Implementation]]
1024 |                 - depends_on [[Path Validation]]
1025 | 
1026 |                 Additional content with various formatting.
1027 |             """).strip(),
1028 |             tags=["security", "test", "full-feature"],
1029 |             entity_type="guide",
1030 |         )
1031 | 
1032 |         # Should succeed normally
1033 |         assert isinstance(result, str)
1034 |         assert "# Error" not in result
1035 |         assert "paths must stay within project boundaries" not in result
1036 |         assert "# Created note" in result
1037 |         assert "file_path: security-tests/Full Feature Security Test.md" in result
1038 |         assert "permalink: security-tests/full-feature-security-test" in result
1039 | 
1040 |         # Should process observations and relations
1041 |         assert "## Observations" in result
1042 |         assert "## Relations" in result
1043 |         assert "## Tags" in result
1044 | 
1045 |         # Should show proper counts
1046 |         assert "security: 1" in result
1047 |         assert "feature: 1" in result
1048 | 
1049 | 
1050 | class TestWriteNoteSecurityEdgeCases:
1051 |     """Test edge cases for write_note security validation."""
1052 | 
1053 |     @pytest.mark.asyncio
1054 |     async def test_write_note_unicode_folder_attacks(self, app, test_project):
1055 |         """Test that Unicode-based path traversal attempts are blocked."""
1056 |         # Test Unicode path traversal attempts
1057 |         unicode_attack_folders = [
1058 |             "notes/文档/../../../etc",  # Chinese characters
1059 |             "docs/café/../../secrets",  # Accented characters
1060 |             "files/αβγ/../../../malicious",  # Greek characters
1061 |         ]
1062 | 
1063 |         for attack_folder in unicode_attack_folders:
1064 |             result = await write_note.fn(
1065 |                 project=test_project.name,
1066 |                 title="Unicode Attack Test",
1067 |                 folder=attack_folder,
1068 |                 content="# Unicode Attack\nThis should be blocked.",
1069 |             )
1070 | 
1071 |             assert isinstance(result, str)
1072 |             assert "# Error" in result
1073 |             assert "paths must stay within project boundaries" in result
1074 | 
1075 |     @pytest.mark.asyncio
1076 |     async def test_write_note_very_long_attack_folder(self, app, test_project):
1077 |         """Test handling of very long attack folder paths."""
1078 |         # Create a very long path traversal attack
1079 |         long_attack_folder = "../" * 1000 + "etc/malicious"
1080 | 
1081 |         result = await write_note.fn(
1082 |             project=test_project.name,
1083 |             title="Long Attack Test",
1084 |             folder=long_attack_folder,
1085 |             content="# Long Attack\nThis should be blocked.",
1086 |         )
1087 | 
1088 |         assert isinstance(result, str)
1089 |         assert "# Error" in result
1090 |         assert "paths must stay within project boundaries" in result
1091 | 
1092 |     @pytest.mark.asyncio
1093 |     async def test_write_note_case_variations_attacks(self, app, test_project):
1094 |         """Test that case variations don't bypass security."""
1095 |         # Test case variations (though case sensitivity depends on filesystem)
1096 |         case_attack_folders = [
1097 |             "../ETC",
1098 |             "../Etc/SECRETS",
1099 |             "..\\WINDOWS",
1100 |             "~/SECRETS",
1101 |         ]
1102 | 
1103 |         for attack_folder in case_attack_folders:
1104 |             result = await write_note.fn(
1105 |                 project=test_project.name,
1106 |                 title="Case Variation Attack Test",
1107 |                 folder=attack_folder,
1108 |                 content="# Case Attack\nThis should be blocked.",
1109 |             )
1110 | 
1111 |             assert isinstance(result, str)
1112 |             assert "# Error" in result
1113 |             assert "paths must stay within project boundaries" in result
1114 | 
1115 |     @pytest.mark.asyncio
1116 |     async def test_write_note_whitespace_in_attack_folders(self, app, test_project):
1117 |         """Test that whitespace doesn't help bypass security."""
1118 |         # Test attack folders with various whitespace
1119 |         whitespace_attack_folders = [
1120 |             " ../../../etc ",
1121 |             "\t../../../secrets\t",
1122 |             " ..\\..\\Windows ",
1123 |             "notes/ ../../ malicious",
1124 |         ]
1125 | 
1126 |         for attack_folder in whitespace_attack_folders:
1127 |             result = await write_note.fn(
1128 |                 project=test_project.name,
1129 |                 title="Whitespace Attack Test",
1130 |                 folder=attack_folder,
1131 |                 content="# Whitespace Attack\nThis should be blocked.",
1132 |             )
1133 | 
1134 |             assert isinstance(result, str)
1135 |             # The attack should still be blocked even with whitespace
1136 |             if ".." in attack_folder.strip() or "~" in attack_folder.strip():
1137 |                 assert "# Error" in result
1138 |                 assert "paths must stay within project boundaries" in result
1139 | 
```

--------------------------------------------------------------------------------
/specs/SPEC-17 Semantic Search with ChromaDB.md:
--------------------------------------------------------------------------------

```markdown
   1 | ---
   2 | title: 'SPEC-17: Semantic Search with ChromaDB'
   3 | type: spec
   4 | permalink: specs/spec-17-semantic-search-chromadb
   5 | tags:
   6 | - search
   7 | - chromadb
   8 | - semantic-search
   9 | - vector-database
  10 | - postgres-migration
  11 | ---
  12 | 
  13 | # SPEC-17: Semantic Search with ChromaDB
  14 | 
  15 | Why ChromaDB for Knowledge Management
  16 | 
  17 | Your users aren't just searching for keywords - they're trying to:
  18 | - "Find notes related to this concept"
  19 | - "Show me similar ideas"
  20 | - "What else did I write about this topic?"
  21 | 
  22 | Example:
  23 |     # User searches: "AI ethics"
  24 | 
  25 |     # FTS5/MeiliSearch finds:
  26 |     - "AI ethics guidelines"     ✅
  27 |     - "ethical AI development"   ✅
  28 |     - "artificial intelligence"  ❌ No keyword match
  29 | 
  30 |     # ChromaDB finds:
  31 |     - "AI ethics guidelines"     ✅
  32 |     - "ethical AI development"   ✅
  33 |     - "artificial intelligence"  ✅ Semantic match!
  34 |     - "bias in ML models"        ✅ Related concept
  35 |     - "responsible technology"   ✅ Similar theme
  36 |     - "neural network fairness"  ✅ Connected idea
  37 | 
  38 | ChromaDB vs MeiliSearch vs Typesense
  39 | 
  40 | | Feature          | ChromaDB           | MeiliSearch        | Typesense          |
  41 | |------------------|--------------------|--------------------|--------------------|
  42 | | Semantic Search  | ✅ Excellent        | ❌ No               | ❌ No               |
  43 | | Keyword Search   | ⚠️ Via metadata    | ✅ Excellent        | ✅ Excellent        |
  44 | | Local Deployment | ✅ Embedded mode    | ⚠️ Server required | ⚠️ Server required |
  45 | | No Server Needed | ✅ YES!             | ❌ No               | ❌ No               |
  46 | | Embedding Cost   | ~$0.13/1M tokens   | None               | None               |
  47 | | Search Speed     | 50-200ms           | 10-50ms            | 10-50ms            |
  48 | | Best For         | Semantic discovery | Exact terms        | Exact terms        |
  49 | 
  50 | The Killer Feature: Embedded Mode
  51 | 
  52 | ChromaDB has an embedded client that runs in-process - NO SERVER NEEDED!
  53 | 
  54 | # Local (FOSS) - ChromaDB embedded in Python process
  55 | import chromadb
  56 | 
  57 | client = chromadb.PersistentClient(path="/path/to/chroma_data")
  58 | collection = client.get_or_create_collection("knowledge_base")
  59 | 
  60 | # Add documents
  61 | collection.add(
  62 |   ids=["note1", "note2"],
  63 |   documents=["AI ethics", "Neural networks"],
  64 |   metadatas=[{"type": "note"}, {"type": "spec"}]
  65 | )
  66 | 
  67 | # Search - NO API calls, runs locally!
  68 | results = collection.query(
  69 |   query_texts=["machine learning"],
  70 |   n_results=10
  71 | )
  72 | 
  73 | 
  74 | ## Why
  75 | 
  76 | ### Current Problem: Database Persistence in Cloud
  77 | In cloud deployments, `memory.db` (SQLite) doesn't persist across Docker container restarts. This means:
  78 | - Database must be rebuilt on every container restart
  79 | - Initial sync takes ~49 seconds for 500 files (after optimization in #352)
  80 | - Users experience delays on each deployment
  81 | 
  82 | ### Search Architecture Issues
  83 | Current SQLite FTS5 implementation creates a **dual-implementation problem** for PostgreSQL migration:
  84 | - FTS5 (SQLite) uses `VIRTUAL TABLE` with `MATCH` queries
  85 | - PostgreSQL full-text search uses `TSVECTOR` with `@@` operator
  86 | - These are fundamentally incompatible architectures
  87 | - Would require **2x search code** and **2x tests** to support both
  88 | 
  89 | **Example of incompatibility:**
  90 | ```python
  91 | # SQLite FTS5
  92 | "content_stems MATCH :text"
  93 | 
  94 | # PostgreSQL
  95 | "content_vector @@ plainto_tsquery(:text)"
  96 | ```
  97 | 
  98 | ### Search Quality Limitations
  99 | Current keyword-based FTS5 has limitations:
 100 | - No semantic understanding (search "AI" doesn't find "machine learning")
 101 | - No word relationships (search "neural networks" doesn't find "deep learning")
 102 | - Limited typo tolerance
 103 | - No relevance ranking beyond keyword matching
 104 | 
 105 | ### Strategic Goal: PostgreSQL Migration
 106 | Moving to PostgreSQL (Neon) for cloud deployments would:
 107 | - ✅ Solve persistence issues (database survives restarts)
 108 | - ✅ Enable multi-tenant architecture
 109 | - ✅ Better performance for large datasets
 110 | - ✅ Support for cloud-native scaling
 111 | 
 112 | **But requires solving the search compatibility problem.**
 113 | 
 114 | ## What
 115 | 
 116 | Migrate from SQLite FTS5 to **ChromaDB** for semantic vector search across all deployments.
 117 | 
 118 | **Key insight:** ChromaDB is **database-agnostic** - it works with both SQLite and PostgreSQL, eliminating the dual-implementation problem.
 119 | 
 120 | ### Affected Areas
 121 | - Search implementation (`src/basic_memory/repository/search_repository.py`)
 122 | - Search service (`src/basic_memory/services/search_service.py`)
 123 | - Search models (`src/basic_memory/models/search.py`)
 124 | - Database initialization (`src/basic_memory/db.py`)
 125 | - MCP search tools (`src/basic_memory/mcp/tools/search.py`)
 126 | - Dependencies (`pyproject.toml` - add ChromaDB)
 127 | - Alembic migrations (FTS5 table removal)
 128 | - Documentation
 129 | 
 130 | ### What Changes
 131 | **Removed:**
 132 | - SQLite FTS5 virtual table
 133 | - `MATCH` query syntax
 134 | - FTS5-specific tokenization and prefix handling
 135 | - ~300 lines of FTS5 query preparation code
 136 | 
 137 | **Added:**
 138 | - ChromaDB persistent client (embedded mode)
 139 | - Vector embedding generation
 140 | - Semantic similarity search
 141 | - Local embedding model (`sentence-transformers`)
 142 | - Collection management for multi-project support
 143 | 
 144 | ### What Stays the Same
 145 | - Search API interface (MCP tools, REST endpoints)
 146 | - Entity/Observation/Relation indexing workflow
 147 | - Multi-project isolation
 148 | - Search filtering by type, date, metadata
 149 | - Pagination and result formatting
 150 | - **All SQL queries for exact lookups and metadata filtering**
 151 | 
 152 | ## Hybrid Architecture: SQL + ChromaDB
 153 | 
 154 | **Critical Design Decision:** ChromaDB **complements** SQL, it doesn't **replace** it.
 155 | 
 156 | ### Why Hybrid?
 157 | 
 158 | ChromaDB is excellent for semantic text search but terrible for exact lookups. SQL is perfect for exact lookups and structured queries. We use both:
 159 | 
 160 | ```
 161 | ┌─────────────────────────────────────────────────┐
 162 | │ Search Request                                   │
 163 | └─────────────────────────────────────────────────┘
 164 |                     ▼
 165 |        ┌────────────────────────┐
 166 |        │ SearchRepository       │
 167 |        │  (Smart Router)        │
 168 |        └────────────────────────┘
 169 |               ▼           ▼
 170 |   ┌───────────┐      ┌──────────────┐
 171 |   │ SQL       │      │ ChromaDB     │
 172 |   │ Queries   │      │ Semantic     │
 173 |   └───────────┘      └──────────────┘
 174 |        ▼                    ▼
 175 |   Exact lookups      Text search
 176 |   - Permalink        - Semantic similarity
 177 |   - Pattern match    - Related concepts
 178 |   - Title exact      - Typo tolerance
 179 |   - Metadata filter  - Fuzzy matching
 180 |   - Date ranges
 181 | ```
 182 | 
 183 | ### When to Use Each
 184 | 
 185 | #### Use SQL For (Fast & Exact)
 186 | 
 187 | **Exact Permalink Lookup:**
 188 | ```python
 189 | # Find by exact permalink - SQL wins
 190 | "SELECT * FROM entities WHERE permalink = 'specs/search-feature'"
 191 | # ~1ms, perfect for exact matches
 192 | 
 193 | # ChromaDB would be: ~50ms, wasteful
 194 | ```
 195 | 
 196 | **Pattern Matching:**
 197 | ```python
 198 | # Find all specs - SQL wins
 199 | "SELECT * FROM entities WHERE permalink GLOB 'specs/*'"
 200 | # ~5ms, perfect for wildcards
 201 | 
 202 | # ChromaDB doesn't support glob patterns
 203 | ```
 204 | 
 205 | **Pure Metadata Queries:**
 206 | ```python
 207 | # Find all meetings tagged "important" - SQL wins
 208 | "SELECT * FROM entities
 209 |  WHERE json_extract(entity_metadata, '$.entity_type') = 'meeting'
 210 |  AND json_extract(entity_metadata, '$.tags') LIKE '%important%'"
 211 | # ~5ms, structured query
 212 | 
 213 | # No text search needed, SQL is faster and simpler
 214 | ```
 215 | 
 216 | **Date Filtering:**
 217 | ```python
 218 | # Find recent specs - SQL wins
 219 | "SELECT * FROM entities
 220 |  WHERE entity_type = 'spec'
 221 |  AND created_at > '2024-01-01'
 222 |  ORDER BY created_at DESC"
 223 | # ~2ms, perfect for structured data
 224 | ```
 225 | 
 226 | #### Use ChromaDB For (Semantic & Fuzzy)
 227 | 
 228 | **Semantic Content Search:**
 229 | ```python
 230 | # Find notes about "neural networks" - ChromaDB wins
 231 | collection.query(query_texts=["neural networks"])
 232 | # Finds: "machine learning", "deep learning", "AI models"
 233 | # ~50-100ms, semantic understanding
 234 | 
 235 | # SQL FTS5 would only find exact keyword matches
 236 | ```
 237 | 
 238 | **Text Search + Metadata:**
 239 | ```python
 240 | # Find meeting notes about "project planning" tagged "important"
 241 | collection.query(
 242 |     query_texts=["project planning"],
 243 |     where={
 244 |         "entity_type": "meeting",
 245 |         "tags": {"$contains": "important"}
 246 |     }
 247 | )
 248 | # ~100ms, semantic search with filters
 249 | # Finds: "roadmap discussion", "sprint planning", etc.
 250 | ```
 251 | 
 252 | **Typo Tolerance:**
 253 | ```python
 254 | # User types "serch feature" (typo) - ChromaDB wins
 255 | collection.query(query_texts=["serch feature"])
 256 | # Still finds: "search feature" documents
 257 | # ~50-100ms, fuzzy matching
 258 | 
 259 | # SQL would find nothing
 260 | ```
 261 | 
 262 | ### Performance Comparison
 263 | 
 264 | | Query Type | SQL | ChromaDB | Winner |
 265 | |-----------|-----|----------|--------|
 266 | | Exact permalink | 1-2ms | 50ms | ✅ SQL |
 267 | | Pattern match (specs/*) | 5-10ms | N/A | ✅ SQL |
 268 | | Pure metadata filter | 5ms | 50ms | ✅ SQL |
 269 | | Semantic text search | ❌ Can't | 50-100ms | ✅ ChromaDB |
 270 | | Text + metadata | ❌ Keywords only | 100ms | ✅ ChromaDB |
 271 | | Typo tolerance | ❌ Can't | 50ms | ✅ ChromaDB |
 272 | 
 273 | ### Metadata/Frontmatter Handling
 274 | 
 275 | **Both systems support full frontmatter filtering!**
 276 | 
 277 | #### SQL Metadata Storage
 278 | 
 279 | ```python
 280 | # Entities table stores frontmatter as JSON
 281 | CREATE TABLE entities (
 282 |     id INTEGER PRIMARY KEY,
 283 |     title TEXT,
 284 |     permalink TEXT,
 285 |     file_path TEXT,
 286 |     entity_type TEXT,
 287 |     entity_metadata JSON,  -- All frontmatter here!
 288 |     created_at DATETIME,
 289 |     ...
 290 | )
 291 | 
 292 | # Query frontmatter fields
 293 | SELECT * FROM entities
 294 | WHERE json_extract(entity_metadata, '$.entity_type') = 'meeting'
 295 |   AND json_extract(entity_metadata, '$.tags') LIKE '%important%'
 296 |   AND json_extract(entity_metadata, '$.status') = 'completed'
 297 | ```
 298 | 
 299 | #### ChromaDB Metadata Storage
 300 | 
 301 | ```python
 302 | # When indexing, store ALL frontmatter as metadata
 303 | class ChromaSearchBackend:
 304 |     async def index_entity(self, entity: Entity):
 305 |         """Index with complete frontmatter metadata."""
 306 | 
 307 |         # Extract ALL frontmatter fields
 308 |         metadata = {
 309 |             "entity_id": entity.id,
 310 |             "project_id": entity.project_id,
 311 |             "permalink": entity.permalink,
 312 |             "file_path": entity.file_path,
 313 |             "entity_type": entity.entity_type,
 314 |             "type": "entity",
 315 |             # ALL frontmatter tags
 316 |             "tags": entity.entity_metadata.get("tags", []),
 317 |             # Custom frontmatter fields
 318 |             "status": entity.entity_metadata.get("status"),
 319 |             "priority": entity.entity_metadata.get("priority"),
 320 |             # Spread any other custom fields
 321 |             **{k: v for k, v in entity.entity_metadata.items()
 322 |                if k not in ["tags", "entity_type"]}
 323 |         }
 324 | 
 325 |         self.collection.upsert(
 326 |             ids=[f"entity_{entity.id}_{entity.project_id}"],
 327 |             documents=[self._format_document(entity)],
 328 |             metadatas=[metadata]  # Full frontmatter!
 329 |         )
 330 | ```
 331 | 
 332 | #### ChromaDB Metadata Queries
 333 | 
 334 | ChromaDB supports rich filtering:
 335 | 
 336 | ```python
 337 | # Simple filter - single field
 338 | collection.query(
 339 |     query_texts=["project planning"],
 340 |     where={"entity_type": "meeting"}
 341 | )
 342 | 
 343 | # Multiple conditions (AND)
 344 | collection.query(
 345 |     query_texts=["architecture decisions"],
 346 |     where={
 347 |         "entity_type": "spec",
 348 |         "tags": {"$contains": "important"}
 349 |     }
 350 | )
 351 | 
 352 | # Complex filters with operators
 353 | collection.query(
 354 |     query_texts=["machine learning"],
 355 |     where={
 356 |         "$and": [
 357 |             {"entity_type": {"$in": ["note", "spec"]}},
 358 |             {"tags": {"$contains": "AI"}},
 359 |             {"created_at": {"$gt": "2024-01-01"}},
 360 |             {"status": "in-progress"}
 361 |         ]
 362 |     }
 363 | )
 364 | 
 365 | # Multiple tags (all must match)
 366 | collection.query(
 367 |     query_texts=["cloud architecture"],
 368 |     where={
 369 |         "$and": [
 370 |             {"tags": {"$contains": "architecture"}},
 371 |             {"tags": {"$contains": "cloud"}}
 372 |         ]
 373 |     }
 374 | )
 375 | ```
 376 | 
 377 | ### Smart Routing Implementation
 378 | 
 379 | ```python
 380 | class SearchRepository:
 381 |     def __init__(
 382 |         self,
 383 |         session_maker: async_sessionmaker[AsyncSession],
 384 |         project_id: int,
 385 |         chroma_backend: ChromaSearchBackend
 386 |     ):
 387 |         self.sql = session_maker  # Keep SQL!
 388 |         self.chroma = chroma_backend
 389 |         self.project_id = project_id
 390 | 
 391 |     async def search(
 392 |         self,
 393 |         search_text: Optional[str] = None,
 394 |         permalink: Optional[str] = None,
 395 |         permalink_match: Optional[str] = None,
 396 |         title: Optional[str] = None,
 397 |         types: Optional[List[str]] = None,
 398 |         tags: Optional[List[str]] = None,
 399 |         after_date: Optional[datetime] = None,
 400 |         custom_metadata: Optional[dict] = None,
 401 |         limit: int = 10,
 402 |         offset: int = 0,
 403 |     ) -> List[SearchIndexRow]:
 404 |         """Smart routing between SQL and ChromaDB."""
 405 | 
 406 |         # ==========================================
 407 |         # Route 1: Exact Lookups → SQL (1-5ms)
 408 |         # ==========================================
 409 | 
 410 |         if permalink:
 411 |             # Exact permalink: "specs/search-feature"
 412 |             return await self._sql_permalink_lookup(permalink)
 413 | 
 414 |         if permalink_match:
 415 |             # Pattern match: "specs/*"
 416 |             return await self._sql_pattern_match(permalink_match)
 417 | 
 418 |         if title and not search_text:
 419 |             # Exact title lookup (no semantic search needed)
 420 |             return await self._sql_title_match(title)
 421 | 
 422 |         # ==========================================
 423 |         # Route 2: Pure Metadata → SQL (5-10ms)
 424 |         # ==========================================
 425 | 
 426 |         # No text search, just filtering by metadata
 427 |         if not search_text and (types or tags or after_date or custom_metadata):
 428 |             return await self._sql_metadata_filter(
 429 |                 types=types,
 430 |                 tags=tags,
 431 |                 after_date=after_date,
 432 |                 custom_metadata=custom_metadata,
 433 |                 limit=limit,
 434 |                 offset=offset
 435 |             )
 436 | 
 437 |         # ==========================================
 438 |         # Route 3: Text Search → ChromaDB (50-100ms)
 439 |         # ==========================================
 440 | 
 441 |         if search_text:
 442 |             # Build ChromaDB metadata filters
 443 |             where_filters = self._build_chroma_filters(
 444 |                 types=types,
 445 |                 tags=tags,
 446 |                 after_date=after_date,
 447 |                 custom_metadata=custom_metadata
 448 |             )
 449 | 
 450 |             # Semantic search with metadata filtering
 451 |             return await self.chroma.search(
 452 |                 query_text=search_text,
 453 |                 project_id=self.project_id,
 454 |                 where=where_filters,
 455 |                 limit=limit
 456 |             )
 457 | 
 458 |         # ==========================================
 459 |         # Route 4: List All → SQL (2-5ms)
 460 |         # ==========================================
 461 | 
 462 |         return await self._sql_list_entities(
 463 |             limit=limit,
 464 |             offset=offset
 465 |         )
 466 | 
 467 |     def _build_chroma_filters(
 468 |         self,
 469 |         types: Optional[List[str]] = None,
 470 |         tags: Optional[List[str]] = None,
 471 |         after_date: Optional[datetime] = None,
 472 |         custom_metadata: Optional[dict] = None
 473 |     ) -> dict:
 474 |         """Build ChromaDB where clause from filters."""
 475 |         filters = {"project_id": self.project_id}
 476 | 
 477 |         # Type filtering
 478 |         if types:
 479 |             if len(types) == 1:
 480 |                 filters["entity_type"] = types[0]
 481 |             else:
 482 |                 filters["entity_type"] = {"$in": types}
 483 | 
 484 |         # Tag filtering (array contains)
 485 |         if tags:
 486 |             if len(tags) == 1:
 487 |                 filters["tags"] = {"$contains": tags[0]}
 488 |             else:
 489 |                 # Multiple tags - all must match
 490 |                 filters = {
 491 |                     "$and": [
 492 |                         filters,
 493 |                         *[{"tags": {"$contains": tag}} for tag in tags]
 494 |                     ]
 495 |                 }
 496 | 
 497 |         # Date filtering
 498 |         if after_date:
 499 |             filters["created_at"] = {"$gt": after_date.isoformat()}
 500 | 
 501 |         # Custom frontmatter fields
 502 |         if custom_metadata:
 503 |             filters.update(custom_metadata)
 504 | 
 505 |         return filters
 506 | 
 507 |     async def _sql_metadata_filter(
 508 |         self,
 509 |         types: Optional[List[str]] = None,
 510 |         tags: Optional[List[str]] = None,
 511 |         after_date: Optional[datetime] = None,
 512 |         custom_metadata: Optional[dict] = None,
 513 |         limit: int = 10,
 514 |         offset: int = 0
 515 |     ) -> List[SearchIndexRow]:
 516 |         """Pure metadata queries using SQL."""
 517 |         conditions = ["project_id = :project_id"]
 518 |         params = {"project_id": self.project_id}
 519 | 
 520 |         if types:
 521 |             type_list = ", ".join(f"'{t}'" for t in types)
 522 |             conditions.append(f"entity_type IN ({type_list})")
 523 | 
 524 |         if tags:
 525 |             # Check each tag
 526 |             for i, tag in enumerate(tags):
 527 |                 param_name = f"tag_{i}"
 528 |                 conditions.append(
 529 |                     f"json_extract(entity_metadata, '$.tags') LIKE :{param_name}"
 530 |                 )
 531 |                 params[param_name] = f"%{tag}%"
 532 | 
 533 |         if after_date:
 534 |             conditions.append("created_at > :after_date")
 535 |             params["after_date"] = after_date
 536 | 
 537 |         if custom_metadata:
 538 |             for key, value in custom_metadata.items():
 539 |                 param_name = f"meta_{key}"
 540 |                 conditions.append(
 541 |                     f"json_extract(entity_metadata, '$.{key}') = :{param_name}"
 542 |                 )
 543 |                 params[param_name] = value
 544 | 
 545 |         where = " AND ".join(conditions)
 546 |         sql = f"""
 547 |             SELECT * FROM entities
 548 |             WHERE {where}
 549 |             ORDER BY created_at DESC
 550 |             LIMIT :limit OFFSET :offset
 551 |         """
 552 |         params["limit"] = limit
 553 |         params["offset"] = offset
 554 | 
 555 |         async with db.scoped_session(self.session_maker) as session:
 556 |             result = await session.execute(text(sql), params)
 557 |             return self._format_sql_results(result)
 558 | ```
 559 | 
 560 | ### Real-World Examples
 561 | 
 562 | #### Example 1: Pure Metadata Query (No Text)
 563 | ```python
 564 | # "Find all meetings tagged 'important'"
 565 | results = await search_repo.search(
 566 |     types=["meeting"],
 567 |     tags=["important"]
 568 | )
 569 | 
 570 | # Routing: → SQL (~5ms)
 571 | # SQL: SELECT * FROM entities
 572 | #      WHERE entity_type = 'meeting'
 573 | #      AND json_extract(entity_metadata, '$.tags') LIKE '%important%'
 574 | ```
 575 | 
 576 | #### Example 2: Semantic Search (No Metadata)
 577 | ```python
 578 | # "Find notes about neural networks"
 579 | results = await search_repo.search(
 580 |     search_text="neural networks"
 581 | )
 582 | 
 583 | # Routing: → ChromaDB (~80ms)
 584 | # Finds: "machine learning", "deep learning", "AI models", etc.
 585 | ```
 586 | 
 587 | #### Example 3: Semantic + Metadata
 588 | ```python
 589 | # "Find meeting notes about 'project planning' tagged 'important'"
 590 | results = await search_repo.search(
 591 |     search_text="project planning",
 592 |     types=["meeting"],
 593 |     tags=["important"]
 594 | )
 595 | 
 596 | # Routing: → ChromaDB with filters (~100ms)
 597 | # ChromaDB: query_texts=["project planning"]
 598 | #           where={"entity_type": "meeting",
 599 | #                  "tags": {"$contains": "important"}}
 600 | # Finds: "roadmap discussion", "sprint planning", etc.
 601 | ```
 602 | 
 603 | #### Example 4: Complex Frontmatter Query
 604 | ```python
 605 | # "Find in-progress specs with multiple tags, recent"
 606 | results = await search_repo.search(
 607 |     types=["spec"],
 608 |     tags=["architecture", "cloud"],
 609 |     after_date=datetime(2024, 1, 1),
 610 |     custom_metadata={"status": "in-progress"}
 611 | )
 612 | 
 613 | # Routing: → SQL (~10ms)
 614 | # No text search, pure structured query - SQL is faster
 615 | ```
 616 | 
 617 | #### Example 5: Semantic + Complex Metadata
 618 | ```python
 619 | # "Find notes about 'authentication' that are in-progress"
 620 | results = await search_repo.search(
 621 |     search_text="authentication",
 622 |     custom_metadata={"status": "in-progress", "priority": "high"}
 623 | )
 624 | 
 625 | # Routing: → ChromaDB with metadata filters (~100ms)
 626 | # Semantic search for "authentication" concept
 627 | # Filters by status and priority in metadata
 628 | ```
 629 | 
 630 | #### Example 6: Exact Permalink
 631 | ```python
 632 | # "Show me specs/search-feature"
 633 | results = await search_repo.search(
 634 |     permalink="specs/search-feature"
 635 | )
 636 | 
 637 | # Routing: → SQL (~1ms)
 638 | # SQL: SELECT * FROM entities WHERE permalink = 'specs/search-feature'
 639 | ```
 640 | 
 641 | #### Example 7: Pattern Match
 642 | ```python
 643 | # "Show me all specs"
 644 | results = await search_repo.search(
 645 |     permalink_match="specs/*"
 646 | )
 647 | 
 648 | # Routing: → SQL (~5ms)
 649 | # SQL: SELECT * FROM entities WHERE permalink GLOB 'specs/*'
 650 | ```
 651 | 
 652 | ### What We Remove vs Keep
 653 | 
 654 | **REMOVE (FTS5-specific):**
 655 | - ❌ `CREATE VIRTUAL TABLE search_index USING fts5(...)`
 656 | - ❌ `MATCH` operator queries
 657 | - ❌ FTS5 tokenization configuration
 658 | - ❌ ~300 lines of FTS5 query preparation code
 659 | - ❌ Trigram generation and prefix handling
 660 | 
 661 | **KEEP (Standard SQL):**
 662 | - ✅ `SELECT * FROM entities WHERE permalink = :permalink`
 663 | - ✅ `SELECT * FROM entities WHERE permalink GLOB :pattern`
 664 | - ✅ `SELECT * FROM entities WHERE title LIKE :title`
 665 | - ✅ `SELECT * FROM entities WHERE json_extract(entity_metadata, ...) = :value`
 666 | - ✅ All date filtering, pagination, sorting
 667 | - ✅ Entity table structure and indexes
 668 | 
 669 | **ADD (ChromaDB):**
 670 | - ✅ ChromaDB persistent client (embedded)
 671 | - ✅ Semantic vector search
 672 | - ✅ Metadata filtering in ChromaDB
 673 | - ✅ Smart routing logic
 674 | 
 675 | ## How (High Level)
 676 | 
 677 | ### Architecture Overview
 678 | 
 679 | ```
 680 | ┌─────────────────────────────────────────────────────────────┐
 681 | │ FOSS Deployment (Local)                                      │
 682 | ├─────────────────────────────────────────────────────────────┤
 683 | │ SQLite (data) + ChromaDB embedded (search)                   │
 684 | │ - No external services                                       │
 685 | │ - Local embedding model (sentence-transformers)              │
 686 | │ - Persists in ~/.basic-memory/chroma_data/                  │
 687 | └─────────────────────────────────────────────────────────────┘
 688 | 
 689 | ┌─────────────────────────────────────────────────────────────┐
 690 | │ Cloud Deployment (Multi-tenant)                              │
 691 | ├─────────────────────────────────────────────────────────────┤
 692 | │ PostgreSQL/Neon (data) + ChromaDB server (search)           │
 693 | │ - Neon serverless Postgres for persistence                   │
 694 | │ - ChromaDB server in Docker container                        │
 695 | │ - Optional: OpenAI embeddings for better quality             │
 696 | └─────────────────────────────────────────────────────────────┘
 697 | ```
 698 | 
 699 | ### Phase 1: ChromaDB Integration (2-3 days)
 700 | 
 701 | #### 1. Add ChromaDB Dependency
 702 | ```toml
 703 | # pyproject.toml
 704 | dependencies = [
 705 |     "chromadb>=0.4.0",
 706 |     "sentence-transformers>=2.2.0",  # Local embeddings
 707 | ]
 708 | ```
 709 | 
 710 | #### 2. Create ChromaSearchBackend
 711 | ```python
 712 | # src/basic_memory/search/chroma_backend.py
 713 | from chromadb import PersistentClient
 714 | from chromadb.utils import embedding_functions
 715 | 
 716 | class ChromaSearchBackend:
 717 |     def __init__(
 718 |         self,
 719 |         persist_directory: Path,
 720 |         collection_name: str = "knowledge_base",
 721 |         embedding_model: str = "all-MiniLM-L6-v2"
 722 |     ):
 723 |         """Initialize ChromaDB with local embeddings."""
 724 |         self.client = PersistentClient(path=str(persist_directory))
 725 | 
 726 |         # Use local sentence-transformers model (no API costs)
 727 |         self.embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
 728 |             model_name=embedding_model
 729 |         )
 730 | 
 731 |         self.collection = self.client.get_or_create_collection(
 732 |             name=collection_name,
 733 |             embedding_function=self.embed_fn,
 734 |             metadata={"hnsw:space": "cosine"}  # Similarity metric
 735 |         )
 736 | 
 737 |     async def index_entity(self, entity: Entity):
 738 |         """Index entity with automatic embeddings."""
 739 |         # Combine title and content for semantic search
 740 |         document = self._format_document(entity)
 741 | 
 742 |         self.collection.upsert(
 743 |             ids=[f"entity_{entity.id}_{entity.project_id}"],
 744 |             documents=[document],
 745 |             metadatas=[{
 746 |                 "entity_id": entity.id,
 747 |                 "project_id": entity.project_id,
 748 |                 "permalink": entity.permalink,
 749 |                 "file_path": entity.file_path,
 750 |                 "entity_type": entity.entity_type,
 751 |                 "type": "entity",
 752 |             }]
 753 |         )
 754 | 
 755 |     async def search(
 756 |         self,
 757 |         query_text: str,
 758 |         project_id: int,
 759 |         limit: int = 10,
 760 |         filters: dict = None
 761 |     ) -> List[SearchResult]:
 762 |         """Semantic search with metadata filtering."""
 763 |         where = {"project_id": project_id}
 764 |         if filters:
 765 |             where.update(filters)
 766 | 
 767 |         results = self.collection.query(
 768 |             query_texts=[query_text],
 769 |             n_results=limit,
 770 |             where=where
 771 |         )
 772 | 
 773 |         return self._format_results(results)
 774 | ```
 775 | 
 776 | #### 3. Update SearchRepository
 777 | ```python
 778 | # src/basic_memory/repository/search_repository.py
 779 | class SearchRepository:
 780 |     def __init__(
 781 |         self,
 782 |         session_maker: async_sessionmaker[AsyncSession],
 783 |         project_id: int,
 784 |         chroma_backend: ChromaSearchBackend
 785 |     ):
 786 |         self.session_maker = session_maker
 787 |         self.project_id = project_id
 788 |         self.chroma = chroma_backend
 789 | 
 790 |     async def search(
 791 |         self,
 792 |         search_text: Optional[str] = None,
 793 |         permalink: Optional[str] = None,
 794 |         # ... other filters
 795 |     ) -> List[SearchIndexRow]:
 796 |         """Search using ChromaDB for text, SQL for exact lookups."""
 797 | 
 798 |         # For exact permalink/pattern matches, use SQL
 799 |         if permalink or permalink_match:
 800 |             return await self._sql_exact_search(...)
 801 | 
 802 |         # For text search, use ChromaDB semantic search
 803 |         if search_text:
 804 |             results = await self.chroma.search(
 805 |                 query_text=search_text,
 806 |                 project_id=self.project_id,
 807 |                 limit=limit,
 808 |                 filters=self._build_filters(types, after_date, ...)
 809 |             )
 810 |             return results
 811 | 
 812 |         # Fallback to listing all
 813 |         return await self._list_entities(...)
 814 | ```
 815 | 
 816 | #### 4. Update SearchService
 817 | ```python
 818 | # src/basic_memory/services/search_service.py
 819 | class SearchService:
 820 |     def __init__(
 821 |         self,
 822 |         search_repository: SearchRepository,
 823 |         entity_repository: EntityRepository,
 824 |         file_service: FileService,
 825 |         chroma_backend: ChromaSearchBackend,
 826 |     ):
 827 |         self.repository = search_repository
 828 |         self.entity_repository = entity_repository
 829 |         self.file_service = file_service
 830 |         self.chroma = chroma_backend
 831 | 
 832 |     async def index_entity(self, entity: Entity):
 833 |         """Index entity in ChromaDB."""
 834 |         if entity.is_markdown:
 835 |             await self._index_entity_markdown(entity)
 836 |         else:
 837 |             await self._index_entity_file(entity)
 838 | 
 839 |     async def _index_entity_markdown(self, entity: Entity):
 840 |         """Index markdown entity with full content."""
 841 |         # Index entity
 842 |         await self.chroma.index_entity(entity)
 843 | 
 844 |         # Index observations (as separate documents)
 845 |         for obs in entity.observations:
 846 |             await self.chroma.index_observation(obs, entity)
 847 | 
 848 |         # Index relations (metadata only)
 849 |         for rel in entity.outgoing_relations:
 850 |             await self.chroma.index_relation(rel, entity)
 851 | ```
 852 | 
 853 | ### Phase 2: PostgreSQL Support (1 day)
 854 | 
 855 | #### 1. Add PostgreSQL Database Type
 856 | ```python
 857 | # src/basic_memory/db.py
 858 | class DatabaseType(Enum):
 859 |     MEMORY = auto()
 860 |     FILESYSTEM = auto()
 861 |     POSTGRESQL = auto()  # NEW
 862 | 
 863 |     @classmethod
 864 |     def get_db_url(cls, db_path_or_url: str, db_type: "DatabaseType") -> str:
 865 |         if db_type == cls.POSTGRESQL:
 866 |             return db_path_or_url  # Neon connection string
 867 |         elif db_type == cls.MEMORY:
 868 |             return "sqlite+aiosqlite://"
 869 |         return f"sqlite+aiosqlite:///{db_path_or_url}"
 870 | ```
 871 | 
 872 | #### 2. Update Connection Handling
 873 | ```python
 874 | def _create_engine_and_session(...):
 875 |     db_url = DatabaseType.get_db_url(db_path_or_url, db_type)
 876 | 
 877 |     if db_type == DatabaseType.POSTGRESQL:
 878 |         # Use asyncpg driver for Postgres
 879 |         engine = create_async_engine(
 880 |             db_url,
 881 |             pool_size=10,
 882 |             max_overflow=20,
 883 |             pool_pre_ping=True,  # Health checks
 884 |         )
 885 |     else:
 886 |         # SQLite configuration
 887 |         engine = create_async_engine(db_url, connect_args=connect_args)
 888 | 
 889 |         # Only configure SQLite-specific settings for SQLite
 890 |         if db_type != DatabaseType.MEMORY:
 891 |             @event.listens_for(engine.sync_engine, "connect")
 892 |             def enable_wal_mode(dbapi_conn, connection_record):
 893 |                 _configure_sqlite_connection(dbapi_conn, enable_wal=True)
 894 | 
 895 |     return engine, async_sessionmaker(engine, expire_on_commit=False)
 896 | ```
 897 | 
 898 | #### 3. Remove SQLite-Specific Code
 899 | ```python
 900 | # Remove from scoped_session context manager:
 901 | # await session.execute(text("PRAGMA foreign_keys=ON"))  # DELETE
 902 | 
 903 | # PostgreSQL handles foreign keys by default
 904 | ```
 905 | 
 906 | ### Phase 3: Migration & Testing (1-2 days)
 907 | 
 908 | #### 1. Create Migration Script
 909 | ```python
 910 | # scripts/migrate_to_chromadb.py
 911 | async def migrate_fts5_to_chromadb():
 912 |     """One-time migration from FTS5 to ChromaDB."""
 913 |     # 1. Read all entities from database
 914 |     entities = await entity_repository.find_all()
 915 | 
 916 |     # 2. Index in ChromaDB
 917 |     for entity in entities:
 918 |         await search_service.index_entity(entity)
 919 | 
 920 |     # 3. Drop FTS5 table (Alembic migration)
 921 |     await session.execute(text("DROP TABLE IF EXISTS search_index"))
 922 | ```
 923 | 
 924 | #### 2. Update Tests
 925 | - Replace FTS5 test fixtures with ChromaDB fixtures
 926 | - Test semantic search quality
 927 | - Test multi-project isolation in ChromaDB
 928 | - Benchmark performance vs FTS5
 929 | 
 930 | #### 3. Documentation Updates
 931 | - Update search documentation
 932 | - Add ChromaDB configuration guide
 933 | - Document embedding model options
 934 | - PostgreSQL deployment guide
 935 | 
 936 | ### Configuration
 937 | 
 938 | ```python
 939 | # config.py
 940 | class BasicMemoryConfig:
 941 |     # Database
 942 |     database_type: DatabaseType = DatabaseType.FILESYSTEM
 943 |     database_path: Path = Path.home() / ".basic-memory" / "memory.db"
 944 |     database_url: Optional[str] = None  # For Postgres: postgresql://...
 945 | 
 946 |     # Search
 947 |     chroma_persist_directory: Path = Path.home() / ".basic-memory" / "chroma_data"
 948 |     embedding_model: str = "all-MiniLM-L6-v2"  # Local model
 949 |     embedding_provider: str = "local"  # or "openai"
 950 |     openai_api_key: Optional[str] = None  # For cloud deployments
 951 | ```
 952 | 
 953 | ### Deployment Configurations
 954 | 
 955 | #### Local (FOSS)
 956 | ```yaml
 957 | # Default configuration
 958 | database_type: FILESYSTEM
 959 | database_path: ~/.basic-memory/memory.db
 960 | chroma_persist_directory: ~/.basic-memory/chroma_data
 961 | embedding_model: all-MiniLM-L6-v2
 962 | embedding_provider: local
 963 | ```
 964 | 
 965 | #### Cloud (Docker Compose)
 966 | ```yaml
 967 | services:
 968 |   postgres:
 969 |     image: postgres:15
 970 |     environment:
 971 |       POSTGRES_DB: basic_memory
 972 |       POSTGRES_PASSWORD: ${DB_PASSWORD}
 973 | 
 974 |   chromadb:
 975 |     image: chromadb/chroma:latest
 976 |     volumes:
 977 |       - chroma_data:/chroma/chroma
 978 |     environment:
 979 |       ALLOW_RESET: true
 980 | 
 981 |   app:
 982 |     environment:
 983 |       DATABASE_TYPE: POSTGRESQL
 984 |       DATABASE_URL: postgresql://postgres:${DB_PASSWORD}@postgres/basic_memory
 985 |       CHROMA_HOST: chromadb
 986 |       CHROMA_PORT: 8000
 987 |       EMBEDDING_PROVIDER: local  # or openai
 988 | ```
 989 | 
 990 | ## How to Evaluate
 991 | 
 992 | ### Success Criteria
 993 | 
 994 | #### Functional Requirements
 995 | - ✅ Semantic search finds related concepts (e.g., "AI" finds "machine learning")
 996 | - ✅ Exact permalink/pattern matches work (e.g., `specs/*`)
 997 | - ✅ Multi-project isolation maintained
 998 | - ✅ All existing search filters work (type, date, metadata)
 999 | - ✅ MCP tools continue to work without changes
1000 | - ✅ Works with both SQLite and PostgreSQL
1001 | 
1002 | #### Performance Requirements
1003 | - ✅ Search latency < 200ms for 1000 documents (local embedding)
1004 | - ✅ Indexing time comparable to FTS5 (~10 files/sec)
1005 | - ✅ Initial sync time not significantly worse than current
1006 | - ✅ Memory footprint < 1GB for local deployments
1007 | 
1008 | #### Quality Requirements
1009 | - ✅ Better search relevance than FTS5 keyword matching
1010 | - ✅ Handles typos and word variations
1011 | - ✅ Finds semantically similar content
1012 | 
1013 | #### Deployment Requirements
1014 | - ✅ FOSS: Works out-of-box with no external services
1015 | - ✅ Cloud: Integrates with PostgreSQL (Neon)
1016 | - ✅ No breaking changes to MCP API
1017 | - ✅ Migration script for existing users
1018 | 
1019 | ### Testing Procedure
1020 | 
1021 | #### 1. Unit Tests
1022 | ```bash
1023 | # Test ChromaDB backend
1024 | pytest tests/test_chroma_backend.py
1025 | 
1026 | # Test search repository with ChromaDB
1027 | pytest tests/test_search_repository.py
1028 | 
1029 | # Test search service
1030 | pytest tests/test_search_service.py
1031 | ```
1032 | 
1033 | #### 2. Integration Tests
1034 | ```bash
1035 | # Test full search workflow
1036 | pytest test-int/test_search_integration.py
1037 | 
1038 | # Test with PostgreSQL
1039 | DATABASE_TYPE=POSTGRESQL pytest test-int/
1040 | ```
1041 | 
1042 | #### 3. Semantic Search Quality Tests
1043 | ```python
1044 | # Test semantic similarity
1045 | search("machine learning") should find:
1046 | - "neural networks"
1047 | - "deep learning"
1048 | - "AI algorithms"
1049 | 
1050 | search("software architecture") should find:
1051 | - "system design"
1052 | - "design patterns"
1053 | - "microservices"
1054 | ```
1055 | 
1056 | #### 4. Performance Benchmarks
1057 | ```bash
1058 | # Run search benchmarks
1059 | pytest test-int/test_search_performance.py -v
1060 | 
1061 | # Measure:
1062 | - Search latency (should be < 200ms)
1063 | - Indexing throughput (should be ~10 files/sec)
1064 | - Memory usage (should be < 1GB)
1065 | ```
1066 | 
1067 | #### 5. Migration Testing
1068 | ```bash
1069 | # Test migration from FTS5 to ChromaDB
1070 | python scripts/migrate_to_chromadb.py
1071 | 
1072 | # Verify all entities indexed
1073 | # Verify search results quality
1074 | # Verify no data loss
1075 | ```
1076 | 
1077 | ### Metrics
1078 | 
1079 | **Search Quality:**
1080 | - Semantic relevance score (manual evaluation)
1081 | - Precision/recall for common queries
1082 | - User satisfaction (qualitative)
1083 | 
1084 | **Performance:**
1085 | - Average search latency (ms)
1086 | - P95/P99 search latency
1087 | - Indexing throughput (files/sec)
1088 | - Memory usage (MB)
1089 | 
1090 | **Deployment:**
1091 | - Local deployment success rate
1092 | - Cloud deployment success rate
1093 | - Migration success rate
1094 | 
1095 | ## Implementation Checklist
1096 | 
1097 | ### Phase 1: ChromaDB Integration
1098 | - [ ] Add ChromaDB and sentence-transformers dependencies
1099 | - [ ] Create ChromaSearchBackend class
1100 | - [ ] Update SearchRepository to use ChromaDB
1101 | - [ ] Update SearchService indexing methods
1102 | - [ ] Remove FTS5 table creation code
1103 | - [ ] Update search query logic
1104 | - [ ] Add ChromaDB configuration to BasicMemoryConfig
1105 | 
1106 | ### Phase 2: PostgreSQL Support
1107 | - [ ] Add DatabaseType.POSTGRESQL enum
1108 | - [ ] Update get_db_url() for Postgres connection strings
1109 | - [ ] Add asyncpg dependency
1110 | - [ ] Update engine creation for Postgres
1111 | - [ ] Remove SQLite-specific PRAGMA statements
1112 | - [ ] Test with Neon database
1113 | 
1114 | ### Phase 3: Testing & Migration
1115 | - [ ] Write unit tests for ChromaSearchBackend
1116 | - [ ] Update search integration tests
1117 | - [ ] Add semantic search quality tests
1118 | - [ ] Create performance benchmarks
1119 | - [ ] Write migration script from FTS5
1120 | - [ ] Test migration with existing data
1121 | - [ ] Update documentation
1122 | 
1123 | ### Phase 4: Deployment
1124 | - [ ] Update docker-compose.yml for cloud
1125 | - [ ] Document local FOSS deployment
1126 | - [ ] Document cloud PostgreSQL deployment
1127 | - [ ] Create migration guide for users
1128 | - [ ] Update MCP tool documentation
1129 | 
1130 | ## Notes
1131 | 
1132 | ### Embedding Model Trade-offs
1133 | 
1134 | **Local Model: `all-MiniLM-L6-v2`**
1135 | - Size: 80MB download
1136 | - Speed: ~50ms embedding time
1137 | - Dimensions: 384
1138 | - Cost: $0
1139 | - Quality: Good for general knowledge
1140 | - Best for: FOSS deployments
1141 | 
1142 | **OpenAI: `text-embedding-3-small`**
1143 | - Speed: ~100-200ms (API call)
1144 | - Dimensions: 1536
1145 | - Cost: ~$0.13 per 1M tokens (~$0.01 per 1000 notes)
1146 | - Quality: Excellent
1147 | - Best for: Cloud deployments with budget
1148 | 
1149 | ### ChromaDB Storage
1150 | 
1151 | ChromaDB stores data in:
1152 | ```
1153 | ~/.basic-memory/chroma_data/
1154 |   ├── chroma.sqlite3        # Metadata
1155 |   ├── index/                # HNSW indexes
1156 |   └── collections/          # Vector data
1157 | ```
1158 | 
1159 | Typical sizes:
1160 | - 100 notes: ~5MB
1161 | - 1000 notes: ~50MB
1162 | - 10000 notes: ~500MB
1163 | 
1164 | ### Why Not Keep FTS5?
1165 | 
1166 | **Considered:** Hybrid approach (FTS5 for SQLite + tsvector for Postgres)
1167 | **Rejected because:**
1168 | - 2x the code to maintain
1169 | - 2x the tests to write
1170 | - 2x the bugs to fix
1171 | - Inconsistent search behavior between deployments
1172 | - ChromaDB provides better search quality anyway
1173 | 
1174 | **ChromaDB wins:**
1175 | - One implementation for both databases
1176 | - Better search quality (semantic!)
1177 | - Database-agnostic architecture
1178 | - Embedded mode for FOSS (no servers needed)
1179 | 
1180 | ## implementation
1181 | 
1182 |   Proposed Architecture
1183 | 
1184 |   Option 1: ChromaDB Only (Simplest)
1185 | 
1186 |   class ChromaSearchBackend:
1187 |       def __init__(self, path: str, embedding_model: str = "all-MiniLM-L6-v2"):yes
1188 |           # For local: embedded client (no server!)
1189 |           self.client = chromadb.PersistentClient(path=path)
1190 | 
1191 |           # Use local embedding model (no API costs!)
1192 |           from chromadb.utils import embedding_functions
1193 |           self.embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
1194 |               model_name=embedding_model
1195 |           )
1196 | 
1197 |           self.collection = self.client.get_or_create_collection(
1198 |               name="knowledge_base",
1199 |               embedding_function=self.embed_fn
1200 |           )
1201 | 
1202 |       async def index_entity(self, entity: Entity):
1203 |           # ChromaDB handles embeddings automatically!
1204 |           self.collection.upsert(
1205 |               ids=[str(entity.id)],
1206 |               documents=[f"{entity.title}\n{entity.content}"],
1207 |               metadatas=[{
1208 |                   "permalink": entity.permalink,
1209 |                   "type": entity.entity_type,
1210 |                   "file_path": entity.file_path
1211 |               }]
1212 |           )
1213 | 
1214 |       async def search(self, query: str, filters: dict = None):
1215 |           # Semantic search with optional metadata filters
1216 |           results = self.collection.query(
1217 |               query_texts=[query],
1218 |               n_results=10,
1219 |               where=filters  # e.g., {"type": "note"}
1220 |           )
1221 |           return results
1222 | 
1223 |   Deployment:
1224 |   - Local (FOSS): ChromaDB embedded, local embedding model, NO servers
1225 |   - Cloud: ChromaDB server OR still embedded (it's just a Python lib!)
1226 | 
1227 |   Option 2: Hybrid FTS + ChromaDB (Best UX)
1228 | 
1229 |   class HybridSearchBackend:
1230 |       def __init__(self):
1231 |           self.fts = SQLiteFTS5Backend()    # Fast keyword search
1232 |           self.chroma = ChromaSearchBackend()  # Semantic search
1233 | 
1234 |       async def search(self, query: str, search_type: str = "auto"):
1235 |           if search_type == "exact":
1236 |               # User wants exact match: "specs/search-feature"
1237 |               return await self.fts.search(query)
1238 | 
1239 |           elif search_type == "semantic":
1240 |               # User wants related concepts
1241 |               return await self.chroma.search(query)
1242 | 
1243 |           else:  # "auto"
1244 |               # Check if query looks like exact match
1245 |               if "/" in query or query.startswith('"'):
1246 |                   return await self.fts.search(query)
1247 | 
1248 |               # Otherwise use semantic search
1249 |               return await self.chroma.search(query)
1250 | 
1251 |   Embedding Options
1252 | 
1253 |   Option A: Local Model (FREE, FOSS-friendly)
1254 | 
1255 |   # Uses sentence-transformers (runs locally)
1256 |   # Model: ~100MB download
1257 |   # Speed: ~50-100ms for embedding
1258 |   # Cost: $0
1259 | 
1260 |   from chromadb.utils import embedding_functions
1261 |   embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
1262 |       model_name="all-MiniLM-L6-v2"  # Fast, accurate, free
1263 |   )
1264 | 
1265 |   Option B: OpenAI Embeddings (Cloud only)
1266 | 
1267 |   # For cloud users who want best quality
1268 |   # Model: text-embedding-3-small
1269 |   # Speed: ~100-200ms via API
1270 |   # Cost: ~$0.13 per 1M tokens (~$0.01 per 1000 notes)
1271 | 
1272 |   embed_fn = embedding_functions.OpenAIEmbeddingFunction(
1273 |       api_key="...",
1274 |       model_name="text-embedding-3-small"
1275 |   )
1276 | 
1277 |   Performance Comparison
1278 | 
1279 |   Local embedding model: all-MiniLM-L6-v2
1280 |   Embedding time: ~50ms per note
1281 |   Search time: ~100ms for 1000 notes
1282 |   Memory: ~500MB (model + ChromaDB)
1283 |   Cost: $0
1284 |   Quality: Good (384 dimensions)
1285 | 
1286 |   OpenAI embeddings: text-embedding-3-small
1287 |   Embedding time: ~100-200ms per note (API call)
1288 |   Search time: ~50ms for 1000 notes
1289 |   Cost: ~$0.01 per 1000 notes
1290 |   Quality: Excellent (1536 dimensions)
1291 | 
1292 |   My Recommendation: ChromaDB with Local Embeddings
1293 | 
1294 |   Here's the plan:
1295 | 
1296 |   Phase 1: Local ChromaDB (1-2 days)
1297 | 
1298 |   # FOSS version
1299 |   - SQLite for data persistence
1300 |   - ChromaDB embedded for semantic search
1301 |   - Local embedding model (no API costs)
1302 |   - NO external services required
1303 | 
1304 |   Benefits:
1305 |   - ✅ Same deployment as current (just Python package)
1306 |   - ✅ Semantic search for better UX
1307 |   - ✅ Free embeddings with local model
1308 |   - ✅ No servers needed
1309 | 
1310 |   Phase 2: Postgres + ChromaDB Cloud (1-2 days)
1311 | 
1312 |   # Cloud version
1313 |   - Postgres for data persistence
1314 |   - ChromaDB server for semantic search
1315 |   - OpenAI embeddings (higher quality)
1316 |   - OR keep local embeddings (cheaper)
1317 | 
1318 |   Phase 3: Hybrid Search (optional, 1 day)
1319 | 
1320 |   # Add FTS for exact matches alongside ChromaDB
1321 |   - Quick keyword search when needed
1322 |   - Semantic search for exploration
1323 |   - Best of both worlds
1324 | 
1325 |   Code Estimate
1326 | 
1327 |   Just ChromaDB (replacing FTS5):
1328 |   - Remove FTS5 code: 2 hours
1329 |   - Add ChromaDB backend: 4 hours
1330 |   - Update search service: 2 hours
1331 |   - Testing: 4 hours
1332 |   - Total: 1.5 days
1333 | 
1334 |   ChromaDB + Postgres migration:
1335 |   - Add Postgres support: 4 hours
1336 |   - Test with Neon: 2 hours
1337 |   - Total: +0.75 days
1338 | 
1339 |   Grand total: 2-3 days for complete migration
1340 | 
1341 |   The Kicker
1342 | 
1343 |   ChromaDB solves BOTH problems:
1344 |   1. ✅ Works with SQLite AND Postgres (it's separate!)
1345 |   2. ✅ No server needed for local (embedded mode)
1346 |   3. ✅ Better search than FTS5 (semantic!)
1347 |   4. ✅ One implementation for both deployments
1348 | 
1349 |   Want me to prototype this? I can show you:
1350 |   1. ChromaDB embedded with local embeddings
1351 |   2. Example searches showing semantic matching
1352 |   3. Performance benchmarks
1353 |   4. Migration from FTS5
1354 | 
1355 | 
1356 | ## Observations
1357 | 
1358 | - [problem] SQLite FTS5 and PostgreSQL tsvector are incompatible architectures requiring dual implementation #database-compatibility
1359 | - [problem] Cloud deployments lose database on container restart requiring full re-sync #persistence
1360 | - [solution] ChromaDB provides database-agnostic semantic search eliminating dual implementation #architecture
1361 | - [advantage] Semantic search finds related concepts beyond keyword matching improving UX #search-quality
1362 | - [deployment] Embedded ChromaDB requires no external services for FOSS #simplicity
1363 | - [migration] Moving to PostgreSQL solves cloud persistence issues #cloud-architecture
1364 | - [performance] Local embedding models provide good quality at zero cost #cost-optimization
1365 | - [trade-off] Embedding generation adds ~50ms latency vs instant FTS5 indexing #performance
1366 | - [benefit] Single search codebase reduces maintenance burden and test coverage needs #maintainability
1367 | 
1368 | ## Prior Art / References
1369 | 
1370 | ### Community Fork: manuelbliemel/basic-memory (feature/vector-search)
1371 | 
1372 | **Repository**: https://github.com/manuelbliemel/basic-memory/tree/feature/vector-search
1373 | 
1374 | **Key Implementation Details**:
1375 | 
1376 | **Vector Database**: ChromaDB (same as our approach!)
1377 | 
1378 | **Embedding Models**:
1379 | - Local: `all-MiniLM-L6-v2` (default, 384 dims) - same model we planned
1380 | - Also supports: `all-mpnet-base-v2`, `paraphrase-MiniLM-L6-v2`, `multi-qa-MiniLM-L6-cos-v1`
1381 | - OpenAI: `text-embedding-ada-002`, `text-embedding-3-small`, `text-embedding-3-large`
1382 | 
1383 | **Chunking Strategy** (interesting - we didn't consider this):
1384 | - Chunk Size: 500 characters
1385 | - Chunk Overlap: 50 characters
1386 | - Breaks documents into smaller pieces for better semantic search
1387 | 
1388 | **Search Strategies**:
1389 | 1. `fuzzy_only` (default) - FTS5 only
1390 | 2. `vector_only` - ChromaDB only
1391 | 3. `hybrid` (recommended) - Both FTS5 + ChromaDB
1392 | 4. `fuzzy_primary` - FTS5 first, ChromaDB fallback
1393 | 5. `vector_primary` - ChromaDB first, FTS5 fallback
1394 | 
1395 | **Configuration**:
1396 | - Similarity Threshold: 0.1
1397 | - Max Results: 5
1398 | - Storage: `~/.basic-memory/chroma/`
1399 | - Config: `~/.basic-memory/config.json`
1400 | 
1401 | **Key Differences from Our Approach**:
1402 | 
1403 | | Aspect | Their Approach | Our Approach |
1404 | |--------|---------------|--------------|
1405 | | FTS5 | Keep FTS5 + add ChromaDB | Remove FTS5, use SQL for exact lookups |
1406 | | Search Strategy | 5 configurable strategies | Smart routing (automatic) |
1407 | | Document Processing | Chunk into 500-char pieces | Index full documents |
1408 | | Hybrid Mode | Run both, merge, dedupe | Route to best backend |
1409 | | Configuration | User-configurable strategy | Automatic based on query type |
1410 | 
1411 | **What We Can Learn**:
1412 | 
1413 | 1. **Chunking**: Breaking documents into 500-character chunks with 50-char overlap may improve semantic search quality for long documents
1414 |    - Pro: Better granularity for semantic matching
1415 |    - Con: More vectors to store and search
1416 |    - Consider: Optional chunking for large documents (>2000 chars)
1417 | 
1418 | 2. **Configurable Strategies**: Allowing users to choose search strategy provides flexibility
1419 |    - Pro: Power users can tune behavior
1420 |    - Con: More complexity, most users won't configure
1421 |    - Consider: Default to smart routing, allow override via config
1422 | 
1423 | 3. **Similarity Threshold**: They use 0.1 as default
1424 |    - Consider: Benchmark different thresholds for quality
1425 | 
1426 | 4. **Storage Location**: `~/.basic-memory/chroma/` matches our planned `chroma_data/` approach
1427 | 
1428 | **Potential Collaboration**:
1429 | - Their implementation is nearly complete as a fork
1430 | - Could potentially merge their work or use as reference implementation
1431 | - Their chunking strategy could be valuable addition to our approach
1432 | 
1433 | ## Relations
1434 | 
1435 | - implements [[SPEC-11 Basic Memory API Performance Optimization]]
1436 | - relates_to [[Performance Optimizations Documentation]]
1437 | - enables [[PostgreSQL Migration]]
1438 | - improves_on [[SQLite FTS5 Search]]
1439 | - references [[manuelbliemel/basic-memory feature/vector-search fork]]
1440 | 
```
Page 19/23FirstPrevNextLast