This is page 19 of 23. Use http://codebase.md/basicmachines-co/basic-memory?lines=true&page={x} to view the full context.
# Directory Structure
```
├── .claude
│ ├── agents
│ │ ├── python-developer.md
│ │ └── system-architect.md
│ └── commands
│ ├── release
│ │ ├── beta.md
│ │ ├── changelog.md
│ │ ├── release-check.md
│ │ └── release.md
│ ├── spec.md
│ └── test-live.md
├── .dockerignore
├── .github
│ ├── dependabot.yml
│ ├── ISSUE_TEMPLATE
│ │ ├── bug_report.md
│ │ ├── config.yml
│ │ ├── documentation.md
│ │ └── feature_request.md
│ └── workflows
│ ├── claude-code-review.yml
│ ├── claude-issue-triage.yml
│ ├── claude.yml
│ ├── dev-release.yml
│ ├── docker.yml
│ ├── pr-title.yml
│ ├── release.yml
│ └── test.yml
├── .gitignore
├── .python-version
├── CHANGELOG.md
├── CITATION.cff
├── CLA.md
├── CLAUDE.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── docker-compose.yml
├── Dockerfile
├── docs
│ ├── ai-assistant-guide-extended.md
│ ├── character-handling.md
│ ├── cloud-cli.md
│ └── Docker.md
├── justfile
├── LICENSE
├── llms-install.md
├── pyproject.toml
├── README.md
├── SECURITY.md
├── smithery.yaml
├── specs
│ ├── SPEC-1 Specification-Driven Development Process.md
│ ├── SPEC-10 Unified Deployment Workflow and Event Tracking.md
│ ├── SPEC-11 Basic Memory API Performance Optimization.md
│ ├── SPEC-12 OpenTelemetry Observability.md
│ ├── SPEC-13 CLI Authentication with Subscription Validation.md
│ ├── SPEC-14 Cloud Git Versioning & GitHub Backup.md
│ ├── SPEC-14- Cloud Git Versioning & GitHub Backup.md
│ ├── SPEC-15 Configuration Persistence via Tigris for Cloud Tenants.md
│ ├── SPEC-16 MCP Cloud Service Consolidation.md
│ ├── SPEC-17 Semantic Search with ChromaDB.md
│ ├── SPEC-18 AI Memory Management Tool.md
│ ├── SPEC-19 Sync Performance and Memory Optimization.md
│ ├── SPEC-2 Slash Commands Reference.md
│ ├── SPEC-20 Simplified Project-Scoped Rclone Sync.md
│ ├── SPEC-3 Agent Definitions.md
│ ├── SPEC-4 Notes Web UI Component Architecture.md
│ ├── SPEC-5 CLI Cloud Upload via WebDAV.md
│ ├── SPEC-6 Explicit Project Parameter Architecture.md
│ ├── SPEC-7 POC to spike Tigris Turso for local access to cloud data.md
│ ├── SPEC-8 TigrisFS Integration.md
│ ├── SPEC-9 Multi-Project Bidirectional Sync Architecture.md
│ ├── SPEC-9 Signed Header Tenant Information.md
│ └── SPEC-9-1 Follow-Ups- Conflict, Sync, and Observability.md
├── src
│ └── basic_memory
│ ├── __init__.py
│ ├── alembic
│ │ ├── alembic.ini
│ │ ├── env.py
│ │ ├── migrations.py
│ │ ├── script.py.mako
│ │ └── versions
│ │ ├── 3dae7c7b1564_initial_schema.py
│ │ ├── 502b60eaa905_remove_required_from_entity_permalink.py
│ │ ├── 5fe1ab1ccebe_add_projects_table.py
│ │ ├── 647e7a75e2cd_project_constraint_fix.py
│ │ ├── 9d9c1cb7d8f5_add_mtime_and_size_columns_to_entity_.py
│ │ ├── a1b2c3d4e5f6_fix_project_foreign_keys.py
│ │ ├── b3c3938bacdb_relation_to_name_unique_index.py
│ │ ├── cc7172b46608_update_search_index_schema.py
│ │ └── e7e1f4367280_add_scan_watermark_tracking_to_project.py
│ ├── api
│ │ ├── __init__.py
│ │ ├── app.py
│ │ ├── routers
│ │ │ ├── __init__.py
│ │ │ ├── directory_router.py
│ │ │ ├── importer_router.py
│ │ │ ├── knowledge_router.py
│ │ │ ├── management_router.py
│ │ │ ├── memory_router.py
│ │ │ ├── project_router.py
│ │ │ ├── prompt_router.py
│ │ │ ├── resource_router.py
│ │ │ ├── search_router.py
│ │ │ └── utils.py
│ │ └── template_loader.py
│ ├── cli
│ │ ├── __init__.py
│ │ ├── app.py
│ │ ├── auth.py
│ │ ├── commands
│ │ │ ├── __init__.py
│ │ │ ├── cloud
│ │ │ │ ├── __init__.py
│ │ │ │ ├── api_client.py
│ │ │ │ ├── bisync_commands.py
│ │ │ │ ├── cloud_utils.py
│ │ │ │ ├── core_commands.py
│ │ │ │ ├── rclone_commands.py
│ │ │ │ ├── rclone_config.py
│ │ │ │ ├── rclone_installer.py
│ │ │ │ ├── upload_command.py
│ │ │ │ └── upload.py
│ │ │ ├── command_utils.py
│ │ │ ├── db.py
│ │ │ ├── import_chatgpt.py
│ │ │ ├── import_claude_conversations.py
│ │ │ ├── import_claude_projects.py
│ │ │ ├── import_memory_json.py
│ │ │ ├── mcp.py
│ │ │ ├── project.py
│ │ │ ├── status.py
│ │ │ └── tool.py
│ │ └── main.py
│ ├── config.py
│ ├── db.py
│ ├── deps.py
│ ├── file_utils.py
│ ├── ignore_utils.py
│ ├── importers
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── chatgpt_importer.py
│ │ ├── claude_conversations_importer.py
│ │ ├── claude_projects_importer.py
│ │ ├── memory_json_importer.py
│ │ └── utils.py
│ ├── markdown
│ │ ├── __init__.py
│ │ ├── entity_parser.py
│ │ ├── markdown_processor.py
│ │ ├── plugins.py
│ │ ├── schemas.py
│ │ └── utils.py
│ ├── mcp
│ │ ├── __init__.py
│ │ ├── async_client.py
│ │ ├── project_context.py
│ │ ├── prompts
│ │ │ ├── __init__.py
│ │ │ ├── ai_assistant_guide.py
│ │ │ ├── continue_conversation.py
│ │ │ ├── recent_activity.py
│ │ │ ├── search.py
│ │ │ └── utils.py
│ │ ├── resources
│ │ │ ├── ai_assistant_guide.md
│ │ │ └── project_info.py
│ │ ├── server.py
│ │ └── tools
│ │ ├── __init__.py
│ │ ├── build_context.py
│ │ ├── canvas.py
│ │ ├── chatgpt_tools.py
│ │ ├── delete_note.py
│ │ ├── edit_note.py
│ │ ├── list_directory.py
│ │ ├── move_note.py
│ │ ├── project_management.py
│ │ ├── read_content.py
│ │ ├── read_note.py
│ │ ├── recent_activity.py
│ │ ├── search.py
│ │ ├── utils.py
│ │ ├── view_note.py
│ │ └── write_note.py
│ ├── models
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── knowledge.py
│ │ ├── project.py
│ │ └── search.py
│ ├── repository
│ │ ├── __init__.py
│ │ ├── entity_repository.py
│ │ ├── observation_repository.py
│ │ ├── project_info_repository.py
│ │ ├── project_repository.py
│ │ ├── relation_repository.py
│ │ ├── repository.py
│ │ └── search_repository.py
│ ├── schemas
│ │ ├── __init__.py
│ │ ├── base.py
│ │ ├── cloud.py
│ │ ├── delete.py
│ │ ├── directory.py
│ │ ├── importer.py
│ │ ├── memory.py
│ │ ├── project_info.py
│ │ ├── prompt.py
│ │ ├── request.py
│ │ ├── response.py
│ │ ├── search.py
│ │ └── sync_report.py
│ ├── services
│ │ ├── __init__.py
│ │ ├── context_service.py
│ │ ├── directory_service.py
│ │ ├── entity_service.py
│ │ ├── exceptions.py
│ │ ├── file_service.py
│ │ ├── initialization.py
│ │ ├── link_resolver.py
│ │ ├── project_service.py
│ │ ├── search_service.py
│ │ └── service.py
│ ├── sync
│ │ ├── __init__.py
│ │ ├── background_sync.py
│ │ ├── sync_service.py
│ │ └── watch_service.py
│ ├── templates
│ │ └── prompts
│ │ ├── continue_conversation.hbs
│ │ └── search.hbs
│ └── utils.py
├── test-int
│ ├── BENCHMARKS.md
│ ├── cli
│ │ ├── test_project_commands_integration.py
│ │ └── test_version_integration.py
│ ├── conftest.py
│ ├── mcp
│ │ ├── test_build_context_underscore.py
│ │ ├── test_build_context_validation.py
│ │ ├── test_chatgpt_tools_integration.py
│ │ ├── test_default_project_mode_integration.py
│ │ ├── test_delete_note_integration.py
│ │ ├── test_edit_note_integration.py
│ │ ├── test_list_directory_integration.py
│ │ ├── test_move_note_integration.py
│ │ ├── test_project_management_integration.py
│ │ ├── test_project_state_sync_integration.py
│ │ ├── test_read_content_integration.py
│ │ ├── test_read_note_integration.py
│ │ ├── test_search_integration.py
│ │ ├── test_single_project_mcp_integration.py
│ │ └── test_write_note_integration.py
│ ├── test_db_wal_mode.py
│ ├── test_disable_permalinks_integration.py
│ └── test_sync_performance_benchmark.py
├── tests
│ ├── __init__.py
│ ├── api
│ │ ├── conftest.py
│ │ ├── test_async_client.py
│ │ ├── test_continue_conversation_template.py
│ │ ├── test_directory_router.py
│ │ ├── test_importer_router.py
│ │ ├── test_knowledge_router.py
│ │ ├── test_management_router.py
│ │ ├── test_memory_router.py
│ │ ├── test_project_router_operations.py
│ │ ├── test_project_router.py
│ │ ├── test_prompt_router.py
│ │ ├── test_relation_background_resolution.py
│ │ ├── test_resource_router.py
│ │ ├── test_search_router.py
│ │ ├── test_search_template.py
│ │ ├── test_template_loader_helpers.py
│ │ └── test_template_loader.py
│ ├── cli
│ │ ├── conftest.py
│ │ ├── test_cli_tools.py
│ │ ├── test_cloud_authentication.py
│ │ ├── test_ignore_utils.py
│ │ ├── test_import_chatgpt.py
│ │ ├── test_import_claude_conversations.py
│ │ ├── test_import_claude_projects.py
│ │ ├── test_import_memory_json.py
│ │ ├── test_project_add_with_local_path.py
│ │ └── test_upload.py
│ ├── conftest.py
│ ├── db
│ │ └── test_issue_254_foreign_key_constraints.py
│ ├── importers
│ │ ├── test_importer_base.py
│ │ └── test_importer_utils.py
│ ├── markdown
│ │ ├── __init__.py
│ │ ├── test_date_frontmatter_parsing.py
│ │ ├── test_entity_parser_error_handling.py
│ │ ├── test_entity_parser.py
│ │ ├── test_markdown_plugins.py
│ │ ├── test_markdown_processor.py
│ │ ├── test_observation_edge_cases.py
│ │ ├── test_parser_edge_cases.py
│ │ ├── test_relation_edge_cases.py
│ │ └── test_task_detection.py
│ ├── mcp
│ │ ├── conftest.py
│ │ ├── test_obsidian_yaml_formatting.py
│ │ ├── test_permalink_collision_file_overwrite.py
│ │ ├── test_prompts.py
│ │ ├── test_resources.py
│ │ ├── test_tool_build_context.py
│ │ ├── test_tool_canvas.py
│ │ ├── test_tool_delete_note.py
│ │ ├── test_tool_edit_note.py
│ │ ├── test_tool_list_directory.py
│ │ ├── test_tool_move_note.py
│ │ ├── test_tool_read_content.py
│ │ ├── test_tool_read_note.py
│ │ ├── test_tool_recent_activity.py
│ │ ├── test_tool_resource.py
│ │ ├── test_tool_search.py
│ │ ├── test_tool_utils.py
│ │ ├── test_tool_view_note.py
│ │ ├── test_tool_write_note.py
│ │ └── tools
│ │ └── test_chatgpt_tools.py
│ ├── Non-MarkdownFileSupport.pdf
│ ├── repository
│ │ ├── test_entity_repository_upsert.py
│ │ ├── test_entity_repository.py
│ │ ├── test_entity_upsert_issue_187.py
│ │ ├── test_observation_repository.py
│ │ ├── test_project_info_repository.py
│ │ ├── test_project_repository.py
│ │ ├── test_relation_repository.py
│ │ ├── test_repository.py
│ │ ├── test_search_repository_edit_bug_fix.py
│ │ └── test_search_repository.py
│ ├── schemas
│ │ ├── test_base_timeframe_minimum.py
│ │ ├── test_memory_serialization.py
│ │ ├── test_memory_url_validation.py
│ │ ├── test_memory_url.py
│ │ ├── test_schemas.py
│ │ └── test_search.py
│ ├── Screenshot.png
│ ├── services
│ │ ├── test_context_service.py
│ │ ├── test_directory_service.py
│ │ ├── test_entity_service_disable_permalinks.py
│ │ ├── test_entity_service.py
│ │ ├── test_file_service.py
│ │ ├── test_initialization.py
│ │ ├── test_link_resolver.py
│ │ ├── test_project_removal_bug.py
│ │ ├── test_project_service_operations.py
│ │ ├── test_project_service.py
│ │ └── test_search_service.py
│ ├── sync
│ │ ├── test_character_conflicts.py
│ │ ├── test_sync_service_incremental.py
│ │ ├── test_sync_service.py
│ │ ├── test_sync_wikilink_issue.py
│ │ ├── test_tmp_files.py
│ │ ├── test_watch_service_edge_cases.py
│ │ ├── test_watch_service_reload.py
│ │ └── test_watch_service.py
│ ├── test_config.py
│ ├── test_db_migration_deduplication.py
│ ├── test_deps.py
│ ├── test_production_cascade_delete.py
│ ├── test_rclone_commands.py
│ └── utils
│ ├── test_file_utils.py
│ ├── test_frontmatter_obsidian_compatible.py
│ ├── test_parse_tags.py
│ ├── test_permalink_formatting.py
│ ├── test_utf8_handling.py
│ └── test_validate_project_path.py
├── uv.lock
├── v0.15.0-RELEASE-DOCS.md
└── v15-docs
├── api-performance.md
├── background-relations.md
├── basic-memory-home.md
├── bug-fixes.md
├── chatgpt-integration.md
├── cloud-authentication.md
├── cloud-bisync.md
├── cloud-mode-usage.md
├── cloud-mount.md
├── default-project-mode.md
├── env-file-removal.md
├── env-var-overrides.md
├── explicit-project-parameter.md
├── gitignore-integration.md
├── project-root-env-var.md
├── README.md
└── sqlite-performance.md
```
# Files
--------------------------------------------------------------------------------
/specs/SPEC-9 Multi-Project Bidirectional Sync Architecture.md:
--------------------------------------------------------------------------------
```markdown
1 | ---
2 | title: 'SPEC-9: Multi-Project Bidirectional Sync Architecture'
3 | type: spec
4 | permalink: specs/spec-9-multi-project-bisync
5 | tags:
6 | - cloud
7 | - bisync
8 | - architecture
9 | - multi-project
10 | ---
11 |
12 | # SPEC-9: Multi-Project Bidirectional Sync Architecture
13 |
14 | ## Status: ✅ Implementation Complete
15 |
16 | **Completed Phases:**
17 | - ✅ Phase 1: Cloud Mode Toggle & Config
18 | - ✅ Phase 2: Bisync Updates (Multi-Project)
19 | - ✅ Phase 3: Sync Command Dual Mode
20 | - ✅ Phase 4: Remove Duplicate Commands & Cloud Mode Auth
21 | - ✅ Phase 5: Mount Updates
22 | - ✅ Phase 6: Safety & Validation
23 | - ⏸️ Phase 7: Cloud-Side Implementation (Deferred to cloud repo)
24 | - ✅ Phase 8.1: Testing (All test scenarios validated)
25 | - ✅ Phase 8.2: Documentation (Core docs complete, demos pending)
26 |
27 | **Key Achievements:**
28 | - Unified CLI: `bm sync`, `bm project`, `bm tool` work transparently in both local and cloud modes
29 | - Multi-project sync: Single `bm sync` operation handles all projects bidirectionally
30 | - Cloud mode toggle: `bm cloud login` / `bm cloud logout` switches modes seamlessly
31 | - Integrity checking: `bm cloud check` verifies file matching without data transfer
32 | - Directory isolation: Mount and bisync use separate directories with conflict prevention
33 | - Clean UX: No RCLONE_TEST files, clear error messages, transparent implementation
34 |
35 | ## Why
36 |
37 | **Current State:**
38 | SPEC-8 implemented rclone bisync for cloud file synchronization, but has several architectural limitations:
39 | 1. Syncs only a single project subdirectory (`bucket:/basic-memory`)
40 | 2. Requires separate `bm cloud` command namespace, duplicating existing CLI commands
41 | 3. Users must learn different commands for local vs cloud operations
42 | 4. RCLONE_TEST marker files clutter user directories
43 |
44 | **Problems:**
45 | 1. **Duplicate Commands**: `bm project` vs `bm cloud project`, `bm tool` vs (no cloud equivalent)
46 | 2. **Inconsistent UX**: Same operations require different command syntax depending on mode
47 | 3. **Single Project Sync**: Users can only sync one project at a time
48 | 4. **Manual Coordination**: Creating new projects requires manual coordination between local and cloud
49 | 5. **Confusing Artifacts**: RCLONE_TEST marker files confuse users
50 |
51 | **Goals:**
52 | - **Unified CLI**: All existing `bm` commands work in both local and cloud mode via toggle
53 | - **Multi-Project Sync**: Single sync operation handles all projects bidirectionally
54 | - **Simple Mode Switch**: `bm cloud login` enables cloud mode, `logout` returns to local
55 | - **Automatic Registration**: Projects auto-register on both local and cloud sides
56 | - **Clean UX**: Remove unnecessary safety checks and confusing artifacts
57 |
58 | ## Cloud Access Paradigm: The Dropbox Model
59 |
60 | **Mental Model Shift:**
61 |
62 | Basic Memory cloud access follows the **Dropbox/iCloud paradigm** - not a per-project cloud connection model.
63 |
64 | **What This Means:**
65 |
66 | ```
67 | Traditional Project-Based Model (❌ Not This):
68 | bm cloud mount --project work # Mount individual project
69 | bm cloud mount --project personal # Mount another project
70 | bm cloud sync --project research # Sync specific project
71 | → Multiple connections, multiple credentials, complex management
72 |
73 | Dropbox Model (✅ This):
74 | bm cloud mount # One mount, all projects
75 | bm sync # One sync, all projects
76 | ~/basic-memory-cloud/ # One folder, all content
77 | → Single connection, organized by folders (projects)
78 | ```
79 |
80 | **Key Principles:**
81 |
82 | 1. **Mount/Bisync = Access Methods, Not Project Tools**
83 | - Mount: Read-through cache to cloud (like Dropbox folder)
84 | - Bisync: Bidirectional sync with cloud (like Dropbox sync)
85 | - Both operate at **bucket level** (all projects)
86 |
87 | 2. **Projects = Organization Within Cloud Space**
88 | - Projects are folders within your cloud storage
89 | - Creating a folder creates a project (auto-discovered)
90 | - Projects are managed via `bm project` commands
91 |
92 | 3. **One Cloud Space Per Machine**
93 | - One set of IAM credentials per tenant
94 | - One mount point: `~/basic-memory-cloud/`
95 | - One bisync directory: `~/basic-memory-cloud-sync/` (default)
96 | - All projects accessible through this single entry point
97 |
98 | 4. **Why This Works Better**
99 | - **Credential Management**: One credential set, not N sets per project
100 | - **Resource Efficiency**: One rclone process, not N processes
101 | - **Familiar Pattern**: Users already understand Dropbox/iCloud
102 | - **Operational Simplicity**: `mount` once, `unmount` once
103 | - **Scales Naturally**: Add projects by creating folders, not reconfiguring cloud access
104 |
105 | **User Journey:**
106 |
107 | ```bash
108 | # Setup cloud access (once)
109 | bm cloud login
110 | bm cloud mount # or: bm cloud setup for bisync
111 |
112 | # Work with projects (create folders as needed)
113 | cd ~/basic-memory-cloud/
114 | mkdir my-new-project
115 | echo "# Notes" > my-new-project/readme.md
116 |
117 | # Cloud auto-discovers and registers project
118 | # No additional cloud configuration needed
119 | ```
120 |
121 | This paradigm shift means **mount and bisync are infrastructure concerns**, while **projects are content organization**. Users think about their knowledge, not about cloud plumbing.
122 |
123 | ## What
124 |
125 | This spec affects:
126 |
127 | 1. **Cloud Mode Toggle** (`config.py`, `async_client.py`):
128 | - Add `cloud_mode` flag to `~/.basic-memory/config.json`
129 | - Set/unset `BASIC_MEMORY_PROXY_URL` based on cloud mode
130 | - `bm cloud login` enables cloud mode, `logout` disables it
131 | - All CLI commands respect cloud mode via existing async_client
132 |
133 | 2. **Unified CLI Commands**:
134 | - **Remove**: `bm cloud project` commands (duplicate of `bm project`)
135 | - **Enhance**: `bm sync` co-opted for bisync in cloud mode
136 | - **Keep**: `bm cloud login/logout/status/setup` for mode management
137 | - **Result**: `bm project`, `bm tool`, `bm sync` work in both modes
138 |
139 | 3. **Bisync Integration** (`bisync_commands.py`):
140 | - Remove `--check-access` (no RCLONE_TEST files)
141 | - Sync bucket root (all projects), not single subdirectory
142 | - Project auto-registration before sync
143 | - `bm sync` triggers bisync in cloud mode
144 | - `bm sync --watch` for continuous sync
145 |
146 | 4. **Config Structure**:
147 | ```json
148 | {
149 | "cloud_mode": true,
150 | "cloud_host": "https://cloud.basicmemory.com",
151 | "auth_tokens": {...},
152 | "bisync_config": {
153 | "profile": "balanced",
154 | "sync_dir": "~/basic-memory-cloud-sync"
155 | }
156 | }
157 | ```
158 |
159 | 5. **User Workflows**:
160 | - **Enable cloud**: `bm cloud login` → all commands work remotely
161 | - **Create projects**: `bm project add "name"` creates on cloud
162 | - **Sync files**: `bm sync` runs bisync (all projects)
163 | - **Use tools**: `bm tool write-note` creates notes on cloud
164 | - **Disable cloud**: `bm cloud logout` → back to local mode
165 |
166 | ## Implementation Tasks
167 |
168 | ### Phase 1: Cloud Mode Toggle & Config (Foundation) ✅
169 |
170 | **1.1 Update Config Schema**
171 | - [x] Add `cloud_mode: bool = False` to Config model
172 | - [x] Add `bisync_config: dict` with `profile` and `sync_dir` fields
173 | - [x] Ensure `cloud_host` field exists
174 | - [x] Add config migration for existing users (defaults handle this)
175 |
176 | **1.2 Update async_client.py**
177 | - [x] Read `cloud_mode` from config (not just environment)
178 | - [x] Set `BASIC_MEMORY_PROXY_URL` from config when `cloud_mode=true`
179 | - [x] Priority: env var > config.cloud_host (if cloud_mode) > None (local ASGI)
180 | - [ ] Test both local and cloud mode routing
181 |
182 | **1.3 Update Login/Logout Commands**
183 | - [x] `bm cloud login`: Set `cloud_mode=true` and save config
184 | - [x] `bm cloud login`: Set `BASIC_MEMORY_PROXY_URL` environment variable
185 | - [x] `bm cloud logout`: Set `cloud_mode=false` and save config
186 | - [x] `bm cloud logout`: Clear `BASIC_MEMORY_PROXY_URL` environment variable
187 | - [x] `bm cloud status`: Show current mode (local/cloud), connection status
188 |
189 | **1.4 Skip Initialization in Cloud Mode** ✅
190 | - [x] Update `ensure_initialization()` to check `cloud_mode` and return early
191 | - [x] Document that `config.projects` is only used in local mode
192 | - [x] Cloud manages its own projects via API, no local reconciliation needed
193 |
194 | ### Phase 2: Bisync Updates (Multi-Project)
195 |
196 | **2.1 Remove RCLONE_TEST Files** ✅
197 | - [x] Update all bisync profiles: `check_access=False`
198 | - [x] Remove RCLONE_TEST creation from `setup_cloud_bisync()`
199 | - [x] Remove RCLONE_TEST upload logic
200 | - [ ] Update documentation
201 |
202 | **2.2 Sync Bucket Root (All Projects)** ✅
203 | - [x] Change remote path from `bucket:/basic-memory` to `bucket:/` in `build_bisync_command()`
204 | - [x] Update `setup_cloud_bisync()` to use bucket root
205 | - [ ] Test with multiple projects
206 |
207 | **2.3 Project Auto-Registration (Bisync)** ✅
208 | - [x] Add `fetch_cloud_projects()` function (GET /proxy/projects/projects)
209 | - [x] Add `scan_local_directories()` function
210 | - [x] Add `create_cloud_project()` function (POST /proxy/projects/projects)
211 | - [x] Integrate into `run_bisync()`: fetch → scan → create missing → sync
212 | - [x] Wait for API 201 response before syncing
213 |
214 | **2.4 Bisync Directory Configuration** ✅
215 | - [x] Add `--dir` parameter to `bm cloud bisync-setup`
216 | - [x] Store bisync directory in config
217 | - [x] Default to `~/basic-memory-cloud-sync/`
218 | - [x] Add `validate_bisync_directory()` safety check
219 | - [x] Update `get_default_mount_path()` to return fixed `~/basic-memory-cloud/`
220 |
221 | **2.5 Sync/Status API Infrastructure** ✅ (commit d48b1dc)
222 | - [x] Create `POST /{project}/project/sync` endpoint for background sync
223 | - [x] Create `POST /{project}/project/status` endpoint for scan-only status
224 | - [x] Create `SyncReportResponse` Pydantic schema
225 | - [x] Refactor CLI `sync` command to use API endpoint
226 | - [x] Refactor CLI `status` command to use API endpoint
227 | - [x] Create `command_utils.py` with shared `run_sync()` function
228 | - [x] Update `notify_container_sync()` to call `run_sync()` for each project
229 | - [x] Update all tests to match new API-based implementation
230 |
231 | ### Phase 3: Sync Command Dual Mode ✅
232 |
233 | **3.1 Update `bm sync` Command** ✅
234 | - [x] Check `config.cloud_mode` at start
235 | - [x] If `cloud_mode=false`: Run existing local sync
236 | - [x] If `cloud_mode=true`: Run bisync
237 | - [x] Add `--watch` parameter for continuous sync
238 | - [x] Add `--interval` parameter (default 60 seconds)
239 | - [x] Error if `--watch` used in local mode with helpful message
240 |
241 | **3.2 Watch Mode for Bisync** ✅
242 | - [x] Implement `run_bisync_watch()` with interval loop
243 | - [x] Add `--interval` parameter (default 60 seconds)
244 | - [x] Handle errors gracefully, continue on failure
245 | - [x] Show sync progress and status
246 |
247 | **3.3 Integrity Check Command** ✅
248 | - [x] Implement `bm cloud check` command using `rclone check`
249 | - [x] Read-only operation that verifies file matching
250 | - [x] Error with helpful messages if rclone/bisync not set up
251 | - [x] Support `--one-way` flag for faster checks
252 | - [x] Transparent about rclone implementation
253 | - [x] Suggest `bm sync` to resolve differences
254 |
255 | **Implementation Notes:**
256 | - `bm sync` adapts to cloud mode automatically - users don't need separate commands
257 | - `bm cloud bisync` kept for power users with full options (--dry-run, --resync, --profile, --verbose)
258 | - `bm cloud check` provides integrity verification without transferring data
259 | - Design philosophy: Simplicity for everyday use, transparency about implementation
260 |
261 | ### Phase 4: Remove Duplicate Commands & Cloud Mode Auth ✅
262 |
263 | **4.0 Cloud Mode Authentication** ✅
264 | - [x] Update `async_client.py` to support dual auth sources
265 | - [x] FastMCP context auth (cloud service mode) via `inject_auth_header()`
266 | - [x] JWT token file auth (CLI cloud mode) via `CLIAuth.get_valid_token()`
267 | - [x] Automatic token refresh for CLI cloud mode
268 | - [x] Remove `BASIC_MEMORY_PROXY_URL` environment variable dependency
269 | - [x] Simplify to use only `config.cloud_mode` + `config.cloud_host`
270 |
271 | **4.1 Delete `bm cloud project` Commands** ✅
272 | - [x] Remove `bm cloud project list` (use `bm project list`)
273 | - [x] Remove `bm cloud project add` (use `bm project add`)
274 | - [x] Update `core_commands.py` to remove project_app subcommands
275 | - [x] Keep only: `login`, `logout`, `status`, `setup`, `mount`, `unmount`, bisync commands
276 | - [x] Remove unused imports (Table, generate_permalink, os)
277 | - [x] Clean up environment variable references in login/logout
278 |
279 | **4.2 CLI Command Cloud Mode Integration** ✅
280 | - [x] Add runtime `cloud_mode_enabled` checks to all CLI commands
281 | - [x] Update `list_projects()` to conditionally authenticate based on cloud mode
282 | - [x] Update `remove_project()` to conditionally authenticate based on cloud mode
283 | - [x] Update `run_sync()` to conditionally authenticate based on cloud mode
284 | - [x] Update `get_project_info()` to conditionally authenticate based on cloud mode
285 | - [x] Update `run_status()` to conditionally authenticate based on cloud mode
286 | - [x] Remove auth from `set_default_project()` (local-only command, no cloud version)
287 | - [x] Create CLI integration tests (`test-int/cli/`) to validate both local and cloud modes
288 | - [x] Replace mock-heavy CLI tests with integration tests (deleted 5 mock test files)
289 |
290 | **4.3 OAuth Authentication Fixes** ✅
291 | - [x] Restore missing `SettingsConfigDict` in `BasicMemoryConfig`
292 | - [x] Fix environment variable reading with `BASIC_MEMORY_` prefix
293 | - [x] Fix `.env` file loading
294 | - [x] Fix extra field handling for config files
295 | - [x] Resolve `bm cloud login` OAuth failure ("Something went wrong" error)
296 | - [x] Implement PKCE (Proof Key for Code Exchange) for device flow
297 | - [x] Generate code verifier and SHA256 challenge for device authorization
298 | - [x] Send code_verifier with token polling requests
299 | - [x] Support both PKCE-required and PKCE-optional OAuth clients
300 | - [x] Verify authentication flow works end-to-end with staging and production
301 | - [x] Document WorkOS requirement: redirect URI must be configured even for device flow
302 |
303 | **4.4 Update Documentation**
304 | - [ ] Update `cloud-cli.md` with cloud mode toggle workflow
305 | - [ ] Document `bm cloud login` → use normal commands
306 | - [ ] Add examples of cloud mode usage
307 | - [ ] Document mount vs bisync directory isolation
308 | - [ ] Add troubleshooting section
309 |
310 | ### Phase 5: Mount Updates ✅
311 |
312 | **5.1 Fixed Mount Directory** ✅
313 | - [x] Change mount path to `~/basic-memory-cloud/` (fixed, no tenant ID)
314 | - [x] Update `get_default_mount_path()` function
315 | - [x] Remove configurability (fixed location)
316 | - [x] Update mount commands to use new path
317 |
318 | **5.2 Mount at Bucket Root** ✅
319 | - [x] Ensure mount uses bucket root (not subdirectory)
320 | - [x] Test with multiple projects
321 | - [x] Verify all projects visible in mount
322 |
323 | **Implementation:** Mount uses fixed `~/basic-memory-cloud/` directory and syncs entire bucket root `basic-memory-{tenant_id}:{bucket_name}` for all projects.
324 |
325 | ### Phase 6: Safety & Validation ✅
326 |
327 | **6.1 Directory Conflict Prevention** ✅
328 | - [x] Implement `validate_bisync_directory()` check
329 | - [x] Detect if bisync dir == mount dir
330 | - [x] Detect if bisync dir is currently mounted
331 | - [x] Show clear error messages with solutions
332 |
333 | **6.2 State Management** ✅
334 | - [x] Use `--workdir` for bisync state
335 | - [x] Store state in `~/.basic-memory/bisync-state/{tenant-id}/`
336 | - [x] Ensure state directory created before bisync
337 |
338 | **Implementation:** `validate_bisync_directory()` prevents conflicts by checking directory equality and mount status. State managed in isolated `~/.basic-memory/bisync-state/{tenant-id}/` directory using `--workdir` flag.
339 |
340 | ### Phase 7: Cloud-Side Implementation (Deferred to Cloud Repo)
341 |
342 | **7.1 Project Discovery Service (Cloud)** - Deferred
343 | - [ ] Create `ProjectDiscoveryService` background job
344 | - [ ] Scan `/app/data/` every 2 minutes
345 | - [ ] Auto-register new directories as projects
346 | - [ ] Log discovery events
347 | - [ ] Handle errors gracefully
348 |
349 | **7.2 Project API Updates (Cloud)** - Deferred
350 | - [ ] Ensure `POST /proxy/projects/projects` creates directory synchronously
351 | - [ ] Return 201 with project details
352 | - [ ] Ensure directory ready immediately after creation
353 |
354 | **Note:** Phase 7 is cloud-side work that belongs in the basic-memory-cloud repository. The CLI-side implementation (Phase 2.3 auto-registration) is complete and working - it calls the existing cloud API endpoints.
355 |
356 | ### Phase 8: Testing & Documentation
357 |
358 | **8.1 Test Scenarios**
359 | - [x] Test: Cloud mode toggle (login/logout)
360 | - [x] Test: Local-first project creation (bisync)
361 | - [x] Test: Cloud-first project creation (API)
362 | - [x] Test: Multi-project bidirectional sync
363 | - [x] Test: MCP tools in cloud mode
364 | - [x] Test: Watch mode continuous sync
365 | - [x] Test: Safety profile protection (max_delete implemented)
366 | - [x] Test: No RCLONE_TEST files (check_access=False in all profiles)
367 | - [x] Test: Mount/bisync directory isolation (validate_bisync_directory)
368 | - [x] Test: Integrity check command (bm cloud check)
369 |
370 | **8.2 Documentation**
371 | - [x] Update cloud-cli.md with cloud mode instructions
372 | - [x] Document Dropbox model paradigm
373 | - [x] Update command reference with new commands
374 | - [x] Document `bm sync` dual mode behavior
375 | - [x] Document `bm cloud check` command
376 | - [x] Document directory structure and fixed paths
377 | - [ ] Update README with quick start
378 | - [ ] Create migration guide for existing users
379 | - [ ] Create video/GIF demos
380 |
381 | ### Success Criteria Checklist
382 |
383 | - [x] `bm cloud login` enables cloud mode for all commands
384 | - [x] `bm cloud logout` reverts to local mode
385 | - [x] `bm project`, `bm tool`, `bm sync` work transparently in both modes
386 | - [x] `bm sync` runs bisync in cloud mode, local sync in local mode
387 | - [x] Single sync operation handles all projects bidirectionally
388 | - [x] Local directories auto-create cloud projects via API
389 | - [x] Cloud projects auto-sync to local directories
390 | - [x] No RCLONE_TEST files in user directories
391 | - [x] Bisync profiles provide safety via `max_delete` limits
392 | - [x] `bm sync --watch` enables continuous sync
393 | - [x] No duplicate `bm cloud project` commands (removed)
394 | - [x] `bm cloud check` command for integrity verification
395 | - [ ] Documentation covers cloud mode toggle and workflows
396 | - [ ] Edge cases handled gracefully with clear errors
397 |
398 | ## How (High Level)
399 |
400 | ### Architecture Overview
401 |
402 | **Cloud Mode Toggle:**
403 | ```
404 | ┌─────────────────────────────────────┐
405 | │ bm cloud login │
406 | │ ├─ Authenticate via OAuth │
407 | │ ├─ Set cloud_mode: true in config │
408 | │ └─ Set BASIC_MEMORY_PROXY_URL │
409 | └─────────────────────────────────────┘
410 | ↓
411 | ┌─────────────────────────────────────┐
412 | │ All CLI commands use async_client │
413 | │ ├─ async_client checks proxy URL │
414 | │ ├─ If set: HTTP to cloud │
415 | │ └─ If not: Local ASGI │
416 | └─────────────────────────────────────┘
417 | ↓
418 | ┌─────────────────────────────────────┐
419 | │ bm project add "work" │
420 | │ bm tool write-note ... │
421 | │ bm sync (triggers bisync) │
422 | │ → All work against cloud │
423 | └─────────────────────────────────────┘
424 | ```
425 |
426 | **Storage Hierarchy:**
427 | ```
428 | Cloud Container: Bucket: Local Sync Dir:
429 | /app/data/ (mounted) ←→ production-tenant-{id}/ ←→ ~/basic-memory-cloud-sync/
430 | ├── basic-memory/ ├── basic-memory/ ├── basic-memory/
431 | │ ├── notes/ │ ├── notes/ │ ├── notes/
432 | │ └── concepts/ │ └── concepts/ │ └── concepts/
433 | ├── work-project/ ├── work-project/ ├── work-project/
434 | │ └── tasks/ │ └── tasks/ │ └── tasks/
435 | └── personal/ └── personal/ └── personal/
436 | └── journal/ └── journal/ └── journal/
437 |
438 | Bidirectional sync via rclone bisync
439 | ```
440 |
441 | ### Sync Flow
442 |
443 | **`bm sync` execution (in cloud mode):**
444 |
445 | 1. **Check cloud mode**
446 | ```python
447 | if not config.cloud_mode:
448 | # Run normal local file sync
449 | run_local_sync()
450 | return
451 |
452 | # Cloud mode: Run bisync
453 | ```
454 |
455 | 2. **Fetch cloud projects**
456 | ```python
457 | # GET /proxy/projects/projects (via async_client)
458 | cloud_projects = fetch_cloud_projects()
459 | cloud_project_names = {p["name"] for p in cloud_projects["projects"]}
460 | ```
461 |
462 | 3. **Scan local sync directory**
463 | ```python
464 | sync_dir = config.bisync_config["sync_dir"] # ~/basic-memory-cloud-sync
465 | local_dirs = [d.name for d in sync_dir.iterdir()
466 | if d.is_dir() and not d.name.startswith('.')]
467 | ```
468 |
469 | 4. **Create missing cloud projects**
470 | ```python
471 | for dir_name in local_dirs:
472 | if dir_name not in cloud_project_names:
473 | # POST /proxy/projects/projects (via async_client)
474 | create_cloud_project(name=dir_name)
475 | # Blocks until 201 response
476 | ```
477 |
478 | 5. **Run bisync on bucket root**
479 | ```bash
480 | rclone bisync \
481 | ~/basic-memory-cloud-sync \
482 | basic-memory-{tenant}:{bucket} \
483 | --filters-file ~/.basic-memory/.bmignore.rclone \
484 | --conflict-resolve=newer \
485 | --max-delete=25
486 | # Syncs ALL project subdirectories bidirectionally
487 | ```
488 |
489 | 6. **Notify cloud to refresh** (commit d48b1dc)
490 | ```python
491 | # After rclone bisync completes, sync each project's database
492 | for project in cloud_projects:
493 | # POST /{project}/project/sync (via async_client)
494 | # Triggers background sync for this project
495 | await run_sync(project=project_name)
496 | ```
497 |
498 | ### Key Changes
499 |
500 | **1. Cloud Mode via Config**
501 |
502 | **Config changes:**
503 | ```python
504 | class Config:
505 | cloud_mode: bool = False
506 | cloud_host: str = "https://cloud.basicmemory.com"
507 | bisync_config: dict = {
508 | "profile": "balanced",
509 | "sync_dir": "~/basic-memory-cloud-sync"
510 | }
511 | ```
512 |
513 | **async_client.py behavior:**
514 | ```python
515 | def create_client() -> AsyncClient:
516 | # Check config first, then environment
517 | config = ConfigManager().config
518 | proxy_url = os.getenv("BASIC_MEMORY_PROXY_URL") or \
519 | (config.cloud_host if config.cloud_mode else None)
520 |
521 | if proxy_url:
522 | return AsyncClient(base_url=proxy_url) # HTTP to cloud
523 | else:
524 | return AsyncClient(transport=ASGITransport(...)) # Local ASGI
525 | ```
526 |
527 | **2. Login/Logout Sets Cloud Mode**
528 |
529 | ```python
530 | # bm cloud login
531 | async def login():
532 | # Existing OAuth flow...
533 | success = await auth.login()
534 | if success:
535 | config.cloud_mode = True
536 | config.save()
537 | os.environ["BASIC_MEMORY_PROXY_URL"] = config.cloud_host
538 | ```
539 |
540 | ```python
541 | # bm cloud logout
542 | def logout():
543 | config.cloud_mode = False
544 | config.save()
545 | os.environ.pop("BASIC_MEMORY_PROXY_URL", None)
546 | ```
547 |
548 | **3. Remove Duplicate Commands**
549 |
550 | **Delete:**
551 | - `bm cloud project list` → use `bm project list`
552 | - `bm cloud project add` → use `bm project add`
553 |
554 | **Keep:**
555 | - `bm cloud login` - Enable cloud mode
556 | - `bm cloud logout` - Disable cloud mode
557 | - `bm cloud status` - Show current mode & connection
558 | - `bm cloud setup` - Initial bisync setup
559 | - `bm cloud bisync` - Power-user command with full options
560 | - `bm cloud check` - Verify file integrity between local and cloud
561 |
562 | **4. Sync Command Dual Mode**
563 |
564 | ```python
565 | # bm sync
566 | def sync_command(watch: bool = False, profile: str = "balanced"):
567 | config = ConfigManager().config
568 |
569 | if config.cloud_mode:
570 | # Run bisync for cloud sync
571 | run_bisync(profile=profile, watch=watch)
572 | else:
573 | # Run local file sync
574 | run_local_sync()
575 | ```
576 |
577 | **5. Remove RCLONE_TEST Files**
578 |
579 | ```python
580 | # All profiles: check_access=False
581 | BISYNC_PROFILES = {
582 | "safe": RcloneBisyncProfile(check_access=False, max_delete=10),
583 | "balanced": RcloneBisyncProfile(check_access=False, max_delete=25),
584 | "fast": RcloneBisyncProfile(check_access=False, max_delete=50),
585 | }
586 | ```
587 |
588 | **6. Sync Bucket Root (All Projects)**
589 |
590 | ```python
591 | # Sync entire bucket, not subdirectory
592 | rclone_remote = f"basic-memory-{tenant_id}:{bucket_name}"
593 | ```
594 |
595 | ## How to Evaluate
596 |
597 | ### Test Scenarios
598 |
599 | **1. Cloud Mode Toggle**
600 | ```bash
601 | # Start in local mode
602 | bm project list
603 | # → Shows local projects
604 |
605 | # Enable cloud mode
606 | bm cloud login
607 | # → Authenticates, sets cloud_mode=true
608 |
609 | bm project list
610 | # → Now shows cloud projects (same command!)
611 |
612 | # Disable cloud mode
613 | bm cloud logout
614 |
615 | bm project list
616 | # → Back to local projects
617 | ```
618 |
619 | **Expected:** ✅ Single command works in both modes
620 |
621 | **2. Local-First Project Creation (Cloud Mode)**
622 | ```bash
623 | # Enable cloud mode
624 | bm cloud login
625 |
626 | # Create new project locally in sync dir
627 | mkdir ~/basic-memory-cloud-sync/my-research
628 | echo "# Research Notes" > ~/basic-memory-cloud-sync/my-research/index.md
629 |
630 | # Run sync (triggers bisync in cloud mode)
631 | bm sync
632 |
633 | # Verify:
634 | # - Cloud project created automatically via API
635 | # - Files synced to bucket:/my-research/
636 | # - Cloud database updated
637 | # - `bm project list` shows new project
638 | ```
639 |
640 | **Expected:** ✅ Project visible in cloud project list
641 |
642 | **3. Cloud-First Project Creation**
643 | ```bash
644 | # In cloud mode
645 | bm project add "work-notes"
646 | # → Creates project on cloud (via async_client HTTP)
647 |
648 | # Run sync
649 | bm sync
650 |
651 | # Verify:
652 | # - Local directory ~/basic-memory-cloud-sync/work-notes/ created
653 | # - Files sync bidirectionally
654 | # - Can use `bm tool write-note` to add content remotely
655 | ```
656 |
657 | **Expected:** ✅ Project accessible via all CLI commands
658 |
659 | **4. Multi-Project Bidirectional Sync**
660 | ```bash
661 | # Setup: 3 projects in cloud mode
662 | # Modify files in all 3 locally and remotely
663 |
664 | bm sync
665 |
666 | # Verify:
667 | # - All 3 projects sync simultaneously
668 | # - Changes propagate correctly
669 | # - No cross-project interference
670 | ```
671 |
672 | **Expected:** ✅ All projects in sync state
673 |
674 | **5. MCP Tools Work in Cloud Mode**
675 | ```bash
676 | # In cloud mode
677 | bm tool write-note \
678 | --title "Meeting Notes" \
679 | --folder "work-notes" \
680 | --content "Discussion points..."
681 |
682 | # Verify:
683 | # - Note created on cloud (via async_client HTTP)
684 | # - Next `bm sync` pulls note to local
685 | # - Note appears in ~/basic-memory-cloud-sync/work-notes/
686 | ```
687 |
688 | **Expected:** ✅ Tools work transparently in cloud mode
689 |
690 | **6. Watch Mode Continuous Sync**
691 | ```bash
692 | # In cloud mode
693 | bm sync --watch
694 |
695 | # While running:
696 | # - Create local folder → auto-creates cloud project
697 | # - Edit files locally → syncs to cloud
698 | # - Edit files remotely → syncs to local
699 | # - Create project via API → appears locally
700 |
701 | # Verify:
702 | # - Continuous bidirectional sync
703 | # - New projects handled automatically
704 | # - No manual intervention needed
705 | ```
706 |
707 | **Expected:** ✅ Seamless continuous sync
708 |
709 | **7. Safety Profile Protection**
710 | ```bash
711 | # Create project with 15 files locally
712 | # Delete project from cloud (simulate error)
713 |
714 | bm sync --profile safe
715 |
716 | # Verify:
717 | # - Bisync detects 15 pending deletions
718 | # - Exceeds max_delete=10 limit
719 | # - Aborts with clear error
720 | # - No files deleted locally
721 | ```
722 |
723 | **Expected:** ✅ Safety limit prevents data loss
724 |
725 | **8. No RCLONE_TEST Files**
726 | ```bash
727 | # After setup and multiple syncs
728 | ls -la ~/basic-memory-cloud-sync/
729 |
730 | # Verify:
731 | # - No RCLONE_TEST files
732 | # - No .rclone state files (in ~/.basic-memory/bisync-state/)
733 | # - Clean directory structure
734 | ```
735 |
736 | **Expected:** ✅ User directory stays clean
737 |
738 | ### Success Criteria
739 |
740 | - [x] `bm cloud login` enables cloud mode for all commands
741 | - [x] `bm cloud logout` reverts to local mode
742 | - [x] `bm project`, `bm tool`, `bm sync` work in both modes transparently
743 | - [x] `bm sync` runs bisync in cloud mode, local sync in local mode
744 | - [x] Single sync operation handles all projects bidirectionally
745 | - [x] Local directories auto-create cloud projects via API
746 | - [x] Cloud projects auto-sync to local directories
747 | - [x] No RCLONE_TEST files in user directories
748 | - [x] Bisync profiles provide safety via `max_delete` limits
749 | - [x] `bm sync --watch` enables continuous sync
750 | - [x] No duplicate `bm cloud project` commands (removed)
751 | - [x] `bm cloud check` command for integrity verification
752 | - [ ] Documentation covers cloud mode toggle and workflows
753 | - [ ] Edge cases handled gracefully with clear errors
754 |
755 | ## Notes
756 |
757 | ### API Contract
758 |
759 | **Cloud must provide:**
760 |
761 | 1. **Project Management APIs:**
762 | - `GET /proxy/projects/projects` - List all projects
763 | - `POST /proxy/projects/projects` - Create project synchronously
764 | - `POST /proxy/sync` - Trigger cache refresh
765 |
766 | 2. **Project Discovery Service (Background):**
767 | - **Purpose**: Auto-register projects created via mount, direct bucket uploads, or any non-API method
768 | - **Interval**: Every 2 minutes
769 | - **Behavior**:
770 | - Scan `/app/data/` for directories
771 | - Register any directory not already in project database
772 | - Log discovery events
773 | - **Implementation**:
774 | ```python
775 | class ProjectDiscoveryService:
776 | """Background service to auto-discover projects from filesystem."""
777 |
778 | async def run(self):
779 | """Scan /app/data/ and register new project directories."""
780 | data_path = Path("/app/data")
781 |
782 | for dir_path in data_path.iterdir():
783 | # Skip hidden and special directories
784 | if not dir_path.is_dir() or dir_path.name.startswith('.'):
785 | continue
786 |
787 | project_name = dir_path.name
788 |
789 | # Check if project already registered
790 | project = await self.project_repo.get_by_name(project_name)
791 | if not project:
792 | # Auto-register new project
793 | await self.project_repo.create(
794 | name=project_name,
795 | path=str(dir_path)
796 | )
797 | logger.info(f"Auto-discovered project: {project_name}")
798 | ```
799 |
800 | **Project Creation (API-based):**
801 | - API creates `/app/data/{project-name}/` directory
802 | - Registers project in database
803 | - Returns 201 with project details
804 | - Directory ready for bisync immediately
805 |
806 | **Project Creation (Discovery-based):**
807 | - User creates folder via mount: `~/basic-memory-cloud/new-project/`
808 | - Files appear in `/app/data/new-project/` (mounted bucket)
809 | - Discovery service finds directory on next scan (within 2 minutes)
810 | - Auto-registers as project
811 | - User sees project in `bm project list` after discovery
812 |
813 | **Why Both Methods:**
814 | - **API**: Immediate registration when using bisync (client-side scan + API call)
815 | - **Discovery**: Delayed registration when using mount (no API call hook)
816 | - **Result**: Projects created ANY way (API, mount, bisync, WebDAV) eventually registered
817 | - **Trade-off**: 2-minute delay for mount-created projects is acceptable
818 |
819 | ### Mount vs Bisync Directory Isolation
820 |
821 | **Critical Safety Requirement**: Mount and bisync MUST use different directories to prevent conflicts.
822 |
823 | **The Dropbox Model Applied:**
824 |
825 | Both mount and bisync operate at **bucket level** (all projects), following the Dropbox/iCloud paradigm:
826 |
827 | ```
828 | ~/basic-memory-cloud/ # Mount: Read-through cache (like Dropbox folder)
829 | ├── work-notes/
830 | ├── personal/
831 | └── research/
832 |
833 | ~/basic-memory-cloud-sync/ # Bisync: Bidirectional sync (like Dropbox sync folder)
834 | ├── work-notes/
835 | ├── personal/
836 | └── research/
837 | ```
838 |
839 | **Mount Directory (Fixed):**
840 | ```bash
841 | # Fixed location, not configurable
842 | ~/basic-memory-cloud/
843 | ```
844 | - **Scope**: Entire bucket (all projects)
845 | - **Method**: NFS mount via `rclone nfsmount`
846 | - **Behavior**: Read-through cache to cloud bucket
847 | - **Credentials**: One IAM credential set per tenant
848 | - **Process**: One rclone mount process
849 | - **Use Case**: Quick access, browsing, light editing
850 | - **Known Issue**: Obsidian compatibility problems with NFS
851 | - **Not Configurable**: Fixed location prevents user error
852 |
853 | **Why Fixed Location:**
854 | - One mount point per machine (like `/Users/you/Dropbox`)
855 | - Prevents credential proliferation (one credential set, not N)
856 | - Prevents multiple mount processes (resource efficiency)
857 | - Familiar pattern users already understand
858 | - Simple operations: `mount` once, `unmount` once
859 |
860 | **Bisync Directory (User Configurable):**
861 | ```bash
862 | # Default location
863 | ~/basic-memory-cloud-sync/
864 |
865 | # User can override
866 | bm cloud setup --dir ~/my-knowledge-base
867 | ```
868 | - **Scope**: Entire bucket (all projects)
869 | - **Method**: Bidirectional sync via `rclone bisync`
870 | - **Behavior**: Full local copy with periodic sync
871 | - **Credentials**: Same IAM credential set as mount
872 | - **Use Case**: Full offline access, reliable editing, Obsidian support
873 | - **Configurable**: Users may want specific locations (external drive, existing folder structure)
874 |
875 | **Why User Configurable:**
876 | - Users have preferences for where local copies live
877 | - May want sync folder on external drive
878 | - May want to integrate with existing folder structure
879 | - Default works for most, option available for power users
880 |
881 | **Conflict Prevention:**
882 | ```python
883 | def validate_bisync_directory(bisync_dir: Path):
884 | """Ensure bisync directory doesn't conflict with mount."""
885 | mount_dir = Path.home() / "basic-memory-cloud"
886 |
887 | if bisync_dir.resolve() == mount_dir.resolve():
888 | raise BisyncError(
889 | f"Cannot use {bisync_dir} for bisync - it's the mount directory!\n"
890 | f"Mount and bisync must use different directories.\n\n"
891 | f"Options:\n"
892 | f" 1. Use default: ~/basic-memory-cloud-sync/\n"
893 | f" 2. Specify different directory: --dir ~/my-sync-folder"
894 | )
895 |
896 | # Check if mount is active at this location
897 | result = subprocess.run(["mount"], capture_output=True, text=True)
898 | if str(bisync_dir) in result.stdout and "rclone" in result.stdout:
899 | raise BisyncError(
900 | f"{bisync_dir} is currently mounted via 'bm cloud mount'\n"
901 | f"Cannot use mounted directory for bisync.\n\n"
902 | f"Either:\n"
903 | f" 1. Unmount first: bm cloud unmount\n"
904 | f" 2. Use different directory for bisync"
905 | )
906 | ```
907 |
908 | **Why This Matters:**
909 | - Mounting and syncing the SAME directory would create infinite loops
910 | - rclone mount → bisync detects changes → syncs to bucket → mount sees changes → triggers bisync → ∞
911 | - Separate directories = clean separation of concerns
912 | - Mount is read-heavy caching layer, bisync is write-heavy bidirectional sync
913 |
914 | ### Future Enhancements
915 |
916 | **Phase 2 (Not in this spec):**
917 | - **Near Real-Time Sync**: Integrate `watch_service.py` with cloud mode
918 | - Watch service detects local changes (already battle-tested)
919 | - Queue changes in memory
920 | - Use `rclone copy` for individual file sync (near instant)
921 | - Example: `rclone copyto ~/sync/project/file.md tenant:{bucket}/project/file.md`
922 | - Fallback to full `rclone bisync` every N seconds for bidirectional changes
923 | - Provides near real-time sync without polling overhead
924 | - Per-project bisync profiles (different safety levels per project)
925 | - Selective project sync (exclude specific projects from sync)
926 | - Project deletion workflow (cascade to cloud/local)
927 | - Conflict resolution UI/CLI
928 |
929 | **Phase 3:**
930 | - Project sharing between tenants
931 | - Incremental backup/restore
932 | - Sync statistics and bandwidth monitoring
933 | - Mobile app integration with cloud mode
934 |
935 | ### Related Specs
936 |
937 | - **SPEC-8**: TigrisFS Integration - Original bisync implementation
938 | - **SPEC-6**: Explicit Project Parameter Architecture - Multi-project foundations
939 | - **SPEC-5**: CLI Cloud Upload via WebDAV - Cloud file operations
940 |
941 | ### Implementation Notes
942 |
943 | **Architectural Simplifications:**
944 | - **Unified CLI**: Eliminated duplicate commands by using mode toggle
945 | - **Single Entry Point**: All commands route through `async_client` which handles mode
946 | - **Config-Driven**: Cloud mode stored in persistent config, not just environment
947 | - **Transparent Routing**: Existing commands work without modification in cloud mode
948 |
949 | **Complexity Trade-offs:**
950 | - Removed: Separate `bm cloud project` command namespace
951 | - Removed: Complex state detection for new projects
952 | - Removed: RCLONE_TEST marker file management
953 | - Added: Simple cloud_mode flag and config integration
954 | - Added: Simple project list comparison before sync
955 | - Relied on: Existing bisync profile safety mechanisms
956 | - Result: Significantly simpler, more maintainable code
957 |
958 | **User Experience:**
959 | - **Mental Model**: "Toggle cloud mode, use normal commands"
960 | - **No Learning Curve**: Same commands work locally and in cloud
961 | - **Minimal Config**: Just login/logout to switch modes
962 | - **Safety**: Profile system gives users control over safety/speed trade-offs
963 | - **"Just Works"**: Create folders anywhere, they sync automatically
964 |
965 | **Migration Path:**
966 | - Existing `bm cloud project` users: Use `bm project` instead
967 | - Existing `bm cloud bisync` becomes `bm sync` in cloud mode
968 | - Config automatically migrates on first `bm cloud login`
969 |
970 |
971 | ## Testing
972 |
973 |
974 | Initial Setup (One Time)
975 |
976 | 1. Login to cloud and enable cloud mode:
977 | bm cloud login
978 | # → Authenticates via OAuth
979 | # → Sets cloud_mode=true in config
980 | # → Sets BASIC_MEMORY_PROXY_URL environment variable
981 | # → All CLI commands now route to cloud
982 |
983 | 2. Check cloud mode status:
984 | bm cloud status
985 | # → Shows: Mode: Cloud (enabled)
986 | # → Shows: Host: https://cloud.basicmemory.com
987 | # → Checks cloud health
988 |
989 | 3. Set up bidirectional sync:
990 | bm cloud bisync-setup
991 | # Or with custom directory:
992 | bm cloud bisync-setup --dir ~/my-sync-folder
993 |
994 | # This will:
995 | # → Install rclone (if not already installed)
996 | # → Get tenant info (tenant_id, bucket_name)
997 | # → Generate scoped IAM credentials
998 | # → Configure rclone with credentials
999 | # → Create sync directory (default: ~/basic-memory-cloud-sync/)
1000 | # → Validate no conflict with mount directory
1001 | # → Run initial --resync to establish baseline
1002 |
1003 | Normal Usage
1004 |
1005 | 4. Create local project and sync:
1006 | # Create a local project directory
1007 | mkdir ~/basic-memory-cloud-sync/my-research
1008 | echo "# Research Notes" > ~/basic-memory-cloud-sync/my-research/readme.md
1009 |
1010 | # Run sync
1011 | bm cloud bisync
1012 |
1013 | # Auto-magic happens:
1014 | # → Checks for new local directories
1015 | # → Finds "my-research" not in cloud
1016 | # → Creates project on cloud via POST /proxy/projects/projects
1017 | # → Runs bidirectional sync (all projects)
1018 | # → Syncs to bucket root (all projects synced together)
1019 |
1020 | 5. Watch mode for continuous sync:
1021 | bm cloud bisync --watch
1022 | # Or with custom interval:
1023 | bm cloud bisync --watch --interval 30
1024 |
1025 | # → Syncs every 60 seconds (or custom interval)
1026 | # → Auto-registers new projects on each run
1027 | # → Press Ctrl+C to stop
1028 |
1029 | 6. Check bisync status:
1030 | bm cloud bisync-status
1031 | # → Shows tenant ID
1032 | # → Shows sync directory path
1033 | # → Shows initialization status
1034 | # → Shows last sync time
1035 | # → Lists available profiles (safe/balanced/fast)
1036 |
1037 | 7. Manual sync with different profiles:
1038 | # Safe mode (max 10 deletes, preserves conflicts)
1039 | bm cloud bisync --profile safe
1040 |
1041 | # Balanced mode (max 25 deletes, auto-resolve to newer) - default
1042 | bm cloud bisync --profile balanced
1043 |
1044 | # Fast mode (max 50 deletes, skip verification)
1045 | bm cloud bisync --profile fast
1046 |
1047 | 8. Dry run to preview changes:
1048 | bm cloud bisync --dry-run
1049 | # → Shows what would be synced without making changes
1050 |
1051 | 9. Force resync (if needed):
1052 | bm cloud bisync --resync
1053 | # → Establishes new baseline
1054 | # → Use if sync state is corrupted
1055 |
1056 | 10. Check file integrity:
1057 | bm cloud check
1058 | # → Verifies all files match between local and cloud
1059 | # → Read-only operation (no data transfer)
1060 | # → Shows differences if any found
1061 |
1062 | # Faster one-way check
1063 | bm cloud check --one-way
1064 | # → Only checks for missing files on destination
1065 |
1066 | Verify Cloud Mode Integration
1067 |
1068 | 11. Test that all commands work in cloud mode:
1069 | # List cloud projects (not local)
1070 | bm project list
1071 |
1072 | # Create project on cloud
1073 | bm project add "work-notes"
1074 |
1075 | # Use MCP tools against cloud
1076 | bm tool write-note --title "Test" --folder "my-research" --content "Hello"
1077 |
1078 | # All of these work against cloud because cloud_mode=true
1079 |
1080 | 12. Switch back to local mode:
1081 | bm cloud logout
1082 | # → Sets cloud_mode=false
1083 | # → Clears BASIC_MEMORY_PROXY_URL
1084 | # → All commands now work locally again
1085 |
1086 | Expected Directory Structure
1087 |
1088 | ~/basic-memory-cloud-sync/ # Your local sync directory
1089 | ├── my-research/ # Auto-created cloud project
1090 | │ ├── readme.md
1091 | │ └── notes.md
1092 | ├── work-notes/ # Another project
1093 | │ └── tasks.md
1094 | └── personal/ # Another project
1095 | └── journal.md
1096 |
1097 | # All sync bidirectionally with:
1098 | bucket:/ # Cloud bucket root
1099 | ├── my-research/
1100 | ├── work-notes/
1101 | └── personal/
1102 |
1103 | Key Points to Test
1104 |
1105 | 1. ✅ Cloud mode toggle works (login/logout)
1106 | 2. ✅ Bisync setup validates directory (no conflict with mount)
1107 | 3. ✅ Local directories auto-create cloud projects
1108 | 4. ✅ All projects sync together (bucket root)
1109 | 5. ✅ No RCLONE_TEST files created
1110 | 6. ✅ Changes sync bidirectionally
1111 | 7. ✅ Watch mode continuous sync works
1112 | 8. ✅ Profile safety limits work (max_delete)
1113 | 9. ✅ `bm sync` adapts to cloud mode automatically
1114 | 10. ✅ `bm cloud check` verifies file integrity without side effects
1115 |
```
--------------------------------------------------------------------------------
/tests/mcp/test_tool_write_note.py:
--------------------------------------------------------------------------------
```python
1 | """Tests for note tools that exercise the full stack with SQLite."""
2 |
3 | from textwrap import dedent
4 | import pytest
5 |
6 | from basic_memory.mcp.tools import write_note, read_note, delete_note
7 | from basic_memory.utils import normalize_newlines
8 |
9 |
10 | @pytest.mark.asyncio
11 | async def test_write_note(app, test_project):
12 | """Test creating a new note.
13 |
14 | Should:
15 | - Create entity with correct type and content
16 | - Save markdown content
17 | - Handle tags correctly
18 | - Return valid permalink
19 | """
20 | result = await write_note.fn(
21 | project=test_project.name,
22 | title="Test Note",
23 | folder="test",
24 | content="# Test\nThis is a test note",
25 | tags=["test", "documentation"],
26 | )
27 |
28 | assert result
29 | assert "# Created note" in result
30 | assert f"project: {test_project.name}" in result
31 | assert "file_path: test/Test Note.md" in result
32 | assert "permalink: test/test-note" in result
33 | assert "## Tags" in result
34 | assert "- test, documentation" in result
35 | assert f"[Session: Using project '{test_project.name}']" in result
36 |
37 | # Try reading it back via permalink
38 | content = await read_note.fn("test/test-note", project=test_project.name)
39 | assert (
40 | normalize_newlines(
41 | dedent("""
42 | ---
43 | title: Test Note
44 | type: note
45 | permalink: test/test-note
46 | tags:
47 | - test
48 | - documentation
49 | ---
50 |
51 | # Test
52 | This is a test note
53 | """).strip()
54 | )
55 | in content
56 | )
57 |
58 |
59 | @pytest.mark.asyncio
60 | async def test_write_note_no_tags(app, test_project):
61 | """Test creating a note without tags."""
62 | result = await write_note.fn(
63 | project=test_project.name, title="Simple Note", folder="test", content="Just some text"
64 | )
65 |
66 | assert result
67 | assert "# Created note" in result
68 | assert f"project: {test_project.name}" in result
69 | assert "file_path: test/Simple Note.md" in result
70 | assert "permalink: test/simple-note" in result
71 | assert f"[Session: Using project '{test_project.name}']" in result
72 | # Should be able to read it back
73 | content = await read_note.fn("test/simple-note", project=test_project.name)
74 | assert (
75 | normalize_newlines(
76 | dedent("""
77 | ---
78 | title: Simple Note
79 | type: note
80 | permalink: test/simple-note
81 | ---
82 |
83 | Just some text
84 | """).strip()
85 | )
86 | in content
87 | )
88 |
89 |
90 | @pytest.mark.asyncio
91 | async def test_write_note_update_existing(app, test_project):
92 | """Test creating a new note.
93 |
94 | Should:
95 | - Create entity with correct type and content
96 | - Save markdown content
97 | - Handle tags correctly
98 | - Return valid permalink
99 | """
100 | result = await write_note.fn(
101 | project=test_project.name,
102 | title="Test Note",
103 | folder="test",
104 | content="# Test\nThis is a test note",
105 | tags=["test", "documentation"],
106 | )
107 |
108 | assert result # Got a valid permalink
109 | assert "# Created note" in result
110 | assert f"project: {test_project.name}" in result
111 | assert "file_path: test/Test Note.md" in result
112 | assert "permalink: test/test-note" in result
113 | assert "## Tags" in result
114 | assert "- test, documentation" in result
115 | assert f"[Session: Using project '{test_project.name}']" in result
116 |
117 | result = await write_note.fn(
118 | project=test_project.name,
119 | title="Test Note",
120 | folder="test",
121 | content="# Test\nThis is an updated note",
122 | tags=["test", "documentation"],
123 | )
124 | assert "# Updated note" in result
125 | assert f"project: {test_project.name}" in result
126 | assert "file_path: test/Test Note.md" in result
127 | assert "permalink: test/test-note" in result
128 | assert "## Tags" in result
129 | assert "- test, documentation" in result
130 | assert f"[Session: Using project '{test_project.name}']" in result
131 |
132 | # Try reading it back
133 | content = await read_note.fn("test/test-note", project=test_project.name)
134 | assert (
135 | normalize_newlines(
136 | dedent(
137 | """
138 | ---
139 | title: Test Note
140 | type: note
141 | permalink: test/test-note
142 | tags:
143 | - test
144 | - documentation
145 | ---
146 |
147 | # Test
148 | This is an updated note
149 | """
150 | ).strip()
151 | )
152 | == content
153 | )
154 |
155 |
156 | @pytest.mark.asyncio
157 | async def test_issue_93_write_note_respects_custom_permalink_new_note(app, test_project):
158 | """Test that write_note respects custom permalinks in frontmatter for new notes (Issue #93)"""
159 |
160 | # Create a note with custom permalink in frontmatter
161 | content_with_custom_permalink = dedent("""
162 | ---
163 | permalink: custom/my-desired-permalink
164 | ---
165 |
166 | # My New Note
167 |
168 | This note has a custom permalink specified in frontmatter.
169 |
170 | - [note] Testing if custom permalink is respected
171 | """).strip()
172 |
173 | result = await write_note.fn(
174 | project=test_project.name,
175 | title="My New Note",
176 | folder="notes",
177 | content=content_with_custom_permalink,
178 | )
179 |
180 | # Verify the custom permalink is respected
181 | assert "# Created note" in result
182 | assert f"project: {test_project.name}" in result
183 | assert "file_path: notes/My New Note.md" in result
184 | assert "permalink: custom/my-desired-permalink" in result
185 | assert f"[Session: Using project '{test_project.name}']" in result
186 |
187 |
188 | @pytest.mark.asyncio
189 | async def test_issue_93_write_note_respects_custom_permalink_existing_note(app, test_project):
190 | """Test that write_note respects custom permalinks when updating existing notes (Issue #93)"""
191 |
192 | # Step 1: Create initial note (auto-generated permalink)
193 | result1 = await write_note.fn(
194 | project=test_project.name,
195 | title="Existing Note",
196 | folder="test",
197 | content="Initial content without custom permalink",
198 | )
199 |
200 | assert "# Created note" in result1
201 | assert f"project: {test_project.name}" in result1
202 |
203 | # Extract the auto-generated permalink
204 | initial_permalink = None
205 | for line in result1.split("\n"):
206 | if line.startswith("permalink:"):
207 | initial_permalink = line.split(":", 1)[1].strip()
208 | break
209 |
210 | assert initial_permalink is not None
211 |
212 | # Step 2: Update with content that includes custom permalink in frontmatter
213 | updated_content = dedent("""
214 | ---
215 | permalink: custom/new-permalink
216 | ---
217 |
218 | # Existing Note
219 |
220 | Updated content with custom permalink in frontmatter.
221 |
222 | - [note] Custom permalink should be respected on update
223 | """).strip()
224 |
225 | result2 = await write_note.fn(
226 | project=test_project.name,
227 | title="Existing Note",
228 | folder="test",
229 | content=updated_content,
230 | )
231 |
232 | # Verify the custom permalink is respected
233 | assert "# Updated note" in result2
234 | assert f"project: {test_project.name}" in result2
235 | assert "permalink: custom/new-permalink" in result2
236 | assert f"permalink: {initial_permalink}" not in result2
237 | assert f"[Session: Using project '{test_project.name}']" in result2
238 |
239 |
240 | @pytest.mark.asyncio
241 | async def test_delete_note_existing(app, test_project):
242 | """Test deleting a new note.
243 |
244 | Should:
245 | - Create entity with correct type and content
246 | - Return valid permalink
247 | - Delete the note
248 | """
249 | result = await write_note.fn(
250 | project=test_project.name,
251 | title="Test Note",
252 | folder="test",
253 | content="# Test\nThis is a test note",
254 | tags=["test", "documentation"],
255 | )
256 |
257 | assert result
258 | assert f"project: {test_project.name}" in result
259 |
260 | deleted = await delete_note.fn("test/test-note", project=test_project.name)
261 | assert deleted is True
262 |
263 |
264 | @pytest.mark.asyncio
265 | async def test_delete_note_doesnt_exist(app, test_project):
266 | """Test deleting a new note.
267 |
268 | Should:
269 | - Delete the note
270 | - verify returns false
271 | """
272 | deleted = await delete_note.fn("doesnt-exist", project=test_project.name)
273 | assert deleted is False
274 |
275 |
276 | @pytest.mark.asyncio
277 | async def test_write_note_with_tag_array_from_bug_report(app, test_project):
278 | """Test creating a note with a tag array as reported in issue #38.
279 |
280 | This reproduces the exact payload from the bug report where Cursor
281 | was passing an array of tags and getting a type mismatch error.
282 | """
283 | # This is the exact payload from the bug report
284 | bug_payload = {
285 | "project": test_project.name,
286 | "title": "Title",
287 | "folder": "folder",
288 | "content": "CONTENT",
289 | "tags": ["hipporag", "search", "fallback", "symfony", "error-handling"],
290 | }
291 |
292 | # Try to call the function with this data directly
293 | result = await write_note.fn(**bug_payload)
294 |
295 | assert result
296 | assert f"project: {test_project.name}" in result
297 | assert "permalink: folder/title" in result
298 | assert "Tags" in result
299 | assert "hipporag" in result
300 | assert f"[Session: Using project '{test_project.name}']" in result
301 |
302 |
303 | @pytest.mark.asyncio
304 | async def test_write_note_verbose(app, test_project):
305 | """Test creating a new note.
306 |
307 | Should:
308 | - Create entity with correct type and content
309 | - Save markdown content
310 | - Handle tags correctly
311 | - Return valid permalink
312 | """
313 | result = await write_note.fn(
314 | project=test_project.name,
315 | title="Test Note",
316 | folder="test",
317 | content="""
318 | # Test\nThis is a test note
319 |
320 | - [note] First observation
321 | - relates to [[Knowledge]]
322 |
323 | """,
324 | tags=["test", "documentation"],
325 | )
326 |
327 | assert "# Created note" in result
328 | assert f"project: {test_project.name}" in result
329 | assert "file_path: test/Test Note.md" in result
330 | assert "permalink: test/test-note" in result
331 | assert "## Observations" in result
332 | assert "- note: 1" in result
333 | assert "## Relations" in result
334 | assert "## Tags" in result
335 | assert "- test, documentation" in result
336 | assert f"[Session: Using project '{test_project.name}']" in result
337 |
338 |
339 | @pytest.mark.asyncio
340 | async def test_write_note_preserves_custom_metadata(app, project_config, test_project):
341 | """Test that updating a note preserves custom metadata fields.
342 |
343 | Reproduces issue #36 where custom frontmatter fields like Status
344 | were being lost when updating notes with the write_note tool.
345 |
346 | Should:
347 | - Create a note with custom frontmatter
348 | - Update the note with new content
349 | - Verify custom frontmatter is preserved
350 | """
351 | # First, create a note with custom metadata using write_note
352 | await write_note.fn(
353 | project=test_project.name,
354 | title="Custom Metadata Note",
355 | folder="test",
356 | content="# Initial content",
357 | tags=["test"],
358 | )
359 |
360 | # Read the note to get its permalink
361 | content = await read_note.fn("test/custom-metadata-note", project=test_project.name)
362 |
363 | # Now directly update the file with custom frontmatter
364 | # We need to use a direct file update to add custom frontmatter
365 | import frontmatter
366 |
367 | file_path = project_config.home / "test" / "Custom Metadata Note.md"
368 | post = frontmatter.load(file_path)
369 |
370 | # Add custom frontmatter
371 | post["Status"] = "In Progress"
372 | post["Priority"] = "High"
373 | post["Version"] = "1.0"
374 |
375 | # Write the file back
376 | with open(file_path, "w") as f:
377 | f.write(frontmatter.dumps(post))
378 |
379 | # Now update the note using write_note
380 | result = await write_note.fn(
381 | project=test_project.name,
382 | title="Custom Metadata Note",
383 | folder="test",
384 | content="# Updated content",
385 | tags=["test", "updated"],
386 | )
387 |
388 | # Verify the update was successful
389 | assert (
390 | "Updated note\nproject: test-project\nfile_path: test/Custom Metadata Note.md"
391 | ) in result
392 | assert f"project: {test_project.name}" in result
393 |
394 | # Read the note back and check if custom frontmatter is preserved
395 | content = await read_note.fn("test/custom-metadata-note", project=test_project.name)
396 |
397 | # Custom frontmatter should be preserved
398 | assert "Status: In Progress" in content
399 | assert "Priority: High" in content
400 | # Version might be quoted as '1.0' due to YAML serialization
401 | assert "Version:" in content # Just check that the field exists
402 | assert "1.0" in content # And that the value exists somewhere
403 |
404 | # And new content should be there
405 | assert "# Updated content" in content
406 |
407 | # And tags should be updated (without # prefix)
408 | assert "- test" in content
409 | assert "- updated" in content
410 |
411 |
412 | @pytest.mark.asyncio
413 | async def test_write_note_preserves_content_frontmatter(app, test_project):
414 | """Test creating a new note."""
415 | await write_note.fn(
416 | project=test_project.name,
417 | title="Test Note",
418 | folder="test",
419 | content=dedent(
420 | """
421 | ---
422 | title: Test Note
423 | type: note
424 | version: 1.0
425 | author: name
426 | ---
427 | # Test
428 |
429 | This is a test note
430 | """
431 | ),
432 | tags=["test", "documentation"],
433 | )
434 |
435 | # Try reading it back via permalink
436 | content = await read_note.fn("test/test-note", project=test_project.name)
437 | assert (
438 | normalize_newlines(
439 | dedent(
440 | """
441 | ---
442 | title: Test Note
443 | type: note
444 | permalink: test/test-note
445 | version: 1.0
446 | author: name
447 | tags:
448 | - test
449 | - documentation
450 | ---
451 |
452 | # Test
453 |
454 | This is a test note
455 | """
456 | ).strip()
457 | )
458 | in content
459 | )
460 |
461 |
462 | @pytest.mark.asyncio
463 | async def test_write_note_permalink_collision_fix_issue_139(app, test_project):
464 | """Test fix for GitHub Issue #139: UNIQUE constraint failed: entity.permalink.
465 |
466 | This reproduces the exact scenario described in the issue:
467 | 1. Create a note with title "Note 1"
468 | 2. Create another note with title "Note 2"
469 | 3. Try to create/replace first note again with same title "Note 1"
470 |
471 | Before the fix, step 3 would fail with UNIQUE constraint error.
472 | After the fix, it should either update the existing note or create with unique permalink.
473 | """
474 | # Step 1: Create first note
475 | result1 = await write_note.fn(
476 | project=test_project.name,
477 | title="Note 1",
478 | folder="test",
479 | content="Original content for note 1",
480 | )
481 | assert "# Created note" in result1
482 | assert f"project: {test_project.name}" in result1
483 | assert "permalink: test/note-1" in result1
484 |
485 | # Step 2: Create second note with different title
486 | result2 = await write_note.fn(
487 | project=test_project.name, title="Note 2", folder="test", content="Content for note 2"
488 | )
489 | assert "# Created note" in result2
490 | assert f"project: {test_project.name}" in result2
491 | assert "permalink: test/note-2" in result2
492 |
493 | # Step 3: Try to create/replace first note again
494 | # This scenario would trigger the UNIQUE constraint failure before the fix
495 | result3 = await write_note.fn(
496 | project=test_project.name,
497 | title="Note 1", # Same title as first note
498 | folder="test", # Same folder as first note
499 | content="Replacement content for note 1", # Different content
500 | )
501 |
502 | # This should not raise a UNIQUE constraint failure error
503 | # It should succeed and either:
504 | # 1. Update the existing note (preferred behavior)
505 | # 2. Create a new note with unique permalink (fallback behavior)
506 |
507 | assert result3 is not None
508 | assert f"project: {test_project.name}" in result3
509 | assert "Updated note" in result3 or "Created note" in result3
510 |
511 | # The result should contain either the original permalink or a unique one
512 | assert "permalink: test/note-1" in result3 or "permalink: test/note-1-1" in result3
513 |
514 | # Verify we can read back the content
515 | if "permalink: test/note-1" in result3:
516 | # Updated existing note case
517 | content = await read_note.fn("test/note-1", project=test_project.name)
518 | assert "Replacement content for note 1" in content
519 | else:
520 | # Created new note with unique permalink case
521 | content = await read_note.fn(test_project.name, "test/note-1-1")
522 | assert "Replacement content for note 1" in content
523 | # Original note should still exist
524 | original_content = await read_note.fn(test_project.name, "test/note-1")
525 | assert "Original content for note 1" in original_content
526 |
527 |
528 | @pytest.mark.asyncio
529 | async def test_write_note_with_custom_entity_type(app, test_project):
530 | """Test creating a note with custom entity_type parameter.
531 |
532 | This test verifies the fix for Issue #144 where entity_type parameter
533 | was hardcoded to "note" instead of allowing custom types.
534 | """
535 | result = await write_note.fn(
536 | project=test_project.name,
537 | title="Test Guide",
538 | folder="guides",
539 | content="# Guide Content\nThis is a guide",
540 | tags=["guide", "documentation"],
541 | entity_type="guide",
542 | )
543 |
544 | assert result
545 | assert "# Created note" in result
546 | assert f"project: {test_project.name}" in result
547 | assert "file_path: guides/Test Guide.md" in result
548 | assert "permalink: guides/test-guide" in result
549 | assert "## Tags" in result
550 | assert "- guide, documentation" in result
551 | assert f"[Session: Using project '{test_project.name}']" in result
552 |
553 | # Verify the entity type is correctly set in the frontmatter
554 | content = await read_note.fn("guides/test-guide", project=test_project.name)
555 | assert (
556 | normalize_newlines(
557 | dedent("""
558 | ---
559 | title: Test Guide
560 | type: guide
561 | permalink: guides/test-guide
562 | tags:
563 | - guide
564 | - documentation
565 | ---
566 |
567 | # Guide Content
568 | This is a guide
569 | """).strip()
570 | )
571 | in content
572 | )
573 |
574 |
575 | @pytest.mark.asyncio
576 | async def test_write_note_with_report_entity_type(app, test_project):
577 | """Test creating a note with entity_type="report"."""
578 | result = await write_note.fn(
579 | project=test_project.name,
580 | title="Monthly Report",
581 | folder="reports",
582 | content="# Monthly Report\nThis is a monthly report",
583 | tags=["report", "monthly"],
584 | entity_type="report",
585 | )
586 |
587 | assert result
588 | assert "# Created note" in result
589 | assert f"project: {test_project.name}" in result
590 | assert "file_path: reports/Monthly Report.md" in result
591 | assert "permalink: reports/monthly-report" in result
592 | assert f"[Session: Using project '{test_project.name}']" in result
593 |
594 | # Verify the entity type is correctly set in the frontmatter
595 | content = await read_note.fn("reports/monthly-report", project=test_project.name)
596 | assert "type: report" in content
597 | assert "# Monthly Report" in content
598 |
599 |
600 | @pytest.mark.asyncio
601 | async def test_write_note_with_config_entity_type(app, test_project):
602 | """Test creating a note with entity_type="config"."""
603 | result = await write_note.fn(
604 | project=test_project.name,
605 | title="System Config",
606 | folder="config",
607 | content="# System Configuration\nThis is a config file",
608 | entity_type="config",
609 | )
610 |
611 | assert result
612 | assert "# Created note" in result
613 | assert f"project: {test_project.name}" in result
614 | assert "file_path: config/System Config.md" in result
615 | assert "permalink: config/system-config" in result
616 | assert f"[Session: Using project '{test_project.name}']" in result
617 |
618 | # Verify the entity type is correctly set in the frontmatter
619 | content = await read_note.fn("config/system-config", project=test_project.name)
620 | assert "type: config" in content
621 | assert "# System Configuration" in content
622 |
623 |
624 | @pytest.mark.asyncio
625 | async def test_write_note_entity_type_default_behavior(app, test_project):
626 | """Test that the entity_type parameter defaults to "note" when not specified.
627 |
628 | This ensures backward compatibility - existing code that doesn't specify
629 | entity_type should continue to work as before.
630 | """
631 | result = await write_note.fn(
632 | project=test_project.name,
633 | title="Default Type Test",
634 | folder="test",
635 | content="# Default Type Test\nThis should be type 'note'",
636 | tags=["test"],
637 | )
638 |
639 | assert result
640 | assert "# Created note" in result
641 | assert f"project: {test_project.name}" in result
642 | assert "file_path: test/Default Type Test.md" in result
643 | assert "permalink: test/default-type-test" in result
644 | assert f"[Session: Using project '{test_project.name}']" in result
645 |
646 | # Verify the entity type defaults to "note"
647 | content = await read_note.fn("test/default-type-test", project=test_project.name)
648 | assert "type: note" in content
649 | assert "# Default Type Test" in content
650 |
651 |
652 | @pytest.mark.asyncio
653 | async def test_write_note_update_existing_with_different_entity_type(app, test_project):
654 | """Test updating an existing note with a different entity_type."""
655 | # Create initial note as "note" type
656 | result1 = await write_note.fn(
657 | project=test_project.name,
658 | title="Changeable Type",
659 | folder="test",
660 | content="# Initial Content\nThis starts as a note",
661 | tags=["test"],
662 | entity_type="note",
663 | )
664 |
665 | assert result1
666 | assert "# Created note" in result1
667 | assert f"project: {test_project.name}" in result1
668 |
669 | # Update the same note with a different entity_type
670 | result2 = await write_note.fn(
671 | project=test_project.name,
672 | title="Changeable Type",
673 | folder="test",
674 | content="# Updated Content\nThis is now a guide",
675 | tags=["guide"],
676 | entity_type="guide",
677 | )
678 |
679 | assert result2
680 | assert "# Updated note" in result2
681 | assert f"project: {test_project.name}" in result2
682 |
683 | # Verify the entity type was updated
684 | content = await read_note.fn("test/changeable-type", project=test_project.name)
685 | assert "type: guide" in content
686 | assert "# Updated Content" in content
687 | assert "- guide" in content
688 |
689 |
690 | @pytest.mark.asyncio
691 | async def test_write_note_respects_frontmatter_entity_type(app, test_project):
692 | """Test that entity_type in frontmatter is respected when parameter is not provided.
693 |
694 | This verifies that when write_note is called without entity_type parameter,
695 | but the content includes frontmatter with a 'type' field, that type is respected
696 | instead of defaulting to 'note'.
697 | """
698 | note = dedent("""
699 | ---
700 | title: Test Guide
701 | type: guide
702 | permalink: guides/test-guide
703 | tags:
704 | - guide
705 | - documentation
706 | ---
707 |
708 | # Guide Content
709 | This is a guide
710 | """).strip()
711 |
712 | # Call write_note without entity_type parameter - it should respect frontmatter type
713 | result = await write_note.fn(
714 | project=test_project.name, title="Test Guide", folder="guides", content=note
715 | )
716 |
717 | assert result
718 | assert "# Created note" in result
719 | assert f"project: {test_project.name}" in result
720 | assert "file_path: guides/Test Guide.md" in result
721 | assert "permalink: guides/test-guide" in result
722 | assert f"[Session: Using project '{test_project.name}']" in result
723 |
724 | # Verify the entity type from frontmatter is respected (should be "guide", not "note")
725 | content = await read_note.fn("guides/test-guide", project=test_project.name)
726 | assert "type: guide" in content
727 | assert "# Guide Content" in content
728 | assert "- guide" in content
729 | assert "- documentation" in content
730 |
731 |
732 | class TestWriteNoteSecurityValidation:
733 | """Test write_note security validation features."""
734 |
735 | @pytest.mark.asyncio
736 | async def test_write_note_blocks_path_traversal_unix(self, app, test_project):
737 | """Test that Unix-style path traversal attacks are blocked in folder parameter."""
738 | # Test various Unix-style path traversal patterns
739 | attack_folders = [
740 | "../",
741 | "../../",
742 | "../../../",
743 | "../secrets",
744 | "../../etc",
745 | "../../../etc/passwd_folder",
746 | "notes/../../../etc",
747 | "folder/../../outside",
748 | "../../../../malicious",
749 | ]
750 |
751 | for attack_folder in attack_folders:
752 | result = await write_note.fn(
753 | project=test_project.name,
754 | title="Test Note",
755 | folder=attack_folder,
756 | content="# Test Content\nThis should be blocked by security validation.",
757 | )
758 |
759 | assert isinstance(result, str)
760 | assert "# Error" in result
761 | assert "paths must stay within project boundaries" in result
762 | assert attack_folder in result
763 |
764 | @pytest.mark.asyncio
765 | async def test_write_note_blocks_path_traversal_windows(self, app, test_project):
766 | """Test that Windows-style path traversal attacks are blocked in folder parameter."""
767 | # Test various Windows-style path traversal patterns
768 | attack_folders = [
769 | "..\\",
770 | "..\\..\\",
771 | "..\\..\\..\\",
772 | "..\\secrets",
773 | "..\\..\\Windows",
774 | "..\\..\\..\\Windows\\System32",
775 | "notes\\..\\..\\..\\Windows",
776 | "\\\\server\\share",
777 | "\\\\..\\..\\Windows",
778 | ]
779 |
780 | for attack_folder in attack_folders:
781 | result = await write_note.fn(
782 | project=test_project.name,
783 | title="Test Note",
784 | folder=attack_folder,
785 | content="# Test Content\nThis should be blocked by security validation.",
786 | )
787 |
788 | assert isinstance(result, str)
789 | assert "# Error" in result
790 | assert "paths must stay within project boundaries" in result
791 | assert attack_folder in result
792 |
793 | @pytest.mark.asyncio
794 | async def test_write_note_blocks_absolute_paths(self, app, test_project):
795 | """Test that absolute paths are blocked in folder parameter."""
796 | # Test various absolute path patterns
797 | attack_folders = [
798 | "/etc",
799 | "/home/user",
800 | "/var/log",
801 | "/root",
802 | "C:\\Windows",
803 | "C:\\Users\\user",
804 | "D:\\secrets",
805 | "/tmp/malicious",
806 | "/usr/local/evil",
807 | ]
808 |
809 | for attack_folder in attack_folders:
810 | result = await write_note.fn(
811 | project=test_project.name,
812 | title="Test Note",
813 | folder=attack_folder,
814 | content="# Test Content\nThis should be blocked by security validation.",
815 | )
816 |
817 | assert isinstance(result, str)
818 | assert "# Error" in result
819 | assert "paths must stay within project boundaries" in result
820 | assert attack_folder in result
821 |
822 | @pytest.mark.asyncio
823 | async def test_write_note_blocks_home_directory_access(self, app, test_project):
824 | """Test that home directory access patterns are blocked in folder parameter."""
825 | # Test various home directory access patterns
826 | attack_folders = [
827 | "~",
828 | "~/",
829 | "~/secrets",
830 | "~/.ssh",
831 | "~/Documents",
832 | "~\\AppData",
833 | "~\\Desktop",
834 | "~/.env_folder",
835 | ]
836 |
837 | for attack_folder in attack_folders:
838 | result = await write_note.fn(
839 | project=test_project.name,
840 | title="Test Note",
841 | folder=attack_folder,
842 | content="# Test Content\nThis should be blocked by security validation.",
843 | )
844 |
845 | assert isinstance(result, str)
846 | assert "# Error" in result
847 | assert "paths must stay within project boundaries" in result
848 | assert attack_folder in result
849 |
850 | @pytest.mark.asyncio
851 | async def test_write_note_blocks_mixed_attack_patterns(self, app, test_project):
852 | """Test that mixed legitimate/attack patterns are blocked in folder parameter."""
853 | # Test mixed patterns that start legitimate but contain attacks
854 | attack_folders = [
855 | "notes/../../../etc",
856 | "docs/../../.env_folder",
857 | "legitimate/path/../../.ssh",
858 | "project/folder/../../../Windows",
859 | "valid/folder/../../home/user",
860 | "assets/../../../tmp/evil",
861 | ]
862 |
863 | for attack_folder in attack_folders:
864 | result = await write_note.fn(
865 | project=test_project.name,
866 | title="Test Note",
867 | folder=attack_folder,
868 | content="# Test Content\nThis should be blocked by security validation.",
869 | )
870 |
871 | assert isinstance(result, str)
872 | assert "# Error" in result
873 | assert "paths must stay within project boundaries" in result
874 |
875 | @pytest.mark.asyncio
876 | async def test_write_note_allows_safe_folder_paths(self, app, test_project):
877 | """Test that legitimate folder paths are still allowed."""
878 | # Test various safe folder patterns
879 | safe_folders = [
880 | "notes",
881 | "docs",
882 | "projects/2025",
883 | "archive/old-notes",
884 | "deep/nested/directory/structure",
885 | "folder/subfolder",
886 | "research/ml",
887 | "meeting-notes",
888 | ]
889 |
890 | for safe_folder in safe_folders:
891 | result = await write_note.fn(
892 | project=test_project.name,
893 | title=f"Test Note in {safe_folder.replace('/', '-')}",
894 | folder=safe_folder,
895 | content="# Test Content\nThis should work normally with security validation.",
896 | tags=["test", "security"],
897 | )
898 |
899 | # Should succeed (not a security error)
900 | assert isinstance(result, str)
901 | assert "# Error" not in result
902 | assert "paths must stay within project boundaries" not in result
903 | # Should be normal successful creation/update
904 | assert ("# Created note" in result) or ("# Updated note" in result)
905 | assert safe_folder in result # Should show in file_path
906 |
907 | @pytest.mark.asyncio
908 | async def test_write_note_empty_folder_security(self, app, test_project):
909 | """Test that empty folder parameter is handled securely."""
910 | # Empty folder should be allowed (creates in root)
911 | result = await write_note.fn(
912 | project=test_project.name,
913 | title="Root Note",
914 | folder="",
915 | content="# Root Note\nThis note should be created in the project root.",
916 | )
917 |
918 | assert isinstance(result, str)
919 | # Empty folder should not trigger security error
920 | assert "# Error" not in result
921 | assert "paths must stay within project boundaries" not in result
922 | # Should succeed normally
923 | assert ("# Created note" in result) or ("# Updated note" in result)
924 |
925 | @pytest.mark.asyncio
926 | async def test_write_note_none_folder_security(self, app, test_project):
927 | """Test that default folder behavior works securely when folder is omitted."""
928 | # The write_note function requires folder parameter, but we can test with empty string
929 | # which effectively creates in project root
930 | result = await write_note.fn(
931 | project=test_project.name,
932 | title="Root Folder Note",
933 | folder="", # Empty string instead of None since folder is required
934 | content="# Root Folder Note\nThis note should be created in the project root.",
935 | )
936 |
937 | assert isinstance(result, str)
938 | # Empty folder should not trigger security error
939 | assert "# Error" not in result
940 | assert "paths must stay within project boundaries" not in result
941 | # Should succeed normally
942 | assert ("# Created note" in result) or ("# Updated note" in result)
943 |
944 | @pytest.mark.asyncio
945 | async def test_write_note_current_directory_references_security(self, app, test_project):
946 | """Test that current directory references are handled securely."""
947 | # Test current directory references (should be safe)
948 | safe_folders = [
949 | "./notes",
950 | "folder/./subfolder",
951 | "./folder/subfolder",
952 | ]
953 |
954 | for safe_folder in safe_folders:
955 | result = await write_note.fn(
956 | project=test_project.name,
957 | title=f"Current Dir Test {safe_folder.replace('/', '-').replace('.', 'dot')}",
958 | folder=safe_folder,
959 | content="# Current Directory Test\nThis should work with current directory references.",
960 | )
961 |
962 | assert isinstance(result, str)
963 | # Should NOT contain security error message
964 | assert "# Error" not in result
965 | assert "paths must stay within project boundaries" not in result
966 | # Should succeed normally
967 | assert ("# Created note" in result) or ("# Updated note" in result)
968 |
969 | @pytest.mark.asyncio
970 | async def test_write_note_security_with_all_parameters(self, app, test_project):
971 | """Test security validation works with all write_note parameters."""
972 | # Test that security validation is applied even when all other parameters are provided
973 | result = await write_note.fn(
974 | project=test_project.name,
975 | title="Security Test with All Params",
976 | folder="../../../etc/malicious",
977 | content="# Malicious Content\nThis should be blocked by security validation.",
978 | tags=["malicious", "test"],
979 | entity_type="guide",
980 | )
981 |
982 | assert isinstance(result, str)
983 | assert "# Error" in result
984 | assert "paths must stay within project boundaries" in result
985 | assert "../../../etc/malicious" in result
986 |
987 | @pytest.mark.asyncio
988 | async def test_write_note_security_logging(self, app, test_project, caplog):
989 | """Test that security violations are properly logged."""
990 | # Attempt path traversal attack
991 | result = await write_note.fn(
992 | project=test_project.name,
993 | title="Security Logging Test",
994 | folder="../../../etc/passwd_folder",
995 | content="# Test Content\nThis should trigger security logging.",
996 | )
997 |
998 | assert "# Error" in result
999 | assert "paths must stay within project boundaries" in result
1000 |
1001 | # Check that security violation was logged
1002 | # Note: This test may need adjustment based on the actual logging setup
1003 | # The security validation should generate a warning log entry
1004 |
1005 | @pytest.mark.asyncio
1006 | async def test_write_note_preserves_functionality_with_security(self, app, test_project):
1007 | """Test that security validation doesn't break normal note creation functionality."""
1008 | # Create a note with all features to ensure security validation doesn't interfere
1009 | result = await write_note.fn(
1010 | project=test_project.name,
1011 | title="Full Feature Security Test",
1012 | folder="security-tests",
1013 | content=dedent("""
1014 | # Full Feature Security Test
1015 |
1016 | This note tests that security validation doesn't break normal functionality.
1017 |
1018 | ## Observations
1019 | - [security] Path validation working correctly #security
1020 | - [feature] All features still functional #test
1021 |
1022 | ## Relations
1023 | - relates_to [[Security Implementation]]
1024 | - depends_on [[Path Validation]]
1025 |
1026 | Additional content with various formatting.
1027 | """).strip(),
1028 | tags=["security", "test", "full-feature"],
1029 | entity_type="guide",
1030 | )
1031 |
1032 | # Should succeed normally
1033 | assert isinstance(result, str)
1034 | assert "# Error" not in result
1035 | assert "paths must stay within project boundaries" not in result
1036 | assert "# Created note" in result
1037 | assert "file_path: security-tests/Full Feature Security Test.md" in result
1038 | assert "permalink: security-tests/full-feature-security-test" in result
1039 |
1040 | # Should process observations and relations
1041 | assert "## Observations" in result
1042 | assert "## Relations" in result
1043 | assert "## Tags" in result
1044 |
1045 | # Should show proper counts
1046 | assert "security: 1" in result
1047 | assert "feature: 1" in result
1048 |
1049 |
1050 | class TestWriteNoteSecurityEdgeCases:
1051 | """Test edge cases for write_note security validation."""
1052 |
1053 | @pytest.mark.asyncio
1054 | async def test_write_note_unicode_folder_attacks(self, app, test_project):
1055 | """Test that Unicode-based path traversal attempts are blocked."""
1056 | # Test Unicode path traversal attempts
1057 | unicode_attack_folders = [
1058 | "notes/文档/../../../etc", # Chinese characters
1059 | "docs/café/../../secrets", # Accented characters
1060 | "files/αβγ/../../../malicious", # Greek characters
1061 | ]
1062 |
1063 | for attack_folder in unicode_attack_folders:
1064 | result = await write_note.fn(
1065 | project=test_project.name,
1066 | title="Unicode Attack Test",
1067 | folder=attack_folder,
1068 | content="# Unicode Attack\nThis should be blocked.",
1069 | )
1070 |
1071 | assert isinstance(result, str)
1072 | assert "# Error" in result
1073 | assert "paths must stay within project boundaries" in result
1074 |
1075 | @pytest.mark.asyncio
1076 | async def test_write_note_very_long_attack_folder(self, app, test_project):
1077 | """Test handling of very long attack folder paths."""
1078 | # Create a very long path traversal attack
1079 | long_attack_folder = "../" * 1000 + "etc/malicious"
1080 |
1081 | result = await write_note.fn(
1082 | project=test_project.name,
1083 | title="Long Attack Test",
1084 | folder=long_attack_folder,
1085 | content="# Long Attack\nThis should be blocked.",
1086 | )
1087 |
1088 | assert isinstance(result, str)
1089 | assert "# Error" in result
1090 | assert "paths must stay within project boundaries" in result
1091 |
1092 | @pytest.mark.asyncio
1093 | async def test_write_note_case_variations_attacks(self, app, test_project):
1094 | """Test that case variations don't bypass security."""
1095 | # Test case variations (though case sensitivity depends on filesystem)
1096 | case_attack_folders = [
1097 | "../ETC",
1098 | "../Etc/SECRETS",
1099 | "..\\WINDOWS",
1100 | "~/SECRETS",
1101 | ]
1102 |
1103 | for attack_folder in case_attack_folders:
1104 | result = await write_note.fn(
1105 | project=test_project.name,
1106 | title="Case Variation Attack Test",
1107 | folder=attack_folder,
1108 | content="# Case Attack\nThis should be blocked.",
1109 | )
1110 |
1111 | assert isinstance(result, str)
1112 | assert "# Error" in result
1113 | assert "paths must stay within project boundaries" in result
1114 |
1115 | @pytest.mark.asyncio
1116 | async def test_write_note_whitespace_in_attack_folders(self, app, test_project):
1117 | """Test that whitespace doesn't help bypass security."""
1118 | # Test attack folders with various whitespace
1119 | whitespace_attack_folders = [
1120 | " ../../../etc ",
1121 | "\t../../../secrets\t",
1122 | " ..\\..\\Windows ",
1123 | "notes/ ../../ malicious",
1124 | ]
1125 |
1126 | for attack_folder in whitespace_attack_folders:
1127 | result = await write_note.fn(
1128 | project=test_project.name,
1129 | title="Whitespace Attack Test",
1130 | folder=attack_folder,
1131 | content="# Whitespace Attack\nThis should be blocked.",
1132 | )
1133 |
1134 | assert isinstance(result, str)
1135 | # The attack should still be blocked even with whitespace
1136 | if ".." in attack_folder.strip() or "~" in attack_folder.strip():
1137 | assert "# Error" in result
1138 | assert "paths must stay within project boundaries" in result
1139 |
```
--------------------------------------------------------------------------------
/specs/SPEC-17 Semantic Search with ChromaDB.md:
--------------------------------------------------------------------------------
```markdown
1 | ---
2 | title: 'SPEC-17: Semantic Search with ChromaDB'
3 | type: spec
4 | permalink: specs/spec-17-semantic-search-chromadb
5 | tags:
6 | - search
7 | - chromadb
8 | - semantic-search
9 | - vector-database
10 | - postgres-migration
11 | ---
12 |
13 | # SPEC-17: Semantic Search with ChromaDB
14 |
15 | Why ChromaDB for Knowledge Management
16 |
17 | Your users aren't just searching for keywords - they're trying to:
18 | - "Find notes related to this concept"
19 | - "Show me similar ideas"
20 | - "What else did I write about this topic?"
21 |
22 | Example:
23 | # User searches: "AI ethics"
24 |
25 | # FTS5/MeiliSearch finds:
26 | - "AI ethics guidelines" ✅
27 | - "ethical AI development" ✅
28 | - "artificial intelligence" ❌ No keyword match
29 |
30 | # ChromaDB finds:
31 | - "AI ethics guidelines" ✅
32 | - "ethical AI development" ✅
33 | - "artificial intelligence" ✅ Semantic match!
34 | - "bias in ML models" ✅ Related concept
35 | - "responsible technology" ✅ Similar theme
36 | - "neural network fairness" ✅ Connected idea
37 |
38 | ChromaDB vs MeiliSearch vs Typesense
39 |
40 | | Feature | ChromaDB | MeiliSearch | Typesense |
41 | |------------------|--------------------|--------------------|--------------------|
42 | | Semantic Search | ✅ Excellent | ❌ No | ❌ No |
43 | | Keyword Search | ⚠️ Via metadata | ✅ Excellent | ✅ Excellent |
44 | | Local Deployment | ✅ Embedded mode | ⚠️ Server required | ⚠️ Server required |
45 | | No Server Needed | ✅ YES! | ❌ No | ❌ No |
46 | | Embedding Cost | ~$0.13/1M tokens | None | None |
47 | | Search Speed | 50-200ms | 10-50ms | 10-50ms |
48 | | Best For | Semantic discovery | Exact terms | Exact terms |
49 |
50 | The Killer Feature: Embedded Mode
51 |
52 | ChromaDB has an embedded client that runs in-process - NO SERVER NEEDED!
53 |
54 | # Local (FOSS) - ChromaDB embedded in Python process
55 | import chromadb
56 |
57 | client = chromadb.PersistentClient(path="/path/to/chroma_data")
58 | collection = client.get_or_create_collection("knowledge_base")
59 |
60 | # Add documents
61 | collection.add(
62 | ids=["note1", "note2"],
63 | documents=["AI ethics", "Neural networks"],
64 | metadatas=[{"type": "note"}, {"type": "spec"}]
65 | )
66 |
67 | # Search - NO API calls, runs locally!
68 | results = collection.query(
69 | query_texts=["machine learning"],
70 | n_results=10
71 | )
72 |
73 |
74 | ## Why
75 |
76 | ### Current Problem: Database Persistence in Cloud
77 | In cloud deployments, `memory.db` (SQLite) doesn't persist across Docker container restarts. This means:
78 | - Database must be rebuilt on every container restart
79 | - Initial sync takes ~49 seconds for 500 files (after optimization in #352)
80 | - Users experience delays on each deployment
81 |
82 | ### Search Architecture Issues
83 | Current SQLite FTS5 implementation creates a **dual-implementation problem** for PostgreSQL migration:
84 | - FTS5 (SQLite) uses `VIRTUAL TABLE` with `MATCH` queries
85 | - PostgreSQL full-text search uses `TSVECTOR` with `@@` operator
86 | - These are fundamentally incompatible architectures
87 | - Would require **2x search code** and **2x tests** to support both
88 |
89 | **Example of incompatibility:**
90 | ```python
91 | # SQLite FTS5
92 | "content_stems MATCH :text"
93 |
94 | # PostgreSQL
95 | "content_vector @@ plainto_tsquery(:text)"
96 | ```
97 |
98 | ### Search Quality Limitations
99 | Current keyword-based FTS5 has limitations:
100 | - No semantic understanding (search "AI" doesn't find "machine learning")
101 | - No word relationships (search "neural networks" doesn't find "deep learning")
102 | - Limited typo tolerance
103 | - No relevance ranking beyond keyword matching
104 |
105 | ### Strategic Goal: PostgreSQL Migration
106 | Moving to PostgreSQL (Neon) for cloud deployments would:
107 | - ✅ Solve persistence issues (database survives restarts)
108 | - ✅ Enable multi-tenant architecture
109 | - ✅ Better performance for large datasets
110 | - ✅ Support for cloud-native scaling
111 |
112 | **But requires solving the search compatibility problem.**
113 |
114 | ## What
115 |
116 | Migrate from SQLite FTS5 to **ChromaDB** for semantic vector search across all deployments.
117 |
118 | **Key insight:** ChromaDB is **database-agnostic** - it works with both SQLite and PostgreSQL, eliminating the dual-implementation problem.
119 |
120 | ### Affected Areas
121 | - Search implementation (`src/basic_memory/repository/search_repository.py`)
122 | - Search service (`src/basic_memory/services/search_service.py`)
123 | - Search models (`src/basic_memory/models/search.py`)
124 | - Database initialization (`src/basic_memory/db.py`)
125 | - MCP search tools (`src/basic_memory/mcp/tools/search.py`)
126 | - Dependencies (`pyproject.toml` - add ChromaDB)
127 | - Alembic migrations (FTS5 table removal)
128 | - Documentation
129 |
130 | ### What Changes
131 | **Removed:**
132 | - SQLite FTS5 virtual table
133 | - `MATCH` query syntax
134 | - FTS5-specific tokenization and prefix handling
135 | - ~300 lines of FTS5 query preparation code
136 |
137 | **Added:**
138 | - ChromaDB persistent client (embedded mode)
139 | - Vector embedding generation
140 | - Semantic similarity search
141 | - Local embedding model (`sentence-transformers`)
142 | - Collection management for multi-project support
143 |
144 | ### What Stays the Same
145 | - Search API interface (MCP tools, REST endpoints)
146 | - Entity/Observation/Relation indexing workflow
147 | - Multi-project isolation
148 | - Search filtering by type, date, metadata
149 | - Pagination and result formatting
150 | - **All SQL queries for exact lookups and metadata filtering**
151 |
152 | ## Hybrid Architecture: SQL + ChromaDB
153 |
154 | **Critical Design Decision:** ChromaDB **complements** SQL, it doesn't **replace** it.
155 |
156 | ### Why Hybrid?
157 |
158 | ChromaDB is excellent for semantic text search but terrible for exact lookups. SQL is perfect for exact lookups and structured queries. We use both:
159 |
160 | ```
161 | ┌─────────────────────────────────────────────────┐
162 | │ Search Request │
163 | └─────────────────────────────────────────────────┘
164 | ▼
165 | ┌────────────────────────┐
166 | │ SearchRepository │
167 | │ (Smart Router) │
168 | └────────────────────────┘
169 | ▼ ▼
170 | ┌───────────┐ ┌──────────────┐
171 | │ SQL │ │ ChromaDB │
172 | │ Queries │ │ Semantic │
173 | └───────────┘ └──────────────┘
174 | ▼ ▼
175 | Exact lookups Text search
176 | - Permalink - Semantic similarity
177 | - Pattern match - Related concepts
178 | - Title exact - Typo tolerance
179 | - Metadata filter - Fuzzy matching
180 | - Date ranges
181 | ```
182 |
183 | ### When to Use Each
184 |
185 | #### Use SQL For (Fast & Exact)
186 |
187 | **Exact Permalink Lookup:**
188 | ```python
189 | # Find by exact permalink - SQL wins
190 | "SELECT * FROM entities WHERE permalink = 'specs/search-feature'"
191 | # ~1ms, perfect for exact matches
192 |
193 | # ChromaDB would be: ~50ms, wasteful
194 | ```
195 |
196 | **Pattern Matching:**
197 | ```python
198 | # Find all specs - SQL wins
199 | "SELECT * FROM entities WHERE permalink GLOB 'specs/*'"
200 | # ~5ms, perfect for wildcards
201 |
202 | # ChromaDB doesn't support glob patterns
203 | ```
204 |
205 | **Pure Metadata Queries:**
206 | ```python
207 | # Find all meetings tagged "important" - SQL wins
208 | "SELECT * FROM entities
209 | WHERE json_extract(entity_metadata, '$.entity_type') = 'meeting'
210 | AND json_extract(entity_metadata, '$.tags') LIKE '%important%'"
211 | # ~5ms, structured query
212 |
213 | # No text search needed, SQL is faster and simpler
214 | ```
215 |
216 | **Date Filtering:**
217 | ```python
218 | # Find recent specs - SQL wins
219 | "SELECT * FROM entities
220 | WHERE entity_type = 'spec'
221 | AND created_at > '2024-01-01'
222 | ORDER BY created_at DESC"
223 | # ~2ms, perfect for structured data
224 | ```
225 |
226 | #### Use ChromaDB For (Semantic & Fuzzy)
227 |
228 | **Semantic Content Search:**
229 | ```python
230 | # Find notes about "neural networks" - ChromaDB wins
231 | collection.query(query_texts=["neural networks"])
232 | # Finds: "machine learning", "deep learning", "AI models"
233 | # ~50-100ms, semantic understanding
234 |
235 | # SQL FTS5 would only find exact keyword matches
236 | ```
237 |
238 | **Text Search + Metadata:**
239 | ```python
240 | # Find meeting notes about "project planning" tagged "important"
241 | collection.query(
242 | query_texts=["project planning"],
243 | where={
244 | "entity_type": "meeting",
245 | "tags": {"$contains": "important"}
246 | }
247 | )
248 | # ~100ms, semantic search with filters
249 | # Finds: "roadmap discussion", "sprint planning", etc.
250 | ```
251 |
252 | **Typo Tolerance:**
253 | ```python
254 | # User types "serch feature" (typo) - ChromaDB wins
255 | collection.query(query_texts=["serch feature"])
256 | # Still finds: "search feature" documents
257 | # ~50-100ms, fuzzy matching
258 |
259 | # SQL would find nothing
260 | ```
261 |
262 | ### Performance Comparison
263 |
264 | | Query Type | SQL | ChromaDB | Winner |
265 | |-----------|-----|----------|--------|
266 | | Exact permalink | 1-2ms | 50ms | ✅ SQL |
267 | | Pattern match (specs/*) | 5-10ms | N/A | ✅ SQL |
268 | | Pure metadata filter | 5ms | 50ms | ✅ SQL |
269 | | Semantic text search | ❌ Can't | 50-100ms | ✅ ChromaDB |
270 | | Text + metadata | ❌ Keywords only | 100ms | ✅ ChromaDB |
271 | | Typo tolerance | ❌ Can't | 50ms | ✅ ChromaDB |
272 |
273 | ### Metadata/Frontmatter Handling
274 |
275 | **Both systems support full frontmatter filtering!**
276 |
277 | #### SQL Metadata Storage
278 |
279 | ```python
280 | # Entities table stores frontmatter as JSON
281 | CREATE TABLE entities (
282 | id INTEGER PRIMARY KEY,
283 | title TEXT,
284 | permalink TEXT,
285 | file_path TEXT,
286 | entity_type TEXT,
287 | entity_metadata JSON, -- All frontmatter here!
288 | created_at DATETIME,
289 | ...
290 | )
291 |
292 | # Query frontmatter fields
293 | SELECT * FROM entities
294 | WHERE json_extract(entity_metadata, '$.entity_type') = 'meeting'
295 | AND json_extract(entity_metadata, '$.tags') LIKE '%important%'
296 | AND json_extract(entity_metadata, '$.status') = 'completed'
297 | ```
298 |
299 | #### ChromaDB Metadata Storage
300 |
301 | ```python
302 | # When indexing, store ALL frontmatter as metadata
303 | class ChromaSearchBackend:
304 | async def index_entity(self, entity: Entity):
305 | """Index with complete frontmatter metadata."""
306 |
307 | # Extract ALL frontmatter fields
308 | metadata = {
309 | "entity_id": entity.id,
310 | "project_id": entity.project_id,
311 | "permalink": entity.permalink,
312 | "file_path": entity.file_path,
313 | "entity_type": entity.entity_type,
314 | "type": "entity",
315 | # ALL frontmatter tags
316 | "tags": entity.entity_metadata.get("tags", []),
317 | # Custom frontmatter fields
318 | "status": entity.entity_metadata.get("status"),
319 | "priority": entity.entity_metadata.get("priority"),
320 | # Spread any other custom fields
321 | **{k: v for k, v in entity.entity_metadata.items()
322 | if k not in ["tags", "entity_type"]}
323 | }
324 |
325 | self.collection.upsert(
326 | ids=[f"entity_{entity.id}_{entity.project_id}"],
327 | documents=[self._format_document(entity)],
328 | metadatas=[metadata] # Full frontmatter!
329 | )
330 | ```
331 |
332 | #### ChromaDB Metadata Queries
333 |
334 | ChromaDB supports rich filtering:
335 |
336 | ```python
337 | # Simple filter - single field
338 | collection.query(
339 | query_texts=["project planning"],
340 | where={"entity_type": "meeting"}
341 | )
342 |
343 | # Multiple conditions (AND)
344 | collection.query(
345 | query_texts=["architecture decisions"],
346 | where={
347 | "entity_type": "spec",
348 | "tags": {"$contains": "important"}
349 | }
350 | )
351 |
352 | # Complex filters with operators
353 | collection.query(
354 | query_texts=["machine learning"],
355 | where={
356 | "$and": [
357 | {"entity_type": {"$in": ["note", "spec"]}},
358 | {"tags": {"$contains": "AI"}},
359 | {"created_at": {"$gt": "2024-01-01"}},
360 | {"status": "in-progress"}
361 | ]
362 | }
363 | )
364 |
365 | # Multiple tags (all must match)
366 | collection.query(
367 | query_texts=["cloud architecture"],
368 | where={
369 | "$and": [
370 | {"tags": {"$contains": "architecture"}},
371 | {"tags": {"$contains": "cloud"}}
372 | ]
373 | }
374 | )
375 | ```
376 |
377 | ### Smart Routing Implementation
378 |
379 | ```python
380 | class SearchRepository:
381 | def __init__(
382 | self,
383 | session_maker: async_sessionmaker[AsyncSession],
384 | project_id: int,
385 | chroma_backend: ChromaSearchBackend
386 | ):
387 | self.sql = session_maker # Keep SQL!
388 | self.chroma = chroma_backend
389 | self.project_id = project_id
390 |
391 | async def search(
392 | self,
393 | search_text: Optional[str] = None,
394 | permalink: Optional[str] = None,
395 | permalink_match: Optional[str] = None,
396 | title: Optional[str] = None,
397 | types: Optional[List[str]] = None,
398 | tags: Optional[List[str]] = None,
399 | after_date: Optional[datetime] = None,
400 | custom_metadata: Optional[dict] = None,
401 | limit: int = 10,
402 | offset: int = 0,
403 | ) -> List[SearchIndexRow]:
404 | """Smart routing between SQL and ChromaDB."""
405 |
406 | # ==========================================
407 | # Route 1: Exact Lookups → SQL (1-5ms)
408 | # ==========================================
409 |
410 | if permalink:
411 | # Exact permalink: "specs/search-feature"
412 | return await self._sql_permalink_lookup(permalink)
413 |
414 | if permalink_match:
415 | # Pattern match: "specs/*"
416 | return await self._sql_pattern_match(permalink_match)
417 |
418 | if title and not search_text:
419 | # Exact title lookup (no semantic search needed)
420 | return await self._sql_title_match(title)
421 |
422 | # ==========================================
423 | # Route 2: Pure Metadata → SQL (5-10ms)
424 | # ==========================================
425 |
426 | # No text search, just filtering by metadata
427 | if not search_text and (types or tags or after_date or custom_metadata):
428 | return await self._sql_metadata_filter(
429 | types=types,
430 | tags=tags,
431 | after_date=after_date,
432 | custom_metadata=custom_metadata,
433 | limit=limit,
434 | offset=offset
435 | )
436 |
437 | # ==========================================
438 | # Route 3: Text Search → ChromaDB (50-100ms)
439 | # ==========================================
440 |
441 | if search_text:
442 | # Build ChromaDB metadata filters
443 | where_filters = self._build_chroma_filters(
444 | types=types,
445 | tags=tags,
446 | after_date=after_date,
447 | custom_metadata=custom_metadata
448 | )
449 |
450 | # Semantic search with metadata filtering
451 | return await self.chroma.search(
452 | query_text=search_text,
453 | project_id=self.project_id,
454 | where=where_filters,
455 | limit=limit
456 | )
457 |
458 | # ==========================================
459 | # Route 4: List All → SQL (2-5ms)
460 | # ==========================================
461 |
462 | return await self._sql_list_entities(
463 | limit=limit,
464 | offset=offset
465 | )
466 |
467 | def _build_chroma_filters(
468 | self,
469 | types: Optional[List[str]] = None,
470 | tags: Optional[List[str]] = None,
471 | after_date: Optional[datetime] = None,
472 | custom_metadata: Optional[dict] = None
473 | ) -> dict:
474 | """Build ChromaDB where clause from filters."""
475 | filters = {"project_id": self.project_id}
476 |
477 | # Type filtering
478 | if types:
479 | if len(types) == 1:
480 | filters["entity_type"] = types[0]
481 | else:
482 | filters["entity_type"] = {"$in": types}
483 |
484 | # Tag filtering (array contains)
485 | if tags:
486 | if len(tags) == 1:
487 | filters["tags"] = {"$contains": tags[0]}
488 | else:
489 | # Multiple tags - all must match
490 | filters = {
491 | "$and": [
492 | filters,
493 | *[{"tags": {"$contains": tag}} for tag in tags]
494 | ]
495 | }
496 |
497 | # Date filtering
498 | if after_date:
499 | filters["created_at"] = {"$gt": after_date.isoformat()}
500 |
501 | # Custom frontmatter fields
502 | if custom_metadata:
503 | filters.update(custom_metadata)
504 |
505 | return filters
506 |
507 | async def _sql_metadata_filter(
508 | self,
509 | types: Optional[List[str]] = None,
510 | tags: Optional[List[str]] = None,
511 | after_date: Optional[datetime] = None,
512 | custom_metadata: Optional[dict] = None,
513 | limit: int = 10,
514 | offset: int = 0
515 | ) -> List[SearchIndexRow]:
516 | """Pure metadata queries using SQL."""
517 | conditions = ["project_id = :project_id"]
518 | params = {"project_id": self.project_id}
519 |
520 | if types:
521 | type_list = ", ".join(f"'{t}'" for t in types)
522 | conditions.append(f"entity_type IN ({type_list})")
523 |
524 | if tags:
525 | # Check each tag
526 | for i, tag in enumerate(tags):
527 | param_name = f"tag_{i}"
528 | conditions.append(
529 | f"json_extract(entity_metadata, '$.tags') LIKE :{param_name}"
530 | )
531 | params[param_name] = f"%{tag}%"
532 |
533 | if after_date:
534 | conditions.append("created_at > :after_date")
535 | params["after_date"] = after_date
536 |
537 | if custom_metadata:
538 | for key, value in custom_metadata.items():
539 | param_name = f"meta_{key}"
540 | conditions.append(
541 | f"json_extract(entity_metadata, '$.{key}') = :{param_name}"
542 | )
543 | params[param_name] = value
544 |
545 | where = " AND ".join(conditions)
546 | sql = f"""
547 | SELECT * FROM entities
548 | WHERE {where}
549 | ORDER BY created_at DESC
550 | LIMIT :limit OFFSET :offset
551 | """
552 | params["limit"] = limit
553 | params["offset"] = offset
554 |
555 | async with db.scoped_session(self.session_maker) as session:
556 | result = await session.execute(text(sql), params)
557 | return self._format_sql_results(result)
558 | ```
559 |
560 | ### Real-World Examples
561 |
562 | #### Example 1: Pure Metadata Query (No Text)
563 | ```python
564 | # "Find all meetings tagged 'important'"
565 | results = await search_repo.search(
566 | types=["meeting"],
567 | tags=["important"]
568 | )
569 |
570 | # Routing: → SQL (~5ms)
571 | # SQL: SELECT * FROM entities
572 | # WHERE entity_type = 'meeting'
573 | # AND json_extract(entity_metadata, '$.tags') LIKE '%important%'
574 | ```
575 |
576 | #### Example 2: Semantic Search (No Metadata)
577 | ```python
578 | # "Find notes about neural networks"
579 | results = await search_repo.search(
580 | search_text="neural networks"
581 | )
582 |
583 | # Routing: → ChromaDB (~80ms)
584 | # Finds: "machine learning", "deep learning", "AI models", etc.
585 | ```
586 |
587 | #### Example 3: Semantic + Metadata
588 | ```python
589 | # "Find meeting notes about 'project planning' tagged 'important'"
590 | results = await search_repo.search(
591 | search_text="project planning",
592 | types=["meeting"],
593 | tags=["important"]
594 | )
595 |
596 | # Routing: → ChromaDB with filters (~100ms)
597 | # ChromaDB: query_texts=["project planning"]
598 | # where={"entity_type": "meeting",
599 | # "tags": {"$contains": "important"}}
600 | # Finds: "roadmap discussion", "sprint planning", etc.
601 | ```
602 |
603 | #### Example 4: Complex Frontmatter Query
604 | ```python
605 | # "Find in-progress specs with multiple tags, recent"
606 | results = await search_repo.search(
607 | types=["spec"],
608 | tags=["architecture", "cloud"],
609 | after_date=datetime(2024, 1, 1),
610 | custom_metadata={"status": "in-progress"}
611 | )
612 |
613 | # Routing: → SQL (~10ms)
614 | # No text search, pure structured query - SQL is faster
615 | ```
616 |
617 | #### Example 5: Semantic + Complex Metadata
618 | ```python
619 | # "Find notes about 'authentication' that are in-progress"
620 | results = await search_repo.search(
621 | search_text="authentication",
622 | custom_metadata={"status": "in-progress", "priority": "high"}
623 | )
624 |
625 | # Routing: → ChromaDB with metadata filters (~100ms)
626 | # Semantic search for "authentication" concept
627 | # Filters by status and priority in metadata
628 | ```
629 |
630 | #### Example 6: Exact Permalink
631 | ```python
632 | # "Show me specs/search-feature"
633 | results = await search_repo.search(
634 | permalink="specs/search-feature"
635 | )
636 |
637 | # Routing: → SQL (~1ms)
638 | # SQL: SELECT * FROM entities WHERE permalink = 'specs/search-feature'
639 | ```
640 |
641 | #### Example 7: Pattern Match
642 | ```python
643 | # "Show me all specs"
644 | results = await search_repo.search(
645 | permalink_match="specs/*"
646 | )
647 |
648 | # Routing: → SQL (~5ms)
649 | # SQL: SELECT * FROM entities WHERE permalink GLOB 'specs/*'
650 | ```
651 |
652 | ### What We Remove vs Keep
653 |
654 | **REMOVE (FTS5-specific):**
655 | - ❌ `CREATE VIRTUAL TABLE search_index USING fts5(...)`
656 | - ❌ `MATCH` operator queries
657 | - ❌ FTS5 tokenization configuration
658 | - ❌ ~300 lines of FTS5 query preparation code
659 | - ❌ Trigram generation and prefix handling
660 |
661 | **KEEP (Standard SQL):**
662 | - ✅ `SELECT * FROM entities WHERE permalink = :permalink`
663 | - ✅ `SELECT * FROM entities WHERE permalink GLOB :pattern`
664 | - ✅ `SELECT * FROM entities WHERE title LIKE :title`
665 | - ✅ `SELECT * FROM entities WHERE json_extract(entity_metadata, ...) = :value`
666 | - ✅ All date filtering, pagination, sorting
667 | - ✅ Entity table structure and indexes
668 |
669 | **ADD (ChromaDB):**
670 | - ✅ ChromaDB persistent client (embedded)
671 | - ✅ Semantic vector search
672 | - ✅ Metadata filtering in ChromaDB
673 | - ✅ Smart routing logic
674 |
675 | ## How (High Level)
676 |
677 | ### Architecture Overview
678 |
679 | ```
680 | ┌─────────────────────────────────────────────────────────────┐
681 | │ FOSS Deployment (Local) │
682 | ├─────────────────────────────────────────────────────────────┤
683 | │ SQLite (data) + ChromaDB embedded (search) │
684 | │ - No external services │
685 | │ - Local embedding model (sentence-transformers) │
686 | │ - Persists in ~/.basic-memory/chroma_data/ │
687 | └─────────────────────────────────────────────────────────────┘
688 |
689 | ┌─────────────────────────────────────────────────────────────┐
690 | │ Cloud Deployment (Multi-tenant) │
691 | ├─────────────────────────────────────────────────────────────┤
692 | │ PostgreSQL/Neon (data) + ChromaDB server (search) │
693 | │ - Neon serverless Postgres for persistence │
694 | │ - ChromaDB server in Docker container │
695 | │ - Optional: OpenAI embeddings for better quality │
696 | └─────────────────────────────────────────────────────────────┘
697 | ```
698 |
699 | ### Phase 1: ChromaDB Integration (2-3 days)
700 |
701 | #### 1. Add ChromaDB Dependency
702 | ```toml
703 | # pyproject.toml
704 | dependencies = [
705 | "chromadb>=0.4.0",
706 | "sentence-transformers>=2.2.0", # Local embeddings
707 | ]
708 | ```
709 |
710 | #### 2. Create ChromaSearchBackend
711 | ```python
712 | # src/basic_memory/search/chroma_backend.py
713 | from chromadb import PersistentClient
714 | from chromadb.utils import embedding_functions
715 |
716 | class ChromaSearchBackend:
717 | def __init__(
718 | self,
719 | persist_directory: Path,
720 | collection_name: str = "knowledge_base",
721 | embedding_model: str = "all-MiniLM-L6-v2"
722 | ):
723 | """Initialize ChromaDB with local embeddings."""
724 | self.client = PersistentClient(path=str(persist_directory))
725 |
726 | # Use local sentence-transformers model (no API costs)
727 | self.embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
728 | model_name=embedding_model
729 | )
730 |
731 | self.collection = self.client.get_or_create_collection(
732 | name=collection_name,
733 | embedding_function=self.embed_fn,
734 | metadata={"hnsw:space": "cosine"} # Similarity metric
735 | )
736 |
737 | async def index_entity(self, entity: Entity):
738 | """Index entity with automatic embeddings."""
739 | # Combine title and content for semantic search
740 | document = self._format_document(entity)
741 |
742 | self.collection.upsert(
743 | ids=[f"entity_{entity.id}_{entity.project_id}"],
744 | documents=[document],
745 | metadatas=[{
746 | "entity_id": entity.id,
747 | "project_id": entity.project_id,
748 | "permalink": entity.permalink,
749 | "file_path": entity.file_path,
750 | "entity_type": entity.entity_type,
751 | "type": "entity",
752 | }]
753 | )
754 |
755 | async def search(
756 | self,
757 | query_text: str,
758 | project_id: int,
759 | limit: int = 10,
760 | filters: dict = None
761 | ) -> List[SearchResult]:
762 | """Semantic search with metadata filtering."""
763 | where = {"project_id": project_id}
764 | if filters:
765 | where.update(filters)
766 |
767 | results = self.collection.query(
768 | query_texts=[query_text],
769 | n_results=limit,
770 | where=where
771 | )
772 |
773 | return self._format_results(results)
774 | ```
775 |
776 | #### 3. Update SearchRepository
777 | ```python
778 | # src/basic_memory/repository/search_repository.py
779 | class SearchRepository:
780 | def __init__(
781 | self,
782 | session_maker: async_sessionmaker[AsyncSession],
783 | project_id: int,
784 | chroma_backend: ChromaSearchBackend
785 | ):
786 | self.session_maker = session_maker
787 | self.project_id = project_id
788 | self.chroma = chroma_backend
789 |
790 | async def search(
791 | self,
792 | search_text: Optional[str] = None,
793 | permalink: Optional[str] = None,
794 | # ... other filters
795 | ) -> List[SearchIndexRow]:
796 | """Search using ChromaDB for text, SQL for exact lookups."""
797 |
798 | # For exact permalink/pattern matches, use SQL
799 | if permalink or permalink_match:
800 | return await self._sql_exact_search(...)
801 |
802 | # For text search, use ChromaDB semantic search
803 | if search_text:
804 | results = await self.chroma.search(
805 | query_text=search_text,
806 | project_id=self.project_id,
807 | limit=limit,
808 | filters=self._build_filters(types, after_date, ...)
809 | )
810 | return results
811 |
812 | # Fallback to listing all
813 | return await self._list_entities(...)
814 | ```
815 |
816 | #### 4. Update SearchService
817 | ```python
818 | # src/basic_memory/services/search_service.py
819 | class SearchService:
820 | def __init__(
821 | self,
822 | search_repository: SearchRepository,
823 | entity_repository: EntityRepository,
824 | file_service: FileService,
825 | chroma_backend: ChromaSearchBackend,
826 | ):
827 | self.repository = search_repository
828 | self.entity_repository = entity_repository
829 | self.file_service = file_service
830 | self.chroma = chroma_backend
831 |
832 | async def index_entity(self, entity: Entity):
833 | """Index entity in ChromaDB."""
834 | if entity.is_markdown:
835 | await self._index_entity_markdown(entity)
836 | else:
837 | await self._index_entity_file(entity)
838 |
839 | async def _index_entity_markdown(self, entity: Entity):
840 | """Index markdown entity with full content."""
841 | # Index entity
842 | await self.chroma.index_entity(entity)
843 |
844 | # Index observations (as separate documents)
845 | for obs in entity.observations:
846 | await self.chroma.index_observation(obs, entity)
847 |
848 | # Index relations (metadata only)
849 | for rel in entity.outgoing_relations:
850 | await self.chroma.index_relation(rel, entity)
851 | ```
852 |
853 | ### Phase 2: PostgreSQL Support (1 day)
854 |
855 | #### 1. Add PostgreSQL Database Type
856 | ```python
857 | # src/basic_memory/db.py
858 | class DatabaseType(Enum):
859 | MEMORY = auto()
860 | FILESYSTEM = auto()
861 | POSTGRESQL = auto() # NEW
862 |
863 | @classmethod
864 | def get_db_url(cls, db_path_or_url: str, db_type: "DatabaseType") -> str:
865 | if db_type == cls.POSTGRESQL:
866 | return db_path_or_url # Neon connection string
867 | elif db_type == cls.MEMORY:
868 | return "sqlite+aiosqlite://"
869 | return f"sqlite+aiosqlite:///{db_path_or_url}"
870 | ```
871 |
872 | #### 2. Update Connection Handling
873 | ```python
874 | def _create_engine_and_session(...):
875 | db_url = DatabaseType.get_db_url(db_path_or_url, db_type)
876 |
877 | if db_type == DatabaseType.POSTGRESQL:
878 | # Use asyncpg driver for Postgres
879 | engine = create_async_engine(
880 | db_url,
881 | pool_size=10,
882 | max_overflow=20,
883 | pool_pre_ping=True, # Health checks
884 | )
885 | else:
886 | # SQLite configuration
887 | engine = create_async_engine(db_url, connect_args=connect_args)
888 |
889 | # Only configure SQLite-specific settings for SQLite
890 | if db_type != DatabaseType.MEMORY:
891 | @event.listens_for(engine.sync_engine, "connect")
892 | def enable_wal_mode(dbapi_conn, connection_record):
893 | _configure_sqlite_connection(dbapi_conn, enable_wal=True)
894 |
895 | return engine, async_sessionmaker(engine, expire_on_commit=False)
896 | ```
897 |
898 | #### 3. Remove SQLite-Specific Code
899 | ```python
900 | # Remove from scoped_session context manager:
901 | # await session.execute(text("PRAGMA foreign_keys=ON")) # DELETE
902 |
903 | # PostgreSQL handles foreign keys by default
904 | ```
905 |
906 | ### Phase 3: Migration & Testing (1-2 days)
907 |
908 | #### 1. Create Migration Script
909 | ```python
910 | # scripts/migrate_to_chromadb.py
911 | async def migrate_fts5_to_chromadb():
912 | """One-time migration from FTS5 to ChromaDB."""
913 | # 1. Read all entities from database
914 | entities = await entity_repository.find_all()
915 |
916 | # 2. Index in ChromaDB
917 | for entity in entities:
918 | await search_service.index_entity(entity)
919 |
920 | # 3. Drop FTS5 table (Alembic migration)
921 | await session.execute(text("DROP TABLE IF EXISTS search_index"))
922 | ```
923 |
924 | #### 2. Update Tests
925 | - Replace FTS5 test fixtures with ChromaDB fixtures
926 | - Test semantic search quality
927 | - Test multi-project isolation in ChromaDB
928 | - Benchmark performance vs FTS5
929 |
930 | #### 3. Documentation Updates
931 | - Update search documentation
932 | - Add ChromaDB configuration guide
933 | - Document embedding model options
934 | - PostgreSQL deployment guide
935 |
936 | ### Configuration
937 |
938 | ```python
939 | # config.py
940 | class BasicMemoryConfig:
941 | # Database
942 | database_type: DatabaseType = DatabaseType.FILESYSTEM
943 | database_path: Path = Path.home() / ".basic-memory" / "memory.db"
944 | database_url: Optional[str] = None # For Postgres: postgresql://...
945 |
946 | # Search
947 | chroma_persist_directory: Path = Path.home() / ".basic-memory" / "chroma_data"
948 | embedding_model: str = "all-MiniLM-L6-v2" # Local model
949 | embedding_provider: str = "local" # or "openai"
950 | openai_api_key: Optional[str] = None # For cloud deployments
951 | ```
952 |
953 | ### Deployment Configurations
954 |
955 | #### Local (FOSS)
956 | ```yaml
957 | # Default configuration
958 | database_type: FILESYSTEM
959 | database_path: ~/.basic-memory/memory.db
960 | chroma_persist_directory: ~/.basic-memory/chroma_data
961 | embedding_model: all-MiniLM-L6-v2
962 | embedding_provider: local
963 | ```
964 |
965 | #### Cloud (Docker Compose)
966 | ```yaml
967 | services:
968 | postgres:
969 | image: postgres:15
970 | environment:
971 | POSTGRES_DB: basic_memory
972 | POSTGRES_PASSWORD: ${DB_PASSWORD}
973 |
974 | chromadb:
975 | image: chromadb/chroma:latest
976 | volumes:
977 | - chroma_data:/chroma/chroma
978 | environment:
979 | ALLOW_RESET: true
980 |
981 | app:
982 | environment:
983 | DATABASE_TYPE: POSTGRESQL
984 | DATABASE_URL: postgresql://postgres:${DB_PASSWORD}@postgres/basic_memory
985 | CHROMA_HOST: chromadb
986 | CHROMA_PORT: 8000
987 | EMBEDDING_PROVIDER: local # or openai
988 | ```
989 |
990 | ## How to Evaluate
991 |
992 | ### Success Criteria
993 |
994 | #### Functional Requirements
995 | - ✅ Semantic search finds related concepts (e.g., "AI" finds "machine learning")
996 | - ✅ Exact permalink/pattern matches work (e.g., `specs/*`)
997 | - ✅ Multi-project isolation maintained
998 | - ✅ All existing search filters work (type, date, metadata)
999 | - ✅ MCP tools continue to work without changes
1000 | - ✅ Works with both SQLite and PostgreSQL
1001 |
1002 | #### Performance Requirements
1003 | - ✅ Search latency < 200ms for 1000 documents (local embedding)
1004 | - ✅ Indexing time comparable to FTS5 (~10 files/sec)
1005 | - ✅ Initial sync time not significantly worse than current
1006 | - ✅ Memory footprint < 1GB for local deployments
1007 |
1008 | #### Quality Requirements
1009 | - ✅ Better search relevance than FTS5 keyword matching
1010 | - ✅ Handles typos and word variations
1011 | - ✅ Finds semantically similar content
1012 |
1013 | #### Deployment Requirements
1014 | - ✅ FOSS: Works out-of-box with no external services
1015 | - ✅ Cloud: Integrates with PostgreSQL (Neon)
1016 | - ✅ No breaking changes to MCP API
1017 | - ✅ Migration script for existing users
1018 |
1019 | ### Testing Procedure
1020 |
1021 | #### 1. Unit Tests
1022 | ```bash
1023 | # Test ChromaDB backend
1024 | pytest tests/test_chroma_backend.py
1025 |
1026 | # Test search repository with ChromaDB
1027 | pytest tests/test_search_repository.py
1028 |
1029 | # Test search service
1030 | pytest tests/test_search_service.py
1031 | ```
1032 |
1033 | #### 2. Integration Tests
1034 | ```bash
1035 | # Test full search workflow
1036 | pytest test-int/test_search_integration.py
1037 |
1038 | # Test with PostgreSQL
1039 | DATABASE_TYPE=POSTGRESQL pytest test-int/
1040 | ```
1041 |
1042 | #### 3. Semantic Search Quality Tests
1043 | ```python
1044 | # Test semantic similarity
1045 | search("machine learning") should find:
1046 | - "neural networks"
1047 | - "deep learning"
1048 | - "AI algorithms"
1049 |
1050 | search("software architecture") should find:
1051 | - "system design"
1052 | - "design patterns"
1053 | - "microservices"
1054 | ```
1055 |
1056 | #### 4. Performance Benchmarks
1057 | ```bash
1058 | # Run search benchmarks
1059 | pytest test-int/test_search_performance.py -v
1060 |
1061 | # Measure:
1062 | - Search latency (should be < 200ms)
1063 | - Indexing throughput (should be ~10 files/sec)
1064 | - Memory usage (should be < 1GB)
1065 | ```
1066 |
1067 | #### 5. Migration Testing
1068 | ```bash
1069 | # Test migration from FTS5 to ChromaDB
1070 | python scripts/migrate_to_chromadb.py
1071 |
1072 | # Verify all entities indexed
1073 | # Verify search results quality
1074 | # Verify no data loss
1075 | ```
1076 |
1077 | ### Metrics
1078 |
1079 | **Search Quality:**
1080 | - Semantic relevance score (manual evaluation)
1081 | - Precision/recall for common queries
1082 | - User satisfaction (qualitative)
1083 |
1084 | **Performance:**
1085 | - Average search latency (ms)
1086 | - P95/P99 search latency
1087 | - Indexing throughput (files/sec)
1088 | - Memory usage (MB)
1089 |
1090 | **Deployment:**
1091 | - Local deployment success rate
1092 | - Cloud deployment success rate
1093 | - Migration success rate
1094 |
1095 | ## Implementation Checklist
1096 |
1097 | ### Phase 1: ChromaDB Integration
1098 | - [ ] Add ChromaDB and sentence-transformers dependencies
1099 | - [ ] Create ChromaSearchBackend class
1100 | - [ ] Update SearchRepository to use ChromaDB
1101 | - [ ] Update SearchService indexing methods
1102 | - [ ] Remove FTS5 table creation code
1103 | - [ ] Update search query logic
1104 | - [ ] Add ChromaDB configuration to BasicMemoryConfig
1105 |
1106 | ### Phase 2: PostgreSQL Support
1107 | - [ ] Add DatabaseType.POSTGRESQL enum
1108 | - [ ] Update get_db_url() for Postgres connection strings
1109 | - [ ] Add asyncpg dependency
1110 | - [ ] Update engine creation for Postgres
1111 | - [ ] Remove SQLite-specific PRAGMA statements
1112 | - [ ] Test with Neon database
1113 |
1114 | ### Phase 3: Testing & Migration
1115 | - [ ] Write unit tests for ChromaSearchBackend
1116 | - [ ] Update search integration tests
1117 | - [ ] Add semantic search quality tests
1118 | - [ ] Create performance benchmarks
1119 | - [ ] Write migration script from FTS5
1120 | - [ ] Test migration with existing data
1121 | - [ ] Update documentation
1122 |
1123 | ### Phase 4: Deployment
1124 | - [ ] Update docker-compose.yml for cloud
1125 | - [ ] Document local FOSS deployment
1126 | - [ ] Document cloud PostgreSQL deployment
1127 | - [ ] Create migration guide for users
1128 | - [ ] Update MCP tool documentation
1129 |
1130 | ## Notes
1131 |
1132 | ### Embedding Model Trade-offs
1133 |
1134 | **Local Model: `all-MiniLM-L6-v2`**
1135 | - Size: 80MB download
1136 | - Speed: ~50ms embedding time
1137 | - Dimensions: 384
1138 | - Cost: $0
1139 | - Quality: Good for general knowledge
1140 | - Best for: FOSS deployments
1141 |
1142 | **OpenAI: `text-embedding-3-small`**
1143 | - Speed: ~100-200ms (API call)
1144 | - Dimensions: 1536
1145 | - Cost: ~$0.13 per 1M tokens (~$0.01 per 1000 notes)
1146 | - Quality: Excellent
1147 | - Best for: Cloud deployments with budget
1148 |
1149 | ### ChromaDB Storage
1150 |
1151 | ChromaDB stores data in:
1152 | ```
1153 | ~/.basic-memory/chroma_data/
1154 | ├── chroma.sqlite3 # Metadata
1155 | ├── index/ # HNSW indexes
1156 | └── collections/ # Vector data
1157 | ```
1158 |
1159 | Typical sizes:
1160 | - 100 notes: ~5MB
1161 | - 1000 notes: ~50MB
1162 | - 10000 notes: ~500MB
1163 |
1164 | ### Why Not Keep FTS5?
1165 |
1166 | **Considered:** Hybrid approach (FTS5 for SQLite + tsvector for Postgres)
1167 | **Rejected because:**
1168 | - 2x the code to maintain
1169 | - 2x the tests to write
1170 | - 2x the bugs to fix
1171 | - Inconsistent search behavior between deployments
1172 | - ChromaDB provides better search quality anyway
1173 |
1174 | **ChromaDB wins:**
1175 | - One implementation for both databases
1176 | - Better search quality (semantic!)
1177 | - Database-agnostic architecture
1178 | - Embedded mode for FOSS (no servers needed)
1179 |
1180 | ## implementation
1181 |
1182 | Proposed Architecture
1183 |
1184 | Option 1: ChromaDB Only (Simplest)
1185 |
1186 | class ChromaSearchBackend:
1187 | def __init__(self, path: str, embedding_model: str = "all-MiniLM-L6-v2"):yes
1188 | # For local: embedded client (no server!)
1189 | self.client = chromadb.PersistentClient(path=path)
1190 |
1191 | # Use local embedding model (no API costs!)
1192 | from chromadb.utils import embedding_functions
1193 | self.embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
1194 | model_name=embedding_model
1195 | )
1196 |
1197 | self.collection = self.client.get_or_create_collection(
1198 | name="knowledge_base",
1199 | embedding_function=self.embed_fn
1200 | )
1201 |
1202 | async def index_entity(self, entity: Entity):
1203 | # ChromaDB handles embeddings automatically!
1204 | self.collection.upsert(
1205 | ids=[str(entity.id)],
1206 | documents=[f"{entity.title}\n{entity.content}"],
1207 | metadatas=[{
1208 | "permalink": entity.permalink,
1209 | "type": entity.entity_type,
1210 | "file_path": entity.file_path
1211 | }]
1212 | )
1213 |
1214 | async def search(self, query: str, filters: dict = None):
1215 | # Semantic search with optional metadata filters
1216 | results = self.collection.query(
1217 | query_texts=[query],
1218 | n_results=10,
1219 | where=filters # e.g., {"type": "note"}
1220 | )
1221 | return results
1222 |
1223 | Deployment:
1224 | - Local (FOSS): ChromaDB embedded, local embedding model, NO servers
1225 | - Cloud: ChromaDB server OR still embedded (it's just a Python lib!)
1226 |
1227 | Option 2: Hybrid FTS + ChromaDB (Best UX)
1228 |
1229 | class HybridSearchBackend:
1230 | def __init__(self):
1231 | self.fts = SQLiteFTS5Backend() # Fast keyword search
1232 | self.chroma = ChromaSearchBackend() # Semantic search
1233 |
1234 | async def search(self, query: str, search_type: str = "auto"):
1235 | if search_type == "exact":
1236 | # User wants exact match: "specs/search-feature"
1237 | return await self.fts.search(query)
1238 |
1239 | elif search_type == "semantic":
1240 | # User wants related concepts
1241 | return await self.chroma.search(query)
1242 |
1243 | else: # "auto"
1244 | # Check if query looks like exact match
1245 | if "/" in query or query.startswith('"'):
1246 | return await self.fts.search(query)
1247 |
1248 | # Otherwise use semantic search
1249 | return await self.chroma.search(query)
1250 |
1251 | Embedding Options
1252 |
1253 | Option A: Local Model (FREE, FOSS-friendly)
1254 |
1255 | # Uses sentence-transformers (runs locally)
1256 | # Model: ~100MB download
1257 | # Speed: ~50-100ms for embedding
1258 | # Cost: $0
1259 |
1260 | from chromadb.utils import embedding_functions
1261 | embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
1262 | model_name="all-MiniLM-L6-v2" # Fast, accurate, free
1263 | )
1264 |
1265 | Option B: OpenAI Embeddings (Cloud only)
1266 |
1267 | # For cloud users who want best quality
1268 | # Model: text-embedding-3-small
1269 | # Speed: ~100-200ms via API
1270 | # Cost: ~$0.13 per 1M tokens (~$0.01 per 1000 notes)
1271 |
1272 | embed_fn = embedding_functions.OpenAIEmbeddingFunction(
1273 | api_key="...",
1274 | model_name="text-embedding-3-small"
1275 | )
1276 |
1277 | Performance Comparison
1278 |
1279 | Local embedding model: all-MiniLM-L6-v2
1280 | Embedding time: ~50ms per note
1281 | Search time: ~100ms for 1000 notes
1282 | Memory: ~500MB (model + ChromaDB)
1283 | Cost: $0
1284 | Quality: Good (384 dimensions)
1285 |
1286 | OpenAI embeddings: text-embedding-3-small
1287 | Embedding time: ~100-200ms per note (API call)
1288 | Search time: ~50ms for 1000 notes
1289 | Cost: ~$0.01 per 1000 notes
1290 | Quality: Excellent (1536 dimensions)
1291 |
1292 | My Recommendation: ChromaDB with Local Embeddings
1293 |
1294 | Here's the plan:
1295 |
1296 | Phase 1: Local ChromaDB (1-2 days)
1297 |
1298 | # FOSS version
1299 | - SQLite for data persistence
1300 | - ChromaDB embedded for semantic search
1301 | - Local embedding model (no API costs)
1302 | - NO external services required
1303 |
1304 | Benefits:
1305 | - ✅ Same deployment as current (just Python package)
1306 | - ✅ Semantic search for better UX
1307 | - ✅ Free embeddings with local model
1308 | - ✅ No servers needed
1309 |
1310 | Phase 2: Postgres + ChromaDB Cloud (1-2 days)
1311 |
1312 | # Cloud version
1313 | - Postgres for data persistence
1314 | - ChromaDB server for semantic search
1315 | - OpenAI embeddings (higher quality)
1316 | - OR keep local embeddings (cheaper)
1317 |
1318 | Phase 3: Hybrid Search (optional, 1 day)
1319 |
1320 | # Add FTS for exact matches alongside ChromaDB
1321 | - Quick keyword search when needed
1322 | - Semantic search for exploration
1323 | - Best of both worlds
1324 |
1325 | Code Estimate
1326 |
1327 | Just ChromaDB (replacing FTS5):
1328 | - Remove FTS5 code: 2 hours
1329 | - Add ChromaDB backend: 4 hours
1330 | - Update search service: 2 hours
1331 | - Testing: 4 hours
1332 | - Total: 1.5 days
1333 |
1334 | ChromaDB + Postgres migration:
1335 | - Add Postgres support: 4 hours
1336 | - Test with Neon: 2 hours
1337 | - Total: +0.75 days
1338 |
1339 | Grand total: 2-3 days for complete migration
1340 |
1341 | The Kicker
1342 |
1343 | ChromaDB solves BOTH problems:
1344 | 1. ✅ Works with SQLite AND Postgres (it's separate!)
1345 | 2. ✅ No server needed for local (embedded mode)
1346 | 3. ✅ Better search than FTS5 (semantic!)
1347 | 4. ✅ One implementation for both deployments
1348 |
1349 | Want me to prototype this? I can show you:
1350 | 1. ChromaDB embedded with local embeddings
1351 | 2. Example searches showing semantic matching
1352 | 3. Performance benchmarks
1353 | 4. Migration from FTS5
1354 |
1355 |
1356 | ## Observations
1357 |
1358 | - [problem] SQLite FTS5 and PostgreSQL tsvector are incompatible architectures requiring dual implementation #database-compatibility
1359 | - [problem] Cloud deployments lose database on container restart requiring full re-sync #persistence
1360 | - [solution] ChromaDB provides database-agnostic semantic search eliminating dual implementation #architecture
1361 | - [advantage] Semantic search finds related concepts beyond keyword matching improving UX #search-quality
1362 | - [deployment] Embedded ChromaDB requires no external services for FOSS #simplicity
1363 | - [migration] Moving to PostgreSQL solves cloud persistence issues #cloud-architecture
1364 | - [performance] Local embedding models provide good quality at zero cost #cost-optimization
1365 | - [trade-off] Embedding generation adds ~50ms latency vs instant FTS5 indexing #performance
1366 | - [benefit] Single search codebase reduces maintenance burden and test coverage needs #maintainability
1367 |
1368 | ## Prior Art / References
1369 |
1370 | ### Community Fork: manuelbliemel/basic-memory (feature/vector-search)
1371 |
1372 | **Repository**: https://github.com/manuelbliemel/basic-memory/tree/feature/vector-search
1373 |
1374 | **Key Implementation Details**:
1375 |
1376 | **Vector Database**: ChromaDB (same as our approach!)
1377 |
1378 | **Embedding Models**:
1379 | - Local: `all-MiniLM-L6-v2` (default, 384 dims) - same model we planned
1380 | - Also supports: `all-mpnet-base-v2`, `paraphrase-MiniLM-L6-v2`, `multi-qa-MiniLM-L6-cos-v1`
1381 | - OpenAI: `text-embedding-ada-002`, `text-embedding-3-small`, `text-embedding-3-large`
1382 |
1383 | **Chunking Strategy** (interesting - we didn't consider this):
1384 | - Chunk Size: 500 characters
1385 | - Chunk Overlap: 50 characters
1386 | - Breaks documents into smaller pieces for better semantic search
1387 |
1388 | **Search Strategies**:
1389 | 1. `fuzzy_only` (default) - FTS5 only
1390 | 2. `vector_only` - ChromaDB only
1391 | 3. `hybrid` (recommended) - Both FTS5 + ChromaDB
1392 | 4. `fuzzy_primary` - FTS5 first, ChromaDB fallback
1393 | 5. `vector_primary` - ChromaDB first, FTS5 fallback
1394 |
1395 | **Configuration**:
1396 | - Similarity Threshold: 0.1
1397 | - Max Results: 5
1398 | - Storage: `~/.basic-memory/chroma/`
1399 | - Config: `~/.basic-memory/config.json`
1400 |
1401 | **Key Differences from Our Approach**:
1402 |
1403 | | Aspect | Their Approach | Our Approach |
1404 | |--------|---------------|--------------|
1405 | | FTS5 | Keep FTS5 + add ChromaDB | Remove FTS5, use SQL for exact lookups |
1406 | | Search Strategy | 5 configurable strategies | Smart routing (automatic) |
1407 | | Document Processing | Chunk into 500-char pieces | Index full documents |
1408 | | Hybrid Mode | Run both, merge, dedupe | Route to best backend |
1409 | | Configuration | User-configurable strategy | Automatic based on query type |
1410 |
1411 | **What We Can Learn**:
1412 |
1413 | 1. **Chunking**: Breaking documents into 500-character chunks with 50-char overlap may improve semantic search quality for long documents
1414 | - Pro: Better granularity for semantic matching
1415 | - Con: More vectors to store and search
1416 | - Consider: Optional chunking for large documents (>2000 chars)
1417 |
1418 | 2. **Configurable Strategies**: Allowing users to choose search strategy provides flexibility
1419 | - Pro: Power users can tune behavior
1420 | - Con: More complexity, most users won't configure
1421 | - Consider: Default to smart routing, allow override via config
1422 |
1423 | 3. **Similarity Threshold**: They use 0.1 as default
1424 | - Consider: Benchmark different thresholds for quality
1425 |
1426 | 4. **Storage Location**: `~/.basic-memory/chroma/` matches our planned `chroma_data/` approach
1427 |
1428 | **Potential Collaboration**:
1429 | - Their implementation is nearly complete as a fork
1430 | - Could potentially merge their work or use as reference implementation
1431 | - Their chunking strategy could be valuable addition to our approach
1432 |
1433 | ## Relations
1434 |
1435 | - implements [[SPEC-11 Basic Memory API Performance Optimization]]
1436 | - relates_to [[Performance Optimizations Documentation]]
1437 | - enables [[PostgreSQL Migration]]
1438 | - improves_on [[SQLite FTS5 Search]]
1439 | - references [[manuelbliemel/basic-memory feature/vector-search fork]]
1440 |
```