#
tokens: 40003/50000 3/416 files (page 21/27)
lines: on (toggle) GitHub
raw markdown copy reset
This is page 21 of 27. Use http://codebase.md/basicmachines-co/basic-memory?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .claude
│   ├── commands
│   │   ├── release
│   │   │   ├── beta.md
│   │   │   ├── changelog.md
│   │   │   ├── release-check.md
│   │   │   └── release.md
│   │   ├── spec.md
│   │   └── test-live.md
│   └── settings.json
├── .dockerignore
├── .env.example
├── .github
│   ├── dependabot.yml
│   ├── ISSUE_TEMPLATE
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   ├── documentation.md
│   │   └── feature_request.md
│   └── workflows
│       ├── claude-code-review.yml
│       ├── claude-issue-triage.yml
│       ├── claude.yml
│       ├── dev-release.yml
│       ├── docker.yml
│       ├── pr-title.yml
│       ├── release.yml
│       └── test.yml
├── .gitignore
├── .python-version
├── CHANGELOG.md
├── CITATION.cff
├── CLA.md
├── CLAUDE.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── docker-compose-postgres.yml
├── docker-compose.yml
├── Dockerfile
├── docs
│   ├── ai-assistant-guide-extended.md
│   ├── ARCHITECTURE.md
│   ├── character-handling.md
│   ├── cloud-cli.md
│   ├── Docker.md
│   └── testing-coverage.md
├── justfile
├── LICENSE
├── llms-install.md
├── pyproject.toml
├── README.md
├── SECURITY.md
├── smithery.yaml
├── specs
│   ├── SPEC-1 Specification-Driven Development Process.md
│   ├── SPEC-10 Unified Deployment Workflow and Event Tracking.md
│   ├── SPEC-11 Basic Memory API Performance Optimization.md
│   ├── SPEC-12 OpenTelemetry Observability.md
│   ├── SPEC-13 CLI Authentication with Subscription Validation.md
│   ├── SPEC-14 Cloud Git Versioning & GitHub Backup.md
│   ├── SPEC-14- Cloud Git Versioning & GitHub Backup.md
│   ├── SPEC-15 Configuration Persistence via Tigris for Cloud Tenants.md
│   ├── SPEC-16 MCP Cloud Service Consolidation.md
│   ├── SPEC-17 Semantic Search with ChromaDB.md
│   ├── SPEC-18 AI Memory Management Tool.md
│   ├── SPEC-19 Sync Performance and Memory Optimization.md
│   ├── SPEC-2 Slash Commands Reference.md
│   ├── SPEC-20 Simplified Project-Scoped Rclone Sync.md
│   ├── SPEC-3 Agent Definitions.md
│   ├── SPEC-4 Notes Web UI Component Architecture.md
│   ├── SPEC-5 CLI Cloud Upload via WebDAV.md
│   ├── SPEC-6 Explicit Project Parameter Architecture.md
│   ├── SPEC-7 POC to spike Tigris Turso for local access to cloud data.md
│   ├── SPEC-8 TigrisFS Integration.md
│   ├── SPEC-9 Multi-Project Bidirectional Sync Architecture.md
│   ├── SPEC-9 Signed Header Tenant Information.md
│   └── SPEC-9-1 Follow-Ups- Conflict, Sync, and Observability.md
├── src
│   └── basic_memory
│       ├── __init__.py
│       ├── alembic
│       │   ├── alembic.ini
│       │   ├── env.py
│       │   ├── migrations.py
│       │   ├── script.py.mako
│       │   └── versions
│       │       ├── 314f1ea54dc4_add_postgres_full_text_search_support_.py
│       │       ├── 3dae7c7b1564_initial_schema.py
│       │       ├── 502b60eaa905_remove_required_from_entity_permalink.py
│       │       ├── 5fe1ab1ccebe_add_projects_table.py
│       │       ├── 647e7a75e2cd_project_constraint_fix.py
│       │       ├── 6830751f5fb6_merge_multiple_heads.py
│       │       ├── 9d9c1cb7d8f5_add_mtime_and_size_columns_to_entity_.py
│       │       ├── a1b2c3d4e5f6_fix_project_foreign_keys.py
│       │       ├── a2b3c4d5e6f7_add_search_index_entity_cascade.py
│       │       ├── b3c3938bacdb_relation_to_name_unique_index.py
│       │       ├── cc7172b46608_update_search_index_schema.py
│       │       ├── e7e1f4367280_add_scan_watermark_tracking_to_project.py
│       │       ├── f8a9b2c3d4e5_add_pg_trgm_for_fuzzy_link_resolution.py
│       │       └── g9a0b3c4d5e6_add_external_id_to_project_and_entity.py
│       ├── api
│       │   ├── __init__.py
│       │   ├── app.py
│       │   ├── container.py
│       │   ├── routers
│       │   │   ├── __init__.py
│       │   │   ├── directory_router.py
│       │   │   ├── importer_router.py
│       │   │   ├── knowledge_router.py
│       │   │   ├── management_router.py
│       │   │   ├── memory_router.py
│       │   │   ├── project_router.py
│       │   │   ├── prompt_router.py
│       │   │   ├── resource_router.py
│       │   │   ├── search_router.py
│       │   │   └── utils.py
│       │   ├── template_loader.py
│       │   └── v2
│       │       ├── __init__.py
│       │       └── routers
│       │           ├── __init__.py
│       │           ├── directory_router.py
│       │           ├── importer_router.py
│       │           ├── knowledge_router.py
│       │           ├── memory_router.py
│       │           ├── project_router.py
│       │           ├── prompt_router.py
│       │           ├── resource_router.py
│       │           └── search_router.py
│       ├── cli
│       │   ├── __init__.py
│       │   ├── app.py
│       │   ├── auth.py
│       │   ├── commands
│       │   │   ├── __init__.py
│       │   │   ├── cloud
│       │   │   │   ├── __init__.py
│       │   │   │   ├── api_client.py
│       │   │   │   ├── bisync_commands.py
│       │   │   │   ├── cloud_utils.py
│       │   │   │   ├── core_commands.py
│       │   │   │   ├── rclone_commands.py
│       │   │   │   ├── rclone_config.py
│       │   │   │   ├── rclone_installer.py
│       │   │   │   ├── upload_command.py
│       │   │   │   └── upload.py
│       │   │   ├── command_utils.py
│       │   │   ├── db.py
│       │   │   ├── format.py
│       │   │   ├── import_chatgpt.py
│       │   │   ├── import_claude_conversations.py
│       │   │   ├── import_claude_projects.py
│       │   │   ├── import_memory_json.py
│       │   │   ├── mcp.py
│       │   │   ├── project.py
│       │   │   ├── status.py
│       │   │   ├── telemetry.py
│       │   │   └── tool.py
│       │   ├── container.py
│       │   └── main.py
│       ├── config.py
│       ├── db.py
│       ├── deps
│       │   ├── __init__.py
│       │   ├── config.py
│       │   ├── db.py
│       │   ├── importers.py
│       │   ├── projects.py
│       │   ├── repositories.py
│       │   └── services.py
│       ├── deps.py
│       ├── file_utils.py
│       ├── ignore_utils.py
│       ├── importers
│       │   ├── __init__.py
│       │   ├── base.py
│       │   ├── chatgpt_importer.py
│       │   ├── claude_conversations_importer.py
│       │   ├── claude_projects_importer.py
│       │   ├── memory_json_importer.py
│       │   └── utils.py
│       ├── markdown
│       │   ├── __init__.py
│       │   ├── entity_parser.py
│       │   ├── markdown_processor.py
│       │   ├── plugins.py
│       │   ├── schemas.py
│       │   └── utils.py
│       ├── mcp
│       │   ├── __init__.py
│       │   ├── async_client.py
│       │   ├── clients
│       │   │   ├── __init__.py
│       │   │   ├── directory.py
│       │   │   ├── knowledge.py
│       │   │   ├── memory.py
│       │   │   ├── project.py
│       │   │   ├── resource.py
│       │   │   └── search.py
│       │   ├── container.py
│       │   ├── project_context.py
│       │   ├── prompts
│       │   │   ├── __init__.py
│       │   │   ├── ai_assistant_guide.py
│       │   │   ├── continue_conversation.py
│       │   │   ├── recent_activity.py
│       │   │   ├── search.py
│       │   │   └── utils.py
│       │   ├── resources
│       │   │   ├── ai_assistant_guide.md
│       │   │   └── project_info.py
│       │   ├── server.py
│       │   └── tools
│       │       ├── __init__.py
│       │       ├── build_context.py
│       │       ├── canvas.py
│       │       ├── chatgpt_tools.py
│       │       ├── delete_note.py
│       │       ├── edit_note.py
│       │       ├── list_directory.py
│       │       ├── move_note.py
│       │       ├── project_management.py
│       │       ├── read_content.py
│       │       ├── read_note.py
│       │       ├── recent_activity.py
│       │       ├── search.py
│       │       ├── utils.py
│       │       ├── view_note.py
│       │       └── write_note.py
│       ├── models
│       │   ├── __init__.py
│       │   ├── base.py
│       │   ├── knowledge.py
│       │   ├── project.py
│       │   └── search.py
│       ├── project_resolver.py
│       ├── repository
│       │   ├── __init__.py
│       │   ├── entity_repository.py
│       │   ├── observation_repository.py
│       │   ├── postgres_search_repository.py
│       │   ├── project_info_repository.py
│       │   ├── project_repository.py
│       │   ├── relation_repository.py
│       │   ├── repository.py
│       │   ├── search_index_row.py
│       │   ├── search_repository_base.py
│       │   ├── search_repository.py
│       │   └── sqlite_search_repository.py
│       ├── runtime.py
│       ├── schemas
│       │   ├── __init__.py
│       │   ├── base.py
│       │   ├── cloud.py
│       │   ├── delete.py
│       │   ├── directory.py
│       │   ├── importer.py
│       │   ├── memory.py
│       │   ├── project_info.py
│       │   ├── prompt.py
│       │   ├── request.py
│       │   ├── response.py
│       │   ├── search.py
│       │   ├── sync_report.py
│       │   └── v2
│       │       ├── __init__.py
│       │       ├── entity.py
│       │       └── resource.py
│       ├── services
│       │   ├── __init__.py
│       │   ├── context_service.py
│       │   ├── directory_service.py
│       │   ├── entity_service.py
│       │   ├── exceptions.py
│       │   ├── file_service.py
│       │   ├── initialization.py
│       │   ├── link_resolver.py
│       │   ├── project_service.py
│       │   ├── search_service.py
│       │   └── service.py
│       ├── sync
│       │   ├── __init__.py
│       │   ├── background_sync.py
│       │   ├── coordinator.py
│       │   ├── sync_service.py
│       │   └── watch_service.py
│       ├── telemetry.py
│       ├── templates
│       │   └── prompts
│       │       ├── continue_conversation.hbs
│       │       └── search.hbs
│       └── utils.py
├── test-int
│   ├── BENCHMARKS.md
│   ├── cli
│   │   ├── test_project_commands_integration.py
│   │   └── test_version_integration.py
│   ├── conftest.py
│   ├── mcp
│   │   ├── test_build_context_underscore.py
│   │   ├── test_build_context_validation.py
│   │   ├── test_chatgpt_tools_integration.py
│   │   ├── test_default_project_mode_integration.py
│   │   ├── test_delete_note_integration.py
│   │   ├── test_edit_note_integration.py
│   │   ├── test_lifespan_shutdown_sync_task_cancellation_integration.py
│   │   ├── test_list_directory_integration.py
│   │   ├── test_move_note_integration.py
│   │   ├── test_project_management_integration.py
│   │   ├── test_project_state_sync_integration.py
│   │   ├── test_read_content_integration.py
│   │   ├── test_read_note_integration.py
│   │   ├── test_search_integration.py
│   │   ├── test_single_project_mcp_integration.py
│   │   └── test_write_note_integration.py
│   ├── test_db_wal_mode.py
│   └── test_disable_permalinks_integration.py
├── tests
│   ├── __init__.py
│   ├── api
│   │   ├── conftest.py
│   │   ├── test_api_container.py
│   │   ├── test_async_client.py
│   │   ├── test_continue_conversation_template.py
│   │   ├── test_directory_router.py
│   │   ├── test_importer_router.py
│   │   ├── test_knowledge_router.py
│   │   ├── test_management_router.py
│   │   ├── test_memory_router.py
│   │   ├── test_project_router_operations.py
│   │   ├── test_project_router.py
│   │   ├── test_prompt_router.py
│   │   ├── test_relation_background_resolution.py
│   │   ├── test_resource_router.py
│   │   ├── test_search_router.py
│   │   ├── test_search_template.py
│   │   ├── test_template_loader_helpers.py
│   │   ├── test_template_loader.py
│   │   └── v2
│   │       ├── __init__.py
│   │       ├── conftest.py
│   │       ├── test_directory_router.py
│   │       ├── test_importer_router.py
│   │       ├── test_knowledge_router.py
│   │       ├── test_memory_router.py
│   │       ├── test_project_router.py
│   │       ├── test_prompt_router.py
│   │       ├── test_resource_router.py
│   │       └── test_search_router.py
│   ├── cli
│   │   ├── cloud
│   │   │   ├── test_cloud_api_client_and_utils.py
│   │   │   ├── test_rclone_config_and_bmignore_filters.py
│   │   │   └── test_upload_path.py
│   │   ├── conftest.py
│   │   ├── test_auth_cli_auth.py
│   │   ├── test_cli_container.py
│   │   ├── test_cli_exit.py
│   │   ├── test_cli_tool_exit.py
│   │   ├── test_cli_tools.py
│   │   ├── test_cloud_authentication.py
│   │   ├── test_ignore_utils.py
│   │   ├── test_import_chatgpt.py
│   │   ├── test_import_claude_conversations.py
│   │   ├── test_import_claude_projects.py
│   │   ├── test_import_memory_json.py
│   │   ├── test_project_add_with_local_path.py
│   │   └── test_upload.py
│   ├── conftest.py
│   ├── db
│   │   └── test_issue_254_foreign_key_constraints.py
│   ├── importers
│   │   ├── test_conversation_indexing.py
│   │   ├── test_importer_base.py
│   │   └── test_importer_utils.py
│   ├── markdown
│   │   ├── __init__.py
│   │   ├── test_date_frontmatter_parsing.py
│   │   ├── test_entity_parser_error_handling.py
│   │   ├── test_entity_parser.py
│   │   ├── test_markdown_plugins.py
│   │   ├── test_markdown_processor.py
│   │   ├── test_observation_edge_cases.py
│   │   ├── test_parser_edge_cases.py
│   │   ├── test_relation_edge_cases.py
│   │   └── test_task_detection.py
│   ├── mcp
│   │   ├── clients
│   │   │   ├── __init__.py
│   │   │   └── test_clients.py
│   │   ├── conftest.py
│   │   ├── test_async_client_modes.py
│   │   ├── test_mcp_container.py
│   │   ├── test_obsidian_yaml_formatting.py
│   │   ├── test_permalink_collision_file_overwrite.py
│   │   ├── test_project_context.py
│   │   ├── test_prompts.py
│   │   ├── test_recent_activity_prompt_modes.py
│   │   ├── test_resources.py
│   │   ├── test_server_lifespan_branches.py
│   │   ├── test_tool_build_context.py
│   │   ├── test_tool_canvas.py
│   │   ├── test_tool_delete_note.py
│   │   ├── test_tool_edit_note.py
│   │   ├── test_tool_list_directory.py
│   │   ├── test_tool_move_note.py
│   │   ├── test_tool_project_management.py
│   │   ├── test_tool_read_content.py
│   │   ├── test_tool_read_note.py
│   │   ├── test_tool_recent_activity.py
│   │   ├── test_tool_resource.py
│   │   ├── test_tool_search.py
│   │   ├── test_tool_utils.py
│   │   ├── test_tool_view_note.py
│   │   ├── test_tool_write_note_kebab_filenames.py
│   │   ├── test_tool_write_note.py
│   │   └── tools
│   │       └── test_chatgpt_tools.py
│   ├── Non-MarkdownFileSupport.pdf
│   ├── README.md
│   ├── repository
│   │   ├── test_entity_repository_upsert.py
│   │   ├── test_entity_repository.py
│   │   ├── test_entity_upsert_issue_187.py
│   │   ├── test_observation_repository.py
│   │   ├── test_postgres_search_repository.py
│   │   ├── test_project_info_repository.py
│   │   ├── test_project_repository.py
│   │   ├── test_relation_repository.py
│   │   ├── test_repository.py
│   │   ├── test_search_repository_edit_bug_fix.py
│   │   └── test_search_repository.py
│   ├── schemas
│   │   ├── test_base_timeframe_minimum.py
│   │   ├── test_memory_serialization.py
│   │   ├── test_memory_url_validation.py
│   │   ├── test_memory_url.py
│   │   ├── test_relation_response_reference_resolution.py
│   │   ├── test_schemas.py
│   │   └── test_search.py
│   ├── Screenshot.png
│   ├── services
│   │   ├── test_context_service.py
│   │   ├── test_directory_service.py
│   │   ├── test_entity_service_disable_permalinks.py
│   │   ├── test_entity_service.py
│   │   ├── test_file_service.py
│   │   ├── test_initialization_cloud_mode_branches.py
│   │   ├── test_initialization.py
│   │   ├── test_link_resolver.py
│   │   ├── test_project_removal_bug.py
│   │   ├── test_project_service_operations.py
│   │   ├── test_project_service.py
│   │   └── test_search_service.py
│   ├── sync
│   │   ├── test_character_conflicts.py
│   │   ├── test_coordinator.py
│   │   ├── test_sync_service_incremental.py
│   │   ├── test_sync_service.py
│   │   ├── test_sync_wikilink_issue.py
│   │   ├── test_tmp_files.py
│   │   ├── test_watch_service_atomic_adds.py
│   │   ├── test_watch_service_edge_cases.py
│   │   ├── test_watch_service_reload.py
│   │   └── test_watch_service.py
│   ├── test_config.py
│   ├── test_deps.py
│   ├── test_production_cascade_delete.py
│   ├── test_project_resolver.py
│   ├── test_rclone_commands.py
│   ├── test_runtime.py
│   ├── test_telemetry.py
│   └── utils
│       ├── test_file_utils.py
│       ├── test_frontmatter_obsidian_compatible.py
│       ├── test_parse_tags.py
│       ├── test_permalink_formatting.py
│       ├── test_timezone_utils.py
│       ├── test_utf8_handling.py
│       └── test_validate_project_path.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/src/basic_memory/services/entity_service.py:
--------------------------------------------------------------------------------

```python
  1 | """Service for managing entities in the database."""
  2 | 
  3 | from pathlib import Path
  4 | from typing import List, Optional, Sequence, Tuple, Union
  5 | 
  6 | import frontmatter
  7 | import yaml
  8 | from loguru import logger
  9 | from sqlalchemy.exc import IntegrityError
 10 | 
 11 | 
 12 | from basic_memory.config import ProjectConfig, BasicMemoryConfig
 13 | from basic_memory.file_utils import (
 14 |     has_frontmatter,
 15 |     parse_frontmatter,
 16 |     remove_frontmatter,
 17 |     dump_frontmatter,
 18 | )
 19 | from basic_memory.markdown import EntityMarkdown
 20 | from basic_memory.markdown.entity_parser import EntityParser
 21 | from basic_memory.markdown.utils import entity_model_from_markdown, schema_to_markdown
 22 | from basic_memory.models import Entity as EntityModel
 23 | from basic_memory.models import Observation, Relation
 24 | from basic_memory.models.knowledge import Entity
 25 | from basic_memory.repository import ObservationRepository, RelationRepository
 26 | from basic_memory.repository.entity_repository import EntityRepository
 27 | from basic_memory.schemas import Entity as EntitySchema
 28 | from basic_memory.schemas.base import Permalink
 29 | from basic_memory.services import BaseService, FileService
 30 | from basic_memory.services.exceptions import EntityCreationError, EntityNotFoundError
 31 | from basic_memory.services.link_resolver import LinkResolver
 32 | from basic_memory.services.search_service import SearchService
 33 | from basic_memory.utils import generate_permalink
 34 | 
 35 | 
 36 | class EntityService(BaseService[EntityModel]):
 37 |     """Service for managing entities in the database."""
 38 | 
 39 |     def __init__(
 40 |         self,
 41 |         entity_parser: EntityParser,
 42 |         entity_repository: EntityRepository,
 43 |         observation_repository: ObservationRepository,
 44 |         relation_repository: RelationRepository,
 45 |         file_service: FileService,
 46 |         link_resolver: LinkResolver,
 47 |         search_service: Optional[SearchService] = None,
 48 |         app_config: Optional[BasicMemoryConfig] = None,
 49 |     ):
 50 |         super().__init__(entity_repository)
 51 |         self.observation_repository = observation_repository
 52 |         self.relation_repository = relation_repository
 53 |         self.entity_parser = entity_parser
 54 |         self.file_service = file_service
 55 |         self.link_resolver = link_resolver
 56 |         self.search_service = search_service
 57 |         self.app_config = app_config
 58 | 
 59 |     async def detect_file_path_conflicts(
 60 |         self, file_path: str, skip_check: bool = False
 61 |     ) -> List[Entity]:
 62 |         """Detect potential file path conflicts for a given file path.
 63 | 
 64 |         This checks for entities with similar file paths that might cause conflicts:
 65 |         - Case sensitivity differences (Finance/file.md vs finance/file.md)
 66 |         - Character encoding differences
 67 |         - Hyphen vs space differences
 68 |         - Unicode normalization differences
 69 | 
 70 |         Args:
 71 |             file_path: The file path to check for conflicts
 72 |             skip_check: If True, skip the check and return empty list (optimization for bulk operations)
 73 | 
 74 |         Returns:
 75 |             List of entities that might conflict with the given file path
 76 |         """
 77 |         if skip_check:
 78 |             return []
 79 | 
 80 |         from basic_memory.utils import detect_potential_file_conflicts
 81 | 
 82 |         conflicts = []
 83 | 
 84 |         # Get all existing file paths
 85 |         all_entities = await self.repository.find_all()
 86 |         existing_paths = [entity.file_path for entity in all_entities]
 87 | 
 88 |         # Use the enhanced conflict detection utility
 89 |         conflicting_paths = detect_potential_file_conflicts(file_path, existing_paths)
 90 | 
 91 |         # Find the entities corresponding to conflicting paths
 92 |         for entity in all_entities:
 93 |             if entity.file_path in conflicting_paths:
 94 |                 conflicts.append(entity)
 95 | 
 96 |         return conflicts
 97 | 
 98 |     async def resolve_permalink(
 99 |         self,
100 |         file_path: Permalink | Path,
101 |         markdown: Optional[EntityMarkdown] = None,
102 |         skip_conflict_check: bool = False,
103 |     ) -> str:
104 |         """Get or generate unique permalink for an entity.
105 | 
106 |         Priority:
107 |         1. If markdown has permalink and it's not used by another file -> use as is
108 |         2. If markdown has permalink but it's used by another file -> make unique
109 |         3. For existing files, keep current permalink from db
110 |         4. Generate new unique permalink from file path
111 | 
112 |         Enhanced to detect and handle character-related conflicts.
113 | 
114 |         Note: Uses lightweight repository methods that skip eager loading of
115 |         observations and relations for better performance during bulk operations.
116 |         """
117 |         file_path_str = Path(file_path).as_posix()
118 | 
119 |         # Check for potential file path conflicts before resolving permalink
120 |         conflicts = await self.detect_file_path_conflicts(
121 |             file_path_str, skip_check=skip_conflict_check
122 |         )
123 |         if conflicts:
124 |             logger.warning(
125 |                 f"Detected potential file path conflicts for '{file_path_str}': "
126 |                 f"{[entity.file_path for entity in conflicts]}"
127 |             )
128 | 
129 |         # If markdown has explicit permalink, try to validate it
130 |         if markdown and markdown.frontmatter.permalink:
131 |             desired_permalink = markdown.frontmatter.permalink
132 |             # Use lightweight method - we only need to check file_path
133 |             existing_file_path = await self.repository.get_file_path_for_permalink(
134 |                 desired_permalink
135 |             )
136 | 
137 |             # If no conflict or it's our own file, use as is
138 |             if not existing_file_path or existing_file_path == file_path_str:
139 |                 return desired_permalink
140 | 
141 |         # For existing files, try to find current permalink
142 |         # Use lightweight method - we only need the permalink
143 |         existing_permalink = await self.repository.get_permalink_for_file_path(file_path_str)
144 |         if existing_permalink:
145 |             return existing_permalink
146 | 
147 |         # New file - generate permalink
148 |         if markdown and markdown.frontmatter.permalink:
149 |             desired_permalink = markdown.frontmatter.permalink
150 |         else:
151 |             desired_permalink = generate_permalink(file_path_str)
152 | 
153 |         # Make unique if needed - enhanced to handle character conflicts
154 |         # Use lightweight existence check instead of loading full entity
155 |         permalink = desired_permalink
156 |         suffix = 1
157 |         while await self.repository.permalink_exists(permalink):
158 |             permalink = f"{desired_permalink}-{suffix}"
159 |             suffix += 1
160 |             logger.debug(f"creating unique permalink: {permalink}")
161 | 
162 |         return permalink
163 | 
164 |     async def create_or_update_entity(self, schema: EntitySchema) -> Tuple[EntityModel, bool]:
165 |         """Create new entity or update existing one.
166 |         Returns: (entity, is_new) where is_new is True if a new entity was created
167 |         """
168 |         logger.debug(
169 |             f"Creating or updating entity: {schema.file_path}, permalink: {schema.permalink}"
170 |         )
171 | 
172 |         # Try to find existing entity using strict resolution (no fuzzy search)
173 |         # This prevents incorrectly matching similar file paths like "Node A.md" and "Node C.md"
174 |         existing = await self.link_resolver.resolve_link(schema.file_path, strict=True)
175 |         if not existing and schema.permalink:
176 |             existing = await self.link_resolver.resolve_link(schema.permalink, strict=True)
177 | 
178 |         if existing:
179 |             logger.debug(f"Found existing entity: {existing.file_path}")
180 |             return await self.update_entity(existing, schema), False
181 |         else:
182 |             # Create new entity
183 |             return await self.create_entity(schema), True
184 | 
185 |     async def create_entity(self, schema: EntitySchema) -> EntityModel:
186 |         """Create a new entity and write to filesystem."""
187 |         logger.debug(f"Creating entity: {schema.title}")
188 | 
189 |         # Get file path and ensure it's a Path object
190 |         file_path = Path(schema.file_path)
191 | 
192 |         if await self.file_service.exists(file_path):
193 |             raise EntityCreationError(
194 |                 f"file for entity {schema.folder}/{schema.title} already exists: {file_path}"
195 |             )
196 | 
197 |         # Parse content frontmatter to check for user-specified permalink and entity_type
198 |         content_markdown = None
199 |         if schema.content and has_frontmatter(schema.content):
200 |             content_frontmatter = parse_frontmatter(schema.content)
201 | 
202 |             # If content has entity_type/type, use it to override the schema entity_type
203 |             if "type" in content_frontmatter:
204 |                 schema.entity_type = content_frontmatter["type"]
205 | 
206 |             if "permalink" in content_frontmatter:
207 |                 # Create a minimal EntityMarkdown object for permalink resolution
208 |                 from basic_memory.markdown.schemas import EntityFrontmatter
209 | 
210 |                 frontmatter_metadata = {
211 |                     "title": schema.title,
212 |                     "type": schema.entity_type,
213 |                     "permalink": content_frontmatter["permalink"],
214 |                 }
215 |                 frontmatter_obj = EntityFrontmatter(metadata=frontmatter_metadata)
216 |                 content_markdown = EntityMarkdown(
217 |                     frontmatter=frontmatter_obj,
218 |                     content="",  # content not needed for permalink resolution
219 |                     observations=[],
220 |                     relations=[],
221 |                 )
222 | 
223 |         # Get unique permalink (prioritizing content frontmatter) unless disabled
224 |         if self.app_config and self.app_config.disable_permalinks:
225 |             # Use empty string as sentinel to indicate permalinks are disabled
226 |             # The permalink property will return None when it sees empty string
227 |             schema._permalink = ""
228 |         else:
229 |             # Generate and set permalink
230 |             permalink = await self.resolve_permalink(file_path, content_markdown)
231 |             schema._permalink = permalink
232 | 
233 |         post = await schema_to_markdown(schema)
234 | 
235 |         # write file
236 |         final_content = dump_frontmatter(post)
237 |         checksum = await self.file_service.write_file(file_path, final_content)
238 | 
239 |         # parse entity from content we just wrote (avoids re-reading file for cloud compatibility)
240 |         entity_markdown = await self.entity_parser.parse_markdown_content(
241 |             file_path=file_path,
242 |             content=final_content,
243 |         )
244 | 
245 |         # create entity
246 |         created = await self.create_entity_from_markdown(file_path, entity_markdown)
247 | 
248 |         # add relations
249 |         entity = await self.update_entity_relations(created.file_path, entity_markdown)
250 | 
251 |         # Set final checksum to mark complete
252 |         return await self.repository.update(entity.id, {"checksum": checksum})
253 | 
254 |     async def update_entity(self, entity: EntityModel, schema: EntitySchema) -> EntityModel:
255 |         """Update an entity's content and metadata."""
256 |         logger.debug(
257 |             f"Updating entity with permalink: {entity.permalink} content-type: {schema.content_type}"
258 |         )
259 | 
260 |         # Convert file path string to Path
261 |         file_path = Path(entity.file_path)
262 | 
263 |         # Read existing content via file_service (for cloud compatibility)
264 |         existing_content = await self.file_service.read_file_content(file_path)
265 |         existing_markdown = await self.entity_parser.parse_markdown_content(
266 |             file_path=file_path,
267 |             content=existing_content,
268 |         )
269 | 
270 |         # Parse content frontmatter to check for user-specified permalink and entity_type
271 |         content_markdown = None
272 |         if schema.content and has_frontmatter(schema.content):
273 |             content_frontmatter = parse_frontmatter(schema.content)
274 | 
275 |             # If content has entity_type/type, use it to override the schema entity_type
276 |             if "type" in content_frontmatter:
277 |                 schema.entity_type = content_frontmatter["type"]
278 | 
279 |             if "permalink" in content_frontmatter:
280 |                 # Create a minimal EntityMarkdown object for permalink resolution
281 |                 from basic_memory.markdown.schemas import EntityFrontmatter
282 | 
283 |                 frontmatter_metadata = {
284 |                     "title": schema.title,
285 |                     "type": schema.entity_type,
286 |                     "permalink": content_frontmatter["permalink"],
287 |                 }
288 |                 frontmatter_obj = EntityFrontmatter(metadata=frontmatter_metadata)
289 |                 content_markdown = EntityMarkdown(
290 |                     frontmatter=frontmatter_obj,
291 |                     content="",  # content not needed for permalink resolution
292 |                     observations=[],
293 |                     relations=[],
294 |                 )
295 | 
296 |         # Check if we need to update the permalink based on content frontmatter (unless disabled)
297 |         new_permalink = entity.permalink  # Default to existing
298 |         if self.app_config and not self.app_config.disable_permalinks:
299 |             if content_markdown and content_markdown.frontmatter.permalink:
300 |                 # Resolve permalink with the new content frontmatter
301 |                 resolved_permalink = await self.resolve_permalink(file_path, content_markdown)
302 |                 if resolved_permalink != entity.permalink:
303 |                     new_permalink = resolved_permalink
304 |                     # Update the schema to use the new permalink
305 |                     schema._permalink = new_permalink
306 | 
307 |         # Create post with new content from schema
308 |         post = await schema_to_markdown(schema)
309 | 
310 |         # Merge new metadata with existing metadata
311 |         existing_markdown.frontmatter.metadata.update(post.metadata)
312 | 
313 |         # Ensure the permalink in the metadata is the resolved one
314 |         if new_permalink != entity.permalink:
315 |             existing_markdown.frontmatter.metadata["permalink"] = new_permalink
316 | 
317 |         # Create a new post with merged metadata
318 |         merged_post = frontmatter.Post(post.content, **existing_markdown.frontmatter.metadata)
319 | 
320 |         # write file
321 |         final_content = dump_frontmatter(merged_post)
322 |         checksum = await self.file_service.write_file(file_path, final_content)
323 | 
324 |         # parse entity from content we just wrote (avoids re-reading file for cloud compatibility)
325 |         entity_markdown = await self.entity_parser.parse_markdown_content(
326 |             file_path=file_path,
327 |             content=final_content,
328 |         )
329 | 
330 |         # update entity in db
331 |         entity = await self.update_entity_and_observations(file_path, entity_markdown)
332 | 
333 |         # add relations
334 |         await self.update_entity_relations(file_path.as_posix(), entity_markdown)
335 | 
336 |         # Set final checksum to match file
337 |         entity = await self.repository.update(entity.id, {"checksum": checksum})
338 | 
339 |         return entity
340 | 
341 |     async def delete_entity(self, permalink_or_id: str | int) -> bool:
342 |         """Delete entity and its file."""
343 |         logger.debug(f"Deleting entity: {permalink_or_id}")
344 | 
345 |         try:
346 |             # Get entity first for file deletion
347 |             if isinstance(permalink_or_id, str):
348 |                 entity = await self.get_by_permalink(permalink_or_id)
349 |             else:
350 |                 entities = await self.get_entities_by_id([permalink_or_id])
351 |                 if len(entities) != 1:  # pragma: no cover
352 |                     logger.error(
353 |                         "Entity lookup error", entity_id=permalink_or_id, found_count=len(entities)
354 |                     )
355 |                     raise ValueError(
356 |                         f"Expected 1 entity with ID {permalink_or_id}, got {len(entities)}"
357 |                     )
358 |                 entity = entities[0]
359 | 
360 |             # Delete from search index first (if search_service is available)
361 |             if self.search_service:
362 |                 await self.search_service.handle_delete(entity)
363 | 
364 |             # Delete file
365 |             await self.file_service.delete_entity_file(entity)
366 | 
367 |             # Delete from DB (this will cascade to observations/relations)
368 |             return await self.repository.delete(entity.id)
369 | 
370 |         except EntityNotFoundError:
371 |             logger.info(f"Entity not found: {permalink_or_id}")
372 |             return True  # Already deleted
373 | 
374 |     async def get_by_permalink(self, permalink: str) -> EntityModel:
375 |         """Get entity by type and name combination."""
376 |         logger.debug(f"Getting entity by permalink: {permalink}")
377 |         db_entity = await self.repository.get_by_permalink(permalink)
378 |         if not db_entity:
379 |             raise EntityNotFoundError(f"Entity not found: {permalink}")
380 |         return db_entity
381 | 
382 |     async def get_entities_by_id(self, ids: List[int]) -> Sequence[EntityModel]:
383 |         """Get specific entities and their relationships."""
384 |         logger.debug(f"Getting entities: {ids}")
385 |         return await self.repository.find_by_ids(ids)
386 | 
387 |     async def get_entities_by_permalinks(self, permalinks: List[str]) -> Sequence[EntityModel]:
388 |         """Get specific nodes and their relationships."""
389 |         logger.debug(f"Getting entities permalinks: {permalinks}")
390 |         return await self.repository.find_by_permalinks(permalinks)
391 | 
392 |     async def delete_entity_by_file_path(self, file_path: Union[str, Path]) -> None:
393 |         """Delete entity by file path."""
394 |         await self.repository.delete_by_file_path(str(file_path))
395 | 
396 |     async def create_entity_from_markdown(
397 |         self, file_path: Path, markdown: EntityMarkdown
398 |     ) -> EntityModel:
399 |         """Create entity and observations only.
400 | 
401 |         Creates the entity with null checksum to indicate sync not complete.
402 |         Relations will be added in second pass.
403 | 
404 |         Uses UPSERT approach to handle permalink/file_path conflicts cleanly.
405 |         """
406 |         logger.debug(f"Creating entity: {markdown.frontmatter.title} file_path: {file_path}")
407 |         model = entity_model_from_markdown(
408 |             file_path, markdown, project_id=self.repository.project_id
409 |         )
410 | 
411 |         # Mark as incomplete because we still need to add relations
412 |         model.checksum = None
413 | 
414 |         # Use UPSERT to handle conflicts cleanly
415 |         try:
416 |             return await self.repository.upsert_entity(model)
417 |         except Exception as e:
418 |             logger.error(f"Failed to upsert entity for {file_path}: {e}")
419 |             raise EntityCreationError(f"Failed to create entity: {str(e)}") from e
420 | 
421 |     async def update_entity_and_observations(
422 |         self, file_path: Path, markdown: EntityMarkdown
423 |     ) -> EntityModel:
424 |         """Update entity fields and observations.
425 | 
426 |         Updates everything except relations and sets null checksum
427 |         to indicate sync not complete.
428 |         """
429 |         logger.debug(f"Updating entity and observations: {file_path}")
430 | 
431 |         db_entity = await self.repository.get_by_file_path(file_path.as_posix())
432 | 
433 |         # Clear observations for entity
434 |         await self.observation_repository.delete_by_fields(entity_id=db_entity.id)
435 | 
436 |         # add new observations
437 |         observations = [
438 |             Observation(
439 |                 project_id=self.observation_repository.project_id,
440 |                 entity_id=db_entity.id,
441 |                 content=obs.content,
442 |                 category=obs.category,
443 |                 context=obs.context,
444 |                 tags=obs.tags,
445 |             )
446 |             for obs in markdown.observations
447 |         ]
448 |         await self.observation_repository.add_all(observations)
449 | 
450 |         # update values from markdown
451 |         db_entity = entity_model_from_markdown(file_path, markdown, db_entity)
452 | 
453 |         # checksum value is None == not finished with sync
454 |         db_entity.checksum = None
455 | 
456 |         # update entity
457 |         return await self.repository.update(
458 |             db_entity.id,
459 |             db_entity,
460 |         )
461 | 
462 |     async def update_entity_relations(
463 |         self,
464 |         path: str,
465 |         markdown: EntityMarkdown,
466 |     ) -> EntityModel:
467 |         """Update relations for entity"""
468 |         logger.debug(f"Updating relations for entity: {path}")
469 | 
470 |         db_entity = await self.repository.get_by_file_path(path)
471 | 
472 |         # Clear existing relations first
473 |         await self.relation_repository.delete_outgoing_relations_from_entity(db_entity.id)
474 | 
475 |         # Batch resolve all relation targets in parallel
476 |         if markdown.relations:
477 |             import asyncio
478 | 
479 |             # Create tasks for all relation lookups
480 |             # Use strict=True to disable fuzzy search - only exact matches should create resolved relations
481 |             # This ensures forward references (links to non-existent entities) remain unresolved (to_id=NULL)
482 |             lookup_tasks = [
483 |                 self.link_resolver.resolve_link(rel.target, strict=True)
484 |                 for rel in markdown.relations
485 |             ]
486 | 
487 |             # Execute all lookups in parallel
488 |             resolved_entities = await asyncio.gather(*lookup_tasks, return_exceptions=True)
489 | 
490 |             # Process results and create relation records
491 |             relations_to_add = []
492 |             for rel, resolved in zip(markdown.relations, resolved_entities):
493 |                 # Handle exceptions from gather and None results
494 |                 target_entity: Optional[Entity] = None
495 |                 if not isinstance(resolved, Exception):
496 |                     # Type narrowing: resolved is Optional[Entity] here, not Exception
497 |                     target_entity = resolved  # type: ignore
498 | 
499 |                 # if the target is found, store the id
500 |                 target_id = target_entity.id if target_entity else None
501 |                 # if the target is found, store the title, otherwise add the target for a "forward link"
502 |                 target_name = target_entity.title if target_entity else rel.target
503 | 
504 |                 # Create the relation
505 |                 relation = Relation(
506 |                     project_id=self.relation_repository.project_id,
507 |                     from_id=db_entity.id,
508 |                     to_id=target_id,
509 |                     to_name=target_name,
510 |                     relation_type=rel.type,
511 |                     context=rel.context,
512 |                 )
513 |                 relations_to_add.append(relation)
514 | 
515 |             # Batch insert all relations
516 |             if relations_to_add:
517 |                 try:
518 |                     await self.relation_repository.add_all(relations_to_add)
519 |                 except IntegrityError:
520 |                     # Some relations might be duplicates - fall back to individual inserts
521 |                     logger.debug("Batch relation insert failed, trying individual inserts")
522 |                     for relation in relations_to_add:
523 |                         try:
524 |                             await self.relation_repository.add(relation)
525 |                         except IntegrityError:
526 |                             # Unique constraint violation - relation already exists
527 |                             logger.debug(
528 |                                 f"Skipping duplicate relation {relation.relation_type} from {db_entity.permalink}"
529 |                             )
530 |                             continue
531 | 
532 |         return await self.repository.get_by_file_path(path)
533 | 
534 |     async def edit_entity(
535 |         self,
536 |         identifier: str,
537 |         operation: str,
538 |         content: str,
539 |         section: Optional[str] = None,
540 |         find_text: Optional[str] = None,
541 |         expected_replacements: int = 1,
542 |     ) -> EntityModel:
543 |         """Edit an existing entity's content using various operations.
544 | 
545 |         Args:
546 |             identifier: Entity identifier (permalink, title, etc.)
547 |             operation: The editing operation (append, prepend, find_replace, replace_section)
548 |             content: The content to add or use for replacement
549 |             section: For replace_section operation - the markdown header
550 |             find_text: For find_replace operation - the text to find and replace
551 |             expected_replacements: For find_replace operation - expected number of replacements (default: 1)
552 | 
553 |         Returns:
554 |             The updated entity model
555 | 
556 |         Raises:
557 |             EntityNotFoundError: If the entity cannot be found
558 |             ValueError: If required parameters are missing for the operation or replacement count doesn't match expected
559 |         """
560 |         logger.debug(f"Editing entity: {identifier}, operation: {operation}")
561 | 
562 |         # Find the entity using the link resolver with strict mode for destructive operations
563 |         entity = await self.link_resolver.resolve_link(identifier, strict=True)
564 |         if not entity:
565 |             raise EntityNotFoundError(f"Entity not found: {identifier}")
566 | 
567 |         # Read the current file content
568 |         file_path = Path(entity.file_path)
569 |         current_content, _ = await self.file_service.read_file(file_path)
570 | 
571 |         # Apply the edit operation
572 |         new_content = self.apply_edit_operation(
573 |             current_content, operation, content, section, find_text, expected_replacements
574 |         )
575 | 
576 |         # Write the updated content back to the file
577 |         checksum = await self.file_service.write_file(file_path, new_content)
578 | 
579 |         # Parse the content we just wrote (avoids re-reading file for cloud compatibility)
580 |         entity_markdown = await self.entity_parser.parse_markdown_content(
581 |             file_path=file_path,
582 |             content=new_content,
583 |         )
584 | 
585 |         # Update entity and its relationships
586 |         entity = await self.update_entity_and_observations(file_path, entity_markdown)
587 |         await self.update_entity_relations(file_path.as_posix(), entity_markdown)
588 | 
589 |         # Set final checksum to match file
590 |         entity = await self.repository.update(entity.id, {"checksum": checksum})
591 | 
592 |         return entity
593 | 
594 |     def apply_edit_operation(
595 |         self,
596 |         current_content: str,
597 |         operation: str,
598 |         content: str,
599 |         section: Optional[str] = None,
600 |         find_text: Optional[str] = None,
601 |         expected_replacements: int = 1,
602 |     ) -> str:
603 |         """Apply the specified edit operation to the current content."""
604 | 
605 |         if operation == "append":
606 |             # Ensure proper spacing
607 |             if current_content and not current_content.endswith("\n"):
608 |                 return current_content + "\n" + content
609 |             return current_content + content  # pragma: no cover
610 | 
611 |         elif operation == "prepend":
612 |             # Handle frontmatter-aware prepending
613 |             return self._prepend_after_frontmatter(current_content, content)
614 | 
615 |         elif operation == "find_replace":
616 |             if not find_text:
617 |                 raise ValueError("find_text is required for find_replace operation")
618 |             if not find_text.strip():
619 |                 raise ValueError("find_text cannot be empty or whitespace only")
620 | 
621 |             # Count actual occurrences
622 |             actual_count = current_content.count(find_text)
623 | 
624 |             # Validate count matches expected
625 |             if actual_count != expected_replacements:
626 |                 if actual_count == 0:
627 |                     raise ValueError(f"Text to replace not found: '{find_text}'")
628 |                 else:
629 |                     raise ValueError(
630 |                         f"Expected {expected_replacements} occurrences of '{find_text}', "
631 |                         f"but found {actual_count}"
632 |                     )
633 | 
634 |             return current_content.replace(find_text, content)
635 | 
636 |         elif operation == "replace_section":
637 |             if not section:
638 |                 raise ValueError("section is required for replace_section operation")
639 |             if not section.strip():
640 |                 raise ValueError("section cannot be empty or whitespace only")
641 |             return self.replace_section_content(current_content, section, content)
642 | 
643 |         else:
644 |             raise ValueError(f"Unsupported operation: {operation}")
645 | 
646 |     def replace_section_content(
647 |         self, current_content: str, section_header: str, new_content: str
648 |     ) -> str:
649 |         """Replace content under a specific markdown section header.
650 | 
651 |         This method uses a simple, safe approach: when replacing a section, it only
652 |         replaces the immediate content under that header until it encounters the next
653 |         header of ANY level. This means:
654 | 
655 |         - Replacing "# Header" replaces content until "## Subsection" (preserves subsections)
656 |         - Replacing "## Section" replaces content until "### Subsection" (preserves subsections)
657 |         - More predictable and safer than trying to consume entire hierarchies
658 | 
659 |         Args:
660 |             current_content: The current markdown content
661 |             section_header: The section header to find and replace (e.g., "## Section Name")
662 |             new_content: The new content to replace the section with (should not include the header itself)
663 | 
664 |         Returns:
665 |             The updated content with the section replaced
666 | 
667 |         Raises:
668 |             ValueError: If multiple sections with the same header are found
669 |         """
670 |         # Normalize the section header (ensure it starts with #)
671 |         if not section_header.startswith("#"):
672 |             section_header = "## " + section_header
673 | 
674 |         # Strip duplicate header from new_content if present (fix for issue #390)
675 |         # LLMs sometimes include the section header in their content, which would create duplicates
676 |         new_content_lines = new_content.lstrip().split("\n")
677 |         if new_content_lines and new_content_lines[0].strip() == section_header.strip():
678 |             # Remove the duplicate header line
679 |             new_content = "\n".join(new_content_lines[1:]).lstrip()
680 | 
681 |         # First pass: count matching sections to check for duplicates
682 |         lines = current_content.split("\n")
683 |         matching_sections = []
684 | 
685 |         for i, line in enumerate(lines):
686 |             if line.strip() == section_header.strip():
687 |                 matching_sections.append(i)
688 | 
689 |         # Handle multiple sections error
690 |         if len(matching_sections) > 1:
691 |             raise ValueError(
692 |                 f"Multiple sections found with header '{section_header}'. "
693 |                 f"Section replacement requires unique headers."
694 |             )
695 | 
696 |         # If no section found, append it
697 |         if len(matching_sections) == 0:
698 |             logger.info(f"Section '{section_header}' not found, appending to end of document")
699 |             separator = "\n\n" if current_content and not current_content.endswith("\n\n") else ""
700 |             return current_content + separator + section_header + "\n" + new_content
701 | 
702 |         # Replace the single matching section
703 |         result_lines = []
704 |         section_line_idx = matching_sections[0]
705 | 
706 |         i = 0
707 |         while i < len(lines):
708 |             line = lines[i]
709 | 
710 |             # Check if this is our target section header
711 |             if i == section_line_idx:
712 |                 # Add the section header and new content
713 |                 result_lines.append(line)
714 |                 result_lines.append(new_content)
715 |                 i += 1
716 | 
717 |                 # Skip the original section content until next header or end
718 |                 while i < len(lines):
719 |                     next_line = lines[i]
720 |                     # Stop consuming when we hit any header (preserve subsections)
721 |                     if next_line.startswith("#"):
722 |                         # We found another header - continue processing from here
723 |                         break
724 |                     i += 1
725 |                 # Continue processing from the next header (don't increment i again)
726 |                 continue
727 | 
728 |             # Add all other lines (including subsequent sections)
729 |             result_lines.append(line)
730 |             i += 1
731 | 
732 |         return "\n".join(result_lines)
733 | 
734 |     def _prepend_after_frontmatter(self, current_content: str, content: str) -> str:
735 |         """Prepend content after frontmatter, preserving frontmatter structure."""
736 | 
737 |         # Check if file has frontmatter
738 |         if has_frontmatter(current_content):
739 |             try:
740 |                 # Parse and separate frontmatter from body
741 |                 frontmatter_data = parse_frontmatter(current_content)
742 |                 body_content = remove_frontmatter(current_content)
743 | 
744 |                 # Prepend content to the body
745 |                 if content and not content.endswith("\n"):
746 |                     new_body = content + "\n" + body_content
747 |                 else:
748 |                     new_body = content + body_content
749 | 
750 |                 # Reconstruct file with frontmatter + prepended body
751 |                 yaml_fm = yaml.dump(frontmatter_data, sort_keys=False, allow_unicode=True)
752 |                 return f"---\n{yaml_fm}---\n\n{new_body.strip()}"
753 | 
754 |             except Exception as e:  # pragma: no cover
755 |                 logger.warning(
756 |                     f"Failed to parse frontmatter during prepend: {e}"
757 |                 )  # pragma: no cover
758 |                 # Fall back to simple prepend if frontmatter parsing fails  # pragma: no cover
759 | 
760 |         # No frontmatter or parsing failed - do simple prepend  # pragma: no cover
761 |         if content and not content.endswith("\n"):  # pragma: no cover
762 |             return content + "\n" + current_content  # pragma: no cover
763 |         return content + current_content  # pragma: no cover
764 | 
765 |     async def move_entity(
766 |         self,
767 |         identifier: str,
768 |         destination_path: str,
769 |         project_config: ProjectConfig,
770 |         app_config: BasicMemoryConfig,
771 |     ) -> EntityModel:
772 |         """Move entity to new location with database consistency.
773 | 
774 |         Args:
775 |             identifier: Entity identifier (title, permalink, or memory:// URL)
776 |             destination_path: New path relative to project root
777 |             project_config: Project configuration for file operations
778 |             app_config: App configuration for permalink update settings
779 | 
780 |         Returns:
781 |             Success message with move details
782 | 
783 |         Raises:
784 |             EntityNotFoundError: If the entity cannot be found
785 |             ValueError: If move operation fails due to validation or filesystem errors
786 |         """
787 |         logger.debug(f"Moving entity: {identifier} to {destination_path}")
788 | 
789 |         # 1. Resolve identifier to entity with strict mode for destructive operations
790 |         entity = await self.link_resolver.resolve_link(identifier, strict=True)
791 |         if not entity:
792 |             raise EntityNotFoundError(f"Entity not found: {identifier}")
793 | 
794 |         current_path = entity.file_path
795 |         old_permalink = entity.permalink
796 | 
797 |         # 2. Validate destination path format first
798 |         if not destination_path or destination_path.startswith("/") or not destination_path.strip():
799 |             raise ValueError(f"Invalid destination path: {destination_path}")
800 | 
801 |         # 3. Validate paths
802 |         # NOTE: In tenantless/cloud mode, we cannot rely on local filesystem paths.
803 |         # Use FileService for existence checks and moving.
804 |         if not await self.file_service.exists(current_path):
805 |             raise ValueError(f"Source file not found: {current_path}")
806 | 
807 |         if await self.file_service.exists(destination_path):
808 |             raise ValueError(f"Destination already exists: {destination_path}")
809 | 
810 |         try:
811 |             # 4. Ensure destination directory if needed (no-op for S3)
812 |             await self.file_service.ensure_directory(Path(destination_path).parent)
813 | 
814 |             # 5. Move physical file via FileService (filesystem rename or cloud move)
815 |             await self.file_service.move_file(current_path, destination_path)
816 |             logger.info(f"Moved file: {current_path} -> {destination_path}")
817 | 
818 |             # 6. Prepare database updates
819 |             updates = {"file_path": destination_path}
820 | 
821 |             # 7. Update permalink if configured or if entity has null permalink (unless disabled)
822 |             if not app_config.disable_permalinks and (
823 |                 app_config.update_permalinks_on_move or old_permalink is None
824 |             ):
825 |                 # Generate new permalink from destination path
826 |                 new_permalink = await self.resolve_permalink(destination_path)
827 | 
828 |                 # Update frontmatter with new permalink
829 |                 await self.file_service.update_frontmatter(
830 |                     destination_path, {"permalink": new_permalink}
831 |                 )
832 | 
833 |                 updates["permalink"] = new_permalink
834 |                 if old_permalink is None:
835 |                     logger.info(
836 |                         f"Generated permalink for entity with null permalink: {new_permalink}"
837 |                     )
838 |                 else:
839 |                     logger.info(f"Updated permalink: {old_permalink} -> {new_permalink}")
840 | 
841 |             # 8. Recalculate checksum
842 |             new_checksum = await self.file_service.compute_checksum(destination_path)
843 |             updates["checksum"] = new_checksum
844 | 
845 |             # 9. Update database
846 |             updated_entity = await self.repository.update(entity.id, updates)
847 |             if not updated_entity:
848 |                 raise ValueError(f"Failed to update entity in database: {entity.id}")
849 | 
850 |             return updated_entity
851 | 
852 |         except Exception as e:
853 |             # Rollback: try to restore original file location if move succeeded
854 |             try:
855 |                 if await self.file_service.exists(
856 |                     destination_path
857 |                 ) and not await self.file_service.exists(current_path):
858 |                     await self.file_service.move_file(destination_path, current_path)
859 |                     logger.info(f"Rolled back file move: {destination_path} -> {current_path}")
860 |             except Exception as rollback_error:  # pragma: no cover
861 |                 logger.error(f"Failed to rollback file move: {rollback_error}")
862 | 
863 |             # Re-raise the original error with context
864 |             raise ValueError(f"Move failed: {str(e)}") from e
865 | 
```

--------------------------------------------------------------------------------
/specs/SPEC-19 Sync Performance and Memory Optimization.md:
--------------------------------------------------------------------------------

```markdown
   1 | ---
   2 | title: 'SPEC-19: Sync Performance and Memory Optimization'
   3 | type: spec
   4 | permalink: specs/spec-17-sync-performance-optimization
   5 | tags:
   6 | - performance
   7 | - memory
   8 | - sync
   9 | - optimization
  10 | - core
  11 | status: draft
  12 | ---
  13 | 
  14 | # SPEC-19: Sync Performance and Memory Optimization
  15 | 
  16 | ## Why
  17 | 
  18 | ### Problem Statement
  19 | 
  20 | Current sync implementation causes Out-of-Memory (OOM) kills and poor performance on production systems:
  21 | 
  22 | **Evidence from Production**:
  23 | - **Tenant-6d2ff1a3**: OOM killed on 1GB machine
  24 |   - Files: 2,621 total (31 PDFs, 80MB binary data)
  25 |   - Memory: 1.5-1.7GB peak usage
  26 |   - Sync duration: 15+ minutes
  27 |   - Error: `Out of memory: Killed process 693 (python)`
  28 | 
  29 | **Root Causes**:
  30 | 
  31 | 1. **Checksum-based scanning loads ALL files into memory**
  32 |    - `scan_directory()` computes checksums for ALL 2,624 files upfront
  33 |    - Results stored in multiple dicts (`ScanResult.files`, `SyncReport.checksums`)
  34 |    - Even unchanged files are fully read and checksummed
  35 | 
  36 | 2. **Large files read entirely for checksums**
  37 |    - 16MB PDF → Full read into memory → Compute checksum
  38 |    - No streaming or chunked processing
  39 |    - TigrisFS caching compounds memory usage
  40 | 
  41 | 3. **Unbounded concurrency**
  42 |    - All 2,624 files processed simultaneously
  43 |    - Each file loads full content into memory
  44 |    - No semaphore limiting concurrent operations
  45 | 
  46 | 4. **Cloud-specific resource leaks**
  47 |    - aiohttp session leak in keepalive (not in context manager)
  48 |    - Circuit breaker resets every 30s sync cycle (ineffective)
  49 |    - Thundering herd: all tenants sync at :00 and :30
  50 | 
  51 | ### Impact
  52 | 
  53 | - **Production stability**: OOM kills are unacceptable
  54 | - **User experience**: 15+ minute syncs are too slow
  55 | - **Cost**: Forced upgrades from 1GB → 2GB machines ($5-10/mo per tenant)
  56 | - **Scalability**: Current approach won't scale to 100+ tenants
  57 | 
  58 | ### Architectural Decision
  59 | 
  60 | **Fix in basic-memory core first, NOT UberSync**
  61 | 
  62 | Rationale:
  63 | - Root causes are algorithmic, not architectural
  64 | - Benefits all users (CLI + Cloud)
  65 | - Lower risk than new centralized service
  66 | - Known solutions (rsync/rclone use same pattern)
  67 | - Can defer UberSync until metrics prove it necessary
  68 | 
  69 | ## What
  70 | 
  71 | ### Affected Components
  72 | 
  73 | **basic-memory (core)**:
  74 | - `src/basic_memory/sync/sync_service.py` - Core sync algorithm (~42KB)
  75 | - `src/basic_memory/models.py` - Entity model (add mtime/size columns)
  76 | - `src/basic_memory/file_utils.py` - Checksum computation functions
  77 | - `src/basic_memory/repository/entity_repository.py` - Database queries
  78 | - `alembic/versions/` - Database migration for schema changes
  79 | 
  80 | **basic-memory-cloud (wrapper)**:
  81 | - `apps/api/src/basic_memory_cloud_api/sync_worker.py` - Cloud sync wrapper
  82 | - Circuit breaker implementation
  83 | - Sync coordination logic
  84 | 
  85 | ### Database Schema Changes
  86 | 
  87 | Add to Entity model:
  88 | ```python
  89 | mtime: float  # File modification timestamp
  90 | size: int     # File size in bytes
  91 | ```
  92 | 
  93 | ## How (High Level)
  94 | 
  95 | ### Phase 1: Core Algorithm Fixes (basic-memory)
  96 | 
  97 | **Priority: P0 - Critical**
  98 | 
  99 | #### 1.1 mtime-based Scanning (Issue #383)
 100 | 
 101 | Replace expensive checksum-based scanning with lightweight stat-based comparison:
 102 | 
 103 | ```python
 104 | async def scan_directory(self, directory: Path) -> ScanResult:
 105 |     """Scan using mtime/size instead of checksums"""
 106 |     result = ScanResult()
 107 | 
 108 |     for root, dirnames, filenames in os.walk(str(directory)):
 109 |         for filename in filenames:
 110 |             rel_path = path.relative_to(directory).as_posix()
 111 |             stat = path.stat()
 112 | 
 113 |             # Store lightweight metadata instead of checksum
 114 |             result.files[rel_path] = {
 115 |                 'mtime': stat.st_mtime,
 116 |                 'size': stat.st_size
 117 |             }
 118 | 
 119 |     return result
 120 | 
 121 | async def scan(self, directory: Path):
 122 |     """Compare mtime/size, only compute checksums for changed files"""
 123 |     db_state = await self.get_db_file_state()  # Include mtime/size
 124 |     scan_result = await self.scan_directory(directory)
 125 | 
 126 |     for file_path, metadata in scan_result.files.items():
 127 |         db_metadata = db_state.get(file_path)
 128 | 
 129 |         # Only compute expensive checksum if mtime/size changed
 130 |         if not db_metadata or metadata['mtime'] != db_metadata['mtime']:
 131 |             checksum = await self._compute_checksum_streaming(file_path)
 132 |             # Process immediately, don't accumulate in memory
 133 | ```
 134 | 
 135 | **Benefits**:
 136 | - No file reads during initial scan (just stat calls)
 137 | - ~90% reduction in memory usage
 138 | - ~10x faster scan phase
 139 | - Only checksum files that actually changed
 140 | 
 141 | #### 1.2 Streaming Checksum Computation (Issue #382)
 142 | 
 143 | For large files (>1MB), use chunked reading to avoid loading entire file:
 144 | 
 145 | ```python
 146 | async def _compute_checksum_streaming(self, path: Path, chunk_size: int = 65536) -> str:
 147 |     """Compute checksum using 64KB chunks for large files"""
 148 |     hasher = hashlib.sha256()
 149 | 
 150 |     loop = asyncio.get_event_loop()
 151 | 
 152 |     def read_chunks():
 153 |         with open(path, 'rb') as f:
 154 |             while chunk := f.read(chunk_size):
 155 |                 hasher.update(chunk)
 156 | 
 157 |     await loop.run_in_executor(None, read_chunks)
 158 |     return hasher.hexdigest()
 159 | 
 160 | async def _compute_checksum_async(self, file_path: Path) -> str:
 161 |     """Choose appropriate checksum method based on file size"""
 162 |     stat = file_path.stat()
 163 | 
 164 |     if stat.st_size > 1_048_576:  # 1MB threshold
 165 |         return await self._compute_checksum_streaming(file_path)
 166 |     else:
 167 |         # Small files: existing fast path
 168 |         content = await self._read_file_async(file_path)
 169 |         return compute_checksum(content)
 170 | ```
 171 | 
 172 | **Benefits**:
 173 | - Constant memory usage regardless of file size
 174 | - 16MB PDF uses 64KB memory (not 16MB)
 175 | - Works well with TigrisFS network I/O
 176 | 
 177 | #### 1.3 Bounded Concurrency (Issue #198)
 178 | 
 179 | Add semaphore to limit concurrent file operations, or consider using aiofiles and async reads
 180 | 
 181 | ```python
 182 | class SyncService:
 183 |     def __init__(self, ...):
 184 |         # ... existing code ...
 185 |         self._file_semaphore = asyncio.Semaphore(10)  # Max 10 concurrent
 186 |         self._max_tracked_failures = 100  # LRU cache limit
 187 | 
 188 |     async def _read_file_async(self, file_path: Path) -> str:
 189 |         async with self._file_semaphore:
 190 |             loop = asyncio.get_event_loop()
 191 |             return await loop.run_in_executor(
 192 |                 self._thread_pool,
 193 |                 file_path.read_text,
 194 |                 "utf-8"
 195 |             )
 196 | 
 197 |     async def _record_failure(self, path: str, error: str):
 198 |         # ... existing code ...
 199 | 
 200 |         # Implement LRU eviction
 201 |         if len(self._file_failures) > self._max_tracked_failures:
 202 |             self._file_failures.popitem(last=False)  # Remove oldest
 203 | ```
 204 | 
 205 | **Benefits**:
 206 | - Maximum 10 files in memory at once (vs all 2,624)
 207 | - 90%+ reduction in peak memory usage
 208 | - Prevents unbounded memory growth on error-prone projects
 209 | 
 210 | ### Phase 2: Cloud-Specific Fixes (basic-memory-cloud)
 211 | 
 212 | **Priority: P1 - High**
 213 | 
 214 | #### 2.1 Fix Resource Leaks
 215 | 
 216 | ```python
 217 | # apps/api/src/basic_memory_cloud_api/sync_worker.py
 218 | 
 219 | async def send_keepalive():
 220 |     """Send keepalive pings using proper session management"""
 221 |     # Use context manager to ensure cleanup
 222 |     async with aiohttp.ClientSession(
 223 |         timeout=aiohttp.ClientTimeout(total=5)
 224 |     ) as session:
 225 |         while True:
 226 |             try:
 227 |                 await session.get(f"https://{fly_app_name}.fly.dev/health")
 228 |                 await asyncio.sleep(10)
 229 |             except asyncio.CancelledError:
 230 |                 raise  # Exit cleanly
 231 |             except Exception as e:
 232 |                 logger.warning(f"Keepalive failed: {e}")
 233 | ```
 234 | 
 235 | #### 2.2 Improve Circuit Breaker
 236 | 
 237 | Track failures across sync cycles instead of resetting every 30s:
 238 | 
 239 | ```python
 240 | # Persistent failure tracking
 241 | class SyncWorker:
 242 |     def __init__(self):
 243 |         self._persistent_failures: Dict[str, int] = {}  # file -> failure_count
 244 |         self._failure_window_start = time.time()
 245 | 
 246 |     async def should_skip_file(self, file_path: str) -> bool:
 247 |         # Skip files that failed >3 times in last hour
 248 |         if self._persistent_failures.get(file_path, 0) > 3:
 249 |             if time.time() - self._failure_window_start < 3600:
 250 |                 return True
 251 |         return False
 252 | ```
 253 | 
 254 | ### Phase 3: Measurement & Decision
 255 | 
 256 | **Priority: P2 - Future**
 257 | 
 258 | After implementing Phases 1-2, collect metrics for 2 weeks:
 259 | - Memory usage per tenant sync
 260 | - Sync duration (scan + process)
 261 | - Concurrent sync load at peak times
 262 | - OOM incidents
 263 | - Resource costs
 264 | 
 265 | **UberSync Decision Criteria**:
 266 | 
 267 | Build centralized sync service ONLY if metrics show:
 268 | - ✅ Core fixes insufficient for >100 tenants
 269 | - ✅ Resource contention causing problems
 270 | - ✅ Need for tenant tier prioritization (paid > free)
 271 | - ✅ Cost savings justify complexity
 272 | 
 273 | Otherwise, defer UberSync as premature optimization.
 274 | 
 275 | ## How to Evaluate
 276 | 
 277 | ### Success Metrics (Phase 1)
 278 | 
 279 | **Memory Usage**:
 280 | - ✅ Peak memory <500MB for 2,000+ file projects (was 1.5-1.7GB)
 281 | - ✅ Memory usage linear with concurrent files (10 max), not total files
 282 | - ✅ Large file memory usage: 64KB chunks (not 16MB)
 283 | 
 284 | **Performance**:
 285 | - ✅ Initial scan <30 seconds (was 5+ minutes)
 286 | - ✅ Full sync <5 minutes for 2,000+ files (was 15+ minutes)
 287 | - ✅ Subsequent syncs <10 seconds (only changed files)
 288 | 
 289 | **Stability**:
 290 | - ✅ 2,000+ file projects run on 1GB machines
 291 | - ✅ Zero OOM kills in production
 292 | - ✅ No degradation with binary files (PDFs, images)
 293 | 
 294 | ### Success Metrics (Phase 2)
 295 | 
 296 | **Resource Management**:
 297 | - ✅ Zero aiohttp session leaks (verified via monitoring)
 298 | - ✅ Circuit breaker prevents repeated failures (>3 fails = skip for 1 hour)
 299 | - ✅ Tenant syncs distributed over 30s window (no thundering herd)
 300 | 
 301 | **Observability**:
 302 | - ✅ Logfire traces show memory usage per sync
 303 | - ✅ Clear logging of skipped files and reasons
 304 | - ✅ Metrics on sync duration, file counts, failure rates
 305 | 
 306 | ### Test Plan
 307 | 
 308 | **Unit Tests** (basic-memory):
 309 | - mtime comparison logic
 310 | - Streaming checksum correctness
 311 | - Semaphore limiting (mock 100 files, verify max 10 concurrent)
 312 | - LRU cache eviction
 313 | - Checksum computation: streaming vs non-streaming equivalence
 314 | 
 315 | **Integration Tests** (basic-memory):
 316 | - Large file handling (create 20MB test file)
 317 | - Mixed file types (text + binary)
 318 | - Changed file detection via mtime
 319 | - Sync with 1,000+ files
 320 | 
 321 | **Load Tests** (basic-memory-cloud):
 322 | - Test on tenant-6d2ff1a3 (2,621 files, 31 PDFs)
 323 | - Monitor memory during full sync with Logfire
 324 | - Measure scan and sync duration
 325 | - Run on 1GB machine (downgrade from 2GB to verify)
 326 | - Simulate 10 concurrent tenant syncs
 327 | 
 328 | **Regression Tests**:
 329 | - Verify existing sync scenarios still work
 330 | - CLI sync behavior unchanged
 331 | - File watcher integration unaffected
 332 | 
 333 | ### Performance Benchmarks
 334 | 
 335 | Establish baseline, then compare after each phase:
 336 | 
 337 | | Metric | Baseline | Phase 1 Target | Phase 2 Target |
 338 | |--------|----------|----------------|----------------|
 339 | | Peak Memory (2,600 files) | 1.5-1.7GB | <500MB | <450MB |
 340 | | Initial Scan Time | 5+ min | <30 sec | <30 sec |
 341 | | Full Sync Time | 15+ min | <5 min | <5 min |
 342 | | Subsequent Sync | 2+ min | <10 sec | <10 sec |
 343 | | OOM Incidents/Week | 2-3 | 0 | 0 |
 344 | | Min RAM Required | 2GB | 1GB | 1GB |
 345 | 
 346 | ## Implementation Phases
 347 | 
 348 | ### Phase 0.5: Database Schema & Streaming Foundation
 349 | 
 350 | **Priority: P0 - Required for Phase 1**
 351 | 
 352 | This phase establishes the foundation for streaming sync with mtime-based change detection.
 353 | 
 354 | **Database Schema Changes**:
 355 | - [x] Add `mtime` column to Entity model (REAL type for float timestamp)
 356 | - [x] Add `size` column to Entity model (INTEGER type for file size in bytes)
 357 | - [x] Create Alembic migration for new columns (nullable initially)
 358 | - [x] Add indexes on `(file_path, project_id)` for optimistic upsert performance
 359 | - [ ] Backfill existing entities with mtime/size from filesystem
 360 | 
 361 | **Streaming Architecture**:
 362 | - [x] Replace `os.walk()` with `os.scandir()` for cached stat info
 363 | - [ ] Eliminate `get_db_file_state()` - no upfront SELECT all entities
 364 | - [x] Implement streaming iterator `_scan_directory_streaming()`
 365 | - [x] Add `get_by_file_path()` optimized query (single file lookup)
 366 | - [x] Add `get_all_file_paths()` for deletion detection (paths only, no entities)
 367 | 
 368 | **Benefits**:
 369 | - **50% fewer network calls** on Tigris (scandir returns cached stat)
 370 | - **No large dicts in memory** (process files one at a time)
 371 | - **Indexed lookups** instead of full table scan
 372 | - **Foundation for mtime comparison** (Phase 1)
 373 | 
 374 | **Code Changes**:
 375 | 
 376 | ```python
 377 | # Before: Load all entities upfront
 378 | db_paths = await self.get_db_file_state()  # SELECT * FROM entity WHERE project_id = ?
 379 | scan_result = await self.scan_directory()  # os.walk() + stat() per file
 380 | 
 381 | # After: Stream and query incrementally
 382 | async for file_path, stat_info in self.scan_directory():  # scandir() with cached stat
 383 |     db_entity = await self.entity_repository.get_by_file_path(rel_path)  # Indexed lookup
 384 |     # Process immediately, no accumulation
 385 | ```
 386 | 
 387 | **Files Modified**:
 388 | - `src/basic_memory/models.py` - Add mtime/size columns
 389 | - `alembic/versions/xxx_add_mtime_size.py` - Migration
 390 | - `src/basic_memory/sync/sync_service.py` - Streaming implementation
 391 | - `src/basic_memory/repository/entity_repository.py` - Add get_all_file_paths()
 392 | 
 393 | **Migration Strategy**:
 394 | ```sql
 395 | -- Migration: Add nullable columns
 396 | ALTER TABLE entity ADD COLUMN mtime REAL;
 397 | ALTER TABLE entity ADD COLUMN size INTEGER;
 398 | 
 399 | -- Backfill from filesystem during first sync after upgrade
 400 | -- (Handled in sync_service on first scan)
 401 | ```
 402 | 
 403 | ### Phase 1: Core Fixes
 404 | 
 405 | **mtime-based scanning**:
 406 | - [x] Add mtime/size columns to Entity model (completed in Phase 0.5)
 407 | - [x] Database migration (alembic) (completed in Phase 0.5)
 408 | - [x] Refactor `scan()` to use streaming architecture with mtime/size comparison
 409 | - [x] Update `sync_markdown_file()` and `sync_regular_file()` to store mtime/size in database
 410 | - [x] Only compute checksums for changed files (mtime/size differ)
 411 | - [x] Unit tests for streaming scan (6 tests passing)
 412 | - [ ] Integration test with 1,000 files (defer to benchmarks)
 413 | 
 414 | **Streaming checksums**:
 415 | - [x] Implement `_compute_checksum_streaming()` with chunked reading
 416 | - [x] Add file size threshold logic (1MB)
 417 | - [x] Test with large files (16MB PDF)
 418 | - [x] Verify memory usage stays constant
 419 | - [x] Test checksum equivalence (streaming vs non-streaming)
 420 | 
 421 | **Bounded concurrency**:
 422 | - [x] Add semaphore (10 concurrent) to `_read_file_async()` (already existed)
 423 | - [x] Add LRU cache for failures (100 max) (already existed)
 424 | - [ ] Review thread pool size configuration
 425 | - [ ] Load test with 2,000+ files
 426 | - [ ] Verify <500MB peak memory
 427 | 
 428 | **Cleanup & Optimization**:
 429 | - [x] Eliminate `get_db_file_state()` - no upfront SELECT all entities (streaming architecture complete)
 430 | - [x] Consolidate file operations in FileService (eliminate duplicate checksum logic)
 431 | - [x] Add aiofiles dependency (already present)
 432 | - [x] FileService streaming checksums for files >1MB
 433 | - [x] SyncService delegates all file operations to FileService
 434 | - [x] Complete true async I/O refactoring - all file operations use aiofiles
 435 |   - [x] Added `FileService.read_file_content()` using aiofiles
 436 |   - [x] Removed `SyncService._read_file_async()` wrapper method
 437 |   - [x] Removed `SyncService._compute_checksum_async()` wrapper method
 438 |   - [x] Inlined all 7 checksum calls to use `file_service.compute_checksum()` directly
 439 |   - [x] All file I/O operations now properly consolidated in FileService with non-blocking I/O
 440 | - [x] Removed sync_status_service completely (unnecessary complexity and state tracking)
 441 |   - [x] Removed `sync_status_service.py` and `sync_status` MCP tool
 442 |   - [x] Removed all `sync_status_tracker` calls from `sync_service.py`
 443 |   - [x] Removed migration status checks from MCP tools (`write_note`, `read_note`, `build_context`)
 444 |   - [x] Removed `check_migration_status()` and `wait_for_migration_or_return_status()` from `utils.py`
 445 |   - [x] Removed all related tests (4 test files deleted)
 446 |   - [x] All 1184 tests passing
 447 | 
 448 | **Phase 1 Implementation Summary:**
 449 | 
 450 | Phase 1 is now complete with all core fixes implemented and tested:
 451 | 
 452 | 1. **Streaming Architecture** (Phase 0.5 + Phase 1):
 453 |    - Replaced `os.walk()` with `os.scandir()` for cached stat info
 454 |    - Eliminated upfront `get_db_file_state()` SELECT query
 455 |    - Implemented `_scan_directory_streaming()` for incremental processing
 456 |    - Added indexed `get_by_file_path()` lookups
 457 |    - Result: 50% fewer network calls on TigrisFS, no large dicts in memory
 458 | 
 459 | 2. **mtime-based Change Detection**:
 460 |    - Added `mtime` and `size` columns to Entity model
 461 |    - Alembic migration completed and deployed
 462 |    - Only compute checksums when mtime/size differs from database
 463 |    - Result: ~90% reduction in checksum operations during typical syncs
 464 | 
 465 | 3. **True Async I/O with aiofiles**:
 466 |    - All file operations consolidated in FileService
 467 |    - `FileService.compute_checksum()`: 64KB chunked reading for constant memory (lines 261-296 of file_service.py)
 468 |    - `FileService.read_file_content()`: Non-blocking file reads with aiofiles (lines 160-193 of file_service.py)
 469 |    - Removed all wrapper methods from SyncService (`_read_file_async`, `_compute_checksum_async`)
 470 |    - Semaphore controls concurrency (max 10 concurrent file operations)
 471 |    - Result: Constant memory usage regardless of file size, true non-blocking I/O
 472 | 
 473 | 4. **Test Coverage**:
 474 |    - 41/43 sync tests passing (2 skipped as expected)
 475 |    - Circuit breaker tests updated for new architecture
 476 |    - Streaming checksum equivalence verified
 477 |    - All edge cases covered (large files, concurrent operations, failures)
 478 | 
 479 | **Key Files Modified**:
 480 | - `src/basic_memory/models.py` - Added mtime/size columns
 481 | - `alembic/versions/xxx_add_mtime_size.py` - Database migration
 482 | - `src/basic_memory/sync/sync_service.py` - Streaming implementation, removed wrapper methods
 483 | - `src/basic_memory/services/file_service.py` - Added `read_file_content()`, streaming checksums
 484 | - `src/basic_memory/repository/entity_repository.py` - Added `get_all_file_paths()`
 485 | - `tests/sync/test_sync_service.py` - Updated circuit breaker test mocks
 486 | 
 487 | **Performance Improvements Achieved**:
 488 | - Memory usage: Constant per file (64KB chunks) vs full file in memory
 489 | - Scan speed: Stat-only scan (no checksums for unchanged files)
 490 | - I/O efficiency: True async with aiofiles (no thread pool blocking)
 491 | - Network efficiency: 50% fewer calls on TigrisFS via scandir caching
 492 | - Architecture: Clean separation of concerns (FileService owns all file I/O)
 493 | - Reduced complexity: Removed unnecessary sync_status_service state tracking
 494 | 
 495 | **Observability**:
 496 | - [x] Added Logfire instrumentation to `sync_file()` and `sync_markdown_file()`
 497 | - [x] Logfire disabled by default via `ignore_no_config = true` in pyproject.toml
 498 | - [x] No telemetry in FOSS version unless explicitly configured
 499 | - [x] Cloud deployment can enable Logfire for performance monitoring
 500 | 
 501 | **Next Steps**: Phase 1.5 scan watermark optimization for large project performance.
 502 | 
 503 | ### Phase 1.5: Scan Watermark Optimization
 504 | 
 505 | **Priority: P0 - Critical for Large Projects**
 506 | 
 507 | This phase addresses Issue #388 where large projects (1,460+ files) take 7+ minutes for sync operations even when no files have changed.
 508 | 
 509 | **Problem Analysis**:
 510 | 
 511 | From production data (tenant-0a20eb58):
 512 | - Total sync time: 420-450 seconds (7+ minutes) with 0 changes
 513 | - Scan phase: 321 seconds (75% of total time)
 514 | - Per-file cost: 220ms × 1,460 files = 5+ minutes
 515 | - Root cause: Network I/O to TigrisFS for stat operations (even with mtime columns)
 516 | - 15 concurrent syncs every 30 seconds compounds the problem
 517 | 
 518 | **Current Behavior** (Phase 1):
 519 | ```python
 520 | async def scan(self, directory: Path):
 521 |     """Scan filesystem using mtime/size comparison"""
 522 |     # Still stats ALL 1,460 files every sync cycle
 523 |     async for file_path, stat_info in self._scan_directory_streaming():
 524 |         db_entity = await self.entity_repository.get_by_file_path(file_path)
 525 |         # Compare mtime/size, skip unchanged files
 526 |         # Only checksum if changed (✅ already optimized)
 527 | ```
 528 | 
 529 | **Problem**: Even with mtime optimization, we stat every file on every scan. On TigrisFS (network FUSE mount), this means 1,460 network calls taking 5+ minutes.
 530 | 
 531 | **Solution: Scan Watermark + File Count Detection**
 532 | 
 533 | Track when we last scanned and how many files existed. Use filesystem-level filtering to only examine files modified since last scan.
 534 | 
 535 | **Key Insight**: File count changes signal deletions
 536 | - Count same → incremental scan (95% of syncs)
 537 | - Count increased → new files found by incremental (4% of syncs)
 538 | - Count decreased → files deleted, need full scan (1% of syncs)
 539 | 
 540 | **Database Schema Changes**:
 541 | 
 542 | Add to Project model:
 543 | ```python
 544 | last_scan_timestamp: float | None  # Unix timestamp of last successful scan start
 545 | last_file_count: int | None        # Number of files found in last scan
 546 | ```
 547 | 
 548 | **Implementation Strategy**:
 549 | 
 550 | ```python
 551 | async def scan(self, directory: Path):
 552 |     """Smart scan using watermark and file count"""
 553 |     project = await self.project_repository.get_current()
 554 | 
 555 |     # Step 1: Quick file count (fast on TigrisFS: 1.4s for 1,460 files)
 556 |     current_count = await self._quick_count_files(directory)
 557 | 
 558 |     # Step 2: Determine scan strategy
 559 |     if project.last_file_count is None:
 560 |         # First sync ever → full scan
 561 |         file_paths = await self._scan_directory_full(directory)
 562 |         scan_type = "full_initial"
 563 | 
 564 |     elif current_count < project.last_file_count:
 565 |         # Files deleted → need full scan to detect which ones
 566 |         file_paths = await self._scan_directory_full(directory)
 567 |         scan_type = "full_deletions"
 568 |         logger.info(f"File count decreased ({project.last_file_count} → {current_count}), running full scan")
 569 | 
 570 |     elif project.last_scan_timestamp is not None:
 571 |         # Incremental scan: only files modified since last scan
 572 |         file_paths = await self._scan_directory_modified_since(
 573 |             directory,
 574 |             project.last_scan_timestamp
 575 |         )
 576 |         scan_type = "incremental"
 577 |         logger.info(f"Incremental scan since {project.last_scan_timestamp}, found {len(file_paths)} changed files")
 578 |     else:
 579 |         # Fallback to full scan
 580 |         file_paths = await self._scan_directory_full(directory)
 581 |         scan_type = "full_fallback"
 582 | 
 583 |     # Step 3: Process changed files (existing logic)
 584 |     for file_path in file_paths:
 585 |         await self._process_file(file_path)
 586 | 
 587 |     # Step 4: Update watermark AFTER successful scan
 588 |     await self.project_repository.update(
 589 |         project.id,
 590 |         last_scan_timestamp=time.time(),  # Start of THIS scan
 591 |         last_file_count=current_count
 592 |     )
 593 | 
 594 |     # Step 5: Record metrics
 595 |     logfire.metric_counter(f"sync.scan.{scan_type}").add(1)
 596 |     logfire.metric_histogram("sync.scan.files_scanned", unit="files").record(len(file_paths))
 597 | ```
 598 | 
 599 | **Helper Methods**:
 600 | 
 601 | ```python
 602 | async def _quick_count_files(self, directory: Path) -> int:
 603 |     """Fast file count using find command"""
 604 |     # TigrisFS: 1.4s for 1,460 files
 605 |     result = await asyncio.create_subprocess_shell(
 606 |         f'find "{directory}" -type f | wc -l',
 607 |         stdout=asyncio.subprocess.PIPE
 608 |     )
 609 |     stdout, _ = await result.communicate()
 610 |     return int(stdout.strip())
 611 | 
 612 | async def _scan_directory_modified_since(
 613 |     self,
 614 |     directory: Path,
 615 |     since_timestamp: float
 616 | ) -> List[str]:
 617 |     """Use find -newermt for filesystem-level filtering"""
 618 |     # Convert timestamp to find-compatible format
 619 |     since_date = datetime.fromtimestamp(since_timestamp).strftime("%Y-%m-%d %H:%M:%S")
 620 | 
 621 |     # TigrisFS: 0.2s for 0 changed files (vs 5+ minutes for full scan)
 622 |     result = await asyncio.create_subprocess_shell(
 623 |         f'find "{directory}" -type f -newermt "{since_date}"',
 624 |         stdout=asyncio.subprocess.PIPE
 625 |     )
 626 |     stdout, _ = await result.communicate()
 627 | 
 628 |     # Convert absolute paths to relative
 629 |     file_paths = []
 630 |     for line in stdout.decode().splitlines():
 631 |         if line:
 632 |             rel_path = Path(line).relative_to(directory).as_posix()
 633 |             file_paths.append(rel_path)
 634 | 
 635 |     return file_paths
 636 | ```
 637 | 
 638 | **TigrisFS Testing Results** (SSH to production-basic-memory-tenant-0a20eb58):
 639 | 
 640 | ```bash
 641 | # Full file count
 642 | $ time find . -type f | wc -l
 643 | 1460
 644 | real    0m1.362s  # ✅ Acceptable
 645 | 
 646 | # Incremental scan (1 hour window)
 647 | $ time find . -type f -newermt "2025-01-20 10:00:00" | wc -l
 648 | 0
 649 | real    0m0.161s  # ✅ 8.5x faster!
 650 | 
 651 | # Incremental scan (24 hours)
 652 | $ time find . -type f -newermt "2025-01-19 11:00:00" | wc -l
 653 | 0
 654 | real    0m0.239s  # ✅ 5.7x faster!
 655 | ```
 656 | 
 657 | **Conclusion**: `find -newermt` works perfectly on TigrisFS and provides massive speedup.
 658 | 
 659 | **Expected Performance Improvements**:
 660 | 
 661 | | Scenario | Files Changed | Current Time | With Watermark | Speedup |
 662 | |----------|---------------|--------------|----------------|---------|
 663 | | No changes (common) | 0 | 420s | ~2s | 210x |
 664 | | Few changes | 5-10 | 420s | ~5s | 84x |
 665 | | Many changes | 100+ | 420s | ~30s | 14x |
 666 | | Deletions (rare) | N/A | 420s | 420s | 1x |
 667 | 
 668 | **Full sync breakdown** (1,460 files, 0 changes):
 669 | - File count: 1.4s
 670 | - Incremental scan: 0.2s
 671 | - Database updates: 0.4s
 672 | - **Total: ~2s (225x faster)**
 673 | 
 674 | **Metrics to Track**:
 675 | 
 676 | ```python
 677 | # Scan type distribution
 678 | logfire.metric_counter("sync.scan.full_initial").add(1)
 679 | logfire.metric_counter("sync.scan.full_deletions").add(1)
 680 | logfire.metric_counter("sync.scan.incremental").add(1)
 681 | 
 682 | # Performance metrics
 683 | logfire.metric_histogram("sync.scan.duration", unit="ms").record(scan_ms)
 684 | logfire.metric_histogram("sync.scan.files_scanned", unit="files").record(file_count)
 685 | logfire.metric_histogram("sync.scan.files_changed", unit="files").record(changed_count)
 686 | 
 687 | # Watermark effectiveness
 688 | logfire.metric_histogram("sync.scan.watermark_age", unit="s").record(
 689 |     time.time() - project.last_scan_timestamp
 690 | )
 691 | ```
 692 | 
 693 | **Edge Cases Handled**:
 694 | 
 695 | 1. **First sync**: No watermark → full scan (expected)
 696 | 2. **Deletions**: File count decreased → full scan (rare but correct)
 697 | 3. **Clock skew**: Use scan start time, not end time (captures files created during scan)
 698 | 4. **Scan failure**: Don't update watermark on failure (retry will re-scan)
 699 | 5. **New files**: Count increased → incremental scan finds them (common, fast)
 700 | 
 701 | **Files to Modify**:
 702 | - `src/basic_memory/models.py` - Add last_scan_timestamp, last_file_count to Project
 703 | - `alembic/versions/xxx_add_scan_watermark.py` - Migration for new columns
 704 | - `src/basic_memory/sync/sync_service.py` - Implement watermark logic
 705 | - `src/basic_memory/repository/project_repository.py` - Update methods
 706 | - `tests/sync/test_sync_watermark.py` - Test watermark behavior
 707 | 
 708 | **Test Plan**:
 709 | - [x] SSH test on TigrisFS confirms `find -newermt` works (completed)
 710 | - [x] Unit tests for scan strategy selection (4 tests)
 711 | - [x] Unit tests for file count detection (integrated in strategy tests)
 712 | - [x] Integration test: verify incremental scan finds changed files (4 tests)
 713 | - [x] Integration test: verify deletion detection triggers full scan (2 tests)
 714 | - [ ] Load test on tenant-0a20eb58 (1,460 files) - pending production deployment
 715 | - [ ] Verify <3s for no-change sync - pending production deployment
 716 | 
 717 | **Implementation Status**: ✅ **COMPLETED**
 718 | 
 719 | **Code Changes** (Commit: `fb16055d`):
 720 | - ✅ Added `last_scan_timestamp` and `last_file_count` to Project model
 721 | - ✅ Created database migration `e7e1f4367280_add_scan_watermark_tracking_to_project.py`
 722 | - ✅ Implemented smart scan strategy selection in `sync_service.py`
 723 | - ✅ Added `_quick_count_files()` using `find | wc -l` (~1.4s for 1,460 files)
 724 | - ✅ Added `_scan_directory_modified_since()` using `find -newermt` (~0.2s)
 725 | - ✅ Added `_scan_directory_full()` wrapper for full scans
 726 | - ✅ Watermark update logic after successful sync (uses sync START time)
 727 | - ✅ Logfire metrics for scan types and performance tracking
 728 | 
 729 | **Test Coverage** (18 tests in `test_sync_service_incremental.py`):
 730 | - ✅ Scan strategy selection (4 tests)
 731 |   - First sync uses full scan
 732 |   - File count decreased triggers full scan
 733 |   - Same file count uses incremental scan
 734 |   - Increased file count uses incremental scan
 735 | - ✅ Incremental scan base cases (4 tests)
 736 |   - No changes scenario
 737 |   - Detects new files
 738 |   - Detects modified files
 739 |   - Detects multiple changes
 740 | - ✅ Deletion detection (2 tests)
 741 |   - Single file deletion
 742 |   - Multiple file deletions
 743 | - ✅ Move detection (2 tests)
 744 |   - Moves require full scan (renames don't update mtime)
 745 |   - Moves detected in full scan via checksum
 746 | - ✅ Watermark update (3 tests)
 747 |   - Watermark updated after successful sync
 748 |   - Watermark uses sync start time
 749 |   - File count accuracy
 750 | - ✅ Edge cases (3 tests)
 751 |   - Concurrent file changes
 752 |   - Empty directory handling
 753 |   - Respects .gitignore patterns
 754 | 
 755 | **Performance Expectations** (to be verified in production):
 756 | - No changes: 420s → ~2s (210x faster)
 757 | - Few changes (5-10): 420s → ~5s (84x faster)
 758 | - Many changes (100+): 420s → ~30s (14x faster)
 759 | - Deletions: 420s → 420s (full scan, rare case)
 760 | 
 761 | **Rollout Strategy**:
 762 | 1. ✅ Code complete and tested (18 new tests, all passing)
 763 | 2. ✅ Pushed to `phase-0.5-streaming-foundation` branch
 764 | 3. ⏳ Windows CI tests running
 765 | 4. 📊 Deploy to staging tenant with watermark optimization
 766 | 5. 📊 Monitor scan performance metrics via Logfire
 767 | 6. 📊 Verify no missed files (compare full vs incremental results)
 768 | 7. 📊 Deploy to production tenant-0a20eb58
 769 | 8. 📊 Measure actual improvement (expect 420s → 2-3s)
 770 | 
 771 | **Success Criteria**:
 772 | - ✅ Implementation complete with comprehensive tests
 773 | - [ ] No-change syncs complete in <3 seconds (was 420s) - pending production test
 774 | - [ ] Incremental scans (95% of cases) use watermark - pending production test
 775 | - [ ] Deletion detection works correctly (full scan when needed) - tested in unit tests ✅
 776 | - [ ] No files missed due to watermark logic - tested in unit tests ✅
 777 | - [ ] Metrics show scan type distribution matches expectations - pending production test
 778 | 
 779 | **Next Steps**:
 780 | 1. Production deployment to tenant-0a20eb58
 781 | 2. Measure actual performance improvements
 782 | 3. Monitor metrics for 1 week
 783 | 4. Phase 2 cloud-specific fixes
 784 | 5. Phase 3 production measurement and UberSync decision
 785 | 
 786 | ### Phase 2: Cloud Fixes 
 787 | 
 788 | **Resource leaks**:
 789 | - [ ] Fix aiohttp session context manager
 790 | - [ ] Implement persistent circuit breaker
 791 | - [ ] Add memory monitoring/alerts
 792 | - [ ] Test on production tenant
 793 | 
 794 | **Sync coordination**:
 795 | - [ ] Implement hash-based staggering
 796 | - [ ] Add jitter to sync intervals
 797 | - [ ] Load test with 10 concurrent tenants
 798 | - [ ] Verify no thundering herd
 799 | 
 800 | ### Phase 3: Measurement
 801 | 
 802 | **Deploy to production**:
 803 | - [ ] Deploy Phase 1+2 changes
 804 | - [ ] Downgrade tenant-6d2ff1a3 to 1GB
 805 | - [ ] Monitor for OOM incidents
 806 | 
 807 | **Collect metrics**:
 808 | - [ ] Memory usage patterns
 809 | - [ ] Sync duration distributions
 810 | - [ ] Concurrent sync load
 811 | - [ ] Cost analysis
 812 | 
 813 | **UberSync decision**:
 814 | - [ ] Review metrics against decision criteria
 815 | - [ ] Document findings
 816 | - [ ] Create SPEC-18 for UberSync if needed
 817 | 
 818 | ## Related Issues
 819 | 
 820 | ### basic-memory (core)
 821 | - [#383](https://github.com/basicmachines-co/basic-memory/issues/383) - Refactor sync to use mtime-based scanning
 822 | - [#382](https://github.com/basicmachines-co/basic-memory/issues/382) - Optimize memory for large file syncs
 823 | - [#371](https://github.com/basicmachines-co/basic-memory/issues/371) - aiofiles for non-blocking I/O (future)
 824 | 
 825 | ### basic-memory-cloud
 826 | - [#198](https://github.com/basicmachines-co/basic-memory-cloud/issues/198) - Memory optimization for sync worker
 827 | - [#189](https://github.com/basicmachines-co/basic-memory-cloud/issues/189) - Circuit breaker for infinite retry loops
 828 | 
 829 | ## References
 830 | 
 831 | **Standard sync tools using mtime**:
 832 | - rsync: Uses mtime-based comparison by default, only checksums on `--checksum` flag
 833 | - rclone: Default is mtime/size, `--checksum` mode optional
 834 | - syncthing: Block-level sync with mtime tracking
 835 | 
 836 | **fsnotify polling** (future consideration):
 837 | - [fsnotify/fsnotify#9](https://github.com/fsnotify/fsnotify/issues/9) - Polling mode for network filesystems
 838 | 
 839 | ## Notes
 840 | 
 841 | ### Why Not UberSync Now?
 842 | 
 843 | **Premature Optimization**:
 844 | - Current problems are algorithmic, not architectural
 845 | - No evidence that multi-tenant coordination is the issue
 846 | - Single tenant OOM proves algorithm is the problem
 847 | 
 848 | **Benefits of Core-First Approach**:
 849 | - ✅ Helps all users (CLI + Cloud)
 850 | - ✅ Lower risk (no new service)
 851 | - ✅ Clear path (issues specify fixes)
 852 | - ✅ Can defer UberSync until proven necessary
 853 | 
 854 | **When UberSync Makes Sense**:
 855 | - >100 active tenants causing resource contention
 856 | - Need for tenant tier prioritization (paid > free)
 857 | - Centralized observability requirements
 858 | - Cost optimization at scale
 859 | 
 860 | ### Migration Strategy
 861 | 
 862 | **Backward Compatibility**:
 863 | - New mtime/size columns nullable initially
 864 | - Existing entities sync normally (compute mtime on first scan)
 865 | - No breaking changes to MCP API
 866 | - CLI behavior unchanged
 867 | 
 868 | **Rollout**:
 869 | 1. Deploy to staging with test tenant
 870 | 2. Validate memory/performance improvements
 871 | 3. Deploy to production (blue-green)
 872 | 4. Monitor for 1 week
 873 | 5. Downgrade tenant machines if successful
 874 | 
 875 | ## Further Considerations
 876 | 
 877 | ### Version Control System (VCS) Integration
 878 | 
 879 | **Context:** Users frequently request git versioning, and large projects with PDFs/images pose memory challenges.
 880 | 
 881 | #### Git-Based Sync
 882 | 
 883 | **Approach:** Use git for change detection instead of custom mtime comparison.
 884 | 
 885 | ```python
 886 | # Git automatically tracks changes
 887 | repo = git.Repo(project_path)
 888 | repo.git.add(A=True)
 889 | diff = repo.index.diff('HEAD')
 890 | 
 891 | for change in diff:
 892 |     if change.change_type == 'M':  # Modified
 893 |         await sync_file(change.b_path)
 894 | ```
 895 | 
 896 | **Pros:**
 897 | - ✅ Proven, battle-tested change detection
 898 | - ✅ Built-in rename/move detection (similarity index)
 899 | - ✅ Efficient for cloud sync (git protocol over HTTP)
 900 | - ✅ Could enable version history as bonus feature
 901 | - ✅ Users want git integration anyway
 902 | 
 903 | **Cons:**
 904 | - ❌ User confusion (`.git` folder in knowledge base)
 905 | - ❌ Conflicts with existing git repos (submodule complexity)
 906 | - ❌ Adds dependency (git binary or dulwich/pygit2)
 907 | - ❌ Less control over sync logic
 908 | - ❌ Doesn't solve large file problem (PDFs still checksummed)
 909 | - ❌ Git LFS adds complexity
 910 | 
 911 | #### Jujutsu (jj) Alternative
 912 | 
 913 | **Why jj is compelling:**
 914 | 
 915 | 1. **Working Copy as Source of Truth**
 916 |    - Git: Staging area is intermediate state
 917 |    - Jujutsu: Working copy IS a commit
 918 |    - Aligns with "files are source of truth" philosophy!
 919 | 
 920 | 2. **Automatic Change Tracking**
 921 |    - No manual staging required
 922 |    - Working copy changes tracked automatically
 923 |    - Better fit for sync operations vs git's commit-centric model
 924 | 
 925 | 3. **Conflict Handling**
 926 |    - User edits + sync changes both preserved
 927 |    - Operation log vs linear history
 928 |    - Built for operations, not just history
 929 | 
 930 | **Cons:**
 931 | - ❌ New/immature (2020 vs git's 2005)
 932 | - ❌ Not universally available
 933 | - ❌ Steeper learning curve for users
 934 | - ❌ No LFS equivalent yet
 935 | - ❌ Still doesn't solve large file checksumming
 936 | 
 937 | #### Git Index Format (Hybrid Approach)
 938 | 
 939 | **Best of both worlds:** Use git's index format without full git repo.
 940 | 
 941 | ```python
 942 | from dulwich.index import Index  # Pure Python
 943 | 
 944 | # Use git index format for tracking
 945 | idx = Index(project_path / '.basic-memory' / 'index')
 946 | 
 947 | for file in files:
 948 |     stat = file.stat()
 949 |     if idx.get(file) and idx[file].mtime == stat.st_mtime:
 950 |         continue  # Unchanged (git's proven logic)
 951 | 
 952 |     await sync_file(file)
 953 |     idx[file] = (stat.st_mtime, stat.st_size, sha)
 954 | ```
 955 | 
 956 | **Pros:**
 957 | - ✅ Git's proven change detection logic
 958 | - ✅ No user-visible `.git` folder
 959 | - ✅ No git dependency (pure Python)
 960 | - ✅ Full control over sync
 961 | 
 962 | **Cons:**
 963 | - ❌ Adds dependency (dulwich)
 964 | - ❌ Doesn't solve large files
 965 | - ❌ No built-in versioning
 966 | 
 967 | ### Large File Handling
 968 | 
 969 | **Problem:** PDFs/images cause memory issues regardless of VCS choice.
 970 | 
 971 | **Solutions (Phase 1+):**
 972 | 
 973 | **1. Skip Checksums for Large Files**
 974 | ```python
 975 | if stat.st_size > 10_000_000:  # 10MB threshold
 976 |     checksum = None  # Use mtime/size only
 977 |     logger.info(f"Skipping checksum for {file_path}")
 978 | ```
 979 | 
 980 | **2. Partial Hashing**
 981 | ```python
 982 | if file.suffix in ['.pdf', '.jpg', '.png']:
 983 |     # Hash first/last 64KB instead of entire file
 984 |     checksum = hash_partial(file, chunk_size=65536)
 985 | ```
 986 | 
 987 | **3. External Blob Storage**
 988 | ```python
 989 | if stat.st_size > 10_000_000:
 990 |     blob_id = await upload_to_tigris_blob(file)
 991 |     entity.blob_id = blob_id
 992 |     entity.file_path = None  # Not in main sync
 993 | ```
 994 | 
 995 | ### Recommendation & Timeline
 996 | 
 997 | **Phase 0.5-1 (Now):** Custom streaming + mtime
 998 | - ✅ Solves urgent memory issues
 999 | - ✅ No dependencies
1000 | - ✅ Full control
1001 | - ✅ Skip checksums for large files (>10MB)
1002 | - ✅ Proven pattern (rsync/rclone)
1003 | 
1004 | **Phase 2 (After metrics):** Git index format exploration
1005 | ```python
1006 | # Optional: Use git index for tracking if beneficial
1007 | from dulwich.index import Index
1008 | # No git repo, just index file format
1009 | ```
1010 | 
1011 | **Future (User feature):** User-facing versioning
1012 | ```python
1013 | # Let users opt into VCS:
1014 | basic-memory config set versioning git
1015 | basic-memory config set versioning jj
1016 | basic-memory config set versioning none  # Current behavior
1017 | 
1018 | # Integrate with their chosen workflow
1019 | # Not forced upon them
1020 | ```
1021 | 
1022 | **Rationale:**
1023 | 1. **Don't block on VCS decision** - Memory issues are P0
1024 | 2. **Learn from deployment** - See actual usage patterns
1025 | 3. **Keep options open** - Can add git/jj later
1026 | 4. **Files as source of truth** - Core philosophy preserved
1027 | 5. **Large files need attention regardless** - VCS won't solve that
1028 | 
1029 | **Decision Point:**
1030 | - If Phase 0.5/1 achieves memory targets → VCS integration deferred
1031 | - If users strongly request versioning → Add as opt-in feature
1032 | - If change detection becomes bottleneck → Explore git index format
1033 | 
1034 | ## Agent Assignment
1035 | 
1036 | **Phase 1 Implementation**: `python-developer` agent
1037 | - Expertise in FastAPI, async Python, database migrations
1038 | - Handles basic-memory core changes
1039 | 
1040 | **Phase 2 Implementation**: `python-developer` agent
1041 | - Same agent continues with cloud-specific fixes
1042 | - Maintains consistency across phases
1043 | 
1044 | **Phase 3 Review**: `system-architect` agent
1045 | - Analyzes metrics and makes UberSync decision
1046 | - Creates SPEC-18 if centralized service needed
1047 | 
```

--------------------------------------------------------------------------------
/specs/SPEC-20 Simplified Project-Scoped Rclone Sync.md:
--------------------------------------------------------------------------------

```markdown
   1 | ---
   2 | title: 'SPEC-20: Simplified Project-Scoped Rclone Sync'
   3 | date: 2025-01-27
   4 | updated: 2025-01-28
   5 | status: Implemented
   6 | priority: High
   7 | goal: Simplify cloud sync by making it project-scoped, safe by design, and closer to native rclone commands
   8 | parent: SPEC-8
   9 | ---
  10 | 
  11 | ## Executive Summary
  12 | 
  13 | The current rclone implementation (SPEC-8) has proven too complex with multiple footguns:
  14 | - Two workflows (mount vs bisync) with different directories causing confusion
  15 | - Multiple profiles (3 for mount, 3 for bisync) creating too much choice
  16 | - Directory conflicts (`~/basic-memory-cloud/` vs `~/basic-memory-cloud-sync/`)
  17 | - Auto-discovery of folders leading to errors
  18 | - Unclear what syncs and when
  19 | 
  20 | This spec proposes a **radical simplification**: project-scoped sync operations that are explicit, safe, and thin wrappers around rclone commands.
  21 | 
  22 | ## Why
  23 | 
  24 | ### Current Problems
  25 | 
  26 | **Complexity:**
  27 | - Users must choose between mount and bisync workflows
  28 | - Different directories for different workflows
  29 | - Profile selection (6 total profiles) overwhelms users
  30 | - Setup requires multiple steps with potential conflicts
  31 | 
  32 | **Footguns:**
  33 | - Renaming local folder breaks sync (no config tracking)
  34 | - Mount directory conflicts with bisync directory
  35 | - Auto-discovered folders create phantom projects
  36 | - Uninitialized bisync state causes failures
  37 | - Unclear which files sync (all files in root directory?)
  38 | 
  39 | **User Confusion:**
  40 | - "What does `bm sync` actually do?"
  41 | - "Is `~/basic-memory-cloud-sync/my-folder/` a project or just a folder?"
  42 | - "Why do I have two basic-memory directories?"
  43 | - "How do I sync just one project?"
  44 | 
  45 | ### Design Principles (Revised)
  46 | 
  47 | 1. **Projects are independent** - Each project manages its own sync state
  48 | 2. **Global cloud mode** - You're either local or cloud (no per-project flag)
  49 | 3. **Explicit operations** - No auto-discovery, no magic
  50 | 4. **Safe by design** - Config tracks state, not filesystem
  51 | 5. **Thin rclone wrappers** - Stay close to rclone commands
  52 | 6. **One good way** - Remove choices that don't matter
  53 | 
  54 | ## What
  55 | 
  56 | ### New Architecture
  57 | 
  58 | #### Core Concepts
  59 | 
  60 | 1. **Global Cloud Mode** (existing, keep as-is)
  61 |    - `cloud_mode_enabled` in config
  62 |    - `bm cloud login` enables it, `bm cloud logout` disables it
  63 |    - When enabled, ALL Basic Memory operations hit cloud API
  64 | 
  65 | 2. **Project-Scoped Sync** (new)
  66 |    - Each project optionally has a `local_path` for local working copy
  67 |    - Sync operations are explicit: `bm project sync --name research`
  68 |    - Projects can live anywhere on disk, not forced into sync directory
  69 | 
  70 | 3. **Simplified rclone Config** (new)
  71 |    - Single remote named `bm-cloud` (not `basic-memory-{tenant_id}`)
  72 |    - One credential set per user
  73 |    - Config lives at `~/.config/rclone/rclone.conf`
  74 | 
  75 | #### Command Structure
  76 | 
  77 | ```bash
  78 | # Setup (once)
  79 | bm cloud login                          # Authenticate, enable cloud mode
  80 | bm cloud setup                          # Install rclone, generate credentials
  81 | 
  82 | # Project creation with optional sync
  83 | bm project add research                 # Create cloud project (no local sync)
  84 | bm project add research --local ~/docs  # Create with local sync configured
  85 | 
  86 | # Or configure sync later
  87 | bm project sync-setup research ~/docs   # Configure local sync for existing project
  88 | 
  89 | # Project-scoped rclone operations
  90 | bm project sync --name research         # One-way sync (local → cloud)
  91 | bm project bisync --name research       # Two-way sync (local ↔ cloud)
  92 | bm project check --name research        # Verify integrity
  93 | 
  94 | # Advanced file operations
  95 | bm project ls --name research [path]    # List remote files
  96 | bm project copy --name research src dst # Copy files
  97 | 
  98 | # Batch operations
  99 | bm project sync --all                   # Sync all projects with local_sync_path
 100 | bm project bisync --all                 # Two-way sync all projects
 101 | ```
 102 | 
 103 | #### Config Model
 104 | 
 105 | ```json
 106 | {
 107 |   "cloud_mode": true,
 108 |   "cloud_host":  "https://cloud.basicmemory.com",
 109 | 
 110 |   "projects": {
 111 |     // Used in LOCAL mode only (simple name → path mapping)
 112 |     "main": "/Users/user/basic-memory"
 113 |   },
 114 | 
 115 |   "cloud_projects": {
 116 |     // Used in CLOUD mode for sync configuration
 117 |     "research": {
 118 |       "local_path": "~/Documents/research",
 119 |       "last_sync": "2025-01-27T10:00:00Z",
 120 |       "bisync_initialized": true
 121 |     },
 122 |     "work": {
 123 |       "local_path": "~/work",
 124 |       "last_sync": null,
 125 |       "bisync_initialized": false
 126 |     }
 127 |   }
 128 | }
 129 | ```
 130 | 
 131 | **Note:** In cloud mode, the actual project list comes from the API (`GET /projects/projects`). The `cloud_projects` dict only stores local sync configuration.
 132 | 
 133 | #### Rclone Config
 134 | 
 135 | ```ini
 136 | # ~/.config/rclone/rclone.conf
 137 | [basic-memory-cloud]
 138 | type = s3
 139 | provider = Other
 140 | access_key_id = {scoped_access_key}
 141 | secret_access_key = {scoped_secret_key}
 142 | endpoint = https://fly.storage.tigris.dev
 143 | region = auto
 144 | ```
 145 | 
 146 | ### What Gets Removed
 147 | 
 148 | - ❌ Mount commands (`bm cloud mount`, `bm cloud unmount`, `bm cloud mount-status`)
 149 | - ❌ Profile selection (both mount and bisync profiles)
 150 | - ❌ `~/basic-memory-cloud/` directory (mount point)
 151 | - ❌ `~/basic-memory-cloud-sync/` directory (forced sync location)
 152 | - ❌ Auto-discovery of folders
 153 | - ❌ Separate `bisync-setup` command
 154 | - ❌ `bisync_config` in config.json
 155 | - ❌ Convenience commands (`bm sync`, `bm bisync` without project name)
 156 | - ❌ Complex state management for global sync
 157 | 
 158 | ### What Gets Simplified
 159 | 
 160 | - ✅ One setup command: `bm cloud setup`
 161 | - ✅ One rclone remote: `bm-cloud`
 162 | - ✅ One workflow: project-scoped bisync (remove mount)
 163 | - ✅ One set of defaults (balanced settings from SPEC-8)
 164 | - ✅ Clear project-to-path mapping in config
 165 | - ✅ Explicit sync operations only
 166 | 
 167 | ### What Gets Added
 168 | 
 169 | - ✅ `bm project sync --name <project>` (one-way: local → cloud)
 170 | - ✅ `bm project bisync --name <project>` (two-way: local ↔ cloud)
 171 | - ✅ `bm project check --name <project>` (integrity verification)
 172 | - ✅ `bm project sync-setup <project> <local_path>` (configure sync)
 173 | - ✅ `bm project ls --name <project> [path]` (list remote files)
 174 | - ✅ `bm project copy --name <project> <src> <dst>` (copy files)
 175 | - ✅ `cloud_projects` dict in config.json
 176 | 
 177 | ## How
 178 | 
 179 | ### Phase 1: Project Model Updates
 180 | 
 181 | **1.1 Update Config Schema**
 182 | 
 183 | ```python
 184 | # basic_memory/config.py
 185 | class Config(BaseModel):
 186 |     # ... existing fields ...
 187 |     cloud_mode: bool = False
 188 |     cloud_host: str = "https://cloud.basicmemory.com"
 189 | 
 190 |     # Local mode: simple name → path mapping
 191 |     projects: dict[str, str] = {}
 192 | 
 193 |     # Cloud mode: sync configuration per project
 194 |     cloud_projects: dict[str, CloudProjectConfig] = {}
 195 | 
 196 | 
 197 | class CloudProjectConfig(BaseModel):
 198 |     """Sync configuration for a cloud project."""
 199 |     local_path: str                        # Local working directory
 200 |     last_sync: Optional[datetime] = None   # Last successful sync
 201 |     bisync_initialized: bool = False       # Whether bisync baseline exists
 202 | ```
 203 | 
 204 | **No database changes needed** - sync config lives in `~/.basic-memory/config.json` only.
 205 | 
 206 | ### Phase 2: Simplified Rclone Configuration
 207 | 
 208 | **2.1 Simplify Remote Naming**
 209 | 
 210 | ```python
 211 | # basic_memory/cli/commands/cloud/rclone_config.py
 212 | 
 213 | def configure_rclone_remote(
 214 |     access_key: str,
 215 |     secret_key: str,
 216 |     endpoint: str = "https://fly.storage.tigris.dev",
 217 |     region: str = "auto",
 218 | ) -> None:
 219 |     """Configure single rclone remote named 'bm-cloud'."""
 220 | 
 221 |     config = load_rclone_config()
 222 | 
 223 |     # Single remote name (not tenant-specific)
 224 |     REMOTE_NAME = "basic-memory-cloud"
 225 | 
 226 |     if not config.has_section(REMOTE_NAME):
 227 |         config.add_section(REMOTE_NAME)
 228 | 
 229 |     config.set(REMOTE_NAME, "type", "s3")
 230 |     config.set(REMOTE_NAME, "provider", "Other")
 231 |     config.set(REMOTE_NAME, "access_key_id", access_key)
 232 |     config.set(REMOTE_NAME, "secret_access_key", secret_key)
 233 |     config.set(REMOTE_NAME, "endpoint", endpoint)
 234 |     config.set(REMOTE_NAME, "region", region)
 235 | 
 236 |     save_rclone_config(config)
 237 | ```
 238 | 
 239 | **2.2 Remove Profile Complexity**
 240 | 
 241 | Use single set of balanced defaults (from SPEC-8 Phase 4 testing):
 242 | - `conflict_resolve`: `newer` (auto-resolve to most recent)
 243 | - `max_delete`: `25` (safety limit)
 244 | - `check_access`: `false` (skip for performance)
 245 | 
 246 | ### Phase 3: Project-Scoped Rclone Commands
 247 | 
 248 | **3.1 Core Sync Operations**
 249 | 
 250 | ```python
 251 | # basic_memory/cli/commands/cloud/rclone_commands.py
 252 | 
 253 | def get_project_remote(project: Project, bucket_name: str) -> str:
 254 |     """Build rclone remote path for project.
 255 | 
 256 |     Returns: bm-cloud:bucket-name/app/data/research
 257 |     """
 258 |     # Strip leading slash from cloud path
 259 |     cloud_path = project.path.lstrip("/")
 260 |     return f"basic-memory-cloud:{bucket_name}/{cloud_path}"
 261 | 
 262 | 
 263 | def project_sync(
 264 |     project: Project,
 265 |     bucket_name: str,
 266 |     dry_run: bool = False,
 267 |     verbose: bool = False,
 268 | ) -> bool:
 269 |     """One-way sync: local → cloud.
 270 | 
 271 |     Uses rclone sync to make cloud identical to local.
 272 |     """
 273 |     if not project.local_sync_path:
 274 |         raise RcloneError(f"Project {project.name} has no local_sync_path configured")
 275 | 
 276 |     local_path = Path(project.local_sync_path).expanduser()
 277 |     remote_path = get_project_remote(project, bucket_name)
 278 |     filter_path = get_bmignore_filter_path()
 279 | 
 280 |     cmd = [
 281 |         "rclone", "sync",
 282 |         str(local_path),
 283 |         remote_path,
 284 |         "--filters-file", str(filter_path),
 285 |     ]
 286 | 
 287 |     if verbose:
 288 |         cmd.append("--verbose")
 289 |     else:
 290 |         cmd.append("--progress")
 291 | 
 292 |     if dry_run:
 293 |         cmd.append("--dry-run")
 294 | 
 295 |     result = subprocess.run(cmd, text=True)
 296 |     return result.returncode == 0
 297 | 
 298 | 
 299 | def project_bisync(
 300 |     project: Project,
 301 |     bucket_name: str,
 302 |     dry_run: bool = False,
 303 |     resync: bool = False,
 304 |     verbose: bool = False,
 305 | ) -> bool:
 306 |     """Two-way sync: local ↔ cloud.
 307 | 
 308 |     Uses rclone bisync with balanced defaults.
 309 |     """
 310 |     if not project.local_sync_path:
 311 |         raise RcloneError(f"Project {project.name} has no local_sync_path configured")
 312 | 
 313 |     local_path = Path(project.local_sync_path).expanduser()
 314 |     remote_path = get_project_remote(project, bucket_name)
 315 |     filter_path = get_bmignore_filter_path()
 316 |     state_path = get_project_bisync_state(project.name)
 317 | 
 318 |     # Ensure state directory exists
 319 |     state_path.mkdir(parents=True, exist_ok=True)
 320 | 
 321 |     cmd = [
 322 |         "rclone", "bisync",
 323 |         str(local_path),
 324 |         remote_path,
 325 |         "--create-empty-src-dirs",
 326 |         "--resilient",
 327 |         "--conflict-resolve=newer",
 328 |         "--max-delete=25",
 329 |         "--filters-file", str(filter_path),
 330 |         "--workdir", str(state_path),
 331 |     ]
 332 | 
 333 |     if verbose:
 334 |         cmd.append("--verbose")
 335 |     else:
 336 |         cmd.append("--progress")
 337 | 
 338 |     if dry_run:
 339 |         cmd.append("--dry-run")
 340 | 
 341 |     if resync:
 342 |         cmd.append("--resync")
 343 | 
 344 |     # Check if first run requires resync
 345 |     if not resync and not bisync_initialized(project.name) and not dry_run:
 346 |         raise RcloneError(
 347 |             f"First bisync for {project.name} requires --resync to establish baseline.\n"
 348 |             f"Run: bm project bisync --name {project.name} --resync"
 349 |         )
 350 | 
 351 |     result = subprocess.run(cmd, text=True)
 352 |     return result.returncode == 0
 353 | 
 354 | 
 355 | def project_check(
 356 |     project: Project,
 357 |     bucket_name: str,
 358 |     one_way: bool = False,
 359 | ) -> bool:
 360 |     """Check integrity between local and cloud.
 361 | 
 362 |     Returns True if files match, False if differences found.
 363 |     """
 364 |     if not project.local_sync_path:
 365 |         raise RcloneError(f"Project {project.name} has no local_sync_path configured")
 366 | 
 367 |     local_path = Path(project.local_sync_path).expanduser()
 368 |     remote_path = get_project_remote(project, bucket_name)
 369 |     filter_path = get_bmignore_filter_path()
 370 | 
 371 |     cmd = [
 372 |         "rclone", "check",
 373 |         str(local_path),
 374 |         remote_path,
 375 |         "--filter-from", str(filter_path),
 376 |     ]
 377 | 
 378 |     if one_way:
 379 |         cmd.append("--one-way")
 380 | 
 381 |     result = subprocess.run(cmd, capture_output=True, text=True)
 382 |     return result.returncode == 0
 383 | ```
 384 | 
 385 | **3.2 Advanced File Operations**
 386 | 
 387 | ```python
 388 | def project_ls(
 389 |     project: Project,
 390 |     bucket_name: str,
 391 |     path: Optional[str] = None,
 392 | ) -> list[str]:
 393 |     """List files in remote project."""
 394 |     remote_path = get_project_remote(project, bucket_name)
 395 |     if path:
 396 |         remote_path = f"{remote_path}/{path}"
 397 | 
 398 |     cmd = ["rclone", "ls", remote_path]
 399 |     result = subprocess.run(cmd, capture_output=True, text=True, check=True)
 400 |     return result.stdout.splitlines()
 401 | 
 402 | 
 403 | def project_copy(
 404 |     project: Project,
 405 |     bucket_name: str,
 406 |     src: str,
 407 |     dst: str,
 408 |     dry_run: bool = False,
 409 | ) -> bool:
 410 |     """Copy files within project scope."""
 411 |     # Implementation similar to sync
 412 |     pass
 413 | ```
 414 | 
 415 | ### Phase 4: CLI Integration
 416 | 
 417 | **4.1 Update Project Commands**
 418 | 
 419 | ```python
 420 | # basic_memory/cli/commands/project.py
 421 | 
 422 | @project_app.command("add")
 423 | def add_project(
 424 |     name: str = typer.Argument(..., help="Name of the project"),
 425 |     path: str = typer.Argument(None, help="Path (required for local mode)"),
 426 |     local: str = typer.Option(None, "--local", help="Local sync path for cloud mode"),
 427 |     set_default: bool = typer.Option(False, "--default", help="Set as default"),
 428 | ) -> None:
 429 |     """Add a new project.
 430 | 
 431 |     Cloud mode examples:
 432 |       bm project add research                    # No local sync
 433 |       bm project add research --local ~/docs     # With local sync
 434 | 
 435 |     Local mode example:
 436 |       bm project add research ~/Documents/research
 437 |     """
 438 |     config = ConfigManager().config
 439 | 
 440 |     if config.cloud_mode_enabled:
 441 |         # Cloud mode: auto-generate cloud path from name
 442 |         async def _add_project():
 443 |             async with get_client() as client:
 444 |                 data = {
 445 |                     "name": name,
 446 |                     "path": generate_permalink(name),
 447 |                     "local_sync_path": local,  # Optional
 448 |                     "set_default": set_default,
 449 |                 }
 450 |                 response = await call_post(client, "/projects/projects", json=data)
 451 |                 return ProjectStatusResponse.model_validate(response.json())
 452 |     else:
 453 |         # Local mode: path is required
 454 |         if path is None:
 455 |             console.print("[red]Error: path argument is required in local mode[/red]")
 456 |             raise typer.Exit(1)
 457 | 
 458 |         resolved_path = Path(os.path.abspath(os.path.expanduser(path))).as_posix()
 459 | 
 460 |         async def _add_project():
 461 |             async with get_client() as client:
 462 |                 data = {
 463 |                     "name": name,
 464 |                     "path": resolved_path,
 465 |                     "set_default": set_default,
 466 |                 }
 467 |                 response = await call_post(client, "/projects/projects", json=data)
 468 |                 return ProjectStatusResponse.model_validate(response.json())
 469 | 
 470 |     # Execute and display result
 471 |     result = asyncio.run(_add_project())
 472 |     console.print(f"[green]{result.message}[/green]")
 473 | 
 474 | 
 475 | @project_app.command("sync-setup")
 476 | def setup_project_sync(
 477 |     name: str = typer.Argument(..., help="Project name"),
 478 |     local_path: str = typer.Argument(..., help="Local sync directory"),
 479 | ) -> None:
 480 |     """Configure local sync for an existing cloud project."""
 481 | 
 482 |     config = ConfigManager().config
 483 | 
 484 |     if not config.cloud_mode_enabled:
 485 |         console.print("[red]Error: sync-setup only available in cloud mode[/red]")
 486 |         raise typer.Exit(1)
 487 | 
 488 |     resolved_path = Path(os.path.abspath(os.path.expanduser(local_path))).as_posix()
 489 | 
 490 |     async def _update_project():
 491 |         async with get_client() as client:
 492 |             data = {"local_sync_path": resolved_path}
 493 |             project_permalink = generate_permalink(name)
 494 |             response = await call_patch(
 495 |                 client,
 496 |                 f"/projects/{project_permalink}",
 497 |                 json=data,
 498 |             )
 499 |             return ProjectStatusResponse.model_validate(response.json())
 500 | 
 501 |     result = asyncio.run(_update_project())
 502 |     console.print(f"[green]{result.message}[/green]")
 503 |     console.print(f"\nLocal sync configured: {resolved_path}")
 504 |     console.print(f"\nTo sync: bm project bisync --name {name} --resync")
 505 | ```
 506 | 
 507 | **4.2 New Sync Commands**
 508 | 
 509 | ```python
 510 | # basic_memory/cli/commands/project.py
 511 | 
 512 | @project_app.command("sync")
 513 | def sync_project(
 514 |     name: str = typer.Option(..., "--name", help="Project name"),
 515 |     all_projects: bool = typer.Option(False, "--all", help="Sync all projects"),
 516 |     dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes"),
 517 |     verbose: bool = typer.Option(False, "--verbose", help="Show detailed output"),
 518 | ) -> None:
 519 |     """One-way sync: local → cloud (make cloud identical to local)."""
 520 | 
 521 |     config = ConfigManager().config
 522 |     if not config.cloud_mode_enabled:
 523 |         console.print("[red]Error: sync only available in cloud mode[/red]")
 524 |         raise typer.Exit(1)
 525 | 
 526 |     # Get projects to sync
 527 |     if all_projects:
 528 |         projects = get_all_sync_projects()
 529 |     else:
 530 |         projects = [get_project_by_name(name)]
 531 | 
 532 |     # Get bucket name
 533 |     tenant_info = asyncio.run(get_mount_info())
 534 |     bucket_name = tenant_info.bucket_name
 535 | 
 536 |     # Sync each project
 537 |     for project in projects:
 538 |         if not project.local_sync_path:
 539 |             console.print(f"[yellow]Skipping {project.name}: no local_sync_path[/yellow]")
 540 |             continue
 541 | 
 542 |         console.print(f"[blue]Syncing {project.name}...[/blue]")
 543 |         try:
 544 |             project_sync(project, bucket_name, dry_run=dry_run, verbose=verbose)
 545 |             console.print(f"[green]✓ {project.name} synced[/green]")
 546 |         except RcloneError as e:
 547 |             console.print(f"[red]✗ {project.name} failed: {e}[/red]")
 548 | 
 549 | 
 550 | @project_app.command("bisync")
 551 | def bisync_project(
 552 |     name: str = typer.Option(..., "--name", help="Project name"),
 553 |     all_projects: bool = typer.Option(False, "--all", help="Bisync all projects"),
 554 |     dry_run: bool = typer.Option(False, "--dry-run", help="Preview changes"),
 555 |     resync: bool = typer.Option(False, "--resync", help="Force new baseline"),
 556 |     verbose: bool = typer.Option(False, "--verbose", help="Show detailed output"),
 557 | ) -> None:
 558 |     """Two-way sync: local ↔ cloud (bidirectional sync)."""
 559 | 
 560 |     # Similar to sync but calls project_bisync()
 561 |     pass
 562 | 
 563 | 
 564 | @project_app.command("check")
 565 | def check_project(
 566 |     name: str = typer.Option(..., "--name", help="Project name"),
 567 |     one_way: bool = typer.Option(False, "--one-way", help="Check one direction only"),
 568 | ) -> None:
 569 |     """Verify file integrity between local and cloud."""
 570 | 
 571 |     # Calls project_check()
 572 |     pass
 573 | 
 574 | 
 575 | @project_app.command("ls")
 576 | def list_project_files(
 577 |     name: str = typer.Option(..., "--name", help="Project name"),
 578 |     path: str = typer.Argument(None, help="Path within project"),
 579 | ) -> None:
 580 |     """List files in remote project."""
 581 | 
 582 |     # Calls project_ls()
 583 |     pass
 584 | ```
 585 | 
 586 | **4.3 Update Cloud Setup**
 587 | 
 588 | ```python
 589 | # basic_memory/cli/commands/cloud/core_commands.py
 590 | 
 591 | @cloud_app.command("setup")
 592 | def cloud_setup() -> None:
 593 |     """Set up cloud sync (install rclone and configure credentials)."""
 594 | 
 595 |     console.print("[bold blue]Basic Memory Cloud Setup[/bold blue]")
 596 |     console.print("Installing rclone and configuring credentials...\n")
 597 | 
 598 |     try:
 599 |         # Step 1: Install rclone
 600 |         console.print("[blue]Step 1: Installing rclone...[/blue]")
 601 |         install_rclone()
 602 | 
 603 |         # Step 2: Get tenant info
 604 |         console.print("\n[blue]Step 2: Getting tenant information...[/blue]")
 605 |         tenant_info = asyncio.run(get_mount_info())
 606 |         tenant_id = tenant_info.tenant_id
 607 | 
 608 |         console.print(f"[green]✓ Tenant: {tenant_id}[/green]")
 609 | 
 610 |         # Step 3: Generate credentials
 611 |         console.print("\n[blue]Step 3: Generating sync credentials...[/blue]")
 612 |         creds = asyncio.run(generate_mount_credentials(tenant_id))
 613 | 
 614 |         console.print("[green]✓ Generated credentials[/green]")
 615 | 
 616 |         # Step 4: Configure rclone (single remote: bm-cloud)
 617 |         console.print("\n[blue]Step 4: Configuring rclone...[/blue]")
 618 |         configure_rclone_remote(
 619 |             access_key=creds.access_key,
 620 |             secret_key=creds.secret_key,
 621 |         )
 622 | 
 623 |         console.print("\n[bold green]✓ Cloud setup completed![/bold green]")
 624 |         console.print("\nNext steps:")
 625 |         console.print("  1. Create projects with local sync:")
 626 |         console.print("     bm project add research --local ~/Documents/research")
 627 |         console.print("\n  2. Or configure sync for existing projects:")
 628 |         console.print("     bm project sync-setup research ~/Documents/research")
 629 |         console.print("\n  3. Start syncing:")
 630 |         console.print("     bm project bisync --name research --resync")
 631 | 
 632 |     except Exception as e:
 633 |         console.print(f"\n[red]Setup failed: {e}[/red]")
 634 |         raise typer.Exit(1)
 635 | ```
 636 | 
 637 | ### Phase 5: Cleanup
 638 | 
 639 | **5.1 Remove Deprecated Commands**
 640 | 
 641 | ```python
 642 | # Remove from cloud commands:
 643 | - cloud mount
 644 | - cloud unmount
 645 | - cloud mount-status
 646 | - bisync-setup
 647 | - Individual bisync command (moved to project bisync)
 648 | 
 649 | # Remove from root commands:
 650 | - bm sync (without project specification)
 651 | - bm bisync (without project specification)
 652 | ```
 653 | 
 654 | **5.2 Remove Deprecated Code**
 655 | 
 656 | ```python
 657 | # Files to remove:
 658 | - mount_commands.py (entire file)
 659 | 
 660 | # Functions to remove from rclone_config.py:
 661 | - MOUNT_PROFILES
 662 | - get_default_mount_path()
 663 | - build_mount_command()
 664 | - is_path_mounted()
 665 | - get_rclone_processes()
 666 | - kill_rclone_process()
 667 | - unmount_path()
 668 | - cleanup_orphaned_rclone_processes()
 669 | 
 670 | # Functions to remove from bisync_commands.py:
 671 | - BISYNC_PROFILES (use single default)
 672 | - setup_cloud_bisync() (replaced by cloud setup)
 673 | - run_bisync_watch() (can add back to project bisync if needed)
 674 | - show_bisync_status() (replaced by project list showing sync status)
 675 | ```
 676 | 
 677 | **5.3 Update Configuration Schema**
 678 | 
 679 | ```python
 680 | # Remove from config.json:
 681 | - bisync_config (no longer needed)
 682 | 
 683 | # The projects array is the source of truth for sync configuration
 684 | ```
 685 | 
 686 | ### Phase 6: Documentation Updates
 687 | 
 688 | **6.1 Update CLI Documentation**
 689 | 
 690 | ```markdown
 691 | # docs/cloud-cli.md
 692 | 
 693 | ## Project-Scoped Cloud Sync
 694 | 
 695 | Basic Memory cloud sync is project-scoped - each project can optionally be configured with a local working directory that syncs with the cloud.
 696 | 
 697 | ### Setup (One Time)
 698 | 
 699 | 1. Authenticate and enable cloud mode:
 700 |    ```bash
 701 |    bm cloud login
 702 |    ```
 703 | 
 704 | 2. Install rclone and configure credentials:
 705 |    ```bash
 706 |    bm cloud setup
 707 |    ```
 708 | 
 709 | ### Create Projects with Sync
 710 | 
 711 | Create a cloud project with optional local sync:
 712 | 
 713 | ```bash
 714 | # Create project without local sync
 715 | bm project add research
 716 | 
 717 | # Create project with local sync
 718 | bm project add research --local ~/Documents/research
 719 | ```
 720 | 
 721 | Or configure sync for existing (remote) project:
 722 | 
 723 | ```bash
 724 | bm project sync-setup research ~/Documents/research
 725 | ```
 726 | 
 727 | ### Syncing Projects
 728 | 
 729 | **Two-way sync (recommended):**
 730 | ```bash
 731 | # First time - establish baseline
 732 | bm project bisync --name research --resync
 733 | 
 734 | # Subsequent syncs
 735 | bm project bisync --name research
 736 | 
 737 | # Sync all projects with local_sync_path configured
 738 | bm project bisync --all
 739 | ```
 740 | 
 741 | **One-way sync (local → cloud):**
 742 | ```bash
 743 | bm project sync --name research
 744 | ```
 745 | 
 746 | **Verify integrity:**
 747 | ```bash
 748 | bm project check --name research
 749 | ```
 750 | 
 751 | ### Advanced Operations
 752 | 
 753 | **List remote files:**
 754 | ```bash
 755 | bm project ls --name research
 756 | bm project ls --name research subfolder
 757 | ```
 758 | 
 759 | **Preview changes before syncing:**
 760 | ```bash
 761 | bm project bisync --name research --dry-run
 762 | ```
 763 | 
 764 | **Verbose output for debugging:**
 765 | ```bash
 766 | bm project bisync --name research --verbose
 767 | ```
 768 | 
 769 | ### Project Management
 770 | 
 771 | **List projects (shows sync status):**
 772 | ```bash
 773 | bm project list
 774 | ```
 775 | 
 776 | **Update sync path:**
 777 | ```bash
 778 | bm project sync-setup research ~/new/path
 779 | ```
 780 | 
 781 | **Remove project:**
 782 | ```bash
 783 | bm project remove research
 784 | ```
 785 | ```
 786 | 
 787 | **6.2 Update SPEC-8**
 788 | 
 789 | Add to SPEC-8's "Implementation Notes" section:
 790 | 
 791 | ```markdown
 792 | ## Superseded by SPEC-20
 793 | 
 794 | The initial implementation in SPEC-8 proved too complex with multiple footguns:
 795 | - Mount vs bisync workflow confusion
 796 | - Multiple profiles creating decision paralysis
 797 | - Directory conflicts and auto-discovery errors
 798 | 
 799 | SPEC-20 supersedes the sync implementation with a simplified project-scoped approach while keeping the core Tigris infrastructure from SPEC-8.
 800 | ```
 801 | 
 802 | ## How to Evaluate
 803 | 
 804 | ### Success Criteria
 805 | 
 806 | **1. Simplified Setup**
 807 | - [ ] `bm cloud setup` completes in one command
 808 | - [ ] Creates single rclone remote named `bm-cloud`
 809 | - [ ] No profile selection required
 810 | - [ ] Clear next steps printed after setup
 811 | 
 812 | **2. Clear Project Model**
 813 | - [ ] Projects can be created with or without local sync
 814 | - [ ] `bm project list` shows sync status for each project
 815 | - [ ] `local_sync_path` stored in project config
 816 | - [ ] Renaming local folder doesn't break sync (config is source of truth)
 817 | 
 818 | **3. Working Sync Operations**
 819 | - [ ] `bm project sync --name <project>` performs one-way sync
 820 | - [ ] `bm project bisync --name <project>` performs two-way sync
 821 | - [ ] `bm project check --name <project>` verifies integrity
 822 | - [ ] `--all` flag syncs all configured projects
 823 | - [ ] `--dry-run` shows changes without applying
 824 | - [ ] First bisync requires `--resync` with clear error message
 825 | 
 826 | **4. Safety**
 827 | - [ ] Cannot sync project without `local_sync_path` configured
 828 | - [ ] Bisync state is per-project (not global)
 829 | - [ ] `.bmignore` patterns respected
 830 | - [ ] Max delete safety (25 files) prevents accidents
 831 | - [ ] Clear error messages for all failure modes
 832 | 
 833 | **5. Clean Removal**
 834 | - [ ] Mount commands removed
 835 | - [ ] Profile selection removed
 836 | - [ ] Global sync directory removed (`~/basic-memory-cloud-sync/`)
 837 | - [ ] Auto-discovery removed
 838 | - [ ] Convenience commands (`bm sync`) removed
 839 | 
 840 | **6. Documentation**
 841 | - [ ] Updated cloud-cli.md with new workflow
 842 | - [ ] Clear examples for common operations
 843 | - [ ] Migration guide for existing users
 844 | - [ ] Troubleshooting section
 845 | 
 846 | ### Test Scenarios
 847 | 
 848 | **Scenario 1: New User Setup**
 849 | ```bash
 850 | # Start fresh
 851 | bm cloud login
 852 | bm cloud setup
 853 | bm project add research --local ~/docs/research
 854 | bm project bisync --name research --resync
 855 | # Edit files locally
 856 | bm project bisync --name research
 857 | # Verify changes synced
 858 | ```
 859 | 
 860 | **Scenario 2: Multiple Projects**
 861 | ```bash
 862 | bm project add work --local ~/work
 863 | bm project add personal --local ~/personal
 864 | bm project bisync --all --resync
 865 | # Edit files in both projects
 866 | bm project bisync --all
 867 | ```
 868 | 
 869 | **Scenario 3: Project Without Sync**
 870 | ```bash
 871 | bm project add temp-notes
 872 | # Try to sync (should fail gracefully)
 873 | bm project bisync --name temp-notes
 874 | # Should see: "Project temp-notes has no local_sync_path configured"
 875 | ```
 876 | 
 877 | **Scenario 4: Integrity Check**
 878 | ```bash
 879 | bm project bisync --name research
 880 | # Manually edit file in cloud UI
 881 | bm project check --name research
 882 | # Should report differences
 883 | bm project bisync --name research
 884 | # Should sync changes back to local
 885 | ```
 886 | 
 887 | **Scenario 5: Safety Features**
 888 | ```bash
 889 | # Delete 30 files locally
 890 | bm project sync --name research
 891 | # Should fail with max delete error
 892 | # User reviews and confirms
 893 | bm project sync --name research  # After confirming
 894 | ```
 895 | 
 896 | ### Performance Targets
 897 | 
 898 | - Setup completes in < 30 seconds
 899 | - Single project sync < 5 seconds for small changes
 900 | - Bisync initialization (--resync) < 10 seconds for typical project
 901 | - Batch sync (--all) processes N projects in N*5 seconds
 902 | 
 903 | ### Breaking Changes
 904 | 
 905 | This is a **breaking change** from SPEC-8 implementation:
 906 | 
 907 | **Migration Required:**
 908 | - Users must run `bm cloud setup` again
 909 | - Existing `~/basic-memory-cloud-sync/` directory abandoned
 910 | - Projects must be configured with `local_sync_path`
 911 | - Mount users must switch to bisync workflow
 912 | 
 913 | **Migration Guide:**
 914 | ```bash
 915 | # 1. Note current project locations
 916 | bm project list
 917 | 
 918 | # 2. Re-run setup
 919 | bm cloud setup
 920 | 
 921 | # 3. Configure sync for each project
 922 | bm project sync-setup research ~/Documents/research
 923 | bm project sync-setup work ~/work
 924 | 
 925 | # 4. Establish baselines
 926 | bm project bisync --all --resync
 927 | 
 928 | # 5. Old directory can be deleted
 929 | rm -rf ~/basic-memory-cloud-sync/
 930 | ```
 931 | 
 932 | ## Dependencies
 933 | 
 934 | - **SPEC-8**: TigrisFS Integration (bucket provisioning, credentials)
 935 | - Python 3.12+
 936 | - rclone 1.64.0+
 937 | - Typer (CLI framework)
 938 | - Rich (console output)
 939 | 
 940 | ## Risks
 941 | 
 942 | **Risk 1: User Confusion from Breaking Changes**
 943 | - Mitigation: Clear migration guide, version bump (0.16.0)
 944 | - Mitigation: Detect old config and print migration instructions
 945 | 
 946 | **Risk 2: Per-Project Bisync State Complexity**
 947 | - Mitigation: Use rclone's `--workdir` to isolate state per project
 948 | - Mitigation: Store in `~/.basic-memory/bisync-state/{project_name}/`
 949 | 
 950 | **Risk 3: Batch Operations Performance**
 951 | - Mitigation: Run syncs sequentially with progress indicators
 952 | - Mitigation: Add `--parallel` flag in future if needed
 953 | 
 954 | **Risk 4: Lost Features (Mount)**
 955 | - Mitigation: Document mount as experimental/advanced feature
 956 | - Mitigation: Can restore if users request it
 957 | 
 958 | ## Open Questions
 959 | 
 960 | 1. **Should we keep mount as experimental command?**
 961 |    - Lean toward: Remove entirely, focus on bisync
 962 |    - Alternative: Keep as `bm project mount --name <project>` (advanced)
 963 | 
 964 |    - Answer: remove
 965 | 
 966 | 2. **Batch sync order?**
 967 |    - Alphabetical by project name?
 968 |    - By last modified time?
 969 |    - Let user specify order?
 970 |    - answer: project order from api or config
 971 | 
 972 | 3. **Credential refresh?**
 973 |    - Auto-detect expired credentials and re-run credential generation?
 974 |    - Or require manual `bm cloud setup` again?
 975 |    - answer: manual setup is fine
 976 | 
 977 | 4. **Watch mode for projects?**
 978 |    - `bm project bisync --name research --watch`?
 979 |    - Or removed entirely (users can use OS tools)?
 980 |    - answer: remove for now: we can add it back later if it's useful
 981 | 
 982 | 5. **Project path validation?**
 983 |    - Ensure `local_path` exists before allowing bisync?
 984 |    - Or let rclone error naturally?
 985 |    - answer: create if needed, exists is ok
 986 | 
 987 | ## Implementation Checklist
 988 | 
 989 | ### Phase 1: Config Schema (1-2 days) ✅
 990 | - [x] Add `CloudProjectConfig` model to `basic_memory/config.py`
 991 | - [x] Add `cloud_projects: dict[str, CloudProjectConfig]` to Config model
 992 | - [x] Test config loading/saving with new schema
 993 | - [x] Handle migration from old config format
 994 | 
 995 | ### Phase 2: Rclone Config Simplification ✅
 996 | - [x] Update `configure_rclone_remote()` to use `basic-memory-cloud` as remote name
 997 | - [x] Remove `add_tenant_to_rclone_config()` (replaced by configure_rclone_remote)
 998 | - [x] Remove tenant_id from remote naming
 999 | - [x] Test rclone config generation
1000 | - [x] Clean up deprecated import references in bisync_commands.py and core_commands.py
1001 | 
1002 | ### Phase 3: Project-Scoped Rclone Commands ✅
1003 | - [x] Create `src/basic_memory/cli/commands/cloud/rclone_commands.py`
1004 | - [x] Implement `get_project_remote(project, bucket_name)`
1005 | - [x] Implement `project_sync()` (one-way: local → cloud)
1006 | - [x] Implement `project_bisync()` (two-way: local ↔ cloud)
1007 | - [x] Implement `project_check()` (integrity verification)
1008 | - [x] Implement `project_ls()` (list remote files)
1009 | - [x] Add helper: `get_project_bisync_state(project_name)`
1010 | - [x] Add helper: `bisync_initialized(project_name)`
1011 | - [x] Add helper: `get_bmignore_filter_path()`
1012 | - [x] Add `SyncProject` dataclass for project representation
1013 | - [x] Write unit tests for rclone commands (22 tests, 99% coverage)
1014 | - [x] Temporarily disable mount commands in core_commands.py
1015 | 
1016 | ### Phase 4: CLI Integration ✅
1017 | - [x] Update `project.py`: Add `--local-path` flag to `project add` command
1018 | - [x] Update `project.py`: Create `project sync-setup` command
1019 | - [x] Create `project.py`: Add `project sync` command
1020 | - [x] Create `project.py`: Add `project bisync` command
1021 | - [x] Create `project.py`: Add `project check` command
1022 | - [x] Create `project.py`: Add `project ls` command
1023 | - [x] Create `project.py`: Add `project bisync-reset` command
1024 | - [x] Import rclone_commands module and get_mount_info helper
1025 | - [x] Update `project list` to show local sync paths in cloud mode
1026 | - [x] Update `project list` to conditionally show columns based on config
1027 | - [x] Update `project remove` to clean up local directories and bisync state
1028 | - [x] Add automatic database sync trigger after file sync operations
1029 | - [x] Add path normalization to prevent S3 mount point leakage
1030 | - [x] Update `cloud/core_commands.py`: Simplified `cloud setup` command
1031 | - [x] Write unit tests for `project add --local-path` (4 tests passing)
1032 | 
1033 | ### Phase 5: Cleanup ✅
1034 | - [x] Remove `mount_commands.py` (entire file)
1035 | - [x] Remove mount-related functions from `rclone_config.py`:
1036 |   - [x] `MOUNT_PROFILES`
1037 |   - [x] `get_default_mount_path()`
1038 |   - [x] `build_mount_command()`
1039 |   - [x] `is_path_mounted()`
1040 |   - [x] `get_rclone_processes()`
1041 |   - [x] `kill_rclone_process()`
1042 |   - [x] `unmount_path()`
1043 |   - [x] `cleanup_orphaned_rclone_processes()`
1044 | - [x] Remove from `bisync_commands.py`:
1045 |   - [x] `BISYNC_PROFILES` (use single default)
1046 |   - [x] `setup_cloud_bisync()`
1047 |   - [x] `run_bisync_watch()`
1048 |   - [x] `show_bisync_status()`
1049 |   - [x] `run_bisync()`
1050 |   - [x] `run_check()`
1051 | - [x] Remove `bisync_config` from config schema
1052 | - [x] Remove deprecated cloud commands:
1053 |   - [x] `cloud mount`
1054 |   - [x] `cloud unmount`
1055 |   - [x] Simplified `cloud setup` to just install rclone and configure credentials
1056 | - [x] Remove convenience commands:
1057 |   - [x] Root-level `bm sync` (removed - confusing in cloud mode, automatic in local mode)
1058 | - [x] Update tests to remove references to deprecated functionality
1059 | - [x] All typecheck errors resolved
1060 | 
1061 | ### Phase 6: Documentation ✅
1062 | - [x] Update `docs/cloud-cli.md` with new workflow
1063 | - [x] Add troubleshooting section for empty directory issues
1064 | - [x] Add troubleshooting section for bisync state corruption
1065 | - [x] Document `bisync-reset` command usage
1066 | - [x] Update command reference with all new commands
1067 | - [x] Add examples for common workflows
1068 | - [ ] Add migration guide for existing users (deferred - no users on old system yet)
1069 | - [ ] Update SPEC-8 with "Superseded by SPEC-20" note (deferred)
1070 | 
1071 | ### Testing & Validation ✅
1072 | - [x] Test Scenario 1: New user setup (manual testing complete)
1073 | - [x] Test Scenario 2: Multiple projects (manual testing complete)
1074 | - [x] Test Scenario 3: Project without sync (manual testing complete)
1075 | - [x] Test Scenario 4: Integrity check (manual testing complete)
1076 | - [x] Test Scenario 5: bisync-reset command (manual testing complete)
1077 | - [x] Test cleanup on remove (manual testing complete)
1078 | - [x] Verify all commands work end-to-end
1079 | - [x] Document known issues (empty directory bisync limitation)
1080 | - [ ] Automated integration tests (deferred)
1081 | - [ ] Test migration from SPEC-8 implementation (N/A - no users yet)
1082 | 
1083 | ## Implementation Notes
1084 | 
1085 | ### Key Improvements Added During Implementation
1086 | 
1087 | **1. Path Normalization (Critical Bug Fix)**
1088 | 
1089 | **Problem:** Files were syncing to `/app/data/app/data/project/` instead of `/app/data/project/`
1090 | 
1091 | **Root cause:**
1092 | - S3 bucket contains projects directly (e.g., `basic-memory-llc/`)
1093 | - Fly machine mounts bucket at `/app/data/`
1094 | - API returns paths like `/app/data/basic-memory-llc` (mount point + project)
1095 | - Rclone was using this full path, causing path doubling
1096 | 
1097 | **Solution (three layers):**
1098 | - API side: Added `normalize_project_path()` in `project_router.py` to strip `/app/data/` prefix
1099 | - CLI side: Added defensive normalization in `project.py` commands
1100 | - Rclone side: Updated `get_project_remote()` to strip prefix before building remote path
1101 | 
1102 | **Files modified:**
1103 | - `src/basic_memory/api/routers/project_router.py` - API normalization
1104 | - `src/basic_memory/cli/commands/project.py` - CLI normalization
1105 | - `src/basic_memory/cli/commands/cloud/rclone_commands.py` - Rclone remote path construction
1106 | 
1107 | **2. Automatic Database Sync After File Operations**
1108 | 
1109 | **Enhancement:** After successful file sync or bisync, automatically trigger database sync via API
1110 | 
1111 | **Implementation:**
1112 | - After `project sync`: POST to `/{project}/project/sync`
1113 | - After `project bisync`: POST to `/{project}/project/sync` + update config timestamps
1114 | - Skip trigger on `--dry-run`
1115 | - Graceful error handling with warnings
1116 | 
1117 | **Benefit:** Files and database stay in sync automatically without manual intervention
1118 | 
1119 | **3. Enhanced Project Removal with Cleanup**
1120 | 
1121 | **Enhancement:** `bm project remove` now properly cleans up local artifacts
1122 | 
1123 | **Behavior with `--delete-notes`:**
1124 | - ✓ Removes project from cloud API
1125 | - ✓ Deletes cloud files
1126 | - ✓ Removes local sync directory
1127 | - ✓ Removes bisync state directory
1128 | - ✓ Removes `cloud_projects` config entry
1129 | 
1130 | **Behavior without `--delete-notes`:**
1131 | - ✓ Removes project from cloud API
1132 | - ✗ Keeps local files (shows path in message)
1133 | - ✓ Removes bisync state directory (cleanup)
1134 | - ✓ Removes `cloud_projects` config entry
1135 | 
1136 | **Files modified:**
1137 | - `src/basic_memory/cli/commands/project.py` - Enhanced `remove_project()` function
1138 | 
1139 | **4. Bisync State Reset Command**
1140 | 
1141 | **New command:** `bm project bisync-reset <project>`
1142 | 
1143 | **Purpose:** Clear bisync state when it becomes corrupted (e.g., after mixing dry-run and actual runs)
1144 | 
1145 | **What it does:**
1146 | - Removes all bisync metadata from `~/.basic-memory/bisync-state/{project}/`
1147 | - Forces fresh baseline on next `--resync`
1148 | - Safe operation (doesn't touch files)
1149 | - Also runs automatically on project removal
1150 | 
1151 | **Files created:**
1152 | - Added `bisync-reset` command to `src/basic_memory/cli/commands/project.py`
1153 | 
1154 | **5. Improved UI for Project List**
1155 | 
1156 | **Enhancements:**
1157 | - Shows "Local Path" column in cloud mode for projects with sync configured
1158 | - Conditionally shows/hides columns based on config:
1159 |   - Local Path: only in cloud mode
1160 |   - Default: only when `default_project_mode` is True
1161 | - Uses `no_wrap=True, overflow="fold"` to prevent path truncation
1162 | - Applies path normalization to prevent showing mount point details
1163 | 
1164 | **Files modified:**
1165 | - `src/basic_memory/cli/commands/project.py` - Enhanced `list_projects()` function
1166 | 
1167 | **6. Documentation of Known Issues**
1168 | 
1169 | **Issue documented:** Rclone bisync limitation with empty directories
1170 | 
1171 | **Problem:** "Empty prior Path1 listing. Cannot sync to an empty directory"
1172 | 
1173 | **Explanation:** Bisync creates listing files that track state. When both directories are completely empty, these listing files are considered invalid.
1174 | 
1175 | **Solution documented:** Add at least one file (like README.md) before running `--resync`
1176 | 
1177 | **Files updated:**
1178 | - `docs/cloud-cli.md` - Added troubleshooting sections for:
1179 |   - Empty directory issues
1180 |   - Bisync state corruption
1181 |   - Usage of `bisync-reset` command
1182 | 
1183 | ### Rclone Flag Fix
1184 | 
1185 | **Bug fix:** Incorrect rclone flag causing sync failures
1186 | 
1187 | **Error:** `unknown flag: --filters-file`
1188 | 
1189 | **Fix:** Changed `--filters-file` to correct flag `--filter-from` in both `project_sync()` and `project_bisync()` functions
1190 | 
1191 | **Files modified:**
1192 | - `src/basic_memory/cli/commands/cloud/rclone_commands.py`
1193 | 
1194 | ### Test Coverage
1195 | 
1196 | **Unit tests added:**
1197 | - `tests/cli/test_project_add_with_local_path.py` - 4 tests for `--local-path` functionality
1198 |   - Test with local path saves to config
1199 |   - Test without local path doesn't save to config
1200 |   - Test tilde expansion in paths
1201 |   - Test nested directory creation
1202 | 
1203 | **Manual testing completed:**
1204 | - All 10 project commands tested end-to-end
1205 | - Path normalization verified
1206 | - Database sync trigger verified
1207 | - Cleanup on remove verified
1208 | - Bisync state reset verified
1209 | 
1210 | ## Future Enhancements (Out of Scope)
1211 | 
1212 | - **Per-project rclone profiles**: Allow advanced users to override defaults
1213 | - **Conflict resolution UI**: Interactive conflict resolution for bisync
1214 | - **Sync scheduling**: Automatic periodic sync without watch mode
1215 | - **Sync analytics**: Track sync frequency, data transferred, etc.
1216 | - **Multi-machine coordination**: Detect and warn about concurrent edits from different machines
1217 | 
```
Page 21/27FirstPrevNextLast