ruanodendaal/bear-mcp-server # codebase.md

# Directory Structure

```
├── .dockerignore
├── .DS_Store
├── .gitignore
├── Dockerfile
├── LICENSE
├── package-lock.json
├── package.json
├── readme.md
└── src
    ├── bear-mcp-server.js
    ├── create-index.js
    ├── lib
    │   └── explore-database.js
    └── utils.js
```

# Files

--------------------------------------------------------------------------------
/.dockerignore:
--------------------------------------------------------------------------------

```
node_modules
npm-debug.log
.DS_Store

```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
node_modules
.DS_Store

# Vector index files
src/note_vectors.index
src/note_vectors.json

```

--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------

```markdown
# Bear Notes MCP Server with RAG

Looking to supercharge your Bear Notes experience with AI assistants? This little gem connects your personal knowledge base to AI systems using semantic search and RAG (Retrieval-Augmented Generation).

I built this because I wanted my AI assistants to actually understand what's in my notes, not just perform simple text matching. The result is rather sweet, if I do say so myself.

## Getting Started

Setting up is straightforward:

```bash
git clone [your-repo-url]
cd bear-mcp-server
npm install
```

Make the scripts executable (because permissions matter):

```bash
chmod +x src/bear-mcp-server.js
chmod +x src/create-index.js
```

## First Things First: Index Your Notes

Before diving in, you'll need to create vector embeddings of your notes:

```bash
npm run index
```

Fair warning: this might take a few minutes if you're a prolific note-taker like me. It's converting all your notes into mathematical vectors that capture their meaning— clever stuff 😉.

## Configuration

Update your MCP configuration file:

```json
{
  "mcpServers": {
    "bear-notes": {
      "command": "node",
      "args": [
        "/absolute/path/to/bear-mcp-server/src/bear-mcp-server.js"
      ],
      "env": {
        "BEAR_DATABASE_PATH": "/Users/yourusername/Library/Group Containers/9K33E3U3T4.net.shinyfrog.net.bear/Application Data/database.sqlite"
      }
    }
  }
}
```

> 🚨 _Remember to replace the path with your actual installation location. No prizes for using the example path verbatim, I'm afraid._ 

## What Makes This Special?

- **Semantic Search**: Find notes based on meaning, not just keywords. Ask about "productivity systems" and it'll find your notes on GTD and Pomodoro, even if they don't contain those exact words.

- **RAG Support**: Your AI assistants can now pull in relevant context from your notes, even when you haven't explicitly mentioned them.

- **All Local Processing**: Everything runs on your machine. No data leaves your computer, no API keys needed, no internet dependency (after initial setup).

- **Graceful Fallbacks**: If semantic search isn't available for whatever reason, it'll quietly fall back to traditional search. Belt and braces.

## How It Works

### The Clever Bits

This server uses the Xenova implementation of transformers.js with the all-MiniLM-L6-v2 model:

- It creates 384-dimensional vectors that capture the semantic essence of your notes
- All processing happens locally on your machine
- The first startup might be a tad slow while the model loads, but it's zippy after that

### The Flow

1. Your query gets converted into a vector using the transformer model
2. This vector is compared to the pre-indexed vectors of your notes
3. Notes with similar meanings are returned, regardless of exact keyword matches
4. AI assistants use these relevant notes as context for their responses

## Project Structure

Nothing too complex here:

```
bear-mcp-server/
├── package.json
├── readme.md
└── src/
    ├── bear-mcp-server.js     # Main MCP server
    ├── create-index.js        # Script to index notes
    ├── utils.js               # Utility functions
    ├── lib/                   # Additional utilities and diagnostic scripts
    │   └── explore-database.js # Database exploration and diagnostic tool
    ├── note_vectors.index     # Generated vector index (after indexing)
    └── note_vectors.json      # Note ID mapping (after indexing)
```

## Available Tools for AI Assistants

AI assistants connecting to this server can use these tools:

1. **search_notes**: Find notes that match a query
   - Parameters: `query` (required), `limit` (optional, default: 10), `semantic` (optional, default: true)

2. **get_note**: Fetch a specific note by its ID
   - Parameters: `id` (required)

3. **get_tags**: List all tags used in your Bear Notes

4. **retrieve_for_rag**: Get notes semantically similar to a query, specifically formatted for RAG
   - Parameters: `query` (required), `limit` (optional, default: 5)

## Requirements

- Node.js version 16 or higher
- Bear Notes for macOS
- An MCP-compatible AI assistant client

## Limitations & Caveats

- Read-only access to Bear Notes (we're not modifying your precious notes)
- macOS only (sorry Windows and Linux folks)
- If you add loads of new notes, you'll want to rebuild the index with `npm run index`
- First startup is a bit like waiting for the kettle to boil while the embedding model loads

## Troubleshooting

If things go wonky:

1. Double-check your Bear database path
2. Make sure you've run the indexing process with `npm run index`
3. Check permissions on the Bear Notes database
4. Verify the server scripts are executable
5. Look for error messages in the logs

When in doubt, try turning it off and on again. Works more often than we'd like to admit.

## 🐳 Running with Docker (Optional)

Prefer containers? You can run everything inside Docker too.

### 1. Build the Docker image

```bash
docker build -t bear-mcp-server .
```

### 2. Index your notes

You'll still need to run the indexing step before anything useful happens:

```bash
docker run \
  -v /path/to/your/NoteDatabase.sqlite:/app/database.sqlite \
  -e BEAR_DATABASE_PATH=/app/database.sqlite \
  bear-mcp-server \
  npm run index
```

> 🛠 Replace `/path/to/your/NoteDatabase.sqlite` with the actual path to your Bear database.

### 3. Start the server

Once indexed, fire it up:

```bash
docker run \
  -v /path/to/your/NoteDatabase.sqlite:/app/database.sqlite \
  -e BEAR_DATABASE_PATH=/app/database.sqlite \
  -p 8000:8000 \
  bear-mcp-server
```

Boom—your AI assistant is now running in a container and talking to your notes.

## License

MIT (Feel free to tinker, share, and improve)
```

--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------

```dockerfile
# Use the official Node.js 16 LTS image as the base
FROM node:16-slim

# Set the working directory
WORKDIR /app

# Copy package files and install dependencies
COPY package*.json ./
RUN npm install --production

# Copy the rest of the application code
COPY . .

# Make the server script executable
RUN chmod +x src/bear-mcp-server.js

# Define the default command to run the server
CMD ["node", "src/bear-mcp-server.js"]

```

--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------

```json
{
  "name": "bear-mcp-server",
  "version": "1.0.0",
  "description": "Model Context Protocol server for Bear Notes with RAG capabilities",
  "main": "src/bear-mcp-server.js",
  "type": "module",
  "scripts": {
    "start": "node src/bear-mcp-server.js",
    "index": "node src/create-index.js",
    "test": "node src/lib/explore-database.js"
  },
  "dependencies": {
    "@modelcontextprotocol/sdk": "latest",
    "sqlite3": "latest",
    "@xenova/transformers": "^2.15.0",
    "faiss-node": "^0.5.1"
  },
  "engines": {
    "node": ">=16.0.0"
  }
}
```

--------------------------------------------------------------------------------
/src/create-index.js:
--------------------------------------------------------------------------------

```javascript
#!/usr/bin/env node

import { getDbPath, createDb, initEmbedder, createEmbedding } from './utils.js';
// Fix for CommonJS module import in ESM
import faissNode from 'faiss-node';
const { IndexFlatL2 } = faissNode;

import fs from 'fs/promises';
import path from 'path';
import { fileURLToPath } from 'url';

// Get current file path for ES modules
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

// Path to save the vector index
const INDEX_PATH = path.join(__dirname, 'note_vectors');

// Main indexing function
async function createVectorIndex() {
  console.log('Starting to create vector index for Bear Notes...');
  
  // Initialize the embedding model
  const modelInitialized = await initEmbedder();
  if (!modelInitialized) {
    console.error('Failed to initialize embedding model');
    process.exit(1);
  }
  
  // Connect to the database
  const dbPath = getDbPath();
  const db = createDb(dbPath);
  
  try {
    // Get all non-trashed notes
    const notes = await db.allAsync(`
      SELECT 
        ZUNIQUEIDENTIFIER as id,
        ZTITLE as title,
        ZTEXT as content
      FROM ZSFNOTE
      WHERE ZTRASHED = 0
    `);
    
    console.log(`Found ${notes.length} notes to index`);
    
    // Create vectors for all notes
    const noteIds = [];
    const dimension = 384; // Dimension of the all-MiniLM-L6-v2 model
    
    // Create FAISS index
    const index = new IndexFlatL2(dimension);
    
    // Process notes in batches to avoid memory issues
    for (let i = 0; i < notes.length; i++) {
      const note = notes[i];
      
      // Create a combined text for embedding
      const textToEmbed = `${note.title}\n${note.content || ''}`.trim();
      
      if (textToEmbed) {
        try {
          // Create embedding for the note
          const embedding = await createEmbedding(textToEmbed);
          
          // Add to index
          index.add(embedding);
          
          // Store note ID
          noteIds.push(note.id);
          
          if ((i + 1) % 50 === 0 || i === notes.length - 1) {
            console.log(`Indexed ${i + 1} of ${notes.length} notes`);
          }
        } catch (error) {
          console.error(`Error embedding note ${note.id}:`, error.message);
        }
      }
    }
    
    console.log(`Successfully created embeddings for ${noteIds.length} notes`);
    
    // Create mapping from index positions to note IDs
    const noteIdMap = {};
    for (let i = 0; i < noteIds.length; i++) {
      noteIdMap[i] = noteIds[i];
    }
    
    // Save the index and mapping
    index.write(`${INDEX_PATH}.index`);
    await fs.writeFile(`${INDEX_PATH}.json`, JSON.stringify(noteIdMap));
    
    console.log(`Vector index saved to ${INDEX_PATH}`);
  } catch (error) {
    console.error('Error creating vector index:', error);
  } finally {
    // Close the database connection
    db.close();
  }
}

// Run the indexing
createVectorIndex().then(() => {
  console.log('Indexing complete');
  process.exit(0);
}).catch(error => {
  console.error('Indexing failed:', error);
  process.exit(1);
});
```

--------------------------------------------------------------------------------
/src/bear-mcp-server.js:
--------------------------------------------------------------------------------

```javascript
#!/usr/bin/env node

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
  CallToolRequestSchema,
  ErrorCode,
  ListToolsRequestSchema,
  McpError,
} from '@modelcontextprotocol/sdk/types.js';
import {
  getDbPath,
  createDb,
  searchNotes,
  retrieveNote,
  getAllTags,
  loadVectorIndex,
  initEmbedder,
  retrieveForRAG
} from './utils.js';

// Initialize dependencies
async function initialize() {
  console.error('Initializing Bear Notes MCP server...');
  
  // Initialize database connection
  const dbPath = getDbPath();
  const db = createDb(dbPath);
  
  // Initialize embedding model
  const modelInitialized = await initEmbedder();
  if (!modelInitialized) {
    console.error('Warning: Embedding model initialization failed, semantic search will not be available');
  }
  
  // Load vector index
  const indexLoaded = await loadVectorIndex();
  if (!indexLoaded) {
    console.error('Warning: Vector index not found, semantic search will not be available');
    console.error('Run "npm run index" to create the vector index');
  }
  
  return { db, hasSemanticSearch: modelInitialized && indexLoaded };
}

// Main function
async function main() {
  // Initialize components
  const { db, hasSemanticSearch } = await initialize();
  
  // Create MCP server
  const server = new Server(
    {
      name: 'bear-notes',
      version: '1.0.0',
    },
    {
      capabilities: {
        tools: {},
      }
    }
  );

  // Register the list tools handler
  server.setRequestHandler(ListToolsRequestSchema, async () => {
    const tools = [
      {
        name: 'search_notes',
        description: 'Search for notes in Bear that match a query',
        inputSchema: {
          type: 'object',
          properties: {
            query: {
              type: 'string',
              description: 'Search query to find matching notes',
            },
            limit: {
              type: 'number',
              description: 'Maximum number of results to return (default: 10)',
            },
            semantic: {
              type: 'boolean',
              description: 'Use semantic search instead of keyword search (default: true)',
            }
          },
          required: ['query'],
        },
      },
      {
        name: 'get_note',
        description: 'Retrieve a specific note by its ID',
        inputSchema: {
          type: 'object',
          properties: {
            id: {
              type: 'string',
              description: 'Unique identifier of the note to retrieve',
            },
          },
          required: ['id'],
        },
      },
      {
        name: 'get_tags',
        description: 'Get all tags used in Bear Notes',
        inputSchema: {
          type: 'object',
          properties: {},
        },
      }
    ];
    
    // Add RAG tool if semantic search is available
    if (hasSemanticSearch) {
      tools.push({
        name: 'retrieve_for_rag',
        description: 'Retrieve notes that are semantically similar to a query for RAG',
        inputSchema: {
          type: 'object',
          properties: {
            query: {
              type: 'string',
              description: 'Query for which to find relevant notes',
            },
            limit: {
              type: 'number',
              description: 'Maximum number of notes to retrieve (default: 5)',
            },
          },
          required: ['query'],
        },
      });
    }
    
    return { tools };
  });

  // Register the call tool handler
  server.setRequestHandler(CallToolRequestSchema, async (request) => {
    if (request.params.name === 'search_notes') {
      const { query, limit = 10, semantic = true } = request.params.arguments;
      const useSemanticSearch = semantic && hasSemanticSearch;
      
      try {
        const notes = await searchNotes(db, query, limit, useSemanticSearch);
        return { 
          toolResult: { 
            notes,
            searchMethod: useSemanticSearch ? 'semantic' : 'keyword' 
          } 
        };
      } catch (error) {
        return { 
          toolResult: { 
            error: `Search failed: ${error.message}`,
            searchMethod: 'keyword',
            notes: [] 
          } 
        };
      }
    }
    
    if (request.params.name === 'get_note') {
      const { id } = request.params.arguments;
      try {
        const note = await retrieveNote(db, id);
        return { toolResult: { note } };
      } catch (error) {
        return { toolResult: { error: error.message } };
      }
    }
    
    if (request.params.name === 'get_tags') {
      try {
        const tags = await getAllTags(db);
        return { toolResult: { tags } };
      } catch (error) {
        return { toolResult: { error: error.message } };
      }
    }
    
    if (request.params.name === 'retrieve_for_rag' && hasSemanticSearch) {
      const { query, limit = 5 } = request.params.arguments;
      try {
        const context = await retrieveForRAG(db, query, limit);
        return { 
          toolResult: { 
            context,
            query 
          } 
        };
      } catch (error) {
        return { 
          toolResult: { 
            error: `RAG retrieval failed: ${error.message}`,
            context: [] 
          } 
        };
      }
    }
    
    throw new McpError(ErrorCode.MethodNotFound, 'Tool not found');
  });

  // Use stdio transport instead of HTTP
  const transport = new StdioServerTransport();

  // Start the server with stdio transport
  await server.connect(transport);

  // Handle process termination
  ['SIGINT', 'SIGTERM', 'SIGHUP'].forEach(signal => {
    process.on(signal, () => {
      console.error(`Received ${signal}, shutting down Bear Notes MCP server...`);
      db.close(() => {
        console.error('Database connection closed.');
        process.exit(0);
      });
    });
  });

  // Important: Log to stderr for debugging, not stdout
  console.error('Bear Notes MCP server ready');
}

// Run the main function
main().catch(error => {
  console.error('Server error:', error);
  process.exit(1);
});
```

--------------------------------------------------------------------------------
/src/lib/explore-database.js:
--------------------------------------------------------------------------------

```javascript
#!/usr/bin/env node

import sqlite3 from 'sqlite3';
import { promisify } from 'util';
import path from 'path';
import os from 'os';

// Default path to Bear's database
const defaultDBPath = path.join(
  os.homedir(),
  'Library/Group Containers/9K33E3U3T4.net.shinyfrog.bear/Application Data/database.sqlite'
);

// Get the database path from environment variable or use default
const dbPath = process.env.BEAR_DATABASE_PATH || defaultDBPath;

console.log(`Examining Bear database at: ${dbPath}`);

// Connect to the database
const db = new sqlite3.Database(dbPath, sqlite3.OPEN_READONLY, (err) => {
  if (err) {
    console.error('Error connecting to Bear database:', err.message);
    process.exit(1);
  }
  console.log('Connected to Bear Notes database successfully');
});

// Promisify database methods
db.allAsync = promisify(db.all).bind(db);
db.getAsync = promisify(db.get).bind(db);

async function examineDatabase() {
  try {
    // List all tables in the database
    const tables = await db.allAsync(`
      SELECT name FROM sqlite_master 
      WHERE type='table'
      ORDER BY name;
    `);
    
    console.log('\n--- All Tables in Bear Database ---');
    tables.forEach(table => console.log(table.name));
    
    // Find tables related to tags
    const tagTables = tables.filter(table => 
      table.name.toLowerCase().includes('tag') || 
      table.name.toLowerCase().includes('z_')
    );
    
    console.log('\n--- Potential Tag-Related Tables ---');
    tagTables.forEach(table => console.log(table.name));
    
    // Detect Z_* junction tables which often connect many-to-many relationships
    const junctionTables = tables.filter(table => 
      table.name.startsWith('Z_') && 
      !table.name.includes('FTS')
    );
    
    console.log('\n--- Junction Tables (Z_*) ---');
    junctionTables.forEach(table => console.log(table.name));
    
    // Get schema for each tag-related table
    console.log('\n--- Schema Details for Tag-Related Tables ---');
    for (const table of tagTables) {
      const schema = await db.allAsync(`PRAGMA table_info(${table.name})`);
      console.log(`\nTable: ${table.name}`);
      schema.forEach(col => {
        console.log(`  - ${col.name} (${col.type})`);
      });
    }
    
    // Check if Z_7TAGS exists and suggest alternatives
    const hasZ7Tags = tables.some(table => table.name === 'Z_7TAGS');
    if (!hasZ7Tags) {
      console.log('\n--- Z_7TAGS Table Not Found ---');
      
      // Look for possible alternative junction tables between notes and tags
      console.log('\nPossible alternatives for note-tag relationships:');
      for (const table of junctionTables) {
        try {
          // Get the first few rows to sample the data
          const sampleData = await db.allAsync(`SELECT * FROM ${table.name} LIMIT 5`);
          if (sampleData && sampleData.length > 0) {
            console.log(`\nTable ${table.name} contents (sample):`);
            console.log(JSON.stringify(sampleData, null, 2));
          }
        } catch (error) {
          console.error(`Error reading from ${table.name}:`, error.message);
        }
      }
      
      // Look specifically at the ZSFNOTETAG table structure and contents
      if (tables.some(table => table.name === 'ZSFNOTETAG')) {
        try {
          console.log('\nExamining ZSFNOTETAG table structure:');
          const noteTagSchema = await db.allAsync(`PRAGMA table_info(ZSFNOTETAG)`);
          noteTagSchema.forEach(col => {
            console.log(`  - ${col.name} (${col.type})`);
          });
          
          // Sample some data from the note tag table
          const noteTagSample = await db.allAsync(`SELECT * FROM ZSFNOTETAG LIMIT 5`);
          console.log('\nZSFNOTETAG sample data:');
          console.log(JSON.stringify(noteTagSample, null, 2));
        } catch (error) {
          console.error('Error examining ZSFNOTETAG:', error.message);
        }
      }
      
      // Look for ZSFNOTE structure to understand how notes are stored
      if (tables.some(table => table.name === 'ZSFNOTE')) {
        try {
          console.log('\nExamining ZSFNOTE table structure:');
          const noteSchema = await db.allAsync(`PRAGMA table_info(ZSFNOTE)`);
          noteSchema.forEach(col => {
            console.log(`  - ${col.name} (${col.type})`);
          });
        } catch (error) {
          console.error('Error examining ZSFNOTE:', error.message);
        }
      }
    }
    
    // Try actual query used in the code to see what error it produces
    try {
      console.log('\n--- Testing the Problematic Query ---');
      // Get a sample note ID first
      const sampleNote = await db.getAsync(`
        SELECT ZUNIQUEIDENTIFIER as id FROM ZSFNOTE LIMIT 1
      `);
      
      if (sampleNote) {
        try {
          const tags = await db.allAsync(`
            SELECT ZT.ZTITLE as tag_name
            FROM Z_5TAGS ZNT
            JOIN ZSFNOTETAG ZT ON ZT.Z_PK = ZNT.Z_13TAGS
            JOIN ZSFNOTE ZN ON ZN.Z_PK = ZNT.Z_5NOTES
            WHERE ZN.ZUNIQUEIDENTIFIER = ?
          `, [sampleNote.id]);
          
          console.log('Query succeeded with results:', tags);
        } catch (error) {
          console.error('The problematic query failed with error:', error.message);
          
          // Try to identify the correct join pattern
          console.log('\nAttempting to find the correct table relationship...');
          
          for (const jTable of junctionTables) {
            // Skip large tables for performance reasons
            const count = await db.getAsync(`SELECT COUNT(*) as count FROM ${jTable.name}`);
            if (count.count > 1000) {
              console.log(`Skipping large table ${jTable.name} with ${count.count} rows`);
              continue;
            }
            
            const schema = await db.allAsync(`PRAGMA table_info(${jTable.name})`);
            const columns = schema.map(col => col.name);
            
            // Look for columns that might connect to notes and tags
            const noteCols = columns.filter(col => col.includes('NOTE') || col.includes('NOTES'));
            const tagCols = columns.filter(col => col.includes('TAG') || col.includes('TAGS'));
            
            if (noteCols.length > 0 && tagCols.length > 0) {
              console.log(`\nPotential junction table: ${jTable.name}`);
              console.log(`  Note columns: ${noteCols.join(', ')}`);
              console.log(`  Tag columns: ${tagCols.join(', ')}`);
              
              // Try a sample query with this table
              try {
                const noteCol = noteCols[0];
                const tagCol = tagCols[0];
                
                const testQuery = `
                  SELECT ZT.ZTITLE as tag_name
                  FROM ${jTable.name} J
                  JOIN ZSFNOTETAG ZT ON ZT.Z_PK = J.${tagCol}
                  JOIN ZSFNOTE ZN ON ZN.Z_PK = J.${noteCol}
                  WHERE ZN.ZUNIQUEIDENTIFIER = ?
                  LIMIT 5
                `;
                
                console.log(`Trying query: ${testQuery}`);
                const testResult = await db.allAsync(testQuery, [sampleNote.id]);
                
                console.log(`Test query succeeded! Found ${testResult.length} tags:`, testResult);
                
                // Print the full working query for implementation
                console.log('\nWORKING QUERY:');
                console.log(`
SELECT ZT.ZTITLE as tag_name
FROM ${jTable.name} J
JOIN ZSFNOTETAG ZT ON ZT.Z_PK = J.${tagCol}
JOIN ZSFNOTE ZN ON ZN.Z_PK = J.${noteCol}
WHERE ZN.ZUNIQUEIDENTIFIER = ?
                `);
              } catch (testError) {
                console.log(`Test query failed: ${testError.message}`);
              }
            }
          }
        }
      } else {
        console.log('No notes found in the database');
      }
    } catch (queryError) {
      console.error('Error running test query:', queryError.message);
    }
    
  } catch (error) {
    console.error('Error examining database:', error.message);
  } finally {
    db.close(() => {
      console.log('\nDatabase connection closed.');
    });
  }
}

examineDatabase();

```

--------------------------------------------------------------------------------
/src/utils.js:
--------------------------------------------------------------------------------

```javascript
import sqlite3 from 'sqlite3';
import path from 'path';
import os from 'os';
import { promisify } from 'util';
import { fileURLToPath } from 'url';
import fs from 'fs/promises';
import { pipeline } from '@xenova/transformers';
// Fix for CommonJS module import in ESM
import faissNode from 'faiss-node';
const { IndexFlatL2 } = faissNode;

// Get current file path for ES modules
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

// Setup SQLite with verbose mode
const sqlite = sqlite3.verbose();
const { Database } = sqlite;

// Default path to Bear's database
const defaultDBPath = path.join(
  os.homedir(),
  'Library/Group Containers/9K33E3U3T4.net.shinyfrog.bear/Application Data/database.sqlite'
);

// Path to the vector index - store in src directory
const INDEX_PATH = path.join(__dirname, 'note_vectors');

// Embedding model name
const EMBEDDING_MODEL = 'Xenova/all-MiniLM-L6-v2';

// Global variables for embedding model and vector index
let embedder = null;
let vectorIndex = null;
let noteIdMap = null;

// Get the database path from environment variable or use default
export const getDbPath = () => process.env.BEAR_DATABASE_PATH || defaultDBPath;

// Create and configure database connection
export const createDb = (dbPath) => {
  const db = new Database(dbPath, sqlite3.OPEN_READONLY, (err) => {
    if (err) {
      console.error('Error connecting to Bear database:', err.message);
      process.exit(1);
    }
    console.error('Connected to Bear Notes database at:', dbPath);
  });

  // Promisify database methods
  db.allAsync = promisify(db.all).bind(db);
  db.getAsync = promisify(db.get).bind(db);
  
  return db;
};

// Initialize the embedding model
export const initEmbedder = async () => {
  if (!embedder) {
    try {
      // Using Xenova's implementation of transformers
      console.error(`Initializing embedding model (${EMBEDDING_MODEL})...`);
      embedder = await pipeline('feature-extraction', EMBEDDING_MODEL);
      console.error('Embedding model initialized');
      return true;
    } catch (error) {
      console.error('Error initializing embedding model:', error);
      return false;
    }
  }
  return true;
};

// Load the vector index
export const loadVectorIndex = async () => {
  try {
    if (!vectorIndex) {
      // Check if index exists
      try {
      await fs.access(`${INDEX_PATH}.index`);
      
      // Load index using the direct file reading method
      vectorIndex = IndexFlatL2.read(`${INDEX_PATH}.index`);
        
        const idMapData = await fs.readFile(`${INDEX_PATH}.json`, 'utf8');
        noteIdMap = JSON.parse(idMapData);
        
        console.error(`Loaded vector index with ${vectorIndex.ntotal} vectors`);
        return true;
      } catch (error) {
        console.error('Vector index not found. Please run indexing first:', error.message);
        return false;
      }
    }
    return true;
  } catch (error) {
    console.error('Error loading vector index:', error);
    return false;
  }
};

// Create text embeddings
export const createEmbedding = async (text) => {
  if (!embedder) {
    const initialized = await initEmbedder();
    if (!initialized) {
      throw new Error('Failed to initialize embedding model');
    }
  }
  
  try {
    // Generate embeddings using Xenova transformers
    const result = await embedder(text, { 
      pooling: 'mean',
      normalize: true 
    });
    
    // Return the embedding as a regular array
    return Array.from(result.data);
  } catch (error) {
    console.error('Error creating embedding:', error);
    throw error;
  }
};

// Search for notes using semantic search
export const semanticSearch = async (db, query, limit = 10) => {
  try {
    // Ensure vector index is loaded
    if (!vectorIndex || !noteIdMap) {
      const loaded = await loadVectorIndex();
      if (!loaded) {
        throw new Error('Vector index not available. Please run indexing first.');
      }
    }
    
    // Create embedding for the query
    const queryEmbedding = await createEmbedding(query);
    
    // Search in vector index
    const { labels, distances } = vectorIndex.search(queryEmbedding, limit);
    
    // Get note IDs from the results
    const noteIds = labels.map(idx => noteIdMap[idx]).filter(id => id);
    
    if (noteIds.length === 0) {
      return [];
    }
    
    // Prepare placeholders for SQL query
    const placeholders = noteIds.map(() => '?').join(',');
    
    // Get full note details from database
    const notes = await db.allAsync(`
      SELECT 
        ZUNIQUEIDENTIFIER as id,
        ZTITLE as title,
        ZTEXT as content,
        ZSUBTITLE as subtitle,
        ZCREATIONDATE as creation_date
      FROM ZSFNOTE
      WHERE ZUNIQUEIDENTIFIER IN (${placeholders}) AND ZTRASHED = 0
      ORDER BY ZMODIFICATIONDATE DESC
    `, noteIds);
    
    // Get tags for each note
    for (const note of notes) {
      try {
        const tags = await db.allAsync(`
          SELECT ZT.ZTITLE as tag_name
          FROM Z_5TAGS ZNT
          JOIN ZSFNOTETAG ZT ON ZT.Z_PK = ZNT.Z_13TAGS
          JOIN ZSFNOTE ZN ON ZN.Z_PK = ZNT.Z_5NOTES
          WHERE ZN.ZUNIQUEIDENTIFIER = ?
        `, [note.id]);
        note.tags = tags.map(t => t.tag_name);
      } catch (tagError) {
        console.error(`Error fetching tags for note ${note.id}:`, tagError.message);
        note.tags = [];
      }
      
      // Convert Apple's timestamp (seconds since 2001-01-01) to standard timestamp
      if (note.creation_date) {
        // Apple's reference date is 2001-01-01, so add seconds to get UNIX timestamp
        note.creation_date = new Date((note.creation_date + 978307200) * 1000).toISOString();
      }
      
      // Store the semantic similarity score (lower distance is better)
      const idx = noteIds.indexOf(note.id);
      note.score = idx >= 0 ? 1 - distances[idx] : 0;
    }
    
    // Sort by similarity score
    return notes.sort((a, b) => b.score - a.score);
  } catch (error) {
    console.error('Semantic search error:', error);
    throw error;
  }
};

// Fallback to keyword search if vector search fails
export const searchNotes = async (db, query, limit = 10, useSemanticSearch = true) => {
  try {
    // Try semantic search first if enabled
    if (useSemanticSearch) {
      try {
        const semanticResults = await semanticSearch(db, query, limit);
        if (semanticResults && semanticResults.length > 0) {
          return semanticResults;
        }
      } catch (error) {
        console.error('Semantic search failed, falling back to keyword search:', error.message);
      }
    }
    
    // Fallback to keyword search
    const notes = await db.allAsync(`
      SELECT 
        ZUNIQUEIDENTIFIER as id,
        ZTITLE as title,
        ZTEXT as content,
        ZSUBTITLE as subtitle,
        ZCREATIONDATE as creation_date
      FROM ZSFNOTE
      WHERE ZTRASHED = 0 AND (ZTITLE LIKE ? OR ZTEXT LIKE ?)
      ORDER BY ZMODIFICATIONDATE DESC
      LIMIT ?
    `, [`%${query}%`, `%${query}%`, limit]);
    
    // Get tags for each note
    for (const note of notes) {
      try {
        const tags = await db.allAsync(`
          SELECT ZT.ZTITLE as tag_name
          FROM Z_5TAGS ZNT
          JOIN ZSFNOTETAG ZT ON ZT.Z_PK = ZNT.Z_13TAGS
          JOIN ZSFNOTE ZN ON ZN.Z_PK = ZNT.Z_5NOTES
          WHERE ZN.ZUNIQUEIDENTIFIER = ?
        `, [note.id]);
        
        note.tags = tags.map(t => t.tag_name);
      } catch (tagError) {
        console.error(`Error fetching tags for note ${note.id}:`, tagError.message);
        note.tags = [];
      }
      
      // Convert Apple's timestamp (seconds since 2001-01-01) to standard timestamp
      if (note.creation_date) {
        // Apple's reference date is 2001-01-01, so add seconds to get UNIX timestamp
        note.creation_date = new Date((note.creation_date + 978307200) * 1000).toISOString();
      }
    }
    
    return notes;
  } catch (error) {
    console.error('Search error:', error);
    throw error;
  }
};

// Retrieve a specific note by ID
export const retrieveNote = async (db, id) => {
  try {
    if (!id) {
      throw new Error('Note ID is required');
    }
    
    // Get the note by ID
    const note = await db.getAsync(`
      SELECT 
        ZUNIQUEIDENTIFIER as id,
        ZTITLE as title,
        ZTEXT as content,
        ZSUBTITLE as subtitle,
        ZCREATIONDATE as creation_date
      FROM ZSFNOTE
      WHERE ZUNIQUEIDENTIFIER = ? AND ZTRASHED = 0
    `, [id]);
    
    if (!note) {
      throw new Error('Note not found');
    }
    
    // Get tags for the note
    try {
      const tags = await db.allAsync(`
        SELECT ZT.ZTITLE as tag_name
        FROM Z_5TAGS ZNT
        JOIN ZSFNOTETAG ZT ON ZT.Z_PK = ZNT.Z_13TAGS
        JOIN ZSFNOTE ZN ON ZN.Z_PK = ZNT.Z_5NOTES
        WHERE ZN.ZUNIQUEIDENTIFIER = ?
      `, [note.id]);
      note.tags = tags.map(t => t.tag_name);
    } catch (tagError) {
      console.error(`Error fetching tags for note ${note.id}:`, tagError.message);
      note.tags = [];
    }
    
    // Convert Apple's timestamp (seconds since 2001-01-01) to standard timestamp
    if (note.creation_date) {
      // Apple's reference date is 2001-01-01, so add seconds to get UNIX timestamp
      note.creation_date = new Date((note.creation_date + 978307200) * 1000).toISOString();
    }
    
    return note;
  } catch (error) {
    console.error('Retrieve error:', error);
    throw error;
  }
};

// Get all tags
export const getAllTags = async (db) => {
  try {
    const tags = await db.allAsync('SELECT ZTITLE as name FROM ZSFNOTETAG');
    return tags.map(tag => tag.name);
  } catch (error) {
    console.error('Get tags error:', error);
    throw error;
  }
};

// RAG function to retrieve notes that are semantically similar to a query
export const retrieveForRAG = async (db, query, limit = 5) => {
  try {
    // Get semantically similar notes
    const notes = await semanticSearch(db, query, limit);
    
    // Format for RAG context
    return notes.map(note => ({
      id: note.id,
      title: note.title,
      content: note.content,
      tags: note.tags,
      score: note.score
    }));
  } catch (error) {
    console.error('RAG retrieval error:', error);
    // Fallback to keyword search
    const notes = await searchNotes(db, query, limit, false);
    return notes.map(note => ({
      id: note.id,
      title: note.title,
      content: note.content,
      tags: note.tags
    }));
  }
};
```