This is page 1 of 2. Use http://codebase.md/gongrzhe/office-word-mcp-server?lines=true&page={x} to view the full context.
# Directory Structure
```
├── __init__.py
├── .gitignore
├── Dockerfile
├── LICENSE
├── mcp-config.json
├── office_word_mcp_server
│ └── __init__.py
├── pyproject.toml
├── README.md
├── RENDER_DEPLOYMENT.md
├── requirements.txt
├── setup_mcp.py
├── smithery.yaml
├── test_formatting.py
├── tests
│ └── test_convert_to_pdf.py
├── uv.lock
├── word_document_server
│ ├── __init__.py
│ ├── core
│ │ ├── __init__.py
│ │ ├── comments.py
│ │ ├── footnotes.py
│ │ ├── protection.py
│ │ ├── styles.py
│ │ ├── tables.py
│ │ └── unprotect.py
│ ├── main.py
│ ├── tools
│ │ ├── __init__.py
│ │ ├── comment_tools.py
│ │ ├── content_tools.py
│ │ ├── document_tools.py
│ │ ├── extended_document_tools.py
│ │ ├── footnote_tools.py
│ │ ├── format_tools.py
│ │ └── protection_tools.py
│ └── utils
│ ├── __init__.py
│ ├── document_utils.py
│ ├── extended_document_utils.py
│ └── file_utils.py
└── word_mcp_server.py
```
# Files
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
```
1 | # Project files
2 | .idea
3 | .DS_Store
4 |
5 | # Python-generated files
6 | __pycache__/
7 | *.py[oc]
8 | build/
9 | dist/
10 | wheels/
11 | *.egg-info
12 |
13 | # Virtual environments
14 | .venv
15 | .env.example
16 | .idea
17 |
```
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
```markdown
1 | # Office-Word-MCP-Server
2 |
3 | [](https://smithery.ai/server/@GongRzhe/Office-Word-MCP-Server)
4 |
5 | A Model Context Protocol (MCP) server for creating, reading, and manipulating Microsoft Word documents. This server enables AI assistants to work with Word documents through a standardized interface, providing rich document editing capabilities.
6 |
7 | <a href="https://glama.ai/mcp/servers/@GongRzhe/Office-Word-MCP-Server">
8 | <img width="380" height="200" src="https://glama.ai/mcp/servers/@GongRzhe/Office-Word-MCP-Server/badge" alt="Office Word Server MCP server" />
9 | </a>
10 |
11 | 
12 |
13 | ## Overview
14 |
15 | Office-Word-MCP-Server implements the [Model Context Protocol](https://modelcontextprotocol.io/) to expose Word document operations as tools and resources. It serves as a bridge between AI assistants and Microsoft Word documents, allowing for document creation, content addition, formatting, and analysis.
16 |
17 | The server features a modular architecture that separates concerns into core functionality, tools, and utilities, making it highly maintainable and extensible for future enhancements.
18 |
19 | ### Example
20 |
21 | #### Pormpt
22 |
23 | 
24 |
25 | #### Output
26 |
27 | 
28 |
29 | ## Features
30 |
31 | ### Document Management
32 |
33 | - Create new Word documents with metadata
34 | - Extract text and analyze document structure
35 | - View document properties and statistics
36 | - List available documents in a directory
37 | - Create copies of existing documents
38 | - Merge multiple documents into a single document
39 | - Convert Word documents to PDF format
40 |
41 | ### Content Creation
42 |
43 | - Add headings with different levels and direct formatting (font, size, bold, italic, borders)
44 | - Insert paragraphs with optional styling and direct formatting (font, size, bold, italic, color)
45 | - Create tables with custom data
46 | - Add images with proportional scaling
47 | - Insert page breaks
48 | - Insert bulleted and numbered lists with proper XML formatting
49 | - Add footnotes and endnotes to documents
50 | - Convert footnotes to endnotes
51 | - Customize footnote and endnote styling
52 | - Create professional table layouts for technical documentation
53 | - Design callout boxes and formatted content for instructional materials
54 | - Build structured data tables for business reports with consistent styling
55 | - Insert content relative to existing text or paragraph indices
56 |
57 | ### Rich Text Formatting
58 |
59 | - Format specific text sections (bold, italic, underline)
60 | - Change text color and font properties
61 | - Apply custom styles to text elements
62 | - Search and replace text throughout documents
63 | - Individual cell text formatting within tables
64 | - Multiple formatting combinations for enhanced visual appeal
65 | - Font customization with family and size control
66 | - Direct formatting during content creation (paragraphs and headings)
67 | - Reduce function calls by combining content creation with formatting
68 | - Add section header borders for visual separation
69 |
70 | ### Table Formatting
71 |
72 | - Format tables with borders and styles
73 | - Create header rows with distinct formatting
74 | - Apply cell shading and custom borders
75 | - Structure tables for better readability
76 | - Individual cell background shading with color support
77 | - Alternating row colors for improved readability
78 | - Enhanced header row highlighting with custom colors
79 | - Cell text formatting with bold, italic, underline, color, font size, and font family
80 | - Comprehensive color support with named colors and hex color codes
81 | - Cell padding management with independent control of all sides
82 | - Cell alignment (horizontal and vertical positioning)
83 | - Cell merging (horizontal, vertical, and rectangular areas)
84 | - Column width management with multiple units (points, percentage, auto-fit)
85 | - Auto-fit capabilities for dynamic column sizing
86 | - Professional callout table support with icon cells and styled content
87 |
88 | ### Advanced Document Manipulation
89 |
90 | - Delete paragraphs
91 | - Insert content relative to specific text or paragraph indices
92 | - Insert bulleted and numbered lists with proper XML numbering structure
93 | - Insert headers and paragraphs before or after target locations
94 | - Create custom document styles
95 | - Apply consistent formatting throughout documents
96 | - Format specific ranges of text with detailed control
97 | - Flexible padding units with support for points and percentage-based measurements
98 | - Clear, readable table presentation with proper alignment and spacing
99 |
100 | ### Document Protection
101 |
102 | - Add password protection to documents
103 | - Implement restricted editing with editable sections
104 | - Add digital signatures to documents
105 | - Verify document authenticity and integrity
106 |
107 | ### Comment Extraction
108 |
109 | - Extract all comments from a document
110 | - Filter comments by author
111 | - Get comments for specific paragraphs
112 | - Access comment metadata (author, date, text)
113 |
114 | ## Installation
115 |
116 | ### Installing via Smithery
117 |
118 | To install Office Word Document Server for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@GongRzhe/Office-Word-MCP-Server):
119 |
120 | ```bash
121 | npx -y @smithery/cli install @GongRzhe/Office-Word-MCP-Server --client claude
122 | ```
123 |
124 | ### Prerequisites
125 |
126 | - Python 3.8 or higher
127 | - pip package manager
128 |
129 | ### Basic Installation
130 |
131 | ```bash
132 | # Clone the repository
133 | git clone https://github.com/GongRzhe/Office-Word-MCP-Server.git
134 | cd Office-Word-MCP-Server
135 |
136 | # Install dependencies
137 | pip install -r requirements.txt
138 | ```
139 |
140 | ### Using the Setup Script
141 |
142 | Alternatively, you can use the provided setup script which handles:
143 |
144 | - Checking prerequisites
145 | - Setting up a virtual environment
146 | - Installing dependencies
147 | - Generating MCP configuration
148 |
149 | ```bash
150 | python setup_mcp.py
151 | ```
152 |
153 | ## Usage with Claude for Desktop
154 |
155 | ### Configuration
156 |
157 | #### Method 1: After Local Installation
158 |
159 | 1. After installation, add the server to your Claude for Desktop configuration file:
160 |
161 | ```json
162 | {
163 | "mcpServers": {
164 | "word-document-server": {
165 | "command": "python",
166 | "args": ["/path/to/word_mcp_server.py"]
167 | }
168 | }
169 | }
170 | ```
171 |
172 | #### Method 2: Without Installation (Using uvx)
173 |
174 | 1. You can also configure Claude for Desktop to use the server without local installation by using the uvx package manager:
175 |
176 | ```json
177 | {
178 | "mcpServers": {
179 | "word-document-server": {
180 | "command": "uvx",
181 | "args": ["--from", "office-word-mcp-server", "word_mcp_server"]
182 | }
183 | }
184 | }
185 | ```
186 |
187 | 2. Configuration file locations:
188 |
189 | - macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
190 | - Windows: `%APPDATA%\Claude\claude_desktop_config.json`
191 |
192 | 3. Restart Claude for Desktop to load the configuration.
193 |
194 | ### Example Operations
195 |
196 | Once configured, you can ask Claude to perform operations like:
197 |
198 | - "Create a new document called 'report.docx' with a title page"
199 | - "Add a heading and three paragraphs to my document"
200 | - "Add my name in Helvetica 36pt bold at the top of the document"
201 | - "Add a section heading 'Summary' in Helvetica 14pt bold with a bottom border"
202 | - "Add a paragraph in Times New Roman 14pt with italic blue text"
203 | - "Insert a bulleted list after the paragraph containing 'Introduction'"
204 | - "Insert a numbered list with items: 'First step', 'Second step', 'Third step'"
205 | - "Add bullet points after the 'Summary' heading"
206 | - "Insert a 4x4 table with sales data"
207 | - "Format the word 'important' in paragraph 2 to be bold and red"
208 | - "Search and replace all instances of 'old term' with 'new term'"
209 | - "Create a custom style for section headings"
210 | - "Apply formatting to the table in my document"
211 | - "Extract all comments from my document"
212 | - "Show me all comments by John Doe"
213 | - "Get comments for paragraph 3"
214 | - "Make the text in table cell (1,2) bold and blue with 14pt font"
215 | - "Add 10 points of padding to all sides of the header cells"
216 | - "Create a callout table with a blue checkmark icon and white text"
217 | - "Set the first column width to 50 points and auto-fit the remaining columns"
218 | - "Apply alternating row colors to make the table more readable"
219 |
220 |
221 | ## API Reference
222 |
223 | ### Document Creation and Properties
224 |
225 | ```python
226 | create_document(filename, title=None, author=None)
227 | get_document_info(filename)
228 | get_document_text(filename)
229 | get_document_outline(filename)
230 | list_available_documents(directory=".")
231 | copy_document(source_filename, destination_filename=None)
232 | convert_to_pdf(filename, output_filename=None)
233 | ```
234 |
235 | ### Content Addition
236 |
237 | ```python
238 | add_heading(filename, text, level=1, font_name=None, font_size=None,
239 | bold=None, italic=None, border_bottom=False)
240 | add_paragraph(filename, text, style=None, font_name=None, font_size=None,
241 | bold=None, italic=None, color=None)
242 | add_table(filename, rows, cols, data=None)
243 | add_picture(filename, image_path, width=None)
244 | add_page_break(filename)
245 | ```
246 |
247 | ### Advanced Content Manipulation
248 |
249 | ```python
250 | # Insert content relative to existing text or paragraph index
251 | insert_header_near_text(filename, target_text=None, header_title=None,
252 | position='after', header_style='Heading 1',
253 | target_paragraph_index=None)
254 |
255 | insert_line_or_paragraph_near_text(filename, target_text=None, line_text=None,
256 | position='after', line_style=None,
257 | target_paragraph_index=None)
258 |
259 | # Insert bulleted or numbered lists with proper XML formatting
260 | insert_numbered_list_near_text(filename, target_text=None, list_items=None,
261 | position='after', target_paragraph_index=None,
262 | bullet_type='bullet')
263 | # bullet_type options:
264 | # 'bullet' - Creates bulleted list with bullets (•)
265 | # 'number' - Creates numbered list (1, 2, 3, ...)
266 | ```
267 |
268 | ### Content Extraction
269 |
270 | ```python
271 | get_document_text(filename)
272 | get_paragraph_text_from_document(filename, paragraph_index)
273 | find_text_in_document(filename, text_to_find, match_case=True, whole_word=False)
274 | ```
275 |
276 | ### Text Formatting
277 |
278 | ```python
279 | format_text(filename, paragraph_index, start_pos, end_pos, bold=None,
280 | italic=None, underline=None, color=None, font_size=None, font_name=None)
281 | search_and_replace(filename, find_text, replace_text)
282 | delete_paragraph(filename, paragraph_index)
283 | create_custom_style(filename, style_name, bold=None, italic=None,
284 | font_size=None, font_name=None, color=None, base_style=None)
285 | ```
286 |
287 | ### Table Formatting
288 |
289 | ```python
290 | format_table(filename, table_index, has_header_row=None,
291 | border_style=None, shading=None)
292 | set_table_cell_shading(filename, table_index, row_index, col_index,
293 | fill_color, pattern="clear")
294 | apply_table_alternating_rows(filename, table_index,
295 | color1="FFFFFF", color2="F2F2F2")
296 | highlight_table_header(filename, table_index,
297 | header_color="4472C4", text_color="FFFFFF")
298 |
299 | # Cell merging tools
300 | merge_table_cells(filename, table_index, start_row, start_col, end_row, end_col)
301 | merge_table_cells_horizontal(filename, table_index, row_index, start_col, end_col)
302 | merge_table_cells_vertical(filename, table_index, col_index, start_row, end_row)
303 |
304 | # Cell alignment tools
305 | set_table_cell_alignment(filename, table_index, row_index, col_index,
306 | horizontal="left", vertical="top")
307 | set_table_alignment_all(filename, table_index,
308 | horizontal="left", vertical="top")
309 |
310 | # Cell text formatting tools
311 | format_table_cell_text(filename, table_index, row_index, col_index,
312 | text_content=None, bold=None, italic=None, underline=None,
313 | color=None, font_size=None, font_name=None)
314 |
315 | # Cell padding tools
316 | set_table_cell_padding(filename, table_index, row_index, col_index,
317 | top=None, bottom=None, left=None, right=None, unit="points")
318 |
319 | # Column width management
320 | set_table_column_width(filename, table_index, col_index, width, width_type="points")
321 | set_table_column_widths(filename, table_index, widths, width_type="points")
322 | set_table_width(filename, table_index, width, width_type="points")
323 | auto_fit_table_columns(filename, table_index)
324 | ```
325 |
326 | ### Comment Extraction
327 |
328 | ```python
329 | get_all_comments(filename)
330 | get_comments_by_author(filename, author)
331 | get_comments_for_paragraph(filename, paragraph_index)
332 | ```
333 |
334 | ## Troubleshooting
335 |
336 | ### Common Issues
337 |
338 | 1. **Missing Styles**
339 |
340 | - Some documents may lack required styles for heading and table operations
341 | - The server will attempt to create missing styles or use direct formatting
342 | - For best results, use templates with standard Word styles
343 |
344 | 2. **Permission Issues**
345 |
346 | - Ensure the server has permission to read/write to the document paths
347 | - Use the `copy_document` function to create editable copies of locked documents
348 | - Check file ownership and permissions if operations fail
349 |
350 | 3. **Image Insertion Problems**
351 | - Use absolute paths for image files
352 | - Verify image format compatibility (JPEG, PNG recommended)
353 | - Check image file size and permissions
354 |
355 | 4. **Table Formatting Issues**
356 |
357 | - **Cell index errors**: Ensure row and column indices are within table bounds (0-based indexing)
358 | - **Color format problems**: Use hex colors without '#' prefix (e.g., "FF0000" for red) or standard color names
359 | - **Padding unit confusion**: Specify "points" or "percent" explicitly when setting cell padding
360 | - **Column width conflicts**: Auto-fit may override manual column width settings
361 | - **Text formatting persistence**: Apply cell text formatting after setting cell content for best results
362 |
363 | ### Debugging
364 |
365 | Enable detailed logging by setting the environment variable:
366 |
367 | ```bash
368 | export MCP_DEBUG=1 # Linux/macOS
369 | set MCP_DEBUG=1 # Windows
370 | ```
371 |
372 | ## Contributing
373 |
374 | Contributions are welcome! Please feel free to submit a Pull Request.
375 |
376 | 1. Fork the repository
377 | 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
378 | 3. Commit your changes (`git commit -m 'Add some amazing feature'`)
379 | 4. Push to the branch (`git push origin feature/amazing-feature`)
380 | 5. Open a Pull Request
381 |
382 | ## License
383 |
384 | This project is licensed under the MIT License - see the LICENSE file for details.
385 |
386 | ## Acknowledgments
387 |
388 | - [Model Context Protocol](https://modelcontextprotocol.io/) for the protocol specification
389 | - [python-docx](https://python-docx.readthedocs.io/) for Word document manipulation
390 | - [FastMCP](https://github.com/modelcontextprotocol/python-sdk) for the Python MCP implementation
391 |
392 | ---
393 |
394 | _Note: This server interacts with document files on your system. Always verify that requested operations are appropriate before confirming them in Claude for Desktop or other MCP clients._
395 |
```
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
```
1 | fastmcp
2 | python-docx
3 | msoffcrypto-tool
4 | docx2pdf
5 | python-dotenv
```
--------------------------------------------------------------------------------
/office_word_mcp_server/__init__.py:
--------------------------------------------------------------------------------
```python
1 | from word_document_server.main import run_server
2 |
3 | __all__ = ["run_server"]
4 |
```
--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
```python
1 | """Office Word MCP Server package entry point."""
2 | from word_document_server.main import run_server
3 |
4 | __all__ = ["run_server"]
5 |
```
--------------------------------------------------------------------------------
/word_mcp_server.py:
--------------------------------------------------------------------------------
```python
1 | #!/usr/bin/env python3
2 | """
3 | Run script for the Word Document Server.
4 |
5 | This script provides a simple way to start the Word Document Server.
6 | """
7 |
8 | from word_document_server.main import run_server
9 |
10 | if __name__ == "__main__":
11 | run_server()
12 |
```
--------------------------------------------------------------------------------
/mcp-config.json:
--------------------------------------------------------------------------------
```json
1 | {
2 | "mcpServers": {
3 | "word-document-server": {
4 | "command": "/Users/gongzhe/GitRepos/Office-Word-MCP-Server/.venv/bin/python",
5 | "args": [
6 | "/Users/gongzhe/GitRepos/Office-Word-MCP-Server/word_mcp_server.py"
7 | ],
8 | "env": {
9 | "PYTHONPATH": "/Users/gongzhe/GitRepos/Office-Word-MCP-Server",
10 | "MCP_TRANSPORT": "stdio"
11 | }
12 | }
13 | }
14 | }
```
--------------------------------------------------------------------------------
/word_document_server/utils/__init__.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Utility functions for the Word Document Server.
3 |
4 | This package contains utility modules for file operations and document handling.
5 | """
6 |
7 | from word_document_server.utils.file_utils import check_file_writeable, create_document_copy, ensure_docx_extension
8 | from word_document_server.utils.document_utils import get_document_properties, extract_document_text, get_document_structure, find_paragraph_by_text, find_and_replace_text
9 |
```
--------------------------------------------------------------------------------
/smithery.yaml:
--------------------------------------------------------------------------------
```yaml
1 | # Smithery configuration file: https://smithery.ai/docs/build/project-config
2 |
3 | startCommand:
4 | type: stdio
5 | configSchema:
6 | # JSON Schema defining the configuration options for the MCP.
7 | type: object
8 | description: No configuration options required
9 | commandFunction:
10 | # A JS function that produces the CLI command based on the given config to start the MCP on stdio.
11 | |-
12 | (config) => ({command:'word_mcp_server', args:[]})
13 | exampleConfig: {}
14 |
```
--------------------------------------------------------------------------------
/word_document_server/__init__.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Word Document Server - MCP server for Microsoft Word document manipulation.
3 |
4 | This package provides tools for creating, reading, and manipulating Microsoft Word
5 | documents through the Model Context Protocol (MCP).
6 |
7 | Features:
8 | - Document creation and management
9 | - Content addition (headings, paragraphs, tables, images)
10 | - Text and table formatting
11 | - Document protection (password, restricted editing, signatures)
12 | - Footnote and endnote management
13 | """
14 |
15 | __version__ = "1.0.0"
16 |
```
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
```dockerfile
1 | # Generated by https://smithery.ai. See: https://smithery.ai/docs/build/project-config
2 | # syntax=docker/dockerfile:1
3 |
4 | # Use official Python runtime
5 | FROM python:3.11-slim
6 |
7 | # Set working directory
8 | WORKDIR /app
9 |
10 | # Install build dependencies
11 | RUN apt-get update \
12 | && apt-get install -y --no-install-recommends build-essential \
13 | && rm -rf /var/lib/apt/lists/*
14 |
15 | # Copy project files
16 | COPY . /app
17 |
18 | # Install Python dependencies
19 | RUN pip install --no-cache-dir .
20 |
21 | # Default command
22 | ENTRYPOINT ["word_mcp_server"]
23 |
```
--------------------------------------------------------------------------------
/word_document_server/core/__init__.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Core functionality for the Word Document Server.
3 |
4 | This package contains the core functionality modules used by the Word Document Server.
5 | """
6 |
7 | from word_document_server.core.styles import ensure_heading_style, ensure_table_style, create_style
8 | from word_document_server.core.protection import add_protection_info, verify_document_protection, is_section_editable, create_signature_info, verify_signature
9 | from word_document_server.core.footnotes import add_footnote, add_endnote, convert_footnotes_to_endnotes, find_footnote_references, get_format_symbols, customize_footnote_formatting
10 | from word_document_server.core.tables import set_cell_border, apply_table_style, copy_table
11 |
```
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
```toml
1 | [build-system]
2 | requires = ["hatchling"]
3 | build-backend = "hatchling.build"
4 |
5 | [project]
6 | name = "office-word-mcp-server"
7 | version = "1.1.10"
8 | description = "MCP server for manipulating Microsoft Word documents"
9 | readme = "README.md"
10 | license = {file = "LICENSE"}
11 | authors = [
12 | {name = "GongRzhe", email = "[email protected]"}
13 | ]
14 | classifiers = [
15 | "Programming Language :: Python :: 3",
16 | "License :: OSI Approved :: MIT License",
17 | "Operating System :: OS Independent",
18 | ]
19 | requires-python = ">=3.11"
20 | dependencies = [
21 | "python-docx>=1.1.2",
22 | "fastmcp>=2.8.1",
23 | "msoffcrypto-tool>=5.4.2",
24 | "docx2pdf>=0.1.8",
25 | "pytest>=8.4.2",
26 | ]
27 |
28 | [project.urls]
29 | "Homepage" = "https://github.com/GongRzhe/Office-Word-MCP-Server.git"
30 | "Bug Tracker" = "https://github.com/GongRzhe/Office-Word-MCP-Server.git/issues"
31 |
32 | [tool.hatch.build.targets.wheel]
33 | only-include = [
34 | "word_document_server",
35 | "office_word_mcp_server",
36 | ]
37 | sources = ["."]
38 |
39 | [project.scripts]
40 | word_mcp_server = "word_document_server.main:run_server"
41 |
```
--------------------------------------------------------------------------------
/word_document_server/tools/__init__.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | MCP tool implementations for the Word Document Server.
3 |
4 | This package contains the MCP tool implementations that expose functionality
5 | to clients through the Model Context Protocol.
6 | """
7 |
8 | # Document tools
9 | from word_document_server.tools.document_tools import (
10 | create_document, get_document_info, get_document_text,
11 | get_document_outline, list_available_documents,
12 | copy_document, merge_documents
13 | )
14 |
15 | # Content tools
16 | from word_document_server.tools.content_tools import (
17 | add_heading, add_paragraph, add_table, add_picture,
18 | add_page_break, add_table_of_contents, delete_paragraph,
19 | search_and_replace
20 | )
21 |
22 | # Format tools
23 | from word_document_server.tools.format_tools import (
24 | format_text, create_custom_style, format_table
25 | )
26 |
27 | # Protection tools
28 | from word_document_server.tools.protection_tools import (
29 | protect_document, add_restricted_editing,
30 | add_digital_signature, verify_document
31 | )
32 |
33 | # Footnote tools
34 | from word_document_server.tools.footnote_tools import (
35 | add_footnote_to_document, add_endnote_to_document,
36 | convert_footnotes_to_endnotes_in_document, customize_footnote_style
37 | )
38 |
39 | # Comment tools
40 | from word_document_server.tools.comment_tools import (
41 | get_all_comments, get_comments_by_author, get_comments_for_paragraph
42 | )
43 |
```
--------------------------------------------------------------------------------
/RENDER_DEPLOYMENT.md:
--------------------------------------------------------------------------------
```markdown
1 | # Render Deployment Guide
2 |
3 | This document explains how to deploy the Office Word MCP Server on Render.
4 |
5 | ## Required Environment Variables
6 |
7 | Set the following environment variables in your Render service:
8 |
9 | ### `MCP_TRANSPORT`
10 | - **Value**: `sse`
11 | - **Description**: Sets the transport type to Server-Sent Events (SSE) for HTTP communication
12 | - **Required**: Yes (for Render deployment)
13 |
14 | ### `MCP_HOST`
15 | - **Value**: `0.0.0.0`
16 | - **Description**: Binds the server to all network interfaces
17 | - **Required**: No (defaults to 0.0.0.0)
18 |
19 | ### `FASTMCP_LOG_LEVEL`
20 | - **Value**: `INFO`
21 | - **Description**: Sets the logging level for FastMCP
22 | - **Required**: No (defaults to INFO)
23 |
24 | ## How to Set Environment Variables
25 |
26 | 1. Go to your Render dashboard: https://dashboard.render.com
27 | 2. Navigate to your service: `Office-Word-MCP-Server`
28 | 3. Click on "Environment" in the left sidebar
29 | 4. Add the environment variable:
30 | - Key: `MCP_TRANSPORT`
31 | - Value: `sse`
32 | 5. Click "Save Changes"
33 |
34 | ## Deployment
35 |
36 | After setting the environment variables:
37 | 1. Render will automatically redeploy your service
38 | 2. The server will start with SSE transport on the port provided by Render
39 | 3. Access your server at: `https://office-word-mcp-server-bzlp.onrender.com/sse`
40 |
41 | ## Health Check Endpoint
42 |
43 | The FastMCP server with SSE transport automatically provides a health check endpoint at:
44 | - `https://your-service.onrender.com/health`
45 |
46 | ## Troubleshooting
47 |
48 | ### Server exits with status 1
49 | - **Cause**: Server is running in STDIO mode instead of SSE
50 | - **Fix**: Ensure `MCP_TRANSPORT=sse` is set in environment variables
51 |
52 | ### Port binding errors
53 | - **Cause**: Server not using Render's PORT environment variable
54 | - **Fix**: This has been fixed in the latest version of main.py
55 |
56 | ### Cannot connect to server
57 | - **Cause**: Health checks failing
58 | - **Fix**: Ensure SSE transport is enabled and server is listening on 0.0.0.0
59 |
60 |
```
--------------------------------------------------------------------------------
/word_document_server/utils/file_utils.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | File utility functions for Word Document Server.
3 | """
4 | import os
5 | from typing import Tuple, Optional
6 | import shutil
7 |
8 |
9 | def check_file_writeable(filepath: str) -> Tuple[bool, str]:
10 | """
11 | Check if a file can be written to.
12 |
13 | Args:
14 | filepath: Path to the file
15 |
16 | Returns:
17 | Tuple of (is_writeable, error_message)
18 | """
19 | # If file doesn't exist, check if directory is writeable
20 | if not os.path.exists(filepath):
21 | directory = os.path.dirname(filepath)
22 | # If no directory is specified (empty string), use current directory
23 | if directory == '':
24 | directory = '.'
25 | if not os.path.exists(directory):
26 | return False, f"Directory {directory} does not exist"
27 | if not os.access(directory, os.W_OK):
28 | return False, f"Directory {directory} is not writeable"
29 | return True, ""
30 |
31 | # If file exists, check if it's writeable
32 | if not os.access(filepath, os.W_OK):
33 | return False, f"File {filepath} is not writeable (permission denied)"
34 |
35 | # Try to open the file for writing to see if it's locked
36 | try:
37 | with open(filepath, 'a'):
38 | pass
39 | return True, ""
40 | except IOError as e:
41 | return False, f"File {filepath} is not writeable: {str(e)}"
42 | except Exception as e:
43 | return False, f"Unknown error checking file permissions: {str(e)}"
44 |
45 |
46 | def create_document_copy(source_path: str, dest_path: Optional[str] = None) -> Tuple[bool, str, Optional[str]]:
47 | """
48 | Create a copy of a document.
49 |
50 | Args:
51 | source_path: Path to the source document
52 | dest_path: Optional path for the new document. If not provided, will use source_path + '_copy.docx'
53 |
54 | Returns:
55 | Tuple of (success, message, new_filepath)
56 | """
57 | if not os.path.exists(source_path):
58 | return False, f"Source document {source_path} does not exist", None
59 |
60 | if not dest_path:
61 | # Generate a new filename if not provided
62 | base, ext = os.path.splitext(source_path)
63 | dest_path = f"{base}_copy{ext}"
64 |
65 | try:
66 | # Simple file copy
67 | shutil.copy2(source_path, dest_path)
68 | return True, f"Document copied to {dest_path}", dest_path
69 | except Exception as e:
70 | return False, f"Failed to copy document: {str(e)}", None
71 |
72 |
73 | def ensure_docx_extension(filename: str) -> str:
74 | """
75 | Ensure filename has .docx extension.
76 |
77 | Args:
78 | filename: The filename to check
79 |
80 | Returns:
81 | Filename with .docx extension
82 | """
83 | if not filename.endswith('.docx'):
84 | return filename + '.docx'
85 | return filename
86 |
```
--------------------------------------------------------------------------------
/word_document_server/core/unprotect.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Unprotect document functionality for the Word Document Server.
3 |
4 | This module handles removing document protection.
5 | """
6 | import os
7 | import json
8 | import hashlib
9 | import tempfile
10 | import shutil
11 | from typing import Tuple, Optional
12 |
13 | def remove_protection_info(filename: str, password: Optional[str] = None) -> Tuple[bool, str]:
14 | """
15 | Remove protection information from a document and decrypt it if necessary.
16 |
17 | Args:
18 | filename: Path to the Word document
19 | password: Password to verify before removing protection
20 |
21 | Returns:
22 | Tuple of (success, message)
23 | """
24 | base_path, _ = os.path.splitext(filename)
25 | metadata_path = f"{base_path}.protection"
26 |
27 | # Check if protection metadata exists
28 | if not os.path.exists(metadata_path):
29 | return False, "Document is not protected"
30 |
31 | try:
32 | # Load protection data
33 | with open(metadata_path, 'r') as f:
34 | protection_data = json.load(f)
35 |
36 | # Verify password if provided and required
37 | if password and protection_data.get("password_hash"):
38 | password_hash = hashlib.sha256(password.encode()).hexdigest()
39 | if password_hash != protection_data.get("password_hash"):
40 | return False, "Incorrect password"
41 |
42 | # Handle true encryption if it was applied
43 | if protection_data.get("true_encryption") and password:
44 | try:
45 | import msoffcrypto
46 |
47 | # Create a temporary file for the decrypted output
48 | temp_fd, temp_path = tempfile.mkstemp(suffix='.docx')
49 | os.close(temp_fd)
50 |
51 | # Open the encrypted document
52 | with open(filename, 'rb') as f:
53 | office_file = msoffcrypto.OfficeFile(f)
54 |
55 | # Decrypt with provided password
56 | try:
57 | office_file.load_key(password=password)
58 |
59 | # Write the decrypted file to the temp path
60 | with open(temp_path, 'wb') as out_file:
61 | office_file.decrypt(out_file)
62 |
63 | # Replace encrypted file with decrypted version
64 | shutil.move(temp_path, filename)
65 | except Exception as decrypt_error:
66 | if os.path.exists(temp_path):
67 | os.unlink(temp_path)
68 | return False, f"Failed to decrypt document: {str(decrypt_error)}"
69 | except ImportError:
70 | return False, "Missing msoffcrypto package required for encryption/decryption"
71 | except Exception as e:
72 | return False, f"Error decrypting document: {str(e)}"
73 |
74 | # Remove the protection metadata file
75 | os.remove(metadata_path)
76 | return True, "Protection removed successfully"
77 | except Exception as e:
78 | return False, f"Error removing protection: {str(e)}"
79 |
```
--------------------------------------------------------------------------------
/test_formatting.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Test script for add_paragraph and add_heading formatting parameters.
3 | """
4 | import asyncio
5 | from docx import Document
6 | from word_document_server.tools.content_tools import add_paragraph, add_heading
7 | from word_document_server.tools.document_tools import create_document
8 |
9 |
10 | async def test_formatting():
11 | """Test the new formatting parameters."""
12 | test_doc = 'test_formatting.docx'
13 |
14 | # Create test document
15 | print("Creating test document...")
16 | await create_document(test_doc, title="Formatting Test", author="Test Script")
17 |
18 | # Test 1: Name with large font
19 | print("Test 1: Adding name with large Helvetica 36pt bold...")
20 | result = await add_paragraph(
21 | test_doc,
22 | "JAMES MEHORTER",
23 | font_name="Helvetica",
24 | font_size=36,
25 | bold=True
26 | )
27 | print(f" Result: {result}")
28 |
29 | # Test 2: Title line
30 | print("Test 2: Adding title with Helvetica 14pt...")
31 | result = await add_paragraph(
32 | test_doc,
33 | "Principal Software Engineer | Technical Team Lead",
34 | font_name="Helvetica",
35 | font_size=14
36 | )
37 | print(f" Result: {result}")
38 |
39 | # Test 3: Section header with border
40 | print("Test 3: Adding section header with border...")
41 | result = await add_heading(
42 | test_doc,
43 | "PROFESSIONAL SUMMARY",
44 | level=2,
45 | font_name="Helvetica",
46 | font_size=14,
47 | bold=True,
48 | border_bottom=True
49 | )
50 | print(f" Result: {result}")
51 |
52 | # Test 4: Body text in Times New Roman
53 | print("Test 4: Adding body text in Times New Roman 14pt...")
54 | result = await add_paragraph(
55 | test_doc,
56 | "This is body text that should be in Times New Roman at 14pt. "
57 | "It demonstrates the ability to apply different fonts to different paragraphs.",
58 | font_name="Times New Roman",
59 | font_size=14
60 | )
61 | print(f" Result: {result}")
62 |
63 | # Test 5: Another section header
64 | print("Test 5: Adding another section header with border...")
65 | result = await add_heading(
66 | test_doc,
67 | "SKILLS",
68 | level=2,
69 | font_name="Helvetica",
70 | font_size=14,
71 | bold=True,
72 | border_bottom=True
73 | )
74 | print(f" Result: {result}")
75 |
76 | # Test 6: Italic text with color
77 | print("Test 6: Adding italic text with color...")
78 | result = await add_paragraph(
79 | test_doc,
80 | "This text is italic and colored blue.",
81 | font_name="Arial",
82 | font_size=12,
83 | italic=True,
84 | color="0000FF"
85 | )
86 | print(f" Result: {result}")
87 |
88 | print(f"\n✅ Test document created: {test_doc}")
89 |
90 | # Verify formatting
91 | print("\nVerifying formatting...")
92 | verify_doc = Document(test_doc)
93 | for i, para in enumerate(verify_doc.paragraphs):
94 | if para.runs:
95 | run = para.runs[0]
96 | text_preview = para.text[:50] + "..." if len(para.text) > 50 else para.text
97 | print(f"\nParagraph {i}: {text_preview}")
98 | print(f" Font: {run.font.name}")
99 | print(f" Size: {run.font.size}")
100 | print(f" Bold: {run.font.bold}")
101 | print(f" Italic: {run.font.italic}")
102 |
103 | print("\n✅ All tests completed successfully!")
104 | print(f"Open {test_doc} in Word to verify the formatting visually.")
105 |
106 |
107 | if __name__ == "__main__":
108 | asyncio.run(test_formatting())
109 |
```
--------------------------------------------------------------------------------
/tests/test_convert_to_pdf.py:
--------------------------------------------------------------------------------
```python
1 | import asyncio
2 | from pathlib import Path
3 |
4 | import pytest
5 | from docx import Document
6 |
7 | # Target for testing: convert_to_pdf (async function)
8 | from word_document_server.tools.extended_document_tools import convert_to_pdf
9 |
10 |
11 | def _make_sample_docx(path: Path) -> None:
12 | """Generates a simple .docx file in a temporary directory."""
13 | doc = Document()
14 | doc.add_heading("Conversion Test Document", level=1)
15 | doc.add_paragraph("This is a test paragraph for PDF conversion. Contains ASCII too.")
16 | doc.add_paragraph("Second paragraph: Contains special characters and spaces to cover path/content edge cases.")
17 | doc.save(path)
18 |
19 |
20 | def test_convert_to_pdf_with_temp_docx(tmp_path: Path):
21 | """
22 | End-to-end test: Create a temporary .docx -> call convert_to_pdf -> validate the PDF output.
23 |
24 | Notes:
25 | - On Linux/macOS, it first tries LibreOffice (soffice/libreoffice),
26 | and falls back to docx2pdf on failure (requires Microsoft Word).
27 | - If these tools are missing or the command is unavailable, the test is skipped with a reason.
28 | """
29 | # 1) Generate a docx file with spaces in its name in the temp directory
30 | src_doc = tmp_path / "sample document with spaces.docx"
31 | _make_sample_docx(src_doc)
32 |
33 | # 2) Define the output PDF path (also in the temp directory)
34 | out_pdf = tmp_path / "converted output.pdf"
35 |
36 | # 3) Run the asynchronous function under test
37 | result_msg = asyncio.run(convert_to_pdf(str(src_doc), output_filename=str(out_pdf)))
38 |
39 | # 4) Success condition: the return message contains success keywords, or the target PDF exists
40 | success_keywords = ["successfully converted", "converted to PDF"]
41 | success = any(k.lower() in result_msg.lower() for k in success_keywords) or out_pdf.exists()
42 |
43 | if not success:
44 | # When LibreOffice or Microsoft Word is not installed, the tool returns a hint.
45 | # In this case, skip the test instead of failing.
46 | pytest.skip(f"PDF conversion tool unavailable or conversion failed: {result_msg}")
47 |
48 | # 5) Assert: The PDF file was generated and is not empty
49 | # Some environments (especially docx2pdf) might ignore the exact output filename
50 | # and just generate a PDF with the same name as the source in the output or source directory,
51 | # so we check multiple possible locations.
52 | candidates = [
53 | out_pdf,
54 | # Common: A PDF with the same name as the source file in the output directory
55 | out_pdf.parent / f"{src_doc.stem}.pdf",
56 | # Fallback: A PDF in the same directory as the source file
57 | src_doc.with_suffix(".pdf"),
58 | ]
59 |
60 | # If none of the above paths exist, search for any newly generated PDF in the temp directory
61 | found = None
62 | for p in candidates:
63 | if p.exists():
64 | found = p
65 | break
66 | if not found:
67 | pdfs = sorted(tmp_path.glob("*.pdf"), key=lambda p: p.stat().st_mtime, reverse=True)
68 | if pdfs:
69 | found = pdfs[0]
70 |
71 | if not found:
72 | # If the tool returns success but the output can't be found,
73 | # treat it as an environment/tooling difference and skip instead of failing.
74 | pytest.skip(f"Could not find the generated PDF. Function output: {result_msg}")
75 |
76 | assert found.exists(), f"Generated PDF not found: {found}, function output: {result_msg}"
77 | assert found.stat().st_size > 0, f"The generated PDF file is empty: {found}"
78 |
79 |
80 | if __name__ == "__main__":
81 | # Allow running this file directly for quick verification:
82 | # python tests/test_convert_to_pdf.py
83 | import sys
84 | sys.exit(pytest.main([__file__, "-q"]))
85 |
```
--------------------------------------------------------------------------------
/word_document_server/core/styles.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Style-related functions for Word Document Server.
3 | """
4 | from docx.shared import Pt
5 | from docx.enum.style import WD_STYLE_TYPE
6 |
7 |
8 | def ensure_heading_style(doc):
9 | """
10 | Ensure Heading styles exist in the document.
11 |
12 | Args:
13 | doc: Document object
14 | """
15 | for i in range(1, 10): # Create Heading 1 through Heading 9
16 | style_name = f'Heading {i}'
17 | try:
18 | # Try to access the style to see if it exists
19 | style = doc.styles[style_name]
20 | except KeyError:
21 | # Create the style if it doesn't exist
22 | try:
23 | style = doc.styles.add_style(style_name, WD_STYLE_TYPE.PARAGRAPH)
24 | if i == 1:
25 | style.font.size = Pt(16)
26 | style.font.bold = True
27 | elif i == 2:
28 | style.font.size = Pt(14)
29 | style.font.bold = True
30 | else:
31 | style.font.size = Pt(12)
32 | style.font.bold = True
33 | except Exception:
34 | # If style creation fails, we'll just use default formatting
35 | pass
36 |
37 |
38 | def ensure_table_style(doc):
39 | """
40 | Ensure Table Grid style exists in the document.
41 |
42 | Args:
43 | doc: Document object
44 | """
45 | try:
46 | # Try to access the style to see if it exists
47 | style = doc.styles['Table Grid']
48 | except KeyError:
49 | # If style doesn't exist, we'll handle it at usage time
50 | pass
51 |
52 |
53 | def create_style(doc, style_name, style_type, base_style=None, font_properties=None, paragraph_properties=None):
54 | """
55 | Create a new style in the document.
56 |
57 | Args:
58 | doc: Document object
59 | style_name: Name for the new style
60 | style_type: Type of style (WD_STYLE_TYPE)
61 | base_style: Optional base style to inherit from
62 | font_properties: Dictionary of font properties (bold, italic, size, name, color)
63 | paragraph_properties: Dictionary of paragraph properties (alignment, spacing)
64 |
65 | Returns:
66 | The created style
67 | """
68 | from docx.shared import Pt
69 |
70 | try:
71 | # Check if style already exists
72 | style = doc.styles.get_by_id(style_name, WD_STYLE_TYPE.PARAGRAPH)
73 | return style
74 | except:
75 | # Create new style
76 | new_style = doc.styles.add_style(style_name, style_type)
77 |
78 | # Set base style if specified
79 | if base_style:
80 | new_style.base_style = doc.styles[base_style]
81 |
82 | # Set font properties
83 | if font_properties:
84 | font = new_style.font
85 | if 'bold' in font_properties:
86 | font.bold = font_properties['bold']
87 | if 'italic' in font_properties:
88 | font.italic = font_properties['italic']
89 | if 'size' in font_properties:
90 | font.size = Pt(font_properties['size'])
91 | if 'name' in font_properties:
92 | font.name = font_properties['name']
93 | if 'color' in font_properties:
94 | from docx.shared import RGBColor
95 |
96 | # Define common RGB colors
97 | color_map = {
98 | 'red': RGBColor(255, 0, 0),
99 | 'blue': RGBColor(0, 0, 255),
100 | 'green': RGBColor(0, 128, 0),
101 | 'yellow': RGBColor(255, 255, 0),
102 | 'black': RGBColor(0, 0, 0),
103 | 'gray': RGBColor(128, 128, 128),
104 | 'white': RGBColor(255, 255, 255),
105 | 'purple': RGBColor(128, 0, 128),
106 | 'orange': RGBColor(255, 165, 0)
107 | }
108 |
109 | color_value = font_properties['color']
110 | try:
111 | # Handle string color names
112 | if isinstance(color_value, str) and color_value.lower() in color_map:
113 | font.color.rgb = color_map[color_value.lower()]
114 | # Handle RGBColor objects
115 | elif hasattr(color_value, 'rgb'):
116 | font.color.rgb = color_value
117 | # Try to parse as RGB string
118 | elif isinstance(color_value, str):
119 | font.color.rgb = RGBColor.from_string(color_value)
120 | # Use directly if it's already an RGB value
121 | else:
122 | font.color.rgb = color_value
123 | except Exception as e:
124 | # Fallback to black if all else fails
125 | font.color.rgb = RGBColor(0, 0, 0)
126 |
127 | # Set paragraph properties
128 | if paragraph_properties:
129 | if 'alignment' in paragraph_properties:
130 | new_style.paragraph_format.alignment = paragraph_properties['alignment']
131 | if 'spacing' in paragraph_properties:
132 | new_style.paragraph_format.line_spacing = paragraph_properties['spacing']
133 |
134 | return new_style
135 |
```
--------------------------------------------------------------------------------
/word_document_server/tools/comment_tools.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Comment extraction tools for Word Document Server.
3 |
4 | These tools provide high-level interfaces for extracting and analyzing
5 | comments from Word documents through the MCP protocol.
6 | """
7 | import os
8 | import json
9 | from typing import Dict, List, Optional, Any
10 | from docx import Document
11 |
12 | from word_document_server.utils.file_utils import ensure_docx_extension
13 | from word_document_server.core.comments import (
14 | extract_all_comments,
15 | filter_comments_by_author,
16 | get_comments_for_paragraph
17 | )
18 |
19 |
20 | async def get_all_comments(filename: str) -> str:
21 | """
22 | Extract all comments from a Word document.
23 |
24 | Args:
25 | filename: Path to the Word document
26 |
27 | Returns:
28 | JSON string containing all comments with metadata
29 | """
30 | filename = ensure_docx_extension(filename)
31 |
32 | if not os.path.exists(filename):
33 | return json.dumps({
34 | 'success': False,
35 | 'error': f'Document {filename} does not exist'
36 | }, indent=2)
37 |
38 | try:
39 | # Load the document
40 | doc = Document(filename)
41 |
42 | # Extract all comments
43 | comments = extract_all_comments(doc)
44 |
45 | # Return results
46 | return json.dumps({
47 | 'success': True,
48 | 'comments': comments,
49 | 'total_comments': len(comments)
50 | }, indent=2)
51 |
52 | except Exception as e:
53 | return json.dumps({
54 | 'success': False,
55 | 'error': f'Failed to extract comments: {str(e)}'
56 | }, indent=2)
57 |
58 |
59 | async def get_comments_by_author(filename: str, author: str) -> str:
60 | """
61 | Extract comments from a specific author in a Word document.
62 |
63 | Args:
64 | filename: Path to the Word document
65 | author: Name of the comment author to filter by
66 |
67 | Returns:
68 | JSON string containing filtered comments
69 | """
70 | filename = ensure_docx_extension(filename)
71 |
72 | if not os.path.exists(filename):
73 | return json.dumps({
74 | 'success': False,
75 | 'error': f'Document {filename} does not exist'
76 | }, indent=2)
77 |
78 | if not author or not author.strip():
79 | return json.dumps({
80 | 'success': False,
81 | 'error': 'Author name cannot be empty'
82 | }, indent=2)
83 |
84 | try:
85 | # Load the document
86 | doc = Document(filename)
87 |
88 | # Extract all comments
89 | all_comments = extract_all_comments(doc)
90 |
91 | # Filter by author
92 | author_comments = filter_comments_by_author(all_comments, author)
93 |
94 | # Return results
95 | return json.dumps({
96 | 'success': True,
97 | 'author': author,
98 | 'comments': author_comments,
99 | 'total_comments': len(author_comments)
100 | }, indent=2)
101 |
102 | except Exception as e:
103 | return json.dumps({
104 | 'success': False,
105 | 'error': f'Failed to extract comments: {str(e)}'
106 | }, indent=2)
107 |
108 |
109 | async def get_comments_for_paragraph(filename: str, paragraph_index: int) -> str:
110 | """
111 | Extract comments for a specific paragraph in a Word document.
112 |
113 | Args:
114 | filename: Path to the Word document
115 | paragraph_index: Index of the paragraph (0-based)
116 |
117 | Returns:
118 | JSON string containing comments for the specified paragraph
119 | """
120 | filename = ensure_docx_extension(filename)
121 |
122 | if not os.path.exists(filename):
123 | return json.dumps({
124 | 'success': False,
125 | 'error': f'Document {filename} does not exist'
126 | }, indent=2)
127 |
128 | if paragraph_index < 0:
129 | return json.dumps({
130 | 'success': False,
131 | 'error': 'Paragraph index must be non-negative'
132 | }, indent=2)
133 |
134 | try:
135 | # Load the document
136 | doc = Document(filename)
137 |
138 | # Check if paragraph index is valid
139 | if paragraph_index >= len(doc.paragraphs):
140 | return json.dumps({
141 | 'success': False,
142 | 'error': f'Paragraph index {paragraph_index} is out of range. Document has {len(doc.paragraphs)} paragraphs.'
143 | }, indent=2)
144 |
145 | # Extract all comments
146 | all_comments = extract_all_comments(doc)
147 |
148 | # Filter for the specific paragraph
149 | from word_document_server.core.comments import get_comments_for_paragraph as core_get_comments_for_paragraph
150 | para_comments = core_get_comments_for_paragraph(all_comments, paragraph_index)
151 |
152 | # Get the paragraph text for context
153 | paragraph_text = doc.paragraphs[paragraph_index].text
154 |
155 | # Return results
156 | return json.dumps({
157 | 'success': True,
158 | 'paragraph_index': paragraph_index,
159 | 'paragraph_text': paragraph_text,
160 | 'comments': para_comments,
161 | 'total_comments': len(para_comments)
162 | }, indent=2)
163 |
164 | except Exception as e:
165 | return json.dumps({
166 | 'success': False,
167 | 'error': f'Failed to extract comments: {str(e)}'
168 | }, indent=2)
```
--------------------------------------------------------------------------------
/word_document_server/utils/extended_document_utils.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Extended document utilities for Word Document Server.
3 | """
4 | from typing import Dict, List, Any, Tuple
5 | from docx import Document
6 |
7 |
8 | def get_paragraph_text(doc_path: str, paragraph_index: int) -> Dict[str, Any]:
9 | """
10 | Get text from a specific paragraph in a Word document.
11 |
12 | Args:
13 | doc_path: Path to the Word document
14 | paragraph_index: Index of the paragraph to extract (0-based)
15 |
16 | Returns:
17 | Dictionary with paragraph text and metadata
18 | """
19 | import os
20 | if not os.path.exists(doc_path):
21 | return {"error": f"Document {doc_path} does not exist"}
22 |
23 | try:
24 | doc = Document(doc_path)
25 |
26 | # Check if paragraph index is valid
27 | if paragraph_index < 0 or paragraph_index >= len(doc.paragraphs):
28 | return {"error": f"Invalid paragraph index: {paragraph_index}. Document has {len(doc.paragraphs)} paragraphs."}
29 |
30 | paragraph = doc.paragraphs[paragraph_index]
31 |
32 | return {
33 | "index": paragraph_index,
34 | "text": paragraph.text,
35 | "style": paragraph.style.name if paragraph.style else "Normal",
36 | "is_heading": paragraph.style.name.startswith("Heading") if paragraph.style else False
37 | }
38 | except Exception as e:
39 | return {"error": f"Failed to get paragraph text: {str(e)}"}
40 |
41 |
42 | def find_text(doc_path: str, text_to_find: str, match_case: bool = True, whole_word: bool = False) -> Dict[str, Any]:
43 | """
44 | Find all occurrences of specific text in a Word document.
45 |
46 | Args:
47 | doc_path: Path to the Word document
48 | text_to_find: Text to search for
49 | match_case: Whether to perform case-sensitive search
50 | whole_word: Whether to match whole words only
51 |
52 | Returns:
53 | Dictionary with search results
54 | """
55 | import os
56 | if not os.path.exists(doc_path):
57 | return {"error": f"Document {doc_path} does not exist"}
58 |
59 | if not text_to_find:
60 | return {"error": "Search text cannot be empty"}
61 |
62 | try:
63 | doc = Document(doc_path)
64 | results = {
65 | "query": text_to_find,
66 | "match_case": match_case,
67 | "whole_word": whole_word,
68 | "occurrences": [],
69 | "total_count": 0
70 | }
71 |
72 | # Search in paragraphs
73 | for i, para in enumerate(doc.paragraphs):
74 | # Prepare text for comparison
75 | para_text = para.text
76 | search_text = text_to_find
77 |
78 | if not match_case:
79 | para_text = para_text.lower()
80 | search_text = search_text.lower()
81 |
82 | # Find all occurrences (simple implementation)
83 | start_pos = 0
84 | while True:
85 | if whole_word:
86 | # For whole word search, we need to check word boundaries
87 | words = para_text.split()
88 | found = False
89 | for word_idx, word in enumerate(words):
90 | if (word == search_text or
91 | (not match_case and word.lower() == search_text.lower())):
92 | results["occurrences"].append({
93 | "paragraph_index": i,
94 | "position": word_idx,
95 | "context": para.text[:100] + ("..." if len(para.text) > 100 else "")
96 | })
97 | results["total_count"] += 1
98 | found = True
99 |
100 | # Break after checking all words
101 | break
102 | else:
103 | # For substring search
104 | pos = para_text.find(search_text, start_pos)
105 | if pos == -1:
106 | break
107 |
108 | results["occurrences"].append({
109 | "paragraph_index": i,
110 | "position": pos,
111 | "context": para.text[:100] + ("..." if len(para.text) > 100 else "")
112 | })
113 | results["total_count"] += 1
114 | start_pos = pos + len(search_text)
115 |
116 | # Search in tables
117 | for table_idx, table in enumerate(doc.tables):
118 | for row_idx, row in enumerate(table.rows):
119 | for col_idx, cell in enumerate(row.cells):
120 | for para_idx, para in enumerate(cell.paragraphs):
121 | # Prepare text for comparison
122 | para_text = para.text
123 | search_text = text_to_find
124 |
125 | if not match_case:
126 | para_text = para_text.lower()
127 | search_text = search_text.lower()
128 |
129 | # Find all occurrences (simple implementation)
130 | start_pos = 0
131 | while True:
132 | if whole_word:
133 | # For whole word search, check word boundaries
134 | words = para_text.split()
135 | found = False
136 | for word_idx, word in enumerate(words):
137 | if (word == search_text or
138 | (not match_case and word.lower() == search_text.lower())):
139 | results["occurrences"].append({
140 | "location": f"Table {table_idx}, Row {row_idx}, Column {col_idx}",
141 | "position": word_idx,
142 | "context": para.text[:100] + ("..." if len(para.text) > 100 else "")
143 | })
144 | results["total_count"] += 1
145 | found = True
146 |
147 | # Break after checking all words
148 | break
149 | else:
150 | # For substring search
151 | pos = para_text.find(search_text, start_pos)
152 | if pos == -1:
153 | break
154 |
155 | results["occurrences"].append({
156 | "location": f"Table {table_idx}, Row {row_idx}, Column {col_idx}",
157 | "position": pos,
158 | "context": para.text[:100] + ("..." if len(para.text) > 100 else "")
159 | })
160 | results["total_count"] += 1
161 | start_pos = pos + len(search_text)
162 |
163 | return results
164 | except Exception as e:
165 | return {"error": f"Failed to search for text: {str(e)}"}
166 |
```
--------------------------------------------------------------------------------
/word_document_server/core/comments.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Core comment extraction functionality for Word documents.
3 |
4 | This module provides low-level functions to extract and process comments
5 | from Word documents using the python-docx library.
6 | """
7 | import datetime
8 | from typing import Dict, List, Optional, Any
9 | from docx import Document
10 | from docx.document import Document as DocumentType
11 | from docx.text.paragraph import Paragraph
12 |
13 |
14 | def extract_all_comments(doc: DocumentType) -> List[Dict[str, Any]]:
15 | """
16 | Extract all comments from a Word document.
17 |
18 | Args:
19 | doc: The Document object to extract comments from
20 |
21 | Returns:
22 | List of dictionaries containing comment information
23 | """
24 | comments = []
25 |
26 | # Access the document's comment part if it exists
27 | try:
28 | # Get the document part
29 | document_part = doc.part
30 |
31 | # Find comments part through relationships
32 | comments_part = None
33 | for rel_id, rel in document_part.rels.items():
34 | if 'comments' in rel.reltype and 'comments' == rel.reltype.split('/')[-1]:
35 | comments_part = rel.target_part
36 | break
37 |
38 | if comments_part:
39 | # Extract comments from the comments part using proper xpath syntax
40 | comment_elements = comments_part.element.xpath('.//w:comment')
41 |
42 | for idx, comment_element in enumerate(comment_elements):
43 | comment_data = extract_comment_data(comment_element, idx)
44 | if comment_data:
45 | comments.append(comment_data)
46 |
47 | # If no comments found, try alternative approach
48 | if not comments:
49 | # Fallback: scan paragraphs for comment references
50 | comments = extract_comments_from_paragraphs(doc)
51 |
52 | except Exception as e:
53 | # If direct access fails, try alternative approach
54 | comments = extract_comments_from_paragraphs(doc)
55 |
56 | return comments
57 |
58 |
59 | def extract_comments_from_paragraphs(doc: DocumentType) -> List[Dict[str, Any]]:
60 | """
61 | Extract comments by scanning paragraphs for comment references.
62 |
63 | Args:
64 | doc: The Document object
65 |
66 | Returns:
67 | List of comment dictionaries
68 | """
69 | comments = []
70 | comment_id = 1
71 |
72 | # Check all paragraphs in the document
73 | for para_idx, paragraph in enumerate(doc.paragraphs):
74 | para_comments = find_paragraph_comments(paragraph, para_idx, comment_id)
75 | comments.extend(para_comments)
76 | comment_id += len(para_comments)
77 |
78 | # Check paragraphs in tables
79 | for table in doc.tables:
80 | for row in table.rows:
81 | for cell in row.cells:
82 | for para_idx, paragraph in enumerate(cell.paragraphs):
83 | para_comments = find_paragraph_comments(paragraph, para_idx, comment_id, in_table=True)
84 | comments.extend(para_comments)
85 | comment_id += len(para_comments)
86 |
87 | return comments
88 |
89 |
90 | def extract_comment_data(comment_element, index: int) -> Optional[Dict[str, Any]]:
91 | """
92 | Extract data from a comment XML element.
93 |
94 | Args:
95 | comment_element: The XML comment element
96 | index: Index for generating a unique ID
97 |
98 | Returns:
99 | Dictionary with comment data or None
100 | """
101 | try:
102 | # Extract comment attributes
103 | comment_id = comment_element.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id', str(index))
104 | author = comment_element.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}author', 'Unknown')
105 | initials = comment_element.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}initials', '')
106 | date_str = comment_element.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date', '')
107 |
108 | # Parse date if available
109 | date = None
110 | if date_str:
111 | try:
112 | date = datetime.datetime.fromisoformat(date_str.replace('Z', '+00:00'))
113 | date = date.isoformat()
114 | except:
115 | date = date_str
116 |
117 | # Extract comment text
118 | text_elements = comment_element.xpath('.//w:t')
119 | text = ''.join(elem.text or '' for elem in text_elements)
120 |
121 | return {
122 | 'id': f'comment_{index + 1}',
123 | 'comment_id': comment_id,
124 | 'author': author,
125 | 'initials': initials,
126 | 'date': date,
127 | 'text': text.strip(),
128 | 'paragraph_index': None, # Will be filled if we can determine it
129 | 'in_table': False,
130 | 'reference_text': ''
131 | }
132 |
133 | except Exception as e:
134 | return None
135 |
136 |
137 | def find_paragraph_comments(paragraph: Paragraph, para_index: int,
138 | start_id: int, in_table: bool = False) -> List[Dict[str, Any]]:
139 | """
140 | Find comments associated with a specific paragraph.
141 |
142 | Args:
143 | paragraph: The paragraph to check
144 | para_index: The index of the paragraph
145 | start_id: Starting ID for comments
146 | in_table: Whether the paragraph is in a table
147 |
148 | Returns:
149 | List of comment dictionaries
150 | """
151 | comments = []
152 |
153 | try:
154 | # Access the paragraph's XML element
155 | para_xml = paragraph._element
156 |
157 | # Look for comment range markers (simplified approach)
158 | # This is a basic implementation - the full version would need more sophisticated XML parsing
159 | xml_text = str(para_xml)
160 |
161 | # Simple check for comment references in the XML
162 | if 'commentRangeStart' in xml_text or 'commentReference' in xml_text:
163 | # Create a placeholder comment entry
164 | comment_info = {
165 | 'id': f'comment_{start_id}',
166 | 'comment_id': f'{start_id}',
167 | 'author': 'Unknown',
168 | 'initials': '',
169 | 'date': None,
170 | 'text': 'Comment detected but content not accessible',
171 | 'paragraph_index': para_index,
172 | 'in_table': in_table,
173 | 'reference_text': paragraph.text[:50] + '...' if len(paragraph.text) > 50 else paragraph.text
174 | }
175 | comments.append(comment_info)
176 |
177 | except Exception:
178 | # If we can't access the XML, skip this paragraph
179 | pass
180 |
181 | return comments
182 |
183 |
184 | def filter_comments_by_author(comments: List[Dict[str, Any]], author: str) -> List[Dict[str, Any]]:
185 | """
186 | Filter comments by author name.
187 |
188 | Args:
189 | comments: List of comment dictionaries
190 | author: Author name to filter by (case-insensitive)
191 |
192 | Returns:
193 | Filtered list of comments
194 | """
195 | author_lower = author.lower()
196 | return [c for c in comments if c.get('author', '').lower() == author_lower]
197 |
198 |
199 | def get_comments_for_paragraph(comments: List[Dict[str, Any]], paragraph_index: int) -> List[Dict[str, Any]]:
200 | """
201 | Get all comments for a specific paragraph.
202 |
203 | Args:
204 | comments: List of all comments
205 | paragraph_index: Index of the paragraph
206 |
207 | Returns:
208 | Comments for the specified paragraph
209 | """
210 | return [c for c in comments if c.get('paragraph_index') == paragraph_index]
```
--------------------------------------------------------------------------------
/word_document_server/tools/document_tools.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Document creation and manipulation tools for Word Document Server.
3 | """
4 | import os
5 | import json
6 | from typing import Dict, List, Optional, Any
7 | from docx import Document
8 |
9 | from word_document_server.utils.file_utils import check_file_writeable, ensure_docx_extension, create_document_copy
10 | from word_document_server.utils.document_utils import get_document_properties, extract_document_text, get_document_structure, get_document_xml, insert_header_near_text, insert_line_or_paragraph_near_text
11 | from word_document_server.core.styles import ensure_heading_style, ensure_table_style
12 |
13 |
14 | async def create_document(filename: str, title: Optional[str] = None, author: Optional[str] = None) -> str:
15 | """Create a new Word document with optional metadata.
16 |
17 | Args:
18 | filename: Name of the document to create (with or without .docx extension)
19 | title: Optional title for the document metadata
20 | author: Optional author for the document metadata
21 | """
22 | filename = ensure_docx_extension(filename)
23 |
24 | # Check if file is writeable
25 | is_writeable, error_message = check_file_writeable(filename)
26 | if not is_writeable:
27 | return f"Cannot create document: {error_message}"
28 |
29 | try:
30 | doc = Document()
31 |
32 | # Set properties if provided
33 | if title:
34 | doc.core_properties.title = title
35 | if author:
36 | doc.core_properties.author = author
37 |
38 | # Ensure necessary styles exist
39 | ensure_heading_style(doc)
40 | ensure_table_style(doc)
41 |
42 | # Save the document
43 | doc.save(filename)
44 |
45 | return f"Document {filename} created successfully"
46 | except Exception as e:
47 | return f"Failed to create document: {str(e)}"
48 |
49 |
50 | async def get_document_info(filename: str) -> str:
51 | """Get information about a Word document.
52 |
53 | Args:
54 | filename: Path to the Word document
55 | """
56 | filename = ensure_docx_extension(filename)
57 |
58 | if not os.path.exists(filename):
59 | return f"Document {filename} does not exist"
60 |
61 | try:
62 | properties = get_document_properties(filename)
63 | return json.dumps(properties, indent=2)
64 | except Exception as e:
65 | return f"Failed to get document info: {str(e)}"
66 |
67 |
68 | async def get_document_text(filename: str) -> str:
69 | """Extract all text from a Word document.
70 |
71 | Args:
72 | filename: Path to the Word document
73 | """
74 | filename = ensure_docx_extension(filename)
75 |
76 | return extract_document_text(filename)
77 |
78 |
79 | async def get_document_outline(filename: str) -> str:
80 | """Get the structure of a Word document.
81 |
82 | Args:
83 | filename: Path to the Word document
84 | """
85 | filename = ensure_docx_extension(filename)
86 |
87 | structure = get_document_structure(filename)
88 | return json.dumps(structure, indent=2)
89 |
90 |
91 | async def list_available_documents(directory: str = ".") -> str:
92 | """List all .docx files in the specified directory.
93 |
94 | Args:
95 | directory: Directory to search for Word documents
96 | """
97 | try:
98 | if not os.path.exists(directory):
99 | return f"Directory {directory} does not exist"
100 |
101 | docx_files = [f for f in os.listdir(directory) if f.endswith('.docx')]
102 |
103 | if not docx_files:
104 | return f"No Word documents found in {directory}"
105 |
106 | result = f"Found {len(docx_files)} Word documents in {directory}:\n"
107 | for file in docx_files:
108 | file_path = os.path.join(directory, file)
109 | size = os.path.getsize(file_path) / 1024 # KB
110 | result += f"- {file} ({size:.2f} KB)\n"
111 |
112 | return result
113 | except Exception as e:
114 | return f"Failed to list documents: {str(e)}"
115 |
116 |
117 | async def copy_document(source_filename: str, destination_filename: Optional[str] = None) -> str:
118 | """Create a copy of a Word document.
119 |
120 | Args:
121 | source_filename: Path to the source document
122 | destination_filename: Optional path for the copy. If not provided, a default name will be generated.
123 | """
124 | source_filename = ensure_docx_extension(source_filename)
125 |
126 | if destination_filename:
127 | destination_filename = ensure_docx_extension(destination_filename)
128 |
129 | success, message, new_path = create_document_copy(source_filename, destination_filename)
130 | if success:
131 | return message
132 | else:
133 | return f"Failed to copy document: {message}"
134 |
135 |
136 | async def merge_documents(target_filename: str, source_filenames: List[str], add_page_breaks: bool = True) -> str:
137 | """Merge multiple Word documents into a single document.
138 |
139 | Args:
140 | target_filename: Path to the target document (will be created or overwritten)
141 | source_filenames: List of paths to source documents to merge
142 | add_page_breaks: If True, add page breaks between documents
143 | """
144 | from word_document_server.core.tables import copy_table
145 |
146 | target_filename = ensure_docx_extension(target_filename)
147 |
148 | # Check if target file is writeable
149 | is_writeable, error_message = check_file_writeable(target_filename)
150 | if not is_writeable:
151 | return f"Cannot create target document: {error_message}"
152 |
153 | # Validate all source documents exist
154 | missing_files = []
155 | for filename in source_filenames:
156 | doc_filename = ensure_docx_extension(filename)
157 | if not os.path.exists(doc_filename):
158 | missing_files.append(doc_filename)
159 |
160 | if missing_files:
161 | return f"Cannot merge documents. The following source files do not exist: {', '.join(missing_files)}"
162 |
163 | try:
164 | # Create a new document for the merged result
165 | target_doc = Document()
166 |
167 | # Process each source document
168 | for i, filename in enumerate(source_filenames):
169 | doc_filename = ensure_docx_extension(filename)
170 | source_doc = Document(doc_filename)
171 |
172 | # Add page break between documents (except before the first one)
173 | if add_page_breaks and i > 0:
174 | target_doc.add_page_break()
175 |
176 | # Copy all paragraphs
177 | for paragraph in source_doc.paragraphs:
178 | # Create a new paragraph with the same text and style
179 | new_paragraph = target_doc.add_paragraph(paragraph.text)
180 | new_paragraph.style = target_doc.styles['Normal'] # Default style
181 |
182 | # Try to match the style if possible
183 | try:
184 | if paragraph.style and paragraph.style.name in target_doc.styles:
185 | new_paragraph.style = target_doc.styles[paragraph.style.name]
186 | except:
187 | pass
188 |
189 | # Copy run formatting
190 | for i, run in enumerate(paragraph.runs):
191 | if i < len(new_paragraph.runs):
192 | new_run = new_paragraph.runs[i]
193 | # Copy basic formatting
194 | new_run.bold = run.bold
195 | new_run.italic = run.italic
196 | new_run.underline = run.underline
197 | # Font size if specified
198 | if run.font.size:
199 | new_run.font.size = run.font.size
200 |
201 | # Copy all tables
202 | for table in source_doc.tables:
203 | copy_table(table, target_doc)
204 |
205 | # Save the merged document
206 | target_doc.save(target_filename)
207 | return f"Successfully merged {len(source_filenames)} documents into {target_filename}"
208 | except Exception as e:
209 | return f"Failed to merge documents: {str(e)}"
210 |
211 |
212 | async def get_document_xml_tool(filename: str) -> str:
213 | """Get the raw XML structure of a Word document."""
214 | return get_document_xml(filename)
215 |
```
--------------------------------------------------------------------------------
/word_document_server/tools/extended_document_tools.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Extended document tools for Word Document Server.
3 |
4 | These tools provide enhanced document content extraction and search capabilities.
5 | """
6 | import os
7 | import json
8 | import subprocess
9 | import platform
10 | import shutil
11 | from typing import Dict, List, Optional, Any, Union, Tuple
12 | from docx import Document
13 |
14 | from word_document_server.utils.file_utils import check_file_writeable, ensure_docx_extension
15 | from word_document_server.utils.extended_document_utils import get_paragraph_text, find_text
16 |
17 |
18 | async def get_paragraph_text_from_document(filename: str, paragraph_index: int) -> str:
19 | """Get text from a specific paragraph in a Word document.
20 |
21 | Args:
22 | filename: Path to the Word document
23 | paragraph_index: Index of the paragraph to retrieve (0-based)
24 | """
25 | filename = ensure_docx_extension(filename)
26 |
27 | if not os.path.exists(filename):
28 | return f"Document {filename} does not exist"
29 |
30 |
31 | if paragraph_index < 0:
32 | return "Invalid parameter: paragraph_index must be a non-negative integer"
33 |
34 | try:
35 | result = get_paragraph_text(filename, paragraph_index)
36 | return json.dumps(result, indent=2)
37 | except Exception as e:
38 | return f"Failed to get paragraph text: {str(e)}"
39 |
40 |
41 | async def find_text_in_document(filename: str, text_to_find: str, match_case: bool = True, whole_word: bool = False) -> str:
42 | """Find occurrences of specific text in a Word document.
43 |
44 | Args:
45 | filename: Path to the Word document
46 | text_to_find: Text to search for in the document
47 | match_case: Whether to match case (True) or ignore case (False)
48 | whole_word: Whether to match whole words only (True) or substrings (False)
49 | """
50 | filename = ensure_docx_extension(filename)
51 |
52 | if not os.path.exists(filename):
53 | return f"Document {filename} does not exist"
54 |
55 | if not text_to_find:
56 | return "Search text cannot be empty"
57 |
58 | try:
59 |
60 | result = find_text(filename, text_to_find, match_case, whole_word)
61 | return json.dumps(result, indent=2)
62 | except Exception as e:
63 | return f"Failed to search for text: {str(e)}"
64 |
65 |
66 | async def convert_to_pdf(filename: str, output_filename: Optional[str] = None) -> str:
67 | """Convert a Word document to PDF format.
68 |
69 | Args:
70 | filename: Path to the Word document
71 | output_filename: Optional path for the output PDF. If not provided,
72 | will use the same name with .pdf extension
73 | """
74 | filename = ensure_docx_extension(filename)
75 |
76 | if not os.path.exists(filename):
77 | return f"Document {filename} does not exist"
78 |
79 | # Generate output filename if not provided
80 | if not output_filename:
81 | base_name, _ = os.path.splitext(filename)
82 | output_filename = f"{base_name}.pdf"
83 | elif not output_filename.lower().endswith('.pdf'):
84 | output_filename = f"{output_filename}.pdf"
85 |
86 | # Convert to absolute path if not already
87 | if not os.path.isabs(output_filename):
88 | output_filename = os.path.abspath(output_filename)
89 |
90 | # Ensure the output directory exists
91 | output_dir = os.path.dirname(output_filename)
92 | if not output_dir:
93 | output_dir = os.path.abspath('.')
94 |
95 | # Create the directory if it doesn't exist
96 | os.makedirs(output_dir, exist_ok=True)
97 |
98 | # Check if output file can be written
99 | is_writeable, error_message = check_file_writeable(output_filename)
100 | if not is_writeable:
101 | return f"Cannot create PDF: {error_message} (Path: {output_filename}, Dir: {output_dir})"
102 |
103 | try:
104 | # Determine platform for appropriate conversion method
105 | system = platform.system()
106 |
107 | if system == "Windows":
108 | # On Windows, try docx2pdf which uses Microsoft Word
109 | try:
110 | from docx2pdf import convert
111 | convert(filename, output_filename)
112 | return f"Document successfully converted to PDF: {output_filename}"
113 | except (ImportError, Exception) as e:
114 | return f"Failed to convert document to PDF: {str(e)}\nNote: docx2pdf requires Microsoft Word to be installed."
115 |
116 | elif system in ["Linux", "Darwin"]: # Linux or macOS
117 | errors = []
118 |
119 | # --- Attempt 1: LibreOffice ---
120 | lo_commands = []
121 | if system == "Darwin": # macOS
122 | lo_commands = ["soffice", "/Applications/LibreOffice.app/Contents/MacOS/soffice"]
123 | else: # Linux
124 | lo_commands = ["libreoffice", "soffice"]
125 |
126 | for cmd_name in lo_commands:
127 | try:
128 | output_dir_for_lo = os.path.dirname(output_filename) or '.'
129 | os.makedirs(output_dir_for_lo, exist_ok=True)
130 |
131 | cmd = [cmd_name, '--headless', '--convert-to', 'pdf', '--outdir', output_dir_for_lo, filename]
132 | result = subprocess.run(cmd, capture_output=True, text=True, timeout=60, check=False)
133 |
134 | if result.returncode == 0:
135 | # LibreOffice typically creates a PDF with the same base name as the source file.
136 | # e.g., 'mydoc.docx' -> 'mydoc.pdf'
137 | base_name = os.path.splitext(os.path.basename(filename))[0]
138 | created_pdf_name = f"{base_name}.pdf"
139 | created_pdf_path = os.path.join(output_dir_for_lo, created_pdf_name)
140 |
141 | # If the created file exists, move it to the desired output_filename if necessary.
142 | if os.path.exists(created_pdf_path):
143 | if created_pdf_path != output_filename:
144 | shutil.move(created_pdf_path, output_filename)
145 |
146 | # Final check: does the target file now exist?
147 | if os.path.exists(output_filename):
148 | return f"Document successfully converted to PDF via {cmd_name}: {output_filename}"
149 |
150 | # If we get here, soffice returned 0 but the expected file wasn't created.
151 | errors.append(f"{cmd_name} returned success code, but output file '{created_pdf_path}' was not found.")
152 | # Continue to the next command or fallback.
153 | else:
154 | errors.append(f"{cmd_name} failed. Stderr: {result.stderr.strip()}")
155 | except FileNotFoundError:
156 | errors.append(f"Command '{cmd_name}' not found.")
157 | except (subprocess.SubprocessError, Exception) as e:
158 | errors.append(f"An error occurred with {cmd_name}: {str(e)}")
159 |
160 | # --- Attempt 2: docx2pdf (Fallback) ---
161 | try:
162 | from docx2pdf import convert
163 | convert(filename, output_filename)
164 | if os.path.exists(output_filename) and os.path.getsize(output_filename) > 0:
165 | return f"Document successfully converted to PDF via docx2pdf: {output_filename}"
166 | else:
167 | errors.append("docx2pdf fallback was executed but failed to create a valid output file.")
168 | except ImportError:
169 | errors.append("docx2pdf is not installed, skipping fallback.")
170 | except Exception as e:
171 | errors.append(f"docx2pdf fallback failed with an exception: {str(e)}")
172 |
173 | # --- If all attempts failed ---
174 | error_summary = "Failed to convert document to PDF using all available methods.\n"
175 | error_summary += "Recorded errors: " + "; ".join(errors) + "\n"
176 | error_summary += "To convert documents to PDF, please install either:\n"
177 | error_summary += "1. LibreOffice (recommended for Linux/macOS)\n"
178 | error_summary += "2. Microsoft Word (required for docx2pdf on Windows/macOS)"
179 | return error_summary
180 | else:
181 | return f"PDF conversion not supported on {system} platform"
182 |
183 | except Exception as e:
184 | return f"Failed to convert document to PDF: {str(e)}"
185 |
```
--------------------------------------------------------------------------------
/word_document_server/core/protection.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Document protection functionality for Word Document Server.
3 | """
4 | import os
5 | import json
6 | import hashlib
7 | import datetime
8 | from typing import Dict, List, Tuple, Optional, Any
9 |
10 |
11 | def add_protection_info(doc_path: str, protection_type: str, password_hash: str,
12 | sections: Optional[List[str]] = None,
13 | signature_info: Optional[Dict[str, Any]] = None,
14 | raw_password: Optional[str] = None) -> bool:
15 | """
16 | Add document protection information to a separate metadata file and encrypt the document.
17 |
18 | Args:
19 | doc_path: Path to the document
20 | protection_type: Type of protection ('password', 'restricted', 'signature')
21 | password_hash: Hashed password for security
22 | sections: List of section names that can be edited (for restricted editing)
23 | signature_info: Information about digital signature
24 | raw_password: The actual password for document encryption
25 |
26 | Returns:
27 | True if protection info was successfully added, False otherwise
28 | """
29 | # Create metadata filename based on document path
30 | base_path, _ = os.path.splitext(doc_path)
31 | metadata_path = f"{base_path}.protection"
32 |
33 | # Prepare protection data
34 | protection_data = {
35 | "type": protection_type,
36 | "password_hash": password_hash,
37 | "applied_date": datetime.datetime.now().isoformat(),
38 | }
39 |
40 | if sections:
41 | protection_data["editable_sections"] = sections
42 |
43 | if signature_info:
44 | protection_data["signature"] = signature_info
45 |
46 | # Write protection info to metadata file
47 | try:
48 | with open(metadata_path, 'w') as f:
49 | json.dump(protection_data, f, indent=2)
50 |
51 | # Apply actual document encryption if raw_password is provided
52 | if protection_type == "password" and raw_password:
53 | import msoffcrypto
54 | import tempfile
55 | import shutil
56 |
57 | # Create a temporary file for the encrypted output
58 | temp_fd, temp_path = tempfile.mkstemp(suffix='.docx')
59 | os.close(temp_fd)
60 |
61 | try:
62 | # Open the document
63 | with open(doc_path, 'rb') as f:
64 | office_file = msoffcrypto.OfficeFile(f)
65 |
66 | # Encrypt with password
67 | office_file.load_key(password=raw_password)
68 |
69 | # Write the encrypted file to the temp path
70 | with open(temp_path, 'wb') as out_file:
71 | office_file.encrypt(out_file)
72 |
73 | # Replace original with encrypted version
74 | shutil.move(temp_path, doc_path)
75 |
76 | # Update metadata to note that true encryption was applied
77 | protection_data["true_encryption"] = True
78 | with open(metadata_path, 'w') as f:
79 | json.dump(protection_data, f, indent=2)
80 |
81 | except Exception as e:
82 | print(f"Encryption error: {str(e)}")
83 | if os.path.exists(temp_path):
84 | os.unlink(temp_path)
85 | return False
86 |
87 | return True
88 | except Exception as e:
89 | print(f"Protection error: {str(e)}")
90 | return False
91 |
92 |
93 | def verify_document_protection(doc_path: str, password: Optional[str] = None) -> Tuple[bool, str]:
94 | """
95 | Verify if a document is protected and if the password is correct.
96 |
97 | Args:
98 | doc_path: Path to the document
99 | password: Password to verify
100 |
101 | Returns:
102 | Tuple of (is_protected_and_verified, message)
103 | """
104 | base_path, _ = os.path.splitext(doc_path)
105 | metadata_path = f"{base_path}.protection"
106 |
107 | # Check if protection metadata exists
108 | if not os.path.exists(metadata_path):
109 | return False, "Document is not protected"
110 |
111 | try:
112 | # Read protection data
113 | with open(metadata_path, 'r') as f:
114 | protection_data = json.load(f)
115 |
116 | # If password is provided, verify it
117 | if password:
118 | password_hash = hashlib.sha256(password.encode()).hexdigest()
119 | if password_hash != protection_data.get("password_hash"):
120 | return False, "Incorrect password"
121 |
122 | # Return protection type
123 | protection_type = protection_data.get("type", "unknown")
124 | return True, f"Document is protected with {protection_type} protection"
125 |
126 | except Exception as e:
127 | return False, f"Error verifying protection: {str(e)}"
128 |
129 |
130 | def is_section_editable(doc_path: str, section_name: str) -> bool:
131 | """
132 | Check if a specific section of a document is editable.
133 |
134 | Args:
135 | doc_path: Path to the document
136 | section_name: Name of the section to check
137 |
138 | Returns:
139 | True if section is editable, False otherwise
140 | """
141 | base_path, _ = os.path.splitext(doc_path)
142 | metadata_path = f"{base_path}.protection"
143 |
144 | # Check if protection metadata exists
145 | if not os.path.exists(metadata_path):
146 | # If no protection exists, all sections are editable
147 | return True
148 |
149 | try:
150 | # Read protection data
151 | with open(metadata_path, 'r') as f:
152 | protection_data = json.load(f)
153 |
154 | # Check protection type
155 | if protection_data.get("type") != "restricted":
156 | # If not restricted editing, return based on protection type
157 | return protection_data.get("type") != "password"
158 |
159 | # Check if the section is in the list of editable sections
160 | editable_sections = protection_data.get("editable_sections", [])
161 | return section_name in editable_sections
162 |
163 | except Exception:
164 | # In case of error, default to not editable for security
165 | return False
166 |
167 |
168 | def create_signature_info(doc, signer_name: str, reason: Optional[str] = None) -> Dict[str, Any]:
169 | """
170 | Create signature information for a document.
171 |
172 | Args:
173 | doc: Document object
174 | signer_name: Name of the person signing the document
175 | reason: Optional reason for signing
176 |
177 | Returns:
178 | Dictionary containing signature information
179 | """
180 | # Create signature info
181 | signature_info = {
182 | "signer": signer_name,
183 | "timestamp": datetime.datetime.now().isoformat(),
184 | }
185 |
186 | if reason:
187 | signature_info["reason"] = reason
188 |
189 | # Generate a simple signature hash based on document content and metadata
190 | text_content = "\n".join([p.text for p in doc.paragraphs])
191 | content_hash = hashlib.sha256(text_content.encode()).hexdigest()
192 | signature_info["content_hash"] = content_hash
193 |
194 | return signature_info
195 |
196 |
197 | def verify_signature(doc_path: str) -> Tuple[bool, str]:
198 | """
199 | Verify a document's digital signature.
200 |
201 | Args:
202 | doc_path: Path to the document
203 |
204 | Returns:
205 | Tuple of (is_valid, message)
206 | """
207 | from docx import Document
208 |
209 | base_path, _ = os.path.splitext(doc_path)
210 | metadata_path = f"{base_path}.protection"
211 |
212 | if not os.path.exists(metadata_path):
213 | return False, "Document is not signed"
214 |
215 | try:
216 | # Read protection data
217 | with open(metadata_path, 'r') as f:
218 | protection_data = json.load(f)
219 |
220 | if protection_data.get("type") != "signature":
221 | return False, f"Document is protected with {protection_data.get('type')} protection, not a signature"
222 |
223 | # Get the original content hash
224 | signature_info = protection_data.get("signature", {})
225 | original_hash = signature_info.get("content_hash")
226 |
227 | if not original_hash:
228 | return False, "Invalid signature: missing content hash"
229 |
230 | # Calculate current content hash
231 | doc = Document(doc_path)
232 | text_content = "\n".join([p.text for p in doc.paragraphs])
233 | current_hash = hashlib.sha256(text_content.encode()).hexdigest()
234 |
235 | # Compare hashes
236 | if current_hash != original_hash:
237 | return False, f"Document has been modified since it was signed by {signature_info.get('signer')}"
238 |
239 | return True, f"Document signature is valid. Signed by {signature_info.get('signer')} on {signature_info.get('timestamp')}"
240 |
241 | except Exception as e:
242 | return False, f"Error verifying signature: {str(e)}"
243 |
```
--------------------------------------------------------------------------------
/word_document_server/tools/protection_tools.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Protection tools for Word Document Server.
3 |
4 | These tools handle document protection features such as
5 | password protection, restricted editing, and digital signatures.
6 | """
7 | import os
8 | import hashlib
9 | import datetime
10 | import io
11 | from typing import List, Optional, Dict, Any
12 | from docx import Document
13 | import msoffcrypto
14 |
15 | from word_document_server.utils.file_utils import check_file_writeable, ensure_docx_extension
16 |
17 |
18 |
19 | from word_document_server.core.protection import (
20 | add_protection_info,
21 | verify_document_protection,
22 | create_signature_info
23 | )
24 |
25 |
26 | async def protect_document(filename: str, password: str) -> str:
27 | """Add password protection to a Word document.
28 |
29 | Args:
30 | filename: Path to the Word document
31 | password: Password to protect the document with
32 | """
33 | filename = ensure_docx_extension(filename)
34 |
35 | if not os.path.exists(filename):
36 | return f"Document {filename} does not exist"
37 |
38 | # Check if file is writeable
39 | is_writeable, error_message = check_file_writeable(filename)
40 | if not is_writeable:
41 | return f"Cannot protect document: {error_message}"
42 |
43 | try:
44 | # Read the original file content
45 | with open(filename, "rb") as infile:
46 | original_data = infile.read()
47 |
48 | # Create an msoffcrypto file object from the original data
49 | file = msoffcrypto.OfficeFile(io.BytesIO(original_data))
50 | file.load_key(password=password) # Set the password for encryption
51 |
52 | # Encrypt the data into an in-memory buffer
53 | encrypted_data_io = io.BytesIO()
54 |
55 | file.encrypt(password=password, outfile=encrypted_data_io)
56 |
57 | # Overwrite the original file with the encrypted data
58 | with open(filename, "wb") as outfile:
59 | outfile.write(encrypted_data_io.getvalue())
60 |
61 |
62 | base_path, _ = os.path.splitext(filename)
63 | metadata_path = f"{base_path}.protection"
64 | if os.path.exists(metadata_path):
65 | os.remove(metadata_path)
66 |
67 | return f"Document {filename} encrypted successfully with password."
68 |
69 | except Exception as e:
70 | # Attempt to restore original file content on failure
71 | try:
72 | if 'original_data' in locals():
73 | with open(filename, "wb") as outfile:
74 | outfile.write(original_data)
75 | return f"Failed to encrypt document {filename}: {str(e)}. Original file restored."
76 | else:
77 | return f"Failed to encrypt document {filename}: {str(e)}. Could not restore original file."
78 | except Exception as restore_e:
79 | return f"Failed to encrypt document {filename}: {str(e)}. Also failed to restore original file: {str(restore_e)}"
80 |
81 |
82 | async def add_restricted_editing(filename: str, password: str, editable_sections: List[str]) -> str:
83 | """Add restricted editing to a Word document, allowing editing only in specified sections.
84 |
85 | Args:
86 | filename: Path to the Word document
87 | password: Password to protect the document with
88 | editable_sections: List of section names that can be edited
89 | """
90 | filename = ensure_docx_extension(filename)
91 |
92 | if not os.path.exists(filename):
93 | return f"Document {filename} does not exist"
94 |
95 | # Check if file is writeable
96 | is_writeable, error_message = check_file_writeable(filename)
97 | if not is_writeable:
98 | return f"Cannot protect document: {error_message}"
99 |
100 | try:
101 | # Hash the password for security
102 | password_hash = hashlib.sha256(password.encode()).hexdigest()
103 |
104 | # Add protection info to metadata
105 | success = add_protection_info(
106 | filename,
107 | protection_type="restricted",
108 | password_hash=password_hash,
109 | sections=editable_sections
110 | )
111 |
112 | if not editable_sections:
113 | return "No editable sections specified. Document will be fully protected."
114 |
115 | if success:
116 | return f"Document {filename} protected with restricted editing. Editable sections: {', '.join(editable_sections)}"
117 | else:
118 | return f"Failed to protect document {filename} with restricted editing"
119 | except Exception as e:
120 | return f"Failed to add restricted editing: {str(e)}"
121 |
122 | async def add_digital_signature(filename: str, signer_name: str, reason: Optional[str] = None) -> str:
123 | """Add a digital signature to a Word document.
124 |
125 | Args:
126 | filename: Path to the Word document
127 | signer_name: Name of the person signing the document
128 | reason: Optional reason for signing
129 | """
130 | filename = ensure_docx_extension(filename)
131 |
132 | if not os.path.exists(filename):
133 | return f"Document {filename} does not exist"
134 |
135 | # Check if file is writeable
136 | is_writeable, error_message = check_file_writeable(filename)
137 | if not is_writeable:
138 | return f"Cannot add signature to document: {error_message}"
139 |
140 | try:
141 | doc = Document(filename)
142 |
143 | # Create signature info
144 | signature_info = create_signature_info(doc, signer_name, reason)
145 |
146 | # Add protection info to metadata
147 | success = add_protection_info(
148 | filename,
149 | protection_type="signature",
150 | password_hash="", # No password for signature-only
151 | signature_info=signature_info
152 | )
153 |
154 | if success:
155 | # Add a visible signature block to the document
156 | doc.add_paragraph("").add_run() # Add empty paragraph for spacing
157 | signature_para = doc.add_paragraph()
158 | signature_para.add_run(f"Digitally signed by: {signer_name}").bold = True
159 | if reason:
160 | signature_para.add_run(f"\nReason: {reason}")
161 | signature_para.add_run(f"\nDate: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
162 | signature_para.add_run(f"\nSignature ID: {signature_info['content_hash'][:8]}")
163 |
164 | # Save the document with the visible signature
165 | doc.save(filename)
166 |
167 | return f"Digital signature added to document {filename}"
168 | else:
169 | return f"Failed to add digital signature to document {filename}"
170 | except Exception as e:
171 | return f"Failed to add digital signature: {str(e)}"
172 |
173 | async def verify_document(filename: str, password: Optional[str] = None) -> str:
174 | """Verify document protection and/or digital signature.
175 |
176 | Args:
177 | filename: Path to the Word document
178 | password: Optional password to verify
179 | """
180 | filename = ensure_docx_extension(filename)
181 |
182 | if not os.path.exists(filename):
183 | return f"Document {filename} does not exist"
184 |
185 | try:
186 | # Verify document protection
187 | is_verified, message = verify_document_protection(filename, password)
188 |
189 | if not is_verified and password:
190 | return f"Document verification failed: {message}"
191 |
192 | # If document has a digital signature, verify content integrity
193 | base_path, _ = os.path.splitext(filename)
194 | metadata_path = f"{base_path}.protection"
195 |
196 | if os.path.exists(metadata_path):
197 | try:
198 | import json
199 | with open(metadata_path, 'r') as f:
200 | protection_data = json.load(f)
201 |
202 | if protection_data.get("type") == "signature":
203 | # Get the original content hash
204 | signature_info = protection_data.get("signature", {})
205 | original_hash = signature_info.get("content_hash")
206 |
207 | if original_hash:
208 | # Calculate current content hash
209 | doc = Document(filename)
210 | text_content = "\n".join([p.text for p in doc.paragraphs])
211 | current_hash = hashlib.sha256(text_content.encode()).hexdigest()
212 |
213 | # Compare hashes
214 | if current_hash != original_hash:
215 | return f"Document has been modified since it was signed by {signature_info.get('signer')}"
216 | else:
217 | return f"Document signature is valid. Signed by {signature_info.get('signer')} on {signature_info.get('timestamp')}"
218 | except Exception as e:
219 | return f"Error verifying signature: {str(e)}"
220 |
221 | return message
222 | except Exception as e:
223 | return f"Failed to verify document: {str(e)}"
224 |
225 | async def unprotect_document(filename: str, password: str) -> str:
226 | """Remove password protection from a Word document.
227 |
228 | Args:
229 | filename: Path to the Word document
230 | password: Password that was used to protect the document
231 | """
232 | filename = ensure_docx_extension(filename)
233 |
234 | if not os.path.exists(filename):
235 | return f"Document {filename} does not exist"
236 |
237 | # Check if file is writeable
238 | is_writeable, error_message = check_file_writeable(filename)
239 | if not is_writeable:
240 | return f"Cannot modify document: {error_message}"
241 |
242 | try:
243 | # Read the encrypted file content
244 | with open(filename, "rb") as infile:
245 | encrypted_data = infile.read()
246 |
247 | # Create an msoffcrypto file object from the encrypted data
248 | file = msoffcrypto.OfficeFile(io.BytesIO(encrypted_data))
249 | file.load_key(password=password) # Set the password for decryption
250 |
251 | # Decrypt the data into an in-memory buffer
252 | decrypted_data_io = io.BytesIO()
253 | file.decrypt(outfile=decrypted_data_io) # Pass the buffer as the 'outfile' argument
254 |
255 | # Overwrite the original file with the decrypted data
256 | with open(filename, "wb") as outfile:
257 | outfile.write(decrypted_data_io.getvalue())
258 |
259 | return f"Document {filename} decrypted successfully."
260 |
261 | except msoffcrypto.exceptions.InvalidKeyError:
262 | return f"Failed to decrypt document {filename}: Incorrect password."
263 | except msoffcrypto.exceptions.InvalidFormatError:
264 | return f"Failed to decrypt document {filename}: File is not encrypted or is not a supported Office format."
265 | except Exception as e:
266 | # Attempt to restore encrypted file content on failure
267 | try:
268 | if 'encrypted_data' in locals():
269 | with open(filename, "wb") as outfile:
270 | outfile.write(encrypted_data)
271 | return f"Failed to decrypt document {filename}: {str(e)}. Encrypted file restored."
272 | else:
273 | return f"Failed to decrypt document {filename}: {str(e)}. Could not restore encrypted file."
274 | except Exception as restore_e:
275 | return f"Failed to decrypt document {filename}: {str(e)}. Also failed to restore encrypted file: {str(restore_e)}"
276 |
```
--------------------------------------------------------------------------------
/word_document_server/tools/content_tools.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Content tools for Word Document Server.
3 |
4 | These tools add various types of content to Word documents,
5 | including headings, paragraphs, tables, images, and page breaks.
6 | """
7 | import os
8 | from typing import List, Optional, Dict, Any
9 | from docx import Document
10 | from docx.shared import Inches, Pt, RGBColor
11 |
12 | from word_document_server.utils.file_utils import check_file_writeable, ensure_docx_extension
13 | from word_document_server.utils.document_utils import find_and_replace_text, insert_header_near_text, insert_numbered_list_near_text, insert_line_or_paragraph_near_text, replace_paragraph_block_below_header, replace_block_between_manual_anchors
14 | from word_document_server.core.styles import ensure_heading_style, ensure_table_style
15 |
16 |
17 | async def add_heading(filename: str, text: str, level: int = 1,
18 | font_name: Optional[str] = None, font_size: Optional[int] = None,
19 | bold: Optional[bool] = None, italic: Optional[bool] = None,
20 | border_bottom: bool = False) -> str:
21 | """Add a heading to a Word document with optional formatting.
22 |
23 | Args:
24 | filename: Path to the Word document
25 | text: Heading text
26 | level: Heading level (1-9, where 1 is the highest level)
27 | font_name: Font family (e.g., 'Helvetica')
28 | font_size: Font size in points (e.g., 14)
29 | bold: True/False for bold text
30 | italic: True/False for italic text
31 | border_bottom: True to add bottom border (for section headers)
32 | """
33 | filename = ensure_docx_extension(filename)
34 |
35 | # Ensure level is converted to integer
36 | try:
37 | level = int(level)
38 | except (ValueError, TypeError):
39 | return "Invalid parameter: level must be an integer between 1 and 9"
40 |
41 | # Validate level range
42 | if level < 1 or level > 9:
43 | return f"Invalid heading level: {level}. Level must be between 1 and 9."
44 |
45 | if not os.path.exists(filename):
46 | return f"Document {filename} does not exist"
47 |
48 | # Check if file is writeable
49 | is_writeable, error_message = check_file_writeable(filename)
50 | if not is_writeable:
51 | # Suggest creating a copy
52 | return f"Cannot modify document: {error_message}. Consider creating a copy first or creating a new document."
53 |
54 | try:
55 | doc = Document(filename)
56 |
57 | # Ensure heading styles exist
58 | ensure_heading_style(doc)
59 |
60 | # Try to add heading with style
61 | try:
62 | heading = doc.add_heading(text, level=level)
63 | except Exception as style_error:
64 | # If style-based approach fails, use direct formatting
65 | heading = doc.add_paragraph(text)
66 | heading.style = doc.styles['Normal']
67 | if heading.runs:
68 | run = heading.runs[0]
69 | run.bold = True
70 | # Adjust size based on heading level
71 | if level == 1:
72 | run.font.size = Pt(16)
73 | elif level == 2:
74 | run.font.size = Pt(14)
75 | else:
76 | run.font.size = Pt(12)
77 |
78 | # Apply formatting to all runs in the heading
79 | if any([font_name, font_size, bold is not None, italic is not None]):
80 | for run in heading.runs:
81 | if font_name:
82 | run.font.name = font_name
83 | if font_size:
84 | run.font.size = Pt(font_size)
85 | if bold is not None:
86 | run.font.bold = bold
87 | if italic is not None:
88 | run.font.italic = italic
89 |
90 | # Add bottom border if requested
91 | if border_bottom:
92 | from docx.oxml import OxmlElement
93 | from docx.oxml.ns import qn
94 |
95 | pPr = heading._element.get_or_add_pPr()
96 | pBdr = OxmlElement('w:pBdr')
97 |
98 | bottom = OxmlElement('w:bottom')
99 | bottom.set(qn('w:val'), 'single')
100 | bottom.set(qn('w:sz'), '4') # 0.5pt border
101 | bottom.set(qn('w:space'), '0')
102 | bottom.set(qn('w:color'), '000000')
103 |
104 | pBdr.append(bottom)
105 | pPr.append(pBdr)
106 |
107 | doc.save(filename)
108 | return f"Heading '{text}' (level {level}) added to {filename}"
109 | except Exception as e:
110 | return f"Failed to add heading: {str(e)}"
111 |
112 |
113 | async def add_paragraph(filename: str, text: str, style: Optional[str] = None,
114 | font_name: Optional[str] = None, font_size: Optional[int] = None,
115 | bold: Optional[bool] = None, italic: Optional[bool] = None,
116 | color: Optional[str] = None) -> str:
117 | """Add a paragraph to a Word document with optional formatting.
118 |
119 | Args:
120 | filename: Path to the Word document
121 | text: Paragraph text
122 | style: Optional paragraph style name
123 | font_name: Font family (e.g., 'Helvetica', 'Times New Roman')
124 | font_size: Font size in points (e.g., 14, 36)
125 | bold: True/False for bold text
126 | italic: True/False for italic text
127 | color: RGB color as hex string (e.g., '000000' for black)
128 | """
129 | filename = ensure_docx_extension(filename)
130 |
131 | if not os.path.exists(filename):
132 | return f"Document {filename} does not exist"
133 |
134 | # Check if file is writeable
135 | is_writeable, error_message = check_file_writeable(filename)
136 | if not is_writeable:
137 | # Suggest creating a copy
138 | return f"Cannot modify document: {error_message}. Consider creating a copy first or creating a new document."
139 |
140 | try:
141 | doc = Document(filename)
142 | paragraph = doc.add_paragraph(text)
143 |
144 | if style:
145 | try:
146 | paragraph.style = style
147 | except KeyError:
148 | # Style doesn't exist, use normal and report it
149 | paragraph.style = doc.styles['Normal']
150 | doc.save(filename)
151 | return f"Style '{style}' not found, paragraph added with default style to {filename}"
152 |
153 | # Apply formatting to all runs in the paragraph
154 | if any([font_name, font_size, bold is not None, italic is not None, color]):
155 | for run in paragraph.runs:
156 | if font_name:
157 | run.font.name = font_name
158 | if font_size:
159 | run.font.size = Pt(font_size)
160 | if bold is not None:
161 | run.font.bold = bold
162 | if italic is not None:
163 | run.font.italic = italic
164 | if color:
165 | # Remove any '#' prefix if present
166 | color_hex = color.lstrip('#')
167 | run.font.color.rgb = RGBColor.from_string(color_hex)
168 |
169 | doc.save(filename)
170 | return f"Paragraph added to {filename}"
171 | except Exception as e:
172 | return f"Failed to add paragraph: {str(e)}"
173 |
174 |
175 | async def add_table(filename: str, rows: int, cols: int, data: Optional[List[List[str]]] = None) -> str:
176 | """Add a table to a Word document.
177 |
178 | Args:
179 | filename: Path to the Word document
180 | rows: Number of rows in the table
181 | cols: Number of columns in the table
182 | data: Optional 2D array of data to fill the table
183 | """
184 | filename = ensure_docx_extension(filename)
185 |
186 | if not os.path.exists(filename):
187 | return f"Document {filename} does not exist"
188 |
189 | # Check if file is writeable
190 | is_writeable, error_message = check_file_writeable(filename)
191 | if not is_writeable:
192 | # Suggest creating a copy
193 | return f"Cannot modify document: {error_message}. Consider creating a copy first or creating a new document."
194 |
195 | try:
196 | doc = Document(filename)
197 | table = doc.add_table(rows=rows, cols=cols)
198 |
199 | # Try to set the table style
200 | try:
201 | table.style = 'Table Grid'
202 | except KeyError:
203 | # If style doesn't exist, add basic borders
204 | pass
205 |
206 | # Fill table with data if provided
207 | if data:
208 | for i, row_data in enumerate(data):
209 | if i >= rows:
210 | break
211 | for j, cell_text in enumerate(row_data):
212 | if j >= cols:
213 | break
214 | table.cell(i, j).text = str(cell_text)
215 |
216 | doc.save(filename)
217 | return f"Table ({rows}x{cols}) added to {filename}"
218 | except Exception as e:
219 | return f"Failed to add table: {str(e)}"
220 |
221 |
222 | async def add_picture(filename: str, image_path: str, width: Optional[float] = None) -> str:
223 | """Add an image to a Word document.
224 |
225 | Args:
226 | filename: Path to the Word document
227 | image_path: Path to the image file
228 | width: Optional width in inches (proportional scaling)
229 | """
230 | filename = ensure_docx_extension(filename)
231 |
232 | # Validate document existence
233 | if not os.path.exists(filename):
234 | return f"Document {filename} does not exist"
235 |
236 | # Get absolute paths for better diagnostics
237 | abs_filename = os.path.abspath(filename)
238 | abs_image_path = os.path.abspath(image_path)
239 |
240 | # Validate image existence with improved error message
241 | if not os.path.exists(abs_image_path):
242 | return f"Image file not found: {abs_image_path}"
243 |
244 | # Check image file size
245 | try:
246 | image_size = os.path.getsize(abs_image_path) / 1024 # Size in KB
247 | if image_size <= 0:
248 | return f"Image file appears to be empty: {abs_image_path} (0 KB)"
249 | except Exception as size_error:
250 | return f"Error checking image file: {str(size_error)}"
251 |
252 | # Check if file is writeable
253 | is_writeable, error_message = check_file_writeable(abs_filename)
254 | if not is_writeable:
255 | return f"Cannot modify document: {error_message}. Consider creating a copy first or creating a new document."
256 |
257 | try:
258 | doc = Document(abs_filename)
259 | # Additional diagnostic info
260 | diagnostic = f"Attempting to add image ({abs_image_path}, {image_size:.2f} KB) to document ({abs_filename})"
261 |
262 | try:
263 | if width:
264 | doc.add_picture(abs_image_path, width=Inches(width))
265 | else:
266 | doc.add_picture(abs_image_path)
267 | doc.save(abs_filename)
268 | return f"Picture {image_path} added to {filename}"
269 | except Exception as inner_error:
270 | # More detailed error for the specific operation
271 | error_type = type(inner_error).__name__
272 | error_msg = str(inner_error)
273 | return f"Failed to add picture: {error_type} - {error_msg or 'No error details available'}\nDiagnostic info: {diagnostic}"
274 | except Exception as outer_error:
275 | # Fallback error handling
276 | error_type = type(outer_error).__name__
277 | error_msg = str(outer_error)
278 | return f"Document processing error: {error_type} - {error_msg or 'No error details available'}"
279 |
280 |
281 | async def add_page_break(filename: str) -> str:
282 | """Add a page break to the document.
283 |
284 | Args:
285 | filename: Path to the Word document
286 | """
287 | filename = ensure_docx_extension(filename)
288 |
289 | if not os.path.exists(filename):
290 | return f"Document {filename} does not exist"
291 |
292 | # Check if file is writeable
293 | is_writeable, error_message = check_file_writeable(filename)
294 | if not is_writeable:
295 | return f"Cannot modify document: {error_message}. Consider creating a copy first."
296 |
297 | try:
298 | doc = Document(filename)
299 | doc.add_page_break()
300 | doc.save(filename)
301 | return f"Page break added to {filename}."
302 | except Exception as e:
303 | return f"Failed to add page break: {str(e)}"
304 |
305 |
306 | async def add_table_of_contents(filename: str, title: str = "Table of Contents", max_level: int = 3) -> str:
307 | """Add a table of contents to a Word document based on heading styles.
308 |
309 | Args:
310 | filename: Path to the Word document
311 | title: Optional title for the table of contents
312 | max_level: Maximum heading level to include (1-9)
313 | """
314 | filename = ensure_docx_extension(filename)
315 |
316 | if not os.path.exists(filename):
317 | return f"Document {filename} does not exist"
318 |
319 | # Check if file is writeable
320 | is_writeable, error_message = check_file_writeable(filename)
321 | if not is_writeable:
322 | return f"Cannot modify document: {error_message}. Consider creating a copy first."
323 |
324 | try:
325 | # Ensure max_level is within valid range
326 | max_level = max(1, min(max_level, 9))
327 |
328 | doc = Document(filename)
329 |
330 | # Collect headings and their positions
331 | headings = []
332 | for i, paragraph in enumerate(doc.paragraphs):
333 | # Check if paragraph style is a heading
334 | if paragraph.style and paragraph.style.name.startswith('Heading '):
335 | try:
336 | # Extract heading level from style name
337 | level = int(paragraph.style.name.split(' ')[1])
338 | if level <= max_level:
339 | headings.append({
340 | 'level': level,
341 | 'text': paragraph.text,
342 | 'position': i
343 | })
344 | except (ValueError, IndexError):
345 | # Skip if heading level can't be determined
346 | pass
347 |
348 | if not headings:
349 | return f"No headings found in document {filename}. Table of contents not created."
350 |
351 | # Create a new document with the TOC
352 | toc_doc = Document()
353 |
354 | # Add title
355 | if title:
356 | toc_doc.add_heading(title, level=1)
357 |
358 | # Add TOC entries
359 | for heading in headings:
360 | # Indent based on level (using tab characters)
361 | indent = ' ' * (heading['level'] - 1)
362 | toc_doc.add_paragraph(f"{indent}{heading['text']}")
363 |
364 | # Add page break
365 | toc_doc.add_page_break()
366 |
367 | # Get content from original document
368 | for paragraph in doc.paragraphs:
369 | p = toc_doc.add_paragraph(paragraph.text)
370 | # Copy style if possible
371 | try:
372 | if paragraph.style:
373 | p.style = paragraph.style.name
374 | except:
375 | pass
376 |
377 | # Copy tables
378 | for table in doc.tables:
379 | # Create a new table with the same dimensions
380 | new_table = toc_doc.add_table(rows=len(table.rows), cols=len(table.columns))
381 | # Copy cell contents
382 | for i, row in enumerate(table.rows):
383 | for j, cell in enumerate(row.cells):
384 | for paragraph in cell.paragraphs:
385 | new_table.cell(i, j).text = paragraph.text
386 |
387 | # Save the new document with TOC
388 | toc_doc.save(filename)
389 |
390 | return f"Table of contents with {len(headings)} entries added to {filename}"
391 | except Exception as e:
392 | return f"Failed to add table of contents: {str(e)}"
393 |
394 |
395 | async def delete_paragraph(filename: str, paragraph_index: int) -> str:
396 | """Delete a paragraph from a document.
397 |
398 | Args:
399 | filename: Path to the Word document
400 | paragraph_index: Index of the paragraph to delete (0-based)
401 | """
402 | filename = ensure_docx_extension(filename)
403 |
404 | if not os.path.exists(filename):
405 | return f"Document {filename} does not exist"
406 |
407 | # Check if file is writeable
408 | is_writeable, error_message = check_file_writeable(filename)
409 | if not is_writeable:
410 | return f"Cannot modify document: {error_message}. Consider creating a copy first."
411 |
412 | try:
413 | doc = Document(filename)
414 |
415 | # Validate paragraph index
416 | if paragraph_index < 0 or paragraph_index >= len(doc.paragraphs):
417 | return f"Invalid paragraph index. Document has {len(doc.paragraphs)} paragraphs (0-{len(doc.paragraphs)-1})."
418 |
419 | # Delete the paragraph (by removing its content and setting it empty)
420 | # Note: python-docx doesn't support true paragraph deletion, this is a workaround
421 | paragraph = doc.paragraphs[paragraph_index]
422 | p = paragraph._p
423 | p.getparent().remove(p)
424 |
425 | doc.save(filename)
426 | return f"Paragraph at index {paragraph_index} deleted successfully."
427 | except Exception as e:
428 | return f"Failed to delete paragraph: {str(e)}"
429 |
430 |
431 | async def search_and_replace(filename: str, find_text: str, replace_text: str) -> str:
432 | """Search for text and replace all occurrences.
433 |
434 | Args:
435 | filename: Path to the Word document
436 | find_text: Text to search for
437 | replace_text: Text to replace with
438 | """
439 | filename = ensure_docx_extension(filename)
440 |
441 | if not os.path.exists(filename):
442 | return f"Document {filename} does not exist"
443 |
444 | # Check if file is writeable
445 | is_writeable, error_message = check_file_writeable(filename)
446 | if not is_writeable:
447 | return f"Cannot modify document: {error_message}. Consider creating a copy first."
448 |
449 | try:
450 | doc = Document(filename)
451 |
452 | # Perform find and replace
453 | count = find_and_replace_text(doc, find_text, replace_text)
454 |
455 | if count > 0:
456 | doc.save(filename)
457 | return f"Replaced {count} occurrence(s) of '{find_text}' with '{replace_text}'."
458 | else:
459 | return f"No occurrences of '{find_text}' found."
460 | except Exception as e:
461 | return f"Failed to search and replace: {str(e)}"
462 |
463 | async def insert_header_near_text_tool(filename: str, target_text: str = None, header_title: str = "", position: str = 'after', header_style: str = 'Heading 1', target_paragraph_index: int = None) -> str:
464 | """Insert a header (with specified style) before or after the target paragraph. Specify by text or paragraph index."""
465 | return insert_header_near_text(filename, target_text, header_title, position, header_style, target_paragraph_index)
466 |
467 | async def insert_numbered_list_near_text_tool(filename: str, target_text: str = None, list_items: list = None, position: str = 'after', target_paragraph_index: int = None, bullet_type: str = 'bullet') -> str:
468 | """Insert a bulleted or numbered list before or after the target paragraph. Specify by text or paragraph index."""
469 | return insert_numbered_list_near_text(filename, target_text, list_items, position, target_paragraph_index, bullet_type)
470 |
471 | async def insert_line_or_paragraph_near_text_tool(filename: str, target_text: str = None, line_text: str = "", position: str = 'after', line_style: str = None, target_paragraph_index: int = None) -> str:
472 | """Insert a new line or paragraph (with specified or matched style) before or after the target paragraph. Specify by text or paragraph index."""
473 | return insert_line_or_paragraph_near_text(filename, target_text, line_text, position, line_style, target_paragraph_index)
474 |
475 | async def replace_paragraph_block_below_header_tool(filename: str, header_text: str, new_paragraphs: list, detect_block_end_fn=None) -> str:
476 | """Reemplaza el bloque de párrafos debajo de un encabezado, evitando modificar TOC."""
477 | return replace_paragraph_block_below_header(filename, header_text, new_paragraphs, detect_block_end_fn)
478 |
479 | async def replace_block_between_manual_anchors_tool(filename: str, start_anchor_text: str, new_paragraphs: list, end_anchor_text: str = None, match_fn=None, new_paragraph_style: str = None) -> str:
480 | """Replace all content between start_anchor_text and end_anchor_text (or next logical header if not provided)."""
481 | return replace_block_between_manual_anchors(filename, start_anchor_text, new_paragraphs, end_anchor_text, match_fn, new_paragraph_style)
482 |
```
--------------------------------------------------------------------------------
/setup_mcp.py:
--------------------------------------------------------------------------------
```python
1 | # Import necessary Python standard libraries
2 | import os
3 | import json
4 | import subprocess
5 | import sys
6 | import shutil
7 | import platform
8 |
9 | def check_prerequisites():
10 | """
11 | Check if necessary prerequisites are installed
12 |
13 | Returns:
14 | tuple: (python_ok, uv_installed, uvx_installed, word_server_installed)
15 | """
16 | # Check Python version
17 | python_version = sys.version_info
18 | python_ok = python_version.major >= 3 and python_version.minor >= 8
19 |
20 | # Check if uv/uvx is installed
21 | uv_installed = shutil.which("uv") is not None
22 | uvx_installed = shutil.which("uvx") is not None
23 |
24 | # Check if word-document-server is already installed via pip
25 | try:
26 | result = subprocess.run(
27 | [sys.executable, "-m", "pip", "show", "word-document-server"],
28 | capture_output=True,
29 | text=True,
30 | check=False
31 | )
32 | word_server_installed = result.returncode == 0
33 | except Exception:
34 | word_server_installed = False
35 |
36 | return (python_ok, uv_installed, uvx_installed, word_server_installed)
37 |
38 | def get_transport_choice():
39 | """
40 | Ask user to choose transport type
41 |
42 | Returns:
43 | dict: Transport configuration
44 | """
45 | print("\nTransport Configuration:")
46 | print("1. STDIO (default, local execution)")
47 | print("2. Streamable HTTP (modern, recommended for web deployment)")
48 | print("3. SSE (Server-Sent Events, for compatibility)")
49 |
50 | choice = input("\nSelect transport type (1-3, default: 1): ").strip()
51 |
52 | if choice == "2":
53 | host = input("Host (default: 127.0.0.1): ").strip() or "127.0.0.1"
54 | port = input("Port (default: 8000): ").strip() or "8000"
55 | path = input("Path (default: /mcp): ").strip() or "/mcp"
56 |
57 | return {
58 | "transport": "streamable-http",
59 | "host": host,
60 | "port": port,
61 | "path": path
62 | }
63 | elif choice == "3":
64 | host = input("Host (default: 127.0.0.1): ").strip() or "127.0.0.1"
65 | port = input("Port (default: 8000): ").strip() or "8000"
66 | sse_path = input("SSE Path (default: /sse): ").strip() or "/sse"
67 |
68 | return {
69 | "transport": "sse",
70 | "host": host,
71 | "port": port,
72 | "sse_path": sse_path
73 | }
74 | else:
75 | # Default to stdio
76 | return {
77 | "transport": "stdio"
78 | }
79 |
80 | def setup_venv():
81 | """
82 | Function to set up Python virtual environment
83 |
84 | Features:
85 | - Checks if Python version meets requirements (3.8+)
86 | - Creates Python virtual environment (if it doesn't exist)
87 | - Installs required dependencies in the newly created virtual environment
88 |
89 | No parameters required
90 |
91 | Returns: Path to Python interpreter in the virtual environment
92 | """
93 | # Check Python version
94 | python_version = sys.version_info
95 | if python_version.major < 3 or (python_version.major == 3 and python_version.minor < 8):
96 | print("Error: Python 3.8 or higher is required.")
97 | sys.exit(1)
98 |
99 | # Get absolute path of the directory containing the current script
100 | base_path = os.path.abspath(os.path.dirname(__file__))
101 | # Set virtual environment directory path
102 | venv_path = os.path.join(base_path, '.venv')
103 |
104 | # Determine pip and python executable paths based on operating system
105 | is_windows = platform.system() == "Windows"
106 | if is_windows:
107 | pip_path = os.path.join(venv_path, 'Scripts', 'pip.exe')
108 | python_path = os.path.join(venv_path, 'Scripts', 'python.exe')
109 | else:
110 | pip_path = os.path.join(venv_path, 'bin', 'pip')
111 | python_path = os.path.join(venv_path, 'bin', 'python')
112 |
113 | # Check if virtual environment already exists and is valid
114 | venv_exists = os.path.exists(venv_path)
115 | pip_exists = os.path.exists(pip_path)
116 |
117 | if not venv_exists or not pip_exists:
118 | print("Creating new virtual environment...")
119 | # Remove existing venv if it's invalid
120 | if venv_exists and not pip_exists:
121 | print("Existing virtual environment is incomplete, recreating it...")
122 | try:
123 | shutil.rmtree(venv_path)
124 | except Exception as e:
125 | print(f"Warning: Could not remove existing virtual environment: {e}")
126 | print("Please delete the .venv directory manually and try again.")
127 | sys.exit(1)
128 |
129 | # Create virtual environment
130 | try:
131 | subprocess.run([sys.executable, '-m', 'venv', venv_path], check=True)
132 | print("Virtual environment created successfully!")
133 | except subprocess.CalledProcessError as e:
134 | print(f"Error creating virtual environment: {e}")
135 | sys.exit(1)
136 | else:
137 | print("Valid virtual environment already exists.")
138 |
139 | # Double-check that pip exists after creating venv
140 | if not os.path.exists(pip_path):
141 | print(f"Error: pip executable not found at {pip_path}")
142 | print("Try creating the virtual environment manually with: python -m venv .venv")
143 | sys.exit(1)
144 |
145 | # Install or update dependencies
146 | print("\nInstalling requirements...")
147 | try:
148 | # Install FastMCP package (standalone library)
149 | subprocess.run([pip_path, 'install', 'fastmcp'], check=True)
150 | # Install python-docx package
151 | subprocess.run([pip_path, 'install', 'python-docx'], check=True)
152 |
153 | # Also install dependencies from requirements.txt if it exists
154 | requirements_path = os.path.join(base_path, 'requirements.txt')
155 | if os.path.exists(requirements_path):
156 | subprocess.run([pip_path, 'install', '-r', requirements_path], check=True)
157 |
158 | print("Requirements installed successfully!")
159 | except subprocess.CalledProcessError as e:
160 | print(f"Error installing requirements: {e}")
161 | sys.exit(1)
162 | except FileNotFoundError:
163 | print(f"Error: Could not execute {pip_path}")
164 | print("Try activating the virtual environment manually and installing requirements:")
165 | if is_windows:
166 | print(f".venv\\Scripts\\activate")
167 | else:
168 | print("source .venv/bin/activate")
169 | print("pip install mcp[cli] python-docx")
170 | sys.exit(1)
171 |
172 | return python_path
173 |
174 | def generate_mcp_config_local(python_path, transport_config):
175 | """
176 | Generate MCP configuration for locally installed word-document-server
177 |
178 | Parameters:
179 | - python_path: Path to Python interpreter in the virtual environment
180 | - transport_config: Transport configuration dictionary
181 |
182 | Returns: Path to the generated config file
183 | """
184 | # Get absolute path of the directory containing the current script
185 | base_path = os.path.abspath(os.path.dirname(__file__))
186 |
187 | # Path to Word Document Server script
188 | server_script_path = os.path.join(base_path, 'word_mcp_server.py')
189 |
190 | # Build environment variables
191 | env = {
192 | "PYTHONPATH": base_path,
193 | "MCP_TRANSPORT": transport_config["transport"]
194 | }
195 |
196 | # Add transport-specific environment variables
197 | if transport_config["transport"] == "streamable-http":
198 | env.update({
199 | "MCP_HOST": transport_config["host"],
200 | "MCP_PORT": transport_config["port"],
201 | "MCP_PATH": transport_config["path"]
202 | })
203 | elif transport_config["transport"] == "sse":
204 | env.update({
205 | "MCP_HOST": transport_config["host"],
206 | "MCP_PORT": transport_config["port"],
207 | "MCP_SSE_PATH": transport_config["sse_path"]
208 | })
209 | # For stdio transport, no additional environment variables needed
210 |
211 | # Create MCP configuration dictionary
212 | config = {
213 | "mcpServers": {
214 | "word-document-server": {
215 | "command": python_path,
216 | "args": [server_script_path],
217 | "env": env
218 | }
219 | }
220 | }
221 |
222 | # Save configuration to JSON file
223 | config_path = os.path.join(base_path, 'mcp-config.json')
224 | with open(config_path, 'w') as f:
225 | json.dump(config, f, indent=2)
226 |
227 | return config_path
228 |
229 | def generate_mcp_config_uvx(transport_config):
230 | """
231 | Generate MCP configuration for PyPI-installed word-document-server using UVX
232 |
233 | Parameters:
234 | - transport_config: Transport configuration dictionary
235 |
236 | Returns: Path to the generated config file
237 | """
238 | # Get absolute path of the directory containing the current script
239 | base_path = os.path.abspath(os.path.dirname(__file__))
240 |
241 | # Build environment variables
242 | env = {
243 | "MCP_TRANSPORT": transport_config["transport"]
244 | }
245 |
246 | # Add transport-specific environment variables
247 | if transport_config["transport"] == "streamable-http":
248 | env.update({
249 | "MCP_HOST": transport_config["host"],
250 | "MCP_PORT": transport_config["port"],
251 | "MCP_PATH": transport_config["path"]
252 | })
253 | elif transport_config["transport"] == "sse":
254 | env.update({
255 | "MCP_HOST": transport_config["host"],
256 | "MCP_PORT": transport_config["port"],
257 | "MCP_SSE_PATH": transport_config["sse_path"]
258 | })
259 | # For stdio transport, no additional environment variables needed
260 |
261 | # Create MCP configuration dictionary
262 | config = {
263 | "mcpServers": {
264 | "word-document-server": {
265 | "command": "uvx",
266 | "args": ["--from", "word-mcp-server", "word_mcp_server"],
267 | "env": env
268 | }
269 | }
270 | }
271 |
272 | # Save configuration to JSON file
273 | config_path = os.path.join(base_path, 'mcp-config.json')
274 | with open(config_path, 'w') as f:
275 | json.dump(config, f, indent=2)
276 |
277 | return config_path
278 |
279 | def generate_mcp_config_module(transport_config):
280 | """
281 | Generate MCP configuration for PyPI-installed word-document-server using Python module
282 |
283 | Parameters:
284 | - transport_config: Transport configuration dictionary
285 |
286 | Returns: Path to the generated config file
287 | """
288 | # Get absolute path of the directory containing the current script
289 | base_path = os.path.abspath(os.path.dirname(__file__))
290 |
291 | # Build environment variables
292 | env = {
293 | "MCP_TRANSPORT": transport_config["transport"]
294 | }
295 |
296 | # Add transport-specific environment variables
297 | if transport_config["transport"] == "streamable-http":
298 | env.update({
299 | "MCP_HOST": transport_config["host"],
300 | "MCP_PORT": transport_config["port"],
301 | "MCP_PATH": transport_config["path"]
302 | })
303 | elif transport_config["transport"] == "sse":
304 | env.update({
305 | "MCP_HOST": transport_config["host"],
306 | "MCP_PORT": transport_config["port"],
307 | "MCP_SSE_PATH": transport_config["sse_path"]
308 | })
309 |
310 |
311 | # Create MCP configuration dictionary
312 | config = {
313 | "mcpServers": {
314 | "word-document-server": {
315 | "command": sys.executable,
316 | "args": ["-m", "word_document_server"],
317 | "env": env
318 | }
319 | }
320 | }
321 |
322 | # Save configuration to JSON file
323 | config_path = os.path.join(base_path, 'mcp-config.json')
324 | with open(config_path, 'w') as f:
325 | json.dump(config, f, indent=2)
326 |
327 | return config_path
328 |
329 | def install_from_pypi():
330 | """
331 | Install word-document-server from PyPI
332 |
333 | Returns: True if successful, False otherwise
334 | """
335 | print("\nInstalling word-document-server from PyPI...")
336 | try:
337 | subprocess.run([sys.executable, "-m", "pip", "install", "word-mcp-server"], check=True)
338 | print("word-mcp-server successfully installed from PyPI!")
339 | return True
340 | except subprocess.CalledProcessError:
341 | print("Failed to install word-mcp-server from PyPI.")
342 | return False
343 |
344 | def print_config_instructions(config_path, transport_config):
345 | """
346 | Print instructions for using the generated config
347 |
348 | Parameters:
349 | - config_path: Path to the generated config file
350 | - transport_config: Transport configuration dictionary
351 | """
352 | print(f"\nMCP configuration has been written to: {config_path}")
353 |
354 | with open(config_path, 'r') as f:
355 | config = json.load(f)
356 |
357 | print("\nMCP configuration for Claude Desktop:")
358 | print(json.dumps(config, indent=2))
359 |
360 | # Print transport-specific instructions
361 | if transport_config["transport"] == "streamable-http":
362 | print(f"\n📡 Streamable HTTP Transport Configuration:")
363 | print(f" Server will be accessible at: http://{transport_config['host']}:{transport_config['port']}{transport_config['path']}")
364 | print(f" \n To test the server manually:")
365 | print(f" curl -X POST http://{transport_config['host']}:{transport_config['port']}{transport_config['path']}")
366 |
367 | elif transport_config["transport"] == "sse":
368 | print(f"\n📡 SSE Transport Configuration:")
369 | print(f" Server will be accessible at: http://{transport_config['host']}:{transport_config['port']}{transport_config['sse_path']}")
370 | print(f" \n To test the server manually:")
371 | print(f" curl http://{transport_config['host']}:{transport_config['port']}{transport_config['sse_path']}")
372 |
373 | else: # stdio
374 | print(f"\n💻 STDIO Transport Configuration:")
375 | print(f" Server runs locally with standard input/output")
376 |
377 | # Provide instructions for adding configuration to Claude Desktop configuration file
378 | if platform.system() == "Windows":
379 | claude_config_path = os.path.expandvars("%APPDATA%\\Claude\\claude_desktop_config.json")
380 | else: # macOS
381 | claude_config_path = os.path.expanduser("~/Library/Application Support/Claude/claude_desktop_config.json")
382 |
383 | print(f"\nTo use with Claude Desktop, merge this configuration into: {claude_config_path}")
384 |
385 | def create_package_structure():
386 | """
387 | Create necessary package structure and environment files
388 | """
389 | # Get absolute path of the directory containing the current script
390 | base_path = os.path.abspath(os.path.dirname(__file__))
391 |
392 | # Create __init__.py file
393 | init_path = os.path.join(base_path, '__init__.py')
394 | if not os.path.exists(init_path):
395 | with open(init_path, 'w') as f:
396 | f.write('# Word Document MCP Server')
397 | print(f"Created __init__.py at: {init_path}")
398 |
399 | # Create requirements.txt file
400 | requirements_path = os.path.join(base_path, 'requirements.txt')
401 | if not os.path.exists(requirements_path):
402 | with open(requirements_path, 'w') as f:
403 | f.write('fastmcp\npython-docx\nmsoffcrypto-tool\ndocx2pdf\nhttpx\ncryptography\n')
404 | print(f"Created requirements.txt at: {requirements_path}")
405 |
406 | # Create .env.example file
407 | env_example_path = os.path.join(base_path, '.env.example')
408 | if not os.path.exists(env_example_path):
409 | with open(env_example_path, 'w') as f:
410 | f.write("""# Transport Configuration
411 | # Valid options: stdio, streamable-http, sse
412 | MCP_TRANSPORT=stdio
413 |
414 | # HTTP/SSE Configuration (when not using stdio)
415 | MCP_HOST=127.0.0.1
416 | MCP_PORT=8000
417 |
418 | # Streamable HTTP specific
419 | MCP_PATH=/mcp
420 |
421 | # SSE specific
422 | MCP_SSE_PATH=/sse
423 |
424 | """)
425 | print(f"Created .env.example at: {env_example_path}")
426 |
427 | # Main execution entry point
428 | if __name__ == '__main__':
429 | # Check prerequisites
430 | python_ok, uv_installed, uvx_installed, word_server_installed = check_prerequisites()
431 |
432 | if not python_ok:
433 | print("Error: Python 3.8 or higher is required.")
434 | sys.exit(1)
435 |
436 | print("Word Document MCP Server Setup (Multi-Transport)")
437 | print("===============================================\n")
438 |
439 | # Create necessary files
440 | create_package_structure()
441 |
442 | # Get transport configuration
443 | transport_config = get_transport_choice()
444 |
445 | # If word-document-server is already installed, offer config options
446 | if word_server_installed:
447 | print("word-document-server is already installed via pip.")
448 |
449 | if uvx_installed:
450 | print("\nOptions:")
451 | print("1. Generate MCP config for UVX (recommended)")
452 | print("2. Generate MCP config for Python module")
453 | print("3. Set up local development environment")
454 |
455 | choice = input("\nEnter your choice (1-3): ")
456 |
457 | if choice == "1":
458 | config_path = generate_mcp_config_uvx(transport_config)
459 | print_config_instructions(config_path, transport_config)
460 | elif choice == "2":
461 | config_path = generate_mcp_config_module(transport_config)
462 | print_config_instructions(config_path, transport_config)
463 | elif choice == "3":
464 | python_path = setup_venv()
465 | config_path = generate_mcp_config_local(python_path, transport_config)
466 | print_config_instructions(config_path, transport_config)
467 | else:
468 | print("Invalid choice. Exiting.")
469 | sys.exit(1)
470 | else:
471 | print("\nOptions:")
472 | print("1. Generate MCP config for Python module")
473 | print("2. Set up local development environment")
474 |
475 | choice = input("\nEnter your choice (1-2): ")
476 |
477 | if choice == "1":
478 | config_path = generate_mcp_config_module(transport_config)
479 | print_config_instructions(config_path, transport_config)
480 | elif choice == "2":
481 | python_path = setup_venv()
482 | config_path = generate_mcp_config_local(python_path, transport_config)
483 | print_config_instructions(config_path, transport_config)
484 | else:
485 | print("Invalid choice. Exiting.")
486 | sys.exit(1)
487 |
488 | # If word-document-server is not installed, offer installation options
489 | else:
490 | print("word-document-server is not installed.")
491 |
492 | print("\nOptions:")
493 | print("1. Install from PyPI (recommended)")
494 | print("2. Set up local development environment")
495 |
496 | choice = input("\nEnter your choice (1-2): ")
497 |
498 | if choice == "1":
499 | if install_from_pypi():
500 | if uvx_installed:
501 | print("\nNow generating MCP config for UVX...")
502 | config_path = generate_mcp_config_uvx(transport_config)
503 | else:
504 | print("\nUVX not found. Generating MCP config for Python module...")
505 | config_path = generate_mcp_config_module(transport_config)
506 | print_config_instructions(config_path, transport_config)
507 | elif choice == "2":
508 | python_path = setup_venv()
509 | config_path = generate_mcp_config_local(python_path, transport_config)
510 | print_config_instructions(config_path, transport_config)
511 | else:
512 | print("Invalid choice. Exiting.")
513 | sys.exit(1)
514 |
515 | print("\nSetup complete! You can now use the Word Document MCP server with compatible clients like Claude Desktop.")
516 | print("\nTransport Summary:")
517 | print(f" - Transport: {transport_config['transport']}")
518 | if transport_config['transport'] != 'stdio':
519 | print(f" - Host: {transport_config.get('host', 'N/A')}")
520 | print(f" - Port: {transport_config.get('port', 'N/A')}")
521 | if transport_config['transport'] == 'streamable-http':
522 | print(f" - Path: {transport_config.get('path', 'N/A')}")
523 | elif transport_config['transport'] == 'sse':
524 | print(f" - SSE Path: {transport_config.get('sse_path', 'N/A')}")
```
--------------------------------------------------------------------------------
/word_document_server/utils/document_utils.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Document utility functions for Word Document Server.
3 | """
4 | import json
5 | from typing import Dict, List, Any
6 | from docx import Document
7 | from docx.oxml.table import CT_Tbl
8 | from docx.oxml.text.paragraph import CT_P
9 | from docx.oxml.ns import qn
10 | from docx.oxml import OxmlElement
11 |
12 |
13 | def get_document_properties(doc_path: str) -> Dict[str, Any]:
14 | """Get properties of a Word document."""
15 | import os
16 | if not os.path.exists(doc_path):
17 | return {"error": f"Document {doc_path} does not exist"}
18 |
19 | try:
20 | doc = Document(doc_path)
21 | core_props = doc.core_properties
22 |
23 | return {
24 | "title": core_props.title or "",
25 | "author": core_props.author or "",
26 | "subject": core_props.subject or "",
27 | "keywords": core_props.keywords or "",
28 | "created": str(core_props.created) if core_props.created else "",
29 | "modified": str(core_props.modified) if core_props.modified else "",
30 | "last_modified_by": core_props.last_modified_by or "",
31 | "revision": core_props.revision or 0,
32 | "page_count": len(doc.sections),
33 | "word_count": sum(len(paragraph.text.split()) for paragraph in doc.paragraphs),
34 | "paragraph_count": len(doc.paragraphs),
35 | "table_count": len(doc.tables)
36 | }
37 | except Exception as e:
38 | return {"error": f"Failed to get document properties: {str(e)}"}
39 |
40 |
41 | def extract_document_text(doc_path: str) -> str:
42 | """Extract all text from a Word document."""
43 | import os
44 | if not os.path.exists(doc_path):
45 | return f"Document {doc_path} does not exist"
46 |
47 | try:
48 | doc = Document(doc_path)
49 | text = []
50 |
51 | for paragraph in doc.paragraphs:
52 | text.append(paragraph.text)
53 |
54 | for table in doc.tables:
55 | for row in table.rows:
56 | for cell in row.cells:
57 | for paragraph in cell.paragraphs:
58 | text.append(paragraph.text)
59 |
60 | return "\n".join(text)
61 | except Exception as e:
62 | return f"Failed to extract text: {str(e)}"
63 |
64 |
65 | def get_document_structure(doc_path: str) -> Dict[str, Any]:
66 | """Get the structure of a Word document."""
67 | import os
68 | if not os.path.exists(doc_path):
69 | return {"error": f"Document {doc_path} does not exist"}
70 |
71 | try:
72 | doc = Document(doc_path)
73 | structure = {
74 | "paragraphs": [],
75 | "tables": []
76 | }
77 |
78 | # Get paragraphs
79 | for i, para in enumerate(doc.paragraphs):
80 | structure["paragraphs"].append({
81 | "index": i,
82 | "text": para.text[:100] + ("..." if len(para.text) > 100 else ""),
83 | "style": para.style.name if para.style else "Normal"
84 | })
85 |
86 | # Get tables
87 | for i, table in enumerate(doc.tables):
88 | table_data = {
89 | "index": i,
90 | "rows": len(table.rows),
91 | "columns": len(table.columns),
92 | "preview": []
93 | }
94 |
95 | # Get sample of table data
96 | max_rows = min(3, len(table.rows))
97 | for row_idx in range(max_rows):
98 | row_data = []
99 | max_cols = min(3, len(table.columns))
100 | for col_idx in range(max_cols):
101 | try:
102 | cell_text = table.cell(row_idx, col_idx).text
103 | row_data.append(cell_text[:20] + ("..." if len(cell_text) > 20 else ""))
104 | except IndexError:
105 | row_data.append("N/A")
106 | table_data["preview"].append(row_data)
107 |
108 | structure["tables"].append(table_data)
109 |
110 | return structure
111 | except Exception as e:
112 | return {"error": f"Failed to get document structure: {str(e)}"}
113 |
114 |
115 | def find_paragraph_by_text(doc, text, partial_match=False):
116 | """
117 | Find paragraphs containing specific text.
118 |
119 | Args:
120 | doc: Document object
121 | text: Text to search for
122 | partial_match: If True, matches paragraphs containing the text; if False, matches exact text
123 |
124 | Returns:
125 | List of paragraph indices that match the criteria
126 | """
127 | matching_paragraphs = []
128 |
129 | for i, para in enumerate(doc.paragraphs):
130 | if partial_match and text in para.text:
131 | matching_paragraphs.append(i)
132 | elif not partial_match and para.text == text:
133 | matching_paragraphs.append(i)
134 |
135 | return matching_paragraphs
136 |
137 |
138 | def find_and_replace_text(doc, old_text, new_text):
139 | """
140 | Find and replace text throughout the document, skipping Table of Contents (TOC) paragraphs.
141 |
142 | Args:
143 | doc: Document object
144 | old_text: Text to find
145 | new_text: Text to replace with
146 |
147 | Returns:
148 | Number of replacements made
149 | """
150 | count = 0
151 |
152 | # Search in paragraphs
153 | for para in doc.paragraphs:
154 | # Skip TOC paragraphs
155 | if para.style and para.style.name.startswith("TOC"):
156 | continue
157 | if old_text in para.text:
158 | for run in para.runs:
159 | if old_text in run.text:
160 | run.text = run.text.replace(old_text, new_text)
161 | count += 1
162 |
163 | # Search in tables
164 | for table in doc.tables:
165 | for row in table.rows:
166 | for cell in row.cells:
167 | for para in cell.paragraphs:
168 | # Skip TOC paragraphs in tables
169 | if para.style and para.style.name.startswith("TOC"):
170 | continue
171 | if old_text in para.text:
172 | for run in para.runs:
173 | if old_text in run.text:
174 | run.text = run.text.replace(old_text, new_text)
175 | count += 1
176 |
177 | return count
178 |
179 |
180 | def get_document_xml(doc_path: str) -> str:
181 | """Extract and return the raw XML structure of the Word document (word/document.xml)."""
182 | import os
183 | import zipfile
184 | if not os.path.exists(doc_path):
185 | return f"Document {doc_path} does not exist"
186 | try:
187 | with zipfile.ZipFile(doc_path) as docx_zip:
188 | with docx_zip.open('word/document.xml') as xml_file:
189 | return xml_file.read().decode('utf-8')
190 | except Exception as e:
191 | return f"Failed to extract XML: {str(e)}"
192 |
193 |
194 | def insert_header_near_text(doc_path: str, target_text: str = None, header_title: str = "", position: str = 'after', header_style: str = 'Heading 1', target_paragraph_index: int = None) -> str:
195 | """Insert a header (with specified style) before or after the target paragraph. Specify by text or paragraph index. Skips TOC paragraphs in text search."""
196 | import os
197 | from docx import Document
198 | if not os.path.exists(doc_path):
199 | return f"Document {doc_path} does not exist"
200 | try:
201 | doc = Document(doc_path)
202 | found = False
203 | para = None
204 | if target_paragraph_index is not None:
205 | if target_paragraph_index < 0 or target_paragraph_index >= len(doc.paragraphs):
206 | return f"Invalid target_paragraph_index: {target_paragraph_index}. Document has {len(doc.paragraphs)} paragraphs."
207 | para = doc.paragraphs[target_paragraph_index]
208 | found = True
209 | else:
210 | for i, p in enumerate(doc.paragraphs):
211 | # Skip TOC paragraphs
212 | if p.style and p.style.name.lower().startswith("toc"):
213 | continue
214 | if target_text and target_text in p.text:
215 | para = p
216 | found = True
217 | break
218 | if not found or para is None:
219 | return f"Target paragraph not found (by index or text). (TOC paragraphs are skipped in text search)"
220 | # Save anchor index before insertion
221 | if target_paragraph_index is not None:
222 | anchor_index = target_paragraph_index
223 | else:
224 | anchor_index = None
225 | for i, p in enumerate(doc.paragraphs):
226 | if p is para:
227 | anchor_index = i
228 | break
229 | new_para = doc.add_paragraph(header_title, style=header_style)
230 | if position == 'before':
231 | para._element.addprevious(new_para._element)
232 | else:
233 | para._element.addnext(new_para._element)
234 | doc.save(doc_path)
235 | if anchor_index is not None:
236 | return f"Header '{header_title}' (style: {header_style}) inserted {position} paragraph (index {anchor_index})."
237 | else:
238 | return f"Header '{header_title}' (style: {header_style}) inserted {position} the target paragraph."
239 | except Exception as e:
240 | return f"Failed to insert header: {str(e)}"
241 |
242 |
243 | def insert_line_or_paragraph_near_text(doc_path: str, target_text: str = None, line_text: str = "", position: str = 'after', line_style: str = None, target_paragraph_index: int = None) -> str:
244 | """
245 | Insert a new line or paragraph (with specified or matched style) before or after the target paragraph.
246 | You can specify the target by text (first match) or by paragraph index.
247 | Skips paragraphs whose style name starts with 'TOC' if using text search.
248 | """
249 | import os
250 | from docx import Document
251 | if not os.path.exists(doc_path):
252 | return f"Document {doc_path} does not exist"
253 | try:
254 | doc = Document(doc_path)
255 | found = False
256 | para = None
257 | if target_paragraph_index is not None:
258 | if target_paragraph_index < 0 or target_paragraph_index >= len(doc.paragraphs):
259 | return f"Invalid target_paragraph_index: {target_paragraph_index}. Document has {len(doc.paragraphs)} paragraphs."
260 | para = doc.paragraphs[target_paragraph_index]
261 | found = True
262 | else:
263 | for i, p in enumerate(doc.paragraphs):
264 | # Skip TOC paragraphs
265 | if p.style and p.style.name.lower().startswith("toc"):
266 | continue
267 | if target_text and target_text in p.text:
268 | para = p
269 | found = True
270 | break
271 | if not found or para is None:
272 | return f"Target paragraph not found (by index or text). (TOC paragraphs are skipped in text search)"
273 | # Save anchor index before insertion
274 | if target_paragraph_index is not None:
275 | anchor_index = target_paragraph_index
276 | else:
277 | anchor_index = None
278 | for i, p in enumerate(doc.paragraphs):
279 | if p is para:
280 | anchor_index = i
281 | break
282 | # Determine style: use provided or match target
283 | style = line_style if line_style else para.style
284 | new_para = doc.add_paragraph(line_text, style=style)
285 | if position == 'before':
286 | para._element.addprevious(new_para._element)
287 | else:
288 | para._element.addnext(new_para._element)
289 | doc.save(doc_path)
290 | if anchor_index is not None:
291 | return f"Line/paragraph inserted {position} paragraph (index {anchor_index}) with style '{style}'."
292 | else:
293 | return f"Line/paragraph inserted {position} the target paragraph with style '{style}'."
294 | except Exception as e:
295 | return f"Failed to insert line/paragraph: {str(e)}"
296 |
297 |
298 | def add_bullet_numbering(paragraph, num_id=1, level=0):
299 | """
300 | Add bullet/numbering XML to a paragraph.
301 |
302 | Args:
303 | paragraph: python-docx Paragraph object
304 | num_id: Numbering definition ID (1=bullets, 2=numbers, etc.)
305 | level: Indentation level (0=first level, 1=second level, etc.)
306 |
307 | Returns:
308 | The modified paragraph
309 | """
310 | # Get or create paragraph properties
311 | pPr = paragraph._element.get_or_add_pPr()
312 |
313 | # Remove existing numPr if any (to avoid duplicates)
314 | existing_numPr = pPr.find(qn('w:numPr'))
315 | if existing_numPr is not None:
316 | pPr.remove(existing_numPr)
317 |
318 | # Create numbering properties element
319 | numPr = OxmlElement('w:numPr')
320 |
321 | # Set indentation level
322 | ilvl = OxmlElement('w:ilvl')
323 | ilvl.set(qn('w:val'), str(level))
324 | numPr.append(ilvl)
325 |
326 | # Set numbering definition ID
327 | numId = OxmlElement('w:numId')
328 | numId.set(qn('w:val'), str(num_id))
329 | numPr.append(numId)
330 |
331 | # Add to paragraph properties
332 | pPr.append(numPr)
333 |
334 | return paragraph
335 |
336 |
337 | def insert_numbered_list_near_text(doc_path: str, target_text: str = None, list_items: list = None, position: str = 'after', target_paragraph_index: int = None, bullet_type: str = 'bullet') -> str:
338 | """
339 | Insert a bulleted or numbered list before or after the target paragraph. Specify by text or paragraph index. Skips TOC paragraphs in text search.
340 | Args:
341 | doc_path: Path to the Word document
342 | target_text: Text to search for in paragraphs (optional if using index)
343 | list_items: List of strings, each as a list item
344 | position: 'before' or 'after' (default: 'after')
345 | target_paragraph_index: Optional paragraph index to use as anchor
346 | bullet_type: 'bullet' for bullets (•), 'number' for numbers (1,2,3) (default: 'bullet')
347 | Returns:
348 | Status message
349 | """
350 | import os
351 | from docx import Document
352 | if not os.path.exists(doc_path):
353 | return f"Document {doc_path} does not exist"
354 | try:
355 | doc = Document(doc_path)
356 | found = False
357 | para = None
358 | if target_paragraph_index is not None:
359 | if target_paragraph_index < 0 or target_paragraph_index >= len(doc.paragraphs):
360 | return f"Invalid target_paragraph_index: {target_paragraph_index}. Document has {len(doc.paragraphs)} paragraphs."
361 | para = doc.paragraphs[target_paragraph_index]
362 | found = True
363 | else:
364 | for i, p in enumerate(doc.paragraphs):
365 | # Skip TOC paragraphs
366 | if p.style and p.style.name.lower().startswith("toc"):
367 | continue
368 | if target_text and target_text in p.text:
369 | para = p
370 | found = True
371 | break
372 | if not found or para is None:
373 | return f"Target paragraph not found (by index or text). (TOC paragraphs are skipped in text search)"
374 | # Save anchor index before insertion
375 | if target_paragraph_index is not None:
376 | anchor_index = target_paragraph_index
377 | else:
378 | anchor_index = None
379 | for i, p in enumerate(doc.paragraphs):
380 | if p is para:
381 | anchor_index = i
382 | break
383 | # Determine numbering ID based on bullet_type
384 | num_id = 1 if bullet_type == 'bullet' else 2
385 |
386 | # Use ListParagraph style for proper list formatting
387 | style_name = None
388 | for candidate in ['List Paragraph', 'ListParagraph', 'Normal']:
389 | try:
390 | _ = doc.styles[candidate]
391 | style_name = candidate
392 | break
393 | except KeyError:
394 | continue
395 | if not style_name:
396 | style_name = None # fallback to default
397 |
398 | new_paras = []
399 | for item in (list_items or []):
400 | p = doc.add_paragraph(item, style=style_name)
401 | # Add bullet numbering XML - this is the fix!
402 | add_bullet_numbering(p, num_id=num_id, level=0)
403 | new_paras.append(p)
404 | # Move the new paragraphs to the correct position
405 | for p in reversed(new_paras):
406 | if position == 'before':
407 | para._element.addprevious(p._element)
408 | else:
409 | para._element.addnext(p._element)
410 | doc.save(doc_path)
411 | list_type = "bulleted" if bullet_type == 'bullet' else "numbered"
412 | if anchor_index is not None:
413 | return f"{list_type.capitalize()} list with {len(new_paras)} items inserted {position} paragraph (index {anchor_index})."
414 | else:
415 | return f"{list_type.capitalize()} list with {len(new_paras)} items inserted {position} the target paragraph."
416 | except Exception as e:
417 | return f"Failed to insert numbered list: {str(e)}"
418 |
419 |
420 | def is_toc_paragraph(para):
421 | """Devuelve True si el párrafo tiene un estilo de tabla de contenido (TOC)."""
422 | return para.style and para.style.name.upper().startswith("TOC")
423 |
424 |
425 | def is_heading_paragraph(para):
426 | """Devuelve True si el párrafo tiene un estilo de encabezado (Heading 1, Heading 2, etc)."""
427 | return para.style and para.style.name.lower().startswith("heading")
428 |
429 |
430 | # --- Helper: Get style name from a <w:p> element ---
431 | def get_paragraph_style(el):
432 | from docx.oxml.ns import qn
433 | pPr = el.find(qn('w:pPr'))
434 | if pPr is not None:
435 | pStyle = pPr.find(qn('w:pStyle'))
436 | if pStyle is not None and 'w:val' in pStyle.attrib:
437 | return pStyle.attrib['w:val']
438 | return None
439 |
440 | # --- Main: Delete everything under a header until next heading/TOC ---
441 | def delete_block_under_header(doc, header_text):
442 | """
443 | Remove all elements (paragraphs, tables, etc.) after the header (by text) and before the next heading/TOC (by style).
444 | Returns: (header_element, elements_removed)
445 | """
446 | # Find the header paragraph by text (like delete_paragraph finds by index)
447 | header_para = None
448 | header_idx = None
449 |
450 | for i, para in enumerate(doc.paragraphs):
451 | if para.text.strip().lower() == header_text.strip().lower():
452 | header_para = para
453 | header_idx = i
454 | break
455 |
456 | if header_para is None:
457 | return None, 0
458 |
459 | # Find the next heading/TOC paragraph to determine the end of the block
460 | end_idx = None
461 | for i in range(header_idx + 1, len(doc.paragraphs)):
462 | para = doc.paragraphs[i]
463 | if para.style and para.style.name.lower().startswith(('heading', 'título', 'toc')):
464 | end_idx = i
465 | break
466 |
467 | # If no next heading found, delete until end of document
468 | if end_idx is None:
469 | end_idx = len(doc.paragraphs)
470 |
471 | # Remove paragraphs by index (like delete_paragraph does)
472 | removed_count = 0
473 | for i in range(header_idx + 1, end_idx):
474 | if i < len(doc.paragraphs): # Safety check
475 | para = doc.paragraphs[header_idx + 1] # Always remove the first paragraph after header
476 | p = para._p
477 | p.getparent().remove(p)
478 | removed_count += 1
479 |
480 | return header_para._p, removed_count
481 |
482 | # --- Usage in replace_paragraph_block_below_header ---
483 | def replace_paragraph_block_below_header(
484 | doc_path: str,
485 | header_text: str,
486 | new_paragraphs: list,
487 | detect_block_end_fn=None,
488 | new_paragraph_style: str = None
489 | ) -> str:
490 | """
491 | Reemplaza todo el contenido debajo de una cabecera (por texto), hasta el siguiente encabezado/TOC (por estilo).
492 | """
493 | from docx import Document
494 | import os
495 | if not os.path.exists(doc_path):
496 | return f"Document {doc_path} not found."
497 |
498 | doc = Document(doc_path)
499 |
500 | # Find the header paragraph first
501 | header_para = None
502 | header_idx = None
503 | for i, para in enumerate(doc.paragraphs):
504 | para_text = para.text.strip().lower()
505 | is_toc = is_toc_paragraph(para)
506 | if para_text == header_text.strip().lower() and not is_toc:
507 | header_para = para
508 | header_idx = i
509 | break
510 |
511 | if header_para is None:
512 | return f"Header '{header_text}' not found in document."
513 |
514 | # Delete everything under the header using the same document instance
515 | header_el, removed_count = delete_block_under_header(doc, header_text)
516 |
517 | # Now insert new paragraphs after the header (which should still be in the document)
518 | style_to_use = new_paragraph_style or "Normal"
519 |
520 | # Find the header again after deletion (it should still be there)
521 | current_para = header_para
522 | for text in new_paragraphs:
523 | new_para = doc.add_paragraph(text, style=style_to_use)
524 | current_para._element.addnext(new_para._element)
525 | current_para = new_para
526 |
527 | doc.save(doc_path)
528 | return f"Replaced content under '{header_text}' with {len(new_paragraphs)} paragraph(s), style: {style_to_use}, removed {removed_count} elements."
529 |
530 |
531 | def replace_block_between_manual_anchors(
532 | doc_path: str,
533 | start_anchor_text: str,
534 | new_paragraphs: list,
535 | end_anchor_text: str = None,
536 | match_fn=None,
537 | new_paragraph_style: str = None
538 | ) -> str:
539 | """
540 | Replace all content (paragraphs, tables, etc.) between start_anchor_text and end_anchor_text (or next logical header if not provided).
541 | If end_anchor_text is None, deletes until next visually distinct paragraph (bold, all caps, or different font size), or end of document.
542 | Inserts new_paragraphs after the start anchor.
543 | """
544 | from docx import Document
545 | import os
546 | if not os.path.exists(doc_path):
547 | return f"Document {doc_path} not found."
548 | doc = Document(doc_path)
549 | body = doc.element.body
550 | elements = list(body)
551 | start_idx = None
552 | end_idx = None
553 | # Find start anchor
554 | for i, el in enumerate(elements):
555 | if el.tag == CT_P.tag:
556 | p_text = "".join([node.text or '' for node in el.iter() if node.tag.endswith('}t')]).strip()
557 | if match_fn:
558 | if match_fn(p_text, el):
559 | start_idx = i
560 | break
561 | elif p_text == start_anchor_text.strip():
562 | start_idx = i
563 | break
564 | if start_idx is None:
565 | return f"Start anchor '{start_anchor_text}' not found."
566 | # Find end anchor
567 | if end_anchor_text:
568 | for i in range(start_idx + 1, len(elements)):
569 | el = elements[i]
570 | if el.tag == CT_P.tag:
571 | p_text = "".join([node.text or '' for node in el.iter() if node.tag.endswith('}t')]).strip()
572 | if match_fn:
573 | if match_fn(p_text, el, is_end=True):
574 | end_idx = i
575 | break
576 | elif p_text == end_anchor_text.strip():
577 | end_idx = i
578 | break
579 | else:
580 | # Heuristic: next visually distinct paragraph (bold, all caps, or different font size), or end of document
581 | for i in range(start_idx + 1, len(elements)):
582 | el = elements[i]
583 | if el.tag == CT_P.tag:
584 | # Check for bold, all caps, or font size
585 | runs = [node for node in el.iter() if node.tag.endswith('}r')]
586 | for run in runs:
587 | rpr = run.find(qn('w:rPr'))
588 | if rpr is not None:
589 | if rpr.find(qn('w:b')) is not None or rpr.find(qn('w:caps')) is not None or rpr.find(qn('w:sz')) is not None:
590 | end_idx = i
591 | break
592 | if end_idx is not None:
593 | break
594 | # Mark elements for removal
595 | to_remove = []
596 | for i in range(start_idx + 1, end_idx if end_idx is not None else len(elements)):
597 | to_remove.append(elements[i])
598 | for el in to_remove:
599 | body.remove(el)
600 | doc.save(doc_path)
601 | # Reload and find start anchor for insertion
602 | doc = Document(doc_path)
603 | paras = doc.paragraphs
604 | anchor_idx = None
605 | for i, para in enumerate(paras):
606 | if para.text.strip() == start_anchor_text.strip():
607 | anchor_idx = i
608 | break
609 | if anchor_idx is None:
610 | return f"Start anchor '{start_anchor_text}' not found after deletion (unexpected)."
611 | anchor_para = paras[anchor_idx]
612 | style_to_use = new_paragraph_style or "Normal"
613 | for text in new_paragraphs:
614 | new_para = doc.add_paragraph(text, style=style_to_use)
615 | anchor_para._element.addnext(new_para._element)
616 | anchor_para = new_para
617 | doc.save(doc_path)
618 | return f"Replaced content between '{start_anchor_text}' and '{end_anchor_text or 'next logical header'}' with {len(new_paragraphs)} paragraph(s), style: {style_to_use}, removed {len(to_remove)} elements."
619 |
```