#
tokens: 49359/50000 33/40 files (page 1/4)
lines: on (toggle) GitHub
raw markdown copy reset
This is page 1 of 4. Use http://codebase.md/ilikepizza2/qa-mcp?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .env.example
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── main.py
├── mcp_server.py
├── README.md
├── requirements.txt
├── src
│   ├── __init__.py
│   ├── agents
│   │   ├── __init__.py
│   │   ├── auth_agent.py
│   │   ├── crawler_agent.py
│   │   ├── js_utils
│   │   │   └── xpathgenerator.js
│   │   └── recorder_agent.py
│   ├── browser
│   │   ├── __init__.py
│   │   ├── browser_controller.py
│   │   └── panel
│   │       └── panel.py
│   ├── core
│   │   ├── __init__.py
│   │   └── task_manager.py
│   ├── dom
│   │   ├── buildDomTree.js
│   │   ├── history
│   │   │   ├── service.py
│   │   │   └── view.py
│   │   ├── service.py
│   │   └── views.py
│   ├── execution
│   │   ├── __init__.py
│   │   └── executor.py
│   ├── llm
│   │   ├── __init__.py
│   │   ├── clients
│   │   │   ├── azure_openai_client.py
│   │   │   ├── gemini_client.py
│   │   │   └── openai_client.py
│   │   └── llm_client.py
│   ├── security
│   │   ├── __init__.py
│   │   ├── nuclei_scanner.py
│   │   ├── semgrep_scanner.py
│   │   ├── utils.py
│   │   └── zap_scanner.py
│   └── utils
│       ├── __init__.py
│       ├── image_utils.py
│       └── utils.py
└── test_schema.md
```

# Files

--------------------------------------------------------------------------------
/.env.example:
--------------------------------------------------------------------------------

```
1 | LLM_API_KEY="YOUR_LLM_API_KEY"
2 | LLM_BASE_URL="LLM_BASE_URL"
3 | LLM_MODEL="LLM_MODEL"
```

--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------

```
 1 | .env
 2 | __pycache__/
 3 | *.pyc
 4 | /venv/
 5 | /.venv
 6 | /output
 7 | ignore-*
 8 | .DS_Store
 9 | doc
10 | stitched.png
11 | visual_baselines/
12 | /results/
```

--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------

```markdown
  1 | # VibeShift: The Security Engineer for Vibe Coders
  2 | 
  3 | **VibeShift** is an intelligent security agent designed to integrate seamlessly with AI coding assistants (like Cursor, GitHub Copilot, Claude Code, etc.). It acts as your automated security engineer, analyzing code generated by AI, identifying vulnerabilities, and facilitating AI-driven remediation *before* insecure code makes it to your codebase. It leverages the **MCP (Model Context Protocol)** for smooth interaction within your existing AI coding environment.
  4 | 
  5 | <a href="https://www.producthunt.com/posts/vibeshift-mcp?embed=true&utm_source=badge-featured&utm_medium=badge&utm_source=badge-vibeshift&#0045;mcp" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=966186&theme=light&t=1747654611925" alt="VibeShift&#0032;MCP - Get&#0032;secure&#0044;&#0032;working&#0032;code&#0032;in&#0032;1&#0032;shot | Product Hunt" style="width: 115px; height: 25px;" width="250" height="54" /></a>
  6 | [![Twitter Follow](https://img.shields.io/twitter/follow/Omiiee_Chan?style=social)](https://x.com/Omiiee_Chan)
  7 | [![Twitter Follow](https://img.shields.io/twitter/follow/_gauravkabra_?style=social)](https://x.com/_gauravkabra_)
  8 | ![](https://img.shields.io/github/stars/groundng/vibeshift)
  9 | 
 10 | 
 11 | **The Problem:** AI coding assistants accelerate development dramatically, but they can also generate code with subtle or overt security vulnerabilities. Manually reviewing all AI-generated code for security flaws is slow, error-prone, and doesn't scale with the speed of AI development. This "vibe-driven development" can leave applications exposed.
 12 | 
 13 | **The Solution: GroundNG's VibeShift** bridges this critical security gap by enabling your AI coding assistant to:
 14 | 
 15 | 1.  **Automatically Analyze AI-Generated Code:** As code is generated or modified by an AI assistant, VibeShift can be triggered to perform security analysis using a suite of tools (SAST, DAST components) and AI-driven checks.
 16 | 2.  **Identify Security Vulnerabilities:** Pinpoints common and complex vulnerabilities (e.g., XSS, SQLi, insecure configurations, logic flaws) within the AI-generated snippets or larger code blocks.
 17 | 3.  **Facilitate AI-Driven Remediation:** Provides detailed feedback and vulnerability information directly to the AI coding assistant, enabling it to suggest or even automatically apply fixes.
 18 | 4.  **Create a Security Feedback Loop:** Ensures that developers and their AI assistants are immediately aware of potential security risks, allowing for rapid correction and learning.
 19 | 
 20 | This creates a "shift-left" security paradigm for AI-assisted coding, embedding security directly into the development workflow and helping to ship more secure code, faster.
 21 | 
 22 | # Demo (Click to play these videos)
 23 | [![Demo](https://img.youtube.com/vi/bN_RgQGa8B0/maxresdefault.jpg)](https://www.youtube.com/watch?v=bN_RgQGa8B0)
 24 | [![Click to play](https://img.youtube.com/vi/wCbCUCqjnXQ/maxresdefault.jpg)](https://youtu.be/wCbCUCqjnXQ)
 25 | 
 26 | 
 27 | ## Features
 28 | 
 29 | *   **MCP Integration:** Seamlessly integrates with Cursor/Windsurf/Github Copilot/Roo Code
 30 | *   **Automated Security Scanning:** Triggers on AI code generation/modification to perform:
 31 |     *   **Static Code Analysis (SAST):** Integrates tools like Semgrep to find vulnerabilities in source code.
 32 |     *   **Dynamic Analysis (DAST Primitives):** Can invoke tools like Nuclei or ZAP for checks against running components (where applicable).
 33 | *   **AI-Assisted Test Recording:** Generate Playwright-based test scripts from natural language descriptions (in automated mode).
 34 | *   **Deterministic Test Execution:** Run recorded JSON test files reliably using Playwright.
 35 | *   **AI-Powered Test Discovery:** Crawl websites and leverage any LLM (in openai compliant format) to suggest test steps for discovered pages.
 36 | *   **Regression Testing:** Easily run existing test suites to catch regressions.
 37 | *   **Automated Feedback Loop:** Execution results (including failures, screenshots, console logs) are returned, providing direct feedback to the AI assistant.
 38 | *   **Self Healing:** Existing tests self heal in case of code changes. No need to manually update.
 39 | *   **UI tests:** UI tests which aren't supported by playwright directly are also supported. For example, `Check if the text is overflowing in the div`
 40 | *   **Visual Regression Testing**: Using traditional pixelmatch and vision LLM approach.
 41 | 
 42 | ## How it Works
 43 | 
 44 | ```
 45 | +-------------+       +-----------------+       +---------------------+       +-----------------+       +-------------+
 46 | |    User     | ----> | AI Coding Agent | ----> |     MCP Server      | ----> | Scan, test, exec| ----> | Browser     |
 47 | | (Developer) |       | (e.g., Copilot) |       | (mcp_server.py)     |       | (SAST, Record)  |       | (Playwright)|
 48 | +-------------+       +-----------------+       +---------------------+       +-----------------+       +-------------+
 49 |       ^                                                  |                            |                     |
 50 |       |--------------------------------------------------+----------------------------+---------------------+
 51 |                                       [Test Results / Feedback]
 52 | ```
 53 | 
 54 | 1.  **User:** Prompts their AI coding assistant (e.g., "Test this repository for security vulnerabilities", "Record a test for the login flow", "Run the regression test 'test_login.json'").
 55 | 2.  **AI Coding Agent:** Recognizes the intent and uses MCP to call the appropriate tool provided by the `MCP Server`.
 56 | 3.  **MCP Server:** Routes the request to the corresponding function (`get_security_scan`, `record_test_flow`, `run_regression_test`, `discover_test_flows`, `list_recorded_tests`).
 57 | 4.  **VibeShift Agent:**
 58 |     *   **Traditional Security Scan:**  Invokes **Static Analysis Tools** (e.g., Semgrep) on the code.
 59 |     *   **Recording:** The `WebAgent` (in automated mode) interacts with the LLM to plan steps, controls the browser via `BrowserController` (Playwright), processes HTML/Vision, and saves the resulting test steps to a JSON file in the `output/` directory.
 60 |     *   **Execution:** The `TestExecutor` loads the specified JSON test file, uses `BrowserController` to interact with the browser according to the recorded steps, and captures results, screenshots, and console logs.
 61 |     *   **Discovery:** The `CrawlerAgent` uses `BrowserController` and `LLMClient` to crawl pages and suggest test steps.
 62 | 6.  **Browser:** Playwright drives the actual browser interaction.
 63 | 6.  **Feedback Loop:**
 64 |     *   The comprehensive security report (vulnerabilities, locations, suggestions) is returned through the MCP server to the **AI Coding Agent**.
 65 |     *   The AI Coding Agent presents this to the developer and can use the information to **suggest or apply fixes**.
 66 |     *   The goal is a rapid cycle of code generation -> security scan -> AI-driven fix -> re-scan (optional).
 67 | 
 68 | ## Getting Started
 69 | 
 70 | ### Prerequisites
 71 | 
 72 | *   Python 3.10+
 73 | *   Access to any LLM (gemini 2.0 flash works best for free in my testing)
 74 | *   MCP installed (`pip install mcp[cli]`)
 75 | *   Playwright browsers installed (`patchright install`)
 76 | 
 77 | ### Installation
 78 | 
 79 | 1.  **Clone the repository:**
 80 |     ```bash
 81 |     git clone https://github.com/GroundNG/VibeShift
 82 |     cd VibeShift
 83 |     ```
 84 | 2.  **Create a virtual environment (recommended):**
 85 |     ```bash
 86 |     python -m venv venv
 87 |     source venv/bin/activate # Linux/macOS
 88 |     # venv\Scripts\activate # Windows
 89 |     ```
 90 | 3.  **Install dependencies:**
 91 |     ```bash
 92 |     pip install -r requirements.txt
 93 |     ```
 94 | 4.  **Install Playwright browsers:**
 95 |     ```bash
 96 |     patchright install --with-deps # Installs browsers and OS dependencies
 97 |     ```
 98 | 
 99 | ### Configuration
100 | 
101 | 1.  Rename the .env.example to .env file in the project root directory.
102 | 2.  Add your LLM API key and other necessary details:
103 |     ```dotenv
104 |     # .env
105 |     LLM_API_KEY="YOUR_LLM_API_KEY"
106 |     ```
107 |     *   Replace `YOUR_LLM_API_KEY` with your actual key.
108 | 
109 | ### Adding the MCP Server
110 | Add this to you mcp config:
111 | ```json
112 | {
113 |   "mcpServers": {
114 |     "VibeShift":{
115 |       "command": "uv",
116 |       "args": ["--directory","path/to/cloned_repo", "run", "mcp_server.py"]
117 |     }
118 |   }
119 | }
120 | ```
121 | 
122 | 
123 | Keep this server running while you interact with your AI coding assistant.
124 | 
125 | ## Usage
126 | 
127 | Interact with the agent through your MCP-enabled AI coding assistant using natural language.
128 | 
129 | **Examples:**
130 | *   **Security Analysis:**
131 |      *   **Automatic (Preferred):** VibeShift automatically analyzes code snippets generated or significantly modified by the AI assistant. 
132 |      *   **Explicit Commands:**
133 |           > "VibeShift, analyze this function for security vulnerabilities."
134 |           > "Ask VibeShift to check the Python code Copilot just wrote for SQL injection."
135 |           > "Secure the generated code with VibeShift before committing."
136 | 
137 | *   **Record a Test:**
138 |     > "Record a test: go to https://practicetestautomation.com/practice-test-login/, type 'student' into the username field, type 'Password123' into the password field, click the submit button, and verify the text 'Congratulations student' is visible."
139 |     *   *(The agent will perform these actions automatically and save a `test_....json` file in `output/`)*
140 | 
141 | *   **Execute a Test:**
142 |     > "Run the regression test `output/test_practice_test_login_20231105_103000.json`"
143 |     *   *(The agent will execute the steps in the specified file and report PASS/FAIL status with errors and details.)*
144 | 
145 | *   **Discover Test Steps:**
146 |     > "Discover potential test steps starting from https://practicetestautomation.com/practice/"
147 |     *   *(The agent will crawl the site, analyze pages, and return suggested test steps for each.)*
148 | 
149 | *   **List Recorded Tests:**
150 |     > "List the available recorded web tests."
151 |     *   *(The agent will return a list of `.json` files found in the `output/` directory.)*
152 | 
153 | **Output:**
154 | * **Security Reports:** Returned to the AI coding assistant, detailing:
155 |     *   Vulnerability type (e.g., CWE, OWASP category)
156 |     *   Location in code
157 |     *   Severity
158 |     *   Evidence / Explanation
159 |     *   Suggested remediations (often for the AI to action)
160 | *   **Recorded Tests:** Saved as JSON files in the `output/` directory (see `test_schema.md` for format).
161 | *   **Execution Results:** Returned as a JSON object summarizing the run (status, errors, evidence paths). Full results are also saved to `output/execution_result_....json`.
162 | *   **Discovery Results:** Returned as a JSON object with discovered URLs and suggested steps. Full results saved to `output/discovery_results_....json`.
163 | 
164 | 
165 | ## Inspiration
166 | * **[Browser Use](https://github.com/browser-use/browser-use/)**: The dom context tree generation is heavily inspired from them and is modified to accomodate  static/dynamic/visual elements. Special thanks to them for their contribution to open source.
167 | * **[Semgrep](https://github.com/returntocorp/semgrep)**: A powerful open-source static analysis tool we leverage.
168 | * **[Nuclei](https://github.com/projectdiscovery/nuclei)**: For template-based dynamic scanning capabilities.
169 | 
170 |   
171 | ## Contributing
172 | 
173 | We welcome contributions! Please see `CONTRIBUTING.md` for details on how to get started, report issues, and submit pull requests. We're particularly interested in:
174 | 
175 | *   New security analyzer integrations.
176 | 
177 | ## License
178 | 
179 | This project is licensed under the [APACHE-2.0](LICENSE).
180 | 
181 | 
182 | 
```

--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------

```markdown
 1 | # Contributing to the AI Web Testing Agent
 2 | 
 3 | First off, thank you for considering contributing! This project aims to improve the development workflow by integrating automated web testing directly with AI coding assistants. Your contributions can make a real difference.
 4 | 
 5 | ## Code of Conduct
 6 | 
 7 | This project and everyone participating in it is governed by the [Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior. 
 8 | 
 9 | ## How Can I Contribute?
10 | 
11 | There are many ways to contribute, from reporting bugs to implementing new features.
12 | 
13 | ### Reporting Bugs
14 | 
15 | *   Ensure the bug was not already reported by searching on GitHub under [Issues](https://github.com/Ilikepizza2/GroundNG/issues). 
16 | *   If you're unable to find an open issue addressing the problem, [open a new one](https://github.com/Ilikepizza2/GroundNG/issues/new).  Be sure to include a **title and clear description**, as much relevant information as possible, and a **code sample or an executable test case** demonstrating the expected behavior that is not occurring.
17 | *   Include details about your environment (OS, Python version, library versions).
18 | 
19 | ### Suggesting Enhancements
20 | 
21 | *   Open a new issue to suggest an enhancement. Provide a clear description of the enhancement and its potential benefits.
22 | *   Explain why this enhancement would be useful and provide examples if possible.
23 | 
24 | ### Pull Requests
25 | 
26 | 1.  **Fork the repository** on GitHub.
27 | 2.  **Clone your fork** locally: `git clone [email protected]:Ilikepizza2/GroundNG.git`
28 | 3.  **Create a virtual environment** and install dependencies:
29 |     ```bash
30 |     cd <repository-name>
31 |     python -m venv venv
32 |     source venv/bin/activate # Or venv\Scripts\activate on Windows
33 |     pip install -r requirements.txt
34 |     playwright install --with-deps
35 |     ```
36 | 4.  **Create a topic branch** for your changes: `git checkout -b feature/your-feature-name` or `git checkout -b fix/your-bug-fix`.
37 | 5.  **Make your changes.** Write clean, readable code. Add comments where necessary.
38 | 6.  **Add tests** for your changes. Ensure existing tests pass. (See Testing section below).
39 | 7.  **Format your code** (e.g., using Black): `black .`
40 | 8.  **Commit your changes** using a descriptive commit message. Consider using [Conventional Commits](https://www.conventionalcommits.org/).
41 | 9.  **Push your branch** to your fork on GitHub: `git push origin feature/your-feature-name`.
42 | 10. **Open a Pull Request** to the `main` branch of the original repository. Provide a clear description of your changes and link any relevant issues.
43 | 
44 | ## Development Setup
45 | 
46 | *   Follow the installation steps in the [README.md](README.md).
47 | *   Ensure you have a `.env` file set up with your LLM API key for running the agent components that require it.
48 | *   Use the `mcp dev mcp_server.py` command to run the server locally for testing MCP interactions.
49 | 
50 | ## Testing
51 | 
52 | *   This project uses `pytest`. Run tests using:
53 |     ```bash
54 |     pytest
55 |     ```
56 | *   Please add tests for any new features or bug fixes. Place tests in a `tests/` directory (if not already present).
57 | *   Ensure all tests pass before submitting a pull request.
58 | 
59 | ## Code Style
60 | 
61 | *   Please follow PEP 8 guidelines.
62 | *   We recommend using [Black](https://github.com/psf/black) for code formatting. Run `black .` before committing.
63 | *   Use clear and descriptive variable and function names.
64 | *   Add docstrings to modules, classes, and functions.
65 | 
66 | ## Questions?
67 | 
68 | If you have questions about contributing or the project in general, feel free to open an issue on GitHub.
69 | 
70 | Thank you for your contribution!
```

--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Contributor Covenant Code of Conduct
  2 | 
  3 | ## Our Pledge
  4 | 
  5 | We as members, contributors, and leaders pledge to make participation in our
  6 | community a harassment-free experience for everyone, regardless of age, body
  7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
  8 | identity and expression, level of experience, education, socio-economic status,
  9 | nationality, personal appearance, race, religion, or sexual identity
 10 | and orientation.
 11 | 
 12 | We pledge to act and interact in ways that contribute to an open, welcoming,
 13 | diverse, inclusive, and healthy community.
 14 | 
 15 | ## Our Standards
 16 | 
 17 | Examples of behavior that contributes to a positive environment for our
 18 | community include:
 19 | 
 20 | *   Demonstrating empathy and kindness toward other people
 21 | *   Being respectful of differing opinions, viewpoints, and experiences
 22 | *   Giving and gracefully accepting constructive feedback
 23 | *   Accepting responsibility and apologizing to those affected by our mistakes,
 24 |     and learning from the experience
 25 | *   Focusing on what is best not just for us as individuals, but for the
 26 |     overall community
 27 | 
 28 | Examples of unacceptable behavior include:
 29 | 
 30 | *   The use of sexualized language or imagery, and sexual attention or
 31 |     advances of any kind
 32 | *   Trolling, insulting or derogatory comments, and personal or political attacks
 33 | *   Public or private harassment
 34 | *   Publishing others' private information, such as a physical or email
 35 |     address, without their explicit permission
 36 | *   Other conduct which could reasonably be considered inappropriate in a
 37 |     professional setting
 38 | 
 39 | ## Enforcement Responsibilities
 40 | 
 41 | Community leaders are responsible for clarifying and enforcing our standards of
 42 | acceptable behavior and will take appropriate and fair corrective action in
 43 | response to any behavior that they deem inappropriate, threatening, offensive,
 44 | or harmful.
 45 | 
 46 | Community leaders have the right and responsibility to remove, edit, or reject
 47 | comments, commits, code, wiki edits, issues, and other contributions that are
 48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
 49 | decisions when appropriate.
 50 | 
 51 | ## Scope
 52 | 
 53 | This Code of Conduct applies within all community spaces, and also applies when
 54 | an individual is officially representing the community in public spaces.
 55 | Examples of representing our community include using an official e-mail address,
 56 | posting via an official social media account, or acting as an appointed
 57 | representative at an online or offline event.
 58 | 
 59 | ## Enforcement
 60 | 
 61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
 62 | reported to the community leaders responsible for enforcement at no one available :(
 63 | All complaints will be reviewed and investigated promptly and fairly.
 64 | 
 65 | All community leaders are obligated to respect the privacy and security of the
 66 | reporter of any incident.
 67 | 
 68 | ## Enforcement Guidelines
 69 | 
 70 | Community leaders will follow these Community Impact Guidelines in determining
 71 | the consequences for any action they deem in violation of this Code of Conduct:
 72 | 
 73 | ### 1. Correction
 74 | 
 75 | **Community Impact**: Use of inappropriate language or other behavior deemed
 76 | unprofessional or unwelcome in the community.
 77 | 
 78 | **Consequence**: A private, written warning from community leaders, providing
 79 | clarity around the nature of the violation and an explanation of why the
 80 | behavior was inappropriate. A public apology may be requested.
 81 | 
 82 | ### 2. Warning
 83 | 
 84 | **Community Impact**: A violation through a single incident or series
 85 | of actions.
 86 | 
 87 | **Consequence**: A warning with consequences for continued behavior. No
 88 | interaction with the people involved, including unsolicited interaction with
 89 | those enforcing the Code of Conduct, for a specified period of time. This
 90 | includes avoiding interactions in community spaces as well as external channels
 91 | like social media. Violating these terms may lead to a temporary or
 92 | permanent ban.
 93 | 
 94 | ### 3. Temporary Ban
 95 | 
 96 | **Community Impact**: A serious violation of community standards, including
 97 | sustained inappropriate behavior.
 98 | 
 99 | **Consequence**: A temporary ban from any sort of interaction or public
100 | communication with the community for a specified period of time. No public or
101 | private interaction with the people involved, including unsolicited interaction
102 | with those enforcing the Code of Conduct, is allowed during this period.
103 | Violating these terms may lead to a permanent ban.
104 | 
105 | ### 4. Permanent Ban
106 | 
107 | **Community Impact**: Demonstrating a pattern of violation of community
108 | standards, including sustained inappropriate behavior, harassment of an
109 | individual, or aggression toward or disparagement of classes of individuals.
110 | 
111 | **Consequence**: A permanent ban from any sort of public interaction within
112 | the community.
113 | 
114 | ## Attribution
115 | 
116 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
117 | version 2.1, available at
118 | [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
119 | 
120 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
121 | enforcement ladder][mozilla coc].
122 | 
123 | For answers to common questions about this code of conduct, see the FAQ at
124 | [https://www.contributor-covenant.org/faq][faq]. Translations are available at
125 | [https://www.contributor-covenant.org/translations][translations].
126 | 
127 | [homepage]: https://www.contributor-covenant.org
128 | [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
129 | [mozilla coc]: https://github.com/mozilla/diversity
130 | [faq]: https://www.contributor-covenant.org/faq
131 | [translations]: https://www.contributor-covenant.org/translations
```

--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/agents/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/browser/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/core/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/execution/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/llm/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/security/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/src/utils/__init__.py:
--------------------------------------------------------------------------------

```python
1 | 
```

--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------

```
 1 | playwright
 2 | google-genai
 3 | beautifulsoup4
 4 | python-dotenv
 5 | Pillow
 6 | pydantic>=2.0  
 7 | pytest
 8 | pytest-html
 9 | mcp[cli]
10 | pixelmatch>=0.3.0
11 | openai
12 | patchright
13 | requests
14 | semgrep
```

--------------------------------------------------------------------------------
/src/utils/utils.py:
--------------------------------------------------------------------------------

```python
 1 | # /src/utils/utils.py
 2 | import os
 3 | from dotenv import load_dotenv
 4 | 
 5 | def load_api_key():
 6 |     """Loads the llm API key from .env file."""
 7 |     load_dotenv()
 8 |     api_key = os.getenv("LLM_API_KEY")
 9 |     if not api_key:
10 |         raise ValueError("LLM_API_KEY not found in .env file or environment variables.")
11 |     return api_key
12 | 
13 | def load_api_base_url():
14 |     """Loads the API base url from .env file."""
15 |     load_dotenv()
16 |     base_url = os.getenv("LLM_BASE_URL")
17 |     if not base_url:
18 |         raise ValueError("LLM_BASE_URL not found in .env file or environment variables.")
19 |     return base_url
20 | 
21 | def load_api_version():
22 |     """Loads the API Version from .env file."""
23 |     load_dotenv()
24 |     api_version = os.getenv("LLM_API_VERSION")
25 |     if not api_version:
26 |         raise ValueError("LLM_API_VERSION not found in .env file or environment variables.")
27 |     return api_version
28 | 
29 | def load_llm_model():
30 |     """Loads the llm model from .env file."""
31 |     load_dotenv()
32 |     llm_model = os.getenv("LLM_MODEL")
33 |     if not llm_model:
34 |         raise ValueError("LLM_MODEL not found in .env file or environment variables.")
35 |     return llm_model
36 | 
37 | def load_llm_timeout():
38 |     """Loads the default llm model timeout from .env file."""
39 |     load_dotenv()
40 |     llm_timeout = os.getenv("LLM_TIMEOUT")
41 |     if not llm_timeout:
42 |         raise ValueError("LLM_TIMEOUT not found in .env file or environment variables.")
43 |     return llm_timeout
```

--------------------------------------------------------------------------------
/src/agents/js_utils/xpathgenerator.js:
--------------------------------------------------------------------------------

```javascript
 1 | function generateXPathForElement(currentElement) {
 2 |     function getElementPosition(currentElement) {
 3 |         if (!currentElement.parentElement) return 0;
 4 |         const tagName = currentElement.nodeName.toLowerCase();
 5 |         const siblings = Array.from(currentElement.parentElement.children)
 6 |             .filter((sib) => sib.nodeName.toLowerCase() === tagName);
 7 |         if (siblings.length === 1) return 0;
 8 |         const index = siblings.indexOf(currentElement) + 1;
 9 |         return index;
10 |     }
11 |     const segments = [];
12 |     let elementToProcess = currentElement;
13 |     while (elementToProcess && elementToProcess.nodeType === Node.ELEMENT_NODE) {
14 |         const position = getElementPosition(elementToProcess);
15 |         const tagName = elementToProcess.nodeName.toLowerCase();
16 |         const xpathIndex = position > 0 ? `[${position}]` : "";
17 |         segments.unshift(`${tagName}${xpathIndex}`);
18 |         const parentNode = elementToProcess.parentNode;
19 |         if (!parentNode || parentNode.nodeType !== Node.ELEMENT_NODE) {
20 |             elementToProcess = null;
21 |         } else if (parentNode instanceof ShadowRoot || parentNode instanceof HTMLIFrameElement) {
22 |             elementToProcess = null;
23 |         } else {
24 |             elementToProcess = parentNode;
25 |         }
26 |     }
27 |     let finalPath = segments.join("/");
28 |     if (finalPath && !finalPath.startsWith('html') && !finalPath.startsWith('/html')) {
29 |         if (finalPath.startsWith('body')) {
30 |             finalPath = '/html/' + finalPath;
31 |         } else if (!finalPath.startsWith('/')) {
32 |             finalPath = '/' + finalPath;
33 |         }
34 |     } else if (finalPath.startsWith('body')) {
35 |         finalPath = '/html/' + finalPath;
36 |     }
37 |     return finalPath || null;
38 | }
```

--------------------------------------------------------------------------------
/src/security/utils.py:
--------------------------------------------------------------------------------

```python
 1 | # utils.py
 2 | import logging
 3 | import json
 4 | import os
 5 | from datetime import datetime
 6 | 
 7 | LOG_FORMAT = '%(asctime)s - %(levelname)s - %(message)s'
 8 | 
 9 | def setup_logging(log_level=logging.INFO):
10 |     """Configures basic logging."""
11 |     logging.basicConfig(level=log_level, format=LOG_FORMAT)
12 | 
13 | def save_report(data, tool_name, output_dir="results", filename_prefix="report"):
14 |     """Saves the collected data to a JSON file."""
15 |     if not os.path.exists(output_dir):
16 |         os.makedirs(output_dir)
17 | 
18 |     timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
19 |     filename = f"{filename_prefix}_{tool_name}_{timestamp}.json"
20 |     filepath = os.path.join(output_dir, filename)
21 | 
22 |     try:
23 |         with open(filepath, 'w') as f:
24 |             json.dump(data, f, indent=4)
25 |         logging.info(f"Successfully saved {tool_name} report to {filepath}")
26 |         return filepath
27 |     except Exception as e:
28 |         logging.error(f"Failed to save {tool_name} report to {filepath}: {e}")
29 |         return None
30 | 
31 | def parse_json_lines_file(filepath):
32 |     """Parses a file containing JSON objects, one per line."""
33 |     results = []
34 |     if not os.path.exists(filepath):
35 |         logging.error(f"File not found for parsing: {filepath}")
36 |         return results
37 |     try:
38 |         with open(filepath, 'r') as f:
39 |             for line in f:
40 |                 try:
41 |                     if line.strip():
42 |                         results.append(json.loads(line))
43 |                 except json.JSONDecodeError as e:
44 |                     logging.warning(f"Skipping invalid JSON line in {filepath}: {line.strip()} - Error: {e}")
45 |         return results
46 |     except Exception as e:
47 |         logging.error(f"Failed to read or parse JSON lines file {filepath}: {e}")
48 |         return [] # Return empty list on failure
49 | 
50 | def parse_json_file(filepath):
51 |     """Parses a standard JSON file."""
52 |     if not os.path.exists(filepath):
53 |         logging.error(f"File not found for parsing: {filepath}")
54 |         return None
55 |     try:
56 |         with open(filepath, 'r') as f:
57 |             data = json.load(f)
58 |         return data
59 |     except json.JSONDecodeError as e:
60 |         logging.error(f"Invalid JSON format in {filepath}: {e}")
61 |         return None
62 |     except Exception as e:
63 |         logging.error(f"Failed to read or parse JSON file {filepath}: {e}")
64 |         return None
```

--------------------------------------------------------------------------------
/src/dom/history/view.py:
--------------------------------------------------------------------------------

```python
 1 | # /src/dom/history/view.py 
 2 | from dataclasses import dataclass
 3 | from typing import Optional, List, Dict, Union # Added Dict
 4 | 
 5 | # Use Pydantic for coordinate models if available and desired
 6 | try:
 7 |     from pydantic import BaseModel, Field
 8 | 
 9 |     class Coordinates(BaseModel):
10 |         x: float # Use float for potential subpixel values
11 |         y: float
12 | 
13 |     class CoordinateSet(BaseModel):
14 |         # Match names from buildDomTree.js if they differ
15 |         top_left: Coordinates
16 |         top_right: Coordinates
17 |         bottom_left: Coordinates
18 |         bottom_right: Coordinates
19 |         center: Coordinates
20 |         width: float
21 |         height: float
22 | 
23 |     class ViewportInfo(BaseModel):
24 |         scroll_x: float = Field(alias="scrollX") # Match JS key if needed
25 |         scroll_y: float = Field(alias="scrollY")
26 |         width: float
27 |         height: float
28 | 
29 | except ImportError:
30 |     # Fallback if Pydantic is not installed (less type safety)
31 |     Coordinates = Dict[str, float]
32 |     CoordinateSet = Dict[str, Union[Coordinates, float]]
33 |     ViewportInfo = Dict[str, float]
34 |     BaseModel = object # Placeholder
35 | 
36 | 
37 | @dataclass
38 | class HashedDomElement:
39 |     """ Hash components of a DOM element for comparison. """
40 |     branch_path_hash: str
41 |     attributes_hash: str
42 |     xpath_hash: str
43 |     # text_hash: str (Still excluded)
44 | 
45 | @dataclass
46 | class DOMHistoryElement:
47 |     """ A serializable representation of a DOM element's state at a point in time. """
48 |     tag_name: str
49 |     xpath: str
50 |     highlight_index: Optional[int]
51 |     entire_parent_branch_path: List[str]
52 |     attributes: Dict[str, str]
53 |     shadow_root: bool = False
54 |     css_selector: Optional[str] = None # Generated enhanced selector
55 |     # Store the Pydantic models or dicts directly
56 |     page_coordinates: Optional[CoordinateSet] = None
57 |     viewport_coordinates: Optional[CoordinateSet] = None
58 |     viewport_info: Optional[ViewportInfo] = None
59 | 
60 |     def to_dict(self) -> dict:
61 |         """ Converts the history element to a dictionary. """
62 |         data = {
63 |             'tag_name': self.tag_name,
64 |             'xpath': self.xpath,
65 |             'highlight_index': self.highlight_index,
66 |             'entire_parent_branch_path': self.entire_parent_branch_path,
67 |             'attributes': self.attributes,
68 |             'shadow_root': self.shadow_root,
69 |             'css_selector': self.css_selector,
70 |              # Handle Pydantic models correctly if used
71 |             'page_coordinates': self.page_coordinates.model_dump() if isinstance(self.page_coordinates, BaseModel) else self.page_coordinates,
72 |             'viewport_coordinates': self.viewport_coordinates.model_dump() if isinstance(self.viewport_coordinates, BaseModel) else self.viewport_coordinates,
73 |             'viewport_info': self.viewport_info.model_dump() if isinstance(self.viewport_info, BaseModel) else self.viewport_info,
74 |         }
75 |         # Filter out None values if desired
76 |         # return {k: v for k, v in data.items() if v is not None}
77 |         return data
```

--------------------------------------------------------------------------------
/src/security/nuclei_scanner.py:
--------------------------------------------------------------------------------

```python
 1 | # nuclei_scanner.py
 2 | import logging
 3 | import subprocess
 4 | import os
 5 | import shlex
 6 | from datetime import datetime
 7 | from .utils import parse_json_file  # Relative import
 8 | 
 9 | NUCLEI_TIMEOUT_SECONDS = 900  # 15 minutes default
10 | 
11 | def run_nuclei(target_url: str, output_dir="results", timeout=NUCLEI_TIMEOUT_SECONDS, severity="low,medium,high,critical"):
12 |     """Runs the Nuclei security scanner against a target URL or IP."""
13 |     if not target_url:
14 |         logging.error("Nuclei target URL/IP is required")
15 |         return []
16 | 
17 |     logging.info(f"Starting Nuclei scan for target: {target_url}")
18 |     timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
19 |     output_filename = f"nuclei_output_{timestamp}.json"
20 |     output_filepath = os.path.join(output_dir, output_filename)
21 | 
22 |     if not os.path.exists(output_dir):
23 |         os.makedirs(output_dir)
24 | 
25 |     # Configure nuclei command with common best practices
26 |     command = [
27 |         "nuclei", 
28 |         "-target", target_url,
29 |         "-json", 
30 |         "-o", output_filepath,
31 |         "-severity", severity,
32 |         "-silent"
33 |     ]
34 | 
35 |     logging.debug(f"Executing Nuclei command: {' '.join(shlex.quote(cmd) for cmd in command)}")
36 | 
37 |     try:
38 |         result = subprocess.run(command, capture_output=True, text=True, timeout=timeout, check=False)
39 | 
40 |         logging.info("Nuclei process finished.")
41 |         logging.debug(f"Nuclei stdout:\n{result.stdout}")
42 | 
43 |         if result.returncode != 0:
44 |             logging.warning(f"Nuclei exited with non-zero status code: {result.returncode}")
45 |             return [f"Nuclei exited with non-zero status code: {result.returncode}"]
46 | 
47 |         # Parse the JSON output file
48 |         findings = parse_json_file(output_filepath)
49 |         if findings:
50 |             logging.info(f"Successfully parsed {len(findings)} findings from Nuclei output.")
51 |             # Add tool name for context
52 |             for finding in findings:
53 |                 finding['tool'] = 'Nuclei'
54 |                 # Standardize some fields to match our expected format
55 |                 if 'info' in finding:
56 |                     finding['severity'] = finding.get('info', {}).get('severity')
57 |                     finding['message'] = finding.get('info', {}).get('name')
58 |                     finding['description'] = finding.get('info', {}).get('description')
59 |                     finding['matched_at'] = finding.get('matched-at', '')
60 | 
61 |             return findings
62 |         else:
63 |             logging.warning(f"Could not parse findings from Nuclei output file: {output_filepath}")
64 |             return [f"Could not parse findings from Nuclei output file: {output_filepath}"]
65 | 
66 |     except subprocess.TimeoutExpired:
67 |         logging.error(f"Nuclei scan timed out after {timeout} seconds.")
68 |         return [f"Nuclei scan timed out after {timeout} seconds."]
69 |     except FileNotFoundError:
70 |         logging.error("Nuclei command not found. Is Nuclei installed and in PATH?")
71 |         return ["Nuclei command not found. Is Nuclei installed and in PATH?"]
72 |     except Exception as e:
73 |         logging.error(f"An unexpected error occurred while running Nuclei: {e}")
74 |         return [f"An unexpected error occurred while running Nuclei: {e}"]
75 | 
```

--------------------------------------------------------------------------------
/src/security/semgrep_scanner.py:
--------------------------------------------------------------------------------

```python
 1 | # semgrep_scanner.py
 2 | import logging
 3 | import subprocess
 4 | import os
 5 | import shlex
 6 | from datetime import datetime
 7 | from .utils import parse_json_file # Relative import
 8 | 
 9 | SEMGREP_TIMEOUT_SECONDS = 600 # 10 minutes default
10 | 
11 | def run_semgrep(code_path: str, config: str = "auto", output_dir="results", timeout=SEMGREP_TIMEOUT_SECONDS):
12 |     """Runs the Semgrep CLI tool."""
13 |     if not os.path.isdir(code_path):
14 |         logging.error(f"Semgrep target path is not a valid directory: {code_path}")
15 |         return []
16 | 
17 |     logging.info(f"Starting Semgrep scan for codebase: {code_path} using config: {config}")
18 |     timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
19 |     output_filename = f"semgrep_output_{timestamp}.json"
20 |     output_filepath = os.path.join(output_dir, output_filename)
21 | 
22 |     if not os.path.exists(output_dir):
23 |         os.makedirs(output_dir)
24 | 
25 |     # Use --json for machine-readable output
26 |     command = ["semgrep", "scan", "--config", config, "--json", "-o", output_filepath, code_path]
27 | 
28 |     logging.debug(f"Executing Semgrep command: {' '.join(shlex.quote(cmd) for cmd in command)}")
29 | 
30 |     try:
31 |         result = subprocess.run(command, capture_output=True, text=True, timeout=timeout, check=False) # check=False
32 | 
33 |         logging.info("Semgrep process finished.")
34 |         logging.debug(f"Semgrep stdout:\n{result.stdout}") # Often has progress info
35 |         # if result.stderr:
36 |         #      logging.warning(f"Semgrep stderr:\n{result.stderr}")
37 |         #      return [f"semgrep stderr: \n{result.stderr}"]
38 | 
39 |         if result.returncode != 0:
40 |             logging.warning(f"Semgrep exited with non-zero status code: {result.returncode}")
41 |             return [f"Semgrep exited with non-zero status code: {result.returncode}"]
42 |             # It might still produce output even with errors (e.g., parse errors)
43 | 
44 |         # Parse the JSON output file
45 |         report_data = parse_json_file(output_filepath)
46 |         if report_data and "results" in report_data:
47 |              findings = report_data["results"]
48 |              logging.info(f"Successfully parsed {len(findings)} findings from Semgrep output.")
49 |              # Add tool name for context
50 |              for finding in findings:
51 |                  finding['tool'] = 'Semgrep'
52 |                  # Simplify structure slightly if needed
53 |                  finding['message'] = finding.get('extra', {}).get('message')
54 |                  finding['severity'] = finding.get('extra', {}).get('severity')
55 |                  finding['code_snippet'] = finding.get('extra', {}).get('lines')
56 | 
57 |              return findings
58 |         else:
59 |              logging.warning(f"Could not parse findings from Semgrep output file: {output_filepath}")
60 |              return [f"Could not parse findings from Semgrep output file: {output_filepath}"]
61 | 
62 |     except subprocess.TimeoutExpired:
63 |         logging.error(f"Semgrep scan timed out after {timeout} seconds.")
64 |         return [f"Semgrep scan timed out after {timeout} seconds."]
65 |     except FileNotFoundError:
66 |         logging.error("Semgrep command not found. Is Semgrep installed and in PATH?")
67 |         return ["Semgrep command not found. Is Semgrep installed and in PATH?"]
68 |     except Exception as e:
69 |         logging.error(f"An unexpected error occurred while running Semgrep: {e}")
70 |         return [f"An unexpected error occurred while running Semgrep: {e}"]
```

--------------------------------------------------------------------------------
/src/llm/llm_client.py:
--------------------------------------------------------------------------------

```python
 1 | # /src/llm/lm_client.py
 2 | from google import genai
 3 | from PIL import Image
 4 | import io
 5 | import logging
 6 | import time # Import time module
 7 | import threading # Import threading for lock
 8 | from typing import Type, Optional, Union, List, Dict, Any
 9 | logger = logging.getLogger(__name__)
10 | import base64
11 | import json 
12 | 
13 | from .clients.gemini_client import GeminiClient
14 | from .clients.azure_openai_client import AzureOpenAIClient
15 | from .clients.openai_client import OpenAIClient
16 | 
17 | 
18 | class LLMClient:
19 |     """
20 |     Handles interactions with LLM APIs (Google Gemini or any LLM with OpenAI sdk)
21 |     with rate limiting.
22 |     """
23 | 
24 |     # Rate limiting parameters (adjust based on the specific API limits)
25 |     # Consider making this provider-specific if needed
26 |     MIN_REQUEST_INTERVAL_SECONDS = 3.0 # Adjusted slightly, Gemini free is 15 RPM (4s), LLM depends on tier
27 | 
28 |     def __init__(self, provider: str):# 'gemini' or 'LLM'
29 |         """
30 |         Initializes the LLM client for the specified provider.
31 | 
32 |         Args:
33 |             provider: The LLM provider to use ('gemini' or 'openai' or 'azure').
34 |         """
35 |         self.provider = provider.lower()
36 |         self.client = None
37 | 
38 |         if self.provider == 'gemini':
39 |             self.client = GeminiClient()
40 |         elif self.provider == 'openai':
41 |             self.client = OpenAIClient()
42 |         elif self.provider == 'azure':
43 |             self.client = AzureOpenAIClient()
44 |         else:
45 |             raise ValueError(f"Unsupported provider: {provider}. Choose 'gemini' or 'openai' or 'azure'.")
46 |         
47 |         # Common initialization
48 |         self._last_request_time = 0.0
49 |         self._lock = threading.Lock() # Lock for rate limiting
50 |         logger.info(f"LLMClient initialized for provider '{self.provider}' with {self.MIN_REQUEST_INTERVAL_SECONDS}s request interval.")
51 | 
52 | 
53 |     def _wait_for_rate_limit(self):
54 |         """Waits if necessary to maintain the minimum request interval."""
55 |         with self._lock: # Ensure thread-safe access
56 |             now = time.monotonic()
57 |             elapsed = now - self._last_request_time
58 |             wait_time = self.MIN_REQUEST_INTERVAL_SECONDS - elapsed
59 | 
60 |             if wait_time > 0:
61 |                 logger.debug(f"Rate limiting: Waiting for {wait_time:.2f} seconds...")
62 |                 time.sleep(wait_time)
63 | 
64 |             self._last_request_time = time.monotonic() # Update after potential wait
65 | 
66 |     def generate_text(self, prompt: str) -> str:
67 |           """Generates text using the configured LLM provider, respecting rate limits."""
68 |           self._wait_for_rate_limit() # Wait before making the API call
69 |           return self.client.generate_text(prompt)
70 | 
71 | 
72 |     def generate_multimodal(self, prompt: str, image_bytes: bytes) -> str:
73 |           """Generates text based on a prompt and an image, respecting rate limits."""
74 |           self._wait_for_rate_limit() # Wait before making the API call
75 |           return self.client.generate_multimodal(prompt, image_bytes)
76 | 
77 |     def generate_json(self, Schema_Class: Type, prompt: str, image_bytes: Optional[bytes] = None) -> Union[Dict[str, Any], str]:
78 |           """
79 |           Generates structured JSON output based on a prompt, an optional image,
80 |           and a defined schema, respecting rate limits.
81 | 
82 |           For Gemini, Schema_Class should be a Pydantic BaseModel or compatible type.
83 |           For any other LLM, Schema_Class must be a Pydantic BaseModel.
84 | 
85 |           Returns:
86 |               A dictionary representing the parsed JSON on success, or an error string.
87 |           """
88 |           self._wait_for_rate_limit()
89 |           return self.client.generate_json(Schema_Class, prompt, image_bytes)
```

--------------------------------------------------------------------------------
/test_schema.md:
--------------------------------------------------------------------------------

```markdown
  1 | 
  2 | ```json
  3 | // output/test_case_example.json
  4 | {
  5 |   "test_name": "Login Functionality Test",
  6 |   "feature_description": "User logs in with valid credentials and verifies the welcome message.",
  7 |   "recorded_at": "2023-10-27T10:00:00Z",
  8 |   "steps": [
  9 |     {
 10 |       "step_id": 1,
 11 |       "action": "navigate",
 12 |       "description": "Navigate to the login page", // Natural language
 13 |       "parameters": {
 14 |         "url": "https://practicetestautomation.com/practice-test-login/"
 15 |       },
 16 |       "selector": null, // Not applicable
 17 |       "wait_after_secs": 1.0 // Optional: Simple wait after action
 18 |     },
 19 |     {
 20 |       "step_id": 2,
 21 |       "action": "type",
 22 |       "description": "Type username 'student'",
 23 |       "parameters": {
 24 |         "text": "student",
 25 |         "parameter_name": "username" // Optional: For parameterization
 26 |       },
 27 |       "selector": "#username", // Recorded robust selector
 28 |       "wait_after_secs": 0.5
 29 |     },
 30 |     {
 31 |       "step_id": 3,
 32 |       "action": "type",
 33 |       "description": "Type password 'Password123'",
 34 |       "parameters": {
 35 |         "text": "Password123",
 36 |         "parameter_name": "password" // Optional: For parameterization
 37 |       },
 38 |       "selector": "input[name='password']",
 39 |       "wait_after_secs": 0.5
 40 |     },
 41 |     {
 42 |       "step_id": 4,
 43 |       "action": "click",
 44 |       "description": "Click the submit button",
 45 |       "parameters": {},
 46 |       "selector": "button#submit",
 47 |       "wait_after_secs": 1.0 // Longer wait after potential navigation/update
 48 |     },
 49 |     {
 50 |       "step_id": 5,
 51 |       "action": "wait_for_load_state", // Explicit wait example
 52 |       "description": "Wait for page load after submit",
 53 |       "parameters": {
 54 |         "state": "domcontentloaded" // Or "load", "networkidle"
 55 |       },
 56 |       "selector": null,
 57 |       "wait_after_secs": 0
 58 |     },
 59 |     {
 60 |       "step_id": 6,
 61 |       "action": "assert_text_contains",
 62 |       "description": "Verify success message is shown",
 63 |       "parameters": {
 64 |         "expected_text": "Congratulations student. You successfully logged in!"
 65 |       },
 66 |       "selector": "div.post-content p strong", // Selector for the element containing the text
 67 |       "wait_after_secs": 0
 68 |     },
 69 |     {
 70 |       "step_id": 7,
 71 |       "action": "assert_visible",
 72 |       "description": "Verify logout button is visible",
 73 |       "parameters": {},
 74 |       "selector": "a.wp-block-button__link:has-text('Log out')",
 75 |       "wait_after_secs": 0
 76 |     },
 77 |     {
 78 |       "step_id": 8, // Example ID
 79 |       "action": "select",
 80 |       "description": "Select 'Weekly' notification frequency",
 81 |       "parameters": {
 82 |         "option_label": "Weekly" // Store the label (or value if preferred)
 83 |         // "parameter_name": "notification_pref" // Optional parameterization
 84 |       },
 85 |       "selector": "select#notificationFrequency", // Selector for the <select> element
 86 |       "wait_after_secs": 0.5
 87 |     },
 88 |     {
 89 |       "step_id": 9, // Example ID
 90 |       "action": "assert_passed_verification", // Special action
 91 |       "description": "Verify user avatar is displayed in header", // Original goal
 92 |       "parameters": {
 93 |         // Optional: might include reasoning from recorder's AI
 94 |         "reasoning": "The avatar image was visually confirmed present by the vision LLM during recording."
 95 |       },
 96 |       "selector": null, // No specific selector needed for executor's check
 97 |       "wait_after_secs": 0
 98 |       // NOTE: During execution, the TestExecutor will take a screenshot
 99 |       //       and use its own vision LLM call to re-verify the condition
100 |       //       described in the 'description' field. It passes if the LLM
101 |       //       confirms visually, otherwise it fails the test.
102 |     }
103 |     // ... more steps
104 |   ]
105 | }
106 | ```
107 | 
```

--------------------------------------------------------------------------------
/src/utils/image_utils.py:
--------------------------------------------------------------------------------

```python
 1 | from PIL import Image, ImageDraw, ImageFont
 2 | import io
 3 | import logging
 4 | import base64
 5 | from typing import Optional
 6 | import os
 7 | 
 8 | from ..llm.llm_client import LLMClient
 9 | 
10 | logger = logging.getLogger(__name__)
11 | 
12 | 
13 | 
14 | # Helper Function
15 | def stitch_images(img1: Image.Image, img2: Image.Image, label1="Baseline", label2="Current") -> Optional[Image.Image]:
16 |         """Stitches two images side-by-side with labels."""
17 |         if img1.size != img2.size:
18 |             logger.error("Cannot stitch images of different sizes.")
19 |             return None
20 |         
21 |         width1, height1 = img1.size
22 |         width2, height2 = img2.size # Should be same as height1
23 | 
24 |         # Add padding for labels
25 |         label_height = 30 # Adjust as needed
26 |         total_width = width1 + width2
27 |         total_height = height1 + label_height
28 | 
29 |         stitched_img = Image.new('RGBA', (total_width, total_height), (255, 255, 255, 255)) # White background
30 | 
31 |         # Paste images
32 |         stitched_img.paste(img1, (0, label_height))
33 |         stitched_img.paste(img2, (width1, label_height))
34 | 
35 |         # Add labels
36 |         try:
37 |             draw = ImageDraw.Draw(stitched_img)
38 |             # Attempt to load a simple font (adjust path or use default if needed)
39 |             try:
40 |                 # On Linux/macOS, common paths
41 |                 font_path = "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf" 
42 |                 if not os.path.exists(font_path): font_path = "/System/Library/Fonts/Supplemental/Arial Bold.ttf" # macOS fallback
43 |                 font = ImageFont.truetype(font_path, 15)
44 |             except IOError:
45 |                 logger.warning("Default font not found, using Pillow's default.")
46 |                 font = ImageFont.load_default()
47 | 
48 |             # Label 1 (Baseline)
49 |             label1_pos = (10, 5)
50 |             draw.text(label1_pos, f"1: {label1}", fill=(0, 0, 0, 255), font=font)
51 |             
52 |             # Label 2 (Current)
53 |             label2_pos = (width1 + 10, 5)
54 |             draw.text(label2_pos, f"2: {label2}", fill=(0, 0, 0, 255), font=font)
55 | 
56 |         except Exception as e:
57 |             logger.warning(f"Could not add labels to stitched image: {e}")
58 |             # Return image without labels if drawing fails
59 | 
60 |         stitched_img.save("./stitched.png")
61 |         return stitched_img
62 |     
63 | def compare_images(prompt: str, image_bytes_1: bytes, image_bytes_2: bytes, image_client: LLMClient) -> str:
64 |         """
65 |         Compares two images using the multimodal LLM based on the prompt,
66 |         by stitching them into a single image first.
67 |         """
68 |         
69 |         logger.info("Preparing images for stitched comparison...")
70 |         try:
71 |             img1 = Image.open(io.BytesIO(image_bytes_1)).convert("RGBA")
72 |             img2 = Image.open(io.BytesIO(image_bytes_2)).convert("RGBA")
73 | 
74 |             if img1.size != img2.size:
75 |                  error_msg = f"Visual Comparison Failed: Image dimensions mismatch. Baseline: {img1.size}, Current: {img2.size}."
76 |                  logger.error(error_msg)
77 |                  return f"Error: {error_msg}" # Return error directly
78 | 
79 |             stitched_image_pil = stitch_images(img1, img2)
80 |             if not stitched_image_pil:
81 |                 return "Error: Failed to stitch images."
82 | 
83 |             # Convert stitched image to bytes
84 |             stitched_buffer = io.BytesIO()
85 |             stitched_image_pil.save(stitched_buffer, format="PNG")
86 |             stitched_image_bytes = stitched_buffer.getvalue()
87 |             logger.info(f"Images stitched successfully (new size: {stitched_image_pil.size}). Requesting LLM comparison...")
88 | 
89 |         except Exception as e:
90 |              logger.error(f"Error processing images for stitching: {e}", exc_info=True)
91 |              return f"Error: Image processing failed - {e}"
92 | 
93 | 
94 |         return image_client.generate_multimodal(prompt, stitched_image_bytes)
95 |     
96 | 
```

--------------------------------------------------------------------------------
/src/llm/clients/gemini_client.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/llm/clients/gemini_client.py
  2 | from google import genai
  3 | from PIL import Image
  4 | import io
  5 | import logging
  6 | import time # Import time module
  7 | import threading # Import threading for lock
  8 | from typing import Type, Optional, Union, List, Dict, Any
  9 | logger = logging.getLogger(__name__)
 10 | import base64
 11 | import json
 12 | 
 13 | from ...utils.utils import load_api_key
 14 | 
 15 | class GeminiClient:
 16 |     def __init__(self):
 17 |         self.client = None
 18 |         gemini_api_key = load_api_key()
 19 |         if not gemini_api_key:
 20 |             raise ValueError("gemini_api_key is required for provider 'gemini'")
 21 |         try:
 22 |             # genai.configure(api_key=gemini_api_key) # configure is global, prefer Client
 23 |             self.client = genai.Client(api_key=gemini_api_key)
 24 |             # Test connection slightly by listing models (optional)
 25 |             # list(self.client.models.list())
 26 |             logger.info("Google Gemini Client initialized.")
 27 |         except Exception as e:
 28 |             logger.error(f"Failed to initialize Google Gemini Client: {e}", exc_info=True)
 29 |             raise RuntimeError(f"Gemini client initialization failed: {e}")
 30 |         
 31 |     
 32 |     def generate_text(self, prompt: str) -> str:
 33 |         """Generates text using the Gemini text model, respecting rate limits."""
 34 |         try:
 35 |             # Truncate prompt for logging if too long
 36 |             log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
 37 |             logger.debug(f"Sending text prompt (truncated): {log_prompt}")
 38 |             # response = self.text_model.generate_content(prompt)
 39 |             response = self.client.models.generate_content(
 40 |                         model='gemini-2.0-flash',
 41 |                         contents=prompt
 42 |                 )
 43 |             logger.debug("Received text response.")
 44 | 
 45 |             # Improved response handling
 46 |             if hasattr(response, 'text'):
 47 |                 return response.text
 48 |             elif response.parts:
 49 |                 # Sometimes response might be in parts without direct .text attribute
 50 |                 return "".join(part.text for part in response.parts if hasattr(part, 'text'))
 51 |             elif response.prompt_feedback and response.prompt_feedback.block_reason:
 52 |                 block_reason = response.prompt_feedback.block_reason
 53 |                 block_message = f"Error: Content generation blocked due to {block_reason}"
 54 |                 if response.prompt_feedback.safety_ratings:
 55 |                         block_message += f" - Safety Ratings: {response.prompt_feedback.safety_ratings}"
 56 |                 logger.warning(block_message)
 57 |                 return block_message
 58 |             else:
 59 |                 logger.warning(f"Text generation returned no text/parts and no block reason. Response: {response}")
 60 |                 return "Error: Empty or unexpected response from LLM."
 61 | 
 62 |         except Exception as e:
 63 |             logger.error(f"Error during Gemini text generation: {e}", exc_info=True)
 64 |             return f"Error: Failed to communicate with Gemini API - {type(e).__name__}: {e}"
 65 |         
 66 |     def generate_multimodal(self, prompt: str, image_bytes: bytes) -> str:
 67 |           """Generates text based on a prompt and an image, respecting rate limits."""
 68 |           try:
 69 |                log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
 70 |                #   logger.debug(f"Sending multimodal prompt (truncated): {log_prompt} with image.")
 71 |                image = Image.open(io.BytesIO(image_bytes))
 72 |                #   response = self.vision_model.generate_content([prompt, image])
 73 |                response = self.client.models.generate_content(
 74 |                     model='gemini-2.0-flash',
 75 |                     contents=[
 76 |                          prompt,
 77 |                          image
 78 |                     ]
 79 |                )
 80 |                logger.debug("Received multimodal response.")
 81 | 
 82 |                # Improved response handling (similar to text)
 83 |                if hasattr(response, 'text'):
 84 |                     return response.text
 85 |                elif response.parts:
 86 |                     return "".join(part.text for part in response.parts if hasattr(part, 'text'))
 87 |                elif response.prompt_feedback and response.prompt_feedback.block_reason:
 88 |                     block_reason = response.prompt_feedback.block_reason
 89 |                     block_message = f"Error: Multimodal generation blocked due to {block_reason}"
 90 |                     if response.prompt_feedback.safety_ratings:
 91 |                          block_message += f" - Safety Ratings: {response.prompt_feedback.safety_ratings}"
 92 |                     logger.warning(block_message)
 93 |                     return block_message
 94 |                else:
 95 |                     logger.warning(f"Multimodal generation returned no text/parts and no block reason. Response: {response}")
 96 |                     return "Error: Empty or unexpected response from Vision LLM."
 97 | 
 98 |           except Exception as e:
 99 |                logger.error(f"Error during Gemini multimodal generation: {e}", exc_info=True)
100 |                return f"Error: Failed to communicate with Gemini Vision API - {type(e).__name__}: {e}"
101 |           
102 | 
103 |     def generate_json(self, Schema_Class: Type, prompt: str, image_bytes: Optional[bytes] = None) -> Union[Dict[str, Any], str]:
104 |         """generates json based on prompt and a defined schema"""
105 |         contents = prompt
106 |         if(image_bytes is not None):
107 |             image = Image.open(io.BytesIO(image_bytes))
108 |             contents = [prompt, image]
109 |         try:
110 |             log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
111 |             logger.debug(f"Sending text prompt (truncated): {log_prompt}")
112 |             response = self.client.models.generate_content(
113 |                 model='gemini-2.0-flash',
114 |                 contents=contents,
115 |                 config={
116 |                         'response_mime_type': 'application/json',
117 |                         'response_schema': Schema_Class
118 |                 }
119 |             )
120 |             logger.debug("Received json response from LLM")
121 |             if hasattr(response, 'parsed'):
122 |                 return response.parsed
123 |             elif response.prompt_feedback and response.prompt_feedback.block_reason:
124 |                 block_reason = response.prompt_feedback.block_reason
125 |                 block_message = f"Error: JSON generation blocked due to {block_reason}"
126 |                 if response.prompt_feedback.safety_ratings:
127 |                         block_message += f" - Safety Ratings: {response.prompt_feedback.safety_ratings}"
128 |                 logger.warning(block_message)
129 |                 return block_message
130 |             else:
131 |                 logger.warning(f"JSON generation returned no text/parts and no block reason. Response: {response}")
132 |                 return "Error: Empty or unexpected response from JSON LLM."
133 |         except Exception as e:
134 |             logger.error(f"Error during Gemini JSON generation: {e}", exc_info=True)
135 |             return f"Error: Failed to communicate with Gemini JSON API - {type(e).__name__}: {e}"
136 | 
```

--------------------------------------------------------------------------------
/src/dom/history/service.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/dom/history/service.py
  2 | import hashlib
  3 | from typing import Optional, List, Dict # Added Dict
  4 | 
  5 | # Use relative imports
  6 | from ..views import DOMElementNode # Import from parent package's views
  7 | from .view import DOMHistoryElement, HashedDomElement # Import from sibling view
  8 | 
  9 | # Requires BrowserContext._enhanced_css_selector_for_element
 10 | # This needs to be available. Let's assume DomService provides it statically for now.
 11 | from ..service import DomService
 12 | 
 13 | 
 14 | class HistoryTreeProcessor:
 15 |     """
 16 |     Operations for comparing DOM elements across different states using hashing.
 17 |     """
 18 | 
 19 |     @staticmethod
 20 |     def convert_dom_element_to_history_element(dom_element: DOMElementNode) -> DOMHistoryElement:
 21 |         """Converts a live DOMElementNode to a serializable DOMHistoryElement."""
 22 |         if not dom_element: return None # Added safety check
 23 | 
 24 |         parent_branch_path = HistoryTreeProcessor._get_parent_branch_path(dom_element)
 25 |         # Use the static method from DomService to generate the selector
 26 |         css_selector = DomService._enhanced_css_selector_for_element(dom_element)
 27 | 
 28 |         # Ensure coordinate/viewport data is copied correctly
 29 |         page_coords = dom_element.page_coordinates.model_dump() if dom_element.page_coordinates else None
 30 |         viewport_coords = dom_element.viewport_coordinates.model_dump() if dom_element.viewport_coordinates else None
 31 |         viewport_info = dom_element.viewport_info.model_dump() if dom_element.viewport_info else None
 32 | 
 33 | 
 34 |         return DOMHistoryElement(
 35 |             tag_name=dom_element.tag_name,
 36 |             xpath=dom_element.xpath,
 37 |             highlight_index=dom_element.highlight_index,
 38 |             entire_parent_branch_path=parent_branch_path,
 39 |             attributes=dom_element.attributes,
 40 |             shadow_root=dom_element.shadow_root,
 41 |             css_selector=css_selector, # Use generated selector
 42 |             # Pass the Pydantic models directly if DOMHistoryElement expects them
 43 |             page_coordinates=dom_element.page_coordinates,
 44 |             viewport_coordinates=dom_element.viewport_coordinates,
 45 |             viewport_info=dom_element.viewport_info,
 46 |         )
 47 | 
 48 |     @staticmethod
 49 |     def find_history_element_in_tree(dom_history_element: DOMHistoryElement, tree: DOMElementNode) -> Optional[DOMElementNode]:
 50 |         """Finds an element in a new DOM tree that matches a historical element."""
 51 |         if not dom_history_element or not tree: return None
 52 | 
 53 |         hashed_dom_history_element = HistoryTreeProcessor._hash_dom_history_element(dom_history_element)
 54 | 
 55 |         # Define recursive search function
 56 |         def process_node(node: DOMElementNode) -> Optional[DOMElementNode]:
 57 |             if not isinstance(node, DOMElementNode): # Skip non-element nodes
 58 |                 return None
 59 | 
 60 |             # Only hash and compare elements that could potentially match (e.g., have attributes/xpath)
 61 |             # Optimization: maybe check tag_name first?
 62 |             if node.tag_name == dom_history_element.tag_name:
 63 |                 hashed_node = HistoryTreeProcessor._hash_dom_element(node)
 64 |                 if hashed_node == hashed_dom_history_element:
 65 |                     # Found a match based on hash
 66 |                     # Optional: Add secondary checks here if needed (e.g., text snippet)
 67 |                     return node
 68 | 
 69 |             # Recursively search children
 70 |             for child in node.children:
 71 |                  # Important: Only recurse into DOMElementNode children
 72 |                  if isinstance(child, DOMElementNode):
 73 |                       result = process_node(child)
 74 |                       if result is not None:
 75 |                            return result # Return immediately if found in subtree
 76 | 
 77 |             return None # Not found in this branch
 78 | 
 79 |         return process_node(tree)
 80 | 
 81 |     @staticmethod
 82 |     def compare_history_element_and_dom_element(dom_history_element: DOMHistoryElement, dom_element: DOMElementNode) -> bool:
 83 |         """Compares a historical element and a live element using hashes."""
 84 |         if not dom_history_element or not dom_element: return False
 85 | 
 86 |         hashed_dom_history_element = HistoryTreeProcessor._hash_dom_history_element(dom_history_element)
 87 |         hashed_dom_element = HistoryTreeProcessor._hash_dom_element(dom_element)
 88 | 
 89 |         return hashed_dom_history_element == hashed_dom_element
 90 | 
 91 |     @staticmethod
 92 |     def _hash_dom_history_element(dom_history_element: DOMHistoryElement) -> Optional[HashedDomElement]:
 93 |         """Generates a hash object from a DOMHistoryElement."""
 94 |         if not dom_history_element: return None
 95 | 
 96 |         # Use the stored parent path
 97 |         branch_path_hash = HistoryTreeProcessor._parent_branch_path_hash(dom_history_element.entire_parent_branch_path)
 98 |         attributes_hash = HistoryTreeProcessor._attributes_hash(dom_history_element.attributes)
 99 |         xpath_hash = HistoryTreeProcessor._xpath_hash(dom_history_element.xpath)
100 | 
101 |         return HashedDomElement(branch_path_hash, attributes_hash, xpath_hash)
102 | 
103 |     @staticmethod
104 |     def _hash_dom_element(dom_element: DOMElementNode) -> Optional[HashedDomElement]:
105 |         """Generates a hash object from a live DOMElementNode."""
106 |         if not dom_element: return None
107 | 
108 |         parent_branch_path = HistoryTreeProcessor._get_parent_branch_path(dom_element)
109 |         branch_path_hash = HistoryTreeProcessor._parent_branch_path_hash(parent_branch_path)
110 |         attributes_hash = HistoryTreeProcessor._attributes_hash(dom_element.attributes)
111 |         xpath_hash = HistoryTreeProcessor._xpath_hash(dom_element.xpath)
112 |         # text_hash = DomTreeProcessor._text_hash(dom_element) # Text hash still excluded
113 | 
114 |         return HashedDomElement(branch_path_hash, attributes_hash, xpath_hash)
115 | 
116 |     @staticmethod
117 |     def _get_parent_branch_path(dom_element: DOMElementNode) -> List[str]:
118 |         """Gets the list of tag names from the element up to the root."""
119 |         parents: List[str] = [] # Store tag names directly
120 |         current_element: Optional[DOMElementNode] = dom_element
121 |         while current_element is not None:
122 |             # Prepend tag name to maintain order from root to element
123 |             parents.insert(0, current_element.tag_name)
124 |             current_element = current_element.parent # Access the parent attribute
125 | 
126 |         # The loop includes the element itself, the definition might imply *excluding* it
127 |         # If path should *exclude* the element itself, remove the first element:
128 |         # if parents: parents.pop(0) # No, the JS build tree Xpath includes self, let's keep it consistent
129 |         return parents
130 | 
131 |     @staticmethod
132 |     def _parent_branch_path_hash(parent_branch_path: List[str]) -> str:
133 |         """Hashes the parent branch path string."""
134 |         # Normalize: use lowercase tags and join consistently
135 |         parent_branch_path_string = '/'.join(tag.lower() for tag in parent_branch_path)
136 |         return hashlib.sha256(parent_branch_path_string.encode('utf-8')).hexdigest()
137 | 
138 |     @staticmethod
139 |     def _attributes_hash(attributes: Dict[str, str]) -> str:
140 |         """Hashes the element's attributes dictionary."""
141 |         # Ensure consistent order by sorting keys
142 |         # Normalize attribute values (e.g., strip whitespace?) - Keep simple for now
143 |         attributes_string = ''.join(f'{key}={attributes[key]}' for key in sorted(attributes.keys()))
144 |         return hashlib.sha256(attributes_string.encode('utf-8')).hexdigest()
145 | 
146 |     @staticmethod
147 |     def _xpath_hash(xpath: str) -> str:
148 |         """Hashes the element's XPath."""
149 |         # Normalize XPath? (e.g., lowercase tags) - Assume input is consistent for now
150 |         return hashlib.sha256(xpath.encode('utf-8')).hexdigest()
151 | 
152 |     # _text_hash remains commented out / unused based on the original code's decision
153 |     # @staticmethod
154 |     # def _text_hash(dom_element: DOMElementNode) -> str:
155 |     #     """ """
156 |     #     text_string = dom_element.get_all_text_till_next_clickable_element()
157 |     #     return hashlib.sha256(text_string.encode()).hexdigest()
158 | 
159 | 
```

--------------------------------------------------------------------------------
/src/core/task_manager.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/core/task_manager.py
  2 | import logging
  3 | from typing import List, Dict, Any, Optional
  4 | 
  5 | logger = logging.getLogger(__name__)
  6 | 
  7 | class TaskManager:
  8 |     """Manages the main task, subtasks, progress, and status."""
  9 | 
 10 |     def __init__(self, max_retries_per_subtask: int = 2): # Renamed parameter for clarity internally
 11 |         self.main_task: str = "" # Stores the overall feature description
 12 |         self.subtasks: List[Dict[str, Any]] = [] # Stores the individual test steps
 13 |         self.current_subtask_index: int = 0 # Index of the step being processed or next to process
 14 |         self.max_retries_per_subtask: int = max_retries_per_subtask
 15 |         logger.info(f"TaskManager (Test Mode) initialized (max_retries_per_step={max_retries_per_subtask}).")
 16 | 
 17 |     def set_main_task(self, feature_description: str):
 18 |         """Sets the main feature description being tested."""
 19 |         self.main_task = feature_description
 20 |         self.subtasks = []
 21 |         self.current_subtask_index = 0
 22 |         logger.info(f"Feature under test set: {feature_description}")
 23 | 
 24 | 
 25 |     def add_subtasks(self, test_step_list: List[str]):
 26 |         """Adds a list of test steps derived from the feature description."""
 27 |         if not self.main_task:
 28 |             logger.error("Cannot add test steps before setting a feature description.")
 29 |             return
 30 | 
 31 |         if not isinstance(test_step_list, list) or not all(isinstance(s, str) and s for s in test_step_list):
 32 |              logger.error(f"Invalid test step list format received: {test_step_list}")
 33 |              raise ValueError("Test step list must be a non-empty list of non-empty strings.")
 34 | 
 35 |         self.subtasks = [] # Clear existing steps before adding new ones
 36 |         for desc in test_step_list:
 37 |             self.subtasks.append({
 38 |                 "description": desc, # The test step description
 39 |                 "status": "pending",  # pending, in_progress, done, failed
 40 |                 "attempts": 0,
 41 |                 "result": None, # Store result of the step (e.g., extracted text)
 42 |                 "error": None,  # Store error if the step failed
 43 |                 "_recorded_": False,
 44 |                 "last_failed_selector": None # Store selector if failure was element-related
 45 |             })
 46 |         self.current_subtask_index = 0 if self.subtasks else -1 # Reset index
 47 |         logger.info(f"Added {len(test_step_list)} test steps.")
 48 | 
 49 |     def insert_subtasks(self, index: int, new_step_descriptions: List[str]):
 50 |         """Inserts new test steps at a specific index."""
 51 |         if not isinstance(new_step_descriptions, list) or not all(isinstance(s, str) and s for s in new_step_descriptions):
 52 |             logger.error(f"Invalid new step list format received for insertion: {new_step_descriptions}")
 53 |             return False # Indicate failure
 54 | 
 55 |         if not (0 <= index <= len(self.subtasks)): # Allow insertion at the end
 56 |              logger.error(f"Invalid index {index} for inserting subtasks (Total steps: {len(self.subtasks)}).")
 57 |              return False
 58 | 
 59 |         new_tasks = []
 60 |         for desc in new_step_descriptions:
 61 |             new_tasks.append({
 62 |                 "description": desc,
 63 |                 "status": "pending", # New tasks start as pending
 64 |                 "attempts": 0,
 65 |                 "result": None,
 66 |                 "error": None,
 67 |                 "_recorded_": False, # Ensure internal flags are initialized
 68 |                 "last_failed_selector": None
 69 |             })
 70 | 
 71 |         # Insert the new tasks into the list
 72 |         self.subtasks[index:index] = new_tasks
 73 |         logger.info(f"Inserted {len(new_tasks)} new subtasks at index {index}.")
 74 | 
 75 |         # Crucial: If the insertion happens at or before the current index,
 76 |         # we might need to adjust the current index, but generally, the next call
 77 |         # to get_next_subtask() should find the newly inserted pending tasks if they
 78 |         # are before the previously 'current' task. Let get_next_subtask handle finding the next actionable item.
 79 |         # If insertion happens *after* current processing index, it doesn't immediately affect flow.
 80 | 
 81 |         return True # Indicate success
 82 | 
 83 | 
 84 |     def get_next_subtask(self) -> Optional[Dict[str, Any]]:
 85 |         """
 86 |         Gets the first test step that is 'pending' or 'failed' with retries remaining.
 87 |         Iterates sequentially.
 88 |         """
 89 |         for index, task in enumerate(self.subtasks):
 90 |             # In recorder mode, 'failed' means AI suggestion failed, allow retry
 91 |             # In executor mode (if used here), 'failed' means execution failed
 92 |             is_pending = task["status"] == "pending"
 93 |             is_retryable_failure = (task["status"] == "failed" and
 94 |                                     task["attempts"] <= self.max_retries_per_subtask)
 95 | 
 96 |             if is_pending or is_retryable_failure:
 97 |                  # Found the next actionable step
 98 | 
 99 |                  if is_retryable_failure:
100 |                      logger.info(f"Retrying test step {index + 1} (Attempt {task['attempts'] + 1}/{self.max_retries_per_subtask + 1})")
101 |                  else: # Pending
102 |                       logger.info(f"Starting test step {index + 1}/{len(self.subtasks)}: {task['description']}")
103 | 
104 |                  # Update the main index to point to this task BEFORE changing status
105 |                  self.current_subtask_index = index
106 | 
107 |                  task["status"] = "in_progress"
108 |                  task["attempts"] += 1
109 |                  # Keep error context on retry, clear result
110 |                  task["result"] = None
111 |                  return task
112 | 
113 |         # No actionable tasks found
114 |         logger.info("No more actionable test steps found.")
115 |         self.current_subtask_index = len(self.subtasks) # Mark completion
116 |         return None
117 | 
118 | 
119 | 
120 |     def update_subtask_status(self, index: int, status: str, result: Any = None, error: Optional[str] = None, force_update: bool = False):
121 |         """Updates the status of a specific test step."""
122 |         if 0 <= index < len(self.subtasks):
123 |             task = self.subtasks[index]
124 |             current_status = task["status"]
125 |             # Allow update only if forced or if task is 'in_progress'
126 |             # if not force_update and task["status"] != "in_progress":
127 |             #     logger.warning(f"Attempted to update status of test step {index + 1} ('{task['description'][:50]}...') "
128 |             #                 f"from '{task['status']}' to '{status}', but it's not 'in_progress'. Ignoring (unless force_update=True).")
129 |             #     return
130 |             
131 |             # Log if the status is actually changing
132 |             if current_status != status:
133 |                 logger.info(f"Updating Test Step {index + 1} status from '{current_status}' to '{status}'.")
134 |             else:
135 |                  logger.debug(f"Test Step {index + 1} status already '{status}'. Updating result/error.")
136 | 
137 |             task["status"] = status
138 |             task["result"] = result
139 |             task["error"] = error
140 | 
141 |             log_message = f"Test Step {index + 1} ('{task['description'][:50]}...') processed. Status: {status}."
142 |             if result and status == 'done': log_message += f" Result: {str(result)[:100]}..."
143 |             if error: log_message += f" Error/Note: {error}"
144 |             # Use debug for potentially repetitive updates if status doesn't change
145 |             log_level = logging.INFO if current_status != status else logging.DEBUG
146 |             logger.log(log_level, log_message)
147 | 
148 |             # Log permanent failure clearly
149 |             if status == "failed" and task["attempts"] > self.max_retries_per_subtask:
150 |                  logger.warning(f"Test Step {index + 1} failed permanently after {task['attempts']} attempts.")
151 | 
152 |         else:
153 |             logger.error(f"Invalid index {index} for updating test step status (Total steps: {len(self.subtasks)}).")
154 | 
155 | 
156 | 
157 |     def get_current_subtask(self) -> Optional[Dict[str, Any]]:
158 |          """Gets the test step currently marked by current_subtask_index (likely 'in_progress')."""
159 |          if 0 <= self.current_subtask_index < len(self.subtasks):
160 |               return self.subtasks[self.current_subtask_index]
161 |          return None
162 | 
163 | 
164 | 
165 |     def is_complete(self) -> bool:
166 |         """Checks if all test steps have been processed (are 'done' or 'failed' permanently)."""
167 |         for task in self.subtasks:
168 |             if task['status'] == 'pending' or \
169 |                task['status'] == 'in_progress' or \
170 |                (task['status'] == 'failed' and task['attempts'] <= self.max_retries_per_subtask):
171 |                 return False # Found an actionable step
172 |         return True # All steps processed
173 | 
```

--------------------------------------------------------------------------------
/src/agents/auth_agent.py:
--------------------------------------------------------------------------------

```python
  1 | # File: record_auth_state_selectors.py
  2 | import time
  3 | import os
  4 | import logging
  5 | import getpass
  6 | from typing import Optional, Dict, Any
  7 | from pydantic import BaseModel, Field
  8 | 
  9 | from patchright.sync_api import Error as PlaywrightError, TimeoutError as PlaywrightTimeoutError
 10 | 
 11 | # Import necessary components from your project structure
 12 | from ..browser.browser_controller import BrowserController
 13 | from ..llm.llm_client import LLMClient # Assuming you have this initialized
 14 | from ..dom.views import DOMState # To type hint DOM state
 15 | 
 16 | # Configure basic logging for this script
 17 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
 18 | logger = logging.getLogger(__name__)
 19 | 
 20 | 
 21 | # Generic descriptions for LLM to find elements
 22 | USERNAME_FIELD_DESC = "the username input field"
 23 | PASSWORD_FIELD_DESC = "the password input field"
 24 | SUBMIT_BUTTON_DESC = "the login or submit button"
 25 | # Element to verify login success
 26 | LOGIN_SUCCESS_SELECTOR_DESC = "the logout button or link" # Description for verification element
 27 | 
 28 | # --- Output file path ---
 29 | AUTH_STATE_FILE = "auth_state.json"
 30 | # ---------------------
 31 | 
 32 | # --- Pydantic Schema for LLM Selector Response ---
 33 | class LLMSelectorResponse(BaseModel):
 34 |     selector: Optional[str] = Field(..., description="The best CSS selector found for the described element, or null if not found/identifiable.")
 35 |     reasoning: str = Field(..., description="Explanation for the chosen selector or why none was found.")
 36 | # -----------------------------------------------
 37 | 
 38 | # --- Helper Function to Find Selector via LLM ---
 39 | def find_element_selector_via_llm(
 40 |     llm_client: LLMClient,
 41 |     element_description: str,
 42 |     dom_state: Optional[DOMState],
 43 |     page: Any # Playwright Page object for validation
 44 | ) -> Optional[str]:
 45 |     """
 46 |     Uses LLM to find a selector for a described element based on DOM context.
 47 |     Validates the selector before returning.
 48 |     """
 49 |     if not llm_client:
 50 |         logger.error("LLMClient is not available.")
 51 |         return None
 52 |     if not dom_state or not dom_state.element_tree:
 53 |         logger.error(f"Cannot find selector for '{element_description}': DOM state is not available.")
 54 |         return None
 55 | 
 56 |     try:
 57 |         dom_context_str, _ = dom_state.element_tree.generate_llm_context_string(context_purpose='verification')
 58 |         current_url = page.url if page else "Unknown"
 59 | 
 60 |         prompt = f"""
 61 | You are an AI assistant identifying CSS selectors for web automation.
 62 | Based on the following HTML context and the element description, provide the most robust CSS selector.
 63 | 
 64 | **Current URL:** {current_url}
 65 | **Element to Find:** "{element_description}"
 66 | 
 67 | **HTML Context (Visible elements, interactive `[index]`, static `(Static)`):**
 68 | ```html
 69 | {dom_context_str}
 70 | \```
 71 | 
 72 | **Your Task:**
 73 | 1. Analyze the HTML context to find the single element that best matches the description "{element_description}".
 74 | 2. Provide the most stable and specific CSS selector for that element. Prioritize IDs, unique data attributes (like data-testid), or name attributes. Avoid relying solely on text or highly dynamic classes if possible.
 75 | 3. If no suitable element is found, return null for the selector.
 76 | 
 77 | **Output Format:** Respond ONLY with a JSON object matching the following schema:
 78 | ```json
 79 | {{
 80 |   "selector": "YOUR_SUGGESTED_CSS_SELECTOR_OR_NULL",
 81 |   "reasoning": "Explain your choice or why none was found."
 82 | }}
 83 | \```
 84 | """
 85 |         logger.debug(f"Sending prompt to LLM to find selector for: '{element_description}'")
 86 |         response_obj = llm_client.generate_json(LLMSelectorResponse, prompt)
 87 | 
 88 |         if isinstance(response_obj, LLMSelectorResponse):
 89 |             selector = response_obj.selector
 90 |             reasoning = response_obj.reasoning
 91 |             if selector:
 92 |                 logger.info(f"LLM suggested selector '{selector}' for '{element_description}'. Reasoning: {reasoning}")
 93 |                 # --- Validate Selector ---
 94 |                 try:
 95 |                     handles = page.query_selector_all(selector)
 96 |                     count = len(handles)
 97 |                     if count == 1:
 98 |                         logger.info(f"✅ Validation PASSED: Selector '{selector}' uniquely found the element.")
 99 |                         return selector
100 |                     elif count > 1:
101 |                         logger.warning(f"⚠️ Validation WARNING: Selector '{selector}' matched {count} elements. Using the first one.")
102 |                         return selector # Still return it, maybe it's okay
103 |                     else: # count == 0
104 |                         logger.error(f"❌ Validation FAILED: Selector '{selector}' did not find any elements.")
105 |                         return None
106 |                 except Exception as validate_err:
107 |                     logger.error(f"❌ Validation ERROR for selector '{selector}': {validate_err}")
108 |                     return None
109 |                 # --- End Validation ---
110 |             else:
111 |                 logger.error(f"LLM could not find a selector for '{element_description}'. Reasoning: {reasoning}")
112 |                 return None
113 |         elif isinstance(response_obj, str): # LLM Error string
114 |              logger.error(f"LLM returned an error finding selector for '{element_description}': {response_obj}")
115 |              return None
116 |         else:
117 |             logger.error(f"Unexpected response type from LLM finding selector for '{element_description}': {type(response_obj)}")
118 |             return None
119 | 
120 |     except Exception as e:
121 |         logger.error(f"Error during LLM selector identification for '{element_description}': {e}", exc_info=True)
122 |         return None
123 | # --- End Helper Function ---
124 | 
125 | 
126 | # --- Main Function ---
127 | def record_selectors_and_save_auth_state(llm_client: LLMClient, login_url: str, auth_state_file: str = AUTH_STATE_FILE):
128 |     """
129 |     Uses LLM to find login selectors, gets credentials securely, performs login,
130 |     and saves the authentication state.
131 |     """
132 |     logger.info("--- Authentication State Generation (Recorder-Assisted Selectors) ---")
133 | 
134 |     if not login_url:
135 |         logger.error(f"Login url not provided. Exiting...")
136 |         return False
137 |     
138 |     # Get credentials securely first
139 |     try:
140 |         username = input(f"Enter username (will be visible): ")
141 |         if not username: raise ValueError("Username cannot be empty.")
142 |         password = getpass.getpass(f"Enter password for '{username}' (input will be hidden): ")
143 |         if not password: raise ValueError("Password cannot be empty.")
144 |     except (EOFError, ValueError) as e:
145 |         logger.error(f"\n❌ Input error: {e}. Aborting.")
146 |         return False
147 |     except Exception as e:
148 |         logger.error(f"\n❌ Error reading input: {e}")
149 |         return False
150 | 
151 |     logger.info("Initializing BrowserController (visible browser)...")
152 |     # Must run non-headless for user interaction/visibility AND selector validation
153 |     browser_controller = BrowserController(headless=False)
154 |     final_success = False
155 | 
156 |     try:
157 |         browser_controller.start()
158 |         page = browser_controller.page
159 |         if not page: raise RuntimeError("Failed to initialize browser page.")
160 | 
161 |         logger.info(f"Navigating browser to login page: {login_url}")
162 |         browser_controller.goto(login_url)
163 | 
164 |         logger.info("Attempting to identify login form selectors using LLM...")
165 |         # Give the page a moment to settle before getting DOM
166 |         time.sleep(1)
167 |         dom_state = browser_controller.get_structured_dom(highlight_all_clickable_elements=False, viewport_expansion=-1)
168 | 
169 |         # Find Selectors using the helper function
170 |         username_selector = find_element_selector_via_llm(llm_client, USERNAME_FIELD_DESC, dom_state, page)
171 |         if not username_selector: return False # Abort if not found
172 | 
173 |         password_selector = find_element_selector_via_llm(llm_client, PASSWORD_FIELD_DESC, dom_state, page)
174 |         if not password_selector: return False
175 | 
176 |         submit_selector = find_element_selector_via_llm(llm_client, SUBMIT_BUTTON_DESC, dom_state, page)
177 |         if not submit_selector: return False
178 | 
179 |         logger.info("Successfully identified all necessary login selectors.")
180 |         logger.info(f"  Username Field: '{username_selector}'")
181 |         logger.info(f"  Password Field: '{password_selector}'")
182 |         logger.info(f"  Submit Button:  '{submit_selector}'")
183 | 
184 |         input("\n-> Press Enter to proceed with login using these selectors and your credentials...")
185 | 
186 |         # --- Execute Login (using identified selectors and secure credentials) ---
187 |         logger.info(f"Typing username into: {username_selector}")
188 |         browser_controller.type(username_selector, username)
189 |         time.sleep(0.3)
190 | 
191 |         logger.info(f"Typing password into: {password_selector}")
192 |         browser_controller.type(password_selector, password)
193 |         time.sleep(0.3)
194 | 
195 |         logger.info(f"Clicking submit button: {submit_selector}")
196 |         browser_controller.click(submit_selector)
197 | 
198 |         # --- Verify Login Success ---
199 |         logger.info("Attempting to identify login success element selector using LLM...")
200 |         # Re-fetch DOM state after potential page change/update
201 |         time.sleep(1) # Wait briefly for page update
202 |         post_login_dom_state = browser_controller.get_structured_dom(highlight_all_clickable_elements=False, viewport_expansion=-1)
203 |         login_success_selector = find_element_selector_via_llm(llm_client, LOGIN_SUCCESS_SELECTOR_DESC, post_login_dom_state, page)
204 | 
205 |         if not login_success_selector:
206 |             logger.error("❌ Login Verification Failed: Could not identify the confirmation element via LLM.")
207 |             raise RuntimeError("Failed to identify login confirmation element.") # Treat as failure
208 | 
209 |         logger.info(f"Waiting for login confirmation element ({login_success_selector}) to appear...")
210 |         try:
211 |             page.locator(login_success_selector).wait_for(state="visible", timeout=15000)
212 |             logger.info("✅ Login successful! Confirmation element found.")
213 |         except PlaywrightTimeoutError:
214 |             logger.error(f"❌ Login Failed: Confirmation element '{login_success_selector}' did not appear within timeout.")
215 |             raise # Re-raise to be caught by the main handler
216 | 
217 |         # --- Save the storage state ---
218 |         if browser_controller.context:
219 |             logger.info(f"Saving authentication state to {auth_state_file}...")
220 |             browser_controller.context.storage_state(path=auth_state_file)
221 |             logger.info(f"✅ Successfully saved authentication state.")
222 |             final_success = True
223 |         else:
224 |             logger.error("❌ Cannot save state: Browser context is not available.")
225 | 
226 |     except (PlaywrightError, ValueError, RuntimeError) as e:
227 |         logger.error(f"❌ An error occurred: {type(e).__name__}: {e}", exc_info=False)
228 |         if browser_controller and browser_controller.page:
229 |             ts = time.strftime("%Y%m%d_%H%M%S")
230 |             fail_path = f"output/record_auth_error_{ts}.png"
231 |             browser_controller.save_screenshot(fail_path)
232 |             logger.info(f"Saved error screenshot to: {fail_path}")
233 |     except Exception as e:
234 |         logger.critical(f"❌ An unexpected critical error occurred: {e}", exc_info=True)
235 |     finally:
236 |         logger.info("Closing browser...")
237 |         if browser_controller:
238 |             browser_controller.close()
239 | 
240 |     return final_success
241 | # --- End Main Function ---
242 | 
243 | 
```

--------------------------------------------------------------------------------
/src/security/zap_scanner.py:
--------------------------------------------------------------------------------

```python
  1 | # zap_scanner.py
  2 | import logging
  3 | import subprocess
  4 | import os
  5 | import shlex
  6 | import time
  7 | import json
  8 | import requests
  9 | from datetime import datetime
 10 | from .utils import parse_json_file  # Relative import
 11 | 
 12 | ZAP_TIMEOUT_SECONDS = 1800  # 30 minutes default
 13 | ZAP_API_PORT = 8080  # Default ZAP API port
 14 | 
 15 | def run_zap_scan(target_url: str, output_dir="results", timeout=ZAP_TIMEOUT_SECONDS, 
 16 |                  zap_path=None, api_key=None, scan_mode="baseline"):
 17 |     """
 18 |     Runs OWASP ZAP security scanner against a target URL.
 19 |     
 20 |     Args:
 21 |         target_url: The URL to scan
 22 |         output_dir: Directory to store scan results
 23 |         timeout: Maximum time in seconds for the scan
 24 |         zap_path: Path to ZAP installation (uses docker by default)
 25 |         api_key: ZAP API key if required
 26 |         scan_mode: Type of scan - 'baseline', 'full' or 'api'
 27 |     """
 28 |     if not target_url:
 29 |         logging.error("ZAP target URL is required")
 30 |         return []
 31 | 
 32 |     logging.info(f"Starting ZAP scan for target: {target_url}")
 33 |     timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
 34 |     output_filename = f"zap_output_{timestamp}.json"
 35 |     output_filepath = os.path.join(output_dir, output_filename)
 36 | 
 37 |     if not os.path.exists(output_dir):
 38 |         os.makedirs(output_dir)
 39 | 
 40 |     # Determine if using Docker or local ZAP installation
 41 |     use_docker = zap_path is None
 42 |     
 43 |     if use_docker:
 44 |         # Docker command to run ZAP in a container
 45 |         command = [
 46 |             "docker", "run", "--rm", "-v", f"{os.path.abspath(output_dir)}:/zap/wrk:rw",
 47 |             "-t", "owasp/zap2docker-stable", "zap-" + scan_mode + "-scan.py",
 48 |             "-t", target_url,
 49 |             "-J", output_filename
 50 |         ]
 51 |         
 52 |         if api_key:
 53 |             command.extend(["-z", f"api.key={api_key}"])
 54 |     else:
 55 |         # Local ZAP installation
 56 |         script_name = f"zap-{scan_mode}-scan.py"
 57 |         command = [
 58 |             os.path.join(zap_path, script_name),
 59 |             "-t", target_url,
 60 |             "-J", output_filepath
 61 |         ]
 62 |         
 63 |         if api_key:
 64 |             command.extend(["-z", f"api.key={api_key}"])
 65 | 
 66 |     logging.debug(f"Executing ZAP command: {' '.join(shlex.quote(cmd) for cmd in command)}")
 67 | 
 68 |     try:
 69 |         result = subprocess.run(command, capture_output=True, text=True, timeout=timeout, check=False)
 70 | 
 71 |         logging.info("ZAP process finished.")
 72 |         logging.debug(f"ZAP stdout:\n{result.stdout}")
 73 | 
 74 |         if result.returncode != 0:
 75 |             logging.warning(f"ZAP exited with non-zero status code: {result.returncode}")
 76 |             return [f"ZAP exited with non-zero status code: {result.returncode}"]
 77 | 
 78 |         # For Docker, the output will be in the mapped volume
 79 |         actual_output_path = output_filepath if not use_docker else os.path.join(output_dir, output_filename)
 80 |         
 81 |         # Parse the JSON output file
 82 |         report_data = parse_json_file(actual_output_path)
 83 |         
 84 |         if report_data and "site" in report_data:
 85 |             # Process ZAP findings from the report
 86 |             findings = []
 87 |             
 88 |             # Structure varies based on scan mode but generally has sites with alerts
 89 |             for site in report_data.get("site", []):
 90 |                 site_url = site.get("@name", "")
 91 |                 for alert in site.get("alerts", []):
 92 |                     finding = {
 93 |                         'tool': 'OWASP ZAP',
 94 |                         'severity': alert.get("riskdesc", "").split(" ", 1)[0],
 95 |                         'message': alert.get("name", ""),
 96 |                         'description': alert.get("desc", ""),
 97 |                         'url': site_url,
 98 |                         'solution': alert.get("solution", ""),
 99 |                         'references': alert.get("reference", ""),
100 |                         'cweid': alert.get("cweid", ""),
101 |                         'instances': len(alert.get("instances", [])),
102 |                     }
103 |                     findings.append(finding)
104 |             
105 |             logging.info(f"Successfully parsed {len(findings)} findings from ZAP output.")
106 |             return findings
107 |         else:
108 |             logging.warning(f"Could not parse findings from ZAP output file: {actual_output_path}")
109 |             return [f"Could not parse findings from ZAP output file: {actual_output_path}"]
110 | 
111 |     except subprocess.TimeoutExpired:
112 |         logging.error(f"ZAP scan timed out after {timeout} seconds.")
113 |         return [f"ZAP scan timed out after {timeout} seconds."]
114 |     except FileNotFoundError as e:
115 |         if use_docker:
116 |             logging.error("Docker command not found. Is Docker installed and in PATH?")
117 |             return ["Docker command not found. Is Docker installed and in PATH?"]
118 |         else:
119 |             logging.error(f"ZAP command not found at {zap_path}. Is ZAP installed?")
120 |             return [f"ZAP command not found at {zap_path}. Is ZAP installed?"]
121 |     except Exception as e:
122 |         logging.error(f"An unexpected error occurred while running ZAP: {e}")
123 |         return [f"An unexpected error occurred while running ZAP: {e}"]
124 | 
125 | 
126 | def run_zap_api_scan(target_url: str, api_definition: str, output_dir="results", 
127 |                     timeout=ZAP_TIMEOUT_SECONDS, zap_path=None, api_key=None):
128 |     """
129 |     Runs ZAP API scan against a REST API with OpenAPI/Swagger definition.
130 |     
131 |     Args:
132 |         target_url: Base URL of the API
133 |         api_definition: Path to OpenAPI/Swagger definition file
134 |         output_dir: Directory to store scan results
135 |         timeout: Maximum time in seconds for the scan
136 |         zap_path: Path to ZAP installation (uses docker by default)
137 |         api_key: ZAP API key if required
138 |     """
139 |     if not os.path.isfile(api_definition):
140 |         logging.error(f"API definition file not found: {api_definition}")
141 |         return []
142 |         
143 |     # Similar implementation as run_zap_scan but with API scanning options
144 |     timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
145 |     output_filename = f"zap_api_output_{timestamp}.json"
146 |     output_filepath = os.path.join(output_dir, output_filename)
147 | 
148 |     if not os.path.exists(output_dir):
149 |         os.makedirs(output_dir)
150 |     
151 |     use_docker = zap_path is None
152 |     
153 |     if use_docker:
154 |         # Volume mount for API definition file
155 |         api_def_dir = os.path.dirname(os.path.abspath(api_definition))
156 |         api_def_file = os.path.basename(api_definition)
157 |         
158 |         command = [
159 |             "docker", "run", "--rm", 
160 |             "-v", f"{os.path.abspath(output_dir)}:/zap/wrk:rw",
161 |             "-v", f"{api_def_dir}:/zap/api:ro",
162 |             "-t", "owasp/zap2docker-stable", "zap-api-scan.py",
163 |             "-t", target_url,
164 |             "-f", f"/zap/api/{api_def_file}",
165 |             "-J", output_filename
166 |         ]
167 |     else:
168 |         command = [
169 |             os.path.join(zap_path, "zap-api-scan.py"),
170 |             "-t", target_url,
171 |             "-f", api_definition,
172 |             "-J", output_filepath
173 |         ]
174 |     
175 |     if api_key:
176 |         command.extend(["-z", f"api.key={api_key}"])
177 |     
178 |     logging.debug(f"Executing ZAP API scan command: {' '.join(shlex.quote(cmd) for cmd in command)}")
179 |     
180 |     # The rest of the implementation follows similar pattern to run_zap_scan
181 |     try:
182 |         result = subprocess.run(command, capture_output=True, text=True, timeout=timeout, check=False)
183 |         
184 |         # Processing similar to run_zap_scan
185 |         logging.info("ZAP API scan process finished.")
186 |         logging.debug(f"ZAP stdout:\n{result.stdout}")
187 | 
188 |         if result.returncode != 0:
189 |             logging.warning(f"ZAP API scan exited with non-zero status code: {result.returncode}")
190 |             return [f"ZAP API scan exited with non-zero status code: {result.returncode}"]
191 | 
192 |         # For Docker, the output will be in the mapped volume
193 |         actual_output_path = output_filepath if not use_docker else os.path.join(output_dir, output_filename)
194 |         
195 |         # Parse the JSON output file - same processing as run_zap_scan
196 |         report_data = parse_json_file(actual_output_path)
197 |         
198 |         if report_data and "site" in report_data:
199 |             findings = []
200 |             
201 |             for site in report_data.get("site", []):
202 |                 site_url = site.get("@name", "")
203 |                 for alert in site.get("alerts", []):
204 |                     finding = {
205 |                         'tool': 'OWASP ZAP API Scan',
206 |                         'severity': alert.get("riskdesc", "").split(" ", 1)[0],
207 |                         'message': alert.get("name", ""),
208 |                         'description': alert.get("desc", ""),
209 |                         'url': site_url,
210 |                         'solution': alert.get("solution", ""),
211 |                         'references': alert.get("reference", ""),
212 |                         'cweid': alert.get("cweid", ""),
213 |                         'instances': len(alert.get("instances", [])),
214 |                     }
215 |                     findings.append(finding)
216 |             
217 |             logging.info(f"Successfully parsed {len(findings)} findings from ZAP API scan output.")
218 |             return findings
219 |         else:
220 |             logging.warning(f"Could not parse findings from ZAP API scan output file: {actual_output_path}")
221 |             return [f"Could not parse findings from ZAP API scan output file: {actual_output_path}"]
222 |             
223 |     except Exception as e:
224 |         logging.error(f"An unexpected error occurred while running ZAP API scan: {e}")
225 |         return [f"An unexpected error occurred while running ZAP API scan: {e}"]
226 | 
227 | def discover_endpoints(target_url: str, output_dir="results", timeout=600, zap_path=None, api_key=None):
228 |     """
229 |     Uses ZAP's spider to discover endpoints in a web application.
230 |     
231 |     Args:
232 |         target_url: The URL to scan
233 |         output_dir: Directory to store results
234 |         timeout: Maximum time in seconds for the spider
235 |         zap_path: Path to ZAP installation (uses docker by default)
236 |         api_key: ZAP API key if required
237 |     
238 |     Returns:
239 |         List of discovered endpoints and their details
240 |     """
241 |     if not target_url:
242 |         logging.error("Target URL is required for endpoint discovery")
243 |         return []
244 | 
245 |     logging.info(f"Starting ZAP endpoint discovery for: {target_url}")
246 |     timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
247 |     output_filename = f"zap_endpoints_{timestamp}.json"
248 |     output_filepath = os.path.join(output_dir, output_filename)
249 | 
250 |     if not os.path.exists(output_dir):
251 |         os.makedirs(output_dir)
252 | 
253 |     use_docker = zap_path is None
254 |     
255 |     if use_docker:
256 |         command = [
257 |             "docker", "run", "--rm",
258 |             "-v", f"{os.path.abspath(output_dir)}:/zap/wrk:rw",
259 |             "-t", "owasp/zap2docker-stable",
260 |             "zap-full-scan.py",
261 |             "-t", target_url,
262 |             "-J", output_filename,
263 |             "-z", "-config spider.maxDuration=1",  # Limit spider duration
264 |             "--spider-first",  # Run spider before the scan
265 |             "-n", "endpoints.context"  # Don't perform actual scan, just spider
266 |         ]
267 |     else:
268 |         command = [
269 |             os.path.join(zap_path, "zap-full-scan.py"),
270 |             "-t", target_url,
271 |             "-J", output_filepath,
272 |             "-z", "-config spider.maxDuration=1",
273 |             "--spider-first",
274 |             "-n", "endpoints.context"
275 |         ]
276 | 
277 |     if api_key:
278 |         command.extend(["-z", f"api.key={api_key}"])
279 | 
280 |     logging.debug(f"Executing ZAP endpoint discovery: {' '.join(shlex.quote(cmd) for cmd in command)}")
281 | 
282 |     try:
283 |         result = subprocess.run(command, capture_output=True, text=True, timeout=timeout, check=False)
284 |         
285 |         logging.info("ZAP endpoint discovery finished.")
286 |         logging.debug(f"ZAP stdout:\n{result.stdout}")
287 | 
288 |         actual_output_path = output_filepath if not use_docker else os.path.join(output_dir, output_filename)
289 |         
290 |         # Parse the JSON output file
291 |         report_data = parse_json_file(actual_output_path)
292 |         
293 |         if report_data:
294 |             endpoints = []
295 |             # Extract endpoints from spider results
296 |             if "site" in report_data:
297 |                 for site in report_data.get("site", []):
298 |                     site_url = site.get("@name", "")
299 |                     # Extract URLs from alerts and spider results
300 |                     urls = set()
301 |                     
302 |                     # Get URLs from alerts
303 |                     for alert in site.get("alerts", []):
304 |                         for instance in alert.get("instances", []):
305 |                             url = instance.get("uri", "")
306 |                             if url:
307 |                                 urls.add(url)
308 |                     
309 |                     # Add discovered endpoints
310 |                     for url in urls:
311 |                         endpoint = {
312 |                             'url': url,
313 |                             'method': 'GET',  # Default to GET, ZAP spider mainly discovers GET endpoints
314 |                             'source': 'ZAP Spider',
315 |                             'parameters': [],  # Could be enhanced to parse URL parameters
316 |                             'discovered_at': datetime.now().isoformat()
317 |                         }
318 |                         endpoints.append(endpoint)
319 |             
320 |             logging.info(f"Successfully discovered {len(endpoints)} endpoints.")
321 |             
322 |             # Save endpoints to a separate file
323 |             endpoints_file = os.path.join(output_dir, f"discovered_endpoints_{timestamp}.json")
324 |             with open(endpoints_file, 'w') as f:
325 |                 json.dump(endpoints, f, indent=2)
326 |             logging.info(f"Saved discovered endpoints to: {endpoints_file}")
327 |             
328 |             return endpoints
329 |         else:
330 |             logging.warning("No endpoints discovered or parsing failed.")
331 |             return []
332 | 
333 |     except subprocess.TimeoutExpired:
334 |         logging.error(f"Endpoint discovery timed out after {timeout} seconds.")
335 |         return []
336 |     except Exception as e:
337 |         logging.error(f"An error occurred during endpoint discovery: {e}")
338 |         return []
```

--------------------------------------------------------------------------------
/src/llm/clients/openai_client.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/llm/clients/openai_client.py
  2 | from PIL import Image
  3 | import io
  4 | import logging
  5 | import time # Import time module
  6 | import threading # Import threading for lock
  7 | from typing import Type, Optional, Union, List, Dict, Any
  8 | logger = logging.getLogger(__name__)
  9 | import base64
 10 | import json
 11 | 
 12 | from ...utils.utils import load_api_key, load_api_base_url, load_api_version, load_llm_model
 13 | 
 14 | # --- Provider Specific Imports ---
 15 | try:
 16 |     import openai
 17 |     from openai import OpenAI
 18 |     from pydantic import BaseModel # Needed for LLM JSON tool definition
 19 |     OPENAI_SDK = True
 20 | except ImportError:
 21 |     OPENAI_SDK = False
 22 |     # Define dummy classes if LLM libs are not installed to avoid NameErrors
 23 |     class BaseModel: pass
 24 |     class OpenAI: pass
 25 | 
 26 | 
 27 | 
 28 | 
 29 | # --- Helper Function ---
 30 | def _image_bytes_to_base64_url(image_bytes: bytes) -> Optional[str]:
 31 |     """Converts image bytes to a base64 data URL."""
 32 |     try:
 33 |         # Try to determine the image format
 34 |         img = Image.open(io.BytesIO(image_bytes))
 35 |         format = img.format
 36 |         if not format:
 37 |             logger.warning("Could not determine image format, assuming JPEG.")
 38 |             format = "jpeg" # Default assumption
 39 |         else:
 40 |             format = format.lower()
 41 |             if format == 'jpg': # Standardize to jpeg
 42 |                 format = 'jpeg'
 43 | 
 44 |         # Ensure format is supported (common web formats)
 45 |         if format not in ['jpeg', 'png', 'gif', 'webp']:
 46 |              logger.warning(f"Unsupported image format '{format}' for base64 URL, defaulting to JPEG.")
 47 |              format = 'jpeg' # Fallback
 48 | 
 49 |         encoded_string = base64.b64encode(image_bytes).decode('utf-8')
 50 |         return f"data:image/{format};base64,{encoded_string}"
 51 |     except Exception as e:
 52 |         logger.error(f"Error converting image bytes to base64 URL: {e}", exc_info=True)
 53 |         return None
 54 | 
 55 | 
 56 | class OpenAIClient:
 57 |     def __init__(self):
 58 |         self.client = None
 59 |         self.LLM_api_key = load_api_key()
 60 |         self.LLM_api_version = load_api_version()
 61 |         self.LLM_model_name = load_llm_model()
 62 |         self.LLM_endpoint = load_api_base_url()
 63 |         self.LLM_vision_model_name = self.LLM_model_name
 64 |         
 65 |         if not OPENAI_SDK:
 66 |                 raise ImportError("LLM OpenAI libraries (openai, pydantic) are not installed. Please install them.")
 67 |         if not all([self.LLM_api_key, self.LLM_endpoint, self.LLM_api_version, self.LLM_model_name]):
 68 |             raise ValueError("LLM_api_key, LLM_endpoint, LLM_api_version, and LLM_model_name are required for provider 'LLM'")
 69 |         try:
 70 |             self.client = OpenAI(
 71 |                 api_key=self.LLM_api_key,
 72 |                 base_url=self.LLM_endpoint,
 73 |             )
 74 |             # Test connection slightly by listing models (optional, requires different permission potentially)
 75 |             # self.client.models.list()
 76 |             logger.info(f"LLM OpenAI Client initialized for endpoint {self.LLM_endpoint} and model {self.LLM_model_name}.")
 77 |         except Exception as e:
 78 |             logger.error(f"Failed to initialize LLM OpenAI Client: {e}", exc_info=True)
 79 |             raise RuntimeError(f"LLM client initialization failed: {e}")
 80 |     
 81 |     def generate_text(self, prompt: str) -> str:
 82 |          try:
 83 |              log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
 84 |              logger.debug(f"[LLM] Sending text prompt (truncated): {log_prompt}")
 85 |              messages = [{"role": "user", "content": prompt}]
 86 |              response = self.client.chat.completions.create(
 87 |                  model=self.LLM_model_name,
 88 |                  messages=messages,
 89 |                  max_tokens=1024, # Adjust as needed
 90 |              )
 91 |              logger.debug("[LLM] Received text response.")
 92 | 
 93 |              if response.choices:
 94 |                  message = response.choices[0].message
 95 |                  if message.content:
 96 |                      return message.content
 97 |                  else:
 98 |                      # Handle cases like function calls if they unexpectedly occur or content filter
 99 |                      finish_reason = response.choices[0].finish_reason
100 |                      logger.warning(f"[LLM] Text generation returned no content. Finish reason: {finish_reason}. Response: {response.model_dump_json(indent=2)}")
101 |                      if finish_reason == 'content_filter':
102 |                          return "Error: [LLM] Content generation blocked due to content filter."
103 |                      return "Error: [LLM] Empty response from LLM."
104 |              else:
105 |                  logger.warning(f"[LLM] Text generation returned no choices. Response: {response.model_dump_json(indent=2)}")
106 |                  return "Error: [LLM] No choices returned from LLM."
107 | 
108 |          except openai.APIError as e:
109 |              # Handle API error here, e.g. retry or log
110 |              logger.error(f"[LLM] OpenAI API returned an API Error: {e}", exc_info=True)
111 |              return f"Error: [LLM] API Error - {type(e).__name__}: {e}"
112 |          except openai.AuthenticationError as e:
113 |              logger.error(f"[LLM] OpenAI API authentication error: {e}", exc_info=True)
114 |              return f"Error: [LLM] Authentication Error - {e}"
115 |          except openai.RateLimitError as e:
116 |              logger.error(f"[LLM] OpenAI API request exceeded rate limit: {e}", exc_info=True)
117 |              # Note: Our simple time.sleep might not be enough for LLM's complex limits
118 |              return f"Error: [LLM] Rate limit exceeded - {e}"
119 |          except Exception as e:
120 |              logger.error(f"Error during LLM text generation: {e}", exc_info=True)
121 |              return f"Error: [LLM] Failed to communicate with API - {type(e).__name__}: {e}"
122 | 
123 |     def generate_multimodal(self, prompt: str, image_bytes: bytes) -> str:
124 |         if not self.LLM_vision_model_name:
125 |              logger.error("[LLM] LLM vision model name not configured.")
126 |              return "Error: [LLM] Vision model not configured."
127 | 
128 |         base64_url = _image_bytes_to_base64_url(image_bytes)
129 |         if not base64_url:
130 |             return "Error: [LLM] Failed to convert image to base64."
131 | 
132 |         try:
133 |             log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
134 |             logger.debug(f"[LLM] Sending multimodal prompt (truncated): {log_prompt} with image.")
135 | 
136 |             messages = [
137 |                 {
138 |                     "role": "user",
139 |                     "content": [
140 |                         {"type": "text", "text": prompt},
141 |                         {"type": "image_url", "image_url": {"url": base64_url}},
142 |                     ],
143 |                 }
144 |             ]
145 | 
146 |             response = self.client.chat.completions.create(
147 |                 model=self.LLM_vision_model_name, # Use the vision model deployment
148 |                 messages=messages,
149 |                 max_tokens=1024, # Adjust as needed
150 |             )
151 |             logger.debug("[LLM] Received multimodal response.")
152 | 
153 |             # Parsing logic similar to text generation
154 |             if response.choices:
155 |                 message = response.choices[0].message
156 |                 if message.content:
157 |                     return message.content
158 |                 else:
159 |                     finish_reason = response.choices[0].finish_reason
160 |                     logger.warning(f"[LLM] Multimodal generation returned no content. Finish reason: {finish_reason}. Response: {response.model_dump_json(indent=2)}")
161 |                     if finish_reason == 'content_filter':
162 |                         return "Error: [LLM] Content generation blocked due to content filter."
163 |                     return "Error: [LLM] Empty multimodal response from LLM."
164 |             else:
165 |                 logger.warning(f"[LLM] Multimodal generation returned no choices. Response: {response.model_dump_json(indent=2)}")
166 |                 return "Error: [LLM] No choices returned from Vision LLM."
167 | 
168 |         except openai.APIError as e:
169 |              logger.error(f"[LLM] OpenAI Vision API returned an API Error: {e}", exc_info=True)
170 |              return f"Error: [LLM] Vision API Error - {type(e).__name__}: {e}"
171 |         # Add other specific openai exceptions as needed (AuthenticationError, RateLimitError, etc.)
172 |         except Exception as e:
173 |             logger.error(f"Error during LLM multimodal generation: {e}", exc_info=True)
174 |             return f"Error: [LLM] Failed to communicate with Vision API - {type(e).__name__}: {e}"
175 | 
176 | 
177 |     def generate_json(self, Schema_Class: Type[BaseModel], prompt: str, image_bytes: Optional[bytes] = None) -> Union[Dict[str, Any], str]:
178 |          if not issubclass(Schema_Class, BaseModel):
179 |               logger.error(f"[LLM] Schema_Class must be a Pydantic BaseModel for LLM JSON generation.")
180 |               return "Error: [LLM] Invalid schema type provided."
181 | 
182 |          current_model = self.LLM_model_name
183 |          messages: List[Dict[str, Any]] = [{"role": "user", "content": []}] # Initialize user content as list
184 | 
185 |          # Prepare content (text and optional image)
186 |          text_content = {"type": "text", "text": prompt}
187 |          messages[0]["content"].append(text_content) # type: ignore
188 | 
189 |          log_msg_suffix = ""
190 |          if image_bytes is not None:
191 |              if not self.LLM_vision_model_name:
192 |                   logger.error("[LLM] LLM vision model name not configured for multimodal JSON.")
193 |                   return "Error: [LLM] Vision model not configured for multimodal JSON."
194 |              current_model = self.LLM_vision_model_name # Use vision model if image is present
195 | 
196 |              base64_url = _image_bytes_to_base64_url(image_bytes)
197 |              if not base64_url:
198 |                  return "Error: [LLM] Failed to convert image to base64 for JSON."
199 |              image_content = {"type": "image_url", "image_url": {"url": base64_url}}
200 |              messages[0]["content"].append(image_content) # type: ignore
201 |              log_msg_suffix = " with image"
202 | 
203 | 
204 |          # Prepare the tool based on the Pydantic schema
205 |          try:
206 |              tool_def = openai.pydantic_function_tool(Schema_Class)
207 |              tools = [tool_def]
208 |              # Tool choice can force the model to use the function, or let it decide.
209 |              # Forcing it: tool_choice = {"type": "function", "function": {"name": Schema_Class.__name__}}
210 |              # Letting it decide (often better unless you *know* it must be called): tool_choice = "auto"
211 |              # Let's explicitly request the tool for structured output
212 |              tool_choice = {"type": "function", "function": {"name": tool_def['function']['name']}}
213 | 
214 |          except Exception as tool_err:
215 |              logger.error(f"[LLM] Failed to create tool definition from schema {Schema_Class.__name__}: {tool_err}", exc_info=True)
216 |              return f"Error: [LLM] Failed to create tool definition - {tool_err}"
217 | 
218 | 
219 |          try:
220 |              log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
221 |              logger.debug(f"[LLM] Sending JSON prompt (truncated): {log_prompt}{log_msg_suffix} with schema {Schema_Class.__name__}")
222 | 
223 |              # Add a system prompt to guide the model (optional but helpful)
224 |              system_message = {"role": "system", "content": f"You are a helpful assistant. Use the provided '{Schema_Class.__name__}' tool to structure your response based on the user's request."}
225 |              final_messages = [system_message] + messages
226 | 
227 |              response = self.client.chat.completions.create(
228 |                  model=current_model, # Use vision model if image included
229 |                  messages=final_messages,
230 |                  tools=tools,
231 |                  tool_choice=tool_choice, # Request the specific tool
232 |                  max_tokens=2048, # Adjust as needed
233 |              )
234 |              logger.debug("[LLM] Received JSON response structure.")
235 | 
236 |              if response.choices:
237 |                  message = response.choices[0].message
238 |                  finish_reason = response.choices[0].finish_reason
239 | 
240 |                  if message.tool_calls:
241 |                      if len(message.tool_calls) > 1:
242 |                           logger.warning(f"[LLM] Multiple tool calls received, using the first one for schema {Schema_Class.__name__}")
243 | 
244 |                      tool_call = message.tool_calls[0]
245 |                      if tool_call.type == 'function' and tool_call.function.name == tool_def['function']['name']:
246 |                          function_args_str = tool_call.function.arguments
247 |                          try:
248 |                              # Parse the arguments string into a dictionary
249 |                              parsed_args = json.loads(function_args_str)
250 |                              # Validate and potentially instantiate the Pydantic model
251 |                              model_instance = Schema_Class.model_validate(parsed_args)
252 |                              return model_instance # Return as dict
253 |                          #     print(parsed_args)
254 |                          #     return parsed_args # Return the parsed dict directly
255 |                          except json.JSONDecodeError as json_err:
256 |                              logger.error(f"[LLM] Failed to parse JSON arguments from tool call: {json_err}. Arguments: '{function_args_str}'")
257 |                              return f"Error: [LLM] Failed to parse JSON arguments - {json_err}"
258 |                          except Exception as val_err: # Catch Pydantic validation errors if model_validate is used
259 |                              logger.error(f"[LLM] JSON arguments failed validation for schema {Schema_Class.__name__}: {val_err}. Arguments: {function_args_str}")
260 |                              return f"Error: [LLM] JSON arguments failed validation - {val_err}"
261 |                      else:
262 |                          logger.warning(f"[LLM] Expected function tool call for {Schema_Class.__name__} but got type '{tool_call.type}' or name '{tool_call.function.name}'.")
263 |                          return f"Error: [LLM] Unexpected tool call type/name received."
264 | 
265 |                  elif finish_reason == 'tool_calls':
266 |                       # This might happen if the model intended to call but failed, or structure is odd
267 |                       logger.warning(f"[LLM] Finish reason is 'tool_calls' but no tool_calls found in message. Response: {response.model_dump_json(indent=2)}")
268 |                       return "Error: [LLM] Model indicated tool use but none found."
269 |                  elif finish_reason == 'content_filter':
270 |                       logger.warning(f"[LLM] JSON generation blocked due to content filter.")
271 |                       return "Error: [LLM] Content generation blocked due to content filter."
272 |                  else:
273 |                       # Model didn't use the tool
274 |                       logger.warning(f"[LLM] Model did not use the requested JSON tool {Schema_Class.__name__}. Finish reason: {finish_reason}. Content: {message.content}")
275 |                       # You might return the text content or an error depending on requirements
276 |                       # return message.content or "Error: [LLM] Model generated text instead of using the JSON tool."
277 |                       return f"Error: [LLM] Model did not use the JSON tool. Finish Reason: {finish_reason}."
278 | 
279 |              else:
280 |                  logger.warning(f"[LLM] JSON generation returned no choices. Response: {response.model_dump_json(indent=2)}")
281 |                  return "Error: [LLM] No choices returned from LLM for JSON request."
282 | 
283 |          except openai.APIError as e:
284 |              logger.error(f"[LLM] OpenAI API returned an API Error during JSON generation: {e}", exc_info=True)
285 |              return f"Error: [LLM] API Error (JSON) - {type(e).__name__}: {e}"
286 |          # Add other specific openai exceptions (AuthenticationError, RateLimitError, etc.)
287 |          except Exception as e:
288 |              logger.error(f"Error during LLM JSON generation: {e}", exc_info=True)
289 |              return f"Error: [LLM] Failed to communicate with API for JSON - {type(e).__name__}: {e}"
290 | 
291 |     
```

--------------------------------------------------------------------------------
/src/llm/clients/azure_openai_client.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/llm/clients/azure_openai_client.py
  2 | from PIL import Image
  3 | import io
  4 | import logging
  5 | import time # Import time module
  6 | import threading # Import threading for lock
  7 | from typing import Type, Optional, Union, List, Dict, Any
  8 | logger = logging.getLogger(__name__)
  9 | import base64
 10 | import json
 11 | 
 12 | from ...utils.utils import load_api_key, load_api_base_url, load_api_version, load_llm_model
 13 | 
 14 | # --- Provider Specific Imports ---
 15 | try:
 16 |     import openai
 17 |     from openai import AzureOpenAI
 18 |     from pydantic import BaseModel # Needed for LLM JSON tool definition
 19 |     OPENAI_SDK = True
 20 | except ImportError:
 21 |     OPENAI_SDK = False
 22 |     # Define dummy classes if LLM libs are not installed to avoid NameErrors
 23 |     class BaseModel: pass
 24 |     class OpenAI: pass
 25 | 
 26 | 
 27 | 
 28 | 
 29 | # --- Helper Function ---
 30 | def _image_bytes_to_base64_url(image_bytes: bytes) -> Optional[str]:
 31 |     """Converts image bytes to a base64 data URL."""
 32 |     try:
 33 |         # Try to determine the image format
 34 |         img = Image.open(io.BytesIO(image_bytes))
 35 |         format = img.format
 36 |         if not format:
 37 |             logger.warning("Could not determine image format, assuming JPEG.")
 38 |             format = "jpeg" # Default assumption
 39 |         else:
 40 |             format = format.lower()
 41 |             if format == 'jpg': # Standardize to jpeg
 42 |                 format = 'jpeg'
 43 | 
 44 |         # Ensure format is supported (common web formats)
 45 |         if format not in ['jpeg', 'png', 'gif', 'webp']:
 46 |              logger.warning(f"Unsupported image format '{format}' for base64 URL, defaulting to JPEG.")
 47 |              format = 'jpeg' # Fallback
 48 | 
 49 |         encoded_string = base64.b64encode(image_bytes).decode('utf-8')
 50 |         return f"data:image/{format};base64,{encoded_string}"
 51 |     except Exception as e:
 52 |         logger.error(f"Error converting image bytes to base64 URL: {e}", exc_info=True)
 53 |         return None
 54 | 
 55 | 
 56 | class AzureOpenAIClient:
 57 |     def __init__(self):
 58 |         self.client = None
 59 |         self.LLM_api_key = load_api_key()
 60 |         self.LLM_api_version = load_api_version()
 61 |         self.LLM_model_name = load_llm_model()
 62 |         self.LLM_endpoint = load_api_base_url()
 63 |         self.LLM_vision_model_name = self.LLM_model_name
 64 |         
 65 |         if not OPENAI_SDK:
 66 |                 raise ImportError("LLM OpenAI libraries (openai, pydantic) are not installed. Please install them.")
 67 |         if not all([self.LLM_api_key, self.LLM_endpoint, self.LLM_api_version, self.LLM_model_name]):
 68 |             raise ValueError("LLM_api_key, LLM_endpoint, LLM_api_version, and LLM_model_name are required for provider 'LLM'")
 69 |         try:
 70 |             self.client = AzureOpenAI(
 71 |                 api_key=self.LLM_api_key,
 72 |                 azure_endpoint=self.LLM_endpoint,
 73 |                 api_version=self.LLM_api_version
 74 |             )
 75 |             # Test connection slightly by listing models (optional, requires different permission potentially)
 76 |             # self.client.models.list()
 77 |             logger.info(f"LLM OpenAI Client initialized for endpoint {self.LLM_endpoint} and model {self.LLM_model_name}.")
 78 |         except Exception as e:
 79 |             logger.error(f"Failed to initialize LLM OpenAI Client: {e}", exc_info=True)
 80 |             raise RuntimeError(f"LLM client initialization failed: {e}")
 81 |     
 82 |     def generate_text(self, prompt: str) -> str:
 83 |          try:
 84 |              log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
 85 |              logger.debug(f"[LLM] Sending text prompt (truncated): {log_prompt}")
 86 |              messages = [{"role": "user", "content": prompt}]
 87 |              response = self.client.chat.completions.create(
 88 |                  model=self.LLM_model_name,
 89 |                  messages=messages,
 90 |                  max_tokens=1024, # Adjust as needed
 91 |              )
 92 |              logger.debug("[LLM] Received text response.")
 93 | 
 94 |              if response.choices:
 95 |                  message = response.choices[0].message
 96 |                  if message.content:
 97 |                      return message.content
 98 |                  else:
 99 |                      # Handle cases like function calls if they unexpectedly occur or content filter
100 |                      finish_reason = response.choices[0].finish_reason
101 |                      logger.warning(f"[LLM] Text generation returned no content. Finish reason: {finish_reason}. Response: {response.model_dump_json(indent=2)}")
102 |                      if finish_reason == 'content_filter':
103 |                          return "Error: [LLM] Content generation blocked due to content filter."
104 |                      return "Error: [LLM] Empty response from LLM."
105 |              else:
106 |                  logger.warning(f"[LLM] Text generation returned no choices. Response: {response.model_dump_json(indent=2)}")
107 |                  return "Error: [LLM] No choices returned from LLM."
108 | 
109 |          except openai.APIError as e:
110 |              # Handle API error here, e.g. retry or log
111 |              logger.error(f"[LLM] OpenAI API returned an API Error: {e}", exc_info=True)
112 |              return f"Error: [LLM] API Error - {type(e).__name__}: {e}"
113 |          except openai.AuthenticationError as e:
114 |              logger.error(f"[LLM] OpenAI API authentication error: {e}", exc_info=True)
115 |              return f"Error: [LLM] Authentication Error - {e}"
116 |          except openai.RateLimitError as e:
117 |              logger.error(f"[LLM] OpenAI API request exceeded rate limit: {e}", exc_info=True)
118 |              # Note: Our simple time.sleep might not be enough for LLM's complex limits
119 |              return f"Error: [LLM] Rate limit exceeded - {e}"
120 |          except Exception as e:
121 |              logger.error(f"Error during LLM text generation: {e}", exc_info=True)
122 |              return f"Error: [LLM] Failed to communicate with API - {type(e).__name__}: {e}"
123 | 
124 |     def generate_multimodal(self, prompt: str, image_bytes: bytes) -> str:
125 |         if not self.LLM_vision_model_name:
126 |              logger.error("[LLM] LLM vision model name not configured.")
127 |              return "Error: [LLM] Vision model not configured."
128 | 
129 |         base64_url = _image_bytes_to_base64_url(image_bytes)
130 |         if not base64_url:
131 |             return "Error: [LLM] Failed to convert image to base64."
132 | 
133 |         try:
134 |             log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
135 |             logger.debug(f"[LLM] Sending multimodal prompt (truncated): {log_prompt} with image.")
136 | 
137 |             messages = [
138 |                 {
139 |                     "role": "user",
140 |                     "content": [
141 |                         {"type": "text", "text": prompt},
142 |                         {"type": "image_url", "image_url": {"url": base64_url}},
143 |                     ],
144 |                 }
145 |             ]
146 | 
147 |             response = self.client.chat.completions.create(
148 |                 model=self.LLM_vision_model_name, # Use the vision model deployment
149 |                 messages=messages,
150 |                 max_tokens=1024, # Adjust as needed
151 |             )
152 |             logger.debug("[LLM] Received multimodal response.")
153 | 
154 |             # Parsing logic similar to text generation
155 |             if response.choices:
156 |                 message = response.choices[0].message
157 |                 if message.content:
158 |                     return message.content
159 |                 else:
160 |                     finish_reason = response.choices[0].finish_reason
161 |                     logger.warning(f"[LLM] Multimodal generation returned no content. Finish reason: {finish_reason}. Response: {response.model_dump_json(indent=2)}")
162 |                     if finish_reason == 'content_filter':
163 |                         return "Error: [LLM] Content generation blocked due to content filter."
164 |                     return "Error: [LLM] Empty multimodal response from LLM."
165 |             else:
166 |                 logger.warning(f"[LLM] Multimodal generation returned no choices. Response: {response.model_dump_json(indent=2)}")
167 |                 return "Error: [LLM] No choices returned from Vision LLM."
168 | 
169 |         except openai.APIError as e:
170 |              logger.error(f"[LLM] OpenAI Vision API returned an API Error: {e}", exc_info=True)
171 |              return f"Error: [LLM] Vision API Error - {type(e).__name__}: {e}"
172 |         # Add other specific openai exceptions as needed (AuthenticationError, RateLimitError, etc.)
173 |         except Exception as e:
174 |             logger.error(f"Error during LLM multimodal generation: {e}", exc_info=True)
175 |             return f"Error: [LLM] Failed to communicate with Vision API - {type(e).__name__}: {e}"
176 | 
177 | 
178 |     def generate_json(self, Schema_Class: Type[BaseModel], prompt: str, image_bytes: Optional[bytes] = None) -> Union[Dict[str, Any], str]:
179 |          if not issubclass(Schema_Class, BaseModel):
180 |               logger.error(f"[LLM] Schema_Class must be a Pydantic BaseModel for LLM JSON generation.")
181 |               return "Error: [LLM] Invalid schema type provided."
182 | 
183 |          current_model = self.LLM_model_name
184 |          messages: List[Dict[str, Any]] = [{"role": "user", "content": []}] # Initialize user content as list
185 | 
186 |          # Prepare content (text and optional image)
187 |          text_content = {"type": "text", "text": prompt}
188 |          messages[0]["content"].append(text_content) # type: ignore
189 | 
190 |          log_msg_suffix = ""
191 |          if image_bytes is not None:
192 |              if not self.LLM_vision_model_name:
193 |                   logger.error("[LLM] LLM vision model name not configured for multimodal JSON.")
194 |                   return "Error: [LLM] Vision model not configured for multimodal JSON."
195 |              current_model = self.LLM_vision_model_name # Use vision model if image is present
196 | 
197 |              base64_url = _image_bytes_to_base64_url(image_bytes)
198 |              if not base64_url:
199 |                  return "Error: [LLM] Failed to convert image to base64 for JSON."
200 |              image_content = {"type": "image_url", "image_url": {"url": base64_url}}
201 |              messages[0]["content"].append(image_content) # type: ignore
202 |              log_msg_suffix = " with image"
203 | 
204 | 
205 |          # Prepare the tool based on the Pydantic schema
206 |          try:
207 |              tool_def = openai.pydantic_function_tool(Schema_Class)
208 |              tools = [tool_def]
209 |              # Tool choice can force the model to use the function, or let it decide.
210 |              # Forcing it: tool_choice = {"type": "function", "function": {"name": Schema_Class.__name__}}
211 |              # Letting it decide (often better unless you *know* it must be called): tool_choice = "auto"
212 |              # Let's explicitly request the tool for structured output
213 |              tool_choice = {"type": "function", "function": {"name": tool_def['function']['name']}}
214 | 
215 |          except Exception as tool_err:
216 |              logger.error(f"[LLM] Failed to create tool definition from schema {Schema_Class.__name__}: {tool_err}", exc_info=True)
217 |              return f"Error: [LLM] Failed to create tool definition - {tool_err}"
218 | 
219 | 
220 |          try:
221 |              log_prompt = prompt[:200] + ('...' if len(prompt) > 200 else '')
222 |              logger.debug(f"[LLM] Sending JSON prompt (truncated): {log_prompt}{log_msg_suffix} with schema {Schema_Class.__name__}")
223 | 
224 |              # Add a system prompt to guide the model (optional but helpful)
225 |              system_message = {"role": "system", "content": f"You are a helpful assistant. Use the provided '{Schema_Class.__name__}' tool to structure your response based on the user's request."}
226 |              final_messages = [system_message] + messages
227 | 
228 |              response = self.client.chat.completions.create(
229 |                  model=current_model, # Use vision model if image included
230 |                  messages=final_messages,
231 |                  tools=tools,
232 |                  tool_choice=tool_choice, # Request the specific tool
233 |                  max_tokens=2048, # Adjust as needed
234 |              )
235 |              logger.debug("[LLM] Received JSON response structure.")
236 | 
237 |              if response.choices:
238 |                  message = response.choices[0].message
239 |                  finish_reason = response.choices[0].finish_reason
240 | 
241 |                  if message.tool_calls:
242 |                      if len(message.tool_calls) > 1:
243 |                           logger.warning(f"[LLM] Multiple tool calls received, using the first one for schema {Schema_Class.__name__}")
244 | 
245 |                      tool_call = message.tool_calls[0]
246 |                      if tool_call.type == 'function' and tool_call.function.name == tool_def['function']['name']:
247 |                          function_args_str = tool_call.function.arguments
248 |                          try:
249 |                              # Parse the arguments string into a dictionary
250 |                              parsed_args = json.loads(function_args_str)
251 |                              # Validate and potentially instantiate the Pydantic model
252 |                              model_instance = Schema_Class.model_validate(parsed_args)
253 |                              return model_instance # Return as dict
254 |                          #     print(parsed_args)
255 |                          #     return parsed_args # Return the parsed dict directly
256 |                          except json.JSONDecodeError as json_err:
257 |                              logger.error(f"[LLM] Failed to parse JSON arguments from tool call: {json_err}. Arguments: '{function_args_str}'")
258 |                              return f"Error: [LLM] Failed to parse JSON arguments - {json_err}"
259 |                          except Exception as val_err: # Catch Pydantic validation errors if model_validate is used
260 |                              logger.error(f"[LLM] JSON arguments failed validation for schema {Schema_Class.__name__}: {val_err}. Arguments: {function_args_str}")
261 |                              return f"Error: [LLM] JSON arguments failed validation - {val_err}"
262 |                      else:
263 |                          logger.warning(f"[LLM] Expected function tool call for {Schema_Class.__name__} but got type '{tool_call.type}' or name '{tool_call.function.name}'.")
264 |                          return f"Error: [LLM] Unexpected tool call type/name received."
265 | 
266 |                  elif finish_reason == 'tool_calls':
267 |                       # This might happen if the model intended to call but failed, or structure is odd
268 |                       logger.warning(f"[LLM] Finish reason is 'tool_calls' but no tool_calls found in message. Response: {response.model_dump_json(indent=2)}")
269 |                       return "Error: [LLM] Model indicated tool use but none found."
270 |                  elif finish_reason == 'content_filter':
271 |                       logger.warning(f"[LLM] JSON generation blocked due to content filter.")
272 |                       return "Error: [LLM] Content generation blocked due to content filter."
273 |                  else:
274 |                       # Model didn't use the tool
275 |                       logger.warning(f"[LLM] Model did not use the requested JSON tool {Schema_Class.__name__}. Finish reason: {finish_reason}. Content: {message.content}")
276 |                       # You might return the text content or an error depending on requirements
277 |                       # return message.content or "Error: [LLM] Model generated text instead of using the JSON tool."
278 |                       return f"Error: [LLM] Model did not use the JSON tool. Finish Reason: {finish_reason}."
279 | 
280 |              else:
281 |                  logger.warning(f"[LLM] JSON generation returned no choices. Response: {response.model_dump_json(indent=2)}")
282 |                  return "Error: [LLM] No choices returned from LLM for JSON request."
283 | 
284 |          except openai.APIError as e:
285 |              logger.error(f"[LLM] OpenAI API returned an API Error during JSON generation: {e}", exc_info=True)
286 |              return f"Error: [LLM] API Error (JSON) - {type(e).__name__}: {e}"
287 |          # Add other specific openai exceptions (AuthenticationError, RateLimitError, etc.)
288 |          except Exception as e:
289 |              logger.error(f"Error during LLM JSON generation: {e}", exc_info=True)
290 |              return f"Error: [LLM] Failed to communicate with API for JSON - {type(e).__name__}: {e}"
291 | 
292 |     
```

--------------------------------------------------------------------------------
/mcp_server.py:
--------------------------------------------------------------------------------

```python
  1 | # mcp_server.py
  2 | import sys
  3 | import os
  4 | import json
  5 | import logging
  6 | from typing import List, Dict, Any, Optional
  7 | import asyncio
  8 | import re
  9 | import time
 10 | from datetime import datetime
 11 | 
 12 | # Ensure agent modules are importable (adjust path if necessary)
 13 | # Assuming mcp_server.py is at the root level alongside agent.py etc.
 14 | sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)))
 15 | 
 16 | from mcp.server.fastmcp import FastMCP
 17 | from mcp.server.fastmcp.prompts import base as mcp_prompts
 18 | 
 19 | # Import necessary components from your existing code
 20 | from src.agents.recorder_agent import WebAgent # Needs refactoring for non-interactive use
 21 | from src.agents.crawler_agent import CrawlerAgent
 22 | from src.llm.llm_client import LLMClient
 23 | from src.execution.executor import TestExecutor
 24 | from src.utils.utils import load_api_key, load_api_base_url, load_api_version, load_llm_model
 25 | from src.security.semgrep_scanner import run_semgrep
 26 | from src.security.zap_scanner import run_zap_scan, discover_endpoints
 27 | from src.security.nuclei_scanner import run_nuclei
 28 | from src.security.utils import save_report
 29 | 
 30 | # Configure logging for the MCP server
 31 | logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - [MCP Server] %(message)s')
 32 | logger = logging.getLogger(__name__)
 33 | 
 34 | # Define the output directory for tests (consistent with agent/executor)
 35 | TEST_OUTPUT_DIR = "output"
 36 | 
 37 | # --- Initialize FastMCP Server ---
 38 | mcp = FastMCP("WebTestAgentServer")
 39 | 
 40 | llm_client = LLMClient(provider='azure')
 41 | 
 42 | # --- MCP Tool: Record a New Test Flow (Automated - Requires Agent Refactoring) ---
 43 | @mcp.tool()
 44 | async def record_test_flow(feature_description: str, project_directory: str, headless: bool = True) -> Dict[str, Any]:
 45 |     """
 46 |     Attempts to automatically record a web test flow based on a natural language description. If a case fails, there might be a possibility that you missed/told wrong step in feature description. Don't give vague actions like select anything. Give exact actions like select so and so element
 47 |     Uses the WebAgent in automated mode (bypasses interactive prompts). Do not skip telling any step. Give complete end to end steps what to do and what to verify
 48 | 
 49 |     Args:
 50 |         feature_description: A natural language description of the test case or user flow. Crucially, this description MUST explicitly include the starting URL of the website to be tested (e.g., 'Go to https://example.com, then click...'). Do not give blanket things for input. Say exact things like enter invalid-email into email box or enter [email protected] into mailbox
 51 |         project_directory: The project directory you are currently working in. This is used to identify the test flows of a project
 52 |         headless: Run the underlying browser in headless mode. Defaults to True.
 53 | 
 54 |     Returns:
 55 |         A dictionary containing the recording status, including success/failure,
 56 |         message, and the path to the generated test JSON file if successful.
 57 |     """
 58 |     logger.info(f"Received automated request to record test flow: '{feature_description[:100]}...' (Headless: {headless})")
 59 |     try:
 60 |         
 61 |         # 1. Instantiate WebAgent in AUTOMATED mode
 62 |         recorder_agent = WebAgent(
 63 |             llm_client=llm_client,
 64 |             headless=headless, # Allow MCP tool to specify headless
 65 |             is_recorder_mode=True,
 66 |             automated_mode=True, # <<< Set automated mode 
 67 |             max_retries_per_subtask=2,
 68 |             filename=re.sub(r"[ /]", "_", project_directory)
 69 |         )
 70 |         
 71 |         # Run the blocking recorder_agent.record method in a separate thread
 72 |         # Pass the method and its arguments to asyncio.to_thread
 73 |         logger.info("Delegating agent recording to a separate thread...")
 74 |         recording_result = await asyncio.to_thread(recorder_agent.record, feature_description)
 75 |         logger.info(f"Automated recording finished (thread returned). Result: {recording_result}")
 76 |         return recording_result
 77 | 
 78 |     except Exception as e:
 79 |         logger.error(f"Error in record_test_flow tool: {e}", exc_info=True)
 80 |         return {"success": False, "message": f"Internal server error during automated recording: {e}"}
 81 | 
 82 | 
 83 | # --- MCP Tool: Run a Single Regression Test ---
 84 | @mcp.tool()
 85 | async def run_regression_test(test_file_path: str, headless: bool = True, enable_healing: bool = True, healing_mode: str = 'soft', get_performance: bool = False, get_network_requests: bool = False) -> Dict[str, Any]:
 86 |     """
 87 |     Runs a previously recorded test case from a JSON file. If a case fails, it could be either because your code has a problem, or could be you missed/wrong step in feature description
 88 |     
 89 | 
 90 |     Args:
 91 |         test_file_path: The relative or absolute path to the .json test file (e.g., 'output/test_login.json').
 92 |         headless: Run the browser in headless mode (no visible window). Defaults to True.
 93 |         enable_healing: Whether to run this regression test with healing mode enabled. In healing mode, if test fails because of a changed or flaky selector, the agent can try to heal the test automatically.
 94 |         healing_mode: can be 'soft' or 'hard'. In soft mode, only single step is attempted to heal. In hard healing, complete test is tried to be re-recorded
 95 |         get_performance: Whether to include performance stats in response
 96 |         get_network_requests: Whether to include network stats in response
 97 | 
 98 |     Returns:
 99 |         A dictionary containing the execution result summary, including status (PASS/FAIL),
100 |         duration, message, error details (if failed), and evidence paths.
101 |     """
102 |     logger.info(f"Received request to run regression test: '{test_file_path}', Headless: {headless}")
103 | 
104 |     # Basic path validation (relative to server or absolute)
105 |     if not os.path.isabs(test_file_path):
106 |         # Assume relative to the server's working directory or a known output dir
107 |         # For simplicity, let's check relative to CWD and TEST_OUTPUT_DIR
108 |         potential_paths = [
109 |             test_file_path,
110 |             os.path.join(TEST_OUTPUT_DIR, test_file_path)
111 |         ]
112 |         found_path = None
113 |         for p in potential_paths:
114 |             if os.path.exists(p) and os.path.isfile(p):
115 |                 found_path = p
116 |                 break
117 |         if not found_path:
118 |              logger.error(f"Test file not found at '{test_file_path}' or within '{TEST_OUTPUT_DIR}'.")
119 |              return {"success": False, "status": "ERROR", "message": f"Test file not found: {test_file_path}"}
120 |         test_file_path = os.path.abspath(found_path) # Use absolute path for executor
121 |         logger.info(f"Resolved test file path to: {test_file_path}")
122 | 
123 | 
124 |     try:
125 |         # Executor doesn't need the LLM client
126 |         executor = TestExecutor(
127 |             headless=headless, 
128 |             llm_client=llm_client, 
129 |             enable_healing=enable_healing,
130 |             healing_mode=healing_mode,
131 |             get_network_requests=get_network_requests,
132 |             get_performance=get_performance
133 |             )
134 |         logger.info(f"Delegating test execution for '{test_file_path}' to a separate thread...")
135 |         test_result = await asyncio.to_thread(
136 |             executor.run_test, # The function to run
137 |             test_file_path     # Arguments for the function
138 |         )
139 | 
140 |         # Add a success flag for generic tool success/failure indication
141 |         # Post-processing (synchronous)
142 |         test_result["success"] = test_result.get("status") == "PASS"
143 |         logger.info(f"Execution finished for '{test_file_path}' (thread returned). Status: {test_result.get('status')}")
144 |         try:
145 |                 base_name = os.path.splitext(os.path.basename(test_file_path))[0]
146 |                 result_filename = os.path.join("output", f"execution_result_{base_name}_{time.strftime('%Y%m%d_%H%M%S')}.json")
147 |                 with open(result_filename, 'w', encoding='utf-8') as f:
148 |                     json.dump(test_result, f, indent=2, ensure_ascii=False)
149 |                 print(f"\nFull execution result details saved to: {result_filename}")
150 |         except Exception as save_err:
151 |                 logger.error(f"Failed to save full execution result JSON: {save_err}")
152 |         return test_result
153 | 
154 |     except FileNotFoundError:
155 |         logger.error(f"Test file not found by executor: {test_file_path}")
156 |         return {"success": False, "status": "ERROR", "message": f"Test file not found: {test_file_path}"}
157 |     except Exception as e:
158 |         logger.error(f"Error running regression test '{test_file_path}': {e}", exc_info=True)
159 |         return {"success": False, "status": "ERROR", "message": f"Internal server error during execution: {e}"}
160 | 
161 | @mcp.tool()
162 | async def discover_test_flows(start_url: str, max_pages_to_crawl: int = 10, headless: bool = True) -> Dict[str, Any]:
163 |     """
164 |      Crawls a website starting from a given URL within the same domain, analyzes page content
165 |     (DOM, Screenshot), and uses an LLM to suggest potential specific test step descriptions
166 |     for each discovered page.
167 | 
168 |     Args:
169 |         start_url: The URL to begin crawling from (e.g., 'https://example.com').
170 |         max_pages_to_crawl: The maximum number of unique pages to visit (default: 10).
171 |         headless: Run the crawler's browser in headless mode (default: True).
172 | 
173 |     Returns:
174 |         A dictionary containing the crawl summary, including success status,
175 |         pages visited, and a dictionary mapping visited URLs to suggested test step descriptions.
176 |         Example: {"success": true, "discovered_steps": {"https://example.com/login": ["Type 'user' into Username field", ...]}}
177 |     """
178 |     logger.info(f"Received request to discover test flows starting from: '{start_url}', Max Pages: {max_pages_to_crawl}, Headless: {headless}")
179 | 
180 |     try:
181 |         # 1. Instantiate CrawlerAgent
182 |         crawler = CrawlerAgent(
183 |             llm_client=llm_client,
184 |             headless=headless
185 |         )
186 | 
187 |         # 2. Run the blocking crawl method in a separate thread
188 |         logger.info("Delegating crawler execution to a separate thread...")
189 |         crawl_results = await asyncio.to_thread(
190 |             crawler.crawl_and_suggest,
191 |             start_url,
192 |             max_pages_to_crawl
193 |         )
194 |         logger.info(f"Crawling finished (thread returned). Visited: {crawl_results.get('pages_visited')}, Suggestions: {len(crawl_results.get('discovered_steps', {}))}")
195 | 
196 | 
197 |         # Return the results dictionary from the crawler
198 |         return crawl_results
199 | 
200 |     except Exception as e:
201 |         logger.error(f"Error in discover_test_flows tool: {e}", exc_info=True)
202 |         return {"success": False, "message": f"Internal server error during crawling: {e}", "discovered_steps": {}}
203 | 
204 | 
205 | # --- MCP Resource: List Recorded Tests ---
206 | @mcp.tool()
207 | def list_recorded_tests(project_directory: str) -> List[str]:
208 |     """
209 |     Provides a list of available test JSON files in the standard output directory.
210 | 
211 |     Args:
212 |     project_directory: The project directory you are currently working in. This is used to identify the test flows of a project
213 |     
214 |     Returns:
215 |         test_files: A list of filenames for each test flow (e.g., ["test_login_flow_....json", "test_search_....json"]).
216 |     """
217 |     logger.info(f"Providing resource list of tests from '{TEST_OUTPUT_DIR}'")
218 |     if not os.path.exists(TEST_OUTPUT_DIR) or not os.path.isdir(TEST_OUTPUT_DIR):
219 |         logger.warning(f"Test output directory '{TEST_OUTPUT_DIR}' not found.")
220 |         return []
221 | 
222 |     try:
223 |         test_files = [
224 |             f for f in os.listdir(TEST_OUTPUT_DIR)
225 |             if os.path.isfile(os.path.join(TEST_OUTPUT_DIR, f)) and f.endswith(".json") and f.startswith(re.sub(r"[ /]", "_", project_directory)) 
226 |         ]
227 |         # Optionally return just the test files, excluding execution results
228 |         test_files = [f for f in test_files if not f.startswith("execution_result_")]
229 |         return sorted(test_files)
230 |     except Exception as e:
231 |         logger.error(f"Error listing test files in '{TEST_OUTPUT_DIR}': {e}", exc_info=True)
232 |         # Re-raise or return empty list? Returning empty is safer for resource.
233 |         return []
234 | 
235 | 
236 | @mcp.tool()
237 | def get_security_scan(project_directory: str, target_url: str = None, semgrep_config: str = 'auto') -> Dict[str, Any]:
238 |     """
239 |     Provides a list of vulnerabilities in the code through static code scanning using semgrep, nuclei and zap.
240 |     Also discovers endpoints using ZAP's spider functionality. Try to fix them automatically if you think it is a true positive.
241 | 
242 |     Args:
243 |     project_directory: The project directory which you want to scan for security issues. Give absolute path only.
244 |     target_url: The target URL for dynamic scanning (ZAP and Nuclei). Required for endpoint discovery.
245 |     semgrep_config: The config for semgrep scans. Default: 'auto'
246 |     
247 |     Returns:
248 |         Dict containing:
249 |         - vulnerabilities: List of vulnerabilities found
250 |         - endpoints: List of discovered endpoints (if target_url provided)
251 |     """
252 |     logging.info("--- Starting Phase 1: Security Scanning ---")
253 |     all_findings = []
254 |     discovered_endpoints = []
255 | 
256 |     if project_directory:
257 |         # Run Semgrep scan
258 |         logging.info("--- Running Semgrep Scan ---")
259 |         semgrep_findings = run_semgrep(
260 |             code_path=project_directory,
261 |             config=semgrep_config,
262 |             output_dir='./results',
263 |             timeout=600
264 |         )
265 |         if semgrep_findings:
266 |             logging.info(f"Completed Semgrep Scan. Found {len(semgrep_findings)} potential issues.")
267 |             all_findings.extend(semgrep_findings)
268 |         else:
269 |             logging.warning("Semgrep scan completed with no findings or failed.")
270 |             all_findings.append({"Warning": "Semgrep scan completed with no findings or failed."})
271 | 
272 |         if target_url:
273 |             # First, discover endpoints using ZAP spider
274 |             logging.info("--- Running Endpoint Discovery ---")
275 |             try:
276 |                 discovered_endpoints = discover_endpoints(
277 |                     target_url=target_url,
278 |                     output_dir='./results',
279 |                     timeout=600  # 10 minutes for discovery
280 |                 )
281 |                 logging.info(f"Discovered {len(discovered_endpoints)} endpoints")
282 |             except Exception as e:
283 |                 logging.error(f"Error during endpoint discovery: {e}")
284 |                 discovered_endpoints = []
285 | 
286 |             # Run ZAP scan
287 |             logging.info("--- Running ZAP Scan ---")
288 |             try:
289 |                 zap_findings = run_zap_scan(
290 |                     target_url=target_url,
291 |                     output_dir='./results',
292 |                     scan_mode="baseline"  # Using baseline scan for quicker results
293 |                 )
294 |                 if zap_findings and not isinstance(zap_findings[0], str):
295 |                     logging.info(f"Completed ZAP Scan. Found {len(zap_findings)} potential issues.")
296 |                     all_findings.extend(zap_findings)
297 |                 else:
298 |                     logging.warning("ZAP scan completed with no findings or failed.")
299 |                     all_findings.append({"Warning": "ZAP scan completed with no findings or failed."})
300 |             except Exception as e:
301 |                 logging.error(f"Error during ZAP scan: {e}")
302 |                 all_findings.append({"Error": f"ZAP scan failed: {str(e)}"})
303 | 
304 |             # Run Nuclei scan
305 |             logging.info("--- Running Nuclei Scan ---")
306 |             try:
307 |                 nuclei_findings = run_nuclei(
308 |                     target_url=target_url,
309 |                     output_dir='./results'
310 |                 )
311 |                 if nuclei_findings and not isinstance(nuclei_findings[0], str):
312 |                     logging.info(f"Completed Nuclei Scan. Found {len(nuclei_findings)} potential issues.")
313 |                     all_findings.extend(nuclei_findings)
314 |                 else:
315 |                     logging.warning("Nuclei scan completed with no findings or failed.")
316 |                     all_findings.append({"Warning": "Nuclei scan completed with no findings or failed."})
317 |             except Exception as e:
318 |                 logging.error(f"Error during Nuclei scan: {e}")
319 |                 all_findings.append({"Error": f"Nuclei scan failed: {str(e)}"})
320 |         else:
321 |             logging.info("Skipping dynamic scans and endpoint discovery as target_url was not provided.")
322 | 
323 |     else:
324 |         logging.info("Skipping scans as project_directory was not provided.")
325 |         all_findings.append({"Warning": "Skipping scans as project_directory was not provided"})
326 | 
327 |     logging.info("--- Phase 1: Security Scanning Complete ---")
328 |     
329 |     logging.info("--- Starting Phase 2: Consolidating Results ---")
330 |     logging.info(f"Total findings aggregated from all tools: {len(all_findings)}")
331 | 
332 |     # Save the consolidated report
333 |     consolidated_report_path = save_report(all_findings, "consolidated", './results/', "consolidated_scan_results")
334 | 
335 |     if consolidated_report_path:
336 |         logging.info(f"Consolidated report saved to: {consolidated_report_path}")
337 |         print(f"\nConsolidated report saved to: {consolidated_report_path}")
338 |     else:
339 |         logging.error("Failed to save the consolidated report.")
340 | 
341 |     # Save discovered endpoints if any
342 |     if discovered_endpoints:
343 |         timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
344 |         endpoints_file = os.path.join('./results', f"discovered_endpoints_{timestamp}.json")
345 |         try:
346 |             with open(endpoints_file, 'w') as f:
347 |                 json.dump(discovered_endpoints, f, indent=2)
348 |             logging.info(f"Saved discovered endpoints to: {endpoints_file}")
349 |         except Exception as e:
350 |             logging.error(f"Failed to save endpoints report: {e}")
351 | 
352 |     logging.info("--- Phase 2: Consolidation Complete ---")
353 |     logging.info("--- Security Automation Script Finished ---")
354 |     
355 |     return {
356 |         "vulnerabilities": all_findings,
357 |         "endpoints": discovered_endpoints
358 |     }
359 | 
360 | 
361 | # --- Running the Server ---
362 | # The actual running is handled by `mcp dev` or `mcp install`.
363 | # No `if __name__ == "__main__": mcp.run()` needed here when using the CLI tools.
364 | logger.info("WebTestAgent MCP Server defined. Run with 'mcp dev mcp_server.py'")
365 | 
366 | if __name__ == "__main__":
367 |     mcp.run()
```

--------------------------------------------------------------------------------
/src/dom/views.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/dom/views.py 
  2 | from dataclasses import dataclass, field, KW_ONLY # Use field for default_factory
  3 | from functools import cached_property
  4 | from typing import TYPE_CHECKING, Dict, List, Optional, Union, Literal, Tuple
  5 | import re # Added for selector generation
  6 | 
  7 | # Use relative imports if within the same package structure
  8 | from .history.view import CoordinateSet, HashedDomElement, ViewportInfo # Adjusted import
  9 | 
 10 | # Placeholder decorator if not using utils.time_execution_sync
 11 | def time_execution_sync(label):
 12 |     def decorator(func):
 13 |         def wrapper(*args, **kwargs):
 14 |             # Basic logging
 15 |             # logger.debug(f"Executing {label}...")
 16 |             result = func(*args, **kwargs)
 17 |             # logger.debug(f"Finished {label}.")
 18 |             return result
 19 |         return wrapper
 20 |     return decorator
 21 | 
 22 | # Avoid circular import issues
 23 | if TYPE_CHECKING:
 24 |     # This creates a forward reference issue if DOMElementNode itself is in this file.
 25 |     # We need to define DOMElementNode before DOMBaseNode if DOMBaseNode references it.
 26 |     # Let's adjust the structure slightly or use string hints.
 27 |     pass # Forward reference handled by structure/string hints below
 28 | 
 29 | @dataclass(frozen=False)
 30 | class DOMBaseNode:
 31 |     # Parent needs to be Optional and potentially use string hint if defined later
 32 |     parent: Optional['DOMElementNode'] = None # Default to None
 33 |     is_visible: bool = False # Provide default
 34 | 
 35 | @dataclass(frozen=False)
 36 | class DOMTextNode(DOMBaseNode):
 37 |      # --- Field ordering within subclass matters less with KW_ONLY ---
 38 |     # --- but arguments after the marker MUST be passed by keyword ---
 39 |     _ : KW_ONLY # <--- Add KW_ONLY marker
 40 | 
 41 |     # Fields defined in this class (now keyword-only)
 42 |     text: str
 43 |     type: str = 'TEXT_NODE'
 44 | 
 45 |     def has_parent_with_highlight_index(self) -> bool:
 46 |         current = self.parent
 47 |         while current is not None:
 48 |             if current.highlight_index is not None:
 49 |                 return True
 50 |             current = current.parent
 51 |         return False
 52 | 
 53 |     # These visibility checks might be less useful now that JS handles it, but keep for now
 54 |     def is_parent_in_viewport(self) -> bool:
 55 |         if self.parent is None:
 56 |             return False
 57 |         return self.parent.is_in_viewport
 58 | 
 59 |     def is_parent_top_element(self) -> bool:
 60 |         if self.parent is None:
 61 |             return False
 62 |         return self.parent.is_top_element
 63 | 
 64 | # Define DOMElementNode *before* DOMBaseNode references it fully, or ensure Optional['DOMElementNode'] works
 65 | @dataclass(frozen=False)
 66 | class DOMElementNode(DOMBaseNode):
 67 |     """
 68 |     Represents an element node in the processed DOM tree.
 69 |     Includes information about interactivity, visibility, and structure.
 70 |     """
 71 |     tag_name: str = ""
 72 |     xpath: str = ""
 73 |     attributes: Dict[str, str] = field(default_factory=dict)
 74 |     # Use Union with string hint for forward reference if needed, or ensure DOMTextNode is defined first
 75 |     children: List[Union['DOMElementNode', DOMTextNode]] = field(default_factory=list)
 76 |     is_interactive: bool = False
 77 |     is_top_element: bool = False
 78 |     is_in_viewport: bool = False
 79 |     shadow_root: bool = False
 80 |     highlight_index: Optional[int] = None
 81 |     page_coordinates: Optional[CoordinateSet] = None
 82 |     viewport_coordinates: Optional[CoordinateSet] = None
 83 |     viewport_info: Optional[ViewportInfo] = None
 84 |     css_selector: Optional[str] = None # Added field for robust selector
 85 | 
 86 |     def __repr__(self) -> str:
 87 |         # ... (repr logic remains the same) ...
 88 |         tag_str = f'<{self.tag_name}'
 89 |         for key, value in self.attributes.items():
 90 |              # Shorten long values in repr
 91 |              value_repr = value if len(value) < 50 else value[:47] + '...'
 92 |              tag_str += f' {key}="{value_repr}"'
 93 |         tag_str += '>'
 94 | 
 95 |         extras = []
 96 |         if self.is_interactive: extras.append('interactive')
 97 |         if self.is_top_element: extras.append('top')
 98 |         if self.is_in_viewport: extras.append('in-viewport')
 99 |         if self.shadow_root: extras.append('shadow-root')
100 |         if self.highlight_index is not None: extras.append(f'highlight:{self.highlight_index}')
101 |         if self.css_selector: extras.append(f'css:"{self.css_selector[:50]}..."') # Show generated selector
102 | 
103 |         if extras:
104 |             tag_str += f' [{", ".join(extras)}]'
105 |         return tag_str
106 | 
107 |     @cached_property
108 |     def hash(self) -> HashedDomElement:
109 |         """ Lazily computes and caches the hash of the element using HistoryTreeProcessor. """
110 |         # Use relative import within the method to avoid top-level circular dependencies
111 |         from .history.service import HistoryTreeProcessor
112 |         # Ensure HistoryTreeProcessor._hash_dom_element exists and is static or accessible
113 |         return HistoryTreeProcessor._hash_dom_element(self)
114 | 
115 |     def get_all_text_till_next_clickable_element(self, max_depth: int = -1) -> str:
116 |         """
117 |         Recursively collects all text content within this element, stopping descent
118 |         if a nested interactive element (with a highlight_index) is encountered.
119 |         """
120 |         text_parts = []
121 | 
122 |         def collect_text(node: Union['DOMElementNode', DOMTextNode], current_depth: int) -> None:
123 |             if max_depth != -1 and current_depth > max_depth:
124 |                 return
125 | 
126 |             # Check if the node itself is interactive and not the starting node
127 |             if isinstance(node, DOMElementNode) and node is not self and node.highlight_index is not None:
128 |                 # Stop recursion down this path if we hit an interactive element
129 |                 return
130 | 
131 |             if isinstance(node, DOMTextNode):
132 |                 # Only include visible text nodes
133 |                 if node.is_visible:
134 |                     text_parts.append(node.text)
135 |             elif isinstance(node, DOMElementNode):
136 |                 # Recursively process children
137 |                 for child in node.children:
138 |                     collect_text(child, current_depth + 1)
139 | 
140 |         # Start collection from the element itself
141 |         collect_text(self, 0)
142 |         # Join collected parts and clean up whitespace
143 |         return '\n'.join(filter(None, (tp.strip() for tp in text_parts))).strip()
144 | 
145 | 
146 |     @time_execution_sync('--clickable_elements_to_string')
147 |     def generate_llm_context_string(self, 
148 |             include_attributes: Optional[List[str]] = None, 
149 |             max_static_elements_action: int = 50, # Max static elements for action context
150 |             max_static_elements_verification: int = 150, # Allow more static elements for verification context
151 |             context_purpose: Literal['action', 'verification'] = 'action' # New parameter
152 |         ) -> Tuple[str, Dict[str, 'DOMElementNode']]:
153 |         """
154 |         Generates a string representation of VISIBLE elements tree for LLM context.
155 |         Clearly distinguishes interactive elements (with index) from static ones.
156 |         Assigns temporary IDs to static elements for later lookup.
157 | 
158 |         Args:
159 |             include_attributes: List of specific attributes to include. If None, uses defaults.
160 |             max_static_elements_action: Max static elements for 'action' purpose.
161 |             max_static_elements_verification: Max static elements for 'verification' purpose.
162 |             context_purpose: 'action' (concise) or 'verification' (more inclusive static).
163 |             
164 |         Returns:
165 |             Tuple containing:
166 |                 - The formatted context string.
167 |                 - A dictionary mapping temporary static IDs (e.g., "s1", "s2")
168 |                   to the corresponding DOMElementNode objects.
169 | 
170 |         """
171 |         formatted_lines = []
172 |         processed_node_ids = set()
173 |         static_element_count = 0
174 |         nodes_processed_count = 0 
175 |         static_id_counter = 1 # Counter for temporary static IDs
176 |         temp_static_id_map: Dict[str, 'DOMElementNode'] = {} # Map temporary ID to node
177 | 
178 |         max_static_elements = max_static_elements_verification if context_purpose == 'verification' else max_static_elements_action
179 | 
180 |         
181 |         def get_direct_visible_text(node: DOMElementNode, max_len=10000) -> str:
182 |             """Gets text directly within this node, ignoring children elements."""
183 |             texts = []
184 |             for child in node.children:
185 |                 if isinstance(child, DOMTextNode) and child.is_visible:
186 |                     texts.append(child.text.strip())
187 |             full_text = ' '.join(filter(None, texts))
188 |             if len(full_text) > max_len:
189 |                  return full_text[:max_len-3] + "..."
190 |             return full_text
191 | 
192 |         def get_parent_hint(node: DOMElementNode) -> Optional[str]:
193 |             """Gets a hint string for the nearest identifiable parent."""
194 |             parent = node.parent
195 |             if isinstance(parent, DOMElementNode):
196 |                 parent_attrs = parent.attributes
197 |                 hint_parts = []
198 |                 if parent_attrs.get('id'):
199 |                     hint_parts.append(f"id=\"{parent_attrs['id'][:20]}\"") # Limit length
200 |                 if parent_attrs.get('data-testid'):
201 |                     hint_parts.append(f"data-testid=\"{parent_attrs['data-testid'][:20]}\"")
202 |                 # Add class hint only if specific? Maybe too noisy. Start with id/testid.
203 |                 # if parent_attrs.get('class'):
204 |                 #    stable_classes = [c for c in parent_attrs['class'].split() if len(c) > 3 and not c.isdigit()]
205 |                 #    if stable_classes: hint_parts.append(f"class=\"{stable_classes[0][:15]}...\"") # Show first stable class
206 | 
207 |                 if hint_parts:
208 |                     return f"(inside: <{parent.tag_name} {' '.join(hint_parts)}>)"
209 |             return None
210 | 
211 |         def process_node(node: Union['DOMElementNode', DOMTextNode], depth: int) -> None:
212 |             nonlocal static_element_count, nodes_processed_count, static_id_counter # Allow modification
213 | 
214 |             # Skip if already processed or not an element
215 |             if not isinstance(node, DOMElementNode): return
216 |             nodes_processed_count += 1
217 |             node_id = id(node)
218 |             if node_id in processed_node_ids: return
219 |             processed_node_ids.add(node_id)
220 | 
221 |             is_node_visible = node.is_visible
222 |             visibility_marker = "" if is_node_visible else " (Not Visible)" 
223 | 
224 |             should_add_current_node = False
225 |             line_to_add = ""
226 |             is_interactive = node.highlight_index is not None
227 |             temp_static_id_assigned = None # Track if ID was assigned to this node
228 | 
229 | 
230 |             indent = '  ' * depth
231 | 
232 |             # --- Attribute Extraction (Common logic) ---
233 |             attributes_to_show = {}
234 |             default_attrs = ['id', 'name', 'class', 'aria-label', 'placeholder', 'role', 'type', 'value', 'title', 'alt', 'href', 'data-testid', 'data-value']
235 |             attrs_to_check = include_attributes if include_attributes else default_attrs
236 |             extract_attrs_for_this_node = is_interactive or (context_purpose == 'verification')
237 |             if extract_attrs_for_this_node:
238 |                 for attr_key in attrs_to_check:
239 |                     if attr_key in node.attributes and node.attributes[attr_key] is not None: # Check for not None
240 |                         # Simple check to exclude extremely long class lists for brevity, unless it's ID/testid
241 |                         if attr_key == 'class' and len(node.attributes[attr_key]) > 20 and context_purpose == 'action':
242 |                             attributes_to_show[attr_key] = node.attributes[attr_key][:97] + "..."
243 |                         else:
244 |                             attributes_to_show[attr_key] = node.attributes[attr_key]
245 |             attrs_str = ""
246 |             if attributes_to_show:
247 |                 parts = []
248 |                 for key, value in attributes_to_show.items():
249 |                     value_str = str(value) # Ensure it's a string
250 |                     # Limit length for display
251 |                     display_value = value_str if len(value_str) < 50 else value_str[:47] + '...'
252 |                     # *** CORRECT HTML ESCAPING for attribute value strings ***
253 |                     display_value = display_value.replace('&', '&').replace('<', '<').replace('>', '>').replace('"', '"')
254 |                     parts.append(f'{key}="{display_value}"')
255 |                 attrs_str = " ".join(parts)
256 | 
257 |             # --- Format line based on Interactive vs. Static ---
258 |             if is_interactive:
259 |                 # == INTERACTIVE ELEMENT == (Always include)
260 |                 text_content = node.get_all_text_till_next_clickable_element()
261 |                 text_content = ' '.join(text_content.split()) if text_content else ""
262 |                 # Truncate long text for display
263 |                 if len(text_content) > 150: text_content = text_content[:147] + "..."
264 | 
265 |                 line_to_add = f"{indent}[{node.highlight_index}]<{node.tag_name}"
266 |                 if attrs_str: line_to_add += f" {attrs_str}"
267 |                 if text_content: line_to_add += f">{text_content}</{node.tag_name}>"
268 |                 else: line_to_add += " />"
269 |                 line_to_add += visibility_marker
270 |                 should_add_current_node = True
271 | 
272 |             elif static_element_count < max_static_elements:
273 |                 # == VISIBLE STATIC ELEMENT ==
274 |                 text_content = get_direct_visible_text(node)
275 |                 include_this_static = False
276 | 
277 |                 # Determine if static node is relevant for verification
278 |                 if context_purpose == 'verification':
279 |                     common_static_tags = {'p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'span', 'div', 'li', 'label', 'td', 'th', 'strong', 'em', 'dt', 'dd'}
280 |                     # Include if common tag OR has text OR *has attributes calculated in attrs_str*
281 |                     if node.tag_name in common_static_tags or text_content or attrs_str:
282 |                         include_this_static = True
283 |                         
284 |                 if not text_content:
285 |                     include_this_static = False
286 | 
287 |                 if include_this_static:
288 |                     # --- Assign temporary static ID ---
289 |                     current_static_id = f"s{static_id_counter}"
290 |                     temp_static_id_map[current_static_id] = node
291 |                     temp_static_id_assigned = current_static_id # Mark that ID was assigned
292 |                     static_id_counter += 1
293 |                     
294 |                     # *** Start building the line ***
295 |                     line_to_add = f"{indent}<{node.tag_name}"
296 | 
297 |                     # *** CRUCIAL: Add the calculated attributes string ***
298 |                     if attrs_str:
299 |                         line_to_add += f" {attrs_str}"
300 |                         
301 |                     # --- Add the static ID attribute to the string ---
302 |                     line_to_add += f' data-static-id="{current_static_id}"'
303 | 
304 |                     # *** Add the static marker ***
305 |                     line_to_add += " (Static)"
306 |                     line_to_add += visibility_marker
307 | 
308 |                     # *** Add parent hint ONLY if element lacks key identifiers ***
309 |                     node_attrs = node.attributes # Use original attributes for this check
310 |                     has_key_identifier = node_attrs.get('id') or node_attrs.get('data-testid') or node_attrs.get('name')
311 |                     if not has_key_identifier:
312 |                             parent_hint = get_parent_hint(node)
313 |                             if parent_hint:
314 |                                 line_to_add += f" {parent_hint}"
315 | 
316 |                     # *** Add text content and close tag ***
317 |                     if text_content:
318 |                         line_to_add += f">{text_content}</{node.tag_name}>"
319 |                     else:
320 |                         line_to_add += " />"
321 | 
322 |                     should_add_current_node = True
323 |                     static_element_count += 1
324 | 
325 |             # --- Add the formatted line if needed ---
326 |             if should_add_current_node:
327 |                 formatted_lines.append(line_to_add)
328 |                 # logger.debug(f"Added line: {line_to_add}") # Optional debug
329 | 
330 |             # --- ALWAYS Recurse into children (unless static limit hit) ---
331 |             # We recurse even if the parent wasn't added, because children might be visible/interactive
332 |             if static_element_count >= max_static_elements:
333 |                  # Stop recursing down static branches if limit is hit
334 |                  pass
335 |             else:
336 |                  for child in node.children:
337 |                      # Pass DOMElementNode or DOMTextNode directly
338 |                      process_node(child, depth + 1)
339 | 
340 | 
341 |         # Start processing from the root element
342 |         process_node(self, 0)
343 | 
344 |         # logger.debug(f"Finished generate_llm_context_string. Processed {nodes_processed_count} nodes. Added {len(formatted_lines)} lines.") # Log summary
345 | 
346 |         output_str = '\n'.join(formatted_lines)
347 |         if static_element_count >= max_static_elements:
348 |              output_str += f"\n{ '  ' * 0 }... (Static element list truncated after {max_static_elements} entries)"
349 |         return output_str, temp_static_id_map
350 | 
351 | 
352 |     def get_file_upload_element(self, check_siblings: bool = True) -> Optional['DOMElementNode']:
353 |         # Check if current element is a file input
354 |         if self.tag_name == 'input' and self.attributes.get('type') == 'file':
355 |             return self
356 | 
357 |         # Check children
358 |         for child in self.children:
359 |             if isinstance(child, DOMElementNode):
360 |                 result = child.get_file_upload_element(check_siblings=False)
361 |                 if result:
362 |                     return result
363 | 
364 |         # Check siblings only for the initial call
365 |         if check_siblings and self.parent:
366 |             for sibling in self.parent.children:
367 |                 if sibling is not self and isinstance(sibling, DOMElementNode):
368 |                     result = sibling.get_file_upload_element(check_siblings=False)
369 |                     if result:
370 |                         return result
371 | 
372 |         return None
373 | 
374 | # Type alias for the selector map
375 | SelectorMap = Dict[int, DOMElementNode]
376 | 
377 | 
378 | @dataclass
379 | class DOMState:
380 |     """Holds the state of the processed DOM at a point in time."""
381 |     element_tree: DOMElementNode
382 |     selector_map: SelectorMap
```

--------------------------------------------------------------------------------
/src/agents/crawler_agent.py:
--------------------------------------------------------------------------------

```python
  1 | # /src/crawler_agent.py
  2 | import logging
  3 | import time
  4 | from urllib.parse import urlparse, urljoin
  5 | from typing import List, Set, Dict, Any, Optional
  6 | import asyncio # For potential async operations if BrowserController becomes async
  7 | import re
  8 | import os
  9 | import json
 10 | 
 11 | from pydantic import BaseModel, Field
 12 | 
 13 | # Use relative imports within the package if applicable, or adjust paths
 14 | from ..browser.browser_controller import BrowserController
 15 | from ..llm.llm_client import LLMClient
 16 | 
 17 | logger = logging.getLogger(__name__)
 18 | 
 19 | # --- Pydantic Schema for LLM Response ---
 20 | class SuggestedTestStepsSchema(BaseModel):
 21 |     """Schema for suggested test steps relevant to the current page."""
 22 |     suggested_test_steps: List[str] = Field(..., description="List of 3-5 specific, actionable test step descriptions (like 'Click button X', 'Type Y into Z', 'Verify text A') relevant to the current page context.")
 23 |     reasoning: str = Field(..., description="Brief reasoning for why these steps are relevant to the page content and URL.")
 24 | 
 25 | # --- Crawler Agent ---
 26 | class CrawlerAgent:
 27 |     """
 28 |     Crawls a given domain, identifies unique pages, and uses an LLM
 29 |     to suggest potential test flows for each discovered page.
 30 |     """
 31 | 
 32 |     def __init__(self, llm_client: LLMClient, headless: bool = True, politeness_delay_sec: float = 1.0):
 33 |         self.llm_client = llm_client
 34 |         self.headless = headless
 35 |         self.politeness_delay = politeness_delay_sec
 36 |         self.browser_controller: Optional[BrowserController] = None
 37 | 
 38 |         # State for crawling
 39 |         self.base_domain: Optional[str] = None
 40 |         self.queue: List[str] = []
 41 |         self.visited_urls: Set[str] = set()
 42 |         self.discovered_steps: Dict[str, List[str]] = {}
 43 | 
 44 |     def _normalize_url(self, url: str) -> str:
 45 |         """Removes fragments and trailing slashes for consistent URL tracking."""
 46 |         parsed = urlparse(url)
 47 |         # Rebuild without fragment, ensure path starts with / if root
 48 |         path = parsed.path if parsed.path else '/'
 49 |         if path.endswith('/'):
 50 |              path = path[:-1] # Remove trailing slash unless it's just '/'
 51 |         # Ensure scheme and netloc are present
 52 |         scheme = parsed.scheme if parsed.scheme else 'http' # Default to http if missing? Or raise error? Let's default.
 53 |         netloc = parsed.netloc
 54 |         if not netloc:
 55 |              logger.warning(f"URL '{url}' missing network location (domain). Skipping.")
 56 |              return None # Invalid URL for crawling
 57 |         # Query params are usually important, keep them
 58 |         query = parsed.query
 59 |         # Reconstruct
 60 |         rebuilt_url = f"{scheme}://{netloc}{path}"
 61 |         if query:
 62 |             rebuilt_url += f"?{query}"
 63 |         return rebuilt_url.lower() # Use lowercase for comparison
 64 | 
 65 |     def _get_domain(self, url: str) -> Optional[str]:
 66 |         """Extracts the network location (domain) from a URL."""
 67 |         try:
 68 |             return urlparse(url).netloc.lower()
 69 |         except Exception:
 70 |             return None
 71 | 
 72 |     def _is_valid_url(self, url: str) -> bool:
 73 |         """Basic check if a URL seems valid for crawling."""
 74 |         try:
 75 |             parsed = urlparse(url)
 76 |             # Must have scheme (http/https) and netloc (domain)
 77 |             return all([parsed.scheme in ['http', 'https'], parsed.netloc])
 78 |         except Exception:
 79 |             return False
 80 | 
 81 |     def _extract_links(self, current_url: str) -> Set[str]:
 82 |         """Extracts and normalizes unique, valid links from the current page."""
 83 |         if not self.browser_controller or not self.browser_controller.page:
 84 |             logger.error("Browser not available for link extraction.")
 85 |             return set()
 86 | 
 87 |         extracted_links = set()
 88 |         try:
 89 |             # Use Playwright's locator to find all anchor tags
 90 |             links = self.browser_controller.page.locator('a[href]').all()
 91 |             logger.debug(f"Found {len(links)} potential link elements on {current_url}.")
 92 | 
 93 |             for link_locator in links:
 94 |                 try:
 95 |                     href = link_locator.get_attribute('href')
 96 |                     if href:
 97 |                         # Resolve relative URLs against the current page's URL
 98 |                         absolute_url = urljoin(current_url, href.strip())
 99 |                         normalized_url = self._normalize_url(absolute_url)
100 | 
101 |                         if normalized_url and self._is_valid_url(normalized_url):
102 |                              # logger.debug(f"  Found link: {href} -> {normalized_url}")
103 |                              extracted_links.add(normalized_url)
104 |                         # else: logger.debug(f"  Skipping invalid/malformed link: {href} -> {normalized_url}")
105 | 
106 |                 except Exception as link_err:
107 |                     # Log error getting attribute but continue with others
108 |                     logger.warning(f"Could not get href for a link element on {current_url}: {link_err}")
109 |                     continue # Skip this link
110 | 
111 |         except Exception as e:
112 |             logger.error(f"Error extracting links from {current_url}: {e}", exc_info=True)
113 | 
114 |         logger.info(f"Extracted {len(extracted_links)} unique, valid, normalized links from {current_url}.")
115 |         return extracted_links
116 | 
117 |     def _get_test_step_suggestions(self,
118 |                                    page_url: str,
119 |                                    dom_context_str: Optional[str],
120 |                                    screenshot_bytes: Optional[bytes]
121 |                                    ) -> List[str]:
122 |         """Asks the LLM to suggest specific test steps based on page URL, DOM, and screenshot."""
123 |         logger.info(f"Requesting LLM suggestions for page: {page_url} (using DOM/Screenshot context)")
124 | 
125 |         prompt = f"""
126 |         You are an AI Test Analyst identifying valuable test scenarios by suggesting specific test steps.
127 |         The crawler is currently on the web page:
128 |         URL: {page_url}
129 | 
130 |         Analyze the following page context:
131 |         1.  **URL & Page Purpose:** Infer the primary purpose of this page (e.g., Login, Blog Post, Product Details, Form Submission, Search Results, Homepage).
132 |         2.  **Visible DOM Elements:** Review the HTML snippet of visible elements. Note forms, primary action buttons (Submit, Add to Cart, Subscribe), key content areas, inputs, etc. Interactive elements are marked `[index]`, static with `(Static)`.
133 |         3.  **Screenshot:** Analyze the visual layout, focusing on interactive elements and prominent information relevant to the page's purpose.
134 | 
135 |         **Visible DOM Context:**
136 |         ```html
137 |         {dom_context_str if dom_context_str else "DOM context not available."}
138 |         ```
139 |         {f"**Screenshot Analysis:** Please analyze the attached screenshot for layout, visible text, forms, and key interactive elements." if screenshot_bytes else "**Note:** No screenshot provided."}
140 | 
141 |         **Your Task:**
142 |         Based on the inferred purpose and the page context (URL, DOM, Screenshot), suggest **one or two short sequences (totaling 4-7 steps)** of specific, actionable test steps representing **meaningful user interactions or verifications** related to the page's **core functionality**.
143 | 
144 |         **Step Description Requirements:**
145 |         *   Each step should be a single, clear instruction (e.g., "Click 'Login' button", "Type '[email protected]' into 'Email' field", "Verify 'Welcome Back!' message is displayed").
146 |         *   Describe target elements clearly using visual labels, placeholders, or roles (e.g., 'Username field', 'Add to Cart button', 'Subscribe to newsletter checkbox'). **Do NOT include CSS selectors or indices `[index]`**.
147 |         *   **Prioritize sequences:** Group related actions together logically (e.g., fill form fields -> click submit; select options -> add to cart).
148 |         *   **Focus on core function:** Test the main reason the page exists (logging in, submitting data, viewing specific content details, adding an item, completing a search, signing up, etc.).
149 |         *   **Include Verifications:** Crucially, add steps to verify expected outcomes after actions (e.g., "Verify success message 'Item Added' appears", "Verify error message 'Password required' is shown", "Verify user is redirected to dashboard page", "Verify shopping cart count increases").
150 |         *   **AVOID:** Simply listing navigation links (header, footer, sidebar) unless they are part of a specific task *initiated* on this page (like password recovery). Avoid generic actions ("Click image", "Click text") without clear purpose or verification.
151 | 
152 |         **Examples of GOOD Step Sequences:**
153 |         *   Login Page: `["Type 'testuser' into Username field", "Type 'wrongpass' into Password field", "Click Login button", "Verify 'Invalid credentials' error message is visible"]`
154 |         *   Product Page: `["Select 'Red' from Color dropdown", "Click 'Add to Cart' button", "Verify cart icon shows '1 item'", "Navigate to the shopping cart page"]`
155 |         *   Blog Page (if comments enabled): `["Scroll down to the comments section", "Type 'Great post!' into the comment input box", "Click the 'Submit Comment' button", "Verify 'Comment submitted successfully' message appears"]`
156 |         *   Newsletter Signup Form: `["Enter 'John Doe' into the Full Name field", "Enter '[email protected]' into the Email field", "Click the 'Subscribe' button", "Verify confirmation text 'Thanks for subscribing!' is displayed"]`
157 | 
158 |         **Examples of BAD/LOW-VALUE Steps (to avoid):**
159 |         *   `["Click Home link", "Click About Us link", "Click Contact link"]` (Just navigation, low value unless testing navigation itself specifically)
160 |         *   `["Click the first image", "Click the second paragraph"]` (No clear purpose or verification)
161 |         *   `["Type text into search bar"]` (Incomplete - what text? what next? add submit/verify)
162 | 
163 |         **Output Requirements:**
164 |         - Provide a JSON object matching the required schema (`SuggestedTestStepsSchema`).
165 |         - The `suggested_test_steps` list should contain 4-7 specific steps, ideally forming 1-2 meaningful sequences.
166 |         - Provide brief `reasoning` explaining *why* these steps test the core function.
167 | 
168 |         Respond ONLY with the JSON object matching the schema.
169 | 
170 |         """
171 | 
172 |         # Call generate_json, passing image_bytes if available
173 |         response_obj = self.llm_client.generate_json(
174 |             SuggestedTestStepsSchema, # Use the updated schema class
175 |             prompt,
176 |             image_bytes=screenshot_bytes # Pass the image bytes here
177 |         )
178 | 
179 |         if isinstance(response_obj, SuggestedTestStepsSchema):
180 |             logger.debug(f"LLM suggested steps for {page_url}: {response_obj.suggested_test_steps} (Reason: {response_obj.reasoning})")
181 |             # Validate the response list
182 |             if response_obj.suggested_test_steps and isinstance(response_obj.suggested_test_steps, list):
183 |                  valid_steps = [step for step in response_obj.suggested_test_steps if isinstance(step, str) and step.strip()]
184 |                  if len(valid_steps) != len(response_obj.suggested_test_steps):
185 |                       logger.warning(f"LLM response for {page_url} contained invalid step entries. Using only valid ones.")
186 |                  return valid_steps
187 |             else:
188 |                  logger.warning(f"LLM did not return a valid list of steps for {page_url}.")
189 |                  return []
190 |         elif isinstance(response_obj, str): # Handle error string
191 |             logger.error(f"LLM suggestion failed for {page_url}: {response_obj}")
192 |             return []
193 |         else: # Handle unexpected type
194 |             logger.error(f"Unexpected response type from LLM for {page_url}: {type(response_obj)}")
195 |             return []
196 | 
197 | 
198 |     def crawl_and_suggest(self, start_url: str, max_pages: int = 10) -> Dict[str, Any]:
199 |         """
200 |         Starts the crawling process from the given URL.
201 | 
202 |         Args:
203 |             start_url: The initial URL to start crawling from.
204 |             max_pages: The maximum number of unique pages to visit and get suggestions for.
205 | 
206 |         Returns:
207 |             A dictionary containing the results:
208 |             {
209 |                 "success": bool,
210 |                 "message": str,
211 |                 "start_url": str,
212 |                 "base_domain": str,
213 |                 "pages_visited": int,
214 |                 "discovered_steps": Dict[str, List[str]] # {url: [flow1, flow2,...]}
215 |             }
216 |         """
217 |         logger.info(f"Starting crawl from '{start_url}', max pages: {max_pages}")
218 |         crawl_result = {
219 |             "success": False,
220 |             "message": "Crawl initiated.",
221 |             "start_url": start_url,
222 |             "base_domain": None,
223 |             "pages_visited": 0,
224 |             "discovered_steps": {}
225 |         }
226 | 
227 |         # --- Initialization ---
228 |         self.queue = []
229 |         self.visited_urls = set()
230 |         self.discovered_steps = {}
231 | 
232 |         normalized_start_url = self._normalize_url(start_url)
233 |         if not normalized_start_url or not self._is_valid_url(normalized_start_url):
234 |             crawl_result["message"] = f"Invalid start URL provided: {start_url}"
235 |             logger.error(crawl_result["message"])
236 |             return crawl_result
237 | 
238 |         self.base_domain = self._get_domain(normalized_start_url)
239 |         if not self.base_domain:
240 |              crawl_result["message"] = f"Could not extract domain from start URL: {start_url}"
241 |              logger.error(crawl_result["message"])
242 |              return crawl_result
243 | 
244 |         crawl_result["base_domain"] = self.base_domain
245 |         self.queue.append(normalized_start_url)
246 |         logger.info(f"Base domain set to: {self.base_domain}")
247 | 
248 |         try:
249 |             # --- Setup Browser ---
250 |             logger.info("Starting browser for crawler...")
251 |             self.browser_controller = BrowserController(headless=self.headless)
252 |             self.browser_controller.start()
253 |             if not self.browser_controller.page:
254 |                  raise RuntimeError("Failed to initialize browser page for crawler.")
255 | 
256 |             # --- Crawling Loop ---
257 |             while self.queue and len(self.visited_urls) < max_pages:
258 |                 current_url = self.queue.pop(0)
259 | 
260 |                 # Skip if already visited
261 |                 if current_url in self.visited_urls:
262 |                     logger.debug(f"Skipping already visited URL: {current_url}")
263 |                     continue
264 | 
265 |                 # Check if it belongs to the target domain
266 |                 current_domain = self._get_domain(current_url)
267 |                 if current_domain != self.base_domain:
268 |                     logger.debug(f"Skipping URL outside base domain ({self.base_domain}): {current_url}")
269 |                     continue
270 | 
271 |                 logger.info(f"Visiting ({len(self.visited_urls) + 1}/{max_pages}): {current_url}")
272 |                 self.visited_urls.add(current_url)
273 |                 crawl_result["pages_visited"] += 1
274 | 
275 |                 dom_context_str: Optional[str] = None
276 |                 screenshot_bytes: Optional[bytes] = None
277 | 
278 |                 # Navigate
279 |                 try:
280 |                     self.browser_controller.goto(current_url)
281 |                     # Optional: Add wait for load state if needed, goto has basic wait
282 |                     self.browser_controller.page.wait_for_load_state('domcontentloaded', timeout=15000)
283 |                     actual_url_after_nav = self.browser_controller.get_current_url() # Get the URL we actually landed on
284 | 
285 |                     # --- >>> ADDED DOMAIN CHECK AFTER NAVIGATION <<< ---
286 |                     actual_domain = self._get_domain(actual_url_after_nav)
287 |                     if actual_domain != self.base_domain:
288 |                         logger.warning(f"Redirected outside base domain! "
289 |                                        f"Initial: {current_url}, Final: {actual_url_after_nav} ({actual_domain}). Skipping further processing for this page.")
290 |                         # Optional: Add the actual off-domain URL to visited to prevent loops if it links back?
291 |                         off_domain_normalized = self._normalize_url(actual_url_after_nav)
292 |                         if off_domain_normalized:
293 |                             self.visited_urls.add(off_domain_normalized)
294 |                         time.sleep(self.politeness_delay) # Still add delay
295 |                         continue # Skip context gathering, link extraction, suggestions for this off-domain page
296 |                     # --- Gather Context (DOM + Screenshot) ---
297 |                     try:
298 |                         logger.debug(f"Gathering DOM state for {current_url}...")
299 |                         dom_state = self.browser_controller.get_structured_dom()
300 |                         if dom_state and dom_state.element_tree:
301 |                             dom_context_str, _ = dom_state.element_tree.generate_llm_context_string(context_purpose='verification') # Use verification context (more static elements)
302 |                             logger.debug(f"DOM context string generated (length: {len(dom_context_str)}).")
303 |                         else:
304 |                              logger.warning(f"Could not get structured DOM for {current_url}.")
305 |                              dom_context_str = "Error retrieving DOM structure."
306 | 
307 |                         logger.debug(f"Taking screenshot for {current_url}...")
308 |                         screenshot_bytes = self.browser_controller.take_screenshot()
309 |                         if not screenshot_bytes:
310 |                              logger.warning(f"Failed to take screenshot for {current_url}.")
311 | 
312 |                     except Exception as context_err:
313 |                         logger.error(f"Failed to gather context (DOM/Screenshot) for {current_url}: {context_err}")
314 |                         dom_context_str = f"Error gathering context: {context_err}"
315 |                         screenshot_bytes = None # Ensure screenshot is None if context gathering failed
316 | 
317 |                 except Exception as nav_e:
318 |                     logger.warning(f"Failed to navigate to {current_url}: {nav_e}. Skipping this page.")
319 |                     # Don't add links or suggestions if navigation fails
320 |                     continue # Skip to next URL in queue
321 | 
322 |                 # Extract Links
323 |                 new_links = self._extract_links(current_url)
324 |                 for link in new_links:
325 |                     if link not in self.visited_urls and self._get_domain(link) == self.base_domain:
326 |                          if link not in self.queue: # Add only if not already queued
327 |                               self.queue.append(link)
328 | 
329 |                 # --- Get LLM Suggestions (using gathered context) ---
330 |                 suggestions = self._get_test_step_suggestions(
331 |                     current_url,
332 |                     dom_context_str,
333 |                     screenshot_bytes
334 |                 )
335 |                 if suggestions:
336 |                     self.discovered_steps[current_url] = suggestions 
337 | 
338 |                 # Politeness delay
339 |                 logger.debug(f"Waiting {self.politeness_delay}s before next page...")
340 |                 time.sleep(self.politeness_delay)
341 | 
342 | 
343 |             # --- Loop End ---
344 |             crawl_result["success"] = True
345 |             if len(self.visited_urls) >= max_pages:
346 |                 crawl_result["message"] = f"Crawl finished: Reached max pages limit ({max_pages})."
347 |                 logger.info(crawl_result["message"])
348 |             elif not self.queue:
349 |                 crawl_result["message"] = f"Crawl finished: Explored all reachable pages within domain ({len(self.visited_urls)} visited)."
350 |                 logger.info(crawl_result["message"])
351 |             else: # Should not happen unless error
352 |                 crawl_result["message"] = "Crawl finished unexpectedly."
353 | 
354 |             crawl_result["discovered_steps"] = self.discovered_steps
355 | 
356 | 
357 |         except Exception as e:
358 |             logger.critical(f"Critical error during crawl process: {e}", exc_info=True)
359 |             crawl_result["message"] = f"Crawler failed with error: {e}"
360 |             crawl_result["success"] = False
361 |         finally:
362 |             logger.info("--- Ending Crawl ---")
363 |             if self.browser_controller:
364 |                 self.browser_controller.close()
365 |                 self.browser_controller = None
366 | 
367 |             logger.info(f"Crawl Summary: Visited {crawl_result['pages_visited']} pages. Found suggestions for {len(crawl_result.get('discovered_steps', {}))} pages.")
368 | 
369 |         return crawl_result
370 |     
371 | 
```
Page 1/4FirstPrevNextLast