#
tokens: 44685/50000 6/616 files (page 18/28)
lines: on (toggle) GitHub
raw markdown copy reset
This is page 18 of 28. Use http://codebase.md/trycua/cua?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .cursorignore
├── .dockerignore
├── .editorconfig
├── .gitattributes
├── .github
│   ├── FUNDING.yml
│   ├── scripts
│   │   ├── get_pyproject_version.py
│   │   └── tests
│   │       ├── __init__.py
│   │       ├── README.md
│   │       └── test_get_pyproject_version.py
│   └── workflows
│       ├── bump-version.yml
│       ├── ci-lume.yml
│       ├── docker-publish-cua-linux.yml
│       ├── docker-publish-cua-windows.yml
│       ├── docker-publish-kasm.yml
│       ├── docker-publish-xfce.yml
│       ├── docker-reusable-publish.yml
│       ├── link-check.yml
│       ├── lint.yml
│       ├── npm-publish-cli.yml
│       ├── npm-publish-computer.yml
│       ├── npm-publish-core.yml
│       ├── publish-lume.yml
│       ├── pypi-publish-agent.yml
│       ├── pypi-publish-computer-server.yml
│       ├── pypi-publish-computer.yml
│       ├── pypi-publish-core.yml
│       ├── pypi-publish-mcp-server.yml
│       ├── pypi-publish-som.yml
│       ├── pypi-reusable-publish.yml
│       ├── python-tests.yml
│       ├── test-cua-models.yml
│       └── test-validation-script.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .prettierignore
├── .prettierrc.yaml
├── .vscode
│   ├── docs.code-workspace
│   ├── extensions.json
│   ├── launch.json
│   ├── libs-ts.code-workspace
│   ├── lume.code-workspace
│   ├── lumier.code-workspace
│   ├── py.code-workspace
│   └── settings.json
├── blog
│   ├── app-use.md
│   ├── assets
│   │   ├── composite-agents.png
│   │   ├── docker-ubuntu-support.png
│   │   ├── hack-booth.png
│   │   ├── hack-closing-ceremony.jpg
│   │   ├── hack-cua-ollama-hud.jpeg
│   │   ├── hack-leaderboard.png
│   │   ├── hack-the-north.png
│   │   ├── hack-winners.jpeg
│   │   ├── hack-workshop.jpeg
│   │   ├── hud-agent-evals.png
│   │   └── trajectory-viewer.jpeg
│   ├── bringing-computer-use-to-the-web.md
│   ├── build-your-own-operator-on-macos-1.md
│   ├── build-your-own-operator-on-macos-2.md
│   ├── cloud-windows-ga-macos-preview.md
│   ├── composite-agents.md
│   ├── computer-use-agents-for-growth-hacking.md
│   ├── cua-hackathon.md
│   ├── cua-playground-preview.md
│   ├── cua-vlm-router.md
│   ├── hack-the-north.md
│   ├── hud-agent-evals.md
│   ├── human-in-the-loop.md
│   ├── introducing-cua-cli.md
│   ├── introducing-cua-cloud-containers.md
│   ├── lume-to-containerization.md
│   ├── neurips-2025-cua-papers.md
│   ├── sandboxed-python-execution.md
│   ├── training-computer-use-models-trajectories-1.md
│   ├── trajectory-viewer.md
│   ├── ubuntu-docker-support.md
│   └── windows-sandbox.md
├── CONTRIBUTING.md
├── Development.md
├── Dockerfile
├── docs
│   ├── .env.example
│   ├── .gitignore
│   ├── content
│   │   └── docs
│   │       ├── agent-sdk
│   │       │   ├── agent-loops.mdx
│   │       │   ├── benchmarks
│   │       │   │   ├── index.mdx
│   │       │   │   ├── interactive.mdx
│   │       │   │   ├── introduction.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── osworld-verified.mdx
│   │       │   │   ├── screenspot-pro.mdx
│   │       │   │   └── screenspot-v2.mdx
│   │       │   ├── callbacks
│   │       │   │   ├── agent-lifecycle.mdx
│   │       │   │   ├── cost-saving.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── logging.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── pii-anonymization.mdx
│   │       │   │   └── trajectories.mdx
│   │       │   ├── chat-history.mdx
│   │       │   ├── custom-tools.mdx
│   │       │   ├── customizing-computeragent.mdx
│   │       │   ├── integrations
│   │       │   │   ├── hud.mdx
│   │       │   │   ├── meta.json
│   │       │   │   └── observability.mdx
│   │       │   ├── mcp-server
│   │       │   │   ├── client-integrations.mdx
│   │       │   │   ├── configuration.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   ├── llm-integrations.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── tools.mdx
│   │       │   │   └── usage.mdx
│   │       │   ├── message-format.mdx
│   │       │   ├── meta.json
│   │       │   ├── migration-guide.mdx
│   │       │   ├── prompt-caching.mdx
│   │       │   ├── supported-agents
│   │       │   │   ├── composed-agents.mdx
│   │       │   │   ├── computer-use-agents.mdx
│   │       │   │   ├── grounding-models.mdx
│   │       │   │   ├── human-in-the-loop.mdx
│   │       │   │   └── meta.json
│   │       │   ├── supported-model-providers
│   │       │   │   ├── cua-vlm-router.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   └── local-models.mdx
│   │       │   ├── telemetry.mdx
│   │       │   └── usage-tracking.mdx
│   │       ├── cli-playbook
│   │       │   ├── commands.mdx
│   │       │   ├── index.mdx
│   │       │   └── meta.json
│   │       ├── computer-sdk
│   │       │   ├── cloud-vm-management.mdx
│   │       │   ├── commands.mdx
│   │       │   ├── computer-server
│   │       │   │   ├── Commands.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── REST-API.mdx
│   │       │   │   └── WebSocket-API.mdx
│   │       │   ├── computer-ui.mdx
│   │       │   ├── computers.mdx
│   │       │   ├── custom-computer-handlers.mdx
│   │       │   ├── meta.json
│   │       │   ├── sandboxed-python.mdx
│   │       │   └── tracing-api.mdx
│   │       ├── example-usecases
│   │       │   ├── form-filling.mdx
│   │       │   ├── gemini-complex-ui-navigation.mdx
│   │       │   ├── meta.json
│   │       │   ├── post-event-contact-export.mdx
│   │       │   └── windows-app-behind-vpn.mdx
│   │       ├── get-started
│   │       │   ├── meta.json
│   │       │   └── quickstart.mdx
│   │       ├── index.mdx
│   │       ├── macos-vm-cli-playbook
│   │       │   ├── lume
│   │       │   │   ├── cli-reference.mdx
│   │       │   │   ├── faq.md
│   │       │   │   ├── http-api.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   ├── meta.json
│   │       │   │   └── prebuilt-images.mdx
│   │       │   ├── lumier
│   │       │   │   ├── building-lumier.mdx
│   │       │   │   ├── docker-compose.mdx
│   │       │   │   ├── docker.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   └── meta.json
│   │       │   └── meta.json
│   │       └── meta.json
│   ├── next.config.mjs
│   ├── package-lock.json
│   ├── package.json
│   ├── pnpm-lock.yaml
│   ├── postcss.config.mjs
│   ├── public
│   │   └── img
│   │       ├── agent_gradio_ui.png
│   │       ├── agent.png
│   │       ├── bg-dark.jpg
│   │       ├── bg-light.jpg
│   │       ├── cli.png
│   │       ├── computer.png
│   │       ├── grounding-with-gemini3.gif
│   │       ├── hero.png
│   │       ├── laminar_trace_example.png
│   │       ├── som_box_threshold.png
│   │       └── som_iou_threshold.png
│   ├── README.md
│   ├── source.config.ts
│   ├── src
│   │   ├── app
│   │   │   ├── (home)
│   │   │   │   ├── [[...slug]]
│   │   │   │   │   └── page.tsx
│   │   │   │   └── layout.tsx
│   │   │   ├── api
│   │   │   │   ├── posthog
│   │   │   │   │   └── [...path]
│   │   │   │   │       └── route.ts
│   │   │   │   └── search
│   │   │   │       └── route.ts
│   │   │   ├── favicon.ico
│   │   │   ├── global.css
│   │   │   ├── layout.config.tsx
│   │   │   ├── layout.tsx
│   │   │   ├── llms.mdx
│   │   │   │   └── [[...slug]]
│   │   │   │       └── route.ts
│   │   │   ├── llms.txt
│   │   │   │   └── route.ts
│   │   │   ├── robots.ts
│   │   │   └── sitemap.ts
│   │   ├── assets
│   │   │   ├── discord-black.svg
│   │   │   ├── discord-white.svg
│   │   │   ├── logo-black.svg
│   │   │   └── logo-white.svg
│   │   ├── components
│   │   │   ├── analytics-tracker.tsx
│   │   │   ├── cookie-consent.tsx
│   │   │   ├── doc-actions-menu.tsx
│   │   │   ├── editable-code-block.tsx
│   │   │   ├── footer.tsx
│   │   │   ├── hero.tsx
│   │   │   ├── iou.tsx
│   │   │   ├── mermaid.tsx
│   │   │   └── page-feedback.tsx
│   │   ├── lib
│   │   │   ├── llms.ts
│   │   │   └── source.ts
│   │   ├── mdx-components.tsx
│   │   └── providers
│   │       └── posthog-provider.tsx
│   └── tsconfig.json
├── examples
│   ├── agent_examples.py
│   ├── agent_ui_examples.py
│   ├── browser_tool_example.py
│   ├── cloud_api_examples.py
│   ├── computer_examples_windows.py
│   ├── computer_examples.py
│   ├── computer_ui_examples.py
│   ├── computer-example-ts
│   │   ├── .env.example
│   │   ├── .gitignore
│   │   ├── package-lock.json
│   │   ├── package.json
│   │   ├── pnpm-lock.yaml
│   │   ├── README.md
│   │   ├── src
│   │   │   ├── helpers.ts
│   │   │   └── index.ts
│   │   └── tsconfig.json
│   ├── docker_examples.py
│   ├── evals
│   │   ├── hud_eval_examples.py
│   │   └── wikipedia_most_linked.txt
│   ├── pylume_examples.py
│   ├── sandboxed_functions_examples.py
│   ├── som_examples.py
│   ├── tracing_examples.py
│   ├── utils.py
│   └── winsandbox_example.py
├── img
│   ├── agent_gradio_ui.png
│   ├── agent.png
│   ├── cli.png
│   ├── computer.png
│   ├── logo_black.png
│   └── logo_white.png
├── libs
│   ├── kasm
│   │   ├── Dockerfile
│   │   ├── LICENSE
│   │   ├── README.md
│   │   └── src
│   │       └── ubuntu
│   │           └── install
│   │               └── firefox
│   │                   ├── custom_startup.sh
│   │                   ├── firefox.desktop
│   │                   └── install_firefox.sh
│   ├── lume
│   │   ├── .cursorignore
│   │   ├── CONTRIBUTING.md
│   │   ├── Development.md
│   │   ├── img
│   │   │   └── cli.png
│   │   ├── Package.resolved
│   │   ├── Package.swift
│   │   ├── README.md
│   │   ├── resources
│   │   │   └── lume.entitlements
│   │   ├── scripts
│   │   │   ├── build
│   │   │   │   ├── build-debug.sh
│   │   │   │   ├── build-release-notarized.sh
│   │   │   │   └── build-release.sh
│   │   │   └── install.sh
│   │   ├── src
│   │   │   ├── Commands
│   │   │   │   ├── Clone.swift
│   │   │   │   ├── Config.swift
│   │   │   │   ├── Create.swift
│   │   │   │   ├── Delete.swift
│   │   │   │   ├── Get.swift
│   │   │   │   ├── Images.swift
│   │   │   │   ├── IPSW.swift
│   │   │   │   ├── List.swift
│   │   │   │   ├── Logs.swift
│   │   │   │   ├── Options
│   │   │   │   │   └── FormatOption.swift
│   │   │   │   ├── Prune.swift
│   │   │   │   ├── Pull.swift
│   │   │   │   ├── Push.swift
│   │   │   │   ├── Run.swift
│   │   │   │   ├── Serve.swift
│   │   │   │   ├── Set.swift
│   │   │   │   └── Stop.swift
│   │   │   ├── ContainerRegistry
│   │   │   │   ├── ImageContainerRegistry.swift
│   │   │   │   ├── ImageList.swift
│   │   │   │   └── ImagesPrinter.swift
│   │   │   ├── Errors
│   │   │   │   └── Errors.swift
│   │   │   ├── FileSystem
│   │   │   │   ├── Home.swift
│   │   │   │   ├── Settings.swift
│   │   │   │   ├── VMConfig.swift
│   │   │   │   ├── VMDirectory.swift
│   │   │   │   └── VMLocation.swift
│   │   │   ├── LumeController.swift
│   │   │   ├── Main.swift
│   │   │   ├── Server
│   │   │   │   ├── Handlers.swift
│   │   │   │   ├── HTTP.swift
│   │   │   │   ├── Requests.swift
│   │   │   │   ├── Responses.swift
│   │   │   │   └── Server.swift
│   │   │   ├── Utils
│   │   │   │   ├── CommandRegistry.swift
│   │   │   │   ├── CommandUtils.swift
│   │   │   │   ├── Logger.swift
│   │   │   │   ├── NetworkUtils.swift
│   │   │   │   ├── Path.swift
│   │   │   │   ├── ProcessRunner.swift
│   │   │   │   ├── ProgressLogger.swift
│   │   │   │   ├── String.swift
│   │   │   │   └── Utils.swift
│   │   │   ├── Virtualization
│   │   │   │   ├── DarwinImageLoader.swift
│   │   │   │   ├── DHCPLeaseParser.swift
│   │   │   │   ├── ImageLoaderFactory.swift
│   │   │   │   └── VMVirtualizationService.swift
│   │   │   ├── VM
│   │   │   │   ├── DarwinVM.swift
│   │   │   │   ├── LinuxVM.swift
│   │   │   │   ├── VM.swift
│   │   │   │   ├── VMDetails.swift
│   │   │   │   ├── VMDetailsPrinter.swift
│   │   │   │   ├── VMDisplayResolution.swift
│   │   │   │   └── VMFactory.swift
│   │   │   └── VNC
│   │   │       ├── PassphraseGenerator.swift
│   │   │       └── VNCService.swift
│   │   └── tests
│   │       ├── Mocks
│   │       │   ├── MockVM.swift
│   │       │   ├── MockVMVirtualizationService.swift
│   │       │   └── MockVNCService.swift
│   │       ├── VM
│   │       │   └── VMDetailsPrinterTests.swift
│   │       ├── VMTests.swift
│   │       ├── VMVirtualizationServiceTests.swift
│   │       └── VNCServiceTests.swift
│   ├── lumier
│   │   ├── .dockerignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   └── src
│   │       ├── bin
│   │       │   └── entry.sh
│   │       ├── config
│   │       │   └── constants.sh
│   │       ├── hooks
│   │       │   └── on-logon.sh
│   │       └── lib
│   │           ├── utils.sh
│   │           └── vm.sh
│   ├── python
│   │   ├── agent
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── agent
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── adapters
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── cua_adapter.py
│   │   │   │   │   ├── huggingfacelocal_adapter.py
│   │   │   │   │   ├── human_adapter.py
│   │   │   │   │   ├── mlxvlm_adapter.py
│   │   │   │   │   └── models
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── generic.py
│   │   │   │   │       ├── internvl.py
│   │   │   │   │       ├── opencua.py
│   │   │   │   │       └── qwen2_5_vl.py
│   │   │   │   ├── agent.py
│   │   │   │   ├── callbacks
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── budget_manager.py
│   │   │   │   │   ├── image_retention.py
│   │   │   │   │   ├── logging.py
│   │   │   │   │   ├── operator_validator.py
│   │   │   │   │   ├── pii_anonymization.py
│   │   │   │   │   ├── prompt_instructions.py
│   │   │   │   │   ├── telemetry.py
│   │   │   │   │   └── trajectory_saver.py
│   │   │   │   ├── cli.py
│   │   │   │   ├── computers
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cua.py
│   │   │   │   │   └── custom.py
│   │   │   │   ├── decorators.py
│   │   │   │   ├── human_tool
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── __main__.py
│   │   │   │   │   ├── server.py
│   │   │   │   │   └── ui.py
│   │   │   │   ├── integrations
│   │   │   │   │   └── hud
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── agent.py
│   │   │   │   │       └── proxy.py
│   │   │   │   ├── loops
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── anthropic.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── composed_grounded.py
│   │   │   │   │   ├── gelato.py
│   │   │   │   │   ├── gemini.py
│   │   │   │   │   ├── generic_vlm.py
│   │   │   │   │   ├── glm45v.py
│   │   │   │   │   ├── gta1.py
│   │   │   │   │   ├── holo.py
│   │   │   │   │   ├── internvl.py
│   │   │   │   │   ├── model_types.csv
│   │   │   │   │   ├── moondream3.py
│   │   │   │   │   ├── omniparser.py
│   │   │   │   │   ├── openai.py
│   │   │   │   │   ├── opencua.py
│   │   │   │   │   ├── uiins.py
│   │   │   │   │   ├── uitars.py
│   │   │   │   │   └── uitars2.py
│   │   │   │   ├── proxy
│   │   │   │   │   ├── examples.py
│   │   │   │   │   └── handlers.py
│   │   │   │   ├── responses.py
│   │   │   │   ├── tools
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── browser_tool.py
│   │   │   │   ├── types.py
│   │   │   │   └── ui
│   │   │   │       ├── __init__.py
│   │   │   │       ├── __main__.py
│   │   │   │       └── gradio
│   │   │   │           ├── __init__.py
│   │   │   │           ├── app.py
│   │   │   │           └── ui_components.py
│   │   │   ├── benchmarks
│   │   │   │   ├── .gitignore
│   │   │   │   ├── contrib.md
│   │   │   │   ├── interactive.py
│   │   │   │   ├── models
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── gta1.py
│   │   │   │   ├── README.md
│   │   │   │   ├── ss-pro.py
│   │   │   │   ├── ss-v2.py
│   │   │   │   └── utils.py
│   │   │   ├── example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_computer_agent.py
│   │   ├── bench-ui
│   │   │   ├── bench_ui
│   │   │   │   ├── __init__.py
│   │   │   │   ├── api.py
│   │   │   │   └── child.py
│   │   │   ├── examples
│   │   │   │   ├── folder_example.py
│   │   │   │   ├── gui
│   │   │   │   │   ├── index.html
│   │   │   │   │   ├── logo.svg
│   │   │   │   │   └── styles.css
│   │   │   │   ├── output_overlay.png
│   │   │   │   └── simple_example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       └── test_port_detection.py
│   │   ├── computer
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── computer
│   │   │   │   ├── __init__.py
│   │   │   │   ├── computer.py
│   │   │   │   ├── diorama_computer.py
│   │   │   │   ├── helpers.py
│   │   │   │   ├── interface
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── generic.py
│   │   │   │   │   ├── linux.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   ├── models.py
│   │   │   │   │   └── windows.py
│   │   │   │   ├── logger.py
│   │   │   │   ├── models.py
│   │   │   │   ├── providers
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cloud
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── docker
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── lume
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── lume_api.py
│   │   │   │   │   ├── lumier
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── types.py
│   │   │   │   │   └── winsandbox
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── provider.py
│   │   │   │   │       └── setup_script.ps1
│   │   │   │   ├── tracing_wrapper.py
│   │   │   │   ├── tracing.py
│   │   │   │   ├── ui
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── __main__.py
│   │   │   │   │   └── gradio
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       └── app.py
│   │   │   │   └── utils.py
│   │   │   ├── poetry.toml
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_computer.py
│   │   ├── computer-server
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── computer_server
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── browser.py
│   │   │   │   ├── cli.py
│   │   │   │   ├── diorama
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── diorama_computer.py
│   │   │   │   │   ├── diorama.py
│   │   │   │   │   ├── draw.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   └── safezone.py
│   │   │   │   ├── handlers
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── generic.py
│   │   │   │   │   ├── linux.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   └── windows.py
│   │   │   │   ├── main.py
│   │   │   │   ├── server.py
│   │   │   │   ├── utils
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── wallpaper.py
│   │   │   │   └── watchdog.py
│   │   │   ├── examples
│   │   │   │   ├── __init__.py
│   │   │   │   └── usage_example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   ├── run_server.py
│   │   │   ├── test_connection.py
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_server.py
│   │   ├── core
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── core
│   │   │   │   ├── __init__.py
│   │   │   │   └── telemetry
│   │   │   │       ├── __init__.py
│   │   │   │       └── posthog.py
│   │   │   ├── poetry.toml
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_telemetry.py
│   │   ├── mcp-server
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── build-extension.py
│   │   │   ├── CONCURRENT_SESSIONS.md
│   │   │   ├── desktop-extension
│   │   │   │   ├── cua-extension.mcpb
│   │   │   │   ├── desktop_extension.png
│   │   │   │   ├── manifest.json
│   │   │   │   ├── README.md
│   │   │   │   ├── requirements.txt
│   │   │   │   ├── run_server.sh
│   │   │   │   └── setup.py
│   │   │   ├── mcp_server
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── server.py
│   │   │   │   └── session_manager.py
│   │   │   ├── pdm.lock
│   │   │   ├── pyproject.toml
│   │   │   ├── QUICK_TEST_COMMANDS.sh
│   │   │   ├── quick_test_local_option.py
│   │   │   ├── README.md
│   │   │   ├── scripts
│   │   │   │   ├── install_mcp_server.sh
│   │   │   │   └── start_mcp_server.sh
│   │   │   ├── test_mcp_server_local_option.py
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_mcp_server.py
│   │   ├── pylume
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_pylume.py
│   │   └── som
│   │       ├── .bumpversion.cfg
│   │       ├── LICENSE
│   │       ├── poetry.toml
│   │       ├── pyproject.toml
│   │       ├── README.md
│   │       ├── som
│   │       │   ├── __init__.py
│   │       │   ├── detect.py
│   │       │   ├── detection.py
│   │       │   ├── models.py
│   │       │   ├── ocr.py
│   │       │   ├── util
│   │       │   │   └── utils.py
│   │       │   └── visualization.py
│   │       └── tests
│   │           ├── conftest.py
│   │           └── test_omniparser.py
│   ├── qemu-docker
│   │   ├── linux
│   │   │   ├── Dockerfile
│   │   │   ├── README.md
│   │   │   └── src
│   │   │       ├── entry.sh
│   │   │       └── vm
│   │   │           ├── image
│   │   │           │   └── README.md
│   │   │           └── setup
│   │   │               ├── install.sh
│   │   │               ├── setup-cua-server.sh
│   │   │               └── setup.sh
│   │   ├── README.md
│   │   └── windows
│   │       ├── Dockerfile
│   │       ├── README.md
│   │       └── src
│   │           ├── entry.sh
│   │           └── vm
│   │               ├── image
│   │               │   └── README.md
│   │               └── setup
│   │                   ├── install.bat
│   │                   ├── on-logon.ps1
│   │                   ├── setup-cua-server.ps1
│   │                   ├── setup-utils.psm1
│   │                   └── setup.ps1
│   ├── typescript
│   │   ├── .gitignore
│   │   ├── .nvmrc
│   │   ├── agent
│   │   │   ├── examples
│   │   │   │   ├── playground-example.html
│   │   │   │   └── README.md
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── client.ts
│   │   │   │   ├── index.ts
│   │   │   │   └── types.ts
│   │   │   ├── tests
│   │   │   │   └── client.test.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── computer
│   │   │   ├── .editorconfig
│   │   │   ├── .gitattributes
│   │   │   ├── .gitignore
│   │   │   ├── LICENSE
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── computer
│   │   │   │   │   ├── index.ts
│   │   │   │   │   ├── providers
│   │   │   │   │   │   ├── base.ts
│   │   │   │   │   │   ├── cloud.ts
│   │   │   │   │   │   └── index.ts
│   │   │   │   │   └── types.ts
│   │   │   │   ├── index.ts
│   │   │   │   ├── interface
│   │   │   │   │   ├── base.ts
│   │   │   │   │   ├── factory.ts
│   │   │   │   │   ├── index.ts
│   │   │   │   │   ├── linux.ts
│   │   │   │   │   ├── macos.ts
│   │   │   │   │   └── windows.ts
│   │   │   │   └── types.ts
│   │   │   ├── tests
│   │   │   │   ├── computer
│   │   │   │   │   └── cloud.test.ts
│   │   │   │   ├── interface
│   │   │   │   │   ├── factory.test.ts
│   │   │   │   │   ├── index.test.ts
│   │   │   │   │   ├── linux.test.ts
│   │   │   │   │   ├── macos.test.ts
│   │   │   │   │   └── windows.test.ts
│   │   │   │   └── setup.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── core
│   │   │   ├── .editorconfig
│   │   │   ├── .gitattributes
│   │   │   ├── .gitignore
│   │   │   ├── LICENSE
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── index.ts
│   │   │   │   └── telemetry
│   │   │   │       ├── clients
│   │   │   │       │   ├── index.ts
│   │   │   │       │   └── posthog.ts
│   │   │   │       └── index.ts
│   │   │   ├── tests
│   │   │   │   └── telemetry.test.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── cua-cli
│   │   │   ├── .gitignore
│   │   │   ├── .prettierrc
│   │   │   ├── bun.lock
│   │   │   ├── CLAUDE.md
│   │   │   ├── index.ts
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── auth.ts
│   │   │   │   ├── cli.ts
│   │   │   │   ├── commands
│   │   │   │   │   ├── auth.ts
│   │   │   │   │   └── sandbox.ts
│   │   │   │   ├── config.ts
│   │   │   │   ├── http.ts
│   │   │   │   ├── storage.ts
│   │   │   │   └── util.ts
│   │   │   └── tsconfig.json
│   │   ├── package.json
│   │   ├── pnpm-lock.yaml
│   │   ├── pnpm-workspace.yaml
│   │   └── README.md
│   └── xfce
│       ├── .dockerignore
│       ├── .gitignore
│       ├── Development.md
│       ├── Dockerfile
│       ├── Dockerfile.dev
│       ├── README.md
│       └── src
│           ├── scripts
│           │   ├── resize-display.sh
│           │   ├── start-computer-server.sh
│           │   ├── start-novnc.sh
│           │   ├── start-vnc.sh
│           │   └── xstartup.sh
│           ├── supervisor
│           │   └── supervisord.conf
│           └── xfce-config
│               ├── helpers.rc
│               ├── xfce4-power-manager.xml
│               └── xfce4-session.xml
├── LICENSE.md
├── Makefile
├── notebooks
│   ├── agent_nb.ipynb
│   ├── blog
│   │   ├── build-your-own-operator-on-macos-1.ipynb
│   │   └── build-your-own-operator-on-macos-2.ipynb
│   ├── composite_agents_docker_nb.ipynb
│   ├── computer_nb.ipynb
│   ├── computer_server_nb.ipynb
│   ├── customizing_computeragent.ipynb
│   ├── eval_osworld.ipynb
│   ├── ollama_nb.ipynb
│   ├── README.md
│   ├── sota_hackathon_cloud.ipynb
│   └── sota_hackathon.ipynb
├── package-lock.json
├── package.json
├── pnpm-lock.yaml
├── pyproject.toml
├── pyrightconfig.json
├── README.md
├── scripts
│   ├── install-cli.ps1
│   ├── install-cli.sh
│   ├── playground-docker.sh
│   ├── playground.sh
│   ├── run-docker-dev.sh
│   └── typescript-typecheck.js
├── TESTING.md
├── tests
│   ├── agent_loop_testing
│   │   ├── agent_test.py
│   │   └── README.md
│   ├── pytest.ini
│   ├── shell_cmd.py
│   ├── test_files.py
│   ├── test_mcp_server_session_management.py
│   ├── test_mcp_server_streaming.py
│   ├── test_shell_bash.py
│   ├── test_telemetry.py
│   ├── test_tracing.py
│   ├── test_venv.py
│   └── test_watchdog.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/libs/python/computer/computer/providers/lume/provider.py:
--------------------------------------------------------------------------------

```python
  1 | """Lume VM provider implementation using curl commands.
  2 | 
  3 | This provider uses direct curl commands to interact with the Lume API,
  4 | removing the dependency on the pylume Python package.
  5 | """
  6 | 
  7 | import asyncio
  8 | import json
  9 | import logging
 10 | import os
 11 | import re
 12 | import subprocess
 13 | import urllib.parse
 14 | from typing import Any, Dict, List, Optional, Tuple
 15 | 
 16 | from ...logger import Logger, LogLevel
 17 | from ..base import BaseVMProvider, VMProviderType
 18 | from ..lume_api import (
 19 |     HAS_CURL,
 20 |     lume_api_get,
 21 |     lume_api_pull,
 22 |     lume_api_run,
 23 |     lume_api_stop,
 24 |     lume_api_update,
 25 |     parse_memory,
 26 | )
 27 | 
 28 | # Setup logging
 29 | logger = logging.getLogger(__name__)
 30 | 
 31 | 
 32 | class LumeProvider(BaseVMProvider):
 33 |     """Lume VM provider implementation using direct curl commands.
 34 | 
 35 |     This provider uses curl to interact with the Lume API server,
 36 |     removing the dependency on the pylume Python package.
 37 |     """
 38 | 
 39 |     def __init__(
 40 |         self,
 41 |         provider_port: int = 7777,
 42 |         host: str = "localhost",
 43 |         storage: Optional[str] = None,
 44 |         verbose: bool = False,
 45 |         ephemeral: bool = False,
 46 |     ):
 47 |         """Initialize the Lume provider.
 48 | 
 49 |         Args:
 50 |             provider_port: Port for the Lume API server (default: 7777)
 51 |             host: Host to use for API connections (default: localhost)
 52 |             storage: Path to store VM data
 53 |             verbose: Enable verbose logging
 54 |         """
 55 |         if not HAS_CURL:
 56 |             raise ImportError(
 57 |                 "curl is required for LumeProvider. "
 58 |                 "Please ensure it is installed and in your PATH."
 59 |             )
 60 | 
 61 |         self.host = host
 62 |         self.port = provider_port  # Default port for Lume API
 63 |         self.storage = storage
 64 |         self.verbose = verbose
 65 |         self.ephemeral = ephemeral  # If True, VMs will be deleted after stopping
 66 | 
 67 |         # Base API URL for Lume API calls
 68 |         self.api_base_url = f"http://{self.host}:{self.port}"
 69 | 
 70 |         self.logger = logging.getLogger(__name__)
 71 | 
 72 |     @property
 73 |     def provider_type(self) -> VMProviderType:
 74 |         """Get the provider type."""
 75 |         return VMProviderType.LUME
 76 | 
 77 |     async def __aenter__(self):
 78 |         """Enter async context manager."""
 79 |         # No initialization needed, just return self
 80 |         return self
 81 | 
 82 |     async def __aexit__(self, exc_type, exc_val, exc_tb):
 83 |         """Exit async context manager."""
 84 |         # No cleanup needed
 85 |         pass
 86 | 
 87 |     def _lume_api_get(
 88 |         self, vm_name: str = "", storage: Optional[str] = None, debug: bool = False
 89 |     ) -> Dict[str, Any]:
 90 |         """Get VM information using shared lume_api function.
 91 | 
 92 |         Args:
 93 |             vm_name: Optional name of the VM to get info for.
 94 |                      If empty, lists all VMs.
 95 |             storage: Optional storage path override. If provided, this will be used instead of self.storage
 96 |             debug: Whether to show debug output
 97 | 
 98 |         Returns:
 99 |             Dictionary with VM status information parsed from JSON response
100 |         """
101 |         # Use the shared implementation from lume_api module
102 |         return lume_api_get(
103 |             vm_name=vm_name,
104 |             host=self.host,
105 |             port=self.port,
106 |             storage=storage if storage is not None else self.storage,
107 |             debug=debug,
108 |             verbose=self.verbose,
109 |         )
110 | 
111 |     def _lume_api_run(
112 |         self, vm_name: str, run_opts: Dict[str, Any], debug: bool = False
113 |     ) -> Dict[str, Any]:
114 |         """Run a VM using shared lume_api function.
115 | 
116 |         Args:
117 |             vm_name: Name of the VM to run
118 |             run_opts: Dictionary of run options
119 |             debug: Whether to show debug output
120 | 
121 |         Returns:
122 |             Dictionary with API response or error information
123 |         """
124 |         # Use the shared implementation from lume_api module
125 |         return lume_api_run(
126 |             vm_name=vm_name,
127 |             host=self.host,
128 |             port=self.port,
129 |             run_opts=run_opts,
130 |             storage=self.storage,
131 |             debug=debug,
132 |             verbose=self.verbose,
133 |         )
134 | 
135 |     def _lume_api_stop(self, vm_name: str, debug: bool = False) -> Dict[str, Any]:
136 |         """Stop a VM using shared lume_api function.
137 | 
138 |         Args:
139 |             vm_name: Name of the VM to stop
140 |             debug: Whether to show debug output
141 | 
142 |         Returns:
143 |             Dictionary with API response or error information
144 |         """
145 |         # Use the shared implementation from lume_api module
146 |         return lume_api_stop(
147 |             vm_name=vm_name,
148 |             host=self.host,
149 |             port=self.port,
150 |             storage=self.storage,
151 |             debug=debug,
152 |             verbose=self.verbose,
153 |         )
154 | 
155 |     def _lume_api_update(
156 |         self, vm_name: str, update_opts: Dict[str, Any], debug: bool = False
157 |     ) -> Dict[str, Any]:
158 |         """Update VM configuration using shared lume_api function.
159 | 
160 |         Args:
161 |             vm_name: Name of the VM to update
162 |             update_opts: Dictionary of update options
163 |             debug: Whether to show debug output
164 | 
165 |         Returns:
166 |             Dictionary with API response or error information
167 |         """
168 |         # Use the shared implementation from lume_api module
169 |         return lume_api_update(
170 |             vm_name=vm_name,
171 |             host=self.host,
172 |             port=self.port,
173 |             update_opts=update_opts,
174 |             storage=self.storage,
175 |             debug=debug,
176 |             verbose=self.verbose,
177 |         )
178 | 
179 |     async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
180 |         """Get VM information by name.
181 | 
182 |         Args:
183 |             name: Name of the VM to get information for
184 |             storage: Optional storage path override. If provided, this will be used
185 |                     instead of the provider's default storage path.
186 | 
187 |         Returns:
188 |             Dictionary with VM information including status, IP address, etc.
189 | 
190 |         Note:
191 |             If storage is not provided, the provider's default storage path will be used.
192 |             The storage parameter allows overriding the storage location for this specific call.
193 |         """
194 |         if not HAS_CURL:
195 |             logger.error("curl is not available. Cannot get VM status.")
196 |             return {"name": name, "status": "unavailable", "error": "curl is not available"}
197 | 
198 |         # First try to get detailed VM info from the API
199 |         try:
200 |             # Query the Lume API for VM status using the provider's storage_path
201 |             vm_info = self._lume_api_get(
202 |                 vm_name=name,
203 |                 storage=storage if storage is not None else self.storage,
204 |                 debug=self.verbose,
205 |             )
206 | 
207 |             # Check for API errors
208 |             if "error" in vm_info:
209 |                 logger.debug(f"API request error: {vm_info['error']}")
210 |                 # If we got an error from the API, report the VM as not ready yet
211 |                 return {
212 |                     "name": name,
213 |                     "status": "starting",  # VM is still starting - do not attempt to connect yet
214 |                     "api_status": "error",
215 |                     "error": vm_info["error"],
216 |                 }
217 | 
218 |             # Process the VM status information
219 |             vm_status = vm_info.get("status", "unknown")
220 | 
221 |             # Check if VM is stopped or not running - don't wait for IP in this case
222 |             if vm_status == "stopped":
223 |                 logger.info(f"VM {name} is in '{vm_status}' state - not waiting for IP address")
224 |                 # Return the status as-is without waiting for an IP
225 |                 result = {
226 |                     "name": name,
227 |                     "status": vm_status,
228 |                     **vm_info,  # Include all original fields from the API response
229 |                 }
230 |                 return result
231 | 
232 |             # Handle field name differences between APIs
233 |             # Some APIs use camelCase, others use snake_case
234 |             if "vncUrl" in vm_info:
235 |                 vnc_url = vm_info["vncUrl"]
236 |             elif "vnc_url" in vm_info:
237 |                 vnc_url = vm_info["vnc_url"]
238 |             else:
239 |                 vnc_url = ""
240 | 
241 |             if "ipAddress" in vm_info:
242 |                 ip_address = vm_info["ipAddress"]
243 |             elif "ip_address" in vm_info:
244 |                 ip_address = vm_info["ip_address"]
245 |             else:
246 |                 # If no IP address is provided and VM is supposed to be running,
247 |                 # report it as still starting
248 |                 ip_address = None
249 |                 logger.info(
250 |                     f"VM {name} is in '{vm_status}' state but no IP address found - reporting as still starting"
251 |                 )
252 | 
253 |             logger.info(f"VM {name} status: {vm_status}")
254 | 
255 |             # Return the complete status information
256 |             result = {
257 |                 "name": name,
258 |                 "status": vm_status if vm_status else "running",
259 |                 "ip_address": ip_address,
260 |                 "vnc_url": vnc_url,
261 |                 "api_status": "ok",
262 |             }
263 | 
264 |             # Include all original fields from the API response
265 |             if isinstance(vm_info, dict):
266 |                 for key, value in vm_info.items():
267 |                     if key not in result:  # Don't override our carefully processed fields
268 |                         result[key] = value
269 | 
270 |             return result
271 | 
272 |         except Exception as e:
273 |             logger.error(f"Failed to get VM status: {e}")
274 |             # Return a fallback status that indicates the VM is not ready yet
275 |             return {
276 |                 "name": name,
277 |                 "status": "initializing",  # VM is still initializing
278 |                 "error": f"Failed to get VM status: {str(e)}",
279 |             }
280 | 
281 |     async def list_vms(self) -> List[Dict[str, Any]]:
282 |         """List all available VMs."""
283 |         result = self._lume_api_get(debug=self.verbose)
284 | 
285 |         # Extract the VMs list from the response
286 |         if "vms" in result and isinstance(result["vms"], list):
287 |             return result["vms"]
288 |         elif "error" in result:
289 |             logger.error(f"Error listing VMs: {result['error']}")
290 |             return []
291 |         else:
292 |             return []
293 | 
294 |     async def run_vm(
295 |         self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None
296 |     ) -> Dict[str, Any]:
297 |         """Run a VM with the given options.
298 | 
299 |         If the VM does not exist in the storage location, this will attempt to pull it
300 |         from the Lume registry first.
301 | 
302 |         Args:
303 |             image: Image name to use when pulling the VM if it doesn't exist
304 |             name: Name of the VM to run
305 |             run_opts: Dictionary of run options (memory, cpu, etc.)
306 |             storage: Optional storage path override. If provided, this will be used
307 |                     instead of the provider's default storage path.
308 | 
309 |         Returns:
310 |             Dictionary with VM run status and information
311 |         """
312 |         # First check if VM exists by trying to get its info
313 |         vm_info = await self.get_vm(name, storage=storage)
314 | 
315 |         if "error" in vm_info:
316 |             # VM doesn't exist, try to pull it
317 |             self.logger.info(
318 |                 f"VM {name} not found, attempting to pull image {image} from registry..."
319 |             )
320 | 
321 |             # Call pull_vm with the image parameter
322 |             pull_result = await self.pull_vm(name=name, image=image, storage=storage)
323 | 
324 |             # Check if pull was successful
325 |             if "error" in pull_result:
326 |                 self.logger.error(f"Failed to pull VM image: {pull_result['error']}")
327 |                 return pull_result  # Return the error from pull
328 | 
329 |             self.logger.info(f"Successfully pulled VM image {image} as {name}")
330 | 
331 |         # Now run the VM with the given options
332 |         self.logger.info(f"Running VM {name} with options: {run_opts}")
333 | 
334 |         from ..lume_api import lume_api_run
335 | 
336 |         return lume_api_run(
337 |             vm_name=name,
338 |             host=self.host,
339 |             port=self.port,
340 |             run_opts=run_opts,
341 |             storage=storage if storage is not None else self.storage,
342 |             debug=self.verbose,
343 |             verbose=self.verbose,
344 |         )
345 | 
346 |     async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
347 |         """Stop a running VM.
348 | 
349 |         If this provider was initialized with ephemeral=True, the VM will also
350 |         be deleted after it is stopped.
351 | 
352 |         Args:
353 |             name: Name of the VM to stop
354 |             storage: Optional storage path override
355 | 
356 |         Returns:
357 |             Dictionary with stop status and information
358 |         """
359 |         # Stop the VM first
360 |         stop_result = self._lume_api_stop(name, debug=self.verbose)
361 | 
362 |         # Log ephemeral status for debugging
363 |         self.logger.info(f"Ephemeral mode status: {self.ephemeral}")
364 | 
365 |         # If ephemeral mode is enabled, delete the VM after stopping
366 |         if self.ephemeral and (stop_result.get("success", False) or "error" not in stop_result):
367 |             self.logger.info(f"Ephemeral mode enabled - deleting VM {name} after stopping")
368 |             try:
369 |                 delete_result = await self.delete_vm(name, storage=storage)
370 | 
371 |                 # Return combined result
372 |                 return {
373 |                     **stop_result,  # Include all stop result info
374 |                     "deleted": True,
375 |                     "delete_result": delete_result,
376 |                 }
377 |             except Exception as e:
378 |                 self.logger.error(f"Failed to delete ephemeral VM {name}: {e}")
379 |                 # Include the error but still return stop result
380 |                 return {**stop_result, "deleted": False, "delete_error": str(e)}
381 | 
382 |         # Just return the stop result if not ephemeral
383 |         return stop_result
384 | 
385 |     async def pull_vm(
386 |         self,
387 |         name: str,
388 |         image: str,
389 |         storage: Optional[str] = None,
390 |         registry: str = "ghcr.io",
391 |         organization: str = "trycua",
392 |         pull_opts: Optional[Dict[str, Any]] = None,
393 |     ) -> Dict[str, Any]:
394 |         """Pull a VM image from the registry.
395 | 
396 |         Args:
397 |             name: Name for the VM after pulling
398 |             image: The image name to pull (e.g. 'macos-sequoia-cua:latest')
399 |             storage: Optional storage path to use
400 |             registry: Registry to pull from (default: ghcr.io)
401 |             organization: Organization in registry (default: trycua)
402 |             pull_opts: Additional options for pulling the VM (optional)
403 | 
404 |         Returns:
405 |             Dictionary with information about the pulled VM
406 | 
407 |         Raises:
408 |             RuntimeError: If pull operation fails or image is not provided
409 |         """
410 |         # Validate image parameter
411 |         if not image:
412 |             raise ValueError("Image parameter is required for pull_vm")
413 | 
414 |         self.logger.info(f"Pulling VM image '{image}' as '{name}'")
415 |         self.logger.info("You can check the pull progress using: lume logs -f")
416 | 
417 |         # Set default pull_opts if not provided
418 |         if pull_opts is None:
419 |             pull_opts = {}
420 | 
421 |         # Log information about the operation
422 |         self.logger.debug(f"Pull storage location: {storage or 'default'}")
423 | 
424 |         try:
425 |             # Call the lume_api_pull function from lume_api.py
426 |             from ..lume_api import lume_api_pull
427 | 
428 |             result = lume_api_pull(
429 |                 image=image,
430 |                 name=name,
431 |                 host=self.host,
432 |                 port=self.port,
433 |                 storage=storage if storage is not None else self.storage,
434 |                 registry=registry,
435 |                 organization=organization,
436 |                 debug=self.verbose,
437 |                 verbose=self.verbose,
438 |             )
439 | 
440 |             # Check for errors in the result
441 |             if "error" in result:
442 |                 self.logger.error(f"Failed to pull VM image: {result['error']}")
443 |                 return result
444 | 
445 |             self.logger.info(f"Successfully pulled VM image '{image}' as '{name}'")
446 |             return result
447 |         except Exception as e:
448 |             self.logger.error(f"Failed to pull VM image '{image}': {e}")
449 |             return {"error": f"Failed to pull VM: {str(e)}"}
450 | 
451 |     async def delete_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
452 |         """Delete a VM permanently.
453 | 
454 |         Args:
455 |             name: Name of the VM to delete
456 |             storage: Optional storage path override
457 | 
458 |         Returns:
459 |             Dictionary with delete status and information
460 |         """
461 |         self.logger.info(f"Deleting VM {name}...")
462 | 
463 |         try:
464 |             # Call the lume_api_delete function we created
465 |             from ..lume_api import lume_api_delete
466 | 
467 |             result = lume_api_delete(
468 |                 vm_name=name,
469 |                 host=self.host,
470 |                 port=self.port,
471 |                 storage=storage if storage is not None else self.storage,
472 |                 debug=self.verbose,
473 |                 verbose=self.verbose,
474 |             )
475 | 
476 |             # Check for errors in the result
477 |             if "error" in result:
478 |                 self.logger.error(f"Failed to delete VM: {result['error']}")
479 |                 return result
480 | 
481 |             self.logger.info(f"Successfully deleted VM '{name}'")
482 |             return result
483 |         except Exception as e:
484 |             self.logger.error(f"Failed to delete VM '{name}': {e}")
485 |             return {"error": f"Failed to delete VM: {str(e)}"}
486 | 
487 |     async def update_vm(
488 |         self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None
489 |     ) -> Dict[str, Any]:
490 |         """Update VM configuration."""
491 |         return self._lume_api_update(name, update_opts, debug=self.verbose)
492 | 
493 |     async def restart_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
494 |         raise NotImplementedError("LumeProvider does not support restarting VMs.")
495 | 
496 |     async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> str:
497 |         """Get the IP address of a VM, waiting indefinitely until it's available.
498 | 
499 |         Args:
500 |             name: Name of the VM to get the IP for
501 |             storage: Optional storage path override
502 |             retry_delay: Delay between retries in seconds (default: 2)
503 | 
504 |         Returns:
505 |             IP address of the VM when it becomes available
506 |         """
507 |         # Track total attempts for logging purposes
508 |         total_attempts = 0
509 | 
510 |         # Loop indefinitely until we get a valid IP
511 |         while True:
512 |             total_attempts += 1
513 | 
514 |             # Log retry message but not on first attempt
515 |             if total_attempts > 1:
516 |                 self.logger.info(f"Waiting for VM {name} IP address (attempt {total_attempts})...")
517 | 
518 |             try:
519 |                 # Get VM information
520 |                 vm_info = await self.get_vm(name, storage=storage)
521 | 
522 |                 # Check if we got a valid IP
523 |                 ip = vm_info.get("ip_address", None)
524 |                 if ip and ip != "unknown" and not ip.startswith("0.0.0.0"):
525 |                     self.logger.info(f"Got valid VM IP address: {ip}")
526 |                     return ip
527 | 
528 |                 # Check the VM status
529 |                 status = vm_info.get("status", "unknown")
530 | 
531 |                 # If VM is not running yet, log and wait
532 |                 if status != "running":
533 |                     self.logger.info(f"VM is not running yet (status: {status}). Waiting...")
534 |                 # If VM is running but no IP yet, wait and retry
535 |                 else:
536 |                     self.logger.info("VM is running but no valid IP address yet. Waiting...")
537 | 
538 |             except Exception as e:
539 |                 self.logger.warning(f"Error getting VM {name} IP: {e}, continuing to wait...")
540 | 
541 |             # Wait before next retry
542 |             await asyncio.sleep(retry_delay)
543 | 
544 |             # Add progress log every 10 attempts
545 |             if total_attempts % 10 == 0:
546 |                 self.logger.info(
547 |                     f"Still waiting for VM {name} IP after {total_attempts} attempts..."
548 |                 )
549 | 
```

--------------------------------------------------------------------------------
/libs/python/computer/computer/providers/lume_api.py:
--------------------------------------------------------------------------------

```python
  1 | """Shared API utilities for Lume and Lumier providers.
  2 | 
  3 | This module contains shared functions for interacting with the Lume API,
  4 | used by both the LumeProvider and LumierProvider classes.
  5 | """
  6 | 
  7 | import json
  8 | import logging
  9 | import subprocess
 10 | import urllib.parse
 11 | from typing import Any, Dict, List, Optional
 12 | 
 13 | from computer.utils import safe_join
 14 | 
 15 | # Setup logging
 16 | logger = logging.getLogger(__name__)
 17 | 
 18 | # Check if curl is available
 19 | try:
 20 |     subprocess.run(["curl", "--version"], capture_output=True, check=True)
 21 |     HAS_CURL = True
 22 | except (subprocess.SubprocessError, FileNotFoundError):
 23 |     HAS_CURL = False
 24 | 
 25 | 
 26 | def lume_api_get(
 27 |     vm_name: str,
 28 |     host: str,
 29 |     port: int,
 30 |     storage: Optional[str] = None,
 31 |     debug: bool = False,
 32 |     verbose: bool = False,
 33 | ) -> Dict[str, Any]:
 34 |     """Use curl to get VM information from Lume API.
 35 | 
 36 |     Args:
 37 |         vm_name: Name of the VM to get info for
 38 |         host: API host
 39 |         port: API port
 40 |         storage: Storage path for the VM
 41 |         debug: Whether to show debug output
 42 |         verbose: Enable verbose logging
 43 | 
 44 |     Returns:
 45 |         Dictionary with VM status information parsed from JSON response
 46 |     """
 47 |     # URL encode the storage parameter for the query
 48 |     encoded_storage = ""
 49 |     storage_param = ""
 50 | 
 51 |     if storage:
 52 |         # First encode the storage path properly
 53 |         encoded_storage = urllib.parse.quote(storage, safe="")
 54 |         storage_param = f"?storage={encoded_storage}"
 55 | 
 56 |     # Construct API URL with encoded storage parameter if needed
 57 |     api_url = f"http://{host}:{port}/lume/vms/{vm_name}{storage_param}"
 58 | 
 59 |     # Construct the curl command with increased timeouts for more reliability
 60 |     # --connect-timeout: Time to establish connection (15 seconds)
 61 |     # --max-time: Maximum time for the whole operation (20 seconds)
 62 |     # -f: Fail silently (no output at all) on server errors
 63 |     # Add single quotes around URL to ensure special characters are handled correctly
 64 |     cmd = ["curl", "--connect-timeout", "15", "--max-time", "20", "-s", "-f", api_url]
 65 | 
 66 |     # For logging and display, show the properly escaped URL
 67 |     display_cmd = ["curl", "--connect-timeout", "15", "--max-time", "20", "-s", "-f", api_url]
 68 | 
 69 |     # Only print the curl command when debug is enabled
 70 |     display_curl_string = " ".join(display_cmd)
 71 |     logger.debug(f"Executing API request: {display_curl_string}")
 72 | 
 73 |     # Execute the command - for execution we need to use shell=True to handle URLs with special characters
 74 |     try:
 75 |         # Use a single string with shell=True for proper URL handling
 76 |         shell_cmd = safe_join(cmd)
 77 |         result = subprocess.run(shell_cmd, shell=True, capture_output=True, text=True)
 78 | 
 79 |         # Handle curl exit codes
 80 |         if result.returncode != 0:
 81 |             curl_error = "Unknown error"
 82 | 
 83 |             # Map common curl error codes to helpful messages
 84 |             if result.returncode == 7:
 85 |                 curl_error = "Failed to connect to the API server - it might still be starting up"
 86 |             elif result.returncode == 22:
 87 |                 curl_error = "HTTP error returned from API server"
 88 |             elif result.returncode == 28:
 89 |                 curl_error = "Operation timeout - the API server is taking too long to respond"
 90 |             elif result.returncode == 52:
 91 |                 curl_error = (
 92 |                     "Empty reply from server - the API server is starting but not fully ready yet"
 93 |                 )
 94 |             elif result.returncode == 56:
 95 |                 curl_error = "Network problem during data transfer - check container networking"
 96 | 
 97 |             # Only log at debug level to reduce noise during retries
 98 |             logger.debug(f"API request failed with code {result.returncode}: {curl_error}")
 99 | 
100 |             # Return a more useful error message
101 |             return {
102 |                 "error": f"API request failed: {curl_error}",
103 |                 "curl_code": result.returncode,
104 |                 "vm_name": vm_name,
105 |                 "status": "unknown",  # We don't know the actual status due to API error
106 |             }
107 | 
108 |         # Try to parse the response as JSON
109 |         if result.stdout and result.stdout.strip():
110 |             try:
111 |                 vm_status = json.loads(result.stdout)
112 |                 if debug or verbose:
113 |                     logger.info(
114 |                         f"Successfully parsed VM status: {vm_status.get('status', 'unknown')}"
115 |                     )
116 |                 return vm_status
117 |             except json.JSONDecodeError as e:
118 |                 # Return the raw response if it's not valid JSON
119 |                 logger.warning(f"Invalid JSON response: {e}")
120 |                 if "Virtual machine not found" in result.stdout:
121 |                     return {"status": "not_found", "message": "VM not found in Lume API"}
122 | 
123 |                 return {
124 |                     "error": f"Invalid JSON response: {result.stdout[:100]}...",
125 |                     "status": "unknown",
126 |                 }
127 |         else:
128 |             return {"error": "Empty response from API", "status": "unknown"}
129 |     except subprocess.SubprocessError as e:
130 |         logger.error(f"Failed to execute API request: {e}")
131 |         return {"error": f"Failed to execute API request: {str(e)}", "status": "unknown"}
132 | 
133 | 
134 | def lume_api_run(
135 |     vm_name: str,
136 |     host: str,
137 |     port: int,
138 |     run_opts: Dict[str, Any],
139 |     storage: Optional[str] = None,
140 |     debug: bool = False,
141 |     verbose: bool = False,
142 | ) -> Dict[str, Any]:
143 |     """Run a VM using curl.
144 | 
145 |     Args:
146 |         vm_name: Name of the VM to run
147 |         host: API host
148 |         port: API port
149 |         run_opts: Dictionary of run options
150 |         storage: Storage path for the VM
151 |         debug: Whether to show debug output
152 |         verbose: Enable verbose logging
153 | 
154 |     Returns:
155 |         Dictionary with API response or error information
156 |     """
157 |     # Construct API URL
158 |     api_url = f"http://{host}:{port}/lume/vms/{vm_name}/run"
159 | 
160 |     # Prepare JSON payload with required parameters
161 |     payload = {}
162 | 
163 |     # Add CPU cores if specified
164 |     if "cpu" in run_opts:
165 |         payload["cpu"] = run_opts["cpu"]
166 | 
167 |     # Add memory if specified
168 |     if "memory" in run_opts:
169 |         payload["memory"] = run_opts["memory"]
170 | 
171 |     # Add storage parameter if specified
172 |     if storage:
173 |         payload["storage"] = storage
174 |     elif "storage" in run_opts:
175 |         payload["storage"] = run_opts["storage"]
176 | 
177 |     # Add shared directories if specified
178 |     if "shared_directories" in run_opts and run_opts["shared_directories"]:
179 |         payload["sharedDirectories"] = run_opts["shared_directories"]
180 | 
181 |     # Log the payload for debugging
182 |     logger.debug(f"API payload: {json.dumps(payload, indent=2)}")
183 | 
184 |     # Construct the curl command
185 |     cmd = [
186 |         "curl",
187 |         "--connect-timeout",
188 |         "30",
189 |         "--max-time",
190 |         "30",
191 |         "-s",
192 |         "-X",
193 |         "POST",
194 |         "-H",
195 |         "Content-Type: application/json",
196 |         "-d",
197 |         json.dumps(payload),
198 |         api_url,
199 |     ]
200 | 
201 |     # Execute the command
202 |     try:
203 |         result = subprocess.run(cmd, capture_output=True, text=True)
204 | 
205 |         if result.returncode != 0:
206 |             logger.warning(f"API request failed with code {result.returncode}: {result.stderr}")
207 |             return {"error": f"API request failed: {result.stderr}"}
208 | 
209 |         # Try to parse the response as JSON
210 |         if result.stdout and result.stdout.strip():
211 |             try:
212 |                 response = json.loads(result.stdout)
213 |                 return response
214 |             except json.JSONDecodeError:
215 |                 # Return the raw response if it's not valid JSON
216 |                 return {
217 |                     "success": True,
218 |                     "message": "VM started successfully",
219 |                     "raw_response": result.stdout,
220 |                 }
221 |         else:
222 |             return {"success": True, "message": "VM started successfully"}
223 |     except subprocess.SubprocessError as e:
224 |         logger.error(f"Failed to execute run request: {e}")
225 |         return {"error": f"Failed to execute run request: {str(e)}"}
226 | 
227 | 
228 | def lume_api_stop(
229 |     vm_name: str,
230 |     host: str,
231 |     port: int,
232 |     storage: Optional[str] = None,
233 |     debug: bool = False,
234 |     verbose: bool = False,
235 | ) -> Dict[str, Any]:
236 |     """Stop a VM using curl.
237 | 
238 |     Args:
239 |         vm_name: Name of the VM to stop
240 |         host: API host
241 |         port: API port
242 |         storage: Storage path for the VM
243 |         debug: Whether to show debug output
244 |         verbose: Enable verbose logging
245 | 
246 |     Returns:
247 |         Dictionary with API response or error information
248 |     """
249 |     # Construct API URL
250 |     api_url = f"http://{host}:{port}/lume/vms/{vm_name}/stop"
251 | 
252 |     # Prepare JSON payload with required parameters
253 |     payload = {}
254 | 
255 |     # Add storage path if specified
256 |     if storage:
257 |         payload["storage"] = storage
258 | 
259 |     # Construct the curl command
260 |     cmd = [
261 |         "curl",
262 |         "--connect-timeout",
263 |         "15",
264 |         "--max-time",
265 |         "20",
266 |         "-s",
267 |         "-X",
268 |         "POST",
269 |         "-H",
270 |         "Content-Type: application/json",
271 |         "-d",
272 |         json.dumps(payload),
273 |         api_url,
274 |     ]
275 | 
276 |     # Execute the command
277 |     try:
278 |         if debug or verbose:
279 |             logger.info(f"Executing: {' '.join(cmd)}")
280 | 
281 |         result = subprocess.run(cmd, capture_output=True, text=True)
282 | 
283 |         if result.returncode != 0:
284 |             logger.warning(f"API request failed with code {result.returncode}: {result.stderr}")
285 |             return {"error": f"API request failed: {result.stderr}"}
286 | 
287 |         # Try to parse the response as JSON
288 |         if result.stdout and result.stdout.strip():
289 |             try:
290 |                 response = json.loads(result.stdout)
291 |                 return response
292 |             except json.JSONDecodeError:
293 |                 # Return the raw response if it's not valid JSON
294 |                 return {
295 |                     "success": True,
296 |                     "message": "VM stopped successfully",
297 |                     "raw_response": result.stdout,
298 |                 }
299 |         else:
300 |             return {"success": True, "message": "VM stopped successfully"}
301 |     except subprocess.SubprocessError as e:
302 |         logger.error(f"Failed to execute stop request: {e}")
303 |         return {"error": f"Failed to execute stop request: {str(e)}"}
304 | 
305 | 
306 | def lume_api_update(
307 |     vm_name: str,
308 |     host: str,
309 |     port: int,
310 |     update_opts: Dict[str, Any],
311 |     storage: Optional[str] = None,
312 |     debug: bool = False,
313 |     verbose: bool = False,
314 | ) -> Dict[str, Any]:
315 |     """Update VM settings using curl.
316 | 
317 |     Args:
318 |         vm_name: Name of the VM to update
319 |         host: API host
320 |         port: API port
321 |         update_opts: Dictionary of update options
322 |         storage: Storage path for the VM
323 |         debug: Whether to show debug output
324 |         verbose: Enable verbose logging
325 | 
326 |     Returns:
327 |         Dictionary with API response or error information
328 |     """
329 |     # Construct API URL
330 |     api_url = f"http://{host}:{port}/lume/vms/{vm_name}/update"
331 | 
332 |     # Prepare JSON payload with required parameters
333 |     payload = {}
334 | 
335 |     # Add CPU cores if specified
336 |     if "cpu" in update_opts:
337 |         payload["cpu"] = update_opts["cpu"]
338 | 
339 |     # Add memory if specified
340 |     if "memory" in update_opts:
341 |         payload["memory"] = update_opts["memory"]
342 | 
343 |     # Add storage path if specified
344 |     if storage:
345 |         payload["storage"] = storage
346 | 
347 |     # Construct the curl command
348 |     cmd = [
349 |         "curl",
350 |         "--connect-timeout",
351 |         "15",
352 |         "--max-time",
353 |         "20",
354 |         "-s",
355 |         "-X",
356 |         "POST",
357 |         "-H",
358 |         "Content-Type: application/json",
359 |         "-d",
360 |         json.dumps(payload),
361 |         api_url,
362 |     ]
363 | 
364 |     # Execute the command
365 |     try:
366 |         if debug:
367 |             logger.info(f"Executing: {' '.join(cmd)}")
368 | 
369 |         result = subprocess.run(cmd, capture_output=True, text=True)
370 | 
371 |         if result.returncode != 0:
372 |             logger.warning(f"API request failed with code {result.returncode}: {result.stderr}")
373 |             return {"error": f"API request failed: {result.stderr}"}
374 | 
375 |         # Try to parse the response as JSON
376 |         if result.stdout and result.stdout.strip():
377 |             try:
378 |                 response = json.loads(result.stdout)
379 |                 return response
380 |             except json.JSONDecodeError:
381 |                 # Return the raw response if it's not valid JSON
382 |                 return {
383 |                     "success": True,
384 |                     "message": "VM updated successfully",
385 |                     "raw_response": result.stdout,
386 |                 }
387 |         else:
388 |             return {"success": True, "message": "VM updated successfully"}
389 |     except subprocess.SubprocessError as e:
390 |         logger.error(f"Failed to execute update request: {e}")
391 |         return {"error": f"Failed to execute update request: {str(e)}"}
392 | 
393 | 
394 | def lume_api_pull(
395 |     image: str,
396 |     name: str,
397 |     host: str,
398 |     port: int,
399 |     storage: Optional[str] = None,
400 |     registry: str = "ghcr.io",
401 |     organization: str = "trycua",
402 |     debug: bool = False,
403 |     verbose: bool = False,
404 | ) -> Dict[str, Any]:
405 |     """Pull a VM image from a registry using curl.
406 | 
407 |     Args:
408 |         image: Name/tag of the image to pull
409 |         name: Name to give the VM after pulling
410 |         host: API host
411 |         port: API port
412 |         storage: Storage path for the VM
413 |         registry: Registry to pull from (default: ghcr.io)
414 |         organization: Organization in registry (default: trycua)
415 |         debug: Whether to show debug output
416 |         verbose: Enable verbose logging
417 | 
418 |     Returns:
419 |         Dictionary with pull status and information
420 |     """
421 |     # Prepare pull request payload
422 |     pull_payload = {
423 |         "image": image,  # Use provided image name
424 |         "name": name,  # Always use name as the target VM name
425 |         "registry": registry,
426 |         "organization": organization,
427 |     }
428 | 
429 |     if storage:
430 |         pull_payload["storage"] = storage
431 | 
432 |     # Construct pull command with proper JSON payload
433 |     pull_cmd = ["curl"]
434 | 
435 |     if not verbose:
436 |         pull_cmd.append("-s")
437 | 
438 |     pull_cmd.extend(
439 |         [
440 |             "-X",
441 |             "POST",
442 |             "-H",
443 |             "Content-Type: application/json",
444 |             "-d",
445 |             json.dumps(pull_payload),
446 |             f"http://{host}:{port}/lume/pull",
447 |         ]
448 |     )
449 | 
450 |     logger.debug(f"Executing API request: {' '.join(pull_cmd)}")
451 | 
452 |     try:
453 |         # Execute pull command
454 |         result = subprocess.run(pull_cmd, capture_output=True, text=True)
455 | 
456 |         if result.returncode != 0:
457 |             error_msg = f"Failed to pull VM {name}: {result.stderr}"
458 |             logger.error(error_msg)
459 |             return {"error": error_msg}
460 | 
461 |         try:
462 |             response = json.loads(result.stdout)
463 |             logger.info(f"Successfully initiated pull for VM {name}")
464 |             return response
465 |         except json.JSONDecodeError:
466 |             if result.stdout:
467 |                 logger.info(f"Pull response: {result.stdout}")
468 |             return {"success": True, "message": f"Successfully initiated pull for VM {name}"}
469 | 
470 |     except subprocess.SubprocessError as e:
471 |         error_msg = f"Failed to execute pull command: {str(e)}"
472 |         logger.error(error_msg)
473 |         return {"error": error_msg}
474 | 
475 | 
476 | def lume_api_delete(
477 |     vm_name: str,
478 |     host: str,
479 |     port: int,
480 |     storage: Optional[str] = None,
481 |     debug: bool = False,
482 |     verbose: bool = False,
483 | ) -> Dict[str, Any]:
484 |     """Delete a VM using curl.
485 | 
486 |     Args:
487 |         vm_name: Name of the VM to delete
488 |         host: API host
489 |         port: API port
490 |         storage: Storage path for the VM
491 |         debug: Whether to show debug output
492 |         verbose: Enable verbose logging
493 | 
494 |     Returns:
495 |         Dictionary with API response or error information
496 |     """
497 |     # URL encode the storage parameter for the query
498 |     encoded_storage = ""
499 |     storage_param = ""
500 | 
501 |     if storage:
502 |         # First encode the storage path properly
503 |         encoded_storage = urllib.parse.quote(storage, safe="")
504 |         storage_param = f"?storage={encoded_storage}"
505 | 
506 |     # Construct API URL with encoded storage parameter if needed
507 |     api_url = f"http://{host}:{port}/lume/vms/{vm_name}{storage_param}"
508 | 
509 |     # Construct the curl command for DELETE operation - using much longer timeouts matching shell implementation
510 |     cmd = [
511 |         "curl",
512 |         "--connect-timeout",
513 |         "6000",
514 |         "--max-time",
515 |         "5000",
516 |         "-s",
517 |         "-X",
518 |         "DELETE",
519 |         api_url,
520 |     ]
521 | 
522 |     # For logging and display, show the properly escaped URL
523 |     display_cmd = [
524 |         "curl",
525 |         "--connect-timeout",
526 |         "6000",
527 |         "--max-time",
528 |         "5000",
529 |         "-s",
530 |         "-X",
531 |         "DELETE",
532 |         api_url,
533 |     ]
534 | 
535 |     # Only print the curl command when debug is enabled
536 |     display_curl_string = " ".join(display_cmd)
537 |     logger.debug(f"Executing API request: {display_curl_string}")
538 | 
539 |     # Execute the command - for execution we need to use shell=True to handle URLs with special characters
540 |     try:
541 |         # Use a single string with shell=True for proper URL handling
542 |         shell_cmd = safe_join(cmd)
543 |         result = subprocess.run(shell_cmd, shell=True, capture_output=True, text=True)
544 | 
545 |         # Handle curl exit codes
546 |         if result.returncode != 0:
547 |             curl_error = "Unknown error"
548 | 
549 |             # Map common curl error codes to helpful messages
550 |             if result.returncode == 7:
551 |                 curl_error = "Failed to connect to the API server - it might still be starting up"
552 |             elif result.returncode == 22:
553 |                 curl_error = "HTTP error returned from API server"
554 |             elif result.returncode == 28:
555 |                 curl_error = "Operation timeout - the API server is taking too long to respond"
556 |             elif result.returncode == 52:
557 |                 curl_error = (
558 |                     "Empty reply from server - the API server is starting but not fully ready yet"
559 |                 )
560 |             elif result.returncode == 56:
561 |                 curl_error = "Network problem during data transfer - check container networking"
562 | 
563 |             # Only log at debug level to reduce noise during retries
564 |             logger.debug(f"API request failed with code {result.returncode}: {curl_error}")
565 | 
566 |             # Return a more useful error message
567 |             return {
568 |                 "error": f"API request failed: {curl_error}",
569 |                 "curl_code": result.returncode,
570 |                 "vm_name": vm_name,
571 |                 "storage": storage,
572 |             }
573 | 
574 |         # Try to parse the response as JSON
575 |         if result.stdout and result.stdout.strip():
576 |             try:
577 |                 response = json.loads(result.stdout)
578 |                 return response
579 |             except json.JSONDecodeError:
580 |                 # Return the raw response if it's not valid JSON
581 |                 return {
582 |                     "success": True,
583 |                     "message": "VM deleted successfully",
584 |                     "raw_response": result.stdout,
585 |                 }
586 |         else:
587 |             return {"success": True, "message": "VM deleted successfully"}
588 |     except subprocess.SubprocessError as e:
589 |         logger.error(f"Failed to execute delete request: {e}")
590 |         return {"error": f"Failed to execute delete request: {str(e)}"}
591 | 
592 | 
593 | def parse_memory(memory_str: str) -> int:
594 |     """Parse memory string to MB integer.
595 | 
596 |     Examples:
597 |         "8GB" -> 8192
598 |         "1024MB" -> 1024
599 |         "512" -> 512
600 | 
601 |     Returns:
602 |         Memory value in MB
603 |     """
604 |     if isinstance(memory_str, int):
605 |         return memory_str
606 | 
607 |     if isinstance(memory_str, str):
608 |         # Extract number and unit
609 |         import re
610 | 
611 |         match = re.match(r"(\d+)([A-Za-z]*)", memory_str)
612 |         if match:
613 |             value, unit = match.groups()
614 |             value = int(value)
615 |             unit = unit.upper()
616 | 
617 |             if unit == "GB" or unit == "G":
618 |                 return value * 1024
619 |             elif unit == "MB" or unit == "M" or unit == "":
620 |                 return value
621 | 
622 |     # Default fallback
623 |     logger.warning(f"Could not parse memory string '{memory_str}', using 8GB default")
624 |     return 8192  # Default to 8GB
625 | 
```

--------------------------------------------------------------------------------
/libs/python/computer-server/computer_server/handlers/linux.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Linux implementation of automation and accessibility handlers.
  3 | 
  4 | This implementation attempts to use pyautogui for GUI automation when available.
  5 | If running in a headless environment without X11, it will fall back to simulated responses.
  6 | To use GUI automation in a headless environment:
  7 | 1. Install Xvfb: sudo apt-get install xvfb
  8 | 2. Run with virtual display: xvfb-run python -m computer_server
  9 | """
 10 | 
 11 | import asyncio
 12 | import base64
 13 | import json
 14 | import logging
 15 | import os
 16 | import subprocess
 17 | from io import BytesIO
 18 | from typing import Any, Dict, List, Optional, Tuple
 19 | 
 20 | # Configure logger
 21 | logger = logging.getLogger(__name__)
 22 | 
 23 | # Try to import pyautogui, but don't fail if it's not available
 24 | # This allows the server to run in headless environments
 25 | try:
 26 |     import pyautogui
 27 | 
 28 |     pyautogui.FAILSAFE = False
 29 | 
 30 |     logger.info("pyautogui successfully imported, GUI automation available")
 31 | except Exception as e:
 32 |     logger.warning(f"pyautogui import failed: {str(e)}. GUI operations will be simulated.")
 33 | 
 34 | from pynput.keyboard import Controller as KeyboardController
 35 | from pynput.keyboard import Key
 36 | from pynput.mouse import Button
 37 | from pynput.mouse import Controller as MouseController
 38 | 
 39 | from .base import BaseAccessibilityHandler, BaseAutomationHandler
 40 | 
 41 | 
 42 | class LinuxAccessibilityHandler(BaseAccessibilityHandler):
 43 |     """Linux implementation of accessibility handler."""
 44 | 
 45 |     async def get_accessibility_tree(self) -> Dict[str, Any]:
 46 |         """Get the accessibility tree of the current window.
 47 | 
 48 |         Returns:
 49 |             Dict[str, Any]: A dictionary containing success status and a simulated tree structure
 50 |                            since Linux doesn't have equivalent accessibility API like macOS.
 51 |         """
 52 |         # Linux doesn't have equivalent accessibility API like macOS
 53 |         # Return a minimal dummy tree
 54 |         logger.info(
 55 |             "Getting accessibility tree (simulated, no accessibility API available on Linux)"
 56 |         )
 57 |         return {
 58 |             "success": True,
 59 |             "tree": {
 60 |                 "role": "Window",
 61 |                 "title": "Linux Window",
 62 |                 "position": {"x": 0, "y": 0},
 63 |                 "size": {"width": 1920, "height": 1080},
 64 |                 "children": [],
 65 |             },
 66 |         }
 67 | 
 68 |     async def find_element(
 69 |         self, role: Optional[str] = None, title: Optional[str] = None, value: Optional[str] = None
 70 |     ) -> Dict[str, Any]:
 71 |         """Find an element in the accessibility tree by criteria.
 72 | 
 73 |         Args:
 74 |             role: The role of the element to find.
 75 |             title: The title of the element to find.
 76 |             value: The value of the element to find.
 77 | 
 78 |         Returns:
 79 |             Dict[str, Any]: A dictionary indicating that element search is not supported on Linux.
 80 |         """
 81 |         logger.info(
 82 |             f"Finding element with role={role}, title={title}, value={value} (not supported on Linux)"
 83 |         )
 84 |         return {"success": False, "message": "Element search not supported on Linux"}
 85 | 
 86 |     def get_cursor_position(self) -> Tuple[int, int]:
 87 |         """Get the current cursor position.
 88 | 
 89 |         Returns:
 90 |             Tuple[int, int]: The x and y coordinates of the cursor position.
 91 |                            Returns (0, 0) if pyautogui is not available.
 92 |         """
 93 |         try:
 94 |             pos = pyautogui.position()
 95 |             return pos.x, pos.y
 96 |         except Exception as e:
 97 |             logger.warning(f"Failed to get cursor position with pyautogui: {e}")
 98 | 
 99 |         logger.info("Getting cursor position (simulated)")
100 |         return 0, 0
101 | 
102 |     def get_screen_size(self) -> Tuple[int, int]:
103 |         """Get the screen size.
104 | 
105 |         Returns:
106 |             Tuple[int, int]: The width and height of the screen in pixels.
107 |                            Returns (1920, 1080) if pyautogui is not available.
108 |         """
109 |         try:
110 |             size = pyautogui.size()
111 |             return size.width, size.height
112 |         except Exception as e:
113 |             logger.warning(f"Failed to get screen size with pyautogui: {e}")
114 | 
115 |         logger.info("Getting screen size (simulated)")
116 |         return 1920, 1080
117 | 
118 | 
119 | class LinuxAutomationHandler(BaseAutomationHandler):
120 |     """Linux implementation of automation handler using pyautogui."""
121 | 
122 |     keyboard = KeyboardController()
123 |     mouse = MouseController()
124 | 
125 |     # Mouse Actions
126 |     async def mouse_down(
127 |         self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
128 |     ) -> Dict[str, Any]:
129 |         """Press and hold a mouse button at the specified coordinates.
130 | 
131 |         Args:
132 |             x: The x coordinate to move to before pressing. If None, uses current position.
133 |             y: The y coordinate to move to before pressing. If None, uses current position.
134 |             button: The mouse button to press ("left", "right", or "middle").
135 | 
136 |         Returns:
137 |             Dict[str, Any]: A dictionary with success status and error message if failed.
138 |         """
139 |         try:
140 |             if x is not None and y is not None:
141 |                 pyautogui.moveTo(x, y)
142 |             pyautogui.mouseDown(button=button)
143 |             return {"success": True}
144 |         except Exception as e:
145 |             return {"success": False, "error": str(e)}
146 | 
147 |     async def mouse_up(
148 |         self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
149 |     ) -> Dict[str, Any]:
150 |         """Release a mouse button at the specified coordinates.
151 | 
152 |         Args:
153 |             x: The x coordinate to move to before releasing. If None, uses current position.
154 |             y: The y coordinate to move to before releasing. If None, uses current position.
155 |             button: The mouse button to release ("left", "right", or "middle").
156 | 
157 |         Returns:
158 |             Dict[str, Any]: A dictionary with success status and error message if failed.
159 |         """
160 |         try:
161 |             if x is not None and y is not None:
162 |                 pyautogui.moveTo(x, y)
163 |             pyautogui.mouseUp(button=button)
164 |             return {"success": True}
165 |         except Exception as e:
166 |             return {"success": False, "error": str(e)}
167 | 
168 |     async def move_cursor(self, x: int, y: int) -> Dict[str, Any]:
169 |         """Move the cursor to the specified coordinates.
170 | 
171 |         Args:
172 |             x: The x coordinate to move to.
173 |             y: The y coordinate to move to.
174 | 
175 |         Returns:
176 |             Dict[str, Any]: A dictionary with success status and error message if failed.
177 |         """
178 |         try:
179 |             pyautogui.moveTo(x, y)
180 |             return {"success": True}
181 |         except Exception as e:
182 |             return {"success": False, "error": str(e)}
183 | 
184 |     async def left_click(self, x: Optional[int] = None, y: Optional[int] = None) -> Dict[str, Any]:
185 |         """Perform a left mouse click at the specified coordinates.
186 | 
187 |         Args:
188 |             x: The x coordinate to click at. If None, clicks at current position.
189 |             y: The y coordinate to click at. If None, clicks at current position.
190 | 
191 |         Returns:
192 |             Dict[str, Any]: A dictionary with success status and error message if failed.
193 |         """
194 |         try:
195 |             if x is not None and y is not None:
196 |                 pyautogui.moveTo(x, y)
197 |             pyautogui.click()
198 |             return {"success": True}
199 |         except Exception as e:
200 |             return {"success": False, "error": str(e)}
201 | 
202 |     async def right_click(self, x: Optional[int] = None, y: Optional[int] = None) -> Dict[str, Any]:
203 |         """Perform a right mouse click at the specified coordinates.
204 | 
205 |         Args:
206 |             x: The x coordinate to click at. If None, clicks at current position.
207 |             y: The y coordinate to click at. If None, clicks at current position.
208 | 
209 |         Returns:
210 |             Dict[str, Any]: A dictionary with success status and error message if failed.
211 |         """
212 |         try:
213 |             if x is not None and y is not None:
214 |                 pyautogui.moveTo(x, y)
215 |             pyautogui.rightClick()
216 |             return {"success": True}
217 |         except Exception as e:
218 |             return {"success": False, "error": str(e)}
219 | 
220 |     async def double_click(
221 |         self, x: Optional[int] = None, y: Optional[int] = None
222 |     ) -> Dict[str, Any]:
223 |         """Perform a double click at the specified coordinates.
224 | 
225 |         Args:
226 |             x: The x coordinate to double click at. If None, clicks at current position.
227 |             y: The y coordinate to double click at. If None, clicks at current position.
228 | 
229 |         Returns:
230 |             Dict[str, Any]: A dictionary with success status and error message if failed.
231 |         """
232 |         try:
233 |             if x is not None and y is not None:
234 |                 pyautogui.moveTo(x, y)
235 |             pyautogui.doubleClick(interval=0.1)
236 |             return {"success": True}
237 |         except Exception as e:
238 |             return {"success": False, "error": str(e)}
239 | 
240 |     async def click(
241 |         self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
242 |     ) -> Dict[str, Any]:
243 |         """Perform a mouse click with the specified button at the given coordinates.
244 | 
245 |         Args:
246 |             x: The x coordinate to click at. If None, clicks at current position.
247 |             y: The y coordinate to click at. If None, clicks at current position.
248 |             button: The mouse button to click ("left", "right", or "middle").
249 | 
250 |         Returns:
251 |             Dict[str, Any]: A dictionary with success status and error message if failed.
252 |         """
253 |         try:
254 |             if x is not None and y is not None:
255 |                 pyautogui.moveTo(x, y)
256 |             pyautogui.click(button=button)
257 |             return {"success": True}
258 |         except Exception as e:
259 |             return {"success": False, "error": str(e)}
260 | 
261 |     async def drag_to(
262 |         self, x: int, y: int, button: str = "left", duration: float = 0.5
263 |     ) -> Dict[str, Any]:
264 |         """Drag from the current position to the specified coordinates.
265 | 
266 |         Args:
267 |             x: The x coordinate to drag to.
268 |             y: The y coordinate to drag to.
269 |             button: The mouse button to use for dragging.
270 |             duration: The time in seconds to take for the drag operation.
271 | 
272 |         Returns:
273 |             Dict[str, Any]: A dictionary with success status and error message if failed.
274 |         """
275 |         try:
276 |             pyautogui.dragTo(x, y, duration=duration, button=button)
277 |             return {"success": True}
278 |         except Exception as e:
279 |             return {"success": False, "error": str(e)}
280 | 
281 |     async def drag(
282 |         self, start_x: int, start_y: int, end_x: int, end_y: int, button: str = "left"
283 |     ) -> Dict[str, Any]:
284 |         """Drag from start coordinates to end coordinates.
285 | 
286 |         Args:
287 |             start_x: The starting x coordinate.
288 |             start_y: The starting y coordinate.
289 |             end_x: The ending x coordinate.
290 |             end_y: The ending y coordinate.
291 |             button: The mouse button to use for dragging.
292 | 
293 |         Returns:
294 |             Dict[str, Any]: A dictionary with success status and error message if failed.
295 |         """
296 |         try:
297 |             pyautogui.moveTo(start_x, start_y)
298 |             pyautogui.dragTo(end_x, end_y, duration=0.5, button=button)
299 |             return {"success": True}
300 |         except Exception as e:
301 |             return {"success": False, "error": str(e)}
302 | 
303 |     async def drag_path(
304 |         self, path: List[Tuple[int, int]], button: str = "left", duration: float = 0.5
305 |     ) -> Dict[str, Any]:
306 |         """Drag along a path defined by a list of coordinates.
307 | 
308 |         Args:
309 |             path: A list of (x, y) coordinate tuples defining the drag path.
310 |             button: The mouse button to use for dragging.
311 |             duration: The time in seconds to take for each segment of the drag.
312 | 
313 |         Returns:
314 |             Dict[str, Any]: A dictionary with success status and error message if failed.
315 |         """
316 |         try:
317 |             if not path:
318 |                 return {"success": False, "error": "Path is empty"}
319 |             pyautogui.moveTo(*path[0])
320 |             for x, y in path[1:]:
321 |                 pyautogui.dragTo(x, y, duration=duration, button=button)
322 |             return {"success": True}
323 |         except Exception as e:
324 |             return {"success": False, "error": str(e)}
325 | 
326 |     # Keyboard Actions
327 |     async def key_down(self, key: str) -> Dict[str, Any]:
328 |         """Press and hold a key.
329 | 
330 |         Args:
331 |             key: The key to press down.
332 | 
333 |         Returns:
334 |             Dict[str, Any]: A dictionary with success status and error message if failed.
335 |         """
336 |         try:
337 |             pyautogui.keyDown(key)
338 |             return {"success": True}
339 |         except Exception as e:
340 |             return {"success": False, "error": str(e)}
341 | 
342 |     async def key_up(self, key: str) -> Dict[str, Any]:
343 |         """Release a key.
344 | 
345 |         Args:
346 |             key: The key to release.
347 | 
348 |         Returns:
349 |             Dict[str, Any]: A dictionary with success status and error message if failed.
350 |         """
351 |         try:
352 |             pyautogui.keyUp(key)
353 |             return {"success": True}
354 |         except Exception as e:
355 |             return {"success": False, "error": str(e)}
356 | 
357 |     async def type_text(self, text: str) -> Dict[str, Any]:
358 |         """Type the specified text using the keyboard.
359 | 
360 |         Args:
361 |             text: The text to type.
362 | 
363 |         Returns:
364 |             Dict[str, Any]: A dictionary with success status and error message if failed.
365 |         """
366 |         try:
367 |             # use pynput for Unicode support
368 |             self.keyboard.type(text)
369 |             return {"success": True}
370 |         except Exception as e:
371 |             return {"success": False, "error": str(e)}
372 | 
373 |     async def press_key(self, key: str) -> Dict[str, Any]:
374 |         """Press and release a key.
375 | 
376 |         Args:
377 |             key: The key to press.
378 | 
379 |         Returns:
380 |             Dict[str, Any]: A dictionary with success status and error message if failed.
381 |         """
382 |         try:
383 |             pyautogui.press(key)
384 |             return {"success": True}
385 |         except Exception as e:
386 |             return {"success": False, "error": str(e)}
387 | 
388 |     async def hotkey(self, keys: List[str]) -> Dict[str, Any]:
389 |         """Press a combination of keys simultaneously.
390 | 
391 |         Args:
392 |             keys: A list of keys to press together as a hotkey combination.
393 | 
394 |         Returns:
395 |             Dict[str, Any]: A dictionary with success status and error message if failed.
396 |         """
397 |         try:
398 |             pyautogui.hotkey(*keys)
399 |             return {"success": True}
400 |         except Exception as e:
401 |             return {"success": False, "error": str(e)}
402 | 
403 |     # Scrolling Actions
404 |     async def scroll(self, x: int, y: int) -> Dict[str, Any]:
405 |         """Scroll the mouse wheel.
406 | 
407 |         Args:
408 |             x: The horizontal scroll amount.
409 |             y: The vertical scroll amount.
410 | 
411 |         Returns:
412 |             Dict[str, Any]: A dictionary with success status and error message if failed.
413 |         """
414 |         try:
415 |             self.mouse.scroll(x, y)
416 |             return {"success": True}
417 |         except Exception as e:
418 |             return {"success": False, "error": str(e)}
419 | 
420 |     async def scroll_down(self, clicks: int = 1) -> Dict[str, Any]:
421 |         """Scroll down by the specified number of clicks.
422 | 
423 |         Args:
424 |             clicks: The number of scroll clicks to perform downward.
425 | 
426 |         Returns:
427 |             Dict[str, Any]: A dictionary with success status and error message if failed.
428 |         """
429 |         try:
430 |             pyautogui.scroll(-clicks)
431 |             return {"success": True}
432 |         except Exception as e:
433 |             return {"success": False, "error": str(e)}
434 | 
435 |     async def scroll_up(self, clicks: int = 1) -> Dict[str, Any]:
436 |         """Scroll up by the specified number of clicks.
437 | 
438 |         Args:
439 |             clicks: The number of scroll clicks to perform upward.
440 | 
441 |         Returns:
442 |             Dict[str, Any]: A dictionary with success status and error message if failed.
443 |         """
444 |         try:
445 |             pyautogui.scroll(clicks)
446 |             return {"success": True}
447 |         except Exception as e:
448 |             return {"success": False, "error": str(e)}
449 | 
450 |     # Screen Actions
451 |     async def screenshot(self) -> Dict[str, Any]:
452 |         """Take a screenshot of the current screen.
453 | 
454 |         Returns:
455 |             Dict[str, Any]: A dictionary containing success status and base64-encoded image data,
456 |                            or error message if failed.
457 |         """
458 |         try:
459 |             from PIL import Image
460 | 
461 |             screenshot = pyautogui.screenshot()
462 |             if not isinstance(screenshot, Image.Image):
463 |                 return {"success": False, "error": "Failed to capture screenshot"}
464 |             buffered = BytesIO()
465 |             screenshot.save(buffered, format="PNG", optimize=True)
466 |             buffered.seek(0)
467 |             image_data = base64.b64encode(buffered.getvalue()).decode()
468 |             return {"success": True, "image_data": image_data}
469 |         except Exception as e:
470 |             return {"success": False, "error": f"Screenshot error: {str(e)}"}
471 | 
472 |     async def get_screen_size(self) -> Dict[str, Any]:
473 |         """Get the size of the screen.
474 | 
475 |         Returns:
476 |             Dict[str, Any]: A dictionary containing success status and screen dimensions,
477 |                            or error message if failed.
478 |         """
479 |         try:
480 |             size = pyautogui.size()
481 |             return {"success": True, "size": {"width": size.width, "height": size.height}}
482 |         except Exception as e:
483 |             return {"success": False, "error": str(e)}
484 | 
485 |     async def get_cursor_position(self) -> Dict[str, Any]:
486 |         """Get the current position of the cursor.
487 | 
488 |         Returns:
489 |             Dict[str, Any]: A dictionary containing success status and cursor coordinates,
490 |                            or error message if failed.
491 |         """
492 |         try:
493 |             pos = pyautogui.position()
494 |             return {"success": True, "position": {"x": pos.x, "y": pos.y}}
495 |         except Exception as e:
496 |             return {"success": False, "error": str(e)}
497 | 
498 |     # Clipboard Actions
499 |     async def copy_to_clipboard(self) -> Dict[str, Any]:
500 |         """Get the current content of the clipboard.
501 | 
502 |         Returns:
503 |             Dict[str, Any]: A dictionary containing success status and clipboard content,
504 |                            or error message if failed.
505 |         """
506 |         try:
507 |             import pyperclip
508 | 
509 |             content = pyperclip.paste()
510 |             return {"success": True, "content": content}
511 |         except Exception as e:
512 |             return {"success": False, "error": str(e)}
513 | 
514 |     async def set_clipboard(self, text: str) -> Dict[str, Any]:
515 |         """Set the clipboard content to the specified text.
516 | 
517 |         Args:
518 |             text: The text to copy to the clipboard.
519 | 
520 |         Returns:
521 |             Dict[str, Any]: A dictionary with success status and error message if failed.
522 |         """
523 |         try:
524 |             import pyperclip
525 | 
526 |             pyperclip.copy(text)
527 |             return {"success": True}
528 |         except Exception as e:
529 |             return {"success": False, "error": str(e)}
530 | 
531 |     # Command Execution
532 |     async def run_command(self, command: str) -> Dict[str, Any]:
533 |         """Execute a shell command asynchronously.
534 | 
535 |         Args:
536 |             command: The shell command to execute.
537 | 
538 |         Returns:
539 |             Dict[str, Any]: A dictionary containing success status, stdout, stderr,
540 |                            and return code, or error message if failed.
541 |         """
542 |         try:
543 |             # Create subprocess
544 |             process = await asyncio.create_subprocess_shell(
545 |                 command, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
546 |             )
547 |             # Wait for the subprocess to finish
548 |             stdout, stderr = await process.communicate()
549 |             # Return decoded output
550 |             return {
551 |                 "success": True,
552 |                 "stdout": stdout.decode() if stdout else "",
553 |                 "stderr": stderr.decode() if stderr else "",
554 |                 "return_code": process.returncode,
555 |             }
556 |         except Exception as e:
557 |             return {"success": False, "error": str(e)}
558 | 
```

--------------------------------------------------------------------------------
/libs/python/computer/computer/interface/base.py:
--------------------------------------------------------------------------------

```python
  1 | """Base interface for computer control."""
  2 | 
  3 | from abc import ABC, abstractmethod
  4 | from typing import Any, Dict, List, Optional, Tuple
  5 | 
  6 | from ..logger import Logger, LogLevel
  7 | from .models import CommandResult, MouseButton
  8 | 
  9 | 
 10 | class BaseComputerInterface(ABC):
 11 |     """Base class for computer control interfaces."""
 12 | 
 13 |     def __init__(
 14 |         self,
 15 |         ip_address: str,
 16 |         username: str = "lume",
 17 |         password: str = "lume",
 18 |         api_key: Optional[str] = None,
 19 |         vm_name: Optional[str] = None,
 20 |     ):
 21 |         """Initialize interface.
 22 | 
 23 |         Args:
 24 |             ip_address: IP address of the computer to control
 25 |             username: Username for authentication
 26 |             password: Password for authentication
 27 |             api_key: Optional API key for cloud authentication
 28 |             vm_name: Optional VM name for cloud authentication
 29 |         """
 30 |         self.ip_address = ip_address
 31 |         self.username = username
 32 |         self.password = password
 33 |         self.api_key = api_key
 34 |         self.vm_name = vm_name
 35 |         self.logger = Logger("cua.interface", LogLevel.NORMAL)
 36 | 
 37 |         # Optional default delay time between commands (in seconds)
 38 |         self.delay: float = 0.0
 39 | 
 40 |     @abstractmethod
 41 |     async def wait_for_ready(self, timeout: int = 60) -> None:
 42 |         """Wait for interface to be ready.
 43 | 
 44 |         Args:
 45 |             timeout: Maximum time to wait in seconds
 46 | 
 47 |         Raises:
 48 |             TimeoutError: If interface is not ready within timeout
 49 |         """
 50 |         pass
 51 | 
 52 |     @abstractmethod
 53 |     def close(self) -> None:
 54 |         """Close the interface connection."""
 55 |         pass
 56 | 
 57 |     def force_close(self) -> None:
 58 |         """Force close the interface connection.
 59 | 
 60 |         By default, this just calls close(), but subclasses can override
 61 |         to provide more forceful cleanup.
 62 |         """
 63 |         self.close()
 64 | 
 65 |     # Mouse Actions
 66 |     @abstractmethod
 67 |     async def mouse_down(
 68 |         self,
 69 |         x: Optional[int] = None,
 70 |         y: Optional[int] = None,
 71 |         button: "MouseButton" = "left",
 72 |         delay: Optional[float] = None,
 73 |     ) -> None:
 74 |         """Press and hold a mouse button.
 75 | 
 76 |         Args:
 77 |             x: X coordinate to press at. If None, uses current cursor position.
 78 |             y: Y coordinate to press at. If None, uses current cursor position.
 79 |             button: Mouse button to press ('left', 'middle', 'right').
 80 |             delay: Optional delay in seconds after the action
 81 |         """
 82 |         pass
 83 | 
 84 |     @abstractmethod
 85 |     async def mouse_up(
 86 |         self,
 87 |         x: Optional[int] = None,
 88 |         y: Optional[int] = None,
 89 |         button: "MouseButton" = "left",
 90 |         delay: Optional[float] = None,
 91 |     ) -> None:
 92 |         """Release a mouse button.
 93 | 
 94 |         Args:
 95 |             x: X coordinate to release at. If None, uses current cursor position.
 96 |             y: Y coordinate to release at. If None, uses current cursor position.
 97 |             button: Mouse button to release ('left', 'middle', 'right').
 98 |             delay: Optional delay in seconds after the action
 99 |         """
100 |         pass
101 | 
102 |     @abstractmethod
103 |     async def left_click(
104 |         self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
105 |     ) -> None:
106 |         """Perform a left mouse button click.
107 | 
108 |         Args:
109 |             x: X coordinate to click at. If None, uses current cursor position.
110 |             y: Y coordinate to click at. If None, uses current cursor position.
111 |             delay: Optional delay in seconds after the action
112 |         """
113 |         pass
114 | 
115 |     @abstractmethod
116 |     async def right_click(
117 |         self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
118 |     ) -> None:
119 |         """Perform a right mouse button click.
120 | 
121 |         Args:
122 |             x: X coordinate to click at. If None, uses current cursor position.
123 |             y: Y coordinate to click at. If None, uses current cursor position.
124 |             delay: Optional delay in seconds after the action
125 |         """
126 |         pass
127 | 
128 |     @abstractmethod
129 |     async def double_click(
130 |         self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
131 |     ) -> None:
132 |         """Perform a double left mouse button click.
133 | 
134 |         Args:
135 |             x: X coordinate to double-click at. If None, uses current cursor position.
136 |             y: Y coordinate to double-click at. If None, uses current cursor position.
137 |             delay: Optional delay in seconds after the action
138 |         """
139 |         pass
140 | 
141 |     @abstractmethod
142 |     async def move_cursor(self, x: int, y: int, delay: Optional[float] = None) -> None:
143 |         """Move the cursor to the specified screen coordinates.
144 | 
145 |         Args:
146 |             x: X coordinate to move cursor to.
147 |             y: Y coordinate to move cursor to.
148 |             delay: Optional delay in seconds after the action
149 |         """
150 |         pass
151 | 
152 |     @abstractmethod
153 |     async def drag_to(
154 |         self,
155 |         x: int,
156 |         y: int,
157 |         button: str = "left",
158 |         duration: float = 0.5,
159 |         delay: Optional[float] = None,
160 |     ) -> None:
161 |         """Drag from current position to specified coordinates.
162 | 
163 |         Args:
164 |             x: The x coordinate to drag to
165 |             y: The y coordinate to drag to
166 |             button: The mouse button to use ('left', 'middle', 'right')
167 |             duration: How long the drag should take in seconds
168 |             delay: Optional delay in seconds after the action
169 |         """
170 |         pass
171 | 
172 |     @abstractmethod
173 |     async def drag(
174 |         self,
175 |         path: List[Tuple[int, int]],
176 |         button: str = "left",
177 |         duration: float = 0.5,
178 |         delay: Optional[float] = None,
179 |     ) -> None:
180 |         """Drag the cursor along a path of coordinates.
181 | 
182 |         Args:
183 |             path: List of (x, y) coordinate tuples defining the drag path
184 |             button: The mouse button to use ('left', 'middle', 'right')
185 |             duration: Total time in seconds that the drag operation should take
186 |             delay: Optional delay in seconds after the action
187 |         """
188 |         pass
189 | 
190 |     # Keyboard Actions
191 |     @abstractmethod
192 |     async def key_down(self, key: str, delay: Optional[float] = None) -> None:
193 |         """Press and hold a key.
194 | 
195 |         Args:
196 |             key: The key to press and hold (e.g., 'a', 'shift', 'ctrl').
197 |             delay: Optional delay in seconds after the action.
198 |         """
199 |         pass
200 | 
201 |     @abstractmethod
202 |     async def key_up(self, key: str, delay: Optional[float] = None) -> None:
203 |         """Release a previously pressed key.
204 | 
205 |         Args:
206 |             key: The key to release (e.g., 'a', 'shift', 'ctrl').
207 |             delay: Optional delay in seconds after the action.
208 |         """
209 |         pass
210 | 
211 |     @abstractmethod
212 |     async def type_text(self, text: str, delay: Optional[float] = None) -> None:
213 |         """Type the specified text string.
214 | 
215 |         Args:
216 |             text: The text string to type.
217 |             delay: Optional delay in seconds after the action.
218 |         """
219 |         pass
220 | 
221 |     @abstractmethod
222 |     async def press_key(self, key: str, delay: Optional[float] = None) -> None:
223 |         """Press and release a single key.
224 | 
225 |         Args:
226 |             key: The key to press (e.g., 'a', 'enter', 'escape').
227 |             delay: Optional delay in seconds after the action.
228 |         """
229 |         pass
230 | 
231 |     @abstractmethod
232 |     async def hotkey(self, *keys: str, delay: Optional[float] = None) -> None:
233 |         """Press multiple keys simultaneously (keyboard shortcut).
234 | 
235 |         Args:
236 |             *keys: Variable number of keys to press together (e.g., 'ctrl', 'c').
237 |             delay: Optional delay in seconds after the action.
238 |         """
239 |         pass
240 | 
241 |     # Scrolling Actions
242 |     @abstractmethod
243 |     async def scroll(self, x: int, y: int, delay: Optional[float] = None) -> None:
244 |         """Scroll the mouse wheel by specified amounts.
245 | 
246 |         Args:
247 |             x: Horizontal scroll amount (positive = right, negative = left).
248 |             y: Vertical scroll amount (positive = up, negative = down).
249 |             delay: Optional delay in seconds after the action.
250 |         """
251 |         pass
252 | 
253 |     @abstractmethod
254 |     async def scroll_down(self, clicks: int = 1, delay: Optional[float] = None) -> None:
255 |         """Scroll down by the specified number of clicks.
256 | 
257 |         Args:
258 |             clicks: Number of scroll clicks to perform downward.
259 |             delay: Optional delay in seconds after the action.
260 |         """
261 |         pass
262 | 
263 |     @abstractmethod
264 |     async def scroll_up(self, clicks: int = 1, delay: Optional[float] = None) -> None:
265 |         """Scroll up by the specified number of clicks.
266 | 
267 |         Args:
268 |             clicks: Number of scroll clicks to perform upward.
269 |             delay: Optional delay in seconds after the action.
270 |         """
271 |         pass
272 | 
273 |     # Screen Actions
274 |     @abstractmethod
275 |     async def screenshot(self) -> bytes:
276 |         """Take a screenshot.
277 | 
278 |         Returns:
279 |             Raw bytes of the screenshot image
280 |         """
281 |         pass
282 | 
283 |     @abstractmethod
284 |     async def get_screen_size(self) -> Dict[str, int]:
285 |         """Get the screen dimensions.
286 | 
287 |         Returns:
288 |             Dict with 'width' and 'height' keys
289 |         """
290 |         pass
291 | 
292 |     @abstractmethod
293 |     async def get_cursor_position(self) -> Dict[str, int]:
294 |         """Get the current cursor position on screen.
295 | 
296 |         Returns:
297 |             Dict with 'x' and 'y' keys containing cursor coordinates.
298 |         """
299 |         pass
300 | 
301 |     # Clipboard Actions
302 |     @abstractmethod
303 |     async def copy_to_clipboard(self) -> str:
304 |         """Get the current clipboard content.
305 | 
306 |         Returns:
307 |             The text content currently stored in the clipboard.
308 |         """
309 |         pass
310 | 
311 |     @abstractmethod
312 |     async def set_clipboard(self, text: str) -> None:
313 |         """Set the clipboard content to the specified text.
314 | 
315 |         Args:
316 |             text: The text to store in the clipboard.
317 |         """
318 |         pass
319 | 
320 |     # File System Actions
321 |     @abstractmethod
322 |     async def file_exists(self, path: str) -> bool:
323 |         """Check if a file exists at the specified path.
324 | 
325 |         Args:
326 |             path: The file path to check.
327 | 
328 |         Returns:
329 |             True if the file exists, False otherwise.
330 |         """
331 |         pass
332 | 
333 |     @abstractmethod
334 |     async def directory_exists(self, path: str) -> bool:
335 |         """Check if a directory exists at the specified path.
336 | 
337 |         Args:
338 |             path: The directory path to check.
339 | 
340 |         Returns:
341 |             True if the directory exists, False otherwise.
342 |         """
343 |         pass
344 | 
345 |     @abstractmethod
346 |     async def list_dir(self, path: str) -> List[str]:
347 |         """List the contents of a directory.
348 | 
349 |         Args:
350 |             path: The directory path to list.
351 | 
352 |         Returns:
353 |             List of file and directory names in the specified directory.
354 |         """
355 |         pass
356 | 
357 |     @abstractmethod
358 |     async def read_text(self, path: str) -> str:
359 |         """Read the text contents of a file.
360 | 
361 |         Args:
362 |             path: The file path to read from.
363 | 
364 |         Returns:
365 |             The text content of the file.
366 |         """
367 |         pass
368 | 
369 |     @abstractmethod
370 |     async def write_text(self, path: str, content: str) -> None:
371 |         """Write text content to a file.
372 | 
373 |         Args:
374 |             path: The file path to write to.
375 |             content: The text content to write.
376 |         """
377 |         pass
378 | 
379 |     @abstractmethod
380 |     async def read_bytes(self, path: str, offset: int = 0, length: Optional[int] = None) -> bytes:
381 |         """Read file binary contents with optional seeking support.
382 | 
383 |         Args:
384 |             path: Path to the file
385 |             offset: Byte offset to start reading from (default: 0)
386 |             length: Number of bytes to read (default: None for entire file)
387 |         """
388 |         pass
389 | 
390 |     @abstractmethod
391 |     async def write_bytes(self, path: str, content: bytes) -> None:
392 |         """Write binary content to a file.
393 | 
394 |         Args:
395 |             path: The file path to write to.
396 |             content: The binary content to write.
397 |         """
398 |         pass
399 | 
400 |     @abstractmethod
401 |     async def delete_file(self, path: str) -> None:
402 |         """Delete a file at the specified path.
403 | 
404 |         Args:
405 |             path: The file path to delete.
406 |         """
407 |         pass
408 | 
409 |     @abstractmethod
410 |     async def create_dir(self, path: str) -> None:
411 |         """Create a directory at the specified path.
412 | 
413 |         Args:
414 |             path: The directory path to create.
415 |         """
416 |         pass
417 | 
418 |     @abstractmethod
419 |     async def delete_dir(self, path: str) -> None:
420 |         """Delete a directory at the specified path.
421 | 
422 |         Args:
423 |             path: The directory path to delete.
424 |         """
425 |         pass
426 | 
427 |     @abstractmethod
428 |     async def get_file_size(self, path: str) -> int:
429 |         """Get the size of a file in bytes.
430 | 
431 |         Args:
432 |             path: The file path to get the size of.
433 | 
434 |         Returns:
435 |             The size of the file in bytes.
436 |         """
437 |         pass
438 | 
439 |     # Desktop actions
440 |     @abstractmethod
441 |     async def get_desktop_environment(self) -> str:
442 |         """Get the current desktop environment.
443 | 
444 |         Returns:
445 |             The name of the current desktop environment.
446 |         """
447 |         pass
448 | 
449 |     @abstractmethod
450 |     async def set_wallpaper(self, path: str) -> None:
451 |         """Set the desktop wallpaper to the specified path.
452 | 
453 |         Args:
454 |             path: The file path to set as wallpaper
455 |         """
456 |         pass
457 | 
458 |     # Window management
459 |     @abstractmethod
460 |     async def open(self, target: str) -> None:
461 |         """Open a target using the system's default handler.
462 | 
463 |         Typically opens files, folders, or URLs with the associated application.
464 | 
465 |         Args:
466 |             target: The file path, folder path, or URL to open.
467 |         """
468 |         pass
469 | 
470 |     @abstractmethod
471 |     async def launch(self, app: str, args: List[str] | None = None) -> Optional[int]:
472 |         """Launch an application with optional arguments.
473 | 
474 |         Args:
475 |             app: The application executable or bundle identifier.
476 |             args: Optional list of arguments to pass to the application.
477 | 
478 |         Returns:
479 |             Optional process ID (PID) of the launched application if available, otherwise None.
480 |         """
481 |         pass
482 | 
483 |     @abstractmethod
484 |     async def get_current_window_id(self) -> int | str:
485 |         """Get the identifier of the currently active/focused window.
486 | 
487 |         Returns:
488 |             A window identifier that can be used with other window management methods.
489 |         """
490 |         pass
491 | 
492 |     @abstractmethod
493 |     async def get_application_windows(self, app: str) -> List[int | str]:
494 |         """Get all window identifiers for a specific application.
495 | 
496 |         Args:
497 |             app: The application name, executable, or identifier to query.
498 | 
499 |         Returns:
500 |             A list of window identifiers belonging to the specified application.
501 |         """
502 |         pass
503 | 
504 |     @abstractmethod
505 |     async def get_window_name(self, window_id: int | str) -> str:
506 |         """Get the title/name of a window.
507 | 
508 |         Args:
509 |             window_id: The window identifier.
510 | 
511 |         Returns:
512 |             The window's title or name string.
513 |         """
514 |         pass
515 | 
516 |     @abstractmethod
517 |     async def get_window_size(self, window_id: int | str) -> tuple[int, int]:
518 |         """Get the size of a window in pixels.
519 | 
520 |         Args:
521 |             window_id: The window identifier.
522 | 
523 |         Returns:
524 |             A tuple of (width, height) representing the window size in pixels.
525 |         """
526 |         pass
527 | 
528 |     @abstractmethod
529 |     async def get_window_position(self, window_id: int | str) -> tuple[int, int]:
530 |         """Get the screen position of a window.
531 | 
532 |         Args:
533 |             window_id: The window identifier.
534 | 
535 |         Returns:
536 |             A tuple of (x, y) representing the window's top-left corner in screen coordinates.
537 |         """
538 |         pass
539 | 
540 |     @abstractmethod
541 |     async def set_window_size(self, window_id: int | str, width: int, height: int) -> None:
542 |         """Set the size of a window in pixels.
543 | 
544 |         Args:
545 |             window_id: The window identifier.
546 |             width: Desired width in pixels.
547 |             height: Desired height in pixels.
548 |         """
549 |         pass
550 | 
551 |     @abstractmethod
552 |     async def set_window_position(self, window_id: int | str, x: int, y: int) -> None:
553 |         """Move a window to a specific position on the screen.
554 | 
555 |         Args:
556 |             window_id: The window identifier.
557 |             x: X coordinate for the window's top-left corner.
558 |             y: Y coordinate for the window's top-left corner.
559 |         """
560 |         pass
561 | 
562 |     @abstractmethod
563 |     async def maximize_window(self, window_id: int | str) -> None:
564 |         """Maximize a window.
565 | 
566 |         Args:
567 |             window_id: The window identifier.
568 |         """
569 |         pass
570 | 
571 |     @abstractmethod
572 |     async def minimize_window(self, window_id: int | str) -> None:
573 |         """Minimize a window.
574 | 
575 |         Args:
576 |             window_id: The window identifier.
577 |         """
578 |         pass
579 | 
580 |     @abstractmethod
581 |     async def activate_window(self, window_id: int | str) -> None:
582 |         """Bring a window to the foreground and focus it.
583 | 
584 |         Args:
585 |             window_id: The window identifier.
586 |         """
587 |         pass
588 | 
589 |     @abstractmethod
590 |     async def close_window(self, window_id: int | str) -> None:
591 |         """Close a window.
592 | 
593 |         Args:
594 |             window_id: The window identifier.
595 |         """
596 |         pass
597 | 
598 |     # Convenience aliases
599 |     async def get_window_title(self, window_id: int | str) -> str:
600 |         """Convenience alias for get_window_name().
601 | 
602 |         Args:
603 |             window_id: The window identifier.
604 | 
605 |         Returns:
606 |             The window's title or name string.
607 |         """
608 |         return await self.get_window_name(window_id)
609 | 
610 |     async def window_size(self, window_id: int | str) -> tuple[int, int]:
611 |         """Convenience alias for get_window_size().
612 | 
613 |         Args:
614 |             window_id: The window identifier.
615 | 
616 |         Returns:
617 |             A tuple of (width, height) representing the window size in pixels.
618 |         """
619 |         return await self.get_window_size(window_id)
620 | 
621 |     # Shell actions
622 |     @abstractmethod
623 |     async def run_command(self, command: str) -> CommandResult:
624 |         """Run shell command and return structured result.
625 | 
626 |         Executes a shell command using subprocess.run with shell=True and check=False.
627 |         The command is run in the target environment and captures both stdout and stderr.
628 | 
629 |         Args:
630 |             command (str): The shell command to execute
631 | 
632 |         Returns:
633 |             CommandResult: A structured result containing:
634 |                 - stdout (str): Standard output from the command
635 |                 - stderr (str): Standard error from the command
636 |                 - returncode (int): Exit code from the command (0 indicates success)
637 | 
638 |         Raises:
639 |             RuntimeError: If the command execution fails at the system level
640 | 
641 |         Example:
642 |             result = await interface.run_command("ls -la")
643 |             if result.returncode == 0:
644 |                 print(f"Output: {result.stdout}")
645 |             else:
646 |                 print(f"Error: {result.stderr}, Exit code: {result.returncode}")
647 |         """
648 |         pass
649 | 
650 |     # Accessibility Actions
651 |     @abstractmethod
652 |     async def get_accessibility_tree(self) -> Dict:
653 |         """Get the accessibility tree of the current screen.
654 | 
655 |         Returns:
656 |             Dict containing the hierarchical accessibility information of screen elements.
657 |         """
658 |         pass
659 | 
660 |     @abstractmethod
661 |     async def to_screen_coordinates(self, x: float, y: float) -> tuple[float, float]:
662 |         """Convert screenshot coordinates to screen coordinates.
663 | 
664 |         Args:
665 |             x: X coordinate in screenshot space
666 |             y: Y coordinate in screenshot space
667 | 
668 |         Returns:
669 |             tuple[float, float]: (x, y) coordinates in screen space
670 |         """
671 |         pass
672 | 
673 |     @abstractmethod
674 |     async def to_screenshot_coordinates(self, x: float, y: float) -> tuple[float, float]:
675 |         """Convert screen coordinates to screenshot coordinates.
676 | 
677 |         Args:
678 |             x: X coordinate in screen space
679 |             y: Y coordinate in screen space
680 | 
681 |         Returns:
682 |             tuple[float, float]: (x, y) coordinates in screenshot space
683 |         """
684 |         pass
685 | 
```

--------------------------------------------------------------------------------
/libs/python/computer/computer/providers/docker/provider.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Docker VM provider implementation.
  3 | 
  4 | This provider uses Docker containers running the CUA Ubuntu image to create
  5 | Linux VMs with computer-server. It handles VM lifecycle operations through Docker
  6 | commands and container management.
  7 | """
  8 | 
  9 | import asyncio
 10 | import json
 11 | import logging
 12 | import re
 13 | import subprocess
 14 | import time
 15 | from typing import Any, Dict, List, Optional
 16 | 
 17 | from ..base import BaseVMProvider, VMProviderType
 18 | 
 19 | # Setup logging
 20 | logger = logging.getLogger(__name__)
 21 | 
 22 | # Check if Docker is available
 23 | try:
 24 |     subprocess.run(["docker", "--version"], capture_output=True, check=True)
 25 |     HAS_DOCKER = True
 26 | except (subprocess.SubprocessError, FileNotFoundError):
 27 |     HAS_DOCKER = False
 28 | 
 29 | 
 30 | class DockerProvider(BaseVMProvider):
 31 |     """
 32 |     Docker VM Provider implementation using Docker containers.
 33 | 
 34 |     This provider uses Docker to run containers with the CUA Ubuntu image
 35 |     that includes computer-server for remote computer use.
 36 |     """
 37 | 
 38 |     def __init__(
 39 |         self,
 40 |         host: str = "localhost",
 41 |         storage: Optional[str] = None,
 42 |         shared_path: Optional[str] = None,
 43 |         image: str = "trycua/cua-ubuntu:latest",
 44 |         verbose: bool = False,
 45 |         ephemeral: bool = False,
 46 |         vnc_port: Optional[int] = 6901,
 47 |         api_port: Optional[int] = None,
 48 |     ):
 49 |         """Initialize the Docker VM Provider.
 50 | 
 51 |         Args:
 52 |             host: Hostname for the API server (default: localhost)
 53 |             storage: Path for persistent VM storage
 54 |             shared_path: Path for shared folder between host and container
 55 |             image: Docker image to use (default: "trycua/cua-ubuntu:latest")
 56 |                    Supported images:
 57 |                    - "trycua/cua-ubuntu:latest" (Kasm-based)
 58 |                    - "trycua/cua-xfce:latest" (vanilla XFCE)
 59 |             verbose: Enable verbose logging
 60 |             ephemeral: Use ephemeral (temporary) storage
 61 |             vnc_port: Port for VNC interface (default: 6901)
 62 |             api_port: Port for API server (default: 8000)
 63 |         """
 64 |         self.host = host
 65 |         self.api_port = api_port if api_port is not None else 8000
 66 |         self.vnc_port = vnc_port
 67 |         self.ephemeral = ephemeral
 68 | 
 69 |         # Handle ephemeral storage (temporary directory)
 70 |         if ephemeral:
 71 |             self.storage = "ephemeral"
 72 |         else:
 73 |             self.storage = storage
 74 | 
 75 |         self.shared_path = shared_path
 76 |         self.image = image
 77 |         self.verbose = verbose
 78 |         self._container_id = None
 79 |         self._running_containers = {}  # Track running containers by name
 80 | 
 81 |         # Detect image type and configure user directory accordingly
 82 |         self._detect_image_config()
 83 | 
 84 |     def _detect_image_config(self):
 85 |         """Detect image type and configure paths accordingly."""
 86 |         # Detect if this is a docker-xfce image or Kasm image
 87 |         if "docker-xfce" in self.image.lower() or "xfce" in self.image.lower():
 88 |             self._home_dir = "/home/cua"
 89 |             self._image_type = "docker-xfce"
 90 |             logger.info(f"Detected docker-xfce image: using {self._home_dir}")
 91 |         else:
 92 |             # Default to Kasm configuration
 93 |             self._home_dir = "/home/kasm-user"
 94 |             self._image_type = "kasm"
 95 |             logger.info(f"Detected Kasm image: using {self._home_dir}")
 96 | 
 97 |     @property
 98 |     def provider_type(self) -> VMProviderType:
 99 |         """Return the provider type."""
100 |         return VMProviderType.DOCKER
101 | 
102 |     def _parse_memory(self, memory_str: str) -> str:
103 |         """Parse memory string to Docker format.
104 | 
105 |         Examples:
106 |             "8GB" -> "8g"
107 |             "1024MB" -> "1024m"
108 |             "512" -> "512m"
109 |         """
110 |         if isinstance(memory_str, int):
111 |             return f"{memory_str}m"
112 | 
113 |         if isinstance(memory_str, str):
114 |             # Extract number and unit
115 |             match = re.match(r"(\d+)([A-Za-z]*)", memory_str)
116 |             if match:
117 |                 value, unit = match.groups()
118 |                 unit = unit.upper()
119 | 
120 |                 if unit == "GB" or unit == "G":
121 |                     return f"{value}g"
122 |                 elif unit == "MB" or unit == "M" or unit == "":
123 |                     return f"{value}m"
124 | 
125 |         # Default fallback
126 |         logger.warning(f"Could not parse memory string '{memory_str}', using 4g default")
127 |         return "4g"  # Default to 4GB
128 | 
129 |     async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
130 |         """Get VM information by name.
131 | 
132 |         Args:
133 |             name: Name of the VM to get information for
134 |             storage: Optional storage path override. If provided, this will be used
135 |                     instead of the provider's default storage path.
136 | 
137 |         Returns:
138 |             Dictionary with VM information including status, IP address, etc.
139 |         """
140 |         try:
141 |             # Check if container exists and get its status
142 |             cmd = ["docker", "inspect", name]
143 |             result = subprocess.run(cmd, capture_output=True, text=True)
144 | 
145 |             if result.returncode != 0:
146 |                 # Container doesn't exist
147 |                 return {
148 |                     "name": name,
149 |                     "status": "not_found",
150 |                     "ip_address": None,
151 |                     "ports": {},
152 |                     "image": self.image,
153 |                     "provider": "docker",
154 |                 }
155 | 
156 |             # Parse container info
157 |             container_info = json.loads(result.stdout)[0]
158 |             state = container_info["State"]
159 |             network_settings = container_info["NetworkSettings"]
160 | 
161 |             # Determine status
162 |             if state["Running"]:
163 |                 status = "running"
164 |             elif state["Paused"]:
165 |                 status = "paused"
166 |             else:
167 |                 status = "stopped"
168 | 
169 |             # Get IP address
170 |             ip_address = network_settings.get("IPAddress", "")
171 |             if not ip_address and "Networks" in network_settings:
172 |                 # Try to get IP from bridge network
173 |                 for network_name, network_info in network_settings["Networks"].items():
174 |                     if network_info.get("IPAddress"):
175 |                         ip_address = network_info["IPAddress"]
176 |                         break
177 | 
178 |             # Get port mappings
179 |             ports = {}
180 |             if "Ports" in network_settings and network_settings["Ports"]:
181 |                 # network_settings["Ports"] is a dict like:
182 |                 # {'6901/tcp': [{'HostIp': '0.0.0.0', 'HostPort': '6901'}, ...], ...}
183 |                 for container_port, port_mappings in network_settings["Ports"].items():
184 |                     if port_mappings:  # Check if there are any port mappings
185 |                         # Take the first mapping (usually the IPv4 one)
186 |                         for mapping in port_mappings:
187 |                             if mapping.get("HostPort"):
188 |                                 ports[container_port] = mapping["HostPort"]
189 |                                 break  # Use the first valid mapping
190 | 
191 |             return {
192 |                 "name": name,
193 |                 "status": status,
194 |                 "ip_address": ip_address or "127.0.0.1",  # Use localhost if no IP
195 |                 "ports": ports,
196 |                 "image": container_info["Config"]["Image"],
197 |                 "provider": "docker",
198 |                 "container_id": container_info["Id"][:12],  # Short ID
199 |                 "created": container_info["Created"],
200 |                 "started": state.get("StartedAt", ""),
201 |             }
202 | 
203 |         except Exception as e:
204 |             logger.error(f"Error getting VM info for {name}: {e}")
205 |             import traceback
206 | 
207 |             traceback.print_exc()
208 |             return {"name": name, "status": "error", "error": str(e), "provider": "docker"}
209 | 
210 |     async def list_vms(self) -> List[Dict[str, Any]]:
211 |         """List all Docker containers managed by this provider."""
212 |         try:
213 |             # List all containers (running and stopped) with the CUA image
214 |             cmd = ["docker", "ps", "-a", "--filter", f"ancestor={self.image}", "--format", "json"]
215 |             result = subprocess.run(cmd, capture_output=True, text=True, check=True)
216 | 
217 |             containers = []
218 |             if result.stdout.strip():
219 |                 for line in result.stdout.strip().split("\n"):
220 |                     if line.strip():
221 |                         container_data = json.loads(line)
222 |                         vm_info = await self.get_vm(container_data["Names"])
223 |                         containers.append(vm_info)
224 | 
225 |             return containers
226 | 
227 |         except subprocess.CalledProcessError as e:
228 |             logger.error(f"Error listing containers: {e.stderr}")
229 |             return []
230 |         except Exception as e:
231 |             logger.error(f"Error listing VMs: {e}")
232 |             import traceback
233 | 
234 |             traceback.print_exc()
235 |             return []
236 | 
237 |     async def run_vm(
238 |         self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None
239 |     ) -> Dict[str, Any]:
240 |         """Run a VM with the given options.
241 | 
242 |         Args:
243 |             image: Name/tag of the Docker image to use
244 |             name: Name of the container to run
245 |             run_opts: Options for running the VM, including:
246 |                 - memory: Memory limit (e.g., "4GB", "2048MB")
247 |                 - cpu: CPU limit (e.g., 2 for 2 cores)
248 |                 - vnc_port: Specific port for VNC interface
249 |                 - api_port: Specific port for computer-server API
250 | 
251 |         Returns:
252 |             Dictionary with VM status information
253 |         """
254 |         try:
255 |             # Check if container already exists
256 |             existing_vm = await self.get_vm(name, storage)
257 |             if existing_vm["status"] == "running":
258 |                 logger.info(f"Container {name} is already running")
259 |                 return existing_vm
260 |             elif existing_vm["status"] in ["stopped", "paused"]:
261 |                 if self.ephemeral:
262 |                     # Delete existing container
263 |                     logger.info(f"Deleting existing container {name}")
264 |                     delete_cmd = ["docker", "rm", name]
265 |                     result = subprocess.run(delete_cmd, capture_output=True, text=True, check=True)
266 |                 else:
267 |                     # Start existing container
268 |                     logger.info(f"Starting existing container {name}")
269 |                     start_cmd = ["docker", "start", name]
270 |                     result = subprocess.run(start_cmd, capture_output=True, text=True, check=True)
271 | 
272 |                     # Wait for container to be ready
273 |                     await self._wait_for_container_ready(name)
274 |                     return await self.get_vm(name, storage)
275 | 
276 |             # Use provided image or default
277 |             docker_image = image if image != "default" else self.image
278 | 
279 |             # Build docker run command
280 |             cmd = ["docker", "run", "-d", "--name", name]
281 | 
282 |             # Add memory limit if specified
283 |             if "memory" in run_opts:
284 |                 memory_limit = self._parse_memory(run_opts["memory"])
285 |                 cmd.extend(["--memory", memory_limit])
286 | 
287 |             # Add CPU limit if specified
288 |             if "cpu" in run_opts:
289 |                 cpu_count = str(run_opts["cpu"])
290 |                 cmd.extend(["--cpus", cpu_count])
291 | 
292 |             # Add port mappings
293 |             vnc_port = run_opts.get("vnc_port", self.vnc_port)
294 |             api_port = run_opts.get("api_port", self.api_port)
295 | 
296 |             if vnc_port:
297 |                 cmd.extend(["-p", f"{vnc_port}:6901"])  # VNC port
298 |             if api_port:
299 |                 # Map the API port to container port 8000 (computer-server default)
300 |                 cmd.extend(["-p", f"{api_port}:8000"])  # computer-server API port
301 | 
302 |             # Add volume mounts if storage is specified
303 |             storage_path = storage or self.storage
304 |             if storage_path and storage_path != "ephemeral":
305 |                 # Mount storage directory using detected home directory
306 |                 cmd.extend(["-v", f"{storage_path}:{self._home_dir}/storage"])
307 | 
308 |             # Add shared path if specified
309 |             if self.shared_path:
310 |                 # Mount shared directory using detected home directory
311 |                 cmd.extend(["-v", f"{self.shared_path}:{self._home_dir}/shared"])
312 | 
313 |             # Add environment variables
314 |             cmd.extend(["-e", "VNC_PW=password"])  # Set VNC password
315 |             cmd.extend(["-e", "VNCOPTIONS=-disableBasicAuth"])  # Disable VNC basic auth
316 | 
317 |             # Apply display resolution if provided (e.g., "1024x768")
318 |             display_resolution = run_opts.get("display")
319 |             if (
320 |                 isinstance(display_resolution, dict)
321 |                 and "width" in display_resolution
322 |                 and "height" in display_resolution
323 |             ):
324 |                 cmd.extend(
325 |                     [
326 |                         "-e",
327 |                         f"VNC_RESOLUTION={display_resolution['width']}x{display_resolution['height']}",
328 |                     ]
329 |                 )
330 | 
331 |             # Add the image
332 |             cmd.append(docker_image)
333 | 
334 |             logger.info(f"Running Docker container with command: {' '.join(cmd)}")
335 | 
336 |             # Run the container
337 |             result = subprocess.run(cmd, capture_output=True, text=True, check=True)
338 |             container_id = result.stdout.strip()
339 | 
340 |             logger.info(f"Container {name} started with ID: {container_id[:12]}")
341 | 
342 |             # Store container info
343 |             self._container_id = container_id
344 |             self._running_containers[name] = container_id
345 | 
346 |             # Wait for container to be ready
347 |             await self._wait_for_container_ready(name)
348 | 
349 |             # Return VM info
350 |             vm_info = await self.get_vm(name, storage)
351 |             vm_info["container_id"] = container_id[:12]
352 | 
353 |             return vm_info
354 | 
355 |         except subprocess.CalledProcessError as e:
356 |             error_msg = f"Failed to run container {name}: {e.stderr}"
357 |             logger.error(error_msg)
358 |             return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
359 |         except Exception as e:
360 |             error_msg = f"Error running VM {name}: {e}"
361 |             logger.error(error_msg)
362 |             return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
363 | 
364 |     async def _wait_for_container_ready(self, container_name: str, timeout: int = 60) -> bool:
365 |         """Wait for the Docker container to be fully ready.
366 | 
367 |         Args:
368 |             container_name: Name of the Docker container to check
369 |             timeout: Maximum time to wait in seconds (default: 60 seconds)
370 | 
371 |         Returns:
372 |             True if the container is running and ready
373 |         """
374 |         logger.info(f"Waiting for container {container_name} to be ready...")
375 | 
376 |         start_time = time.time()
377 |         while time.time() - start_time < timeout:
378 |             try:
379 |                 # Check if container is running
380 |                 vm_info = await self.get_vm(container_name)
381 |                 if vm_info["status"] == "running":
382 |                     logger.info(f"Container {container_name} is running")
383 | 
384 |                     # Additional check: try to connect to computer-server API
385 |                     # This is optional - we'll just wait a bit more for services to start
386 |                     await asyncio.sleep(5)
387 |                     return True
388 | 
389 |             except Exception as e:
390 |                 logger.debug(f"Container {container_name} not ready yet: {e}")
391 | 
392 |             await asyncio.sleep(2)
393 | 
394 |         logger.warning(f"Container {container_name} did not become ready within {timeout} seconds")
395 |         return False
396 | 
397 |     async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
398 |         """Stop a running VM by stopping the Docker container."""
399 |         try:
400 |             logger.info(f"Stopping container {name}")
401 | 
402 |             # Stop the container
403 |             cmd = ["docker", "stop", name]
404 |             result = subprocess.run(cmd, capture_output=True, text=True, check=True)
405 | 
406 |             # Remove from running containers tracking
407 |             if name in self._running_containers:
408 |                 del self._running_containers[name]
409 | 
410 |             logger.info(f"Container {name} stopped successfully")
411 | 
412 |             # Delete container if ephemeral=True
413 |             if self.ephemeral:
414 |                 cmd = ["docker", "rm", name]
415 |                 result = subprocess.run(cmd, capture_output=True, text=True, check=True)
416 | 
417 |             return {
418 |                 "name": name,
419 |                 "status": "stopped",
420 |                 "message": "Container stopped successfully",
421 |                 "provider": "docker",
422 |             }
423 | 
424 |         except subprocess.CalledProcessError as e:
425 |             error_msg = f"Failed to stop container {name}: {e.stderr}"
426 |             logger.error(error_msg)
427 |             return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
428 |         except Exception as e:
429 |             error_msg = f"Error stopping VM {name}: {e}"
430 |             logger.error(error_msg)
431 |             return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
432 | 
433 |     async def restart_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
434 |         raise NotImplementedError("DockerProvider does not support restarting VMs.")
435 | 
436 |     async def update_vm(
437 |         self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None
438 |     ) -> Dict[str, Any]:
439 |         """Update VM configuration.
440 | 
441 |         Note: Docker containers cannot be updated while running.
442 |         This method will return an error suggesting to recreate the container.
443 |         """
444 |         return {
445 |             "name": name,
446 |             "status": "error",
447 |             "error": "Docker containers cannot be updated while running. Please stop and recreate the container with new options.",
448 |             "provider": "docker",
449 |         }
450 | 
451 |     async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> str:
452 |         """Get the IP address of a VM, waiting indefinitely until it's available.
453 | 
454 |         Args:
455 |             name: Name of the VM to get the IP for
456 |             storage: Optional storage path override
457 |             retry_delay: Delay between retries in seconds (default: 2)
458 | 
459 |         Returns:
460 |             IP address of the VM when it becomes available
461 |         """
462 |         logger.info(f"Getting IP address for container {name}")
463 | 
464 |         total_attempts = 0
465 |         while True:
466 |             total_attempts += 1
467 | 
468 |             try:
469 |                 vm_info = await self.get_vm(name, storage)
470 | 
471 |                 if vm_info["status"] == "error":
472 |                     raise Exception(
473 |                         f"VM is in error state: {vm_info.get('error', 'Unknown error')}"
474 |                     )
475 | 
476 |                 # TODO: for now, return localhost
477 |                 # it seems the docker container is not accessible from the host
478 |                 # on WSL2, unless you port forward? not sure
479 |                 if True:
480 |                     logger.warning("Overriding container IP with localhost")
481 |                     return "localhost"
482 | 
483 |                 # Check if we got a valid IP
484 |                 ip = vm_info.get("ip_address", None)
485 |                 if ip and ip != "unknown" and not ip.startswith("0.0.0.0"):
486 |                     logger.info(f"Got valid container IP address: {ip}")
487 |                     return ip
488 | 
489 |                 # For Docker containers, we can also use localhost if ports are mapped
490 |                 if vm_info["status"] == "running" and vm_info.get("ports"):
491 |                     logger.info("Container is running with port mappings, using localhost")
492 |                     return "127.0.0.1"
493 | 
494 |                 # Check the container status
495 |                 status = vm_info.get("status", "unknown")
496 | 
497 |                 if status == "stopped":
498 |                     logger.info(f"Container status is {status}, but still waiting for it to start")
499 |                 elif status != "running":
500 |                     logger.info(f"Container is not running yet (status: {status}). Waiting...")
501 |                 else:
502 |                     logger.info("Container is running but no valid IP address yet. Waiting...")
503 | 
504 |             except Exception as e:
505 |                 logger.warning(f"Error getting container {name} IP: {e}, continuing to wait...")
506 | 
507 |             # Wait before next retry
508 |             await asyncio.sleep(retry_delay)
509 | 
510 |             # Add progress log every 10 attempts
511 |             if total_attempts % 10 == 0:
512 |                 logger.info(
513 |                     f"Still waiting for container {name} IP after {total_attempts} attempts..."
514 |                 )
515 | 
516 |     async def __aenter__(self):
517 |         """Async context manager entry."""
518 |         logger.debug("Entering DockerProvider context")
519 |         return self
520 | 
521 |     async def __aexit__(self, exc_type, exc_val, exc_tb):
522 |         """Async context manager exit.
523 | 
524 |         This method handles cleanup of running containers if needed.
525 |         """
526 |         logger.debug(f"Exiting DockerProvider context, handling exceptions: {exc_type}")
527 |         try:
528 |             # Optionally stop running containers on context exit
529 |             # For now, we'll leave containers running as they might be needed
530 |             # Users can manually stop them if needed
531 |             pass
532 |         except Exception as e:
533 |             logger.error(f"Error during DockerProvider cleanup: {e}")
534 |             if exc_type is None:
535 |                 raise
536 |         return False
537 | 
```

--------------------------------------------------------------------------------
/docs/content/docs/example-usecases/gemini-complex-ui-navigation.mdx:
--------------------------------------------------------------------------------

```markdown
  1 | ---
  2 | title: GUI Grounding with Gemini 3
  3 | description: Using Google's Gemini 3 with OmniParser for Advanced GUI Grounding Tasks
  4 | ---
  5 | 
  6 | import { Step, Steps } from 'fumadocs-ui/components/steps';
  7 | import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
  8 | import { Callout } from 'fumadocs-ui/components/callout';
  9 | 
 10 | ## Overview
 11 | 
 12 | This example demonstrates how to use Google's Gemini 3 models with OmniParser for complex GUI grounding tasks. Gemini 3 Pro achieves exceptional performance on the [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) with a **72.7% accuracy** (compared to Claude Sonnet 4.5's 36.2%), making it ideal for precise UI element location and complex navigation tasks.
 13 | 
 14 | <img
 15 |   src="/docs/img/grounding-with-gemini3.gif"
 16 |   alt="Demo of Gemini 3 with OmniParser performing complex GUI navigation tasks"
 17 |   width="800px"
 18 | />
 19 | 
 20 | <Callout type="info" title="Why Gemini 3 for UI Navigation?">
 21 |   According to [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/),
 22 |   Gemini 3 Pro achieves: - **72.7%** on ScreenSpot-Pro (vs. Gemini 2.5 Pro's 11.4%) -
 23 |   Industry-leading performance on complex UI navigation tasks - Advanced multimodal understanding
 24 |   for high-resolution screens
 25 | </Callout>
 26 | 
 27 | ### What You'll Build
 28 | 
 29 | This guide shows how to:
 30 | 
 31 | - Set up Vertex AI with proper authentication
 32 | - Use OmniParser with Gemini 3 for GUI element detection
 33 | - Leverage Gemini 3-specific features like `thinking_level` and `media_resolution`
 34 | - Create agents that can perform complex multi-step UI interactions
 35 | 
 36 | ---
 37 | 
 38 | <Steps>
 39 | 
 40 | <Step>
 41 | 
 42 | ### Set Up Google Cloud and Vertex AI
 43 | 
 44 | Before using Gemini 3 models, you need to enable Vertex AI in Google Cloud Console.
 45 | 
 46 | #### 1. Create a Google Cloud Project
 47 | 
 48 | 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
 49 | 2. Click **Select a project** → **New Project**
 50 | 3. Enter a project name and click **Create**
 51 | 4. Note your **Project ID** (you'll need this later)
 52 | 
 53 | #### 2. Enable Vertex AI API
 54 | 
 55 | 1. Navigate to [Vertex AI API](https://console.cloud.google.com/apis/library/aiplatform.googleapis.com)
 56 | 2. Select your project
 57 | 3. Click **Enable**
 58 | 
 59 | #### 3. Enable Billing
 60 | 
 61 | 1. Go to [Billing](https://console.cloud.google.com/billing)
 62 | 2. Link a billing account to your project
 63 | 3. Vertex AI offers a [free tier](https://cloud.google.com/vertex-ai/pricing) for testing
 64 | 
 65 | #### 4. Create a Service Account
 66 | 
 67 | 1. Go to [IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
 68 | 2. Click **Create Service Account**
 69 | 3. Enter a name (e.g., "cua-gemini-agent")
 70 | 4. Click **Create and Continue**
 71 | 5. Grant the **Vertex AI User** role
 72 | 6. Click **Done**
 73 | 
 74 | #### 5. Create and Download Service Account Key
 75 | 
 76 | 1. Click on your newly created service account
 77 | 2. Go to **Keys** tab
 78 | 3. Click **Add Key** → **Create new key**
 79 | 4. Select **JSON** format
 80 | 5. Click **Create** (the key file will download automatically)
 81 | 6. **Important**: Store this key file securely! It contains credentials for accessing your Google Cloud resources
 82 | 
 83 | <Callout type="warn">
 84 |   Never commit your service account JSON key to version control! Add it to `.gitignore` immediately.
 85 | </Callout>
 86 | 
 87 | </Step>
 88 | 
 89 | <Step>
 90 | 
 91 | ### Install Dependencies
 92 | 
 93 | Install the required packages for OmniParser and Gemini 3:
 94 | 
 95 | Create a `requirements.txt` file:
 96 | 
 97 | ```text
 98 | cua-agent
 99 | cua-computer
100 | cua-som  # OmniParser for GUI element detection
101 | litellm>=1.0.0
102 | python-dotenv>=1.0.0
103 | google-cloud-aiplatform>=1.70.0
104 | ```
105 | 
106 | Install the dependencies:
107 | 
108 | ```bash
109 | pip install -r requirements.txt
110 | ```
111 | 
112 | </Step>
113 | 
114 | <Step>
115 | 
116 | ### Configure Environment Variables
117 | 
118 | Create a `.env` file in your project root:
119 | 
120 | ```text
121 | # Google Cloud / Vertex AI credentials
122 | GOOGLE_CLOUD_PROJECT=your-project-id
123 | GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-service-account-key.json
124 | 
125 | # Cua credentials (for cloud sandboxes)
126 | CUA_API_KEY=sk_cua-api01...
127 | CUA_SANDBOX_NAME=your-sandbox-name
128 | ```
129 | 
130 | Replace the values:
131 | 
132 | - `your-project-id`: Your Google Cloud Project ID from Step 1
133 | - `/path/to/your-service-account-key.json`: Path to the JSON key file you downloaded
134 | - `sk_cua-api01...`: Your Cua API key from the [Cua dashboard](https://cua.dev)
135 | - `your-sandbox-name`: Your sandbox name (if using cloud sandboxes)
136 | 
137 | </Step>
138 | 
139 | <Step>
140 | 
141 | ### Create Your Complex UI Navigation Script
142 | 
143 | Create a Python file (e.g., `gemini_ui_navigation.py`):
144 | 
145 | <Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox']}>
146 |   <Tab value="Cloud Sandbox">
147 | 
148 | ```python
149 | import asyncio
150 | import logging
151 | import os
152 | import signal
153 | import traceback
154 | 
155 | from agent import ComputerAgent
156 | from computer import Computer, VMProviderType
157 | from dotenv import load_dotenv
158 | 
159 | logging.basicConfig(level=logging.INFO)
160 | logger = logging.getLogger(__name__)
161 | 
162 | def handle_sigint(sig, frame):
163 |     print("\n\nExecution interrupted by user. Exiting gracefully...")
164 |     exit(0)
165 | 
166 | async def complex_ui_navigation():
167 |     """
168 |     Demonstrate Gemini 3's exceptional UI grounding capabilities
169 |     with complex, multi-step navigation tasks.
170 |     """
171 |     try:
172 |         async with Computer(
173 |             os_type="linux",
174 |             provider_type=VMProviderType.CLOUD,
175 |             name=os.environ["CUA_SANDBOX_NAME"],
176 |             api_key=os.environ["CUA_API_KEY"],
177 |             verbosity=logging.INFO,
178 |         ) as computer:
179 | 
180 |             agent = ComputerAgent(
181 |                 # Use OmniParser with Gemini 3 Pro for optimal GUI grounding
182 |                 model="omniparser+vertex_ai/gemini-3-pro-preview",
183 |                 tools=[computer],
184 |                 only_n_most_recent_images=3,
185 |                 verbosity=logging.INFO,
186 |                 trajectory_dir="trajectories",
187 |                 use_prompt_caching=False,
188 |                 max_trajectory_budget=5.0,
189 |                 # Gemini 3-specific parameters
190 |                 thinking_level="high",  # Enables deeper reasoning (vs "low")
191 |                 media_resolution="high",  # High-resolution image processing (vs "low" or "medium")
192 |             )
193 | 
194 |             # Complex GUI grounding tasks inspired by ScreenSpot-Pro benchmark
195 |             # These test precise element location in professional UIs
196 |             tasks = [
197 |                 # Task 1: GitHub repository navigation
198 |                 {
199 |                     "instruction": (
200 |                         "Go to github.com/trycua/cua. "
201 |                         "Find and click on the 'Issues' tab. "
202 |                         "Then locate and click on the search box within the issues page "
203 |                         "(not the global GitHub search). "
204 |                         "Type 'omniparser' and press Enter."
205 |                     ),
206 |                     "description": "Tests precise UI element distinction in a complex interface",
207 |                 },
208 | 
209 |                 # Task 2: Search for and install Visual Studio Code
210 |                 {
211 |                     "instruction": (
212 |                         "Open your system's app store (e.g., Microsoft Store). "
213 |                         "Search for 'Visual Studio Code'. "
214 |                         "In the search results, select 'Visual Studio Code'. "
215 |                         "Click on 'Install' or 'Get' to begin the installation. "
216 |                         "If prompted, accept any permissions or confirm the installation. "
217 |                         "Wait for Visual Studio Code to finish installing."
218 |                     ),
219 |                     "description": "Tests the ability to search for an application and complete its installation through a step-by-step app store workflow.",
220 |                 },
221 |             ]
222 | 
223 |             history = []
224 | 
225 |             for i, task_info in enumerate(tasks, 1):
226 |                 task = task_info["instruction"]
227 |                 print(f"\n{'='*60}")
228 |                 print(f"[Task {i}/{len(tasks)}] {task_info['description']}")
229 |                 print(f"{'='*60}")
230 |                 print(f"\nInstruction: {task}\n")
231 | 
232 |                 # Add user message to history
233 |                 history.append({"role": "user", "content": task})
234 | 
235 |                 # Run agent with conversation history
236 |                 async for result in agent.run(history, stream=False):
237 |                     history += result.get("output", [])
238 | 
239 |                     # Print output for debugging
240 |                     for item in result.get("output", []):
241 |                         if item.get("type") == "message":
242 |                             content = item.get("content", [])
243 |                             for content_part in content:
244 |                                 if content_part.get("text"):
245 |                                     logger.info(f"Agent: {content_part.get('text')}")
246 |                         elif item.get("type") == "computer_call":
247 |                             action = item.get("action", {})
248 |                             action_type = action.get("type", "")
249 |                             logger.debug(f"Computer Action: {action_type}")
250 | 
251 |                 print(f"\n✅ Task {i}/{len(tasks)} completed")
252 | 
253 |             print("\n🎉 All complex UI navigation tasks completed successfully!")
254 | 
255 |     except Exception as e:
256 |         logger.error(f"Error in complex_ui_navigation: {e}")
257 |         traceback.print_exc()
258 |         raise
259 | 
260 | def main():
261 |     try:
262 |         load_dotenv()
263 | 
264 |         # Validate required environment variables
265 |         required_vars = [
266 |             "GOOGLE_CLOUD_PROJECT",
267 |             "GOOGLE_APPLICATION_CREDENTIALS",
268 |             "CUA_API_KEY",
269 |             "CUA_SANDBOX_NAME",
270 |         ]
271 | 
272 |         missing_vars = [var for var in required_vars if not os.environ.get(var)]
273 |         if missing_vars:
274 |             raise RuntimeError(
275 |                 f"Missing required environment variables: {', '.join(missing_vars)}\n"
276 |                 f"Please check your .env file and ensure all keys are set.\n"
277 |                 f"See the setup guide for details on configuring Vertex AI credentials."
278 |             )
279 | 
280 |         signal.signal(signal.SIGINT, handle_sigint)
281 | 
282 |         asyncio.run(complex_ui_navigation())
283 | 
284 |     except Exception as e:
285 |         logger.error(f"Error running automation: {e}")
286 |         traceback.print_exc()
287 | 
288 | if __name__ == "__main__":
289 |     main()
290 | ```
291 | 
292 |   </Tab>
293 |   <Tab value="Linux on Docker">
294 | 
295 | ```python
296 | import asyncio
297 | import logging
298 | import os
299 | import signal
300 | import traceback
301 | 
302 | from agent import ComputerAgent
303 | from computer import Computer, VMProviderType
304 | from dotenv import load_dotenv
305 | 
306 | logging.basicConfig(level=logging.INFO)
307 | logger = logging.getLogger(__name__)
308 | 
309 | def handle_sigint(sig, frame):
310 |     print("\n\nExecution interrupted by user. Exiting gracefully...")
311 |     exit(0)
312 | 
313 | async def complex_ui_navigation():
314 |     """
315 |     Demonstrate Gemini 3's exceptional UI grounding capabilities
316 |     with complex, multi-step navigation tasks.
317 |     """
318 |     try:
319 |         async with Computer(
320 |             os_type="linux",
321 |             provider_type=VMProviderType.DOCKER,
322 |             image="trycua/cua-xfce:latest",
323 |             verbosity=logging.INFO,
324 |         ) as computer:
325 | 
326 |             agent = ComputerAgent(
327 |                 # Use OmniParser with Gemini 3 Pro for optimal GUI grounding
328 |                 model="omniparser+vertex_ai/gemini-3-pro-preview",
329 |                 tools=[computer],
330 |                 only_n_most_recent_images=3,
331 |                 verbosity=logging.INFO,
332 |                 trajectory_dir="trajectories",
333 |                 use_prompt_caching=False,
334 |                 max_trajectory_budget=5.0,
335 |                 # Gemini 3-specific parameters
336 |                 thinking_level="high",  # Enables deeper reasoning (vs "low")
337 |                 media_resolution="high",  # High-resolution image processing (vs "low" or "medium")
338 |             )
339 | 
340 |             # Complex GUI grounding tasks inspired by ScreenSpot-Pro benchmark
341 |             tasks = [
342 |                 {
343 |                     "instruction": (
344 |                         "Go to github.com/trycua/cua. "
345 |                         "Find and click on the 'Issues' tab. "
346 |                         "Then locate and click on the search box within the issues page "
347 |                         "(not the global GitHub search). "
348 |                         "Type 'omniparser' and press Enter."
349 |                     ),
350 |                     "description": "Tests precise UI element distinction in a complex interface",
351 |                 },
352 |             ]
353 | 
354 |             history = []
355 | 
356 |             for i, task_info in enumerate(tasks, 1):
357 |                 task = task_info["instruction"]
358 |                 print(f"\n{'='*60}")
359 |                 print(f"[Task {i}/{len(tasks)}] {task_info['description']}")
360 |                 print(f"{'='*60}")
361 |                 print(f"\nInstruction: {task}\n")
362 | 
363 |                 history.append({"role": "user", "content": task})
364 | 
365 |                 async for result in agent.run(history, stream=False):
366 |                     history += result.get("output", [])
367 | 
368 |                     for item in result.get("output", []):
369 |                         if item.get("type") == "message":
370 |                             content = item.get("content", [])
371 |                             for content_part in content:
372 |                                 if content_part.get("text"):
373 |                                     logger.info(f"Agent: {content_part.get('text')}")
374 |                         elif item.get("type") == "computer_call":
375 |                             action = item.get("action", {})
376 |                             action_type = action.get("type", "")
377 |                             logger.debug(f"Computer Action: {action_type}")
378 | 
379 |                 print(f"\n✅ Task {i}/{len(tasks)} completed")
380 | 
381 |             print("\n🎉 All complex UI navigation tasks completed successfully!")
382 | 
383 |     except Exception as e:
384 |         logger.error(f"Error in complex_ui_navigation: {e}")
385 |         traceback.print_exc()
386 |         raise
387 | 
388 | def main():
389 |     try:
390 |         load_dotenv()
391 | 
392 |         required_vars = [
393 |             "GOOGLE_CLOUD_PROJECT",
394 |             "GOOGLE_APPLICATION_CREDENTIALS",
395 |         ]
396 | 
397 |         missing_vars = [var for var in required_vars if not os.environ.get(var)]
398 |         if missing_vars:
399 |             raise RuntimeError(
400 |                 f"Missing required environment variables: {', '.join(missing_vars)}\n"
401 |                 f"Please check your .env file."
402 |             )
403 | 
404 |         signal.signal(signal.SIGINT, handle_sigint)
405 | 
406 |         asyncio.run(complex_ui_navigation())
407 | 
408 |     except Exception as e:
409 |         logger.error(f"Error running automation: {e}")
410 |         traceback.print_exc()
411 | 
412 | if __name__ == "__main__":
413 |     main()
414 | ```
415 | 
416 |   </Tab>
417 |   <Tab value="macOS Sandbox">
418 | 
419 | ```python
420 | import asyncio
421 | import logging
422 | import os
423 | import signal
424 | import traceback
425 | 
426 | from agent import ComputerAgent
427 | from computer import Computer, VMProviderType
428 | from dotenv import load_dotenv
429 | 
430 | logging.basicConfig(level=logging.INFO)
431 | logger = logging.getLogger(__name__)
432 | 
433 | def handle_sigint(sig, frame):
434 |     print("\n\nExecution interrupted by user. Exiting gracefully...")
435 |     exit(0)
436 | 
437 | async def complex_ui_navigation():
438 |     """
439 |     Demonstrate Gemini 3's exceptional UI grounding capabilities
440 |     with complex, multi-step navigation tasks.
441 |     """
442 |     try:
443 |         async with Computer(
444 |             os_type="macos",
445 |             provider_type=VMProviderType.LUME,
446 |             name="macos-sequoia-cua:latest",
447 |             verbosity=logging.INFO,
448 |         ) as computer:
449 | 
450 |             agent = ComputerAgent(
451 |                 # Use OmniParser with Gemini 3 Pro for optimal GUI grounding
452 |                 model="omniparser+vertex_ai/gemini-3-pro-preview",
453 |                 tools=[computer],
454 |                 only_n_most_recent_images=3,
455 |                 verbosity=logging.INFO,
456 |                 trajectory_dir="trajectories",
457 |                 use_prompt_caching=False,
458 |                 max_trajectory_budget=5.0,
459 |                 # Gemini 3-specific parameters
460 |                 thinking_level="high",  # Enables deeper reasoning (vs "low")
461 |                 media_resolution="high",  # High-resolution image processing (vs "low" or "medium")
462 |             )
463 | 
464 |             # Complex GUI grounding tasks inspired by ScreenSpot-Pro benchmark
465 |             tasks = [
466 |                 {
467 |                     "instruction": (
468 |                         "Go to github.com/trycua/cua. "
469 |                         "Find and click on the 'Issues' tab. "
470 |                         "Then locate and click on the search box within the issues page "
471 |                         "(not the global GitHub search). "
472 |                         "Type 'omniparser' and press Enter."
473 |                     ),
474 |                     "description": "Tests precise UI element distinction in a complex interface",
475 |                 },
476 |             ]
477 | 
478 |             history = []
479 | 
480 |             for i, task_info in enumerate(tasks, 1):
481 |                 task = task_info["instruction"]
482 |                 print(f"\n{'='*60}")
483 |                 print(f"[Task {i}/{len(tasks)}] {task_info['description']}")
484 |                 print(f"{'='*60}")
485 |                 print(f"\nInstruction: {task}\n")
486 | 
487 |                 history.append({"role": "user", "content": task})
488 | 
489 |                 async for result in agent.run(history, stream=False):
490 |                     history += result.get("output", [])
491 | 
492 |                     for item in result.get("output", []):
493 |                         if item.get("type") == "message":
494 |                             content = item.get("content", [])
495 |                             for content_part in content:
496 |                                 if content_part.get("text"):
497 |                                     logger.info(f"Agent: {content_part.get('text')}")
498 |                         elif item.get("type") == "computer_call":
499 |                             action = item.get("action", {})
500 |                             action_type = action.get("type", "")
501 |                             logger.debug(f"Computer Action: {action_type}")
502 | 
503 |                 print(f"\n✅ Task {i}/{len(tasks)} completed")
504 | 
505 |             print("\n🎉 All complex UI navigation tasks completed successfully!")
506 | 
507 |     except Exception as e:
508 |         logger.error(f"Error in complex_ui_navigation: {e}")
509 |         traceback.print_exc()
510 |         raise
511 | 
512 | def main():
513 |     try:
514 |         load_dotenv()
515 | 
516 |         required_vars = [
517 |             "GOOGLE_CLOUD_PROJECT",
518 |             "GOOGLE_APPLICATION_CREDENTIALS",
519 |         ]
520 | 
521 |         missing_vars = [var for var in required_vars if not os.environ.get(var)]
522 |         if missing_vars:
523 |             raise RuntimeError(
524 |                 f"Missing required environment variables: {', '.join(missing_vars)}\n"
525 |                 f"Please check your .env file."
526 |             )
527 | 
528 |         signal.signal(signal.SIGINT, handle_sigint)
529 | 
530 |         asyncio.run(complex_ui_navigation())
531 | 
532 |     except Exception as e:
533 |         logger.error(f"Error running automation: {e}")
534 |         traceback.print_exc()
535 | 
536 | if __name__ == "__main__":
537 |     main()
538 | ```
539 | 
540 |   </Tab>
541 | </Tabs>
542 | 
543 | </Step>
544 | 
545 | <Step>
546 | 
547 | ### Run Your Script
548 | 
549 | Execute your complex UI navigation automation:
550 | 
551 | ```bash
552 | python gemini_ui_navigation.py
553 | ```
554 | 
555 | The agent will:
556 | 
557 | 1. Navigate to GitHub and locate specific UI elements
558 | 2. Distinguish between similar elements (e.g., global search vs. issues search)
559 | 3. Perform multi-step interactions with visual feedback
560 | 4. Use Gemini 3's advanced reasoning for precise element grounding
561 | 
562 | Monitor the output to see the agent's progress through each task.
563 | 
564 | </Step>
565 | 
566 | </Steps>
567 | 
568 | ---
569 | 
570 | ## Understanding Gemini 3-Specific Parameters
571 | 
572 | ### `thinking_level`
573 | 
574 | Controls the amount of internal reasoning the model performs:
575 | 
576 | - `"high"`: Deeper reasoning, better for complex UI navigation (recommended for ScreenSpot-like tasks)
577 | - `"low"`: Faster responses, suitable for simpler tasks
578 | 
579 | ### `media_resolution`
580 | 
581 | Controls vision processing for multimodal inputs:
582 | 
583 | - `"high"`: Best for complex UIs with many small elements (recommended)
584 | - `"medium"`: Balanced quality and speed
585 | - `"low"`: Faster processing for simple interfaces
586 | 
587 | <Callout type="info">
588 |   For tasks requiring precise GUI element location (like ScreenSpot-Pro), use
589 |   `thinking_level="high"` and `media_resolution="high"` for optimal performance.
590 | </Callout>
591 | 
592 | ---
593 | 
594 | ## Benchmark Performance
595 | 
596 | Gemini 3 Pro's performance on ScreenSpot-Pro demonstrates its exceptional UI grounding capabilities:
597 | 
598 | | Model             | ScreenSpot-Pro Score |
599 | | ----------------- | -------------------- |
600 | | **Gemini 3 Pro**  | **72.7%**            |
601 | | Claude Sonnet 4.5 | 36.2%                |
602 | | Gemini 2.5 Pro    | 11.4%                |
603 | | GPT-5.1           | 3.5%                 |
604 | 
605 | This makes Gemini 3 the ideal choice for complex UI navigation, element detection, and professional GUI automation tasks.
606 | 
607 | ---
608 | 
609 | ## Troubleshooting
610 | 
611 | ### Authentication Issues
612 | 
613 | If you encounter authentication errors:
614 | 
615 | 1. Verify your service account JSON key path is correct
616 | 2. Ensure the service account has the **Vertex AI User** role
617 | 3. Check that the Vertex AI API is enabled in your project
618 | 4. Confirm your `GOOGLE_CLOUD_PROJECT` matches your actual project ID
619 | 
620 | ### "Vertex AI API not enabled" Error
621 | 
622 | Run this command to enable the API:
623 | 
624 | ```bash
625 | gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID
626 | ```
627 | 
628 | ### Billing Issues
629 | 
630 | Ensure billing is enabled for your Google Cloud project. Visit the [Billing section](https://console.cloud.google.com/billing) to verify.
631 | 
632 | ---
633 | 
634 | ## Next Steps
635 | 
636 | - Learn more about [OmniParser agent loops](/agent-sdk/agent-loops)
637 | - Explore [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing)
638 | - Read about [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding)
639 | - Check out [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/)
640 | - Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help
641 | 
```
Page 18/28FirstPrevNextLast