This is page 18 of 28. Use http://codebase.md/trycua/cua?lines=true&page={x} to view the full context.
# Directory Structure
```
├── .cursorignore
├── .dockerignore
├── .editorconfig
├── .gitattributes
├── .github
│ ├── FUNDING.yml
│ ├── scripts
│ │ ├── get_pyproject_version.py
│ │ └── tests
│ │ ├── __init__.py
│ │ ├── README.md
│ │ └── test_get_pyproject_version.py
│ └── workflows
│ ├── bump-version.yml
│ ├── ci-lume.yml
│ ├── docker-publish-cua-linux.yml
│ ├── docker-publish-cua-windows.yml
│ ├── docker-publish-kasm.yml
│ ├── docker-publish-xfce.yml
│ ├── docker-reusable-publish.yml
│ ├── link-check.yml
│ ├── lint.yml
│ ├── npm-publish-cli.yml
│ ├── npm-publish-computer.yml
│ ├── npm-publish-core.yml
│ ├── publish-lume.yml
│ ├── pypi-publish-agent.yml
│ ├── pypi-publish-computer-server.yml
│ ├── pypi-publish-computer.yml
│ ├── pypi-publish-core.yml
│ ├── pypi-publish-mcp-server.yml
│ ├── pypi-publish-som.yml
│ ├── pypi-reusable-publish.yml
│ ├── python-tests.yml
│ ├── test-cua-models.yml
│ └── test-validation-script.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .prettierignore
├── .prettierrc.yaml
├── .vscode
│ ├── docs.code-workspace
│ ├── extensions.json
│ ├── launch.json
│ ├── libs-ts.code-workspace
│ ├── lume.code-workspace
│ ├── lumier.code-workspace
│ ├── py.code-workspace
│ └── settings.json
├── blog
│ ├── app-use.md
│ ├── assets
│ │ ├── composite-agents.png
│ │ ├── docker-ubuntu-support.png
│ │ ├── hack-booth.png
│ │ ├── hack-closing-ceremony.jpg
│ │ ├── hack-cua-ollama-hud.jpeg
│ │ ├── hack-leaderboard.png
│ │ ├── hack-the-north.png
│ │ ├── hack-winners.jpeg
│ │ ├── hack-workshop.jpeg
│ │ ├── hud-agent-evals.png
│ │ └── trajectory-viewer.jpeg
│ ├── bringing-computer-use-to-the-web.md
│ ├── build-your-own-operator-on-macos-1.md
│ ├── build-your-own-operator-on-macos-2.md
│ ├── cloud-windows-ga-macos-preview.md
│ ├── composite-agents.md
│ ├── computer-use-agents-for-growth-hacking.md
│ ├── cua-hackathon.md
│ ├── cua-playground-preview.md
│ ├── cua-vlm-router.md
│ ├── hack-the-north.md
│ ├── hud-agent-evals.md
│ ├── human-in-the-loop.md
│ ├── introducing-cua-cli.md
│ ├── introducing-cua-cloud-containers.md
│ ├── lume-to-containerization.md
│ ├── neurips-2025-cua-papers.md
│ ├── sandboxed-python-execution.md
│ ├── training-computer-use-models-trajectories-1.md
│ ├── trajectory-viewer.md
│ ├── ubuntu-docker-support.md
│ └── windows-sandbox.md
├── CONTRIBUTING.md
├── Development.md
├── Dockerfile
├── docs
│ ├── .env.example
│ ├── .gitignore
│ ├── content
│ │ └── docs
│ │ ├── agent-sdk
│ │ │ ├── agent-loops.mdx
│ │ │ ├── benchmarks
│ │ │ │ ├── index.mdx
│ │ │ │ ├── interactive.mdx
│ │ │ │ ├── introduction.mdx
│ │ │ │ ├── meta.json
│ │ │ │ ├── osworld-verified.mdx
│ │ │ │ ├── screenspot-pro.mdx
│ │ │ │ └── screenspot-v2.mdx
│ │ │ ├── callbacks
│ │ │ │ ├── agent-lifecycle.mdx
│ │ │ │ ├── cost-saving.mdx
│ │ │ │ ├── index.mdx
│ │ │ │ ├── logging.mdx
│ │ │ │ ├── meta.json
│ │ │ │ ├── pii-anonymization.mdx
│ │ │ │ └── trajectories.mdx
│ │ │ ├── chat-history.mdx
│ │ │ ├── custom-tools.mdx
│ │ │ ├── customizing-computeragent.mdx
│ │ │ ├── integrations
│ │ │ │ ├── hud.mdx
│ │ │ │ ├── meta.json
│ │ │ │ └── observability.mdx
│ │ │ ├── mcp-server
│ │ │ │ ├── client-integrations.mdx
│ │ │ │ ├── configuration.mdx
│ │ │ │ ├── index.mdx
│ │ │ │ ├── installation.mdx
│ │ │ │ ├── llm-integrations.mdx
│ │ │ │ ├── meta.json
│ │ │ │ ├── tools.mdx
│ │ │ │ └── usage.mdx
│ │ │ ├── message-format.mdx
│ │ │ ├── meta.json
│ │ │ ├── migration-guide.mdx
│ │ │ ├── prompt-caching.mdx
│ │ │ ├── supported-agents
│ │ │ │ ├── composed-agents.mdx
│ │ │ │ ├── computer-use-agents.mdx
│ │ │ │ ├── grounding-models.mdx
│ │ │ │ ├── human-in-the-loop.mdx
│ │ │ │ └── meta.json
│ │ │ ├── supported-model-providers
│ │ │ │ ├── cua-vlm-router.mdx
│ │ │ │ ├── index.mdx
│ │ │ │ └── local-models.mdx
│ │ │ ├── telemetry.mdx
│ │ │ └── usage-tracking.mdx
│ │ ├── cli-playbook
│ │ │ ├── commands.mdx
│ │ │ ├── index.mdx
│ │ │ └── meta.json
│ │ ├── computer-sdk
│ │ │ ├── cloud-vm-management.mdx
│ │ │ ├── commands.mdx
│ │ │ ├── computer-server
│ │ │ │ ├── Commands.mdx
│ │ │ │ ├── index.mdx
│ │ │ │ ├── meta.json
│ │ │ │ ├── REST-API.mdx
│ │ │ │ └── WebSocket-API.mdx
│ │ │ ├── computer-ui.mdx
│ │ │ ├── computers.mdx
│ │ │ ├── custom-computer-handlers.mdx
│ │ │ ├── meta.json
│ │ │ ├── sandboxed-python.mdx
│ │ │ └── tracing-api.mdx
│ │ ├── example-usecases
│ │ │ ├── form-filling.mdx
│ │ │ ├── gemini-complex-ui-navigation.mdx
│ │ │ ├── meta.json
│ │ │ ├── post-event-contact-export.mdx
│ │ │ └── windows-app-behind-vpn.mdx
│ │ ├── get-started
│ │ │ ├── meta.json
│ │ │ └── quickstart.mdx
│ │ ├── index.mdx
│ │ ├── macos-vm-cli-playbook
│ │ │ ├── lume
│ │ │ │ ├── cli-reference.mdx
│ │ │ │ ├── faq.md
│ │ │ │ ├── http-api.mdx
│ │ │ │ ├── index.mdx
│ │ │ │ ├── installation.mdx
│ │ │ │ ├── meta.json
│ │ │ │ └── prebuilt-images.mdx
│ │ │ ├── lumier
│ │ │ │ ├── building-lumier.mdx
│ │ │ │ ├── docker-compose.mdx
│ │ │ │ ├── docker.mdx
│ │ │ │ ├── index.mdx
│ │ │ │ ├── installation.mdx
│ │ │ │ └── meta.json
│ │ │ └── meta.json
│ │ └── meta.json
│ ├── next.config.mjs
│ ├── package-lock.json
│ ├── package.json
│ ├── pnpm-lock.yaml
│ ├── postcss.config.mjs
│ ├── public
│ │ └── img
│ │ ├── agent_gradio_ui.png
│ │ ├── agent.png
│ │ ├── bg-dark.jpg
│ │ ├── bg-light.jpg
│ │ ├── cli.png
│ │ ├── computer.png
│ │ ├── grounding-with-gemini3.gif
│ │ ├── hero.png
│ │ ├── laminar_trace_example.png
│ │ ├── som_box_threshold.png
│ │ └── som_iou_threshold.png
│ ├── README.md
│ ├── source.config.ts
│ ├── src
│ │ ├── app
│ │ │ ├── (home)
│ │ │ │ ├── [[...slug]]
│ │ │ │ │ └── page.tsx
│ │ │ │ └── layout.tsx
│ │ │ ├── api
│ │ │ │ ├── posthog
│ │ │ │ │ └── [...path]
│ │ │ │ │ └── route.ts
│ │ │ │ └── search
│ │ │ │ └── route.ts
│ │ │ ├── favicon.ico
│ │ │ ├── global.css
│ │ │ ├── layout.config.tsx
│ │ │ ├── layout.tsx
│ │ │ ├── llms.mdx
│ │ │ │ └── [[...slug]]
│ │ │ │ └── route.ts
│ │ │ ├── llms.txt
│ │ │ │ └── route.ts
│ │ │ ├── robots.ts
│ │ │ └── sitemap.ts
│ │ ├── assets
│ │ │ ├── discord-black.svg
│ │ │ ├── discord-white.svg
│ │ │ ├── logo-black.svg
│ │ │ └── logo-white.svg
│ │ ├── components
│ │ │ ├── analytics-tracker.tsx
│ │ │ ├── cookie-consent.tsx
│ │ │ ├── doc-actions-menu.tsx
│ │ │ ├── editable-code-block.tsx
│ │ │ ├── footer.tsx
│ │ │ ├── hero.tsx
│ │ │ ├── iou.tsx
│ │ │ ├── mermaid.tsx
│ │ │ └── page-feedback.tsx
│ │ ├── lib
│ │ │ ├── llms.ts
│ │ │ └── source.ts
│ │ ├── mdx-components.tsx
│ │ └── providers
│ │ └── posthog-provider.tsx
│ └── tsconfig.json
├── examples
│ ├── agent_examples.py
│ ├── agent_ui_examples.py
│ ├── browser_tool_example.py
│ ├── cloud_api_examples.py
│ ├── computer_examples_windows.py
│ ├── computer_examples.py
│ ├── computer_ui_examples.py
│ ├── computer-example-ts
│ │ ├── .env.example
│ │ ├── .gitignore
│ │ ├── package-lock.json
│ │ ├── package.json
│ │ ├── pnpm-lock.yaml
│ │ ├── README.md
│ │ ├── src
│ │ │ ├── helpers.ts
│ │ │ └── index.ts
│ │ └── tsconfig.json
│ ├── docker_examples.py
│ ├── evals
│ │ ├── hud_eval_examples.py
│ │ └── wikipedia_most_linked.txt
│ ├── pylume_examples.py
│ ├── sandboxed_functions_examples.py
│ ├── som_examples.py
│ ├── tracing_examples.py
│ ├── utils.py
│ └── winsandbox_example.py
├── img
│ ├── agent_gradio_ui.png
│ ├── agent.png
│ ├── cli.png
│ ├── computer.png
│ ├── logo_black.png
│ └── logo_white.png
├── libs
│ ├── kasm
│ │ ├── Dockerfile
│ │ ├── LICENSE
│ │ ├── README.md
│ │ └── src
│ │ └── ubuntu
│ │ └── install
│ │ └── firefox
│ │ ├── custom_startup.sh
│ │ ├── firefox.desktop
│ │ └── install_firefox.sh
│ ├── lume
│ │ ├── .cursorignore
│ │ ├── CONTRIBUTING.md
│ │ ├── Development.md
│ │ ├── img
│ │ │ └── cli.png
│ │ ├── Package.resolved
│ │ ├── Package.swift
│ │ ├── README.md
│ │ ├── resources
│ │ │ └── lume.entitlements
│ │ ├── scripts
│ │ │ ├── build
│ │ │ │ ├── build-debug.sh
│ │ │ │ ├── build-release-notarized.sh
│ │ │ │ └── build-release.sh
│ │ │ └── install.sh
│ │ ├── src
│ │ │ ├── Commands
│ │ │ │ ├── Clone.swift
│ │ │ │ ├── Config.swift
│ │ │ │ ├── Create.swift
│ │ │ │ ├── Delete.swift
│ │ │ │ ├── Get.swift
│ │ │ │ ├── Images.swift
│ │ │ │ ├── IPSW.swift
│ │ │ │ ├── List.swift
│ │ │ │ ├── Logs.swift
│ │ │ │ ├── Options
│ │ │ │ │ └── FormatOption.swift
│ │ │ │ ├── Prune.swift
│ │ │ │ ├── Pull.swift
│ │ │ │ ├── Push.swift
│ │ │ │ ├── Run.swift
│ │ │ │ ├── Serve.swift
│ │ │ │ ├── Set.swift
│ │ │ │ └── Stop.swift
│ │ │ ├── ContainerRegistry
│ │ │ │ ├── ImageContainerRegistry.swift
│ │ │ │ ├── ImageList.swift
│ │ │ │ └── ImagesPrinter.swift
│ │ │ ├── Errors
│ │ │ │ └── Errors.swift
│ │ │ ├── FileSystem
│ │ │ │ ├── Home.swift
│ │ │ │ ├── Settings.swift
│ │ │ │ ├── VMConfig.swift
│ │ │ │ ├── VMDirectory.swift
│ │ │ │ └── VMLocation.swift
│ │ │ ├── LumeController.swift
│ │ │ ├── Main.swift
│ │ │ ├── Server
│ │ │ │ ├── Handlers.swift
│ │ │ │ ├── HTTP.swift
│ │ │ │ ├── Requests.swift
│ │ │ │ ├── Responses.swift
│ │ │ │ └── Server.swift
│ │ │ ├── Utils
│ │ │ │ ├── CommandRegistry.swift
│ │ │ │ ├── CommandUtils.swift
│ │ │ │ ├── Logger.swift
│ │ │ │ ├── NetworkUtils.swift
│ │ │ │ ├── Path.swift
│ │ │ │ ├── ProcessRunner.swift
│ │ │ │ ├── ProgressLogger.swift
│ │ │ │ ├── String.swift
│ │ │ │ └── Utils.swift
│ │ │ ├── Virtualization
│ │ │ │ ├── DarwinImageLoader.swift
│ │ │ │ ├── DHCPLeaseParser.swift
│ │ │ │ ├── ImageLoaderFactory.swift
│ │ │ │ └── VMVirtualizationService.swift
│ │ │ ├── VM
│ │ │ │ ├── DarwinVM.swift
│ │ │ │ ├── LinuxVM.swift
│ │ │ │ ├── VM.swift
│ │ │ │ ├── VMDetails.swift
│ │ │ │ ├── VMDetailsPrinter.swift
│ │ │ │ ├── VMDisplayResolution.swift
│ │ │ │ └── VMFactory.swift
│ │ │ └── VNC
│ │ │ ├── PassphraseGenerator.swift
│ │ │ └── VNCService.swift
│ │ └── tests
│ │ ├── Mocks
│ │ │ ├── MockVM.swift
│ │ │ ├── MockVMVirtualizationService.swift
│ │ │ └── MockVNCService.swift
│ │ ├── VM
│ │ │ └── VMDetailsPrinterTests.swift
│ │ ├── VMTests.swift
│ │ ├── VMVirtualizationServiceTests.swift
│ │ └── VNCServiceTests.swift
│ ├── lumier
│ │ ├── .dockerignore
│ │ ├── Dockerfile
│ │ ├── README.md
│ │ └── src
│ │ ├── bin
│ │ │ └── entry.sh
│ │ ├── config
│ │ │ └── constants.sh
│ │ ├── hooks
│ │ │ └── on-logon.sh
│ │ └── lib
│ │ ├── utils.sh
│ │ └── vm.sh
│ ├── python
│ │ ├── agent
│ │ │ ├── .bumpversion.cfg
│ │ │ ├── agent
│ │ │ │ ├── __init__.py
│ │ │ │ ├── __main__.py
│ │ │ │ ├── adapters
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── cua_adapter.py
│ │ │ │ │ ├── huggingfacelocal_adapter.py
│ │ │ │ │ ├── human_adapter.py
│ │ │ │ │ ├── mlxvlm_adapter.py
│ │ │ │ │ └── models
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── generic.py
│ │ │ │ │ ├── internvl.py
│ │ │ │ │ ├── opencua.py
│ │ │ │ │ └── qwen2_5_vl.py
│ │ │ │ ├── agent.py
│ │ │ │ ├── callbacks
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── budget_manager.py
│ │ │ │ │ ├── image_retention.py
│ │ │ │ │ ├── logging.py
│ │ │ │ │ ├── operator_validator.py
│ │ │ │ │ ├── pii_anonymization.py
│ │ │ │ │ ├── prompt_instructions.py
│ │ │ │ │ ├── telemetry.py
│ │ │ │ │ └── trajectory_saver.py
│ │ │ │ ├── cli.py
│ │ │ │ ├── computers
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── cua.py
│ │ │ │ │ └── custom.py
│ │ │ │ ├── decorators.py
│ │ │ │ ├── human_tool
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── __main__.py
│ │ │ │ │ ├── server.py
│ │ │ │ │ └── ui.py
│ │ │ │ ├── integrations
│ │ │ │ │ └── hud
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── agent.py
│ │ │ │ │ └── proxy.py
│ │ │ │ ├── loops
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── anthropic.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── composed_grounded.py
│ │ │ │ │ ├── gelato.py
│ │ │ │ │ ├── gemini.py
│ │ │ │ │ ├── generic_vlm.py
│ │ │ │ │ ├── glm45v.py
│ │ │ │ │ ├── gta1.py
│ │ │ │ │ ├── holo.py
│ │ │ │ │ ├── internvl.py
│ │ │ │ │ ├── model_types.csv
│ │ │ │ │ ├── moondream3.py
│ │ │ │ │ ├── omniparser.py
│ │ │ │ │ ├── openai.py
│ │ │ │ │ ├── opencua.py
│ │ │ │ │ ├── uiins.py
│ │ │ │ │ ├── uitars.py
│ │ │ │ │ └── uitars2.py
│ │ │ │ ├── proxy
│ │ │ │ │ ├── examples.py
│ │ │ │ │ └── handlers.py
│ │ │ │ ├── responses.py
│ │ │ │ ├── tools
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ └── browser_tool.py
│ │ │ │ ├── types.py
│ │ │ │ └── ui
│ │ │ │ ├── __init__.py
│ │ │ │ ├── __main__.py
│ │ │ │ └── gradio
│ │ │ │ ├── __init__.py
│ │ │ │ ├── app.py
│ │ │ │ └── ui_components.py
│ │ │ ├── benchmarks
│ │ │ │ ├── .gitignore
│ │ │ │ ├── contrib.md
│ │ │ │ ├── interactive.py
│ │ │ │ ├── models
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ └── gta1.py
│ │ │ │ ├── README.md
│ │ │ │ ├── ss-pro.py
│ │ │ │ ├── ss-v2.py
│ │ │ │ └── utils.py
│ │ │ ├── example.py
│ │ │ ├── pyproject.toml
│ │ │ ├── README.md
│ │ │ └── tests
│ │ │ ├── conftest.py
│ │ │ └── test_computer_agent.py
│ │ ├── bench-ui
│ │ │ ├── bench_ui
│ │ │ │ ├── __init__.py
│ │ │ │ ├── api.py
│ │ │ │ └── child.py
│ │ │ ├── examples
│ │ │ │ ├── folder_example.py
│ │ │ │ ├── gui
│ │ │ │ │ ├── index.html
│ │ │ │ │ ├── logo.svg
│ │ │ │ │ └── styles.css
│ │ │ │ ├── output_overlay.png
│ │ │ │ └── simple_example.py
│ │ │ ├── pyproject.toml
│ │ │ ├── README.md
│ │ │ └── tests
│ │ │ └── test_port_detection.py
│ │ ├── computer
│ │ │ ├── .bumpversion.cfg
│ │ │ ├── computer
│ │ │ │ ├── __init__.py
│ │ │ │ ├── computer.py
│ │ │ │ ├── diorama_computer.py
│ │ │ │ ├── helpers.py
│ │ │ │ ├── interface
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── factory.py
│ │ │ │ │ ├── generic.py
│ │ │ │ │ ├── linux.py
│ │ │ │ │ ├── macos.py
│ │ │ │ │ ├── models.py
│ │ │ │ │ └── windows.py
│ │ │ │ ├── logger.py
│ │ │ │ ├── models.py
│ │ │ │ ├── providers
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── cloud
│ │ │ │ │ │ ├── __init__.py
│ │ │ │ │ │ └── provider.py
│ │ │ │ │ ├── docker
│ │ │ │ │ │ ├── __init__.py
│ │ │ │ │ │ └── provider.py
│ │ │ │ │ ├── factory.py
│ │ │ │ │ ├── lume
│ │ │ │ │ │ ├── __init__.py
│ │ │ │ │ │ └── provider.py
│ │ │ │ │ ├── lume_api.py
│ │ │ │ │ ├── lumier
│ │ │ │ │ │ ├── __init__.py
│ │ │ │ │ │ └── provider.py
│ │ │ │ │ ├── types.py
│ │ │ │ │ └── winsandbox
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── provider.py
│ │ │ │ │ └── setup_script.ps1
│ │ │ │ ├── tracing_wrapper.py
│ │ │ │ ├── tracing.py
│ │ │ │ ├── ui
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── __main__.py
│ │ │ │ │ └── gradio
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ └── app.py
│ │ │ │ └── utils.py
│ │ │ ├── poetry.toml
│ │ │ ├── pyproject.toml
│ │ │ ├── README.md
│ │ │ └── tests
│ │ │ ├── conftest.py
│ │ │ └── test_computer.py
│ │ ├── computer-server
│ │ │ ├── .bumpversion.cfg
│ │ │ ├── computer_server
│ │ │ │ ├── __init__.py
│ │ │ │ ├── __main__.py
│ │ │ │ ├── browser.py
│ │ │ │ ├── cli.py
│ │ │ │ ├── diorama
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── diorama_computer.py
│ │ │ │ │ ├── diorama.py
│ │ │ │ │ ├── draw.py
│ │ │ │ │ ├── macos.py
│ │ │ │ │ └── safezone.py
│ │ │ │ ├── handlers
│ │ │ │ │ ├── base.py
│ │ │ │ │ ├── factory.py
│ │ │ │ │ ├── generic.py
│ │ │ │ │ ├── linux.py
│ │ │ │ │ ├── macos.py
│ │ │ │ │ └── windows.py
│ │ │ │ ├── main.py
│ │ │ │ ├── server.py
│ │ │ │ ├── utils
│ │ │ │ │ ├── __init__.py
│ │ │ │ │ └── wallpaper.py
│ │ │ │ └── watchdog.py
│ │ │ ├── examples
│ │ │ │ ├── __init__.py
│ │ │ │ └── usage_example.py
│ │ │ ├── pyproject.toml
│ │ │ ├── README.md
│ │ │ ├── run_server.py
│ │ │ ├── test_connection.py
│ │ │ └── tests
│ │ │ ├── conftest.py
│ │ │ └── test_server.py
│ │ ├── core
│ │ │ ├── .bumpversion.cfg
│ │ │ ├── core
│ │ │ │ ├── __init__.py
│ │ │ │ └── telemetry
│ │ │ │ ├── __init__.py
│ │ │ │ └── posthog.py
│ │ │ ├── poetry.toml
│ │ │ ├── pyproject.toml
│ │ │ ├── README.md
│ │ │ └── tests
│ │ │ ├── conftest.py
│ │ │ └── test_telemetry.py
│ │ ├── mcp-server
│ │ │ ├── .bumpversion.cfg
│ │ │ ├── build-extension.py
│ │ │ ├── CONCURRENT_SESSIONS.md
│ │ │ ├── desktop-extension
│ │ │ │ ├── cua-extension.mcpb
│ │ │ │ ├── desktop_extension.png
│ │ │ │ ├── manifest.json
│ │ │ │ ├── README.md
│ │ │ │ ├── requirements.txt
│ │ │ │ ├── run_server.sh
│ │ │ │ └── setup.py
│ │ │ ├── mcp_server
│ │ │ │ ├── __init__.py
│ │ │ │ ├── __main__.py
│ │ │ │ ├── server.py
│ │ │ │ └── session_manager.py
│ │ │ ├── pdm.lock
│ │ │ ├── pyproject.toml
│ │ │ ├── QUICK_TEST_COMMANDS.sh
│ │ │ ├── quick_test_local_option.py
│ │ │ ├── README.md
│ │ │ ├── scripts
│ │ │ │ ├── install_mcp_server.sh
│ │ │ │ └── start_mcp_server.sh
│ │ │ ├── test_mcp_server_local_option.py
│ │ │ └── tests
│ │ │ ├── conftest.py
│ │ │ └── test_mcp_server.py
│ │ ├── pylume
│ │ │ └── tests
│ │ │ ├── conftest.py
│ │ │ └── test_pylume.py
│ │ └── som
│ │ ├── .bumpversion.cfg
│ │ ├── LICENSE
│ │ ├── poetry.toml
│ │ ├── pyproject.toml
│ │ ├── README.md
│ │ ├── som
│ │ │ ├── __init__.py
│ │ │ ├── detect.py
│ │ │ ├── detection.py
│ │ │ ├── models.py
│ │ │ ├── ocr.py
│ │ │ ├── util
│ │ │ │ └── utils.py
│ │ │ └── visualization.py
│ │ └── tests
│ │ ├── conftest.py
│ │ └── test_omniparser.py
│ ├── qemu-docker
│ │ ├── linux
│ │ │ ├── Dockerfile
│ │ │ ├── README.md
│ │ │ └── src
│ │ │ ├── entry.sh
│ │ │ └── vm
│ │ │ ├── image
│ │ │ │ └── README.md
│ │ │ └── setup
│ │ │ ├── install.sh
│ │ │ ├── setup-cua-server.sh
│ │ │ └── setup.sh
│ │ ├── README.md
│ │ └── windows
│ │ ├── Dockerfile
│ │ ├── README.md
│ │ └── src
│ │ ├── entry.sh
│ │ └── vm
│ │ ├── image
│ │ │ └── README.md
│ │ └── setup
│ │ ├── install.bat
│ │ ├── on-logon.ps1
│ │ ├── setup-cua-server.ps1
│ │ ├── setup-utils.psm1
│ │ └── setup.ps1
│ ├── typescript
│ │ ├── .gitignore
│ │ ├── .nvmrc
│ │ ├── agent
│ │ │ ├── examples
│ │ │ │ ├── playground-example.html
│ │ │ │ └── README.md
│ │ │ ├── package.json
│ │ │ ├── README.md
│ │ │ ├── src
│ │ │ │ ├── client.ts
│ │ │ │ ├── index.ts
│ │ │ │ └── types.ts
│ │ │ ├── tests
│ │ │ │ └── client.test.ts
│ │ │ ├── tsconfig.json
│ │ │ ├── tsdown.config.ts
│ │ │ └── vitest.config.ts
│ │ ├── computer
│ │ │ ├── .editorconfig
│ │ │ ├── .gitattributes
│ │ │ ├── .gitignore
│ │ │ ├── LICENSE
│ │ │ ├── package.json
│ │ │ ├── README.md
│ │ │ ├── src
│ │ │ │ ├── computer
│ │ │ │ │ ├── index.ts
│ │ │ │ │ ├── providers
│ │ │ │ │ │ ├── base.ts
│ │ │ │ │ │ ├── cloud.ts
│ │ │ │ │ │ └── index.ts
│ │ │ │ │ └── types.ts
│ │ │ │ ├── index.ts
│ │ │ │ ├── interface
│ │ │ │ │ ├── base.ts
│ │ │ │ │ ├── factory.ts
│ │ │ │ │ ├── index.ts
│ │ │ │ │ ├── linux.ts
│ │ │ │ │ ├── macos.ts
│ │ │ │ │ └── windows.ts
│ │ │ │ └── types.ts
│ │ │ ├── tests
│ │ │ │ ├── computer
│ │ │ │ │ └── cloud.test.ts
│ │ │ │ ├── interface
│ │ │ │ │ ├── factory.test.ts
│ │ │ │ │ ├── index.test.ts
│ │ │ │ │ ├── linux.test.ts
│ │ │ │ │ ├── macos.test.ts
│ │ │ │ │ └── windows.test.ts
│ │ │ │ └── setup.ts
│ │ │ ├── tsconfig.json
│ │ │ ├── tsdown.config.ts
│ │ │ └── vitest.config.ts
│ │ ├── core
│ │ │ ├── .editorconfig
│ │ │ ├── .gitattributes
│ │ │ ├── .gitignore
│ │ │ ├── LICENSE
│ │ │ ├── package.json
│ │ │ ├── README.md
│ │ │ ├── src
│ │ │ │ ├── index.ts
│ │ │ │ └── telemetry
│ │ │ │ ├── clients
│ │ │ │ │ ├── index.ts
│ │ │ │ │ └── posthog.ts
│ │ │ │ └── index.ts
│ │ │ ├── tests
│ │ │ │ └── telemetry.test.ts
│ │ │ ├── tsconfig.json
│ │ │ ├── tsdown.config.ts
│ │ │ └── vitest.config.ts
│ │ ├── cua-cli
│ │ │ ├── .gitignore
│ │ │ ├── .prettierrc
│ │ │ ├── bun.lock
│ │ │ ├── CLAUDE.md
│ │ │ ├── index.ts
│ │ │ ├── package.json
│ │ │ ├── README.md
│ │ │ ├── src
│ │ │ │ ├── auth.ts
│ │ │ │ ├── cli.ts
│ │ │ │ ├── commands
│ │ │ │ │ ├── auth.ts
│ │ │ │ │ └── sandbox.ts
│ │ │ │ ├── config.ts
│ │ │ │ ├── http.ts
│ │ │ │ ├── storage.ts
│ │ │ │ └── util.ts
│ │ │ └── tsconfig.json
│ │ ├── package.json
│ │ ├── pnpm-lock.yaml
│ │ ├── pnpm-workspace.yaml
│ │ └── README.md
│ └── xfce
│ ├── .dockerignore
│ ├── .gitignore
│ ├── Development.md
│ ├── Dockerfile
│ ├── Dockerfile.dev
│ ├── README.md
│ └── src
│ ├── scripts
│ │ ├── resize-display.sh
│ │ ├── start-computer-server.sh
│ │ ├── start-novnc.sh
│ │ ├── start-vnc.sh
│ │ └── xstartup.sh
│ ├── supervisor
│ │ └── supervisord.conf
│ └── xfce-config
│ ├── helpers.rc
│ ├── xfce4-power-manager.xml
│ └── xfce4-session.xml
├── LICENSE.md
├── Makefile
├── notebooks
│ ├── agent_nb.ipynb
│ ├── blog
│ │ ├── build-your-own-operator-on-macos-1.ipynb
│ │ └── build-your-own-operator-on-macos-2.ipynb
│ ├── composite_agents_docker_nb.ipynb
│ ├── computer_nb.ipynb
│ ├── computer_server_nb.ipynb
│ ├── customizing_computeragent.ipynb
│ ├── eval_osworld.ipynb
│ ├── ollama_nb.ipynb
│ ├── README.md
│ ├── sota_hackathon_cloud.ipynb
│ └── sota_hackathon.ipynb
├── package-lock.json
├── package.json
├── pnpm-lock.yaml
├── pyproject.toml
├── pyrightconfig.json
├── README.md
├── scripts
│ ├── install-cli.ps1
│ ├── install-cli.sh
│ ├── playground-docker.sh
│ ├── playground.sh
│ ├── run-docker-dev.sh
│ └── typescript-typecheck.js
├── TESTING.md
├── tests
│ ├── agent_loop_testing
│ │ ├── agent_test.py
│ │ └── README.md
│ ├── pytest.ini
│ ├── shell_cmd.py
│ ├── test_files.py
│ ├── test_mcp_server_session_management.py
│ ├── test_mcp_server_streaming.py
│ ├── test_shell_bash.py
│ ├── test_telemetry.py
│ ├── test_tracing.py
│ ├── test_venv.py
│ └── test_watchdog.py
└── uv.lock
```
# Files
--------------------------------------------------------------------------------
/libs/python/computer/computer/providers/lume/provider.py:
--------------------------------------------------------------------------------
```python
1 | """Lume VM provider implementation using curl commands.
2 |
3 | This provider uses direct curl commands to interact with the Lume API,
4 | removing the dependency on the pylume Python package.
5 | """
6 |
7 | import asyncio
8 | import json
9 | import logging
10 | import os
11 | import re
12 | import subprocess
13 | import urllib.parse
14 | from typing import Any, Dict, List, Optional, Tuple
15 |
16 | from ...logger import Logger, LogLevel
17 | from ..base import BaseVMProvider, VMProviderType
18 | from ..lume_api import (
19 | HAS_CURL,
20 | lume_api_get,
21 | lume_api_pull,
22 | lume_api_run,
23 | lume_api_stop,
24 | lume_api_update,
25 | parse_memory,
26 | )
27 |
28 | # Setup logging
29 | logger = logging.getLogger(__name__)
30 |
31 |
32 | class LumeProvider(BaseVMProvider):
33 | """Lume VM provider implementation using direct curl commands.
34 |
35 | This provider uses curl to interact with the Lume API server,
36 | removing the dependency on the pylume Python package.
37 | """
38 |
39 | def __init__(
40 | self,
41 | provider_port: int = 7777,
42 | host: str = "localhost",
43 | storage: Optional[str] = None,
44 | verbose: bool = False,
45 | ephemeral: bool = False,
46 | ):
47 | """Initialize the Lume provider.
48 |
49 | Args:
50 | provider_port: Port for the Lume API server (default: 7777)
51 | host: Host to use for API connections (default: localhost)
52 | storage: Path to store VM data
53 | verbose: Enable verbose logging
54 | """
55 | if not HAS_CURL:
56 | raise ImportError(
57 | "curl is required for LumeProvider. "
58 | "Please ensure it is installed and in your PATH."
59 | )
60 |
61 | self.host = host
62 | self.port = provider_port # Default port for Lume API
63 | self.storage = storage
64 | self.verbose = verbose
65 | self.ephemeral = ephemeral # If True, VMs will be deleted after stopping
66 |
67 | # Base API URL for Lume API calls
68 | self.api_base_url = f"http://{self.host}:{self.port}"
69 |
70 | self.logger = logging.getLogger(__name__)
71 |
72 | @property
73 | def provider_type(self) -> VMProviderType:
74 | """Get the provider type."""
75 | return VMProviderType.LUME
76 |
77 | async def __aenter__(self):
78 | """Enter async context manager."""
79 | # No initialization needed, just return self
80 | return self
81 |
82 | async def __aexit__(self, exc_type, exc_val, exc_tb):
83 | """Exit async context manager."""
84 | # No cleanup needed
85 | pass
86 |
87 | def _lume_api_get(
88 | self, vm_name: str = "", storage: Optional[str] = None, debug: bool = False
89 | ) -> Dict[str, Any]:
90 | """Get VM information using shared lume_api function.
91 |
92 | Args:
93 | vm_name: Optional name of the VM to get info for.
94 | If empty, lists all VMs.
95 | storage: Optional storage path override. If provided, this will be used instead of self.storage
96 | debug: Whether to show debug output
97 |
98 | Returns:
99 | Dictionary with VM status information parsed from JSON response
100 | """
101 | # Use the shared implementation from lume_api module
102 | return lume_api_get(
103 | vm_name=vm_name,
104 | host=self.host,
105 | port=self.port,
106 | storage=storage if storage is not None else self.storage,
107 | debug=debug,
108 | verbose=self.verbose,
109 | )
110 |
111 | def _lume_api_run(
112 | self, vm_name: str, run_opts: Dict[str, Any], debug: bool = False
113 | ) -> Dict[str, Any]:
114 | """Run a VM using shared lume_api function.
115 |
116 | Args:
117 | vm_name: Name of the VM to run
118 | run_opts: Dictionary of run options
119 | debug: Whether to show debug output
120 |
121 | Returns:
122 | Dictionary with API response or error information
123 | """
124 | # Use the shared implementation from lume_api module
125 | return lume_api_run(
126 | vm_name=vm_name,
127 | host=self.host,
128 | port=self.port,
129 | run_opts=run_opts,
130 | storage=self.storage,
131 | debug=debug,
132 | verbose=self.verbose,
133 | )
134 |
135 | def _lume_api_stop(self, vm_name: str, debug: bool = False) -> Dict[str, Any]:
136 | """Stop a VM using shared lume_api function.
137 |
138 | Args:
139 | vm_name: Name of the VM to stop
140 | debug: Whether to show debug output
141 |
142 | Returns:
143 | Dictionary with API response or error information
144 | """
145 | # Use the shared implementation from lume_api module
146 | return lume_api_stop(
147 | vm_name=vm_name,
148 | host=self.host,
149 | port=self.port,
150 | storage=self.storage,
151 | debug=debug,
152 | verbose=self.verbose,
153 | )
154 |
155 | def _lume_api_update(
156 | self, vm_name: str, update_opts: Dict[str, Any], debug: bool = False
157 | ) -> Dict[str, Any]:
158 | """Update VM configuration using shared lume_api function.
159 |
160 | Args:
161 | vm_name: Name of the VM to update
162 | update_opts: Dictionary of update options
163 | debug: Whether to show debug output
164 |
165 | Returns:
166 | Dictionary with API response or error information
167 | """
168 | # Use the shared implementation from lume_api module
169 | return lume_api_update(
170 | vm_name=vm_name,
171 | host=self.host,
172 | port=self.port,
173 | update_opts=update_opts,
174 | storage=self.storage,
175 | debug=debug,
176 | verbose=self.verbose,
177 | )
178 |
179 | async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
180 | """Get VM information by name.
181 |
182 | Args:
183 | name: Name of the VM to get information for
184 | storage: Optional storage path override. If provided, this will be used
185 | instead of the provider's default storage path.
186 |
187 | Returns:
188 | Dictionary with VM information including status, IP address, etc.
189 |
190 | Note:
191 | If storage is not provided, the provider's default storage path will be used.
192 | The storage parameter allows overriding the storage location for this specific call.
193 | """
194 | if not HAS_CURL:
195 | logger.error("curl is not available. Cannot get VM status.")
196 | return {"name": name, "status": "unavailable", "error": "curl is not available"}
197 |
198 | # First try to get detailed VM info from the API
199 | try:
200 | # Query the Lume API for VM status using the provider's storage_path
201 | vm_info = self._lume_api_get(
202 | vm_name=name,
203 | storage=storage if storage is not None else self.storage,
204 | debug=self.verbose,
205 | )
206 |
207 | # Check for API errors
208 | if "error" in vm_info:
209 | logger.debug(f"API request error: {vm_info['error']}")
210 | # If we got an error from the API, report the VM as not ready yet
211 | return {
212 | "name": name,
213 | "status": "starting", # VM is still starting - do not attempt to connect yet
214 | "api_status": "error",
215 | "error": vm_info["error"],
216 | }
217 |
218 | # Process the VM status information
219 | vm_status = vm_info.get("status", "unknown")
220 |
221 | # Check if VM is stopped or not running - don't wait for IP in this case
222 | if vm_status == "stopped":
223 | logger.info(f"VM {name} is in '{vm_status}' state - not waiting for IP address")
224 | # Return the status as-is without waiting for an IP
225 | result = {
226 | "name": name,
227 | "status": vm_status,
228 | **vm_info, # Include all original fields from the API response
229 | }
230 | return result
231 |
232 | # Handle field name differences between APIs
233 | # Some APIs use camelCase, others use snake_case
234 | if "vncUrl" in vm_info:
235 | vnc_url = vm_info["vncUrl"]
236 | elif "vnc_url" in vm_info:
237 | vnc_url = vm_info["vnc_url"]
238 | else:
239 | vnc_url = ""
240 |
241 | if "ipAddress" in vm_info:
242 | ip_address = vm_info["ipAddress"]
243 | elif "ip_address" in vm_info:
244 | ip_address = vm_info["ip_address"]
245 | else:
246 | # If no IP address is provided and VM is supposed to be running,
247 | # report it as still starting
248 | ip_address = None
249 | logger.info(
250 | f"VM {name} is in '{vm_status}' state but no IP address found - reporting as still starting"
251 | )
252 |
253 | logger.info(f"VM {name} status: {vm_status}")
254 |
255 | # Return the complete status information
256 | result = {
257 | "name": name,
258 | "status": vm_status if vm_status else "running",
259 | "ip_address": ip_address,
260 | "vnc_url": vnc_url,
261 | "api_status": "ok",
262 | }
263 |
264 | # Include all original fields from the API response
265 | if isinstance(vm_info, dict):
266 | for key, value in vm_info.items():
267 | if key not in result: # Don't override our carefully processed fields
268 | result[key] = value
269 |
270 | return result
271 |
272 | except Exception as e:
273 | logger.error(f"Failed to get VM status: {e}")
274 | # Return a fallback status that indicates the VM is not ready yet
275 | return {
276 | "name": name,
277 | "status": "initializing", # VM is still initializing
278 | "error": f"Failed to get VM status: {str(e)}",
279 | }
280 |
281 | async def list_vms(self) -> List[Dict[str, Any]]:
282 | """List all available VMs."""
283 | result = self._lume_api_get(debug=self.verbose)
284 |
285 | # Extract the VMs list from the response
286 | if "vms" in result and isinstance(result["vms"], list):
287 | return result["vms"]
288 | elif "error" in result:
289 | logger.error(f"Error listing VMs: {result['error']}")
290 | return []
291 | else:
292 | return []
293 |
294 | async def run_vm(
295 | self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None
296 | ) -> Dict[str, Any]:
297 | """Run a VM with the given options.
298 |
299 | If the VM does not exist in the storage location, this will attempt to pull it
300 | from the Lume registry first.
301 |
302 | Args:
303 | image: Image name to use when pulling the VM if it doesn't exist
304 | name: Name of the VM to run
305 | run_opts: Dictionary of run options (memory, cpu, etc.)
306 | storage: Optional storage path override. If provided, this will be used
307 | instead of the provider's default storage path.
308 |
309 | Returns:
310 | Dictionary with VM run status and information
311 | """
312 | # First check if VM exists by trying to get its info
313 | vm_info = await self.get_vm(name, storage=storage)
314 |
315 | if "error" in vm_info:
316 | # VM doesn't exist, try to pull it
317 | self.logger.info(
318 | f"VM {name} not found, attempting to pull image {image} from registry..."
319 | )
320 |
321 | # Call pull_vm with the image parameter
322 | pull_result = await self.pull_vm(name=name, image=image, storage=storage)
323 |
324 | # Check if pull was successful
325 | if "error" in pull_result:
326 | self.logger.error(f"Failed to pull VM image: {pull_result['error']}")
327 | return pull_result # Return the error from pull
328 |
329 | self.logger.info(f"Successfully pulled VM image {image} as {name}")
330 |
331 | # Now run the VM with the given options
332 | self.logger.info(f"Running VM {name} with options: {run_opts}")
333 |
334 | from ..lume_api import lume_api_run
335 |
336 | return lume_api_run(
337 | vm_name=name,
338 | host=self.host,
339 | port=self.port,
340 | run_opts=run_opts,
341 | storage=storage if storage is not None else self.storage,
342 | debug=self.verbose,
343 | verbose=self.verbose,
344 | )
345 |
346 | async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
347 | """Stop a running VM.
348 |
349 | If this provider was initialized with ephemeral=True, the VM will also
350 | be deleted after it is stopped.
351 |
352 | Args:
353 | name: Name of the VM to stop
354 | storage: Optional storage path override
355 |
356 | Returns:
357 | Dictionary with stop status and information
358 | """
359 | # Stop the VM first
360 | stop_result = self._lume_api_stop(name, debug=self.verbose)
361 |
362 | # Log ephemeral status for debugging
363 | self.logger.info(f"Ephemeral mode status: {self.ephemeral}")
364 |
365 | # If ephemeral mode is enabled, delete the VM after stopping
366 | if self.ephemeral and (stop_result.get("success", False) or "error" not in stop_result):
367 | self.logger.info(f"Ephemeral mode enabled - deleting VM {name} after stopping")
368 | try:
369 | delete_result = await self.delete_vm(name, storage=storage)
370 |
371 | # Return combined result
372 | return {
373 | **stop_result, # Include all stop result info
374 | "deleted": True,
375 | "delete_result": delete_result,
376 | }
377 | except Exception as e:
378 | self.logger.error(f"Failed to delete ephemeral VM {name}: {e}")
379 | # Include the error but still return stop result
380 | return {**stop_result, "deleted": False, "delete_error": str(e)}
381 |
382 | # Just return the stop result if not ephemeral
383 | return stop_result
384 |
385 | async def pull_vm(
386 | self,
387 | name: str,
388 | image: str,
389 | storage: Optional[str] = None,
390 | registry: str = "ghcr.io",
391 | organization: str = "trycua",
392 | pull_opts: Optional[Dict[str, Any]] = None,
393 | ) -> Dict[str, Any]:
394 | """Pull a VM image from the registry.
395 |
396 | Args:
397 | name: Name for the VM after pulling
398 | image: The image name to pull (e.g. 'macos-sequoia-cua:latest')
399 | storage: Optional storage path to use
400 | registry: Registry to pull from (default: ghcr.io)
401 | organization: Organization in registry (default: trycua)
402 | pull_opts: Additional options for pulling the VM (optional)
403 |
404 | Returns:
405 | Dictionary with information about the pulled VM
406 |
407 | Raises:
408 | RuntimeError: If pull operation fails or image is not provided
409 | """
410 | # Validate image parameter
411 | if not image:
412 | raise ValueError("Image parameter is required for pull_vm")
413 |
414 | self.logger.info(f"Pulling VM image '{image}' as '{name}'")
415 | self.logger.info("You can check the pull progress using: lume logs -f")
416 |
417 | # Set default pull_opts if not provided
418 | if pull_opts is None:
419 | pull_opts = {}
420 |
421 | # Log information about the operation
422 | self.logger.debug(f"Pull storage location: {storage or 'default'}")
423 |
424 | try:
425 | # Call the lume_api_pull function from lume_api.py
426 | from ..lume_api import lume_api_pull
427 |
428 | result = lume_api_pull(
429 | image=image,
430 | name=name,
431 | host=self.host,
432 | port=self.port,
433 | storage=storage if storage is not None else self.storage,
434 | registry=registry,
435 | organization=organization,
436 | debug=self.verbose,
437 | verbose=self.verbose,
438 | )
439 |
440 | # Check for errors in the result
441 | if "error" in result:
442 | self.logger.error(f"Failed to pull VM image: {result['error']}")
443 | return result
444 |
445 | self.logger.info(f"Successfully pulled VM image '{image}' as '{name}'")
446 | return result
447 | except Exception as e:
448 | self.logger.error(f"Failed to pull VM image '{image}': {e}")
449 | return {"error": f"Failed to pull VM: {str(e)}"}
450 |
451 | async def delete_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
452 | """Delete a VM permanently.
453 |
454 | Args:
455 | name: Name of the VM to delete
456 | storage: Optional storage path override
457 |
458 | Returns:
459 | Dictionary with delete status and information
460 | """
461 | self.logger.info(f"Deleting VM {name}...")
462 |
463 | try:
464 | # Call the lume_api_delete function we created
465 | from ..lume_api import lume_api_delete
466 |
467 | result = lume_api_delete(
468 | vm_name=name,
469 | host=self.host,
470 | port=self.port,
471 | storage=storage if storage is not None else self.storage,
472 | debug=self.verbose,
473 | verbose=self.verbose,
474 | )
475 |
476 | # Check for errors in the result
477 | if "error" in result:
478 | self.logger.error(f"Failed to delete VM: {result['error']}")
479 | return result
480 |
481 | self.logger.info(f"Successfully deleted VM '{name}'")
482 | return result
483 | except Exception as e:
484 | self.logger.error(f"Failed to delete VM '{name}': {e}")
485 | return {"error": f"Failed to delete VM: {str(e)}"}
486 |
487 | async def update_vm(
488 | self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None
489 | ) -> Dict[str, Any]:
490 | """Update VM configuration."""
491 | return self._lume_api_update(name, update_opts, debug=self.verbose)
492 |
493 | async def restart_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
494 | raise NotImplementedError("LumeProvider does not support restarting VMs.")
495 |
496 | async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> str:
497 | """Get the IP address of a VM, waiting indefinitely until it's available.
498 |
499 | Args:
500 | name: Name of the VM to get the IP for
501 | storage: Optional storage path override
502 | retry_delay: Delay between retries in seconds (default: 2)
503 |
504 | Returns:
505 | IP address of the VM when it becomes available
506 | """
507 | # Track total attempts for logging purposes
508 | total_attempts = 0
509 |
510 | # Loop indefinitely until we get a valid IP
511 | while True:
512 | total_attempts += 1
513 |
514 | # Log retry message but not on first attempt
515 | if total_attempts > 1:
516 | self.logger.info(f"Waiting for VM {name} IP address (attempt {total_attempts})...")
517 |
518 | try:
519 | # Get VM information
520 | vm_info = await self.get_vm(name, storage=storage)
521 |
522 | # Check if we got a valid IP
523 | ip = vm_info.get("ip_address", None)
524 | if ip and ip != "unknown" and not ip.startswith("0.0.0.0"):
525 | self.logger.info(f"Got valid VM IP address: {ip}")
526 | return ip
527 |
528 | # Check the VM status
529 | status = vm_info.get("status", "unknown")
530 |
531 | # If VM is not running yet, log and wait
532 | if status != "running":
533 | self.logger.info(f"VM is not running yet (status: {status}). Waiting...")
534 | # If VM is running but no IP yet, wait and retry
535 | else:
536 | self.logger.info("VM is running but no valid IP address yet. Waiting...")
537 |
538 | except Exception as e:
539 | self.logger.warning(f"Error getting VM {name} IP: {e}, continuing to wait...")
540 |
541 | # Wait before next retry
542 | await asyncio.sleep(retry_delay)
543 |
544 | # Add progress log every 10 attempts
545 | if total_attempts % 10 == 0:
546 | self.logger.info(
547 | f"Still waiting for VM {name} IP after {total_attempts} attempts..."
548 | )
549 |
```
--------------------------------------------------------------------------------
/libs/python/computer/computer/providers/lume_api.py:
--------------------------------------------------------------------------------
```python
1 | """Shared API utilities for Lume and Lumier providers.
2 |
3 | This module contains shared functions for interacting with the Lume API,
4 | used by both the LumeProvider and LumierProvider classes.
5 | """
6 |
7 | import json
8 | import logging
9 | import subprocess
10 | import urllib.parse
11 | from typing import Any, Dict, List, Optional
12 |
13 | from computer.utils import safe_join
14 |
15 | # Setup logging
16 | logger = logging.getLogger(__name__)
17 |
18 | # Check if curl is available
19 | try:
20 | subprocess.run(["curl", "--version"], capture_output=True, check=True)
21 | HAS_CURL = True
22 | except (subprocess.SubprocessError, FileNotFoundError):
23 | HAS_CURL = False
24 |
25 |
26 | def lume_api_get(
27 | vm_name: str,
28 | host: str,
29 | port: int,
30 | storage: Optional[str] = None,
31 | debug: bool = False,
32 | verbose: bool = False,
33 | ) -> Dict[str, Any]:
34 | """Use curl to get VM information from Lume API.
35 |
36 | Args:
37 | vm_name: Name of the VM to get info for
38 | host: API host
39 | port: API port
40 | storage: Storage path for the VM
41 | debug: Whether to show debug output
42 | verbose: Enable verbose logging
43 |
44 | Returns:
45 | Dictionary with VM status information parsed from JSON response
46 | """
47 | # URL encode the storage parameter for the query
48 | encoded_storage = ""
49 | storage_param = ""
50 |
51 | if storage:
52 | # First encode the storage path properly
53 | encoded_storage = urllib.parse.quote(storage, safe="")
54 | storage_param = f"?storage={encoded_storage}"
55 |
56 | # Construct API URL with encoded storage parameter if needed
57 | api_url = f"http://{host}:{port}/lume/vms/{vm_name}{storage_param}"
58 |
59 | # Construct the curl command with increased timeouts for more reliability
60 | # --connect-timeout: Time to establish connection (15 seconds)
61 | # --max-time: Maximum time for the whole operation (20 seconds)
62 | # -f: Fail silently (no output at all) on server errors
63 | # Add single quotes around URL to ensure special characters are handled correctly
64 | cmd = ["curl", "--connect-timeout", "15", "--max-time", "20", "-s", "-f", api_url]
65 |
66 | # For logging and display, show the properly escaped URL
67 | display_cmd = ["curl", "--connect-timeout", "15", "--max-time", "20", "-s", "-f", api_url]
68 |
69 | # Only print the curl command when debug is enabled
70 | display_curl_string = " ".join(display_cmd)
71 | logger.debug(f"Executing API request: {display_curl_string}")
72 |
73 | # Execute the command - for execution we need to use shell=True to handle URLs with special characters
74 | try:
75 | # Use a single string with shell=True for proper URL handling
76 | shell_cmd = safe_join(cmd)
77 | result = subprocess.run(shell_cmd, shell=True, capture_output=True, text=True)
78 |
79 | # Handle curl exit codes
80 | if result.returncode != 0:
81 | curl_error = "Unknown error"
82 |
83 | # Map common curl error codes to helpful messages
84 | if result.returncode == 7:
85 | curl_error = "Failed to connect to the API server - it might still be starting up"
86 | elif result.returncode == 22:
87 | curl_error = "HTTP error returned from API server"
88 | elif result.returncode == 28:
89 | curl_error = "Operation timeout - the API server is taking too long to respond"
90 | elif result.returncode == 52:
91 | curl_error = (
92 | "Empty reply from server - the API server is starting but not fully ready yet"
93 | )
94 | elif result.returncode == 56:
95 | curl_error = "Network problem during data transfer - check container networking"
96 |
97 | # Only log at debug level to reduce noise during retries
98 | logger.debug(f"API request failed with code {result.returncode}: {curl_error}")
99 |
100 | # Return a more useful error message
101 | return {
102 | "error": f"API request failed: {curl_error}",
103 | "curl_code": result.returncode,
104 | "vm_name": vm_name,
105 | "status": "unknown", # We don't know the actual status due to API error
106 | }
107 |
108 | # Try to parse the response as JSON
109 | if result.stdout and result.stdout.strip():
110 | try:
111 | vm_status = json.loads(result.stdout)
112 | if debug or verbose:
113 | logger.info(
114 | f"Successfully parsed VM status: {vm_status.get('status', 'unknown')}"
115 | )
116 | return vm_status
117 | except json.JSONDecodeError as e:
118 | # Return the raw response if it's not valid JSON
119 | logger.warning(f"Invalid JSON response: {e}")
120 | if "Virtual machine not found" in result.stdout:
121 | return {"status": "not_found", "message": "VM not found in Lume API"}
122 |
123 | return {
124 | "error": f"Invalid JSON response: {result.stdout[:100]}...",
125 | "status": "unknown",
126 | }
127 | else:
128 | return {"error": "Empty response from API", "status": "unknown"}
129 | except subprocess.SubprocessError as e:
130 | logger.error(f"Failed to execute API request: {e}")
131 | return {"error": f"Failed to execute API request: {str(e)}", "status": "unknown"}
132 |
133 |
134 | def lume_api_run(
135 | vm_name: str,
136 | host: str,
137 | port: int,
138 | run_opts: Dict[str, Any],
139 | storage: Optional[str] = None,
140 | debug: bool = False,
141 | verbose: bool = False,
142 | ) -> Dict[str, Any]:
143 | """Run a VM using curl.
144 |
145 | Args:
146 | vm_name: Name of the VM to run
147 | host: API host
148 | port: API port
149 | run_opts: Dictionary of run options
150 | storage: Storage path for the VM
151 | debug: Whether to show debug output
152 | verbose: Enable verbose logging
153 |
154 | Returns:
155 | Dictionary with API response or error information
156 | """
157 | # Construct API URL
158 | api_url = f"http://{host}:{port}/lume/vms/{vm_name}/run"
159 |
160 | # Prepare JSON payload with required parameters
161 | payload = {}
162 |
163 | # Add CPU cores if specified
164 | if "cpu" in run_opts:
165 | payload["cpu"] = run_opts["cpu"]
166 |
167 | # Add memory if specified
168 | if "memory" in run_opts:
169 | payload["memory"] = run_opts["memory"]
170 |
171 | # Add storage parameter if specified
172 | if storage:
173 | payload["storage"] = storage
174 | elif "storage" in run_opts:
175 | payload["storage"] = run_opts["storage"]
176 |
177 | # Add shared directories if specified
178 | if "shared_directories" in run_opts and run_opts["shared_directories"]:
179 | payload["sharedDirectories"] = run_opts["shared_directories"]
180 |
181 | # Log the payload for debugging
182 | logger.debug(f"API payload: {json.dumps(payload, indent=2)}")
183 |
184 | # Construct the curl command
185 | cmd = [
186 | "curl",
187 | "--connect-timeout",
188 | "30",
189 | "--max-time",
190 | "30",
191 | "-s",
192 | "-X",
193 | "POST",
194 | "-H",
195 | "Content-Type: application/json",
196 | "-d",
197 | json.dumps(payload),
198 | api_url,
199 | ]
200 |
201 | # Execute the command
202 | try:
203 | result = subprocess.run(cmd, capture_output=True, text=True)
204 |
205 | if result.returncode != 0:
206 | logger.warning(f"API request failed with code {result.returncode}: {result.stderr}")
207 | return {"error": f"API request failed: {result.stderr}"}
208 |
209 | # Try to parse the response as JSON
210 | if result.stdout and result.stdout.strip():
211 | try:
212 | response = json.loads(result.stdout)
213 | return response
214 | except json.JSONDecodeError:
215 | # Return the raw response if it's not valid JSON
216 | return {
217 | "success": True,
218 | "message": "VM started successfully",
219 | "raw_response": result.stdout,
220 | }
221 | else:
222 | return {"success": True, "message": "VM started successfully"}
223 | except subprocess.SubprocessError as e:
224 | logger.error(f"Failed to execute run request: {e}")
225 | return {"error": f"Failed to execute run request: {str(e)}"}
226 |
227 |
228 | def lume_api_stop(
229 | vm_name: str,
230 | host: str,
231 | port: int,
232 | storage: Optional[str] = None,
233 | debug: bool = False,
234 | verbose: bool = False,
235 | ) -> Dict[str, Any]:
236 | """Stop a VM using curl.
237 |
238 | Args:
239 | vm_name: Name of the VM to stop
240 | host: API host
241 | port: API port
242 | storage: Storage path for the VM
243 | debug: Whether to show debug output
244 | verbose: Enable verbose logging
245 |
246 | Returns:
247 | Dictionary with API response or error information
248 | """
249 | # Construct API URL
250 | api_url = f"http://{host}:{port}/lume/vms/{vm_name}/stop"
251 |
252 | # Prepare JSON payload with required parameters
253 | payload = {}
254 |
255 | # Add storage path if specified
256 | if storage:
257 | payload["storage"] = storage
258 |
259 | # Construct the curl command
260 | cmd = [
261 | "curl",
262 | "--connect-timeout",
263 | "15",
264 | "--max-time",
265 | "20",
266 | "-s",
267 | "-X",
268 | "POST",
269 | "-H",
270 | "Content-Type: application/json",
271 | "-d",
272 | json.dumps(payload),
273 | api_url,
274 | ]
275 |
276 | # Execute the command
277 | try:
278 | if debug or verbose:
279 | logger.info(f"Executing: {' '.join(cmd)}")
280 |
281 | result = subprocess.run(cmd, capture_output=True, text=True)
282 |
283 | if result.returncode != 0:
284 | logger.warning(f"API request failed with code {result.returncode}: {result.stderr}")
285 | return {"error": f"API request failed: {result.stderr}"}
286 |
287 | # Try to parse the response as JSON
288 | if result.stdout and result.stdout.strip():
289 | try:
290 | response = json.loads(result.stdout)
291 | return response
292 | except json.JSONDecodeError:
293 | # Return the raw response if it's not valid JSON
294 | return {
295 | "success": True,
296 | "message": "VM stopped successfully",
297 | "raw_response": result.stdout,
298 | }
299 | else:
300 | return {"success": True, "message": "VM stopped successfully"}
301 | except subprocess.SubprocessError as e:
302 | logger.error(f"Failed to execute stop request: {e}")
303 | return {"error": f"Failed to execute stop request: {str(e)}"}
304 |
305 |
306 | def lume_api_update(
307 | vm_name: str,
308 | host: str,
309 | port: int,
310 | update_opts: Dict[str, Any],
311 | storage: Optional[str] = None,
312 | debug: bool = False,
313 | verbose: bool = False,
314 | ) -> Dict[str, Any]:
315 | """Update VM settings using curl.
316 |
317 | Args:
318 | vm_name: Name of the VM to update
319 | host: API host
320 | port: API port
321 | update_opts: Dictionary of update options
322 | storage: Storage path for the VM
323 | debug: Whether to show debug output
324 | verbose: Enable verbose logging
325 |
326 | Returns:
327 | Dictionary with API response or error information
328 | """
329 | # Construct API URL
330 | api_url = f"http://{host}:{port}/lume/vms/{vm_name}/update"
331 |
332 | # Prepare JSON payload with required parameters
333 | payload = {}
334 |
335 | # Add CPU cores if specified
336 | if "cpu" in update_opts:
337 | payload["cpu"] = update_opts["cpu"]
338 |
339 | # Add memory if specified
340 | if "memory" in update_opts:
341 | payload["memory"] = update_opts["memory"]
342 |
343 | # Add storage path if specified
344 | if storage:
345 | payload["storage"] = storage
346 |
347 | # Construct the curl command
348 | cmd = [
349 | "curl",
350 | "--connect-timeout",
351 | "15",
352 | "--max-time",
353 | "20",
354 | "-s",
355 | "-X",
356 | "POST",
357 | "-H",
358 | "Content-Type: application/json",
359 | "-d",
360 | json.dumps(payload),
361 | api_url,
362 | ]
363 |
364 | # Execute the command
365 | try:
366 | if debug:
367 | logger.info(f"Executing: {' '.join(cmd)}")
368 |
369 | result = subprocess.run(cmd, capture_output=True, text=True)
370 |
371 | if result.returncode != 0:
372 | logger.warning(f"API request failed with code {result.returncode}: {result.stderr}")
373 | return {"error": f"API request failed: {result.stderr}"}
374 |
375 | # Try to parse the response as JSON
376 | if result.stdout and result.stdout.strip():
377 | try:
378 | response = json.loads(result.stdout)
379 | return response
380 | except json.JSONDecodeError:
381 | # Return the raw response if it's not valid JSON
382 | return {
383 | "success": True,
384 | "message": "VM updated successfully",
385 | "raw_response": result.stdout,
386 | }
387 | else:
388 | return {"success": True, "message": "VM updated successfully"}
389 | except subprocess.SubprocessError as e:
390 | logger.error(f"Failed to execute update request: {e}")
391 | return {"error": f"Failed to execute update request: {str(e)}"}
392 |
393 |
394 | def lume_api_pull(
395 | image: str,
396 | name: str,
397 | host: str,
398 | port: int,
399 | storage: Optional[str] = None,
400 | registry: str = "ghcr.io",
401 | organization: str = "trycua",
402 | debug: bool = False,
403 | verbose: bool = False,
404 | ) -> Dict[str, Any]:
405 | """Pull a VM image from a registry using curl.
406 |
407 | Args:
408 | image: Name/tag of the image to pull
409 | name: Name to give the VM after pulling
410 | host: API host
411 | port: API port
412 | storage: Storage path for the VM
413 | registry: Registry to pull from (default: ghcr.io)
414 | organization: Organization in registry (default: trycua)
415 | debug: Whether to show debug output
416 | verbose: Enable verbose logging
417 |
418 | Returns:
419 | Dictionary with pull status and information
420 | """
421 | # Prepare pull request payload
422 | pull_payload = {
423 | "image": image, # Use provided image name
424 | "name": name, # Always use name as the target VM name
425 | "registry": registry,
426 | "organization": organization,
427 | }
428 |
429 | if storage:
430 | pull_payload["storage"] = storage
431 |
432 | # Construct pull command with proper JSON payload
433 | pull_cmd = ["curl"]
434 |
435 | if not verbose:
436 | pull_cmd.append("-s")
437 |
438 | pull_cmd.extend(
439 | [
440 | "-X",
441 | "POST",
442 | "-H",
443 | "Content-Type: application/json",
444 | "-d",
445 | json.dumps(pull_payload),
446 | f"http://{host}:{port}/lume/pull",
447 | ]
448 | )
449 |
450 | logger.debug(f"Executing API request: {' '.join(pull_cmd)}")
451 |
452 | try:
453 | # Execute pull command
454 | result = subprocess.run(pull_cmd, capture_output=True, text=True)
455 |
456 | if result.returncode != 0:
457 | error_msg = f"Failed to pull VM {name}: {result.stderr}"
458 | logger.error(error_msg)
459 | return {"error": error_msg}
460 |
461 | try:
462 | response = json.loads(result.stdout)
463 | logger.info(f"Successfully initiated pull for VM {name}")
464 | return response
465 | except json.JSONDecodeError:
466 | if result.stdout:
467 | logger.info(f"Pull response: {result.stdout}")
468 | return {"success": True, "message": f"Successfully initiated pull for VM {name}"}
469 |
470 | except subprocess.SubprocessError as e:
471 | error_msg = f"Failed to execute pull command: {str(e)}"
472 | logger.error(error_msg)
473 | return {"error": error_msg}
474 |
475 |
476 | def lume_api_delete(
477 | vm_name: str,
478 | host: str,
479 | port: int,
480 | storage: Optional[str] = None,
481 | debug: bool = False,
482 | verbose: bool = False,
483 | ) -> Dict[str, Any]:
484 | """Delete a VM using curl.
485 |
486 | Args:
487 | vm_name: Name of the VM to delete
488 | host: API host
489 | port: API port
490 | storage: Storage path for the VM
491 | debug: Whether to show debug output
492 | verbose: Enable verbose logging
493 |
494 | Returns:
495 | Dictionary with API response or error information
496 | """
497 | # URL encode the storage parameter for the query
498 | encoded_storage = ""
499 | storage_param = ""
500 |
501 | if storage:
502 | # First encode the storage path properly
503 | encoded_storage = urllib.parse.quote(storage, safe="")
504 | storage_param = f"?storage={encoded_storage}"
505 |
506 | # Construct API URL with encoded storage parameter if needed
507 | api_url = f"http://{host}:{port}/lume/vms/{vm_name}{storage_param}"
508 |
509 | # Construct the curl command for DELETE operation - using much longer timeouts matching shell implementation
510 | cmd = [
511 | "curl",
512 | "--connect-timeout",
513 | "6000",
514 | "--max-time",
515 | "5000",
516 | "-s",
517 | "-X",
518 | "DELETE",
519 | api_url,
520 | ]
521 |
522 | # For logging and display, show the properly escaped URL
523 | display_cmd = [
524 | "curl",
525 | "--connect-timeout",
526 | "6000",
527 | "--max-time",
528 | "5000",
529 | "-s",
530 | "-X",
531 | "DELETE",
532 | api_url,
533 | ]
534 |
535 | # Only print the curl command when debug is enabled
536 | display_curl_string = " ".join(display_cmd)
537 | logger.debug(f"Executing API request: {display_curl_string}")
538 |
539 | # Execute the command - for execution we need to use shell=True to handle URLs with special characters
540 | try:
541 | # Use a single string with shell=True for proper URL handling
542 | shell_cmd = safe_join(cmd)
543 | result = subprocess.run(shell_cmd, shell=True, capture_output=True, text=True)
544 |
545 | # Handle curl exit codes
546 | if result.returncode != 0:
547 | curl_error = "Unknown error"
548 |
549 | # Map common curl error codes to helpful messages
550 | if result.returncode == 7:
551 | curl_error = "Failed to connect to the API server - it might still be starting up"
552 | elif result.returncode == 22:
553 | curl_error = "HTTP error returned from API server"
554 | elif result.returncode == 28:
555 | curl_error = "Operation timeout - the API server is taking too long to respond"
556 | elif result.returncode == 52:
557 | curl_error = (
558 | "Empty reply from server - the API server is starting but not fully ready yet"
559 | )
560 | elif result.returncode == 56:
561 | curl_error = "Network problem during data transfer - check container networking"
562 |
563 | # Only log at debug level to reduce noise during retries
564 | logger.debug(f"API request failed with code {result.returncode}: {curl_error}")
565 |
566 | # Return a more useful error message
567 | return {
568 | "error": f"API request failed: {curl_error}",
569 | "curl_code": result.returncode,
570 | "vm_name": vm_name,
571 | "storage": storage,
572 | }
573 |
574 | # Try to parse the response as JSON
575 | if result.stdout and result.stdout.strip():
576 | try:
577 | response = json.loads(result.stdout)
578 | return response
579 | except json.JSONDecodeError:
580 | # Return the raw response if it's not valid JSON
581 | return {
582 | "success": True,
583 | "message": "VM deleted successfully",
584 | "raw_response": result.stdout,
585 | }
586 | else:
587 | return {"success": True, "message": "VM deleted successfully"}
588 | except subprocess.SubprocessError as e:
589 | logger.error(f"Failed to execute delete request: {e}")
590 | return {"error": f"Failed to execute delete request: {str(e)}"}
591 |
592 |
593 | def parse_memory(memory_str: str) -> int:
594 | """Parse memory string to MB integer.
595 |
596 | Examples:
597 | "8GB" -> 8192
598 | "1024MB" -> 1024
599 | "512" -> 512
600 |
601 | Returns:
602 | Memory value in MB
603 | """
604 | if isinstance(memory_str, int):
605 | return memory_str
606 |
607 | if isinstance(memory_str, str):
608 | # Extract number and unit
609 | import re
610 |
611 | match = re.match(r"(\d+)([A-Za-z]*)", memory_str)
612 | if match:
613 | value, unit = match.groups()
614 | value = int(value)
615 | unit = unit.upper()
616 |
617 | if unit == "GB" or unit == "G":
618 | return value * 1024
619 | elif unit == "MB" or unit == "M" or unit == "":
620 | return value
621 |
622 | # Default fallback
623 | logger.warning(f"Could not parse memory string '{memory_str}', using 8GB default")
624 | return 8192 # Default to 8GB
625 |
```
--------------------------------------------------------------------------------
/libs/python/computer-server/computer_server/handlers/linux.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Linux implementation of automation and accessibility handlers.
3 |
4 | This implementation attempts to use pyautogui for GUI automation when available.
5 | If running in a headless environment without X11, it will fall back to simulated responses.
6 | To use GUI automation in a headless environment:
7 | 1. Install Xvfb: sudo apt-get install xvfb
8 | 2. Run with virtual display: xvfb-run python -m computer_server
9 | """
10 |
11 | import asyncio
12 | import base64
13 | import json
14 | import logging
15 | import os
16 | import subprocess
17 | from io import BytesIO
18 | from typing import Any, Dict, List, Optional, Tuple
19 |
20 | # Configure logger
21 | logger = logging.getLogger(__name__)
22 |
23 | # Try to import pyautogui, but don't fail if it's not available
24 | # This allows the server to run in headless environments
25 | try:
26 | import pyautogui
27 |
28 | pyautogui.FAILSAFE = False
29 |
30 | logger.info("pyautogui successfully imported, GUI automation available")
31 | except Exception as e:
32 | logger.warning(f"pyautogui import failed: {str(e)}. GUI operations will be simulated.")
33 |
34 | from pynput.keyboard import Controller as KeyboardController
35 | from pynput.keyboard import Key
36 | from pynput.mouse import Button
37 | from pynput.mouse import Controller as MouseController
38 |
39 | from .base import BaseAccessibilityHandler, BaseAutomationHandler
40 |
41 |
42 | class LinuxAccessibilityHandler(BaseAccessibilityHandler):
43 | """Linux implementation of accessibility handler."""
44 |
45 | async def get_accessibility_tree(self) -> Dict[str, Any]:
46 | """Get the accessibility tree of the current window.
47 |
48 | Returns:
49 | Dict[str, Any]: A dictionary containing success status and a simulated tree structure
50 | since Linux doesn't have equivalent accessibility API like macOS.
51 | """
52 | # Linux doesn't have equivalent accessibility API like macOS
53 | # Return a minimal dummy tree
54 | logger.info(
55 | "Getting accessibility tree (simulated, no accessibility API available on Linux)"
56 | )
57 | return {
58 | "success": True,
59 | "tree": {
60 | "role": "Window",
61 | "title": "Linux Window",
62 | "position": {"x": 0, "y": 0},
63 | "size": {"width": 1920, "height": 1080},
64 | "children": [],
65 | },
66 | }
67 |
68 | async def find_element(
69 | self, role: Optional[str] = None, title: Optional[str] = None, value: Optional[str] = None
70 | ) -> Dict[str, Any]:
71 | """Find an element in the accessibility tree by criteria.
72 |
73 | Args:
74 | role: The role of the element to find.
75 | title: The title of the element to find.
76 | value: The value of the element to find.
77 |
78 | Returns:
79 | Dict[str, Any]: A dictionary indicating that element search is not supported on Linux.
80 | """
81 | logger.info(
82 | f"Finding element with role={role}, title={title}, value={value} (not supported on Linux)"
83 | )
84 | return {"success": False, "message": "Element search not supported on Linux"}
85 |
86 | def get_cursor_position(self) -> Tuple[int, int]:
87 | """Get the current cursor position.
88 |
89 | Returns:
90 | Tuple[int, int]: The x and y coordinates of the cursor position.
91 | Returns (0, 0) if pyautogui is not available.
92 | """
93 | try:
94 | pos = pyautogui.position()
95 | return pos.x, pos.y
96 | except Exception as e:
97 | logger.warning(f"Failed to get cursor position with pyautogui: {e}")
98 |
99 | logger.info("Getting cursor position (simulated)")
100 | return 0, 0
101 |
102 | def get_screen_size(self) -> Tuple[int, int]:
103 | """Get the screen size.
104 |
105 | Returns:
106 | Tuple[int, int]: The width and height of the screen in pixels.
107 | Returns (1920, 1080) if pyautogui is not available.
108 | """
109 | try:
110 | size = pyautogui.size()
111 | return size.width, size.height
112 | except Exception as e:
113 | logger.warning(f"Failed to get screen size with pyautogui: {e}")
114 |
115 | logger.info("Getting screen size (simulated)")
116 | return 1920, 1080
117 |
118 |
119 | class LinuxAutomationHandler(BaseAutomationHandler):
120 | """Linux implementation of automation handler using pyautogui."""
121 |
122 | keyboard = KeyboardController()
123 | mouse = MouseController()
124 |
125 | # Mouse Actions
126 | async def mouse_down(
127 | self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
128 | ) -> Dict[str, Any]:
129 | """Press and hold a mouse button at the specified coordinates.
130 |
131 | Args:
132 | x: The x coordinate to move to before pressing. If None, uses current position.
133 | y: The y coordinate to move to before pressing. If None, uses current position.
134 | button: The mouse button to press ("left", "right", or "middle").
135 |
136 | Returns:
137 | Dict[str, Any]: A dictionary with success status and error message if failed.
138 | """
139 | try:
140 | if x is not None and y is not None:
141 | pyautogui.moveTo(x, y)
142 | pyautogui.mouseDown(button=button)
143 | return {"success": True}
144 | except Exception as e:
145 | return {"success": False, "error": str(e)}
146 |
147 | async def mouse_up(
148 | self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
149 | ) -> Dict[str, Any]:
150 | """Release a mouse button at the specified coordinates.
151 |
152 | Args:
153 | x: The x coordinate to move to before releasing. If None, uses current position.
154 | y: The y coordinate to move to before releasing. If None, uses current position.
155 | button: The mouse button to release ("left", "right", or "middle").
156 |
157 | Returns:
158 | Dict[str, Any]: A dictionary with success status and error message if failed.
159 | """
160 | try:
161 | if x is not None and y is not None:
162 | pyautogui.moveTo(x, y)
163 | pyautogui.mouseUp(button=button)
164 | return {"success": True}
165 | except Exception as e:
166 | return {"success": False, "error": str(e)}
167 |
168 | async def move_cursor(self, x: int, y: int) -> Dict[str, Any]:
169 | """Move the cursor to the specified coordinates.
170 |
171 | Args:
172 | x: The x coordinate to move to.
173 | y: The y coordinate to move to.
174 |
175 | Returns:
176 | Dict[str, Any]: A dictionary with success status and error message if failed.
177 | """
178 | try:
179 | pyautogui.moveTo(x, y)
180 | return {"success": True}
181 | except Exception as e:
182 | return {"success": False, "error": str(e)}
183 |
184 | async def left_click(self, x: Optional[int] = None, y: Optional[int] = None) -> Dict[str, Any]:
185 | """Perform a left mouse click at the specified coordinates.
186 |
187 | Args:
188 | x: The x coordinate to click at. If None, clicks at current position.
189 | y: The y coordinate to click at. If None, clicks at current position.
190 |
191 | Returns:
192 | Dict[str, Any]: A dictionary with success status and error message if failed.
193 | """
194 | try:
195 | if x is not None and y is not None:
196 | pyautogui.moveTo(x, y)
197 | pyautogui.click()
198 | return {"success": True}
199 | except Exception as e:
200 | return {"success": False, "error": str(e)}
201 |
202 | async def right_click(self, x: Optional[int] = None, y: Optional[int] = None) -> Dict[str, Any]:
203 | """Perform a right mouse click at the specified coordinates.
204 |
205 | Args:
206 | x: The x coordinate to click at. If None, clicks at current position.
207 | y: The y coordinate to click at. If None, clicks at current position.
208 |
209 | Returns:
210 | Dict[str, Any]: A dictionary with success status and error message if failed.
211 | """
212 | try:
213 | if x is not None and y is not None:
214 | pyautogui.moveTo(x, y)
215 | pyautogui.rightClick()
216 | return {"success": True}
217 | except Exception as e:
218 | return {"success": False, "error": str(e)}
219 |
220 | async def double_click(
221 | self, x: Optional[int] = None, y: Optional[int] = None
222 | ) -> Dict[str, Any]:
223 | """Perform a double click at the specified coordinates.
224 |
225 | Args:
226 | x: The x coordinate to double click at. If None, clicks at current position.
227 | y: The y coordinate to double click at. If None, clicks at current position.
228 |
229 | Returns:
230 | Dict[str, Any]: A dictionary with success status and error message if failed.
231 | """
232 | try:
233 | if x is not None and y is not None:
234 | pyautogui.moveTo(x, y)
235 | pyautogui.doubleClick(interval=0.1)
236 | return {"success": True}
237 | except Exception as e:
238 | return {"success": False, "error": str(e)}
239 |
240 | async def click(
241 | self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
242 | ) -> Dict[str, Any]:
243 | """Perform a mouse click with the specified button at the given coordinates.
244 |
245 | Args:
246 | x: The x coordinate to click at. If None, clicks at current position.
247 | y: The y coordinate to click at. If None, clicks at current position.
248 | button: The mouse button to click ("left", "right", or "middle").
249 |
250 | Returns:
251 | Dict[str, Any]: A dictionary with success status and error message if failed.
252 | """
253 | try:
254 | if x is not None and y is not None:
255 | pyautogui.moveTo(x, y)
256 | pyautogui.click(button=button)
257 | return {"success": True}
258 | except Exception as e:
259 | return {"success": False, "error": str(e)}
260 |
261 | async def drag_to(
262 | self, x: int, y: int, button: str = "left", duration: float = 0.5
263 | ) -> Dict[str, Any]:
264 | """Drag from the current position to the specified coordinates.
265 |
266 | Args:
267 | x: The x coordinate to drag to.
268 | y: The y coordinate to drag to.
269 | button: The mouse button to use for dragging.
270 | duration: The time in seconds to take for the drag operation.
271 |
272 | Returns:
273 | Dict[str, Any]: A dictionary with success status and error message if failed.
274 | """
275 | try:
276 | pyautogui.dragTo(x, y, duration=duration, button=button)
277 | return {"success": True}
278 | except Exception as e:
279 | return {"success": False, "error": str(e)}
280 |
281 | async def drag(
282 | self, start_x: int, start_y: int, end_x: int, end_y: int, button: str = "left"
283 | ) -> Dict[str, Any]:
284 | """Drag from start coordinates to end coordinates.
285 |
286 | Args:
287 | start_x: The starting x coordinate.
288 | start_y: The starting y coordinate.
289 | end_x: The ending x coordinate.
290 | end_y: The ending y coordinate.
291 | button: The mouse button to use for dragging.
292 |
293 | Returns:
294 | Dict[str, Any]: A dictionary with success status and error message if failed.
295 | """
296 | try:
297 | pyautogui.moveTo(start_x, start_y)
298 | pyautogui.dragTo(end_x, end_y, duration=0.5, button=button)
299 | return {"success": True}
300 | except Exception as e:
301 | return {"success": False, "error": str(e)}
302 |
303 | async def drag_path(
304 | self, path: List[Tuple[int, int]], button: str = "left", duration: float = 0.5
305 | ) -> Dict[str, Any]:
306 | """Drag along a path defined by a list of coordinates.
307 |
308 | Args:
309 | path: A list of (x, y) coordinate tuples defining the drag path.
310 | button: The mouse button to use for dragging.
311 | duration: The time in seconds to take for each segment of the drag.
312 |
313 | Returns:
314 | Dict[str, Any]: A dictionary with success status and error message if failed.
315 | """
316 | try:
317 | if not path:
318 | return {"success": False, "error": "Path is empty"}
319 | pyautogui.moveTo(*path[0])
320 | for x, y in path[1:]:
321 | pyautogui.dragTo(x, y, duration=duration, button=button)
322 | return {"success": True}
323 | except Exception as e:
324 | return {"success": False, "error": str(e)}
325 |
326 | # Keyboard Actions
327 | async def key_down(self, key: str) -> Dict[str, Any]:
328 | """Press and hold a key.
329 |
330 | Args:
331 | key: The key to press down.
332 |
333 | Returns:
334 | Dict[str, Any]: A dictionary with success status and error message if failed.
335 | """
336 | try:
337 | pyautogui.keyDown(key)
338 | return {"success": True}
339 | except Exception as e:
340 | return {"success": False, "error": str(e)}
341 |
342 | async def key_up(self, key: str) -> Dict[str, Any]:
343 | """Release a key.
344 |
345 | Args:
346 | key: The key to release.
347 |
348 | Returns:
349 | Dict[str, Any]: A dictionary with success status and error message if failed.
350 | """
351 | try:
352 | pyautogui.keyUp(key)
353 | return {"success": True}
354 | except Exception as e:
355 | return {"success": False, "error": str(e)}
356 |
357 | async def type_text(self, text: str) -> Dict[str, Any]:
358 | """Type the specified text using the keyboard.
359 |
360 | Args:
361 | text: The text to type.
362 |
363 | Returns:
364 | Dict[str, Any]: A dictionary with success status and error message if failed.
365 | """
366 | try:
367 | # use pynput for Unicode support
368 | self.keyboard.type(text)
369 | return {"success": True}
370 | except Exception as e:
371 | return {"success": False, "error": str(e)}
372 |
373 | async def press_key(self, key: str) -> Dict[str, Any]:
374 | """Press and release a key.
375 |
376 | Args:
377 | key: The key to press.
378 |
379 | Returns:
380 | Dict[str, Any]: A dictionary with success status and error message if failed.
381 | """
382 | try:
383 | pyautogui.press(key)
384 | return {"success": True}
385 | except Exception as e:
386 | return {"success": False, "error": str(e)}
387 |
388 | async def hotkey(self, keys: List[str]) -> Dict[str, Any]:
389 | """Press a combination of keys simultaneously.
390 |
391 | Args:
392 | keys: A list of keys to press together as a hotkey combination.
393 |
394 | Returns:
395 | Dict[str, Any]: A dictionary with success status and error message if failed.
396 | """
397 | try:
398 | pyautogui.hotkey(*keys)
399 | return {"success": True}
400 | except Exception as e:
401 | return {"success": False, "error": str(e)}
402 |
403 | # Scrolling Actions
404 | async def scroll(self, x: int, y: int) -> Dict[str, Any]:
405 | """Scroll the mouse wheel.
406 |
407 | Args:
408 | x: The horizontal scroll amount.
409 | y: The vertical scroll amount.
410 |
411 | Returns:
412 | Dict[str, Any]: A dictionary with success status and error message if failed.
413 | """
414 | try:
415 | self.mouse.scroll(x, y)
416 | return {"success": True}
417 | except Exception as e:
418 | return {"success": False, "error": str(e)}
419 |
420 | async def scroll_down(self, clicks: int = 1) -> Dict[str, Any]:
421 | """Scroll down by the specified number of clicks.
422 |
423 | Args:
424 | clicks: The number of scroll clicks to perform downward.
425 |
426 | Returns:
427 | Dict[str, Any]: A dictionary with success status and error message if failed.
428 | """
429 | try:
430 | pyautogui.scroll(-clicks)
431 | return {"success": True}
432 | except Exception as e:
433 | return {"success": False, "error": str(e)}
434 |
435 | async def scroll_up(self, clicks: int = 1) -> Dict[str, Any]:
436 | """Scroll up by the specified number of clicks.
437 |
438 | Args:
439 | clicks: The number of scroll clicks to perform upward.
440 |
441 | Returns:
442 | Dict[str, Any]: A dictionary with success status and error message if failed.
443 | """
444 | try:
445 | pyautogui.scroll(clicks)
446 | return {"success": True}
447 | except Exception as e:
448 | return {"success": False, "error": str(e)}
449 |
450 | # Screen Actions
451 | async def screenshot(self) -> Dict[str, Any]:
452 | """Take a screenshot of the current screen.
453 |
454 | Returns:
455 | Dict[str, Any]: A dictionary containing success status and base64-encoded image data,
456 | or error message if failed.
457 | """
458 | try:
459 | from PIL import Image
460 |
461 | screenshot = pyautogui.screenshot()
462 | if not isinstance(screenshot, Image.Image):
463 | return {"success": False, "error": "Failed to capture screenshot"}
464 | buffered = BytesIO()
465 | screenshot.save(buffered, format="PNG", optimize=True)
466 | buffered.seek(0)
467 | image_data = base64.b64encode(buffered.getvalue()).decode()
468 | return {"success": True, "image_data": image_data}
469 | except Exception as e:
470 | return {"success": False, "error": f"Screenshot error: {str(e)}"}
471 |
472 | async def get_screen_size(self) -> Dict[str, Any]:
473 | """Get the size of the screen.
474 |
475 | Returns:
476 | Dict[str, Any]: A dictionary containing success status and screen dimensions,
477 | or error message if failed.
478 | """
479 | try:
480 | size = pyautogui.size()
481 | return {"success": True, "size": {"width": size.width, "height": size.height}}
482 | except Exception as e:
483 | return {"success": False, "error": str(e)}
484 |
485 | async def get_cursor_position(self) -> Dict[str, Any]:
486 | """Get the current position of the cursor.
487 |
488 | Returns:
489 | Dict[str, Any]: A dictionary containing success status and cursor coordinates,
490 | or error message if failed.
491 | """
492 | try:
493 | pos = pyautogui.position()
494 | return {"success": True, "position": {"x": pos.x, "y": pos.y}}
495 | except Exception as e:
496 | return {"success": False, "error": str(e)}
497 |
498 | # Clipboard Actions
499 | async def copy_to_clipboard(self) -> Dict[str, Any]:
500 | """Get the current content of the clipboard.
501 |
502 | Returns:
503 | Dict[str, Any]: A dictionary containing success status and clipboard content,
504 | or error message if failed.
505 | """
506 | try:
507 | import pyperclip
508 |
509 | content = pyperclip.paste()
510 | return {"success": True, "content": content}
511 | except Exception as e:
512 | return {"success": False, "error": str(e)}
513 |
514 | async def set_clipboard(self, text: str) -> Dict[str, Any]:
515 | """Set the clipboard content to the specified text.
516 |
517 | Args:
518 | text: The text to copy to the clipboard.
519 |
520 | Returns:
521 | Dict[str, Any]: A dictionary with success status and error message if failed.
522 | """
523 | try:
524 | import pyperclip
525 |
526 | pyperclip.copy(text)
527 | return {"success": True}
528 | except Exception as e:
529 | return {"success": False, "error": str(e)}
530 |
531 | # Command Execution
532 | async def run_command(self, command: str) -> Dict[str, Any]:
533 | """Execute a shell command asynchronously.
534 |
535 | Args:
536 | command: The shell command to execute.
537 |
538 | Returns:
539 | Dict[str, Any]: A dictionary containing success status, stdout, stderr,
540 | and return code, or error message if failed.
541 | """
542 | try:
543 | # Create subprocess
544 | process = await asyncio.create_subprocess_shell(
545 | command, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
546 | )
547 | # Wait for the subprocess to finish
548 | stdout, stderr = await process.communicate()
549 | # Return decoded output
550 | return {
551 | "success": True,
552 | "stdout": stdout.decode() if stdout else "",
553 | "stderr": stderr.decode() if stderr else "",
554 | "return_code": process.returncode,
555 | }
556 | except Exception as e:
557 | return {"success": False, "error": str(e)}
558 |
```
--------------------------------------------------------------------------------
/libs/python/computer/computer/interface/base.py:
--------------------------------------------------------------------------------
```python
1 | """Base interface for computer control."""
2 |
3 | from abc import ABC, abstractmethod
4 | from typing import Any, Dict, List, Optional, Tuple
5 |
6 | from ..logger import Logger, LogLevel
7 | from .models import CommandResult, MouseButton
8 |
9 |
10 | class BaseComputerInterface(ABC):
11 | """Base class for computer control interfaces."""
12 |
13 | def __init__(
14 | self,
15 | ip_address: str,
16 | username: str = "lume",
17 | password: str = "lume",
18 | api_key: Optional[str] = None,
19 | vm_name: Optional[str] = None,
20 | ):
21 | """Initialize interface.
22 |
23 | Args:
24 | ip_address: IP address of the computer to control
25 | username: Username for authentication
26 | password: Password for authentication
27 | api_key: Optional API key for cloud authentication
28 | vm_name: Optional VM name for cloud authentication
29 | """
30 | self.ip_address = ip_address
31 | self.username = username
32 | self.password = password
33 | self.api_key = api_key
34 | self.vm_name = vm_name
35 | self.logger = Logger("cua.interface", LogLevel.NORMAL)
36 |
37 | # Optional default delay time between commands (in seconds)
38 | self.delay: float = 0.0
39 |
40 | @abstractmethod
41 | async def wait_for_ready(self, timeout: int = 60) -> None:
42 | """Wait for interface to be ready.
43 |
44 | Args:
45 | timeout: Maximum time to wait in seconds
46 |
47 | Raises:
48 | TimeoutError: If interface is not ready within timeout
49 | """
50 | pass
51 |
52 | @abstractmethod
53 | def close(self) -> None:
54 | """Close the interface connection."""
55 | pass
56 |
57 | def force_close(self) -> None:
58 | """Force close the interface connection.
59 |
60 | By default, this just calls close(), but subclasses can override
61 | to provide more forceful cleanup.
62 | """
63 | self.close()
64 |
65 | # Mouse Actions
66 | @abstractmethod
67 | async def mouse_down(
68 | self,
69 | x: Optional[int] = None,
70 | y: Optional[int] = None,
71 | button: "MouseButton" = "left",
72 | delay: Optional[float] = None,
73 | ) -> None:
74 | """Press and hold a mouse button.
75 |
76 | Args:
77 | x: X coordinate to press at. If None, uses current cursor position.
78 | y: Y coordinate to press at. If None, uses current cursor position.
79 | button: Mouse button to press ('left', 'middle', 'right').
80 | delay: Optional delay in seconds after the action
81 | """
82 | pass
83 |
84 | @abstractmethod
85 | async def mouse_up(
86 | self,
87 | x: Optional[int] = None,
88 | y: Optional[int] = None,
89 | button: "MouseButton" = "left",
90 | delay: Optional[float] = None,
91 | ) -> None:
92 | """Release a mouse button.
93 |
94 | Args:
95 | x: X coordinate to release at. If None, uses current cursor position.
96 | y: Y coordinate to release at. If None, uses current cursor position.
97 | button: Mouse button to release ('left', 'middle', 'right').
98 | delay: Optional delay in seconds after the action
99 | """
100 | pass
101 |
102 | @abstractmethod
103 | async def left_click(
104 | self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
105 | ) -> None:
106 | """Perform a left mouse button click.
107 |
108 | Args:
109 | x: X coordinate to click at. If None, uses current cursor position.
110 | y: Y coordinate to click at. If None, uses current cursor position.
111 | delay: Optional delay in seconds after the action
112 | """
113 | pass
114 |
115 | @abstractmethod
116 | async def right_click(
117 | self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
118 | ) -> None:
119 | """Perform a right mouse button click.
120 |
121 | Args:
122 | x: X coordinate to click at. If None, uses current cursor position.
123 | y: Y coordinate to click at. If None, uses current cursor position.
124 | delay: Optional delay in seconds after the action
125 | """
126 | pass
127 |
128 | @abstractmethod
129 | async def double_click(
130 | self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
131 | ) -> None:
132 | """Perform a double left mouse button click.
133 |
134 | Args:
135 | x: X coordinate to double-click at. If None, uses current cursor position.
136 | y: Y coordinate to double-click at. If None, uses current cursor position.
137 | delay: Optional delay in seconds after the action
138 | """
139 | pass
140 |
141 | @abstractmethod
142 | async def move_cursor(self, x: int, y: int, delay: Optional[float] = None) -> None:
143 | """Move the cursor to the specified screen coordinates.
144 |
145 | Args:
146 | x: X coordinate to move cursor to.
147 | y: Y coordinate to move cursor to.
148 | delay: Optional delay in seconds after the action
149 | """
150 | pass
151 |
152 | @abstractmethod
153 | async def drag_to(
154 | self,
155 | x: int,
156 | y: int,
157 | button: str = "left",
158 | duration: float = 0.5,
159 | delay: Optional[float] = None,
160 | ) -> None:
161 | """Drag from current position to specified coordinates.
162 |
163 | Args:
164 | x: The x coordinate to drag to
165 | y: The y coordinate to drag to
166 | button: The mouse button to use ('left', 'middle', 'right')
167 | duration: How long the drag should take in seconds
168 | delay: Optional delay in seconds after the action
169 | """
170 | pass
171 |
172 | @abstractmethod
173 | async def drag(
174 | self,
175 | path: List[Tuple[int, int]],
176 | button: str = "left",
177 | duration: float = 0.5,
178 | delay: Optional[float] = None,
179 | ) -> None:
180 | """Drag the cursor along a path of coordinates.
181 |
182 | Args:
183 | path: List of (x, y) coordinate tuples defining the drag path
184 | button: The mouse button to use ('left', 'middle', 'right')
185 | duration: Total time in seconds that the drag operation should take
186 | delay: Optional delay in seconds after the action
187 | """
188 | pass
189 |
190 | # Keyboard Actions
191 | @abstractmethod
192 | async def key_down(self, key: str, delay: Optional[float] = None) -> None:
193 | """Press and hold a key.
194 |
195 | Args:
196 | key: The key to press and hold (e.g., 'a', 'shift', 'ctrl').
197 | delay: Optional delay in seconds after the action.
198 | """
199 | pass
200 |
201 | @abstractmethod
202 | async def key_up(self, key: str, delay: Optional[float] = None) -> None:
203 | """Release a previously pressed key.
204 |
205 | Args:
206 | key: The key to release (e.g., 'a', 'shift', 'ctrl').
207 | delay: Optional delay in seconds after the action.
208 | """
209 | pass
210 |
211 | @abstractmethod
212 | async def type_text(self, text: str, delay: Optional[float] = None) -> None:
213 | """Type the specified text string.
214 |
215 | Args:
216 | text: The text string to type.
217 | delay: Optional delay in seconds after the action.
218 | """
219 | pass
220 |
221 | @abstractmethod
222 | async def press_key(self, key: str, delay: Optional[float] = None) -> None:
223 | """Press and release a single key.
224 |
225 | Args:
226 | key: The key to press (e.g., 'a', 'enter', 'escape').
227 | delay: Optional delay in seconds after the action.
228 | """
229 | pass
230 |
231 | @abstractmethod
232 | async def hotkey(self, *keys: str, delay: Optional[float] = None) -> None:
233 | """Press multiple keys simultaneously (keyboard shortcut).
234 |
235 | Args:
236 | *keys: Variable number of keys to press together (e.g., 'ctrl', 'c').
237 | delay: Optional delay in seconds after the action.
238 | """
239 | pass
240 |
241 | # Scrolling Actions
242 | @abstractmethod
243 | async def scroll(self, x: int, y: int, delay: Optional[float] = None) -> None:
244 | """Scroll the mouse wheel by specified amounts.
245 |
246 | Args:
247 | x: Horizontal scroll amount (positive = right, negative = left).
248 | y: Vertical scroll amount (positive = up, negative = down).
249 | delay: Optional delay in seconds after the action.
250 | """
251 | pass
252 |
253 | @abstractmethod
254 | async def scroll_down(self, clicks: int = 1, delay: Optional[float] = None) -> None:
255 | """Scroll down by the specified number of clicks.
256 |
257 | Args:
258 | clicks: Number of scroll clicks to perform downward.
259 | delay: Optional delay in seconds after the action.
260 | """
261 | pass
262 |
263 | @abstractmethod
264 | async def scroll_up(self, clicks: int = 1, delay: Optional[float] = None) -> None:
265 | """Scroll up by the specified number of clicks.
266 |
267 | Args:
268 | clicks: Number of scroll clicks to perform upward.
269 | delay: Optional delay in seconds after the action.
270 | """
271 | pass
272 |
273 | # Screen Actions
274 | @abstractmethod
275 | async def screenshot(self) -> bytes:
276 | """Take a screenshot.
277 |
278 | Returns:
279 | Raw bytes of the screenshot image
280 | """
281 | pass
282 |
283 | @abstractmethod
284 | async def get_screen_size(self) -> Dict[str, int]:
285 | """Get the screen dimensions.
286 |
287 | Returns:
288 | Dict with 'width' and 'height' keys
289 | """
290 | pass
291 |
292 | @abstractmethod
293 | async def get_cursor_position(self) -> Dict[str, int]:
294 | """Get the current cursor position on screen.
295 |
296 | Returns:
297 | Dict with 'x' and 'y' keys containing cursor coordinates.
298 | """
299 | pass
300 |
301 | # Clipboard Actions
302 | @abstractmethod
303 | async def copy_to_clipboard(self) -> str:
304 | """Get the current clipboard content.
305 |
306 | Returns:
307 | The text content currently stored in the clipboard.
308 | """
309 | pass
310 |
311 | @abstractmethod
312 | async def set_clipboard(self, text: str) -> None:
313 | """Set the clipboard content to the specified text.
314 |
315 | Args:
316 | text: The text to store in the clipboard.
317 | """
318 | pass
319 |
320 | # File System Actions
321 | @abstractmethod
322 | async def file_exists(self, path: str) -> bool:
323 | """Check if a file exists at the specified path.
324 |
325 | Args:
326 | path: The file path to check.
327 |
328 | Returns:
329 | True if the file exists, False otherwise.
330 | """
331 | pass
332 |
333 | @abstractmethod
334 | async def directory_exists(self, path: str) -> bool:
335 | """Check if a directory exists at the specified path.
336 |
337 | Args:
338 | path: The directory path to check.
339 |
340 | Returns:
341 | True if the directory exists, False otherwise.
342 | """
343 | pass
344 |
345 | @abstractmethod
346 | async def list_dir(self, path: str) -> List[str]:
347 | """List the contents of a directory.
348 |
349 | Args:
350 | path: The directory path to list.
351 |
352 | Returns:
353 | List of file and directory names in the specified directory.
354 | """
355 | pass
356 |
357 | @abstractmethod
358 | async def read_text(self, path: str) -> str:
359 | """Read the text contents of a file.
360 |
361 | Args:
362 | path: The file path to read from.
363 |
364 | Returns:
365 | The text content of the file.
366 | """
367 | pass
368 |
369 | @abstractmethod
370 | async def write_text(self, path: str, content: str) -> None:
371 | """Write text content to a file.
372 |
373 | Args:
374 | path: The file path to write to.
375 | content: The text content to write.
376 | """
377 | pass
378 |
379 | @abstractmethod
380 | async def read_bytes(self, path: str, offset: int = 0, length: Optional[int] = None) -> bytes:
381 | """Read file binary contents with optional seeking support.
382 |
383 | Args:
384 | path: Path to the file
385 | offset: Byte offset to start reading from (default: 0)
386 | length: Number of bytes to read (default: None for entire file)
387 | """
388 | pass
389 |
390 | @abstractmethod
391 | async def write_bytes(self, path: str, content: bytes) -> None:
392 | """Write binary content to a file.
393 |
394 | Args:
395 | path: The file path to write to.
396 | content: The binary content to write.
397 | """
398 | pass
399 |
400 | @abstractmethod
401 | async def delete_file(self, path: str) -> None:
402 | """Delete a file at the specified path.
403 |
404 | Args:
405 | path: The file path to delete.
406 | """
407 | pass
408 |
409 | @abstractmethod
410 | async def create_dir(self, path: str) -> None:
411 | """Create a directory at the specified path.
412 |
413 | Args:
414 | path: The directory path to create.
415 | """
416 | pass
417 |
418 | @abstractmethod
419 | async def delete_dir(self, path: str) -> None:
420 | """Delete a directory at the specified path.
421 |
422 | Args:
423 | path: The directory path to delete.
424 | """
425 | pass
426 |
427 | @abstractmethod
428 | async def get_file_size(self, path: str) -> int:
429 | """Get the size of a file in bytes.
430 |
431 | Args:
432 | path: The file path to get the size of.
433 |
434 | Returns:
435 | The size of the file in bytes.
436 | """
437 | pass
438 |
439 | # Desktop actions
440 | @abstractmethod
441 | async def get_desktop_environment(self) -> str:
442 | """Get the current desktop environment.
443 |
444 | Returns:
445 | The name of the current desktop environment.
446 | """
447 | pass
448 |
449 | @abstractmethod
450 | async def set_wallpaper(self, path: str) -> None:
451 | """Set the desktop wallpaper to the specified path.
452 |
453 | Args:
454 | path: The file path to set as wallpaper
455 | """
456 | pass
457 |
458 | # Window management
459 | @abstractmethod
460 | async def open(self, target: str) -> None:
461 | """Open a target using the system's default handler.
462 |
463 | Typically opens files, folders, or URLs with the associated application.
464 |
465 | Args:
466 | target: The file path, folder path, or URL to open.
467 | """
468 | pass
469 |
470 | @abstractmethod
471 | async def launch(self, app: str, args: List[str] | None = None) -> Optional[int]:
472 | """Launch an application with optional arguments.
473 |
474 | Args:
475 | app: The application executable or bundle identifier.
476 | args: Optional list of arguments to pass to the application.
477 |
478 | Returns:
479 | Optional process ID (PID) of the launched application if available, otherwise None.
480 | """
481 | pass
482 |
483 | @abstractmethod
484 | async def get_current_window_id(self) -> int | str:
485 | """Get the identifier of the currently active/focused window.
486 |
487 | Returns:
488 | A window identifier that can be used with other window management methods.
489 | """
490 | pass
491 |
492 | @abstractmethod
493 | async def get_application_windows(self, app: str) -> List[int | str]:
494 | """Get all window identifiers for a specific application.
495 |
496 | Args:
497 | app: The application name, executable, or identifier to query.
498 |
499 | Returns:
500 | A list of window identifiers belonging to the specified application.
501 | """
502 | pass
503 |
504 | @abstractmethod
505 | async def get_window_name(self, window_id: int | str) -> str:
506 | """Get the title/name of a window.
507 |
508 | Args:
509 | window_id: The window identifier.
510 |
511 | Returns:
512 | The window's title or name string.
513 | """
514 | pass
515 |
516 | @abstractmethod
517 | async def get_window_size(self, window_id: int | str) -> tuple[int, int]:
518 | """Get the size of a window in pixels.
519 |
520 | Args:
521 | window_id: The window identifier.
522 |
523 | Returns:
524 | A tuple of (width, height) representing the window size in pixels.
525 | """
526 | pass
527 |
528 | @abstractmethod
529 | async def get_window_position(self, window_id: int | str) -> tuple[int, int]:
530 | """Get the screen position of a window.
531 |
532 | Args:
533 | window_id: The window identifier.
534 |
535 | Returns:
536 | A tuple of (x, y) representing the window's top-left corner in screen coordinates.
537 | """
538 | pass
539 |
540 | @abstractmethod
541 | async def set_window_size(self, window_id: int | str, width: int, height: int) -> None:
542 | """Set the size of a window in pixels.
543 |
544 | Args:
545 | window_id: The window identifier.
546 | width: Desired width in pixels.
547 | height: Desired height in pixels.
548 | """
549 | pass
550 |
551 | @abstractmethod
552 | async def set_window_position(self, window_id: int | str, x: int, y: int) -> None:
553 | """Move a window to a specific position on the screen.
554 |
555 | Args:
556 | window_id: The window identifier.
557 | x: X coordinate for the window's top-left corner.
558 | y: Y coordinate for the window's top-left corner.
559 | """
560 | pass
561 |
562 | @abstractmethod
563 | async def maximize_window(self, window_id: int | str) -> None:
564 | """Maximize a window.
565 |
566 | Args:
567 | window_id: The window identifier.
568 | """
569 | pass
570 |
571 | @abstractmethod
572 | async def minimize_window(self, window_id: int | str) -> None:
573 | """Minimize a window.
574 |
575 | Args:
576 | window_id: The window identifier.
577 | """
578 | pass
579 |
580 | @abstractmethod
581 | async def activate_window(self, window_id: int | str) -> None:
582 | """Bring a window to the foreground and focus it.
583 |
584 | Args:
585 | window_id: The window identifier.
586 | """
587 | pass
588 |
589 | @abstractmethod
590 | async def close_window(self, window_id: int | str) -> None:
591 | """Close a window.
592 |
593 | Args:
594 | window_id: The window identifier.
595 | """
596 | pass
597 |
598 | # Convenience aliases
599 | async def get_window_title(self, window_id: int | str) -> str:
600 | """Convenience alias for get_window_name().
601 |
602 | Args:
603 | window_id: The window identifier.
604 |
605 | Returns:
606 | The window's title or name string.
607 | """
608 | return await self.get_window_name(window_id)
609 |
610 | async def window_size(self, window_id: int | str) -> tuple[int, int]:
611 | """Convenience alias for get_window_size().
612 |
613 | Args:
614 | window_id: The window identifier.
615 |
616 | Returns:
617 | A tuple of (width, height) representing the window size in pixels.
618 | """
619 | return await self.get_window_size(window_id)
620 |
621 | # Shell actions
622 | @abstractmethod
623 | async def run_command(self, command: str) -> CommandResult:
624 | """Run shell command and return structured result.
625 |
626 | Executes a shell command using subprocess.run with shell=True and check=False.
627 | The command is run in the target environment and captures both stdout and stderr.
628 |
629 | Args:
630 | command (str): The shell command to execute
631 |
632 | Returns:
633 | CommandResult: A structured result containing:
634 | - stdout (str): Standard output from the command
635 | - stderr (str): Standard error from the command
636 | - returncode (int): Exit code from the command (0 indicates success)
637 |
638 | Raises:
639 | RuntimeError: If the command execution fails at the system level
640 |
641 | Example:
642 | result = await interface.run_command("ls -la")
643 | if result.returncode == 0:
644 | print(f"Output: {result.stdout}")
645 | else:
646 | print(f"Error: {result.stderr}, Exit code: {result.returncode}")
647 | """
648 | pass
649 |
650 | # Accessibility Actions
651 | @abstractmethod
652 | async def get_accessibility_tree(self) -> Dict:
653 | """Get the accessibility tree of the current screen.
654 |
655 | Returns:
656 | Dict containing the hierarchical accessibility information of screen elements.
657 | """
658 | pass
659 |
660 | @abstractmethod
661 | async def to_screen_coordinates(self, x: float, y: float) -> tuple[float, float]:
662 | """Convert screenshot coordinates to screen coordinates.
663 |
664 | Args:
665 | x: X coordinate in screenshot space
666 | y: Y coordinate in screenshot space
667 |
668 | Returns:
669 | tuple[float, float]: (x, y) coordinates in screen space
670 | """
671 | pass
672 |
673 | @abstractmethod
674 | async def to_screenshot_coordinates(self, x: float, y: float) -> tuple[float, float]:
675 | """Convert screen coordinates to screenshot coordinates.
676 |
677 | Args:
678 | x: X coordinate in screen space
679 | y: Y coordinate in screen space
680 |
681 | Returns:
682 | tuple[float, float]: (x, y) coordinates in screenshot space
683 | """
684 | pass
685 |
```
--------------------------------------------------------------------------------
/libs/python/computer/computer/providers/docker/provider.py:
--------------------------------------------------------------------------------
```python
1 | """
2 | Docker VM provider implementation.
3 |
4 | This provider uses Docker containers running the CUA Ubuntu image to create
5 | Linux VMs with computer-server. It handles VM lifecycle operations through Docker
6 | commands and container management.
7 | """
8 |
9 | import asyncio
10 | import json
11 | import logging
12 | import re
13 | import subprocess
14 | import time
15 | from typing import Any, Dict, List, Optional
16 |
17 | from ..base import BaseVMProvider, VMProviderType
18 |
19 | # Setup logging
20 | logger = logging.getLogger(__name__)
21 |
22 | # Check if Docker is available
23 | try:
24 | subprocess.run(["docker", "--version"], capture_output=True, check=True)
25 | HAS_DOCKER = True
26 | except (subprocess.SubprocessError, FileNotFoundError):
27 | HAS_DOCKER = False
28 |
29 |
30 | class DockerProvider(BaseVMProvider):
31 | """
32 | Docker VM Provider implementation using Docker containers.
33 |
34 | This provider uses Docker to run containers with the CUA Ubuntu image
35 | that includes computer-server for remote computer use.
36 | """
37 |
38 | def __init__(
39 | self,
40 | host: str = "localhost",
41 | storage: Optional[str] = None,
42 | shared_path: Optional[str] = None,
43 | image: str = "trycua/cua-ubuntu:latest",
44 | verbose: bool = False,
45 | ephemeral: bool = False,
46 | vnc_port: Optional[int] = 6901,
47 | api_port: Optional[int] = None,
48 | ):
49 | """Initialize the Docker VM Provider.
50 |
51 | Args:
52 | host: Hostname for the API server (default: localhost)
53 | storage: Path for persistent VM storage
54 | shared_path: Path for shared folder between host and container
55 | image: Docker image to use (default: "trycua/cua-ubuntu:latest")
56 | Supported images:
57 | - "trycua/cua-ubuntu:latest" (Kasm-based)
58 | - "trycua/cua-xfce:latest" (vanilla XFCE)
59 | verbose: Enable verbose logging
60 | ephemeral: Use ephemeral (temporary) storage
61 | vnc_port: Port for VNC interface (default: 6901)
62 | api_port: Port for API server (default: 8000)
63 | """
64 | self.host = host
65 | self.api_port = api_port if api_port is not None else 8000
66 | self.vnc_port = vnc_port
67 | self.ephemeral = ephemeral
68 |
69 | # Handle ephemeral storage (temporary directory)
70 | if ephemeral:
71 | self.storage = "ephemeral"
72 | else:
73 | self.storage = storage
74 |
75 | self.shared_path = shared_path
76 | self.image = image
77 | self.verbose = verbose
78 | self._container_id = None
79 | self._running_containers = {} # Track running containers by name
80 |
81 | # Detect image type and configure user directory accordingly
82 | self._detect_image_config()
83 |
84 | def _detect_image_config(self):
85 | """Detect image type and configure paths accordingly."""
86 | # Detect if this is a docker-xfce image or Kasm image
87 | if "docker-xfce" in self.image.lower() or "xfce" in self.image.lower():
88 | self._home_dir = "/home/cua"
89 | self._image_type = "docker-xfce"
90 | logger.info(f"Detected docker-xfce image: using {self._home_dir}")
91 | else:
92 | # Default to Kasm configuration
93 | self._home_dir = "/home/kasm-user"
94 | self._image_type = "kasm"
95 | logger.info(f"Detected Kasm image: using {self._home_dir}")
96 |
97 | @property
98 | def provider_type(self) -> VMProviderType:
99 | """Return the provider type."""
100 | return VMProviderType.DOCKER
101 |
102 | def _parse_memory(self, memory_str: str) -> str:
103 | """Parse memory string to Docker format.
104 |
105 | Examples:
106 | "8GB" -> "8g"
107 | "1024MB" -> "1024m"
108 | "512" -> "512m"
109 | """
110 | if isinstance(memory_str, int):
111 | return f"{memory_str}m"
112 |
113 | if isinstance(memory_str, str):
114 | # Extract number and unit
115 | match = re.match(r"(\d+)([A-Za-z]*)", memory_str)
116 | if match:
117 | value, unit = match.groups()
118 | unit = unit.upper()
119 |
120 | if unit == "GB" or unit == "G":
121 | return f"{value}g"
122 | elif unit == "MB" or unit == "M" or unit == "":
123 | return f"{value}m"
124 |
125 | # Default fallback
126 | logger.warning(f"Could not parse memory string '{memory_str}', using 4g default")
127 | return "4g" # Default to 4GB
128 |
129 | async def get_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
130 | """Get VM information by name.
131 |
132 | Args:
133 | name: Name of the VM to get information for
134 | storage: Optional storage path override. If provided, this will be used
135 | instead of the provider's default storage path.
136 |
137 | Returns:
138 | Dictionary with VM information including status, IP address, etc.
139 | """
140 | try:
141 | # Check if container exists and get its status
142 | cmd = ["docker", "inspect", name]
143 | result = subprocess.run(cmd, capture_output=True, text=True)
144 |
145 | if result.returncode != 0:
146 | # Container doesn't exist
147 | return {
148 | "name": name,
149 | "status": "not_found",
150 | "ip_address": None,
151 | "ports": {},
152 | "image": self.image,
153 | "provider": "docker",
154 | }
155 |
156 | # Parse container info
157 | container_info = json.loads(result.stdout)[0]
158 | state = container_info["State"]
159 | network_settings = container_info["NetworkSettings"]
160 |
161 | # Determine status
162 | if state["Running"]:
163 | status = "running"
164 | elif state["Paused"]:
165 | status = "paused"
166 | else:
167 | status = "stopped"
168 |
169 | # Get IP address
170 | ip_address = network_settings.get("IPAddress", "")
171 | if not ip_address and "Networks" in network_settings:
172 | # Try to get IP from bridge network
173 | for network_name, network_info in network_settings["Networks"].items():
174 | if network_info.get("IPAddress"):
175 | ip_address = network_info["IPAddress"]
176 | break
177 |
178 | # Get port mappings
179 | ports = {}
180 | if "Ports" in network_settings and network_settings["Ports"]:
181 | # network_settings["Ports"] is a dict like:
182 | # {'6901/tcp': [{'HostIp': '0.0.0.0', 'HostPort': '6901'}, ...], ...}
183 | for container_port, port_mappings in network_settings["Ports"].items():
184 | if port_mappings: # Check if there are any port mappings
185 | # Take the first mapping (usually the IPv4 one)
186 | for mapping in port_mappings:
187 | if mapping.get("HostPort"):
188 | ports[container_port] = mapping["HostPort"]
189 | break # Use the first valid mapping
190 |
191 | return {
192 | "name": name,
193 | "status": status,
194 | "ip_address": ip_address or "127.0.0.1", # Use localhost if no IP
195 | "ports": ports,
196 | "image": container_info["Config"]["Image"],
197 | "provider": "docker",
198 | "container_id": container_info["Id"][:12], # Short ID
199 | "created": container_info["Created"],
200 | "started": state.get("StartedAt", ""),
201 | }
202 |
203 | except Exception as e:
204 | logger.error(f"Error getting VM info for {name}: {e}")
205 | import traceback
206 |
207 | traceback.print_exc()
208 | return {"name": name, "status": "error", "error": str(e), "provider": "docker"}
209 |
210 | async def list_vms(self) -> List[Dict[str, Any]]:
211 | """List all Docker containers managed by this provider."""
212 | try:
213 | # List all containers (running and stopped) with the CUA image
214 | cmd = ["docker", "ps", "-a", "--filter", f"ancestor={self.image}", "--format", "json"]
215 | result = subprocess.run(cmd, capture_output=True, text=True, check=True)
216 |
217 | containers = []
218 | if result.stdout.strip():
219 | for line in result.stdout.strip().split("\n"):
220 | if line.strip():
221 | container_data = json.loads(line)
222 | vm_info = await self.get_vm(container_data["Names"])
223 | containers.append(vm_info)
224 |
225 | return containers
226 |
227 | except subprocess.CalledProcessError as e:
228 | logger.error(f"Error listing containers: {e.stderr}")
229 | return []
230 | except Exception as e:
231 | logger.error(f"Error listing VMs: {e}")
232 | import traceback
233 |
234 | traceback.print_exc()
235 | return []
236 |
237 | async def run_vm(
238 | self, image: str, name: str, run_opts: Dict[str, Any], storage: Optional[str] = None
239 | ) -> Dict[str, Any]:
240 | """Run a VM with the given options.
241 |
242 | Args:
243 | image: Name/tag of the Docker image to use
244 | name: Name of the container to run
245 | run_opts: Options for running the VM, including:
246 | - memory: Memory limit (e.g., "4GB", "2048MB")
247 | - cpu: CPU limit (e.g., 2 for 2 cores)
248 | - vnc_port: Specific port for VNC interface
249 | - api_port: Specific port for computer-server API
250 |
251 | Returns:
252 | Dictionary with VM status information
253 | """
254 | try:
255 | # Check if container already exists
256 | existing_vm = await self.get_vm(name, storage)
257 | if existing_vm["status"] == "running":
258 | logger.info(f"Container {name} is already running")
259 | return existing_vm
260 | elif existing_vm["status"] in ["stopped", "paused"]:
261 | if self.ephemeral:
262 | # Delete existing container
263 | logger.info(f"Deleting existing container {name}")
264 | delete_cmd = ["docker", "rm", name]
265 | result = subprocess.run(delete_cmd, capture_output=True, text=True, check=True)
266 | else:
267 | # Start existing container
268 | logger.info(f"Starting existing container {name}")
269 | start_cmd = ["docker", "start", name]
270 | result = subprocess.run(start_cmd, capture_output=True, text=True, check=True)
271 |
272 | # Wait for container to be ready
273 | await self._wait_for_container_ready(name)
274 | return await self.get_vm(name, storage)
275 |
276 | # Use provided image or default
277 | docker_image = image if image != "default" else self.image
278 |
279 | # Build docker run command
280 | cmd = ["docker", "run", "-d", "--name", name]
281 |
282 | # Add memory limit if specified
283 | if "memory" in run_opts:
284 | memory_limit = self._parse_memory(run_opts["memory"])
285 | cmd.extend(["--memory", memory_limit])
286 |
287 | # Add CPU limit if specified
288 | if "cpu" in run_opts:
289 | cpu_count = str(run_opts["cpu"])
290 | cmd.extend(["--cpus", cpu_count])
291 |
292 | # Add port mappings
293 | vnc_port = run_opts.get("vnc_port", self.vnc_port)
294 | api_port = run_opts.get("api_port", self.api_port)
295 |
296 | if vnc_port:
297 | cmd.extend(["-p", f"{vnc_port}:6901"]) # VNC port
298 | if api_port:
299 | # Map the API port to container port 8000 (computer-server default)
300 | cmd.extend(["-p", f"{api_port}:8000"]) # computer-server API port
301 |
302 | # Add volume mounts if storage is specified
303 | storage_path = storage or self.storage
304 | if storage_path and storage_path != "ephemeral":
305 | # Mount storage directory using detected home directory
306 | cmd.extend(["-v", f"{storage_path}:{self._home_dir}/storage"])
307 |
308 | # Add shared path if specified
309 | if self.shared_path:
310 | # Mount shared directory using detected home directory
311 | cmd.extend(["-v", f"{self.shared_path}:{self._home_dir}/shared"])
312 |
313 | # Add environment variables
314 | cmd.extend(["-e", "VNC_PW=password"]) # Set VNC password
315 | cmd.extend(["-e", "VNCOPTIONS=-disableBasicAuth"]) # Disable VNC basic auth
316 |
317 | # Apply display resolution if provided (e.g., "1024x768")
318 | display_resolution = run_opts.get("display")
319 | if (
320 | isinstance(display_resolution, dict)
321 | and "width" in display_resolution
322 | and "height" in display_resolution
323 | ):
324 | cmd.extend(
325 | [
326 | "-e",
327 | f"VNC_RESOLUTION={display_resolution['width']}x{display_resolution['height']}",
328 | ]
329 | )
330 |
331 | # Add the image
332 | cmd.append(docker_image)
333 |
334 | logger.info(f"Running Docker container with command: {' '.join(cmd)}")
335 |
336 | # Run the container
337 | result = subprocess.run(cmd, capture_output=True, text=True, check=True)
338 | container_id = result.stdout.strip()
339 |
340 | logger.info(f"Container {name} started with ID: {container_id[:12]}")
341 |
342 | # Store container info
343 | self._container_id = container_id
344 | self._running_containers[name] = container_id
345 |
346 | # Wait for container to be ready
347 | await self._wait_for_container_ready(name)
348 |
349 | # Return VM info
350 | vm_info = await self.get_vm(name, storage)
351 | vm_info["container_id"] = container_id[:12]
352 |
353 | return vm_info
354 |
355 | except subprocess.CalledProcessError as e:
356 | error_msg = f"Failed to run container {name}: {e.stderr}"
357 | logger.error(error_msg)
358 | return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
359 | except Exception as e:
360 | error_msg = f"Error running VM {name}: {e}"
361 | logger.error(error_msg)
362 | return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
363 |
364 | async def _wait_for_container_ready(self, container_name: str, timeout: int = 60) -> bool:
365 | """Wait for the Docker container to be fully ready.
366 |
367 | Args:
368 | container_name: Name of the Docker container to check
369 | timeout: Maximum time to wait in seconds (default: 60 seconds)
370 |
371 | Returns:
372 | True if the container is running and ready
373 | """
374 | logger.info(f"Waiting for container {container_name} to be ready...")
375 |
376 | start_time = time.time()
377 | while time.time() - start_time < timeout:
378 | try:
379 | # Check if container is running
380 | vm_info = await self.get_vm(container_name)
381 | if vm_info["status"] == "running":
382 | logger.info(f"Container {container_name} is running")
383 |
384 | # Additional check: try to connect to computer-server API
385 | # This is optional - we'll just wait a bit more for services to start
386 | await asyncio.sleep(5)
387 | return True
388 |
389 | except Exception as e:
390 | logger.debug(f"Container {container_name} not ready yet: {e}")
391 |
392 | await asyncio.sleep(2)
393 |
394 | logger.warning(f"Container {container_name} did not become ready within {timeout} seconds")
395 | return False
396 |
397 | async def stop_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
398 | """Stop a running VM by stopping the Docker container."""
399 | try:
400 | logger.info(f"Stopping container {name}")
401 |
402 | # Stop the container
403 | cmd = ["docker", "stop", name]
404 | result = subprocess.run(cmd, capture_output=True, text=True, check=True)
405 |
406 | # Remove from running containers tracking
407 | if name in self._running_containers:
408 | del self._running_containers[name]
409 |
410 | logger.info(f"Container {name} stopped successfully")
411 |
412 | # Delete container if ephemeral=True
413 | if self.ephemeral:
414 | cmd = ["docker", "rm", name]
415 | result = subprocess.run(cmd, capture_output=True, text=True, check=True)
416 |
417 | return {
418 | "name": name,
419 | "status": "stopped",
420 | "message": "Container stopped successfully",
421 | "provider": "docker",
422 | }
423 |
424 | except subprocess.CalledProcessError as e:
425 | error_msg = f"Failed to stop container {name}: {e.stderr}"
426 | logger.error(error_msg)
427 | return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
428 | except Exception as e:
429 | error_msg = f"Error stopping VM {name}: {e}"
430 | logger.error(error_msg)
431 | return {"name": name, "status": "error", "error": error_msg, "provider": "docker"}
432 |
433 | async def restart_vm(self, name: str, storage: Optional[str] = None) -> Dict[str, Any]:
434 | raise NotImplementedError("DockerProvider does not support restarting VMs.")
435 |
436 | async def update_vm(
437 | self, name: str, update_opts: Dict[str, Any], storage: Optional[str] = None
438 | ) -> Dict[str, Any]:
439 | """Update VM configuration.
440 |
441 | Note: Docker containers cannot be updated while running.
442 | This method will return an error suggesting to recreate the container.
443 | """
444 | return {
445 | "name": name,
446 | "status": "error",
447 | "error": "Docker containers cannot be updated while running. Please stop and recreate the container with new options.",
448 | "provider": "docker",
449 | }
450 |
451 | async def get_ip(self, name: str, storage: Optional[str] = None, retry_delay: int = 2) -> str:
452 | """Get the IP address of a VM, waiting indefinitely until it's available.
453 |
454 | Args:
455 | name: Name of the VM to get the IP for
456 | storage: Optional storage path override
457 | retry_delay: Delay between retries in seconds (default: 2)
458 |
459 | Returns:
460 | IP address of the VM when it becomes available
461 | """
462 | logger.info(f"Getting IP address for container {name}")
463 |
464 | total_attempts = 0
465 | while True:
466 | total_attempts += 1
467 |
468 | try:
469 | vm_info = await self.get_vm(name, storage)
470 |
471 | if vm_info["status"] == "error":
472 | raise Exception(
473 | f"VM is in error state: {vm_info.get('error', 'Unknown error')}"
474 | )
475 |
476 | # TODO: for now, return localhost
477 | # it seems the docker container is not accessible from the host
478 | # on WSL2, unless you port forward? not sure
479 | if True:
480 | logger.warning("Overriding container IP with localhost")
481 | return "localhost"
482 |
483 | # Check if we got a valid IP
484 | ip = vm_info.get("ip_address", None)
485 | if ip and ip != "unknown" and not ip.startswith("0.0.0.0"):
486 | logger.info(f"Got valid container IP address: {ip}")
487 | return ip
488 |
489 | # For Docker containers, we can also use localhost if ports are mapped
490 | if vm_info["status"] == "running" and vm_info.get("ports"):
491 | logger.info("Container is running with port mappings, using localhost")
492 | return "127.0.0.1"
493 |
494 | # Check the container status
495 | status = vm_info.get("status", "unknown")
496 |
497 | if status == "stopped":
498 | logger.info(f"Container status is {status}, but still waiting for it to start")
499 | elif status != "running":
500 | logger.info(f"Container is not running yet (status: {status}). Waiting...")
501 | else:
502 | logger.info("Container is running but no valid IP address yet. Waiting...")
503 |
504 | except Exception as e:
505 | logger.warning(f"Error getting container {name} IP: {e}, continuing to wait...")
506 |
507 | # Wait before next retry
508 | await asyncio.sleep(retry_delay)
509 |
510 | # Add progress log every 10 attempts
511 | if total_attempts % 10 == 0:
512 | logger.info(
513 | f"Still waiting for container {name} IP after {total_attempts} attempts..."
514 | )
515 |
516 | async def __aenter__(self):
517 | """Async context manager entry."""
518 | logger.debug("Entering DockerProvider context")
519 | return self
520 |
521 | async def __aexit__(self, exc_type, exc_val, exc_tb):
522 | """Async context manager exit.
523 |
524 | This method handles cleanup of running containers if needed.
525 | """
526 | logger.debug(f"Exiting DockerProvider context, handling exceptions: {exc_type}")
527 | try:
528 | # Optionally stop running containers on context exit
529 | # For now, we'll leave containers running as they might be needed
530 | # Users can manually stop them if needed
531 | pass
532 | except Exception as e:
533 | logger.error(f"Error during DockerProvider cleanup: {e}")
534 | if exc_type is None:
535 | raise
536 | return False
537 |
```
--------------------------------------------------------------------------------
/docs/content/docs/example-usecases/gemini-complex-ui-navigation.mdx:
--------------------------------------------------------------------------------
```markdown
1 | ---
2 | title: GUI Grounding with Gemini 3
3 | description: Using Google's Gemini 3 with OmniParser for Advanced GUI Grounding Tasks
4 | ---
5 |
6 | import { Step, Steps } from 'fumadocs-ui/components/steps';
7 | import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
8 | import { Callout } from 'fumadocs-ui/components/callout';
9 |
10 | ## Overview
11 |
12 | This example demonstrates how to use Google's Gemini 3 models with OmniParser for complex GUI grounding tasks. Gemini 3 Pro achieves exceptional performance on the [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding) with a **72.7% accuracy** (compared to Claude Sonnet 4.5's 36.2%), making it ideal for precise UI element location and complex navigation tasks.
13 |
14 | <img
15 | src="/docs/img/grounding-with-gemini3.gif"
16 | alt="Demo of Gemini 3 with OmniParser performing complex GUI navigation tasks"
17 | width="800px"
18 | />
19 |
20 | <Callout type="info" title="Why Gemini 3 for UI Navigation?">
21 | According to [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/),
22 | Gemini 3 Pro achieves: - **72.7%** on ScreenSpot-Pro (vs. Gemini 2.5 Pro's 11.4%) -
23 | Industry-leading performance on complex UI navigation tasks - Advanced multimodal understanding
24 | for high-resolution screens
25 | </Callout>
26 |
27 | ### What You'll Build
28 |
29 | This guide shows how to:
30 |
31 | - Set up Vertex AI with proper authentication
32 | - Use OmniParser with Gemini 3 for GUI element detection
33 | - Leverage Gemini 3-specific features like `thinking_level` and `media_resolution`
34 | - Create agents that can perform complex multi-step UI interactions
35 |
36 | ---
37 |
38 | <Steps>
39 |
40 | <Step>
41 |
42 | ### Set Up Google Cloud and Vertex AI
43 |
44 | Before using Gemini 3 models, you need to enable Vertex AI in Google Cloud Console.
45 |
46 | #### 1. Create a Google Cloud Project
47 |
48 | 1. Go to [Google Cloud Console](https://console.cloud.google.com/)
49 | 2. Click **Select a project** → **New Project**
50 | 3. Enter a project name and click **Create**
51 | 4. Note your **Project ID** (you'll need this later)
52 |
53 | #### 2. Enable Vertex AI API
54 |
55 | 1. Navigate to [Vertex AI API](https://console.cloud.google.com/apis/library/aiplatform.googleapis.com)
56 | 2. Select your project
57 | 3. Click **Enable**
58 |
59 | #### 3. Enable Billing
60 |
61 | 1. Go to [Billing](https://console.cloud.google.com/billing)
62 | 2. Link a billing account to your project
63 | 3. Vertex AI offers a [free tier](https://cloud.google.com/vertex-ai/pricing) for testing
64 |
65 | #### 4. Create a Service Account
66 |
67 | 1. Go to [IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
68 | 2. Click **Create Service Account**
69 | 3. Enter a name (e.g., "cua-gemini-agent")
70 | 4. Click **Create and Continue**
71 | 5. Grant the **Vertex AI User** role
72 | 6. Click **Done**
73 |
74 | #### 5. Create and Download Service Account Key
75 |
76 | 1. Click on your newly created service account
77 | 2. Go to **Keys** tab
78 | 3. Click **Add Key** → **Create new key**
79 | 4. Select **JSON** format
80 | 5. Click **Create** (the key file will download automatically)
81 | 6. **Important**: Store this key file securely! It contains credentials for accessing your Google Cloud resources
82 |
83 | <Callout type="warn">
84 | Never commit your service account JSON key to version control! Add it to `.gitignore` immediately.
85 | </Callout>
86 |
87 | </Step>
88 |
89 | <Step>
90 |
91 | ### Install Dependencies
92 |
93 | Install the required packages for OmniParser and Gemini 3:
94 |
95 | Create a `requirements.txt` file:
96 |
97 | ```text
98 | cua-agent
99 | cua-computer
100 | cua-som # OmniParser for GUI element detection
101 | litellm>=1.0.0
102 | python-dotenv>=1.0.0
103 | google-cloud-aiplatform>=1.70.0
104 | ```
105 |
106 | Install the dependencies:
107 |
108 | ```bash
109 | pip install -r requirements.txt
110 | ```
111 |
112 | </Step>
113 |
114 | <Step>
115 |
116 | ### Configure Environment Variables
117 |
118 | Create a `.env` file in your project root:
119 |
120 | ```text
121 | # Google Cloud / Vertex AI credentials
122 | GOOGLE_CLOUD_PROJECT=your-project-id
123 | GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-service-account-key.json
124 |
125 | # Cua credentials (for cloud sandboxes)
126 | CUA_API_KEY=sk_cua-api01...
127 | CUA_SANDBOX_NAME=your-sandbox-name
128 | ```
129 |
130 | Replace the values:
131 |
132 | - `your-project-id`: Your Google Cloud Project ID from Step 1
133 | - `/path/to/your-service-account-key.json`: Path to the JSON key file you downloaded
134 | - `sk_cua-api01...`: Your Cua API key from the [Cua dashboard](https://cua.dev)
135 | - `your-sandbox-name`: Your sandbox name (if using cloud sandboxes)
136 |
137 | </Step>
138 |
139 | <Step>
140 |
141 | ### Create Your Complex UI Navigation Script
142 |
143 | Create a Python file (e.g., `gemini_ui_navigation.py`):
144 |
145 | <Tabs items={['Cloud Sandbox', 'Linux on Docker', 'macOS Sandbox']}>
146 | <Tab value="Cloud Sandbox">
147 |
148 | ```python
149 | import asyncio
150 | import logging
151 | import os
152 | import signal
153 | import traceback
154 |
155 | from agent import ComputerAgent
156 | from computer import Computer, VMProviderType
157 | from dotenv import load_dotenv
158 |
159 | logging.basicConfig(level=logging.INFO)
160 | logger = logging.getLogger(__name__)
161 |
162 | def handle_sigint(sig, frame):
163 | print("\n\nExecution interrupted by user. Exiting gracefully...")
164 | exit(0)
165 |
166 | async def complex_ui_navigation():
167 | """
168 | Demonstrate Gemini 3's exceptional UI grounding capabilities
169 | with complex, multi-step navigation tasks.
170 | """
171 | try:
172 | async with Computer(
173 | os_type="linux",
174 | provider_type=VMProviderType.CLOUD,
175 | name=os.environ["CUA_SANDBOX_NAME"],
176 | api_key=os.environ["CUA_API_KEY"],
177 | verbosity=logging.INFO,
178 | ) as computer:
179 |
180 | agent = ComputerAgent(
181 | # Use OmniParser with Gemini 3 Pro for optimal GUI grounding
182 | model="omniparser+vertex_ai/gemini-3-pro-preview",
183 | tools=[computer],
184 | only_n_most_recent_images=3,
185 | verbosity=logging.INFO,
186 | trajectory_dir="trajectories",
187 | use_prompt_caching=False,
188 | max_trajectory_budget=5.0,
189 | # Gemini 3-specific parameters
190 | thinking_level="high", # Enables deeper reasoning (vs "low")
191 | media_resolution="high", # High-resolution image processing (vs "low" or "medium")
192 | )
193 |
194 | # Complex GUI grounding tasks inspired by ScreenSpot-Pro benchmark
195 | # These test precise element location in professional UIs
196 | tasks = [
197 | # Task 1: GitHub repository navigation
198 | {
199 | "instruction": (
200 | "Go to github.com/trycua/cua. "
201 | "Find and click on the 'Issues' tab. "
202 | "Then locate and click on the search box within the issues page "
203 | "(not the global GitHub search). "
204 | "Type 'omniparser' and press Enter."
205 | ),
206 | "description": "Tests precise UI element distinction in a complex interface",
207 | },
208 |
209 | # Task 2: Search for and install Visual Studio Code
210 | {
211 | "instruction": (
212 | "Open your system's app store (e.g., Microsoft Store). "
213 | "Search for 'Visual Studio Code'. "
214 | "In the search results, select 'Visual Studio Code'. "
215 | "Click on 'Install' or 'Get' to begin the installation. "
216 | "If prompted, accept any permissions or confirm the installation. "
217 | "Wait for Visual Studio Code to finish installing."
218 | ),
219 | "description": "Tests the ability to search for an application and complete its installation through a step-by-step app store workflow.",
220 | },
221 | ]
222 |
223 | history = []
224 |
225 | for i, task_info in enumerate(tasks, 1):
226 | task = task_info["instruction"]
227 | print(f"\n{'='*60}")
228 | print(f"[Task {i}/{len(tasks)}] {task_info['description']}")
229 | print(f"{'='*60}")
230 | print(f"\nInstruction: {task}\n")
231 |
232 | # Add user message to history
233 | history.append({"role": "user", "content": task})
234 |
235 | # Run agent with conversation history
236 | async for result in agent.run(history, stream=False):
237 | history += result.get("output", [])
238 |
239 | # Print output for debugging
240 | for item in result.get("output", []):
241 | if item.get("type") == "message":
242 | content = item.get("content", [])
243 | for content_part in content:
244 | if content_part.get("text"):
245 | logger.info(f"Agent: {content_part.get('text')}")
246 | elif item.get("type") == "computer_call":
247 | action = item.get("action", {})
248 | action_type = action.get("type", "")
249 | logger.debug(f"Computer Action: {action_type}")
250 |
251 | print(f"\n✅ Task {i}/{len(tasks)} completed")
252 |
253 | print("\n🎉 All complex UI navigation tasks completed successfully!")
254 |
255 | except Exception as e:
256 | logger.error(f"Error in complex_ui_navigation: {e}")
257 | traceback.print_exc()
258 | raise
259 |
260 | def main():
261 | try:
262 | load_dotenv()
263 |
264 | # Validate required environment variables
265 | required_vars = [
266 | "GOOGLE_CLOUD_PROJECT",
267 | "GOOGLE_APPLICATION_CREDENTIALS",
268 | "CUA_API_KEY",
269 | "CUA_SANDBOX_NAME",
270 | ]
271 |
272 | missing_vars = [var for var in required_vars if not os.environ.get(var)]
273 | if missing_vars:
274 | raise RuntimeError(
275 | f"Missing required environment variables: {', '.join(missing_vars)}\n"
276 | f"Please check your .env file and ensure all keys are set.\n"
277 | f"See the setup guide for details on configuring Vertex AI credentials."
278 | )
279 |
280 | signal.signal(signal.SIGINT, handle_sigint)
281 |
282 | asyncio.run(complex_ui_navigation())
283 |
284 | except Exception as e:
285 | logger.error(f"Error running automation: {e}")
286 | traceback.print_exc()
287 |
288 | if __name__ == "__main__":
289 | main()
290 | ```
291 |
292 | </Tab>
293 | <Tab value="Linux on Docker">
294 |
295 | ```python
296 | import asyncio
297 | import logging
298 | import os
299 | import signal
300 | import traceback
301 |
302 | from agent import ComputerAgent
303 | from computer import Computer, VMProviderType
304 | from dotenv import load_dotenv
305 |
306 | logging.basicConfig(level=logging.INFO)
307 | logger = logging.getLogger(__name__)
308 |
309 | def handle_sigint(sig, frame):
310 | print("\n\nExecution interrupted by user. Exiting gracefully...")
311 | exit(0)
312 |
313 | async def complex_ui_navigation():
314 | """
315 | Demonstrate Gemini 3's exceptional UI grounding capabilities
316 | with complex, multi-step navigation tasks.
317 | """
318 | try:
319 | async with Computer(
320 | os_type="linux",
321 | provider_type=VMProviderType.DOCKER,
322 | image="trycua/cua-xfce:latest",
323 | verbosity=logging.INFO,
324 | ) as computer:
325 |
326 | agent = ComputerAgent(
327 | # Use OmniParser with Gemini 3 Pro for optimal GUI grounding
328 | model="omniparser+vertex_ai/gemini-3-pro-preview",
329 | tools=[computer],
330 | only_n_most_recent_images=3,
331 | verbosity=logging.INFO,
332 | trajectory_dir="trajectories",
333 | use_prompt_caching=False,
334 | max_trajectory_budget=5.0,
335 | # Gemini 3-specific parameters
336 | thinking_level="high", # Enables deeper reasoning (vs "low")
337 | media_resolution="high", # High-resolution image processing (vs "low" or "medium")
338 | )
339 |
340 | # Complex GUI grounding tasks inspired by ScreenSpot-Pro benchmark
341 | tasks = [
342 | {
343 | "instruction": (
344 | "Go to github.com/trycua/cua. "
345 | "Find and click on the 'Issues' tab. "
346 | "Then locate and click on the search box within the issues page "
347 | "(not the global GitHub search). "
348 | "Type 'omniparser' and press Enter."
349 | ),
350 | "description": "Tests precise UI element distinction in a complex interface",
351 | },
352 | ]
353 |
354 | history = []
355 |
356 | for i, task_info in enumerate(tasks, 1):
357 | task = task_info["instruction"]
358 | print(f"\n{'='*60}")
359 | print(f"[Task {i}/{len(tasks)}] {task_info['description']}")
360 | print(f"{'='*60}")
361 | print(f"\nInstruction: {task}\n")
362 |
363 | history.append({"role": "user", "content": task})
364 |
365 | async for result in agent.run(history, stream=False):
366 | history += result.get("output", [])
367 |
368 | for item in result.get("output", []):
369 | if item.get("type") == "message":
370 | content = item.get("content", [])
371 | for content_part in content:
372 | if content_part.get("text"):
373 | logger.info(f"Agent: {content_part.get('text')}")
374 | elif item.get("type") == "computer_call":
375 | action = item.get("action", {})
376 | action_type = action.get("type", "")
377 | logger.debug(f"Computer Action: {action_type}")
378 |
379 | print(f"\n✅ Task {i}/{len(tasks)} completed")
380 |
381 | print("\n🎉 All complex UI navigation tasks completed successfully!")
382 |
383 | except Exception as e:
384 | logger.error(f"Error in complex_ui_navigation: {e}")
385 | traceback.print_exc()
386 | raise
387 |
388 | def main():
389 | try:
390 | load_dotenv()
391 |
392 | required_vars = [
393 | "GOOGLE_CLOUD_PROJECT",
394 | "GOOGLE_APPLICATION_CREDENTIALS",
395 | ]
396 |
397 | missing_vars = [var for var in required_vars if not os.environ.get(var)]
398 | if missing_vars:
399 | raise RuntimeError(
400 | f"Missing required environment variables: {', '.join(missing_vars)}\n"
401 | f"Please check your .env file."
402 | )
403 |
404 | signal.signal(signal.SIGINT, handle_sigint)
405 |
406 | asyncio.run(complex_ui_navigation())
407 |
408 | except Exception as e:
409 | logger.error(f"Error running automation: {e}")
410 | traceback.print_exc()
411 |
412 | if __name__ == "__main__":
413 | main()
414 | ```
415 |
416 | </Tab>
417 | <Tab value="macOS Sandbox">
418 |
419 | ```python
420 | import asyncio
421 | import logging
422 | import os
423 | import signal
424 | import traceback
425 |
426 | from agent import ComputerAgent
427 | from computer import Computer, VMProviderType
428 | from dotenv import load_dotenv
429 |
430 | logging.basicConfig(level=logging.INFO)
431 | logger = logging.getLogger(__name__)
432 |
433 | def handle_sigint(sig, frame):
434 | print("\n\nExecution interrupted by user. Exiting gracefully...")
435 | exit(0)
436 |
437 | async def complex_ui_navigation():
438 | """
439 | Demonstrate Gemini 3's exceptional UI grounding capabilities
440 | with complex, multi-step navigation tasks.
441 | """
442 | try:
443 | async with Computer(
444 | os_type="macos",
445 | provider_type=VMProviderType.LUME,
446 | name="macos-sequoia-cua:latest",
447 | verbosity=logging.INFO,
448 | ) as computer:
449 |
450 | agent = ComputerAgent(
451 | # Use OmniParser with Gemini 3 Pro for optimal GUI grounding
452 | model="omniparser+vertex_ai/gemini-3-pro-preview",
453 | tools=[computer],
454 | only_n_most_recent_images=3,
455 | verbosity=logging.INFO,
456 | trajectory_dir="trajectories",
457 | use_prompt_caching=False,
458 | max_trajectory_budget=5.0,
459 | # Gemini 3-specific parameters
460 | thinking_level="high", # Enables deeper reasoning (vs "low")
461 | media_resolution="high", # High-resolution image processing (vs "low" or "medium")
462 | )
463 |
464 | # Complex GUI grounding tasks inspired by ScreenSpot-Pro benchmark
465 | tasks = [
466 | {
467 | "instruction": (
468 | "Go to github.com/trycua/cua. "
469 | "Find and click on the 'Issues' tab. "
470 | "Then locate and click on the search box within the issues page "
471 | "(not the global GitHub search). "
472 | "Type 'omniparser' and press Enter."
473 | ),
474 | "description": "Tests precise UI element distinction in a complex interface",
475 | },
476 | ]
477 |
478 | history = []
479 |
480 | for i, task_info in enumerate(tasks, 1):
481 | task = task_info["instruction"]
482 | print(f"\n{'='*60}")
483 | print(f"[Task {i}/{len(tasks)}] {task_info['description']}")
484 | print(f"{'='*60}")
485 | print(f"\nInstruction: {task}\n")
486 |
487 | history.append({"role": "user", "content": task})
488 |
489 | async for result in agent.run(history, stream=False):
490 | history += result.get("output", [])
491 |
492 | for item in result.get("output", []):
493 | if item.get("type") == "message":
494 | content = item.get("content", [])
495 | for content_part in content:
496 | if content_part.get("text"):
497 | logger.info(f"Agent: {content_part.get('text')}")
498 | elif item.get("type") == "computer_call":
499 | action = item.get("action", {})
500 | action_type = action.get("type", "")
501 | logger.debug(f"Computer Action: {action_type}")
502 |
503 | print(f"\n✅ Task {i}/{len(tasks)} completed")
504 |
505 | print("\n🎉 All complex UI navigation tasks completed successfully!")
506 |
507 | except Exception as e:
508 | logger.error(f"Error in complex_ui_navigation: {e}")
509 | traceback.print_exc()
510 | raise
511 |
512 | def main():
513 | try:
514 | load_dotenv()
515 |
516 | required_vars = [
517 | "GOOGLE_CLOUD_PROJECT",
518 | "GOOGLE_APPLICATION_CREDENTIALS",
519 | ]
520 |
521 | missing_vars = [var for var in required_vars if not os.environ.get(var)]
522 | if missing_vars:
523 | raise RuntimeError(
524 | f"Missing required environment variables: {', '.join(missing_vars)}\n"
525 | f"Please check your .env file."
526 | )
527 |
528 | signal.signal(signal.SIGINT, handle_sigint)
529 |
530 | asyncio.run(complex_ui_navigation())
531 |
532 | except Exception as e:
533 | logger.error(f"Error running automation: {e}")
534 | traceback.print_exc()
535 |
536 | if __name__ == "__main__":
537 | main()
538 | ```
539 |
540 | </Tab>
541 | </Tabs>
542 |
543 | </Step>
544 |
545 | <Step>
546 |
547 | ### Run Your Script
548 |
549 | Execute your complex UI navigation automation:
550 |
551 | ```bash
552 | python gemini_ui_navigation.py
553 | ```
554 |
555 | The agent will:
556 |
557 | 1. Navigate to GitHub and locate specific UI elements
558 | 2. Distinguish between similar elements (e.g., global search vs. issues search)
559 | 3. Perform multi-step interactions with visual feedback
560 | 4. Use Gemini 3's advanced reasoning for precise element grounding
561 |
562 | Monitor the output to see the agent's progress through each task.
563 |
564 | </Step>
565 |
566 | </Steps>
567 |
568 | ---
569 |
570 | ## Understanding Gemini 3-Specific Parameters
571 |
572 | ### `thinking_level`
573 |
574 | Controls the amount of internal reasoning the model performs:
575 |
576 | - `"high"`: Deeper reasoning, better for complex UI navigation (recommended for ScreenSpot-like tasks)
577 | - `"low"`: Faster responses, suitable for simpler tasks
578 |
579 | ### `media_resolution`
580 |
581 | Controls vision processing for multimodal inputs:
582 |
583 | - `"high"`: Best for complex UIs with many small elements (recommended)
584 | - `"medium"`: Balanced quality and speed
585 | - `"low"`: Faster processing for simple interfaces
586 |
587 | <Callout type="info">
588 | For tasks requiring precise GUI element location (like ScreenSpot-Pro), use
589 | `thinking_level="high"` and `media_resolution="high"` for optimal performance.
590 | </Callout>
591 |
592 | ---
593 |
594 | ## Benchmark Performance
595 |
596 | Gemini 3 Pro's performance on ScreenSpot-Pro demonstrates its exceptional UI grounding capabilities:
597 |
598 | | Model | ScreenSpot-Pro Score |
599 | | ----------------- | -------------------- |
600 | | **Gemini 3 Pro** | **72.7%** |
601 | | Claude Sonnet 4.5 | 36.2% |
602 | | Gemini 2.5 Pro | 11.4% |
603 | | GPT-5.1 | 3.5% |
604 |
605 | This makes Gemini 3 the ideal choice for complex UI navigation, element detection, and professional GUI automation tasks.
606 |
607 | ---
608 |
609 | ## Troubleshooting
610 |
611 | ### Authentication Issues
612 |
613 | If you encounter authentication errors:
614 |
615 | 1. Verify your service account JSON key path is correct
616 | 2. Ensure the service account has the **Vertex AI User** role
617 | 3. Check that the Vertex AI API is enabled in your project
618 | 4. Confirm your `GOOGLE_CLOUD_PROJECT` matches your actual project ID
619 |
620 | ### "Vertex AI API not enabled" Error
621 |
622 | Run this command to enable the API:
623 |
624 | ```bash
625 | gcloud services enable aiplatform.googleapis.com --project=YOUR_PROJECT_ID
626 | ```
627 |
628 | ### Billing Issues
629 |
630 | Ensure billing is enabled for your Google Cloud project. Visit the [Billing section](https://console.cloud.google.com/billing) to verify.
631 |
632 | ---
633 |
634 | ## Next Steps
635 |
636 | - Learn more about [OmniParser agent loops](/agent-sdk/agent-loops)
637 | - Explore [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing)
638 | - Read about [ScreenSpot-Pro benchmark](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding)
639 | - Check out [Google's Gemini 3 announcement](https://blog.google/products/gemini/gemini-3/)
640 | - Join our [Discord community](https://discord.com/invite/mVnXXpdE85) for help
641 |
```