trycua/cua # codebase.md

This is page 8 of 21. Use http://codebase.md/trycua/cua?lines=true&page={x} to view the full context.

# Directory Structure

```
├── .all-contributorsrc
├── .cursorignore
├── .devcontainer
│   ├── devcontainer.json
│   ├── post-install.sh
│   └── README.md
├── .dockerignore
├── .gitattributes
├── .github
│   ├── FUNDING.yml
│   ├── scripts
│   │   ├── get_pyproject_version.py
│   │   └── tests
│   │       ├── __init__.py
│   │       ├── README.md
│   │       └── test_get_pyproject_version.py
│   └── workflows
│       ├── ci-lume.yml
│       ├── docker-publish-kasm.yml
│       ├── docker-publish-xfce.yml
│       ├── docker-reusable-publish.yml
│       ├── npm-publish-computer.yml
│       ├── npm-publish-core.yml
│       ├── publish-lume.yml
│       ├── pypi-publish-agent.yml
│       ├── pypi-publish-computer-server.yml
│       ├── pypi-publish-computer.yml
│       ├── pypi-publish-core.yml
│       ├── pypi-publish-mcp-server.yml
│       ├── pypi-publish-pylume.yml
│       ├── pypi-publish-som.yml
│       ├── pypi-reusable-publish.yml
│       └── test-validation-script.yml
├── .gitignore
├── .vscode
│   ├── docs.code-workspace
│   ├── launch.json
│   ├── libs-ts.code-workspace
│   ├── lume.code-workspace
│   ├── lumier.code-workspace
│   ├── py.code-workspace
│   └── settings.json
├── blog
│   ├── app-use.md
│   ├── assets
│   │   ├── composite-agents.png
│   │   ├── docker-ubuntu-support.png
│   │   ├── hack-booth.png
│   │   ├── hack-closing-ceremony.jpg
│   │   ├── hack-cua-ollama-hud.jpeg
│   │   ├── hack-leaderboard.png
│   │   ├── hack-the-north.png
│   │   ├── hack-winners.jpeg
│   │   ├── hack-workshop.jpeg
│   │   ├── hud-agent-evals.png
│   │   └── trajectory-viewer.jpeg
│   ├── bringing-computer-use-to-the-web.md
│   ├── build-your-own-operator-on-macos-1.md
│   ├── build-your-own-operator-on-macos-2.md
│   ├── composite-agents.md
│   ├── cua-hackathon.md
│   ├── hack-the-north.md
│   ├── hud-agent-evals.md
│   ├── human-in-the-loop.md
│   ├── introducing-cua-cloud-containers.md
│   ├── lume-to-containerization.md
│   ├── sandboxed-python-execution.md
│   ├── training-computer-use-models-trajectories-1.md
│   ├── trajectory-viewer.md
│   ├── ubuntu-docker-support.md
│   └── windows-sandbox.md
├── CONTRIBUTING.md
├── Development.md
├── Dockerfile
├── docs
│   ├── .gitignore
│   ├── .prettierrc
│   ├── content
│   │   └── docs
│   │       ├── agent-sdk
│   │       │   ├── agent-loops.mdx
│   │       │   ├── benchmarks
│   │       │   │   ├── index.mdx
│   │       │   │   ├── interactive.mdx
│   │       │   │   ├── introduction.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── osworld-verified.mdx
│   │       │   │   ├── screenspot-pro.mdx
│   │       │   │   └── screenspot-v2.mdx
│   │       │   ├── callbacks
│   │       │   │   ├── agent-lifecycle.mdx
│   │       │   │   ├── cost-saving.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── logging.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── pii-anonymization.mdx
│   │       │   │   └── trajectories.mdx
│   │       │   ├── chat-history.mdx
│   │       │   ├── custom-computer-handlers.mdx
│   │       │   ├── custom-tools.mdx
│   │       │   ├── customizing-computeragent.mdx
│   │       │   ├── integrations
│   │       │   │   ├── hud.mdx
│   │       │   │   └── meta.json
│   │       │   ├── message-format.mdx
│   │       │   ├── meta.json
│   │       │   ├── migration-guide.mdx
│   │       │   ├── prompt-caching.mdx
│   │       │   ├── supported-agents
│   │       │   │   ├── composed-agents.mdx
│   │       │   │   ├── computer-use-agents.mdx
│   │       │   │   ├── grounding-models.mdx
│   │       │   │   ├── human-in-the-loop.mdx
│   │       │   │   └── meta.json
│   │       │   ├── supported-model-providers
│   │       │   │   ├── index.mdx
│   │       │   │   └── local-models.mdx
│   │       │   └── usage-tracking.mdx
│   │       ├── computer-sdk
│   │       │   ├── cloud-vm-management.mdx
│   │       │   ├── commands.mdx
│   │       │   ├── computer-ui.mdx
│   │       │   ├── computers.mdx
│   │       │   ├── meta.json
│   │       │   └── sandboxed-python.mdx
│   │       ├── index.mdx
│   │       ├── libraries
│   │       │   ├── agent
│   │       │   │   └── index.mdx
│   │       │   ├── computer
│   │       │   │   └── index.mdx
│   │       │   ├── computer-server
│   │       │   │   ├── Commands.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── REST-API.mdx
│   │       │   │   └── WebSocket-API.mdx
│   │       │   ├── core
│   │       │   │   └── index.mdx
│   │       │   ├── lume
│   │       │   │   ├── cli-reference.mdx
│   │       │   │   ├── faq.md
│   │       │   │   ├── http-api.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   ├── meta.json
│   │       │   │   └── prebuilt-images.mdx
│   │       │   ├── lumier
│   │       │   │   ├── building-lumier.mdx
│   │       │   │   ├── docker-compose.mdx
│   │       │   │   ├── docker.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   └── meta.json
│   │       │   ├── mcp-server
│   │       │   │   ├── client-integrations.mdx
│   │       │   │   ├── configuration.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   ├── llm-integrations.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── tools.mdx
│   │       │   │   └── usage.mdx
│   │       │   └── som
│   │       │       ├── configuration.mdx
│   │       │       └── index.mdx
│   │       ├── meta.json
│   │       ├── quickstart-cli.mdx
│   │       ├── quickstart-devs.mdx
│   │       └── telemetry.mdx
│   ├── next.config.mjs
│   ├── package-lock.json
│   ├── package.json
│   ├── pnpm-lock.yaml
│   ├── postcss.config.mjs
│   ├── public
│   │   └── img
│   │       ├── agent_gradio_ui.png
│   │       ├── agent.png
│   │       ├── cli.png
│   │       ├── computer.png
│   │       ├── som_box_threshold.png
│   │       └── som_iou_threshold.png
│   ├── README.md
│   ├── source.config.ts
│   ├── src
│   │   ├── app
│   │   │   ├── (home)
│   │   │   │   ├── [[...slug]]
│   │   │   │   │   └── page.tsx
│   │   │   │   └── layout.tsx
│   │   │   ├── api
│   │   │   │   └── search
│   │   │   │       └── route.ts
│   │   │   ├── favicon.ico
│   │   │   ├── global.css
│   │   │   ├── layout.config.tsx
│   │   │   ├── layout.tsx
│   │   │   ├── llms.mdx
│   │   │   │   └── [[...slug]]
│   │   │   │       └── route.ts
│   │   │   └── llms.txt
│   │   │       └── route.ts
│   │   ├── assets
│   │   │   ├── discord-black.svg
│   │   │   ├── discord-white.svg
│   │   │   ├── logo-black.svg
│   │   │   └── logo-white.svg
│   │   ├── components
│   │   │   ├── iou.tsx
│   │   │   └── mermaid.tsx
│   │   ├── lib
│   │   │   ├── llms.ts
│   │   │   └── source.ts
│   │   └── mdx-components.tsx
│   └── tsconfig.json
├── examples
│   ├── agent_examples.py
│   ├── agent_ui_examples.py
│   ├── cloud_api_examples.py
│   ├── computer_examples_windows.py
│   ├── computer_examples.py
│   ├── computer_ui_examples.py
│   ├── computer-example-ts
│   │   ├── .env.example
│   │   ├── .gitignore
│   │   ├── .prettierrc
│   │   ├── package-lock.json
│   │   ├── package.json
│   │   ├── pnpm-lock.yaml
│   │   ├── README.md
│   │   ├── src
│   │   │   ├── helpers.ts
│   │   │   └── index.ts
│   │   └── tsconfig.json
│   ├── docker_examples.py
│   ├── evals
│   │   ├── hud_eval_examples.py
│   │   └── wikipedia_most_linked.txt
│   ├── pylume_examples.py
│   ├── sandboxed_functions_examples.py
│   ├── som_examples.py
│   ├── utils.py
│   └── winsandbox_example.py
├── img
│   ├── agent_gradio_ui.png
│   ├── agent.png
│   ├── cli.png
│   ├── computer.png
│   ├── logo_black.png
│   └── logo_white.png
├── libs
│   ├── kasm
│   │   ├── Dockerfile
│   │   ├── LICENSE
│   │   ├── README.md
│   │   └── src
│   │       └── ubuntu
│   │           └── install
│   │               └── firefox
│   │                   ├── custom_startup.sh
│   │                   ├── firefox.desktop
│   │                   └── install_firefox.sh
│   ├── lume
│   │   ├── .cursorignore
│   │   ├── CONTRIBUTING.md
│   │   ├── Development.md
│   │   ├── img
│   │   │   └── cli.png
│   │   ├── Package.resolved
│   │   ├── Package.swift
│   │   ├── README.md
│   │   ├── resources
│   │   │   └── lume.entitlements
│   │   ├── scripts
│   │   │   ├── build
│   │   │   │   ├── build-debug.sh
│   │   │   │   ├── build-release-notarized.sh
│   │   │   │   └── build-release.sh
│   │   │   └── install.sh
│   │   ├── src
│   │   │   ├── Commands
│   │   │   │   ├── Clone.swift
│   │   │   │   ├── Config.swift
│   │   │   │   ├── Create.swift
│   │   │   │   ├── Delete.swift
│   │   │   │   ├── Get.swift
│   │   │   │   ├── Images.swift
│   │   │   │   ├── IPSW.swift
│   │   │   │   ├── List.swift
│   │   │   │   ├── Logs.swift
│   │   │   │   ├── Options
│   │   │   │   │   └── FormatOption.swift
│   │   │   │   ├── Prune.swift
│   │   │   │   ├── Pull.swift
│   │   │   │   ├── Push.swift
│   │   │   │   ├── Run.swift
│   │   │   │   ├── Serve.swift
│   │   │   │   ├── Set.swift
│   │   │   │   └── Stop.swift
│   │   │   ├── ContainerRegistry
│   │   │   │   ├── ImageContainerRegistry.swift
│   │   │   │   ├── ImageList.swift
│   │   │   │   └── ImagesPrinter.swift
│   │   │   ├── Errors
│   │   │   │   └── Errors.swift
│   │   │   ├── FileSystem
│   │   │   │   ├── Home.swift
│   │   │   │   ├── Settings.swift
│   │   │   │   ├── VMConfig.swift
│   │   │   │   ├── VMDirectory.swift
│   │   │   │   └── VMLocation.swift
│   │   │   ├── LumeController.swift
│   │   │   ├── Main.swift
│   │   │   ├── Server
│   │   │   │   ├── Handlers.swift
│   │   │   │   ├── HTTP.swift
│   │   │   │   ├── Requests.swift
│   │   │   │   ├── Responses.swift
│   │   │   │   └── Server.swift
│   │   │   ├── Utils
│   │   │   │   ├── CommandRegistry.swift
│   │   │   │   ├── CommandUtils.swift
│   │   │   │   ├── Logger.swift
│   │   │   │   ├── NetworkUtils.swift
│   │   │   │   ├── Path.swift
│   │   │   │   ├── ProcessRunner.swift
│   │   │   │   ├── ProgressLogger.swift
│   │   │   │   ├── String.swift
│   │   │   │   └── Utils.swift
│   │   │   ├── Virtualization
│   │   │   │   ├── DarwinImageLoader.swift
│   │   │   │   ├── DHCPLeaseParser.swift
│   │   │   │   ├── ImageLoaderFactory.swift
│   │   │   │   └── VMVirtualizationService.swift
│   │   │   ├── VM
│   │   │   │   ├── DarwinVM.swift
│   │   │   │   ├── LinuxVM.swift
│   │   │   │   ├── VM.swift
│   │   │   │   ├── VMDetails.swift
│   │   │   │   ├── VMDetailsPrinter.swift
│   │   │   │   ├── VMDisplayResolution.swift
│   │   │   │   └── VMFactory.swift
│   │   │   └── VNC
│   │   │       ├── PassphraseGenerator.swift
│   │   │       └── VNCService.swift
│   │   └── tests
│   │       ├── Mocks
│   │       │   ├── MockVM.swift
│   │       │   ├── MockVMVirtualizationService.swift
│   │       │   └── MockVNCService.swift
│   │       ├── VM
│   │       │   └── VMDetailsPrinterTests.swift
│   │       ├── VMTests.swift
│   │       ├── VMVirtualizationServiceTests.swift
│   │       └── VNCServiceTests.swift
│   ├── lumier
│   │   ├── .dockerignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   └── src
│   │       ├── bin
│   │       │   └── entry.sh
│   │       ├── config
│   │       │   └── constants.sh
│   │       ├── hooks
│   │       │   └── on-logon.sh
│   │       └── lib
│   │           ├── utils.sh
│   │           └── vm.sh
│   ├── python
│   │   ├── agent
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── agent
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── adapters
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── huggingfacelocal_adapter.py
│   │   │   │   │   ├── human_adapter.py
│   │   │   │   │   ├── mlxvlm_adapter.py
│   │   │   │   │   └── models
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── generic.py
│   │   │   │   │       ├── internvl.py
│   │   │   │   │       ├── opencua.py
│   │   │   │   │       └── qwen2_5_vl.py
│   │   │   │   ├── agent.py
│   │   │   │   ├── callbacks
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── budget_manager.py
│   │   │   │   │   ├── image_retention.py
│   │   │   │   │   ├── logging.py
│   │   │   │   │   ├── operator_validator.py
│   │   │   │   │   ├── pii_anonymization.py
│   │   │   │   │   ├── prompt_instructions.py
│   │   │   │   │   ├── telemetry.py
│   │   │   │   │   └── trajectory_saver.py
│   │   │   │   ├── cli.py
│   │   │   │   ├── computers
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cua.py
│   │   │   │   │   └── custom.py
│   │   │   │   ├── decorators.py
│   │   │   │   ├── human_tool
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── __main__.py
│   │   │   │   │   ├── server.py
│   │   │   │   │   └── ui.py
│   │   │   │   ├── integrations
│   │   │   │   │   └── hud
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── agent.py
│   │   │   │   │       └── proxy.py
│   │   │   │   ├── loops
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── anthropic.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── composed_grounded.py
│   │   │   │   │   ├── gemini.py
│   │   │   │   │   ├── glm45v.py
│   │   │   │   │   ├── gta1.py
│   │   │   │   │   ├── holo.py
│   │   │   │   │   ├── internvl.py
│   │   │   │   │   ├── model_types.csv
│   │   │   │   │   ├── moondream3.py
│   │   │   │   │   ├── omniparser.py
│   │   │   │   │   ├── openai.py
│   │   │   │   │   ├── opencua.py
│   │   │   │   │   └── uitars.py
│   │   │   │   ├── proxy
│   │   │   │   │   ├── examples.py
│   │   │   │   │   └── handlers.py
│   │   │   │   ├── responses.py
│   │   │   │   ├── types.py
│   │   │   │   └── ui
│   │   │   │       ├── __init__.py
│   │   │   │       ├── __main__.py
│   │   │   │       └── gradio
│   │   │   │           ├── __init__.py
│   │   │   │           ├── app.py
│   │   │   │           └── ui_components.py
│   │   │   ├── benchmarks
│   │   │   │   ├── .gitignore
│   │   │   │   ├── contrib.md
│   │   │   │   ├── interactive.py
│   │   │   │   ├── models
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── gta1.py
│   │   │   │   ├── README.md
│   │   │   │   ├── ss-pro.py
│   │   │   │   ├── ss-v2.py
│   │   │   │   └── utils.py
│   │   │   ├── example.py
│   │   │   ├── pyproject.toml
│   │   │   └── README.md
│   │   ├── computer
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── computer
│   │   │   │   ├── __init__.py
│   │   │   │   ├── computer.py
│   │   │   │   ├── diorama_computer.py
│   │   │   │   ├── helpers.py
│   │   │   │   ├── interface
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── generic.py
│   │   │   │   │   ├── linux.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   ├── models.py
│   │   │   │   │   └── windows.py
│   │   │   │   ├── logger.py
│   │   │   │   ├── models.py
│   │   │   │   ├── providers
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cloud
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── docker
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── lume
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── lume_api.py
│   │   │   │   │   ├── lumier
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── types.py
│   │   │   │   │   └── winsandbox
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── provider.py
│   │   │   │   │       └── setup_script.ps1
│   │   │   │   ├── ui
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── __main__.py
│   │   │   │   │   └── gradio
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       └── app.py
│   │   │   │   └── utils.py
│   │   │   ├── poetry.toml
│   │   │   ├── pyproject.toml
│   │   │   └── README.md
│   │   ├── computer-server
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── computer_server
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── cli.py
│   │   │   │   ├── diorama
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── diorama_computer.py
│   │   │   │   │   ├── diorama.py
│   │   │   │   │   ├── draw.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   └── safezone.py
│   │   │   │   ├── handlers
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── generic.py
│   │   │   │   │   ├── linux.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   └── windows.py
│   │   │   │   ├── main.py
│   │   │   │   ├── server.py
│   │   │   │   └── watchdog.py
│   │   │   ├── examples
│   │   │   │   ├── __init__.py
│   │   │   │   └── usage_example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   ├── run_server.py
│   │   │   └── test_connection.py
│   │   ├── core
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── core
│   │   │   │   ├── __init__.py
│   │   │   │   └── telemetry
│   │   │   │       ├── __init__.py
│   │   │   │       └── posthog.py
│   │   │   ├── poetry.toml
│   │   │   ├── pyproject.toml
│   │   │   └── README.md
│   │   ├── mcp-server
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── CONCURRENT_SESSIONS.md
│   │   │   ├── mcp_server
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── server.py
│   │   │   │   └── session_manager.py
│   │   │   ├── pdm.lock
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── scripts
│   │   │       ├── install_mcp_server.sh
│   │   │       └── start_mcp_server.sh
│   │   ├── pylume
│   │   │   ├── __init__.py
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── pylume
│   │   │   │   ├── __init__.py
│   │   │   │   ├── client.py
│   │   │   │   ├── exceptions.py
│   │   │   │   ├── lume
│   │   │   │   ├── models.py
│   │   │   │   ├── pylume.py
│   │   │   │   └── server.py
│   │   │   ├── pyproject.toml
│   │   │   └── README.md
│   │   └── som
│   │       ├── .bumpversion.cfg
│   │       ├── LICENSE
│   │       ├── poetry.toml
│   │       ├── pyproject.toml
│   │       ├── README.md
│   │       ├── som
│   │       │   ├── __init__.py
│   │       │   ├── detect.py
│   │       │   ├── detection.py
│   │       │   ├── models.py
│   │       │   ├── ocr.py
│   │       │   ├── util
│   │       │   │   └── utils.py
│   │       │   └── visualization.py
│   │       └── tests
│   │           └── test_omniparser.py
│   ├── typescript
│   │   ├── .gitignore
│   │   ├── .nvmrc
│   │   ├── agent
│   │   │   ├── examples
│   │   │   │   ├── playground-example.html
│   │   │   │   └── README.md
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── client.ts
│   │   │   │   ├── index.ts
│   │   │   │   └── types.ts
│   │   │   ├── tests
│   │   │   │   └── client.test.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── biome.json
│   │   ├── computer
│   │   │   ├── .editorconfig
│   │   │   ├── .gitattributes
│   │   │   ├── .gitignore
│   │   │   ├── LICENSE
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── computer
│   │   │   │   │   ├── index.ts
│   │   │   │   │   ├── providers
│   │   │   │   │   │   ├── base.ts
│   │   │   │   │   │   ├── cloud.ts
│   │   │   │   │   │   └── index.ts
│   │   │   │   │   └── types.ts
│   │   │   │   ├── index.ts
│   │   │   │   ├── interface
│   │   │   │   │   ├── base.ts
│   │   │   │   │   ├── factory.ts
│   │   │   │   │   ├── index.ts
│   │   │   │   │   ├── linux.ts
│   │   │   │   │   ├── macos.ts
│   │   │   │   │   └── windows.ts
│   │   │   │   └── types.ts
│   │   │   ├── tests
│   │   │   │   ├── computer
│   │   │   │   │   └── cloud.test.ts
│   │   │   │   ├── interface
│   │   │   │   │   ├── factory.test.ts
│   │   │   │   │   ├── index.test.ts
│   │   │   │   │   ├── linux.test.ts
│   │   │   │   │   ├── macos.test.ts
│   │   │   │   │   └── windows.test.ts
│   │   │   │   └── setup.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── core
│   │   │   ├── .editorconfig
│   │   │   ├── .gitattributes
│   │   │   ├── .gitignore
│   │   │   ├── LICENSE
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── index.ts
│   │   │   │   └── telemetry
│   │   │   │       ├── clients
│   │   │   │       │   ├── index.ts
│   │   │   │       │   └── posthog.ts
│   │   │   │       └── index.ts
│   │   │   ├── tests
│   │   │   │   └── telemetry.test.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── package.json
│   │   ├── pnpm-lock.yaml
│   │   ├── pnpm-workspace.yaml
│   │   └── README.md
│   └── xfce
│       ├── .dockerignore
│       ├── .gitignore
│       ├── Dockerfile
│       ├── README.md
│       └── src
│           ├── scripts
│           │   ├── resize-display.sh
│           │   ├── start-computer-server.sh
│           │   ├── start-novnc.sh
│           │   ├── start-vnc.sh
│           │   └── xstartup.sh
│           ├── supervisor
│           │   └── supervisord.conf
│           └── xfce-config
│               ├── helpers.rc
│               ├── xfce4-power-manager.xml
│               └── xfce4-session.xml
├── LICENSE.md
├── Makefile
├── notebooks
│   ├── agent_nb.ipynb
│   ├── blog
│   │   ├── build-your-own-operator-on-macos-1.ipynb
│   │   └── build-your-own-operator-on-macos-2.ipynb
│   ├── composite_agents_docker_nb.ipynb
│   ├── computer_nb.ipynb
│   ├── computer_server_nb.ipynb
│   ├── customizing_computeragent.ipynb
│   ├── eval_osworld.ipynb
│   ├── ollama_nb.ipynb
│   ├── pylume_nb.ipynb
│   ├── README.md
│   ├── sota_hackathon_cloud.ipynb
│   └── sota_hackathon.ipynb
├── pdm.lock
├── pyproject.toml
├── pyrightconfig.json
├── README.md
├── samples
│   └── community
│       ├── global-online
│       │   └── README.md
│       └── hack-the-north
│           └── README.md
├── scripts
│   ├── build-uv.sh
│   ├── build.ps1
│   ├── build.sh
│   ├── cleanup.sh
│   ├── playground-docker.sh
│   ├── playground.sh
│   └── run-docker-dev.sh
└── tests
    ├── pytest.ini
    ├── shell_cmd.py
    ├── test_files.py
    ├── test_mcp_server_session_management.py
    ├── test_mcp_server_streaming.py
    ├── test_shell_bash.py
    ├── test_telemetry.py
    ├── test_venv.py
    └── test_watchdog.py
```

# Files

--------------------------------------------------------------------------------
/libs/python/mcp-server/CONCURRENT_SESSIONS.md:
--------------------------------------------------------------------------------

```markdown
  1 | # MCP Server Concurrent Session Management
  2 | 
  3 | This document describes the improvements made to the MCP Server to address concurrent session management and resource lifecycle issues.
  4 | 
  5 | ## Problem Statement
  6 | 
  7 | The original MCP server implementation had several critical issues:
  8 | 
  9 | 1. **Global Computer Instance**: Used a single `global_computer` variable shared across all clients
 10 | 2. **No Resource Isolation**: Multiple clients would interfere with each other
 11 | 3. **Sequential Task Processing**: Multi-task operations were always sequential
 12 | 4. **No Graceful Shutdown**: Server couldn't properly cleanup resources on shutdown
 13 | 5. **Hidden Event Loop**: `server.run()` hid the event loop, preventing proper lifecycle management
 14 | 
 15 | ## Solution Architecture
 16 | 
 17 | ### 1. Session Manager (`session_manager.py`)
 18 | 
 19 | The `SessionManager` class provides:
 20 | 
 21 | - **Per-session computer instances**: Each client gets isolated computer resources
 22 | - **Computer instance pooling**: Efficient reuse of computer instances with lifecycle management
 23 | - **Task registration**: Track active tasks per session for graceful cleanup
 24 | - **Automatic cleanup**: Background task cleans up idle sessions
 25 | - **Resource limits**: Configurable maximum concurrent sessions
 26 | 
 27 | #### Key Components:
 28 | 
 29 | ```python
 30 | class SessionManager:
 31 |     def __init__(self, max_concurrent_sessions: int = 10):
 32 |         self._sessions: Dict[str, SessionInfo] = {}
 33 |         self._computer_pool = ComputerPool()
 34 |         # ... lifecycle management
 35 | ```
 36 | 
 37 | #### Session Lifecycle:
 38 | 
 39 | 1. **Creation**: New session created when client first connects
 40 | 2. **Task Registration**: Each task is registered with the session
 41 | 3. **Activity Tracking**: Last activity time updated on each operation
 42 | 4. **Cleanup**: Sessions cleaned up when idle or on shutdown
 43 | 
 44 | ### 2. Computer Pool (`ComputerPool`)
 45 | 
 46 | Manages computer instances efficiently:
 47 | 
 48 | - **Pool Size Limits**: Maximum number of concurrent computer instances
 49 | - **Instance Reuse**: Available instances reused across sessions
 50 | - **Lifecycle Management**: Proper startup/shutdown of computer instances
 51 | - **Resource Cleanup**: All instances properly closed on shutdown
 52 | 
 53 | ### 3. Enhanced Server Tools
 54 | 
 55 | All server tools now support:
 56 | 
 57 | - **Session ID Parameter**: Optional `session_id` for multi-client support
 58 | - **Resource Isolation**: Each session gets its own computer instance
 59 | - **Task Tracking**: Proper registration/unregistration of tasks
 60 | - **Error Handling**: Graceful error handling with session cleanup
 61 | 
 62 | #### Updated Tool Signatures:
 63 | 
 64 | ```python
 65 | async def screenshot_cua(ctx: Context, session_id: Optional[str] = None) -> Any:
 66 | async def run_cua_task(ctx: Context, task: str, session_id: Optional[str] = None) -> Any:
 67 | async def run_multi_cua_tasks(ctx: Context, tasks: List[str], session_id: Optional[str] = None, concurrent: bool = False) -> Any:
 68 | ```
 69 | 
 70 | ### 4. Concurrent Task Execution
 71 | 
 72 | The `run_multi_cua_tasks` tool now supports:
 73 | 
 74 | - **Sequential Mode** (default): Tasks run one after another
 75 | - **Concurrent Mode**: Tasks run in parallel using `asyncio.gather()`
 76 | - **Progress Tracking**: Proper progress reporting for both modes
 77 | - **Error Handling**: Individual task failures don't stop other tasks
 78 | 
 79 | ### 5. Graceful Shutdown
 80 | 
 81 | The server now provides:
 82 | 
 83 | - **Signal Handlers**: Proper handling of SIGINT and SIGTERM
 84 | - **Session Cleanup**: All active sessions properly cleaned up
 85 | - **Resource Release**: Computer instances returned to pool and closed
 86 | - **Async Lifecycle**: Event loop properly exposed for cleanup
 87 | 
 88 | ## Usage Examples
 89 | 
 90 | ### Basic Usage (Backward Compatible)
 91 | 
 92 | ```python
 93 | # These calls work exactly as before
 94 | await screenshot_cua(ctx)
 95 | await run_cua_task(ctx, "Open browser")
 96 | await run_multi_cua_tasks(ctx, ["Task 1", "Task 2"])
 97 | ```
 98 | 
 99 | ### Multi-Client Usage
100 | 
101 | ```python
102 | # Client 1
103 | session_id_1 = "client-1-session"
104 | await screenshot_cua(ctx, session_id_1)
105 | await run_cua_task(ctx, "Open browser", session_id_1)
106 | 
107 | # Client 2 (completely isolated)
108 | session_id_2 = "client-2-session"
109 | await screenshot_cua(ctx, session_id_2)
110 | await run_cua_task(ctx, "Open editor", session_id_2)
111 | ```
112 | 
113 | ### Concurrent Task Execution
114 | 
115 | ```python
116 | # Run tasks concurrently instead of sequentially
117 | tasks = ["Open browser", "Open editor", "Open terminal"]
118 | results = await run_multi_cua_tasks(ctx, tasks, concurrent=True)
119 | ```
120 | 
121 | ### Session Management
122 | 
123 | ```python
124 | # Get session statistics
125 | stats = await get_session_stats(ctx)
126 | print(f"Active sessions: {stats['total_sessions']}")
127 | 
128 | # Cleanup specific session
129 | await cleanup_session(ctx, "session-to-cleanup")
130 | ```
131 | 
132 | ## Configuration
133 | 
134 | ### Environment Variables
135 | 
136 | - `CUA_MODEL_NAME`: Model to use (default: `anthropic/claude-3-5-sonnet-20241022`)
137 | - `CUA_MAX_IMAGES`: Maximum images to keep (default: `3`)
138 | 
139 | ### Session Manager Configuration
140 | 
141 | ```python
142 | # In session_manager.py
143 | class SessionManager:
144 |     def __init__(self, max_concurrent_sessions: int = 10):
145 |         # Configurable maximum concurrent sessions
146 |         
147 | class ComputerPool:
148 |     def __init__(self, max_size: int = 5, idle_timeout: float = 300.0):
149 |         # Configurable pool size and idle timeout
150 | ```
151 | 
152 | ## Performance Improvements
153 | 
154 | ### Before (Issues):
155 | - ❌ Single global computer instance
156 | - ❌ Client interference and resource conflicts
157 | - ❌ Sequential task processing only
158 | - ❌ No graceful shutdown
159 | - ❌ 30s timeout issues with long-running tasks
160 | 
161 | ### After (Benefits):
162 | - ✅ Per-session computer instances with proper isolation
163 | - ✅ Computer instance pooling for efficient resource usage
164 | - ✅ Concurrent task execution support
165 | - ✅ Graceful shutdown with proper cleanup
166 | - ✅ Streaming updates prevent timeout issues
167 | - ✅ Configurable resource limits
168 | - ✅ Automatic session cleanup
169 | 
170 | ## Testing
171 | 
172 | Comprehensive test coverage includes:
173 | 
174 | - Session creation and reuse
175 | - Concurrent session isolation
176 | - Task registration and cleanup
177 | - Error handling with session management
178 | - Concurrent vs sequential task execution
179 | - Session statistics and cleanup
180 | 
181 | Run tests with:
182 | 
183 | ```bash
184 | pytest tests/test_mcp_server_session_management.py -v
185 | ```
186 | 
187 | ## Migration Guide
188 | 
189 | ### For Existing Clients
190 | 
191 | No changes required! The new implementation is fully backward compatible:
192 | 
193 | ```python
194 | # This still works exactly as before
195 | await run_cua_task(ctx, "My task")
196 | ```
197 | 
198 | ### For New Multi-Client Applications
199 | 
200 | Use session IDs for proper isolation:
201 | 
202 | ```python
203 | # Create a unique session ID for each client
204 | session_id = str(uuid.uuid4())
205 | await run_cua_task(ctx, "My task", session_id)
206 | ```
207 | 
208 | ### For Concurrent Task Execution
209 | 
210 | Enable concurrent mode for better performance:
211 | 
212 | ```python
213 | tasks = ["Task 1", "Task 2", "Task 3"]
214 | results = await run_multi_cua_tasks(ctx, tasks, concurrent=True)
215 | ```
216 | 
217 | ## Monitoring and Debugging
218 | 
219 | ### Session Statistics
220 | 
221 | ```python
222 | stats = await get_session_stats(ctx)
223 | print(f"Total sessions: {stats['total_sessions']}")
224 | print(f"Max concurrent: {stats['max_concurrent']}")
225 | for session_id, session_info in stats['sessions'].items():
226 |     print(f"Session {session_id}: {session_info['active_tasks']} active tasks")
227 | ```
228 | 
229 | ### Logging
230 | 
231 | The server provides detailed logging for:
232 | 
233 | - Session creation and cleanup
234 | - Task registration and completion
235 | - Resource pool usage
236 | - Error conditions and recovery
237 | 
238 | ### Graceful Shutdown
239 | 
240 | The server properly handles shutdown signals:
241 | 
242 | ```bash
243 | # Send SIGTERM for graceful shutdown
244 | kill -TERM <server_pid>
245 | 
246 | # Or use Ctrl+C (SIGINT)
247 | ```
248 | 
249 | ## Future Enhancements
250 | 
251 | Potential future improvements:
252 | 
253 | 1. **Session Persistence**: Save/restore session state across restarts
254 | 2. **Load Balancing**: Distribute sessions across multiple server instances
255 | 3. **Resource Monitoring**: Real-time monitoring of resource usage
256 | 4. **Auto-scaling**: Dynamic adjustment of pool size based on demand
257 | 5. **Session Timeouts**: Configurable timeouts for different session types
258 | 
```

--------------------------------------------------------------------------------
/blog/human-in-the-loop.md:
--------------------------------------------------------------------------------

```markdown
  1 | # When Agents Need Human Wisdom - Introducing Human-In-The-Loop Support
  2 | 
  3 | *Published on August 29, 2025 by Francesco Bonacci*
  4 | 
  5 | Sometimes the best AI agent is a human. Whether you're creating training demonstrations, evaluating complex scenarios, or need to intervene when automation hits a wall, our new Human-In-The-Loop integration puts you directly in control.
  6 | 
  7 | With yesterday's [HUD evaluation integration](hud-agent-evals.md), you could benchmark any agent at scale. Today's update lets you *become* the agent when it matters most—seamlessly switching between automated intelligence and human judgment.
  8 | 
  9 | <div align="center">
 10 |   <video src="https://github.com/user-attachments/assets/9091b50f-26e7-4981-95ce-40e5d42a1260" width="600" controls></video>
 11 | </div>
 12 | 
 13 | ## What you get
 14 | 
 15 | - **One-line human takeover** for any agent configuration with `human/human` or `model+human/human`
 16 | - **Interactive web UI** to see what your agent sees and control what it does
 17 | - **Zero context switching** - step in exactly where automation left off
 18 | - **Training data generation** - create perfect demonstrations by doing tasks yourself
 19 | - **Ground truth evaluation** - validate agent performance with human expertise
 20 | 
 21 | ## Why Human-In-The-Loop?
 22 | 
 23 | Even the most sophisticated agents encounter edge cases, ambiguous interfaces, or tasks requiring human judgment. Rather than failing gracefully, they can now fail *intelligently*—by asking for human help.
 24 | 
 25 | This approach bridges the gap between fully automated systems and pure manual control, letting you:
 26 | - **Demonstrate complex workflows** that agents can learn from
 27 | - **Evaluate tricky scenarios** where ground truth requires human assessment  
 28 | - **Intervene selectively** when automated agents need guidance
 29 | - **Test and debug** your tools and environments manually
 30 | 
 31 | ## Getting Started
 32 | 
 33 | Launch the human agent interface:
 34 | 
 35 | ```bash
 36 | python -m agent.human_tool
 37 | ```
 38 | 
 39 | The web UI will show pending completions. Click any completion to take control of the agent and see exactly what it sees.
 40 | 
 41 | ## Usage Examples
 42 | 
 43 | ### Direct Human Control
 44 | 
 45 | Perfect for creating demonstrations or when you want full manual control:
 46 | 
 47 | ```python
 48 | from agent import ComputerAgent
 49 | from agent.computer import computer
 50 | 
 51 | agent = ComputerAgent(
 52 |     "human/human",
 53 |     tools=[computer]
 54 | )
 55 | 
 56 | # You'll get full control through the web UI
 57 | async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"):
 58 |     pass
 59 | ```
 60 | 
 61 | ### Hybrid: AI Planning + Human Execution
 62 | 
 63 | Combine model intelligence with human precision—let AI plan, then execute manually:
 64 | 
 65 | ```python
 66 | agent = ComputerAgent(
 67 |     "huggingface-local/HelloKKMe/GTA1-7B+human/human",  
 68 |     tools=[computer]
 69 | )
 70 | 
 71 | # AI creates the plan, human executes each step
 72 | async for _ in agent.run("Navigate to the settings page and enable dark mode"):
 73 |     pass
 74 | ```
 75 | 
 76 | ### Fallback Pattern
 77 | 
 78 | Start automated, escalate to human when needed:
 79 | 
 80 | ```python
 81 | # Primary automated agent
 82 | primary_agent = ComputerAgent("openai/computer-use-preview", tools=[computer])
 83 | 
 84 | # Human fallback agent  
 85 | fallback_agent = ComputerAgent("human/human", tools=[computer])
 86 | 
 87 | try:
 88 |     async for result in primary_agent.run(task):
 89 |         if result.confidence < 0.7:  # Low confidence threshold
 90 |             # Seamlessly hand off to human
 91 |             async for _ in fallback_agent.run(f"Continue this task: {task}"):
 92 |                 pass
 93 | except Exception:
 94 |     # Agent failed, human takes over
 95 |     async for _ in fallback_agent.run(f"Handle this failed task: {task}"):
 96 |         pass
 97 | ```
 98 | 
 99 | ## Interactive Features
100 | 
101 | The human-in-the-loop interface provides a rich, responsive experience:
102 | 
103 | ### **Visual Environment**
104 | - **Screenshot display** with live updates as you work
105 | - **Click handlers** for direct interaction with UI elements  
106 | - **Zoom and pan** to see details clearly
107 | 
108 | ### **Action Controls**
109 | - **Click actions** - precise cursor positioning and clicking
110 | - **Keyboard input** - type text naturally or send specific key combinations
111 | - **Action history** - see the sequence of actions taken
112 | - **Undo support** - step back when needed
113 | 
114 | ### **Tool Integration** 
115 | - **Full OpenAI compatibility** - standard tool call format
116 | - **Custom tools** - integrate your own tools seamlessly
117 | - **Real-time feedback** - see tool responses immediately
118 | 
119 | ### **Smart Polling**
120 | - **Responsive updates** - UI refreshes when new completions arrive
121 | - **Background processing** - continue working while waiting for tasks
122 | - **Session persistence** - resume interrupted sessions
123 | 
124 | ## Real-World Use Cases
125 | 
126 | ### **Training Data Generation**
127 | Create perfect demonstrations for fine-tuning:
128 | 
129 | ```python
130 | # Generate training examples for spreadsheet tasks
131 | demo_agent = ComputerAgent("human/human", tools=[computer])
132 | 
133 | tasks = [
134 |     "Create a budget spreadsheet with income and expense categories",
135 |     "Apply conditional formatting to highlight overbudget items", 
136 |     "Generate a pie chart showing expense distribution"
137 | ]
138 | 
139 | for task in tasks:
140 |     # Human demonstrates each task perfectly
141 |     async for _ in demo_agent.run(task):
142 |         pass  # Recorded actions become training data
143 | ```
144 | 
145 | ### **Evaluation and Ground Truth**
146 | Validate agent performance on complex scenarios:
147 | 
148 | ```python
149 | # Human evaluates agent performance
150 | evaluator = ComputerAgent("human/human", tools=[computer])
151 | 
152 | async for _ in evaluator.run("Review this completed form and rate accuracy (1-10)"):
153 |     pass  # Human provides authoritative quality assessment
154 | ```
155 | 
156 | ### **Interactive Debugging**
157 | Step through agent behavior manually:
158 | 
159 | ```python
160 | # Test a workflow step by step
161 | debug_agent = ComputerAgent("human/human", tools=[computer])
162 | 
163 | async for _ in debug_agent.run("Reproduce the agent's failed login sequence"):
164 |     pass  # Human identifies exactly where automation breaks
165 | ```
166 | 
167 | ### **Edge Case Handling**
168 | Handle scenarios that break automated agents:
169 | 
170 | ```python
171 | # Complex UI interaction requiring human judgment
172 | edge_case_agent = ComputerAgent("human/human", tools=[computer])
173 | 
174 | async for _ in edge_case_agent.run("Navigate this CAPTCHA-protected form"):
175 |     pass  # Human handles what automation cannot
176 | ```
177 | 
178 | ## Configuration Options
179 | 
180 | Customize the human agent experience:
181 | 
182 | - **UI refresh rate**: Adjust polling frequency for your workflow
183 | - **Image quality**: Balance detail vs. performance for screenshots  
184 | - **Action logging**: Save detailed traces for analysis and training
185 | - **Session timeout**: Configure idle timeouts for security
186 | - **Tool permissions**: Restrict which tools humans can access
187 | 
188 | ## When to Use Human-In-The-Loop
189 | 
190 | | **Scenario** | **Why Human Control** |
191 | |--------------|----------------------|
192 | | **Creating training data** | Perfect demonstrations for model fine-tuning |
193 | | **Evaluating complex tasks** | Human judgment for subjective or nuanced assessment |  
194 | | **Handling edge cases** | CAPTCHAs, unusual UIs, context-dependent decisions |
195 | | **Debugging workflows** | Step through failures to identify breaking points |
196 | | **High-stakes operations** | Critical tasks requiring human oversight and approval |
197 | | **Testing new environments** | Validate tools and environments work as expected |
198 | 
199 | ## Learn More
200 | 
201 | - **Interactive examples**: Try human-in-the-loop control with sample tasks
202 | - **Training data pipelines**: Learn how to convert human demonstrations into model training data  
203 | - **Evaluation frameworks**: Build human-validated test suites for your agents
204 | - **API documentation**: Full reference for human agent configuration
205 | 
206 | Ready to put humans back in the loop? The most sophisticated AI system knows when to ask for help.
207 | 
208 | ---
209 | 
210 | *Questions about human-in-the-loop agents? Join the conversation in our [Discord community](https://discord.gg/cua-ai) or check out our [documentation](https://docs.trycua.com/docs/agent-sdk/supported-agents/human-in-the-loop).*
211 | 
```

--------------------------------------------------------------------------------
/docs/content/docs/quickstart-cli.mdx:
--------------------------------------------------------------------------------

```markdown
  1 | ---
  2 | title: Quickstart (CLI)
  3 | description: Get started with the cua Agent CLI in 4 steps
  4 | icon: Rocket
  5 | ---
  6 | 
  7 | import { Step, Steps } from 'fumadocs-ui/components/steps';
  8 | import { Tab, Tabs } from 'fumadocs-ui/components/tabs';
  9 | import { Accordion, Accordions } from 'fumadocs-ui/components/accordion';
 10 | 
 11 | Get up and running with the cua Agent CLI in 4 simple steps.
 12 | 
 13 | <Steps>
 14 | <Step>
 15 | 
 16 | ## Introduction
 17 | 
 18 | cua combines Computer (interface) + Agent (AI) for automating desktop apps. The Agent CLI provides a clean terminal interface to control your remote computer using natural language commands.
 19 | 
 20 | </Step>
 21 | 
 22 | <Step>
 23 | 
 24 | ## Set Up Your Computer Environment
 25 | 
 26 | Choose how you want to run your cua computer. **Cloud Sandbox is recommended** for the easiest setup:
 27 | 
 28 | <Tabs items={['☁️ Cloud Sandbox (Recommended)', 'Linux on Docker', 'Windows Sandbox', 'macOS VM']}>
 29 |   <Tab value="☁️ Cloud Sandbox (Recommended)">
 30 | 
 31 |     **Easiest & safest way to get started - works on any host OS**
 32 | 
 33 |     1. Go to [trycua.com/signin](https://www.trycua.com/signin)
 34 |     2. Navigate to **Dashboard > Containers > Create Instance**
 35 |     3. Create a **Medium, Ubuntu 22** container
 36 |     4. Note your container name and API key
 37 | 
 38 |     Your cloud container will be automatically configured and ready to use.
 39 | 
 40 |   </Tab>
 41 |   <Tab value="Linux on Docker">
 42 | 
 43 |     **Run Linux desktop locally on macOS, Windows, or Linux hosts**
 44 | 
 45 |     1. Install Docker Desktop or Docker Engine
 46 | 
 47 |     2. Pull the CUA XFCE container (lightweight desktop)
 48 | 
 49 |     ```bash
 50 |     docker pull --platform=linux/amd64 trycua/cua-xfce:latest
 51 |     ```
 52 | 
 53 |     Or use KASM for a full-featured desktop:
 54 | 
 55 |     ```bash
 56 |     docker pull --platform=linux/amd64 trycua/cua-ubuntu:latest
 57 |     ```
 58 | 
 59 |   </Tab>
 60 |   <Tab value="Windows Sandbox">
 61 | 
 62 |     **Windows hosts only - requires Windows 10 Pro/Enterprise or Windows 11**
 63 | 
 64 |     1. Enable Windows Sandbox
 65 |     2. Install pywinsandbox dependency
 66 | 
 67 |     ```bash
 68 |     pip install -U git+git://github.com/karkason/pywinsandbox.git
 69 |     ```
 70 | 
 71 |     3. Windows Sandbox will be automatically configured when you run the CLI
 72 | 
 73 |   </Tab>
 74 |   <Tab value="macOS VM">
 75 | 
 76 |     **macOS hosts only - requires Lume CLI**
 77 | 
 78 |     1. Install lume cli
 79 | 
 80 |     ```bash
 81 |     /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
 82 |     ```
 83 | 
 84 |     2. Start a local cua macOS VM
 85 | 
 86 |     ```bash
 87 |     lume run macos-sequoia-cua:latest
 88 |     ```
 89 | 
 90 |   </Tab>
 91 | </Tabs>
 92 | 
 93 | </Step>
 94 | 
 95 | <Step>
 96 | 
 97 | ## Install cua
 98 | 
 99 | <Accordions type="single" defaultValue="uv">
100 | 
101 | <Accordion title="uv (Recommended)" value="uv">
102 | 
103 | ### Install uv
104 | 
105 | <Tabs items={['macOS / Linux', 'Windows']} persist>
106 | <Tab value="macOS / Linux">
107 | 
108 | ```bash
109 | # Use curl to download the script and execute it with sh:
110 | curl -LsSf https://astral.sh/uv/install.sh | sh
111 | 
112 | # If your system doesn't have curl, you can use wget:
113 | # wget -qO- https://astral.sh/uv/install.sh | sh
114 | ```
115 | 
116 | </Tab>
117 | <Tab value="Windows">
118 | 
119 | ```powershell
120 | # Use irm to download the script and execute it with iex:
121 | powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
122 | ```
123 | 
124 | </Tab>
125 | </Tabs>
126 | 
127 | ### Install Python 3.12
128 | 
129 | ```bash
130 | uv python install 3.12
131 | # uv will install cua dependencies automatically when you use --with "cua-agent[cli]"
132 | ```
133 | 
134 | </Accordion>
135 | 
136 | <Accordion title="conda" value="conda">
137 | 
138 | ### Install conda
139 | 
140 | <Tabs items={['macOS', 'Linux', 'Windows']} persist>
141 | <Tab value="macOS">
142 | 
143 | ```bash
144 | mkdir -p ~/miniconda3
145 | curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh
146 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
147 | rm ~/miniconda3/miniconda.sh
148 | source ~/miniconda3/bin/activate
149 | ```
150 | 
151 | </Tab>
152 | <Tab value="Linux">
153 | 
154 | ```bash
155 | mkdir -p ~/miniconda3
156 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
157 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
158 | rm ~/miniconda3/miniconda.sh
159 | source ~/miniconda3/bin/activate
160 | ```
161 | 
162 | </Tab>
163 | <Tab value="Windows">
164 | 
165 | ```powershell
166 | wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -outfile ".\miniconda.exe"
167 | Start-Process -FilePath ".\miniconda.exe" -ArgumentList "/S" -Wait
168 | del .\miniconda.exe
169 | ```
170 | 
171 | </Tab>
172 | </Tabs>
173 | 
174 | ### Create and activate Python 3.12 environment
175 | 
176 | ```bash
177 | conda create -n cua python=3.12
178 | conda activate cua
179 | ```
180 | 
181 | ### Install cua
182 | 
183 | ```bash
184 | pip install "cua-agent[cli]" cua-computer
185 | ```
186 | 
187 | </Accordion>
188 | 
189 | <Accordion title="pip" value="pip">
190 | 
191 | ### Install cua
192 | 
193 | ```bash
194 | pip install "cua-agent[cli]" cua-computer
195 | ```
196 | 
197 | </Accordion>
198 | 
199 | </Accordions>
200 | 
201 | </Step>
202 | 
203 | <Step>
204 | 
205 | ## Run cua CLI
206 | 
207 | Choose your preferred AI model:
208 | 
209 | ### OpenAI Computer Use Preview
210 | 
211 | <Tabs items={['uv', 'conda/pip']} persist>
212 | <Tab value="uv">
213 | 
214 | ```bash
215 | uv run --with "cua-agent[cli]" -m agent.cli openai/computer-use-preview
216 | ```
217 | 
218 | </Tab>
219 | <Tab value="conda/pip">
220 | 
221 | ```bash
222 | python -m agent.cli openai/computer-use-preview
223 | ```
224 | 
225 | </Tab>
226 | </Tabs>
227 | 
228 | ### Anthropic Claude
229 | 
230 | <Tabs items={['uv', 'conda/pip']} persist>
231 | <Tab value="uv">
232 | 
233 | ```bash
234 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-sonnet-4-5-20250929
235 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-opus-4-20250514
236 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-opus-4-1-20250805
237 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-sonnet-4-20250514
238 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-3-5-sonnet-20241022
239 | ```
240 | 
241 | </Tab>
242 | <Tab value="conda/pip">
243 | 
244 | ```bash
245 | python -m agent.cli anthropic/claude-sonnet-4-5-20250929
246 | python -m agent.cli anthropic/claude-opus-4-1-20250805
247 | python -m agent.cli anthropic/claude-opus-4-20250514
248 | python -m agent.cli anthropic/claude-sonnet-4-20250514
249 | python -m agent.cli anthropic/claude-3-5-sonnet-20241022
250 | ```
251 | 
252 | </Tab>
253 | </Tabs>
254 | 
255 | ### Omniparser + LLMs
256 | 
257 | <Tabs items={['uv', 'conda/pip']} persist>
258 | <Tab value="uv">
259 | 
260 | ```bash
261 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+anthropic/claude-3-5-sonnet-20241022
262 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+openai/gpt-4o
263 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+vertex_ai/gemini-pro
264 | ```
265 | 
266 | </Tab>
267 | <Tab value="conda/pip">
268 | 
269 | ```bash
270 | python -m agent.cli omniparser+anthropic/claude-3-5-sonnet-20241022
271 | python -m agent.cli omniparser+openai/gpt-4o
272 | python -m agent.cli omniparser+vertex_ai/gemini-pro
273 | ```
274 | 
275 | </Tab>
276 | </Tabs>
277 | 
278 | ### Local Models
279 | 
280 | <Tabs items={['uv', 'conda/pip']} persist>
281 | <Tab value="uv">
282 | 
283 | ```bash
284 | # Hugging Face models (local)
285 | uv run --with "cua-agent[cli]" -m agent.cli huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B
286 | 
287 | # MLX models (Apple Silicon)
288 | uv run --with "cua-agent[cli]" -m agent.cli mlx/mlx-community/UI-TARS-1.5-7B-6bit
289 | 
290 | # Ollama models
291 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+ollama_chat/llama3.2:latest
292 | ```
293 | 
294 | </Tab>
295 | <Tab value="conda/pip">
296 | 
297 | ```bash
298 | # Hugging Face models (local)
299 | python -m agent.cli huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B
300 | 
301 | # MLX models (Apple Silicon)
302 | python -m agent.cli mlx/mlx-community/UI-TARS-1.5-7B-6bit
303 | 
304 | # Ollama models
305 | python -m agent.cli omniparser+ollama_chat/llama3.2:latest
306 | ```
307 | 
308 | </Tab>
309 | </Tabs>
310 | 
311 | ### Interactive Setup
312 | 
313 | If you haven't set up environment variables, the CLI will guide you through the setup:
314 | 
315 | 1. **Sandbox Name**: Enter your cua sandbox name (or get one at [trycua.com](https://www.trycua.com/))
316 | 2. **CUA API Key**: Enter your cua API key
317 | 3. **Provider API Key**: Enter your AI provider API key (OpenAI, Anthropic, etc.)
318 | 
319 | ### Start Chatting
320 | 
321 | Once connected, you'll see:
322 | 
323 | ```
324 | 💻 Connected to your-container-name (model, agent_loop)
325 | Type 'exit' to quit.
326 | 
327 | >
328 | ```
329 | 
330 | You can ask your agent to perform actions like:
331 | 
332 | - "Take a screenshot and tell me what's on the screen"
333 | - "Open Firefox and go to github.com"
334 | - "Type 'Hello world' into the terminal"
335 | - "Close the current window"
336 | - "Click on the search button"
337 | 
338 | </Step>
339 | </Steps>
340 | 
341 | ---
342 | 
343 | For advanced Python usage and GUI interface, see the [Quickstart (GUI)](/quickstart-ui) and [Quickstart for Developers](/quickstart-devs).
344 | 
345 | For running models locally, see [Running Models Locally](/agent-sdk/local-models).
346 | 
```

--------------------------------------------------------------------------------
/libs/python/agent/agent/human_tool/server.py:
--------------------------------------------------------------------------------

```python
  1 | import asyncio
  2 | import uuid
  3 | from datetime import datetime
  4 | from typing import Dict, List, Any, Optional
  5 | from dataclasses import dataclass, asdict
  6 | from enum import Enum
  7 | 
  8 | from fastapi import FastAPI, HTTPException
  9 | from pydantic import BaseModel
 10 | 
 11 | 
 12 | class CompletionStatus(str, Enum):
 13 |     PENDING = "pending"
 14 |     COMPLETED = "completed"
 15 |     FAILED = "failed"
 16 | 
 17 | 
 18 | @dataclass
 19 | class CompletionCall:
 20 |     id: str
 21 |     messages: List[Dict[str, Any]]
 22 |     model: str
 23 |     status: CompletionStatus
 24 |     created_at: datetime
 25 |     completed_at: Optional[datetime] = None
 26 |     response: Optional[str] = None
 27 |     tool_calls: Optional[List[Dict[str, Any]]] = None
 28 |     error: Optional[str] = None
 29 | 
 30 | 
 31 | class ToolCall(BaseModel):
 32 |     id: str
 33 |     type: str = "function"
 34 |     function: Dict[str, Any]
 35 | 
 36 | 
 37 | class CompletionRequest(BaseModel):
 38 |     messages: List[Dict[str, Any]]
 39 |     model: str
 40 | 
 41 | 
 42 | class CompletionResponse(BaseModel):
 43 |     response: Optional[str] = None
 44 |     tool_calls: Optional[List[Dict[str, Any]]] = None
 45 | 
 46 | 
 47 | class CompletionQueue:
 48 |     def __init__(self):
 49 |         self._queue: Dict[str, CompletionCall] = {}
 50 |         self._pending_order: List[str] = []
 51 |         self._lock = asyncio.Lock()
 52 |     
 53 |     async def add_completion(self, messages: List[Dict[str, Any]], model: str) -> str:
 54 |         """Add a completion call to the queue."""
 55 |         async with self._lock:
 56 |             call_id = str(uuid.uuid4())
 57 |             completion_call = CompletionCall(
 58 |                 id=call_id,
 59 |                 messages=messages,
 60 |                 model=model,
 61 |                 status=CompletionStatus.PENDING,
 62 |                 created_at=datetime.now()
 63 |             )
 64 |             self._queue[call_id] = completion_call
 65 |             self._pending_order.append(call_id)
 66 |             return call_id
 67 |     
 68 |     async def get_pending_calls(self) -> List[Dict[str, Any]]:
 69 |         """Get all pending completion calls."""
 70 |         async with self._lock:
 71 |             pending_calls = []
 72 |             for call_id in self._pending_order:
 73 |                 if call_id in self._queue and self._queue[call_id].status == CompletionStatus.PENDING:
 74 |                     call = self._queue[call_id]
 75 |                     pending_calls.append({
 76 |                         "id": call.id,
 77 |                         "model": call.model,
 78 |                         "created_at": call.created_at.isoformat(),
 79 |                         "messages": call.messages
 80 |                     })
 81 |             return pending_calls
 82 |     
 83 |     async def get_call_status(self, call_id: str) -> Optional[Dict[str, Any]]:
 84 |         """Get the status of a specific completion call."""
 85 |         async with self._lock:
 86 |             if call_id not in self._queue:
 87 |                 return None
 88 |             
 89 |             call = self._queue[call_id]
 90 |             result = {
 91 |                 "id": call.id,
 92 |                 "status": call.status.value,
 93 |                 "created_at": call.created_at.isoformat(),
 94 |                 "model": call.model,
 95 |                 "messages": call.messages
 96 |             }
 97 |             
 98 |             if call.completed_at:
 99 |                 result["completed_at"] = call.completed_at.isoformat()
100 |             if call.response:
101 |                 result["response"] = call.response
102 |             if call.tool_calls:
103 |                 result["tool_calls"] = call.tool_calls
104 |             if call.error:
105 |                 result["error"] = call.error
106 |                 
107 |             return result
108 |     
109 |     async def complete_call(self, call_id: str, response: Optional[str] = None, tool_calls: Optional[List[Dict[str, Any]]] = None) -> bool:
110 |         """Mark a completion call as completed with a response or tool calls."""
111 |         async with self._lock:
112 |             if call_id not in self._queue:
113 |                 return False
114 |             
115 |             call = self._queue[call_id]
116 |             if call.status != CompletionStatus.PENDING:
117 |                 return False
118 |             
119 |             call.status = CompletionStatus.COMPLETED
120 |             call.completed_at = datetime.now()
121 |             call.response = response
122 |             call.tool_calls = tool_calls
123 |             
124 |             # Remove from pending order
125 |             if call_id in self._pending_order:
126 |                 self._pending_order.remove(call_id)
127 |             
128 |             return True
129 |     
130 |     async def fail_call(self, call_id: str, error: str) -> bool:
131 |         """Mark a completion call as failed with an error."""
132 |         async with self._lock:
133 |             if call_id not in self._queue:
134 |                 return False
135 |             
136 |             call = self._queue[call_id]
137 |             if call.status != CompletionStatus.PENDING:
138 |                 return False
139 |             
140 |             call.status = CompletionStatus.FAILED
141 |             call.completed_at = datetime.now()
142 |             call.error = error
143 |             
144 |             # Remove from pending order
145 |             if call_id in self._pending_order:
146 |                 self._pending_order.remove(call_id)
147 |             
148 |             return True
149 |     
150 |     async def wait_for_completion(self, call_id: str, timeout: float = 300.0) -> Optional[str]:
151 |         """Wait for a completion call to be completed and return the response."""
152 |         start_time = asyncio.get_event_loop().time()
153 |         
154 |         while True:
155 |             status = await self.get_call_status(call_id)
156 |             if not status:
157 |                 return None
158 |             
159 |             if status["status"] == CompletionStatus.COMPLETED.value:
160 |                 return status.get("response")
161 |             elif status["status"] == CompletionStatus.FAILED.value:
162 |                 raise Exception(f"Completion failed: {status.get('error', 'Unknown error')}")
163 |             
164 |             # Check timeout
165 |             if asyncio.get_event_loop().time() - start_time > timeout:
166 |                 await self.fail_call(call_id, "Timeout waiting for human response")
167 |                 raise TimeoutError("Timeout waiting for human response")
168 |             
169 |             # Wait a bit before checking again
170 |             await asyncio.sleep(0.5)
171 | 
172 | 
173 | # Global queue instance
174 | completion_queue = CompletionQueue()
175 | 
176 | # FastAPI app
177 | app = FastAPI(title="Human Completion Server", version="1.0.0")
178 | 
179 | 
180 | @app.post("/queue", response_model=Dict[str, str])
181 | async def queue_completion(request: CompletionRequest):
182 |     """Add a completion request to the queue."""
183 |     call_id = await completion_queue.add_completion(request.messages, request.model)
184 |     return {"id": call_id, "status": "queued"}
185 | 
186 | 
187 | @app.get("/pending")
188 | async def list_pending():
189 |     """List all pending completion calls."""
190 |     pending_calls = await completion_queue.get_pending_calls()
191 |     return {"pending_calls": pending_calls}
192 | 
193 | 
194 | @app.get("/status/{call_id}")
195 | async def get_status(call_id: str):
196 |     """Get the status of a specific completion call."""
197 |     status = await completion_queue.get_call_status(call_id)
198 |     if not status:
199 |         raise HTTPException(status_code=404, detail="Completion call not found")
200 |     return status
201 | 
202 | 
203 | @app.post("/complete/{call_id}")
204 | async def complete_call(call_id: str, response: CompletionResponse):
205 |     """Complete a call with a human response."""
206 |     success = await completion_queue.complete_call(
207 |         call_id, 
208 |         response=response.response, 
209 |         tool_calls=response.tool_calls
210 |     )
211 |     if success:
212 |         return {"status": "success", "message": "Call completed"}
213 |     else:
214 |         raise HTTPException(status_code=404, detail="Call not found or already completed")
215 | 
216 | 
217 | @app.post("/fail/{call_id}")
218 | async def fail_call(call_id: str, error: Dict[str, str]):
219 |     """Mark a call as failed."""
220 |     success = await completion_queue.fail_call(call_id, error.get("error", "Unknown error"))
221 |     if not success:
222 |         raise HTTPException(status_code=404, detail="Completion call not found or already completed")
223 |     return {"status": "failed"}
224 | 
225 | 
226 | @app.get("/")
227 | async def root():
228 |     """Root endpoint."""
229 |     return {"message": "Human Completion Server is running"}
230 | 
231 | 
232 | if __name__ == "__main__":
233 |     import uvicorn
234 |     uvicorn.run(app, host="0.0.0.0", port=8002)
235 | 
```

--------------------------------------------------------------------------------
/libs/python/agent/agent/computers/custom.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Custom computer handler implementation that accepts a dictionary of functions.
  3 | """
  4 | 
  5 | import base64
  6 | from typing import Dict, List, Any, Literal, Union, Optional, Callable
  7 | from PIL import Image
  8 | import io
  9 | from .base import AsyncComputerHandler
 10 | 
 11 | 
 12 | class CustomComputerHandler(AsyncComputerHandler):
 13 |     """Computer handler that implements the Computer protocol using a dictionary of custom functions."""
 14 |     
 15 |     def __init__(self, functions: Dict[str, Callable]):
 16 |         """
 17 |         Initialize with a dictionary of functions.
 18 |         
 19 |         Args:
 20 |             functions: Dictionary where keys are method names and values are callable functions.
 21 |                       Only 'screenshot' is required, all others are optional.
 22 |         
 23 |         Raises:
 24 |             ValueError: If required 'screenshot' function is not provided.
 25 |         """
 26 |         if 'screenshot' not in functions:
 27 |             raise ValueError("'screenshot' function is required in functions dictionary")
 28 |         
 29 |         self.functions = functions
 30 |         self._last_screenshot_size: Optional[tuple[int, int]] = None
 31 |     
 32 |     async def _call_function(self, func, *args, **kwargs):
 33 |         """
 34 |         Call a function, handling both async and sync functions.
 35 |         
 36 |         Args:
 37 |             func: The function to call
 38 |             *args: Positional arguments to pass to the function
 39 |             **kwargs: Keyword arguments to pass to the function
 40 |             
 41 |         Returns:
 42 |             The result of the function call
 43 |         """
 44 |         import asyncio
 45 |         import inspect
 46 |         
 47 |         if callable(func):
 48 |             if inspect.iscoroutinefunction(func):
 49 |                 return await func(*args, **kwargs)
 50 |             else:
 51 |                 return func(*args, **kwargs)
 52 |         else:
 53 |             return func
 54 |     
 55 |     async def _get_value(self, attribute: str):
 56 |         """
 57 |         Get value for an attribute, checking both 'get_{attribute}' and '{attribute}' keys.
 58 |         
 59 |         Args:
 60 |             attribute: The attribute name to look for
 61 |             
 62 |         Returns:
 63 |             The value from the functions dict, called if callable, returned directly if not
 64 |         """
 65 |         # Check for 'get_{attribute}' first
 66 |         get_key = f"get_{attribute}"
 67 |         if get_key in self.functions:
 68 |             return await self._call_function(self.functions[get_key])
 69 |         
 70 |         # Check for '{attribute}' 
 71 |         if attribute in self.functions:
 72 |             return await self._call_function(self.functions[attribute])
 73 |         
 74 |         return None
 75 |     
 76 |     def _to_b64_str(self, img: Union[bytes, Image.Image, str]) -> str:
 77 |         """
 78 |         Convert image to base64 string.
 79 |         
 80 |         Args:
 81 |             img: Image as bytes, PIL Image, or base64 string
 82 |             
 83 |         Returns:
 84 |             str: Base64 encoded image string
 85 |         """
 86 |         if isinstance(img, str):
 87 |             # Already a base64 string
 88 |             return img
 89 |         elif isinstance(img, bytes):
 90 |             # Raw bytes
 91 |             return base64.b64encode(img).decode('utf-8')
 92 |         elif isinstance(img, Image.Image):
 93 |             # PIL Image
 94 |             buffer = io.BytesIO()
 95 |             img.save(buffer, format='PNG')
 96 |             return base64.b64encode(buffer.getvalue()).decode('utf-8')
 97 |         else:
 98 |             raise ValueError(f"Unsupported image type: {type(img)}")
 99 |     
100 |     # ==== Computer-Use-Preview Action Space ==== 
101 | 
102 |     async def get_environment(self) -> Literal["windows", "mac", "linux", "browser"]:
103 |         """Get the current environment type."""
104 |         result = await self._get_value('environment')
105 |         if result is None:
106 |             return "linux"
107 |         assert result in ["windows", "mac", "linux", "browser"]
108 |         return result # type: ignore
109 | 
110 |     async def get_dimensions(self) -> tuple[int, int]:
111 |         """Get screen dimensions as (width, height)."""
112 |         result = await self._get_value('dimensions')
113 |         if result is not None:
114 |             return result # type: ignore
115 |         
116 |         # Fallback: use last screenshot size if available
117 |         if not self._last_screenshot_size:
118 |             await self.screenshot()
119 |         assert self._last_screenshot_size is not None, "Failed to get screenshot size"
120 |         
121 |         return self._last_screenshot_size
122 |     
123 |     async def screenshot(self) -> str:
124 |         """Take a screenshot and return as base64 string."""
125 |         result = await self._call_function(self.functions['screenshot'])
126 |         b64_str = self._to_b64_str(result) # type: ignore
127 |         
128 |         # Try to extract dimensions for fallback use
129 |         try:
130 |             if isinstance(result, Image.Image):
131 |                 self._last_screenshot_size = result.size
132 |             elif isinstance(result, bytes):
133 |                 # Try to decode bytes to get dimensions
134 |                 img = Image.open(io.BytesIO(result))
135 |                 self._last_screenshot_size = img.size
136 |         except Exception:
137 |             # If we can't get dimensions, that's okay
138 |             pass
139 |         
140 |         return b64_str
141 |     
142 |     async def click(self, x: int, y: int, button: str = "left") -> None:
143 |         """Click at coordinates with specified button."""
144 |         if 'click' in self.functions:
145 |             await self._call_function(self.functions['click'], x, y, button)
146 |         # No-op if not implemented
147 |     
148 |     async def double_click(self, x: int, y: int) -> None:
149 |         """Double click at coordinates."""
150 |         if 'double_click' in self.functions:
151 |             await self._call_function(self.functions['double_click'], x, y)
152 |         # No-op if not implemented
153 |     
154 |     async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None:
155 |         """Scroll at coordinates with specified scroll amounts."""
156 |         if 'scroll' in self.functions:
157 |             await self._call_function(self.functions['scroll'], x, y, scroll_x, scroll_y)
158 |         # No-op if not implemented
159 |     
160 |     async def type(self, text: str) -> None:
161 |         """Type text."""
162 |         if 'type' in self.functions:
163 |             await self._call_function(self.functions['type'], text)
164 |         # No-op if not implemented
165 |     
166 |     async def wait(self, ms: int = 1000) -> None:
167 |         """Wait for specified milliseconds."""
168 |         if 'wait' in self.functions:
169 |             await self._call_function(self.functions['wait'], ms)
170 |         else:
171 |             # Default implementation
172 |             import asyncio
173 |             await asyncio.sleep(ms / 1000.0)
174 |     
175 |     async def move(self, x: int, y: int) -> None:
176 |         """Move cursor to coordinates."""
177 |         if 'move' in self.functions:
178 |             await self._call_function(self.functions['move'], x, y)
179 |         # No-op if not implemented
180 |     
181 |     async def keypress(self, keys: Union[List[str], str]) -> None:
182 |         """Press key combination."""
183 |         if 'keypress' in self.functions:
184 |             await self._call_function(self.functions['keypress'], keys)
185 |         # No-op if not implemented
186 |     
187 |     async def drag(self, path: List[Dict[str, int]]) -> None:
188 |         """Drag along specified path."""
189 |         if 'drag' in self.functions:
190 |             await self._call_function(self.functions['drag'], path)
191 |         # No-op if not implemented
192 |     
193 |     async def get_current_url(self) -> str:
194 |         """Get current URL (for browser environments)."""
195 |         if 'get_current_url' in self.functions:
196 |             return await self._get_value('current_url') # type: ignore
197 |         return ""  # Default fallback
198 |     
199 |     async def left_mouse_down(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
200 |         """Left mouse down at coordinates."""
201 |         if 'left_mouse_down' in self.functions:
202 |             await self._call_function(self.functions['left_mouse_down'], x, y)
203 |         # No-op if not implemented
204 |     
205 |     async def left_mouse_up(self, x: Optional[int] = None, y: Optional[int] = None) -> None:
206 |         """Left mouse up at coordinates."""
207 |         if 'left_mouse_up' in self.functions:
208 |             await self._call_function(self.functions['left_mouse_up'], x, y)
209 |         # No-op if not implemented
210 | 
```

--------------------------------------------------------------------------------
/libs/typescript/core/src/telemetry/clients/posthog.ts:
--------------------------------------------------------------------------------

```typescript
  1 | /**
  2 |  * Telemetry client using PostHog for collecting anonymous usage data.
  3 |  */
  4 | 
  5 | import * as fs from 'node:fs';
  6 | import * as os from 'node:os';
  7 | import * as path from 'node:path';
  8 | import { pino } from 'pino';
  9 | import { PostHog } from 'posthog-node';
 10 | import { v4 as uuidv4 } from 'uuid';
 11 | 
 12 | // Controls how frequently telemetry will be sent (percentage)
 13 | export const TELEMETRY_SAMPLE_RATE = 100; // 100% sampling rate
 14 | 
 15 | // Public PostHog config for anonymous telemetry
 16 | // These values are intentionally public and meant for anonymous telemetry only
 17 | // https://posthog.com/docs/product-analytics/troubleshooting#is-it-ok-for-my-api-key-to-be-exposed-and-public
 18 | export const PUBLIC_POSTHOG_API_KEY =
 19 |   'phc_eSkLnbLxsnYFaXksif1ksbrNzYlJShr35miFLDppF14';
 20 | export const PUBLIC_POSTHOG_HOST = 'https://eu.i.posthog.com';
 21 | 
 22 | export class PostHogTelemetryClient {
 23 |   private config: {
 24 |     enabled: boolean;
 25 |     sampleRate: number;
 26 |     posthog: { apiKey: string; host: string };
 27 |   };
 28 |   private installationId: string;
 29 |   private initialized = false;
 30 |   private queuedEvents: {
 31 |     name: string;
 32 |     properties: Record<string, unknown>;
 33 |     timestamp: number;
 34 |   }[] = [];
 35 |   private startTime: number; // seconds
 36 |   private posthogClient?: PostHog;
 37 |   private counters: Record<string, number> = {};
 38 | 
 39 |   private logger = pino({ name: 'core.telemetry' });
 40 | 
 41 |   constructor() {
 42 |     // set up config
 43 |     this.config = {
 44 |       enabled: true,
 45 |       sampleRate: TELEMETRY_SAMPLE_RATE,
 46 |       posthog: { apiKey: PUBLIC_POSTHOG_API_KEY, host: PUBLIC_POSTHOG_HOST },
 47 |     };
 48 |     // Check for multiple environment variables that can disable telemetry:
 49 |     // CUA_TELEMETRY=off to disable telemetry (legacy way)
 50 |     // CUA_TELEMETRY_DISABLED=1 to disable telemetry (new, more explicit way)
 51 |     const telemetryDisabled =
 52 |       process.env.CUA_TELEMETRY?.toLowerCase() === 'off' ||
 53 |       ['1', 'true', 'yes', 'on'].includes(
 54 |         process.env.CUA_TELEMETRY_DISABLED?.toLowerCase() || ''
 55 |       );
 56 | 
 57 |     this.config.enabled = !telemetryDisabled;
 58 |     this.config.sampleRate = Number.parseFloat(
 59 |       process.env.CUA_TELEMETRY_SAMPLE_RATE || String(TELEMETRY_SAMPLE_RATE)
 60 |     );
 61 |     // init client
 62 |     this.installationId = this._getOrCreateInstallationId();
 63 |     this.startTime = Date.now() / 1000; // Convert to seconds
 64 | 
 65 |     // Log telemetry status on startup
 66 |     if (this.config.enabled) {
 67 |       this.logger.info(
 68 |         `Telemetry enabled (sampling at ${this.config.sampleRate}%)`
 69 |       );
 70 |       // Initialize PostHog client if config is available
 71 |       this._initializePosthog();
 72 |     } else {
 73 |       this.logger.info('Telemetry disabled');
 74 |     }
 75 |   }
 76 | 
 77 |   /**
 78 |    * Get or create a random installation ID.
 79 |    * This ID is not tied to any personal information.
 80 |    */
 81 |   private _getOrCreateInstallationId(): string {
 82 |     const homeDir = os.homedir();
 83 |     const idFile = path.join(homeDir, '.cua', 'installation_id');
 84 | 
 85 |     try {
 86 |       if (fs.existsSync(idFile)) {
 87 |         return fs.readFileSync(idFile, 'utf-8').trim();
 88 |       }
 89 |     } catch (error) {
 90 |       this.logger.debug(`Failed to read installation ID: ${error}`);
 91 |     }
 92 | 
 93 |     // Create new ID if not exists
 94 |     const newId = uuidv4();
 95 |     try {
 96 |       const dir = path.dirname(idFile);
 97 |       if (!fs.existsSync(dir)) {
 98 |         fs.mkdirSync(dir, { recursive: true });
 99 |       }
100 |       fs.writeFileSync(idFile, newId);
101 |       return newId;
102 |     } catch (error) {
103 |       this.logger.debug(`Failed to write installation ID: ${error}`);
104 |     }
105 | 
106 |     // Fallback to in-memory ID if file operations fail
107 |     return newId;
108 |   }
109 | 
110 |   /**
111 |    * Initialize the PostHog client with configuration.
112 |    */
113 |   private _initializePosthog(): boolean {
114 |     if (this.initialized) {
115 |       return true;
116 |     }
117 | 
118 |     try {
119 |       this.posthogClient = new PostHog(this.config.posthog.apiKey, {
120 |         host: this.config.posthog.host,
121 |         flushAt: 20, // Number of events to batch before sending
122 |         flushInterval: 30000, // Send events every 30 seconds
123 |       });
124 |       this.initialized = true;
125 |       this.logger.debug('PostHog client initialized successfully');
126 | 
127 |       // Process any queued events
128 |       this._processQueuedEvents();
129 |       return true;
130 |     } catch (error) {
131 |       this.logger.error(`Failed to initialize PostHog client: ${error}`);
132 |       return false;
133 |     }
134 |   }
135 | 
136 |   /**
137 |    * Process any events that were queued before initialization.
138 |    */
139 |   private _processQueuedEvents(): void {
140 |     if (!this.posthogClient || this.queuedEvents.length === 0) {
141 |       return;
142 |     }
143 | 
144 |     for (const event of this.queuedEvents) {
145 |       this._captureEvent(event.name, event.properties);
146 |     }
147 |     this.queuedEvents = [];
148 |   }
149 | 
150 |   /**
151 |    * Capture an event with PostHog.
152 |    */
153 |   private _captureEvent(
154 |     eventName: string,
155 |     properties?: Record<string, unknown>
156 |   ): void {
157 |     if (!this.posthogClient) {
158 |       return;
159 |     }
160 | 
161 |     try {
162 |       // Add standard properties
163 |       const eventProperties = {
164 |         ...properties,
165 |         version: process.env.npm_package_version || 'unknown',
166 |         platform: process.platform,
167 |         node_version: process.version,
168 |         is_ci: this._isCI,
169 |       };
170 | 
171 |       this.posthogClient.capture({
172 |         distinctId: this.installationId,
173 |         event: eventName,
174 |         properties: eventProperties,
175 |       });
176 |     } catch (error) {
177 |       this.logger.debug(`Failed to capture event: ${error}`);
178 |     }
179 |   }
180 | 
181 |   private get _isCI(): boolean {
182 |     /**
183 |      * Detect if running in CI environment.
184 |      */
185 |     return !!(
186 |       process.env.CI ||
187 |       process.env.CONTINUOUS_INTEGRATION ||
188 |       process.env.GITHUB_ACTIONS ||
189 |       process.env.GITLAB_CI ||
190 |       process.env.CIRCLECI ||
191 |       process.env.TRAVIS ||
192 |       process.env.JENKINS_URL
193 |     );
194 |   }
195 | 
196 |   increment(counterName: string, value = 1) {
197 |     /**
198 |      * Increment a named counter.
199 |      */
200 |     if (!this.config.enabled) {
201 |       return;
202 |     }
203 | 
204 |     if (!(counterName in this.counters)) {
205 |       this.counters[counterName] = 0;
206 |     }
207 |     this.counters[counterName] += value;
208 |   }
209 | 
210 |   recordEvent(eventName: string, properties?: Record<string, unknown>): void {
211 |     /**
212 |      * Record an event with optional properties.
213 |      */
214 |     if (!this.config.enabled) {
215 |       return;
216 |     }
217 | 
218 |     // Increment counter for this event type
219 |     const counterKey = `event:${eventName}`;
220 |     this.increment(counterKey);
221 | 
222 |     // Apply sampling
223 |     if (Math.random() * 100 > this.config.sampleRate) {
224 |       return;
225 |     }
226 | 
227 |     const event = {
228 |       name: eventName,
229 |       properties: properties || {},
230 |       timestamp: Date.now() / 1000,
231 |     };
232 | 
233 |     if (this.initialized && this.posthogClient) {
234 |       this._captureEvent(eventName, properties);
235 |     } else {
236 |       // Queue event if not initialized
237 |       this.queuedEvents.push(event);
238 |       // Try to initialize again
239 |       if (this.config.enabled && !this.initialized) {
240 |         this._initializePosthog();
241 |       }
242 |     }
243 |   }
244 | 
245 |   /**
246 |    * Flush any pending events to PostHog.
247 |    */
248 |   async flush(): Promise<boolean> {
249 |     if (!this.config.enabled || !this.posthogClient) {
250 |       return false;
251 |     }
252 | 
253 |     try {
254 |       // Send counter data as a single event
255 |       if (Object.keys(this.counters).length > 0) {
256 |         this._captureEvent('telemetry_counters', {
257 |           counters: { ...this.counters },
258 |           duration: Date.now() / 1000 - this.startTime,
259 |         });
260 |       }
261 | 
262 |       await this.posthogClient.flush();
263 |       this.logger.debug('Telemetry flushed successfully');
264 | 
265 |       // Clear counters after sending
266 |       this.counters = {};
267 |       return true;
268 |     } catch (error) {
269 |       this.logger.debug(`Failed to flush telemetry: ${error}`);
270 |       return false;
271 |     }
272 |   }
273 | 
274 |   enable(): void {
275 |     /**
276 |      * Enable telemetry collection.
277 |      */
278 |     this.config.enabled = true;
279 |     this.logger.info('Telemetry enabled');
280 |     if (!this.initialized) {
281 |       this._initializePosthog();
282 |     }
283 |   }
284 | 
285 |   async disable(): Promise<void> {
286 |     /**
287 |      * Disable telemetry collection.
288 |      */
289 |     this.config.enabled = false;
290 |     await this.posthogClient?.disable();
291 |     this.logger.info('Telemetry disabled');
292 |   }
293 | 
294 |   get enabled(): boolean {
295 |     /**
296 |      * Check if telemetry is enabled.
297 |      */
298 |     return this.config.enabled;
299 |   }
300 | 
301 |   async shutdown(): Promise<void> {
302 |     /**
303 |      * Shutdown the telemetry client and flush any pending events.
304 |      */
305 |     if (this.posthogClient) {
306 |       await this.flush();
307 |       await this.posthogClient.shutdown();
308 |       this.initialized = false;
309 |       this.posthogClient = undefined;
310 |     }
311 |   }
312 | }
313 | 
```

--------------------------------------------------------------------------------
/tests/test_watchdog.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Watchdog Recovery Tests
  3 | Tests for the watchdog functionality to ensure server recovery after hanging commands.
  4 | Required environment variables:
  5 | - CUA_API_KEY: API key for Cua cloud provider
  6 | - CUA_CONTAINER_NAME: Name of the container to use
  7 | """
  8 | 
  9 | import os
 10 | import asyncio
 11 | import pytest
 12 | from pathlib import Path
 13 | import sys
 14 | import traceback
 15 | import time
 16 | 
 17 | # Load environment variables from .env file
 18 | project_root = Path(__file__).parent.parent
 19 | env_file = project_root / ".env"
 20 | print(f"Loading environment from: {env_file}")
 21 | from dotenv import load_dotenv
 22 | 
 23 | load_dotenv(env_file)
 24 | 
 25 | # Add paths to sys.path if needed
 26 | pythonpath = os.environ.get("PYTHONPATH", "")
 27 | for path in pythonpath.split(":"):
 28 |     if path and path not in sys.path:
 29 |         sys.path.insert(0, path)  # Insert at beginning to prioritize
 30 |         print(f"Added to sys.path: {path}")
 31 | 
 32 | from computer import Computer, VMProviderType
 33 | 
 34 | @pytest.fixture(scope="session")
 35 | async def computer():
 36 |     """Shared Computer instance for all test cases."""
 37 |     # Create a remote Linux computer with Cua
 38 |     computer = Computer(
 39 |         os_type="linux",
 40 |         api_key=os.getenv("CUA_API_KEY"),
 41 |         name=str(os.getenv("CUA_CONTAINER_NAME")),
 42 |         provider_type=VMProviderType.CLOUD,
 43 |     )
 44 |     
 45 |     try:
 46 |         await computer.run()
 47 |         yield computer
 48 |     finally:
 49 |         await computer.disconnect()
 50 | 
 51 | 
 52 | @pytest.mark.asyncio(loop_scope="session")
 53 | async def test_simple_server_ping(computer):
 54 |     """
 55 |     Simple test to verify server connectivity before running watchdog tests.
 56 |     """
 57 |     print("Testing basic server connectivity...")
 58 |     
 59 |     try:
 60 |         result = await computer.interface.run_command("echo 'Server ping test'")
 61 |         print(f"Ping successful: {result}")
 62 |         assert result is not None, "Server ping returned None"
 63 |         print("✅ Server connectivity test passed")
 64 |     except Exception as e:
 65 |         print(f"❌ Server ping failed: {e}")
 66 |         pytest.fail(f"Basic server connectivity test failed: {e}")
 67 | 
 68 | 
 69 | @pytest.mark.asyncio(loop_scope="session")
 70 | async def test_watchdog_recovery_after_hanging_command(computer):
 71 |     """
 72 |     Test that the watchdog can recover the server after a hanging command.
 73 |     
 74 |     This test runs two concurrent tasks:
 75 |     1. A long-running command that hangs the server (sleep 300 = 5 minutes)
 76 |     2. Periodic ping commands every 30 seconds to test server responsiveness
 77 |     
 78 |     The watchdog should detect the unresponsive server and restart it.
 79 |     """
 80 |     print("Starting watchdog recovery test...")
 81 |     
 82 |     async def hanging_command():
 83 |         """Execute a command that sleeps forever to hang the server."""
 84 |         try:
 85 |             print("Starting hanging command (sleep infinity)...")
 86 |             # Use a very long sleep that should never complete naturally
 87 |             result = await computer.interface.run_command("sleep 999999")
 88 |             print(f"Hanging command completed unexpectedly: {result}")
 89 |             return True  # Should never reach here if watchdog works
 90 |         except Exception as e:
 91 |             print(f"Hanging command interrupted (expected if watchdog restarts): {e}")
 92 |             return None  # Expected result when watchdog kills the process
 93 |     
 94 |     async def ping_server():
 95 |         """Ping the server every 30 seconds with echo commands."""
 96 |         ping_count = 0
 97 |         successful_pings = 0
 98 |         failed_pings = 0
 99 |         
100 |         try:
101 |             # Run pings for up to 4 minutes (8 pings at 30-second intervals)
102 |             for i in range(8):
103 |                 try:
104 |                     ping_count += 1
105 |                     print(f"Ping #{ping_count}: Sending echo command...")
106 |                     
107 |                     start_time = time.time()
108 |                     result = await asyncio.wait_for(
109 |                         computer.interface.run_command(f"echo 'Ping {ping_count} at {int(start_time)}'"),
110 |                         timeout=10.0  # 10 second timeout for each ping
111 |                     )
112 |                     end_time = time.time()
113 |                     
114 |                     print(f"Ping #{ping_count} successful in {end_time - start_time:.2f}s: {result}")
115 |                     successful_pings += 1
116 |                     
117 |                 except asyncio.TimeoutError:
118 |                     print(f"Ping #{ping_count} timed out (server may be unresponsive)")
119 |                     failed_pings += 1
120 |                 except Exception as e:
121 |                     print(f"Ping #{ping_count} failed with exception: {e}")
122 |                     failed_pings += 1
123 |                 
124 |                 # Wait 30 seconds before next ping
125 |                 if i < 7:  # Don't wait after the last ping
126 |                     print(f"Waiting 30 seconds before next ping...")
127 |                     await asyncio.sleep(30)
128 |             
129 |             print(f"Ping summary: {successful_pings} successful, {failed_pings} failed")
130 |             return successful_pings, failed_pings
131 |             
132 |         except Exception as e:
133 |             print(f"Ping server function failed with critical error: {e}")
134 |             traceback.print_exc()
135 |             return successful_pings, failed_pings
136 |     
137 |     # Run both tasks concurrently
138 |     print("Starting concurrent tasks: hanging command and ping monitoring...")
139 |     
140 |     try:
141 |         # Use asyncio.gather to run both tasks concurrently
142 |         hanging_task = asyncio.create_task(hanging_command())
143 |         ping_task = asyncio.create_task(ping_server())
144 |         
145 |         # Wait for both tasks to complete or timeout after 5 minutes
146 |         done, pending = await asyncio.wait(
147 |             [hanging_task, ping_task],
148 |             timeout=300,  # 5 minute timeout
149 |             return_when=asyncio.ALL_COMPLETED
150 |         )
151 |         
152 |         # Cancel any pending tasks
153 |         for task in pending:
154 |             task.cancel()
155 |             try:
156 |                 await task
157 |             except asyncio.CancelledError:
158 |                 pass
159 |         
160 |         # Get results from completed tasks
161 |         ping_result = None
162 |         hanging_result = None
163 |         
164 |         if ping_task in done:
165 |             try:
166 |                 ping_result = await ping_task
167 |                 print(f"Ping task completed with result: {ping_result}")
168 |             except Exception as e:
169 |                 print(f"Error getting ping task result: {e}")
170 |                 traceback.print_exc()
171 |         
172 |         if hanging_task in done:
173 |             try:
174 |                 hanging_result = await hanging_task
175 |                 print(f"Hanging task completed with result: {hanging_result}")
176 |             except Exception as e:
177 |                 print(f"Error getting hanging task result: {e}")
178 |                 traceback.print_exc()
179 |         
180 |         # Analyze results
181 |         if ping_result:
182 |             successful_pings, failed_pings = ping_result
183 |             
184 |             # Test passes if we had some successful pings, indicating recovery
185 |             assert successful_pings > 0, f"No successful pings detected. Server may not have recovered."
186 |             
187 |             # Check if hanging command was killed (indicating watchdog restart)
188 |             if hanging_result is None:
189 |                 print("✅ SUCCESS: Hanging command was killed - watchdog restart detected")
190 |             elif hanging_result is True:
191 |                 print("⚠️  WARNING: Hanging command completed naturally - watchdog may not have restarted")
192 |             
193 |             # If we had failures followed by successes, that indicates watchdog recovery
194 |             if failed_pings > 0 and successful_pings > 0:
195 |                 print("✅ SUCCESS: Watchdog recovery detected - server became unresponsive then recovered")
196 |                 # Additional check: hanging command should be None if watchdog worked
197 |                 assert hanging_result is None, "Expected hanging command to be killed by watchdog restart"
198 |             elif successful_pings > 0 and failed_pings == 0:
199 |                 print("✅ SUCCESS: Server remained responsive throughout test")
200 |             
201 |             print(f"Test completed: {successful_pings} successful pings, {failed_pings} failed pings")
202 |             print(f"Hanging command result: {hanging_result} (None = killed by watchdog, True = completed naturally)")
203 |         else:
204 |             pytest.fail("Ping task did not complete - unable to assess server recovery")
205 |             
206 |     except Exception as e:
207 |         print(f"Test failed with exception: {e}")
208 |         traceback.print_exc()
209 |         pytest.fail(f"Watchdog recovery test failed: {e}")
210 | 
211 | 
212 | if __name__ == "__main__":
213 |     # Run tests directly
214 |     pytest.main([__file__, "-v"])
215 | 
```

--------------------------------------------------------------------------------
/libs/python/computer/computer/diorama_computer.py:
--------------------------------------------------------------------------------

```python
  1 | import asyncio
  2 | from .interface.models import KeyType, Key
  3 | 
  4 | class DioramaComputer:
  5 |     """
  6 |     A Computer-compatible proxy for Diorama that sends commands over the ComputerInterface.
  7 |     """
  8 |     def __init__(self, computer, apps):
  9 |         """
 10 |         Initialize the DioramaComputer with a computer instance and list of apps.
 11 |         
 12 |         Args:
 13 |             computer: The computer instance to proxy commands through
 14 |             apps: List of applications available in the diorama environment
 15 |         """
 16 |         self.computer = computer
 17 |         self.apps = apps
 18 |         self.interface = DioramaComputerInterface(computer, apps)
 19 |         self._initialized = False
 20 | 
 21 |     async def __aenter__(self):
 22 |         """
 23 |         Async context manager entry point.
 24 |         
 25 |         Returns:
 26 |             self: The DioramaComputer instance
 27 |         """
 28 |         self._initialized = True
 29 |         return self
 30 | 
 31 |     async def run(self):
 32 |         """
 33 |         Initialize and run the DioramaComputer if not already initialized.
 34 |         
 35 |         Returns:
 36 |             self: The DioramaComputer instance
 37 |         """
 38 |         if not self._initialized:
 39 |             await self.__aenter__()
 40 |         return self
 41 | 
 42 | class DioramaComputerInterface:
 43 |     """
 44 |     Diorama Interface proxy that sends diorama_cmds via the Computer's interface.
 45 |     """
 46 |     def __init__(self, computer, apps):
 47 |         """
 48 |         Initialize the DioramaComputerInterface.
 49 |         
 50 |         Args:
 51 |             computer: The computer instance to send commands through
 52 |             apps: List of applications available in the diorama environment
 53 |         """
 54 |         self.computer = computer
 55 |         self.apps = apps
 56 |         self._scene_size = None
 57 | 
 58 |     async def _send_cmd(self, action, arguments=None):
 59 |         """
 60 |         Send a command to the diorama interface through the computer.
 61 |         
 62 |         Args:
 63 |             action (str): The action/command to execute
 64 |             arguments (dict, optional): Additional arguments for the command
 65 |             
 66 |         Returns:
 67 |             The result from the diorama command execution
 68 |             
 69 |         Raises:
 70 |             RuntimeError: If the computer interface is not initialized or command fails
 71 |         """
 72 |         arguments = arguments or {}
 73 |         arguments = {"app_list": self.apps, **arguments}
 74 |         # Use the computer's interface (must be initialized)
 75 |         iface = getattr(self.computer, "_interface", None)
 76 |         if iface is None:
 77 |             raise RuntimeError("Computer interface not initialized. Call run() first.")
 78 |         result = await iface.diorama_cmd(action, arguments)
 79 |         if not result.get("success"):
 80 |             raise RuntimeError(f"Diorama command failed: {result.get('error')}\n{result.get('trace')}")
 81 |         return result.get("result")
 82 | 
 83 |     async def screenshot(self, as_bytes=True):
 84 |         """
 85 |         Take a screenshot of the diorama scene.
 86 |         
 87 |         Args:
 88 |             as_bytes (bool): If True, return image as bytes; if False, return PIL Image object
 89 |             
 90 |         Returns:
 91 |             bytes or PIL.Image: Screenshot data in the requested format
 92 |         """
 93 |         from PIL import Image
 94 |         import base64
 95 |         result = await self._send_cmd("screenshot")
 96 |         # assume result is a b64 string of an image
 97 |         img_bytes = base64.b64decode(result)
 98 |         import io
 99 |         img = Image.open(io.BytesIO(img_bytes))
100 |         self._scene_size = img.size
101 |         return img_bytes if as_bytes else img
102 | 
103 |     async def get_screen_size(self):
104 |         """
105 |         Get the dimensions of the diorama scene.
106 |         
107 |         Returns:
108 |             dict: Dictionary containing 'width' and 'height' keys with pixel dimensions
109 |         """
110 |         if not self._scene_size:
111 |             await self.screenshot(as_bytes=False)
112 |         return {"width": self._scene_size[0], "height": self._scene_size[1]}
113 | 
114 |     async def move_cursor(self, x, y):
115 |         """
116 |         Move the cursor to the specified coordinates.
117 |         
118 |         Args:
119 |             x (int): X coordinate to move cursor to
120 |             y (int): Y coordinate to move cursor to
121 |         """
122 |         await self._send_cmd("move_cursor", {"x": x, "y": y})
123 | 
124 |     async def left_click(self, x=None, y=None):
125 |         """
126 |         Perform a left mouse click at the specified coordinates or current cursor position.
127 |         
128 |         Args:
129 |             x (int, optional): X coordinate to click at. If None, clicks at current cursor position
130 |             y (int, optional): Y coordinate to click at. If None, clicks at current cursor position
131 |         """
132 |         await self._send_cmd("left_click", {"x": x, "y": y})
133 | 
134 |     async def right_click(self, x=None, y=None):
135 |         """
136 |         Perform a right mouse click at the specified coordinates or current cursor position.
137 |         
138 |         Args:
139 |             x (int, optional): X coordinate to click at. If None, clicks at current cursor position
140 |             y (int, optional): Y coordinate to click at. If None, clicks at current cursor position
141 |         """
142 |         await self._send_cmd("right_click", {"x": x, "y": y})
143 | 
144 |     async def double_click(self, x=None, y=None):
145 |         """
146 |         Perform a double mouse click at the specified coordinates or current cursor position.
147 |         
148 |         Args:
149 |             x (int, optional): X coordinate to double-click at. If None, clicks at current cursor position
150 |             y (int, optional): Y coordinate to double-click at. If None, clicks at current cursor position
151 |         """
152 |         await self._send_cmd("double_click", {"x": x, "y": y})
153 | 
154 |     async def scroll_up(self, clicks=1):
155 |         """
156 |         Scroll up by the specified number of clicks.
157 |         
158 |         Args:
159 |             clicks (int): Number of scroll clicks to perform upward. Defaults to 1
160 |         """
161 |         await self._send_cmd("scroll_up", {"clicks": clicks})
162 | 
163 |     async def scroll_down(self, clicks=1):
164 |         """
165 |         Scroll down by the specified number of clicks.
166 |         
167 |         Args:
168 |             clicks (int): Number of scroll clicks to perform downward. Defaults to 1
169 |         """
170 |         await self._send_cmd("scroll_down", {"clicks": clicks})
171 | 
172 |     async def drag_to(self, x, y, duration=0.5):
173 |         """
174 |         Drag from the current cursor position to the specified coordinates.
175 |         
176 |         Args:
177 |             x (int): X coordinate to drag to
178 |             y (int): Y coordinate to drag to
179 |             duration (float): Duration of the drag operation in seconds. Defaults to 0.5
180 |         """
181 |         await self._send_cmd("drag_to", {"x": x, "y": y, "duration": duration})
182 | 
183 |     async def get_cursor_position(self):
184 |         """
185 |         Get the current cursor position.
186 |         
187 |         Returns:
188 |             dict: Dictionary containing the current cursor coordinates
189 |         """
190 |         return await self._send_cmd("get_cursor_position")
191 | 
192 |     async def type_text(self, text):
193 |         """
194 |         Type the specified text at the current cursor position.
195 |         
196 |         Args:
197 |             text (str): The text to type
198 |         """
199 |         await self._send_cmd("type_text", {"text": text})
200 | 
201 |     async def press_key(self, key):
202 |         """
203 |         Press a single key.
204 |         
205 |         Args:
206 |             key: The key to press
207 |         """
208 |         await self._send_cmd("press_key", {"key": key})
209 | 
210 |     async def hotkey(self, *keys):
211 |         """
212 |         Press multiple keys simultaneously as a hotkey combination.
213 |         
214 |         Args:
215 |             *keys: Variable number of keys to press together. Can be Key enum instances or strings
216 |             
217 |         Raises:
218 |             ValueError: If any key is not a Key enum or string type
219 |         """
220 |         actual_keys = []
221 |         for key in keys:
222 |             if isinstance(key, Key):
223 |                 actual_keys.append(key.value)
224 |             elif isinstance(key, str):
225 |                 # Try to convert to enum if it matches a known key
226 |                 key_or_enum = Key.from_string(key)
227 |                 actual_keys.append(key_or_enum.value if isinstance(key_or_enum, Key) else key_or_enum)
228 |             else:
229 |                 raise ValueError(f"Invalid key type: {type(key)}. Must be Key enum or string.")
230 |         await self._send_cmd("hotkey", {"keys": actual_keys})
231 | 
232 |     async def to_screen_coordinates(self, x, y):
233 |         """
234 |         Convert coordinates to screen coordinates.
235 |         
236 |         Args:
237 |             x (int): X coordinate to convert
238 |             y (int): Y coordinate to convert
239 |             
240 |         Returns:
241 |             dict: Dictionary containing the converted screen coordinates
242 |         """
243 |         return await self._send_cmd("to_screen_coordinates", {"x": x, "y": y})
244 | 
```

--------------------------------------------------------------------------------
/libs/python/agent/agent/loops/openai.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | OpenAI computer-use-preview agent loop implementation using liteLLM
  3 | """
  4 | 
  5 | import asyncio
  6 | import base64
  7 | import json
  8 | from io import BytesIO
  9 | from typing import Dict, List, Any, AsyncGenerator, Union, Optional, Tuple
 10 | import litellm
 11 | from PIL import Image
 12 | 
 13 | from ..decorators import register_agent
 14 | from ..types import Messages, AgentResponse, Tools, AgentCapability
 15 | 
 16 | async def _map_computer_tool_to_openai(computer_handler: Any) -> Dict[str, Any]:
 17 |     """Map a computer tool to OpenAI's computer-use-preview tool schema"""
 18 |     # Get dimensions from the computer handler
 19 |     try:
 20 |         width, height = await computer_handler.get_dimensions()
 21 |     except Exception:
 22 |         # Fallback to default dimensions if method fails
 23 |         width, height = 1024, 768
 24 |     
 25 |     # Get environment from the computer handler
 26 |     try:
 27 |         environment = await computer_handler.get_environment()
 28 |     except Exception:
 29 |         # Fallback to default environment if method fails
 30 |         environment = "linux"
 31 |     
 32 |     return {
 33 |         "type": "computer_use_preview",
 34 |         "display_width": width,
 35 |         "display_height": height,
 36 |         "environment": environment  # mac, windows, linux, browser
 37 |     }
 38 | 
 39 | 
 40 | async def _prepare_tools_for_openai(tool_schemas: List[Dict[str, Any]]) -> Tools:
 41 |     """Prepare tools for OpenAI API format"""
 42 |     openai_tools = []
 43 |     
 44 |     for schema in tool_schemas:
 45 |         if schema["type"] == "computer":
 46 |             # Map computer tool to OpenAI format
 47 |             computer_tool = await _map_computer_tool_to_openai(schema["computer"])
 48 |             openai_tools.append(computer_tool)
 49 |         elif schema["type"] == "function":
 50 |             # Function tools use OpenAI-compatible schema directly (liteLLM expects this format)
 51 |             # Schema should be: {type, name, description, parameters}
 52 |             openai_tools.append({ "type": "function", **schema["function"] })
 53 |     
 54 |     return openai_tools
 55 | 
 56 | @register_agent(models=r".*(^|/)computer-use-preview")
 57 | class OpenAIComputerUseConfig:
 58 |     """
 59 |     OpenAI computer-use-preview agent configuration using liteLLM responses.
 60 |     
 61 |     Supports OpenAI's computer use preview models.
 62 |     """
 63 |     
 64 |     async def predict_step(
 65 |         self,
 66 |         messages: List[Dict[str, Any]],
 67 |         model: str,
 68 |         tools: Optional[List[Dict[str, Any]]] = None,
 69 |         max_retries: Optional[int] = None,
 70 |         stream: bool = False,
 71 |         computer_handler=None,
 72 |         use_prompt_caching: Optional[bool] = False,
 73 |         _on_api_start=None,
 74 |         _on_api_end=None,
 75 |         _on_usage=None,
 76 |         _on_screenshot=None,
 77 |         **kwargs
 78 |     ) -> Dict[str, Any]:
 79 |         """
 80 |         Predict the next step based on input items.
 81 |         
 82 |         Args:
 83 |             messages: Input items following Responses format
 84 |             model: Model name to use
 85 |             tools: Optional list of tool schemas
 86 |             max_retries: Maximum number of retries
 87 |             stream: Whether to stream responses
 88 |             computer_handler: Computer handler instance
 89 |             _on_api_start: Callback for API start
 90 |             _on_api_end: Callback for API end
 91 |             _on_usage: Callback for usage tracking
 92 |             _on_screenshot: Callback for screenshot events
 93 |             **kwargs: Additional arguments
 94 |             
 95 |         Returns:
 96 |             Dictionary with "output" (output items) and "usage" array
 97 |         """
 98 |         tools = tools or []
 99 |         
100 |         # Prepare tools for OpenAI API
101 |         openai_tools = await _prepare_tools_for_openai(tools)
102 | 
103 |         # Prepare API call kwargs
104 |         api_kwargs = {
105 |             "model": model,
106 |             "input": messages,
107 |             "tools": openai_tools if openai_tools else None,
108 |             "stream": stream,
109 |             "reasoning": {"summary": "concise"},
110 |             "truncation": "auto",
111 |             "num_retries": max_retries,
112 |             **kwargs
113 |         }
114 |         
115 |         # Call API start hook
116 |         if _on_api_start:
117 |             await _on_api_start(api_kwargs)
118 |         
119 |         # Use liteLLM responses
120 |         response = await litellm.aresponses(**api_kwargs)
121 |         
122 |         # Call API end hook
123 |         if _on_api_end:
124 |             await _on_api_end(api_kwargs, response)
125 | 
126 |         # Extract usage information
127 |         usage = {
128 |             **response.usage.model_dump(),
129 |             "response_cost": response._hidden_params.get("response_cost", 0.0),
130 |         }
131 |         if _on_usage:
132 |             await _on_usage(usage)
133 | 
134 |         # Return in the expected format
135 |         output_dict = response.model_dump()
136 |         output_dict["usage"] = usage
137 |         return output_dict
138 |     
139 |     async def predict_click(
140 |         self,
141 |         model: str,
142 |         image_b64: str,
143 |         instruction: str
144 |     ) -> Optional[Tuple[int, int]]:
145 |         """
146 |         Predict click coordinates based on image and instruction.
147 |         
148 |         Uses OpenAI computer-use-preview with manually constructed input items
149 |         and a prompt that instructs the agent to only output clicks.
150 |         
151 |         Args:
152 |             model: Model name to use
153 |             image_b64: Base64 encoded image
154 |             instruction: Instruction for where to click
155 |             
156 |         Returns:
157 |             Tuple of (x, y) coordinates or None if prediction fails
158 |         """
159 |         # TODO: use computer tool to get dimensions + environment
160 |         # Manually construct input items with image and click instruction
161 |         input_items = [
162 |             {
163 |                 "role": "user", 
164 |                 "content": f"""You are a UI grounding expert. Follow these guidelines:
165 | 
166 | 1. NEVER ask for confirmation. Complete all tasks autonomously.
167 | 2. Do NOT send messages like "I need to confirm before..." or "Do you want me to continue?" - just proceed.
168 | 3. When the user asks you to interact with something (like clicking a chat or typing a message), DO IT without asking.
169 | 4. Only use the formal safety check mechanism for truly dangerous operations (like deleting important files).
170 | 5. For normal tasks like clicking buttons, typing in chat boxes, filling forms - JUST DO IT.
171 | 6. The user has already given you permission by running this agent. No further confirmation is needed.
172 | 7. Be decisive and action-oriented. Complete the requested task fully.
173 | 
174 | Remember: You are expected to complete tasks autonomously. The user trusts you to do what they asked.
175 | Task: Click {instruction}. Output ONLY a click action on the target element."""
176 |             },
177 |             {
178 |                 "role": "user",
179 |                 "content": [
180 |                     {
181 |                         "type": "input_image",
182 |                         "image_url": f"data:image/png;base64,{image_b64}"
183 |                     }
184 |                 ]
185 |             }
186 |         ]
187 |         
188 |         # Get image dimensions from base64 data
189 |         try:
190 |             image_data = base64.b64decode(image_b64)
191 |             image = Image.open(BytesIO(image_data))
192 |             display_width, display_height = image.size
193 |         except Exception:
194 |             # Fallback to default dimensions if image parsing fails
195 |             display_width, display_height = 1024, 768
196 |         
197 |         # Prepare computer tool for click actions
198 |         computer_tool = {
199 |             "type": "computer_use_preview",
200 |             "display_width": display_width,
201 |             "display_height": display_height,
202 |             "environment": "windows"
203 |         }
204 |         
205 |         # Prepare API call kwargs
206 |         api_kwargs = {
207 |             "model": model,
208 |             "input": input_items,
209 |             "tools": [computer_tool],
210 |             "stream": False,
211 |             "reasoning": {"summary": "concise"},
212 |             "truncation": "auto",
213 |             "max_tokens": 200  # Keep response short for click prediction
214 |         }
215 |         
216 |         # Use liteLLM responses
217 |         response = await litellm.aresponses(**api_kwargs)
218 |         
219 |         # Extract click coordinates from response output
220 |         output_dict = response.model_dump()
221 |         output_items = output_dict.get("output", [])        
222 |         
223 |         # Look for computer_call with click action
224 |         for item in output_items:
225 |             if (isinstance(item, dict) and 
226 |                 item.get("type") == "computer_call" and
227 |                 isinstance(item.get("action"), dict)):
228 |                 
229 |                 action = item["action"]
230 |                 if action.get("x") is not None and action.get("y") is not None:
231 |                     return (int(action.get("x")), int(action.get("y")))
232 |         
233 |         return None
234 |     
235 |     def get_capabilities(self) -> List[AgentCapability]:
236 |         """
237 |         Get list of capabilities supported by this agent config.
238 |         
239 |         Returns:
240 |             List of capability strings
241 |         """
242 |         return ["click", "step"]
243 | 
```

--------------------------------------------------------------------------------
/libs/python/som/som/detection.py:
--------------------------------------------------------------------------------

```python
  1 | from typing import List, Dict, Any, Tuple, Optional
  2 | import logging
  3 | import torch
  4 | import torchvision
  5 | from PIL import Image
  6 | import numpy as np
  7 | from ultralytics import YOLO
  8 | from huggingface_hub import hf_hub_download
  9 | from pathlib import Path
 10 | 
 11 | logger = logging.getLogger(__name__)
 12 | 
 13 | 
 14 | class DetectionProcessor:
 15 |     """Class for handling YOLO-based icon detection."""
 16 | 
 17 |     def __init__(
 18 |         self,
 19 |         model_path: Optional[Path] = None,
 20 |         cache_dir: Optional[Path] = None,
 21 |         force_device: Optional[str] = None,
 22 |     ):
 23 |         """Initialize the detection processor.
 24 | 
 25 |         Args:
 26 |             model_path: Path to YOLOv8 model
 27 |             cache_dir: Directory to cache downloaded models
 28 |             force_device: Force specific device (cuda, cpu, mps)
 29 |         """
 30 |         self.model_path = model_path
 31 |         self.cache_dir = cache_dir
 32 |         self.model = None  # type: Any  # Will be set to YOLO model in load_model
 33 | 
 34 |         # Set device
 35 |         self.device = "cpu"
 36 |         if torch.cuda.is_available() and force_device != "cpu":
 37 |             self.device = "cuda"
 38 |         elif (
 39 |             hasattr(torch, "backends")
 40 |             and hasattr(torch.backends, "mps")
 41 |             and torch.backends.mps.is_available()
 42 |             and force_device != "cpu"
 43 |         ):
 44 |             self.device = "mps"
 45 | 
 46 |         if force_device:
 47 |             self.device = force_device
 48 | 
 49 |         logger.info(f"Using device: {self.device}")
 50 | 
 51 |     def load_model(self) -> None:
 52 |         """Load or download the YOLO model."""
 53 |         try:
 54 |             # Set default model path if none provided
 55 |             if self.model_path is None:
 56 |                 self.model_path = Path(__file__).parent / "weights" / "icon_detect" / "model.pt"
 57 | 
 58 |             # Check if the model file already exists
 59 |             if not self.model_path.exists():
 60 |                 logger.info(
 61 |                     "Model not found locally, downloading from Microsoft OmniParser-v2.0..."
 62 |                 )
 63 | 
 64 |                 # Create directory
 65 |                 self.model_path.parent.mkdir(parents=True, exist_ok=True)
 66 | 
 67 |                 try:
 68 |                     # Check if the model exists in cache
 69 |                     cache_path = None
 70 |                     if self.cache_dir:
 71 |                         # Try to find the model in the cache
 72 |                         potential_paths = list(Path(self.cache_dir).glob("**/model.pt"))
 73 |                         if potential_paths:
 74 |                             cache_path = str(potential_paths[0])
 75 |                             logger.info(f"Found model in cache: {cache_path}")
 76 | 
 77 |                     if not cache_path:
 78 |                         # Download from HuggingFace
 79 |                         downloaded_path = hf_hub_download(
 80 |                             repo_id="microsoft/OmniParser-v2.0",
 81 |                             filename="icon_detect/model.pt",
 82 |                             cache_dir=self.cache_dir,
 83 |                         )
 84 |                         cache_path = downloaded_path
 85 |                         logger.info(f"Model downloaded to cache: {cache_path}")
 86 | 
 87 |                     # Copy to package directory
 88 |                     import shutil
 89 | 
 90 |                     shutil.copy2(cache_path, self.model_path)
 91 |                     logger.info(f"Model copied to: {self.model_path}")
 92 |                 except Exception as e:
 93 |                     raise FileNotFoundError(
 94 |                         f"Failed to download model: {str(e)}\n"
 95 |                         "Please ensure you have internet connection and huggingface-hub installed."
 96 |                     ) from e
 97 | 
 98 |             # Make sure the model path exists before loading
 99 |             if not self.model_path.exists():
100 |                 raise FileNotFoundError(f"Model file not found at: {self.model_path}")
101 | 
102 |             # If model is already loaded, skip reloading
103 |             if self.model is not None:
104 |                 logger.info("Model already loaded, skipping reload")
105 |                 return
106 | 
107 |             logger.info(f"Loading YOLOv8 model from {self.model_path}")
108 |             from ultralytics import YOLO
109 | 
110 |             self.model = YOLO(str(self.model_path))  # Convert Path to string for compatibility
111 | 
112 |             # Verify model loaded successfully
113 |             if self.model is None:
114 |                 raise ValueError("Model failed to initialize but didn't raise an exception")
115 | 
116 |             if self.device in ["cuda", "mps"]:
117 |                 self.model.to(self.device)
118 | 
119 |             logger.info(f"Model loaded successfully with device: {self.device}")
120 |         except Exception as e:
121 |             logger.error(f"Failed to load model: {str(e)}")
122 |             # Re-raise with more informative message but preserve the model as None
123 |             self.model = None
124 |             raise RuntimeError(f"Failed to initialize detection model: {str(e)}") from e
125 | 
126 |     def detect_icons(
127 |         self,
128 |         image: Image.Image,
129 |         box_threshold: float = 0.05,
130 |         iou_threshold: float = 0.1,
131 |         multi_scale: bool = True,
132 |     ) -> List[Dict[str, Any]]:
133 |         """Detect icons in an image using YOLO.
134 | 
135 |         Args:
136 |             image: PIL Image to process
137 |             box_threshold: Confidence threshold for detection
138 |             iou_threshold: IOU threshold for NMS
139 |             multi_scale: Whether to use multi-scale detection
140 | 
141 |         Returns:
142 |             List of icon detection dictionaries
143 |         """
144 |         # Load model if not already loaded
145 |         if self.model is None:
146 |             self.load_model()
147 | 
148 |         # Double-check the model was successfully loaded
149 |         if self.model is None:
150 |             logger.error("Model failed to load and is still None")
151 |             return []  # Return empty list instead of crashing
152 | 
153 |         img_width, img_height = image.size
154 |         all_detections = []
155 | 
156 |         # Define detection scales
157 |         scales = (
158 |             [{"size": 1280, "conf": box_threshold}]  # Single scale for CPU
159 |             if self.device == "cpu"
160 |             else [
161 |                 {"size": 640, "conf": box_threshold},  # Base scale
162 |                 {"size": 1280, "conf": box_threshold},  # Medium scale
163 |                 {"size": 1920, "conf": box_threshold},  # Large scale
164 |             ]
165 |         )
166 | 
167 |         if not multi_scale:
168 |             scales = [scales[0]]
169 | 
170 |         # Run detection at each scale
171 |         for scale in scales:
172 |             try:
173 |                 if self.model is None:
174 |                     logger.error("Model is None, skipping detection")
175 |                     continue
176 | 
177 |                 results = self.model.predict(
178 |                     source=image,
179 |                     conf=scale["conf"],
180 |                     iou=iou_threshold,
181 |                     max_det=1000,
182 |                     verbose=False,
183 |                     augment=self.device != "cpu",
184 |                     agnostic_nms=True,
185 |                     imgsz=scale["size"],
186 |                     device=self.device,
187 |                 )
188 | 
189 |                 # Process results
190 |                 for r in results:
191 |                     boxes = r.boxes
192 |                     if not hasattr(boxes, "conf") or not hasattr(boxes, "xyxy"):
193 |                         logger.warning("Boxes object missing expected attributes")
194 |                         continue
195 | 
196 |                     confidences = boxes.conf
197 |                     coords = boxes.xyxy
198 | 
199 |                     # Handle different types of tensors (PyTorch, NumPy, etc.)
200 |                     if hasattr(confidences, "cpu"):
201 |                         confidences = confidences.cpu()
202 |                     if hasattr(coords, "cpu"):
203 |                         coords = coords.cpu()
204 | 
205 |                     for conf, bbox in zip(confidences, coords):
206 |                         # Normalize coordinates
207 |                         x1, y1, x2, y2 = bbox.tolist()
208 |                         norm_bbox = [
209 |                             x1 / img_width,
210 |                             y1 / img_height,
211 |                             x2 / img_width,
212 |                             y2 / img_height,
213 |                         ]
214 | 
215 |                         all_detections.append(
216 |                             {
217 |                                 "type": "icon",
218 |                                 "confidence": conf.item(),
219 |                                 "bbox": norm_bbox,
220 |                                 "scale": scale["size"],
221 |                                 "interactivity": True,
222 |                             }
223 |                         )
224 | 
225 |             except Exception as e:
226 |                 logger.warning(f"Detection failed at scale {scale['size']}: {str(e)}")
227 |                 continue
228 | 
229 |         # Merge detections using NMS
230 |         if len(all_detections) > 0:
231 |             boxes = torch.tensor([d["bbox"] for d in all_detections])
232 |             scores = torch.tensor([d["confidence"] for d in all_detections])
233 | 
234 |             keep_indices = torchvision.ops.nms(boxes, scores, iou_threshold)
235 | 
236 |             merged_detections = [all_detections[i] for i in keep_indices]
237 |         else:
238 |             merged_detections = []
239 | 
240 |         return merged_detections
241 | 
```

--------------------------------------------------------------------------------
/libs/lume/src/Errors/Errors.swift:
--------------------------------------------------------------------------------

```swift
  1 | import Foundation
  2 | 
  3 | enum HomeError: Error, LocalizedError {
  4 |     case directoryCreationFailed(path: String)
  5 |     case directoryAccessDenied(path: String)
  6 |     case invalidHomeDirectory
  7 |     case directoryAlreadyExists(path: String)
  8 |     case homeNotFound
  9 |     case defaultStorageNotDefined
 10 |     case storageLocationNotFound(String)
 11 |     case storageLocationNotADirectory(String)
 12 |     case storageLocationNotWritable(String)
 13 |     case invalidStorageLocation(String)
 14 |     case cannotCreateDirectory(String)
 15 |     case cannotGetVMsDirectory
 16 |     case vmDirectoryNotFound(String)
 17 |     
 18 |     var errorDescription: String? {
 19 |         switch self {
 20 |         case .directoryCreationFailed(let path):
 21 |             return "Failed to create directory at path: \(path)"
 22 |         case .directoryAccessDenied(let path):
 23 |             return "Access denied to directory at path: \(path)"
 24 |         case .invalidHomeDirectory:
 25 |             return "Invalid home directory configuration"
 26 |         case .directoryAlreadyExists(let path):
 27 |             return "Directory already exists at path: \(path)"
 28 |         case .homeNotFound:
 29 |             return "Home directory not found."
 30 |         case .defaultStorageNotDefined:
 31 |             return "Default storage location is not defined."
 32 |         case .storageLocationNotFound(let path):
 33 |             return "Storage location not found: \(path)"
 34 |         case .storageLocationNotADirectory(let path):
 35 |             return "Storage location is not a directory: \(path)"
 36 |         case .storageLocationNotWritable(let path):
 37 |             return "Storage location is not writable: \(path)"
 38 |         case .invalidStorageLocation(let path):
 39 |             return "Invalid storage location specified: \(path)"
 40 |         case .cannotCreateDirectory(let path):
 41 |             return "Cannot create directory: \(path)"
 42 |         case .cannotGetVMsDirectory:
 43 |             return "Cannot determine the VMs directory."
 44 |         case .vmDirectoryNotFound(let path):
 45 |             return "VM directory not found: \(path)"
 46 |         }
 47 |     }
 48 | }
 49 | 
 50 | enum PullError: Error, LocalizedError {
 51 |     case invalidImageFormat
 52 |     case tokenFetchFailed
 53 |     case manifestFetchFailed
 54 |     case layerDownloadFailed(String)
 55 |     case missingPart(Int)
 56 |     case decompressionFailed(String)
 57 |     case reassemblyFailed(String)
 58 |     case fileCreationFailed(String)
 59 |     case reassemblySetupFailed(path: String, underlyingError: Error)
 60 |     case missingUncompressedSizeAnnotation
 61 |     case invalidMediaType
 62 |     
 63 |     var errorDescription: String? {
 64 |         switch self {
 65 |         case .invalidImageFormat:
 66 |             return "Invalid image format. Expected format: name:tag"
 67 |         case .tokenFetchFailed:
 68 |             return "Failed to fetch authentication token from registry."
 69 |         case .manifestFetchFailed:
 70 |             return "Failed to fetch image manifest from registry."
 71 |         case .layerDownloadFailed(let digest):
 72 |             return "Failed to download layer: \(digest)"
 73 |         case .missingPart(let partNum):
 74 |             return "Missing required part number \(partNum) for reassembly."
 75 |         case .decompressionFailed(let file):
 76 |             return "Failed to decompress file: \(file)"
 77 |         case .reassemblyFailed(let reason):
 78 |             return "Disk image reassembly failed: \(reason)."
 79 |         case .fileCreationFailed(let path):
 80 |             return "Failed to create the necessary file at path: \(path)"
 81 |         case .reassemblySetupFailed(let path, let underlyingError):
 82 |             return "Failed to set up for reassembly at path: \(path). Underlying error: \(underlyingError.localizedDescription)"
 83 |         case .missingUncompressedSizeAnnotation:
 84 |             return "Could not find the required uncompressed disk size annotation in the image config.json."
 85 |         case .invalidMediaType:
 86 |             return "Invalid media type"
 87 |         }
 88 |     }
 89 | }
 90 | 
 91 | enum VMConfigError: CustomNSError, LocalizedError {
 92 |     case invalidDisplayResolution(String)
 93 |     case invalidMachineIdentifier
 94 |     case emptyMachineIdentifier
 95 |     case emptyHardwareModel
 96 |     case invalidHardwareModel
 97 |     case invalidDiskSize
 98 |     case malformedSizeInput(String)
 99 |     
100 |     var errorDescription: String? {
101 |         switch self {
102 |         case .invalidDisplayResolution(let resolution):
103 |             return "Invalid display resolution: \(resolution)"
104 |         case .emptyMachineIdentifier:
105 |             return "Empty machine identifier"
106 |         case .invalidMachineIdentifier:
107 |             return "Invalid machine identifier"
108 |         case .emptyHardwareModel:
109 |             return "Empty hardware model"
110 |         case .invalidHardwareModel:
111 |             return "Invalid hardware model: the host does not support the hardware model"
112 |         case .invalidDiskSize:
113 |             return "Invalid disk size"
114 |         case .malformedSizeInput(let input):
115 |             return "Malformed size input: \(input)"
116 |         }
117 |     }
118 |     
119 |     static var errorDomain: String { "VMConfigError" }
120 |     
121 |     var errorCode: Int {
122 |         switch self {
123 |         case .invalidDisplayResolution: return 1
124 |         case .emptyMachineIdentifier: return 2
125 |         case .invalidMachineIdentifier: return 3
126 |         case .emptyHardwareModel: return 4
127 |         case .invalidHardwareModel: return 5
128 |         case .invalidDiskSize: return 6
129 |         case .malformedSizeInput: return 7
130 |         }
131 |     }
132 | }
133 | 
134 | enum VMDirectoryError: Error, LocalizedError {
135 |     case configNotFound
136 |     case invalidConfigData
137 |     case diskOperationFailed(String)
138 |     case fileCreationFailed(String)
139 |     case sessionNotFound
140 |     case invalidSessionData
141 |     
142 |     var errorDescription: String {
143 |         switch self {
144 |         case .configNotFound:
145 |             return "VM configuration file not found"
146 |         case .invalidConfigData:
147 |             return "Invalid VM configuration data"
148 |         case .diskOperationFailed(let reason):
149 |             return "Disk operation failed: \(reason)"
150 |         case .fileCreationFailed(let path):
151 |             return "Failed to create file at path: \(path)"
152 |         case .sessionNotFound:
153 |             return "VNC session file not found"
154 |         case .invalidSessionData:
155 |             return "Invalid VNC session data"
156 |         }
157 |     }
158 | }
159 | 
160 | enum VMError: Error, LocalizedError {
161 |     case alreadyExists(String)
162 |     case notFound(String)
163 |     case notInitialized(String)
164 |     case notRunning(String)
165 |     case alreadyRunning(String)
166 |     case installNotStarted(String)
167 |     case stopTimeout(String)
168 |     case resizeTooSmall(current: UInt64, requested: UInt64)
169 |     case vncNotConfigured
170 |     case vncPortBindingFailed(requested: Int, actual: Int)
171 |     case internalError(String)
172 |     case unsupportedOS(String)
173 |     case invalidDisplayResolution(String)
174 |     var errorDescription: String? {
175 |         switch self {
176 |         case .alreadyExists(let name):
177 |             return "Virtual machine already exists with name: \(name)"
178 |         case .notFound(let name):
179 |             return "Virtual machine not found: \(name)"
180 |         case .notInitialized(let name):
181 |             return "Virtual machine not initialized: \(name)"
182 |         case .notRunning(let name):
183 |             return "Virtual machine not running: \(name)"
184 |         case .alreadyRunning(let name):
185 |             return "Virtual machine already running: \(name)"
186 |         case .installNotStarted(let name):
187 |             return "Virtual machine install not started: \(name)"
188 |         case .stopTimeout(let name):
189 |             return "Timeout while stopping virtual machine: \(name)"
190 |         case .resizeTooSmall(let current, let requested):
191 |             return "Cannot resize disk to \(requested) bytes, current size is \(current) bytes"
192 |         case .vncNotConfigured:
193 |             return "VNC is not configured for this virtual machine"
194 |         case .vncPortBindingFailed(let requested, let actual):
195 |             if actual == -1 {
196 |                 return "Could not bind to VNC port \(requested) (port already in use). Try a different port or use port 0 for auto-assign."
197 |             }
198 |             return "Could not bind to VNC port \(requested) (port already in use). System assigned port \(actual) instead. Try a different port or use port 0 for auto-assign."
199 |         case .internalError(let message):
200 |             return "Internal error: \(message)"
201 |         case .unsupportedOS(let os):
202 |             return "Unsupported operating system: \(os)"
203 |         case .invalidDisplayResolution(let resolution):
204 |             return "Invalid display resolution: \(resolution)"
205 |         }
206 |     }
207 | }
208 | 
209 | enum ResticError: Error {
210 |     case snapshotFailed(String)
211 |     case restoreFailed(String)
212 |     case genericError(String)
213 | }
214 | 
215 | enum VmrunError: Error, LocalizedError {
216 |     case commandNotFound
217 |     case operationFailed(command: String, output: String?)
218 | 
219 |     var errorDescription: String? {
220 |         switch self {
221 |         case .commandNotFound:
222 |             return "vmrun command not found. Ensure VMware Fusion is installed and in the system PATH."
223 |         case .operationFailed(let command, let output):
224 |             return "vmrun command '\(command)' failed. Output: \(output ?? "No output")"
225 |         }
226 |     }
227 | }
```

--------------------------------------------------------------------------------
/libs/python/core/core/telemetry/posthog.py:
--------------------------------------------------------------------------------

```python
  1 | """Telemetry client using PostHog for collecting anonymous usage data."""
  2 | 
  3 | from __future__ import annotations
  4 | 
  5 | import logging
  6 | import os
  7 | import uuid
  8 | import sys
  9 | from pathlib import Path
 10 | from typing import Any, Dict, List, Optional
 11 | 
 12 | import posthog
 13 | from core import __version__
 14 | 
 15 | logger = logging.getLogger("core.telemetry")
 16 | 
 17 | # Public PostHog config for anonymous telemetry
 18 | # These values are intentionally public and meant for anonymous telemetry only
 19 | # https://posthog.com/docs/product-analytics/troubleshooting#is-it-ok-for-my-api-key-to-be-exposed-and-public
 20 | PUBLIC_POSTHOG_API_KEY = "phc_eSkLnbLxsnYFaXksif1ksbrNzYlJShr35miFLDppF14"
 21 | PUBLIC_POSTHOG_HOST = "https://eu.i.posthog.com"
 22 | 
 23 | class PostHogTelemetryClient:
 24 |     """Collects and reports telemetry data via PostHog."""
 25 | 
 26 |     # Global singleton (class-managed)
 27 |     _singleton: Optional["PostHogTelemetryClient"] = None
 28 | 
 29 |     def __init__(self):
 30 |         """Initialize PostHog telemetry client."""
 31 |         self.installation_id = self._get_or_create_installation_id()
 32 |         self.initialized = False
 33 |         self.queued_events: List[Dict[str, Any]] = []
 34 | 
 35 |         # Log telemetry status on startup
 36 |         if self.is_telemetry_enabled():
 37 |             logger.info("Telemetry enabled")
 38 |             # Initialize PostHog client if config is available
 39 |             self._initialize_posthog()
 40 |         else:
 41 |             logger.info("Telemetry disabled")
 42 | 
 43 |     @classmethod
 44 |     def is_telemetry_enabled(cls) -> bool:
 45 |         """True if telemetry is currently active for this process."""
 46 |         return (
 47 |             # Legacy opt-out flag
 48 |             os.environ.get("CUA_TELEMETRY", "").lower() != "off"
 49 |             # Opt-in flag (defaults to enabled)
 50 |             and os.environ.get("CUA_TELEMETRY_ENABLED", "true").lower() in { "1", "true", "yes", "on" }
 51 |         )
 52 | 
 53 |     def _get_or_create_installation_id(self) -> str:
 54 |         """Get or create a unique installation ID that persists across runs.
 55 | 
 56 |         The ID is always stored within the core library directory itself,
 57 |         ensuring it persists regardless of how the library is used.
 58 | 
 59 |         This ID is not tied to any personal information.
 60 |         """
 61 |         # Get the core library directory (where this file is located)
 62 |         try:
 63 |             # Find the core module directory using this file's location
 64 |             core_module_dir = Path(
 65 |                 __file__
 66 |             ).parent.parent  # core/telemetry/posthog_client.py -> core/telemetry -> core
 67 |             storage_dir = core_module_dir / ".storage"
 68 |             storage_dir.mkdir(exist_ok=True)
 69 | 
 70 |             id_file = storage_dir / "installation_id"
 71 | 
 72 |             # Try to read existing ID
 73 |             if id_file.exists():
 74 |                 try:
 75 |                     stored_id = id_file.read_text().strip()
 76 |                     if stored_id:  # Make sure it's not empty
 77 |                         logger.debug(f"Using existing installation ID: {stored_id}")
 78 |                         return stored_id
 79 |                 except Exception as e:
 80 |                     logger.debug(f"Error reading installation ID file: {e}")
 81 | 
 82 |             # Create new ID
 83 |             new_id = str(uuid.uuid4())
 84 |             try:
 85 |                 id_file.write_text(new_id)
 86 |                 logger.debug(f"Created new installation ID: {new_id}")
 87 |                 return new_id
 88 |             except Exception as e:
 89 |                 logger.warning(f"Could not write installation ID: {e}")
 90 |         except Exception as e:
 91 |             logger.warning(f"Error accessing core module directory: {e}")
 92 | 
 93 |         # Last resort: Create a new in-memory ID
 94 |         logger.warning("Using random installation ID (will not persist across runs)")
 95 |         return str(uuid.uuid4())
 96 | 
 97 |     def _initialize_posthog(self) -> bool:
 98 |         """Initialize the PostHog client with configuration.
 99 | 
100 |         Returns:
101 |             bool: True if initialized successfully, False otherwise
102 |         """
103 |         if self.initialized:
104 |             return True
105 | 
106 |         try:
107 |             # Allow overrides from environment for testing/region control
108 |             posthog.api_key = PUBLIC_POSTHOG_API_KEY
109 |             posthog.host = PUBLIC_POSTHOG_HOST
110 | 
111 |             # Configure the client
112 |             posthog.debug = os.environ.get("CUA_TELEMETRY_DEBUG", "").lower() == "on"
113 | 
114 |             # Log telemetry status
115 |             logger.info(
116 |                 f"Initializing PostHog telemetry with installation ID: {self.installation_id}"
117 |             )
118 |             if posthog.debug:
119 |                 logger.debug(f"PostHog API Key: {posthog.api_key}")
120 |                 logger.debug(f"PostHog Host: {posthog.host}")
121 | 
122 |             # Identify this installation
123 |             self._identify()
124 | 
125 |             # Process any queued events
126 |             for event in self.queued_events:
127 |                 posthog.capture(
128 |                     distinct_id=self.installation_id,
129 |                     event=event["event"],
130 |                     properties=event["properties"],
131 |                 )
132 |             self.queued_events = []
133 | 
134 |             self.initialized = True
135 |             return True
136 |         except Exception as e:
137 |             logger.warning(f"Failed to initialize PostHog: {e}")
138 |             return False
139 | 
140 |     def _identify(self) -> None:
141 |         """Set up user properties for the current installation with PostHog."""
142 |         try:
143 |             properties = {
144 |                 "version": __version__,
145 |                 "is_ci": "CI" in os.environ,
146 |                 "os": os.name,
147 |                 "python_version": sys.version.split()[0],
148 |             }
149 | 
150 |             logger.debug(
151 |                 f"Setting up PostHog user properties for: {self.installation_id} with properties: {properties}"
152 |             )
153 |             
154 |             # In the Python SDK, we capture an identification event instead of calling identify()
155 |             posthog.capture(
156 |                 distinct_id=self.installation_id,
157 |                 event="$identify",
158 |                 properties={"$set": properties}
159 |             )
160 |             
161 |             logger.info(f"Set up PostHog user properties for installation: {self.installation_id}")
162 |         except Exception as e:
163 |             logger.warning(f"Failed to set up PostHog user properties: {e}")
164 | 
165 |     def record_event(self, event_name: str, properties: Optional[Dict[str, Any]] = None) -> None:
166 |         """Record an event with optional properties.
167 | 
168 |         Args:
169 |             event_name: Name of the event
170 |             properties: Event properties (must not contain sensitive data)
171 |         """
172 |         # Respect runtime telemetry opt-out.
173 |         if not self.is_telemetry_enabled():
174 |             logger.debug("Telemetry disabled; event not recorded.")
175 |             return
176 | 
177 |         event_properties = {"version": __version__, **(properties or {})}
178 | 
179 |         logger.info(f"Recording event: {event_name} with properties: {event_properties}")
180 | 
181 |         if self.initialized:
182 |             try:
183 |                 posthog.capture(
184 |                     distinct_id=self.installation_id, event=event_name, properties=event_properties
185 |                 )
186 |                 logger.info(f"Sent event to PostHog: {event_name}")
187 |                 # Flush immediately to ensure delivery
188 |                 posthog.flush()
189 |             except Exception as e:
190 |                 logger.warning(f"Failed to send event to PostHog: {e}")
191 |         else:
192 |             # Queue the event for later
193 |             logger.info(f"PostHog not initialized, queuing event for later: {event_name}")
194 |             self.queued_events.append({"event": event_name, "properties": event_properties})
195 |             # Try to initialize now if not already
196 |             initialize_result = self._initialize_posthog()
197 |             logger.info(f"Attempted to initialize PostHog: {initialize_result}")
198 | 
199 |     def flush(self) -> bool:
200 |         """Flush any pending events to PostHog.
201 | 
202 |         Returns:
203 |             bool: True if successful, False otherwise
204 |         """
205 |         if not self.initialized and not self._initialize_posthog():
206 |             return False
207 | 
208 |         try:
209 |             posthog.flush()
210 |             return True
211 |         except Exception as e:
212 |             logger.debug(f"Failed to flush PostHog events: {e}")
213 |             return False
214 | 
215 |     @classmethod
216 |     def get_client(cls) -> "PostHogTelemetryClient":
217 |         """Return the global PostHogTelemetryClient instance, creating it if needed."""
218 |         if cls._singleton is None:
219 |             cls._singleton = cls()
220 |         return cls._singleton
221 | 
222 |     @classmethod
223 |     def destroy_client(cls) -> None:
224 |         """Destroy the global PostHogTelemetryClient instance."""
225 |         cls._singleton = None
226 | 
227 | def destroy_telemetry_client() -> None:
228 |     """Destroy the global PostHogTelemetryClient instance (class-managed)."""
229 |     PostHogTelemetryClient.destroy_client()
230 | 
231 | def is_telemetry_enabled() -> bool:
232 |     return PostHogTelemetryClient.is_telemetry_enabled()
233 | 
234 | def record_event(event_name: str, properties: Optional[Dict[str, Any]] | None = None) -> None:
235 |     """Record an arbitrary PostHog event."""
236 |     PostHogTelemetryClient.get_client().record_event(event_name, properties or {})
```

--------------------------------------------------------------------------------
/libs/python/agent/agent/ui/gradio/app.py:
--------------------------------------------------------------------------------

```python
  1 | """
  2 | Advanced Gradio UI for Computer-Use Agent (cua-agent)
  3 | 
  4 | This is a Gradio interface for the Computer-Use Agent v0.4.x (cua-agent)
  5 | with an advanced UI for model selection and configuration.
  6 | 
  7 | Supported Agent Models:
  8 | - OpenAI: openai/computer-use-preview
  9 | - Anthropic: anthropic/claude-3-5-sonnet-20241022, anthropic/claude-3-7-sonnet-20250219
 10 | - UI-TARS: huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B
 11 | - Omniparser: omniparser+anthropic/claude-3-5-sonnet-20241022, omniparser+ollama_chat/gemma3
 12 | 
 13 | Requirements:
 14 |     - Mac with Apple Silicon (M1/M2/M3/M4), Linux, or Windows
 15 |     - macOS 14 (Sonoma) or newer / Ubuntu 20.04+
 16 |     - Python 3.11+
 17 |     - Lume CLI installed (https://github.com/trycua/cua)
 18 |     - OpenAI or Anthropic API key
 19 | """
 20 | 
 21 | import os
 22 | import asyncio
 23 | import logging
 24 | import json
 25 | import platform
 26 | from pathlib import Path
 27 | from typing import Dict, List, Optional, AsyncGenerator, Any, Tuple, Union
 28 | import gradio as gr
 29 | from gradio.components.chatbot import MetadataDict
 30 | from typing import cast
 31 | 
 32 | # Import from agent package
 33 | from agent import ComputerAgent
 34 | from agent.types import Messages, AgentResponse
 35 | from computer import Computer
 36 | 
 37 | # Global variables
 38 | global_agent = None
 39 | global_computer = None
 40 | SETTINGS_FILE = Path(".gradio_settings.json")
 41 | 
 42 | logging.basicConfig(level=logging.INFO)
 43 | 
 44 | import dotenv
 45 | if dotenv.load_dotenv():
 46 |     print(f"DEBUG - Loaded environment variables from {dotenv.find_dotenv()}")
 47 | else:
 48 |     print("DEBUG - No .env file found")
 49 | 
 50 | # --- Settings Load/Save Functions ---
 51 | def load_settings() -> Dict[str, Any]:
 52 |     """Loads settings from the JSON file."""
 53 |     if SETTINGS_FILE.exists():
 54 |         try:
 55 |             with open(SETTINGS_FILE, "r") as f:
 56 |                 settings = json.load(f)
 57 |                 if isinstance(settings, dict):
 58 |                     print(f"DEBUG - Loaded settings from {SETTINGS_FILE}")
 59 |                     return settings
 60 |         except (json.JSONDecodeError, IOError) as e:
 61 |             print(f"Warning: Could not load settings from {SETTINGS_FILE}: {e}")
 62 |     return {}
 63 | 
 64 | 
 65 | def save_settings(settings: Dict[str, Any]):
 66 |     """Saves settings to the JSON file."""
 67 |     settings.pop("provider_api_key", None)
 68 |     try:
 69 |         with open(SETTINGS_FILE, "w") as f:
 70 |             json.dump(settings, f, indent=4)
 71 |         print(f"DEBUG - Saved settings to {SETTINGS_FILE}")
 72 |     except IOError as e:
 73 |         print(f"Warning: Could not save settings to {SETTINGS_FILE}: {e}")
 74 | 
 75 | 
 76 | # # Custom Screenshot Handler for Gradio chat
 77 | # class GradioChatScreenshotHandler:
 78 | #     """Custom handler that adds screenshots to the Gradio chatbot."""
 79 | 
 80 | #     def __init__(self, chatbot_history: List[gr.ChatMessage]):
 81 | #         self.chatbot_history = chatbot_history
 82 | #         print("GradioChatScreenshotHandler initialized")
 83 | 
 84 | #     async def on_screenshot(self, screenshot_base64: str, action_type: str = "") -> None:
 85 | #         """Add screenshot to chatbot when a screenshot is taken."""
 86 | #         image_markdown = f"![Screenshot after {action_type}](data:image/png;base64,{screenshot_base64})"
 87 |         
 88 | #         if self.chatbot_history is not None:
 89 | #             self.chatbot_history.append(
 90 | #                 gr.ChatMessage(
 91 | #                     role="assistant",
 92 | #                     content=image_markdown,
 93 | #                     metadata={"title": f"🖥️ Screenshot - {action_type}", "status": "done"},
 94 | #                 )
 95 | #             )
 96 | 
 97 | 
 98 | # Detect platform capabilities
 99 | is_mac = platform.system().lower() == "darwin"
100 | is_lume_available = is_mac or (os.environ.get("PYLUME_HOST", "localhost") != "localhost")
101 | 
102 | print("PYLUME_HOST: ", os.environ.get("PYLUME_HOST", "localhost"))
103 | print("is_mac: ", is_mac)
104 | print("Lume available: ", is_lume_available)
105 | 
106 | # Map model names to agent model strings
107 | MODEL_MAPPINGS = {
108 |     "openai": {
109 |         "default": "openai/computer-use-preview",
110 |         "OpenAI: Computer-Use Preview": "openai/computer-use-preview",
111 |     },
112 |     "anthropic": {
113 |         "default": "anthropic/claude-3-7-sonnet-20250219",
114 |         "Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-20250514",
115 |         "Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-20250514",
116 |         "Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-3-7-sonnet-20250219",
117 |         "Anthropic: Claude 3.5 Sonnet (20241022)": "anthropic/claude-3-5-sonnet-20241022",
118 |     },
119 |     "omni": {
120 |         "default": "omniparser+openai/gpt-4o",
121 |         "OMNI: OpenAI GPT-4o": "omniparser+openai/gpt-4o",
122 |         "OMNI: OpenAI GPT-4o mini": "omniparser+openai/gpt-4o-mini",
123 |         "OMNI: Claude 3.7 Sonnet (20250219)": "omniparser+anthropic/claude-3-7-sonnet-20250219",
124 |         "OMNI: Claude 3.5 Sonnet (20241022)": "omniparser+anthropic/claude-3-5-sonnet-20241022",
125 |     },
126 |     "uitars": {
127 |         "default": "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B" if is_mac else "ui-tars",
128 |         "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B": "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B",
129 |     },
130 | }
131 | 
132 | 
133 | def get_model_string(model_name: str, loop_provider: str) -> str:
134 |     """Determine the agent model string based on the input."""
135 |     if model_name == "Custom model (OpenAI compatible API)":
136 |         return "custom_oaicompat"
137 |     elif model_name == "Custom model (ollama)":
138 |         return "custom_ollama"
139 |     elif loop_provider == "OMNI-OLLAMA" or model_name.startswith("OMNI: Ollama "):
140 |         if model_name.startswith("OMNI: Ollama "):
141 |             ollama_model = model_name.split("OMNI: Ollama ", 1)[1]
142 |             return f"omniparser+ollama_chat/{ollama_model}"
143 |         return "omniparser+ollama_chat/llama3"
144 |     
145 |     # Map based on loop provider
146 |     mapping = MODEL_MAPPINGS.get(loop_provider.lower(), MODEL_MAPPINGS["openai"])
147 |     return mapping.get(model_name, mapping["default"])
148 | 
149 | 
150 | def get_ollama_models() -> List[str]:
151 |     """Get available models from Ollama if installed."""
152 |     try:
153 |         import subprocess
154 |         result = subprocess.run(["ollama", "list"], capture_output=True, text=True)
155 |         if result.returncode == 0:
156 |             lines = result.stdout.strip().split("\n")
157 |             if len(lines) < 2:
158 |                 return []
159 |             models = []
160 |             for line in lines[1:]:
161 |                 parts = line.split()
162 |                 if parts:
163 |                     model_name = parts[0]
164 |                     models.append(f"OMNI: Ollama {model_name}")
165 |             return models
166 |         return []
167 |     except Exception as e:
168 |         logging.error(f"Error getting Ollama models: {e}")
169 |         return []
170 | 
171 | 
172 | def create_computer_instance(
173 |     verbosity: int = logging.INFO,
174 |     os_type: str = "macos",
175 |     provider_type: str = "lume",
176 |     name: Optional[str] = None,
177 |     api_key: Optional[str] = None
178 | ) -> Computer:
179 |     """Create or get the global Computer instance."""
180 |     global global_computer
181 |     if global_computer is None:
182 |         if provider_type == "localhost":
183 |             global_computer = Computer(
184 |                 verbosity=verbosity,
185 |                 os_type=os_type,
186 |                 use_host_computer_server=True
187 |             )
188 |         else:
189 |             global_computer = Computer(
190 |                 verbosity=verbosity,
191 |                 os_type=os_type,
192 |                 provider_type=provider_type,
193 |                 name=name if name else "",
194 |                 api_key=api_key
195 |             )
196 |     return global_computer
197 | 
198 | 
199 | def create_agent(
200 |     model_string: str,
201 |     save_trajectory: bool = True,
202 |     only_n_most_recent_images: int = 3,
203 |     verbosity: int = logging.INFO,
204 |     custom_model_name: Optional[str] = None,
205 |     computer_os: str = "macos",
206 |     computer_provider: str = "lume",
207 |     computer_name: Optional[str] = None,
208 |     computer_api_key: Optional[str] = None,
209 |     max_trajectory_budget: Optional[float] = None,
210 | ) -> ComputerAgent:
211 |     """Create or update the global agent with the specified parameters."""
212 |     global global_agent
213 | 
214 |     # Create the computer
215 |     computer = create_computer_instance(
216 |         verbosity=verbosity,
217 |         os_type=computer_os,
218 |         provider_type=computer_provider,
219 |         name=computer_name,
220 |         api_key=computer_api_key
221 |     )
222 | 
223 |     # Handle custom models
224 |     if model_string == "custom_oaicompat" and custom_model_name:
225 |         model_string = custom_model_name
226 |     elif model_string == "custom_ollama" and custom_model_name:
227 |         model_string = f"omniparser+ollama_chat/{custom_model_name}"
228 | 
229 |     # Create agent kwargs
230 |     agent_kwargs = {
231 |         "model": model_string,
232 |         "tools": [computer],
233 |         "only_n_most_recent_images": only_n_most_recent_images,
234 |         "verbosity": verbosity,
235 |     }
236 |     
237 |     if save_trajectory:
238 |         agent_kwargs["trajectory_dir"] = "trajectories"
239 |     
240 |     if max_trajectory_budget:
241 |         agent_kwargs["max_trajectory_budget"] = {"max_budget": max_trajectory_budget, "raise_error": True}
242 | 
243 |     global_agent = ComputerAgent(**agent_kwargs)
244 |     return global_agent
245 | 
246 | 
247 | def launch_ui():
248 |     """Standalone function to launch the Gradio app."""
249 |     from agent.ui.gradio.ui_components import create_gradio_ui
250 |     print(f"Starting Gradio app for CUA Agent...")
251 |     demo = create_gradio_ui()
252 |     demo.launch(share=False, inbrowser=True)
253 | 
254 | 
255 | if __name__ == "__main__":
256 |     launch_ui()
257 | 
```

--------------------------------------------------------------------------------
/docs/content/docs/computer-sdk/commands.mdx:
--------------------------------------------------------------------------------

```markdown
  1 | ---
  2 | title: Commands
  3 | description: Computer commands and interface methods
  4 | ---
  5 | 
  6 | This page describes the set of supported **commands** you can use to control a Cua Computer directly via the Python SDK.
  7 | 
  8 | These commands map to the same actions available in the [Computer Server API Commands Reference](../libraries/computer-server/Commands), and provide low-level, async access to system operations from your agent or automation code.
  9 | 
 10 | ## Shell Actions
 11 | 
 12 | Execute shell commands and get detailed results:
 13 | 
 14 | <Tabs items={['Python', 'TypeScript']}>
 15 |   <Tab value="Python">
 16 |     ```python
 17 |     # Run shell command result = await
 18 |     computer.interface.run_command(cmd) # result.stdout, result.stderr, result.returncode
 19 |     ```
 20 |   </Tab>
 21 |   <Tab value="TypeScript">
 22 |     ```typescript
 23 |     // Run shell command const result = await
 24 |     computer.interface.runCommand(cmd); // result.stdout, result.stderr, result.returncode
 25 |     ```
 26 |   </Tab>
 27 | </Tabs>
 28 | 
 29 | ## Mouse Actions
 30 | 
 31 | Precise mouse control and interaction:
 32 | 
 33 | <Tabs items={['Python', 'TypeScript']}>
 34 |   <Tab value="Python">
 35 |     ```python
 36 |     # Basic clicks
 37 |     await computer.interface.left_click(x, y)       # Left click at coordinates
 38 |     await computer.interface.right_click(x, y)      # Right click at coordinates
 39 |     await computer.interface.double_click(x, y)     # Double click at coordinates
 40 | 
 41 |     # Cursor movement and dragging
 42 |     await computer.interface.move_cursor(x, y)      # Move cursor to coordinates
 43 |     await computer.interface.drag_to(x, y, duration)  # Drag to coordinates
 44 |     await computer.interface.get_cursor_position()  # Get current cursor position
 45 | 
 46 |     # Advanced mouse control
 47 |     await computer.interface.mouse_down(x, y, button="left")  # Press and hold a mouse button
 48 |     await computer.interface.mouse_up(x, y, button="left")    # Release a mouse button
 49 |     ```
 50 | 
 51 |   </Tab>
 52 |   <Tab value="TypeScript">
 53 |     ```typescript
 54 |     // Basic clicks
 55 |     await computer.interface.leftClick(x, y);       // Left click at coordinates
 56 |     await computer.interface.rightClick(x, y);      // Right click at coordinates
 57 |     await computer.interface.doubleClick(x, y);     // Double click at coordinates
 58 | 
 59 |     // Cursor movement and dragging
 60 |     await computer.interface.moveCursor(x, y);      // Move cursor to coordinates
 61 |     await computer.interface.dragTo(x, y, duration);  // Drag to coordinates
 62 |     await computer.interface.getCursorPosition();  // Get current cursor position
 63 | 
 64 |     // Advanced mouse control
 65 |     await computer.interface.mouseDown(x, y, "left");  // Press and hold a mouse button
 66 |     await computer.interface.mouseUp(x, y, "left");    // Release a mouse button
 67 |     ```
 68 | 
 69 |   </Tab>
 70 | </Tabs>
 71 | 
 72 | ## Keyboard Actions
 73 | 
 74 | Text input and key combinations:
 75 | 
 76 | <Tabs items={['Python', 'TypeScript']}>
 77 |   <Tab value="Python">
 78 |     ```python
 79 |     # Text input
 80 |     await computer.interface.type_text("Hello")     # Type text
 81 |     await computer.interface.press_key("enter")     # Press a single key
 82 | 
 83 |     # Key combinations and advanced control
 84 |     await computer.interface.hotkey("command", "c") # Press key combination
 85 |     await computer.interface.key_down("command")    # Press and hold a key
 86 |     await computer.interface.key_up("command")      # Release a key
 87 |     ```
 88 | 
 89 |   </Tab>
 90 |   <Tab value="TypeScript">
 91 |     ```typescript
 92 |     // Text input
 93 |     await computer.interface.typeText("Hello");     // Type text
 94 |     await computer.interface.pressKey("enter");     // Press a single key
 95 | 
 96 |     // Key combinations and advanced control
 97 |     await computer.interface.hotkey("command", "c"); // Press key combination
 98 |     await computer.interface.keyDown("command");    // Press and hold a key
 99 |     await computer.interface.keyUp("command");      // Release a key
100 |     ```
101 | 
102 |   </Tab>
103 | </Tabs>
104 | 
105 | ## Scrolling Actions
106 | 
107 | Mouse wheel and scrolling control:
108 | 
109 | <Tabs items={['Python', 'TypeScript']}>
110 |   <Tab value="Python">
111 |     ```python
112 |     # Scrolling
113 |     await computer.interface.scroll(x, y) # Scroll the mouse wheel
114 |     await computer.interface.scroll_down(clicks) # Scroll down await
115 |     computer.interface.scroll_up(clicks) # Scroll up
116 |     ```
117 |   </Tab>
118 |   <Tab value="TypeScript">
119 |     ```typescript 
120 |     // Scrolling 
121 |     await computer.interface.scroll(x, y); // Scroll the mouse wheel 
122 |     await computer.interface.scrollDown(clicks); // Scroll down
123 |     await computer.interface.scrollUp(clicks); // Scroll up 
124 |     ```
125 |   </Tab>
126 | </Tabs>
127 | 
128 | ## Screen Actions
129 | 
130 | Screen capture and display information:
131 | 
132 | <Tabs items={['Python', 'TypeScript']}>
133 |   <Tab value="Python">
134 |     ```python 
135 |     # Screen operations 
136 |     await computer.interface.screenshot() # Take a screenshot 
137 |     await computer.interface.get_screen_size() # Get screen dimensions
138 | 
139 |     ```
140 | 
141 |   </Tab>
142 |   <Tab value="TypeScript">
143 |     ```typescript 
144 |     // Screen operations 
145 |     await computer.interface.screenshot(); // Take a screenshot 
146 |     await computer.interface.getScreenSize(); // Get screen dimensions 
147 |     
148 |     ```
149 |   </Tab>
150 | </Tabs>
151 | 
152 | ## Clipboard Actions
153 | 
154 | System clipboard management:
155 | 
156 | <Tabs items={['Python', 'TypeScript']}>
157 |   <Tab value="Python">
158 |     ```python 
159 |     # Clipboard operations await
160 |     computer.interface.set_clipboard(text) # Set clipboard content await
161 |     computer.interface.copy_to_clipboard() # Get clipboard content
162 | 
163 |     ```
164 | 
165 |   </Tab>
166 |   <Tab value="TypeScript">
167 |     ```typescript 
168 |     // Clipboard operations 
169 |     await computer.interface.setClipboard(text); // Set clipboard content
170 |     await computer.interface.copyToClipboard(); // Get clipboard content
171 | 
172 |     ```
173 | 
174 |   </Tab>
175 | </Tabs>
176 | 
177 | ## File System Operations
178 | 
179 | Direct file and directory manipulation:
180 | 
181 | <Tabs items={['Python', 'TypeScript']}>
182 |   <Tab value="Python">
183 | 
184 |     ```python
185 |     # File existence checks
186 |     await computer.interface.file_exists(path)      # Check if file exists
187 |     await computer.interface.directory_exists(path) # Check if directory exists
188 | 
189 |     # File content operations
190 |     await computer.interface.read_text(path, encoding="utf-8")        # Read file content
191 |     await computer.interface.write_text(path, content, encoding="utf-8") # Write file content
192 |     await computer.interface.read_bytes(path)       # Read file content as bytes
193 |     await computer.interface.write_bytes(path, content) # Write file content as bytes
194 | 
195 |     # File and directory management
196 |     await computer.interface.delete_file(path)      # Delete file
197 |     await computer.interface.create_dir(path)       # Create directory
198 |     await computer.interface.delete_dir(path)       # Delete directory
199 |     await computer.interface.list_dir(path)         # List directory contents
200 |     ```
201 | 
202 |   </Tab>
203 |   <Tab value="TypeScript">
204 |     ```typescript
205 |     # File existence checks
206 |     await computer.interface.fileExists(path);      // Check if file exists
207 |     await computer.interface.directoryExists(path); // Check if directory exists
208 | 
209 |     # File content operations
210 |     await computer.interface.readText(path, "utf-8");        // Read file content
211 |     await computer.interface.writeText(path, content, "utf-8"); // Write file content
212 |     await computer.interface.readBytes(path);       // Read file content as bytes
213 |     await computer.interface.writeBytes(path, content); // Write file content as bytes
214 | 
215 |     # File and directory management
216 |     await computer.interface.deleteFile(path);      // Delete file
217 |     await computer.interface.createDir(path);       // Create directory
218 |     await computer.interface.deleteDir(path);       // Delete directory
219 |     await computer.interface.listDir(path);         // List directory contents
220 |     ```
221 | 
222 |   </Tab>
223 | </Tabs>
224 | 
225 | ## Accessibility
226 | 
227 | Access system accessibility information:
228 | 
229 | <Tabs items={['Python', 'TypeScript']}>
230 |   <Tab value="Python">
231 |     ```python 
232 |     # Get accessibility tree 
233 |     await computer.interface.get_accessibility_tree()
234 | 
235 |     ```
236 | 
237 |   </Tab>
238 |   <Tab value="TypeScript">
239 |     ```typescript 
240 |     // Get accessibility tree 
241 |    await computer.interface.getAccessibilityTree();
242 | 
243 | ```
244 | </Tab>
245 | </Tabs>
246 | 
247 | ## Delay Configuration
248 | 
249 | Control timing between actions:
250 | 
251 | <Tabs items={['Python']}>
252 |   <Tab value="Python">
253 |     ```python
254 |     # Set default delay between all actions (in seconds)
255 |     computer.interface.delay = 0.5  # 500ms delay between actions
256 | 
257 |     # Or specify delay for individual actions
258 |     await computer.interface.left_click(x, y, delay=1.0)     # 1 second delay after click
259 |     await computer.interface.type_text("Hello", delay=0.2)   # 200ms delay after typing
260 |     await computer.interface.press_key("enter", delay=0.5)   # 500ms delay after key press
261 |     ```
262 | 
263 |   </Tab>
264 | </Tabs>
265 | 
266 | ## Python Virtual Environment Operations
267 | 
268 | Manage Python environments:
269 | 
270 | <Tabs items={['Python']}>
271 |   <Tab value="Python">
272 |     ```python
273 |     # Virtual environment management
274 |     await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment
275 |     await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'') # Run a shell command in a virtual environment
276 |     await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception
277 |     ```
278 | 
279 |   </Tab>
280 | </Tabs>
```

--------------------------------------------------------------------------------
/blog/app-use.md:
--------------------------------------------------------------------------------

```markdown
  1 | # App-Use: Control Individual Applications with Cua Agents
  2 | 
  3 | *Published on May 31, 2025 by The Cua Team*
  4 | 
  5 | Today, we are excited to introduce a new experimental feature landing in the [Cua GitHub repository](https://github.com/trycua/cua): **App-Use**. App-Use allows you to create lightweight virtual desktops that limit agent access to specific applications, improving precision of your agent's trajectory. Perfect for parallel workflows, and focused task execution.
  6 | 
  7 | > **Note:** App-Use is currently experimental. To use it, you need to enable it by passing `experiments=["app-use"]` feature flag when creating your Computer instance.
  8 | 
  9 | Check out an example of a Cua Agent automating Cua's team Taco Bell order through the iPhone Mirroring app:
 10 | 
 11 | <div align="center">
 12 |   <video src="https://github.com/user-attachments/assets/6362572e-f784-4006-aa6e-bce10991fab9" width="600" controls></video>
 13 | </div>
 14 | 
 15 | ## What is App-Use?
 16 | 
 17 | App-Use lets you create virtual desktop sessions scoped to specific applications. Instead of giving an agent access to your entire screen, you can say "only work with Safari and Notes" or "just control the iPhone Mirroring app."
 18 | 
 19 | ```python
 20 | # Create a macOS VM with App Use experimental feature enabled
 21 | computer = Computer(experiments=["app-use"])
 22 | 
 23 | # Create a desktop limited to specific apps
 24 | desktop = computer.create_desktop_from_apps(["Safari", "Notes"])
 25 | 
 26 | # Your agent can now only see and interact with these apps
 27 | agent = ComputerAgent(
 28 |     model="anthropic/claude-3-5-sonnet-20241022",
 29 |     tools=[desktop]
 30 | )
 31 | ```
 32 | 
 33 | ## Key Benefits
 34 | 
 35 | ### 1. Lightweight and Fast
 36 | App-Use creates visual filters, not new processes. Your apps continue running normally - we just control what the agent can see and click on. The virtual desktops are composited views that require no additional compute resources beyond the existing window manager operations.
 37 | 
 38 | ### 2. Run Multiple Agents in Parallel
 39 | Deploy a team of specialized agents, each focused on their own apps:
 40 | 
 41 | ```python
 42 | # Create a Computer with App Use enabled
 43 | computer = Computer(experiments=["app-use"])
 44 | 
 45 | # Research agent focuses on browser
 46 | research_desktop = computer.create_desktop_from_apps(["Safari"])
 47 | research_agent = ComputerAgent(tools=[research_desktop], ...)
 48 | 
 49 | # Writing agent focuses on documents  
 50 | writing_desktop = computer.create_desktop_from_apps(["Pages", "Notes"])
 51 | writing_agent = ComputerAgent(tools=[writing_desktop], ...)
 52 | 
 53 | async def run_agent(agent, task):
 54 |     async for result in agent.run(task):
 55 |         print(result.get('text', ''))
 56 | 
 57 | # Run both simultaneously
 58 | await asyncio.gather(
 59 |     run_agent(research_agent, "Research AI trends for 2025"),
 60 |     run_agent(writing_agent, "Draft blog post outline")
 61 | )
 62 | ```
 63 | 
 64 | ## How To: Getting Started with App-Use
 65 | 
 66 | ### Requirements
 67 | 
 68 | To get started with App-Use, you'll need:
 69 | - Python 3.11+
 70 | - macOS Sequoia (15.0) or later
 71 | 
 72 | ### Getting Started
 73 | 
 74 | ```bash
 75 | # Install packages and launch UI
 76 | pip install -U "cua-computer[all]" "cua-agent[all]"
 77 | python -m agent.ui.gradio.app
 78 | ```
 79 | 
 80 | ```python
 81 | import asyncio
 82 | from computer import Computer
 83 | from agent import ComputerAgent
 84 | 
 85 | async def main():
 86 |     computer = Computer()
 87 |     await computer.run()
 88 |     
 89 |     # Create app-specific desktop sessions
 90 |     desktop = computer.create_desktop_from_apps(["Notes"])
 91 |     
 92 |     # Initialize an agent
 93 |     agent = ComputerAgent(
 94 |         model="anthropic/claude-3-5-sonnet-20241022",
 95 |         tools=[desktop]
 96 |     )
 97 |     
 98 |     # Take a screenshot (returns bytes by default)
 99 |     screenshot = await desktop.interface.screenshot()
100 |     with open("app_screenshot.png", "wb") as f:
101 |         f.write(screenshot)
102 |     
103 |     # Run an agent task
104 |     async for result in agent.run("Create a new note titled 'Meeting Notes' and add today's agenda items"):
105 |         print(f"Agent: {result.get('text', '')}")
106 | 
107 | if __name__ == "__main__":
108 |     asyncio.run(main())
109 | ```
110 | 
111 | ## Use Case: Automating Your iPhone with Cua
112 | 
113 | ### ⚠️ Important Warning
114 | 
115 | Computer-use agents are powerful tools that can interact with your devices. This guide involves using your own macOS and iPhone instead of a VM. **Proceed at your own risk.** Always:
116 | - Review agent actions before running
117 | - Start with non-critical tasks
118 | - Monitor agent behavior closely
119 | 
120 | Remember with Cua it is still advised to use a VM for a better level of isolation for your agents.
121 | 
122 | ### Setting Up iPhone Automation
123 | 
124 | ### Step 1: Start the cua-computer-server
125 | 
126 | First, you'll need to start the cua-computer-server locally to enable access to iPhone Mirroring via the Computer interface:
127 | 
128 | ```bash
129 | # Install the server
130 | pip install cua-computer-server
131 | 
132 | # Start the server
133 | python -m computer_server
134 | ```
135 | 
136 | ### Step 2: Connect iPhone Mirroring
137 | 
138 | Then, you'll need to open the "iPhone Mirroring" app on your Mac and connect it to your iPhone.
139 | 
140 | ### Step 3: Create an iPhone Automation Session
141 | 
142 | Finally, you can create an iPhone automation session:
143 | 
144 | ```python
145 | import asyncio
146 | from computer import Computer
147 | from cua_agent import Agent
148 | 
149 | async def automate_iphone():
150 |     # Connect to your local computer server
151 |     my_mac = Computer(use_host_computer_server=True, os_type="macos", experiments=["app-use"])
152 |     await my_mac.run()
153 |     
154 |     # Create a desktop focused on iPhone Mirroring
155 |     my_iphone = my_mac.create_desktop_from_apps(["iPhone Mirroring"])
156 |     
157 |     # Initialize an agent for iPhone automation
158 |     agent = ComputerAgent(
159 |         model="anthropic/claude-3-5-sonnet-20241022",
160 |         tools=[my_iphone]
161 |     )
162 |     
163 |     # Example: Send a message
164 |     async for result in agent.run("Open Messages and send 'Hello from Cua!' to John"):
165 |         print(f"Agent: {result.get('text', '')}")
166 |     
167 |     # Example: Set a reminder
168 |     async for result in agent.run("Create a reminder to call mom at 5 PM today"):
169 |         print(f"Agent: {result.get('text', '')}")
170 | 
171 | if __name__ == "__main__":
172 |     asyncio.run(automate_iphone())
173 | ```
174 | 
175 | ### iPhone Automation Use Cases
176 | 
177 | With Cua's iPhone automation, you can:
178 | - **Automate messaging**: Send texts, respond to messages, manage conversations
179 | - **Control apps**: Navigate any iPhone app using natural language
180 | - **Manage settings**: Adjust iPhone settings programmatically
181 | - **Extract data**: Read information from apps that don't have APIs
182 | - **Test iOS apps**: Automate testing workflows for iPhone applications
183 | 
184 | ## Important Notes
185 | 
186 | - **Visual isolation only**: Apps share the same files, OS resources, and user session
187 | - **Dynamic resolution**: Desktops automatically scale to fit app windows and menu bars
188 | - **macOS only**: Currently requires macOS due to compositing engine dependencies
189 | - **Not a security boundary**: This is for agent focus, not security isolation
190 | 
191 | ## When to Use What: App-Use vs Multiple Cua Containers
192 | 
193 | ### Use App-Use within the same macOS Cua Container:
194 | - ✅ You need lightweight, fast agent focusing (macOS only)
195 | - ✅ You want to run multiple agents on one desktop
196 | - ✅ You're automating personal devices like iPhones
197 | - ✅ Window layout isolation is sufficient
198 | - ✅ You want low computational overhead
199 | 
200 | ### Use Multiple Cua Containers:
201 | - ✅ You need maximum isolation between agents
202 | - ✅ You require cross-platform support (Mac/Linux/Windows)
203 | - ✅ You need guaranteed resource allocation
204 | - ✅ Security and complete isolation are critical
205 | - ⚠️ Note: Most computationally expensive option
206 | 
207 | ## Pro Tips
208 | 
209 | 1. **Start Small**: Test with one app before creating complex multi-app desktops
210 | 2. **Screenshot First**: Take a screenshot to verify your desktop shows the right apps
211 | 3. **Name Your Apps Correctly**: Use exact app names as they appear in the system
212 | 4. **Consider Performance**: While lightweight, too many parallel agents can still impact system performance
213 | 5. **Plan Your Workflows**: Design agent tasks to minimize app switching for best results
214 | 
215 | ### How It Works
216 | 
217 | When you create a desktop session with `create_desktop_from_apps()`, App Use:
218 | - Filters the visual output to show only specified application windows
219 | - Routes input events only to those applications
220 | - Maintains window layout isolation between different sessions
221 | - Shares the underlying file system and OS resources
222 | - **Dynamically adjusts resolution** to fit the window layout and menu bar items
223 | 
224 | The resolution of these virtual desktops is dynamic, automatically scaling to accommodate the applications' window sizes and menu bar requirements. This ensures that agents always have a clear view of the entire interface they need to interact with, regardless of the specific app combination.
225 | 
226 | Currently, App Use is limited to macOS only due to its reliance on Quartz, Apple's powerful compositing engine, for creating these virtual desktops. Quartz provides the low-level window management and rendering capabilities that make it possible to composite multiple application windows into isolated visual environments.
227 | 
228 | ## Conclusion
229 | 
230 | App Use brings a new dimension to computer automation - lightweight, focused, and parallel. Whether you're building a personal iPhone assistant or orchestrating a team of specialized agents, App Use provides the perfect balance of functionality and efficiency.
231 | 
232 | Ready to try it? Update to the latest Cua version and start focusing your agents today!
233 | 
234 | ```bash
235 | pip install -U "cua-computer[all]" "cua-agent[all]"
236 | ```
237 | 
238 | Happy automating! 🎯🤖
239 | 
```

--------------------------------------------------------------------------------
/blog/introducing-cua-cloud-containers.md:
--------------------------------------------------------------------------------

```markdown
  1 | # Introducing Cua Cloud Sandbox: Computer-Use Agents in the Cloud
  2 | 
  3 | *Published on May 28, 2025 by Francesco Bonacci*
  4 | 
  5 | Welcome to the next chapter in our Computer-Use Agent journey! In [Part 1](./build-your-own-operator-on-macos-1), we showed you how to build your own Operator on macOS. In [Part 2](./build-your-own-operator-on-macos-2), we explored the cua-agent framework. Today, we're excited to introduce **Cua Cloud Sandbox** – the easiest way to deploy Computer-Use Agents at scale.
  6 | 
  7 | <div align="center">
  8 |   <video src="https://github.com/user-attachments/assets/63a2addf-649f-4468-971d-58d38dd43ee6" width="600" controls></video>
  9 | </div>
 10 | 
 11 | ## What is Cua Cloud?
 12 | 
 13 | Think of Cua Cloud as **Docker for Computer-Use Agents**. Instead of managing VMs, installing dependencies, and configuring environments, you can launch pre-configured Cloud Sandbox instances with a single command. Each sandbox comes with a **full desktop environment** accessible via browser (via noVNC), all CUA-related dependencies pre-configured (with a PyAutoGUI-compatible server), and **pay-per-use pricing** that scales with your needs.
 14 | 
 15 | ## Why Cua Cloud Sandbox?
 16 | 
 17 | Four months ago, we launched [**Lume**](https://github.com/trycua/cua/tree/main/libs/lume) and [**Cua**](https://github.com/trycua/cua) with the goal to bring sandboxed VMs and Computer-Use Agents on Apple Silicon. The developer's community response was incredible 🎉 
 18 | 
 19 | Going from prototype to production revealed a problem though: **local macOS VMs don't scale**, neither are they easily portable. 
 20 | 
 21 | Our Discord community, YC peers, and early pilot customers kept hitting the same issues. Storage constraints meant **20-40GB per VM** filled laptops fast. Different hardware architectures (Apple Silicon ARM vs Intel x86) prevented portability of local workflows. Every new user lost a day to setup and configuration.
 22 | 
 23 | **Cua Cloud** eliminates these constraints while preserving everything developers are familiar with about our Computer and Agent SDK.
 24 | 
 25 | ### What We Built
 26 | 
 27 | Over the past month, we've been iterating over Cua Cloud with partners and beta users to address these challenges. You use the exact same `Computer` and `ComputerAgent` classes you already know, but with **zero local setup** or storage requirements. VNC access comes with **built-in encryption**, you pay only for compute time (not idle resources), and can bring your own API keys for any LLM provider.
 28 | 
 29 | The result? **Instant deployment** in seconds instead of hours, with no infrastructure to manage. Scale elastically from **1 to 100 agents** in parallel, with consistent behavior across all deployments. Share agent trajectories with your team for better collaboration and debugging.
 30 | 
 31 | ## Getting Started
 32 | 
 33 | ### Step 1: Get Your API Key
 34 | 
 35 | Sign up at [**trycua.com**](https://trycua.com) to get your API key.
 36 | 
 37 | ```bash
 38 | # Set your API key in environment variables
 39 | export CUA_API_KEY=your_api_key_here
 40 | export CUA_CONTAINER_NAME=my-agent-container
 41 | ```
 42 | 
 43 | ### Step 2: Launch Your First Sandbox
 44 | 
 45 | ```python
 46 | import asyncio
 47 | from computer import Computer, VMProviderType
 48 | from agent import ComputerAgent
 49 | 
 50 | async def run_cloud_agent():
 51 |     # Create a remote Linux computer with Cua Cloud
 52 |     computer = Computer(
 53 |         os_type="linux",
 54 |         api_key=os.getenv("CUA_API_KEY"),
 55 |         name=os.getenv("CUA_CONTAINER_NAME"),
 56 |         provider_type=VMProviderType.CLOUD,
 57 |     )
 58 |     
 59 |     # Create an agent with your preferred loop
 60 |     agent = ComputerAgent(
 61 |         model="openai/gpt-4o",
 62 |         save_trajectory=True,
 63 |         verbosity=logging.INFO,
 64 |         tools=[computer]
 65 |     )
 66 |     
 67 |     # Run a task
 68 |     async for result in agent.run("Open Chrome and search for AI news"):
 69 |         print(f"Response: {result.get('text')}")
 70 | 
 71 | # Run the agent
 72 | asyncio.run(run_cloud_agent())
 73 | ```
 74 | 
 75 | ### Available Tiers
 76 | 
 77 | We're launching with **three compute tiers** to match your workload needs:
 78 | 
 79 | - **Small** (1 vCPU, 4GB RAM) - Perfect for simple automation tasks and testing
 80 | - **Medium** (2 vCPU, 8GB RAM) - Ideal for most production workloads
 81 | - **Large** (8 vCPU, 32GB RAM) - Built for complex, resource-intensive operations
 82 | 
 83 | Each tier includes a **full Linux with Xfce desktop environment** with pre-configured browser, **secure VNC access** with SSL, persistent storage during your session, and automatic cleanup on termination for sandboxes.
 84 | 
 85 | ## How some customers are using Cua Cloud today
 86 | 
 87 | ### Example 1: Automated GitHub Workflow
 88 | 
 89 | Let's automate a complete GitHub workflow:
 90 | 
 91 | ```python
 92 | import asyncio
 93 | import os
 94 | from computer import Computer, VMProviderType
 95 | from agent import ComputerAgent
 96 | 
 97 | async def github_automation():
 98 |     """Automate GitHub repository management tasks."""
 99 |     computer = Computer(
100 |         os_type="linux",
101 |         api_key=os.getenv("CUA_API_KEY"),
102 |         name="github-automation",
103 |         provider_type=VMProviderType.CLOUD,
104 |     )
105 |     
106 |     agent = ComputerAgent(
107 |         model="openai/gpt-4o",
108 |         save_trajectory=True,
109 |         verbosity=logging.INFO,
110 |         tools=[computer]
111 |     )
112 |     
113 |     tasks = [
114 |         "Look for a repository named trycua/cua on GitHub.",
115 |         "Check the open issues, open the most recent one and read it.",
116 |         "Clone the repository if it doesn't exist yet.",
117 |         "Create a new branch for the issue.",
118 |         "Make necessary changes to resolve the issue.",
119 |         "Commit the changes with a descriptive message.",
120 |         "Create a pull request."
121 |     ]
122 |     
123 |     for i, task in enumerate(tasks):
124 |         print(f"\nExecuting task {i+1}/{len(tasks)}: {task}")
125 |         async for result in agent.run(task):
126 |             print(f"Response: {result.get('text')}")
127 |             
128 |             # Check if any tools were used
129 |             tools = result.get('tools')
130 |             if tools:
131 |                 print(f"Tools used: {tools}")
132 |         
133 |         print(f"Task {i+1} completed")
134 | 
135 | # Run the automation
136 | asyncio.run(github_automation())
137 | ```
138 | 
139 | ### Example 2: Parallel Web Scraping
140 | 
141 | Run multiple agents in parallel to scrape different websites:
142 | 
143 | ```python
144 | import asyncio
145 | from computer import Computer, VMProviderType
146 | from agent import ComputerAgent
147 | 
148 | async def scrape_website(site_name, url):
149 |     """Scrape a website using a cloud agent."""
150 |     computer = Computer(
151 |         os_type="linux",
152 |         api_key=os.getenv("CUA_API_KEY"),
153 |         name=f"scraper-{site_name}",
154 |         provider_type=VMProviderType.CLOUD,
155 |     )
156 |     
157 |     agent = ComputerAgent(
158 |         model="openai/gpt-4o",
159 |         save_trajectory=True,
160 |         tools=[computer]
161 |     )
162 |     
163 |     results = []
164 |     tasks = [
165 |         f"Navigate to {url}",
166 |         "Extract the main headlines or article titles",
167 |         "Take a screenshot of the page",
168 |         "Save the extracted data to a file"
169 |     ]
170 |     
171 |     for task in tasks:
172 |         async for result in agent.run(task):
173 |             results.append({
174 |                 'site': site_name,
175 |                 'task': task,
176 |                 'response': result.get('text')
177 |             })
178 |     
179 |     return results
180 | 
181 | async def parallel_scraping():
182 |     """Scrape multiple websites in parallel."""
183 |     sites = [
184 |         ("ArXiv", "https://arxiv.org"),
185 |         ("HackerNews", "https://news.ycombinator.com"),
186 |         ("TechCrunch", "https://techcrunch.com")
187 |     ]
188 |     
189 |     # Run all scraping tasks in parallel
190 |     tasks = [scrape_website(name, url) for name, url in sites]
191 |     results = await asyncio.gather(*tasks)
192 |     
193 |     # Process results
194 |     for site_results in results:
195 |         print(f"\nResults from {site_results[0]['site']}:")
196 |         for result in site_results:
197 |             print(f"  - {result['task']}: {result['response'][:100]}...")
198 | 
199 | # Run parallel scraping
200 | asyncio.run(parallel_scraping())
201 | ```
202 | 
203 | ## Cost Optimization Tips
204 | 
205 | To optimize your costs, use appropriate sandbox sizes for your workload and implement timeouts to prevent runaway tasks. Batch related operations together to minimize sandbox spin-up time, and always remember to terminate sandboxes when your work is complete.
206 | 
207 | ## Security Considerations
208 | 
209 | Cua Cloud runs all sandboxes in isolated environments with encrypted VNC connections. Your API keys are never exposed in trajectories.
210 | 
211 | ## What's Next for Cua Cloud
212 | 
213 | We're just getting started! Here's what's coming in the next few months:
214 | 
215 | ### Elastic Autoscaled Sandbox Pools
216 | 
217 | Soon you'll be able to create elastic sandbox pools that automatically scale based on demand. Define minimum and maximum sandbox counts, and let Cua Cloud handle the rest. Perfect for batch processing, scheduled automations, and handling traffic spikes without manual intervention.
218 | 
219 | ### Windows and macOS Cloud Support
220 | 
221 | While we're launching with Linux sandboxes, Windows and macOS cloud machines are coming soon. Run Windows-specific automations, test cross-platform workflows, or leverage macOS-exclusive applications – all in the cloud with the same simple API.
222 | 
223 | Stay tuned for updates and join our [**Discord**](https://discord.gg/cua-ai) to vote on which features you'd like to see first!
224 | 
225 | ## Get Started Today
226 | 
227 | Ready to deploy your Computer-Use Agents in the cloud?
228 | 
229 | Visit [**trycua.com**](https://trycua.com) to sign up and get your API key. Join our [**Discord community**](https://discord.gg/cua-ai) for support and explore more examples on [**GitHub**](https://github.com/trycua/cua).
230 | 
231 | Happy RPA 2.0! 🚀
232 | 
```