This is page 8 of 21. Use http://codebase.md/trycua/cua?lines=true&page={x} to view the full context. # Directory Structure ``` ├── .all-contributorsrc ├── .cursorignore ├── .devcontainer │ ├── devcontainer.json │ ├── post-install.sh │ └── README.md ├── .dockerignore ├── .gitattributes ├── .github │ ├── FUNDING.yml │ ├── scripts │ │ ├── get_pyproject_version.py │ │ └── tests │ │ ├── __init__.py │ │ ├── README.md │ │ └── test_get_pyproject_version.py │ └── workflows │ ├── ci-lume.yml │ ├── docker-publish-kasm.yml │ ├── docker-publish-xfce.yml │ ├── docker-reusable-publish.yml │ ├── npm-publish-computer.yml │ ├── npm-publish-core.yml │ ├── publish-lume.yml │ ├── pypi-publish-agent.yml │ ├── pypi-publish-computer-server.yml │ ├── pypi-publish-computer.yml │ ├── pypi-publish-core.yml │ ├── pypi-publish-mcp-server.yml │ ├── pypi-publish-pylume.yml │ ├── pypi-publish-som.yml │ ├── pypi-reusable-publish.yml │ └── test-validation-script.yml ├── .gitignore ├── .vscode │ ├── docs.code-workspace │ ├── launch.json │ ├── libs-ts.code-workspace │ ├── lume.code-workspace │ ├── lumier.code-workspace │ ├── py.code-workspace │ └── settings.json ├── blog │ ├── app-use.md │ ├── assets │ │ ├── composite-agents.png │ │ ├── docker-ubuntu-support.png │ │ ├── hack-booth.png │ │ ├── hack-closing-ceremony.jpg │ │ ├── hack-cua-ollama-hud.jpeg │ │ ├── hack-leaderboard.png │ │ ├── hack-the-north.png │ │ ├── hack-winners.jpeg │ │ ├── hack-workshop.jpeg │ │ ├── hud-agent-evals.png │ │ └── trajectory-viewer.jpeg │ ├── bringing-computer-use-to-the-web.md │ ├── build-your-own-operator-on-macos-1.md │ ├── build-your-own-operator-on-macos-2.md │ ├── composite-agents.md │ ├── cua-hackathon.md │ ├── hack-the-north.md │ ├── hud-agent-evals.md │ ├── human-in-the-loop.md │ ├── introducing-cua-cloud-containers.md │ ├── lume-to-containerization.md │ ├── sandboxed-python-execution.md │ ├── training-computer-use-models-trajectories-1.md │ ├── trajectory-viewer.md │ ├── ubuntu-docker-support.md │ └── windows-sandbox.md ├── CONTRIBUTING.md ├── Development.md ├── Dockerfile ├── docs │ ├── .gitignore │ ├── .prettierrc │ ├── content │ │ └── docs │ │ ├── agent-sdk │ │ │ ├── agent-loops.mdx │ │ │ ├── benchmarks │ │ │ │ ├── index.mdx │ │ │ │ ├── interactive.mdx │ │ │ │ ├── introduction.mdx │ │ │ │ ├── meta.json │ │ │ │ ├── osworld-verified.mdx │ │ │ │ ├── screenspot-pro.mdx │ │ │ │ └── screenspot-v2.mdx │ │ │ ├── callbacks │ │ │ │ ├── agent-lifecycle.mdx │ │ │ │ ├── cost-saving.mdx │ │ │ │ ├── index.mdx │ │ │ │ ├── logging.mdx │ │ │ │ ├── meta.json │ │ │ │ ├── pii-anonymization.mdx │ │ │ │ └── trajectories.mdx │ │ │ ├── chat-history.mdx │ │ │ ├── custom-computer-handlers.mdx │ │ │ ├── custom-tools.mdx │ │ │ ├── customizing-computeragent.mdx │ │ │ ├── integrations │ │ │ │ ├── hud.mdx │ │ │ │ └── meta.json │ │ │ ├── message-format.mdx │ │ │ ├── meta.json │ │ │ ├── migration-guide.mdx │ │ │ ├── prompt-caching.mdx │ │ │ ├── supported-agents │ │ │ │ ├── composed-agents.mdx │ │ │ │ ├── computer-use-agents.mdx │ │ │ │ ├── grounding-models.mdx │ │ │ │ ├── human-in-the-loop.mdx │ │ │ │ └── meta.json │ │ │ ├── supported-model-providers │ │ │ │ ├── index.mdx │ │ │ │ └── local-models.mdx │ │ │ └── usage-tracking.mdx │ │ ├── computer-sdk │ │ │ ├── cloud-vm-management.mdx │ │ │ ├── commands.mdx │ │ │ ├── computer-ui.mdx │ │ │ ├── computers.mdx │ │ │ ├── meta.json │ │ │ └── sandboxed-python.mdx │ │ ├── index.mdx │ │ ├── libraries │ │ │ ├── agent │ │ │ │ └── index.mdx │ │ │ ├── computer │ │ │ │ └── index.mdx │ │ │ ├── computer-server │ │ │ │ ├── Commands.mdx │ │ │ │ ├── index.mdx │ │ │ │ ├── REST-API.mdx │ │ │ │ └── WebSocket-API.mdx │ │ │ ├── core │ │ │ │ └── index.mdx │ │ │ ├── lume │ │ │ │ ├── cli-reference.mdx │ │ │ │ ├── faq.md │ │ │ │ ├── http-api.mdx │ │ │ │ ├── index.mdx │ │ │ │ ├── installation.mdx │ │ │ │ ├── meta.json │ │ │ │ └── prebuilt-images.mdx │ │ │ ├── lumier │ │ │ │ ├── building-lumier.mdx │ │ │ │ ├── docker-compose.mdx │ │ │ │ ├── docker.mdx │ │ │ │ ├── index.mdx │ │ │ │ ├── installation.mdx │ │ │ │ └── meta.json │ │ │ ├── mcp-server │ │ │ │ ├── client-integrations.mdx │ │ │ │ ├── configuration.mdx │ │ │ │ ├── index.mdx │ │ │ │ ├── installation.mdx │ │ │ │ ├── llm-integrations.mdx │ │ │ │ ├── meta.json │ │ │ │ ├── tools.mdx │ │ │ │ └── usage.mdx │ │ │ └── som │ │ │ ├── configuration.mdx │ │ │ └── index.mdx │ │ ├── meta.json │ │ ├── quickstart-cli.mdx │ │ ├── quickstart-devs.mdx │ │ └── telemetry.mdx │ ├── next.config.mjs │ ├── package-lock.json │ ├── package.json │ ├── pnpm-lock.yaml │ ├── postcss.config.mjs │ ├── public │ │ └── img │ │ ├── agent_gradio_ui.png │ │ ├── agent.png │ │ ├── cli.png │ │ ├── computer.png │ │ ├── som_box_threshold.png │ │ └── som_iou_threshold.png │ ├── README.md │ ├── source.config.ts │ ├── src │ │ ├── app │ │ │ ├── (home) │ │ │ │ ├── [[...slug]] │ │ │ │ │ └── page.tsx │ │ │ │ └── layout.tsx │ │ │ ├── api │ │ │ │ └── search │ │ │ │ └── route.ts │ │ │ ├── favicon.ico │ │ │ ├── global.css │ │ │ ├── layout.config.tsx │ │ │ ├── layout.tsx │ │ │ ├── llms.mdx │ │ │ │ └── [[...slug]] │ │ │ │ └── route.ts │ │ │ └── llms.txt │ │ │ └── route.ts │ │ ├── assets │ │ │ ├── discord-black.svg │ │ │ ├── discord-white.svg │ │ │ ├── logo-black.svg │ │ │ └── logo-white.svg │ │ ├── components │ │ │ ├── iou.tsx │ │ │ └── mermaid.tsx │ │ ├── lib │ │ │ ├── llms.ts │ │ │ └── source.ts │ │ └── mdx-components.tsx │ └── tsconfig.json ├── examples │ ├── agent_examples.py │ ├── agent_ui_examples.py │ ├── cloud_api_examples.py │ ├── computer_examples_windows.py │ ├── computer_examples.py │ ├── computer_ui_examples.py │ ├── computer-example-ts │ │ ├── .env.example │ │ ├── .gitignore │ │ ├── .prettierrc │ │ ├── package-lock.json │ │ ├── package.json │ │ ├── pnpm-lock.yaml │ │ ├── README.md │ │ ├── src │ │ │ ├── helpers.ts │ │ │ └── index.ts │ │ └── tsconfig.json │ ├── docker_examples.py │ ├── evals │ │ ├── hud_eval_examples.py │ │ └── wikipedia_most_linked.txt │ ├── pylume_examples.py │ ├── sandboxed_functions_examples.py │ ├── som_examples.py │ ├── utils.py │ └── winsandbox_example.py ├── img │ ├── agent_gradio_ui.png │ ├── agent.png │ ├── cli.png │ ├── computer.png │ ├── logo_black.png │ └── logo_white.png ├── libs │ ├── kasm │ │ ├── Dockerfile │ │ ├── LICENSE │ │ ├── README.md │ │ └── src │ │ └── ubuntu │ │ └── install │ │ └── firefox │ │ ├── custom_startup.sh │ │ ├── firefox.desktop │ │ └── install_firefox.sh │ ├── lume │ │ ├── .cursorignore │ │ ├── CONTRIBUTING.md │ │ ├── Development.md │ │ ├── img │ │ │ └── cli.png │ │ ├── Package.resolved │ │ ├── Package.swift │ │ ├── README.md │ │ ├── resources │ │ │ └── lume.entitlements │ │ ├── scripts │ │ │ ├── build │ │ │ │ ├── build-debug.sh │ │ │ │ ├── build-release-notarized.sh │ │ │ │ └── build-release.sh │ │ │ └── install.sh │ │ ├── src │ │ │ ├── Commands │ │ │ │ ├── Clone.swift │ │ │ │ ├── Config.swift │ │ │ │ ├── Create.swift │ │ │ │ ├── Delete.swift │ │ │ │ ├── Get.swift │ │ │ │ ├── Images.swift │ │ │ │ ├── IPSW.swift │ │ │ │ ├── List.swift │ │ │ │ ├── Logs.swift │ │ │ │ ├── Options │ │ │ │ │ └── FormatOption.swift │ │ │ │ ├── Prune.swift │ │ │ │ ├── Pull.swift │ │ │ │ ├── Push.swift │ │ │ │ ├── Run.swift │ │ │ │ ├── Serve.swift │ │ │ │ ├── Set.swift │ │ │ │ └── Stop.swift │ │ │ ├── ContainerRegistry │ │ │ │ ├── ImageContainerRegistry.swift │ │ │ │ ├── ImageList.swift │ │ │ │ └── ImagesPrinter.swift │ │ │ ├── Errors │ │ │ │ └── Errors.swift │ │ │ ├── FileSystem │ │ │ │ ├── Home.swift │ │ │ │ ├── Settings.swift │ │ │ │ ├── VMConfig.swift │ │ │ │ ├── VMDirectory.swift │ │ │ │ └── VMLocation.swift │ │ │ ├── LumeController.swift │ │ │ ├── Main.swift │ │ │ ├── Server │ │ │ │ ├── Handlers.swift │ │ │ │ ├── HTTP.swift │ │ │ │ ├── Requests.swift │ │ │ │ ├── Responses.swift │ │ │ │ └── Server.swift │ │ │ ├── Utils │ │ │ │ ├── CommandRegistry.swift │ │ │ │ ├── CommandUtils.swift │ │ │ │ ├── Logger.swift │ │ │ │ ├── NetworkUtils.swift │ │ │ │ ├── Path.swift │ │ │ │ ├── ProcessRunner.swift │ │ │ │ ├── ProgressLogger.swift │ │ │ │ ├── String.swift │ │ │ │ └── Utils.swift │ │ │ ├── Virtualization │ │ │ │ ├── DarwinImageLoader.swift │ │ │ │ ├── DHCPLeaseParser.swift │ │ │ │ ├── ImageLoaderFactory.swift │ │ │ │ └── VMVirtualizationService.swift │ │ │ ├── VM │ │ │ │ ├── DarwinVM.swift │ │ │ │ ├── LinuxVM.swift │ │ │ │ ├── VM.swift │ │ │ │ ├── VMDetails.swift │ │ │ │ ├── VMDetailsPrinter.swift │ │ │ │ ├── VMDisplayResolution.swift │ │ │ │ └── VMFactory.swift │ │ │ └── VNC │ │ │ ├── PassphraseGenerator.swift │ │ │ └── VNCService.swift │ │ └── tests │ │ ├── Mocks │ │ │ ├── MockVM.swift │ │ │ ├── MockVMVirtualizationService.swift │ │ │ └── MockVNCService.swift │ │ ├── VM │ │ │ └── VMDetailsPrinterTests.swift │ │ ├── VMTests.swift │ │ ├── VMVirtualizationServiceTests.swift │ │ └── VNCServiceTests.swift │ ├── lumier │ │ ├── .dockerignore │ │ ├── Dockerfile │ │ ├── README.md │ │ └── src │ │ ├── bin │ │ │ └── entry.sh │ │ ├── config │ │ │ └── constants.sh │ │ ├── hooks │ │ │ └── on-logon.sh │ │ └── lib │ │ ├── utils.sh │ │ └── vm.sh │ ├── python │ │ ├── agent │ │ │ ├── .bumpversion.cfg │ │ │ ├── agent │ │ │ │ ├── __init__.py │ │ │ │ ├── __main__.py │ │ │ │ ├── adapters │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── huggingfacelocal_adapter.py │ │ │ │ │ ├── human_adapter.py │ │ │ │ │ ├── mlxvlm_adapter.py │ │ │ │ │ └── models │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── generic.py │ │ │ │ │ ├── internvl.py │ │ │ │ │ ├── opencua.py │ │ │ │ │ └── qwen2_5_vl.py │ │ │ │ ├── agent.py │ │ │ │ ├── callbacks │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── budget_manager.py │ │ │ │ │ ├── image_retention.py │ │ │ │ │ ├── logging.py │ │ │ │ │ ├── operator_validator.py │ │ │ │ │ ├── pii_anonymization.py │ │ │ │ │ ├── prompt_instructions.py │ │ │ │ │ ├── telemetry.py │ │ │ │ │ └── trajectory_saver.py │ │ │ │ ├── cli.py │ │ │ │ ├── computers │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── cua.py │ │ │ │ │ └── custom.py │ │ │ │ ├── decorators.py │ │ │ │ ├── human_tool │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── __main__.py │ │ │ │ │ ├── server.py │ │ │ │ │ └── ui.py │ │ │ │ ├── integrations │ │ │ │ │ └── hud │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── agent.py │ │ │ │ │ └── proxy.py │ │ │ │ ├── loops │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── anthropic.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── composed_grounded.py │ │ │ │ │ ├── gemini.py │ │ │ │ │ ├── glm45v.py │ │ │ │ │ ├── gta1.py │ │ │ │ │ ├── holo.py │ │ │ │ │ ├── internvl.py │ │ │ │ │ ├── model_types.csv │ │ │ │ │ ├── moondream3.py │ │ │ │ │ ├── omniparser.py │ │ │ │ │ ├── openai.py │ │ │ │ │ ├── opencua.py │ │ │ │ │ └── uitars.py │ │ │ │ ├── proxy │ │ │ │ │ ├── examples.py │ │ │ │ │ └── handlers.py │ │ │ │ ├── responses.py │ │ │ │ ├── types.py │ │ │ │ └── ui │ │ │ │ ├── __init__.py │ │ │ │ ├── __main__.py │ │ │ │ └── gradio │ │ │ │ ├── __init__.py │ │ │ │ ├── app.py │ │ │ │ └── ui_components.py │ │ │ ├── benchmarks │ │ │ │ ├── .gitignore │ │ │ │ ├── contrib.md │ │ │ │ ├── interactive.py │ │ │ │ ├── models │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ └── gta1.py │ │ │ │ ├── README.md │ │ │ │ ├── ss-pro.py │ │ │ │ ├── ss-v2.py │ │ │ │ └── utils.py │ │ │ ├── example.py │ │ │ ├── pyproject.toml │ │ │ └── README.md │ │ ├── computer │ │ │ ├── .bumpversion.cfg │ │ │ ├── computer │ │ │ │ ├── __init__.py │ │ │ │ ├── computer.py │ │ │ │ ├── diorama_computer.py │ │ │ │ ├── helpers.py │ │ │ │ ├── interface │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── factory.py │ │ │ │ │ ├── generic.py │ │ │ │ │ ├── linux.py │ │ │ │ │ ├── macos.py │ │ │ │ │ ├── models.py │ │ │ │ │ └── windows.py │ │ │ │ ├── logger.py │ │ │ │ ├── models.py │ │ │ │ ├── providers │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── cloud │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ └── provider.py │ │ │ │ │ ├── docker │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ └── provider.py │ │ │ │ │ ├── factory.py │ │ │ │ │ ├── lume │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ └── provider.py │ │ │ │ │ ├── lume_api.py │ │ │ │ │ ├── lumier │ │ │ │ │ │ ├── __init__.py │ │ │ │ │ │ └── provider.py │ │ │ │ │ ├── types.py │ │ │ │ │ └── winsandbox │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── provider.py │ │ │ │ │ └── setup_script.ps1 │ │ │ │ ├── ui │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── __main__.py │ │ │ │ │ └── gradio │ │ │ │ │ ├── __init__.py │ │ │ │ │ └── app.py │ │ │ │ └── utils.py │ │ │ ├── poetry.toml │ │ │ ├── pyproject.toml │ │ │ └── README.md │ │ ├── computer-server │ │ │ ├── .bumpversion.cfg │ │ │ ├── computer_server │ │ │ │ ├── __init__.py │ │ │ │ ├── __main__.py │ │ │ │ ├── cli.py │ │ │ │ ├── diorama │ │ │ │ │ ├── __init__.py │ │ │ │ │ ├── base.py │ │ │ │ │ ├── diorama_computer.py │ │ │ │ │ ├── diorama.py │ │ │ │ │ ├── draw.py │ │ │ │ │ ├── macos.py │ │ │ │ │ └── safezone.py │ │ │ │ ├── handlers │ │ │ │ │ ├── base.py │ │ │ │ │ ├── factory.py │ │ │ │ │ ├── generic.py │ │ │ │ │ ├── linux.py │ │ │ │ │ ├── macos.py │ │ │ │ │ └── windows.py │ │ │ │ ├── main.py │ │ │ │ ├── server.py │ │ │ │ └── watchdog.py │ │ │ ├── examples │ │ │ │ ├── __init__.py │ │ │ │ └── usage_example.py │ │ │ ├── pyproject.toml │ │ │ ├── README.md │ │ │ ├── run_server.py │ │ │ └── test_connection.py │ │ ├── core │ │ │ ├── .bumpversion.cfg │ │ │ ├── core │ │ │ │ ├── __init__.py │ │ │ │ └── telemetry │ │ │ │ ├── __init__.py │ │ │ │ └── posthog.py │ │ │ ├── poetry.toml │ │ │ ├── pyproject.toml │ │ │ └── README.md │ │ ├── mcp-server │ │ │ ├── .bumpversion.cfg │ │ │ ├── CONCURRENT_SESSIONS.md │ │ │ ├── mcp_server │ │ │ │ ├── __init__.py │ │ │ │ ├── __main__.py │ │ │ │ ├── server.py │ │ │ │ └── session_manager.py │ │ │ ├── pdm.lock │ │ │ ├── pyproject.toml │ │ │ ├── README.md │ │ │ └── scripts │ │ │ ├── install_mcp_server.sh │ │ │ └── start_mcp_server.sh │ │ ├── pylume │ │ │ ├── __init__.py │ │ │ ├── .bumpversion.cfg │ │ │ ├── pylume │ │ │ │ ├── __init__.py │ │ │ │ ├── client.py │ │ │ │ ├── exceptions.py │ │ │ │ ├── lume │ │ │ │ ├── models.py │ │ │ │ ├── pylume.py │ │ │ │ └── server.py │ │ │ ├── pyproject.toml │ │ │ └── README.md │ │ └── som │ │ ├── .bumpversion.cfg │ │ ├── LICENSE │ │ ├── poetry.toml │ │ ├── pyproject.toml │ │ ├── README.md │ │ ├── som │ │ │ ├── __init__.py │ │ │ ├── detect.py │ │ │ ├── detection.py │ │ │ ├── models.py │ │ │ ├── ocr.py │ │ │ ├── util │ │ │ │ └── utils.py │ │ │ └── visualization.py │ │ └── tests │ │ └── test_omniparser.py │ ├── typescript │ │ ├── .gitignore │ │ ├── .nvmrc │ │ ├── agent │ │ │ ├── examples │ │ │ │ ├── playground-example.html │ │ │ │ └── README.md │ │ │ ├── package.json │ │ │ ├── README.md │ │ │ ├── src │ │ │ │ ├── client.ts │ │ │ │ ├── index.ts │ │ │ │ └── types.ts │ │ │ ├── tests │ │ │ │ └── client.test.ts │ │ │ ├── tsconfig.json │ │ │ ├── tsdown.config.ts │ │ │ └── vitest.config.ts │ │ ├── biome.json │ │ ├── computer │ │ │ ├── .editorconfig │ │ │ ├── .gitattributes │ │ │ ├── .gitignore │ │ │ ├── LICENSE │ │ │ ├── package.json │ │ │ ├── README.md │ │ │ ├── src │ │ │ │ ├── computer │ │ │ │ │ ├── index.ts │ │ │ │ │ ├── providers │ │ │ │ │ │ ├── base.ts │ │ │ │ │ │ ├── cloud.ts │ │ │ │ │ │ └── index.ts │ │ │ │ │ └── types.ts │ │ │ │ ├── index.ts │ │ │ │ ├── interface │ │ │ │ │ ├── base.ts │ │ │ │ │ ├── factory.ts │ │ │ │ │ ├── index.ts │ │ │ │ │ ├── linux.ts │ │ │ │ │ ├── macos.ts │ │ │ │ │ └── windows.ts │ │ │ │ └── types.ts │ │ │ ├── tests │ │ │ │ ├── computer │ │ │ │ │ └── cloud.test.ts │ │ │ │ ├── interface │ │ │ │ │ ├── factory.test.ts │ │ │ │ │ ├── index.test.ts │ │ │ │ │ ├── linux.test.ts │ │ │ │ │ ├── macos.test.ts │ │ │ │ │ └── windows.test.ts │ │ │ │ └── setup.ts │ │ │ ├── tsconfig.json │ │ │ ├── tsdown.config.ts │ │ │ └── vitest.config.ts │ │ ├── core │ │ │ ├── .editorconfig │ │ │ ├── .gitattributes │ │ │ ├── .gitignore │ │ │ ├── LICENSE │ │ │ ├── package.json │ │ │ ├── README.md │ │ │ ├── src │ │ │ │ ├── index.ts │ │ │ │ └── telemetry │ │ │ │ ├── clients │ │ │ │ │ ├── index.ts │ │ │ │ │ └── posthog.ts │ │ │ │ └── index.ts │ │ │ ├── tests │ │ │ │ └── telemetry.test.ts │ │ │ ├── tsconfig.json │ │ │ ├── tsdown.config.ts │ │ │ └── vitest.config.ts │ │ ├── package.json │ │ ├── pnpm-lock.yaml │ │ ├── pnpm-workspace.yaml │ │ └── README.md │ └── xfce │ ├── .dockerignore │ ├── .gitignore │ ├── Dockerfile │ ├── README.md │ └── src │ ├── scripts │ │ ├── resize-display.sh │ │ ├── start-computer-server.sh │ │ ├── start-novnc.sh │ │ ├── start-vnc.sh │ │ └── xstartup.sh │ ├── supervisor │ │ └── supervisord.conf │ └── xfce-config │ ├── helpers.rc │ ├── xfce4-power-manager.xml │ └── xfce4-session.xml ├── LICENSE.md ├── Makefile ├── notebooks │ ├── agent_nb.ipynb │ ├── blog │ │ ├── build-your-own-operator-on-macos-1.ipynb │ │ └── build-your-own-operator-on-macos-2.ipynb │ ├── composite_agents_docker_nb.ipynb │ ├── computer_nb.ipynb │ ├── computer_server_nb.ipynb │ ├── customizing_computeragent.ipynb │ ├── eval_osworld.ipynb │ ├── ollama_nb.ipynb │ ├── pylume_nb.ipynb │ ├── README.md │ ├── sota_hackathon_cloud.ipynb │ └── sota_hackathon.ipynb ├── pdm.lock ├── pyproject.toml ├── pyrightconfig.json ├── README.md ├── samples │ └── community │ ├── global-online │ │ └── README.md │ └── hack-the-north │ └── README.md ├── scripts │ ├── build-uv.sh │ ├── build.ps1 │ ├── build.sh │ ├── cleanup.sh │ ├── playground-docker.sh │ ├── playground.sh │ └── run-docker-dev.sh └── tests ├── pytest.ini ├── shell_cmd.py ├── test_files.py ├── test_mcp_server_session_management.py ├── test_mcp_server_streaming.py ├── test_shell_bash.py ├── test_telemetry.py ├── test_venv.py └── test_watchdog.py ``` # Files -------------------------------------------------------------------------------- /libs/python/mcp-server/CONCURRENT_SESSIONS.md: -------------------------------------------------------------------------------- ```markdown 1 | # MCP Server Concurrent Session Management 2 | 3 | This document describes the improvements made to the MCP Server to address concurrent session management and resource lifecycle issues. 4 | 5 | ## Problem Statement 6 | 7 | The original MCP server implementation had several critical issues: 8 | 9 | 1. **Global Computer Instance**: Used a single `global_computer` variable shared across all clients 10 | 2. **No Resource Isolation**: Multiple clients would interfere with each other 11 | 3. **Sequential Task Processing**: Multi-task operations were always sequential 12 | 4. **No Graceful Shutdown**: Server couldn't properly cleanup resources on shutdown 13 | 5. **Hidden Event Loop**: `server.run()` hid the event loop, preventing proper lifecycle management 14 | 15 | ## Solution Architecture 16 | 17 | ### 1. Session Manager (`session_manager.py`) 18 | 19 | The `SessionManager` class provides: 20 | 21 | - **Per-session computer instances**: Each client gets isolated computer resources 22 | - **Computer instance pooling**: Efficient reuse of computer instances with lifecycle management 23 | - **Task registration**: Track active tasks per session for graceful cleanup 24 | - **Automatic cleanup**: Background task cleans up idle sessions 25 | - **Resource limits**: Configurable maximum concurrent sessions 26 | 27 | #### Key Components: 28 | 29 | ```python 30 | class SessionManager: 31 | def __init__(self, max_concurrent_sessions: int = 10): 32 | self._sessions: Dict[str, SessionInfo] = {} 33 | self._computer_pool = ComputerPool() 34 | # ... lifecycle management 35 | ``` 36 | 37 | #### Session Lifecycle: 38 | 39 | 1. **Creation**: New session created when client first connects 40 | 2. **Task Registration**: Each task is registered with the session 41 | 3. **Activity Tracking**: Last activity time updated on each operation 42 | 4. **Cleanup**: Sessions cleaned up when idle or on shutdown 43 | 44 | ### 2. Computer Pool (`ComputerPool`) 45 | 46 | Manages computer instances efficiently: 47 | 48 | - **Pool Size Limits**: Maximum number of concurrent computer instances 49 | - **Instance Reuse**: Available instances reused across sessions 50 | - **Lifecycle Management**: Proper startup/shutdown of computer instances 51 | - **Resource Cleanup**: All instances properly closed on shutdown 52 | 53 | ### 3. Enhanced Server Tools 54 | 55 | All server tools now support: 56 | 57 | - **Session ID Parameter**: Optional `session_id` for multi-client support 58 | - **Resource Isolation**: Each session gets its own computer instance 59 | - **Task Tracking**: Proper registration/unregistration of tasks 60 | - **Error Handling**: Graceful error handling with session cleanup 61 | 62 | #### Updated Tool Signatures: 63 | 64 | ```python 65 | async def screenshot_cua(ctx: Context, session_id: Optional[str] = None) -> Any: 66 | async def run_cua_task(ctx: Context, task: str, session_id: Optional[str] = None) -> Any: 67 | async def run_multi_cua_tasks(ctx: Context, tasks: List[str], session_id: Optional[str] = None, concurrent: bool = False) -> Any: 68 | ``` 69 | 70 | ### 4. Concurrent Task Execution 71 | 72 | The `run_multi_cua_tasks` tool now supports: 73 | 74 | - **Sequential Mode** (default): Tasks run one after another 75 | - **Concurrent Mode**: Tasks run in parallel using `asyncio.gather()` 76 | - **Progress Tracking**: Proper progress reporting for both modes 77 | - **Error Handling**: Individual task failures don't stop other tasks 78 | 79 | ### 5. Graceful Shutdown 80 | 81 | The server now provides: 82 | 83 | - **Signal Handlers**: Proper handling of SIGINT and SIGTERM 84 | - **Session Cleanup**: All active sessions properly cleaned up 85 | - **Resource Release**: Computer instances returned to pool and closed 86 | - **Async Lifecycle**: Event loop properly exposed for cleanup 87 | 88 | ## Usage Examples 89 | 90 | ### Basic Usage (Backward Compatible) 91 | 92 | ```python 93 | # These calls work exactly as before 94 | await screenshot_cua(ctx) 95 | await run_cua_task(ctx, "Open browser") 96 | await run_multi_cua_tasks(ctx, ["Task 1", "Task 2"]) 97 | ``` 98 | 99 | ### Multi-Client Usage 100 | 101 | ```python 102 | # Client 1 103 | session_id_1 = "client-1-session" 104 | await screenshot_cua(ctx, session_id_1) 105 | await run_cua_task(ctx, "Open browser", session_id_1) 106 | 107 | # Client 2 (completely isolated) 108 | session_id_2 = "client-2-session" 109 | await screenshot_cua(ctx, session_id_2) 110 | await run_cua_task(ctx, "Open editor", session_id_2) 111 | ``` 112 | 113 | ### Concurrent Task Execution 114 | 115 | ```python 116 | # Run tasks concurrently instead of sequentially 117 | tasks = ["Open browser", "Open editor", "Open terminal"] 118 | results = await run_multi_cua_tasks(ctx, tasks, concurrent=True) 119 | ``` 120 | 121 | ### Session Management 122 | 123 | ```python 124 | # Get session statistics 125 | stats = await get_session_stats(ctx) 126 | print(f"Active sessions: {stats['total_sessions']}") 127 | 128 | # Cleanup specific session 129 | await cleanup_session(ctx, "session-to-cleanup") 130 | ``` 131 | 132 | ## Configuration 133 | 134 | ### Environment Variables 135 | 136 | - `CUA_MODEL_NAME`: Model to use (default: `anthropic/claude-3-5-sonnet-20241022`) 137 | - `CUA_MAX_IMAGES`: Maximum images to keep (default: `3`) 138 | 139 | ### Session Manager Configuration 140 | 141 | ```python 142 | # In session_manager.py 143 | class SessionManager: 144 | def __init__(self, max_concurrent_sessions: int = 10): 145 | # Configurable maximum concurrent sessions 146 | 147 | class ComputerPool: 148 | def __init__(self, max_size: int = 5, idle_timeout: float = 300.0): 149 | # Configurable pool size and idle timeout 150 | ``` 151 | 152 | ## Performance Improvements 153 | 154 | ### Before (Issues): 155 | - ❌ Single global computer instance 156 | - ❌ Client interference and resource conflicts 157 | - ❌ Sequential task processing only 158 | - ❌ No graceful shutdown 159 | - ❌ 30s timeout issues with long-running tasks 160 | 161 | ### After (Benefits): 162 | - ✅ Per-session computer instances with proper isolation 163 | - ✅ Computer instance pooling for efficient resource usage 164 | - ✅ Concurrent task execution support 165 | - ✅ Graceful shutdown with proper cleanup 166 | - ✅ Streaming updates prevent timeout issues 167 | - ✅ Configurable resource limits 168 | - ✅ Automatic session cleanup 169 | 170 | ## Testing 171 | 172 | Comprehensive test coverage includes: 173 | 174 | - Session creation and reuse 175 | - Concurrent session isolation 176 | - Task registration and cleanup 177 | - Error handling with session management 178 | - Concurrent vs sequential task execution 179 | - Session statistics and cleanup 180 | 181 | Run tests with: 182 | 183 | ```bash 184 | pytest tests/test_mcp_server_session_management.py -v 185 | ``` 186 | 187 | ## Migration Guide 188 | 189 | ### For Existing Clients 190 | 191 | No changes required! The new implementation is fully backward compatible: 192 | 193 | ```python 194 | # This still works exactly as before 195 | await run_cua_task(ctx, "My task") 196 | ``` 197 | 198 | ### For New Multi-Client Applications 199 | 200 | Use session IDs for proper isolation: 201 | 202 | ```python 203 | # Create a unique session ID for each client 204 | session_id = str(uuid.uuid4()) 205 | await run_cua_task(ctx, "My task", session_id) 206 | ``` 207 | 208 | ### For Concurrent Task Execution 209 | 210 | Enable concurrent mode for better performance: 211 | 212 | ```python 213 | tasks = ["Task 1", "Task 2", "Task 3"] 214 | results = await run_multi_cua_tasks(ctx, tasks, concurrent=True) 215 | ``` 216 | 217 | ## Monitoring and Debugging 218 | 219 | ### Session Statistics 220 | 221 | ```python 222 | stats = await get_session_stats(ctx) 223 | print(f"Total sessions: {stats['total_sessions']}") 224 | print(f"Max concurrent: {stats['max_concurrent']}") 225 | for session_id, session_info in stats['sessions'].items(): 226 | print(f"Session {session_id}: {session_info['active_tasks']} active tasks") 227 | ``` 228 | 229 | ### Logging 230 | 231 | The server provides detailed logging for: 232 | 233 | - Session creation and cleanup 234 | - Task registration and completion 235 | - Resource pool usage 236 | - Error conditions and recovery 237 | 238 | ### Graceful Shutdown 239 | 240 | The server properly handles shutdown signals: 241 | 242 | ```bash 243 | # Send SIGTERM for graceful shutdown 244 | kill -TERM <server_pid> 245 | 246 | # Or use Ctrl+C (SIGINT) 247 | ``` 248 | 249 | ## Future Enhancements 250 | 251 | Potential future improvements: 252 | 253 | 1. **Session Persistence**: Save/restore session state across restarts 254 | 2. **Load Balancing**: Distribute sessions across multiple server instances 255 | 3. **Resource Monitoring**: Real-time monitoring of resource usage 256 | 4. **Auto-scaling**: Dynamic adjustment of pool size based on demand 257 | 5. **Session Timeouts**: Configurable timeouts for different session types 258 | ``` -------------------------------------------------------------------------------- /blog/human-in-the-loop.md: -------------------------------------------------------------------------------- ```markdown 1 | # When Agents Need Human Wisdom - Introducing Human-In-The-Loop Support 2 | 3 | *Published on August 29, 2025 by Francesco Bonacci* 4 | 5 | Sometimes the best AI agent is a human. Whether you're creating training demonstrations, evaluating complex scenarios, or need to intervene when automation hits a wall, our new Human-In-The-Loop integration puts you directly in control. 6 | 7 | With yesterday's [HUD evaluation integration](hud-agent-evals.md), you could benchmark any agent at scale. Today's update lets you *become* the agent when it matters most—seamlessly switching between automated intelligence and human judgment. 8 | 9 | <div align="center"> 10 | <video src="https://github.com/user-attachments/assets/9091b50f-26e7-4981-95ce-40e5d42a1260" width="600" controls></video> 11 | </div> 12 | 13 | ## What you get 14 | 15 | - **One-line human takeover** for any agent configuration with `human/human` or `model+human/human` 16 | - **Interactive web UI** to see what your agent sees and control what it does 17 | - **Zero context switching** - step in exactly where automation left off 18 | - **Training data generation** - create perfect demonstrations by doing tasks yourself 19 | - **Ground truth evaluation** - validate agent performance with human expertise 20 | 21 | ## Why Human-In-The-Loop? 22 | 23 | Even the most sophisticated agents encounter edge cases, ambiguous interfaces, or tasks requiring human judgment. Rather than failing gracefully, they can now fail *intelligently*—by asking for human help. 24 | 25 | This approach bridges the gap between fully automated systems and pure manual control, letting you: 26 | - **Demonstrate complex workflows** that agents can learn from 27 | - **Evaluate tricky scenarios** where ground truth requires human assessment 28 | - **Intervene selectively** when automated agents need guidance 29 | - **Test and debug** your tools and environments manually 30 | 31 | ## Getting Started 32 | 33 | Launch the human agent interface: 34 | 35 | ```bash 36 | python -m agent.human_tool 37 | ``` 38 | 39 | The web UI will show pending completions. Click any completion to take control of the agent and see exactly what it sees. 40 | 41 | ## Usage Examples 42 | 43 | ### Direct Human Control 44 | 45 | Perfect for creating demonstrations or when you want full manual control: 46 | 47 | ```python 48 | from agent import ComputerAgent 49 | from agent.computer import computer 50 | 51 | agent = ComputerAgent( 52 | "human/human", 53 | tools=[computer] 54 | ) 55 | 56 | # You'll get full control through the web UI 57 | async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"): 58 | pass 59 | ``` 60 | 61 | ### Hybrid: AI Planning + Human Execution 62 | 63 | Combine model intelligence with human precision—let AI plan, then execute manually: 64 | 65 | ```python 66 | agent = ComputerAgent( 67 | "huggingface-local/HelloKKMe/GTA1-7B+human/human", 68 | tools=[computer] 69 | ) 70 | 71 | # AI creates the plan, human executes each step 72 | async for _ in agent.run("Navigate to the settings page and enable dark mode"): 73 | pass 74 | ``` 75 | 76 | ### Fallback Pattern 77 | 78 | Start automated, escalate to human when needed: 79 | 80 | ```python 81 | # Primary automated agent 82 | primary_agent = ComputerAgent("openai/computer-use-preview", tools=[computer]) 83 | 84 | # Human fallback agent 85 | fallback_agent = ComputerAgent("human/human", tools=[computer]) 86 | 87 | try: 88 | async for result in primary_agent.run(task): 89 | if result.confidence < 0.7: # Low confidence threshold 90 | # Seamlessly hand off to human 91 | async for _ in fallback_agent.run(f"Continue this task: {task}"): 92 | pass 93 | except Exception: 94 | # Agent failed, human takes over 95 | async for _ in fallback_agent.run(f"Handle this failed task: {task}"): 96 | pass 97 | ``` 98 | 99 | ## Interactive Features 100 | 101 | The human-in-the-loop interface provides a rich, responsive experience: 102 | 103 | ### **Visual Environment** 104 | - **Screenshot display** with live updates as you work 105 | - **Click handlers** for direct interaction with UI elements 106 | - **Zoom and pan** to see details clearly 107 | 108 | ### **Action Controls** 109 | - **Click actions** - precise cursor positioning and clicking 110 | - **Keyboard input** - type text naturally or send specific key combinations 111 | - **Action history** - see the sequence of actions taken 112 | - **Undo support** - step back when needed 113 | 114 | ### **Tool Integration** 115 | - **Full OpenAI compatibility** - standard tool call format 116 | - **Custom tools** - integrate your own tools seamlessly 117 | - **Real-time feedback** - see tool responses immediately 118 | 119 | ### **Smart Polling** 120 | - **Responsive updates** - UI refreshes when new completions arrive 121 | - **Background processing** - continue working while waiting for tasks 122 | - **Session persistence** - resume interrupted sessions 123 | 124 | ## Real-World Use Cases 125 | 126 | ### **Training Data Generation** 127 | Create perfect demonstrations for fine-tuning: 128 | 129 | ```python 130 | # Generate training examples for spreadsheet tasks 131 | demo_agent = ComputerAgent("human/human", tools=[computer]) 132 | 133 | tasks = [ 134 | "Create a budget spreadsheet with income and expense categories", 135 | "Apply conditional formatting to highlight overbudget items", 136 | "Generate a pie chart showing expense distribution" 137 | ] 138 | 139 | for task in tasks: 140 | # Human demonstrates each task perfectly 141 | async for _ in demo_agent.run(task): 142 | pass # Recorded actions become training data 143 | ``` 144 | 145 | ### **Evaluation and Ground Truth** 146 | Validate agent performance on complex scenarios: 147 | 148 | ```python 149 | # Human evaluates agent performance 150 | evaluator = ComputerAgent("human/human", tools=[computer]) 151 | 152 | async for _ in evaluator.run("Review this completed form and rate accuracy (1-10)"): 153 | pass # Human provides authoritative quality assessment 154 | ``` 155 | 156 | ### **Interactive Debugging** 157 | Step through agent behavior manually: 158 | 159 | ```python 160 | # Test a workflow step by step 161 | debug_agent = ComputerAgent("human/human", tools=[computer]) 162 | 163 | async for _ in debug_agent.run("Reproduce the agent's failed login sequence"): 164 | pass # Human identifies exactly where automation breaks 165 | ``` 166 | 167 | ### **Edge Case Handling** 168 | Handle scenarios that break automated agents: 169 | 170 | ```python 171 | # Complex UI interaction requiring human judgment 172 | edge_case_agent = ComputerAgent("human/human", tools=[computer]) 173 | 174 | async for _ in edge_case_agent.run("Navigate this CAPTCHA-protected form"): 175 | pass # Human handles what automation cannot 176 | ``` 177 | 178 | ## Configuration Options 179 | 180 | Customize the human agent experience: 181 | 182 | - **UI refresh rate**: Adjust polling frequency for your workflow 183 | - **Image quality**: Balance detail vs. performance for screenshots 184 | - **Action logging**: Save detailed traces for analysis and training 185 | - **Session timeout**: Configure idle timeouts for security 186 | - **Tool permissions**: Restrict which tools humans can access 187 | 188 | ## When to Use Human-In-The-Loop 189 | 190 | | **Scenario** | **Why Human Control** | 191 | |--------------|----------------------| 192 | | **Creating training data** | Perfect demonstrations for model fine-tuning | 193 | | **Evaluating complex tasks** | Human judgment for subjective or nuanced assessment | 194 | | **Handling edge cases** | CAPTCHAs, unusual UIs, context-dependent decisions | 195 | | **Debugging workflows** | Step through failures to identify breaking points | 196 | | **High-stakes operations** | Critical tasks requiring human oversight and approval | 197 | | **Testing new environments** | Validate tools and environments work as expected | 198 | 199 | ## Learn More 200 | 201 | - **Interactive examples**: Try human-in-the-loop control with sample tasks 202 | - **Training data pipelines**: Learn how to convert human demonstrations into model training data 203 | - **Evaluation frameworks**: Build human-validated test suites for your agents 204 | - **API documentation**: Full reference for human agent configuration 205 | 206 | Ready to put humans back in the loop? The most sophisticated AI system knows when to ask for help. 207 | 208 | --- 209 | 210 | *Questions about human-in-the-loop agents? Join the conversation in our [Discord community](https://discord.gg/cua-ai) or check out our [documentation](https://docs.trycua.com/docs/agent-sdk/supported-agents/human-in-the-loop).* 211 | ``` -------------------------------------------------------------------------------- /docs/content/docs/quickstart-cli.mdx: -------------------------------------------------------------------------------- ```markdown 1 | --- 2 | title: Quickstart (CLI) 3 | description: Get started with the cua Agent CLI in 4 steps 4 | icon: Rocket 5 | --- 6 | 7 | import { Step, Steps } from 'fumadocs-ui/components/steps'; 8 | import { Tab, Tabs } from 'fumadocs-ui/components/tabs'; 9 | import { Accordion, Accordions } from 'fumadocs-ui/components/accordion'; 10 | 11 | Get up and running with the cua Agent CLI in 4 simple steps. 12 | 13 | <Steps> 14 | <Step> 15 | 16 | ## Introduction 17 | 18 | cua combines Computer (interface) + Agent (AI) for automating desktop apps. The Agent CLI provides a clean terminal interface to control your remote computer using natural language commands. 19 | 20 | </Step> 21 | 22 | <Step> 23 | 24 | ## Set Up Your Computer Environment 25 | 26 | Choose how you want to run your cua computer. **Cloud Sandbox is recommended** for the easiest setup: 27 | 28 | <Tabs items={['☁️ Cloud Sandbox (Recommended)', 'Linux on Docker', 'Windows Sandbox', 'macOS VM']}> 29 | <Tab value="☁️ Cloud Sandbox (Recommended)"> 30 | 31 | **Easiest & safest way to get started - works on any host OS** 32 | 33 | 1. Go to [trycua.com/signin](https://www.trycua.com/signin) 34 | 2. Navigate to **Dashboard > Containers > Create Instance** 35 | 3. Create a **Medium, Ubuntu 22** container 36 | 4. Note your container name and API key 37 | 38 | Your cloud container will be automatically configured and ready to use. 39 | 40 | </Tab> 41 | <Tab value="Linux on Docker"> 42 | 43 | **Run Linux desktop locally on macOS, Windows, or Linux hosts** 44 | 45 | 1. Install Docker Desktop or Docker Engine 46 | 47 | 2. Pull the CUA XFCE container (lightweight desktop) 48 | 49 | ```bash 50 | docker pull --platform=linux/amd64 trycua/cua-xfce:latest 51 | ``` 52 | 53 | Or use KASM for a full-featured desktop: 54 | 55 | ```bash 56 | docker pull --platform=linux/amd64 trycua/cua-ubuntu:latest 57 | ``` 58 | 59 | </Tab> 60 | <Tab value="Windows Sandbox"> 61 | 62 | **Windows hosts only - requires Windows 10 Pro/Enterprise or Windows 11** 63 | 64 | 1. Enable Windows Sandbox 65 | 2. Install pywinsandbox dependency 66 | 67 | ```bash 68 | pip install -U git+git://github.com/karkason/pywinsandbox.git 69 | ``` 70 | 71 | 3. Windows Sandbox will be automatically configured when you run the CLI 72 | 73 | </Tab> 74 | <Tab value="macOS VM"> 75 | 76 | **macOS hosts only - requires Lume CLI** 77 | 78 | 1. Install lume cli 79 | 80 | ```bash 81 | /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)" 82 | ``` 83 | 84 | 2. Start a local cua macOS VM 85 | 86 | ```bash 87 | lume run macos-sequoia-cua:latest 88 | ``` 89 | 90 | </Tab> 91 | </Tabs> 92 | 93 | </Step> 94 | 95 | <Step> 96 | 97 | ## Install cua 98 | 99 | <Accordions type="single" defaultValue="uv"> 100 | 101 | <Accordion title="uv (Recommended)" value="uv"> 102 | 103 | ### Install uv 104 | 105 | <Tabs items={['macOS / Linux', 'Windows']} persist> 106 | <Tab value="macOS / Linux"> 107 | 108 | ```bash 109 | # Use curl to download the script and execute it with sh: 110 | curl -LsSf https://astral.sh/uv/install.sh | sh 111 | 112 | # If your system doesn't have curl, you can use wget: 113 | # wget -qO- https://astral.sh/uv/install.sh | sh 114 | ``` 115 | 116 | </Tab> 117 | <Tab value="Windows"> 118 | 119 | ```powershell 120 | # Use irm to download the script and execute it with iex: 121 | powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" 122 | ``` 123 | 124 | </Tab> 125 | </Tabs> 126 | 127 | ### Install Python 3.12 128 | 129 | ```bash 130 | uv python install 3.12 131 | # uv will install cua dependencies automatically when you use --with "cua-agent[cli]" 132 | ``` 133 | 134 | </Accordion> 135 | 136 | <Accordion title="conda" value="conda"> 137 | 138 | ### Install conda 139 | 140 | <Tabs items={['macOS', 'Linux', 'Windows']} persist> 141 | <Tab value="macOS"> 142 | 143 | ```bash 144 | mkdir -p ~/miniconda3 145 | curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o ~/miniconda3/miniconda.sh 146 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 147 | rm ~/miniconda3/miniconda.sh 148 | source ~/miniconda3/bin/activate 149 | ``` 150 | 151 | </Tab> 152 | <Tab value="Linux"> 153 | 154 | ```bash 155 | mkdir -p ~/miniconda3 156 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh 157 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 158 | rm ~/miniconda3/miniconda.sh 159 | source ~/miniconda3/bin/activate 160 | ``` 161 | 162 | </Tab> 163 | <Tab value="Windows"> 164 | 165 | ```powershell 166 | wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -outfile ".\miniconda.exe" 167 | Start-Process -FilePath ".\miniconda.exe" -ArgumentList "/S" -Wait 168 | del .\miniconda.exe 169 | ``` 170 | 171 | </Tab> 172 | </Tabs> 173 | 174 | ### Create and activate Python 3.12 environment 175 | 176 | ```bash 177 | conda create -n cua python=3.12 178 | conda activate cua 179 | ``` 180 | 181 | ### Install cua 182 | 183 | ```bash 184 | pip install "cua-agent[cli]" cua-computer 185 | ``` 186 | 187 | </Accordion> 188 | 189 | <Accordion title="pip" value="pip"> 190 | 191 | ### Install cua 192 | 193 | ```bash 194 | pip install "cua-agent[cli]" cua-computer 195 | ``` 196 | 197 | </Accordion> 198 | 199 | </Accordions> 200 | 201 | </Step> 202 | 203 | <Step> 204 | 205 | ## Run cua CLI 206 | 207 | Choose your preferred AI model: 208 | 209 | ### OpenAI Computer Use Preview 210 | 211 | <Tabs items={['uv', 'conda/pip']} persist> 212 | <Tab value="uv"> 213 | 214 | ```bash 215 | uv run --with "cua-agent[cli]" -m agent.cli openai/computer-use-preview 216 | ``` 217 | 218 | </Tab> 219 | <Tab value="conda/pip"> 220 | 221 | ```bash 222 | python -m agent.cli openai/computer-use-preview 223 | ``` 224 | 225 | </Tab> 226 | </Tabs> 227 | 228 | ### Anthropic Claude 229 | 230 | <Tabs items={['uv', 'conda/pip']} persist> 231 | <Tab value="uv"> 232 | 233 | ```bash 234 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-sonnet-4-5-20250929 235 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-opus-4-20250514 236 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-opus-4-1-20250805 237 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-sonnet-4-20250514 238 | uv run --with "cua-agent[cli]" -m agent.cli anthropic/claude-3-5-sonnet-20241022 239 | ``` 240 | 241 | </Tab> 242 | <Tab value="conda/pip"> 243 | 244 | ```bash 245 | python -m agent.cli anthropic/claude-sonnet-4-5-20250929 246 | python -m agent.cli anthropic/claude-opus-4-1-20250805 247 | python -m agent.cli anthropic/claude-opus-4-20250514 248 | python -m agent.cli anthropic/claude-sonnet-4-20250514 249 | python -m agent.cli anthropic/claude-3-5-sonnet-20241022 250 | ``` 251 | 252 | </Tab> 253 | </Tabs> 254 | 255 | ### Omniparser + LLMs 256 | 257 | <Tabs items={['uv', 'conda/pip']} persist> 258 | <Tab value="uv"> 259 | 260 | ```bash 261 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+anthropic/claude-3-5-sonnet-20241022 262 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+openai/gpt-4o 263 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+vertex_ai/gemini-pro 264 | ``` 265 | 266 | </Tab> 267 | <Tab value="conda/pip"> 268 | 269 | ```bash 270 | python -m agent.cli omniparser+anthropic/claude-3-5-sonnet-20241022 271 | python -m agent.cli omniparser+openai/gpt-4o 272 | python -m agent.cli omniparser+vertex_ai/gemini-pro 273 | ``` 274 | 275 | </Tab> 276 | </Tabs> 277 | 278 | ### Local Models 279 | 280 | <Tabs items={['uv', 'conda/pip']} persist> 281 | <Tab value="uv"> 282 | 283 | ```bash 284 | # Hugging Face models (local) 285 | uv run --with "cua-agent[cli]" -m agent.cli huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B 286 | 287 | # MLX models (Apple Silicon) 288 | uv run --with "cua-agent[cli]" -m agent.cli mlx/mlx-community/UI-TARS-1.5-7B-6bit 289 | 290 | # Ollama models 291 | uv run --with "cua-agent[cli]" -m agent.cli omniparser+ollama_chat/llama3.2:latest 292 | ``` 293 | 294 | </Tab> 295 | <Tab value="conda/pip"> 296 | 297 | ```bash 298 | # Hugging Face models (local) 299 | python -m agent.cli huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B 300 | 301 | # MLX models (Apple Silicon) 302 | python -m agent.cli mlx/mlx-community/UI-TARS-1.5-7B-6bit 303 | 304 | # Ollama models 305 | python -m agent.cli omniparser+ollama_chat/llama3.2:latest 306 | ``` 307 | 308 | </Tab> 309 | </Tabs> 310 | 311 | ### Interactive Setup 312 | 313 | If you haven't set up environment variables, the CLI will guide you through the setup: 314 | 315 | 1. **Sandbox Name**: Enter your cua sandbox name (or get one at [trycua.com](https://www.trycua.com/)) 316 | 2. **CUA API Key**: Enter your cua API key 317 | 3. **Provider API Key**: Enter your AI provider API key (OpenAI, Anthropic, etc.) 318 | 319 | ### Start Chatting 320 | 321 | Once connected, you'll see: 322 | 323 | ``` 324 | 💻 Connected to your-container-name (model, agent_loop) 325 | Type 'exit' to quit. 326 | 327 | > 328 | ``` 329 | 330 | You can ask your agent to perform actions like: 331 | 332 | - "Take a screenshot and tell me what's on the screen" 333 | - "Open Firefox and go to github.com" 334 | - "Type 'Hello world' into the terminal" 335 | - "Close the current window" 336 | - "Click on the search button" 337 | 338 | </Step> 339 | </Steps> 340 | 341 | --- 342 | 343 | For advanced Python usage and GUI interface, see the [Quickstart (GUI)](/quickstart-ui) and [Quickstart for Developers](/quickstart-devs). 344 | 345 | For running models locally, see [Running Models Locally](/agent-sdk/local-models). 346 | ``` -------------------------------------------------------------------------------- /libs/python/agent/agent/human_tool/server.py: -------------------------------------------------------------------------------- ```python 1 | import asyncio 2 | import uuid 3 | from datetime import datetime 4 | from typing import Dict, List, Any, Optional 5 | from dataclasses import dataclass, asdict 6 | from enum import Enum 7 | 8 | from fastapi import FastAPI, HTTPException 9 | from pydantic import BaseModel 10 | 11 | 12 | class CompletionStatus(str, Enum): 13 | PENDING = "pending" 14 | COMPLETED = "completed" 15 | FAILED = "failed" 16 | 17 | 18 | @dataclass 19 | class CompletionCall: 20 | id: str 21 | messages: List[Dict[str, Any]] 22 | model: str 23 | status: CompletionStatus 24 | created_at: datetime 25 | completed_at: Optional[datetime] = None 26 | response: Optional[str] = None 27 | tool_calls: Optional[List[Dict[str, Any]]] = None 28 | error: Optional[str] = None 29 | 30 | 31 | class ToolCall(BaseModel): 32 | id: str 33 | type: str = "function" 34 | function: Dict[str, Any] 35 | 36 | 37 | class CompletionRequest(BaseModel): 38 | messages: List[Dict[str, Any]] 39 | model: str 40 | 41 | 42 | class CompletionResponse(BaseModel): 43 | response: Optional[str] = None 44 | tool_calls: Optional[List[Dict[str, Any]]] = None 45 | 46 | 47 | class CompletionQueue: 48 | def __init__(self): 49 | self._queue: Dict[str, CompletionCall] = {} 50 | self._pending_order: List[str] = [] 51 | self._lock = asyncio.Lock() 52 | 53 | async def add_completion(self, messages: List[Dict[str, Any]], model: str) -> str: 54 | """Add a completion call to the queue.""" 55 | async with self._lock: 56 | call_id = str(uuid.uuid4()) 57 | completion_call = CompletionCall( 58 | id=call_id, 59 | messages=messages, 60 | model=model, 61 | status=CompletionStatus.PENDING, 62 | created_at=datetime.now() 63 | ) 64 | self._queue[call_id] = completion_call 65 | self._pending_order.append(call_id) 66 | return call_id 67 | 68 | async def get_pending_calls(self) -> List[Dict[str, Any]]: 69 | """Get all pending completion calls.""" 70 | async with self._lock: 71 | pending_calls = [] 72 | for call_id in self._pending_order: 73 | if call_id in self._queue and self._queue[call_id].status == CompletionStatus.PENDING: 74 | call = self._queue[call_id] 75 | pending_calls.append({ 76 | "id": call.id, 77 | "model": call.model, 78 | "created_at": call.created_at.isoformat(), 79 | "messages": call.messages 80 | }) 81 | return pending_calls 82 | 83 | async def get_call_status(self, call_id: str) -> Optional[Dict[str, Any]]: 84 | """Get the status of a specific completion call.""" 85 | async with self._lock: 86 | if call_id not in self._queue: 87 | return None 88 | 89 | call = self._queue[call_id] 90 | result = { 91 | "id": call.id, 92 | "status": call.status.value, 93 | "created_at": call.created_at.isoformat(), 94 | "model": call.model, 95 | "messages": call.messages 96 | } 97 | 98 | if call.completed_at: 99 | result["completed_at"] = call.completed_at.isoformat() 100 | if call.response: 101 | result["response"] = call.response 102 | if call.tool_calls: 103 | result["tool_calls"] = call.tool_calls 104 | if call.error: 105 | result["error"] = call.error 106 | 107 | return result 108 | 109 | async def complete_call(self, call_id: str, response: Optional[str] = None, tool_calls: Optional[List[Dict[str, Any]]] = None) -> bool: 110 | """Mark a completion call as completed with a response or tool calls.""" 111 | async with self._lock: 112 | if call_id not in self._queue: 113 | return False 114 | 115 | call = self._queue[call_id] 116 | if call.status != CompletionStatus.PENDING: 117 | return False 118 | 119 | call.status = CompletionStatus.COMPLETED 120 | call.completed_at = datetime.now() 121 | call.response = response 122 | call.tool_calls = tool_calls 123 | 124 | # Remove from pending order 125 | if call_id in self._pending_order: 126 | self._pending_order.remove(call_id) 127 | 128 | return True 129 | 130 | async def fail_call(self, call_id: str, error: str) -> bool: 131 | """Mark a completion call as failed with an error.""" 132 | async with self._lock: 133 | if call_id not in self._queue: 134 | return False 135 | 136 | call = self._queue[call_id] 137 | if call.status != CompletionStatus.PENDING: 138 | return False 139 | 140 | call.status = CompletionStatus.FAILED 141 | call.completed_at = datetime.now() 142 | call.error = error 143 | 144 | # Remove from pending order 145 | if call_id in self._pending_order: 146 | self._pending_order.remove(call_id) 147 | 148 | return True 149 | 150 | async def wait_for_completion(self, call_id: str, timeout: float = 300.0) -> Optional[str]: 151 | """Wait for a completion call to be completed and return the response.""" 152 | start_time = asyncio.get_event_loop().time() 153 | 154 | while True: 155 | status = await self.get_call_status(call_id) 156 | if not status: 157 | return None 158 | 159 | if status["status"] == CompletionStatus.COMPLETED.value: 160 | return status.get("response") 161 | elif status["status"] == CompletionStatus.FAILED.value: 162 | raise Exception(f"Completion failed: {status.get('error', 'Unknown error')}") 163 | 164 | # Check timeout 165 | if asyncio.get_event_loop().time() - start_time > timeout: 166 | await self.fail_call(call_id, "Timeout waiting for human response") 167 | raise TimeoutError("Timeout waiting for human response") 168 | 169 | # Wait a bit before checking again 170 | await asyncio.sleep(0.5) 171 | 172 | 173 | # Global queue instance 174 | completion_queue = CompletionQueue() 175 | 176 | # FastAPI app 177 | app = FastAPI(title="Human Completion Server", version="1.0.0") 178 | 179 | 180 | @app.post("/queue", response_model=Dict[str, str]) 181 | async def queue_completion(request: CompletionRequest): 182 | """Add a completion request to the queue.""" 183 | call_id = await completion_queue.add_completion(request.messages, request.model) 184 | return {"id": call_id, "status": "queued"} 185 | 186 | 187 | @app.get("/pending") 188 | async def list_pending(): 189 | """List all pending completion calls.""" 190 | pending_calls = await completion_queue.get_pending_calls() 191 | return {"pending_calls": pending_calls} 192 | 193 | 194 | @app.get("/status/{call_id}") 195 | async def get_status(call_id: str): 196 | """Get the status of a specific completion call.""" 197 | status = await completion_queue.get_call_status(call_id) 198 | if not status: 199 | raise HTTPException(status_code=404, detail="Completion call not found") 200 | return status 201 | 202 | 203 | @app.post("/complete/{call_id}") 204 | async def complete_call(call_id: str, response: CompletionResponse): 205 | """Complete a call with a human response.""" 206 | success = await completion_queue.complete_call( 207 | call_id, 208 | response=response.response, 209 | tool_calls=response.tool_calls 210 | ) 211 | if success: 212 | return {"status": "success", "message": "Call completed"} 213 | else: 214 | raise HTTPException(status_code=404, detail="Call not found or already completed") 215 | 216 | 217 | @app.post("/fail/{call_id}") 218 | async def fail_call(call_id: str, error: Dict[str, str]): 219 | """Mark a call as failed.""" 220 | success = await completion_queue.fail_call(call_id, error.get("error", "Unknown error")) 221 | if not success: 222 | raise HTTPException(status_code=404, detail="Completion call not found or already completed") 223 | return {"status": "failed"} 224 | 225 | 226 | @app.get("/") 227 | async def root(): 228 | """Root endpoint.""" 229 | return {"message": "Human Completion Server is running"} 230 | 231 | 232 | if __name__ == "__main__": 233 | import uvicorn 234 | uvicorn.run(app, host="0.0.0.0", port=8002) 235 | ``` -------------------------------------------------------------------------------- /libs/python/agent/agent/computers/custom.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Custom computer handler implementation that accepts a dictionary of functions. 3 | """ 4 | 5 | import base64 6 | from typing import Dict, List, Any, Literal, Union, Optional, Callable 7 | from PIL import Image 8 | import io 9 | from .base import AsyncComputerHandler 10 | 11 | 12 | class CustomComputerHandler(AsyncComputerHandler): 13 | """Computer handler that implements the Computer protocol using a dictionary of custom functions.""" 14 | 15 | def __init__(self, functions: Dict[str, Callable]): 16 | """ 17 | Initialize with a dictionary of functions. 18 | 19 | Args: 20 | functions: Dictionary where keys are method names and values are callable functions. 21 | Only 'screenshot' is required, all others are optional. 22 | 23 | Raises: 24 | ValueError: If required 'screenshot' function is not provided. 25 | """ 26 | if 'screenshot' not in functions: 27 | raise ValueError("'screenshot' function is required in functions dictionary") 28 | 29 | self.functions = functions 30 | self._last_screenshot_size: Optional[tuple[int, int]] = None 31 | 32 | async def _call_function(self, func, *args, **kwargs): 33 | """ 34 | Call a function, handling both async and sync functions. 35 | 36 | Args: 37 | func: The function to call 38 | *args: Positional arguments to pass to the function 39 | **kwargs: Keyword arguments to pass to the function 40 | 41 | Returns: 42 | The result of the function call 43 | """ 44 | import asyncio 45 | import inspect 46 | 47 | if callable(func): 48 | if inspect.iscoroutinefunction(func): 49 | return await func(*args, **kwargs) 50 | else: 51 | return func(*args, **kwargs) 52 | else: 53 | return func 54 | 55 | async def _get_value(self, attribute: str): 56 | """ 57 | Get value for an attribute, checking both 'get_{attribute}' and '{attribute}' keys. 58 | 59 | Args: 60 | attribute: The attribute name to look for 61 | 62 | Returns: 63 | The value from the functions dict, called if callable, returned directly if not 64 | """ 65 | # Check for 'get_{attribute}' first 66 | get_key = f"get_{attribute}" 67 | if get_key in self.functions: 68 | return await self._call_function(self.functions[get_key]) 69 | 70 | # Check for '{attribute}' 71 | if attribute in self.functions: 72 | return await self._call_function(self.functions[attribute]) 73 | 74 | return None 75 | 76 | def _to_b64_str(self, img: Union[bytes, Image.Image, str]) -> str: 77 | """ 78 | Convert image to base64 string. 79 | 80 | Args: 81 | img: Image as bytes, PIL Image, or base64 string 82 | 83 | Returns: 84 | str: Base64 encoded image string 85 | """ 86 | if isinstance(img, str): 87 | # Already a base64 string 88 | return img 89 | elif isinstance(img, bytes): 90 | # Raw bytes 91 | return base64.b64encode(img).decode('utf-8') 92 | elif isinstance(img, Image.Image): 93 | # PIL Image 94 | buffer = io.BytesIO() 95 | img.save(buffer, format='PNG') 96 | return base64.b64encode(buffer.getvalue()).decode('utf-8') 97 | else: 98 | raise ValueError(f"Unsupported image type: {type(img)}") 99 | 100 | # ==== Computer-Use-Preview Action Space ==== 101 | 102 | async def get_environment(self) -> Literal["windows", "mac", "linux", "browser"]: 103 | """Get the current environment type.""" 104 | result = await self._get_value('environment') 105 | if result is None: 106 | return "linux" 107 | assert result in ["windows", "mac", "linux", "browser"] 108 | return result # type: ignore 109 | 110 | async def get_dimensions(self) -> tuple[int, int]: 111 | """Get screen dimensions as (width, height).""" 112 | result = await self._get_value('dimensions') 113 | if result is not None: 114 | return result # type: ignore 115 | 116 | # Fallback: use last screenshot size if available 117 | if not self._last_screenshot_size: 118 | await self.screenshot() 119 | assert self._last_screenshot_size is not None, "Failed to get screenshot size" 120 | 121 | return self._last_screenshot_size 122 | 123 | async def screenshot(self) -> str: 124 | """Take a screenshot and return as base64 string.""" 125 | result = await self._call_function(self.functions['screenshot']) 126 | b64_str = self._to_b64_str(result) # type: ignore 127 | 128 | # Try to extract dimensions for fallback use 129 | try: 130 | if isinstance(result, Image.Image): 131 | self._last_screenshot_size = result.size 132 | elif isinstance(result, bytes): 133 | # Try to decode bytes to get dimensions 134 | img = Image.open(io.BytesIO(result)) 135 | self._last_screenshot_size = img.size 136 | except Exception: 137 | # If we can't get dimensions, that's okay 138 | pass 139 | 140 | return b64_str 141 | 142 | async def click(self, x: int, y: int, button: str = "left") -> None: 143 | """Click at coordinates with specified button.""" 144 | if 'click' in self.functions: 145 | await self._call_function(self.functions['click'], x, y, button) 146 | # No-op if not implemented 147 | 148 | async def double_click(self, x: int, y: int) -> None: 149 | """Double click at coordinates.""" 150 | if 'double_click' in self.functions: 151 | await self._call_function(self.functions['double_click'], x, y) 152 | # No-op if not implemented 153 | 154 | async def scroll(self, x: int, y: int, scroll_x: int, scroll_y: int) -> None: 155 | """Scroll at coordinates with specified scroll amounts.""" 156 | if 'scroll' in self.functions: 157 | await self._call_function(self.functions['scroll'], x, y, scroll_x, scroll_y) 158 | # No-op if not implemented 159 | 160 | async def type(self, text: str) -> None: 161 | """Type text.""" 162 | if 'type' in self.functions: 163 | await self._call_function(self.functions['type'], text) 164 | # No-op if not implemented 165 | 166 | async def wait(self, ms: int = 1000) -> None: 167 | """Wait for specified milliseconds.""" 168 | if 'wait' in self.functions: 169 | await self._call_function(self.functions['wait'], ms) 170 | else: 171 | # Default implementation 172 | import asyncio 173 | await asyncio.sleep(ms / 1000.0) 174 | 175 | async def move(self, x: int, y: int) -> None: 176 | """Move cursor to coordinates.""" 177 | if 'move' in self.functions: 178 | await self._call_function(self.functions['move'], x, y) 179 | # No-op if not implemented 180 | 181 | async def keypress(self, keys: Union[List[str], str]) -> None: 182 | """Press key combination.""" 183 | if 'keypress' in self.functions: 184 | await self._call_function(self.functions['keypress'], keys) 185 | # No-op if not implemented 186 | 187 | async def drag(self, path: List[Dict[str, int]]) -> None: 188 | """Drag along specified path.""" 189 | if 'drag' in self.functions: 190 | await self._call_function(self.functions['drag'], path) 191 | # No-op if not implemented 192 | 193 | async def get_current_url(self) -> str: 194 | """Get current URL (for browser environments).""" 195 | if 'get_current_url' in self.functions: 196 | return await self._get_value('current_url') # type: ignore 197 | return "" # Default fallback 198 | 199 | async def left_mouse_down(self, x: Optional[int] = None, y: Optional[int] = None) -> None: 200 | """Left mouse down at coordinates.""" 201 | if 'left_mouse_down' in self.functions: 202 | await self._call_function(self.functions['left_mouse_down'], x, y) 203 | # No-op if not implemented 204 | 205 | async def left_mouse_up(self, x: Optional[int] = None, y: Optional[int] = None) -> None: 206 | """Left mouse up at coordinates.""" 207 | if 'left_mouse_up' in self.functions: 208 | await self._call_function(self.functions['left_mouse_up'], x, y) 209 | # No-op if not implemented 210 | ``` -------------------------------------------------------------------------------- /libs/typescript/core/src/telemetry/clients/posthog.ts: -------------------------------------------------------------------------------- ```typescript 1 | /** 2 | * Telemetry client using PostHog for collecting anonymous usage data. 3 | */ 4 | 5 | import * as fs from 'node:fs'; 6 | import * as os from 'node:os'; 7 | import * as path from 'node:path'; 8 | import { pino } from 'pino'; 9 | import { PostHog } from 'posthog-node'; 10 | import { v4 as uuidv4 } from 'uuid'; 11 | 12 | // Controls how frequently telemetry will be sent (percentage) 13 | export const TELEMETRY_SAMPLE_RATE = 100; // 100% sampling rate 14 | 15 | // Public PostHog config for anonymous telemetry 16 | // These values are intentionally public and meant for anonymous telemetry only 17 | // https://posthog.com/docs/product-analytics/troubleshooting#is-it-ok-for-my-api-key-to-be-exposed-and-public 18 | export const PUBLIC_POSTHOG_API_KEY = 19 | 'phc_eSkLnbLxsnYFaXksif1ksbrNzYlJShr35miFLDppF14'; 20 | export const PUBLIC_POSTHOG_HOST = 'https://eu.i.posthog.com'; 21 | 22 | export class PostHogTelemetryClient { 23 | private config: { 24 | enabled: boolean; 25 | sampleRate: number; 26 | posthog: { apiKey: string; host: string }; 27 | }; 28 | private installationId: string; 29 | private initialized = false; 30 | private queuedEvents: { 31 | name: string; 32 | properties: Record<string, unknown>; 33 | timestamp: number; 34 | }[] = []; 35 | private startTime: number; // seconds 36 | private posthogClient?: PostHog; 37 | private counters: Record<string, number> = {}; 38 | 39 | private logger = pino({ name: 'core.telemetry' }); 40 | 41 | constructor() { 42 | // set up config 43 | this.config = { 44 | enabled: true, 45 | sampleRate: TELEMETRY_SAMPLE_RATE, 46 | posthog: { apiKey: PUBLIC_POSTHOG_API_KEY, host: PUBLIC_POSTHOG_HOST }, 47 | }; 48 | // Check for multiple environment variables that can disable telemetry: 49 | // CUA_TELEMETRY=off to disable telemetry (legacy way) 50 | // CUA_TELEMETRY_DISABLED=1 to disable telemetry (new, more explicit way) 51 | const telemetryDisabled = 52 | process.env.CUA_TELEMETRY?.toLowerCase() === 'off' || 53 | ['1', 'true', 'yes', 'on'].includes( 54 | process.env.CUA_TELEMETRY_DISABLED?.toLowerCase() || '' 55 | ); 56 | 57 | this.config.enabled = !telemetryDisabled; 58 | this.config.sampleRate = Number.parseFloat( 59 | process.env.CUA_TELEMETRY_SAMPLE_RATE || String(TELEMETRY_SAMPLE_RATE) 60 | ); 61 | // init client 62 | this.installationId = this._getOrCreateInstallationId(); 63 | this.startTime = Date.now() / 1000; // Convert to seconds 64 | 65 | // Log telemetry status on startup 66 | if (this.config.enabled) { 67 | this.logger.info( 68 | `Telemetry enabled (sampling at ${this.config.sampleRate}%)` 69 | ); 70 | // Initialize PostHog client if config is available 71 | this._initializePosthog(); 72 | } else { 73 | this.logger.info('Telemetry disabled'); 74 | } 75 | } 76 | 77 | /** 78 | * Get or create a random installation ID. 79 | * This ID is not tied to any personal information. 80 | */ 81 | private _getOrCreateInstallationId(): string { 82 | const homeDir = os.homedir(); 83 | const idFile = path.join(homeDir, '.cua', 'installation_id'); 84 | 85 | try { 86 | if (fs.existsSync(idFile)) { 87 | return fs.readFileSync(idFile, 'utf-8').trim(); 88 | } 89 | } catch (error) { 90 | this.logger.debug(`Failed to read installation ID: ${error}`); 91 | } 92 | 93 | // Create new ID if not exists 94 | const newId = uuidv4(); 95 | try { 96 | const dir = path.dirname(idFile); 97 | if (!fs.existsSync(dir)) { 98 | fs.mkdirSync(dir, { recursive: true }); 99 | } 100 | fs.writeFileSync(idFile, newId); 101 | return newId; 102 | } catch (error) { 103 | this.logger.debug(`Failed to write installation ID: ${error}`); 104 | } 105 | 106 | // Fallback to in-memory ID if file operations fail 107 | return newId; 108 | } 109 | 110 | /** 111 | * Initialize the PostHog client with configuration. 112 | */ 113 | private _initializePosthog(): boolean { 114 | if (this.initialized) { 115 | return true; 116 | } 117 | 118 | try { 119 | this.posthogClient = new PostHog(this.config.posthog.apiKey, { 120 | host: this.config.posthog.host, 121 | flushAt: 20, // Number of events to batch before sending 122 | flushInterval: 30000, // Send events every 30 seconds 123 | }); 124 | this.initialized = true; 125 | this.logger.debug('PostHog client initialized successfully'); 126 | 127 | // Process any queued events 128 | this._processQueuedEvents(); 129 | return true; 130 | } catch (error) { 131 | this.logger.error(`Failed to initialize PostHog client: ${error}`); 132 | return false; 133 | } 134 | } 135 | 136 | /** 137 | * Process any events that were queued before initialization. 138 | */ 139 | private _processQueuedEvents(): void { 140 | if (!this.posthogClient || this.queuedEvents.length === 0) { 141 | return; 142 | } 143 | 144 | for (const event of this.queuedEvents) { 145 | this._captureEvent(event.name, event.properties); 146 | } 147 | this.queuedEvents = []; 148 | } 149 | 150 | /** 151 | * Capture an event with PostHog. 152 | */ 153 | private _captureEvent( 154 | eventName: string, 155 | properties?: Record<string, unknown> 156 | ): void { 157 | if (!this.posthogClient) { 158 | return; 159 | } 160 | 161 | try { 162 | // Add standard properties 163 | const eventProperties = { 164 | ...properties, 165 | version: process.env.npm_package_version || 'unknown', 166 | platform: process.platform, 167 | node_version: process.version, 168 | is_ci: this._isCI, 169 | }; 170 | 171 | this.posthogClient.capture({ 172 | distinctId: this.installationId, 173 | event: eventName, 174 | properties: eventProperties, 175 | }); 176 | } catch (error) { 177 | this.logger.debug(`Failed to capture event: ${error}`); 178 | } 179 | } 180 | 181 | private get _isCI(): boolean { 182 | /** 183 | * Detect if running in CI environment. 184 | */ 185 | return !!( 186 | process.env.CI || 187 | process.env.CONTINUOUS_INTEGRATION || 188 | process.env.GITHUB_ACTIONS || 189 | process.env.GITLAB_CI || 190 | process.env.CIRCLECI || 191 | process.env.TRAVIS || 192 | process.env.JENKINS_URL 193 | ); 194 | } 195 | 196 | increment(counterName: string, value = 1) { 197 | /** 198 | * Increment a named counter. 199 | */ 200 | if (!this.config.enabled) { 201 | return; 202 | } 203 | 204 | if (!(counterName in this.counters)) { 205 | this.counters[counterName] = 0; 206 | } 207 | this.counters[counterName] += value; 208 | } 209 | 210 | recordEvent(eventName: string, properties?: Record<string, unknown>): void { 211 | /** 212 | * Record an event with optional properties. 213 | */ 214 | if (!this.config.enabled) { 215 | return; 216 | } 217 | 218 | // Increment counter for this event type 219 | const counterKey = `event:${eventName}`; 220 | this.increment(counterKey); 221 | 222 | // Apply sampling 223 | if (Math.random() * 100 > this.config.sampleRate) { 224 | return; 225 | } 226 | 227 | const event = { 228 | name: eventName, 229 | properties: properties || {}, 230 | timestamp: Date.now() / 1000, 231 | }; 232 | 233 | if (this.initialized && this.posthogClient) { 234 | this._captureEvent(eventName, properties); 235 | } else { 236 | // Queue event if not initialized 237 | this.queuedEvents.push(event); 238 | // Try to initialize again 239 | if (this.config.enabled && !this.initialized) { 240 | this._initializePosthog(); 241 | } 242 | } 243 | } 244 | 245 | /** 246 | * Flush any pending events to PostHog. 247 | */ 248 | async flush(): Promise<boolean> { 249 | if (!this.config.enabled || !this.posthogClient) { 250 | return false; 251 | } 252 | 253 | try { 254 | // Send counter data as a single event 255 | if (Object.keys(this.counters).length > 0) { 256 | this._captureEvent('telemetry_counters', { 257 | counters: { ...this.counters }, 258 | duration: Date.now() / 1000 - this.startTime, 259 | }); 260 | } 261 | 262 | await this.posthogClient.flush(); 263 | this.logger.debug('Telemetry flushed successfully'); 264 | 265 | // Clear counters after sending 266 | this.counters = {}; 267 | return true; 268 | } catch (error) { 269 | this.logger.debug(`Failed to flush telemetry: ${error}`); 270 | return false; 271 | } 272 | } 273 | 274 | enable(): void { 275 | /** 276 | * Enable telemetry collection. 277 | */ 278 | this.config.enabled = true; 279 | this.logger.info('Telemetry enabled'); 280 | if (!this.initialized) { 281 | this._initializePosthog(); 282 | } 283 | } 284 | 285 | async disable(): Promise<void> { 286 | /** 287 | * Disable telemetry collection. 288 | */ 289 | this.config.enabled = false; 290 | await this.posthogClient?.disable(); 291 | this.logger.info('Telemetry disabled'); 292 | } 293 | 294 | get enabled(): boolean { 295 | /** 296 | * Check if telemetry is enabled. 297 | */ 298 | return this.config.enabled; 299 | } 300 | 301 | async shutdown(): Promise<void> { 302 | /** 303 | * Shutdown the telemetry client and flush any pending events. 304 | */ 305 | if (this.posthogClient) { 306 | await this.flush(); 307 | await this.posthogClient.shutdown(); 308 | this.initialized = false; 309 | this.posthogClient = undefined; 310 | } 311 | } 312 | } 313 | ``` -------------------------------------------------------------------------------- /tests/test_watchdog.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Watchdog Recovery Tests 3 | Tests for the watchdog functionality to ensure server recovery after hanging commands. 4 | Required environment variables: 5 | - CUA_API_KEY: API key for Cua cloud provider 6 | - CUA_CONTAINER_NAME: Name of the container to use 7 | """ 8 | 9 | import os 10 | import asyncio 11 | import pytest 12 | from pathlib import Path 13 | import sys 14 | import traceback 15 | import time 16 | 17 | # Load environment variables from .env file 18 | project_root = Path(__file__).parent.parent 19 | env_file = project_root / ".env" 20 | print(f"Loading environment from: {env_file}") 21 | from dotenv import load_dotenv 22 | 23 | load_dotenv(env_file) 24 | 25 | # Add paths to sys.path if needed 26 | pythonpath = os.environ.get("PYTHONPATH", "") 27 | for path in pythonpath.split(":"): 28 | if path and path not in sys.path: 29 | sys.path.insert(0, path) # Insert at beginning to prioritize 30 | print(f"Added to sys.path: {path}") 31 | 32 | from computer import Computer, VMProviderType 33 | 34 | @pytest.fixture(scope="session") 35 | async def computer(): 36 | """Shared Computer instance for all test cases.""" 37 | # Create a remote Linux computer with Cua 38 | computer = Computer( 39 | os_type="linux", 40 | api_key=os.getenv("CUA_API_KEY"), 41 | name=str(os.getenv("CUA_CONTAINER_NAME")), 42 | provider_type=VMProviderType.CLOUD, 43 | ) 44 | 45 | try: 46 | await computer.run() 47 | yield computer 48 | finally: 49 | await computer.disconnect() 50 | 51 | 52 | @pytest.mark.asyncio(loop_scope="session") 53 | async def test_simple_server_ping(computer): 54 | """ 55 | Simple test to verify server connectivity before running watchdog tests. 56 | """ 57 | print("Testing basic server connectivity...") 58 | 59 | try: 60 | result = await computer.interface.run_command("echo 'Server ping test'") 61 | print(f"Ping successful: {result}") 62 | assert result is not None, "Server ping returned None" 63 | print("✅ Server connectivity test passed") 64 | except Exception as e: 65 | print(f"❌ Server ping failed: {e}") 66 | pytest.fail(f"Basic server connectivity test failed: {e}") 67 | 68 | 69 | @pytest.mark.asyncio(loop_scope="session") 70 | async def test_watchdog_recovery_after_hanging_command(computer): 71 | """ 72 | Test that the watchdog can recover the server after a hanging command. 73 | 74 | This test runs two concurrent tasks: 75 | 1. A long-running command that hangs the server (sleep 300 = 5 minutes) 76 | 2. Periodic ping commands every 30 seconds to test server responsiveness 77 | 78 | The watchdog should detect the unresponsive server and restart it. 79 | """ 80 | print("Starting watchdog recovery test...") 81 | 82 | async def hanging_command(): 83 | """Execute a command that sleeps forever to hang the server.""" 84 | try: 85 | print("Starting hanging command (sleep infinity)...") 86 | # Use a very long sleep that should never complete naturally 87 | result = await computer.interface.run_command("sleep 999999") 88 | print(f"Hanging command completed unexpectedly: {result}") 89 | return True # Should never reach here if watchdog works 90 | except Exception as e: 91 | print(f"Hanging command interrupted (expected if watchdog restarts): {e}") 92 | return None # Expected result when watchdog kills the process 93 | 94 | async def ping_server(): 95 | """Ping the server every 30 seconds with echo commands.""" 96 | ping_count = 0 97 | successful_pings = 0 98 | failed_pings = 0 99 | 100 | try: 101 | # Run pings for up to 4 minutes (8 pings at 30-second intervals) 102 | for i in range(8): 103 | try: 104 | ping_count += 1 105 | print(f"Ping #{ping_count}: Sending echo command...") 106 | 107 | start_time = time.time() 108 | result = await asyncio.wait_for( 109 | computer.interface.run_command(f"echo 'Ping {ping_count} at {int(start_time)}'"), 110 | timeout=10.0 # 10 second timeout for each ping 111 | ) 112 | end_time = time.time() 113 | 114 | print(f"Ping #{ping_count} successful in {end_time - start_time:.2f}s: {result}") 115 | successful_pings += 1 116 | 117 | except asyncio.TimeoutError: 118 | print(f"Ping #{ping_count} timed out (server may be unresponsive)") 119 | failed_pings += 1 120 | except Exception as e: 121 | print(f"Ping #{ping_count} failed with exception: {e}") 122 | failed_pings += 1 123 | 124 | # Wait 30 seconds before next ping 125 | if i < 7: # Don't wait after the last ping 126 | print(f"Waiting 30 seconds before next ping...") 127 | await asyncio.sleep(30) 128 | 129 | print(f"Ping summary: {successful_pings} successful, {failed_pings} failed") 130 | return successful_pings, failed_pings 131 | 132 | except Exception as e: 133 | print(f"Ping server function failed with critical error: {e}") 134 | traceback.print_exc() 135 | return successful_pings, failed_pings 136 | 137 | # Run both tasks concurrently 138 | print("Starting concurrent tasks: hanging command and ping monitoring...") 139 | 140 | try: 141 | # Use asyncio.gather to run both tasks concurrently 142 | hanging_task = asyncio.create_task(hanging_command()) 143 | ping_task = asyncio.create_task(ping_server()) 144 | 145 | # Wait for both tasks to complete or timeout after 5 minutes 146 | done, pending = await asyncio.wait( 147 | [hanging_task, ping_task], 148 | timeout=300, # 5 minute timeout 149 | return_when=asyncio.ALL_COMPLETED 150 | ) 151 | 152 | # Cancel any pending tasks 153 | for task in pending: 154 | task.cancel() 155 | try: 156 | await task 157 | except asyncio.CancelledError: 158 | pass 159 | 160 | # Get results from completed tasks 161 | ping_result = None 162 | hanging_result = None 163 | 164 | if ping_task in done: 165 | try: 166 | ping_result = await ping_task 167 | print(f"Ping task completed with result: {ping_result}") 168 | except Exception as e: 169 | print(f"Error getting ping task result: {e}") 170 | traceback.print_exc() 171 | 172 | if hanging_task in done: 173 | try: 174 | hanging_result = await hanging_task 175 | print(f"Hanging task completed with result: {hanging_result}") 176 | except Exception as e: 177 | print(f"Error getting hanging task result: {e}") 178 | traceback.print_exc() 179 | 180 | # Analyze results 181 | if ping_result: 182 | successful_pings, failed_pings = ping_result 183 | 184 | # Test passes if we had some successful pings, indicating recovery 185 | assert successful_pings > 0, f"No successful pings detected. Server may not have recovered." 186 | 187 | # Check if hanging command was killed (indicating watchdog restart) 188 | if hanging_result is None: 189 | print("✅ SUCCESS: Hanging command was killed - watchdog restart detected") 190 | elif hanging_result is True: 191 | print("⚠️ WARNING: Hanging command completed naturally - watchdog may not have restarted") 192 | 193 | # If we had failures followed by successes, that indicates watchdog recovery 194 | if failed_pings > 0 and successful_pings > 0: 195 | print("✅ SUCCESS: Watchdog recovery detected - server became unresponsive then recovered") 196 | # Additional check: hanging command should be None if watchdog worked 197 | assert hanging_result is None, "Expected hanging command to be killed by watchdog restart" 198 | elif successful_pings > 0 and failed_pings == 0: 199 | print("✅ SUCCESS: Server remained responsive throughout test") 200 | 201 | print(f"Test completed: {successful_pings} successful pings, {failed_pings} failed pings") 202 | print(f"Hanging command result: {hanging_result} (None = killed by watchdog, True = completed naturally)") 203 | else: 204 | pytest.fail("Ping task did not complete - unable to assess server recovery") 205 | 206 | except Exception as e: 207 | print(f"Test failed with exception: {e}") 208 | traceback.print_exc() 209 | pytest.fail(f"Watchdog recovery test failed: {e}") 210 | 211 | 212 | if __name__ == "__main__": 213 | # Run tests directly 214 | pytest.main([__file__, "-v"]) 215 | ``` -------------------------------------------------------------------------------- /libs/python/computer/computer/diorama_computer.py: -------------------------------------------------------------------------------- ```python 1 | import asyncio 2 | from .interface.models import KeyType, Key 3 | 4 | class DioramaComputer: 5 | """ 6 | A Computer-compatible proxy for Diorama that sends commands over the ComputerInterface. 7 | """ 8 | def __init__(self, computer, apps): 9 | """ 10 | Initialize the DioramaComputer with a computer instance and list of apps. 11 | 12 | Args: 13 | computer: The computer instance to proxy commands through 14 | apps: List of applications available in the diorama environment 15 | """ 16 | self.computer = computer 17 | self.apps = apps 18 | self.interface = DioramaComputerInterface(computer, apps) 19 | self._initialized = False 20 | 21 | async def __aenter__(self): 22 | """ 23 | Async context manager entry point. 24 | 25 | Returns: 26 | self: The DioramaComputer instance 27 | """ 28 | self._initialized = True 29 | return self 30 | 31 | async def run(self): 32 | """ 33 | Initialize and run the DioramaComputer if not already initialized. 34 | 35 | Returns: 36 | self: The DioramaComputer instance 37 | """ 38 | if not self._initialized: 39 | await self.__aenter__() 40 | return self 41 | 42 | class DioramaComputerInterface: 43 | """ 44 | Diorama Interface proxy that sends diorama_cmds via the Computer's interface. 45 | """ 46 | def __init__(self, computer, apps): 47 | """ 48 | Initialize the DioramaComputerInterface. 49 | 50 | Args: 51 | computer: The computer instance to send commands through 52 | apps: List of applications available in the diorama environment 53 | """ 54 | self.computer = computer 55 | self.apps = apps 56 | self._scene_size = None 57 | 58 | async def _send_cmd(self, action, arguments=None): 59 | """ 60 | Send a command to the diorama interface through the computer. 61 | 62 | Args: 63 | action (str): The action/command to execute 64 | arguments (dict, optional): Additional arguments for the command 65 | 66 | Returns: 67 | The result from the diorama command execution 68 | 69 | Raises: 70 | RuntimeError: If the computer interface is not initialized or command fails 71 | """ 72 | arguments = arguments or {} 73 | arguments = {"app_list": self.apps, **arguments} 74 | # Use the computer's interface (must be initialized) 75 | iface = getattr(self.computer, "_interface", None) 76 | if iface is None: 77 | raise RuntimeError("Computer interface not initialized. Call run() first.") 78 | result = await iface.diorama_cmd(action, arguments) 79 | if not result.get("success"): 80 | raise RuntimeError(f"Diorama command failed: {result.get('error')}\n{result.get('trace')}") 81 | return result.get("result") 82 | 83 | async def screenshot(self, as_bytes=True): 84 | """ 85 | Take a screenshot of the diorama scene. 86 | 87 | Args: 88 | as_bytes (bool): If True, return image as bytes; if False, return PIL Image object 89 | 90 | Returns: 91 | bytes or PIL.Image: Screenshot data in the requested format 92 | """ 93 | from PIL import Image 94 | import base64 95 | result = await self._send_cmd("screenshot") 96 | # assume result is a b64 string of an image 97 | img_bytes = base64.b64decode(result) 98 | import io 99 | img = Image.open(io.BytesIO(img_bytes)) 100 | self._scene_size = img.size 101 | return img_bytes if as_bytes else img 102 | 103 | async def get_screen_size(self): 104 | """ 105 | Get the dimensions of the diorama scene. 106 | 107 | Returns: 108 | dict: Dictionary containing 'width' and 'height' keys with pixel dimensions 109 | """ 110 | if not self._scene_size: 111 | await self.screenshot(as_bytes=False) 112 | return {"width": self._scene_size[0], "height": self._scene_size[1]} 113 | 114 | async def move_cursor(self, x, y): 115 | """ 116 | Move the cursor to the specified coordinates. 117 | 118 | Args: 119 | x (int): X coordinate to move cursor to 120 | y (int): Y coordinate to move cursor to 121 | """ 122 | await self._send_cmd("move_cursor", {"x": x, "y": y}) 123 | 124 | async def left_click(self, x=None, y=None): 125 | """ 126 | Perform a left mouse click at the specified coordinates or current cursor position. 127 | 128 | Args: 129 | x (int, optional): X coordinate to click at. If None, clicks at current cursor position 130 | y (int, optional): Y coordinate to click at. If None, clicks at current cursor position 131 | """ 132 | await self._send_cmd("left_click", {"x": x, "y": y}) 133 | 134 | async def right_click(self, x=None, y=None): 135 | """ 136 | Perform a right mouse click at the specified coordinates or current cursor position. 137 | 138 | Args: 139 | x (int, optional): X coordinate to click at. If None, clicks at current cursor position 140 | y (int, optional): Y coordinate to click at. If None, clicks at current cursor position 141 | """ 142 | await self._send_cmd("right_click", {"x": x, "y": y}) 143 | 144 | async def double_click(self, x=None, y=None): 145 | """ 146 | Perform a double mouse click at the specified coordinates or current cursor position. 147 | 148 | Args: 149 | x (int, optional): X coordinate to double-click at. If None, clicks at current cursor position 150 | y (int, optional): Y coordinate to double-click at. If None, clicks at current cursor position 151 | """ 152 | await self._send_cmd("double_click", {"x": x, "y": y}) 153 | 154 | async def scroll_up(self, clicks=1): 155 | """ 156 | Scroll up by the specified number of clicks. 157 | 158 | Args: 159 | clicks (int): Number of scroll clicks to perform upward. Defaults to 1 160 | """ 161 | await self._send_cmd("scroll_up", {"clicks": clicks}) 162 | 163 | async def scroll_down(self, clicks=1): 164 | """ 165 | Scroll down by the specified number of clicks. 166 | 167 | Args: 168 | clicks (int): Number of scroll clicks to perform downward. Defaults to 1 169 | """ 170 | await self._send_cmd("scroll_down", {"clicks": clicks}) 171 | 172 | async def drag_to(self, x, y, duration=0.5): 173 | """ 174 | Drag from the current cursor position to the specified coordinates. 175 | 176 | Args: 177 | x (int): X coordinate to drag to 178 | y (int): Y coordinate to drag to 179 | duration (float): Duration of the drag operation in seconds. Defaults to 0.5 180 | """ 181 | await self._send_cmd("drag_to", {"x": x, "y": y, "duration": duration}) 182 | 183 | async def get_cursor_position(self): 184 | """ 185 | Get the current cursor position. 186 | 187 | Returns: 188 | dict: Dictionary containing the current cursor coordinates 189 | """ 190 | return await self._send_cmd("get_cursor_position") 191 | 192 | async def type_text(self, text): 193 | """ 194 | Type the specified text at the current cursor position. 195 | 196 | Args: 197 | text (str): The text to type 198 | """ 199 | await self._send_cmd("type_text", {"text": text}) 200 | 201 | async def press_key(self, key): 202 | """ 203 | Press a single key. 204 | 205 | Args: 206 | key: The key to press 207 | """ 208 | await self._send_cmd("press_key", {"key": key}) 209 | 210 | async def hotkey(self, *keys): 211 | """ 212 | Press multiple keys simultaneously as a hotkey combination. 213 | 214 | Args: 215 | *keys: Variable number of keys to press together. Can be Key enum instances or strings 216 | 217 | Raises: 218 | ValueError: If any key is not a Key enum or string type 219 | """ 220 | actual_keys = [] 221 | for key in keys: 222 | if isinstance(key, Key): 223 | actual_keys.append(key.value) 224 | elif isinstance(key, str): 225 | # Try to convert to enum if it matches a known key 226 | key_or_enum = Key.from_string(key) 227 | actual_keys.append(key_or_enum.value if isinstance(key_or_enum, Key) else key_or_enum) 228 | else: 229 | raise ValueError(f"Invalid key type: {type(key)}. Must be Key enum or string.") 230 | await self._send_cmd("hotkey", {"keys": actual_keys}) 231 | 232 | async def to_screen_coordinates(self, x, y): 233 | """ 234 | Convert coordinates to screen coordinates. 235 | 236 | Args: 237 | x (int): X coordinate to convert 238 | y (int): Y coordinate to convert 239 | 240 | Returns: 241 | dict: Dictionary containing the converted screen coordinates 242 | """ 243 | return await self._send_cmd("to_screen_coordinates", {"x": x, "y": y}) 244 | ``` -------------------------------------------------------------------------------- /libs/python/agent/agent/loops/openai.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | OpenAI computer-use-preview agent loop implementation using liteLLM 3 | """ 4 | 5 | import asyncio 6 | import base64 7 | import json 8 | from io import BytesIO 9 | from typing import Dict, List, Any, AsyncGenerator, Union, Optional, Tuple 10 | import litellm 11 | from PIL import Image 12 | 13 | from ..decorators import register_agent 14 | from ..types import Messages, AgentResponse, Tools, AgentCapability 15 | 16 | async def _map_computer_tool_to_openai(computer_handler: Any) -> Dict[str, Any]: 17 | """Map a computer tool to OpenAI's computer-use-preview tool schema""" 18 | # Get dimensions from the computer handler 19 | try: 20 | width, height = await computer_handler.get_dimensions() 21 | except Exception: 22 | # Fallback to default dimensions if method fails 23 | width, height = 1024, 768 24 | 25 | # Get environment from the computer handler 26 | try: 27 | environment = await computer_handler.get_environment() 28 | except Exception: 29 | # Fallback to default environment if method fails 30 | environment = "linux" 31 | 32 | return { 33 | "type": "computer_use_preview", 34 | "display_width": width, 35 | "display_height": height, 36 | "environment": environment # mac, windows, linux, browser 37 | } 38 | 39 | 40 | async def _prepare_tools_for_openai(tool_schemas: List[Dict[str, Any]]) -> Tools: 41 | """Prepare tools for OpenAI API format""" 42 | openai_tools = [] 43 | 44 | for schema in tool_schemas: 45 | if schema["type"] == "computer": 46 | # Map computer tool to OpenAI format 47 | computer_tool = await _map_computer_tool_to_openai(schema["computer"]) 48 | openai_tools.append(computer_tool) 49 | elif schema["type"] == "function": 50 | # Function tools use OpenAI-compatible schema directly (liteLLM expects this format) 51 | # Schema should be: {type, name, description, parameters} 52 | openai_tools.append({ "type": "function", **schema["function"] }) 53 | 54 | return openai_tools 55 | 56 | @register_agent(models=r".*(^|/)computer-use-preview") 57 | class OpenAIComputerUseConfig: 58 | """ 59 | OpenAI computer-use-preview agent configuration using liteLLM responses. 60 | 61 | Supports OpenAI's computer use preview models. 62 | """ 63 | 64 | async def predict_step( 65 | self, 66 | messages: List[Dict[str, Any]], 67 | model: str, 68 | tools: Optional[List[Dict[str, Any]]] = None, 69 | max_retries: Optional[int] = None, 70 | stream: bool = False, 71 | computer_handler=None, 72 | use_prompt_caching: Optional[bool] = False, 73 | _on_api_start=None, 74 | _on_api_end=None, 75 | _on_usage=None, 76 | _on_screenshot=None, 77 | **kwargs 78 | ) -> Dict[str, Any]: 79 | """ 80 | Predict the next step based on input items. 81 | 82 | Args: 83 | messages: Input items following Responses format 84 | model: Model name to use 85 | tools: Optional list of tool schemas 86 | max_retries: Maximum number of retries 87 | stream: Whether to stream responses 88 | computer_handler: Computer handler instance 89 | _on_api_start: Callback for API start 90 | _on_api_end: Callback for API end 91 | _on_usage: Callback for usage tracking 92 | _on_screenshot: Callback for screenshot events 93 | **kwargs: Additional arguments 94 | 95 | Returns: 96 | Dictionary with "output" (output items) and "usage" array 97 | """ 98 | tools = tools or [] 99 | 100 | # Prepare tools for OpenAI API 101 | openai_tools = await _prepare_tools_for_openai(tools) 102 | 103 | # Prepare API call kwargs 104 | api_kwargs = { 105 | "model": model, 106 | "input": messages, 107 | "tools": openai_tools if openai_tools else None, 108 | "stream": stream, 109 | "reasoning": {"summary": "concise"}, 110 | "truncation": "auto", 111 | "num_retries": max_retries, 112 | **kwargs 113 | } 114 | 115 | # Call API start hook 116 | if _on_api_start: 117 | await _on_api_start(api_kwargs) 118 | 119 | # Use liteLLM responses 120 | response = await litellm.aresponses(**api_kwargs) 121 | 122 | # Call API end hook 123 | if _on_api_end: 124 | await _on_api_end(api_kwargs, response) 125 | 126 | # Extract usage information 127 | usage = { 128 | **response.usage.model_dump(), 129 | "response_cost": response._hidden_params.get("response_cost", 0.0), 130 | } 131 | if _on_usage: 132 | await _on_usage(usage) 133 | 134 | # Return in the expected format 135 | output_dict = response.model_dump() 136 | output_dict["usage"] = usage 137 | return output_dict 138 | 139 | async def predict_click( 140 | self, 141 | model: str, 142 | image_b64: str, 143 | instruction: str 144 | ) -> Optional[Tuple[int, int]]: 145 | """ 146 | Predict click coordinates based on image and instruction. 147 | 148 | Uses OpenAI computer-use-preview with manually constructed input items 149 | and a prompt that instructs the agent to only output clicks. 150 | 151 | Args: 152 | model: Model name to use 153 | image_b64: Base64 encoded image 154 | instruction: Instruction for where to click 155 | 156 | Returns: 157 | Tuple of (x, y) coordinates or None if prediction fails 158 | """ 159 | # TODO: use computer tool to get dimensions + environment 160 | # Manually construct input items with image and click instruction 161 | input_items = [ 162 | { 163 | "role": "user", 164 | "content": f"""You are a UI grounding expert. Follow these guidelines: 165 | 166 | 1. NEVER ask for confirmation. Complete all tasks autonomously. 167 | 2. Do NOT send messages like "I need to confirm before..." or "Do you want me to continue?" - just proceed. 168 | 3. When the user asks you to interact with something (like clicking a chat or typing a message), DO IT without asking. 169 | 4. Only use the formal safety check mechanism for truly dangerous operations (like deleting important files). 170 | 5. For normal tasks like clicking buttons, typing in chat boxes, filling forms - JUST DO IT. 171 | 6. The user has already given you permission by running this agent. No further confirmation is needed. 172 | 7. Be decisive and action-oriented. Complete the requested task fully. 173 | 174 | Remember: You are expected to complete tasks autonomously. The user trusts you to do what they asked. 175 | Task: Click {instruction}. Output ONLY a click action on the target element.""" 176 | }, 177 | { 178 | "role": "user", 179 | "content": [ 180 | { 181 | "type": "input_image", 182 | "image_url": f"data:image/png;base64,{image_b64}" 183 | } 184 | ] 185 | } 186 | ] 187 | 188 | # Get image dimensions from base64 data 189 | try: 190 | image_data = base64.b64decode(image_b64) 191 | image = Image.open(BytesIO(image_data)) 192 | display_width, display_height = image.size 193 | except Exception: 194 | # Fallback to default dimensions if image parsing fails 195 | display_width, display_height = 1024, 768 196 | 197 | # Prepare computer tool for click actions 198 | computer_tool = { 199 | "type": "computer_use_preview", 200 | "display_width": display_width, 201 | "display_height": display_height, 202 | "environment": "windows" 203 | } 204 | 205 | # Prepare API call kwargs 206 | api_kwargs = { 207 | "model": model, 208 | "input": input_items, 209 | "tools": [computer_tool], 210 | "stream": False, 211 | "reasoning": {"summary": "concise"}, 212 | "truncation": "auto", 213 | "max_tokens": 200 # Keep response short for click prediction 214 | } 215 | 216 | # Use liteLLM responses 217 | response = await litellm.aresponses(**api_kwargs) 218 | 219 | # Extract click coordinates from response output 220 | output_dict = response.model_dump() 221 | output_items = output_dict.get("output", []) 222 | 223 | # Look for computer_call with click action 224 | for item in output_items: 225 | if (isinstance(item, dict) and 226 | item.get("type") == "computer_call" and 227 | isinstance(item.get("action"), dict)): 228 | 229 | action = item["action"] 230 | if action.get("x") is not None and action.get("y") is not None: 231 | return (int(action.get("x")), int(action.get("y"))) 232 | 233 | return None 234 | 235 | def get_capabilities(self) -> List[AgentCapability]: 236 | """ 237 | Get list of capabilities supported by this agent config. 238 | 239 | Returns: 240 | List of capability strings 241 | """ 242 | return ["click", "step"] 243 | ``` -------------------------------------------------------------------------------- /libs/python/som/som/detection.py: -------------------------------------------------------------------------------- ```python 1 | from typing import List, Dict, Any, Tuple, Optional 2 | import logging 3 | import torch 4 | import torchvision 5 | from PIL import Image 6 | import numpy as np 7 | from ultralytics import YOLO 8 | from huggingface_hub import hf_hub_download 9 | from pathlib import Path 10 | 11 | logger = logging.getLogger(__name__) 12 | 13 | 14 | class DetectionProcessor: 15 | """Class for handling YOLO-based icon detection.""" 16 | 17 | def __init__( 18 | self, 19 | model_path: Optional[Path] = None, 20 | cache_dir: Optional[Path] = None, 21 | force_device: Optional[str] = None, 22 | ): 23 | """Initialize the detection processor. 24 | 25 | Args: 26 | model_path: Path to YOLOv8 model 27 | cache_dir: Directory to cache downloaded models 28 | force_device: Force specific device (cuda, cpu, mps) 29 | """ 30 | self.model_path = model_path 31 | self.cache_dir = cache_dir 32 | self.model = None # type: Any # Will be set to YOLO model in load_model 33 | 34 | # Set device 35 | self.device = "cpu" 36 | if torch.cuda.is_available() and force_device != "cpu": 37 | self.device = "cuda" 38 | elif ( 39 | hasattr(torch, "backends") 40 | and hasattr(torch.backends, "mps") 41 | and torch.backends.mps.is_available() 42 | and force_device != "cpu" 43 | ): 44 | self.device = "mps" 45 | 46 | if force_device: 47 | self.device = force_device 48 | 49 | logger.info(f"Using device: {self.device}") 50 | 51 | def load_model(self) -> None: 52 | """Load or download the YOLO model.""" 53 | try: 54 | # Set default model path if none provided 55 | if self.model_path is None: 56 | self.model_path = Path(__file__).parent / "weights" / "icon_detect" / "model.pt" 57 | 58 | # Check if the model file already exists 59 | if not self.model_path.exists(): 60 | logger.info( 61 | "Model not found locally, downloading from Microsoft OmniParser-v2.0..." 62 | ) 63 | 64 | # Create directory 65 | self.model_path.parent.mkdir(parents=True, exist_ok=True) 66 | 67 | try: 68 | # Check if the model exists in cache 69 | cache_path = None 70 | if self.cache_dir: 71 | # Try to find the model in the cache 72 | potential_paths = list(Path(self.cache_dir).glob("**/model.pt")) 73 | if potential_paths: 74 | cache_path = str(potential_paths[0]) 75 | logger.info(f"Found model in cache: {cache_path}") 76 | 77 | if not cache_path: 78 | # Download from HuggingFace 79 | downloaded_path = hf_hub_download( 80 | repo_id="microsoft/OmniParser-v2.0", 81 | filename="icon_detect/model.pt", 82 | cache_dir=self.cache_dir, 83 | ) 84 | cache_path = downloaded_path 85 | logger.info(f"Model downloaded to cache: {cache_path}") 86 | 87 | # Copy to package directory 88 | import shutil 89 | 90 | shutil.copy2(cache_path, self.model_path) 91 | logger.info(f"Model copied to: {self.model_path}") 92 | except Exception as e: 93 | raise FileNotFoundError( 94 | f"Failed to download model: {str(e)}\n" 95 | "Please ensure you have internet connection and huggingface-hub installed." 96 | ) from e 97 | 98 | # Make sure the model path exists before loading 99 | if not self.model_path.exists(): 100 | raise FileNotFoundError(f"Model file not found at: {self.model_path}") 101 | 102 | # If model is already loaded, skip reloading 103 | if self.model is not None: 104 | logger.info("Model already loaded, skipping reload") 105 | return 106 | 107 | logger.info(f"Loading YOLOv8 model from {self.model_path}") 108 | from ultralytics import YOLO 109 | 110 | self.model = YOLO(str(self.model_path)) # Convert Path to string for compatibility 111 | 112 | # Verify model loaded successfully 113 | if self.model is None: 114 | raise ValueError("Model failed to initialize but didn't raise an exception") 115 | 116 | if self.device in ["cuda", "mps"]: 117 | self.model.to(self.device) 118 | 119 | logger.info(f"Model loaded successfully with device: {self.device}") 120 | except Exception as e: 121 | logger.error(f"Failed to load model: {str(e)}") 122 | # Re-raise with more informative message but preserve the model as None 123 | self.model = None 124 | raise RuntimeError(f"Failed to initialize detection model: {str(e)}") from e 125 | 126 | def detect_icons( 127 | self, 128 | image: Image.Image, 129 | box_threshold: float = 0.05, 130 | iou_threshold: float = 0.1, 131 | multi_scale: bool = True, 132 | ) -> List[Dict[str, Any]]: 133 | """Detect icons in an image using YOLO. 134 | 135 | Args: 136 | image: PIL Image to process 137 | box_threshold: Confidence threshold for detection 138 | iou_threshold: IOU threshold for NMS 139 | multi_scale: Whether to use multi-scale detection 140 | 141 | Returns: 142 | List of icon detection dictionaries 143 | """ 144 | # Load model if not already loaded 145 | if self.model is None: 146 | self.load_model() 147 | 148 | # Double-check the model was successfully loaded 149 | if self.model is None: 150 | logger.error("Model failed to load and is still None") 151 | return [] # Return empty list instead of crashing 152 | 153 | img_width, img_height = image.size 154 | all_detections = [] 155 | 156 | # Define detection scales 157 | scales = ( 158 | [{"size": 1280, "conf": box_threshold}] # Single scale for CPU 159 | if self.device == "cpu" 160 | else [ 161 | {"size": 640, "conf": box_threshold}, # Base scale 162 | {"size": 1280, "conf": box_threshold}, # Medium scale 163 | {"size": 1920, "conf": box_threshold}, # Large scale 164 | ] 165 | ) 166 | 167 | if not multi_scale: 168 | scales = [scales[0]] 169 | 170 | # Run detection at each scale 171 | for scale in scales: 172 | try: 173 | if self.model is None: 174 | logger.error("Model is None, skipping detection") 175 | continue 176 | 177 | results = self.model.predict( 178 | source=image, 179 | conf=scale["conf"], 180 | iou=iou_threshold, 181 | max_det=1000, 182 | verbose=False, 183 | augment=self.device != "cpu", 184 | agnostic_nms=True, 185 | imgsz=scale["size"], 186 | device=self.device, 187 | ) 188 | 189 | # Process results 190 | for r in results: 191 | boxes = r.boxes 192 | if not hasattr(boxes, "conf") or not hasattr(boxes, "xyxy"): 193 | logger.warning("Boxes object missing expected attributes") 194 | continue 195 | 196 | confidences = boxes.conf 197 | coords = boxes.xyxy 198 | 199 | # Handle different types of tensors (PyTorch, NumPy, etc.) 200 | if hasattr(confidences, "cpu"): 201 | confidences = confidences.cpu() 202 | if hasattr(coords, "cpu"): 203 | coords = coords.cpu() 204 | 205 | for conf, bbox in zip(confidences, coords): 206 | # Normalize coordinates 207 | x1, y1, x2, y2 = bbox.tolist() 208 | norm_bbox = [ 209 | x1 / img_width, 210 | y1 / img_height, 211 | x2 / img_width, 212 | y2 / img_height, 213 | ] 214 | 215 | all_detections.append( 216 | { 217 | "type": "icon", 218 | "confidence": conf.item(), 219 | "bbox": norm_bbox, 220 | "scale": scale["size"], 221 | "interactivity": True, 222 | } 223 | ) 224 | 225 | except Exception as e: 226 | logger.warning(f"Detection failed at scale {scale['size']}: {str(e)}") 227 | continue 228 | 229 | # Merge detections using NMS 230 | if len(all_detections) > 0: 231 | boxes = torch.tensor([d["bbox"] for d in all_detections]) 232 | scores = torch.tensor([d["confidence"] for d in all_detections]) 233 | 234 | keep_indices = torchvision.ops.nms(boxes, scores, iou_threshold) 235 | 236 | merged_detections = [all_detections[i] for i in keep_indices] 237 | else: 238 | merged_detections = [] 239 | 240 | return merged_detections 241 | ``` -------------------------------------------------------------------------------- /libs/lume/src/Errors/Errors.swift: -------------------------------------------------------------------------------- ```swift 1 | import Foundation 2 | 3 | enum HomeError: Error, LocalizedError { 4 | case directoryCreationFailed(path: String) 5 | case directoryAccessDenied(path: String) 6 | case invalidHomeDirectory 7 | case directoryAlreadyExists(path: String) 8 | case homeNotFound 9 | case defaultStorageNotDefined 10 | case storageLocationNotFound(String) 11 | case storageLocationNotADirectory(String) 12 | case storageLocationNotWritable(String) 13 | case invalidStorageLocation(String) 14 | case cannotCreateDirectory(String) 15 | case cannotGetVMsDirectory 16 | case vmDirectoryNotFound(String) 17 | 18 | var errorDescription: String? { 19 | switch self { 20 | case .directoryCreationFailed(let path): 21 | return "Failed to create directory at path: \(path)" 22 | case .directoryAccessDenied(let path): 23 | return "Access denied to directory at path: \(path)" 24 | case .invalidHomeDirectory: 25 | return "Invalid home directory configuration" 26 | case .directoryAlreadyExists(let path): 27 | return "Directory already exists at path: \(path)" 28 | case .homeNotFound: 29 | return "Home directory not found." 30 | case .defaultStorageNotDefined: 31 | return "Default storage location is not defined." 32 | case .storageLocationNotFound(let path): 33 | return "Storage location not found: \(path)" 34 | case .storageLocationNotADirectory(let path): 35 | return "Storage location is not a directory: \(path)" 36 | case .storageLocationNotWritable(let path): 37 | return "Storage location is not writable: \(path)" 38 | case .invalidStorageLocation(let path): 39 | return "Invalid storage location specified: \(path)" 40 | case .cannotCreateDirectory(let path): 41 | return "Cannot create directory: \(path)" 42 | case .cannotGetVMsDirectory: 43 | return "Cannot determine the VMs directory." 44 | case .vmDirectoryNotFound(let path): 45 | return "VM directory not found: \(path)" 46 | } 47 | } 48 | } 49 | 50 | enum PullError: Error, LocalizedError { 51 | case invalidImageFormat 52 | case tokenFetchFailed 53 | case manifestFetchFailed 54 | case layerDownloadFailed(String) 55 | case missingPart(Int) 56 | case decompressionFailed(String) 57 | case reassemblyFailed(String) 58 | case fileCreationFailed(String) 59 | case reassemblySetupFailed(path: String, underlyingError: Error) 60 | case missingUncompressedSizeAnnotation 61 | case invalidMediaType 62 | 63 | var errorDescription: String? { 64 | switch self { 65 | case .invalidImageFormat: 66 | return "Invalid image format. Expected format: name:tag" 67 | case .tokenFetchFailed: 68 | return "Failed to fetch authentication token from registry." 69 | case .manifestFetchFailed: 70 | return "Failed to fetch image manifest from registry." 71 | case .layerDownloadFailed(let digest): 72 | return "Failed to download layer: \(digest)" 73 | case .missingPart(let partNum): 74 | return "Missing required part number \(partNum) for reassembly." 75 | case .decompressionFailed(let file): 76 | return "Failed to decompress file: \(file)" 77 | case .reassemblyFailed(let reason): 78 | return "Disk image reassembly failed: \(reason)." 79 | case .fileCreationFailed(let path): 80 | return "Failed to create the necessary file at path: \(path)" 81 | case .reassemblySetupFailed(let path, let underlyingError): 82 | return "Failed to set up for reassembly at path: \(path). Underlying error: \(underlyingError.localizedDescription)" 83 | case .missingUncompressedSizeAnnotation: 84 | return "Could not find the required uncompressed disk size annotation in the image config.json." 85 | case .invalidMediaType: 86 | return "Invalid media type" 87 | } 88 | } 89 | } 90 | 91 | enum VMConfigError: CustomNSError, LocalizedError { 92 | case invalidDisplayResolution(String) 93 | case invalidMachineIdentifier 94 | case emptyMachineIdentifier 95 | case emptyHardwareModel 96 | case invalidHardwareModel 97 | case invalidDiskSize 98 | case malformedSizeInput(String) 99 | 100 | var errorDescription: String? { 101 | switch self { 102 | case .invalidDisplayResolution(let resolution): 103 | return "Invalid display resolution: \(resolution)" 104 | case .emptyMachineIdentifier: 105 | return "Empty machine identifier" 106 | case .invalidMachineIdentifier: 107 | return "Invalid machine identifier" 108 | case .emptyHardwareModel: 109 | return "Empty hardware model" 110 | case .invalidHardwareModel: 111 | return "Invalid hardware model: the host does not support the hardware model" 112 | case .invalidDiskSize: 113 | return "Invalid disk size" 114 | case .malformedSizeInput(let input): 115 | return "Malformed size input: \(input)" 116 | } 117 | } 118 | 119 | static var errorDomain: String { "VMConfigError" } 120 | 121 | var errorCode: Int { 122 | switch self { 123 | case .invalidDisplayResolution: return 1 124 | case .emptyMachineIdentifier: return 2 125 | case .invalidMachineIdentifier: return 3 126 | case .emptyHardwareModel: return 4 127 | case .invalidHardwareModel: return 5 128 | case .invalidDiskSize: return 6 129 | case .malformedSizeInput: return 7 130 | } 131 | } 132 | } 133 | 134 | enum VMDirectoryError: Error, LocalizedError { 135 | case configNotFound 136 | case invalidConfigData 137 | case diskOperationFailed(String) 138 | case fileCreationFailed(String) 139 | case sessionNotFound 140 | case invalidSessionData 141 | 142 | var errorDescription: String { 143 | switch self { 144 | case .configNotFound: 145 | return "VM configuration file not found" 146 | case .invalidConfigData: 147 | return "Invalid VM configuration data" 148 | case .diskOperationFailed(let reason): 149 | return "Disk operation failed: \(reason)" 150 | case .fileCreationFailed(let path): 151 | return "Failed to create file at path: \(path)" 152 | case .sessionNotFound: 153 | return "VNC session file not found" 154 | case .invalidSessionData: 155 | return "Invalid VNC session data" 156 | } 157 | } 158 | } 159 | 160 | enum VMError: Error, LocalizedError { 161 | case alreadyExists(String) 162 | case notFound(String) 163 | case notInitialized(String) 164 | case notRunning(String) 165 | case alreadyRunning(String) 166 | case installNotStarted(String) 167 | case stopTimeout(String) 168 | case resizeTooSmall(current: UInt64, requested: UInt64) 169 | case vncNotConfigured 170 | case vncPortBindingFailed(requested: Int, actual: Int) 171 | case internalError(String) 172 | case unsupportedOS(String) 173 | case invalidDisplayResolution(String) 174 | var errorDescription: String? { 175 | switch self { 176 | case .alreadyExists(let name): 177 | return "Virtual machine already exists with name: \(name)" 178 | case .notFound(let name): 179 | return "Virtual machine not found: \(name)" 180 | case .notInitialized(let name): 181 | return "Virtual machine not initialized: \(name)" 182 | case .notRunning(let name): 183 | return "Virtual machine not running: \(name)" 184 | case .alreadyRunning(let name): 185 | return "Virtual machine already running: \(name)" 186 | case .installNotStarted(let name): 187 | return "Virtual machine install not started: \(name)" 188 | case .stopTimeout(let name): 189 | return "Timeout while stopping virtual machine: \(name)" 190 | case .resizeTooSmall(let current, let requested): 191 | return "Cannot resize disk to \(requested) bytes, current size is \(current) bytes" 192 | case .vncNotConfigured: 193 | return "VNC is not configured for this virtual machine" 194 | case .vncPortBindingFailed(let requested, let actual): 195 | if actual == -1 { 196 | return "Could not bind to VNC port \(requested) (port already in use). Try a different port or use port 0 for auto-assign." 197 | } 198 | return "Could not bind to VNC port \(requested) (port already in use). System assigned port \(actual) instead. Try a different port or use port 0 for auto-assign." 199 | case .internalError(let message): 200 | return "Internal error: \(message)" 201 | case .unsupportedOS(let os): 202 | return "Unsupported operating system: \(os)" 203 | case .invalidDisplayResolution(let resolution): 204 | return "Invalid display resolution: \(resolution)" 205 | } 206 | } 207 | } 208 | 209 | enum ResticError: Error { 210 | case snapshotFailed(String) 211 | case restoreFailed(String) 212 | case genericError(String) 213 | } 214 | 215 | enum VmrunError: Error, LocalizedError { 216 | case commandNotFound 217 | case operationFailed(command: String, output: String?) 218 | 219 | var errorDescription: String? { 220 | switch self { 221 | case .commandNotFound: 222 | return "vmrun command not found. Ensure VMware Fusion is installed and in the system PATH." 223 | case .operationFailed(let command, let output): 224 | return "vmrun command '\(command)' failed. Output: \(output ?? "No output")" 225 | } 226 | } 227 | } ``` -------------------------------------------------------------------------------- /libs/python/core/core/telemetry/posthog.py: -------------------------------------------------------------------------------- ```python 1 | """Telemetry client using PostHog for collecting anonymous usage data.""" 2 | 3 | from __future__ import annotations 4 | 5 | import logging 6 | import os 7 | import uuid 8 | import sys 9 | from pathlib import Path 10 | from typing import Any, Dict, List, Optional 11 | 12 | import posthog 13 | from core import __version__ 14 | 15 | logger = logging.getLogger("core.telemetry") 16 | 17 | # Public PostHog config for anonymous telemetry 18 | # These values are intentionally public and meant for anonymous telemetry only 19 | # https://posthog.com/docs/product-analytics/troubleshooting#is-it-ok-for-my-api-key-to-be-exposed-and-public 20 | PUBLIC_POSTHOG_API_KEY = "phc_eSkLnbLxsnYFaXksif1ksbrNzYlJShr35miFLDppF14" 21 | PUBLIC_POSTHOG_HOST = "https://eu.i.posthog.com" 22 | 23 | class PostHogTelemetryClient: 24 | """Collects and reports telemetry data via PostHog.""" 25 | 26 | # Global singleton (class-managed) 27 | _singleton: Optional["PostHogTelemetryClient"] = None 28 | 29 | def __init__(self): 30 | """Initialize PostHog telemetry client.""" 31 | self.installation_id = self._get_or_create_installation_id() 32 | self.initialized = False 33 | self.queued_events: List[Dict[str, Any]] = [] 34 | 35 | # Log telemetry status on startup 36 | if self.is_telemetry_enabled(): 37 | logger.info("Telemetry enabled") 38 | # Initialize PostHog client if config is available 39 | self._initialize_posthog() 40 | else: 41 | logger.info("Telemetry disabled") 42 | 43 | @classmethod 44 | def is_telemetry_enabled(cls) -> bool: 45 | """True if telemetry is currently active for this process.""" 46 | return ( 47 | # Legacy opt-out flag 48 | os.environ.get("CUA_TELEMETRY", "").lower() != "off" 49 | # Opt-in flag (defaults to enabled) 50 | and os.environ.get("CUA_TELEMETRY_ENABLED", "true").lower() in { "1", "true", "yes", "on" } 51 | ) 52 | 53 | def _get_or_create_installation_id(self) -> str: 54 | """Get or create a unique installation ID that persists across runs. 55 | 56 | The ID is always stored within the core library directory itself, 57 | ensuring it persists regardless of how the library is used. 58 | 59 | This ID is not tied to any personal information. 60 | """ 61 | # Get the core library directory (where this file is located) 62 | try: 63 | # Find the core module directory using this file's location 64 | core_module_dir = Path( 65 | __file__ 66 | ).parent.parent # core/telemetry/posthog_client.py -> core/telemetry -> core 67 | storage_dir = core_module_dir / ".storage" 68 | storage_dir.mkdir(exist_ok=True) 69 | 70 | id_file = storage_dir / "installation_id" 71 | 72 | # Try to read existing ID 73 | if id_file.exists(): 74 | try: 75 | stored_id = id_file.read_text().strip() 76 | if stored_id: # Make sure it's not empty 77 | logger.debug(f"Using existing installation ID: {stored_id}") 78 | return stored_id 79 | except Exception as e: 80 | logger.debug(f"Error reading installation ID file: {e}") 81 | 82 | # Create new ID 83 | new_id = str(uuid.uuid4()) 84 | try: 85 | id_file.write_text(new_id) 86 | logger.debug(f"Created new installation ID: {new_id}") 87 | return new_id 88 | except Exception as e: 89 | logger.warning(f"Could not write installation ID: {e}") 90 | except Exception as e: 91 | logger.warning(f"Error accessing core module directory: {e}") 92 | 93 | # Last resort: Create a new in-memory ID 94 | logger.warning("Using random installation ID (will not persist across runs)") 95 | return str(uuid.uuid4()) 96 | 97 | def _initialize_posthog(self) -> bool: 98 | """Initialize the PostHog client with configuration. 99 | 100 | Returns: 101 | bool: True if initialized successfully, False otherwise 102 | """ 103 | if self.initialized: 104 | return True 105 | 106 | try: 107 | # Allow overrides from environment for testing/region control 108 | posthog.api_key = PUBLIC_POSTHOG_API_KEY 109 | posthog.host = PUBLIC_POSTHOG_HOST 110 | 111 | # Configure the client 112 | posthog.debug = os.environ.get("CUA_TELEMETRY_DEBUG", "").lower() == "on" 113 | 114 | # Log telemetry status 115 | logger.info( 116 | f"Initializing PostHog telemetry with installation ID: {self.installation_id}" 117 | ) 118 | if posthog.debug: 119 | logger.debug(f"PostHog API Key: {posthog.api_key}") 120 | logger.debug(f"PostHog Host: {posthog.host}") 121 | 122 | # Identify this installation 123 | self._identify() 124 | 125 | # Process any queued events 126 | for event in self.queued_events: 127 | posthog.capture( 128 | distinct_id=self.installation_id, 129 | event=event["event"], 130 | properties=event["properties"], 131 | ) 132 | self.queued_events = [] 133 | 134 | self.initialized = True 135 | return True 136 | except Exception as e: 137 | logger.warning(f"Failed to initialize PostHog: {e}") 138 | return False 139 | 140 | def _identify(self) -> None: 141 | """Set up user properties for the current installation with PostHog.""" 142 | try: 143 | properties = { 144 | "version": __version__, 145 | "is_ci": "CI" in os.environ, 146 | "os": os.name, 147 | "python_version": sys.version.split()[0], 148 | } 149 | 150 | logger.debug( 151 | f"Setting up PostHog user properties for: {self.installation_id} with properties: {properties}" 152 | ) 153 | 154 | # In the Python SDK, we capture an identification event instead of calling identify() 155 | posthog.capture( 156 | distinct_id=self.installation_id, 157 | event="$identify", 158 | properties={"$set": properties} 159 | ) 160 | 161 | logger.info(f"Set up PostHog user properties for installation: {self.installation_id}") 162 | except Exception as e: 163 | logger.warning(f"Failed to set up PostHog user properties: {e}") 164 | 165 | def record_event(self, event_name: str, properties: Optional[Dict[str, Any]] = None) -> None: 166 | """Record an event with optional properties. 167 | 168 | Args: 169 | event_name: Name of the event 170 | properties: Event properties (must not contain sensitive data) 171 | """ 172 | # Respect runtime telemetry opt-out. 173 | if not self.is_telemetry_enabled(): 174 | logger.debug("Telemetry disabled; event not recorded.") 175 | return 176 | 177 | event_properties = {"version": __version__, **(properties or {})} 178 | 179 | logger.info(f"Recording event: {event_name} with properties: {event_properties}") 180 | 181 | if self.initialized: 182 | try: 183 | posthog.capture( 184 | distinct_id=self.installation_id, event=event_name, properties=event_properties 185 | ) 186 | logger.info(f"Sent event to PostHog: {event_name}") 187 | # Flush immediately to ensure delivery 188 | posthog.flush() 189 | except Exception as e: 190 | logger.warning(f"Failed to send event to PostHog: {e}") 191 | else: 192 | # Queue the event for later 193 | logger.info(f"PostHog not initialized, queuing event for later: {event_name}") 194 | self.queued_events.append({"event": event_name, "properties": event_properties}) 195 | # Try to initialize now if not already 196 | initialize_result = self._initialize_posthog() 197 | logger.info(f"Attempted to initialize PostHog: {initialize_result}") 198 | 199 | def flush(self) -> bool: 200 | """Flush any pending events to PostHog. 201 | 202 | Returns: 203 | bool: True if successful, False otherwise 204 | """ 205 | if not self.initialized and not self._initialize_posthog(): 206 | return False 207 | 208 | try: 209 | posthog.flush() 210 | return True 211 | except Exception as e: 212 | logger.debug(f"Failed to flush PostHog events: {e}") 213 | return False 214 | 215 | @classmethod 216 | def get_client(cls) -> "PostHogTelemetryClient": 217 | """Return the global PostHogTelemetryClient instance, creating it if needed.""" 218 | if cls._singleton is None: 219 | cls._singleton = cls() 220 | return cls._singleton 221 | 222 | @classmethod 223 | def destroy_client(cls) -> None: 224 | """Destroy the global PostHogTelemetryClient instance.""" 225 | cls._singleton = None 226 | 227 | def destroy_telemetry_client() -> None: 228 | """Destroy the global PostHogTelemetryClient instance (class-managed).""" 229 | PostHogTelemetryClient.destroy_client() 230 | 231 | def is_telemetry_enabled() -> bool: 232 | return PostHogTelemetryClient.is_telemetry_enabled() 233 | 234 | def record_event(event_name: str, properties: Optional[Dict[str, Any]] | None = None) -> None: 235 | """Record an arbitrary PostHog event.""" 236 | PostHogTelemetryClient.get_client().record_event(event_name, properties or {}) ``` -------------------------------------------------------------------------------- /libs/python/agent/agent/ui/gradio/app.py: -------------------------------------------------------------------------------- ```python 1 | """ 2 | Advanced Gradio UI for Computer-Use Agent (cua-agent) 3 | 4 | This is a Gradio interface for the Computer-Use Agent v0.4.x (cua-agent) 5 | with an advanced UI for model selection and configuration. 6 | 7 | Supported Agent Models: 8 | - OpenAI: openai/computer-use-preview 9 | - Anthropic: anthropic/claude-3-5-sonnet-20241022, anthropic/claude-3-7-sonnet-20250219 10 | - UI-TARS: huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B 11 | - Omniparser: omniparser+anthropic/claude-3-5-sonnet-20241022, omniparser+ollama_chat/gemma3 12 | 13 | Requirements: 14 | - Mac with Apple Silicon (M1/M2/M3/M4), Linux, or Windows 15 | - macOS 14 (Sonoma) or newer / Ubuntu 20.04+ 16 | - Python 3.11+ 17 | - Lume CLI installed (https://github.com/trycua/cua) 18 | - OpenAI or Anthropic API key 19 | """ 20 | 21 | import os 22 | import asyncio 23 | import logging 24 | import json 25 | import platform 26 | from pathlib import Path 27 | from typing import Dict, List, Optional, AsyncGenerator, Any, Tuple, Union 28 | import gradio as gr 29 | from gradio.components.chatbot import MetadataDict 30 | from typing import cast 31 | 32 | # Import from agent package 33 | from agent import ComputerAgent 34 | from agent.types import Messages, AgentResponse 35 | from computer import Computer 36 | 37 | # Global variables 38 | global_agent = None 39 | global_computer = None 40 | SETTINGS_FILE = Path(".gradio_settings.json") 41 | 42 | logging.basicConfig(level=logging.INFO) 43 | 44 | import dotenv 45 | if dotenv.load_dotenv(): 46 | print(f"DEBUG - Loaded environment variables from {dotenv.find_dotenv()}") 47 | else: 48 | print("DEBUG - No .env file found") 49 | 50 | # --- Settings Load/Save Functions --- 51 | def load_settings() -> Dict[str, Any]: 52 | """Loads settings from the JSON file.""" 53 | if SETTINGS_FILE.exists(): 54 | try: 55 | with open(SETTINGS_FILE, "r") as f: 56 | settings = json.load(f) 57 | if isinstance(settings, dict): 58 | print(f"DEBUG - Loaded settings from {SETTINGS_FILE}") 59 | return settings 60 | except (json.JSONDecodeError, IOError) as e: 61 | print(f"Warning: Could not load settings from {SETTINGS_FILE}: {e}") 62 | return {} 63 | 64 | 65 | def save_settings(settings: Dict[str, Any]): 66 | """Saves settings to the JSON file.""" 67 | settings.pop("provider_api_key", None) 68 | try: 69 | with open(SETTINGS_FILE, "w") as f: 70 | json.dump(settings, f, indent=4) 71 | print(f"DEBUG - Saved settings to {SETTINGS_FILE}") 72 | except IOError as e: 73 | print(f"Warning: Could not save settings to {SETTINGS_FILE}: {e}") 74 | 75 | 76 | # # Custom Screenshot Handler for Gradio chat 77 | # class GradioChatScreenshotHandler: 78 | # """Custom handler that adds screenshots to the Gradio chatbot.""" 79 | 80 | # def __init__(self, chatbot_history: List[gr.ChatMessage]): 81 | # self.chatbot_history = chatbot_history 82 | # print("GradioChatScreenshotHandler initialized") 83 | 84 | # async def on_screenshot(self, screenshot_base64: str, action_type: str = "") -> None: 85 | # """Add screenshot to chatbot when a screenshot is taken.""" 86 | # image_markdown = f"" 87 | 88 | # if self.chatbot_history is not None: 89 | # self.chatbot_history.append( 90 | # gr.ChatMessage( 91 | # role="assistant", 92 | # content=image_markdown, 93 | # metadata={"title": f"🖥️ Screenshot - {action_type}", "status": "done"}, 94 | # ) 95 | # ) 96 | 97 | 98 | # Detect platform capabilities 99 | is_mac = platform.system().lower() == "darwin" 100 | is_lume_available = is_mac or (os.environ.get("PYLUME_HOST", "localhost") != "localhost") 101 | 102 | print("PYLUME_HOST: ", os.environ.get("PYLUME_HOST", "localhost")) 103 | print("is_mac: ", is_mac) 104 | print("Lume available: ", is_lume_available) 105 | 106 | # Map model names to agent model strings 107 | MODEL_MAPPINGS = { 108 | "openai": { 109 | "default": "openai/computer-use-preview", 110 | "OpenAI: Computer-Use Preview": "openai/computer-use-preview", 111 | }, 112 | "anthropic": { 113 | "default": "anthropic/claude-3-7-sonnet-20250219", 114 | "Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-20250514", 115 | "Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-20250514", 116 | "Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-3-7-sonnet-20250219", 117 | "Anthropic: Claude 3.5 Sonnet (20241022)": "anthropic/claude-3-5-sonnet-20241022", 118 | }, 119 | "omni": { 120 | "default": "omniparser+openai/gpt-4o", 121 | "OMNI: OpenAI GPT-4o": "omniparser+openai/gpt-4o", 122 | "OMNI: OpenAI GPT-4o mini": "omniparser+openai/gpt-4o-mini", 123 | "OMNI: Claude 3.7 Sonnet (20250219)": "omniparser+anthropic/claude-3-7-sonnet-20250219", 124 | "OMNI: Claude 3.5 Sonnet (20241022)": "omniparser+anthropic/claude-3-5-sonnet-20241022", 125 | }, 126 | "uitars": { 127 | "default": "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B" if is_mac else "ui-tars", 128 | "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B": "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", 129 | }, 130 | } 131 | 132 | 133 | def get_model_string(model_name: str, loop_provider: str) -> str: 134 | """Determine the agent model string based on the input.""" 135 | if model_name == "Custom model (OpenAI compatible API)": 136 | return "custom_oaicompat" 137 | elif model_name == "Custom model (ollama)": 138 | return "custom_ollama" 139 | elif loop_provider == "OMNI-OLLAMA" or model_name.startswith("OMNI: Ollama "): 140 | if model_name.startswith("OMNI: Ollama "): 141 | ollama_model = model_name.split("OMNI: Ollama ", 1)[1] 142 | return f"omniparser+ollama_chat/{ollama_model}" 143 | return "omniparser+ollama_chat/llama3" 144 | 145 | # Map based on loop provider 146 | mapping = MODEL_MAPPINGS.get(loop_provider.lower(), MODEL_MAPPINGS["openai"]) 147 | return mapping.get(model_name, mapping["default"]) 148 | 149 | 150 | def get_ollama_models() -> List[str]: 151 | """Get available models from Ollama if installed.""" 152 | try: 153 | import subprocess 154 | result = subprocess.run(["ollama", "list"], capture_output=True, text=True) 155 | if result.returncode == 0: 156 | lines = result.stdout.strip().split("\n") 157 | if len(lines) < 2: 158 | return [] 159 | models = [] 160 | for line in lines[1:]: 161 | parts = line.split() 162 | if parts: 163 | model_name = parts[0] 164 | models.append(f"OMNI: Ollama {model_name}") 165 | return models 166 | return [] 167 | except Exception as e: 168 | logging.error(f"Error getting Ollama models: {e}") 169 | return [] 170 | 171 | 172 | def create_computer_instance( 173 | verbosity: int = logging.INFO, 174 | os_type: str = "macos", 175 | provider_type: str = "lume", 176 | name: Optional[str] = None, 177 | api_key: Optional[str] = None 178 | ) -> Computer: 179 | """Create or get the global Computer instance.""" 180 | global global_computer 181 | if global_computer is None: 182 | if provider_type == "localhost": 183 | global_computer = Computer( 184 | verbosity=verbosity, 185 | os_type=os_type, 186 | use_host_computer_server=True 187 | ) 188 | else: 189 | global_computer = Computer( 190 | verbosity=verbosity, 191 | os_type=os_type, 192 | provider_type=provider_type, 193 | name=name if name else "", 194 | api_key=api_key 195 | ) 196 | return global_computer 197 | 198 | 199 | def create_agent( 200 | model_string: str, 201 | save_trajectory: bool = True, 202 | only_n_most_recent_images: int = 3, 203 | verbosity: int = logging.INFO, 204 | custom_model_name: Optional[str] = None, 205 | computer_os: str = "macos", 206 | computer_provider: str = "lume", 207 | computer_name: Optional[str] = None, 208 | computer_api_key: Optional[str] = None, 209 | max_trajectory_budget: Optional[float] = None, 210 | ) -> ComputerAgent: 211 | """Create or update the global agent with the specified parameters.""" 212 | global global_agent 213 | 214 | # Create the computer 215 | computer = create_computer_instance( 216 | verbosity=verbosity, 217 | os_type=computer_os, 218 | provider_type=computer_provider, 219 | name=computer_name, 220 | api_key=computer_api_key 221 | ) 222 | 223 | # Handle custom models 224 | if model_string == "custom_oaicompat" and custom_model_name: 225 | model_string = custom_model_name 226 | elif model_string == "custom_ollama" and custom_model_name: 227 | model_string = f"omniparser+ollama_chat/{custom_model_name}" 228 | 229 | # Create agent kwargs 230 | agent_kwargs = { 231 | "model": model_string, 232 | "tools": [computer], 233 | "only_n_most_recent_images": only_n_most_recent_images, 234 | "verbosity": verbosity, 235 | } 236 | 237 | if save_trajectory: 238 | agent_kwargs["trajectory_dir"] = "trajectories" 239 | 240 | if max_trajectory_budget: 241 | agent_kwargs["max_trajectory_budget"] = {"max_budget": max_trajectory_budget, "raise_error": True} 242 | 243 | global_agent = ComputerAgent(**agent_kwargs) 244 | return global_agent 245 | 246 | 247 | def launch_ui(): 248 | """Standalone function to launch the Gradio app.""" 249 | from agent.ui.gradio.ui_components import create_gradio_ui 250 | print(f"Starting Gradio app for CUA Agent...") 251 | demo = create_gradio_ui() 252 | demo.launch(share=False, inbrowser=True) 253 | 254 | 255 | if __name__ == "__main__": 256 | launch_ui() 257 | ``` -------------------------------------------------------------------------------- /docs/content/docs/computer-sdk/commands.mdx: -------------------------------------------------------------------------------- ```markdown 1 | --- 2 | title: Commands 3 | description: Computer commands and interface methods 4 | --- 5 | 6 | This page describes the set of supported **commands** you can use to control a Cua Computer directly via the Python SDK. 7 | 8 | These commands map to the same actions available in the [Computer Server API Commands Reference](../libraries/computer-server/Commands), and provide low-level, async access to system operations from your agent or automation code. 9 | 10 | ## Shell Actions 11 | 12 | Execute shell commands and get detailed results: 13 | 14 | <Tabs items={['Python', 'TypeScript']}> 15 | <Tab value="Python"> 16 | ```python 17 | # Run shell command result = await 18 | computer.interface.run_command(cmd) # result.stdout, result.stderr, result.returncode 19 | ``` 20 | </Tab> 21 | <Tab value="TypeScript"> 22 | ```typescript 23 | // Run shell command const result = await 24 | computer.interface.runCommand(cmd); // result.stdout, result.stderr, result.returncode 25 | ``` 26 | </Tab> 27 | </Tabs> 28 | 29 | ## Mouse Actions 30 | 31 | Precise mouse control and interaction: 32 | 33 | <Tabs items={['Python', 'TypeScript']}> 34 | <Tab value="Python"> 35 | ```python 36 | # Basic clicks 37 | await computer.interface.left_click(x, y) # Left click at coordinates 38 | await computer.interface.right_click(x, y) # Right click at coordinates 39 | await computer.interface.double_click(x, y) # Double click at coordinates 40 | 41 | # Cursor movement and dragging 42 | await computer.interface.move_cursor(x, y) # Move cursor to coordinates 43 | await computer.interface.drag_to(x, y, duration) # Drag to coordinates 44 | await computer.interface.get_cursor_position() # Get current cursor position 45 | 46 | # Advanced mouse control 47 | await computer.interface.mouse_down(x, y, button="left") # Press and hold a mouse button 48 | await computer.interface.mouse_up(x, y, button="left") # Release a mouse button 49 | ``` 50 | 51 | </Tab> 52 | <Tab value="TypeScript"> 53 | ```typescript 54 | // Basic clicks 55 | await computer.interface.leftClick(x, y); // Left click at coordinates 56 | await computer.interface.rightClick(x, y); // Right click at coordinates 57 | await computer.interface.doubleClick(x, y); // Double click at coordinates 58 | 59 | // Cursor movement and dragging 60 | await computer.interface.moveCursor(x, y); // Move cursor to coordinates 61 | await computer.interface.dragTo(x, y, duration); // Drag to coordinates 62 | await computer.interface.getCursorPosition(); // Get current cursor position 63 | 64 | // Advanced mouse control 65 | await computer.interface.mouseDown(x, y, "left"); // Press and hold a mouse button 66 | await computer.interface.mouseUp(x, y, "left"); // Release a mouse button 67 | ``` 68 | 69 | </Tab> 70 | </Tabs> 71 | 72 | ## Keyboard Actions 73 | 74 | Text input and key combinations: 75 | 76 | <Tabs items={['Python', 'TypeScript']}> 77 | <Tab value="Python"> 78 | ```python 79 | # Text input 80 | await computer.interface.type_text("Hello") # Type text 81 | await computer.interface.press_key("enter") # Press a single key 82 | 83 | # Key combinations and advanced control 84 | await computer.interface.hotkey("command", "c") # Press key combination 85 | await computer.interface.key_down("command") # Press and hold a key 86 | await computer.interface.key_up("command") # Release a key 87 | ``` 88 | 89 | </Tab> 90 | <Tab value="TypeScript"> 91 | ```typescript 92 | // Text input 93 | await computer.interface.typeText("Hello"); // Type text 94 | await computer.interface.pressKey("enter"); // Press a single key 95 | 96 | // Key combinations and advanced control 97 | await computer.interface.hotkey("command", "c"); // Press key combination 98 | await computer.interface.keyDown("command"); // Press and hold a key 99 | await computer.interface.keyUp("command"); // Release a key 100 | ``` 101 | 102 | </Tab> 103 | </Tabs> 104 | 105 | ## Scrolling Actions 106 | 107 | Mouse wheel and scrolling control: 108 | 109 | <Tabs items={['Python', 'TypeScript']}> 110 | <Tab value="Python"> 111 | ```python 112 | # Scrolling 113 | await computer.interface.scroll(x, y) # Scroll the mouse wheel 114 | await computer.interface.scroll_down(clicks) # Scroll down await 115 | computer.interface.scroll_up(clicks) # Scroll up 116 | ``` 117 | </Tab> 118 | <Tab value="TypeScript"> 119 | ```typescript 120 | // Scrolling 121 | await computer.interface.scroll(x, y); // Scroll the mouse wheel 122 | await computer.interface.scrollDown(clicks); // Scroll down 123 | await computer.interface.scrollUp(clicks); // Scroll up 124 | ``` 125 | </Tab> 126 | </Tabs> 127 | 128 | ## Screen Actions 129 | 130 | Screen capture and display information: 131 | 132 | <Tabs items={['Python', 'TypeScript']}> 133 | <Tab value="Python"> 134 | ```python 135 | # Screen operations 136 | await computer.interface.screenshot() # Take a screenshot 137 | await computer.interface.get_screen_size() # Get screen dimensions 138 | 139 | ``` 140 | 141 | </Tab> 142 | <Tab value="TypeScript"> 143 | ```typescript 144 | // Screen operations 145 | await computer.interface.screenshot(); // Take a screenshot 146 | await computer.interface.getScreenSize(); // Get screen dimensions 147 | 148 | ``` 149 | </Tab> 150 | </Tabs> 151 | 152 | ## Clipboard Actions 153 | 154 | System clipboard management: 155 | 156 | <Tabs items={['Python', 'TypeScript']}> 157 | <Tab value="Python"> 158 | ```python 159 | # Clipboard operations await 160 | computer.interface.set_clipboard(text) # Set clipboard content await 161 | computer.interface.copy_to_clipboard() # Get clipboard content 162 | 163 | ``` 164 | 165 | </Tab> 166 | <Tab value="TypeScript"> 167 | ```typescript 168 | // Clipboard operations 169 | await computer.interface.setClipboard(text); // Set clipboard content 170 | await computer.interface.copyToClipboard(); // Get clipboard content 171 | 172 | ``` 173 | 174 | </Tab> 175 | </Tabs> 176 | 177 | ## File System Operations 178 | 179 | Direct file and directory manipulation: 180 | 181 | <Tabs items={['Python', 'TypeScript']}> 182 | <Tab value="Python"> 183 | 184 | ```python 185 | # File existence checks 186 | await computer.interface.file_exists(path) # Check if file exists 187 | await computer.interface.directory_exists(path) # Check if directory exists 188 | 189 | # File content operations 190 | await computer.interface.read_text(path, encoding="utf-8") # Read file content 191 | await computer.interface.write_text(path, content, encoding="utf-8") # Write file content 192 | await computer.interface.read_bytes(path) # Read file content as bytes 193 | await computer.interface.write_bytes(path, content) # Write file content as bytes 194 | 195 | # File and directory management 196 | await computer.interface.delete_file(path) # Delete file 197 | await computer.interface.create_dir(path) # Create directory 198 | await computer.interface.delete_dir(path) # Delete directory 199 | await computer.interface.list_dir(path) # List directory contents 200 | ``` 201 | 202 | </Tab> 203 | <Tab value="TypeScript"> 204 | ```typescript 205 | # File existence checks 206 | await computer.interface.fileExists(path); // Check if file exists 207 | await computer.interface.directoryExists(path); // Check if directory exists 208 | 209 | # File content operations 210 | await computer.interface.readText(path, "utf-8"); // Read file content 211 | await computer.interface.writeText(path, content, "utf-8"); // Write file content 212 | await computer.interface.readBytes(path); // Read file content as bytes 213 | await computer.interface.writeBytes(path, content); // Write file content as bytes 214 | 215 | # File and directory management 216 | await computer.interface.deleteFile(path); // Delete file 217 | await computer.interface.createDir(path); // Create directory 218 | await computer.interface.deleteDir(path); // Delete directory 219 | await computer.interface.listDir(path); // List directory contents 220 | ``` 221 | 222 | </Tab> 223 | </Tabs> 224 | 225 | ## Accessibility 226 | 227 | Access system accessibility information: 228 | 229 | <Tabs items={['Python', 'TypeScript']}> 230 | <Tab value="Python"> 231 | ```python 232 | # Get accessibility tree 233 | await computer.interface.get_accessibility_tree() 234 | 235 | ``` 236 | 237 | </Tab> 238 | <Tab value="TypeScript"> 239 | ```typescript 240 | // Get accessibility tree 241 | await computer.interface.getAccessibilityTree(); 242 | 243 | ``` 244 | </Tab> 245 | </Tabs> 246 | 247 | ## Delay Configuration 248 | 249 | Control timing between actions: 250 | 251 | <Tabs items={['Python']}> 252 | <Tab value="Python"> 253 | ```python 254 | # Set default delay between all actions (in seconds) 255 | computer.interface.delay = 0.5 # 500ms delay between actions 256 | 257 | # Or specify delay for individual actions 258 | await computer.interface.left_click(x, y, delay=1.0) # 1 second delay after click 259 | await computer.interface.type_text("Hello", delay=0.2) # 200ms delay after typing 260 | await computer.interface.press_key("enter", delay=0.5) # 500ms delay after key press 261 | ``` 262 | 263 | </Tab> 264 | </Tabs> 265 | 266 | ## Python Virtual Environment Operations 267 | 268 | Manage Python environments: 269 | 270 | <Tabs items={['Python']}> 271 | <Tab value="Python"> 272 | ```python 273 | # Virtual environment management 274 | await computer.venv_install("demo_venv", ["requests", "macos-pyxa"]) # Install packages in a virtual environment 275 | await computer.venv_cmd("demo_venv", "python -c 'import requests; print(requests.get(`https://httpbin.org/ip`).json())'') # Run a shell command in a virtual environment 276 | await computer.venv_exec("demo_venv", python_function_or_code, *args, **kwargs) # Run a Python function in a virtual environment and return the result / raise an exception 277 | ``` 278 | 279 | </Tab> 280 | </Tabs> ``` -------------------------------------------------------------------------------- /blog/app-use.md: -------------------------------------------------------------------------------- ```markdown 1 | # App-Use: Control Individual Applications with Cua Agents 2 | 3 | *Published on May 31, 2025 by The Cua Team* 4 | 5 | Today, we are excited to introduce a new experimental feature landing in the [Cua GitHub repository](https://github.com/trycua/cua): **App-Use**. App-Use allows you to create lightweight virtual desktops that limit agent access to specific applications, improving precision of your agent's trajectory. Perfect for parallel workflows, and focused task execution. 6 | 7 | > **Note:** App-Use is currently experimental. To use it, you need to enable it by passing `experiments=["app-use"]` feature flag when creating your Computer instance. 8 | 9 | Check out an example of a Cua Agent automating Cua's team Taco Bell order through the iPhone Mirroring app: 10 | 11 | <div align="center"> 12 | <video src="https://github.com/user-attachments/assets/6362572e-f784-4006-aa6e-bce10991fab9" width="600" controls></video> 13 | </div> 14 | 15 | ## What is App-Use? 16 | 17 | App-Use lets you create virtual desktop sessions scoped to specific applications. Instead of giving an agent access to your entire screen, you can say "only work with Safari and Notes" or "just control the iPhone Mirroring app." 18 | 19 | ```python 20 | # Create a macOS VM with App Use experimental feature enabled 21 | computer = Computer(experiments=["app-use"]) 22 | 23 | # Create a desktop limited to specific apps 24 | desktop = computer.create_desktop_from_apps(["Safari", "Notes"]) 25 | 26 | # Your agent can now only see and interact with these apps 27 | agent = ComputerAgent( 28 | model="anthropic/claude-3-5-sonnet-20241022", 29 | tools=[desktop] 30 | ) 31 | ``` 32 | 33 | ## Key Benefits 34 | 35 | ### 1. Lightweight and Fast 36 | App-Use creates visual filters, not new processes. Your apps continue running normally - we just control what the agent can see and click on. The virtual desktops are composited views that require no additional compute resources beyond the existing window manager operations. 37 | 38 | ### 2. Run Multiple Agents in Parallel 39 | Deploy a team of specialized agents, each focused on their own apps: 40 | 41 | ```python 42 | # Create a Computer with App Use enabled 43 | computer = Computer(experiments=["app-use"]) 44 | 45 | # Research agent focuses on browser 46 | research_desktop = computer.create_desktop_from_apps(["Safari"]) 47 | research_agent = ComputerAgent(tools=[research_desktop], ...) 48 | 49 | # Writing agent focuses on documents 50 | writing_desktop = computer.create_desktop_from_apps(["Pages", "Notes"]) 51 | writing_agent = ComputerAgent(tools=[writing_desktop], ...) 52 | 53 | async def run_agent(agent, task): 54 | async for result in agent.run(task): 55 | print(result.get('text', '')) 56 | 57 | # Run both simultaneously 58 | await asyncio.gather( 59 | run_agent(research_agent, "Research AI trends for 2025"), 60 | run_agent(writing_agent, "Draft blog post outline") 61 | ) 62 | ``` 63 | 64 | ## How To: Getting Started with App-Use 65 | 66 | ### Requirements 67 | 68 | To get started with App-Use, you'll need: 69 | - Python 3.11+ 70 | - macOS Sequoia (15.0) or later 71 | 72 | ### Getting Started 73 | 74 | ```bash 75 | # Install packages and launch UI 76 | pip install -U "cua-computer[all]" "cua-agent[all]" 77 | python -m agent.ui.gradio.app 78 | ``` 79 | 80 | ```python 81 | import asyncio 82 | from computer import Computer 83 | from agent import ComputerAgent 84 | 85 | async def main(): 86 | computer = Computer() 87 | await computer.run() 88 | 89 | # Create app-specific desktop sessions 90 | desktop = computer.create_desktop_from_apps(["Notes"]) 91 | 92 | # Initialize an agent 93 | agent = ComputerAgent( 94 | model="anthropic/claude-3-5-sonnet-20241022", 95 | tools=[desktop] 96 | ) 97 | 98 | # Take a screenshot (returns bytes by default) 99 | screenshot = await desktop.interface.screenshot() 100 | with open("app_screenshot.png", "wb") as f: 101 | f.write(screenshot) 102 | 103 | # Run an agent task 104 | async for result in agent.run("Create a new note titled 'Meeting Notes' and add today's agenda items"): 105 | print(f"Agent: {result.get('text', '')}") 106 | 107 | if __name__ == "__main__": 108 | asyncio.run(main()) 109 | ``` 110 | 111 | ## Use Case: Automating Your iPhone with Cua 112 | 113 | ### ⚠️ Important Warning 114 | 115 | Computer-use agents are powerful tools that can interact with your devices. This guide involves using your own macOS and iPhone instead of a VM. **Proceed at your own risk.** Always: 116 | - Review agent actions before running 117 | - Start with non-critical tasks 118 | - Monitor agent behavior closely 119 | 120 | Remember with Cua it is still advised to use a VM for a better level of isolation for your agents. 121 | 122 | ### Setting Up iPhone Automation 123 | 124 | ### Step 1: Start the cua-computer-server 125 | 126 | First, you'll need to start the cua-computer-server locally to enable access to iPhone Mirroring via the Computer interface: 127 | 128 | ```bash 129 | # Install the server 130 | pip install cua-computer-server 131 | 132 | # Start the server 133 | python -m computer_server 134 | ``` 135 | 136 | ### Step 2: Connect iPhone Mirroring 137 | 138 | Then, you'll need to open the "iPhone Mirroring" app on your Mac and connect it to your iPhone. 139 | 140 | ### Step 3: Create an iPhone Automation Session 141 | 142 | Finally, you can create an iPhone automation session: 143 | 144 | ```python 145 | import asyncio 146 | from computer import Computer 147 | from cua_agent import Agent 148 | 149 | async def automate_iphone(): 150 | # Connect to your local computer server 151 | my_mac = Computer(use_host_computer_server=True, os_type="macos", experiments=["app-use"]) 152 | await my_mac.run() 153 | 154 | # Create a desktop focused on iPhone Mirroring 155 | my_iphone = my_mac.create_desktop_from_apps(["iPhone Mirroring"]) 156 | 157 | # Initialize an agent for iPhone automation 158 | agent = ComputerAgent( 159 | model="anthropic/claude-3-5-sonnet-20241022", 160 | tools=[my_iphone] 161 | ) 162 | 163 | # Example: Send a message 164 | async for result in agent.run("Open Messages and send 'Hello from Cua!' to John"): 165 | print(f"Agent: {result.get('text', '')}") 166 | 167 | # Example: Set a reminder 168 | async for result in agent.run("Create a reminder to call mom at 5 PM today"): 169 | print(f"Agent: {result.get('text', '')}") 170 | 171 | if __name__ == "__main__": 172 | asyncio.run(automate_iphone()) 173 | ``` 174 | 175 | ### iPhone Automation Use Cases 176 | 177 | With Cua's iPhone automation, you can: 178 | - **Automate messaging**: Send texts, respond to messages, manage conversations 179 | - **Control apps**: Navigate any iPhone app using natural language 180 | - **Manage settings**: Adjust iPhone settings programmatically 181 | - **Extract data**: Read information from apps that don't have APIs 182 | - **Test iOS apps**: Automate testing workflows for iPhone applications 183 | 184 | ## Important Notes 185 | 186 | - **Visual isolation only**: Apps share the same files, OS resources, and user session 187 | - **Dynamic resolution**: Desktops automatically scale to fit app windows and menu bars 188 | - **macOS only**: Currently requires macOS due to compositing engine dependencies 189 | - **Not a security boundary**: This is for agent focus, not security isolation 190 | 191 | ## When to Use What: App-Use vs Multiple Cua Containers 192 | 193 | ### Use App-Use within the same macOS Cua Container: 194 | - ✅ You need lightweight, fast agent focusing (macOS only) 195 | - ✅ You want to run multiple agents on one desktop 196 | - ✅ You're automating personal devices like iPhones 197 | - ✅ Window layout isolation is sufficient 198 | - ✅ You want low computational overhead 199 | 200 | ### Use Multiple Cua Containers: 201 | - ✅ You need maximum isolation between agents 202 | - ✅ You require cross-platform support (Mac/Linux/Windows) 203 | - ✅ You need guaranteed resource allocation 204 | - ✅ Security and complete isolation are critical 205 | - ⚠️ Note: Most computationally expensive option 206 | 207 | ## Pro Tips 208 | 209 | 1. **Start Small**: Test with one app before creating complex multi-app desktops 210 | 2. **Screenshot First**: Take a screenshot to verify your desktop shows the right apps 211 | 3. **Name Your Apps Correctly**: Use exact app names as they appear in the system 212 | 4. **Consider Performance**: While lightweight, too many parallel agents can still impact system performance 213 | 5. **Plan Your Workflows**: Design agent tasks to minimize app switching for best results 214 | 215 | ### How It Works 216 | 217 | When you create a desktop session with `create_desktop_from_apps()`, App Use: 218 | - Filters the visual output to show only specified application windows 219 | - Routes input events only to those applications 220 | - Maintains window layout isolation between different sessions 221 | - Shares the underlying file system and OS resources 222 | - **Dynamically adjusts resolution** to fit the window layout and menu bar items 223 | 224 | The resolution of these virtual desktops is dynamic, automatically scaling to accommodate the applications' window sizes and menu bar requirements. This ensures that agents always have a clear view of the entire interface they need to interact with, regardless of the specific app combination. 225 | 226 | Currently, App Use is limited to macOS only due to its reliance on Quartz, Apple's powerful compositing engine, for creating these virtual desktops. Quartz provides the low-level window management and rendering capabilities that make it possible to composite multiple application windows into isolated visual environments. 227 | 228 | ## Conclusion 229 | 230 | App Use brings a new dimension to computer automation - lightweight, focused, and parallel. Whether you're building a personal iPhone assistant or orchestrating a team of specialized agents, App Use provides the perfect balance of functionality and efficiency. 231 | 232 | Ready to try it? Update to the latest Cua version and start focusing your agents today! 233 | 234 | ```bash 235 | pip install -U "cua-computer[all]" "cua-agent[all]" 236 | ``` 237 | 238 | Happy automating! 🎯🤖 239 | ``` -------------------------------------------------------------------------------- /blog/introducing-cua-cloud-containers.md: -------------------------------------------------------------------------------- ```markdown 1 | # Introducing Cua Cloud Sandbox: Computer-Use Agents in the Cloud 2 | 3 | *Published on May 28, 2025 by Francesco Bonacci* 4 | 5 | Welcome to the next chapter in our Computer-Use Agent journey! In [Part 1](./build-your-own-operator-on-macos-1), we showed you how to build your own Operator on macOS. In [Part 2](./build-your-own-operator-on-macos-2), we explored the cua-agent framework. Today, we're excited to introduce **Cua Cloud Sandbox** – the easiest way to deploy Computer-Use Agents at scale. 6 | 7 | <div align="center"> 8 | <video src="https://github.com/user-attachments/assets/63a2addf-649f-4468-971d-58d38dd43ee6" width="600" controls></video> 9 | </div> 10 | 11 | ## What is Cua Cloud? 12 | 13 | Think of Cua Cloud as **Docker for Computer-Use Agents**. Instead of managing VMs, installing dependencies, and configuring environments, you can launch pre-configured Cloud Sandbox instances with a single command. Each sandbox comes with a **full desktop environment** accessible via browser (via noVNC), all CUA-related dependencies pre-configured (with a PyAutoGUI-compatible server), and **pay-per-use pricing** that scales with your needs. 14 | 15 | ## Why Cua Cloud Sandbox? 16 | 17 | Four months ago, we launched [**Lume**](https://github.com/trycua/cua/tree/main/libs/lume) and [**Cua**](https://github.com/trycua/cua) with the goal to bring sandboxed VMs and Computer-Use Agents on Apple Silicon. The developer's community response was incredible 🎉 18 | 19 | Going from prototype to production revealed a problem though: **local macOS VMs don't scale**, neither are they easily portable. 20 | 21 | Our Discord community, YC peers, and early pilot customers kept hitting the same issues. Storage constraints meant **20-40GB per VM** filled laptops fast. Different hardware architectures (Apple Silicon ARM vs Intel x86) prevented portability of local workflows. Every new user lost a day to setup and configuration. 22 | 23 | **Cua Cloud** eliminates these constraints while preserving everything developers are familiar with about our Computer and Agent SDK. 24 | 25 | ### What We Built 26 | 27 | Over the past month, we've been iterating over Cua Cloud with partners and beta users to address these challenges. You use the exact same `Computer` and `ComputerAgent` classes you already know, but with **zero local setup** or storage requirements. VNC access comes with **built-in encryption**, you pay only for compute time (not idle resources), and can bring your own API keys for any LLM provider. 28 | 29 | The result? **Instant deployment** in seconds instead of hours, with no infrastructure to manage. Scale elastically from **1 to 100 agents** in parallel, with consistent behavior across all deployments. Share agent trajectories with your team for better collaboration and debugging. 30 | 31 | ## Getting Started 32 | 33 | ### Step 1: Get Your API Key 34 | 35 | Sign up at [**trycua.com**](https://trycua.com) to get your API key. 36 | 37 | ```bash 38 | # Set your API key in environment variables 39 | export CUA_API_KEY=your_api_key_here 40 | export CUA_CONTAINER_NAME=my-agent-container 41 | ``` 42 | 43 | ### Step 2: Launch Your First Sandbox 44 | 45 | ```python 46 | import asyncio 47 | from computer import Computer, VMProviderType 48 | from agent import ComputerAgent 49 | 50 | async def run_cloud_agent(): 51 | # Create a remote Linux computer with Cua Cloud 52 | computer = Computer( 53 | os_type="linux", 54 | api_key=os.getenv("CUA_API_KEY"), 55 | name=os.getenv("CUA_CONTAINER_NAME"), 56 | provider_type=VMProviderType.CLOUD, 57 | ) 58 | 59 | # Create an agent with your preferred loop 60 | agent = ComputerAgent( 61 | model="openai/gpt-4o", 62 | save_trajectory=True, 63 | verbosity=logging.INFO, 64 | tools=[computer] 65 | ) 66 | 67 | # Run a task 68 | async for result in agent.run("Open Chrome and search for AI news"): 69 | print(f"Response: {result.get('text')}") 70 | 71 | # Run the agent 72 | asyncio.run(run_cloud_agent()) 73 | ``` 74 | 75 | ### Available Tiers 76 | 77 | We're launching with **three compute tiers** to match your workload needs: 78 | 79 | - **Small** (1 vCPU, 4GB RAM) - Perfect for simple automation tasks and testing 80 | - **Medium** (2 vCPU, 8GB RAM) - Ideal for most production workloads 81 | - **Large** (8 vCPU, 32GB RAM) - Built for complex, resource-intensive operations 82 | 83 | Each tier includes a **full Linux with Xfce desktop environment** with pre-configured browser, **secure VNC access** with SSL, persistent storage during your session, and automatic cleanup on termination for sandboxes. 84 | 85 | ## How some customers are using Cua Cloud today 86 | 87 | ### Example 1: Automated GitHub Workflow 88 | 89 | Let's automate a complete GitHub workflow: 90 | 91 | ```python 92 | import asyncio 93 | import os 94 | from computer import Computer, VMProviderType 95 | from agent import ComputerAgent 96 | 97 | async def github_automation(): 98 | """Automate GitHub repository management tasks.""" 99 | computer = Computer( 100 | os_type="linux", 101 | api_key=os.getenv("CUA_API_KEY"), 102 | name="github-automation", 103 | provider_type=VMProviderType.CLOUD, 104 | ) 105 | 106 | agent = ComputerAgent( 107 | model="openai/gpt-4o", 108 | save_trajectory=True, 109 | verbosity=logging.INFO, 110 | tools=[computer] 111 | ) 112 | 113 | tasks = [ 114 | "Look for a repository named trycua/cua on GitHub.", 115 | "Check the open issues, open the most recent one and read it.", 116 | "Clone the repository if it doesn't exist yet.", 117 | "Create a new branch for the issue.", 118 | "Make necessary changes to resolve the issue.", 119 | "Commit the changes with a descriptive message.", 120 | "Create a pull request." 121 | ] 122 | 123 | for i, task in enumerate(tasks): 124 | print(f"\nExecuting task {i+1}/{len(tasks)}: {task}") 125 | async for result in agent.run(task): 126 | print(f"Response: {result.get('text')}") 127 | 128 | # Check if any tools were used 129 | tools = result.get('tools') 130 | if tools: 131 | print(f"Tools used: {tools}") 132 | 133 | print(f"Task {i+1} completed") 134 | 135 | # Run the automation 136 | asyncio.run(github_automation()) 137 | ``` 138 | 139 | ### Example 2: Parallel Web Scraping 140 | 141 | Run multiple agents in parallel to scrape different websites: 142 | 143 | ```python 144 | import asyncio 145 | from computer import Computer, VMProviderType 146 | from agent import ComputerAgent 147 | 148 | async def scrape_website(site_name, url): 149 | """Scrape a website using a cloud agent.""" 150 | computer = Computer( 151 | os_type="linux", 152 | api_key=os.getenv("CUA_API_KEY"), 153 | name=f"scraper-{site_name}", 154 | provider_type=VMProviderType.CLOUD, 155 | ) 156 | 157 | agent = ComputerAgent( 158 | model="openai/gpt-4o", 159 | save_trajectory=True, 160 | tools=[computer] 161 | ) 162 | 163 | results = [] 164 | tasks = [ 165 | f"Navigate to {url}", 166 | "Extract the main headlines or article titles", 167 | "Take a screenshot of the page", 168 | "Save the extracted data to a file" 169 | ] 170 | 171 | for task in tasks: 172 | async for result in agent.run(task): 173 | results.append({ 174 | 'site': site_name, 175 | 'task': task, 176 | 'response': result.get('text') 177 | }) 178 | 179 | return results 180 | 181 | async def parallel_scraping(): 182 | """Scrape multiple websites in parallel.""" 183 | sites = [ 184 | ("ArXiv", "https://arxiv.org"), 185 | ("HackerNews", "https://news.ycombinator.com"), 186 | ("TechCrunch", "https://techcrunch.com") 187 | ] 188 | 189 | # Run all scraping tasks in parallel 190 | tasks = [scrape_website(name, url) for name, url in sites] 191 | results = await asyncio.gather(*tasks) 192 | 193 | # Process results 194 | for site_results in results: 195 | print(f"\nResults from {site_results[0]['site']}:") 196 | for result in site_results: 197 | print(f" - {result['task']}: {result['response'][:100]}...") 198 | 199 | # Run parallel scraping 200 | asyncio.run(parallel_scraping()) 201 | ``` 202 | 203 | ## Cost Optimization Tips 204 | 205 | To optimize your costs, use appropriate sandbox sizes for your workload and implement timeouts to prevent runaway tasks. Batch related operations together to minimize sandbox spin-up time, and always remember to terminate sandboxes when your work is complete. 206 | 207 | ## Security Considerations 208 | 209 | Cua Cloud runs all sandboxes in isolated environments with encrypted VNC connections. Your API keys are never exposed in trajectories. 210 | 211 | ## What's Next for Cua Cloud 212 | 213 | We're just getting started! Here's what's coming in the next few months: 214 | 215 | ### Elastic Autoscaled Sandbox Pools 216 | 217 | Soon you'll be able to create elastic sandbox pools that automatically scale based on demand. Define minimum and maximum sandbox counts, and let Cua Cloud handle the rest. Perfect for batch processing, scheduled automations, and handling traffic spikes without manual intervention. 218 | 219 | ### Windows and macOS Cloud Support 220 | 221 | While we're launching with Linux sandboxes, Windows and macOS cloud machines are coming soon. Run Windows-specific automations, test cross-platform workflows, or leverage macOS-exclusive applications – all in the cloud with the same simple API. 222 | 223 | Stay tuned for updates and join our [**Discord**](https://discord.gg/cua-ai) to vote on which features you'd like to see first! 224 | 225 | ## Get Started Today 226 | 227 | Ready to deploy your Computer-Use Agents in the cloud? 228 | 229 | Visit [**trycua.com**](https://trycua.com) to sign up and get your API key. Join our [**Discord community**](https://discord.gg/cua-ai) for support and explore more examples on [**GitHub**](https://github.com/trycua/cua). 230 | 231 | Happy RPA 2.0! 🚀 232 | ```