#
tokens: 47795/50000 14/616 files (page 9/20)
lines: off (toggle) GitHub
raw markdown copy
This is page 9 of 20. Use http://codebase.md/trycua/cua?page={x} to view the full context.

# Directory Structure

```
├── .cursorignore
├── .dockerignore
├── .editorconfig
├── .gitattributes
├── .github
│   ├── FUNDING.yml
│   ├── scripts
│   │   ├── get_pyproject_version.py
│   │   └── tests
│   │       ├── __init__.py
│   │       ├── README.md
│   │       └── test_get_pyproject_version.py
│   └── workflows
│       ├── bump-version.yml
│       ├── ci-lume.yml
│       ├── docker-publish-cua-linux.yml
│       ├── docker-publish-cua-windows.yml
│       ├── docker-publish-kasm.yml
│       ├── docker-publish-xfce.yml
│       ├── docker-reusable-publish.yml
│       ├── link-check.yml
│       ├── lint.yml
│       ├── npm-publish-cli.yml
│       ├── npm-publish-computer.yml
│       ├── npm-publish-core.yml
│       ├── publish-lume.yml
│       ├── pypi-publish-agent.yml
│       ├── pypi-publish-computer-server.yml
│       ├── pypi-publish-computer.yml
│       ├── pypi-publish-core.yml
│       ├── pypi-publish-mcp-server.yml
│       ├── pypi-publish-som.yml
│       ├── pypi-reusable-publish.yml
│       ├── python-tests.yml
│       ├── test-cua-models.yml
│       └── test-validation-script.yml
├── .gitignore
├── .pre-commit-config.yaml
├── .prettierignore
├── .prettierrc.yaml
├── .vscode
│   ├── docs.code-workspace
│   ├── extensions.json
│   ├── launch.json
│   ├── libs-ts.code-workspace
│   ├── lume.code-workspace
│   ├── lumier.code-workspace
│   ├── py.code-workspace
│   └── settings.json
├── blog
│   ├── app-use.md
│   ├── assets
│   │   ├── composite-agents.png
│   │   ├── docker-ubuntu-support.png
│   │   ├── hack-booth.png
│   │   ├── hack-closing-ceremony.jpg
│   │   ├── hack-cua-ollama-hud.jpeg
│   │   ├── hack-leaderboard.png
│   │   ├── hack-the-north.png
│   │   ├── hack-winners.jpeg
│   │   ├── hack-workshop.jpeg
│   │   ├── hud-agent-evals.png
│   │   └── trajectory-viewer.jpeg
│   ├── bringing-computer-use-to-the-web.md
│   ├── build-your-own-operator-on-macos-1.md
│   ├── build-your-own-operator-on-macos-2.md
│   ├── cloud-windows-ga-macos-preview.md
│   ├── composite-agents.md
│   ├── computer-use-agents-for-growth-hacking.md
│   ├── cua-hackathon.md
│   ├── cua-playground-preview.md
│   ├── cua-vlm-router.md
│   ├── hack-the-north.md
│   ├── hud-agent-evals.md
│   ├── human-in-the-loop.md
│   ├── introducing-cua-cli.md
│   ├── introducing-cua-cloud-containers.md
│   ├── lume-to-containerization.md
│   ├── neurips-2025-cua-papers.md
│   ├── sandboxed-python-execution.md
│   ├── training-computer-use-models-trajectories-1.md
│   ├── trajectory-viewer.md
│   ├── ubuntu-docker-support.md
│   └── windows-sandbox.md
├── CONTRIBUTING.md
├── Development.md
├── Dockerfile
├── docs
│   ├── .env.example
│   ├── .gitignore
│   ├── content
│   │   └── docs
│   │       ├── agent-sdk
│   │       │   ├── agent-loops.mdx
│   │       │   ├── benchmarks
│   │       │   │   ├── index.mdx
│   │       │   │   ├── interactive.mdx
│   │       │   │   ├── introduction.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── osworld-verified.mdx
│   │       │   │   ├── screenspot-pro.mdx
│   │       │   │   └── screenspot-v2.mdx
│   │       │   ├── callbacks
│   │       │   │   ├── agent-lifecycle.mdx
│   │       │   │   ├── cost-saving.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── logging.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── pii-anonymization.mdx
│   │       │   │   └── trajectories.mdx
│   │       │   ├── chat-history.mdx
│   │       │   ├── custom-tools.mdx
│   │       │   ├── customizing-computeragent.mdx
│   │       │   ├── integrations
│   │       │   │   ├── hud.mdx
│   │       │   │   ├── meta.json
│   │       │   │   └── observability.mdx
│   │       │   ├── mcp-server
│   │       │   │   ├── client-integrations.mdx
│   │       │   │   ├── configuration.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   ├── llm-integrations.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── tools.mdx
│   │       │   │   └── usage.mdx
│   │       │   ├── message-format.mdx
│   │       │   ├── meta.json
│   │       │   ├── migration-guide.mdx
│   │       │   ├── prompt-caching.mdx
│   │       │   ├── supported-agents
│   │       │   │   ├── composed-agents.mdx
│   │       │   │   ├── computer-use-agents.mdx
│   │       │   │   ├── grounding-models.mdx
│   │       │   │   ├── human-in-the-loop.mdx
│   │       │   │   └── meta.json
│   │       │   ├── supported-model-providers
│   │       │   │   ├── cua-vlm-router.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   └── local-models.mdx
│   │       │   ├── telemetry.mdx
│   │       │   └── usage-tracking.mdx
│   │       ├── cli-playbook
│   │       │   ├── commands.mdx
│   │       │   ├── index.mdx
│   │       │   └── meta.json
│   │       ├── computer-sdk
│   │       │   ├── cloud-vm-management.mdx
│   │       │   ├── commands.mdx
│   │       │   ├── computer-server
│   │       │   │   ├── Commands.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── meta.json
│   │       │   │   ├── REST-API.mdx
│   │       │   │   └── WebSocket-API.mdx
│   │       │   ├── computer-ui.mdx
│   │       │   ├── computers.mdx
│   │       │   ├── custom-computer-handlers.mdx
│   │       │   ├── meta.json
│   │       │   ├── sandboxed-python.mdx
│   │       │   └── tracing-api.mdx
│   │       ├── example-usecases
│   │       │   ├── form-filling.mdx
│   │       │   ├── gemini-complex-ui-navigation.mdx
│   │       │   ├── meta.json
│   │       │   ├── post-event-contact-export.mdx
│   │       │   └── windows-app-behind-vpn.mdx
│   │       ├── get-started
│   │       │   ├── meta.json
│   │       │   └── quickstart.mdx
│   │       ├── index.mdx
│   │       ├── macos-vm-cli-playbook
│   │       │   ├── lume
│   │       │   │   ├── cli-reference.mdx
│   │       │   │   ├── faq.md
│   │       │   │   ├── http-api.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   ├── meta.json
│   │       │   │   └── prebuilt-images.mdx
│   │       │   ├── lumier
│   │       │   │   ├── building-lumier.mdx
│   │       │   │   ├── docker-compose.mdx
│   │       │   │   ├── docker.mdx
│   │       │   │   ├── index.mdx
│   │       │   │   ├── installation.mdx
│   │       │   │   └── meta.json
│   │       │   └── meta.json
│   │       └── meta.json
│   ├── next.config.mjs
│   ├── package-lock.json
│   ├── package.json
│   ├── pnpm-lock.yaml
│   ├── postcss.config.mjs
│   ├── public
│   │   └── img
│   │       ├── agent_gradio_ui.png
│   │       ├── agent.png
│   │       ├── bg-dark.jpg
│   │       ├── bg-light.jpg
│   │       ├── cli.png
│   │       ├── computer.png
│   │       ├── grounding-with-gemini3.gif
│   │       ├── hero.png
│   │       ├── laminar_trace_example.png
│   │       ├── som_box_threshold.png
│   │       └── som_iou_threshold.png
│   ├── README.md
│   ├── source.config.ts
│   ├── src
│   │   ├── app
│   │   │   ├── (home)
│   │   │   │   ├── [[...slug]]
│   │   │   │   │   └── page.tsx
│   │   │   │   └── layout.tsx
│   │   │   ├── api
│   │   │   │   ├── posthog
│   │   │   │   │   └── [...path]
│   │   │   │   │       └── route.ts
│   │   │   │   └── search
│   │   │   │       └── route.ts
│   │   │   ├── favicon.ico
│   │   │   ├── global.css
│   │   │   ├── layout.config.tsx
│   │   │   ├── layout.tsx
│   │   │   ├── llms.mdx
│   │   │   │   └── [[...slug]]
│   │   │   │       └── route.ts
│   │   │   ├── llms.txt
│   │   │   │   └── route.ts
│   │   │   ├── robots.ts
│   │   │   └── sitemap.ts
│   │   ├── assets
│   │   │   ├── discord-black.svg
│   │   │   ├── discord-white.svg
│   │   │   ├── logo-black.svg
│   │   │   └── logo-white.svg
│   │   ├── components
│   │   │   ├── analytics-tracker.tsx
│   │   │   ├── cookie-consent.tsx
│   │   │   ├── doc-actions-menu.tsx
│   │   │   ├── editable-code-block.tsx
│   │   │   ├── footer.tsx
│   │   │   ├── hero.tsx
│   │   │   ├── iou.tsx
│   │   │   ├── mermaid.tsx
│   │   │   └── page-feedback.tsx
│   │   ├── lib
│   │   │   ├── llms.ts
│   │   │   └── source.ts
│   │   ├── mdx-components.tsx
│   │   └── providers
│   │       └── posthog-provider.tsx
│   └── tsconfig.json
├── examples
│   ├── agent_examples.py
│   ├── agent_ui_examples.py
│   ├── browser_tool_example.py
│   ├── cloud_api_examples.py
│   ├── computer_examples_windows.py
│   ├── computer_examples.py
│   ├── computer_ui_examples.py
│   ├── computer-example-ts
│   │   ├── .env.example
│   │   ├── .gitignore
│   │   ├── package-lock.json
│   │   ├── package.json
│   │   ├── pnpm-lock.yaml
│   │   ├── README.md
│   │   ├── src
│   │   │   ├── helpers.ts
│   │   │   └── index.ts
│   │   └── tsconfig.json
│   ├── docker_examples.py
│   ├── evals
│   │   ├── hud_eval_examples.py
│   │   └── wikipedia_most_linked.txt
│   ├── pylume_examples.py
│   ├── sandboxed_functions_examples.py
│   ├── som_examples.py
│   ├── tracing_examples.py
│   ├── utils.py
│   └── winsandbox_example.py
├── img
│   ├── agent_gradio_ui.png
│   ├── agent.png
│   ├── cli.png
│   ├── computer.png
│   ├── logo_black.png
│   └── logo_white.png
├── libs
│   ├── kasm
│   │   ├── Dockerfile
│   │   ├── LICENSE
│   │   ├── README.md
│   │   └── src
│   │       └── ubuntu
│   │           └── install
│   │               └── firefox
│   │                   ├── custom_startup.sh
│   │                   ├── firefox.desktop
│   │                   └── install_firefox.sh
│   ├── lume
│   │   ├── .cursorignore
│   │   ├── CONTRIBUTING.md
│   │   ├── Development.md
│   │   ├── img
│   │   │   └── cli.png
│   │   ├── Package.resolved
│   │   ├── Package.swift
│   │   ├── README.md
│   │   ├── resources
│   │   │   └── lume.entitlements
│   │   ├── scripts
│   │   │   ├── build
│   │   │   │   ├── build-debug.sh
│   │   │   │   ├── build-release-notarized.sh
│   │   │   │   └── build-release.sh
│   │   │   └── install.sh
│   │   ├── src
│   │   │   ├── Commands
│   │   │   │   ├── Clone.swift
│   │   │   │   ├── Config.swift
│   │   │   │   ├── Create.swift
│   │   │   │   ├── Delete.swift
│   │   │   │   ├── Get.swift
│   │   │   │   ├── Images.swift
│   │   │   │   ├── IPSW.swift
│   │   │   │   ├── List.swift
│   │   │   │   ├── Logs.swift
│   │   │   │   ├── Options
│   │   │   │   │   └── FormatOption.swift
│   │   │   │   ├── Prune.swift
│   │   │   │   ├── Pull.swift
│   │   │   │   ├── Push.swift
│   │   │   │   ├── Run.swift
│   │   │   │   ├── Serve.swift
│   │   │   │   ├── Set.swift
│   │   │   │   └── Stop.swift
│   │   │   ├── ContainerRegistry
│   │   │   │   ├── ImageContainerRegistry.swift
│   │   │   │   ├── ImageList.swift
│   │   │   │   └── ImagesPrinter.swift
│   │   │   ├── Errors
│   │   │   │   └── Errors.swift
│   │   │   ├── FileSystem
│   │   │   │   ├── Home.swift
│   │   │   │   ├── Settings.swift
│   │   │   │   ├── VMConfig.swift
│   │   │   │   ├── VMDirectory.swift
│   │   │   │   └── VMLocation.swift
│   │   │   ├── LumeController.swift
│   │   │   ├── Main.swift
│   │   │   ├── Server
│   │   │   │   ├── Handlers.swift
│   │   │   │   ├── HTTP.swift
│   │   │   │   ├── Requests.swift
│   │   │   │   ├── Responses.swift
│   │   │   │   └── Server.swift
│   │   │   ├── Utils
│   │   │   │   ├── CommandRegistry.swift
│   │   │   │   ├── CommandUtils.swift
│   │   │   │   ├── Logger.swift
│   │   │   │   ├── NetworkUtils.swift
│   │   │   │   ├── Path.swift
│   │   │   │   ├── ProcessRunner.swift
│   │   │   │   ├── ProgressLogger.swift
│   │   │   │   ├── String.swift
│   │   │   │   └── Utils.swift
│   │   │   ├── Virtualization
│   │   │   │   ├── DarwinImageLoader.swift
│   │   │   │   ├── DHCPLeaseParser.swift
│   │   │   │   ├── ImageLoaderFactory.swift
│   │   │   │   └── VMVirtualizationService.swift
│   │   │   ├── VM
│   │   │   │   ├── DarwinVM.swift
│   │   │   │   ├── LinuxVM.swift
│   │   │   │   ├── VM.swift
│   │   │   │   ├── VMDetails.swift
│   │   │   │   ├── VMDetailsPrinter.swift
│   │   │   │   ├── VMDisplayResolution.swift
│   │   │   │   └── VMFactory.swift
│   │   │   └── VNC
│   │   │       ├── PassphraseGenerator.swift
│   │   │       └── VNCService.swift
│   │   └── tests
│   │       ├── Mocks
│   │       │   ├── MockVM.swift
│   │       │   ├── MockVMVirtualizationService.swift
│   │       │   └── MockVNCService.swift
│   │       ├── VM
│   │       │   └── VMDetailsPrinterTests.swift
│   │       ├── VMTests.swift
│   │       ├── VMVirtualizationServiceTests.swift
│   │       └── VNCServiceTests.swift
│   ├── lumier
│   │   ├── .dockerignore
│   │   ├── Dockerfile
│   │   ├── README.md
│   │   └── src
│   │       ├── bin
│   │       │   └── entry.sh
│   │       ├── config
│   │       │   └── constants.sh
│   │       ├── hooks
│   │       │   └── on-logon.sh
│   │       └── lib
│   │           ├── utils.sh
│   │           └── vm.sh
│   ├── python
│   │   ├── agent
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── agent
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── adapters
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── cua_adapter.py
│   │   │   │   │   ├── huggingfacelocal_adapter.py
│   │   │   │   │   ├── human_adapter.py
│   │   │   │   │   ├── mlxvlm_adapter.py
│   │   │   │   │   └── models
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── generic.py
│   │   │   │   │       ├── internvl.py
│   │   │   │   │       ├── opencua.py
│   │   │   │   │       └── qwen2_5_vl.py
│   │   │   │   ├── agent.py
│   │   │   │   ├── callbacks
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── budget_manager.py
│   │   │   │   │   ├── image_retention.py
│   │   │   │   │   ├── logging.py
│   │   │   │   │   ├── operator_validator.py
│   │   │   │   │   ├── pii_anonymization.py
│   │   │   │   │   ├── prompt_instructions.py
│   │   │   │   │   ├── telemetry.py
│   │   │   │   │   └── trajectory_saver.py
│   │   │   │   ├── cli.py
│   │   │   │   ├── computers
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cua.py
│   │   │   │   │   └── custom.py
│   │   │   │   ├── decorators.py
│   │   │   │   ├── human_tool
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── __main__.py
│   │   │   │   │   ├── server.py
│   │   │   │   │   └── ui.py
│   │   │   │   ├── integrations
│   │   │   │   │   └── hud
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── agent.py
│   │   │   │   │       └── proxy.py
│   │   │   │   ├── loops
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── anthropic.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── composed_grounded.py
│   │   │   │   │   ├── gelato.py
│   │   │   │   │   ├── gemini.py
│   │   │   │   │   ├── generic_vlm.py
│   │   │   │   │   ├── glm45v.py
│   │   │   │   │   ├── gta1.py
│   │   │   │   │   ├── holo.py
│   │   │   │   │   ├── internvl.py
│   │   │   │   │   ├── model_types.csv
│   │   │   │   │   ├── moondream3.py
│   │   │   │   │   ├── omniparser.py
│   │   │   │   │   ├── openai.py
│   │   │   │   │   ├── opencua.py
│   │   │   │   │   ├── uiins.py
│   │   │   │   │   ├── uitars.py
│   │   │   │   │   └── uitars2.py
│   │   │   │   ├── proxy
│   │   │   │   │   ├── examples.py
│   │   │   │   │   └── handlers.py
│   │   │   │   ├── responses.py
│   │   │   │   ├── tools
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── browser_tool.py
│   │   │   │   ├── types.py
│   │   │   │   └── ui
│   │   │   │       ├── __init__.py
│   │   │   │       ├── __main__.py
│   │   │   │       └── gradio
│   │   │   │           ├── __init__.py
│   │   │   │           ├── app.py
│   │   │   │           └── ui_components.py
│   │   │   ├── benchmarks
│   │   │   │   ├── .gitignore
│   │   │   │   ├── contrib.md
│   │   │   │   ├── interactive.py
│   │   │   │   ├── models
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   └── gta1.py
│   │   │   │   ├── README.md
│   │   │   │   ├── ss-pro.py
│   │   │   │   ├── ss-v2.py
│   │   │   │   └── utils.py
│   │   │   ├── example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_computer_agent.py
│   │   ├── bench-ui
│   │   │   ├── bench_ui
│   │   │   │   ├── __init__.py
│   │   │   │   ├── api.py
│   │   │   │   └── child.py
│   │   │   ├── examples
│   │   │   │   ├── folder_example.py
│   │   │   │   ├── gui
│   │   │   │   │   ├── index.html
│   │   │   │   │   ├── logo.svg
│   │   │   │   │   └── styles.css
│   │   │   │   ├── output_overlay.png
│   │   │   │   └── simple_example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       └── test_port_detection.py
│   │   ├── computer
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── computer
│   │   │   │   ├── __init__.py
│   │   │   │   ├── computer.py
│   │   │   │   ├── diorama_computer.py
│   │   │   │   ├── helpers.py
│   │   │   │   ├── interface
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── generic.py
│   │   │   │   │   ├── linux.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   ├── models.py
│   │   │   │   │   └── windows.py
│   │   │   │   ├── logger.py
│   │   │   │   ├── models.py
│   │   │   │   ├── providers
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── cloud
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── docker
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── lume
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── lume_api.py
│   │   │   │   │   ├── lumier
│   │   │   │   │   │   ├── __init__.py
│   │   │   │   │   │   └── provider.py
│   │   │   │   │   ├── types.py
│   │   │   │   │   └── winsandbox
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       ├── provider.py
│   │   │   │   │       └── setup_script.ps1
│   │   │   │   ├── tracing_wrapper.py
│   │   │   │   ├── tracing.py
│   │   │   │   ├── ui
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── __main__.py
│   │   │   │   │   └── gradio
│   │   │   │   │       ├── __init__.py
│   │   │   │   │       └── app.py
│   │   │   │   └── utils.py
│   │   │   ├── poetry.toml
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_computer.py
│   │   ├── computer-server
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── computer_server
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── browser.py
│   │   │   │   ├── cli.py
│   │   │   │   ├── diorama
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── diorama_computer.py
│   │   │   │   │   ├── diorama.py
│   │   │   │   │   ├── draw.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   └── safezone.py
│   │   │   │   ├── handlers
│   │   │   │   │   ├── base.py
│   │   │   │   │   ├── factory.py
│   │   │   │   │   ├── generic.py
│   │   │   │   │   ├── linux.py
│   │   │   │   │   ├── macos.py
│   │   │   │   │   └── windows.py
│   │   │   │   ├── main.py
│   │   │   │   ├── server.py
│   │   │   │   ├── utils
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── wallpaper.py
│   │   │   │   └── watchdog.py
│   │   │   ├── examples
│   │   │   │   ├── __init__.py
│   │   │   │   └── usage_example.py
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   ├── run_server.py
│   │   │   ├── test_connection.py
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_server.py
│   │   ├── core
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── core
│   │   │   │   ├── __init__.py
│   │   │   │   └── telemetry
│   │   │   │       ├── __init__.py
│   │   │   │       └── posthog.py
│   │   │   ├── poetry.toml
│   │   │   ├── pyproject.toml
│   │   │   ├── README.md
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_telemetry.py
│   │   ├── mcp-server
│   │   │   ├── .bumpversion.cfg
│   │   │   ├── build-extension.py
│   │   │   ├── CONCURRENT_SESSIONS.md
│   │   │   ├── desktop-extension
│   │   │   │   ├── cua-extension.mcpb
│   │   │   │   ├── desktop_extension.png
│   │   │   │   ├── manifest.json
│   │   │   │   ├── README.md
│   │   │   │   ├── requirements.txt
│   │   │   │   ├── run_server.sh
│   │   │   │   └── setup.py
│   │   │   ├── mcp_server
│   │   │   │   ├── __init__.py
│   │   │   │   ├── __main__.py
│   │   │   │   ├── server.py
│   │   │   │   └── session_manager.py
│   │   │   ├── pdm.lock
│   │   │   ├── pyproject.toml
│   │   │   ├── QUICK_TEST_COMMANDS.sh
│   │   │   ├── quick_test_local_option.py
│   │   │   ├── README.md
│   │   │   ├── scripts
│   │   │   │   ├── install_mcp_server.sh
│   │   │   │   └── start_mcp_server.sh
│   │   │   ├── test_mcp_server_local_option.py
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_mcp_server.py
│   │   ├── pylume
│   │   │   └── tests
│   │   │       ├── conftest.py
│   │   │       └── test_pylume.py
│   │   └── som
│   │       ├── .bumpversion.cfg
│   │       ├── LICENSE
│   │       ├── poetry.toml
│   │       ├── pyproject.toml
│   │       ├── README.md
│   │       ├── som
│   │       │   ├── __init__.py
│   │       │   ├── detect.py
│   │       │   ├── detection.py
│   │       │   ├── models.py
│   │       │   ├── ocr.py
│   │       │   ├── util
│   │       │   │   └── utils.py
│   │       │   └── visualization.py
│   │       └── tests
│   │           ├── conftest.py
│   │           └── test_omniparser.py
│   ├── qemu-docker
│   │   ├── linux
│   │   │   ├── Dockerfile
│   │   │   ├── README.md
│   │   │   └── src
│   │   │       ├── entry.sh
│   │   │       └── vm
│   │   │           ├── image
│   │   │           │   └── README.md
│   │   │           └── setup
│   │   │               ├── install.sh
│   │   │               ├── setup-cua-server.sh
│   │   │               └── setup.sh
│   │   ├── README.md
│   │   └── windows
│   │       ├── Dockerfile
│   │       ├── README.md
│   │       └── src
│   │           ├── entry.sh
│   │           └── vm
│   │               ├── image
│   │               │   └── README.md
│   │               └── setup
│   │                   ├── install.bat
│   │                   ├── on-logon.ps1
│   │                   ├── setup-cua-server.ps1
│   │                   ├── setup-utils.psm1
│   │                   └── setup.ps1
│   ├── typescript
│   │   ├── .gitignore
│   │   ├── .nvmrc
│   │   ├── agent
│   │   │   ├── examples
│   │   │   │   ├── playground-example.html
│   │   │   │   └── README.md
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── client.ts
│   │   │   │   ├── index.ts
│   │   │   │   └── types.ts
│   │   │   ├── tests
│   │   │   │   └── client.test.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── computer
│   │   │   ├── .editorconfig
│   │   │   ├── .gitattributes
│   │   │   ├── .gitignore
│   │   │   ├── LICENSE
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── computer
│   │   │   │   │   ├── index.ts
│   │   │   │   │   ├── providers
│   │   │   │   │   │   ├── base.ts
│   │   │   │   │   │   ├── cloud.ts
│   │   │   │   │   │   └── index.ts
│   │   │   │   │   └── types.ts
│   │   │   │   ├── index.ts
│   │   │   │   ├── interface
│   │   │   │   │   ├── base.ts
│   │   │   │   │   ├── factory.ts
│   │   │   │   │   ├── index.ts
│   │   │   │   │   ├── linux.ts
│   │   │   │   │   ├── macos.ts
│   │   │   │   │   └── windows.ts
│   │   │   │   └── types.ts
│   │   │   ├── tests
│   │   │   │   ├── computer
│   │   │   │   │   └── cloud.test.ts
│   │   │   │   ├── interface
│   │   │   │   │   ├── factory.test.ts
│   │   │   │   │   ├── index.test.ts
│   │   │   │   │   ├── linux.test.ts
│   │   │   │   │   ├── macos.test.ts
│   │   │   │   │   └── windows.test.ts
│   │   │   │   └── setup.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── core
│   │   │   ├── .editorconfig
│   │   │   ├── .gitattributes
│   │   │   ├── .gitignore
│   │   │   ├── LICENSE
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── index.ts
│   │   │   │   └── telemetry
│   │   │   │       ├── clients
│   │   │   │       │   ├── index.ts
│   │   │   │       │   └── posthog.ts
│   │   │   │       └── index.ts
│   │   │   ├── tests
│   │   │   │   └── telemetry.test.ts
│   │   │   ├── tsconfig.json
│   │   │   ├── tsdown.config.ts
│   │   │   └── vitest.config.ts
│   │   ├── cua-cli
│   │   │   ├── .gitignore
│   │   │   ├── .prettierrc
│   │   │   ├── bun.lock
│   │   │   ├── CLAUDE.md
│   │   │   ├── index.ts
│   │   │   ├── package.json
│   │   │   ├── README.md
│   │   │   ├── src
│   │   │   │   ├── auth.ts
│   │   │   │   ├── cli.ts
│   │   │   │   ├── commands
│   │   │   │   │   ├── auth.ts
│   │   │   │   │   └── sandbox.ts
│   │   │   │   ├── config.ts
│   │   │   │   ├── http.ts
│   │   │   │   ├── storage.ts
│   │   │   │   └── util.ts
│   │   │   └── tsconfig.json
│   │   ├── package.json
│   │   ├── pnpm-lock.yaml
│   │   ├── pnpm-workspace.yaml
│   │   └── README.md
│   └── xfce
│       ├── .dockerignore
│       ├── .gitignore
│       ├── Development.md
│       ├── Dockerfile
│       ├── Dockerfile.dev
│       ├── README.md
│       └── src
│           ├── scripts
│           │   ├── resize-display.sh
│           │   ├── start-computer-server.sh
│           │   ├── start-novnc.sh
│           │   ├── start-vnc.sh
│           │   └── xstartup.sh
│           ├── supervisor
│           │   └── supervisord.conf
│           └── xfce-config
│               ├── helpers.rc
│               ├── xfce4-power-manager.xml
│               └── xfce4-session.xml
├── LICENSE.md
├── Makefile
├── notebooks
│   ├── agent_nb.ipynb
│   ├── blog
│   │   ├── build-your-own-operator-on-macos-1.ipynb
│   │   └── build-your-own-operator-on-macos-2.ipynb
│   ├── composite_agents_docker_nb.ipynb
│   ├── computer_nb.ipynb
│   ├── computer_server_nb.ipynb
│   ├── customizing_computeragent.ipynb
│   ├── eval_osworld.ipynb
│   ├── ollama_nb.ipynb
│   ├── README.md
│   ├── sota_hackathon_cloud.ipynb
│   └── sota_hackathon.ipynb
├── package-lock.json
├── package.json
├── pnpm-lock.yaml
├── pyproject.toml
├── pyrightconfig.json
├── README.md
├── scripts
│   ├── install-cli.ps1
│   ├── install-cli.sh
│   ├── playground-docker.sh
│   ├── playground.sh
│   ├── run-docker-dev.sh
│   └── typescript-typecheck.js
├── TESTING.md
├── tests
│   ├── agent_loop_testing
│   │   ├── agent_test.py
│   │   └── README.md
│   ├── pytest.ini
│   ├── shell_cmd.py
│   ├── test_files.py
│   ├── test_mcp_server_session_management.py
│   ├── test_mcp_server_streaming.py
│   ├── test_shell_bash.py
│   ├── test_telemetry.py
│   ├── test_tracing.py
│   ├── test_venv.py
│   └── test_watchdog.py
└── uv.lock
```

# Files

--------------------------------------------------------------------------------
/libs/python/computer-server/computer_server/handlers/base.py:
--------------------------------------------------------------------------------

```python
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional, Tuple


class BaseAccessibilityHandler(ABC):
    """Abstract base class for OS-specific accessibility handlers."""

    @abstractmethod
    async def get_accessibility_tree(self) -> Dict[str, Any]:
        """Get the accessibility tree of the current window."""
        pass

    @abstractmethod
    async def find_element(
        self, role: Optional[str] = None, title: Optional[str] = None, value: Optional[str] = None
    ) -> Dict[str, Any]:
        """Find an element in the accessibility tree by criteria."""
        pass


class BaseFileHandler(ABC):
    """Abstract base class for OS-specific file handlers."""

    @abstractmethod
    async def file_exists(self, path: str) -> Dict[str, Any]:
        """Check if a file exists at the specified path."""
        pass

    @abstractmethod
    async def directory_exists(self, path: str) -> Dict[str, Any]:
        """Check if a directory exists at the specified path."""
        pass

    @abstractmethod
    async def list_dir(self, path: str) -> Dict[str, Any]:
        """List the contents of a directory."""
        pass

    @abstractmethod
    async def read_text(self, path: str) -> Dict[str, Any]:
        """Read the text contents of a file."""
        pass

    @abstractmethod
    async def write_text(self, path: str, content: str) -> Dict[str, Any]:
        """Write text content to a file."""
        pass

    @abstractmethod
    async def write_bytes(self, path: str, content_b64: str) -> Dict[str, Any]:
        """Write binary content to a file. Sent over the websocket as a base64 string."""
        pass

    @abstractmethod
    async def delete_file(self, path: str) -> Dict[str, Any]:
        """Delete a file."""
        pass

    @abstractmethod
    async def create_dir(self, path: str) -> Dict[str, Any]:
        """Create a directory."""
        pass

    @abstractmethod
    async def delete_dir(self, path: str) -> Dict[str, Any]:
        """Delete a directory."""
        pass

    @abstractmethod
    async def read_bytes(
        self, path: str, offset: int = 0, length: Optional[int] = None
    ) -> Dict[str, Any]:
        """Read the binary contents of a file. Sent over the websocket as a base64 string.

        Args:
            path: Path to the file
            offset: Byte offset to start reading from (default: 0)
            length: Number of bytes to read (default: None for entire file)
        """
        pass

    @abstractmethod
    async def get_file_size(self, path: str) -> Dict[str, Any]:
        """Get the size of a file in bytes."""
        pass


class BaseDesktopHandler(ABC):
    """Abstract base class for OS-specific desktop handlers.

    Categories:
    - Wallpaper Actions: Methods for wallpaper operations
    - Desktop shortcut actions: Methods for managing desktop shortcuts
    """

    # Wallpaper Actions
    @abstractmethod
    async def get_desktop_environment(self) -> Dict[str, Any]:
        """Get the current desktop environment name."""
        pass

    @abstractmethod
    async def set_wallpaper(self, path: str) -> Dict[str, Any]:
        """Set the desktop wallpaper to the file at path."""
        pass


class BaseWindowHandler(ABC):
    """Abstract class for OS-specific window management handlers.

    Categories:
    - Window Management: Methods for application/window control
    """

    # Window Management
    @abstractmethod
    async def open(self, target: str) -> Dict[str, Any]:
        """Open a file or URL with the default application."""
        pass

    @abstractmethod
    async def launch(self, app: str, args: Optional[List[str]] = None) -> Dict[str, Any]:
        """Launch an application with optional arguments."""
        pass

    @abstractmethod
    async def get_current_window_id(self) -> Dict[str, Any]:
        """Get the currently active window ID."""
        pass

    @abstractmethod
    async def get_application_windows(self, app: str) -> Dict[str, Any]:
        """Get windows belonging to an application (by name or bundle)."""
        pass

    @abstractmethod
    async def get_window_name(self, window_id: str) -> Dict[str, Any]:
        """Get the title/name of a window by ID."""
        pass

    @abstractmethod
    async def get_window_size(self, window_id: str | int) -> Dict[str, Any]:
        """Get the size of a window by ID as {width, height}."""
        pass

    @abstractmethod
    async def activate_window(self, window_id: str | int) -> Dict[str, Any]:
        """Bring a window to the foreground by ID."""
        pass

    @abstractmethod
    async def close_window(self, window_id: str | int) -> Dict[str, Any]:
        """Close a window by ID."""
        pass

    @abstractmethod
    async def get_window_position(self, window_id: str | int) -> Dict[str, Any]:
        """Get the top-left position of a window as {x, y}."""
        pass

    @abstractmethod
    async def set_window_size(
        self, window_id: str | int, width: int, height: int
    ) -> Dict[str, Any]:
        """Set the size of a window by ID."""
        pass

    @abstractmethod
    async def set_window_position(self, window_id: str | int, x: int, y: int) -> Dict[str, Any]:
        """Set the position of a window by ID."""
        pass

    @abstractmethod
    async def maximize_window(self, window_id: str | int) -> Dict[str, Any]:
        """Maximize a window by ID."""
        pass

    @abstractmethod
    async def minimize_window(self, window_id: str | int) -> Dict[str, Any]:
        """Minimize a window by ID."""
        pass


class BaseAutomationHandler(ABC):
    """Abstract base class for OS-specific automation handlers.

    Categories:
    - Mouse Actions: Methods for mouse control
    - Keyboard Actions: Methods for keyboard input
    - Scrolling Actions: Methods for scrolling
    - Screen Actions: Methods for screen interaction
    - Clipboard Actions: Methods for clipboard operations
    """

    # Mouse Actions
    @abstractmethod
    async def mouse_down(
        self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
    ) -> Dict[str, Any]:
        """Perform a mouse down at the current or specified position."""
        pass

    @abstractmethod
    async def mouse_up(
        self, x: Optional[int] = None, y: Optional[int] = None, button: str = "left"
    ) -> Dict[str, Any]:
        """Perform a mouse up at the current or specified position."""
        pass

    @abstractmethod
    async def left_click(self, x: Optional[int] = None, y: Optional[int] = None) -> Dict[str, Any]:
        """Perform a left click at the current or specified position."""
        pass

    @abstractmethod
    async def right_click(self, x: Optional[int] = None, y: Optional[int] = None) -> Dict[str, Any]:
        """Perform a right click at the current or specified position."""
        pass

    @abstractmethod
    async def double_click(
        self, x: Optional[int] = None, y: Optional[int] = None
    ) -> Dict[str, Any]:
        """Perform a double click at the current or specified position."""
        pass

    @abstractmethod
    async def move_cursor(self, x: int, y: int) -> Dict[str, Any]:
        """Move the cursor to the specified position."""
        pass

    @abstractmethod
    async def drag_to(
        self, x: int, y: int, button: str = "left", duration: float = 0.5
    ) -> Dict[str, Any]:
        """Drag the cursor from current position to specified coordinates.

        Args:
            x: The x coordinate to drag to
            y: The y coordinate to drag to
            button: The mouse button to use ('left', 'middle', 'right')
            duration: How long the drag should take in seconds
        """
        pass

    @abstractmethod
    async def drag(
        self, path: List[Tuple[int, int]], button: str = "left", duration: float = 0.5
    ) -> Dict[str, Any]:
        """Drag the cursor from current position to specified coordinates.

        Args:
            path: A list of tuples of x and y coordinates to drag to
            button: The mouse button to use ('left', 'middle', 'right')
            duration: How long the drag should take in seconds
        """
        pass

    # Keyboard Actions
    @abstractmethod
    async def key_down(self, key: str) -> Dict[str, Any]:
        """Press and hold the specified key."""
        pass

    @abstractmethod
    async def key_up(self, key: str) -> Dict[str, Any]:
        """Release the specified key."""
        pass

    @abstractmethod
    async def type_text(self, text: str) -> Dict[str, Any]:
        """Type the specified text."""
        pass

    @abstractmethod
    async def press_key(self, key: str) -> Dict[str, Any]:
        """Press the specified key."""
        pass

    @abstractmethod
    async def hotkey(self, keys: List[str]) -> Dict[str, Any]:
        """Press a combination of keys together."""
        pass

    # Scrolling Actions
    @abstractmethod
    async def scroll(self, x: int, y: int) -> Dict[str, Any]:
        """Scroll the specified amount."""
        pass

    @abstractmethod
    async def scroll_down(self, clicks: int = 1) -> Dict[str, Any]:
        """Scroll down by the specified number of clicks."""
        pass

    @abstractmethod
    async def scroll_up(self, clicks: int = 1) -> Dict[str, Any]:
        """Scroll up by the specified number of clicks."""
        pass

    # Screen Actions
    @abstractmethod
    async def screenshot(self) -> Dict[str, Any]:
        """Take a screenshot and return base64 encoded image data."""
        pass

    @abstractmethod
    async def get_screen_size(self) -> Dict[str, Any]:
        """Get the screen size of the VM."""
        pass

    @abstractmethod
    async def get_cursor_position(self) -> Dict[str, Any]:
        """Get the current cursor position."""
        pass

    # Clipboard Actions
    @abstractmethod
    async def copy_to_clipboard(self) -> Dict[str, Any]:
        """Get the current clipboard content."""
        pass

    @abstractmethod
    async def set_clipboard(self, text: str) -> Dict[str, Any]:
        """Set the clipboard content."""
        pass

    @abstractmethod
    async def run_command(self, command: str) -> Dict[str, Any]:
        """Run a command and return the output."""
        pass

```

--------------------------------------------------------------------------------
/docs/content/docs/computer-sdk/tracing-api.mdx:
--------------------------------------------------------------------------------

```markdown
---
title: Tracing
description: Record computer interactions for debugging, training, and analysis
---

# Tracing

The Computer tracing API provides a powerful way to record computer interactions for debugging, training, analysis, and compliance purposes. Inspired by Playwright's tracing functionality, it offers flexible recording options and standardized output formats.

## Overview

The tracing API allows you to:

- Record screenshots at key moments
- Log all API calls and their results
- Capture accessibility tree snapshots
- Add custom metadata
- Export recordings in standardized formats
- Support for both automated and human-in-the-loop workflows

## Basic Usage

### Starting and Stopping Traces

```python
from computer import Computer

computer = Computer(os_type="macos")
await computer.run()

# Start tracing with default options
await computer.tracing.start()

# Perform some operations
await computer.interface.left_click(100, 200)
await computer.interface.type_text("Hello, World!")
await computer.interface.press_key("enter")

# Stop tracing and save
trace_path = await computer.tracing.stop()
print(f"Trace saved to: {trace_path}")
```

### Custom Configuration

```python
# Start tracing with custom configuration
await computer.tracing.start({
    'video': False,              # Record video frames
    'screenshots': True,         # Record screenshots (default: True)
    'api_calls': True,          # Record API calls (default: True)
    'accessibility_tree': True, # Record accessibility snapshots
    'metadata': True,           # Allow custom metadata (default: True)
    'name': 'my_custom_trace',  # Custom trace name
    'path': './my_traces'       # Custom output directory
})

# Add custom metadata during tracing
await computer.tracing.add_metadata('user_id', 'user123')
await computer.tracing.add_metadata('test_case', 'login_flow')

# Stop with custom options
trace_path = await computer.tracing.stop({
    'path': './exports/trace.zip',
    'format': 'zip'  # 'zip' or 'dir'
})
```

## Configuration Options

### Start Options

| Option               | Type | Default        | Description                           |
| -------------------- | ---- | -------------- | ------------------------------------- |
| `video`              | bool | `False`        | Record video frames (future feature)  |
| `screenshots`        | bool | `True`         | Capture screenshots after key actions |
| `api_calls`          | bool | `True`         | Log all interface method calls        |
| `accessibility_tree` | bool | `False`        | Record accessibility tree snapshots   |
| `metadata`           | bool | `True`         | Enable custom metadata recording      |
| `name`               | str  | auto-generated | Custom name for the trace             |
| `path`               | str  | auto-generated | Custom directory for trace files      |

### Stop Options

| Option   | Type | Default        | Description                        |
| -------- | ---- | -------------- | ---------------------------------- |
| `path`   | str  | auto-generated | Custom output path for final trace |
| `format` | str  | `'zip'`        | Output format: `'zip'` or `'dir'`  |

## Use Cases

### Custom Agent Development

```python
from computer import Computer

async def test_custom_agent():
    computer = Computer(os_type="linux")
    await computer.run()

    # Start tracing for this test session
    await computer.tracing.start({
        'name': 'custom_agent_test',
        'screenshots': True,
        'accessibility_tree': True
    })

    # Your custom agent logic here
    screenshot = await computer.interface.screenshot()
    await computer.interface.left_click(500, 300)
    await computer.interface.type_text("test input")

    # Add context about what the agent is doing
    await computer.tracing.add_metadata('action', 'filling_form')
    await computer.tracing.add_metadata('confidence', 0.95)

    # Save the trace
    trace_path = await computer.tracing.stop()
    return trace_path
```

### Training Data Collection

```python
async def collect_training_data():
    computer = Computer(os_type="macos")
    await computer.run()

    tasks = [
        "open_browser_and_search",
        "create_document",
        "send_email"
    ]

    for task in tasks:
        # Start a new trace for each task
        await computer.tracing.start({
            'name': f'training_{task}',
            'screenshots': True,
            'accessibility_tree': True,
            'metadata': True
        })

        # Add task metadata
        await computer.tracing.add_metadata('task_type', task)
        await computer.tracing.add_metadata('difficulty', 'beginner')

        # Perform the task (automated or human-guided)
        await perform_task(computer, task)

        # Save this training example
        await computer.tracing.stop({
            'path': f'./training_data/{task}.zip'
        })
```

### Human-in-the-Loop Recording

```python
async def record_human_demonstration():
    computer = Computer(os_type="windows")
    await computer.run()

    # Start recording human demonstration
    await computer.tracing.start({
        'name': 'human_demo_excel_workflow',
        'screenshots': True,
        'api_calls': True,  # Will capture any programmatic actions
        'metadata': True
    })

    print("Trace recording started. Perform your demonstration...")
    print("The system will record all computer interactions.")

    # Add metadata about the demonstration
    await computer.tracing.add_metadata('demonstrator', 'expert_user')
    await computer.tracing.add_metadata('workflow', 'excel_data_analysis')

    # Human performs actions manually or through other tools
    # Tracing will still capture any programmatic interactions

    input("Press Enter when demonstration is complete...")

    # Stop and save the demonstration
    trace_path = await computer.tracing.stop()
    print(f"Human demonstration saved to: {trace_path}")
```

### RPA Debugging

```python
async def debug_rpa_workflow():
    computer = Computer(os_type="linux")
    await computer.run()

    # Start tracing with full debugging info
    await computer.tracing.start({
        'name': 'rpa_debug_session',
        'screenshots': True,
        'accessibility_tree': True,
        'api_calls': True
    })

    try:
        # Your RPA workflow
        await rpa_login_sequence(computer)
        await rpa_data_entry(computer)
        await rpa_generate_report(computer)

        await computer.tracing.add_metadata('status', 'success')

    except Exception as e:
        # Record the error in the trace
        await computer.tracing.add_metadata('error', str(e))
        await computer.tracing.add_metadata('status', 'failed')
        raise
    finally:
        # Always save the debug trace
        trace_path = await computer.tracing.stop()
        print(f"Debug trace saved to: {trace_path}")
```

## Output Format

### Directory Structure

When using `format='dir'`, traces are saved with this structure:

```
trace_20240922_143052_abc123/
├── trace_metadata.json         # Overall trace information
├── event_000001_trace_start.json
├── event_000002_api_call.json
├── event_000003_api_call.json
├── 000001_initial_screenshot.png
├── 000002_after_left_click.png
├── 000003_after_type_text.png
└── event_000004_trace_end.json
```

### Metadata Format

The `trace_metadata.json` contains:

```json
{
  "trace_id": "trace_20240922_143052_abc123",
  "config": {
    "screenshots": true,
    "api_calls": true,
    "accessibility_tree": false,
    "metadata": true
  },
  "start_time": 1695392252.123,
  "end_time": 1695392267.456,
  "duration": 15.333,
  "total_events": 12,
  "screenshot_count": 5,
  "events": [...] // All events in chronological order
}
```

### Event Format

Individual events follow this structure:

```json
{
  "type": "api_call",
  "timestamp": 1695392255.789,
  "relative_time": 3.666,
  "data": {
    "method": "left_click",
    "args": { "x": 100, "y": 200, "delay": null },
    "result": null,
    "error": null,
    "screenshot": "000002_after_left_click.png",
    "success": true
  }
}
```

## Integration with ComputerAgent

The tracing API works seamlessly with existing ComputerAgent workflows:

```python
from agent import ComputerAgent
from computer import Computer

# Create computer and start tracing
computer = Computer(os_type="macos")
await computer.run()

await computer.tracing.start({
    'name': 'agent_with_tracing',
    'screenshots': True,
    'metadata': True
})

# Create agent using the same computer
agent = ComputerAgent(
    model="openai/computer-use-preview",
    tools=[computer]
)

# Agent operations will be automatically traced
async for _ in agent.run("open cua.ai and navigate to docs"):
    pass

# Save the combined trace
trace_path = await computer.tracing.stop()
```

## Privacy Considerations

The tracing API is designed with privacy in mind:

- Clipboard content is not recorded (only content length)
- Screenshots can be disabled
- Sensitive text input can be filtered
- Custom metadata allows you to control what information is recorded

## Comparison with ComputerAgent Trajectories

| Feature                | ComputerAgent Trajectories | Computer.tracing     |
| ---------------------- | -------------------------- | -------------------- |
| **Scope**              | ComputerAgent only         | Any Computer usage   |
| **Flexibility**        | Fixed format               | Configurable options |
| **Custom Agents**      | Not supported              | Fully supported      |
| **Human-in-the-loop**  | Limited                    | Full support         |
| **Real-time Control**  | No                         | Start/stop anytime   |
| **Output Format**      | Agent-specific             | Standardized         |
| **Accessibility Data** | No                         | Optional             |

## Best Practices

1. **Start tracing early**: Begin recording before your main workflow to capture the complete session
2. **Use meaningful names**: Provide descriptive trace names for easier organization
3. **Add contextual metadata**: Include information about what you're testing or demonstrating
4. **Handle errors gracefully**: Always stop tracing in a finally block
5. **Choose appropriate options**: Only record what you need to minimize overhead
6. **Organize output**: Use custom paths to organize traces by project or use case

The Computer tracing API provides a powerful foundation for recording, analyzing, and improving computer automation workflows across all use cases.

```

--------------------------------------------------------------------------------
/libs/lume/scripts/install.sh:
--------------------------------------------------------------------------------

```bash
#!/bin/bash
set -e

# Lume Installer
# This script installs Lume to your system

# Define colors for output
BOLD=$(tput bold)
NORMAL=$(tput sgr0)
RED=$(tput setaf 1)
GREEN=$(tput setaf 2)
BLUE=$(tput setaf 4)
YELLOW=$(tput setaf 3)

# Check if running as root or with sudo
if [ "$(id -u)" -eq 0 ] || [ -n "$SUDO_USER" ]; then
  echo "${RED}Error: Do not run this script with sudo or as root.${NORMAL}"
  echo "If you need to install to a system directory, create it first with proper permissions:"
  echo "  sudo mkdir -p /desired/directory && sudo chown $(whoami) /desired/directory"
  echo "Then run the installer normally:"
  echo "  ./install.sh --install-dir=/desired/directory"
  exit 1
fi

# Default installation directory (user-specific, doesn't require sudo)
DEFAULT_INSTALL_DIR="$HOME/.local/bin"
INSTALL_DIR="${INSTALL_DIR:-$DEFAULT_INSTALL_DIR}"

# GitHub info
GITHUB_REPO="trycua/cua"
LATEST_RELEASE_URL="https://api.github.com/repos/$GITHUB_REPO/releases/latest"

# Option to skip background service setup (default: install it)
INSTALL_BACKGROUND_SERVICE=true

# Default port for lume serve (default: 7777)
LUME_PORT=7777

# Parse command line arguments
while [ "$#" -gt 0 ]; do
  case "$1" in
    --install-dir)
      INSTALL_DIR="$2"
      shift
      ;;
    --port)
      LUME_PORT="$2"
      shift
      ;;
    --no-background-service)
      INSTALL_BACKGROUND_SERVICE=false
      ;;
    --help)
      echo "${BOLD}${BLUE}Lume Installer${NORMAL}"
      echo "Usage: $0 [OPTIONS]"
      echo ""
      echo "Options:"
      echo "  --install-dir DIR         Install to the specified directory (default: $DEFAULT_INSTALL_DIR)"
      echo "  --port PORT              Specify the port for lume serve (default: 7777)"
      echo "  --no-background-service   Do not setup the Lume background service (LaunchAgent)"
      echo "  --help                    Display this help message"
      echo ""
      echo "Examples:"
      echo "  $0                                   # Install to $DEFAULT_INSTALL_DIR and setup background service"
      echo "  $0 --install-dir=/usr/local/bin      # Install to system directory (may require root privileges)"
      echo "  $0 --port 7778                       # Use port 7778 instead of the default 7777"
      echo "  $0 --no-background-service           # Install without setting up the background service"
      echo "  INSTALL_DIR=/opt/lume $0             # Install to /opt/lume (legacy env var support)"
      exit 0
      ;;
    *)
      echo "${RED}Unknown option: $1${NORMAL}"
      echo "Use --help for usage information"
      exit 1
      ;;
  esac
  shift
done

echo "${BOLD}${BLUE}Lume Installer${NORMAL}"
echo "This script will install Lume to your system."

# Check if we're running with appropriate permissions
check_permissions() {
  # System directories that typically require root privileges
  SYSTEM_DIRS=("/usr/local/bin" "/usr/bin" "/bin" "/opt")
  
  NEEDS_ROOT=false
  for DIR in "${SYSTEM_DIRS[@]}"; do
    if [[ "$INSTALL_DIR" == "$DIR"* ]] && [ ! -w "$INSTALL_DIR" ]; then
      NEEDS_ROOT=true
      break
    fi
  done
  
  if [ "$NEEDS_ROOT" = true ]; then
    echo "${YELLOW}Warning: Installing to $INSTALL_DIR may require root privileges.${NORMAL}"
    echo "Consider these alternatives:"
    echo "  • Install to a user-writable location: $0 --install-dir=$HOME/.local/bin"
    echo "  • Create the directory with correct permissions first:"
    echo "    sudo mkdir -p $INSTALL_DIR && sudo chown $(whoami) $INSTALL_DIR"
    echo ""
    
    # Check if we already have write permission (might have been set up previously)
    if [ ! -w "$INSTALL_DIR" ] && [ ! -w "$(dirname "$INSTALL_DIR")" ]; then
      echo "${RED}Error: You don't have write permission to $INSTALL_DIR${NORMAL}"
      echo "Please choose a different installation directory or ensure you have the proper permissions."
      exit 1
    fi
  fi
}

# Detect OS and architecture
detect_platform() {
  OS=$(uname -s | tr '[:upper:]' '[:lower:]')
  ARCH=$(uname -m)
  
  if [ "$OS" != "darwin" ]; then
    echo "${RED}Error: Currently only macOS is supported.${NORMAL}"
    exit 1
  fi
  
  if [ "$ARCH" != "arm64" ]; then
    echo "${RED}Error: Lume only supports macOS on Apple Silicon (ARM64).${NORMAL}"
    exit 1
  fi
  
  PLATFORM="darwin-arm64"
  echo "Detected platform: ${BOLD}$PLATFORM${NORMAL}"
}

# Create temporary directory
create_temp_dir() {
  TEMP_DIR=$(mktemp -d)
  echo "Using temporary directory: $TEMP_DIR"
  
  # Make sure we clean up on exit
  trap 'rm -rf "$TEMP_DIR"' EXIT
}

# Download the latest release
download_release() {
  echo "Downloading latest Lume release..."
  
  # Use the direct download link with the non-versioned symlink
  DOWNLOAD_URL="https://github.com/$GITHUB_REPO/releases/latest/download/lume.tar.gz"
  echo "Downloading from: $DOWNLOAD_URL"
  
  # Download the tarball
  if command -v curl &> /dev/null; then
    curl -L --progress-bar "$DOWNLOAD_URL" -o "$TEMP_DIR/lume.tar.gz"
    
    # Verify the download was successful
    if [ ! -s "$TEMP_DIR/lume.tar.gz" ]; then
      echo "${RED}Error: Failed to download Lume.${NORMAL}"
      echo "The download URL may be incorrect or the file may not exist."
      exit 1
    fi
    
    # Verify the file is a valid archive
    if ! tar -tzf "$TEMP_DIR/lume.tar.gz" > /dev/null 2>&1; then
      echo "${RED}Error: The downloaded file is not a valid tar.gz archive.${NORMAL}"
      echo "Let's try the alternative URL..."
      
      # Try alternative URL
      ALT_DOWNLOAD_URL="https://github.com/$GITHUB_REPO/releases/latest/download/lume-$PLATFORM.tar.gz"
      echo "Downloading from alternative URL: $ALT_DOWNLOAD_URL"
      curl -L --progress-bar "$ALT_DOWNLOAD_URL" -o "$TEMP_DIR/lume.tar.gz"
      
      # Check again
      if ! tar -tzf "$TEMP_DIR/lume.tar.gz" > /dev/null 2>&1; then
        echo "${RED}Error: Could not download a valid Lume archive.${NORMAL}"
        echo "Please try installing Lume manually from: https://github.com/$GITHUB_REPO/releases/latest"
        exit 1
      fi
    fi
  else
    echo "${RED}Error: curl is required but not installed.${NORMAL}"
    exit 1
  fi
}

# Extract and install
install_binary() {
  echo "Extracting archive..."
  tar -xzf "$TEMP_DIR/lume.tar.gz" -C "$TEMP_DIR"
  
  echo "Installing to $INSTALL_DIR..."
  
  # Create install directory if it doesn't exist
  mkdir -p "$INSTALL_DIR"
  
  # Move the binary to the installation directory
  mv "$TEMP_DIR/lume" "$INSTALL_DIR/"
  
  # Make the binary executable
  chmod +x "$INSTALL_DIR/lume"
  
  echo "${GREEN}Installation complete!${NORMAL}"
  echo "Lume has been installed to ${BOLD}$INSTALL_DIR/lume${NORMAL}"
  
  # Check if the installation directory is in PATH
  if [ -n "${PATH##*$INSTALL_DIR*}" ]; then
    SHELL_NAME=$(basename "$SHELL")
    echo "${YELLOW}Warning: $INSTALL_DIR is not in your PATH.${NORMAL}"
    case "$SHELL_NAME" in
      zsh)
        echo "To add it, run:"
        echo "  echo 'export PATH=\"\$PATH:$INSTALL_DIR\"' >> ~/.zprofile"
        ;;
      bash)
        echo "To add it, run:"
        echo "  echo 'export PATH=\"\$PATH:$INSTALL_DIR\"' >> ~/.bash_profile"
        ;;
      fish)
        echo "To add it, run:"
        echo "  echo 'fish_add_path $INSTALL_DIR' >> ~/.config/fish/config.fish"
        ;;
      *)
        echo "Add $INSTALL_DIR to your PATH in your shell profile file."
        ;;
    esac
  fi
}

# Main installation flow
main() {
  check_permissions
  detect_platform
  create_temp_dir
  download_release
  install_binary

  echo ""
  echo "${GREEN}${BOLD}Lume has been successfully installed!${NORMAL}"
  echo "Run ${BOLD}lume${NORMAL} to get started."

  if [ "$INSTALL_BACKGROUND_SERVICE" = true ]; then
    # --- Setup background service (LaunchAgent) for Lume ---
    SERVICE_NAME="com.trycua.lume_daemon"
    PLIST_PATH="$HOME/Library/LaunchAgents/$SERVICE_NAME.plist"
    LUME_BIN="$INSTALL_DIR/lume"

    echo ""
    echo "Setting up LaunchAgent to run lume daemon on login..."

    # Create LaunchAgents directory if it doesn't exist
    mkdir -p "$HOME/Library/LaunchAgents"

    # Unload existing service if present
    if [ -f "$PLIST_PATH" ]; then
      echo "Existing LaunchAgent found. Unloading..."
      launchctl unload "$PLIST_PATH" 2>/dev/null || true
    fi

    # Create the plist file
    cat <<EOF > "$PLIST_PATH"
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>$SERVICE_NAME</string>
    <key>ProgramArguments</key>
    <array>
        <string>$LUME_BIN</string>
        <string>serve</string>
        <string>--port</string>
        <string>$LUME_PORT</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>WorkingDirectory</key>
    <string>$HOME</string>
    <key>EnvironmentVariables</key>
    <dict>
        <key>PATH</key>
        <string>/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:$HOME/.local/bin</string>
        <key>HOME</key>
        <string>$HOME</string>
    </dict>
    <key>StandardOutPath</key>
    <string>/tmp/lume_daemon.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/lume_daemon.error.log</string>
    <key>ProcessType</key>
    <string>Interactive</string>
    <key>SessionType</key>
    <string>Aqua</string>
</dict>
</plist>
EOF

    # Set permissions
    chmod 644 "$PLIST_PATH"
    touch /tmp/lume_daemon.log /tmp/lume_daemon.error.log
    chmod 644 /tmp/lume_daemon.log /tmp/lume_daemon.error.log

    # Load the LaunchAgent
    echo "Loading LaunchAgent..."
    launchctl unload "$PLIST_PATH" 2>/dev/null || true
    launchctl load "$PLIST_PATH"

    echo "${GREEN}Lume daemon LaunchAgent installed and loaded. It will start automatically on login!${NORMAL}"
    echo "To check status: launchctl list | grep $SERVICE_NAME"
    echo "To view logs: tail -f /tmp/lume_daemon.log"
    echo ""
    echo "To remove the lume daemon service, run:"
    echo "  launchctl unload \"$PLIST_PATH\""
    echo "  rm \"$PLIST_PATH\""
  else
    SERVICE_NAME="com.trycua.lume_daemon"
    PLIST_PATH="$HOME/Library/LaunchAgents/$SERVICE_NAME.plist"
    if [ -f "$PLIST_PATH" ]; then
      echo "Removing existing Lume background service (LaunchAgent)..."
      launchctl unload "$PLIST_PATH" 2>/dev/null || true
      rm "$PLIST_PATH"
      echo "Lume background service (LaunchAgent) removed."
    else
      echo "Skipping Lume background service (LaunchAgent) setup as requested (use --no-background-service)."
    fi
  fi
}

# Run the installation
main

```

--------------------------------------------------------------------------------
/blog/hack-the-north.md:
--------------------------------------------------------------------------------

```markdown
# What happens when hackathon judging is a public benchmark (Hack the North edition)

_Written by Francesco Bonacci — Reviewed by Parth Patel (HUD W25) — Sept 25, 2025_

## Prologue

Hack the North ran Sept 12–14 at the University of Waterloo. Official count this year: **1,778 hackers**, and a [Guinness World Record for the most people building interlocking plastic brick sculptures simultaneously](https://uwaterloo.ca/news/eweal-making-hackathons-fun-again-breaking-guinness-world-record).

Our team arrived from Europe and the US one day before the hackathon, after a summer scattered post–YC X25, waiting for our O-1 visas. **HUD**’s founders Parth and Jay flew in from SF to help us run evaluations, and Michael and Parth from **Ollama** joined as co-sponsors.

Our plan was ambitious: run the **first state-of-the-art Computer-Use Agents track**, score it on a public benchmark, and give the top performer a guaranteed YC interview. (Interview ≠ offer. YC didn’t judge.)

The rest, as they say, was a 36h story worth telling—and a playbook worth sharing for anyone thinking about running or sponsoring this type of hackathon track.

![hack-cua-ollama-hud](./assets/hack-cua-ollama-hud.jpeg)

## The sign-up problem we had to invent

We joined as a sponsor at the last minute, thanks to a push from our friend @Michael Chiang at Ollama—Waterloo alum, naturally. It’s kind of an open secret that UWaterloo turns out some of the sharpest hackers around (_no pun intended, HackMIT_). It was a bit of a scramble, but also great timing—our Agent framework had just finished a major refactor, with support for **100+ VLM configurations** now live. Naturally, we wanted to stress-test it at scale—and see whether teams could come up with SOTA-level setups. _This wasn’t a blank-slate, build-whatever-you-want kind of track._

From day one, though, we knew we’d have to fight for sign-ups. This was a niche track, and a guaranteed YC interview alone wouldn’t be enough to pull people in.

Unfortunately, Hack the North (HTN) didn’t offer an interest form to help us estimate demand, which made capacity planning tricky—especially with early-stage infra. Stress-testing takes foresight, and multimodal language model usage is still costly (~1.5× to 3–4× the price of comparable text-only models).

On top of that, we were discouraged from external promotion on [lu.ma](http://lu.ma). So we spun up our own sign-up page at **cua.ai/hackathon** and built ad-hoc Discord channels to share track details. We emphasized—repeatedly—that only students already accepted to Hack the North should register.

_(Moral: the “measure-zero effect”—no matter how many times you say it, some people won’t see it. Plenty of invalid sign-ups still slipped through.)_

Even so, having your own form is absolutely worth it: it gives you an **early funnel**, surfaces demand signals ahead of time, and—crucially—**lets you require platform sign-up before kickoff**. In our case, Hack the North didn’t provide Devpost access until the very end, so our form was the only way to build a working roster.

Only a small trickle of sign-ups came through by the time the event kicked off—too few to plan around, but clearly the right kind of crowd. Several were already familiar with computer-use agents; one was even interning at Shopify, working on this space.

## At the Sponsor Booth

Day 0 on campus made the difference. We arrived a couple of hours early to collect swag shipments (around 1,200 stickers of our new **Cua-la** mascot, plus t-shirts and hats—always plan ~1.5× the estimated number of hackers!). After walking the sponsor floor and explaining the track at our booth, ~40 hackers signed up.

**Moral:** sponsor booths are still the most effective way to recruit for a track.

**Suggestions to maximize booth time (for HTN this is only ~24 of the total 36 hours):**

- **Be unmistakable.** Run a mini-challenge and a visible giveaway. We offered 5 × $200 Anthropic credits as a lightning raffle and constantly advertised in HTN Slack. Shout-out to our neighbors at **Mintlify**, who dressed their teammate as a mint plant - memorable and effective.
- **Create multiple touchpoints.** Hand out flyers and QR codes, and ask nearby booths to cross-refer. Big thanks to the YC team for flyer space and student connections - and to Michael (Ollama) for pointing visitors our way.
- **Never leave the booth empty.** Keep someone at the booth at all times and rotate shifts. With four founding engineers on-site, coverage was easy. Even after hacking kicked off, the booth stayed a point of reference - and even then multiple participants DM’d us asking where to meet up.
- **Students are organic DevRel.** Our runner-up, Adam, hung out with us at the booth, pulling more people in. Peer-to-peer energy creates the network effect you need!

![hack-booth](./assets/hack-booth.png)

_(Our Founding Engineer, Morgan, hangs out with students at the stand, while Adam (runner-up) hacks on the side.)_

## 02:30 a.m. is still prime time at a hackathon

Hack the North gives sponsors a 30-minute API Workshop during the early hours of the event—a perfect moment to shift from talking to building.

Our slot landed at **2:30 a.m.** (_perks of the cheapest sponsor tier_). Thirty students showed up, energy surprisingly high. James, our new Founding DevRel Engineer, led the session and nailed it.

**Our track rules were simple:**

1. Build a Computer-Use Agent with the [Cua framework](https://github.com/trycua/cua)
2. Benchmark the agent on [HUD](https://www.hud.so)
3. Use [OSWorld-Tiny](https://huggingface.co/datasets/ddupont/OSWorld-Tiny-Public): a 14-task distillation of the full benchmark (~360 tasks, >1h)

**Suggestions:**

- **Leave something tangible.** We provided a Jupyter Notebook teams could run immediately.
- **Narrow scope, strong starts.** The more focused the challenge, the more **robust starting points** you should provide.
- **Want the details?** [Here’s the notebook we left participants](https://github.com/trycua/cua/blob/main/notebooks/sota_hackathon.ipynb).

![hack-booth](./assets/hack-workshop.jpeg)

_(Our CUA Workshop at 2:30 AM.)_

## Making it possible to focus on the work

If you’re an OSS framework, it’s tempting to have hackers self-host on laptops. **Don’t.** You’ll spend the workshop debugging setups instead of reviewing ideas.

**Lesson learned:** within hours, we shifted to **cloud-only Sandboxes**. Payoff: consistent environments, faster starts, far less tech support.

We provided:

- **Credits:** $200 Cua Cloud + $200 HUD per team (manual top-ups for visible progress)
- **LLMs/VLMs:** Anthropic assigned $50 per participant—tight for VLM iteration—so we added capped access under our org
- **Pre-kickoff provisioning:** Platform sign-up auto-created projects, keys, and sandboxes

**Takeaway:** every minute not spent on setup is a minute gained for iterating.

## 12 Hours in the Hackathon

**After the workshop buzz.** Morning interest was high, but Docker setup + requiring focus on a single track thinned the crowd. Most sponsor prizes are broad (“use our product and you qualify”), letting students stack tracks. Ours required commitment. Upside: those who stayed shipped sharper, higher-quality submissions.

**The bell curve of submissions.** Most entries used _claude-sonnet-4-20250514_—proof that docs and public leaderboards ([OSWorld](https://os-world.github.io/#benchmark)) guide choices. Results clustered around the safe pick, with fewer pushing boundaries.

**Who went beyond the baseline.** A few tried multi-agent/tool graphs. One standout—[**cuala**](https://github.com/YeIIcw/cuala)—was a clean reference: deterministic actions, verifiable state changes, callbacks for saving images and trajectories.

**Bottom line:** Early excitement is easy; keeping teams engaged requires reducing friction and offering multiple entry points.

### What broke (and why)

We skipped a full end-to-end **Cua × HUD** dry-run. It showed.

- Hackers ran out of inference credits. Desktop tasks are token-heavy. A full OSWorld run (200 max steps) for _computer-use-preview_ (OpenAI Operator API) can cost >$600. Serious attempts: ~400k tokens × 14 tasks.
- Python version/build mismatches surfaced, requiring debug time across both OSS repos.
- Our Cua framework lacked a **Response Agent** to complete evaluation loops. Some runs stalled until patched.

## Scoring and Results

### Participation & Outcomes

- ~**30** hackers gave the track a serious try; **5** crossed the finish line
- All submissions were **solo**, mostly undergrads
- Judging: OSWorld-Tiny on HUD, with Cua + HUD reruns to verify scores
- Final leaderboard: [HUD Leaderboard](https://www.hud.so/leaderboards/ddupont/OSWorld-Tiny-Public)

![hack-leaderboard](./assets/hack-leaderboard.png)

_(Leaderboard on HUD)_

### Winners

**🥇 Winner — Ram**

- Devpost: https://devpost.com/software/sota-computer-use-agent-challenge
- Code: https://github.com/Ram-Raghav-S/cua/tree/ram
- Score: 68.3%

**🥈 Runner-up — Aryan**

- Devpost: https://devpost.com/software/loopdeloop-computer-use-agent-sota-attempt
- Code: https://github.com/Tumph/cua
- Score: 55.9%

**🥉 Special Mention — Adam**

- Devpost: https://devpost.com/software/cuala
- Code: https://github.com/YeIIcw/cuala
- Score: 42.1%

![hack-winners](./assets/hack-winners.jpeg)

_(Our finalists before the award ceremony)_

## What We’d Keep

- **Sponsor Hack the North again**
- **Keep a visible, staffed booth**
- **Publish a compact FAQ**
- **Simple, transparent scoring**

## What We’d Change

- **Run a full Cua × HUD dry-run under load**
- **Offer multiple on-ramps (evals, creative, RL)**
- **Keep a private eval set for judging**
- **Default to cloud sandboxes**
- **Handle ops earlier (swag, signage, QR codes)**
- **Reward generalization, not lucky runs**

## Closing Thoughts

Our first outing as sponsors wasn’t perfect, but it gave us a working playbook: **provision cloud early, keep scoring simple, always dry-run infra, and make the booth unforgettable**.

If more hackathon tracks leaned on **public benchmarks**, weekends like this would produce fewer demos-for-show and more measurable progress.

**P.S.** Huge thanks to the Ollama and HUD teams for co-sponsoring the track, and to our YC Partner Diana for offering a **guaranteed YC interview** as first prize.

Whether you’re a hacker who wants to participate, or a company looking to sponsor, let’s talk — we’re especially excited to support benchmark-first hackathon tracks in the Bay Area this year.

![hack-closing-ceremony](./assets/hack-closing-ceremony.jpg)

_(HTN Closing Ceremony — Cua Track Winner Announcement)_

```

--------------------------------------------------------------------------------
/docs/content/docs/agent-sdk/supported-model-providers/cua-vlm-router.mdx:
--------------------------------------------------------------------------------

```markdown
---
title: CUA VLM Router
description: Intelligent vision-language model routing with cost optimization and unified access
---

# CUA VLM Router

The **CUA VLM Router** is an intelligent inference API that provides unified access to multiple vision-language model providers through a single API key. It offers cost optimization and detailed observability for production AI applications.

## Overview

Instead of managing multiple API keys and provider-specific code, CUA VLM Router acts as a smart cloud gateway that:

- **Unifies access** to multiple model providers
- **Optimizes costs** through intelligent routing and provider selection
- **Tracks usage** and costs with detailed metadata
- **Provides observability** with routing decisions and attempt logs
- **Managed infrastructure** - no need to manage provider API keys yourself

## Quick Start

### 1. Get Your API Key

Sign up at [cua.ai](https://cua.ai/signin) and get your CUA API key from the dashboard.

### 2. Set Environment Variable

```bash
export CUA_API_KEY="sk_cua-api01_..."
```

### 3. Use with Agent SDK

```python
from agent import ComputerAgent
from computer import Computer

computer = Computer(os_type="linux", provider_type="docker")

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    tools=[computer],
    max_trajectory_budget=5.0
)

messages = [{"role": "user", "content": "Take a screenshot and tell me what's on screen"}]

async for result in agent.run(messages):
    for item in result["output"]:
        if item["type"] == "message":
            print(item["content"][0]["text"])
```

## Available Models

The CUA VLM Router currently supports these models:

| Model ID                          | Provider  | Description       | Best For                                |
| --------------------------------- | --------- | ----------------- | --------------------------------------- |
| `cua/anthropic/claude-sonnet-4.5` | Anthropic | Claude Sonnet 4.5 | General-purpose tasks, recommended      |
| `cua/anthropic/claude-opus-4.5`   | Anthropic | Claude Opus 4.5   | Enhanced agentic and computer-use tasks |
| `cua/anthropic/claude-haiku-4.5`  | Anthropic | Claude Haiku 4.5  | Fast responses, cost-effective          |
| `cua/qwen/qwen3-vl-235b`          | Qwen      | Qwen3 VL 235B     | Large-scale vision-language tasks       |

## How It Works

### Intelligent Routing

When you make a request to CUA VLM Router:

1. **Model Resolution**: Your model ID (e.g., `cua/anthropic/claude-sonnet-4.5`) is resolved to the appropriate provider
2. **Provider Selection**: CUA routes your request to the appropriate model provider
3. **Response**: You receive an OpenAI-compatible response with metadata

## API Reference

### Base URL

```
https://inference.cua.ai/v1
```

### Authentication

All requests require an API key in the Authorization header:

```bash
Authorization: Bearer sk_cua-api01_...
```

### Endpoints

#### List Available Models

```bash
GET /v1/models
```

**Response:**

```json
{
  "data": [
    {
      "id": "anthropic/claude-sonnet-4.5",
      "name": "Claude Sonnet 4.5",
      "object": "model",
      "owned_by": "cua"
    }
  ],
  "object": "list"
}
```

#### Chat Completions

```bash
POST /v1/chat/completions
Content-Type: application/json
```

**Request:**

```json
{
  "model": "anthropic/claude-sonnet-4.5",
  "messages": [{ "role": "user", "content": "Hello!" }],
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}
```

**Response:**

```json
{
  "id": "gen_...",
  "object": "chat.completion",
  "created": 1763554838,
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 12,
    "total_tokens": 22,
    "cost": 0.01,
    "is_byok": true
  }
}
```

#### Streaming

Set `"stream": true` to receive server-sent events:

```bash
curl -X POST https://inference.cua.ai/v1/chat/completions \
  -H "Authorization: Bearer sk_cua-api01_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'
```

**Response (SSE format):**

```
data: {"id":"gen_...","choices":[{"delta":{"content":"1"}}],"object":"chat.completion.chunk"}

data: {"id":"gen_...","choices":[{"delta":{"content":"\n2"}}],"object":"chat.completion.chunk"}

data: {"id":"gen_...","choices":[{"delta":{"content":"\n3\n4\n5"}}],"object":"chat.completion.chunk"}

data: {"id":"gen_...","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{...}}
```

#### Check Balance

```bash
GET /v1/balance
```

**Response:**

```json
{
  "balance": 211689.85,
  "currency": "credits"
}
```

## Cost Tracking

CUA VLM Router provides detailed cost information in every response:

### Credit System

Requests are billed in **credits**:

- Credits are deducted from your CUA account balance
- Prices vary by model and usage
- CUA manages all provider API keys and infrastructure

### Response Cost Fields

```json
{
  "usage": {
    "cost": 0.01, // CUA gateway cost in credits
    "market_cost": 0.000065 // Actual upstream API cost
  }
}
```

**Note:** CUA VLM Router is a fully managed cloud service. If you want to use your own provider API keys directly (BYOK), see the [Supported Model Providers](/agent-sdk/supported-model-providers/) page for direct provider access via the agent SDK.

## Response Metadata

CUA VLM Router includes metadata about routing decisions and costs in the response. This information helps with debugging and monitoring your application's model usage.

## Configuration

### Environment Variables

```bash
# Required: Your CUA API key
export CUA_API_KEY="sk_cua-api01_..."

# Optional: Custom endpoint (defaults to https://inference.cua.ai/v1)
export CUA_BASE_URL="https://custom-endpoint.cua.ai/v1"
```

### Python SDK Configuration

```python
from agent import ComputerAgent

# Using environment variables (recommended)
agent = ComputerAgent(model="cua/anthropic/claude-sonnet-4.5")

# Or explicit configuration
agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    # CUA adapter automatically loads from CUA_API_KEY
)
```

## Benefits Over Direct Provider Access

| Feature                    | CUA VLM Router               | Direct Provider (BYOK)            |
| -------------------------- | ---------------------------- | --------------------------------- |
| **Single API Key**         | ✅ One key for all providers | ❌ Multiple keys to manage        |
| **Managed Infrastructure** | ✅ No API key management     | ❌ Manage multiple provider keys  |
| **Usage Tracking**         | ✅ Unified dashboard         | ❌ Per-provider tracking          |
| **Model Switching**        | ✅ Change model string only  | ❌ Change code + keys             |
| **Setup Complexity**       | ✅ One environment variable  | ❌ Multiple environment variables |

## Error Handling

### Common Error Responses

#### Invalid API Key

```json
{
  "detail": "Insufficient credits. Current balance: 0.00 credits"
}
```

#### Missing Authorization

```json
{
  "detail": "Missing Authorization: Bearer token"
}
```

#### Invalid Model

```json
{
  "detail": "Invalid or unavailable model"
}
```

### Best Practices

1. **Check balance periodically** using `/v1/balance`
2. **Handle rate limits** with exponential backoff
3. **Log generation IDs** for debugging
4. **Set up usage alerts** in your CUA dashboard

## Examples

### Basic Usage

```python
from agent import ComputerAgent
from computer import Computer

computer = Computer(os_type="linux", provider_type="docker")

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",
    tools=[computer]
)

messages = [{"role": "user", "content": "Open Firefox"}]

async for result in agent.run(messages):
    print(result)
```

### Direct API Call (curl)

```bash
curl -X POST https://inference.cua.ai/v1/chat/completions \
  -H "Authorization: Bearer ${CUA_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.5",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "max_tokens": 200
  }'
```

### With Custom Parameters

```python
agent = ComputerAgent(
    model="cua/anthropic/claude-haiku-4.5",
    tools=[computer],
    max_trajectory_budget=10.0,
    temperature=0.7
)
```

### Using Qwen3 VL 235B

```python
from agent import ComputerAgent
from computer import Computer

computer = Computer(os_type="linux", provider_type="docker")

agent = ComputerAgent(
    model="cua/qwen/qwen3-vl-235b",
    tools=[computer],
    only_n_most_recent_images=3
)

messages = [{"role": "user", "content": "Open a browser and search for Python tutorials"}]

async for result in agent.run(messages):
    print(result)
```

### Using Claude Opus 4.5

```python
from agent import ComputerAgent
from computer import Computer

computer = Computer(
    os_type="linux",
    provider_type="cloud",
    name="your-container-name",
    api_key="your-cua-api-key"
)

agent = ComputerAgent(
    model="cua/anthropic/claude-opus-4.5",
    tools=[computer],
    instructions="You are a helpful assistant that can control computers",
    only_n_most_recent_images=3
)

messages = [{"role": "user", "content": "Open a browser and search for Python tutorials"}]

async for result in agent.run(messages):
    print(result)
```

## Migration from Direct Provider Access

Switching from direct provider access (BYOK) to CUA VLM Router is simple:

**Before (Direct Provider Access with BYOK):**

```python
import os
# Required: Provider-specific API key
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

agent = ComputerAgent(
    model="anthropic/claude-sonnet-4-5-20250929",
    tools=[computer]
)
```

**After (CUA VLM Router - Cloud Service):**

```python
import os
# Required: CUA API key only (no provider keys needed)
os.environ["CUA_API_KEY"] = "sk_cua-api01_..."

agent = ComputerAgent(
    model="cua/anthropic/claude-sonnet-4.5",  # Add "cua/" prefix
    tools=[computer]
)
```

That's it! Same code structure, just different model format. CUA manages all provider infrastructure and credentials for you.

## Support

- **Documentation**: [cua.ai/docs](https://cua.ai/docs)
- **Discord**: [Join our community](https://discord.com/invite/mVnXXpdE85)
- **Issues**: [GitHub Issues](https://github.com/trycua/cua/issues)

## Next Steps

- Explore [Agent Loops](/agent-sdk/agent-loops) to customize agent behavior
- Learn about [Cost Saving Callbacks](/agent-sdk/callbacks/cost-saving)
- Try [Example Use Cases](/example-usecases/form-filling)
- Review [Supported Model Providers](/agent-sdk/supported-model-providers/) for all options

```

--------------------------------------------------------------------------------
/libs/python/agent/agent/callbacks/logging.py:
--------------------------------------------------------------------------------

```python
"""
Logging callback for ComputerAgent that provides configurable logging of agent lifecycle events.
"""

import json
import logging
from typing import Any, Dict, List, Optional, Union

from .base import AsyncCallbackHandler


def sanitize_image_urls(data: Any) -> Any:
    """
    Recursively search for 'image_url' keys and set their values to '[omitted]'.

    Args:
        data: Any data structure (dict, list, or primitive type)

    Returns:
        A deep copy of the data with all 'image_url' values replaced with '[omitted]'
    """
    if isinstance(data, dict):
        # Create a copy of the dictionary
        sanitized = {}
        for key, value in data.items():
            if key == "image_url":
                sanitized[key] = "[omitted]"
            else:
                # Recursively sanitize the value
                sanitized[key] = sanitize_image_urls(value)
        return sanitized

    elif isinstance(data, list):
        # Recursively sanitize each item in the list
        return [sanitize_image_urls(item) for item in data]

    else:
        # For primitive types (str, int, bool, None, etc.), return as-is
        return data


class LoggingCallback(AsyncCallbackHandler):
    """
    Callback handler that logs agent lifecycle events with configurable verbosity.

    Logging levels:
    - DEBUG: All events including API calls, message preprocessing, and detailed outputs
    - INFO: Major lifecycle events (start/end, messages, outputs)
    - WARNING: Only warnings and errors
    - ERROR: Only errors
    """

    def __init__(self, logger: Optional[logging.Logger] = None, level: int = logging.INFO):
        """
        Initialize the logging callback.

        Args:
            logger: Logger instance to use. If None, creates a logger named 'agent.ComputerAgent'
            level: Logging level (logging.DEBUG, logging.INFO, etc.)
        """
        self.logger = logger or logging.getLogger("agent.ComputerAgent")
        self.level = level

        # Set up logger if it doesn't have handlers
        if not self.logger.handlers:
            handler = logging.StreamHandler()
            formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
            handler.setFormatter(formatter)
            self.logger.addHandler(handler)
            self.logger.setLevel(level)

    def _update_usage(self, usage: Dict[str, Any]) -> None:
        """Update total usage statistics."""

        def add_dicts(target: Dict[str, Any], source: Dict[str, Any]) -> None:
            for key, value in source.items():
                if isinstance(value, dict):
                    if key not in target:
                        target[key] = {}
                    add_dicts(target[key], value)
                else:
                    if key not in target:
                        target[key] = 0
                    target[key] += value

        add_dicts(self.total_usage, usage)

    async def on_run_start(self, kwargs: Dict[str, Any], old_items: List[Dict[str, Any]]) -> None:
        """Called before the run starts."""
        self.total_usage = {}

    async def on_usage(self, usage: Dict[str, Any]) -> None:
        """Called when usage information is received."""
        self._update_usage(usage)

    async def on_run_end(
        self,
        kwargs: Dict[str, Any],
        old_items: List[Dict[str, Any]],
        new_items: List[Dict[str, Any]],
    ) -> None:
        """Called after the run ends."""

        def format_dict(d, indent=0):
            lines = []
            prefix = f" - {' ' * indent}"
            for key, value in d.items():
                if isinstance(value, dict):
                    lines.append(f"{prefix}{key}:")
                    lines.extend(format_dict(value, indent + 1))
                elif isinstance(value, float):
                    lines.append(f"{prefix}{key}: ${value:.4f}")
                else:
                    lines.append(f"{prefix}{key}: {value}")
            return lines

        formatted_output = "\n".join(format_dict(self.total_usage))
        self.logger.info(f"Total usage:\n{formatted_output}")

    async def on_llm_start(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Called before LLM processing starts."""
        if self.logger.isEnabledFor(logging.INFO):
            self.logger.info(f"LLM processing started with {len(messages)} messages")
        if self.logger.isEnabledFor(logging.DEBUG):
            sanitized_messages = [sanitize_image_urls(msg) for msg in messages]
            self.logger.debug(f"LLM input messages: {json.dumps(sanitized_messages, indent=2)}")
        return messages

    async def on_llm_end(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
        """Called after LLM processing ends."""
        if self.logger.isEnabledFor(logging.DEBUG):
            sanitized_messages = [sanitize_image_urls(msg) for msg in messages]
            self.logger.debug(f"LLM output: {json.dumps(sanitized_messages, indent=2)}")
        return messages

    async def on_computer_call_start(self, item: Dict[str, Any]) -> None:
        """Called when a computer call starts."""
        action = item.get("action", {})
        action_type = action.get("type", "unknown")
        action_args = {k: v for k, v in action.items() if k != "type"}

        # INFO level logging for the action
        self.logger.info(f"Computer: {action_type}({action_args})")

        # DEBUG level logging for full details
        if self.logger.isEnabledFor(logging.DEBUG):
            self.logger.debug(f"Computer call started: {json.dumps(action, indent=2)}")

    async def on_computer_call_end(self, item: Dict[str, Any], result: Any) -> None:
        """Called when a computer call ends."""
        if self.logger.isEnabledFor(logging.DEBUG):
            action = item.get("action", "unknown")
            self.logger.debug(f"Computer call completed: {json.dumps(action, indent=2)}")
            if result:
                sanitized_result = sanitize_image_urls(result)
                self.logger.debug(f"Computer call result: {json.dumps(sanitized_result, indent=2)}")

    async def on_function_call_start(self, item: Dict[str, Any]) -> None:
        """Called when a function call starts."""
        name = item.get("name", "unknown")
        arguments = item.get("arguments", "{}")

        # INFO level logging for the function call
        self.logger.info(f"Function: {name}({arguments})")

        # DEBUG level logging for full details
        if self.logger.isEnabledFor(logging.DEBUG):
            self.logger.debug(f"Function call started: {name}")

    async def on_function_call_end(self, item: Dict[str, Any], result: Any) -> None:
        """Called when a function call ends."""
        # INFO level logging for function output (similar to function_call_output)
        if result:
            # Handle both list and direct result formats
            if isinstance(result, list) and len(result) > 0:
                output = (
                    result[0].get("output", str(result))
                    if isinstance(result[0], dict)
                    else str(result[0])
                )
            else:
                output = str(result)

            # Truncate long outputs
            if len(output) > 100:
                output = output[:100] + "..."

            self.logger.info(f"Output: {output}")

        # DEBUG level logging for full details
        if self.logger.isEnabledFor(logging.DEBUG):
            name = item.get("name", "unknown")
            self.logger.debug(f"Function call completed: {name}")
            if result:
                self.logger.debug(f"Function call result: {json.dumps(result, indent=2)}")

    async def on_text(self, item: Dict[str, Any]) -> None:
        """Called when a text message is encountered."""
        # Get the role to determine if it's Agent or User
        role = item.get("role", "unknown")
        content_items = item.get("content", [])

        # Process content items to build display text
        text_parts = []
        for content_item in content_items:
            content_type = content_item.get("type", "output_text")
            if content_type == "output_text":
                text_content = content_item.get("text", "")
                if not text_content.strip():
                    text_parts.append("[empty]")
                else:
                    # Truncate long text and add ellipsis
                    if len(text_content) > 2048:
                        text_parts.append(text_content[:2048] + "...")
                    else:
                        text_parts.append(text_content)
            else:
                # Non-text content, show as [type]
                text_parts.append(f"[{content_type}]")

        # Join all text parts
        display_text = "".join(text_parts) if text_parts else "[empty]"

        # Log with appropriate level and format
        if role == "assistant":
            self.logger.info(f"Agent: {display_text}")
        elif role == "user":
            self.logger.info(f"User: {display_text}")
        else:
            # Fallback for unknown roles, use debug level
            if self.logger.isEnabledFor(logging.DEBUG):
                self.logger.debug(f"Text message ({role}): {display_text}")

    async def on_api_start(self, kwargs: Dict[str, Any]) -> None:
        """Called when an API call is about to start."""
        if self.logger.isEnabledFor(logging.DEBUG):
            model = kwargs.get("model", "unknown")
            self.logger.debug(f"API call starting for model: {model}")
            # Log sanitized messages if present
            if "messages" in kwargs:
                sanitized_messages = sanitize_image_urls(kwargs["messages"])
                self.logger.debug(f"API call messages: {json.dumps(sanitized_messages, indent=2)}")
            elif "input" in kwargs:
                sanitized_input = sanitize_image_urls(kwargs["input"])
                self.logger.debug(f"API call input: {json.dumps(sanitized_input, indent=2)}")

    async def on_api_end(self, kwargs: Dict[str, Any], result: Any) -> None:
        """Called when an API call has completed."""
        if self.logger.isEnabledFor(logging.DEBUG):
            model = kwargs.get("model", "unknown")
            self.logger.debug(f"API call completed for model: {model}")
            self.logger.debug(
                f"API call result: {json.dumps(sanitize_image_urls(result), indent=2)}"
            )

    async def on_screenshot(self, item: Union[str, bytes], name: str = "screenshot") -> None:
        """Called when a screenshot is taken."""
        if self.logger.isEnabledFor(logging.DEBUG):
            image_size = len(item) / 1024
            self.logger.debug(f"Screenshot captured: {name} {image_size:.2f} KB")

```

--------------------------------------------------------------------------------
/libs/python/computer/computer/tracing_wrapper.py:
--------------------------------------------------------------------------------

```python
"""
Tracing wrapper for computer interface that records API calls.
"""

from typing import Any, Dict, List, Optional, Tuple

from .interface.base import BaseComputerInterface


class TracingInterfaceWrapper:
    """
    Wrapper class that intercepts computer interface calls and records them for tracing.
    """

    def __init__(self, original_interface: BaseComputerInterface, tracing_instance):
        """
        Initialize the tracing wrapper.

        Args:
            original_interface: The original computer interface
            tracing_instance: The ComputerTracing instance
        """
        self._original_interface = original_interface
        self._tracing = tracing_instance

    def __getattr__(self, name):
        """
        Delegate attribute access to the original interface if not found in wrapper.
        """
        return getattr(self._original_interface, name)

    async def _record_call(
        self,
        method_name: str,
        args: Dict[str, Any],
        result: Any = None,
        error: Optional[Exception] = None,
    ):
        """
        Record an API call for tracing.

        Args:
            method_name: Name of the method called
            args: Arguments passed to the method
            result: Result returned by the method
            error: Exception raised, if any
        """
        if self._tracing.is_tracing:
            await self._tracing.record_api_call(method_name, args, result, error)

    # Mouse Actions
    async def left_click(
        self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
    ) -> None:
        """Perform a left mouse button click."""
        args = {"x": x, "y": y, "delay": delay}
        error = None
        try:
            result = await self._original_interface.left_click(x, y, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("left_click", args, None, error)

    async def right_click(
        self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
    ) -> None:
        """Perform a right mouse button click."""
        args = {"x": x, "y": y, "delay": delay}
        error = None
        try:
            result = await self._original_interface.right_click(x, y, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("right_click", args, None, error)

    async def double_click(
        self, x: Optional[int] = None, y: Optional[int] = None, delay: Optional[float] = None
    ) -> None:
        """Perform a double left mouse button click."""
        args = {"x": x, "y": y, "delay": delay}
        error = None
        try:
            result = await self._original_interface.double_click(x, y, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("double_click", args, None, error)

    async def move_cursor(self, x: int, y: int, delay: Optional[float] = None) -> None:
        """Move the cursor to the specified screen coordinates."""
        args = {"x": x, "y": y, "delay": delay}
        error = None
        try:
            result = await self._original_interface.move_cursor(x, y, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("move_cursor", args, None, error)

    async def drag_to(
        self,
        x: int,
        y: int,
        button: str = "left",
        duration: float = 0.5,
        delay: Optional[float] = None,
    ) -> None:
        """Drag from current position to specified coordinates."""
        args = {"x": x, "y": y, "button": button, "duration": duration, "delay": delay}
        error = None
        try:
            result = await self._original_interface.drag_to(x, y, button, duration, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("drag_to", args, None, error)

    async def drag(
        self,
        path: List[Tuple[int, int]],
        button: str = "left",
        duration: float = 0.5,
        delay: Optional[float] = None,
    ) -> None:
        """Drag the cursor along a path of coordinates."""
        args = {"path": path, "button": button, "duration": duration, "delay": delay}
        error = None
        try:
            result = await self._original_interface.drag(path, button, duration, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("drag", args, None, error)

    # Keyboard Actions
    async def key_down(self, key: str, delay: Optional[float] = None) -> None:
        """Press and hold a key."""
        args = {"key": key, "delay": delay}
        error = None
        try:
            result = await self._original_interface.key_down(key, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("key_down", args, None, error)

    async def key_up(self, key: str, delay: Optional[float] = None) -> None:
        """Release a previously pressed key."""
        args = {"key": key, "delay": delay}
        error = None
        try:
            result = await self._original_interface.key_up(key, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("key_up", args, None, error)

    async def type_text(self, text: str, delay: Optional[float] = None) -> None:
        """Type the specified text string."""
        args = {"text": text, "delay": delay}
        error = None
        try:
            result = await self._original_interface.type_text(text, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("type_text", args, None, error)

    async def press_key(self, key: str, delay: Optional[float] = None) -> None:
        """Press and release a single key."""
        args = {"key": key, "delay": delay}
        error = None
        try:
            result = await self._original_interface.press_key(key, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("press_key", args, None, error)

    async def hotkey(self, *keys: str, delay: Optional[float] = None) -> None:
        """Press multiple keys simultaneously (keyboard shortcut)."""
        args = {"keys": keys, "delay": delay}
        error = None
        try:
            result = await self._original_interface.hotkey(*keys, delay=delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("hotkey", args, None, error)

    # Scrolling Actions
    async def scroll(self, x: int, y: int, delay: Optional[float] = None) -> None:
        """Scroll the mouse wheel by specified amounts."""
        args = {"x": x, "y": y, "delay": delay}
        error = None
        try:
            result = await self._original_interface.scroll(x, y, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("scroll", args, None, error)

    async def scroll_down(self, clicks: int = 1, delay: Optional[float] = None) -> None:
        """Scroll down by the specified number of clicks."""
        args = {"clicks": clicks, "delay": delay}
        error = None
        try:
            result = await self._original_interface.scroll_down(clicks, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("scroll_down", args, None, error)

    async def scroll_up(self, clicks: int = 1, delay: Optional[float] = None) -> None:
        """Scroll up by the specified number of clicks."""
        args = {"clicks": clicks, "delay": delay}
        error = None
        try:
            result = await self._original_interface.scroll_up(clicks, delay)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("scroll_up", args, None, error)

    # Screen Actions
    async def screenshot(self) -> bytes:
        """Take a screenshot."""
        args = {}
        error = None
        result = None
        try:
            result = await self._original_interface.screenshot()
            return result
        except Exception as e:
            error = e
            raise
        finally:
            # For screenshots, we don't want to include the raw bytes in the trace args
            await self._record_call(
                "screenshot", args, "screenshot_taken" if result else None, error
            )

    async def get_screen_size(self) -> Dict[str, int]:
        """Get the screen dimensions."""
        args = {}
        error = None
        result = None
        try:
            result = await self._original_interface.get_screen_size()
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("get_screen_size", args, result, error)

    async def get_cursor_position(self) -> Dict[str, int]:
        """Get the current cursor position on screen."""
        args = {}
        error = None
        result = None
        try:
            result = await self._original_interface.get_cursor_position()
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("get_cursor_position", args, result, error)

    # Clipboard Actions
    async def copy_to_clipboard(self) -> str:
        """Get the current clipboard content."""
        args = {}
        error = None
        result = None
        try:
            result = await self._original_interface.copy_to_clipboard()
            return result
        except Exception as e:
            error = e
            raise
        finally:
            # Don't include clipboard content in trace for privacy
            await self._record_call(
                "copy_to_clipboard",
                args,
                f"content_length_{len(result)}" if result else None,
                error,
            )

    async def set_clipboard(self, text: str) -> None:
        """Set the clipboard content to the specified text."""
        # Don't include clipboard content in trace for privacy
        args = {"text_length": len(text)}
        error = None
        try:
            result = await self._original_interface.set_clipboard(text)
            return result
        except Exception as e:
            error = e
            raise
        finally:
            await self._record_call("set_clipboard", args, None, error)

```

--------------------------------------------------------------------------------
/libs/typescript/computer/src/interface/base.ts:
--------------------------------------------------------------------------------

```typescript
/**
 * Base interface for computer control.
 */

import pino from 'pino';
import WebSocket from 'ws';
import type { ScreenSize } from '../types';

export type MouseButton = 'left' | 'middle' | 'right';

export interface CursorPosition {
  x: number;
  y: number;
}

export interface AccessibilityNode {
  role: string;
  title?: string;
  value?: string;
  description?: string;
  bounds?: {
    x: number;
    y: number;
    width: number;
    height: number;
  };
  children?: AccessibilityNode[];
}

/**
 * Base class for computer control interfaces.
 */
export abstract class BaseComputerInterface {
  protected ipAddress: string;
  protected username: string;
  protected password: string;
  protected closed = false;
  protected commandLock: Promise<unknown> = Promise.resolve();
  protected ws: WebSocket;
  protected apiKey?: string;
  protected vmName?: string;

  protected logger = pino({ name: 'computer.interface-base' });

  constructor(
    ipAddress: string,
    username = 'lume',
    password = 'lume',
    apiKey?: string,
    vmName?: string
  ) {
    this.ipAddress = ipAddress;
    this.username = username;
    this.password = password;
    this.apiKey = apiKey;
    this.vmName = vmName;

    // Initialize WebSocket with headers if needed
    const headers: { [key: string]: string } = {};
    if (this.apiKey && this.vmName) {
      headers['X-API-Key'] = this.apiKey;
      headers['X-VM-Name'] = this.vmName;
    }

    // Create the WebSocket instance
    this.ws = new WebSocket(this.wsUri, { headers });
  }

  /**
   * Get the WebSocket URI for connection.
   * Subclasses can override this to customize the URI.
   */
  protected get wsUri(): string {
    const protocol = this.apiKey ? 'wss' : 'ws';

    // Check if ipAddress already includes a port
    if (this.ipAddress.includes(':')) {
      return `${protocol}://${this.ipAddress}/ws`;
    }

    // Otherwise, append the default port
    const port = this.apiKey ? '8443' : '8000';
    return `${protocol}://${this.ipAddress}:${port}/ws`;
  }

  /**
   * Wait for interface to be ready.
   * @param timeout Maximum time to wait in seconds
   * @throws Error if interface is not ready within timeout
   */
  async waitForReady(timeout = 60): Promise<void> {
    const startTime = Date.now();

    while (Date.now() - startTime < timeout * 1000) {
      try {
        await this.connect();
        return;
      } catch (error) {
        console.log(error);
        // Wait a bit before retrying
        this.logger.error(`Error connecting to websocket: ${JSON.stringify(error)}`);
        await new Promise((resolve) => setTimeout(resolve, 1000));
      }
    }

    throw new Error(`Interface not ready after ${timeout} seconds`);
  }

  /**
   * Authenticate with the WebSocket server.
   * This should be called immediately after the WebSocket connection is established.
   */
  private async authenticate(): Promise<void> {
    if (!this.apiKey || !this.vmName) {
      // No authentication needed
      return;
    }

    this.logger.info('Performing authentication handshake...');
    const authMessage = {
      command: 'authenticate',
      params: {
        api_key: this.apiKey,
        container_name: this.vmName,
      },
    };

    return new Promise<void>((resolve, reject) => {
      const authHandler = (data: WebSocket.RawData) => {
        try {
          const authResult = JSON.parse(data.toString());
          if (!authResult.success) {
            const errorMsg = authResult.error || 'Authentication failed';
            this.logger.error(`Authentication failed: ${errorMsg}`);
            this.ws.close();
            reject(new Error(`Authentication failed: ${errorMsg}`));
          } else {
            this.logger.info('Authentication successful');
            this.ws.off('message', authHandler);
            resolve();
          }
        } catch (error) {
          this.ws.off('message', authHandler);
          reject(error);
        }
      };

      this.ws.on('message', authHandler);
      this.ws.send(JSON.stringify(authMessage));
    });
  }

  /**
   * Connect to the WebSocket server.
   */
  public async connect(): Promise<void> {
    // If the WebSocket is already open, check if we need to authenticate
    if (this.ws.readyState === WebSocket.OPEN) {
      this.logger.info('Websocket is open, ensuring authentication is complete.');
      return this.authenticate();
    }

    // If the WebSocket is closed or closing, reinitialize it
    if (this.ws.readyState === WebSocket.CLOSED || this.ws.readyState === WebSocket.CLOSING) {
      this.logger.info('Websocket is closed. Reinitializing connection.');
      const headers: { [key: string]: string } = {};
      if (this.apiKey && this.vmName) {
        headers['X-API-Key'] = this.apiKey;
        headers['X-VM-Name'] = this.vmName;
      }
      this.ws = new WebSocket(this.wsUri, { headers });
      return this.authenticate();
    }

    // Connect and authenticate
    return new Promise((resolve, reject) => {
      const onOpen = async () => {
        try {
          // Always authenticate immediately after connection
          await this.authenticate();
          resolve();
        } catch (error) {
          reject(error);
        }
      };

      // If already connecting, wait for it to complete then authenticate
      if (this.ws.readyState === WebSocket.CONNECTING) {
        this.ws.addEventListener('open', onOpen, { once: true });
        this.ws.addEventListener('error', (error) => reject(error), {
          once: true,
        });
        return;
      }

      // Set up event handlers
      this.ws.on('open', onOpen);

      this.ws.on('error', (error: Error) => {
        reject(error);
      });

      this.ws.on('close', () => {
        if (!this.closed) {
          // Attempt to reconnect
          setTimeout(() => this.connect(), 1000);
        }
      });
    });
  }

  /**
   * Send a command to the WebSocket server.
   */
  public async sendCommand(
    command: string,
    params: { [key: string]: unknown } = {}
  ): Promise<{ [key: string]: unknown }> {
    // Create a new promise for this specific command
    const commandPromise = new Promise<{ [key: string]: unknown }>((resolve, reject) => {
      // Chain it to the previous commands
      const executeCommand = async (): Promise<{
        [key: string]: unknown;
      }> => {
        if (!this.ws || this.ws.readyState !== WebSocket.OPEN) {
          await this.connect();
        }

        return new Promise<{ [key: string]: unknown }>((innerResolve, innerReject) => {
          const messageHandler = (data: WebSocket.RawData) => {
            try {
              const response = JSON.parse(data.toString());
              if (response.error) {
                innerReject(new Error(response.error));
              } else {
                innerResolve(response);
              }
            } catch (error) {
              innerReject(error);
            }
            this.ws.off('message', messageHandler);
          };

          this.ws.on('message', messageHandler);
          const wsCommand = { command, params };
          this.ws.send(JSON.stringify(wsCommand));
        });
      };

      // Add this command to the lock chain
      this.commandLock = this.commandLock.then(() => executeCommand().then(resolve, reject));
    });

    return commandPromise;
  }

  /**
   * Check if the WebSocket is connected.
   */
  public isConnected(): boolean {
    return this.ws && this.ws.readyState === WebSocket.OPEN;
  }

  /**
   * Close the interface connection.
   */
  disconnect(): void {
    this.closed = true;
    if (this.ws && this.ws.readyState === WebSocket.OPEN) {
      this.ws.close();
    } else if (this.ws && this.ws.readyState === WebSocket.CONNECTING) {
      // If still connecting, terminate the connection attempt
      this.ws.terminate();
    }
  }

  /**
   * Force close the interface connection.
   * By default, this just calls close(), but subclasses can override
   * to provide more forceful cleanup.
   */
  forceClose(): void {
    this.disconnect();
  }

  // Mouse Actions
  abstract mouseDown(x?: number, y?: number, button?: MouseButton): Promise<void>;
  abstract mouseUp(x?: number, y?: number, button?: MouseButton): Promise<void>;
  abstract leftClick(x?: number, y?: number): Promise<void>;
  abstract rightClick(x?: number, y?: number): Promise<void>;
  abstract doubleClick(x?: number, y?: number): Promise<void>;
  abstract moveCursor(x: number, y: number): Promise<void>;
  abstract dragTo(x: number, y: number, button?: MouseButton, duration?: number): Promise<void>;
  abstract drag(
    path: Array<[number, number]>,
    button?: MouseButton,
    duration?: number
  ): Promise<void>;

  // Keyboard Actions
  abstract keyDown(key: string): Promise<void>;
  abstract keyUp(key: string): Promise<void>;
  abstract typeText(text: string): Promise<void>;
  abstract pressKey(key: string): Promise<void>;
  abstract hotkey(...keys: string[]): Promise<void>;

  // Scrolling Actions
  abstract scroll(x: number, y: number): Promise<void>;
  abstract scrollDown(clicks?: number): Promise<void>;
  abstract scrollUp(clicks?: number): Promise<void>;

  // Screen Actions
  abstract screenshot(): Promise<Buffer>;
  abstract getScreenSize(): Promise<ScreenSize>;
  abstract getCursorPosition(): Promise<CursorPosition>;

  // Window Management
  abstract open(target: string): Promise<void>;
  abstract launch(app: string, args?: string[]): Promise<number | undefined>;
  abstract getCurrentWindowId(): Promise<number | string>;
  abstract getApplicationWindows(app: string): Promise<Array<number | string>>;
  abstract getWindowName(windowId: number | string): Promise<string>;
  abstract getWindowSize(windowId: number | string): Promise<[number, number]>;
  abstract getWindowPosition(windowId: number | string): Promise<[number, number]>;
  abstract setWindowSize(windowId: number | string, width: number, height: number): Promise<void>;
  abstract setWindowPosition(windowId: number | string, x: number, y: number): Promise<void>;
  abstract maximizeWindow(windowId: number | string): Promise<void>;
  abstract minimizeWindow(windowId: number | string): Promise<void>;
  abstract activateWindow(windowId: number | string): Promise<void>;
  abstract closeWindow(windowId: number | string): Promise<void>;

  // Desktop Actions
  abstract getDesktopEnvironment(): Promise<string>;
  abstract setWallpaper(path: string): Promise<void>;

  // Clipboard Actions
  abstract copyToClipboard(): Promise<string>;
  abstract setClipboard(text: string): Promise<void>;

  // File System Actions
  abstract fileExists(path: string): Promise<boolean>;
  abstract directoryExists(path: string): Promise<boolean>;
  abstract listDir(path: string): Promise<string[]>;
  abstract readText(path: string): Promise<string>;
  abstract writeText(path: string, content: string): Promise<void>;
  abstract readBytes(path: string): Promise<Buffer>;
  abstract writeBytes(path: string, content: Buffer): Promise<void>;
  abstract deleteFile(path: string): Promise<void>;
  abstract createDir(path: string): Promise<void>;
  abstract deleteDir(path: string): Promise<void>;
  abstract runCommand(command: string): Promise<[string, string]>;

  // Accessibility Actions
  abstract getAccessibilityTree(): Promise<AccessibilityNode>;
  abstract toScreenCoordinates(x: number, y: number): Promise<[number, number]>;
  abstract toScreenshotCoordinates(x: number, y: number): Promise<[number, number]>;
}

```

--------------------------------------------------------------------------------
/scripts/playground.sh:
--------------------------------------------------------------------------------

```bash
#!/bin/bash

set -e

echo "🚀 Launching Cua Computer-Use Agent UI..."

# Save the original working directory
ORIGINAL_DIR="$(pwd)"

# Directories used by the script
DEMO_DIR="$HOME/.cua-demo"
VENV_DIR="$DEMO_DIR/venv"

# Function to clean up on exit
cleanup() {
  cd ~
  rm -rf "$TMP_DIR" 2>/dev/null || true
}

# Create a temporary directory for our work
TMP_DIR=$(mktemp -d)
cd "$TMP_DIR"
trap cleanup EXIT

# Ask user to choose between local macOS VMs or Cua Cloud Sandbox
echo ""
echo "Choose your Cua setup:"
echo "1) ☁️  Cua Cloud Sandbox (works on any system)"
echo "2) 🖥️  Local macOS VMs (requires Apple Silicon Mac + macOS 15+)"
echo ""
read -p "Enter your choice (1 or 2): " CHOICE

if [[ "$CHOICE" == "1" ]]; then
  # Cua Cloud Sandbox setup
  echo ""
  echo "☁️ Setting up Cua Cloud Sandbox..."
  echo ""
  
  # Check if existing .env.local already has CUA_API_KEY (check current dir and demo dir)
  # Look for .env.local in the original working directory (before cd to temp dir)
  CURRENT_ENV_FILE="$ORIGINAL_DIR/.env.local"
  DEMO_ENV_FILE="$DEMO_DIR/.env.local"
  
  CUA_API_KEY=""
  
  # First check current directory
  if [[ -f "$CURRENT_ENV_FILE" ]] && grep -q "CUA_API_KEY=" "$CURRENT_ENV_FILE"; then
    EXISTING_CUA_KEY=$(grep "CUA_API_KEY=" "$CURRENT_ENV_FILE" | cut -d'=' -f2- | tr -d '"' | tr -d "'" | xargs)
    if [[ -n "$EXISTING_CUA_KEY" && "$EXISTING_CUA_KEY" != "your_cua_api_key_here" && "$EXISTING_CUA_KEY" != "" ]]; then
      CUA_API_KEY="$EXISTING_CUA_KEY"
    fi
  fi
  
  # Then check demo directory if not found in current dir
  if [[ -z "$CUA_API_KEY" ]] && [[ -f "$DEMO_ENV_FILE" ]] && grep -q "CUA_API_KEY=" "$DEMO_ENV_FILE"; then
    EXISTING_CUA_KEY=$(grep "CUA_API_KEY=" "$DEMO_ENV_FILE" | cut -d'=' -f2- | tr -d '"' | tr -d "'" | xargs)
    if [[ -n "$EXISTING_CUA_KEY" && "$EXISTING_CUA_KEY" != "your_cua_api_key_here" && "$EXISTING_CUA_KEY" != "" ]]; then
      CUA_API_KEY="$EXISTING_CUA_KEY"
    fi
  fi
  
  # If no valid API key found, prompt for one
  if [[ -z "$CUA_API_KEY" ]]; then
    echo "To use Cua Cloud Sandbox, you need to:"
    echo "1. Sign up at https://cua.ai"
    echo "2. Create a Cloud Sandbox"
    echo "3. Generate an Api Key"
    echo ""
    read -p "Enter your Cua Api Key: " CUA_API_KEY
    
    if [[ -z "$CUA_API_KEY" ]]; then
      echo "❌ Cua Api Key is required for Cloud Sandbox."
      exit 1
    fi
  fi
  
  USE_CLOUD=true

elif [[ "$CHOICE" == "2" ]]; then
  # Local macOS VM setup
  echo ""
  echo "🖥️ Setting up local macOS VMs..."
  
  # Check for Apple Silicon Mac
  if [[ $(uname -s) != "Darwin" || $(uname -m) != "arm64" ]]; then
    echo "❌ Local macOS VMs require an Apple Silicon Mac (M1/M2/M3/M4)."
    echo "💡 Consider using Cua Cloud Sandbox instead (option 1)."
    exit 1
  fi

  # Check for macOS 15 (Sequoia) or newer
  OSVERSION=$(sw_vers -productVersion)
  if [[ $(echo "$OSVERSION 15.0" | tr " " "\n" | sort -V | head -n 1) != "15.0" ]]; then
    echo "❌ Local macOS VMs require macOS 15 (Sequoia) or newer. You have $OSVERSION."
    echo "💡 Consider using Cua Cloud Sandbox instead (option 1)."
    exit 1
  fi

  USE_CLOUD=false

else
  echo "❌ Invalid choice. Please run the script again and choose 1 or 2."
  exit 1
fi

# Install Lume if not already installed (only for local VMs)
if [[ "$USE_CLOUD" == "false" ]]; then
  if ! command -v lume &> /dev/null; then
    echo "📦 Installing Lume CLI..."
    curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh | bash
    
    # Add lume to PATH for this session if it's not already there
    if ! command -v lume &> /dev/null; then
      export PATH="$PATH:$HOME/.local/bin"
    fi
  fi

  # Pull the macOS CUA image if not already present
  if ! lume ls | grep -q "macos-sequoia-cua"; then
    # Check available disk space
    IMAGE_SIZE_GB=30
    AVAILABLE_SPACE_KB=$(df -k $HOME | tail -1 | awk '{print $4}')
    AVAILABLE_SPACE_GB=$(($AVAILABLE_SPACE_KB / 1024 / 1024))
    
    echo "📊 The macOS CUA image will use approximately ${IMAGE_SIZE_GB}GB of disk space."
    echo "   You currently have ${AVAILABLE_SPACE_GB}GB available on your system."
    
    # Prompt for confirmation
    read -p "   Continue? [y]/n: " CONTINUE
    CONTINUE=${CONTINUE:-y}
    
    if [[ $CONTINUE =~ ^[Yy]$ ]]; then
      echo "📥 Pulling macOS CUA image (this may take a while)..."
      lume pull macos-sequoia-cua:latest
    else
      echo "❌ Installation cancelled."
      exit 1
    fi
  fi
fi

# Create a Python virtual environment
echo "🐍 Setting up Python environment..."

# Try different Python commands in order of preference
PYTHON_CMD=""
for cmd in python3.11 python3 python; do
  if command -v $cmd &> /dev/null; then
    # Check this Python version
    PYTHON_VERSION=$($cmd --version 2>&1 | cut -d" " -f2)
    PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
    PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
    
    if [ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -eq 11 ]; then
      PYTHON_CMD=$cmd
      echo "✅ Found suitable Python: $cmd (version $PYTHON_VERSION)"
      break
    elif [ "$PYTHON_MAJOR" -eq 3 ] && [ "$PYTHON_MINOR" -gt 11 ]; then
      PYTHON_CMD=$cmd
      PYTHON_TOO_NEW=true
      echo "⚠️  Found $cmd (version $PYTHON_VERSION) but only Python 3.11.x is supported."
      break
    else
      echo "⚠️  Found $cmd (version $PYTHON_VERSION) but it's too old, trying next..."
    fi
  fi
done

# If no suitable Python was found, or if Python is too new, offer to exit or continue
if [ -z "$PYTHON_CMD" ] || [ "$PYTHON_TOO_NEW" = true ]; then
  OS_TYPE=$(uname -s)
  if [ "$PYTHON_TOO_NEW" = true ]; then
    echo -e "\n❌ Python version $PYTHON_VERSION detected. Only Python 3.11.x is supported. Newer versions (e.g., 3.12+) are not yet supported."
  else
    if [[ "$OS_TYPE" == "Darwin" ]]; then
      echo -e "\n❌ python3.11 not found. To continue, we recommend running this:\n\n    $ brew install [email protected]\n"
    elif [[ "$OS_TYPE" == "MINGW"* || "$OS_TYPE" == "CYGWIN"* || "$OS_TYPE" == "MSYS"* ]]; then
      echo -e "\n❌ python3.11 not found. Please install Python 3.11 from https://www.python.org/downloads/\n"
    else
      echo -e "\n❌ python3.11 not found. Please install Python 3.11 from your package manager or https://www.python.org/downloads/\n"
    fi
  fi
  while true; do
    echo "Would you like to exit so you can install Python 3.11, or continue anyway? (e = exit, c = continue): "
    read -n 1 -r PYTHON_CONT_CHOICE
    echo
    if [[ "$PYTHON_CONT_CHOICE" =~ ^[Ee]$ ]]; then
      echo "Exiting so you can install Python 3.11."
      exit 1
    elif [[ "$PYTHON_CONT_CHOICE" =~ ^[Cc]$ ]]; then
      echo "⚠️  Continuing without Python 3.11. Some features may not work as expected."
      break
    else
      echo "Please enter 'e' to exit or 'c' to continue."
    fi
  done
fi

# Create a virtual environment
if [ ! -d "$VENV_DIR" ]; then
  $PYTHON_CMD -m venv "$VENV_DIR"
fi

# Activate the virtual environment
source "$VENV_DIR/bin/activate"

# Install required packages
echo "📦 Updating Cua packages..."
pip install -U pip setuptools wheel Cmake
pip install -U cua-computer "cua-agent[all]"

# Create a simple demo script
mkdir -p "$DEMO_DIR"

# Create .env.local file with API keys (only if it doesn't exist)
if [[ ! -f "$DEMO_DIR/.env.local" ]]; then
  cat > "$DEMO_DIR/.env.local" << EOF
# Uncomment and add your API keys here
# OPENAI_API_KEY=your_openai_api_key_here
# ANTHROPIC_API_KEY=your_anthropic_api_key_here
CUA_API_KEY=your_cua_api_key_here
EOF
  echo "📝 Created .env.local file with API key placeholders"
else
  echo "📝 Found existing .env.local file - keeping your current settings"
fi

if [[ "$USE_CLOUD" == "true" ]]; then
  # Add CUA API key to .env.local if not already present
  if ! grep -q "CUA_API_KEY" "$DEMO_DIR/.env.local"; then
    echo "CUA_API_KEY=$CUA_API_KEY" >> "$DEMO_DIR/.env.local"
    echo "🔑 Added CUA_API_KEY to .env.local"
  elif grep -q "CUA_API_KEY=your_cua_api_key_here" "$DEMO_DIR/.env.local"; then
    # Update placeholder with actual key
    sed -i.bak "s/CUA_API_KEY=your_cua_api_key_here/CUA_API_KEY=$CUA_API_KEY/" "$DEMO_DIR/.env.local"
    echo "🔑 Updated CUA_API_KEY in .env.local"
  fi
fi

# Create a convenience script to run the demo
cat > "$DEMO_DIR/start_ui.sh" << EOF
#!/bin/bash
source "$VENV_DIR/bin/activate"
cd "$DEMO_DIR"
python run_demo.py
EOF
chmod +x "$DEMO_DIR/start_ui.sh"

echo "✅ Setup complete!"

if [[ "$USE_CLOUD" == "true" ]]; then
  # Create run_demo.py for cloud sandbox
  cat > "$DEMO_DIR/run_demo.py" << 'EOF'
import asyncio
import os
from pathlib import Path
from dotenv import load_dotenv
from computer import Computer
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
from agent.ui.gradio.ui_components import create_gradio_ui

# Load environment variables from .env.local
load_dotenv(Path(__file__).parent / ".env.local")

# Check for required API keys
cua_api_key = os.environ.get("CUA_API_KEY", "")
if not cua_api_key:
    print("\n❌ CUA_API_KEY not found in .env.local file.")
    print("Please add your CUA API key to the .env.local file.")
    exit(1)

openai_key = os.environ.get("OPENAI_API_KEY", "")
anthropic_key = os.environ.get("ANTHROPIC_API_KEY", "")

if not openai_key and not anthropic_key:
    print("\n⚠️  No OpenAI or Anthropic API keys found in .env.local.")
    print("Please add at least one API key to use AI agents.")

print("🚀 Starting CUA playground with Cloud Sandbox...")
print("📝 Edit .env.local to update your API keys")

# Launch the Gradio UI and open it in the browser
app = create_gradio_ui()
app.launch(share=False, inbrowser=True)
EOF
else
  # Create run_demo.py for local macOS VMs
  cat > "$DEMO_DIR/run_demo.py" << 'EOF'
import asyncio
import os
from pathlib import Path
from dotenv import load_dotenv
from computer import Computer
from agent import ComputerAgent, LLM, AgentLoop, LLMProvider
from agent.ui.gradio.ui_components import create_gradio_ui

# Load environment variables from .env.local
load_dotenv(Path(__file__).parent / ".env.local")

# Try to load API keys from environment
openai_key = os.environ.get("OPENAI_API_KEY", "")
anthropic_key = os.environ.get("ANTHROPIC_API_KEY", "")

if not openai_key and not anthropic_key:
    print("\n⚠️  No OpenAI or Anthropic API keys found in .env.local.")
    print("Please add at least one API key to use AI agents.")

print("🚀 Starting CUA playground with local macOS VMs...")
print("📝 Edit .env.local to update your API keys")

# Launch the Gradio UI and open it in the browser
app = create_gradio_ui()
app.launch(share=False, inbrowser=True)
EOF
fi

echo "☁️  CUA Cloud Sandbox setup complete!"
echo "📝 Edit $DEMO_DIR/.env.local to update your API keys"
echo "🖥️  Start the playground by running: $DEMO_DIR/start_ui.sh"

# Check if the VM is running (only for local setup)
if [[ "$USE_CLOUD" == "false" ]]; then
  echo "🔍 Checking if the macOS CUA VM is running..."
  VM_RUNNING=$(lume ls | grep "macos-sequoia-cua" | grep "running" || echo "")

  if [ -z "$VM_RUNNING" ]; then
    echo "🚀 Starting the macOS CUA VM in the background..."
    lume run macos-sequoia-cua:latest &
    # Wait a moment for the VM to initialize
    sleep 5
    echo "✅ VM started successfully."
  else
    echo "✅ macOS CUA VM is already running."
  fi
fi

# Ask if the user wants to start the demo now
echo
read -p "Would you like to start the Cua Computer-Use Agent UI now? (y/n) " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
  echo "🚀 Starting the Cua Computer-Use Agent UI..."
  echo ""
  "$DEMO_DIR/start_ui.sh"
fi

```

--------------------------------------------------------------------------------
/libs/python/agent/agent/integrations/hud/proxy.py:
--------------------------------------------------------------------------------

```python
"""HUD ComputerAgent wrapper and Fake AsyncOpenAI client.

Provides FakeAsyncOpenAI that adapts our ComputerAgent to the OpenAI Responses
interface needed by HUD's OperatorAgent. It implements only `responses.create`
and returns an OpenAI Response object with `id` and `output` fields, where `output` is a list of
OpenAI-like response blocks. We intentionally only support a single-step call
by consuming the first yielded result from `ComputerAgent.run()`.
"""

import time
import traceback
import uuid
from typing import Any, Dict, List, Optional

from agent.agent import ComputerAgent as BaseComputerAgent
from agent.callbacks import PromptInstructionsCallback
from hud.agents import OperatorAgent
from hud.tools.computer.settings import computer_settings

# OpenAI Responses typed models (required)
from openai.types.responses import (
    Response,
    ResponseComputerToolCall,
    ResponseInputParam,
    ResponseOutputItem,
    ResponseOutputMessage,
    ResponseOutputText,
    ResponseReasoningItem,
    ResponseUsage,
)
from PIL import Image


def _map_agent_output_to_openai_blocks(
    output_items: List[Dict[str, Any]],
) -> List[ResponseOutputItem]:
    """Map our agent output items to OpenAI ResponseOutputItem typed models.

    Only a subset is supported: computer_call, assistant message (text), and reasoning.
    Unknown types are ignored.
    """
    blocks: List[ResponseOutputItem] = []
    for item in output_items or []:
        t = item.get("type")
        if t == "computer_call":
            comp = ResponseComputerToolCall.model_validate(
                {
                    "id": item.get("id") or f"cu_{uuid.uuid4().hex}",
                    "type": "computer_call",
                    "call_id": item["call_id"],
                    "action": item["action"],
                    "pending_safety_checks": item.get("pending_safety_checks", []),
                    "status": "completed",
                }
            )
            blocks.append(comp)
            # we will exit early here as the responses api only supports a single step
            break
        elif t == "message" and item.get("role") == "assistant":
            content_blocks: List[ResponseOutputText] = []
            for c in item.get("content", []) or []:
                content_blocks.append(
                    ResponseOutputText.model_validate(
                        {
                            "type": "output_text",
                            "text": c["text"],
                            "annotations": [],
                        }
                    )
                )
            if content_blocks:
                msg = ResponseOutputMessage.model_validate(
                    {
                        "id": item.get("id") or f"msg_{uuid.uuid4()}",
                        "type": "message",
                        "role": "assistant",
                        "status": "completed",
                        "content": [ct.model_dump() for ct in content_blocks],
                    }
                )
                blocks.append(msg)
        elif t == "reasoning":
            reasoning = ResponseReasoningItem.model_validate(
                {
                    "id": item.get("id") or f"rsn_{uuid.uuid4()}",
                    "type": "reasoning",
                    "summary": item["summary"],
                }
            )
            blocks.append(reasoning)
        # Unhandled types are ignored
    return blocks


def _to_plain_dict_list(items: Any) -> List[Dict[str, Any]]:
    out: List[Dict[str, Any]] = []
    for it in list(items):
        if hasattr(it, "model_dump"):
            out.append(it.model_dump())  # type: ignore[attr-defined]
        elif isinstance(it, dict):
            out.append(it)
        else:
            # Strict: rely on default __dict__ if present
            out.append(dict(it))  # may raise if not mapping
    return out


class FakeAsyncOpenAI:
    """Minimal fake OpenAI client with only `responses.create` implemented.

    It uses a provided `ComputerAgent` instance to produce a single-step
    response compatible with HUD's OperatorAgent loop.
    """

    def __init__(self, computer_agent: BaseComputerAgent) -> None:
        self._agent = computer_agent
        self.responses = self._Responses(self)

    class _Responses:
        def __init__(self, parent: "FakeAsyncOpenAI") -> None:
            # Caches for cross-call context when using previous_response_id
            self.blocks_cache: Dict[str, ResponseInputParam | ResponseOutputItem] = {}
            self.context_cache: Dict[str, List[str]] = {}
            self.agent = parent._agent

        async def create(
            self,
            *,
            model: str,
            input: ResponseInputParam,
            tools: Optional[List[Dict[str, Any]]] = None,
            instructions: Optional[str] = None,
            previous_response_id: Optional[str] = None,
            max_retries: int = 5,
            **_: Any,
        ) -> Any:
            for attempt in range(max_retries):
                # Prepend cached blocks from previous_response_id to input
                full_input = input
                if previous_response_id is not None:
                    prev_block_ids = self.context_cache[previous_response_id]
                    prev_blocks = [self.blocks_cache[b_id] for b_id in prev_block_ids]
                    full_input = _to_plain_dict_list(prev_blocks + input)

                # Pre-pend instructions message
                effective_input = full_input
                if instructions:
                    effective_input = [
                        {
                            "role": "user",
                            "content": instructions,
                        }
                    ] + full_input

                # Run a single iteration of the ComputerAgent
                agent_result: Optional[Dict[str, Any]] = None
                async for result in self.agent.run(effective_input):  # type: ignore[arg-type]
                    agent_result = result
                    break
                assert agent_result is not None, "Agent failed to produce result"

                output = _map_agent_output_to_openai_blocks(agent_result["output"])
                usage = agent_result["usage"]

                # Cache conversation context using the last response id
                block_ids: List[str] = []
                blocks_to_cache = full_input + output
                for b in blocks_to_cache:
                    bid = getattr(b, "id", None) or f"tmp-{hash(repr(b))}"
                    self.blocks_cache[bid] = b  # type: ignore[assignment]
                    block_ids.append(bid)
                response_id = agent_result.get("id") or f"fake-{int(time.time()*1000)}"
                self.context_cache[response_id] = block_ids

                try:
                    return Response.model_validate(
                        {
                            "id": response_id,
                            "created_at": time.time(),
                            "object": "response",
                            "model": model,
                            "output": output,
                            "parallel_tool_calls": False,
                            "tool_choice": "auto",
                            "tools": [],
                            "previous_response_id": previous_response_id,
                            "usage": ResponseUsage.model_validate(
                                {
                                    "input_tokens": usage.get("input_tokens", 0),
                                    "output_tokens": usage.get("output_tokens", 0),
                                    "total_tokens": usage.get("total_tokens", 0),
                                    "input_tokens_details": usage.get(
                                        "input_tokens_details", {"cached_tokens": 0}
                                    ),
                                    "output_tokens_details": usage.get(
                                        "output_tokens_details", {"reasoning_tokens": 0}
                                    ),
                                }
                            ),
                        }
                    )
                except Exception as e:
                    print(
                        f"Error while validating agent response (attempt {attempt + 1}/{max_retries}): ",
                        e,
                    )
                    if attempt == max_retries - 1:
                        print(traceback.format_exc())
                        raise e


# ---------------------------------------------------------------------------
# Proxy OperatorAgent (moved from __init__.py)
# ---------------------------------------------------------------------------


class ProxyOperatorAgent(OperatorAgent):
    """OperatorAgent that proxies model calls through our ComputerAgent.

    Accepts the same config keys we pass via hud.run_dataset `agent_config`:
    - model: str | None
    - allowed_tools: list[str] | None
    Additional kwargs are forwarded to OperatorAgent (if any are supported).
    """

    def __init__(
        self,
        *,
        model: str | None = None,
        allowed_tools: list[str] | None = None,
        trajectory_dir: str | dict | None = None,
        # === ComputerAgent kwargs ===
        tools: list[Any] | None = None,
        custom_loop: Any | None = None,
        only_n_most_recent_images: int | None = None,
        callbacks: list[Any] | None = None,
        instructions: str | None = None,
        verbosity: int | None = None,
        max_retries: int | None = 3,
        screenshot_delay: float | int = 0.5,
        use_prompt_caching: bool | None = False,
        max_trajectory_budget: float | dict | None = None,
        telemetry_enabled: bool | None = True,
        **kwargs: Any,
    ) -> None:
        model = model or "computer-use-preview"
        allowed_tools = allowed_tools or ["openai_computer"]

        computer_shim = {
            "screenshot": lambda: Image.new(
                "RGB",
                (computer_settings.OPENAI_COMPUTER_WIDTH, computer_settings.OPENAI_COMPUTER_HEIGHT),
            ),
            "environment": "linux",
            "dimensions": (
                computer_settings.OPENAI_COMPUTER_WIDTH,
                computer_settings.OPENAI_COMPUTER_HEIGHT,
            ),
        }
        # Build tools ensuring the computer_shim is included
        agent_tools: list[Any] = [computer_shim]
        if tools:
            agent_tools.extend(tools)

        # Build callbacks, injecting prompt instructions if provided
        agent_callbacks = list(callbacks or [])
        if instructions:
            agent_callbacks.append(PromptInstructionsCallback(instructions))

        computer_agent = BaseComputerAgent(
            model=model,
            tools=agent_tools,
            custom_loop=custom_loop,
            only_n_most_recent_images=only_n_most_recent_images,
            callbacks=agent_callbacks,
            verbosity=verbosity,
            trajectory_dir=trajectory_dir,
            max_retries=max_retries,
            screenshot_delay=screenshot_delay,
            use_prompt_caching=use_prompt_caching,
            max_trajectory_budget=max_trajectory_budget,
            telemetry_enabled=telemetry_enabled,
        )
        model_client = FakeAsyncOpenAI(computer_agent)

        super().__init__(
            model_client=model_client,  # type: ignore[arg-type]
            model=model,
            allowed_tools=allowed_tools,
            **kwargs,
        )


__all__ = [
    "FakeAsyncOpenAI",
    "ProxyOperatorAgent",
]

```

--------------------------------------------------------------------------------
/libs/python/som/som/visualization.py:
--------------------------------------------------------------------------------

```python
import logging
import os
import platform
from typing import Any, Dict, List, Tuple

import numpy as np
import supervision as sv
from PIL import Image, ImageDraw, ImageFont

logger = logging.getLogger(__name__)


class BoxAnnotator:
    """Class for drawing bounding boxes and labels on images."""

    def __init__(self):
        """Initialize the box annotator with a color palette."""
        # WCAG 2.1 compliant color palette optimized for accessibility
        self.colors = [
            "#2E7D32",  # Green
            "#C62828",  # Red
            "#1565C0",  # Blue
            "#6A1B9A",  # Purple
            "#EF6C00",  # Orange
            "#283593",  # Indigo
            "#4527A0",  # Deep Purple
            "#00695C",  # Teal
            "#D84315",  # Deep Orange
            "#1B5E20",  # Dark Green
            "#B71C1C",  # Dark Red
            "#0D47A1",  # Dark Blue
            "#4A148C",  # Dark Purple
            "#E65100",  # Dark Orange
            "#1A237E",  # Dark Indigo
            "#311B92",  # Darker Purple
            "#004D40",  # Dark Teal
            "#BF360C",  # Darker Orange
            "#33691E",  # Darker Green
            "#880E4F",  # Pink
        ]
        self.color_index = 0
        self.default_font = None
        self._initialize_font()

    def _initialize_font(self) -> None:
        """Initialize the default font."""
        # Try to load a system font first
        system = platform.system()
        font_paths = []

        if system == "Darwin":  # macOS
            font_paths = [
                "/System/Library/Fonts/Helvetica.ttc",
                "/System/Library/Fonts/Arial.ttf",
                "/Library/Fonts/Arial.ttf",
            ]
        elif system == "Linux":
            font_paths = [
                "/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf",
                "/usr/share/fonts/TTF/DejaVuSans.ttf",
                "/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf",
            ]
        else:  # Windows
            font_paths = ["C:\\Windows\\Fonts\\arial.ttf"]

        # Try each font path
        for font_path in font_paths:
            if os.path.exists(font_path):
                try:
                    # Test the font with a small size
                    test_font = ImageFont.truetype(font_path, 12)
                    # Test if the font can render text
                    test_font.getbbox("1")
                    self.default_font = font_path
                    return
                except Exception:
                    continue

    def _get_next_color(self) -> str:
        """Get the next color from the palette."""
        color = self.colors[self.color_index]
        self.color_index = (self.color_index + 1) % len(self.colors)
        return color

    def _hex_to_rgb(self, hex_color: str) -> Tuple[int, int, int]:
        """Convert hex color to RGB tuple."""
        hex_color = hex_color.lstrip("#")
        # Create explicit tuple of 3 integers to match the return type
        r = int(hex_color[0:2], 16)
        g = int(hex_color[2:4], 16)
        b = int(hex_color[4:6], 16)
        return (r, g, b)

    def draw_boxes(
        self, image: Image.Image, detections: List[Dict[str, Any]], draw_config: Dict[str, Any]
    ) -> Image.Image:
        """Draw bounding boxes and labels on the image."""
        draw = ImageDraw.Draw(image)

        # Create smaller font while keeping contrast
        try:
            if self.default_font:
                font = ImageFont.truetype(self.default_font, size=12)  # Reduced from 16 to 12
            else:
                # If no TrueType font available, use default
                font = ImageFont.load_default()
        except Exception:
            font = ImageFont.load_default()

        padding = 2  # Reduced padding for smaller overall box
        spacing = 1  # Reduced spacing between elements

        # Keep track of used label areas to check for collisions
        used_areas = []

        # Store label information for third pass
        labels_to_draw = []

        # First pass: Initialize used_areas with all bounding boxes
        for detection in detections:
            box = detection["bbox"]
            x1, y1, x2, y2 = [
                int(coord * dim) for coord, dim in zip(box, [image.width, image.height] * 2)
            ]
            used_areas.append((x1, y1, x2, y2))

        # Second pass: Draw all bounding boxes
        for idx, detection in enumerate(detections, 1):
            # Get box coordinates
            box = detection["bbox"]
            x1, y1, x2, y2 = [
                int(coord * dim) for coord, dim in zip(box, [image.width, image.height] * 2)
            ]

            # Get color for this detection
            color = self._get_next_color()
            rgb_color = self._hex_to_rgb(color)

            # Draw bounding box with original width
            draw.rectangle(((x1, y1), (x2, y2)), outline=rgb_color, width=2)

            # Use detection number as label
            label = str(idx)

            # Get text dimensions using getbbox
            bbox = font.getbbox(label)
            text_width = bbox[2] - bbox[0]
            text_height = bbox[3] - bbox[1]

            # Create box dimensions with padding
            box_width = text_width + (padding * 2)  # Removed multiplier for tighter box
            box_height = text_height + (padding * 2)  # Removed multiplier for tighter box

            def is_inside_bbox(x, y):
                """Check if a label box would be inside the bounding box."""
                return x >= x1 and x + box_width <= x2 and y >= y1 and y + box_height <= y2

            # Try different positions until we find one without collision
            positions = [
                # Top center (above bbox)
                lambda: (x1 + ((x2 - x1) - box_width) // 2, y1 - box_height - spacing),
                # Bottom center (below bbox)
                lambda: (x1 + ((x2 - x1) - box_width) // 2, y2 + spacing),
                # Right center (right of bbox)
                lambda: (x2 + spacing, y1 + ((y2 - y1) - box_height) // 2),
                # Left center (left of bbox)
                lambda: (x1 - box_width - spacing, y1 + ((y2 - y1) - box_height) // 2),
                # Top right (outside corner)
                lambda: (x2 + spacing, y1 - box_height - spacing),
                # Top left (outside corner)
                lambda: (x1 - box_width - spacing, y1 - box_height - spacing),
                # Bottom right (outside corner)
                lambda: (x2 + spacing, y2 + spacing),
                # Bottom left (outside corner)
                lambda: (x1 - box_width - spacing, y2 + spacing),
            ]

            def check_occlusion(x, y):
                """Check if a label box occludes any existing ones or is inside bbox."""
                # First check if it's inside the bounding box
                if is_inside_bbox(x, y):
                    return True

                # Then check collision with other labels
                new_box = (x, y, x + box_width, y + box_height)
                label_width = new_box[2] - new_box[0]
                label_height = new_box[3] - new_box[1]

                for used_box in used_areas:
                    if not (
                        new_box[2] < used_box[0]  # new box is left of used box
                        or new_box[0] > used_box[2]  # new box is right of used box
                        or new_box[3] < used_box[1]  # new box is above used box
                        or new_box[1] > used_box[3]  # new box is below used box
                    ):
                        # Calculate dimensions of the used box
                        used_box_width = used_box[2] - used_box[0]
                        used_box_height = used_box[3] - used_box[1]

                        # Only consider as collision if used box is NOT more than 5x bigger in both dimensions
                        if not (
                            used_box_width > 5 * label_width and used_box_height > 5 * label_height
                        ):
                            return True
                return False

            # Try each position until we find one without collision
            label_x = None
            label_y = None

            for get_pos in positions:
                x, y = get_pos()
                # Ensure position is within image bounds
                if x < 0 or y < 0 or x + box_width > image.width or y + box_height > image.height:
                    continue
                if not check_occlusion(x, y):
                    label_x = x
                    label_y = y
                    break

            # If all positions collide or are out of bounds, find the best possible position
            if label_x is None:
                # Try to place it in the nearest valid position outside the bbox
                best_pos = positions[0]()  # Default to top center
                label_x = max(0, min(image.width - box_width, best_pos[0]))
                label_y = max(0, min(image.height - box_height, best_pos[1]))

                # Ensure it's not inside the bounding box
                if is_inside_bbox(label_x, label_y):
                    # Force it above the bounding box
                    label_y = max(0, y1 - box_height - spacing)

            # Add this label area to used areas
            if (
                label_x is not None
                and label_y is not None
                and box_width is not None
                and box_height is not None
            ):
                used_areas.append((label_x, label_y, label_x + box_width, label_y + box_height))

            # Store label information for second pass
            labels_to_draw.append(
                {
                    "label": label,
                    "x": label_x,
                    "y": label_y,
                    "width": box_width,
                    "height": box_height,
                    "text_width": text_width,
                    "text_height": text_height,
                    "color": rgb_color,
                }
            )

        # Third pass: Draw all labels on top
        for label_info in labels_to_draw:
            # Draw background box with white outline
            draw.rectangle(
                (
                    (label_info["x"] - 1, label_info["y"] - 1),
                    (
                        label_info["x"] + label_info["width"] + 1,
                        label_info["y"] + label_info["height"] + 1,
                    ),
                ),
                outline="white",
                width=2,
            )
            draw.rectangle(
                (
                    (label_info["x"], label_info["y"]),
                    (label_info["x"] + label_info["width"], label_info["y"] + label_info["height"]),
                ),
                fill=label_info["color"],
            )

            # Center text in box
            text_x = label_info["x"] + (label_info["width"] - label_info["text_width"]) // 2
            text_y = label_info["y"] + (label_info["height"] - label_info["text_height"]) // 2

            # Draw text with black outline for better visibility
            outline_width = 1
            for dx in [-outline_width, outline_width]:
                for dy in [-outline_width, outline_width]:
                    draw.text(
                        (text_x + dx, text_y + dy), label_info["label"], fill="black", font=font
                    )

            # Draw the main white text
            draw.text((text_x, text_y), label_info["label"], fill=(255, 255, 255), font=font)

        logger.info("Finished drawing all boxes")
        return image

```

--------------------------------------------------------------------------------
/examples/evals/wikipedia_most_linked.txt:
--------------------------------------------------------------------------------

```
ISBN (identifier)
United States
Main Page
Tilde
Doi (identifier)
Fair use
Association football
Years
Wayback Machine
ISSN (identifier)
India
Wikimedia Foundation
Wikidata
Animal
Taxonomy (biology)
Australia
France
Eukaryote
IP address
U.S. state
Time zone
City
Copyright
Canada
Town
ASCII
Greek alphabet
Typographic ligature
Diacritical mark
Wikipedia
Germany
Human settlement
Open Tree of Life
IMDb (identifier)
United Kingdom
Catalogue of Life
Insect
Russia
Japan
Italy
Arthropod
Television show
Public domain
INaturalist
Poland
England
PMID (identifier)
Daylight saving time
S2CID (identifier)
China
Encyclopedia of Life
Spain
OCLC (identifier)
Plant
Flickr
Wikispecies
Africa
Song
Record label
Lepidoptera
Iran
English language
Music genre
News aggregator
Web feed
Proxy server
X-Forwarded-For
College football
World War II
Brazil
Sweden
Politics
Olympics
Netherlands
Record producer
California
New York City
Surname
The New York Times
London
New Zealand
PMC (identifier)
Logo
Synonym (taxonomy)
Switzerland
Turkey
Sport
Video game
Architecture
Norway
Bibcode (identifier)
Mexico
Botany
JSTOR (identifier)
Rail transport
Field hockey
Ireland
Scotland
Belgium
South Africa
Common name
Professional sports
Sport governing body
Sport industry
Olympic games
Election
Austria
Ukraine
Anthroponymy
Pakistan
Baseball
Denmark
Christianity
Philippines
Woman
Romania
Czech Republic
Album
Godzilla Minus One
Single (music)
Electoral reform
Nofollow
Basketball
New York (state)
Argentina
Finland
Soviet Union
Greece
Russian language
Historic site
Free content
YouTube
Catholic Church
Hungary
Kingdom Hearts
Beetle
Company
Tetris
Portugal
BioShock
Abandonware
Deus Ex (video game)
4A Engine
Yoshi's New Island
Kaboom! (video game)
Rain World
Juno (Overwatch)
Crash Team Rumble
Vault 101
Tales of Commons
NHL Hockey
Clutch Gaming
Haseo
Allin Kempthorne
Ilyas El Maliki
Ratalaika Games
3D mousepad
HaptX
Walid Sultan Midani
Rustler (video game)
Look Outside
Ducks Ahoy!
Fusion Engine
Cricket
Geography
Chordate
The Guardian
Israel
Billboard (magazine)
Ice hockey
Given name
Chicago
World War I
Pennsylvania
Indonesia
Alma mater
Vascular plant
Amorphea
Wikimedia Commons
Novel
Village
Visual arts
Film poster
Flowering plant
Opisthokont
Obazoa
County seat
Short story
First-class cricket
Law
Europe
University
Croatia
Sport of athletics
Holozoa
Choanozoa
Filozoa
German language
Tennis
Eumetazoa
Serbia
ParaHoxozoa
Thailand
History
Midfielder
Bilateria
Unincorporated area
French language
AllMusic
Astronomy
Nephrozoa
Novella
Ship
Twitter
Character (arts)
College
Malaysia
Conflict of interest
Higher education
IUCN Red List
Rock music
Gastropoda
Creative Commons
Wales
Bulgaria
UTC+2
Paris
Species
Illinois
HTML element
South Korea
BBC
Persian language
Moth
Conservation status
Pop music
Colombia
Wicket
American football
Jazz
World Flora Online
Los Angeles
Songwriter
Hong Kong
Hdl (identifier)
Genus
Spanish language
Egypt
Not out
Slovenia
Chile
Korea
Tropicos
Slovakia
Bishop
Family (biology)
Rugby union
Women's history
Nigeria
College basketball
Sports Reference
Washington, D.C.
GFDL
Afghanistan
Sri Lanka
Newspapers.com
UTC+1
Eudicots
Estonia
Los Angeles Times
Olympedia
Bangladesh
Peru
Singapore
Typographical error
UTC
Virginia
Taiwan
Fast bowling
COVID-19 pandemic
Food
Fish
River
Republic of Ireland
Beer
Caribbean
Michigan
Drink
Chinese language
Business
Leg break
Women's Test cricket
Women's cricket
Innings
New Jersey
Protostome
Spin bowling
Sugar
Underarm bowling
Roger Federer
Googly
Apple
Comics
Cricket Australia XI
Fair and unfair play
Anime
Rafael Nadal
Leander Paes
Kazakhstan
Capital city
Blessed Virgin Mary
Venezuela
Case sensitivity
Arabic language
North America
Texas
Burger King
The Plant List
Justine Henin
Sushi
Angelus
Beef
Sanctification
Cuthbert Tunstall
Bread
Saint Mungo
Incumbent
Americanism (heresy)
Curry
Ensoulment
Associated Press
Adolph John Paschang
French cuisine
Altar Society
UTC-5
Philadelphia
Bill Mallon
Yogurt
Soy sauce
Open Era (tennis)
Belarus
Manga
English Wikipedia
Islam
Trademark
ISO 4
Wisconsin
Lithuania
The Washington Post
Agaricus bisporus
Reptile
Sociology
Organizations
Death
Ham and eggs
Asia
Swimming (sport)
South America
Northern Ireland
Observation.org
European Union
Astronomical object
Georgia (U.S. state)
Gmina
Provinces of Iran
Computing
Counties of Iran
Discogs
Mathematics
Powiat
Missouri
Bachelor of Arts
Iran Standard Time
Florida
Bakhsh
Minnesota
Oregon
Nepal
Variety (magazine)
Japanese language
Journalism
Rome
Computer
Ohio
Ontario
Internet Archive
Latvia
Comedy
Azerbaijan
BBC News
Morocco
Ecdysozoa
Print-on-demand
Bengali language
A5 paper
Pedia Press
Education
Mollusca
American Civil War
Berlin
Taxon
Maryland
Panarthropoda
Hebrew language
Toronto
Tactopoda
Episode
Cuba
Country music
Religion
Rotten Tomatoes
Georgia (country)
Classical music
Month
Puerto Rico
GEOnet Names Server
Sydney
The Times
Iraq
Polyphaga
Derivative work
Lisbon
Syria
Ecuador
Uzbekistan
Greek language
Latin
United Nations
Literature
Animation
Physics
Amphibian
Romanize
List of countries
Moscow
Politician
Philosophy
Metacritic
Mammal
Pinyin
Open access
New South Wales
Theatre
Allmusic
Syntax
Women in music
Fly
Colorado
Academic journal
LGBTQ
Seal (emblem)
Rolling Stone
Saudi Arabia
Science fiction
Tweet (social media)
Heavy metal music
Boston
Vietnam
Molecular biology
Facebook
Iceland
Albania
Cycling
Tennessee
Armenia
Massachusetts
Mandibulata
United States Navy
Communes of France
Census
Algeria
United States Army
Wikilink
Pancrustacea
Alternative rock
American English
Radio stations
History of Romania
Endemism
San Francisco
Award
Ghana
Judaism
Alabama
Blog
The Independent
Melbourne
Cantons of France
Lebanon
West Germany
Quotation mark
Regions of France
Chernivtsi Oblast
Tokyo
Italian language
Connecticut
Country
Screenshot
Ghost town
Iran Daylight Time
NatureServe
Mongolia
Cyprus
Northern Bukovina
Rugby league
Northern Bessarabia
State highway
Harvard University
Yorkshire
Pterygota
Slash (punctuation)
Prize
Science
Asian Games
Eastern Time Zone
Myanmar
Nazi Germany
Ottoman Empire
Quebec
Billboard Hot 100
United Arab Emirates
Neoptera
Hexapoda
Least Concern
Type species
EPPO Code
Wikisource
Kyrgyzstan
Allotriocarida
Volleyball
Geology
Second World War
British Columbia
Socialism
Zoology
The Daily Telegraph
Paleontology
Vienna
Dicondylia
BugGuide
United States Senate
Hermit crab
Paraphrase
CNN
Royal Navy
Indian Standard Time
Billboard 200
Kenya
DVD
Sipuncula
Tajikistan
National park
Economics
Heterocyathus
Uruguay
Heteropsammia
Road
Spanish name
Luxembourg
Korean language
UK Singles Chart
Queensland
Montreal
New York Times
Bolivia
CP/M
Timestamp
Electronic music
INSEE code
ArXiv (identifier)
PubMed
SVG
USA Today
Omnivore
Tunisia
Psychology
ESPN
UEFA
Hawaii
Gastropod
Aliyah
North Carolina
Russian Empire
Tibet
Fungi
Oklahoma
Fauna Europaea
Turkmenistan
British English
The London Gazette
Civil township
Boxing
Barack Obama
Animal Diversity Web
Reuters
Eumetabola
Voter turnout
Transport
False positive
Donald Trump
Kansas
Antarctica
Lake
Ethiopia
Time (magazine)
Marriage
NBC
Beijing
Vertebrate
Czechoslovakia
Protected area
Energy
Poetry
Archaeology
Columbia University
Poverty line
Alaska
Computing platform
British Empire
University of Oxford
Costa Rica
Dublin
A-side and B-side
ZIP code
Actinopterygii
UTC-6
Photoperiodism
Mayor
Sphaeriidae
Animal suicide
Atka mackerel
Starling
Arizona
Entertainment Weekly
Sphaerium beckmani
Junqueira cow
Zaniolepis frenata
Campocraspedon
Zimbabwe
Motorsport
Bird flight
Cnemophilidae
Hinduism
Phalarope
Indiana
Museums
Holometabola
Pytilia
North Macedonia
Malta
Cathartiformes
Darter
Saker falcon
Cathartes
Avian malaria
Coal tit
Magpie duck
Video game developer
Bird bath
Vesper sparrow
Gouldian finch
Debeaking
Vector graphics
Semiplumbeous hawk
Scottish crossbill
Bullfinch
Fregata
Nidicolous
Plushcap
Pallid scops owl
Hip-hop
Blyth's frogmouth
Sunda scops owl
Argus (bird)
Operation Migration
Nik Borrow
Per capita income
Guy Oseary
Madrid
Buddhism
Drainage basin
Sephardic Haredim
Rami Kleinstein
Guy Bavli
David Bar-Hayim
Levin Kipnis
Edna Arbel
Prisoner of Zion
Ayala Procaccia
Nachum Heiman
Zman Tel Aviv
CBS
ARIA Charts
Cucujiformia
Away colours
Regex
2019 African Games
1962 Asian Games
1958 Asian Games
Chemistry
Olympic Games
The Middle Ages
Central Asia
Bengalis
Southeast Asia
Find a Grave
Microsoft Windows
Swing (politics)
White (U.S. Census)
Roman Catholic
Maine
The Times of India
Season (sports)
Jamaica
Video game genre
Munich
Asterids
Rosids
Golf
Language
Hangul
Atlanta
Glasgow
UTC+3
Library of Congress
Deuterostome
COVID-19
Video game publisher
Montenegro
ESPNcricinfo
Brand
UTC-4
IGN
Stockholm
Istanbul
NASA
Gnathostomata
Ukrainian language
Human rights
Chicago Tribune
ProQuest
IMDb
River mouth
Hip hop music
Gene
Netflix
Moldova
Barcelona
Paraguay
Olfactores
Labour Party (UK)
United States dollar
Qatar
Photography
Guatemala
Summit
Cold War
Running
First World War
Precipitation
Edinburgh
Amsterdam
Lima
New Eskaton
Computer program
Xinjiang
Women in science
Manhattan
Warsaw
Magazine
Horror film
Deadline Hollywood
Jordan
Aparaglossata
Agriculture
Internet
Prague
The Hindu
Cretaceous
Latino (U.S. Census)
Vietnam War
Music download
Encyclopedia
Chemical compounds
Pittsburgh
Soap opera
Budapest
George W. Bush
Seattle
Extended play
Washington (state)
Listed building
Palestine
LCCN (identifier)
Portland, Oregon
Panama
Plagiarism
Brooklyn
Teleostomi
Manchester
Bird
Mollusk
Automobile
Historic England
Linguistics
Dependent territory
Athens
Civil engineering
Sea snail
Population density
Finance
Disaster management
Tanzania
Jurassic
Districts of Russia
Western Australia
Louisiana
Portuguese language
Anatomy
The Beatles
Tamil language
Milan
Uganda
Natural environment
FIFA
Cameroon
Blu-ray
Mexico City
Chemical formula
Jimmy Wales
Papua New Guinea
Diaphoretickes
UNESCO
Forbes
Technology
Buenos Aires
Vancouver
Dominican Republic
2007
Species description
East Germany
Folk music
Kentucky
Multimedia
Monocotyledon
Rio de Janeiro
Automated
Hindi
Houston
Google
Devonian
Member of Parliament
Bible
Mumbai
FishBase
African diaspora
Carboniferous
Cambrian
Triassic
Montana
Handball
Ordovician
San Diego
Archive.today
Stanford University
British Army
Middle Ages
Frequency
Ultratop
Permian
Detroit
Earth
Precambrian
Hamburg
Alberta
Tamil Nadu
Madagascar
Lancashire
Guitar
Trade union
Instagram
Engineering
2006
Silurian
NPR
Railway station
CAS Registry Number
Yemen
Noctuoidea
Fiji
Haiti
Rowing (sport)
New Orleans
NME
Alternative media
North Korea
Microsoft
Jerusalem
Paleogene
Audery Mill Creek
Horse racing
Post town
Piano
Bavaria
Polish language
Horror fiction
Neogene
Kerala
Copenhagen
Google Books
Central Time Zone
Island
Birmingham
Anglicanism
Software
Mountain range
Investment
Brussels
Muhammad Ali
Asian (U.S. Census)
Video game culture
Brisbane
Church of England
Kosovo
Bachelor of Science
Molar mass
Arachnid
Own goal
Yale University
Caenogastropoda
Auckland
World Athletics
Trinidad and Tobago
Hanyu Pinyin
Sound bite
Time
El Salvador
Microbiology
Columbia Records
Seoul
Cerambycidae
Maharashtra
Chelicerata
Fungus
Media influence
South Carolina
Radio
Telenovela
FA Cup
Senegal
Internet trolling
Nashville, Tennessee
Demonym
Standard Chinese
Sculpture
Liverpool
Thesis
Bass guitar
Chess
Women artists
Icon (computing)
PubChem
UK Albums Chart
Head coach
Roman Empire
Grand Slam (tennis)
JSmol
Formula One
Biology
Kent
Ancient Rome
Inner Carniola
Oslo
Dutch language
Wingspan
Archaeplastida
MTV
Edvard Ravnikar
ITunes
Feminism
German Empire
Pacific Ocean
Atlantic Ocean
Pharmacology
Track gauge
ChemSpider
Doctor of Philosophy
Regions of England
Districts of England
Christmas
Pavel Golia
Predjama Castle
Overtime (sports)
Forum
Swiss Hitparade
Stumped
Majority
Male
Shanghai
Siddharta (band)
```

--------------------------------------------------------------------------------
/libs/python/agent/agent/adapters/models/internvl.py:
--------------------------------------------------------------------------------

```python
from __future__ import annotations

from typing import Any, Dict, List, Optional

# Hugging Face imports are local to avoid hard dependency at module import
try:
    import base64  # type: ignore
    from io import BytesIO  # type: ignore

    # Attempt to import InternVL's model dependencies
    import einops as _  # type: ignore
    import requests  # type: ignore
    import timm as _  # type: ignore
    import torch  # type: ignore
    import torchvision.transforms as T  # type: ignore
    from PIL import Image  # type: ignore
    from torchvision.transforms.functional import InterpolationMode  # type: ignore
    from transformers import AutoModel, AutoTokenizer  # type: ignore

    HF_AVAILABLE = True
except Exception:
    HF_AVAILABLE = False


class InternVLModel:
    """Generic Hugging Face vision-language model handler.
    Uses InternVL's native `model.chat()` interface with `AutoTokenizer`.
    Provides preprocessing to support multi-turn conversations with multiple images.
    """

    def __init__(
        self, model_name: str, device: str = "auto", trust_remote_code: bool = False
    ) -> None:
        if not HF_AVAILABLE:
            raise ImportError(
                'InternVL dependencies not found. Install with: pip install "cua-agent[internvl-hf]"'
            )
        self.model_name = model_name
        self.device = device
        self.model = None
        self.tokenizer = None
        self.trust_remote_code = trust_remote_code
        self._load()

    def _load(self) -> None:
        # Load model
        self.model = AutoModel.from_pretrained(
            self.model_name,
            torch_dtype=torch.bfloat16,
            low_cpu_mem_usage=True,
            use_flash_attn=True,
            device_map=self.device,
            trust_remote_code=self.trust_remote_code,
        ).eval()
        # Load tokenizer (InternVL requires trust_remote_code=True and often use_fast=False)
        self.tokenizer = AutoTokenizer.from_pretrained(
            self.model_name,
            trust_remote_code=self.trust_remote_code,
            use_fast=False,
        )

    # ---- Image preprocessing utilities adapted from InternVL docs ----
    IMAGENET_MEAN = (0.485, 0.456, 0.406)
    IMAGENET_STD = (0.229, 0.224, 0.225)

    def _build_transform(self, input_size: int) -> T.Compose:
        MEAN, STD = self.IMAGENET_MEAN, self.IMAGENET_STD
        transform = T.Compose(
            [
                T.Lambda(lambda img: img.convert("RGB") if img.mode != "RGB" else img),
                T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
                T.ToTensor(),
                T.Normalize(mean=MEAN, std=STD),
            ]
        )
        return transform

    def _find_closest_aspect_ratio(
        self,
        aspect_ratio: float,
        target_ratios: List[tuple],
        width: int,
        height: int,
        image_size: int,
    ):
        best_ratio_diff = float("inf")
        best_ratio = (1, 1)
        area = width * height
        for ratio in target_ratios:
            target_aspect_ratio = ratio[0] / ratio[1]
            ratio_diff = abs(aspect_ratio - target_aspect_ratio)
            if ratio_diff < best_ratio_diff:
                best_ratio_diff = ratio_diff
                best_ratio = ratio
            elif ratio_diff == best_ratio_diff:
                if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
                    best_ratio = ratio
        return best_ratio

    def _dynamic_preprocess(
        self,
        image: Image.Image,
        min_num: int = 1,
        max_num: int = 12,
        image_size: int = 448,
        use_thumbnail: bool = True,
    ) -> List[Image.Image]:
        orig_width, orig_height = image.size
        aspect_ratio = orig_width / orig_height

        target_ratios = set(
            (i, j)
            for n in range(min_num, max_num + 1)
            for i in range(1, n + 1)
            for j in range(1, n + 1)
            if i * j <= max_num and i * j >= min_num
        )
        target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])

        target_aspect_ratio = self._find_closest_aspect_ratio(
            aspect_ratio, target_ratios, orig_width, orig_height, image_size
        )

        target_width = image_size * target_aspect_ratio[0]
        target_height = image_size * target_aspect_ratio[1]
        blocks = target_aspect_ratio[0] * target_aspect_ratio[1]

        resized_img = image.resize((target_width, target_height))
        processed_images: List[Image.Image] = []
        for i in range(blocks):
            box = (
                (i % (target_width // image_size)) * image_size,
                (i // (target_width // image_size)) * image_size,
                ((i % (target_width // image_size)) + 1) * image_size,
                ((i // (target_width // image_size)) + 1) * image_size,
            )
            split_img = resized_img.crop(box)
            processed_images.append(split_img)
        assert len(processed_images) == blocks
        if use_thumbnail and len(processed_images) != 1:
            thumbnail_img = image.resize((image_size, image_size))
            processed_images.append(thumbnail_img)
        return processed_images

    def _load_image_from_source(self, src: str) -> Image.Image:
        """Load PIL image from various sources: data URL, http(s), or local path."""
        if src.startswith("data:image/"):
            # data URL base64
            header, b64data = src.split(",", 1)
            img_bytes = base64.b64decode(b64data)
            return Image.open(BytesIO(img_bytes)).convert("RGB")
        if src.startswith("http://") or src.startswith("https://"):
            resp = requests.get(src, timeout=10)
            resp.raise_for_status()
            return Image.open(BytesIO(resp.content)).convert("RGB")
        # Assume local file path
        return Image.open(src).convert("RGB")

    def _images_to_pixel_values(
        self, images: List[Image.Image], input_size: int = 448, max_num: int = 12
    ):
        transform = self._build_transform(input_size=input_size)
        pixel_values_list = []
        num_patches_list: List[int] = []
        for img in images:
            tiles = self._dynamic_preprocess(
                img, image_size=input_size, use_thumbnail=True, max_num=max_num
            )
            pv = [transform(tile) for tile in tiles]
            pv = torch.stack(pv)
            num_patches_list.append(pv.shape[0])
            pixel_values_list.append(pv)
        if not pixel_values_list:
            return None, []
        pixel_values = torch.cat(pixel_values_list)
        return pixel_values, num_patches_list

    def generate(self, messages: List[Dict[str, Any]], max_new_tokens: int = 128) -> str:
        """Generate text for the given HF-format messages.
        messages: [{ role, content: [{type:'text'|'image', text|image}] }]

        This implementation constructs InternVL-compatible inputs and uses
        `model.chat(tokenizer, pixel_values, question, history=...)` to avoid
        relying on AutoProcessor (which fails for some tokenizers).
        """
        assert self.model is not None and self.tokenizer is not None

        # Build textual context and collect images and the final question
        context_lines: List[str] = []
        all_images: List[Image.Image] = []
        last_user_text_parts: List[str] = []

        for msg in messages:
            role = msg.get("role", "user")
            content = msg.get("content", [])
            if isinstance(content, str):
                content_items = [{"type": "text", "text": content}]
            else:
                content_items = content

            if role == "user":
                # Collect text and images
                parts_text: List[str] = []
                for item in content_items:
                    if item.get("type") == "text":
                        t = item.get("text", "")
                        if t:
                            parts_text.append(t)
                    elif item.get("type") == "image":
                        url = item.get("image", "")
                        if url:
                            try:
                                all_images.append(self._load_image_from_source(url))
                            except Exception:
                                # Ignore failed image loads but keep going
                                pass
                text = "\n".join(parts_text).strip()
                if text:
                    context_lines.append(f"User: {text}")
                # Track last user text separately for question
                last_user_text_parts = parts_text or last_user_text_parts
            elif role == "assistant":
                # Only keep text content for history
                parts_text = [
                    item.get("text", "") for item in content_items if item.get("type") == "text"
                ]
                text = "\n".join(parts_text).strip()
                if text:
                    context_lines.append(f"Assistant: {text}")

        # Prepare pixel values for all collected images (across turns)
        pixel_values = None
        num_patches_list: List[int] = []
        if all_images:
            pixel_values, num_patches_list = self._images_to_pixel_values(
                all_images, input_size=448, max_num=12
            )
            if pixel_values is not None:
                # Convert dtype/device as in docs
                pixel_values = pixel_values.to(torch.bfloat16)
                # Chat API expects tensors on CUDA when model is on CUDA
                try:
                    pixel_values = pixel_values.to(self.model.device)
                except Exception:
                    pass

        # Build question with any prior context and numbered image placeholders
        if all_images:
            # Separate images layout: Image-1: <image> ... then question text
            prefix_lines = [f"Image-{i+1}: <image>" for i in range(len(all_images))]
            prefix = "\n".join(prefix_lines) + "\n"
        else:
            prefix = ""

        last_user_text = "\n".join(last_user_text_parts).strip()
        # Combine prior text-only turns as context to emulate multi-turn
        context_text = "\n".join(context_lines[:-1]) if len(context_lines) > 1 else ""
        base_question = last_user_text if last_user_text else "Describe the image(s) in detail."
        if context_text:
            question = (context_text + "\n" + prefix + base_question).strip()
        else:
            question = (prefix + base_question).strip()

        # Generation config
        generation_config = dict(max_new_tokens=max_new_tokens, do_sample=False)

        # Call InternVL chat
        try:
            if pixel_values is None:
                # Pure-text conversation (embed prior turns in question)
                response = self.model.chat(self.tokenizer, None, question, generation_config)
            else:
                # Multi-image: pass num_patches_list if >1 image
                if len(num_patches_list) > 1:
                    response = self.model.chat(
                        self.tokenizer,
                        pixel_values,
                        question,
                        generation_config,
                        num_patches_list=num_patches_list,
                    )
                else:
                    response = self.model.chat(
                        self.tokenizer, pixel_values, question, generation_config
                    )
        except Exception as e:
            # Fallback: return empty string to avoid crashing the adapter
            return ""

        return response or ""

```

--------------------------------------------------------------------------------
/blog/training-computer-use-models-trajectories-1.md:
--------------------------------------------------------------------------------

```markdown
# Training Computer-Use Models: Creating Human Trajectories with Cua

_Published on May 1, 2025 by Dillon DuPont_

In our previous posts, we covered [building your own Computer-Use Operator](build-your-own-operator-on-macos-1) and [using the Agent framework](build-your-own-operator-on-macos-2) to simplify development. Today, we'll focus on a critical aspect of improving computer-use agents and models: gathering high-quality demonstration data using Cua's Computer-Use Interface (CUI) and its Gradio UI to create and share human-generated trajectories.

Why is this important? Underlying models used by Computer-use agents need examples of how humans interact with computers to learn effectively. By creating a dataset of diverse, well-executed tasks, we can help train better models that understand how to navigate user interfaces and accomplish real tasks.

<video src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134" controls width="600"></video>

## What You'll Learn

By the end of this tutorial, you'll be able to:

- Set up the Computer-Use Interface (CUI) with Gradio UI support
- Record your own computer interaction trajectories
- Organize and tag your demonstrations
- Upload your datasets to Hugging Face for community sharing
- Contribute to improving computer-use AI for everyone

**Prerequisites:**

- macOS Sonoma (14.0) or later
- Python 3.10+
- Basic familiarity with Python and terminal commands
- A Hugging Face account (for uploading datasets)

**Estimated Time:** 20-30 minutes

## Understanding Human Trajectories

### What are Human Trajectories?

Human trajectories, in the context of Computer-use AI Agents, are recordings of how humans interact with computer interfaces to complete tasks. These interactions include:

- Mouse movements, clicks, and scrolls
- Keyboard input
- Changes in the UI state
- Time spent on different elements

These trajectories serve as examples for AI models to learn from, helping them understand the relationship between:

1. The visual state of the screen
2. The user's goal or task
3. The most appropriate action to take

### Why Human Demonstrations Matter

Unlike synthetic data or rule-based automation, human demonstrations capture the nuanced decision-making that happens during computer interaction:

- **Natural Pacing**: Humans pause to think, accelerate through familiar patterns, and adjust to unexpected UI changes
- **Error Recovery**: Humans demonstrate how to recover from mistakes or handle unexpected states
- **Context-Sensitive Actions**: The same UI element might be used differently depending on the task context

By contributing high-quality demonstrations, you're helping to create more capable, human-like computer-use AI systems.

## Setting Up Your Environment

### Installing the CUI with Gradio Support

The Computer-Use Interface includes an optional Gradio UI specifically designed to make recording and sharing demonstrations easy. Let's set it up:

1. **Create a Python environment** (optional but recommended):

   ```bash
   # Using conda
   conda create -n cua-trajectories python=3.10
   conda activate cua-trajectories

   # Using venv
   python -m venv cua-trajectories
   source cua-trajectories/bin/activate  # On macOS/Linux
   ```

2. **Install the CUI package with UI support**:

   ```bash
   pip install "cua-computer[ui]"
   ```

3. **Set up your Hugging Face access token**:
   Create a `.env` file in your project directory and add your Hugging Face token:
   ```bash
   echo "HF_TOKEN=your_huggingface_token" > .env
   ```
   You can get your token from your [Hugging Face account settings](https://huggingface.co/settings/tokens).

### Understanding the Gradio UI

The Computer-Use Interface Gradio UI provides three main components:

1. **Recording Panel**: Captures your screen, mouse, and keyboard activity during demonstrations
2. **Review Panel**: Allows you to review, tag, and organize your demonstration recordings
3. **Upload Panel**: Lets you share your demonstrations with the community via Hugging Face

The UI is designed to make the entire process seamless, from recording to sharing, without requiring deep technical knowledge of the underlying systems.

## Creating Your First Trajectory Dataset

### Launching the UI

To get started, create a simple Python script to launch the Gradio UI:

```python
# launch_trajectory_ui.py
from computer.ui.gradio.app import create_gradio_ui
from dotenv import load_dotenv

# Load your Hugging Face token from .env
load_dotenv('.env')

# Create and launch the UI
app = create_gradio_ui()
app.launch(share=False)
```

Run this script to start the UI:

```bash
python launch_trajectory_ui.py
```

### Recording a Demonstration

Let's walk through the process of recording your first demonstration:

1. **Start the VM**: Click the "Initialize Computer" button in the UI to initialize a fresh macOS sandbox. This ensures your demonstrations are clean and reproducible.
2. **Perform a Task**: Complete a simple task like creating a document, organizing files, or searching for information. Natural, everyday tasks make the best demonstrations.
3. **Review Recording**: Click the "Conversation Logs" or "Function Logs" tabs to review your captured interactions, making sure there is no personal information that you wouldn't want to share.
4. **Add Metadata**: In the "Save/Share Demonstrations" tab, give your recording a descriptive name (e.g., "Creating a Calendar Event") and add relevant tags (e.g., "productivity", "time-management").
5. **Save Your Demonstration**: Click "Save" to store your recording locally.

<video src="https://github.com/user-attachments/assets/de3c3477-62fe-413c-998d-4063e48de176" controls width="600"></video>

### Key Tips for Quality Demonstrations

To create the most valuable demonstrations:

- **Start and end at logical points**: Begin with a clear starting state and end when the task is visibly complete
- **Narrate your thought process**: Use the message input to describe what you're trying to do and why
- **Move at a natural pace**: Don't rush or perform actions artificially slowly
- **Include error recovery**: If you make a mistake, keep going and show how to correct it
- **Demonstrate variations**: Record multiple ways to complete the same task

## Organizing and Tagging Demonstrations

Effective tagging and organization make your demonstrations more valuable to researchers and model developers. Consider these tagging strategies:

### Task-Based Tags

Describe what the demonstration accomplishes:

- `web-browsing`
- `document-editing`
- `file-management`
- `email`
- `scheduling`

### Application Tags

Identify the applications used:

- `finder`
- `safari`
- `notes`
- `terminal`
- `calendar`

### Complexity Tags

Indicate the difficulty level:

- `beginner`
- `intermediate`
- `advanced`
- `multi-application`

### UI Element Tags

Highlight specific UI interactions:

- `drag-and-drop`
- `menu-navigation`
- `form-filling`
- `search`

The Computer-Use Interface UI allows you to apply and manage these tags across all your saved demonstrations, making it easy to create cohesive, well-organized datasets.

<video src="https://github.com/user-attachments/assets/5ad1df37-026a-457f-8b49-922ae805faef" controls width="600"></video>

## Uploading to Hugging Face

Sharing your demonstrations helps advance research in computer-use AI. The Gradio UI makes uploading to Hugging Face simple:

### Preparing for Upload

1. **Review Your Demonstrations**: Use the review panel to ensure all demonstrations are complete and correctly tagged.

2. **Select Demonstrations to Upload**: You can upload all demonstrations or filter by specific tags.

3. **Configure Dataset Information**:
   - **Repository Name**: Format as `{your_username}/{dataset_name}`, e.g., `johndoe/productivity-tasks`
   - **Visibility**: Choose `public` to contribute to the community or `private` for personal use
   - **License**: Standard licenses like CC-BY or MIT are recommended for public datasets

### The Upload Process

1. **Click "Upload to Hugging Face"**: This initiates the upload preparation.

2. **Review Dataset Summary**: Confirm the number of demonstrations and total size.

3. **Confirm Upload**: The UI will show progress as files are transferred.

4. **Receive Confirmation**: Once complete, you'll see a link to your new dataset on Hugging Face.

<video src="https://github.com/user-attachments/assets/c586d460-3877-4b5f-a736-3248886d2134" controls width="600"></video>

Your uploaded dataset will have a standardized format with the following structure:

```json
{
  "timestamp": "2025-05-01T09:20:40.594878",
  "session_id": "1fe9f0fe-9331-4078-aacd-ec7ffb483b86",
  "name": "penguin lemon forest",
  "tool_calls": [...],  // Detailed interaction records
  "messages": [...],    // User/assistant messages
  "tags": ["highquality", "tasks"],
  "images": [...]       // Screenshots of each state
}
```

This structured format makes it easy for researchers to analyze patterns across different demonstrations and build better computer-use models.

```python
from computer import Computer

computer = Computer(os_type="macos", display="1024x768", memory="8GB", cpu="4")
try:
    await computer.run()

    screenshot = await computer.interface.screenshot()
    with open("screenshot.png", "wb") as f:
        f.write(screenshot)

    await computer.interface.move_cursor(100, 100)
    await computer.interface.left_click()
    await computer.interface.right_click(300, 300)
    await computer.interface.double_click(400, 400)

    await computer.interface.type_text("Hello, World!")
    await computer.interface.press_key("enter")

    await computer.interface.set_clipboard("Test clipboard")
    content = await computer.interface.copy_to_clipboard()
    print(f"Clipboard content: {content}")
finally:
    await computer.stop()
```

## Example: Shopping List Demonstration

Let's walk through a concrete example of creating a valuable demonstration:

### Task: Adding Shopping List Items to a Doordash Cart

1. **Start Recording**: Begin with a clean desktop and a text file containing a shopping list.

2. **Task Execution**: Open the file, read the list, open Safari, navigate to Doordash, and add each item to the cart.

3. **Narration**: Add messages like "Reading the shopping list" and "Searching for rice on Doordash" to provide context.

4. **Completion**: Verify all items are in the cart and end the recording.

5. **Tagging**: Add tags like `shopping`, `web-browsing`, `task-completion`, and `multi-step`.

This type of demonstration is particularly valuable because it showcases real-world task completion requiring multiple applications and context switching.

### Exploring Community Datasets

You can also learn from existing trajectory datasets contributed by the community:

1. Visit [Hugging Face Datasets tagged with 'cua'](https://huggingface.co/datasets?other=cua)
2. Explore different approaches to similar tasks
3. Download and analyze high-quality demonstrations

## Conclusion

### Summary

In this guide, we've covered how to:

- Set up the Computer-Use Interface with Gradio UI
- Record high-quality human demonstrations
- Organize and tag your trajectories
- Share your datasets with the community

By contributing your own demonstrations, you're helping to build more capable, human-like AI systems that can understand and execute complex computer tasks.

### Next Steps

Now that you know how to create and share trajectories, consider these advanced techniques:

- Create themed collections around specific productivity workflows
- Collaborate with others to build comprehensive datasets
- Use your datasets to fine-tune your own computer-use models

### Resources

- [Computer-Use Interface GitHub](https://github.com/trycua/cua/tree/main/libs/python/computer)
- [Hugging Face Datasets Documentation](https://huggingface.co/docs/datasets)
- [Example Dataset: ddupont/test-dataset](https://huggingface.co/datasets/ddupont/test-dataset)

```
Page 9/20FirstPrevNextLast