Today's paper introduces Windows Agent Arena, a benchmark for evaluating multi-modal agents within the Windows operating system.
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Windows Agent Arena: Evaluating Multi-Modal…
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Today's paper introduces Windows Agent Arena, a benchmark for evaluating multi-modal agents within the Windows operating system.