Automated Design of Agentic Systems

TL;DR

In this work, we describe a newly forming research area Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them in new ways.

We present a simple yet effective ADAS algorithm named Meta Agent Search to demonstrate that agents can invent novel and powerful agent designs by programming in code.

overview

In Meta Agent Search, we instruct the "meta" agent to iteratively program new agents, test their performance on tasks, add them to an archive of discovered agents, and use this archive to inform the meta agent in subsequent iterations.

New Research Area: Automated Design of Agentic Systems (ADAS)

overview

The three key components of ADAS. The search space determines which agentic systems can be represented in ADAS. The search algorithm specifies how the ADAS method explores the search space. The evaluation function defines how to evaluate a candidate agent on target objectives such as performance.

Experiments

overview

The results of Meta Agent Search on the ARC challenge. (a) Meta Agent Search progressively discovers high-performance agents based on an ever-growing archive of previous discoveries. We report the median accuracy and the 95% bootstrap confidence interval on a held-out test set by evaluating agents five times. (b) The visualization of the best agent discovered by Meta Agent Search on the ARC challenge.

overview

Performance comparison between Meta Agent Search and state-of-the-art hand-designed agents across multiple domains. Meta Agent Search discovers superior agents compared to the baselines in every domain. We report the test accuracy and the 95% bootstrap confidence interval on held-out test sets. The search is conducted independently for each domain.

overview

Importantly, we consistently observe the surprising result that agents invented by Meta Agent Search maintain superior performance even when transferred across domains and models, demonstrating their robustness and generality. Here, we present performance across multiple domains when transferring top agents from the Math (MGSM) domain to non-math domains. Agents discovered by Meta Agent Search in the math domain can outperform or match the performance of baselines after being transferred to domains beyond math. We report the test accuracy and the 95% bootstrap confidence interval. All results on transferability are available in the paper.

Citation

Acknowledgements

This work was supported by the Vector Institute, the Canada CIFAR AI Chairs program, grants from Schmidt Futures and Open Philanthropy, an NSERC Discovery Grant, and a generous donation from Rafael Cosman. We thank Jenny Zhang, Rach Pradhan, Ruiyu Gou, Nicholas Ioannidis, and Eunjeong Hwang for insightful discussions and feedback.

The website template was borrowed from Jon Barron.