Autonomous Agents: AIs Taking Action at the AGI House | Machine Yearning 005
If LLMs are the brains of AI, then Agents are the hands.
This weekend, I had the chance to participate in a fascinating hackathon at AGI House, a community of AI founders and researchers known for their awesome events and amazing speakers. The theme for this hackathon was "Autonomous Apps," with an intriguing twist: using a new form of technology called "Agents." Our team of 7 hackers decided to take the theme of autonomous app development a little too literally… read on for how!
Setting the Scene: Introducing "Agents"
If you’ve been listening to AI twitter, you’ve probably heard of “Agents” like BabyAGI, AgentGPT, and others. What are Agents anyway, and why were we at AGI House testing their limits in a hackathon?
Put one way, if large language models (LLMs) are the brains of advanced AI systems, Agents would be the hands, using LLMs to automate complex chains of tasks. With an Agent, instead of discretely programming a model, you can prompt it with an objective in natural language (e.g. “take my information and register for a frequent flyer account number on these 90 airlines’ websites”), and using an LLM it will learn the steps to accomplish that objective. It’s like Excel macros or robotic process automation (RPA), but on steroids.
A lot of traditional software is designed such that humans still end up doing most of the work. Things like data entry, financial and accounting processes, or IT operations. Menial but labor-intensive. Throughout the hackathon, we saw brilliant teams and demos which took the tedium of legacy workflows and chucked them out the window. This thread from Alex Reibman captured all the demos (including ours!).
Our Approach: Building an App Autonomously
Our team decided to take a literal approach to the theme of “autonomous agents.” Inspired by the recent paper "Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents" by Yashar Talebirad and Amirhossein Nadiri, we created a constrained environment where multiple agents could collaborate towards a common goal: autonomous app development in Agile sprints.
For our project, we chose an Agile development sprint in Linear (a popular project management tool), as our playground. Our playing pieces? A team of Agents designed to mimic the roles of a typical scrum team.
We devised three specialized Agents, each taking on a role in the Agile process with a biological analogue. To manage this efficiently, we also introduced an executive agent to oversee task delegation and bandwidth checks.
AutoPM: This GPT-4 powered agent accepts problem statements or user feedback, turning them into customer pain points. These are then translated into a framework such as "jobs to be done," converted into potential features, then refined, ranked, and finalized into a table of features and engineering sub-tasks. It uses human PM feedback before sending the tasks to AutoArch.
AutoArch: AutoArch prepares the groundwork before coding begins. It uses GPT-4 to draft markup languages for creating UML diagrams and generate the file structure for the project. These resources provide the scaffolding for the engineering work.
AutoEng: With a clear customer problem, component tasks that ladder into features, and architectural constraints, AutoEng is set to start writing code.
We quickly found that off-the-shelf LLMs were capable at most tasks, but not perfect. The AutoPM, for instance, required a lot of human feedback before arriving at product recommendations that were relevant, actionable, rewarding, and specific enough for engineers to work with.
The Outcome: Success!
Despite the experimental nature of our project and only 8 hours to do it, everything (mostly) fell into place! Once the team aligned on AutoPM’s features and sub-tasks, the whole Agent relay would take LESS THAN A MINUTE before working code is shipped.
To reiterate: this is strictly a toolchain of an LLM, a UML generator, and the Linear API. And apparently… that’s all you need to manage a team of Agents in an Agile sprint.
Astute readers will have already heard of “chain-of-thought prompting” (basically, showing your work) and how that elicits better reasoning skills in LLMs. We’re effectively replicating those advancements using Linear as a scratchpad - letting Agentic team members show their work, share context, and provide clarifications where needed.
While we had to scrap some of the elements we had hoped to include, such as AutoUXR for user feedback curation and AutoEng specializations, we were amazed we even got this far!
You can check out this 45s demo for yourself to see the whole thing in action on Linear.
Here’s another example generating a Tic Tac Toe game from scratch. It’s not a difficult coding exercise, but drafting a diagram, file structure, and deployment destination gives the AutoEng much more of a scaffolding.
Takeaways: Specializations and Human Feedback
Despite the rapid turnaround, we got some valuable insights about working with agents and LLMs:
Off-the-shelf LLMs are very capable at handling Agile tasks, but overloading a single LLM with the end-to-end process was highly error-prone.
Most agents today are fragile - small changes to web layouts or APIs could break their logic.
Breaking the process into specialized roles and discrete tasks made error tracing easier by avoiding “black box” productivity losses.
While we didn’t implement robust RLHF features, we definitely saw how human feedback enhanced agent outputs, especially on AutoPM.
If you imagine integrating this with other enterprise apps like Slack, Airtable, or email… you could create an entire workforce of digital agents, with auditable trails to measure and refine work outputs. How crazy would that be?
We walked away with new ideas, a deeper appreciation for advanced AI, and a glimpse into a future where intelligent agents could transform software development. Can't wait to see what's next!
Special thanks to the entire Relay team (Gurkaran, Tony, Anand, Travis, Steve, and Pascal), to Rocky Yu and AGI House for hosting, and to the event sponsors WING VC, RunPod, MultiON for helping put on a great event with an awesome theme.
Disclaimer: This post was co-written by an AI.