I’ve been full swing on using agents (primarily Codex with a mix of Claude) the past few weeks working on a few personal software projects, and I’ve started to get into a rhythm of an approach that is working for me, and results in relatively steady progress without many false starts, wild expansive scope, or complete do-overs.

Work small

I like to keep the feedback loops relatively small, which means asking for small iterations and then checking often.

The guideline I’ve found that keeps the process moving along is to only ask it to build something that I could reasonably test in a few minutes.

“Test” in this case is more like an acceptance test — did the software do the thing I hoped it would do? I’ve actually found the automated tests that the agents have been generating and automatically running after every change to be fairly complete so this has been the “Is this what I wanted? Does this feel right?” phase.

When I first started using these tools, I’d ask for fairly large, open-ended changes and then give incredibly specific instructions for one aspect of it. This never worked. I’d get something that was akin to what I asked for but usually with lots of things I didn’t want, and few things that I did. I’d also have no idea how it fit together, tradeoffs it made, or even how to test it fully.

Another benefit of working small is that I’m spending most of my time verifying and tweaking which is way less frustrating than trying to rephrase and describe a full worldview to the model so that it might guess better next time.

Use plan mode liberally

All the major players have some version of plan mode, where the agents are explicitly forbidden from writing code and instead come up with an output that is a readable doc.

My first couple of attempts at these would be to write a full requirements doc, similar to the ones a product manager would write. But, I found that I could be much more concise and get just-as-good levels of plans.

For any feature of some complexity or ambiguity, or where I wasn’t quite sure I what I wanted, I’d do this:

  • Enter plan mode
  • Write a quick, one sentence version of what I wanted
  • Add in some extra context of the motivation if I thought it was useful
  • If I knew of some edge cases or error conditions that I knew were important, I would add them here
  • If it was similar to another feature, I’d mention that to give it a head-start of what “good” would look like
  • Then, at the end, I literally say, “Ask me lots of questions.”

And it would! It’d go off and read through code and docs, and quickly come back with some implementation questions, scope questions (“do you want the kit-and-kaboodle, or the quick-and-easy?”), and if at any point I wasn’t sure, I’d stop and go back and forth on a minor point until I was able to figure out and describe what should happen.

Plan mode is a really nice mix of being high-level and then being presented with some tradeoffs where it matters.

Side benefit: Sometimes a feature is one where I’m not sure what I wanted, but I would know it when I saw it. Just being quizzed by the agent on tradeoffs sometimes has resulted in me seeing the kernel of the feature better, while also realizing that it was way more complexity than I wanted to add and I’d scrap the whole feature.

Fast, automated tests

My current software project is a CLI tool, written in go. My original prompt included mentioning it needed to be robust and that tests should be maintained and ran before returning success. The magic is watching it make changes, run the tests, see breakage, figure out what needs to change, and then changing the code to get to green again.

For the current project, I had used agents to write 100% of the code. Not a single line or character change was manual. I had a minor text change I wanted to make so I manually updated the text and sure enough broke the tests. It turned out that the minor text, in my eyes, was covered in tests coverage 3 different ways, which gave me a lot of confidence in the rest of the tests.

One thread at a time

This one is probably controversial and may be as much of a reflection of my relative newbieness to this way of working but I really like the one thread at a time approach.

I know people spend time on writing orchestration systems so they can spin up multiple changes at one time but for me, the bottleneck is what can I reason through and verify. Having multiple things in flight means reasoning about how they interact, and that turns play into work.

The architecture matters

Easy to read that “architecture matters” as a no-duh, of-course-it-does statement, but I think it is more interesting than a quick read suggests.

Many of the reasons that people favor certain tools or patterns are things that are, in some way, less relevant now. “This tool doesn’t have good documentation” doesn’t matter when agents can just vacuum the whole code into context. And any missing shim or API or wrapper can be quick work for a well-prompted LLM.

But, there is a compounding benefit in choosing certain tools or patterns. For my current project, I picked go + bash as the weapons of choice. Go is fast to compile and has a good testing story and that makes for fast feedback loops. There is also a tremendous amount of go and bash code in the world and LLMs are pretty dang good at producing idiomatic go code.

And, having seen lots of systems over the years, and knowing why some are “good” and others “less good” given certain constraints, absolutely helps when making tradeoffs or figuring out which paths to go down and which to avoid. This is probably the “taste” that people say doesn’t scale. (I am not sure what I believe here yet.)


Wrapping up, my biggest suggestion is to resist the urge to go big. Work in mind-sized chunks, use your taste and judgement each step of the way, and you’ll find the momentum builds from there. I have no doubt my approach will be wildly different as people figure out what these systems are capable of, and their limitations, but right now, this rhythm is working for me and I’m having fun.