Posts tagged "llm" - nolan caudill's internet house

Claude Cowork and Home Assistant

Tue, 07 Apr 2026 17:27:50 -0700

I have not dabbled with any of the OpenClaw ecosystem (yet) but I’ve used Claude enough on non-coding local tasks that I wanted to give Claude Cowork a try.

Cowork is a native app that lets Claude do more file and local machine manipulation. Specifically, combined with the Chrome extension that lets Claude access and drive a browser session.

Recently, I’ve started adding a few more home automation sensors like door sensors and lights. I’m using Home Assistant to organize and build automations. The Home Assistant UI is extensive but creating automations gets unwieldy, especially when you have a few dozen and want to refactor across automations.

This seemed like it’d be a perfect test run of Cowork + Chrome for Cowork.

The first task I gave it was creating a better default dashboard. I’m sure I could have fiddled my way to something usable, but Claude with just a little direction created a dashboard that was admittedly better than what I could have gotten to, and definitely not in the one or two minutes it took.

Satisified that it was doing reasonable things, I decided to tackle my automations.

First, I had it create a couple that I had wanted but hadn’t gotten to yet, like use tomorrow’s forecast and enable notifications to tell me to open and close windows & doors if the day was going to be warm and another one that would send me a nightly report that everything was locked up and turned off.

The fun one: if I signaled that I was leaving on a bike ride (by clicking a new iOS widget on my lock screen), my garage door would automatically open when I got close to the house so I could roll straight in.

That all worked so I asked Claude to look over all the automations and refactor for redundancy, standardize how the notifications looked, and fix any weird gotchas. It gave me a list of 17 things it found, and I went through each one and gave feedback (eg, do it, do it but with this tweak, ignore this one) and it ripped through the changes.

This was a fun way to spend a morning and I was pleased with how effortless it was to now change something that had been previously fiddly.

LLMs to fix calendar files

Thu, 26 Feb 2026 08:21:37 -0800

I’ve started using Claude to do something that I really shouldn’t have to: making calendar files.

Every organization that can offer a simple calendar URL that I can subscribe to wants to make it fancy (read: harder). The Giants baseball schedule requires you give them an email address, no doubt so I can be put on some marketing list. The youngest kid’s soccer league has a bug where I only can subscribe to the “home” games (despite every game, home or away, being held at the exact same field…). The adult rec league calendar awkwardly titles the events. And so on.

To fix all these little gotchas and obstacles, I tell Claude to read a website and create an ICS file, which it does with no sweat. And, it will even add little surprise tweaks like adding a checkered-flag emoji to the front of every F1 race.

Important note: If you point an LLM at a URL to create a schedule from, there is no guarantee that the timezone it’s fetched in will match your own. The workaround for me on this is to print-to-PDF the calendar and then tell the model to use that instead of the URL.

My current agentic coding approach

Sun, 22 Feb 2026 16:41:52 -0800

I’ve been full swing on using agents (primarily Codex with a mix of Claude) the past few weeks working on a few personal software projects, and I’ve started to get into a rhythm of an approach that is working for me, and results in relatively steady progress without many false starts, wild expansive scope, or complete do-overs.

Work small

I like to keep the feedback loops relatively small, which means asking for small iterations and then checking often.

The guideline I’ve found that keeps the process moving along is to only ask it to build something that I could reasonably test in a few minutes.

“Test” in this case is more like an acceptance test — did the software do the thing I hoped it would do? I’ve actually found the automated tests that the agents have been generating and automatically running after every change to be fairly complete so this has been the “Is this what I wanted? Does this feel right?” phase.

When I first started using these tools, I’d ask for fairly large, open-ended changes and then give incredibly specific instructions for one aspect of it. This never worked. I’d get something that was akin to what I asked for but usually with lots of things I didn’t want, and few things that I did. I’d also have no idea how it fit together, tradeoffs it made, or even how to test it fully.

Another benefit of working small is that I’m spending most of my time verifying and tweaking which is way less frustrating than trying to rephrase and describe a full worldview to the model so that it might guess better next time.

Use plan mode liberally

All the major players have some version of plan mode, where the agents are explicitly forbidden from writing code and instead come up with an output that is a readable doc.

My first couple of attempts at these would be to write a full requirements doc, similar to the ones a product manager would write. But, I found that I could be much more concise and get just-as-good levels of plans.

For any feature of some complexity or ambiguity, or where I wasn’t quite sure I what I wanted, I’d do this:

Enter plan mode
Write a quick, one sentence version of what I wanted
Add in some extra context of the motivation if I thought it was useful
If I knew of some edge cases or error conditions that I knew were important, I would add them here
If it was similar to another feature, I’d mention that to give it a head-start of what “good” would look like
Then, at the end, I literally say, “Ask me lots of questions.”

And it would! It’d go off and read through code and docs, and quickly come back with some implementation questions, scope questions (“do you want the kit-and-kaboodle, or the quick-and-easy?”), and if at any point I wasn’t sure, I’d stop and go back and forth on a minor point until I was able to figure out and describe what should happen.

Plan mode is a really nice mix of being high-level and then being presented with some tradeoffs where it matters.

Side benefit: Sometimes a feature is one where I’m not sure what I wanted, but I would know it when I saw it. Just being quizzed by the agent on tradeoffs sometimes has resulted in me seeing the kernel of the feature better, while also realizing that it was way more complexity than I wanted to add and I’d scrap the whole feature.

Fast, automated tests

My current software project is a CLI tool, written in go. My original prompt included mentioning it needed to be robust and that tests should be maintained and ran before returning success. The magic is watching it make changes, run the tests, see breakage, figure out what needs to change, and then changing the code to get to green again.

For the current project, I had used agents to write 100% of the code. Not a single line or character change was manual. I had a minor text change I wanted to make so I manually updated the text and sure enough broke the tests. It turned out that the minor text, in my eyes, was covered in tests coverage 3 different ways, which gave me a lot of confidence in the rest of the tests.

One thread at a time

This one is probably controversial and may be as much of a reflection of my relative newbieness to this way of working but I really like the one thread at a time approach.

I know people spend time on writing orchestration systems so they can spin up multiple changes at one time but for me, the bottleneck is what can I reason through and verify. Having multiple things in flight means reasoning about how they interact, and that turns play into work.

The architecture matters

Easy to read that “architecture matters” as a no-duh, of-course-it-does statement, but I think it is more interesting than a quick read suggests.

Many of the reasons that people favor certain tools or patterns are things that are, in some way, less relevant now. “This tool doesn’t have good documentation” doesn’t matter when agents can just vacuum the whole code into context. And any missing shim or API or wrapper can be quick work for a well-prompted LLM.

But, there is a compounding benefit in choosing certain tools or patterns. For my current project, I picked go + bash as the weapons of choice. Go is fast to compile and has a good testing story and that makes for fast feedback loops. There is also a tremendous amount of go and bash code in the world and LLMs are pretty dang good at producing idiomatic go code.

And, having seen lots of systems over the years, and knowing why some are “good” and others “less good” given certain constraints, absolutely helps when making tradeoffs or figuring out which paths to go down and which to avoid. This is probably the “taste” that people say doesn’t scale. (I am not sure what I believe here yet.)

Wrapping up, my biggest suggestion is to resist the urge to go big. Work in mind-sized chunks, use your taste and judgement each step of the way, and you’ll find the momentum builds from there. I have no doubt my approach will be wildly different as people figure out what these systems are capable of, and their limitations, but right now, this rhythm is working for me and I’m having fun.

Writing a Simulator

Tue, 17 Feb 2026 13:46:32 -0800

To give myself a slightly-bigger project, I came up with the idea of writing an simulator game where you’re the CTO of a small startup with a goal of helping your company get to an IPO.

I just started it around dinnertime last night, writing up a few paragraphs with how things like projects, budgets, hiring, incidents, and so on work. I let Claude and Codex rip on a planning (“here’s some words. now ask me questions.”) and implementation loop. After every loop, I’d play around with it, think of a thing that’d make it better, and repeat.

This post isn’t about writing software with agents (though I am amazed and am learning a lot about what they are good at and not so capable of) but a more general thing I’m learning when you’re writing a simulator.

Writing this simulator is as much encoding how I think the job of engineering manager works than anything else. If someone said, “How would you do the mechanics of the job?,” this would be an interactive answer to that question which is neat in a meta way.

When thinking through the elements of the game and how they work and interact, I started to use (my many) years of seeing these things up close to model how I think they should behave.

The first part of writing the simulator was mostly coming up with the nouns, the physical objects: projects, engineers, budgets, incidents and then, what sort of attributes they’d have. Projects have a timeline, whether or not they are started or not, which team they are assigned to and other things that seemed relevant to the flow of the game.

But, thinking harder, I started to think of things that “real” projects have. They have a distribution curve of when they’ll actually finish. The individuals that start on a project might not be the same set throughout. And, when you assign a project to an already busy team, each of project takes a little longer (though you as the CTO/manager might not know that until too late).

I realized that writing a simulator about something you know well, you quickly see where the simulator isn’t correct and you add that in. And, before long, I realized I haven’t written a generic “you’re the CTO” game but instead an interactive way to explore how I specifically see the job. I believe someone else given this prompt would pick different things to emphasize, objects interacting differently, and likely even different success criteria.

(Also, I don’t think the game is “fun” yet but I’m not sure a simulation of a fairly bureaucratic job would be fun? If it does up being fun, or at least interesting in my eyes, I’ll likely open source it but I’m not there yet.)

The other part of this exercise that is interesting is that the game does have some emergent properties. For example, new engineers have some amount of ramp before they are fully productive. New engineers on a project might take longer or make more mistakes (which is natural!). Newer engineers add some risk to projects and missed projects hurt the CTO’s standing and lowers the morale of the team. Each of these consequences follow naturally based on some simple rules, but when interacting together, every game is slightly different and somewhat surprising.

The AI hater's guide to code With LLMs

Fri, 13 Feb 2026 15:08:33 -0800

A fairly clear-eyed view of the LLM landscape. After plugging away with these tools for a few days now, this quote resonates:

Coding with automated systems like this is intoxicating. It’s addictive, because it’s the lootbox effect. We don’t get addicted to rewards. We get addicted to potential rewards. Notice that gamblers aren’t actually motivated by having won. They’re motivated by maybe winning next time. It can lead us to the glassy eyed stare with a bucket of quarters at a slot machine, and it can lead us to 2am “one more prompt, maybe it’ll work this time” in a hurry.

from The AI hater’s guide to code with LLMs (The Overview)

Blink: A more secure OpenClaw

Fri, 13 Feb 2026 09:56:59 -0800

I have been intrigued by OpenClaw – a system that uses LLM agents to act locally on your behalf, doing things like sorting email, filling out webforms, cleaning up files, etc – but the security seemed like a complete afterthought. I was happy for folks using that, but it was something I would not run for myself.

This, though, looks interesting and well thought-out. I’m still not at the point where I’d set this up but this is moving things in the right direction.

Why I Ditched OpenClaw and Built a More Secure AI Agent on Blink + Mac Mini:

The community responded with workarounds: layering on firewalls, VPN tunnels, and reverse proxies. These were patches on a system that wasn’t built with security at its core. OpenClaw was designed as a single-user local tool that organically grew into something much bigger. Security was bolted on after the fact instead of baked in from the start.

I wanted everything OpenClaw offered: a personal AI agent on my own hardware, connected to my real tools, available around the clock. But I also wanted a system where the secure setup is the default, without requiring constant hardening and maintenance.

Fun with LLMs

Thu, 12 Feb 2026 21:41:26 -0800

I had a lot of fun playing Claude and OpenRouter today, trying out new techniques, prompts and different models. I realize I’m about 6 months to 2 years behind a lot of people but this is truly the most fun I’ve had writing software in a long long time.

Just today I’ve:

Diagnosed a bug with how I was added SSL to the miniflux installation
Completely revamped my vimrc
Completely revamped my zshrc
“Wrote” a basic test harness and test suite for this blog (in bash, naturally)
Played with 3-4 different models
And most fun of all, I sat with our 9 year old this afternoon (who is home because of the strike) and wrote a complete browser-based store simulator, where you are a shopkeeper and have to do pricing and inventory management

And now I’m spending the evening reading about how best to use these tools beyond barking commands at them.