I just finished a trip to attend the ODSC conference in Boston, leaving without really understanding where we are on the hype cycle of Agentic AI . But on the flight back, I read on article2 where Rogé Karma was walking back the stance he had about the AI bubble given new revenue data. The conference gave me a lot of confidence that regardless of what benefit agent AI will ultimately have, there was now no doubt that companies and individuals who do not upskill will fill behind in one way or another. But the article put down some numbers about the new revenue that Anthropic, open Al cursor and the data center companies they rely on, Microsoft, Google, Amazon, Core weave, were now recently experiencing, taking them perhaps, out of bubble territory.
At the conference, one of the first talks I went to was about private LLMs, and during this talk the speaker1 said that the adoption of AI found a cheat code, agentic AI. He pointed out that although, the cost chasm was showing only 15% chat style adoption, when tools now added agents interfaces into themselves into the existing, mediums people already use, then adoption magically went to 100%.
Perhaps unironically, as I was at my gate waiting for departure, I also saw a post from a friend about https://copy.fail, the newly discovered Linux privilege escalation vulnerability, where a user mode process can su to root. So I keep this in mind while I learn about people excitedly giving anthropic access to rumage around executing on their lap tops.
Back to the speaker’s presentation, his point was that indeed the risk of opening up your company to Anthropic or other companies, giving them a front row seat to your company’s data, can be avoided with the open weight models that are now becoming as good as their SaaS counterparts, and with the per token cost coming up, along with surge pricing, also perhaps even a cost benefit too.
Between talks, I spoke with colleagues about the future of code. One was inspired by a conversation I had with one speaker on program static analysis11, Armando Solar-Lezama, potentially making a come back. Or at least his research team was identifying that there was a gap in code evaluation, bug evaluation in particular. He noted that even Mythos –which he said he did not have access to–, would not find all the bugs, but for the sake of reliability, we do need to find all the bugs. So after his talk, I asked him what was his vision? I wondered, hey, python was so popular for data science, because it was so effortless to start learning, but also come with ease to create bugs that static type and memory safe languages do not run into like Golang. And it sounded like yes he sees there is definitely now more room for strongly typed memory safe languages like Rust, though that will still not be enough to make sure new code that is generated is reliable. I think one of my take aways from his talk was that the code generation is producing an unprecedented volume of code that needs to be reviewed and we desparately need better ways of vetting it for reliability. And hopefully code gen now frees us up to focus on bringing back precisely that kind of code analysis that was very popular in the early 2000s but then faded away.
One colleague I spoke with was in the audience for this talk as well . They had transitioned to our ml platform team from date science and, self analyzed that their coding skills were not so great, however, they have been leaning into code gen since it was starting to be a thing a few years ago, and so they have not developed past their plateau, however at this point, and perhaps with this conference in particular, they are not yet convinced there is a benefit to get good at writing code.
Roge Karma points out4 in his article, citing a SemiAnalysis piece5, deconstructing knowledge work as chunks of Read, Think, Write and Verify. And that makes it a good candidate for building blocks in agentic flows, that can be learned, as long as the criteria are well defined. I pick things up I put things down. In other words, can knowledge work be cut up into units of work that are commoditized. I would flag here that this sounds remarkably similar to the vision of the waterfall software planning model that the agile manifesto6 of 2001 responded too, as well as the Data Science as Pin Factory article7,8 from 2019 written in response to the desire to assembly-line-ify data science. The agile software movement pointed out that software projects are messy and customers cannot accurately describe what they want. And Eric Colson extended this to the messiness of extracting signal from data. In fact his description of the ideal data science pin factory echoes that SemiAnalysis article:
“one person sources the data, another models it, a third implements it, a fourth measures it”
During the conference, I was chatting with another colleague who was excitedly plotting how she can carve out some EDA time soon–Exploratory Data Analysis time–with an unstructured dataset she has been sitting on, using some new techniques the conference inspired her to try. She believes she would need at least a good 6 months, of, finding time in the cracks of her day job, to determine if there is enough there there in her dataset, before even proposing an improvement that her customer can consider.
In an interview9 with Peter Steinberger–creator of OpenClaw, an open source agent–, I listened to, he described his niche as “difficult but not too interesting”. This is precisely the opposite of low hanging fruit, the problems that are right there in front of you, easy to understand, easy to describe quick wins. He responds to people who attempt to preplan a backlog of units of work, orchestrating a team of agents to coordinate and execute on the plan:
“I don’t believe this works. Like, this is the waterfall model of software building. This we learned long ago that this doesn’t work. Like, yes, people work differently and maybe it does work for some. I just don’t see how this could work for me. Like, I have to start with an idea and often I purposefully under-prompt the agent so it would do something that would give me new ideas. You like maybe like 80% of the things I assumed were like crap, but like there were like two things like, ‘oh, I didn’t think about that way.’
“And then I iterate and shape the project. And I have to click it. I have to, like, I have to feel it. I feel, to make good software, you know one thing those things often lack is taste. I have to feel like, how does this feature feel? And the beauty now is that features are so easy, I can just, like, throw it away or, like, re-prompt it. My building model is usually very much forward. It’s very rarely that I actually revert and have to go back. It’s just, like, ‘okay, no, then let’s change this. No, let’s do this.’ It’s like it’s like shaping. I love how this, like, you start with a rock and then you, like, chisel away at it and, like, pick different areas, and then slowly like this statue emerges out of out of marble. That’s how I see, that’s how I see building something.”
That is a reflection on the creative process. I think if anyone would, Steinberger would be a good judge of how agentic programming can massively speed up your experimentation loop, but it nevertheless is a loop you cannot reduce into a clearly defined deterministic sequence of units you can assign over to your army of agents.
I hear Steinberger’s take, more than anything as, that agentic programming is the final nail in the coffin of using product planned roadmaps to derisk quarter long software development efforts.
Perhaps we can acknowledge though that there are still then two kinds of work in the themes of explore and exploit: spikes that are open ended that produce research artifacts and repetitive tasks that are more well defined because you have done them many times already. And agentic work can perhaps make the first kind easier to bound box.
An Internet for AI Agents
The first talk I attended was practical. It reminded me of a more fleshed out moltbook.com. Ramesh Raskar laid out a vision10 for how agents can communicate in the future. He pointed out that currently agents are clients and they do not have URI endpoints. And NANDA proposes a DNS for agents among other aspects.
References
Ivan @ datasaur.ai
https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
agile manifesto
https://multithreaded.stitchfix.com/blog/2019/03/11/FullStackDS-Generalists/
https://hbr.org/2019/03/why-data-science-teams-need-generalists-not-specialists
Ramesh Raskar, https://nanda.media.mit.edu
Armando Solar-Lezama, “Open Challenges for the Next Generation of Programming Agents” , https://x.com/_odsc/status/2047815353967780118
