Notes

Boundary Detection

Follow up to here1, expanding, on the overwhelm of intake you end up with if there are too many voices competing for your time. I want to sharpen what appears to be happening. Yes, your colleagues are asking for help and you dont want to deny them. Yes some production system is breaking, someone else is addressing it, but you have an inkling as to why and you want to save them time on triage. Yes, your colleagues are asking for their pull requests to be reviewed and yes there are also all the other things you said yes to presumably at the start of the sprint. And yes you just attended a meeting reminding you that the next PI planning is just a few weeks away and you should spend some time on refining proposals. ...

Catalyst Optimizer Pushdown Encounter

I ended up learning about how the spark catalyst optimizer performs a so called predicate pushdown, running into this weird issue, simplifying my example below. The behavior was encountered when a df has “left_id”, “free_form” and, where df_ref contains “right_id”, “location” and the column “free_form” contains yyyyMMdd dates only when “location” is “12345” and so thats why a join is performed, start_date = '2024-01-01' end_date = '2025-12-31' (df.join(df_ref, df.left_id = df_ref.right_id, "left") .filter(f.col("location" ) == "12345").withColumn ( "foo_date", f.to date("free_form", "ууууMMdd") ) .where(f.col("foo_date").between(start_date, end_date)) .display()) But mysteriously this crashes when the final start_date, end_date filter is done with ...

What ends up happening

We want to follow the optimal path of the day, not the one using yesterday’s information One reason why plans, I notice for myself, don’t survive, is when there are multiple people waiting on me and also if I have multiple opportunities I have noticed. And also all of this competes with all of the things I have theoretically planned to do in a given day or week or sprint. ...

Whose Responsibility is Tech Debt

Starting a small note here, around discussions myself and others have been having on my team about whether addressing tech debt should be prioritized explicitly or is it something you just address like you might with local refactors with the so-called boy-scout-rule [1]. Realizing after having several converstions about this lately that there are many diverging opinions on this. I have some teammates who see this as the responsibility of the product team, and others who will tackle it head on like an obstacle in their way of the main task at hand. And a more nuanced middle-ground of colleague who is a fan of The Goal, is that you should not have tech debt in the first place if you “pay it down” in stride as part of your role of implementing whatever the company envisions. ...

Notes on feast for spark

Looking into feast, the open source feature store, and whether there is support for using feast as an interface around parquet and or delta tables, with use in a pyspark batch inference databricks environment. I see that parquet is mentioned in the quickstart2, using parquet as the offline component and using sqlite as the online store component. The offline store component is described as intended for training. Maybe it can be useful for a batch inference case too? ...

The Good the bad and the ugly of AI benefit shortfalls

I’m reading AI Snake Oil, and the authors intro by saying they will focus on the examples of AI that harm and they will not include the ones that benefit society. But it is not straightforward to create a list of the good examples. I think one of their examples in the good column was autocorrect. But auto orrect is a good example of where you can’t just sprinkle AI on a problem to solve it. You need to put a lot of effort to get the UX right. ...

More Is Less

Listening Nate B Jones here1 to the end. Thought I heard him refer to productivity gains. Had to rewind that to hear it again. every single agent you can run in parallel, multiplies your productivity. But does that translate to value? Recently, I’ve been reading the Healthcare Handbook2, where I (re)learned that “productivity” has a specific meaning to labor economists. Productivity is your output given your input. HH authors further cite this piece4. But informally, “productivity” is just, “hey I got more stuff done”, “I checked off more to-do list items!” ...

Iteration Planning

Notes, reading this article. Has some zingers in here , hah! The iron cross, maybe without one leg? You can’t cheat the scope–time–quality triangle. If we try to fix time and scope, quality takes the hit. In a developer-facing product, degraded quality means support escalations, technical debt, and broken trust. Like any debt, technical debt must be paid back, and its interest compounds over time. I have heard many variations of this. The one I like, maybe most, forget precisely where heard it maybe Shane de la Moore, is to not even consider quality as a degree of freedom, just consider that a non-negotiable. ...

do you have a flag?

no flag no country Steve Huyhn (aka A Life Engineered ), discusses here , https://youtu.be/oLzj67H-OHo , a few career concepts he says he had to unlearn. One being about hoping if he does good work, it will surely get noticed. But he points out thats highly unlikely to happen. His alternative is to get better at self promoting, starting gently if youre not a natural braggart, highlighting the work of others and then slipping in alongside something you did too, in some email say or chat message. ...

Hook Up Cloudflare Rag Search

Here are some of my notes on adding Cloudflare AI search as the endpoint for my hugo site’s search. Summary The other weekend, I randomly looked into some minimal ways to set up RAG search on my hugo site. A year prior, I had tried out3 TypeSense as a hosted vector embedding store for a several million row many gigabyte dataset but a hugo text site is pretty small so I was wondering what the price might be for this. Cloudflare AI Search came up, incidentally as not only a vector store alternative but a self contained store with indexing and a small RAG layer on top. ...