spotting the difference

I had a crazy issue where a databricks job was taking over 13 hours and failing which had tzken just over 1 hour previously on another workspace. Turned out after a bit of staring at logs I had a face palming moment because yet again I got bit by spot instances/spot workers. I found in my logs Spark UI these weird errors, “Executor 1 removed” “Executor 2 removed” ,.. thinking memory issues or asymmetric shuffle issues but then hovered my mouse and saw ...

March 11, 2026 · (updated May 7, 2026) · 1 min · 110 words · Michal Piekarczyk

Crowded Wisdom

At a recent team meeting , we discussed the merits of voting on story pointing efforts for story cards. Already yawning! A colleague shared a recent study looking back on jarring counting, and the Wisdom of the Crowds phenomenon. You know, the one where a bunch of people stare at an ox and influence each others votes about how the ox is heavier than it really is. At least thats what this royal society paper studied. That if you are given information about your peers assessmwnts of the ox’s weight, you are more likely to discount smaller guesses than larger ones. And they captured a correction factor , which might be handy the next time you are in a situation where you and other people around you are counting gumballs in a jar or group shaming chunky four legged herbivores. ...

March 10, 2026 · 3 min · 535 words · Michal Piekarczyk

Spark small file problem

Making a mental note, today I encountered what is referred to as the spark small file problem [1]. Well, though I did not realize it initially anyway. I was running a pretty complicated feature engineering notebook, trying to reproduce the results of a colleague. I set up a databricks workflow DAG on a recent evening, looking forward to seeing what happened a following morning. The result surprised me. I woke up to a job that ran for 13 hours and crashed with a shuffle flavored error. ...

March 9, 2026 · (updated March 13, 2026) · 5 min · 1056 words · Michal Piekarczyk

Reverse Mick Jagger

Listening to this Cory Doctorow interview . [1] I like his hypothesis for why the foundation model companies are valued so disproportionately highly, that the aristocratic class really really really doesnt want to depend on the working class and they see a way out to finally do without them. That makes sense! But yea reality is , the current models are really really really good at giving you what you “say” you want, which is far from what you need most of the time. ...

March 8, 2026 · (updated March 9, 2026) · 3 min · 439 words · Michal Piekarczyk

prompt-driven-development

lack of clarity what to do next? write it out as if you were to prompt a LLM tool like ChatGPT , but instead the prompt is for you . Prompting seems to have rewired my brain , over the past three years, to focus into words, into highly descriptive narrative, what I want , knowing my words need to be free of noise. This is to the point now Ill often start writing a prompt in a Chat window or a message to a friend or colleague at work, and Ill stop, realize that writing helped me organize my thoughts and I no longer need to even ask the question. ...

March 4, 2026 · 4 min · 661 words · Michal Piekarczyk

going the manhattan distance

Reading the on bullshit paper. author saying in old days craftsmen did not “cut corners”. looking this up , think that was literally about the road you took, per reading [2]. paved road vs over fields. but hmm, shortcuts are kind of organic rosds not necessarily sloppy. like, we have, planned vs actually useful question here ? Idea can be extended maybe , what are we building anyway. You will not do it perfectly from your first idea. ...

March 3, 2026 · (updated March 7, 2026) · 2 min · 276 words · Michal Piekarczyk

humidifier debugging first try

Just had some first try humidifier debugging results! Daily yay! Hah, debugging stories like this that I have, rarely get resolved quickly. My vornado fan heater that downgraded into just a fan had an issue where I found a workaround that ultimately stopped helping and lucklily the staff at Vornado sent me a replacement so I guess I was under warranty. But for my Honeywell humidifier, I had a pretty lucky result just now. I usually fill the tank in the evening, for a mist to counter the evening dryness from the winter indoor weather. But this morning the tank was full and the orange panic light was on. This light only comes on when the water runs out. So that was weird. ...

March 2, 2026 · 1 min · 165 words · Michal Piekarczyk

Immich to vimeo Update

I was trying to use chat gpt codex to save myself a few minutes, to add two new vimeo rest api parameters, name and description , to my upload script I mentioned earlier3. I encountered a 500 and so I could not create a PR. however the cool thing is that I learned about the git apply capability, which somehow I did not realize existed . But it worked seamlessly, with just taking the patch and git apply codex.patch. ...

March 1, 2026 · 2 min · 266 words · Michal Piekarczyk

Boundary Detection

Follow up to here1, expanding, on the overwhelm of intake you end up with if there are too many voices competing for your time. I want to sharpen what appears to be happening. Yes, your colleagues are asking for help and you dont want to deny them. Yes some production system is breaking, someone else is addressing it, but you have an inkling as to why and you want to save them time on triage. Yes, your colleagues are asking for their pull requests to be reviewed and yes there are also all the other things you said yes to presumably at the start of the sprint. And yes you just attended a meeting reminding you that the next PI planning is just a few weeks away and you should spend some time on refining proposals. ...

February 27, 2026 · (updated February 28, 2026) · 2 min · 410 words · Michal Piekarczyk

Catalyst Optimizer Pushdown Encounter

I ended up learning about how the spark catalyst optimizer performs a so called predicate pushdown, running into this weird issue, simplifying my example below. The behavior was encountered when a df has “left_id”, “free_form” and, where df_ref contains “right_id”, “location” and the column “free_form” contains yyyyMMdd dates only when “location” is “12345” and so thats why a join is performed, start_date = '2024-01-01' end_date = '2025-12-31' (df.join(df_ref, df.left_id = df_ref.right_id, "left") .filter(f.col("location" ) == "12345").withColumn ( "foo_date", f.to date("free_form", "ууууMMdd") ) .where(f.col("foo_date").between(start_date, end_date)) .display()) But mysteriously this crashes when the final start_date, end_date filter is done with ...

February 25, 2026 · (updated March 2, 2026) · 2 min · 295 words · Michal Piekarczyk