Odd pyspark window function behavior

I was working through some odd window function behavior Analyzing some feature drift data, I wanted to obtain min, max and mean drift values for features, partitioning on the compare_date here. I would have just done a group by, but I also wanted to get the baseline_date relevant to the largest drift score and so I went with the below approach. But I ended up with some strange results. from pyspark....

September 23, 2023 · 6 min · 1275 words · Michal Piekarczyk

summary learning how to learn coursera

What On Coursera [[Barbara Oakley]] Learning How to Learn1 Originally, came across by reading a medium post by [[Aleksa Gordić]]. This was a surprisingly useful meta-course ! (NOTE my mini write-up here is still somewhat in DRAFT mode) In three sentences You control the input of learning and the output is a side-effect. But not all strategies are as useful and rote learning is overrated and recall like spaced repetition is where the benefit is....

September 10, 2023 · (updated January 1, 2026) · 6 min · 1078 words · Michal Piekarczyk

everybody loves reynauds

Inspiration Haha so per the joke of a friend about “Reynaud’s” sounds like “Raymond” as in “Everyone loves Raymond” , I was wondering, can sentence transformers tell the difference? But haha spoiler alert sort is is that if a model finds this funny then that might also not really understand the medical condition at play here , haha nervous laughter, 😅 Quick look let’s compare this first embedding model import os from sentence_transformers....

June 25, 2023 · 4 min · 751 words · Michal Piekarczyk

semantic code search part 2

public:: true blog-date:: 2023-06-13 Ok continuing from last time, where I ran sentence_transformers model 'msmarco-MiniLM-L-6-v3' against a code search problem of comparing query lines against a source code corpus actually from the sentence_transformers github This time, I wanted to write some code around measuring the successful hits from a model run using cosine similarity from sentence_transformers Query set choice I expanded the set of queries slightly, but not that much yet so I could focus on the evaluation code....

June 13, 2023 · 10 min · 1959 words · Michal Piekarczyk

logseq publish hugo with python

public:: true blog-date:: 2023-06-12 briefly Inspired by the super popular #[[schrodinger logseq plugin]], I wrote up something in #python today, to publish to #Hugo like https://github.com/sawhney17/logseq-schrodinger but also with support for block embeds. In particular, I was also really inspired by [[Bas Grolleman]]’s concept here around how to be able to use #interstitial-journaling and be able to nicely coalesce selected sources through their block embeds, into a target logseq concept they all refer to....

June 12, 2023 · 8 min · 1683 words · Michal Piekarczyk