batch a few book summaries

Drafting this here so far, before it leaves my brain. In, Feel Good Productivity, Ali Abdaal captured very well, the idea that sometimes what we do with the intention of winding down at the end of the day, does not actually achieve that purpose . and instead he challenges his readers, to paint, go for a walk, or to find the things that, really give you the relaxation that you actually want. This reminds me of a friend of mine said about drinking water, she once said that sometimes people will consume food or other snacks when maybe all they want is to quench their thirst. So this reminds me of what Ali has said, that maybe sometimes we reach for doomscrolling, or binge watching Netflix, when we want to wind down, but what we really want to do is do nothing . Ali points out for example that George W. Bush has taken up painting. I looked online, he indeed captures the visages of others very accurately! Ali describes the low stakes rewarding environment you can create for yourself of a hobby, which pulls you in, gives your creative or exploratory brain a chance to look around but does not give you a dopamine hit which is going to leave you with a dopamine bust. Slow reward. But there is a fine line here where if your hobby starts to feel like work you will not enjoy it, not look forward to it. So a difficult balance to strike. One other anecdote he provided that has stuck with me for a while. Now is his anecdote about how LeBron James, buy some stat figures at least, is both the slowest and the fastest player in the NBA. What this means is that apparently is among the fastest people when he is sprinting on the court, but he also walks the most. And apparently in contrast, other players are perhaps constantly jogging in place and not really stopping. So I think with this anecdote Ali reminds us that in order to sustain an indefinite pace in doing what we love we just paying the rent, we also need to stop consistently. oh yeah and I think Andrew Huberman caused this “non-sleep deep rest”, so not napping, and maybe not even meditating, but just being still perhaps. ...

April 3, 2025 · (updated June 2, 2025) · 4 min · 697 words · Michal Piekarczyk

transformer architecture sweet spot

(DRAFT for now ) What is the transformer architecture? Let me try for a, hopefully a sweet spot explanation. A deep neural network, trained by back propagation, with language data, first by self supervised learning (aka pre-training) using Masked Language Modeling, and then by fine tuning, for tasks like text summarization, part of speech labeling, Name Entity Recognition labeling, question answering, translation, and others. Self supervision, by way of next token prediction or more generally masked language modeling , lets a model to be trained without human generated labels. ...

March 9, 2025 · (updated April 11, 2026) · 3 min · 553 words · Michal Piekarczyk

The local sediment

Recently went to check out some of the rock faces in New Haven Nice rocks! Cool dirt too! And in the neighborhood, across the street from the local rock climbing gym were some additional practice rocks.

March 8, 2025 · (updated November 13, 2025) · 1 min · 36 words · Michal Piekarczyk

Chopin Etude op 10 no 1 progress with voice memo and ffprobe and chatgpt

I’ve been recording my progress on Chopin Etude op 10 no 1, for a few years now and figured I’d see where I’m at now. Asked chatgpt for help. I’ve been recording to .m4a by mac voice memo. I just downloaded my files. I had been using a more or less consistent file naming. Chatgpt generated a quick script for polars ( because I had stopped using pandas about a year ago ). ...

February 22, 2025 · (updated March 1, 2025) · 3 min · 547 words · Michal Piekarczyk

build ground truth golden dataset for comparing embedding models faster with chromadb

Initially, thinking that I wanted to create this grand truth data set quickly, a started out by having a four loop and sampling data from my giant data set of documents, looking for matches to input queries, but this ended up being pretty slow and tedious. today I switched to just setting up a local index using chroma DB. and this ended up being extremely fast because I am not having to redo the embedding. ...

February 1, 2025 · (updated February 2, 2025) · 2 min · 255 words · Michal Piekarczyk

Model Metric Updates

This is as an update to an earlier post1, in which the goal is to increase the size of a golden dataset in order to help compare the query performance between two embedding models, all-MiniLM-L12-v2 and all-mpnet-base-v2 and the comparison is important because they are 384 and 768 dimensions respectively, meaning that the second one has twice the storage costs as the other and live postgresql storage is expensive not just for its storage but also for the storage of the HNSW indexes involved 😅. ...

February 1, 2025 · (updated April 12, 2026) · 3 min · 463 words · Michal Piekarczyk

fun pool embedding bug

Recently, I had been interested in locally reproducing the typesense huggingface models on my laptop. I want to experiment with the https://typesense.org nodes, but I also want to be able to use the same embedding models on my laptop for local development. I noticed that the models in the typesense section of hugging face are in the model.onnx format which I had not encountered before. I learned how to get them running locally and I was able to compare that the vectors on a typesense cluster I was running matched vectors I generated locally. ...

January 27, 2025 · (updated June 2, 2025) · 9 min · 1795 words · Michal Piekarczyk

objective comparison of embedding models for your use case

Recently, I got to the point in a project, of looking into TypeSense as an option for embedding hosting for search. Prior, I was working with one particular embedding model, all-mpnet-base-v2, which intuitively and anecdotally performed decently well for my retrieval task. But yea that was the problem, my information was anecdotal and cherry-picked. But when I started looking into TypeSense, I noticed my model of choice was not in the list, https://huggingface.co/typesense/models , and that gave me the direct motivation to finally run a comparison. ...

January 11, 2025 · (updated July 30, 2025) · 4 min · 815 words · Michal Piekarczyk

postgis and the order by cosine distance

a bit tricky, not super intuitive in the docs, but I finally found how to builf a typesense query to do what I had previously done with postgres, that is, first constrain by postgis distance and then order by cosine distance. with a text_embedding column that embeds a text column, and lat lng available as well, flattened on the same collection, it is possible to query like so, { "q": query, "query_by": "text_embedding", "filter_by": f"location:({lat}, {lng}, {radius_km} km)", "sort_by": "_vector_distance:asc", "exclude_fields": "text_embedding", 'page': 1, 'per_page': 100 }

December 28, 2024 · (updated January 21, 2025) · 1 min · 87 words · Michal Piekarczyk

other interesting topics on postgres and kubernetes

Have not covered yet, a few other cool topics. Can go into more detail later, but, for now some highlights. Runtime embeddings to save money At one point in a recent postgres pgvector retrieval project, a really cool epiphany was, w.r.t. indexing in pgvector both a concatenated blob of items and granular items, that it is not necessary to also embed the low level items because they can be embedded at runtime and it is not that time consuming. The dataset is a corpus dishes grouped by menus. The initial design was to allow a two step embedding search, first across the concatenated menu embeddings to narrow down, and then on the menu level, to search the dish item level embeddings a second time. But disk space wise, it became clear that the dish items and their embedding hnsw index would take up an enormous amount. There are typically 10 dishes per menu so a proportionally ~10x larger space was needed. But the duhh moment was that maybe we don’t even need to store the dish item level index at all because maybe we can embed the dishes on the fly when we already have done a first pass, ending up with maybe 5 to 10 menus. Now we would only need to embed maybe 50 to 100 dishes and that would not be too slow. And indeed embedding and searching through 50 to 100 dishes did take time but it appeared to be worth the precious postgresql disk space saved. Of course depending on future query latency requirements it is always possible to reconsider this. ...

December 1, 2024 · (updated July 31, 2025) · 6 min · 1111 words · Michal Piekarczyk