The local sediment

Recently went to check out some of the rock faces in New Haven Nice rocks! Cool dirt too! And in the neighborhood, across the street from the local rock climbing gym were some additional practice rocks.

March 8, 2025 · (updated April 6, 2025) · 1 min · 36 words · Michal Piekarczyk

Chopin Etude op 10 no 1 progress with voice memo and ffprobe and chatgpt

I’ve been recording my progress on Chopin Etude op 10 no 1, for a few years now and figured I’d see where I’m at now. Asked chatgpt for help. I’ve been recording to .m4a by mac voice memo. I just downloaded my files. I had been using a more or less consistent file naming. Chatgpt generated a quick script for polars ( because I had stopped using pandas about a year ago )....

February 22, 2025 · (updated March 1, 2025) · 3 min · 547 words · Michal Piekarczyk

build ground truth golden dataset for comparing embedding models faster with chromadb

Initially, thinking that I wanted to create this grand truth data set quickly, a started out by having a four loop and sampling data from my giant data set of documents, looking for matches to input queries, but this ended up being pretty slow and tedious. today I switched to just setting up a local index using chroma DB. and this ended up being extremely fast because I am not having to redo the embedding....

February 1, 2025 · (updated February 2, 2025) · 2 min · 255 words · Michal Piekarczyk

model-metric-updates

DRAFT This is as an update to an earlier post1, in which the goal is to increase the size of a golden dataset in order to help compare the query performance between two embedding models, all-MiniLM-L12-v2 and all-mpnet-base-v2 and the comparison is important because they are 384 and 768 dimensions respectively, meaning that the second one has twice the storage costs as the other and live postgresql storage is expensive not just for its storage but also for the storage of the hnsw indexes involved 😅....

February 1, 2025 · (updated July 30, 2025) · 3 min · 464 words · Michal Piekarczyk

fun pool embedding bug

Recently, I had been interested in locally reproducing the typesense huggingface models on my laptop. I want to experiment with the https://typesense.org nodes, but I also want to be able to use the same embedding models on my laptop for local development. I noticed that the models in the typesense section of hugging face are in the model.onnx format which I had not encountered before. I learned how to get them running locally and I was able to compare that the vectors on a typesense cluster I was running matched vectors I generated locally....

January 27, 2025 · (updated June 2, 2025) · 9 min · 1795 words · Michal Piekarczyk