The local sediment
Recently went to check out some of the rock faces in New Haven Nice rocks! Cool dirt too! And in the neighborhood, across the street from the local rock climbing gym were some additional practice rocks.
Recently went to check out some of the rock faces in New Haven Nice rocks! Cool dirt too! And in the neighborhood, across the street from the local rock climbing gym were some additional practice rocks.
I’ve been recording my progress on Chopin Etude op 10 no 1, for a few years now and figured I’d see where I’m at now. Asked chatgpt for help. I’ve been recording to .m4a by mac voice memo. I just downloaded my files. I had been using a more or less consistent file naming. Chatgpt generated a quick script for polars ( because I had stopped using pandas about a year ago )....
Initially, thinking that I wanted to create this grand truth data set quickly, a started out by having a four loop and sampling data from my giant data set of documents, looking for matches to input queries, but this ended up being pretty slow and tedious. today I switched to just setting up a local index using chroma DB. and this ended up being extremely fast because I am not having to redo the embedding....
DRAFT This is as an update to an earlier post1, in which the goal is to increase the size of a golden dataset in order to help compare the query performance between two embedding models, all-MiniLM-L12-v2 and all-mpnet-base-v2 and the comparison is important because they are 384 and 768 dimensions respectively, meaning that the second one has twice the storage costs as the other and live postgresql storage is expensive not just for its storage but also for the storage of the hnsw indexes involved 😅....
Recently, I had been interested in locally reproducing the typesense huggingface models on my laptop. I want to experiment with the https://typesense.org nodes, but I also want to be able to use the same embedding models on my laptop for local development. I noticed that the models in the typesense section of hugging face are in the model.onnx format which I had not encountered before. I learned how to get them running locally and I was able to compare that the vectors on a typesense cluster I was running matched vectors I generated locally....