michal.piekarczyk.xyz

model-metric-updates

TODO

fun pool embedding bug

Recently, I had been interested in locally reproducing the typesense huggingface models on my laptop. I want to experiment with the https://typesense.org nodes, but I also want to be able to use the same embedding models on my laptop for local development. I noticed that the models in the typesense section of hugging face are in the model.onnx format which I had not encountered before. I learned how to get them running locally and I was able to compare that the vectors on a typesense cluster I was running matched vectors I generated locally....

objective comparison of embedding models for your use case

Recently, I got to the point in a project, of looking into TypeSense as an option for embedding hosting for search. Prior, I was working with one particular embedding model, all-mpnet-base-v2, which intuitively and anecdotally performed decently well for my retrieval task. But yea that was the problem, my information was anecdotal and cherry-picked. But when I started looking into TypeSense, I noticed my model of choice was not in the list, https://huggingface....

postgis and the order by cosine distance

a bit tricky, not super intuitive in the docs, but I finally found how to builf a typesense query to do what I had previously done with postgres, that is, first constrain by postgis distance and then order by cosine distance. with a text_embedding column that embeds a text column, and lat lng available as well, flattened on the same collection, it is possible to query like so, { "q": query, "query_by": "text_embedding", "filter_by": f"location:({lat}, {lng}, {radius_km} km)", "sort_by": "_vector_distance:asc", "exclude_fields": "text_embedding", 'page': 1, 'per_page': 100 }