How to query pgvector data leveraging multiple indexes

First, an embedding table was created, using langchain pgvector. (TODO show that example). Initial query which was working, using, a chosen_embedding, of some uuid I randomly picked from the vector table, using the cosine similarity <=> as an ORDER BY, explain analyze with myblah(chosen_embedding, chosen_id) as ( values ( (SELECT embedding FROM langchain_pg_embedding WHERE id = '280aefd0-cb15-4a54-924d-aab37ee8a816' ), '280aefd0-cb15-4a54-924d-aab37ee8a816') ) SELECT substr(id, 0, 8) as id, substr(document, 0, 40) as doc, round( 1 - cast(embedding <=> chosen_embedding as numeric), 3 ) as score, cmetadata->'name' as name FROM langchain_pg_embedding, myblah WHERE id !...

September 14, 2024 · (updated September 16, 2024) · 2 min · 233 words · Michal Piekarczyk

string to int conversion nulling

I had this interesting situqtion, where I wanted to plot some numbers that were nested inside of struct columns. They were row counts in a delta table history output, but in any case, I tried to plot them, but my plot treated them as categories. Ok realizing they were strings, I cast them to integers, but then I got nulls. After a bit of trial and error I realized they were probably laerger than 32bit!...

September 13, 2024 · (updated September 16, 2024) · 1 min · 118 words · Michal Piekarczyk

postgresql , pgvector and indexing

placeholder. didnt get chance to write this up yet, but I had used langchain pgvector, to add embeddings to postgresql , I ran my queries, and noticed they were slow. I read about pgvector indexing, and on psql, noticed my embedding column was missing an index! I tried adding the HNSW index manually. Weird error about no dimensions on the vector column. ok learned need explicit dimension. Added it. Nice adding index worked....

September 7, 2024 · (updated September 9, 2024) · 1 min · 101 words · Michal Piekarczyk

Mini survey of some high level deep learning curiosities.

motivation I know more about Databricks for distributed computing. But there is also the Kubernetes world. Let’s read some more on that. Was wondering hmm, I have used Tensorflow, pytorch on standalone VMs in the past and I know that Spark has its own distributed libraries, so do Tensorflow, pytorch natively support spark, hmm? lingering questions (1) What are the dominant technologies for distributed computing? (spark, and containerization by K8S) (2) specifically GPU vs cluster based, are these alternatives?...

September 5, 2024 · (updated September 6, 2024) · 8 min · 1574 words · Michal Piekarczyk

read iphone temp

My iphone has been overheating lately, 🤔 Read one good tip, forget where, just try to avoid using your phone while charging it if it is starting to overheat. But also, if it is really heating up, to stop charging it just in case. Makes sense. Going to try that next time. But what is the temp anyway? Nice, learned from here, https://www.guidingtech.com/how-to-check-iphone-temperature/, you can access your iphone temp through the Privacy Settings, Analytics data....

August 25, 2024 · 4 min · 716 words · Michal Piekarczyk