embedding and pgvector query speedup

Wow, I found a very silly issue that was holding back this query. The data is a langchain embedding table of items associated with places, as well as a second table with those place addresses along with their longitude, latitude. Also, as I had a slower query, executed through sqlalchemy, like from sqlalchemy.sql import text with foo.engine.connect() as conn: data = {"vector": [0.06014461815357208, 0.07383278757333755, 0.010295705869793892, -0.058833882212638855,], "longitude": "", "latitude": ""} sql = """ """ statement = text(sql) # TODO also parameterize the `limit 10` ?...

November 9, 2024 · (updated November 18, 2024) · 4 min · 832 words · Michal Piekarczyk

odd prompt?

looked at someone’s fun project, https://www.npmjs.com/package/@rhettlunn/is-odd-ai That uses GPT 3.5 to see if a number is odd, for fun. My friend was asking has anyone evaluated this, so hmm i took a quick look. https://github.com/rhettlunn/is-odd-ai/blob/main/index.js Spotted two interesting things. Prompt injection attack not sure if i just made that up, but like a SQL injection , the code that places user input into the prompt asking about oddness, doesn’t type check if it is a number, and doesn’t therefore have any way of knowing will someone attempt to insert some text that will cause GPT to escape its sandbox....

November 6, 2024 · (updated November 8, 2024) · 2 min · 223 words · Michal Piekarczyk

langchain_pgvector bug

I had a issue where , using the langchain_pgvector library, with the vectorstore.add_documents function call, which has worked for me before for a while but somehow, for a new collection, I’m trying to add thousands of documents, but only a handful get added, and without errors. Very weird . I didn’t see any useful patterns in the 4 out of 1022 documents that did get added, and pdb tracing through the code did not reveal any silent errors....

November 2, 2024 · (updated November 3, 2024) · 1 min · 188 words · Michal Piekarczyk

video game Stray Review

This contains spoilers. This isn’t really a review, but maybe just free writing about a game I really enjoyed, recently. Maybe that is a review haha not sure. This is a story mode action adventure explorer type like Borderlands, walking dead, life is strange, and many others. Where, you Control a protagonist, in this case a cat,, 🐈, interacting with other characters in the game, as well as exploring your world to solve mostly 3 -space geometry puzzles, of , how to navigate a cat , with jumps and crawl spaces....

October 25, 2024 · (updated November 17, 2024) · 8 min · 1629 words · Michal Piekarczyk

How to query pgvector data leveraging multiple indexes

First, an embedding table was created, using langchain pgvector. (TODO show that example). Initial query which was working, using, a chosen_embedding, of some uuid I randomly picked from the vector table, using the cosine similarity <=> as an ORDER BY, explain analyze with myblah(chosen_embedding, chosen_id) as ( values ( (SELECT embedding FROM langchain_pg_embedding WHERE id = '280aefd0-cb15-4a54-924d-aab37ee8a816' ), '280aefd0-cb15-4a54-924d-aab37ee8a816') ) SELECT substr(id, 0, 8) as id, substr(document, 0, 40) as doc, round( 1 - cast(embedding <=> chosen_embedding as numeric), 3 ) as score, cmetadata->'name' as name FROM langchain_pg_embedding, myblah WHERE id !...

September 14, 2024 · (updated September 16, 2024) · 2 min · 233 words · Michal Piekarczyk