michal.piekarczyk.xyz

embedding and pgvector query speedup

Wow, I found a very silly issue that was holding back this query. The data is a langchain embedding table of items associated with places, as well as a second table with those place addresses along with their longitude, latitude. Also, as I had a slower query, executed through sqlalchemy, like from sqlalchemy.sql import text with foo.engine.connect() as conn: data = {"vector": [0.06014461815357208, 0.07383278757333755, 0.010295705869793892, -0.058833882212638855,], "longitude": "", "latitude": ""} sql = """ """ statement = text(sql) # TODO also parameterize the `limit 10` ?...

odd prompt?

looked at someone’s fun project, https://www.npmjs.com/package/@rhettlunn/is-odd-ai That uses GPT 3.5 to see if a number is odd, for fun. My friend was asking has anyone evaluated this, so hmm i took a quick look. https://github.com/rhettlunn/is-odd-ai/blob/main/index.js Spotted two interesting things. Prompt injection attack not sure if i just made that up, but like a SQL injection , the code that places user input into the prompt asking about oddness, doesn’t type check if it is a number, and doesn’t therefore have any way of knowing will someone attempt to insert some text that will cause GPT to escape its sandbox....

langchain_pgvector bug

I had a issue where , using the langchain_pgvector library, with the vectorstore.add_documents function call, which has worked for me before for a while but somehow, for a new collection, I’m trying to add thousands of documents, but only a handful get added, and without errors. Very weird . I didn’t see any useful patterns in the 4 out of 1022 documents that did get added, and pdb tracing through the code did not reveal any silent errors....

video game Stray Review

This contains spoilers. This isn’t really a review, but maybe just free writing about a game I really enjoyed, recently. Maybe that is a review haha not sure. This is a story mode action adventure explorer type like Borderlands, walking dead, life is strange, and many others. Where, you Control a protagonist, in this case a cat,, 🐈, interacting with other characters in the game, as well as exploring your world to solve mostly 3 -space geometry puzzles, of , how to navigate a cat , with jumps and crawl spaces....

How to query pgvector data leveraging multiple indexes

First, an embedding table was created, using langchain pgvector. (TODO show that example). Initial query which was working, using, a chosen_embedding, of some uuid I randomly picked from the vector table, using the cosine similarity <=> as an ORDER BY, explain analyze with myblah(chosen_embedding, chosen_id) as ( values ( (SELECT embedding FROM langchain_pg_embedding WHERE id = '280aefd0-cb15-4a54-924d-aab37ee8a816' ), '280aefd0-cb15-4a54-924d-aab37ee8a816') ) SELECT substr(id, 0, 8) as id, substr(document, 0, 40) as doc, round( 1 - cast(embedding <=> chosen_embedding as numeric), 3 ) as score, cmetadata->'name' as name FROM langchain_pg_embedding, myblah WHERE id !...