Posts

Some of my notes through learning Golang by conversing with ChatGPT

I have not messed around with Golang yet and figured why not try to learn this through just chatting with ChatGPT? Not a full conversation below , but just using this to jot down some notes. And most of the code examples I’m pasting below are from ChatGPT. Error handling is very different in Go So no exceptions like Python. Instead, you can give a return tuple type from a function like this and the second term will be a error code. ...

Exporting messages from Apple imessages

The problem Messages have been clogging up my iphone for a while now and at this point, they are taking up maybe 15Gigs. Most of this is photo attchments. I thought an options was to manually go through the phone UI, deleting photo attachments to save space, but I did a bit of this tedium and the storage space did not seem to clear up. Maybe it is delayed? The other alternative is to hit a button that says “only keep messages up to a year old”. In reality, I don’t need the 15 Gigs of photo attachments, but I don’t want to lose the text data. ...

test drive chat gpt data analysis

blog-date:: 2023-11-18 So lets try out Chat GPT’s new Data Analysis functionality I passed in my multi-year food log dump from Carb Manager and thought, can I plot the protein data. Here is more or less what transpired. In the Analysis tool, I upload a csv file that I exported from Carb Manager and I opened with this question below. Me Can you plot the calories per day over time ? ...

Help diagnose no coffee coming out of your Kamira Espresso Machine

I have loved using the Kamira stove top espresso machine for five plus years now, but here is one interesting thing I learned I did not know about until now. What is coffee soup? I might be making this term up, but this is what comes to mind if I cannot get any coffee to drip out of my Kamira Espresso stove top machine and after giving up, finding that there is a pool of coffee water in the filter holder. See video below. ...

Odd pyspark window function behavior

I was working through some odd window function behavior Analyzing some feature drift data, I wanted to obtain min, max and mean drift values for features, partitioning on the compare_date here. I would have just done a group by, but I also wanted to get the baseline_date relevant to the largest drift score and so I went with the below approach. But I ended up with some strange results. from pyspark.sql.window import Window import pyspark.sql.functions as F toy_df = spark.createDataFrame( [{'feature': 'feat1', 'category': 'cat1', 'Drift score': 0.0, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat1', 'category': 'cat1', 'Drift score': 0.0, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}, {'feature': 'feat1', 'category': 'cat1', 'Drift score': 0.0, 'group': 'blah', 'baseline_date': '20220731', 'compare_date': '20230131'}, {'feature': 'feat2', 'category': 'cat1', 'Drift score': 0.16076398135644604, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat2', 'category': 'cat1', 'Drift score': 0.07818495131083669, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}, {'feature': 'feat2', 'category': 'cat1', 'Drift score': 0.07164427544566881, 'group': 'blah', 'baseline_date': '20220731', 'compare_date': '20230131'}, {'feature': 'feat3', 'category': 'cat1', 'Drift score': 0.2018208744775895, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat3', 'category': 'cat1', 'Drift score': 0.06897468871439233, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}, {'feature': 'feat3', 'category': 'cat1', 'Drift score': 0.07111383432227428, 'group': 'blah', 'baseline_date': '20220731', 'compare_date': '20230131'}, {'feature': 'feat5', 'category': 'cat1', 'Drift score': 0.20151850543660316, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat5', 'category': 'cat1', 'Drift score': 0.05584133483840621, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}, {'feature': 'feat5', 'category': 'cat1', 'Drift score': 0.056223672793567, 'group': 'blah', 'baseline_date': '20220731', 'compare_date': '20230131'}, {'feature': 'feat6', 'category': 'cat1', 'Drift score': 0.10648175064912868, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat6', 'category': 'cat1', 'Drift score': 0.03398787644288803, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}, {'feature': 'feat6', 'category': 'cat1', 'Drift score': 0.027693531284292805, 'group': 'blah', 'baseline_date': '20220731', 'compare_date': '20230131'}, {'feature': 'feat7', 'category': 'cat1', 'Drift score': 0.12696742943404185, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat7', 'category': 'cat1', 'Drift score': 0.07147622765870758, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}, {'feature': 'feat7', 'category': 'cat1', 'Drift score': 0.07478091185430771, 'group': 'blah', 'baseline_date': '20220731', 'compare_date': '20230131'}, {'feature': 'feat8', 'category': 'cat2', 'Drift score': 0.11779958630386245, 'group': 'blah', 'baseline_date': '20191231', 'compare_date': '20230131'}, {'feature': 'feat8', 'category': 'cat2', 'Drift score': 0.04240444683921199, 'group': 'blah', 'baseline_date': '20220131', 'compare_date': '20230131'}] ) toy_df.show() +--------------------+-------------+--------+------------+-------+-----+ | Drift score|baseline_date|category|compare_date|feature|group| +--------------------+-------------+--------+------------+-------+-----+ | 0.0| 20191231| cat1| 20230131| feat1| blah| | 0.0| 20220131| cat1| 20230131| feat1| blah| | 0.0| 20220731| cat1| 20230131| feat1| blah| | 0.16076398135644604| 20191231| cat1| 20230131| feat2| blah| | 0.07818495131083669| 20220131| cat1| 20230131| feat2| blah| | 0.07164427544566881| 20220731| cat1| 20230131| feat2| blah| | 0.2018208744775895| 20191231| cat1| 20230131| feat3| blah| | 0.06897468871439233| 20220131| cat1| 20230131| feat3| blah| | 0.07111383432227428| 20220731| cat1| 20230131| feat3| blah| | 0.20151850543660316| 20191231| cat1| 20230131| feat5| blah| | 0.05584133483840621| 20220131| cat1| 20230131| feat5| blah| | 0.056223672793567| 20220731| cat1| 20230131| feat5| blah| | 0.10648175064912868| 20191231| cat1| 20230131| feat6| blah| | 0.03398787644288803| 20220131| cat1| 20230131| feat6| blah| |0.027693531284292805| 20220731| cat1| 20230131| feat6| blah| | 0.12696742943404185| 20191231| cat1| 20230131| feat7| blah| | 0.07147622765870758| 20220131| cat1| 20230131| feat7| blah| | 0.07478091185430771| 20220731| cat1| 20230131| feat7| blah| | 0.11779958630386245| 20191231| cat2| 20230131| feat8| blah| | 0.04240444683921199| 20220131| cat2| 20230131| feat8| blah| +--------------------+-------------+--------+------------+-------+-----+ Applying the window function here, w = Window.partitionBy("group", "feature", "compare_date", ).orderBy(F.col("Drift score").desc()) ( toy_df .withColumn("mean_score", F.round(F.mean("Drift score").over(w), 4)) .withColumn("max_score", F.round(F.max("Drift score").over(w), 4)) .withColumn("min_score", F.round(F.min("Drift score").over(w), 4)) .withColumn("baseline_date_max_score", F.first("baseline_date").over(w)) .withColumn("row_num", F.row_number().over(w)) .where(F.col("row_num") == 1) .drop("row_num") .select("category", "feature", "compare_date", "mean_score", "max_score", "min_score", "baseline_date_max_score") .show() ) +--------+-------+------------+----------+---------+---------+-----------------------+ |category|feature|compare_date|mean_score|max_score|min_score|baseline_date_max_score| +--------+-------+------------+----------+---------+---------+-----------------------+ | cat1| feat1| 20230131| 0.0| 0.0| 0.0| 20191231| | cat1| feat2| 20230131| 0.1608| 0.1608| 0.1608| 20191231| | cat1| feat3| 20230131| 0.2018| 0.2018| 0.2018| 20191231| | cat1| feat5| 20230131| 0.2015| 0.2015| 0.2015| 20191231| | cat1| feat6| 20230131| 0.1065| 0.1065| 0.1065| 20191231| | cat1| feat7| 20230131| 0.127| 0.127| 0.127| 20191231| | cat2| feat8| 20230131| 0.1178| 0.1178| 0.1178| 20191231| +--------+-------+------------+----------+---------+---------+-----------------------+ I was confused why are the min, max and mean all the same. I thought, could it be my data is corrupted and some of my partitions only have one row? ...

summary learning how to learn coursera

What On Coursera [[Barbara Oakley]] Learning How to Learn1 Originally, came across by reading a medium post by [[Aleksa Gordić]]. This was a surprisingly useful meta-course ! (NOTE my mini write-up here is still somewhat in DRAFT mode) In three sentences You control the input of learning and the output is a side-effect. But not all strategies are as useful and rote learning is overrated and recall like spaced repetition is where the benefit is. Sleep and taking walks is the learning multiplier enabling diffuse mode you don’t benefit from if you are trying to just focus really hard on something. ...

everybody loves reynauds

Inspiration Haha so per the joke of a friend about “Reynaud’s” sounds like “Raymond” as in “Everyone loves Raymond” , I was wondering, can sentence transformers tell the difference? But haha spoiler alert sort is is that if a model finds this funny then that might also not really understand the medical condition at play here , haha nervous laughter, 😅 Quick look let’s compare this first embedding model import os from sentence_transformers.util import semantic_search, cos_sim from sentence_transformers import SentenceTransformer, util import torch hf_token = os.getenv("HF_TOKEN") embedder = SentenceTransformer( 'msmarco-MiniLM-L-6-v3', use_auth_token=hf_token, ) corpus = ["Everybody Loves Raymond", " blood vessels go into a temporary spasm", "blocked flow of blood in fingers", "Everybody Loves Numbness", "Everybody Loves cold fingers", "Everybody Loves Reindeer", "Everybody Loves Riynudno"] corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True) query = "Everybody Loves Raynaud" query_embedding = embedder.encode(query, convert_to_tensor=True) cos_scores = cos_sim(query_embedding, corpus_embeddings)[0] top_results = torch.topk(cos_scores, k=7) cos_scores, top_results print("Query:\n", query, "\n") for score, idx in zip(top_results[0], top_results[1]): print(corpus[idx].strip(), "(Score: {:.4f})".format(score)) output ...

semantic code search part 2

public:: true blog-date:: 2023-06-13 Ok continuing from last time, where I ran sentence_transformers model 'msmarco-MiniLM-L-6-v3' against a code search problem of comparing query lines against a source code corpus actually from the sentence_transformers github This time, I wanted to write some code around measuring the successful hits from a model run using cosine similarity from sentence_transformers Query set choice I expanded the set of queries slightly, but not that much yet so I could focus on the evaluation code. ...

logseq publish hugo with python

public:: true blog-date:: 2023-06-12 briefly Inspired by the super popular #[[schrodinger logseq plugin]], I wrote up something in #python today, to publish to #Hugo like https://github.com/sawhney17/logseq-schrodinger but also with support for block embeds. In particular, I was also really inspired by [[Bas Grolleman]]’s concept here around how to be able to use #interstitial-journaling and be able to nicely coalesce selected sources through their block embeds, into a target logseq concept they all refer to. ...

semantic code search first stab

public:: true blog_date:: 2023-06-11 The idea here was to try out Sentence Transformers , https://sbert.net , on source code search. And as a first stab, a corpus was built with, hey why not, the code from the #sentence-transformers repo. The documentation at https://www.sbert.net/examples/applications/semantic-search/README.html#python was used for the basic test here. And a small bit of code at the bottom here, shows how the lines from the python source files were written to a python list first. ...