port my notes from here https://gist.github.com/namoopsoo/fa903799b958ffc9f279cd293e83e9d9 and here https://gist.github.com/namoopsoo/df08c674b4e3e4794e97601682242c51 and here https://gist.github.com/namoopsoo/607f29e923ceaba890588e69293413cf

(updated February 26, 2023) · 1 min · 12 words · Michal Piekarczyk

tools.trace debugging exceptions and stack trace https://github.com/clojure/tools.trace dependency: [org.clojure/tools.trace "0.7.9"] user=> (use 'clojure.tools.trace)

(updated February 26, 2023) · 1 min · 13 words · Michal Piekarczyk

hmm… using is so the library is built in, but you still have to start useing it.. boot.user=> (is (= 4 (+ 2 2))) java.lang.RuntimeException: Unable to resolve symbol: is in this context clojure.lang.Compiler$CompilerException: java.lang.RuntimeException: Unable to resolve symbol: is in this context, compiling:(/var/folders/7_/sbz867_n7bdcdtdry2mdz1z00000gn/T/boot.user2780891586981282255.clj:1:1) boot.user=> boot.user=> boot.user=> (use 'clojure.test) nil boot.user=> (is (= 4 (+ 2 2))) true lein test Running all tests in a file lein test module/blah/test_file.py Running specific deftest in module_hmm/blah/test_file....

(updated February 26, 2023) · 1 min · 80 words · Michal Piekarczyk

Passing large dataframes with dbutils.notebook.run ! At one point when migrating databricks notebooks to be useable purely with dbutils.notebook.run, the question came up, hey dbutils.notebook.run is a great way of calling notebooks explicitly, avoiding global variables that make code difficult to lint and debug, but what about spark dataframes? I had come across this https://docs.databricks.com/notebooks/notebook-workflows.html#pass-structured-data nice bit of documentation about using the spark global temp view to handle name references to nicely shuttle around dataframes by reference, given that a caller notebook and a callee notebook share a JVM and theoretically this is instantaneous....

(updated February 26, 2023) · 3 min · 518 words · Michal Piekarczyk

comparing really large spark dataframes I had this usecase where I wanted to be able to check if very large multi-million row and multi-thousand column dataframes were equal, but the advice online about using df1.subtract(df2) just was not cutting it because it was just too slow. It seems to me the df1.subtract(df2) approach more or less is a O(n^2) approach where it is necessary to compare each row in df1 with each row in df2....

(updated February 26, 2023) · 3 min · 463 words · Michal Piekarczyk