Handies

My Rules of text dbutils.widgets (0) Reading a widget that does not exist results in "com.databricks.dbutilsvl.InputWidgetNotDefined"` (1) "dbutils.widgets.text (name, value)" will set the value of a widget only if it does not already exist. If it already exists, this does nothing (2) You cannot change the value of a widget, but you can remove it and then set it again with the same name, with "dbutils.widgets.text (name, value)" . However, if a widget was set in cell1, then cell2 cannot both remove and reset the widget....

Do a build login from shell, $(aws --profile my-local-aws-profile ecr get-login --no-include-email --region us-east-1) build, docker build -t name-of-image -f path/to/Dockerfile path/to/docker/context Run your container # run using an image name, # note that -v takes an absolute path... docker run -i -t -v $(pwd)/local/path:/docker/path <name-of-image>:<tag> # or with a specific image id... say "ad6576e" docker run -d=false -i -t ad6576e If you need your container to have your aws creds Nice hack is to map the “root” user of your container ....

F test statistic to evaluate the features One F test produces a ratio (called an F-value) comparing the variation between two populations’ sample means and the variation within the samples. With a greater variation between the population samples, we are more likely to reject the null hypothesis that the samples are of the same source distribution. With a higher F-value, the lower the p-value associated for the distribution of this test....

Appreciate this post on helping to choose between a few available tests in determining if there are meaningful relationships between feature data. In particular, ANOVA compares two variables, where one is categorical (binning is helpful here) and one is continuous. Chi-square is useful for two categorical comparing two cateorical varables, on the other hand. And Pearson Correlation can be used between two continiuous variables But the caveat is that this test assumes both variables are normally distributed And outliers should be chopped off with some preprocessing....

get the parents of a <blah-branch> git rev-list --parents <blah-commit> beaaaafffff1111111111111111111 fe0000aaaad111111111111111 One of them will typically be the hash of <blah-branch> itself