- my backprop SGD from scratch 2022-Aug
- 14:13 ok reviewing from last time ,
- Yea so I had switched from relu to sigmoid on commit
b88ef76daf
, but yea log loss is still going up during training, so for sure got rid of the bug of how it did not make sense to map that relu output to a sigmoid since a relu only produces positive numbers and so the sigmoid therefore was only able to produce values greater than 0.5 anyway. - So at this point one thought I have for sure is whether this network is just one layer more complicated than would be needed for a problem set this simple. The thought arose after seeing that weight output from last training,
- But in any case, I think for now I am curious if I can find more bugs.
- Yea so I had switched from relu to sigmoid on commit
- So, we are underfitting here. So the loss is just increasing steadily and I see the layer 1 and layer 2 weights are just increasing steadily as well. So makes me think this is related.
- 16:02 let me try to observe the updates ,
import network as n import dataset import plot import runner import ipdb import matplotlib.pyplot as plt import pylab from collections import Counter from utils import utc_now, utc_ts data = dataset.build_dataset_inside_outside_circle(0.5) parameters = {"learning_rate": 0.01, "steps": 50, "log_loss_every_k_steps": 10 } runner.train_and_analysis(data, parameters)
- 16:02 let me try to observe the updates ,
- 14:13 ok reviewing from last time ,
- 17:26 ah well spotted one silly bug in tracking the metrics, so I had the train and validation loss I was logging flipped,
if step % log_loss_every_k_steps == 0: _, total_loss = loss(model, data.X_validation, data.Y_validation) metrics["train"]["loss_vec"].append(total_loss) _, total_loss = loss(model, data.X_train, data.Y_train) metrics["validation"]["loss_vec"].append(total_loss)
fixed now so it is
if step % log_loss_every_k_steps == 0: _, total_loss = loss(model, data.X_validation, data.Y_validation) metrics["validation"]["loss_vec"].append(total_loss) _, total_loss = loss(model, data.X_train, data.Y_train) metrics["train"]["loss_vec"].append(total_loss)
A bug indeed, but would not affect the training itself .
- Ok I think good thing to do next, continue my low level debugging such that as I calculate
g
, if gradient descent is working properly, then I should be able to write an assert that I think the loss at least for the single example should decrease after applyingg
update, otherwise something is wrong ! - 19:27 ok to check this I then have to calculate the loss on the micro-batch I'm using here,
- ok, first here is how I would reshape a single example to obtain its loss,
i = 0 x, y = data.X_train[i], data.Y_train[i] x.shape, y.shape Y_actual, total_loss = n.loss(model, x.reshape((1, -1)), y.reshape((1, 1))) print("(x, y)", (x, y)) print("Y_actual", Y_actual) print("loss", total_loss)
(x, y) (array([ -7.55637702, -12.67353685]), 1) Y_actual [0.93243955] loss 0.06995095896007311
- And side not I realized technically I'm not plotting the training loss, since the training set has
9,000
rows and I'm only really using500
or so of them so far. So I will adjust the training loss calculation for specifically that portion I use. - 20:27 ok cool, going to try out this new code where I also now am logging the before and after for each microbatch loss
import network as n import dataset import plot import runner import ipdb import matplotlib.pyplot as plt import pylab from collections import Counter from utils import utc_now, utc_ts data = dataset.build_dataset_inside_outside_circle(0.5) parameters = {"learning_rate": 0.01, "steps": 500, "log_loss_every_k_steps": 10 } model, artifacts, metrics = runner.train_and_analysis(data, parameters) outer: 100%|█████████████████████████████████████████████████████████████████████| 500/500 [00:12<00:00, 40.35it/s] saving to 2022-10-03T003158.png 2022-10-03T003158.png 2022-10-03T003159-weights.png 2022-10-03T003200-hist.png saving to 2022-10-03T003201-scatter.png 2022-10-03T003201-scatter.png
And let me look at those micro batch updates then
In [8]: metrics["micro_batch_updates"][:5] Out[8]: [{'loss_before': 0.43903926069642474, 'y_actual_before': array([0.64465547]), 'x': array([-9.44442228, 1.4129736 ]), 'y': 1, 'loss_after': 0.43757904199626413, 'y_actual_after': array([0.6455975])}, {'loss_before': 1.0263273283159982, 'y_actual_before': array([0.64167946]), 'x': array([-3.4136343 , 17.13301918]), 'y': 0, 'loss_after': 1.0309406841349795, 'y_actual_after': array([0.64332871])}, {'loss_before': 0.4300753021013386, 'y_actual_before': array([0.65046011]), 'x': array([-2.26675345, -5.20582749]), 'y': 1, 'loss_after': 0.4285424015973017, 'y_actual_after': array([0.65145797])}, {'loss_before': 1.0544704530873739, 'y_actual_before': array([0.65162314]), 'x': array([ 14.74873303, -16.34664216]), 'y': 0, 'loss_after': 1.0598453040833464, 'y_actual_after': array([0.65349059])}, {'loss_before': 0.42370781274874675, 'y_actual_before': array([0.65461512]), 'x': array([ 1.71615885, -11.0142264 ]), 'y': 1, 'loss_after': 0.42217509911520096, 'y_actual_after': array([0.65561923])}] import matplotlib.pyplot as plt from utils import utc_now, utc_ts import pylab deltas = [x["loss_after"] - x["loss_before"] for x in metrics["micro_batch_updates"]] with plt.style.context("fivethirtyeight"): plt.hist(deltas, bins=50) out_loc = f"{utc_ts(utc_now())}-micro-batch-loss-deltas.png" print("saving to", out_loc) pylab.savefig(out_loc, bbox_inches="tight") pylab.close() plt.close() # saving to 2022-10-03T005623-micro-batch-loss-deltas.png
- Wow fascinating, so a lot of the loss is getting reduced, at least slightly more than not haha,
In [17]: from collections import Counter ...: Counter(["loss_reduction" if x < 0 else "loss_increase" for x in [y for y in deltas if y != 0]]) Out[17]: Counter({'loss_reduction': 260, 'loss_increase': 240})
And
with plt.style.context("fivethirtyeight"): plt.plot(deltas) out_loc = f"{utc_ts(utc_now())}-micro-batch-loss-deltas-over-steps.png" print("saving to", out_loc) pylab.savefig(out_loc, bbox_inches="tight") pylab.close() plt.close()
But wow, this next plot is fascinating!
with plt.style.context("fivethirtyeight"): fig = plt.figure(figsize =(20, 9)) plt.plot(deltas, linewidth=0.7) plt.title("Microbatch loss_after - loss_before") out_loc = f"{utc_ts(utc_now())}-micro-batch-loss-deltas-over-steps.png" print("saving to", out_loc) pylab.savefig(out_loc, bbox_inches="tight") pylab.close() plt.close()
- So according to the above, yes the microbatch delta loss is ping ponging back and forth and basically getting worse, for the different microbatch inputs . Wow. so glad I looked at this chronological kind of plot !
- ok, first here is how I would reshape a single example to obtain its loss,