- look back at my network,
- 13:48 quick capture:

- Just going to write up the partial derivative parts of the gradient one at a time,
- 16:53 ok I have added this to the code now . let me try that train loop again then
import network as n
import plot
X, Y = n.build_dataset_inside_outside_circle()
layers = n.initialize_network_layers()
loss_vec, layers = n.train_network(X, Y, layers)
plot.plot_loss_vec(loss_vec)

- 17:12 ok output above is pretty interesting. Probably if indeed things are working, the learning rate is too high. But haha in case something is actually working, let me actually try plotting the predictions for the loss after the first round which I think looks lowest.
layers = n.initialize_network_layers()
loss_vec, layers, artifacts = n.train_network(X, Y, layers, log_loss_each_round=True, steps=10)
layers = artifacts["9"]["model"]
Y_actual, total_loss = n.loss(layers, X, Y)

layers = artifacts["9"]["model"]
Y_actual, total_loss = n.loss(layers, X, Y)
plot.scatter_plot_by_z(X, Y_actual)

- 17:39 ok haha that's kind of confusing. Not sure why the second time around, pretty sure I did not re-generate the data, the loss on this training round went down and stayed down. Likely the first time around we must have jumped too far from the minimum irrecoverably. And the second time, since indeed the weights are generated randomly, we stayed close.
- But also for the second round, when plotting some outputs, clearly we see something funky is going on. And also I suspect that since I have not fixed that whole 95% to 5% dataset imbalance, some funkiness is happening and indeed the loss does appear to be small because the penalty on the imbalanced dataset is not shining through.
- 17:47 So the imbalanced dataset is likely messing with learning and also with the perception of the loss as well.
- 20:30 ok to balance out that data, probably simplest is to generate data where the circle is just bigger
- 20:52 ok so I ended up with something like,
# dataset.py
import math
import numpy as np
from collections import Counter
def build_dataset_inside_outside_circle(balance=0.5):
# Create some data in a 20x20 box centered at origin.
num_samples = 10000
radius = math.sqrt(40*40*balance/math.pi)
X = np.random.random((num_samples, 2)) * 40 + -20
f = (lambda a: int(np.sqrt(a[0]**2 + a[1]**2) <= radius))
Y = np.array(list(map(f, X)))
# Validate balance
assert abs(Counter(Y)[1]/num_samples - balance) < 0.02
return X, Y
import dataset
import plot
X, Y = dataset.build_dataset_inside_outside_circle(0.5)
plot.scatter_plot_by_z(X, Y) # saving to 2022-08-28T005137-scatter.png

- 20:58 ok lets see what happens with training then ,
layers = n.initialize_network_layers()
loss_vec, layers, artifacts = n.train_network(X, Y, layers, log_loss_every_k_steps=10, steps=1000)
outer: 100%|███████████████████████████████████████████████| 1000/1000 [04:45<00:00, 3.51it/s]
plot.plot_loss_vec(loss_vec)
# saving to 2022-08-28T012206.png

- 21:20 ok so this time definitely a little more time to train since I've been measuring log loss every 10 steps on all
10,000
samples but I can do fewer next time to iterate more quickly.- Especially since darn, indeed this time, the loss spiraled out of control
- Out of curiosity, let me plot the outputs for basically the earliest model ,
layers = artifacts["10"]["model"]
Y_actual, total_loss = n.loss(layers, X, Y)
plot.scatter_plot_by_z(X, Y_actual) # saving to 2022-08-28T012619-scatter.png

- ok wow pretty quirky.
- 21:28 ok yea so super curious about what does reducing learning rate do then. Added some additional code to support this too.
import network as n
model = n.initialize_model({"learning_rate": 0.01})
(
loss_vec, model, artifacts, X_validation, Y_validation, Y_prob
) = n.train_network(X, Y, model)
outer: 100%|███████████████████████████████████████████████████| 60/60 [00:01<00:00, 31.19it/s]
- 21:49 ok lets look at a first run then ,
plot.plot_loss_vec(loss_vec)
# saving to 2022-08-28T015127.png

plot.scatter_plot_by_z(X_validation, Y_prob) # saving to 2022-08-28T015518-scatter.png

- 21:56 darn okay still not learning, despite the additional balancing and lower learning rate. Super curious what is the fundamental issue in this network. Curious to debug this.