- [[my back prop SGD from scratch 2022-Aug]]
- 13:35 yea so last time I had noticed , hey on a random initialization why was the
y_prob
0.5
?- I had literally just initialized a new network and got this first example,
ipdb> p x, y (array([10.31816265, -8.80044688]), 1)
while running the
ipdb
debug mode, and inside oftrain_network()
, ran--> 186 y_prob = feed_forward(x, model.layers, verbose=False)
and got
ipdb> p y_prob 0.5
- Let me just use
joblib
to save this so I can just test it again. So I ran this inside of my ipdb,import joblib joblib.dump(model, f"{utc_ts(utc_now())}-model.joblib") # '2022-09-25T174708-model.joblib'
- 13:57 And then inanother session ,
import joblib import ipdb import matplotlib.pyplot as plt import pylab from collections import Counter from utils import utc_now, utc_ts import network as n import dataset import plot import runner model = joblib.load("2022-09-25T174708-model.joblib") y_prob = n.feed_forward(x, model.layers, verbose=False) y_prob # Out[8]: 0.5
ok nice, now that I got this reproduced, let me hunt for some more bugs. Maybe this is purely a coincidence?!????!
- So at this moment for this particular network, the input has no effect basically ,
In [12]: [n.feed_forward(x, model.layers, verbose=False) ...: for x in [ ...: [10.31816265, -8.80044688], ...: [1, 1], ...: [0, 0], ...: [1e4, -1e4] ...: ]] Out[12]: [0.5, 0.5, 0.5, 0.5]
- ok so currently, the final sum appears to always be negative and so the
relu(negative_num)
step at the end always produces a0
and then after that I have a sigmoid so for0
yea makes sense the output is the0.5
. - but why is the input into that final relu always seeming to be negative then ?
- I had literally just initialized a new network and got this first example,
- 15:05 so I think I want to answer a side question , of hey if I create a bunch of random networks, will they all have this weirdness? If not then this might not be a problem .
- ok so ,
from tqdm import tqdm import numpy as np import network as n import dataset import plot import runner import ipdb import matplotlib.pyplot as plt import pylab from collections import Counter from utils import utc_now, utc_ts data = dataset.build_dataset_inside_outside_circle(0.5) parameters = {"learning_rate": 0.01, "steps": 100, "log_loss_every_k_steps": 10 } outputs = [] x = np.array([10.31816265, -8.80044688]) for _ in tqdm(range(10000)): model = n.initialize_model(parameters) outputs.append(n.feed_forward(x, model.layers, verbose=False)) plt.hist(outputs, bins=50) out_loc = f"{utc_ts(utc_now())}.png" print("saving to", out_loc) pylab.savefig(out_loc, bbox_inches="tight") pylab.close() # saving to 2022-09-25T192657.png
- 15:28 ok so false alarm from the sense that, the network is not always stuck at 0.5 .
- And the
0.5
is special only because this is the minimum possible when the step before the sigmoid happens to be a relu . stories heh. - But maybe does beg the question, hey my dataset output is either
0
or1
so if my minimum is0.5
, well that's not ideal haha! - So I need to for sure next either scale the output so the
[0.5, 1.0]
maps to[0, 1.0]
or otherwise, just get rid of thatrelu
all together since I can then allow for all values , mapping into that sigmoid , and that would produce the[0, 1.0]
I need .
- And the
- 17:38 ok so if I take out the final relu, I will also have to adjust the partial derivative calculations too,
- 18:19 ok so first I just updated the
feed_forward
, so then now on commit ,61465a5
, redoing the above mini test, we have ,for _ in tqdm(range(10000)): model = n.initialize_model(parameters) outputs.append(n.feed_forward(x, model.layers, verbose=False)) plt.hist(outputs, bins=50) out_loc = f"{utc_ts(utc_now())}.png" print("saving to", out_loc) pylab.savefig(out_loc, bbox_inches="tight") pylab.close() # saving to 2022-09-25T222204.png
- 18:41 hmm ok so this is still kind of asymmetric but finally getting the values less than
0.5
so better than before. - Ok going to update the partial derivatives too then.
- 18:56 ok lets try this out on commit
ea46849
, having updated partial derivatives for the weights w13, w14 which are affected .import network as n import dataset import plot import runner import ipdb import matplotlib.pyplot as plt import pylab from collections import Counter from utils import utc_now, utc_ts data = dataset.build_dataset_inside_outside_circle(0.5) parameters = {"learning_rate": 0.01, "steps": 50, "log_loss_every_k_steps": 10 } runner.train_and_analysis(data, parameters)
- ok so ,
- 13:35 yea so last time I had noticed , hey on a random initialization why was the