Backprop and SGD From Scratch 2022-09-25

[[my back prop SGD from scratch 2022-Aug]]

13:35 yea so last time I had noticed , hey on a random initialization why was the y_prob 0.5 ?

I had literally just initialized a new network and got this first example,

			  ipdb> p x, y
			  (array([10.31816265, -8.80044688]), 1)

while running the ipdb debug mode, and inside of train_network() , ran

			  --> 186         y_prob = feed_forward(x, model.layers, verbose=False)

and got

			  ipdb> p y_prob
			  0.5

Let me just use joblib to save this so I can just test it again. So I ran this inside of my ipdb,

			  import joblib
			  joblib.dump(model, f"{utc_ts(utc_now())}-model.joblib")
			  # '2022-09-25T174708-model.joblib'

13:57 And then inanother session ,

			  import joblib
			  
			  import ipdb
			  import matplotlib.pyplot as plt
			  import pylab
			  from collections import Counter
			  from utils import utc_now, utc_ts
			  import network as n
			  import dataset
			  import plot
			  import runner
			  
			  model = joblib.load("2022-09-25T174708-model.joblib")
			  y_prob = n.feed_forward(x, model.layers, verbose=False)
			  
			  y_prob # Out[8]: 0.5

ok nice, now that I got this reproduced, let me hunt for some more bugs. Maybe this is purely a coincidence?!????!

So at this moment for this particular network, the input has no effect basically ,

			  
			  In [12]: [n.feed_forward(x, model.layers, verbose=False)
			      ...: for x in [
			      ...: [10.31816265, -8.80044688],
			      ...: [1, 1],
			      ...: [0, 0],
			      ...: [1e4, -1e4]
			      ...: ]]
			  Out[12]: [0.5, 0.5, 0.5, 0.5]

ok so currently, the final sum appears to always be negative and so the relu(negative_num) step at the end always produces a 0 and then after that I have a sigmoid so for 0 yea makes sense the output is the 0.5 .
but why is the input into that final relu always seeming to be negative then ?

15:05 so I think I want to answer a side question , of hey if I create a bunch of random networks, will they all have this weirdness? If not then this might not be a problem .

ok so ,

			  from tqdm import tqdm
			  import numpy as np
			  import network as n
			  import dataset
			  import plot
			  import runner
			  import ipdb
			  import matplotlib.pyplot as plt
			  import pylab
			  from collections import Counter
			  from utils import utc_now, utc_ts
			  
			  data = dataset.build_dataset_inside_outside_circle(0.5)
			  parameters = {"learning_rate": 0.01,
			          "steps": 100,
			          "log_loss_every_k_steps": 10
			          }
			  outputs = []
			  x = np.array([10.31816265, -8.80044688])
			  
			  for _ in tqdm(range(10000)):
			  	model = n.initialize_model(parameters)
			  	outputs.append(n.feed_forward(x, model.layers, verbose=False))
			      
			  plt.hist(outputs, bins=50)
			  out_loc = f"{utc_ts(utc_now())}.png"
			  print("saving to", out_loc)
			  pylab.savefig(out_loc, bbox_inches="tight")
			  pylab.close()
			  # saving to 2022-09-25T192657.png

15:28 ok so false alarm from the sense that, the network is not always stuck at 0.5 .
- And the 0.5 is special only because this is the minimum possible when the step before the sigmoid happens to be a relu . stories heh.
- But maybe does beg the question, hey my dataset output is either 0 or 1 so if my minimum is 0.5 , well that's not ideal haha!
- So I need to for sure next either scale the output so the [0.5, 1.0] maps to [0, 1.0] or otherwise, just get rid of that relu all together since I can then allow for all values , mapping into that sigmoid , and that would produce the [0, 1.0] I need .
17:38 ok so if I take out the final relu, I will also have to adjust the partial derivative calculations too,

18:19 ok so first I just updated the feed_forward , so then now on commit , 61465a5 , redoing the above mini test, we have ,

			  
			  for _ in tqdm(range(10000)):
			      model = n.initialize_model(parameters)
			      outputs.append(n.feed_forward(x, model.layers, verbose=False))
			      
			  plt.hist(outputs, bins=50)
			  out_loc = f"{utc_ts(utc_now())}.png"
			  print("saving to", out_loc)
			  pylab.savefig(out_loc, bbox_inches="tight")
			  pylab.close()
			  
			  # saving to 2022-09-25T222204.png

18:41 hmm ok so this is still kind of asymmetric but finally getting the values less than 0.5 so better than before.
Ok going to update the partial derivatives too then.

18:56 ok lets try this out on commit ea46849 , having updated partial derivatives for the weights w13, w14 which are affected .

			  import network as n
			  import dataset
			  import plot
			  import runner
			  import ipdb
			  import matplotlib.pyplot as plt
			  import pylab
			  from collections import Counter
			  from utils import utc_now, utc_ts
			  
			  data = dataset.build_dataset_inside_outside_circle(0.5)
			  parameters = {"learning_rate": 0.01,
			          "steps": 50,
			          "log_loss_every_k_steps": 10
			  
			          }
			  runner.train_and_analysis(data, parameters)