• [[my back prop SGD from scratch 2022-Aug]]
    • 13:35 yea so last time I had noticed , hey on a random initialization why was the y_prob 0.5 ?
      • I had literally just initialized a new network and got this first example,
        			  ipdb> p x, y
        			  (array([10.31816265, -8.80044688]), 1)
        
        

        while running the ipdb debug mode, and inside of train_network() , ran

        			  --> 186         y_prob = feed_forward(x, model.layers, verbose=False)
        
        

        and got

        			  ipdb> p y_prob
        			  0.5
        
        
      • Let me just use joblib to save this so I can just test it again. So I ran this inside of my ipdb,
        			  import joblib
        			  joblib.dump(model, f"{utc_ts(utc_now())}-model.joblib")
        			  # '2022-09-25T174708-model.joblib'
        
        
      • 13:57 And then inanother session ,
        			  import joblib
        			  
        			  import ipdb
        			  import matplotlib.pyplot as plt
        			  import pylab
        			  from collections import Counter
        			  from utils import utc_now, utc_ts
        			  import network as n
        			  import dataset
        			  import plot
        			  import runner
        			  
        			  model = joblib.load("2022-09-25T174708-model.joblib")
        			  y_prob = n.feed_forward(x, model.layers, verbose=False)
        			  
        			  y_prob # Out[8]: 0.5
        
        

        ok nice, now that I got this reproduced, let me hunt for some more bugs. Maybe this is purely a coincidence?!????!

      • So at this moment for this particular network, the input has no effect basically ,
        			  
        			  In [12]: [n.feed_forward(x, model.layers, verbose=False)
        			      ...: for x in [
        			      ...: [10.31816265, -8.80044688],
        			      ...: [1, 1],
        			      ...: [0, 0],
        			      ...: [1e4, -1e4]
        			      ...: ]]
        			  Out[12]: [0.5, 0.5, 0.5, 0.5]
        
        
      • ok so currently, the final sum appears to always be negative and so the relu(negative_num) step at the end always produces a 0 and then after that I have a sigmoid so for 0 yea makes sense the output is the 0.5 .
      • but why is the input into that final relu always seeming to be negative then ?
    • 15:05 so I think I want to answer a side question , of hey if I create a bunch of random networks, will they all have this weirdness? If not then this might not be a problem .
      • ok so ,
        			  from tqdm import tqdm
        			  import numpy as np
        			  import network as n
        			  import dataset
        			  import plot
        			  import runner
        			  import ipdb
        			  import matplotlib.pyplot as plt
        			  import pylab
        			  from collections import Counter
        			  from utils import utc_now, utc_ts
        			  
        			  data = dataset.build_dataset_inside_outside_circle(0.5)
        			  parameters = {"learning_rate": 0.01,
        			          "steps": 100,
        			          "log_loss_every_k_steps": 10
        			          }
        			  outputs = []
        			  x = np.array([10.31816265, -8.80044688])
        			  
        			  for _ in tqdm(range(10000)):
        			  	model = n.initialize_model(parameters)
        			  	outputs.append(n.feed_forward(x, model.layers, verbose=False))
        			      
        			  plt.hist(outputs, bins=50)
        			  out_loc = f"{utc_ts(utc_now())}.png"
        			  print("saving to", out_loc)
        			  pylab.savefig(out_loc, bbox_inches="tight")
        			  pylab.close()
        			  # saving to 2022-09-25T192657.png
        			  
        
        


      • 15:28 ok so false alarm from the sense that, the network is not always stuck at 0.5 .
        • And the 0.5 is special only because this is the minimum possible when the step before the sigmoid happens to be a relu . stories heh.
        • But maybe does beg the question, hey my dataset output is either 0 or 1 so if my minimum is 0.5 , well that's not ideal haha!
        • So I need to for sure next either scale the output so the [0.5, 1.0] maps to [0, 1.0] or otherwise, just get rid of that relu all together since I can then allow for all values , mapping into that sigmoid , and that would produce the [0, 1.0] I need .
      • 17:38 ok so if I take out the final relu, I will also have to adjust the partial derivative calculations too,
      • 18:19 ok so first I just updated the feed_forward , so then now on commit , 61465a5 , redoing the above mini test, we have ,


        			  
        			  for _ in tqdm(range(10000)):
        			      model = n.initialize_model(parameters)
        			      outputs.append(n.feed_forward(x, model.layers, verbose=False))
        			      
        			  plt.hist(outputs, bins=50)
        			  out_loc = f"{utc_ts(utc_now())}.png"
        			  print("saving to", out_loc)
        			  pylab.savefig(out_loc, bbox_inches="tight")
        			  pylab.close()
        			  
        			  # saving to 2022-09-25T222204.png
        
        



      • 18:41 hmm ok so this is still kind of asymmetric but finally getting the values less than 0.5 so better than before.
      • Ok going to update the partial derivatives too then.
      • 18:56 ok lets try this out on commit ea46849 , having updated partial derivatives for the weights w13, w14 which are affected .


        			  import network as n
        			  import dataset
        			  import plot
        			  import runner
        			  import ipdb
        			  import matplotlib.pyplot as plt
        			  import pylab
        			  from collections import Counter
        			  from utils import utc_now, utc_ts
        			  
        			  data = dataset.build_dataset_inside_outside_circle(0.5)
        			  parameters = {"learning_rate": 0.01,
        			          "steps": 50,
        			          "log_loss_every_k_steps": 10
        			  
        			          }
        			  runner.train_and_analysis(data, parameters)