Backprop and SGD From Scratch Part 4

[[my back prop SGD from scratch 2022-Aug]]
- 16:38 why no learning going on
  - hmm look at this network
```
		  import network as n
		  import dataset
		  import plot
		  X, Y = dataset.build_dataset_inside_outside_circle(0.5)
		  
		  model = n.initialize_model({"learning_rate": 0.01})
		  (
		    loss_vec, model, artifacts, X_validation, Y_validation, Y_prob
		  )  = n.train_network(X, Y, model)
```
  - 17:02 wondering if I can inspect the gradient, to see if it is pointing where it should
  - 23:02 so for a random weight initialized network, curious at least here the response should be nonlinear right?
    - hmm
```
			  model = n.initialize_model({"learning_rate": 0.01})
			  
			  # X_validation  # from earlier 
			  Y_prob, total_loss = loss(model.layers, X_validation, Y_validation)
			  
			  plot.scatter_plot_by_z(X_validation, Y_prob)  # 2022-09-04T031837-scatter.png
```
    - wow super weird but basically even for random weights we have only the linear separation, so that makes me think maybe even the basic feed forward might have some problem?
    - and the probability sharpness?
```
			  from utils import utc_now, utc_ts
			  out_loc = f"{utc_ts(utc_now())}-hist.png"
			  plt.hist(Y_prob, bins=50)
			  pylab.savefig(out_loc, bbox_inches='tight')  # 2022-09-04T033435-hist.png
```
    - 23:38 yea the probability output above is super sharp, for a completely random network. Hmm ok. Well nice separation but yea why is it just only doing linear separation right now and the random initial weights even not non-linear?
  - also one side idea is I'm not changing the bias at all. but that is unrelated to the above linear weirdness .
    - hm