Loss functions vs Metric functions
I like the phrasing in this SO answer, that loss functions are optimized directly when training but that metrics are optimized indirectly. I was trying to figure out last year why functions commonly used as metrics (F1 and AUC) are not listed in the tensor flow keras loss functions . I did however earlier try using F1 as a loss function when trying to understand my particular problem. (At least one error I ran into hints that it is not that simplle because you need to write extra code for computing the gradient.) But even if you can produce the code to compute a gradient for your custom loss function, maybe some metrics are more expensive to run SGD than others. (Also F1 is clearly less sensitive than a function that uses probabilities or logits directly) .