Simple for, training model expresses to learn through having label sample book (affirmatory) the good value of all weight and deviation. In supervising type study, machine study algorithm builds a model through compose of the following means: Examine many sample book and try to find out but utmost ground reduces losing model; This one process calls experience risk the smallest change.
The loss is the penalty that forecasts to how terrible. That is to say, the loss is a numerical value, show exact to what the model forecasts individual example rate. If of the model forecast completely accurate, criterion the loss is 0, otherwise the loss will be bigger. The target that trains a model is find a group of average losses from inside all example " lesser " weight and deviation. For example, graph 3 left those who show is the model with bigger loss, what show on the right side of is the model with lesser loss. Hereon graph, notice the following please:
Gules arrowhead shows a loss. Blue line expresses to forecast.Ask an attention, the graph is medium corresponding and gules arrowhead on the right side of arrowhead comparing grows the red in left graph more. Apparent, relatively the blue line in left graph, what the blue line in graph represents on the right side of is the model with forecast the result better.
You may want to know your whether found function of a maths (loss function) , with significant means collect each losses.
Square loss: A kind of common loss function
What the linear regression model that next we should see uses is one kind calls square loss (call L2 loss again) loss function. The square loss of individual example is as follows:
Mean square error (the average square loss that what MSE) points to is every example. Want computational MSE, the sum of all square loss that requests to give each example, divide next with example amount:
Among them:
(X, what Y) points to is example, among them what X points to is the diagnostic part that when the model undertakes forecasting, uses (for example, temperature, age and copulatory success are led) . The label that what Y points to is example (for example, bleat minutelily frequency) . What Prediction(x) points to is weight and deviation and the function that diagnostic collect X combines. What D points to is to include many have label sample book (namely (X, y) ) data set. What N points to is the example amount in D.Although MSE is commonly used,learn at the machine, but it is not exclusive and practical loss function already, also not be the optimal loss function that applies to all case.