What is random forest?
As new arisen, a kind of machine study with quick height is algorithmic, random forest (Random Forest, abbreviation RF) have wide applied perspective, be sure to Medical Protection from market sale, can use already do market sale simulative to build a model, statistical client origin, reservation and prediction of a person's luck in a given year, also can use the easy sensibility of the risk that forecasts a disease and ill patient.
Random forest is one can be done can be returned to and classify. It has the character that processes big data, and it conduces to estimation or variable is very important foundation data build a model.
A kind of what kind of algorithm is random forest after all then?
If the reader has contacted decision-making tree (Decision Tree) if, what can understanding very easily so is random forest. Random forest passes the thought of compositive study to cultivate many a kind of compositive algorithm namely, its main unit is decision-making tree, and the one big branch that its essence belongs to machine study, compositive study (Ensemble Learning) method. There are two keywords in the name of random forest, one is " random " , one is " forest " . "Silvan " we are very good understand, one is called a tree, so thousands of can be called a forest, such analogy is very apt still, actually this also is the main idea of random forest- - of compositive thought reflect. "We meet the meaning of random " be told in below part.
Explain actually from intuitionistic angle, every decision-making tree is a classification implement (what be aimed at now suppose is classified problem) , so input example to, n the tree can have N classified result. And random forest is compositive all classification are voting result, voting number most category is assigned for final output, this is a kind of the simplest Bagging idea.
What is compositive study method.
Compositive study is solved through building what a few models combine onefold forecast a problem. Its working principle is to generate many classification implement / model, learn independently severally and make forecast. These forecast last couple into sheet, make because of what sheet of this excel any classifies forecast.
Random forest is a subclass that compositive study, what its support cultivates at plan ratio is amalgamative. It is a strategy that learns relative to newer machine () of experiment of Yu Bei Er arises in 90 time, it can be used at any respects almost.
Of random forest generate - of the tree voting
Since random forest has a lot of classification trees, undertake classified to an example of the input, it is the classified result that cultivates according to every, be like is every tree has one bill, you can cast A kind, also can cast B kind, the bill that sends which kind figures the classified result that most is random forest, the face emphasizes here, every tree gives out classified result. Additional, the tree is self-existent, 99.9% irrelvant are cultivated make forecast a result to cover all situations, these forecast a result will each other are quits. Of the tree with outstanding minority forecast a result will " of noise of " of stand aloof Yu Yunyun, make be forecasted well.
Random forest is a kind of important compositive study method that is based on Bagging
What is Bagging principle?
Its characteristic is in " of " random sampling. So what is random sampling?
Random sampling (Bootsrap) collects the sample that secures a number inside the training part from us namely, but after every collect a sample, example replace. That is to say, the sample that collects before continues likely after replace to be collected. Algorithmic to our Bagging, general meeting is collected randomly and train collect example number. The sampling market that such getting and a number that train collect example are same, but example content is completely not identical with. If we are right,have M example training part does T second random sampling, criterion as a result of random sex, t sampling collect each are not identical.
Random sampling made sure the training market of every time sampling is different, but include a few identical sample book.
Why replace ground should sample?
If not be,have replace sample, so the training example of every tree differs completely, without be mixed, the judge level that after every tree training comes out, gets that is to say may be to have very big difference; And random forest is classified finally depend on many tree (weak classification implement) decide by ballot, so the judge standard that the watch of every tree can make give out to differ completely anything but, because this uses completely different training,part trains every tree cannot get significant voting result.
To an example, it is in some second contain M in the trains collect random sampling of example, the probability that is collected every time is 1/m. The probability that is collected is 1? 1/m. Be if M second sampling does not have the probability in be being collected,(1? 1/m)**m. When M → ∞ , (1? 1/m)**m → 1/e? 0.368. That is to say, in every rounds of random sampling of Bagging, training has the data of 36.8% to was not collected to center by sampling collect about centrally.
To this part the was not gone to by sampling data of about 36.8% , we often say for the data outside bag (Out Of Bag, abbreviation OOB) . These data did not participate in those who train volume model to plan to close, because this can be used,will detect the extensive of the model influences ability.
Why to use random wood?
1, have extremely high accuracy rate
2, of random sex introduce, make random forest not easy plan to close too
3, of random sex introduce, make random forest has capacity of very good the voice that fight a confusion of voices
4, can process the data that very tall dimension spends, and need not make diagnostic choice
5, can process disperse data already, also can handle successive model data, data set need not standardization
6, training rate is rapid, can get variable importance sort
7, easy implementation is collateral change, in big sample book of big nowadays data the times has allure very much.
Of course, the advantage of random forest is returned not just above when, actually random forest resembles simply is the battleplan in machine study. You can throw pron any thing almost, it basically is to be able to be offerred use. Conclude in estimation map side is particularly good with, as a result does not need to resemble other classification algorithm be being done in that way a lot of debug.