Xin Zhiyuan is compiled
[Xin Zhiyuan introduction] today, deepness forest series was announced on ArXiv newest the 3rd play -- can do those who show study is multilayer GBDT, feng Ji, Yu Yang and Zhou Zhihua put forward a kind to be had novelly explicit those who express to learn ability is multilayer GBDT forest (MGBDT) , it can travel with the target (Target Propagation) aberrant undertake joint training, the application that this kind of method does not suit in network of a lot of nerve has tremendous latent capacity in the domain.
Still remember the person such as Professor Zhou Zhihua " deepness forest " paper? Today, deepness forest series was announced on ArXiv newest the 3rd play -- can do those who show study is multilayer GBDT.
In this the problem is " Multi-Layered Gradient Boosting Decision Trees " in the paper, author Feng Ji, Yu Yang and Zhou Zhihua put forward a kind to be had novelly explicit those who express to learn ability is multilayer GBDT forest (MGBDT) , it can travel with the target (Target Propagation) aberrant undertake joint training. Because the tree is compositive (Tree Ensembles) superior performance, the application that this kind of method does not suit in network of a lot of nerve has tremendous latent capacity in the domain. This job still makes clear, cannot differential system, also can have but the crucial function of differential system (multilayer express to learn) .
Also can do with decision-making tree multilayer distributed express to learnThe development of deepness nerve network obtained striking progress in nearly 10 years. Build hierarchy through compose or " depth " structure, in the model can be being supervised or be not supervisory environment, learn to express goodly from inside primitive data, this is considered as successful crucial factor. Successful applied domain includes computer vision, speech recognition, Natural Language Processing to wait a moment.
Current, almost all deepness nerve networks are the brunt that parameter updates to training in using the retrorse transmission that random gradient drops to regard training as the process. Really, when the model by but small package is comprised (for example, contain nonlinear of activation function add authority and) when, retrorse transmission remains current optimal option. A few otherer one kind when the method regards nerve network as training like target transmission replaces a method to had been put forward, but its effectiveness and popularize degree to still be in inchoate phase. For example, already it is most and OK that some jobs proved the target travels as good with retrorse transmission, and be in practice, often need additional retrorse transmission to undertake fine tuning. In other words, old, good retrorse transmission remains training but small study system (like nerve network) the most effective method. On the other hand, exploration is used cannot the possibility that small module compose builds multilayer or deep-seated model has academic sense not only, and have important applied latent capacity. For example,
Random forest or gradient promote such as decision-making tree (GBDT) the tree of and so on is compositive remain the main kind that has building a model to disperse or form data in all sorts of domains, what because this will be in the data with compositive tree,obtain institute study is statified distributed express.
Because do not have an opportunity to use chain law propagation error (Use Chain Rule To Propagate Errors) , because this has retrorse transmission impossibly. This produced two main problems: Above all, we whether construction is had cannot of small package multilayer model, can be the output that makes intermediate layer medium regarded as distributed express? The 2nd, if so, don't if where,these models train jointly below the circumstance that have the aid of transmits reversely? The purpose of the article offers a kind of such trial namely.
Zhou Zhihua and Feng Ji put forward recently " deepness forest " frame, this is to try to use a tree for the first time compositive come compose is built multilayer model. Specific for, scan through introducing fine granuality (Fine-grained Scanning) operate with cascade (Cascading Operations) , this model can compose is built have what get used to model complexity oneself is multilayer structure, it is inside wide task range competitive expression. What the diversity that the GcForest model that give promoted to use gather study before the person such as Professor Zhou Zhihua increases is all and politic, but this kind of method is applied to only have supervisory learning environment. Meanwhile, how to use Forest compose to build mutiple level model, examine explicitly its express to learn ability, still not be clear about at present. Because a lot of research of forefathers make clear, multilayer the crucial reason that distributinging notation may be success of deepness nerve network, because this learns a way to expressing,undertake exploration is necessary.
In this one job, the target is to make full use of the best part in two worlds: The excellent performance with compositive tree and statified ability of distributed denotive expression (basically undertook exploring) in nerve network. Specifically, we put forward the first multilayer structure, the form that use gradient strengthens decision-making tree to regard every as the layer is built piece, emphasize its clearly expressing to learn ability, what travel through the target is aberrant and collective optimize training course. The model can be being supervised and undertake training below blame supervisory setting. Thought the tree is used only possibly at nerve network normally in the past may small system, the author goes out in paper middle finger, this job proved first, we can use a tree to gain cent layer and express distributedly. Academic proof and experimental result demonstrated the effectiveness of this method. The others of the paper is such organizations: Above all, discuss a few relevant jobs; Next, put forward to have the method of academic basis; Undertook explain and discussion to experimental result finally.
Experimental result: The precision of MGBDT and rash club sex network of prep above nerveIn experimental part, the main purpose that the author shows them is to confirm whether associated training MGBDT is feasible, the means that in having consideration article, offers is in visual task excel CNN. Specific say, they designed synthetic data classification, income to forecast, the experiment such as protein fixed position, consider the following issues: (Q1) whether is training course in practice convergent? (Q2) is acquired Feature what kind of? (Q3) whether does deepness conduce to the expression with learn better? (Q4) give identical structure, compare with the nerve network photograph that trains through retrorse transmission or target transmission, the function of the layer level structure that offers in article how?
Next watches revealed income to forecast (left) locate with protein (right) in the task, multilayer GBDT forest (MGBDT) the precision with network of XGBoost, nerve is comparative.
Below the case that gives same model structure, MGBDT compares nerve network (include to the target travels and travel reversely both) precision wants tall, multilayer the performance of GBDT is better than monolayer GBDT. Next, the target transmits the nerve network of training, astringent not as good in that way as what anticipate, and the identical structure of use GBDT layer is OK implementation trains a loss inferiorly and won't plan to close too.
In addition, the author is in experiment of protein fixed position, through changing the structural design of the network, show in great majority the circumstance falls, the rash club sex of MGBDT is taller than nerve network. Especially the nerve network that the target trains, after inter layer increases, highest from 0.5964 reduced 0.3654, and MGBDT maintains relative stability all the time.
Finally, the author returns the respect that listed they were not discussed, deepness forest is for instance integrated (Deep Forest Integration) and of use MGBDT aberrant with mixture DNN.
Https://arxiv.org/pdf/1806.00007.pdf