Ten minutes master Python machine to learn a feature to choose

Ten minutes master Python machine to learn a feature to choose

In machine study, diagnostic option is the one pace in practice, help you choose to contribute most feature to the result in all features. Apparent, use not relevant data to be able to reduce the accuracy of the model, especially linear algorithm: Linear regression, logistic regression. Because linear algorithm uses gradient to drop commonly,will seek best cost, so if the feature has nothing to do, way is possible misdirect

Effective feature chooses have the following advantage:

1, had decreased to plan to close. This topic is compared actually big. What should understanding is to plan to close too, it is a model too press close to trains collect, brought about extensive to influence ability very poor.

2, raise an accuracy. Fewer useless data means the promotion of model accuracy.

3, reduce training time. Data became little natural operation time is short.

Kind of can be used at example to center feature option in Sklearn.feature_selection module / dimension is reduced, in order to raise estimation implement accuracy

Ten minutes master Python machine to learn a feature to choose

1, selectKBest kind

Scikit-learn offerred SelectKBest kind the most effective feature that can use the test according to a few statistic to choose to give a few amounts, choose K the as biggest as input value dependency feature, move except other features

SelectKBest(score_func, k=10)

Score_func: Callback function, acquiesce is 'f_classif ' , the method that is based on variance to examine reckons the linear between two random variable counts level

To regression: F_regression, mutual_info_regression

To classification: Chi2, f_classif, mutual_info_classif

Kind method: Fit_transform(X[, y]) agrees with data, change it next, get the data after changeover

Ten minutes master Python machine to learn a feature to choose

2, varianceThreshold(threshold=0.0)

It can move except all features that those variance do not satisfy a few threshold value. Below acquiescent circumstance, it will move except all 0 variance features, namely those extraction on all example are worth all changeless feature

Move to be the feature that 0 perhaps exceed 80% for the scale of 1 in the eigenvalue in whole data set except those

Ten minutes master Python machine to learn a feature to choose

3, recursive RFE of diagnostic need division

RFE(estimator, n_features_to_select=None, step=1, verbose=0)

Recursive the feature is eliminated that is to say every time this feature take out that uses to result least. Continue so next. Get oneself think the character that keep is measured finally.

Recursive the main idea that the feature dispels is iteration compose builds a model (if SVM perhaps returns to a model) single out next best (the poorest perhaps) feature (can be chosen according to coefficient) , single out the feature that come to be put, repeat this process on the rest feature next, all over all previous till all features. The order that the feature in this process is eliminated is diagnostic sort. Accordingly, this is a kind of avaricious algorithm that searchs best feature subclass.

The stability of RFE depends on greatly in iteration when ground floor uses which kinds of model. For example, if RFE is used common regression, the regression that changing without the course is not stable, so RFE does not stabilize namely; If use, is Ridge, and the regression that changing with Ridge is stability, so RFE is stability

Ten minutes master Python machine to learn a feature to choose

4, selectFromModel undertakes the feature chooses

SelectFromModel is yuan of converter, it can use processing any containing Coef_ or the assessment after the training of Feature_importances_ attribute implement. If relevant ``coef_`` or Featureimportances attribute is worth under the threshold value that sets beforehand, it is not important that these features will be thought and move get rid of

Ten minutes master Python machine to learn a feature to choose

Welcome to reprint:News » Ten minutes master Python machine to learn a feature to choose
Share: