1. pre-process :
1.1 Boruta to clean data column-wise
1.2 TomekLinks to clean data row-wise
2. training (use a random sample to do this step if the data size is too big):
2.1 create a basic with fixed learning rate and n_estimators:
XGBRFClassifier(objective='binary:logistic',
n_estimator=X,
learning_rate=0.1,
n_jobs=-1)
2.2 grid search for optimal 'max_depth' and 'min_child_weight'.
-- use scoring='roc_auc'
-- use cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=3)
2.3 get 'current_best' by using gridsearch.best_estimator, then fit current_best again.
2.4 then grid search for 'gamma' with current_best, when it's done, get 'current_best' by using gridsearch.best_estimator, then fit current_best again.
2.5 then grid search for 'subsample' and 'colsample_bytree', when it's done, get 'current_best' by using gridsearch.best_estimator, then fit current_best again.
2.6 then grid search for 'learning_rate' , when it's done, get 'current_best' by using gridsearch.best_estimator, then fit current_best again.
3. final training: use all data to fit the mdl with all the optimized params.
4. Evaluatiion.
MATLAB applications, tutorials, examples, tricks, resources,...and a little bit of everything I learned ...
Subscribe to:
Post Comments (Atom)
my-alpine and docker-compose.yml
``` version: '1' services: man: build: . image: my-alpine:latest ``` Dockerfile: ``` FROM alpine:latest ENV PYTH...
-
It took me a while to figure out how to insert a space in Mathtype equations. This is especially useful when you write an equation with mult...
-
Recently I read post from Dr. Doug Hull's blog: http://blogs.mathworks.com/videos/2009/10/23/basics-volume-visualization-19-defining-s...
-
To get the slope of a pair of x and y, usually I first plot the curve and then add the trend line. Actually there are two functions i...
No comments:
Post a Comment
Any comments?