MATLAB ... and more ...: 2020

Tuesday, March 3, 2020

Building a XGBoost model

1. pre-process :

1.1 Boruta to clean data column-wise
1.2 TomekLinks to clean data row-wise

2. training (use a random sample to do this step if the data size is too big):

2.1 create a basic with fixed learning rate and n_estimators:
XGBRFClassifier(objective='binary:logistic',
n_estimator=X,
learning_rate=0.1,
n_jobs=-1)

2.2 grid search for optimal 'max_depth' and 'min_child_weight'.
-- use scoring='roc_auc'
-- use cv = RepeatedStratifiedKFold(n_splits=3, n_repeats=3)

2.3 get 'current_best' by using gridsearch.best_estimator, then fit current_best again.

2.4 then grid search for 'gamma' with current_best, when it's done, get 'current_best' by using gridsearch.best_estimator, then fit current_best again.

2.5 then grid search for 'subsample' and 'colsample_bytree', when it's done, get 'current_best' by using gridsearch.best_estimator, then fit current_best again.

2.6 then grid search for 'learning_rate' , when it's done, get 'current_best' by using gridsearch.best_estimator, then fit current_best again.

3. final training: use all data to fit the mdl with all the optimized params.

4. Evaluatiion.

Friday, February 28, 2020

Assign bin labels to new values during model inference

In model development:

import pandas as pd
import numpy as np
np.random.seed(42)

bins = [0, 10, 15, 20, 25, 30, np.inf]
labels = bins[1:]
ages = list(range(5, 90, 5))
df = pd.DataFrame({"user_age": ages})
df["user_age_bin"] = pd.cut(df["user_age"], bins=bins, labels=False)

# sort by age
print(df.sort_values('user_age'))

In production, I will need to put individual age values to its corresponding bins. Here's how to do it:

# a new age value
new_age=30

# use this right=True and '-1' trick to make the bins match
print(np.digitize(new_age, bins=bins, right=True) -1)

MATLAB ... and more ...

Tuesday, March 3, 2020

Building a XGBoost model

Friday, February 28, 2020

Assign bin labels to new values during model inference

my-alpine and docker-compose.yml

Enjoy Matlab!

Report Abuse