MATLAB ... and more ...: 2017-08-11

Saturday, August 12, 2017

2017-08-11

Working on LogisticRegression today:

* Adding class_weight = 'balanced' significantly improves the f-score: the recall is enhanced but precision reduced.

Shiny App Update:

The data originally has 652303 obs. of 299 variables

Frank added these filters:

dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(app_scr > 700) %>%

filter(income > 10000) %>%

filter(applicant_age > 30) %>%

filter(time_in_file < 25) %>%

filter(high_credit_amount < 15000) %>%

filter(no_of_trades < 6) %>%

filter(selling_price > 19000)

The resulted file 'High86.rds' has 86 obs. of 299 variables now.

Changed filters again:

dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(app_scr > 500) %>%

filter(income > 8000) %>%

filter(applicant_age > 30) %>%

filter(time_in_file < 25,

time_in_file > 0) %>%

filter(high_credit_amount < 15000) %>%

filter(no_of_trades < 6) %>%

filter(selling_price > 15000)

The resulted file 'High61.rds' has 61 obs. of 299 variables now.

Then updated the filters again:

dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(app_scr > 300) %>%

filter(income > 8000) %>%

filter(applicant_age > 40) %>%

filter(time_in_file < 36,

time_in_file > 0) %>%

filter(high_credit_amount < 15000) %>%

filter(no_of_trades < 6) %>%

filter(selling_price > 1500) %>%

filter(application_status == 'A')

The resulted file 'High4.rds' only has 4 records.

Then updated again:

dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(income > 6000) %>%

filter(app_scr > 400) %>%

filter(str_detect(dealer_scr_reason1, 'Matches to Risky Dealer in Consortium') | str_detect(dealer_scr_reason2, 'Matches to Risky Dealer in Consortium') | str_detect(dealer_scr_reason3, 'Matches to Risky Dealer in Consortium')) %>%

filter(str_detect(empl_name, 'LLC') | str_detect(empl_name, 'venture') | str_detect(empl_name, 'consult') | str_detect(empl_name, 'enterprise'))

The resulted file is 'High42.rds'

Then update:

dt = dtx %>%

filter(income > 8000,

time_in_file > 0,

time_in_file < 36,

applicant_age > 30,

application_status == 'A',

str_detect(addr_city, regex('Miami', ignore_case=T)) | str_detect(addr_city, regex('chicago', ignore_case=T)) | str_detect(addr_city, regex('Houston', ignore_case=T))| str_detect(addr_city, regex('Baltimore', ignore_case=T)) | str_detect(addr_city, regex('Los Angeles', ignore_case=T)) )

The resulted file is 'High21.rds'.

cygwin commands:

# sort

cat all_with_appscr_dlrscr_v2.filtered.pip |sort -t"|" -k2,2n > all_with_appscr_dlrscr_v2.filtered.sorted.pip

# join

join -1 2 -2 1 -t"|" all_with_appscr_dlrscr_v2.filtered.sorted.pip appids2reasons.pip > all_with_appscr_dlrscr_appreasons.pip

MATLAB ... and more ...

Saturday, August 12, 2017

2017-08-11

No comments:

Post a Comment

my-alpine and docker-compose.yml

Enjoy Matlab!

Report Abuse