Saturday, August 12, 2017

2017-08-11

Working on LogisticRegression today:





* Adding class_weight = 'balanced' significantly improves the f-score: the recall is enhanced but precision reduced.







Shiny App Update:



The data originally has 652303 obs. of 299 variables



Frank added these filters:



dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(app_scr > 700) %>%

filter(income > 10000) %>%

filter(applicant_age > 30) %>%

filter(time_in_file < 25) %>%

filter(high_credit_amount < 15000) %>%

filter(no_of_trades < 6) %>%

filter(selling_price > 19000)



The resulted file 'High86.rds' has 86 obs. of 299 variables now.



Changed filters again:



dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(app_scr > 500) %>%

filter(income > 8000) %>%

filter(applicant_age > 30) %>%

filter(time_in_file < 25,

time_in_file > 0) %>%

filter(high_credit_amount < 15000) %>%

filter(no_of_trades < 6) %>%

filter(selling_price > 15000)



The resulted file 'High61.rds' has 61 obs. of 299 variables now.





Then updated the filters again:



dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(app_scr > 300) %>%

filter(income > 8000) %>%

filter(applicant_age > 40) %>%

filter(time_in_file < 36,

time_in_file > 0) %>%

filter(high_credit_amount < 15000) %>%

filter(no_of_trades < 6) %>%

filter(selling_price > 1500) %>%

filter(application_status == 'A')



The resulted file 'High4.rds' only has 4 records.



Then updated again:



dt = dtx %>%

filter(coapplicant_first_name == "") %>%

filter(income > 6000) %>%

filter(app_scr > 400) %>%

filter(str_detect(dealer_scr_reason1, 'Matches to Risky Dealer in Consortium') | str_detect(dealer_scr_reason2, 'Matches to Risky Dealer in Consortium') | str_detect(dealer_scr_reason3, 'Matches to Risky Dealer in Consortium')) %>%

filter(str_detect(empl_name, 'LLC') | str_detect(empl_name, 'venture') | str_detect(empl_name, 'consult') | str_detect(empl_name, 'enterprise'))



The resulted file is 'High42.rds'





Then update:



dt = dtx %>%

filter(income > 8000,

time_in_file > 0,

time_in_file < 36,

applicant_age > 30,

application_status == 'A',

str_detect(addr_city, regex('Miami', ignore_case=T)) | str_detect(addr_city, regex('chicago', ignore_case=T)) | str_detect(addr_city, regex('Houston', ignore_case=T))| str_detect(addr_city, regex('Baltimore', ignore_case=T)) | str_detect(addr_city, regex('Los Angeles', ignore_case=T)) )





The resulted file is 'High21.rds'.









cygwin commands:

# sort

cat all_with_appscr_dlrscr_v2.filtered.pip |sort -t"|" -k2,2n > all_with_appscr_dlrscr_v2.filtered.sorted.pip

# join

join -1 2 -2 1 -t"|" all_with_appscr_dlrscr_v2.filtered.sorted.pip appids2reasons.pip > all_with_appscr_dlrscr_appreasons.pip

No comments:

Post a Comment

Any comments?

my-alpine and docker-compose.yml

 ``` version: '1' services:     man:       build: .       image: my-alpine:latest   ```  Dockerfile: ``` FROM alpine:latest ENV PYTH...