Working on LogisticRegression today:
* Adding class_weight = 'balanced' significantly improves the f-score: the recall is enhanced but precision reduced.
Shiny App Update:
The data originally has 652303 obs. of 299 variables
Frank added these filters:
dt = dtx %>%
filter(coapplicant_first_name == "") %>%
filter(app_scr > 700) %>%
filter(income > 10000) %>%
filter(applicant_age > 30) %>%
filter(time_in_file < 25) %>%
filter(high_credit_amount < 15000) %>%
filter(no_of_trades < 6) %>%
filter(selling_price > 19000)
The resulted file 'High86.rds' has 86 obs. of 299 variables now.
Changed filters again:
dt = dtx %>%
filter(coapplicant_first_name == "") %>%
filter(app_scr > 500) %>%
filter(income > 8000) %>%
filter(applicant_age > 30) %>%
filter(time_in_file < 25,
time_in_file > 0) %>%
filter(high_credit_amount < 15000) %>%
filter(no_of_trades < 6) %>%
filter(selling_price > 15000)
The resulted file 'High61.rds' has 61 obs. of 299 variables now.
Then updated the filters again:
dt = dtx %>%
filter(coapplicant_first_name == "") %>%
filter(app_scr > 300) %>%
filter(income > 8000) %>%
filter(applicant_age > 40) %>%
filter(time_in_file < 36,
time_in_file > 0) %>%
filter(high_credit_amount < 15000) %>%
filter(no_of_trades < 6) %>%
filter(selling_price > 1500) %>%
filter(application_status == 'A')
The resulted file 'High4.rds' only has 4 records.
Then updated again:
dt = dtx %>%
filter(coapplicant_first_name == "") %>%
filter(income > 6000) %>%
filter(app_scr > 400) %>%
filter(str_detect(dealer_scr_reason1, 'Matches to Risky Dealer in Consortium') | str_detect(dealer_scr_reason2, 'Matches to Risky Dealer in Consortium') | str_detect(dealer_scr_reason3, 'Matches to Risky Dealer in Consortium')) %>%
filter(str_detect(empl_name, 'LLC') | str_detect(empl_name, 'venture') | str_detect(empl_name, 'consult') | str_detect(empl_name, 'enterprise'))
The resulted file is 'High42.rds'
Then update:
dt = dtx %>%
filter(income > 8000,
time_in_file > 0,
time_in_file < 36,
applicant_age > 30,
application_status == 'A',
str_detect(addr_city, regex('Miami', ignore_case=T)) | str_detect(addr_city, regex('chicago', ignore_case=T)) | str_detect(addr_city, regex('Houston', ignore_case=T))| str_detect(addr_city, regex('Baltimore', ignore_case=T)) | str_detect(addr_city, regex('Los Angeles', ignore_case=T)) )
The resulted file is 'High21.rds'.
cygwin commands:
# sort
cat all_with_appscr_dlrscr_v2.filtered.pip |sort -t"|" -k2,2n > all_with_appscr_dlrscr_v2.filtered.sorted.pip
# join
join -1 2 -2 1 -t"|" all_with_appscr_dlrscr_v2.filtered.sorted.pip appids2reasons.pip > all_with_appscr_dlrscr_appreasons.pip
MATLAB applications, tutorials, examples, tricks, resources,...and a little bit of everything I learned ...
Saturday, August 12, 2017
Subscribe to:
Post Comments (Atom)
my-alpine and docker-compose.yml
``` version: '1' services: man: build: . image: my-alpine:latest ``` Dockerfile: ``` FROM alpine:latest ENV PYTH...
-
It took me a while to figure out how to insert a space in Mathtype equations. This is especially useful when you write an equation with mult...
-
Recently I read post from Dr. Doug Hull's blog: http://blogs.mathworks.com/videos/2009/10/23/basics-volume-visualization-19-defining-s...
-
To get the slope of a pair of x and y, usually I first plot the curve and then add the trend line. Actually there are two functions i...
No comments:
Post a Comment
Any comments?