Say I have this dataframe:
order_id | class
1 | furniture
2 | book
2 | furniture
2 | book
3 | auto
3 | auto
3 | electronics
3 | pet
and to get the most frequent class of each order:
df.groupby('order_id').agg({'order_id': lambda x: x.value_counts().index[0]})
MATLAB applications, tutorials, examples, tricks, resources,...and a little bit of everything I learned ...
Monday, December 31, 2018
Wednesday, November 14, 2018
Make a simple heatmap in R with ggplot2
So today I got a file that look like this:
And here's the end result of the heat map:
(Notice that the order of the Name in the chart is not the same as that in the dataframe.)
Here's the code I used to make this plot:
rm(list=ls())
library(ggplot2)
df = read.csv('fakedata.csv')
# reshape the dataframe
df.m = melt(df, id.vars = 'Name')
ggplot(df.m, aes(variable, Name)) +
geom_tile(aes(fill = value),
colour = "white") +
scale_fill_gradient(low = 'white',
high = 'blue4')
This is how the df.m look like:
And here's the end result of the heat map:
(Notice that the order of the Name in the chart is not the same as that in the dataframe.)
Here's the code I used to make this plot:
rm(list=ls())
library(ggplot2)
df = read.csv('fakedata.csv')
# reshape the dataframe
df.m = melt(df, id.vars = 'Name')
ggplot(df.m, aes(variable, Name)) +
geom_tile(aes(fill = value),
colour = "white") +
scale_fill_gradient(low = 'white',
high = 'blue4')
This is how the df.m look like:
Tuesday, October 30, 2018
convert a sqlite3 query result table into a pandas dataframe
The sqlite3 query returns a list of tuples. Here's a good way to convert the list of tuples into a pandas data frame.
def q():
sql_c = " some query commands here"
df = pd.DataFrame(dbc.execute(sql_c).fetchall())
# get all column names
cns = [d[0] for d in dbc.description]
df.columns = cns
return df
def q():
sql_c = " some query commands here"
df = pd.DataFrame(dbc.execute(sql_c).fetchall())
# get all column names
cns = [d[0] for d in dbc.description]
df.columns = cns
return df
Saturday, October 27, 2018
DataFrame create multiple new columns by applying a function that returns multiple
df[['sum', 'difference']] = df.apply(
lambda row: add_subtract_list(row['a'], row['b']), axis=1)
Tuesday, October 9, 2018
Monday, September 17, 2018
5 available pane layouts when using tmuxp
tmuxp pane layout options:
1. even-horizontal
2. even-vertical
3. main-horizontal
4. main-vertical
5. tiled
Thursday, July 12, 2018
Add multiple columns to Pandas dataframe by applying a function that returns multiple values
df[['sum', 'difference']] = df.apply(
lambda row: pd.Series(myfunc(row['a'], row['b'])), axis=1)
# The pd.Series is the key!
Friday, June 8, 2018
Bash | Print confirmation before exit
command 1
command 2
.
.
.
[ $? -eq '0' ] && echo "Job done!"
exit $?
command 2
.
.
.
[ $? -eq '0' ] && echo "Job done!"
exit $?
Tuesday, June 5, 2018
AWK function to add a double quote to beginning of a line
function addQuote(input) {
return ("\"" input)
}
Friday, May 11, 2018
Tuesday, May 8, 2018
Python read files line by line
fo = open('inputFileName', 'r')
lines = fo.readlines()
for ln in lines:
print(ln)
lines = fo.readlines()
for ln in lines:
print(ln)
Tuesday, May 1, 2018
Monday, April 30, 2018
my .vimrc file
set nocompatible
filetype off
set rtp+=~/.vim/vundle.git/
call vundle#rc()
"Bundles:
Bundle 'gmarik/vundle'
Bundle 'jiangmiao/auto-pairs'
Bundle 'Lokaltog/vim-powerline'
Bundle 'scrooloose/syntastic'
Bundle 'scrooloose/nerdcommenter'
Bundle 'scrooloose/nerdtree'
Bundle 'tmhedberg/SimpylFold'
"
set number
set relativenumber
syntax on
set background=light
hi clear
syntax enable
colorscheme monokai
set shiftwidth=4
set tabstop=4
set expandtab
set laststatus=2
filetype off
set rtp+=~/.vim/vundle.git/
call vundle#rc()
"Bundles:
Bundle 'gmarik/vundle'
Bundle 'jiangmiao/auto-pairs'
Bundle 'Lokaltog/vim-powerline'
Bundle 'scrooloose/syntastic'
Bundle 'scrooloose/nerdcommenter'
Bundle 'scrooloose/nerdtree'
Bundle 'tmhedberg/SimpylFold'
"
set number
set relativenumber
syntax on
set background=light
hi clear
syntax enable
colorscheme monokai
set shiftwidth=4
set tabstop=4
set expandtab
set laststatus=2
" searching
set hlsearch
set incsearch
set ignorecase
set smartcase
set showmatch
Friday, April 27, 2018
Python | Process text file and get word counts
import csv
import pandas as pd
import codecs
import jieba
import jieba.analyse
tag_list = list() # list of all tags
with codecs.open('soup.csv', 'r', 'utf-8') as f:
for ln in f:
item = ln.strip("\n\r").split("\t")
tags = jieba.analyse.extract_tags(item[0])
for t in tags:
tag_list.append(t)
tagS = pd.Series(tag_list)
output = tagS.value_counts(ascending=False)
output.to_csv('output.csv', encoding='gbk')
Thursday, April 26, 2018
sed | remove leading space for each row
sed "s/^[ \t]*//" -i filename
#These is no output. The change is done inplace.
#These is no output. The change is done inplace.
Thursday, April 12, 2018
AWK | remove comma only between double quotes
awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "", $i) } 1' inputFile
Thursday, April 5, 2018
State names and abbreviations
states = { 'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AS': 'American Samoa', 'AZ': 'Arizona', 'CA': 'California', 'CO': 'Colorado', 'CT': 'Connecticut', 'DC': 'District of Columbia', 'DE': 'Delaware', 'FL': 'Florida', 'GA': 'Georgia', 'GU': 'Guam', 'HI': 'Hawaii', 'IA': 'Iowa', 'ID': 'Idaho', 'IL': 'Illinois', 'IN': 'Indiana', 'KS': 'Kansas', 'KY': 'Kentucky', 'LA': 'Louisiana', 'MA': 'Massachusetts', 'MD': 'Maryland', 'ME': 'Maine', 'MI': 'Michigan', 'MN': 'Minnesota', 'MO': 'Missouri', 'MP': 'Northern Mariana Islands', 'MS': 'Mississippi', 'MT': 'Montana', 'NA': 'National', 'NC': 'North Carolina', 'ND': 'North Dakota', 'NE': 'Nebraska', 'NH': 'New Hampshire', 'NJ': 'New Jersey', 'NM': 'New Mexico', 'NV': 'Nevada', 'NY': 'New York', 'OH': 'Ohio', 'OK': 'Oklahoma', 'OR': 'Oregon', 'PA': 'Pennsylvania', 'PR': 'Puerto Rico', 'RI': 'Rhode Island', 'SC': 'South Carolina', 'SD': 'South Dakota', 'TN': 'Tennessee', 'TX': 'Texas', 'UT': 'Utah', 'VA': 'Virginia', 'VI': 'Virgin Islands', 'VT': 'Vermont', 'WA': 'Washington', 'WI': 'Wisconsin', 'WV': 'West Virginia', 'WY': 'Wyoming' }
Monday, April 2, 2018
Wednesday, March 21, 2018
AWK: add suffix to one column and print out all
Raw data look like:
1 | Field1 | Field2 | Field3
2 | Field1 | Field2 | Field3
3 | Field1 | Field2 | Field3
4 | Field1 | Field2 | Field3
5 | Field1 | Field2 | Field3
6 | Field1 | Field2 | Field3
Want to change it to:
1 | Field1 | Field2$ | Field3
2 | Field1 | Field2$ | Field3
3 | Field1 | Field2$ | Field3
4 | Field1 | Field2$ | Field3
5 | Field1 | Field2$ | Field3
6 | Field1 | Field2$ | Field3
All I need to do is:
cat input.file | awk -F"|" '$3=$3"$"; print' > outcome.file
1 | Field1 | Field2 | Field3
2 | Field1 | Field2 | Field3
3 | Field1 | Field2 | Field3
4 | Field1 | Field2 | Field3
5 | Field1 | Field2 | Field3
6 | Field1 | Field2 | Field3
Want to change it to:
1 | Field1 | Field2$ | Field3
2 | Field1 | Field2$ | Field3
3 | Field1 | Field2$ | Field3
4 | Field1 | Field2$ | Field3
5 | Field1 | Field2$ | Field3
6 | Field1 | Field2$ | Field3
All I need to do is:
cat input.file | awk -F"|" '$3=$3"$"; print' > outcome.file
Saturday, March 3, 2018
This is How You Download Your FitBit Second-Level Data without Coding
If you are a Fitbit user who wants to save a copy of Fitbit data on your computer but doesn’t have advanced programming skills, this tutorial is right for you! You don’t need to do any coding at all to save your second-level data. I have been struggling with getting all the so-called ‘intraday data’ for quite a while. I have found many useful resources online, for example, Paul’s tutorial, Collin Chaffin’s Powershell module, and the Fitbit-Python API, but they are somehow complicated and I just could not make any of these working for me smoothly. Recently I finally figured out a way to download these Fitbit data without any coding. Are you ready?
Step 1: Register an app on http://dev.fitbit.com
First, you need to register an account on dev.fitbit.com, and then click ‘MANAGE YOUR APPS’ on the top right area. Next, you need to click ‘Register a new app’ button at the top right area to start with. Well, this step is simple, just make sure the OAuth 2.0 Application Type is set to ‘Personal’, and the Callback URL is complete – including the ‘http://’ part and also a ‘/’ at the end. Here I used ‘http://google.com/’ as an example. I have used this blog’s URL ‘http://shishu.info/’ which worked just fine too.
After you click the red ‘Register’ button, you will be able to see the credentials for the app you just registered. The ‘OAuth 2.0 Client ID’ and ‘Client Secret’ will be used in the next step.
Step 2: Use the OAuth 2.0 tutorial page
Next, you need to right click the ‘OAuth 2.0 tutorial page’ link and open it in a new tab, so that you can look back at your app’s credentials easily. Make sure the ‘Implicit Grant Flow’ is chosen instead of ‘Authorization Code Flow’ – this will make things much easier! After you copy/paste the ‘Client ID’ and ‘Client Secret’ into the blanks and put in the Redirect URI, click the auto-generated link.
Then you will see the confirmation page. Just click ‘Allow’.
And you will be led to the ‘Redirect URI’, which is ‘http://google.com/’ in this case. But the address bar now shows a very long string which is the token for your app.
Next, you need to copy and paste everything in the address bar but without the starting part ( https://www.google.com/ ) to the ‘Parse response’ section, and hit enter key once. This way you can clearly see what the token is, what the scope is, and how long the token is valid. In this case, the token is valid for 1 week, which equals to 604800 seconds. Pretty good, right?
Step 3: Make the request and get the data!
After you are done with the ‘Parse response’, the next step is ready for you automatically.
Just click the ‘Send to Hurl.it’ link and ‘Launch Request’ on the new page. Make sure to tell the web that you are not a robot too.
After that, if you see the ‘200 OK’ status – meaning everything works fine. And you can find the data you want in the lower half of the page. Like this:
If you click ‘view raw’ at the right side, you will see the ‘BODY’ will be changed to the raw text file and you can simply copy/paste them to a text editor. And that’s all you do to download your Fitbit second-level data! As I promised, you don’t need to know any programming skills to accomplish this, isn’t that cool?
Additional tips:
tip 1: other data types
If you want some other data rather than the ‘user profile’, you can simply change the ‘API endpoint URL’ in the ‘Make Request’ step. According to the Fitbit API documentation, you can get the heart rate data by using this URL:
https://api.fitbit.com/1/user/-/activities/heart/date/today/1d.json
Or the sleep data by using:
https://api.fitbit.com/1/user/28H22H/sleep/date/2014-09-01.json
tip 2: save json file in an easier way
If you think the ‘Send to Hurl.it’ method is not fast enough, you can copy the ‘curl’ command auto-generated in the ‘Make Request’ step and run it in terminal (for Windows, that is the ‘cmd’ window). Add the following part to the end of the copied ‘curl’ command so that the data will be saved to your disk:
>> data_file.json
tip 3: save multiple days’ data automatically
I think Fitbit provides the functionality to let users download multiple days’ data with one command, by specifying the date rage in the curl request. However, I could not make it work for me for some unknown reason. Therefore I used a piece of Python code to download multiple days’ data for me. Here’s the code, which I think is pretty self-explanatory.
Step 1: Register an app on http://dev.fitbit.com
First, you need to register an account on dev.fitbit.com, and then click ‘MANAGE YOUR APPS’ on the top right area. Next, you need to click ‘Register a new app’ button at the top right area to start with. Well, this step is simple, just make sure the OAuth 2.0 Application Type is set to ‘Personal’, and the Callback URL is complete – including the ‘http://’ part and also a ‘/’ at the end. Here I used ‘http://google.com/’ as an example. I have used this blog’s URL ‘http://shishu.info/’ which worked just fine too.
After you click the red ‘Register’ button, you will be able to see the credentials for the app you just registered. The ‘OAuth 2.0 Client ID’ and ‘Client Secret’ will be used in the next step.
Step 2: Use the OAuth 2.0 tutorial page
Next, you need to right click the ‘OAuth 2.0 tutorial page’ link and open it in a new tab, so that you can look back at your app’s credentials easily. Make sure the ‘Implicit Grant Flow’ is chosen instead of ‘Authorization Code Flow’ – this will make things much easier! After you copy/paste the ‘Client ID’ and ‘Client Secret’ into the blanks and put in the Redirect URI, click the auto-generated link.
Then you will see the confirmation page. Just click ‘Allow’.
And you will be led to the ‘Redirect URI’, which is ‘http://google.com/’ in this case. But the address bar now shows a very long string which is the token for your app.
Next, you need to copy and paste everything in the address bar but without the starting part ( https://www.google.com/ ) to the ‘Parse response’ section, and hit enter key once. This way you can clearly see what the token is, what the scope is, and how long the token is valid. In this case, the token is valid for 1 week, which equals to 604800 seconds. Pretty good, right?
Step 3: Make the request and get the data!
After you are done with the ‘Parse response’, the next step is ready for you automatically.
Just click the ‘Send to Hurl.it’ link and ‘Launch Request’ on the new page. Make sure to tell the web that you are not a robot too.
After that, if you see the ‘200 OK’ status – meaning everything works fine. And you can find the data you want in the lower half of the page. Like this:
If you click ‘view raw’ at the right side, you will see the ‘BODY’ will be changed to the raw text file and you can simply copy/paste them to a text editor. And that’s all you do to download your Fitbit second-level data! As I promised, you don’t need to know any programming skills to accomplish this, isn’t that cool?
Additional tips:
tip 1: other data types
If you want some other data rather than the ‘user profile’, you can simply change the ‘API endpoint URL’ in the ‘Make Request’ step. According to the Fitbit API documentation, you can get the heart rate data by using this URL:
https://api.fitbit.com/1/user/-/activities/heart/date/today/1d.json
Or the sleep data by using:
https://api.fitbit.com/1/user/28H22H/sleep/date/2014-09-01.json
tip 2: save json file in an easier way
If you think the ‘Send to Hurl.it’ method is not fast enough, you can copy the ‘curl’ command auto-generated in the ‘Make Request’ step and run it in terminal (for Windows, that is the ‘cmd’ window). Add the following part to the end of the copied ‘curl’ command so that the data will be saved to your disk:
>> data_file.json
tip 3: save multiple days’ data automatically
I think Fitbit provides the functionality to let users download multiple days’ data with one command, by specifying the date rage in the curl request. However, I could not make it work for me for some unknown reason. Therefore I used a piece of Python code to download multiple days’ data for me. Here’s the code, which I think is pretty self-explanatory.
import requests
import json
import pandas as pd
from time import sleep
# put the token for your app in between the single quotes
token = ''
# make a list of dates
# ref: http://stackoverflow.com/questions/993358/creating-a-range-of-dates-in-python
# You can change the start and end date as you want
# Just make sure to use the yyyy-mm-dd format
start_date = '2015-12-28'
end_date = '2016-06-14'
datelist = pd.date_range(start = pd.to_datetime(start_date),
end = pd.to_datetime(end_date)).tolist()
'''
The codes below use a for loop to generate one URL for each day in the datelist,
and then request each day's data and save the data into individual json files.
Because Fitbit limit 150 request per hour, I let the code sleep for 30 seconds
between each request, to meet this limitation.
'''
for ts in datelist:
date = ts.strftime('%Y-%m-%d')
url = 'https://api.fitbit.com/1/user/-/activities/heart/date/' + date + '/1d/1sec/time/00:00/23:59.json'
filename = 'HR'+ date +'.json'
response = requests.get(url=url, headers={'Authorization':'Bearer ' + token})
if response.ok:
with open(filename, 'w') as f:
json.dump(response.content, f)
print (date + ' is saved!')
sleep(30)
else:
print ('The file of %s is not saved due to error!' % date)
sleep(30)
Monday, February 26, 2018
Find the speed limiting part of your Python code
Say you have a Python code, 'simple.py' like this:
A better way is to use 'cprofilev', which you can install from here.
import time
def func_A():
time.sleep(1)
def func_B():
time.sleep(10)
if __name__=='__mian__':
func_A()
func_B()
and you want to figure out which part takes the most time to finish. What you can do is:python -m cProfile simple.pyHowever, the output is kinda too complicated to understand.
A better way is to use 'cprofilev', which you can install from here.
python -m cProfilev simple.pyOutput would look like:
[cProfileV]: cProfile output available at http://127.0.0.1:4000Just go to http://127.0.0.1:4000 and all the information are right there waiting for you.
## https://github.com/ymichael/cprofilev
Tuesday, February 6, 2018
Round to thousands | AWK
function rounding(var1){
a = var1+500
b = int(a/1000)
return b
}
BEGIN{
FS="|"
OFS="|"
}
{
print $1,$2, $3, rounding($4), rounding($5), rounding($6), rounding($7), rounding($8),rounding($9),rounding($10),rounding($11),rounding($12)
}
a = var1+500
b = int(a/1000)
return b
}
BEGIN{
FS="|"
OFS="|"
}
{
print $1,$2, $3, rounding($4), rounding($5), rounding($6), rounding($7), rounding($8),rounding($9),rounding($10),rounding($11),rounding($12)
}
Monday, January 29, 2018
Restart Postgres Service
Go to :
C:\Program Files\PostgreSQL\10\bin
Run:
pg_ctl restart -D "C:\Program Files\PostgreSQL\10\data"
C:\Program Files\PostgreSQL\10\bin
Run:
pg_ctl restart -D "C:\Program Files\PostgreSQL\10\data"
Friday, January 26, 2018
When Python not working properly on Cygwin, use this!
export PATH="C:\Users\Administrator\Anaconda2:$PATH"
Thursday, January 11, 2018
A self-defined algorithm to group strings | Python
def group_names(x):
d1 = dict()
for wx in x:
dist_list = [Levenshtein.distance(wx, w2) for w2 in x]
indx = [d<=4 for d in dist_list]
sub_lst = list(compress(x, indx))
list_new = [e for e in x if e not in sub_lst]
x = list_new
print len(x)
if len(sub_lst)>1:
for i in sub_lst[1:]:
d1[i] = sub_lst[0]
return d1
The problem is that when the input list (x) is too long, it takes quite a while to finish.
d1 = dict()
for wx in x:
dist_list = [Levenshtein.distance(wx, w2) for w2 in x]
indx = [d<=4 for d in dist_list]
sub_lst = list(compress(x, indx))
list_new = [e for e in x if e not in sub_lst]
x = list_new
print len(x)
if len(sub_lst)>1:
for i in sub_lst[1:]:
d1[i] = sub_lst[0]
return d1
The problem is that when the input list (x) is too long, it takes quite a while to finish.
Wednesday, January 10, 2018
Replace multiple characters in a string | Tableau
Just use nested REPLACE() function:
replace(replace(replace(replace(replace(lower([Employer Name]), "inc",""), "-",""), "&",""), " co","")," llc","")
replace(replace(replace(replace(replace(lower([Employer Name]), "inc",""), "-",""), "&",""), " co","")," llc","")
Subscribe to:
Posts (Atom)
my-alpine and docker-compose.yml
``` version: '1' services: man: build: . image: my-alpine:latest ``` Dockerfile: ``` FROM alpine:latest ENV PYTH...
-
It took me a while to figure out how to insert a space in Mathtype equations. This is especially useful when you write an equation with mult...
-
Recently I read post from Dr. Doug Hull's blog: http://blogs.mathworks.com/videos/2009/10/23/basics-volume-visualization-19-defining-s...
-
To get the slope of a pair of x and y, usually I first plot the curve and then add the trend line. Actually there are two functions i...