Friday, June 14, 2019

speed up loading local csv file into AWS RDS MySQL database

tricks I learned today:
1. use 'LOAD LOCAL INFILE'
2. 'SET AUTOCOMMIT=0'  - and manually commit at the end.

Friday, March 15, 2019

Thursday, March 14, 2019

Python function: format dollars

 def format_dollar(s):
      """takes in a str or a number and format it as dollar format
      i.e. u'24567.0' --> u'$24,567'
     """


     s = str(s) # in case input is not string
     

     try:
         i = int(s.split('.')[0])
         output = "$" + "{:,}".format(i)
     except:
         output = s

     return output

Tuesday, February 5, 2019

AWK: single quote eche line and add comma in the end

File.csv looks like:

line1
line2
line3

Use:

cat file.csv | awk -v a="'" '{print a$0a ","}'

to make it look like:

'line1',
'line2',
'line3',

Thursday, January 10, 2019

Python: Notes on Fluent Python

1.

2. List comprehension

a = [['-'] * 3 for i in range(3)]

b = [['-']*3] *3

What is the difference between a and b?

3. Inplace method

Inplace method returns None and does not create a new object. For example:

lst = [5,4,3,2,1]
lst.sort() # return None


4. Sort a list of strings by length

fruits = ['apple', 'grape', 'orange', 'banaba', 'dragon fruit']
sorted(fruits, key=len)

5. recursion

def factorial(n):
    return 1 if n<2 else="" factorial="" n-1="" n="" p="">print(factorial(5))


6. from operator import itemgetter, attrgetter, methodcaller

Monday, December 31, 2018

Pandas: groupby and find the most frequent item

Say I have this dataframe:

order_id | class
 1 |  furniture
2  |  book
2  | furniture
2  | book
3  | auto
3  | auto
3  | electronics
3  | pet

and to get the most frequent class of each order:

df.groupby('order_id').agg({'order_id': lambda x: x.value_counts().index[0]})

Wednesday, November 14, 2018

Make a simple heatmap in R with ggplot2

So today I got a file that look like this:












And here's the end result of the heat map:
















(Notice that the order of the Name in the chart is not the same as that in the dataframe.)

Here's the code I used to make this plot:

rm(list=ls())
library(ggplot2)

df = read.csv('fakedata.csv')

# reshape the dataframe
df.m = melt(df, id.vars = 'Name')


ggplot(df.m, aes(variable, Name)) +
  geom_tile(aes(fill = value),
            colour = "white") +
  scale_fill_gradient(low = 'white',
                      high = 'blue4')




This is how the df.m look like:

my-alpine and docker-compose.yml

 ``` version: '1' services:     man:       build: .       image: my-alpine:latest   ```  Dockerfile: ``` FROM alpine:latest ENV PYTH...