MATLAB ... and more ...

Friday, June 14, 2019

speed up loading local csv file into AWS RDS MySQL database

tricks I learned today:
1. use 'LOAD LOCAL INFILE'
2. 'SET AUTOCOMMIT=0' - and manually commit at the end.

Friday, March 15, 2019

Python function: check if an object is an float

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

Thursday, March 14, 2019

Python function: format dollars

def format_dollar(s):
      """takes in a str or a number and format it as dollar format
      i.e. u'24567.0' --> u'$24,567'
     """

     s = str(s) # in case input is not string

     try:
     i = int(s.split('.')[0])
     output = "$" + "{:,}".format(i)
     except:
     output = s

     return output

Tuesday, February 5, 2019

AWK: single quote eche line and add comma in the end

File.csv looks like:

line1
line2
line3

Use:

cat file.csv | awk -v a="'" '{print a$0a ","}'

to make it look like:

'line1',
'line2',
'line3',

Thursday, January 10, 2019

Python: Notes on Fluent Python

1.

2. List comprehension

a = [['-'] * 3 for i in range(3)]

b = [['-']*3] *3

What is the difference between a and b?

3. Inplace method

Inplace method returns None and does not create a new object. For example:

lst = [5,4,3,2,1]
lst.sort() # return None

4. Sort a list of strings by length

fruits = ['apple', 'grape', 'orange', 'banaba', 'dragon fruit']
sorted(fruits, key=len)

5. recursion

def factorial(n):
return 1 if n<2 else="" factorial="" n-1="" n="" p="">print(factorial(5))

6. from operator import itemgetter, attrgetter, methodcaller

Monday, December 31, 2018

Pandas: groupby and find the most frequent item

Say I have this dataframe:

order_id | class
1 | furniture
2 | book
2 | furniture
2 | book
3 | auto
3 | auto
3 | electronics
3 | pet

and to get the most frequent class of each order:

df.groupby('order_id').agg({'order_id': lambda x: x.value_counts().index[0]})

Wednesday, November 14, 2018

Make a simple heatmap in R with ggplot2

So today I got a file that look like this:

And here's the end result of the heat map:

(Notice that the order of the Name in the chart is not the same as that in the dataframe.)

Here's the code I used to make this plot:

rm(list=ls())
library(ggplot2)

df = read.csv('fakedata.csv')

# reshape the dataframe
df.m = melt(df, id.vars = 'Name')

ggplot(df.m, aes(variable, Name)) +
geom_tile(aes(fill = value),
colour = "white") +
scale_fill_gradient(low = 'white',
high = 'blue4')

This is how the df.m look like: