I know that these two functions: zeros() and ones() can be used to create a series of '0's and '1's in MatLab. For example:
a = zeros(5,1)
b = ones(7)
I just learned about another function today: NaN(). This function can create a vector or matrix with all the elements equals to nan. This is useful if you want to add a new empty column to a dataset, or table.
c = NaN(4,5)
MATLAB applications, tutorials, examples, tricks, resources,...and a little bit of everything I learned ...
Thursday, November 5, 2015
Thursday, October 1, 2015
Aggregate R dataframe with multiple conditions and multiple outputs
t_ag = aggregate(cbind(CPC, PM25)~SiteID+Date+Session+stType2,
data = t,
FUN = mean)
data = t,
FUN = mean)
Thursday, September 24, 2015
Data Wrangling with MongoDB Lesson 6.11 Improving Street Names
"""
Your task in this exercise has two steps:
- audit the OSMFILE and change the variable 'mapping' to reflect the changes needed to fix
the unexpected street types to the appropriate ones in the expected list.
You have to add mappings only for the actual problems you find in this OSMFILE,
not a generalized solution, since that may and will depend on the particular area you are auditing.
- write the update_name function, to actually fix the street name.
The function takes a string with street name as an argument and should return the fixed name
We have provided a simple test so that you see what exactly is expected
"""
import xml.etree.cElementTree as ET
from collections import defaultdict
import re
import pprint
OSMFILE = "example.osm"
street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)
expected = ["Street", "Avenue", "Boulevard", "Drive", "Court", "Place", "Square", "Lane", "Road",
"Trail", "Parkway", "Commons"]
# UPDATE THIS VARIABLE
mapping = { "St": "Street",
"St.": "Street",
"Rd.": "Road",
"Ave": "Avenue",
"N." : "North"}
def audit_street_type(street_types, street_name):
m = street_type_re.search(street_name)
if m:
street_type = m.group()
if street_type not in expected:
street_types[street_type].add(street_name)
def is_street_name(elem):
#print elem.attrib['k']
#print elem.attrib['v']
return (elem.attrib['k'] == "addr:street")
def audit(osmfile):
osm_file = open(osmfile, "r")
street_types = defaultdict(set)
print street_types
for event, elem in ET.iterparse(osm_file, events=("start",)):
if elem.tag == "node" or elem.tag == "way":
for tag in elem.iter("tag"):
if is_street_name(tag):
audit_street_type(street_types, tag.attrib['v'])
return street_types
def update_name(name, mapping):
# YOUR CODE HERE
betterNameParts= []
name = name.split()
for index, part in enumerate(name):
if part in mapping.keys():
betterNameParts.append(mapping[part])
else:
betterNameParts.append(part)
betterName = ' '.join(betterNameParts)
return betterName
def test():
st_types = audit(OSMFILE)
assert len(st_types) == 3
pprint.pprint(dict(st_types))
for st_type, ways in st_types.iteritems():
for name in ways:
better_name = update_name(name, mapping)
print name, "=>", better_name
if name == "West Lexington St.":
assert better_name == "West Lexington Street"
if name == "Baldwin Rd.":
assert better_name == "Baldwin Road"
if __name__ == '__main__':
test()
Your task in this exercise has two steps:
- audit the OSMFILE and change the variable 'mapping' to reflect the changes needed to fix
the unexpected street types to the appropriate ones in the expected list.
You have to add mappings only for the actual problems you find in this OSMFILE,
not a generalized solution, since that may and will depend on the particular area you are auditing.
- write the update_name function, to actually fix the street name.
The function takes a string with street name as an argument and should return the fixed name
We have provided a simple test so that you see what exactly is expected
"""
import xml.etree.cElementTree as ET
from collections import defaultdict
import re
import pprint
OSMFILE = "example.osm"
street_type_re = re.compile(r'\b\S+\.?$', re.IGNORECASE)
expected = ["Street", "Avenue", "Boulevard", "Drive", "Court", "Place", "Square", "Lane", "Road",
"Trail", "Parkway", "Commons"]
# UPDATE THIS VARIABLE
mapping = { "St": "Street",
"St.": "Street",
"Rd.": "Road",
"Ave": "Avenue",
"N." : "North"}
def audit_street_type(street_types, street_name):
m = street_type_re.search(street_name)
if m:
street_type = m.group()
if street_type not in expected:
street_types[street_type].add(street_name)
def is_street_name(elem):
#print elem.attrib['k']
#print elem.attrib['v']
return (elem.attrib['k'] == "addr:street")
def audit(osmfile):
osm_file = open(osmfile, "r")
street_types = defaultdict(set)
print street_types
for event, elem in ET.iterparse(osm_file, events=("start",)):
if elem.tag == "node" or elem.tag == "way":
for tag in elem.iter("tag"):
if is_street_name(tag):
audit_street_type(street_types, tag.attrib['v'])
return street_types
def update_name(name, mapping):
# YOUR CODE HERE
betterNameParts= []
name = name.split()
for index, part in enumerate(name):
if part in mapping.keys():
betterNameParts.append(mapping[part])
else:
betterNameParts.append(part)
betterName = ' '.join(betterNameParts)
return betterName
def test():
st_types = audit(OSMFILE)
assert len(st_types) == 3
pprint.pprint(dict(st_types))
for st_type, ways in st_types.iteritems():
for name in ways:
better_name = update_name(name, mapping)
print name, "=>", better_name
if name == "West Lexington St.":
assert better_name == "West Lexington Street"
if name == "Baldwin Rd.":
assert better_name == "Baldwin Road"
if __name__ == '__main__':
test()
Wednesday, September 23, 2015
numpy.ndarray doesn't have count attribute
First convert it into a list, by using list() function. Then use the count method to count.
list(nda).count(1)
list(nda).count(1)
Tuesday, September 15, 2015
Python code to process CPC 3007 data
# this code process the one minutes cpc data and convert it to each seconds
import xlrd, xlwt, datetime
from array import array
fileName1 = '06082015.xlsx'
f1WorkBook = xlrd.open_workbook(fileName1)
#check sheet names
sheetName = f1WorkBook.sheet_names()
# select the first sheet
f1Sheet1 = f1WorkBook.sheet_by_index(0)
sn = 19
en = 147
#copy and paste the header lines, row 0 to 17
header = [f1Sheet1.row(r) for r in range(18)]
#header = [f1Sheet1.cell_value(row, col) for col in range(2) for row in range(18)]
print header[0]
### ----------This section deals with header---------------------------
newWorkBook = xlwt.Workbook()
newsheet = newWorkBook.add_sheet('processed_'+fileName1[0:8])
for rowIndex, rowValue in enumerate(header):
for colIndex, cellValue in enumerate(rowValue):
newsheet.write(rowIndex, colIndex, cellValue.value)
###-------This part deals with the time column---------------------------------
time = [f1Sheet1.cell_value(row, col) for col in range(2) for row in range(sn-1,en)]
newTime = [0]*((en-sn)*60) # create an empty vector
newTime[0] = time[0]
print newTime[0]
for i in range(0, en-sn):
for j in range(0,60):
#if j+60*i <70: font="">70:>
# print j+60*i
newTime[j+60*i] = newTime[0]+(j+60*i)/(1440*60.0)
for rowIndex in range(18, 18+(en-sn)*60):
#print rowIndex
newsheet.write(rowIndex, 0, newTime[rowIndex-18])
###---------This section deals with the concentrations -----------------------
conc = [f1Sheet1.cell_value(row,col) for col in range(1,2) for row in range(sn-1, en)]
newConc = [0]*((en-sn)*60)
for i in range(0, en-sn):
for j in range(0,60):
newConc[j+60*i] = conc[i]
for rowIndex in range(18, 18+(en-sn)*60):
newsheet.write(rowIndex, 1, newConc[rowIndex-18])
### ------------------Save file
newWorkBook.save('processed_'+fileName1[0:8]+'.xls')
import xlrd, xlwt, datetime
from array import array
fileName1 = '06082015.xlsx'
f1WorkBook = xlrd.open_workbook(fileName1)
#check sheet names
sheetName = f1WorkBook.sheet_names()
# select the first sheet
f1Sheet1 = f1WorkBook.sheet_by_index(0)
sn = 19
en = 147
#copy and paste the header lines, row 0 to 17
header = [f1Sheet1.row(r) for r in range(18)]
#header = [f1Sheet1.cell_value(row, col) for col in range(2) for row in range(18)]
print header[0]
### ----------This section deals with header---------------------------
newWorkBook = xlwt.Workbook()
newsheet = newWorkBook.add_sheet('processed_'+fileName1[0:8])
for rowIndex, rowValue in enumerate(header):
for colIndex, cellValue in enumerate(rowValue):
newsheet.write(rowIndex, colIndex, cellValue.value)
###-------This part deals with the time column---------------------------------
time = [f1Sheet1.cell_value(row, col) for col in range(2) for row in range(sn-1,en)]
newTime = [0]*((en-sn)*60) # create an empty vector
newTime[0] = time[0]
print newTime[0]
for i in range(0, en-sn):
for j in range(0,60):
#if j+60*i <70: font="">70:>
# print j+60*i
newTime[j+60*i] = newTime[0]+(j+60*i)/(1440*60.0)
for rowIndex in range(18, 18+(en-sn)*60):
#print rowIndex
newsheet.write(rowIndex, 0, newTime[rowIndex-18])
###---------This section deals with the concentrations -----------------------
conc = [f1Sheet1.cell_value(row,col) for col in range(1,2) for row in range(sn-1, en)]
newConc = [0]*((en-sn)*60)
for i in range(0, en-sn):
for j in range(0,60):
newConc[j+60*i] = conc[i]
for rowIndex in range(18, 18+(en-sn)*60):
newsheet.write(rowIndex, 1, newConc[rowIndex-18])
### ------------------Save file
newWorkBook.save('processed_'+fileName1[0:8]+'.xls')
Tuesday, June 9, 2015
A simple example of mixed effects model using simulated data in R
First let's make the dataframe.
rm(list=ls())
library(lme4)
set.seed(1)
ID = c(rep('A',10), rep('B', 10), rep('c', 10))
x = c(1:10)
ya = x + 1 + rnorm(10)
yb = x + 2 + rnorm(10)
yc = x + 3 + rnorm(10)
df <- data.frame="" id="" x="rep(x,3)," y="c(ya,yb,yc))</font">->
Then let's do liner regression.
mdl.lm <- span=""> lm(y~x, data=df)
coef(mdl.lm)->
coef(mdl.lm)->
## (Intercept) x
## 2.073487 1.001631
## 2.073487 1.001631
The linear regression cannot treat each ID seperately. It
provides an overall intercept (2.073487) and overall slope (1.001631) for all
the 30 data points.
Next, let's build a random-intercept mixed effects model.
mdl.mix.RI <- span=""> lmer(y~x+(1+1|ID),data=df, REML=F)
coef(mdl.mix.RI)->
coef(mdl.mix.RI)->
## $ID
## (Intercept) x
## A 1.287210 1.001631
## B 2.211162 1.001631
## c 2.722090 1.001631
##
## attr(,"class")
## [1] "coef.mer"
## (Intercept) x
## A 1.287210 1.001631
## B 2.211162 1.001631
## c 2.722090 1.001631
##
## attr(,"class")
## [1] "coef.mer"
The above model can treat each ID differently, so the
Intercept for each ID are different.
Last, let's build a random-intercept and random-slope mixed
effects model.
mdl.mix.RIS <- span=""> lmer(y~x+(1+x|ID),data=df, REML=F)
coef(mdl.mix.RIS)->
coef(mdl.mix.RIS)->
## $ID
## (Intercept) x
## A 0.9373545 1.0650834
## B 2.2293276 0.9929275
## c 3.0537802 0.9468823
##
## attr(,"class")
## [1] "coef.mer"
## (Intercept) x
## A 0.9373545 1.0650834
## B 2.2293276 0.9929275
## c 3.0537802 0.9468823
##
## attr(,"class")
## [1] "coef.mer"
This model also treat each ID differently, and it gives
different intercept and slope for each ID.
Thursday, May 21, 2015
r apply function to list
http://stackoverflow.com/questions/30339652/r-apply-correlation-function-to-a-list/30340905#30340905
Tuesday, May 19, 2015
annotate p value in italicized font in ggplot
label = substitute(italic(p) == a, list(a=0.371) )
ggplot() + annotate('text', label=as.character(as.expression(label)), x=10, y=10, parse=T)
ggplot() + annotate('text', label=as.character(as.expression(label)), x=10, y=10, parse=T)
Wednesday, April 29, 2015
Relationship between Pearson correlation coefficient and the r-squared from linear regression
Simply put, if you take the square root of the r-squared value, which can be easily obtained in Excel, the outcome equals to the Pearson correlation coefficient.
Here's my R code to test this:
x<-rnorm c="" font="" nbsp="">-rnorm>
y<-rnorm c="" font="">-rnorm>
mdl <- font="" lm="" x="" y=""># linear regression model->
r = summary(mdl)$r.squared # the r-squared value from linear regression
rp = cor.test(x,y, method='pearson')$estimate # rp is the Pearson correlation coefficient
sqrt(r) == rp
Here's my R code to test this:
x<-rnorm c="" font="" nbsp="">-rnorm>
y<-rnorm c="" font="">-rnorm>
mdl <- font="" lm="" x="" y=""># linear regression model->
r = summary(mdl)$r.squared # the r-squared value from linear regression
rp = cor.test(x,y, method='pearson')$estimate # rp is the Pearson correlation coefficient
sqrt(r) == rp
Wednesday, February 11, 2015
Subscribe to:
Posts (Atom)
my-alpine and docker-compose.yml
``` version: '1' services: man: build: . image: my-alpine:latest ``` Dockerfile: ``` FROM alpine:latest ENV PYTH...
-
It took me a while to figure out how to insert a space in Mathtype equations. This is especially useful when you write an equation with mult...
-
Recently I read post from Dr. Doug Hull's blog: http://blogs.mathworks.com/videos/2009/10/23/basics-volume-visualization-19-defining-s...
-
To get the slope of a pair of x and y, usually I first plot the curve and then add the trend line. Actually there are two functions i...