Monday, October 30, 2017

Concatenate multiple files with same headers (and only keep one header line in the output file)

awk '
FNR==1 && NR!=1 { while (/^<header>/) getline; }
1 {print}
'
file*.txt >all.txt
Note: the /^<header>/ part need to be changed to adapt to whatever the actual header is.

Tuesday, October 17, 2017

R - conversion

Don't directly convert factor to numeric! Change to characters first!

df$a = as.numeric(df$a) ---> NOT GOOD!

df$a = as.numeric(as.character(df$a)) ---> GOOD!


Monday, October 9, 2017

convert the first tab into pip

cat SOC.csv | sed -e "s/\t/$(printf '|')/"  > SOC.pip

Sanitize U.S. States Names


# load data
StateData = read.csv('65States.pip', sep="|", col.names = c("FullName", "Abbr"))
StateFullName = toupper(StateData$FullName)
StateAbbr = as.vector(StateData$Abbr)

# define a function
sanitizeState = function(inputcol, StateFullName, StateAbbr){
  match = amatch(inputcol, StateFullName, maxDist=1)
  inputcol[!is.na(match)] = StateAbbr[na.omit(match)]
  return (inputcol)
}

# use the function
df$State = sanitizeState(gls$State,StateFullName, StateAbbr )

my-alpine and docker-compose.yml

 ``` version: '1' services:     man:       build: .       image: my-alpine:latest   ```  Dockerfile: ``` FROM alpine:latest ENV PYTH...