MATLAB ... and more ...: September 2017

Wednesday, September 20, 2017

python - read large csv into pandas by chunks

chunks=pd.read_table('filename', chunksize=500000)
df=pd.DataFrame()
df=pd.concat((chunk==1) for chunk in chunks)

remove deplicates

To remove duplicated rows:

awk '!seen[$0]++' <filename>

To remove rows with duplicated field (say $1 is ID and need to remove the entire row if ID is duplicated):

awk '!seen[$1]++' <filename>

Tuesday, September 19, 2017

filter a file based on tokens in another file

BEGIN{

FS="|"

OFS="|"

while ((getline < (“Token_list_file.csv")) > 0) {

id[$1]=$1;

}

{

appid = $1;

if(appid in id) {print $0;}

}

MATLAB ... and more ...

Wednesday, September 20, 2017

python - read large csv into pandas by chunks

remove deplicates

Tuesday, September 19, 2017

filter a file based on tokens in another file

my-alpine and docker-compose.yml

Enjoy Matlab!

Report Abuse