Friday, April 27, 2018

Python | Process text file and get word counts


import csv
import pandas as pd
import codecs
import jieba
import jieba.analyse

tag_list = list() # list of all tags

with codecs.open('soup.csv', 'r', 'utf-8') as f:
   
    for ln in f:

        item = ln.strip("\n\r").split("\t")

        tags = jieba.analyse.extract_tags(item[0])

        for t in tags:

            tag_list.append(t)

tagS = pd.Series(tag_list)

output = tagS.value_counts(ascending=False)

output.to_csv('output.csv', encoding='gbk')

No comments:

Post a Comment

Any comments?

my-alpine and docker-compose.yml

 ``` version: '1' services:     man:       build: .       image: my-alpine:latest   ```  Dockerfile: ``` FROM alpine:latest ENV PYTH...