Thursday, January 11, 2018

A self-defined algorithm to group strings | Python

def group_names(x):
    d1 = dict()
    for wx in x:
        dist_list = [Levenshtein.distance(wx, w2) for w2 in x]
        indx = [d<=4 for d in dist_list]
        sub_lst = list(compress(x, indx))
        list_new = [e for e in x if e not in sub_lst]
        x = list_new
        print len(x)
        if len(sub_lst)>1:
            for i in sub_lst[1:]:
                d1[i] = sub_lst[0]
    return d1


The problem is that when the input list (x) is too long, it takes quite a while to finish.

No comments:

Post a Comment

Any comments?

my-alpine and docker-compose.yml

 ``` version: '1' services:     man:       build: .       image: my-alpine:latest   ```  Dockerfile: ``` FROM alpine:latest ENV PYTH...