Tuesday, 26 June 2018

python - Quickest way to dedupe list in dict

I have a dict containing lists and need a fast way to dedupe the lists.



I know how to dedupe a list in isolation using the set() function, but in this case I want a fast way of iterating through the dict, deduping each list on the way.



hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}



I'd like it to appear like;



hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]}


Though I don't necessarily need to have the original order of the lists preserved.



I've tried using a set like this, but it's not quite correct (it's not iterating properly and I'm losing the first key)



for key, value in hello.items(): goodbye = {key: set(value)}

>>> goodbye
{'test2': set([8, 9, 3, 4, 5])}


EDIT: Following PM 2Ring's comment below, I'm now populating the dict differently to avoid duplicates in the first place. Previously I was using lists, but using sets prevents dupes to be appended by default;



>>> my_numbers = {}
>>> my_numbers['first'] = [1,2,2,2,6,5]
>>> from collections import defaultdict
>>> final_list = defaultdict(set)

>>> for n in my_numbers['first']: final_list['test_first'].add(n)
...
>>> final_list['test_first']
set([1, 2, 5, 6])


As you can see, the final output is a deduped set, as required.

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...