algorithm - Python: find a duplicate in a container efficiently -
i have container cont
. if want find out if has duplicates, i'll check len(cont) == len(set(cont))
.
suppose want find duplicate element if exists (just arbitrary duplicate element). there neat , efficient way write that?
[python 3]
ok, first answer has gotten quite bit of flack, thought i'd try few different ways of doing , report differences. here's code.
import sys import itertools def getfirstdup(c, totest): # original idea using list slicing => 5.014 s if totest == '1': in xrange(0, len(c)): if c[i] in c[:i]: return c[i] # using 2 sets => 4.305 s elif totest == '2': s = set() in c: s2 = s.copy() s.add(i) if len(s) == len(s2): return # using dictionary lut => 0.763 s elif totest == '3': d = {} in c: if in d: return else: d[i] = 1 # using set operations => 0.772 s elif totest == '4': s = set() in c: if in s: return else: s.add(i) # sorting walking => 5.130 s elif totest == '5': c = sorted(c) in xrange(1, len(c)): if c[i] == c[i - 1]: return c[i] # sorting groupby-ing => 5.086 s else: c = sorted(c) k, g in itertools.groupby(c): if len(list(g)) > 1: return k return none c = list(xrange(0, 10000000)) c[5000] = 0 in xrange(0, 10): print getfirstdup(c, sys.argv[1])
basically, try in 6 different ways, listed in source file. used linux time
command , collected realtime runtimes, running commands so
time python ./test.py 1
with 1
being algorithm wanted try. each algorithm looks first duplicate in 10,000,000 integers, , runs ten times. there 1 duplication in list, "mostly sorted" though did try reverse sorted lists without noticing proportional difference between algorithms.
my original suggestion did poorly @ 5.014 s. understanding of icyrock.com's solution did poorly @ 4.305 s. next tried using dictionary create lut, gave best runtime @ 0.763 s. tried using in
operator on sets, , got 0.772 s, dictionary lut. tried sorting , walking list, gave pitiful time of 5.130 s. finally, tried john machin's suggestion of itertools, gave poor time of 5.086 s.
in summary, dictionary lut seems way go, set operations (which may use luts in implementation) being close second.
update: tried razpeitia's suggestion, , apart fact need know precisely duplicate key you're looking for, actual algorithm did worst far (66.366 s).
update 2: i'm sure going test biased because duplicate location near 1 end of list. try running code using different location before downvoting , report results!
Comments
Post a Comment