python - Abuse yield to avoid condition in loop -
i need search first, last, any, or occurence of in else. avoid repeating myself (dry) came following solution.
of interest methods search_revisions()
, collect_one_occurence()
of both searcher
classes.
in searcheryield
create generator in search_revisions()
abandon generator in collect_one_occurence()
after collecting first result. in searchercondition
put condition in loop. condition have checked every iteration of loop.
i can't decide whether (ab)use of yield , subsequent abandoning of generator strike of genius or hideous hack. think? have other ideas such situation?
#!/usr/bin/python class revision: # revision textfile. # search() method search textfile # , return lines match given pattern. # demonstration purposes class simplified # return predefined results def __init__(self, results): self.results = results def search(self, pattern): return self.results class abstractsearcher: def __init__(self, revisions): self.revisions = revisions def search_for_first_occurence(self, pattern): keys = sorted(self.revisions.iterkeys()) return self.collect_one_occurence(keys, pattern) def search_for_last_occurence(self, pattern): keys = sorted(self.revisions.iterkeys(), reverse = true) return self.collect_one_occurence(keys, pattern) def search_for_any_occurence(self, pattern): keys = self.revisions.iterkeys() return self.collect_one_occurence(keys, pattern) def search_for_all_occurences(self, pattern): keys = self.revisions.iterkeys() return self.collect_all_occurences(keys, pattern) class searcheryield(abstractsearcher): def search_revisions(self, keys, pattern): # create generator yields results 1 one key in keys: rev = self.revisions[key] result = rev.search(pattern) if result: yield result def collect_one_occurence(self, keys, pattern): # take first result , abandon generator result in self.search_revisions(keys, pattern): return result return [] def collect_all_occurences(self, keys, pattern): # collect results generator results = [] result in self.search_revisions(keys, pattern): results.extend(result) return results class searchercondition(abstractsearcher): def search_revisions(self, keys, pattern, just_one): # collect either results revisions # or break loop after first result found results = [] key in keys: rev = self.revisions[key] result = rev.search(pattern) if result: results.extend(result) if just_one: break return results def collect_one_occurence(self, keys, pattern): return self.search_revisions(keys, pattern, just_one = true) def collect_all_occurences(self, keys, pattern): return self.search_revisions(keys, pattern, just_one = false) def demo(searcher): print searcher.__class__.__name__ print 'first:', searcher.search_for_first_occurence('foo') print 'last: ', searcher.search_for_last_occurence('foo') print 'any: ', searcher.search_for_any_occurence('foo') print 'all: ', searcher.search_for_all_occurences('foo') def main(): revisions = { 1: revision([]), 2: revision(['a', 'b']), 3: revision(['c']), 4: revision(['d','e', 'f']), 5: revision([])} demo(searcheryield(revisions)) demo(searchercondition(revisions)) if __name__ == '__main__': main()
some context: revisions text files. can think of them revisions of wiki page. typically there hundreds of revisions, thousands. each revision contains thousands of lines of text. there cases when there few revision few lines each.
a search in revision search pattern in text , return matching lines. there thousands of results, there no results.
sometimes need know whether there results in revision (search any). have collect results further processing (search all). need first revision match, last revision (search first , last).
i did benchmark. here results:
$ ./benchmark.py benchmark revcount: 1000 timeitcount: 1000 last, first, yield: 0.902059793472 last, first, cond: 0.897155046463 last, all, yield: 0.818709135056 last, all, cond: 0.818334102631 all, all, yield: 1.26602506638 all, all, cond: 1.17208003998 benchmark revcount: 2000 timeitcount: 1000 last, first, yield: 1.80768609047 last, first, cond: 1.84234118462 last, all, yield: 1.64661192894 last, all, cond: 1.67588806152 all, all, yield: 2.55621600151 all, all, cond: 2.37582707405 benchmark revcount: 10000 timeitcount: 1000 last, first, yield: 9.34304785728 last, first, cond: 9.33725094795 last, all, yield: 8.4673140049 last, all, cond: 8.49153590202 all, all, yield: 12.9636368752 all, all, cond: 11.780673027
the yield , condition solution show similar times. think because generator (yield) has loop condition in (if not empty or that). thought avoided condition in loop, moved out of sight.
anyway, numbers show performance equal, code should judged readability. stick condition in loop. explicit.
here benchmark code:
#!/usr/bin/python import functools import timeit class revision: # revision textfile. # search() method search textfile # , return lines match given pattern. # demonstration purposes class simplified # return predefined results def __init__(self, results): self.results = results def search(self, pattern): return self.results class abstractsearcher: def __init__(self, revisions): self.revisions = revisions def search_for_first_occurence(self, pattern): keys = sorted(self.revisions.iterkeys()) return self.collect_one_occurence(keys, pattern) def search_for_last_occurence(self, pattern): keys = sorted(self.revisions.iterkeys(), reverse = true) return self.collect_one_occurence(keys, pattern) def search_for_any_occurence(self, pattern): keys = self.revisions.iterkeys() return self.collect_one_occurence(keys, pattern) def search_for_all_occurences(self, pattern): keys = self.revisions.iterkeys() return self.collect_all_occurences(keys, pattern) class searcheryield(abstractsearcher): def search_revisions(self, keys, pattern): # create generator yields results 1 one key in keys: rev = self.revisions[key] result = rev.search(pattern) if result: yield result def collect_one_occurence(self, keys, pattern): # take first result , abandon generator result in self.search_revisions(keys, pattern): return result return [] def collect_all_occurences(self, keys, pattern): # collect results generator results = [] result in self.search_revisions(keys, pattern): results.extend(result) return results class searchercondition(abstractsearcher): def search_revisions(self, keys, pattern, just_one): # collect either results revisions # or break loop after first result found results = [] key in keys: rev = self.revisions[key] result = rev.search(pattern) if result: results.extend(result) if just_one: break return results def collect_one_occurence(self, keys, pattern): return self.search_revisions(keys, pattern, just_one = true) def collect_all_occurences(self, keys, pattern): return self.search_revisions(keys, pattern, just_one = false) def benchmark(revcount, timeitcount): lastrev = {} in range(revcount): lastrev[i] = revision([]) lastrev[revcount] = revision([1]) allrevs = {} in range(revcount): allrevs[i] = revision([1]) last_yield = searcheryield(lastrev) last_cond = searchercondition(lastrev) all_yield = searcheryield(allrevs) all_cond = searchercondition(allrevs) lfy = functools.partial(last_yield.search_for_first_occurence, 'foo') lfc = functools.partial(last_cond.search_for_first_occurence, 'foo') lay = functools.partial(last_yield.search_for_all_occurences, 'foo') lac = functools.partial(last_cond.search_for_all_occurences, 'foo') aay = functools.partial(all_yield.search_for_all_occurences, 'foo') aac = functools.partial(all_cond.search_for_all_occurences, 'foo') print 'benchmark revcount: %d timeitcount: %d' % (revcount, timeitcount) print 'last, first, yield:', timeit.timeit(lfy, number = timeitcount) print 'last, first, cond:', timeit.timeit(lfc, number = timeitcount) print 'last, all, yield:', timeit.timeit(lay, number = timeitcount) print 'last, all, cond:', timeit.timeit(lac, number = timeitcount) print ' all, all, yield:', timeit.timeit(aay, number = timeitcount) print ' all, all, cond:', timeit.timeit(aac, number = timeitcount) def main(): timeitcount = 1000 benchmark(1000, timeitcount) benchmark(2000, timeitcount) benchmark(10000, timeitcount) if __name__ == '__main__': main()
some information system:
$ lsb_release -a no lsb modules available. distributor id: ubuntu description: ubuntu 10.04.1 lts release: 10.04 codename: lucid $ uname -a linux lesmana-laptop 2.6.32-26-generic #46-ubuntu smp tue oct 26 16:46:46 utc 2010 i686 gnu/linux $ python --version python 2.6.5 $ cat /proc/cpuinfo | grep name model name : intel(r) pentium(r) m processor 1.60ghz
Comments
Post a Comment