python - Abuse yield to avoid condition in loop -


i need search first, last, any, or occurence of in else. avoid repeating myself (dry) came following solution.

of interest methods search_revisions() , collect_one_occurence() of both searcher classes.

in searcheryield create generator in search_revisions() abandon generator in collect_one_occurence() after collecting first result. in searchercondition put condition in loop. condition have checked every iteration of loop.

i can't decide whether (ab)use of yield , subsequent abandoning of generator strike of genius or hideous hack. think? have other ideas such situation?

#!/usr/bin/python  class revision:   # revision textfile.   # search() method search textfile   # , return lines match given pattern.   # demonstration purposes class simplified   # return predefined results   def __init__(self, results):     self.results = results   def search(self, pattern):     return self.results  class abstractsearcher:   def __init__(self, revisions):     self.revisions = revisions   def search_for_first_occurence(self, pattern):     keys = sorted(self.revisions.iterkeys())     return self.collect_one_occurence(keys, pattern)   def search_for_last_occurence(self, pattern):     keys = sorted(self.revisions.iterkeys(), reverse = true)     return self.collect_one_occurence(keys, pattern)   def search_for_any_occurence(self, pattern):     keys = self.revisions.iterkeys()     return self.collect_one_occurence(keys, pattern)   def search_for_all_occurences(self, pattern):     keys = self.revisions.iterkeys()     return self.collect_all_occurences(keys, pattern)  class searcheryield(abstractsearcher):    def search_revisions(self, keys, pattern):     # create generator yields results 1 one     key in keys:       rev = self.revisions[key]       result = rev.search(pattern)       if result:         yield result    def collect_one_occurence(self, keys, pattern):     # take first result , abandon generator     result in self.search_revisions(keys, pattern):       return result     return []    def collect_all_occurences(self, keys, pattern):     # collect results generator     results = []     result in self.search_revisions(keys, pattern):       results.extend(result)     return results  class searchercondition(abstractsearcher):    def search_revisions(self, keys, pattern, just_one):     # collect either results revisions     # or break loop after first result found     results = []     key in keys:       rev = self.revisions[key]       result = rev.search(pattern)       if result:         results.extend(result)         if just_one:           break     return results    def collect_one_occurence(self, keys, pattern):     return self.search_revisions(keys, pattern, just_one = true)    def collect_all_occurences(self, keys, pattern):     return self.search_revisions(keys, pattern, just_one = false)  def demo(searcher):   print searcher.__class__.__name__   print 'first:', searcher.search_for_first_occurence('foo')   print 'last: ', searcher.search_for_last_occurence('foo')   print 'any:  ', searcher.search_for_any_occurence('foo')   print 'all:  ', searcher.search_for_all_occurences('foo')  def main():   revisions = {         1: revision([]),         2: revision(['a', 'b']),         3: revision(['c']),         4: revision(['d','e', 'f']),         5: revision([])}   demo(searcheryield(revisions))   demo(searchercondition(revisions))  if __name__ == '__main__':   main() 

some context: revisions text files. can think of them revisions of wiki page. typically there hundreds of revisions, thousands. each revision contains thousands of lines of text. there cases when there few revision few lines each.

a search in revision search pattern in text , return matching lines. there thousands of results, there no results.

sometimes need know whether there results in revision (search any). have collect results further processing (search all). need first revision match, last revision (search first , last).

i did benchmark. here results:

$ ./benchmark.py  benchmark revcount: 1000 timeitcount: 1000 last, first, yield: 0.902059793472 last, first,  cond: 0.897155046463 last,   all, yield: 0.818709135056 last,   all,  cond: 0.818334102631  all,   all, yield: 1.26602506638  all,   all,  cond: 1.17208003998 benchmark revcount: 2000 timeitcount: 1000 last, first, yield: 1.80768609047 last, first,  cond: 1.84234118462 last,   all, yield: 1.64661192894 last,   all,  cond: 1.67588806152  all,   all, yield: 2.55621600151  all,   all,  cond: 2.37582707405 benchmark revcount: 10000 timeitcount: 1000 last, first, yield: 9.34304785728 last, first,  cond: 9.33725094795 last,   all, yield: 8.4673140049 last,   all,  cond: 8.49153590202  all,   all, yield: 12.9636368752  all,   all,  cond: 11.780673027 

the yield , condition solution show similar times. think because generator (yield) has loop condition in (if not empty or that). thought avoided condition in loop, moved out of sight.

anyway, numbers show performance equal, code should judged readability. stick condition in loop. explicit.

here benchmark code:

#!/usr/bin/python  import functools import timeit  class revision:   # revision textfile.   # search() method search textfile   # , return lines match given pattern.   # demonstration purposes class simplified   # return predefined results   def __init__(self, results):     self.results = results   def search(self, pattern):     return self.results  class abstractsearcher:   def __init__(self, revisions):     self.revisions = revisions   def search_for_first_occurence(self, pattern):     keys = sorted(self.revisions.iterkeys())     return self.collect_one_occurence(keys, pattern)   def search_for_last_occurence(self, pattern):     keys = sorted(self.revisions.iterkeys(), reverse = true)     return self.collect_one_occurence(keys, pattern)   def search_for_any_occurence(self, pattern):     keys = self.revisions.iterkeys()     return self.collect_one_occurence(keys, pattern)   def search_for_all_occurences(self, pattern):     keys = self.revisions.iterkeys()     return self.collect_all_occurences(keys, pattern)  class searcheryield(abstractsearcher):    def search_revisions(self, keys, pattern):     # create generator yields results 1 one     key in keys:       rev = self.revisions[key]       result = rev.search(pattern)       if result:         yield result    def collect_one_occurence(self, keys, pattern):     # take first result , abandon generator     result in self.search_revisions(keys, pattern):       return result     return []    def collect_all_occurences(self, keys, pattern):     # collect results generator     results = []     result in self.search_revisions(keys, pattern):       results.extend(result)     return results  class searchercondition(abstractsearcher):    def search_revisions(self, keys, pattern, just_one):     # collect either results revisions     # or break loop after first result found     results = []     key in keys:       rev = self.revisions[key]       result = rev.search(pattern)       if result:         results.extend(result)         if just_one:           break     return results    def collect_one_occurence(self, keys, pattern):     return self.search_revisions(keys, pattern, just_one = true)    def collect_all_occurences(self, keys, pattern):     return self.search_revisions(keys, pattern, just_one = false)  def benchmark(revcount, timeitcount):    lastrev = {}   in range(revcount):     lastrev[i] = revision([])   lastrev[revcount] = revision([1])    allrevs = {}   in range(revcount):     allrevs[i] = revision([1])    last_yield = searcheryield(lastrev)   last_cond = searchercondition(lastrev)   all_yield = searcheryield(allrevs)   all_cond = searchercondition(allrevs)    lfy = functools.partial(last_yield.search_for_first_occurence, 'foo')   lfc = functools.partial(last_cond.search_for_first_occurence, 'foo')   lay = functools.partial(last_yield.search_for_all_occurences, 'foo')   lac = functools.partial(last_cond.search_for_all_occurences, 'foo')   aay = functools.partial(all_yield.search_for_all_occurences, 'foo')   aac = functools.partial(all_cond.search_for_all_occurences, 'foo')    print 'benchmark revcount: %d timeitcount: %d' % (revcount, timeitcount)   print 'last, first, yield:', timeit.timeit(lfy, number = timeitcount)   print 'last, first,  cond:', timeit.timeit(lfc, number = timeitcount)   print 'last,   all, yield:', timeit.timeit(lay, number = timeitcount)   print 'last,   all,  cond:', timeit.timeit(lac, number = timeitcount)   print ' all,   all, yield:', timeit.timeit(aay, number = timeitcount)   print ' all,   all,  cond:', timeit.timeit(aac, number = timeitcount)  def main():   timeitcount = 1000   benchmark(1000, timeitcount)   benchmark(2000, timeitcount)   benchmark(10000, timeitcount)  if __name__ == '__main__':   main() 

some information system:

$ lsb_release -a no lsb modules available. distributor id: ubuntu description:    ubuntu 10.04.1 lts release:    10.04 codename:   lucid $ uname -a linux lesmana-laptop 2.6.32-26-generic #46-ubuntu smp tue oct 26 16:46:46 utc 2010 i686 gnu/linux $ python --version python 2.6.5 $ cat /proc/cpuinfo | grep name model name  : intel(r) pentium(r) m processor 1.60ghz 

Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -