performance - Forcing sequential processing in Haskell's Data.Binary.Get -
after trying import basic java runtime library rt.jar language-java-classfile, i've discovered uses huge amounts of memory.
i've reduced program demonstrating problem 100 lines , uploaded hpaste. without forcing evaluation of stream
in line #94, have no chance of ever running because eats memory. forcing stream
before passing getclass
finishes, still uses huge amounts of memory:
34,302,587,664 bytes allocated in heap 32,583,990,728 bytes copied during gc 139,810,024 bytes maximum residency (398 sample(s)) 29,142,240 bytes maximum slop 281 mb total memory in use (4 mb lost due fragmentation) generation 0: 64992 collections, 0 parallel, 38.07s, 37.94s elapsed generation 1: 398 collections, 0 parallel, 25.87s, 27.78s elapsed init time 0.01s ( 0.00s elapsed) mut time 37.22s ( 36.85s elapsed) gc time 63.94s ( 65.72s elapsed) rp time 0.00s ( 0.00s elapsed) prof time 13.00s ( 13.18s elapsed) exit time 0.00s ( 0.00s elapsed) total time 114.17s (115.76s elapsed) %gc time 56.0% (56.8% elapsed) alloc rate 921,369,531 bytes per mut second productivity 32.6% of total user, 32.2% of total elapsed
i thought problem consttable
s staying around, tried forcing cls
in line #94 well. makes memory consumption , runtime worse:
34,300,700,520 bytes allocated in heap 23,579,794,624 bytes copied during gc 487,798,904 bytes maximum residency (423 sample(s)) 36,312,104 bytes maximum slop 554 mb total memory in use (10 mb lost due fragmentation) generation 0: 64983 collections, 0 parallel, 71.19s, 71.48s elapsed generation 1: 423 collections, 0 parallel, 344.74s, 353.01s elapsed init time 0.01s ( 0.00s elapsed) mut time 40.60s ( 42.38s elapsed) gc time 415.93s (424.49s elapsed) rp time 0.00s ( 0.00s elapsed) prof time 56.53s ( 57.71s elapsed) exit time 0.00s ( 0.00s elapsed) total time 513.07s (524.58s elapsed) %gc time 81.1% (80.9% elapsed) alloc rate 844,636,801 bytes per mut second productivity 7.9% of total user, 7.7% of total elapsed
so question basically, how force sequential processing of files involved, after each 1 processed, string result (cls
) remains in memory?
edit 2: realized code this:
stream <- bl.pack <$> filecontents [] classfile
don't that. pack
functions notoriously slow. you'll need find solution doesn't involve using pack
create bytestring.
i'm leaving rest of answer because still think applies, biggest problem.
unfortunately can't test because don't recognize imports.
if want result cls
remain in memory, why don't force instead of forcing stream? change line 94 to
cls `seq` return cls
it may necessary use deepseq
instead of seq
, although have suspicion plain seq
sufficient here.
however think there's better solution, , that's use mapm_
instead of mapm
. think it's better style (and better performance) create function it's supposed each result rather returning list. here, can change main function to:
main = witharchive [checkconsflag] jarpath $ classfiles <- filter isclassfile <$> filenames [] form_ classfiles $ \classfile -> stream <- bl.pack <$> filecontents [] classfile let cls = runget getclass stream lift $ print cls
now print
lifted function passed form_
each classfile. value cls
used internally , never returned, it's both evaluated , gc'd on each iteration of form_
.
making use of style in larger application may require refactoring or redesign, results may worth it.
edit: if you're going trouble redesign code, use iteratees , avoid problem entirely.
Comments
Post a Comment