serialization - How to deserialize 1GB of objects into Python faster than cPickle? -


we've got python-based web server unpickles number of large data files on startup using cpickle. data files (pickled using highest_protocol) around 0.4 gb on disk , load memory 1.2 gb of python objects -- takes 20 seconds. we're using python 2.6 on 64-bit windows machines.

the bottleneck not disk (it takes less 0.5s read data), memory allocation , object creation (there millions of objects being created). want reduce 20s decrease startup time.

is there way deserialize more 1gb of objects python faster cpickle (like 5-10x)? because execution time bound memory allocation , object creation, presume using unpickling technique such json wouldn't here.

i know interpreted languages have way save entire memory image disk file, can load memory in 1 go, without allocation/creation each object. there way this, or achieve similar, in python?

  1. try marshal module - it's internal (used byte-compiler) , intentionally not advertised much, faster. note doesn't serialize arbitrary instances pickle, builtin types (don't remember exact constraints, see docs). note format isn't stable.

  2. if need initialize multiple processes , can tolerate 1 process loaded, there elegant solution: load objects in 1 process, , nothing in except forking processes on demand. forking fast (copy on write) , shares memory between processes. [disclaimers: untested; unlike ruby, python ref counting trigger page copies useless if have huge objects and/or access small fraction of them.]

  3. if objects contain lots of raw data numpy arrays, can memory-map them faster startup. pytables these scenarios.

  4. if you'll use small part of objects, oo database (like zope's) can you. though if need them in memory, waste lots of overhead little gain. (never used one, might nonsense).

  5. maybe other python implementations can it? don't know, thought...


Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

aspxgridview - Devexpress grid - header filter does not work if column is initially hidden -

c# - How to execute a particular part of code asynchronously in a class -