Ways to implement data versioning in Cassandra -


can share thoughts how implement data versioning in cassandra.

suppose need version records in simple address book. (address book records stored rows in columnfamily). expect history:

  • will used infrequently
  • will used @ once present in "time machine" fashion
  • there won't more versions few hundred single record.
  • history won't expire.

i'm considering following approach:

  • convert address book super column family , store multiple version of address book records in 1 row keyed (by time stamp) super columns.

  • create new super column family store old records or changes records. such structure follows:

    { 'address book row key': { 'time stamp1': { 'first name': 'new name', 'modified by': 'user id', },

    'time stamp2': {         'first name': 'new name',         'modified by': 'user id',     }, }, 

    'another address book row key': { 'time stamp': { ....

  • store versions serialized (json) object attached in new columnfamilly. representing sets of version rows , versions columns. (modelled after simple document versioning couchdb)

if can add assumption address books typically have fewer 10,000 entries in them, using 1 row per address book time line in super column family decent approach.

a row like:

{'address_book_18f3a8':   {1290635938721704: {'entry1': 'entry1_stuff', 'entry2': 'entry2_stuff'}},   {1290636018401680: {'entry1': 'entry1_stuff_v2', ...},   ... } 

where row key identifies address book, each super column name time stamp, , subcolumns represent address book's contents version.

this allow read latest version of address book 1 query , write new version single insert.

the reason suggest using if address books less 10,000 elements super columns must deserialized when read single subcolumn. overall, not bad in case, it's keep in mind.

an alternative approach use single row per version of address book, , use separate cf time line row per address book like:

{'address_book_18f3a8': {1290635938721704: some_uuid1, 1290636018401680: some_uuid2...}} 

here, some_uuid1 , some_uuid2 correspond row key versions of address book. downside approach requires 2 queries every time address book read. upside lets efficiently read select parts of address book.


Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -