Most efficient data structure to add styles to text -


i'm looking best data structure add styles text (say in text editor). structure should allow following operations:

  1. quick lookup of styles @ absolute position x
  2. quick insert of text @ position (styles after position must moved).
  3. every position of text must support arbitrary number of styles (overlapping).

i've considered lists/arrays contain text ranges don't allow quick insert without recalculating positions of styles after insert point.

a tree structure relative offsets supports #2 tree degenerate fast when add lots of styles text.

any other options?

i have never developped editor, how this:

i believe possible expand scheme used store text characters themeselves, depending of course on details of implementation (language, toolkits etc) , performance , resource usage requirements.

rather use separate data structure styles, i'd prefer having reference accompany each character , point array or list applicable characters. characters same set of styles point same array or list, 1 shared.

character insertions , deletions not affect styles themeselves, apart changing number of references them, handled bit of reference counting.

depending on programming language compress things bit more pointing halfway list, although additional bookkeeping might in fact make more inefficient.

the main issue suggestion memory usage. in ascii editor written in c, bundling pointer each char raise effective memory usage 1 byte 12 bytes on 64 bit system, due struct alignment padding.

i breaking text small variable size blocks allow efficiently compress pointers. e.g. 32-character block might in c:

struct _blk_ {     unsigned char size;     unsigned int styles;     char content[]; } 

the interesting part metadata processing on variable part of struct, contains both stored text , style pointers. size element indicate number of characters. styles integer (hence 32-character limit) seen set of 32 1-bit fields, each 1 indicating whether character has own style pointer, or whether should use same style previous character. way 32-char block single style have additional overhead of size char, styles mask , single pointer, along padding bytes. inserting , deleting characters small array should quite fast.

as text storage itself, tree sounds idea. perhaps binary tree each node value sum of children values, leaf nodes pointing text blocks size node value? root node value total size of text, each subtree ideally holding half of text. you'd still have auto-balance it, though, having merge half-empty text blocks.

and in case missed it, no expert in trees :-)

edit:

apparently suggested modified version of data structure:

http://en.wikipedia.org/wiki/rope_%28computer_science%29

as referenced in post:

data structure text editor

edit 2:

deletion in proposed data structure should relatively fast, come down byte shifting in array , few bitwise operations on styles mask. insertion pretty same, unless block fills up. might make sense reserve space (i.e. bits in styles mask) within each block allow future insertions directly in blocks, without having alter tree relatively small amounts of new text.

another advantage of bundling characters , styles in blocks inherent data locality should allow more efficient use of cpu cache other alternatives, improving processing speed extent.

much complex data structure, though, you'd need either profiling representative test cases or adaptive algorithm determine optimal parameters operation (block size, reserved space etc).


Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -