r - ddply : push or pull? -


does ddply push or pull when grouping data? i.e, involve many passes on data frame, or one?

if take @ code, see general structure of function:

function (.data, .variables, .fun = null, ..., .progress = "none",      .drop = true, .parallel = false)  {     .variables <- as.quoted(.variables)     pieces <- splitter_d(.data, .variables, drop = .drop)     ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,          .parallel = .parallel) } <environment: namespace:plyr> 

so rearranges variables in format that's easier use, breaks data pieces, , use ldply on pieces. pieces generated function splitter_d. pieces little bit more sophisticated list - it's pointer original data , list of indices. whenever request piece of list, looks matching indices , extracts appropriate data. avoids having multiple copies of data floating around. can see how functions using getanywhere("splitter_d") or plyr:::splitter_d.

ldply passes once on every piece of data. after that, combines dataframe. actually, in files of ldply written:

all plyr functions use same split-apply-combine strategy: split input simpler pieces, apply .fun each piece, , combine pieces single data structure. function splits lists elements , combines result data frame. if there no results, function return data frame 0 rows , columns (data.frame()).

i couldn't better myself. , miracle, first sentence found on page ddply well.


Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -