r - ddply : push or pull? -
does ddply push or pull when grouping data? i.e, involve many passes on data frame, or one?
if take @ code, see general structure of function:
function (.data, .variables, .fun = null, ..., .progress = "none", .drop = true, .parallel = false) { .variables <- as.quoted(.variables) pieces <- splitter_d(.data, .variables, drop = .drop) ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, .parallel = .parallel) } <environment: namespace:plyr>
so rearranges variables in format that's easier use, breaks data pieces, , use ldply on pieces. pieces generated function splitter_d. pieces little bit more sophisticated list - it's pointer original data , list of indices. whenever request piece of list, looks matching indices , extracts appropriate data. avoids having multiple copies of data floating around. can see how functions using getanywhere("splitter_d")
or plyr:::splitter_d
.
ldply passes once on every piece of data. after that, combines dataframe. actually, in files of ldply written:
all plyr functions use same split-apply-combine strategy: split input simpler pieces, apply .fun each piece, , combine pieces single data structure. function splits lists elements , combines result data frame. if there no results, function return data frame 0 rows , columns (data.frame()).
i couldn't better myself. , miracle, first sentence found on page ddply well.
Comments
Post a Comment