r - Only keep min value for each factor level -


i got problems bugs me time… here can me.

i got following data frame

f <- c('a','a','b','b','b','c','d','d','d','d') v1 <- c(1.3,10,2,10,10,1.1,10,3.1,10,10) v2 <- c(1:10) df <- data.frame(f,v1,v2) 

f factor; v1 , v2 values. each level of f, want want keep 1 row: 1 has lowest value of v1 in factor level.

f   v1  v2   1.3 1 b   2   3 c   1.1 6 d   3.1 8 

i tried various things aggregate, ddply, by, tapply… nothing seems work. suggestions, thankful.

using dwin's solution, tapply can avoided using ave.

df[ df$v1 == ave(df$v1, df$f, fun=min), ] 

this gives speed-up, shown below. mind you, dependent on number of levels. give notice ave far forgotten about, although 1 of more powerful functions in r.

f <- rep(letters[1:20],10000) v1 <- rnorm(20*10000) v2 <- 1:(20*10000) df <- data.frame(f,v1,v2)  > system.time(df[ df$v1 == ave(df$v1, df$f, fun=min), ])    user  system elapsed     0.05    0.00    0.05   > system.time(df[ df$v1 %in% tapply(df$v1, df$f, min), ])    user  system elapsed     0.25    0.03    0.29   > system.time(lapply(split(df, df$f), fun = function(x) { +             vec <- which(x[3] == min(x[3])) +             return(x[vec, ]) +         }) +  .... [truncated]     user  system elapsed     0.56    0.00    0.58   > system.time(df[tapply(1:nrow(df),df$f,function(i) i[which.min(df$v1[i])]),] + )    user  system elapsed     0.17    0.00    0.19   > system.time( ddply(df, .var = "f", .fun = function(x) { +     return(subset(x, v1 %in% min(v1))) +     } + ) + )    user  system elapsed     0.28    0.00    0.28  

Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -