r - Only keep min value for each factor level -
i got problems bugs me time… here can me.
i got following data frame
f <- c('a','a','b','b','b','c','d','d','d','d') v1 <- c(1.3,10,2,10,10,1.1,10,3.1,10,10) v2 <- c(1:10) df <- data.frame(f,v1,v2)
f factor; v1 , v2 values. each level of f, want want keep 1 row: 1 has lowest value of v1 in factor level.
f v1 v2 1.3 1 b 2 3 c 1.1 6 d 3.1 8
i tried various things aggregate, ddply, by, tapply… nothing seems work. suggestions, thankful.
using dwin's solution, tapply
can avoided using ave
.
df[ df$v1 == ave(df$v1, df$f, fun=min), ]
this gives speed-up, shown below. mind you, dependent on number of levels. give notice ave
far forgotten about, although 1 of more powerful functions in r.
f <- rep(letters[1:20],10000) v1 <- rnorm(20*10000) v2 <- 1:(20*10000) df <- data.frame(f,v1,v2) > system.time(df[ df$v1 == ave(df$v1, df$f, fun=min), ]) user system elapsed 0.05 0.00 0.05 > system.time(df[ df$v1 %in% tapply(df$v1, df$f, min), ]) user system elapsed 0.25 0.03 0.29 > system.time(lapply(split(df, df$f), fun = function(x) { + vec <- which(x[3] == min(x[3])) + return(x[vec, ]) + }) + .... [truncated] user system elapsed 0.56 0.00 0.58 > system.time(df[tapply(1:nrow(df),df$f,function(i) i[which.min(df$v1[i])]),] + ) user system elapsed 0.17 0.00 0.19 > system.time( ddply(df, .var = "f", .fun = function(x) { + return(subset(x, v1 %in% min(v1))) + } + ) + ) user system elapsed 0.28 0.00 0.28
Comments
Post a Comment