Saturday, 15 December 2018

Random sample of rows from subset of an R dataframe

Answer


Answer





Is there a good way of getting a sample of rows from part of a dataframe?



If I just have data such as




gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age <- c(23, 25, 27, 29, 31, 33, 35, 37)


then I can easily sample the ages of three of the Fs with



sample(age[gender == "F"], 3)



and get something like



[1] 31 35 29


but if I turn this data into a dataframe



mydf <- data.frame(gender, age) 



I cannot use the obvious



sample(mydf[mydf$gender == "F", ], 3)


though I can concoct something convoluted with an absurd number of brackets like



mydf[sample((1:nrow(mydf))[mydf$gender == "F"], 3), ]



and get what I want which is something like



  gender age
7 F 35
4 F 29
1 F 23


Is there a better way that takes me less time to work out how to write?


Answer




Your convoluted way is pretty much how to do it - I think all the answers will be variations on that theme.



For example, I like to generate the mydf$gender=="F" indices first:



idx <- which(mydf$gender=="F")


Then I sample from that:



mydf[ sample(idx,3), ]



So in one line (although, you reduce the absurd number of brackets and possibly make your code easier to understand by having multiple lines):



mydf[ sample( which(mydf$gender=='F'), 3 ), ]


While the "wheee I'm a hacker!" part of me prefers the one-liner, the sensible part of me says that even though the two-liner is two lines, it is much more understandable - it's just your choice.


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...