Sunday 29 July 2018

r - Extracting a random sample of rows in a data.frame with a nested conditional

This question builds from the SO post found here and uses code that was modified from a post on the R-help mailing list which can be seen here

I am trying to extract a random sample of rows in a data frame but with a conditional. Using the R iris data which looks like:

> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa

To take a simple random sample, the code below works fine to take a sample of 2 rows.

iris[sample(nrow(iris), 2), ]

However I am unsure how to condition the Species field. For example how to take the random sample as indicated above but only when Species != “setosa”

There are three categories of iris$Species

> summary(iris$Species)

setosa versicolor virginica
50 50 50

I am unsure how to correctly nest conditionals. One of my earlier attempts is below with the obviously incorrect results included….

> iris[sample(nrow(iris)[iris$Species != "setosa"], 2), ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species



I'd use which to get the vector of rows numbers from which you can sample given your condition....

iris[ sample( which( iris$Species != "setosa" ) , 2 ) , ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#59 6.6 2.9 4.6 1.3 versicolor
#133 6.4 2.8 5.6 2.2 virginica

