This question builds from the SO post found here and uses code that was modified from a post on the R-help mailing list which can be seen here
I am trying to extract a random sample of rows in a data frame but with a conditional. Using the R iris
data which looks like:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
To take a simple random sample, the code below works fine to take a sample of 2 rows.
iris[sample(nrow(iris), 2), ]
However I am unsure how to condition the Species field. For example how to take the random sample as indicated above but only when Species != “setosa”
There are three categories of iris$Species
> summary(iris$Species)
setosa versicolor virginica
50 50 50
I am unsure how to correctly nest conditionals. One of my earlier attempts is below with the obviously incorrect results included….
> iris[sample(nrow(iris)[iris$Species != "setosa"], 2), ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
NA NA NA NA NA
NA.1 NA NA NA NA
Thanks
Answer
I'd use which
to get the vector of rows numbers from which you can sample
given your condition....
iris[ sample( which( iris$Species != "setosa" ) , 2 ) , ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#59 6.6 2.9 4.6 1.3 versicolor
#133 6.4 2.8 5.6 2.2 virginica
No comments:
Post a Comment