Wednesday 10 July 2019

r - Extracting a random sample of rows in a data.frame with a nested conditional



This question builds from the SO post found here and uses code that was modified from a post on the R-help mailing list which can be seen here



I am trying to extract a random sample of rows in a data frame but with a conditional. Using the R iris data which looks like:



> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa

2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa


To take a simple random sample, the code below works fine to take a sample of 2 rows.



iris[sample(nrow(iris), 2), ]



However I am unsure how to condition the Species field. For example how to take the random sample as indicated above but only when Species != “setosa”



There are three categories of iris$Species



> summary(iris$Species)
setosa versicolor virginica
50 50 50



I am unsure how to correctly nest conditionals. One of my earlier attempts is below with the obviously incorrect results included….



> iris[sample(nrow(iris)[iris$Species != "setosa"], 2), ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
NA NA NA NA NA
NA.1 NA NA NA NA


Thanks



Answer



I'd use which to get the vector of rows numbers from which you can sample given your condition....



iris[ sample( which( iris$Species != "setosa" ) , 2 ) , ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#59 6.6 2.9 4.6 1.3 versicolor
#133 6.4 2.8 5.6 2.2 virginica

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...