数据的筛选
数据的筛选要提取相应内容的数据,最为常用的就是提取相应元素,比如提取某个元素,提取某一行,某一列。我们通过下面下面的例子来学习: data<-data.frame(a=sample(1:10),b=rep(c("a","b"),each=5),cdf=rnorm(10))data
#提取相应元素data[2,1]
data[[1]][[2]]
data[[c(1,2)]]
data$a[2]
#提取某一列data[[3]]
data$cdf
data$c
data[["c"]]
data[["c", exact = FALSE]]
数据的筛选还有一个最为常用的的就是移除缺失值: data<-data.frame(a=c(sample(1:5),NA,NA,sample(6:10)),b=c(rep(c("a","b"),each=5),NA,NA),cdf=rnorm(12))data
good <- complete.cases(data)data[good, ]
bad <- as.data.frame(is.na(data))data[!(bad$a|bad$b|bad$c),]
数据筛选有时是为了获得符合条件的数据: X <- data.frame("var1"=sample(1:5),"var2"=sample(6:10),"var3"=sample(11:15))X <- X[sample(1:5),]; X$var2[c(1,3)] = NAX
X[(X$var1 <= 3 & X$var3 > 11),]
subset(X,(X$var1 <= 3 & X$var3 > 11))
X[(X$var1 <= 3 | X$var3 > 15),]
X[which(X$var1 <= 3 | X$var3 > 15),]
对于取子集的函数subset,在帮助文档中有一段warning是值得我们注意的:“This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences."
|
|广告服务|关于我们|Archiver|手机版|小黑屋|大数据人 ( 鄂ICP备14012176号-2 )
GMT+8, 2024-5-12 17:20 , Processed in 0.207080 second(s), 22 queries .
Powered by 小雄! X3.2
© 2014-2020 bigdataer Inc.