第七章 基本统计 本章概要 1 描述统计 2 频次和相依表 3 相关系数和协方差 4 t-检验 5 非参数统计 本章所介绍内容概括如下。 一旦数据合理组织后,首先,基于数据可视化探索数据,接下来,我们要探索某个变量的分布情况和两个变量之间的关系。 1 描述统计 数据集,来自mtcars的三个变量mpg,hp和wt所构成的数据。 > vars <- c(“mpg”, “hp”, “wt”) > head(mtcars[vars]) mpg hp wt Mazda RX4 21.0 110 2.620 Mazda RX4 Wag 21.0 110 2.875 Datsun 710 22.8 93 2.320 Hornet 4 Drive 21.4 110 3.215 Hornet Sportabout 18.7 175 3.440 Valiant 18.1 105 3.460 与描述统计相关的一些函数 > summary(mtcars[vars]) mpg hp wt Min. :10.40 Min. : 52.0 Min. :1.513 1st Qu.:15.43 1st Qu.: 96.5 1st Qu.:2.581 Median :19.20 Median :123.0 Median :3.325 Mean :20.09 Mean :146.7 Mean :3.217 3rd Qu.:22.80 3rd Qu.:180.0 3rd Qu.:3.610 Max. :33.90 Max. :335.0 Max. :5.424 > mystats <- function(x, na.omit=FALSE) { + if(na.omit) + x <- x[!is.na(x)] + m <- mean(x) + n <- length(x) + s <- sd(x) + skew <- sum((x-m)^3/s^3) / n + kurt <- sum((x-m)^4/s^4) / n - 3 + return(c(n=n, mean=m, stdev=s, skew=skew, kurtosis=kurt)) + } > sapply(mtcars[vars], mystats) mpg hp wt n 32.000000 32.0000000 32.00000000 mean 20.090625 146.6875000 3.21725000 stdev 6.026948 68.5628685 0.97845744 skew 0.610655 0.7260237 0.42314646 kurtosis -0.372766 -0.1355511 -0.02271075 > apply(mtcars[vars], 2, mystats) mpg hp wt n 32.000000 32.0000000 32.00000000 mean 20.090625 146.6875000 3.21725000 stdev 6.026948 68.5628685 0.97845744 skew 0.610655 0.7260237 0.42314646 kurtosis -0.372766 -0.1355511 -0.02271075 拓展:一些与描述统计相关的包,例如:Hmisc,pastecs和psych。 > library(Hmisc) 载入需要的程辑包:grid 载入需要的程辑包:lattice 载入需要的程辑包:survival 载入需要的程辑包:splines 载入需要的程辑包:Formula 载入程辑包:‘Hmisc’ 下列对象被屏蔽了from ‘package:base’: format.pval, round.POSIXt, trunc.POSIXt, units > describe(mtcars[vars]) mtcars[vars] 3 Variables 32 Observations ————————————————————————————————————————— mpg n missing unique Mean .05 .10 .25 .50 .75 .90 .95 32 0 25 20.09 12.00 14.34 15.43 19.20 22.80 30.09 31.30 lowest : 10.4 13.3 14.3 14.7 15.0, highest: 26.0 27.3 30.4 32.4 33.9 ————————————————————————————————————————— hp n missing unique Mean .05 .10 .25 .50 .75 .90 .95 32 0 22 146.7 63.65 66.00 96.50 123.00 180.00 243.50 253.55 lowest : 52 62 65 66 91, highest: 215 230 245 264 335 ————————————————————————————————————————— wt n missing unique Mean .05 .10 .25 .50 .75 .90 .95 32 0 29 3.217 1.736 1.956 2.581 3.325 3.610 4.048 5.293 lowest : 1.513 1.615 1.835 1.935 2.140, highest: 3.845 4.070 5.250 5.345 5.424 ———————————————————————————————————————– > library(psych) 载入程辑包:‘psych’ 下列对象被屏蔽了from ‘package:Hmisc’: describe > describe(mtcars[vars]) vars n mean sd median trimmed mad min max range skew kurtosis se mpg 1 32 20.09 6.03 19.20 19.70 5.41 10.40 33.90 23.50 0.61 -0.37 1.07 hp 2 32 146.69 68.56 123.00 141.19 77.10 52.00 335.00 283.00 0.73 -0.14 12.12 wt 3 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91 0.42 -0.02 0.17 每组的描述统计 方式一:aggregate函数 > aggregate(mtcars[vars], by=list(am=mtcars$am), mean) am mpg hp wt 1 0 17.14737 160.2632 3.768895 2 1 24.39231 126.8462 2.411000 > aggregate(mtcars[vars], by=list(am=mtcars$am), sd) am mpg hp wt 1 0 3.833966 53.90820 0.7774001 2 1 6.166504 84.06232 0.6169816 方式二:by函数,形式如下 by(data, INDICES, FUN) 方式三:doBy包中summaryBy()函数或者psych包中describe.by()函数或者reshape包melt()和cast()函数。 结果可视化 直方图、密度图、盒箱图、点图等。 2 频次和相依表 研究对象:分类变量(categorical variables)。 数据集,采用vcd包里的Arthritis数据。 > library(vcd) Loading required package: grid > ?Arthritis starting httpd help server … done > head(Arthritis) ID Treatment Sex Age Improved 1 57 Treated Male 27 Some 2 46 Treated Male 29 None 3 77 Treated Male 30 None 4 17 Treated Male 32 Marked 5 36 Treated Male 46 Marked 6 23 Treated Male 58 Marked 创建和操作相依表的的函数,如图所示。 相依表创建和操作函数 举例说明如下。 一维表 > mytable <- with(Arthritis, table(Improved)) > mytable Improved None Some Marked 42 14 28 > prop.table(mytable) Improved None Some Marked 0.5000000 0.1666667 0.3333333 > prop.table(mytable) * 100 Improved None Some Marked 50.00000 16.66667 33.33333 二维表 > mytable1 <- xtabs(~ Treatment + Improved, data=Arthritis) > mytable1 Improved Treatment None Some Marked Placebo 29 7 7 Treated 13 7 21 |
|广告服务|关于我们|Archiver|手机版|小黑屋|大数据人 ( 鄂ICP备14012176号-2 )
GMT+8, 2024-5-4 09:16 , Processed in 0.200470 second(s), 21 queries .
Powered by 小雄! X3.2
© 2014-2020 bigdataer Inc.