带误差条的线图 / Line plot with error bar in R
       线图(Line plots)的生成方法跟点图(Scatter plots)的产生方法大致相同,这两者都符合“plot”这一命令(command)和其他的定义(customizations),比如定义axes, box, labels, text, arrows, gridlines, colors, symbols, and legends等,用法也相同。
        线图和点图的差别在于概念上:如果x轴是顺序变量(ordered sequence)或时间变量(time variable),而不是预测变量(predictor variable),就应该把数据点连接成线。其他情况下就不应把点连接成线。
基本线图
        如仅仅描述某一变量随着时间(或顺序)的变化,则可以用简单线图。比如这里的示例数据如下,我们想绘出March, April, May三个月份的值在随着年份的变化:
![]()  | 
        R代码可以如下。从所得结果中可以定义线图的各个特征。
x <- data$YEAR
y3 <- data$T_MAR
y4 <- data$T_APR
y5 <- data$T_MAY
yall <- data.frame(y3, y4, y5)
par(mfrow=c(2,3))
matplot(x, yall, type="b", pch=1, col=1:3, main="Pic. 1")
matplot(x, yall, type="p", pch=1, col=1:3, main="Pic. 2")
matplot(x, yall, type="o", pch=19, col=1:3, main="Pic. 3") # type="o" is overlap,it's usually used when symbol is filled such as pch=19
matplot(x, yall, type="l", pch=1, col=1:3, main="Pic. 4") # types are all line
matplot(x, yall, type="pll", pch=1, col=1:3, main="Pic. 5") # types are respectively point,line,and line
matplot(x, yall, type="l", pch=1, col=1:3, lty=c(1,3,5), main="Pic. 6") # line types are 1, 3, and 5
两个Y轴的线图
        有时需要在同一图中放置两个Y轴,以比较两个因变量随着同一自变量变化有什么不同,这时候就要在基本坐标轴的右边放置第二个Y轴,并且定义好两个Y轴的标尺,以及相应两个曲线的标签。
(接以上代码)
par(mar=c(5,4,4,5)+.1) # Make some space for the second Y-axes
matplot(x, y3, type="b", pch=1, col=1:3)  # Draw plot of x~y3
par(new=T) # Important command for this
plot(data$BUD ~ data$YEAR, type="o", ann=F, axes=F, pch=19, ylim=c(60,100)) # Add the second curve to plot
axis(4, las=2) # Add the second Y-axes. las=2 is to define the direction of Y label
mtext("BUD Value", side=4, padj=4) # Define the label of the second Y-axes
legend("topright", pch=c(1,19), lty=1, legend=c("y3", "BUD")) # Add the labels of the two Y-axis
带Error Bar的线图(已知标准差或标准误)
如果涉及到平均值和标准差或标准误,则需要在线图的每一数据点的平均值上添加误差条。当数据格式像Example Data 1的时候,也就是说SD或SE已经计算出来,也就是说事前已经使用Excel等将数据表进行过总结,已经列出了每个组的标准差或标准误,并一一对应平均值MEAN的时候,则可以根据以下代码画图:
(接以上代码)
par(mfrow=c(1,2))
matplot(data$YEAR, data$BUD, type="o",pch=15,col=1:3,ylim=c(70,110), main="Up&Down Error Bar")
arrows(data$YEAR, data$BUD, data$YEAR,data$BUD+data$BUD_SE, length=0.1, angle=90) # Draw the up error bar
arrows(data$YEAR, data$BUD, data$YEAR,data$BUD-data$BUD_SE, length=0.1, angle=90) # Draw the down error bar
matplot(data$YEAR, data$BUD, type="o",pch=15,col=1:3,ylim=c(70,110), main="Arrow Error Bar")
arrows(data$YEAR,data$BUD+data$BUD_SE, data$YEAR,data$BUD-data$BUD_SE, length=0.1, angle=30) # Draw the arrows of error bar
        将三个线图放在一个plot图、平均值和误差条放在另一个plot图,再将两个plot合在一起:
par(mfrow=c(2,1))
par(mar=c(0,5,0,0)) # mar is A numerical vector of the form 
c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot. 
matplot(x, yall, type="o", pch=19, col=1:3, bg=1:3, ylab="Temperature(°C)")
legend(1999, 9.5, lty=c(1,2,3), pch=19, col=1:3, legend=c("March", "April","May"))
par(mar=c(3,5,0,0))
matplot(data$YEAR, data$BUD, type="o",pch=15,col=1:3,ylim=c(70,110), ylab="BUD values")
arrows(data$YEAR, data$BUD, data$YEAR,data$BUD+data$BUD_SE, length=0.1, angle=90)
arrows(data$YEAR, data$BUD, data$YEAR,data$BUD-data$BUD_SE, length=0.1, angle=90)
legend(1999, 9.5, lty=c(1,2,3), pch=19, col=1:3, legend=c("March", "April","May"))
par(mar=c(3,5,0,0))
matplot(data$YEAR, data$BUD, type="o",pch=15,col=1:3,ylim=c(70,110), ylab="BUD values")
arrows(data$YEAR, data$BUD, data$YEAR,data$BUD+data$BUD_SE, length=0.1, angle=90)
arrows(data$YEAR, data$BUD, data$YEAR,data$BUD-data$BUD_SE, length=0.1, angle=90)
带Error Bar的线图(标准差或标准误未知)
对于那些未整理汇总的数据(即没有用某种软件计算过各种平均值、标准差或标准误的时候),或者是在R中运算产生的“中间产物/中间数据”,没有明确的标准差或标准误时,则需要先计算该组数据的平均值、标准差、标准误等,然后再绘制带误差条的线图。如示例数据2:
| NO. | len | supp | dose | 
| 1 | 4.2 | VC | 0.5 | 
| 2 | 11.5 | VC | 0.5 | 
| 3 | 7.3 | VC | 0.5 | 
| 4 | 5.8 | VC | 0.5 | 
| 5 | 6.4 | VC | 0.5 | 
| 6 | 10 | VC | 0.5 | 
| 7 | 11.2 | VC | 0.5 | 
| 8 | 11.2 | VC | 0.5 | 
| 9 | 5.2 | VC | 0.5 | 
| 10 | 7 | VC | 0.5 | 
| 11 | 16.5 | VC | 1 | 
| 12 | 16.5 | VC | 1 | 
| 13 | 15.2 | VC | 1 | 
| 14 | 17.3 | VC | 1 | 
| 15 | 22.5 | VC | 1 | 
| 16 | 17.3 | VC | 1 | 
| 17 | 13.6 | VC | 1 | 
| 18 | 14.5 | VC | 1 | 
| 19 | 18.8 | VC | 1 | 
| 20 | 15.5 | VC | 1 | 
| 21 | 23.6 | VC | 2 | 
| 22 | 18.5 | VC | 2 | 
| 23 | 33.9 | VC | 2 | 
| 24 | 25.5 | VC | 2 | 
| 25 | 26.4 | VC | 2 | 
| 26 | 32.5 | VC | 2 | 
| 27 | 26.7 | VC | 2 | 
| 28 | 21.5 | VC | 2 | 
| 29 | 23.3 | VC | 2 | 
| 30 | 29.5 | VC | 2 | 
| 31 | 15.2 | OJ | 0.5 | 
| 32 | 21.5 | OJ | 0.5 | 
| 33 | 17.6 | OJ | 0.5 | 
| 34 | 9.7 | OJ | 0.5 | 
| 35 | 14.5 | OJ | 0.5 | 
| 36 | 10 | OJ | 0.5 | 
| 37 | 8.2 | OJ | 0.5 | 
| 38 | 9.4 | OJ | 0.5 | 
| 39 | 16.5 | OJ | 0.5 | 
| 40 | 9.7 | OJ | 0.5 | 
| 41 | 19.7 | OJ | 1 | 
| 42 | 23.3 | OJ | 1 | 
| 43 | 23.6 | OJ | 1 | 
| 44 | 26.4 | OJ | 1 | 
| 45 | 20 | OJ | 1 | 
| 46 | 25.2 | OJ | 1 | 
| 47 | 25.8 | OJ | 1 | 
| 48 | 21.2 | OJ | 1 | 
| 49 | 14.5 | OJ | 1 | 
| 50 | 27.3 | OJ | 1 | 
| 51 | 25.5 | OJ | 2 | 
| 52 | 26.4 | OJ | 2 | 
| 53 | 22.4 | OJ | 2 | 
| 54 | 24.5 | OJ | 2 | 
| 55 | 24.8 | OJ | 2 | 
| 56 | 30.9 | OJ | 2 | 
| 57 | 26.4 | OJ | 2 | 
| 58 | 27.3 | OJ | 2 | 
| 59 | 29.4 | OJ | 2 | 
| 60 | 23 | OJ | 2 | 
计算标准差和标准误
在上一篇博客里介绍了在R中如何分类汇总求平均值,没有介绍求标准差和标准误的方法。这里,就提供计算几组数据的平均值、标准差和标准误的函数:
定义这个计算函数的名称:data_summary
定义函数中的项目(arguements):
        data——数据框架
        varname——待计算SE/SD的变量的列名称
        groupnames——作为分组变量的列的名称
在示例数据2中,len是待计算各值的变量,supp和dose为用于分组的变量。
data_summary <- function(data, varname, groupnames){
   require(plyr)
   summary_func <- function(x, col){
     c(mean = mean(x[[col]], na.rm = TRUE),
       sd = sd(x[[col]], se = sd(x[[col]])/sqrt(length(x[[col]]))))
   }
   data_sum<-ddply(data, groupnames, .fun = summary_func,
                   varname)
   data_sum <- rename(data_sum, c("mean" = varname))
  return(data_sum)
 }
df <-read.csv("data.csv", sep=";", header=TRUE)
df.sdse <- data_summary(df, varname = "len", groupnames = c("supp", "dose"))
head(df.sdse)
#   supp dose   len       sd        se
# 1   OJ  0.5 13.23 4.459709 1.4102837
# 2   OJ  1.0 22.70 3.910953 1.2367520
# 3   OJ  2.0 26.06 2.655058 0.8396031
# 4   VC  0.5  7.98 2.746634 0.8685620
# 5   VC  1.0 16.77 2.515309 0.7954104
# 6   VC  2.0 26.14 4.797731 1.5171757
 ***   什么时候使用标准差(standard deviation)?什么时候使用标准误(standard error)?[2]
        视自身分析情况而定。如果焦点在数据的分布(spread)和差异性(variability)上,则可以用标准差(sd)来衡量。如果焦点在平均值的准确度(the precision of the means),或者比较、检验平均值之间的差异(the differences between means),则可以用标准误(se)来衡量。
        当然,要得到上述平均值或数据置信区间(用标准差),前提是数据必须服从正态分布。当不确定数据是否正态分布时,可以使用自展(bootstrapping)来获得置信区间。 
绘制带误差条的线图
        这里,我们使用ggplot2程序包中的geom_errorbar()完成[3]:
library(ggplot2)
p <- ggplot(df.sdse, aes(x=dose, y=len, fill=supp)) 
+ geom_bar(stat="identity", color="black", position=position_dodge())
+ geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=0.2, position=position_dodge(0.5))
p
        使用不同的颜色和背景格式[4],用户可根据自己的喜好选择:
default <- p + labs(title="default", x="Dose (mg)", y = "Length") + scale_fill_manual(values=c('#999999','#E69F00'))
default
gray <- p + labs(title="theme_gray", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_gray()
gray
classtic <- p + labs(title="theme_clastic", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_classic()
classtic
bw <- p + labs(title="theme_bw", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_bw()
bw
linedraw <- p + labs(title="theme_linedraw", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_linedraw()
linedraw
light <- p + labs(title="theme_light", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_light()
light
        如果想像第一个示例数据一样绘制折线图,并带有误差条,则也可使用ggplot2程序包完成[3]:
l <- ggplot(df.sdse, aes(x=dose, y=len, color=supp, group=supp)) +
     geom_point(pch=18, cex=4) +
     geom_line(lty=1) +
     geom_errorbar(aes(ymin=len-sd, ymax=len+sd), width=0.2, position=position_dodge(0.05))
l
## ggplot()中的group必须定义,否则line无法显示 ##
        同样,也可使用不同的颜色和背景格式,比较哪一种更美观[4]:
default.l <- l + labs(title="default", x="Dose (mg)", y = "Length") + scale_fill_manual(values=c('#999999','#E69F00'))
default.l
gray.l <- l + labs(title="theme_gray", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_gray()
gray.l
classtic.l <- l + labs(title="theme_clastic", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_classic()
classtic.l
bw.l <- l + labs(title="theme_bw", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_bw()
bw.l
linedraw.l <- l + labs(title="theme_linedraw", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_linedraw()
linedraw.l
light.l <- l + labs(title="theme_light", x="Dose (mg)", y = "Length") +
scale_fill_manual(values=c('#999999','#E69F00')) + theme_light()
light.l
参考资料 / References
[1] Line plots with error bars. https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/6c_-_line_plots_with_error_bars.pdf
[2] Standard deviation vs Standard error. https://www.r-bloggers.com/standard-deviation-vs-standard-error/
[3] ggplot2 error bars. http://www.sthda.com/english/wiki/ggplot2-error-bars-quick-start-guide-r-software-and-data-visualization
[4] ggplot2 themes and background colors. http://www.sthda.com/english/wiki/ggplot2-themes-and-background-colors-the-3-elements










评论
发表评论