使用 dplyr 包进行数据操作与清洗


在我们数据分析的实际应用中,我们可能会花费大量的时间在数据清洗上,而如果使用 R 里面自带的一些函数(base 包的 transform 等),可能会觉得力不从心,或者不是很人性化。好在我们有其他选择。这里我们介绍 dplyr 包。

单表操作函数(one table verbs)

  • filter: 保留满足条件的行
  • select: 使用列名选出列
  • arrange: 对数据的所有行排序
  • mutate: 添加新的变量
  • summarise: 分类汇总

这些函数的结构如下:

  • 第一个参数为 data frame 名
  • 接下来的参数是指定函数操作的依据(或条件)
  • 通常返回的是一个 data frame
  • 我们这里一个一个函数介绍,为了更好地表述我们的操作,我们这里使用一个构造的 data frame。
df <- data.frame(
      color = c("blue", "black", "blue", "blue", "black"),
      value = 1:5)

filter

filter 用于选择满足条件的观测(行),第一个参数是 data frame 名字,第二个参数是条件。这里举两个例子,我们对上面构造的 data frame 进行 filter 操作。第一个例子要求选取 color == blue 的观测; 第二个例子要求选取 value == 1 或者 value == 4 的观测。 
我们的 data frame 如下:

> df
  color value
1  blue     1
2 black     2
3  blue     3
4  blue     4
5 black     5

例子:

# 选取 `color == blue` 的观测
> filter(df, color == "blue")
  color value
1  blue     1
2  blue     3
3  blue     4
 # 选取 `value == 1` 或者 `value == 4 ` 的观测。
> filter(df, value %in% c(1, 4))
  color value
1  blue     1
2  blue     4

效果如下: 

注意: 这里需要提醒的是,对于多条件的选择,需要完整条件的,然后使用集合运算符将条件拼接起来。集合运算符有 !、|、\&、xor(交补)。条件的判断符有>(=)、\<(=)、==、!=、\%in\% (判断元素是否在集合或者列表内,返回逻辑值)。

select

select 用于选择列,比如

> select(df, color)
  color
1  blue
2 black
3  blue
4  blue
5 black
> select(df, -color)
  value
1     1
2     2
3     3
4     4
5     5

注意: select 中负号表示不选择。其中变量的声明还有其他形式,比如B:F表示从 B 列到 F 列所有列;ends_with(“string”) 表示选取列名以 string 结尾的全部列;contains(“string”) 表示选取列名中含有 string 的所有列。代码如下:

select(flights, arr_delay, dep_delay)
select(flights, arr_delay:dep_delay)
select(flights, ends_with("delay"))
select(flights, contains("delay"))

arrange

arrange 用于根据变量排序,如果排序依据(列)是字符,按照字母表的顺序,如果是数字,默认按照从小到大的顺序排序,如果需要使用逆序排,可以使用desc(var) 或者 -var。

> arrange(df, color)
  color value
1 black     2
2 black     5
3  blue     1
4  blue     3
5  blue     4
> arrange(df, desc(value))
  color value
1 black     5
2  blue     4
3  blue     3
4 black     2
5  blue     1

        注意: 多个排序依据直接把列名放在函数内,用逗号隔开;可以在排序里面使用计算,比如

arrange(flights, date, hour, minute)
arrange(flights, desc(dep_delay - arr_delay))

mutate

mutate 用于添加新的变量,直接使用列名进行计算得到新变量即可。而且它很有特色地方是,可以使用刚添加的变量,也就是在一个语句中可以多个变量,而且变量可以来源于刚新建的变量。

> mutate(df, double = 2 * value, quadruple = 2 * double)
  color value double quadruple
1  blue     1      2         4
2 black     2      4         8
3  blue     3      6        12
4  blue     4      8        16
5 black     5     10        20

summarise

summarise 可以用于“分类汇总”,但不是传统意义上的分类汇总,它还能做更多。

> by_color <- group_by(df, color) # 分组依据
> summarise(by_color, total = sum(value)) # 分组求和
Source: local data frame [2 x 2]
  color total
1 black     7
2  blue     8

实际上它是把我们现有的完整 data frame 依据分组依据(这里是 color)拆分成多个 data frame,然后对每个 data frame 分别计算,类似于 ddply。使用经历:分组依据可以多个,比如根据城市、月份、年份,我们对数据进行分类汇总,可以得到每个城市每一年每月的情况。

summarise 可以使用的函数有:

  • min(x), median(x), max(x), quantile(x, p)
  • n(), n_distinct(), sum(x), mean(x)
  • sum(x > 10), mean(x > 10)
  • sd(x), var(x), iqr(x), mad(x)

本文转载自:使用 dplyr 包进行数据操作与清洗




评论

支持上传图片(拖动图片或者截图粘贴)

129 评论
nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 20:36:30 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:40:59 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:33:09 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:43 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:22 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:21 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:20 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:18 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:17 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:14 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:32:13 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:31:50 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:31:25 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:30:59 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:30:22 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:29:42 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:29:09 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:28:14 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:27:07 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:26:00 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:49 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:47 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:41 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:39 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:37 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:36 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:35 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:32 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:30 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:28 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:26 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:24 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:24:23 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:48 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:21 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:19 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:18 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:17 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:15 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:14 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:23:13 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:22:46 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:22:17 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:21:42 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:21:14 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:20:49 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:19:48 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:18:13 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:16:59 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:15:33 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:19 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:17 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:14 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:11 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:09 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:08 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:14:04 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:13:24 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 19:12:24 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:59:27 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:58:53 回复

555

@@naaA4Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:58:16 回复

555

nZkkAbWB'"Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:58:13 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:58:09 回复

555

nZkkAbWBiAbg0HWo'Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:55:52 回复

555

-1" OR 5*5=25 -- Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:48:49 回复

555

-1' OR 5*5=25 -- Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:48:47 回复

555

-1 OR 5*5=25Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:48:45 回复

555

-1 OR 5*5=25 -- Chrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:48:43 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:48:42 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:48:02 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:40 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:39 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:38 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:35 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:34 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:32 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:31 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:47:05 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:46:40 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:46:15 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:45:46 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:45:18 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:44:42 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:43:28 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:42:07 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:41:06 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:40:16 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:40:13 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:40:10 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:40:04 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:40:01 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:39:59 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:39:55 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:39:20 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:38:50 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:33:26 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:55 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:23 回复

(select 198766*667891 from DUAL)

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:20 回复

(select 198766*667891)

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:19 回复

@@ZEXTx

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:17 回复

555����%2527%2522\'\"

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:16 回复

555'"

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:15 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:32:13 回复

555'||DBMS_PIPE.RECEIVE_MESSAGE(CHR(98)||CHR(98)||CHR(98),15)||'

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:31:42 回复

555*DBMS_PIPE.RECEIVE_MESSAGE(CHR(99)||CHR(99)||CHR(99),15)

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:31:24 回复

555Pfs0p6UJ')) OR 330=(SELECT 330 FROM PG_SLEEP(15))--

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:31:02 回复

555eBj2rhR7') OR 609=(SELECT 609 FROM PG_SLEEP(15))--

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:30:39 回复

555agRlXfgJ' OR 278=(SELECT 278 FROM PG_SLEEP(15))--

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:30:25 回复

555-1)) OR 777=(SELECT 777 FROM PG_SLEEP(15))--

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:30:05 回复

555-1) OR 616=(SELECT 616 FROM PG_SLEEP(15))--

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:29:45 回复

555-1 OR 762=(SELECT 762 FROM PG_SLEEP(15))--

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:29:09 回复

5551f6TsspB'

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:28:18 回复

555-1 waitfor delay '0:0:15' --

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:27:31 回复

555-1)

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:27:01 回复

555-1

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:26:31 回复

(select(0)from(select(sleep(15)))v)/*'+(select(0)from(select(sleep(15)))v)+'"+(select(0)from(select(sleep(15)))v)+"*/

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:26:02 回复

5550"XOR(555*if(now()=sysdate(),sleep(15),0))XOR"Z

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:25:09 回复

5550'XOR(555*if(now()=sysdate(),sleep(15),0))XOR'Z

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:24:46 回复

555*if(now()=sysdate(),sleep(15),0)

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:24:02 回复

-1" OR 5*5=25 or "HMPI5esF"="

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:59 回复

-1' OR 5*5=25 or 'l2muEXWp'='

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:57 回复

-1" OR 5*5=25 --

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:55 回复

-1' OR 5*5=25 --

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:53 回复

-1 OR 5*5=25

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:52 回复

-1 OR 5*5=25 --

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:50 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:30 回复

555

nZkkAbWBChrome/131.0.0.0Windows NT 10.0; Win64; x64
2026-04-10 18:23:08 回复

555