PostgreSQL 11 新特性之窗口函数(window function)增强

作者: 不剪发的Tony老师
毕业于北京航空航天大学,十多年数据库管理与开发经验,目前在一家全球性的金融公司从事数据库架构设计。CSDN学院签约讲师以及GitChat专栏作者。csdn上的博客收藏于以下地址:https://tonydong.blog.csdn.net

文章目录

PostgreSQL 11 针对窗口函数(window function)进行了增强,添加了 SQL:2011 标准中的所有窗口范围(window frame)子句选项。具体来说,新增了以下功能:

    允许在RANGE窗口模式中使用off_set PRECEDING/FOLLOWING限定窗口范围
    支持GROUPS窗口模式,使用基于组的方式限定窗口范围
    支持窗口范围子句的排除选项,用于排除窗口内的某些数据行

首先,对于RANGE窗口模式,增加了off_set PRECEDING/FOLLOWING子句,现在可以支持所有的选项:

RANGE frame_start
RANGE BETWEEN frame_start AND frame_end

   

其中,frame_start 和 frame_end 可以是以下选项之一:

UNBOUNDED PRECEDING
off_set PRECEDING
CURRENT ROW
off_set FOLLOWING
UNBOUNDED FOLLOWING

    
创建一个测试表:

CREATE TABLE test
(
  id      SERIAL PRIMARY KEY,
  val     INT4,
  logtime TIMESTAMP
);

INSERT INTO test(val, logtime)
VALUES
(1, '2019-01-15 08:08:17'),
(1, '2019-01-15 08:14:30'),
(3, '2019-01-15 08:36:00'),
(6, '2019-01-15 09:20:56'),
(6, '2019-01-15 10:15:41');

以下示例比较了ROWS模式和RANGE模式的区别:

SELECT id, val, logtime,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_rows,
       sum(val) OVER (ORDER BY val RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_range,
       sum(val) OVER (ORDER BY logtime RANGE BETWEEN INTERVAL '10 minute' PRECEDING AND '10 minute' FOLLOWING) as sum_range_time
  FROM test;
 id | val |       logtime       | sum_rows | sum_range | sum_range_time
----+-----+---------------------+----------+-----------+----------------
  1 |   1 | 2019-01-15 08:08:17 |        2 |         2 |              2
  2 |   1 | 2019-01-15 08:14:30 |        5 |         2 |              2
  3 |   3 | 2019-01-15 08:36:00 |       10 |         3 |              3
  4 |   6 | 2019-01-15 09:20:56 |       15 |        12 |              6
  5 |   6 | 2019-01-15 10:15:41 |       12 |        12 |              6
(5 rows)

   
其中,

    sum_rows 列计算按照字段 val 排序后,每一行以及前后各一行的合计值;
    sum_range 列计算按照字段 val 排序后,每一行以及前后和它的值相差小于等于 1 的那些行的合计值;
    sum_range_time 列计算按照字段 logtime 排序后,每一行以及前后和它的值相差小于等于 10 分钟的那些行的合计值。

其次,PostgreSQL 11 增加了GROUPS窗口模式。

GROUPS frame_start
GROUPS BETWEEN frame_start AND frame_end

 
其中,frame_start 和 frame_end 可以是以下选项之一:

UNBOUNDED PRECEDING
off_set PRECEDING
CURRENT ROW
off_set FOLLOWING
UNBOUNDED FOLLOWING

   
对于GROUPS窗口模式,计算窗口范围时使用组(由ORDER BY决定)为单位,而不是以行ROWS模式)或者值RANGE模式)为单位。

基于上面的示例,增加GROUPS模式:

SELECT id, val, logtime,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_rows,
       sum(val) OVER (ORDER BY val RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_range,
       sum(val) OVER (ORDER BY val GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as sum_groups
  FROM test;
 id | val |       logtime       | sum_rows | sum_range | sum_groups
----+-----+---------------------+----------+-----------+------------
  1 |   1 | 2019-01-15 08:08:17 |        2 |         2 |          5
  2 |   1 | 2019-01-15 08:14:30 |        5 |         2 |          5
  3 |   3 | 2019-01-15 08:36:00 |       10 |         3 |         17
  4 |   6 | 2019-01-15 09:20:56 |       15 |        12 |         15
  5 |   6 | 2019-01-15 10:15:41 |       12 |        12 |         15
(5 rows)

 
其中,sum_groups 计算按照字段 val 排序后,每一行所在的组以及前后各一个组中所有行的合计值。

ORDER BY val决定了组,示例中存在 3 个组(1、3、6)。对于第一行数据,它所在的组有两个 val 值(1 和 1),1 PRECEDING组不存在,1 FOLLOWING组有一个值(3),合计为 5(1+1+3)。

最后,PostgreSQL 11 支持窗口范围子句的排除选项 frame_exclusion,用于排除窗口内的某些数据:

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

    
其中,frame_start 和 frame_end 可以是以下选项之一:

UNBOUNDED PRECEDING
off_set PRECEDING
CURRENT ROW
off_set FOLLOWING
UNBOUNDED FOLLOWING

    
frame_exclusion 可以是以下选项之一:

EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
EXCLUDE NO OTHERS

 
它们的作用分别为:

    EXCLUDE CURRENT ROW排除当前行
    EXCLUDE GROUP排除当前行所在的组,也就是和当前行排序相同的行
    EXCLUDE TIES排除当前行所在组中的其他行,但是不排除当前行
    EXCLUDE NO OTHERS不排除任何行,默认值

以下示例演示了这些选项的效果:

SELECT id, val, logtime,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE NO OTHERS) as sum_rows,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW) as exclude_current_row,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) as exclude_group,
       sum(val) OVER (ORDER BY val ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) as exclude_ties
  FROM test;
 id | val |       logtime       | sum_rows | exclude_current_row | exclude_group | exclude_ties
----+-----+---------------------+----------+---------------------+---------------+--------------
  1 |   1 | 2019-01-15 08:08:17 |        2 |                   1 |               |            1
  2 |   1 | 2019-01-15 08:14:30 |        5 |                   4 |             3 |            4
  3 |   3 | 2019-01-15 08:36:00 |       10 |                   7 |             7 |           10
  4 |   6 | 2019-01-15 09:20:56 |       15 |                   9 |             3 |            9
  5 |   6 | 2019-01-15 10:15:41 |       12 |                   6 |               |            6
(5 rows)

    
至此,PostgreSQL 11 支持 SQL:2011 标准中定义的所有窗口范围子句选项。

官方文档:
https://www.postgresql.org/docs/11/tutorial-window.html
https://www.postgresql.org/docs/11/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS
https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-WINDOW
https://www.postgresql.org/docs/11/functions-window.html
https://www.postgresql.org/docs/11/sql-select.html#SQL-WINDOW