DolphinDB SQL查询:从简单到复杂
目录摘要一、基础查询1.1 SELECT语句1.2 条件过滤1.3 排序与限制二、聚合查询2.1 基本聚合2.2 分组聚合2.3 分组集三、连接查询3.1 连接类型3.2 内连接3.3 左连接3.4 多表连接四、子查询4.1 标量子查询4.2 行子查询4.3 表子查询五、窗口函数5.1 排序函数5.2 聚合窗口函数5.3 偏移函数六、时间序列查询6.1 时间窗口6.2 时间对齐6.3 重采样七、复杂查询实战7.1 分组Top-N7.2 同比环比7.3 连续区间八、查询优化8.1 执行计划8.2 优化建议8.3 常见问题九、总结参考资料摘要本文系统介绍DolphinDB SQL查询语言。从基础SELECT语句到复杂的多表关联从条件过滤到分组聚合从子查询到窗口函数逐步带领读者掌握DolphinDB SQL查询的核心技能。通过丰富的实战案例帮助读者编写高效的数据查询语句。一、基础查询1.1 SELECT语句//创建示例表 ttable(1..10asid,ABCABCABCAascategory,10203015253512223218asvalue,2024.01.010..9asdate)//查询所有列 select*fromt//查询指定列 selectid,category,valuefromt//列别名 selectidasdevice_id,valueastemperaturefromt//计算列 selectid,value,value*1.832asfahrenheitfromt1.2 条件过滤//WHERE条件 select*fromt whereid5//多条件 select*fromt whereid3andvalue30//IN条件 select*fromt where categoryin[A,B]//BETWEEN条件 select*fromt where value between15and30//LIKE模糊匹配 select*fromt where category likeA%//NULL判断 select*fromt where valueisnotNULL1.3 排序与限制//排序 select*fromt order by value desc select*fromt order by category,value desc//限制行数 select top5*fromt select top5*fromt order by value desc//分页 select*fromt limit5offset3//从第4行开始取5行二、聚合查询2.1 基本聚合//聚合函数 select count(*)ascntfromt selectsum(value)astotalfromt select avg(value)asaveragefromt selectmax(value)asmax_val,min(value)asmin_valfromt//多聚合 select count(*)ascnt,sum(value)astotal,avg(value)asaverage,std(value)asstd_devfromt2.2 分组聚合//GROUP BY select category,count(*)ascnt,sum(value)astotal,avg(value)asaveragefromt group by category//多列分组 select category,date,count(*)ascnt,avg(value)asavg_valfromt group by category,date//HAVING过滤 select category,avg(value)asavg_valfromt group by category having avg(value)202.3 分组集//ROLLUP select category,date,sum(value)astotalfromt group by rollup(category,date)//CUBE select category,date,sum(value)astotalfromt group by cube(category,date)//GROUPING SETS select category,date,sum(value)astotalfromt group by grouping sets((category),(date),(category,date))三、连接查询3.1 连接类型连接类型INNER JOIN内连接交集LEFT JOIN左连接左表全部RIGHT JOIN右连接右表全部FULL JOIN全连接并集3.2 内连接//创建示例表 t1table(1..5asid,ABCDEasname)t2table(1..3asid,100..102asvalue)//内连接 select*fromlj(t1,t2,id)//或 select t1.id,t1.name,t2.valuefromt1 inner join t2 on t1.idt2.id3.3 左连接//左连接 select*fromlj(t1,t2,id)/*idname value1A1002B1013C1024D NULL5E NULL*/3.4 多表连接//多表连接 t1table(1..5asid,ABCDEasname)t2table(1..5asid,100..104asvalue)t3table(1..5asid,XYZXYastype)select t1.id,t1.name,t2.value,t3.typefromt1 left join t2 on t1.idt2.idleft join t3 on t1.idt3.id四、子查询4.1 标量子查询//标量子查询返回单个值 select*fromt where value(select avg(value)fromt)//在SELECT中使用 selectid,value,(select avg(value)fromt)asavg_valuefromt4.2 行子查询//行子查询返回多行 select*fromt where categoryin(select distinct categoryfromt where value25)//EXISTS子查询 select*fromt1 where exists(select*fromt2 where t2.idt1.id)4.3 表子查询//表子查询返回表 select category,avg_valfrom(select category,avg(value)asavg_valfromt group by category)where avg_val20//WITH子句CTEwithstatsas(select category,avg(value)asavg_valfromt group by category)select*fromstats where avg_val20五、窗口函数5.1 排序函数//ROW_NUMBER selectid,category,value,row_number()over(partition by category order by value desc)asrankfromt//RANK有并列 selectid,category,value,rank()over(order by value desc)asrankfromt//DENSE_RANK连续排名 selectid,category,value,dense_rank()over(order by value desc)asrankfromt5.2 聚合窗口函数//累积聚合 selectid,value,sum(value)over(order byid)ascumsum,avg(value)over(order byidrows between2precedingandcurrent row)asmavgfromt//分组窗口 selectid,category,value,sum(value)over(partition by category order byid)ascategory_cumsumfromt5.3 偏移函数//LAG/LEAD selectid,value,lag(value,1)over(order byid)asprev_value,lead(value,1)over(order byid)asnext_valuefromt//FIRST/LAST selectid,value,first(value)over(order byid)asfirst_val,last(value)over(order byid)aslast_valfromt六、时间序列查询6.1 时间窗口//创建时间序列数据 ttable(1..100asid,2024.01.01T00:00:000..99*60000astimestamp,//每分钟一条 rand(100.0,100)asvalue)//时间窗口聚合 select bar(timestamp,10m)astime_window,avg(value)asavg_val,max(value)asmax_val,count(*)ascntfromt group by bar(timestamp,10m)6.2 时间对齐//时间对齐 select*fromt where timestamp between2024.01.01T00:00:00and2024.01.01T01:00:00//按小时聚合 select bar(timestamp,1h)ashour,avg(value)asavg_valfromt group by bar(timestamp,1h)6.3 重采样//重采样1分钟→5分钟 select bar(timestamp,5m)astime_5m,first(value)asopen,max(value)ashigh,min(value)aslow,last(value)asclosefromt group by bar(timestamp,5m)七、复杂查询实战7.1 分组Top-N//每个类别取前3条 select*from(selectid,category,value,row_number()over(partition by category order by value desc)asrankfromt)where rank37.2 同比环比//环比计算 select date,value,lag(value,1)over(order by date)asprev_value,(value-lag(value,1)over(order by date))/lag(value,1)over(order by date)asmom_ratefromt//同比计算 select date,value,lag(value,12)over(order by date)asprev_year_value,(value-lag(value,12)over(order by date))/lag(value,12)over(order by date)asyoy_ratefromt7.3 连续区间//查找连续值 select*from(selectid,value,id-row_number()over(order byid)asgrpfromt where value20)group by grp having count(*)3//连续3个以上八、查询优化8.1 执行计划//查看执行计划 explain select*fromt whereid5//分析查询性能 timer select count(*)fromt8.2 优化建议优化项说明分区裁剪在分区列上过滤索引使用在索引列上查询减少扫描只查询需要的列避免全表使用WHERE条件8.3 常见问题//避免SELECT*selectid,valuefromt//好 select*fromt//避免//使用分区过滤 select*fromt where date between2024.01.01and2024.01.31//好//避免函数包装索引列 select*fromt where date2024.01.15//好 select*fromt where year(date)2024//避免九、总结本文系统介绍了DolphinDB SQL查询基础查询SELECT、WHERE、ORDER BY聚合查询GROUP BY、HAVING、分组集连接查询内连接、左连接、多表连接子查询标量、行、表子查询窗口函数排序、聚合、偏移时间序列时间窗口、重采样查询优化执行计划、优化建议思考题如何选择合适的连接类型窗口函数和GROUP BY有什么区别如何优化大数据量查询参考资料DolphinDB SQL查询DolphinDB窗口函数