一、Linux / Shell日志文件 app.log 格式2025-04-11 08:00:00,uid100,SUCCESS2025-04-11 08:01:00,uid101,FAIL2025-04-11 08:02:00,uid100,SUCCESS1.筛选出 SUCCESS 的行2.提取 UID第 2 列逗号分隔3.去重统计次数4.按次数降序排列grepSUCCESSapp.log|cut-d,-f2|sort|uniq-c|sort-nrgrep “SUCCESS” app.log作用只保留包含SUCCESS的成功日志用途过滤失败 / 异常日志只看目标数据二、SQL176. 第二高的薪水SELECT(SELECT DISTINCT salary FROM Employee ORDER BY salary DESC LIMIT1OFFSET1)AS SecondHighestSalary;LIMIT 1 OFFSET 1跳过 1 条取 1 条 第二条178. 分数排名SELECT score, DENSE_RANK()OVER(ORDER BY score DESC)ASrankFROM Scores ORDER BY score DESC;DENSE_RANK() 连续并列排名分数相同名次相同排名格式1,1,2,3,3,4RANK()跳跃排名 1,1,3,4ROW_NUMBER()不并列 1,2,3,4180. 连续出现的数字SELECT DISTINCT l1.num AS ConsecutiveNums FROM Logs l1 JOIN Logs l2 ON l1.idl2.id -1JOIN Logs l3 ON l1.idl3.id -2WHERE l1.numl2.num AND l2.numl3.num;三、PySparkfrompyspark.sqlimportSparkSessionfrompyspark.sqlimportfunctionsasF sparkSparkSession.builder \.master(local[*])\.appName(day11)\.getOrCreate()data[(1,click,10),(1,buy,20),(2,click,15),(3,view,5),]dfspark.createDataFrame(data,[uid,event,cost])# 1. 分组多聚合df.groupBy(uid)\.agg(F.sum(cost).alias(total),F.avg(cost).alias(avg),F.count(*).alias(cnt)).show()# 2. 过滤 排序df.filter(F.col(cost)10)\.orderBy(F.desc(cost))\.show()# 3. 新增列df.withColumn(cost_double,F.col(cost)*2).show()spark.stop()四、算法合并两个有序链表classListNode:def__init__(self,val0,nextNone):self.valval self.nextnextdefmergeTwoLists(l1,l2):dummycurListNode()whilel1andl2:ifl1.vall2.val:cur.nextl1 l1l1.nextelse:cur.nextl2 l2l2.nextcurcur.nextcur.nextl1orl2returndummy.next虚拟头节点 dummy避免头节点判空代码更简洁双指针遍历每次取较小节点接入结果剩余链表直接拼接一个链表走完后另一个直接接上复杂度时间O (m n)空间O (1)原地合并