1 00:00:00,150 --> 00:00:03,230 And the final two functions will look at the final two window functions. 2 00:00:03,240 --> 00:00:06,120 We'll look at our lead and lag. 3 00:00:06,120 --> 00:00:10,650 And these ones can seem intimidating up front, but they're quite useful. 4 00:00:10,740 --> 00:00:17,640 They are most often used to make some comparison to find a difference between one row and the preceding 5 00:00:17,640 --> 00:00:20,130 row or the next row. 6 00:00:20,130 --> 00:00:26,130 So the way that it works syntax wise is we have some expression that we provide between the parentheses, 7 00:00:26,130 --> 00:00:32,100 and then if we're doing lag, that will return the value of that expression from the previous row. 8 00:00:32,130 --> 00:00:37,170 If we're doing lead, that will return the value of the expression from the next row. 9 00:00:37,170 --> 00:00:41,820 And I think it's easiest to show this as an example, because it is a little confusing when you just 10 00:00:41,820 --> 00:00:43,320 read about it on the docks. 11 00:00:43,530 --> 00:00:51,090 So if we do a salary for every employee and then I do lag of salary. 12 00:00:52,460 --> 00:00:59,960 That will evaluate to if I let me do an over I have to have that let's do order by salary to sending 13 00:01:00,710 --> 00:01:01,940 something like that. 14 00:01:02,420 --> 00:01:08,870 This is going to evaluate to the previous salary for each row in a given window. 15 00:01:08,870 --> 00:01:14,720 And if we only have one window, then it will just be the previous salary and you'll see what we get. 16 00:01:14,720 --> 00:01:19,070 So for this first row, 159,000, there is no row before it. 17 00:01:19,070 --> 00:01:20,630 There is no previous salary. 18 00:01:20,630 --> 00:01:23,660 But then we get to this row 103,000. 19 00:01:23,660 --> 00:01:25,250 What's the previous value? 20 00:01:25,280 --> 00:01:28,460 159,000 right from right there. 21 00:01:29,150 --> 00:01:32,120 So maybe useful, maybe not useful on its own. 22 00:01:32,120 --> 00:01:35,210 But what we could do is calculate the difference. 23 00:01:35,750 --> 00:01:39,050 And what is the salary jump between these rows. 24 00:01:39,050 --> 00:01:42,680 So salary minus the previous row salary. 25 00:01:43,280 --> 00:01:46,480 And then we'll call this as I don't know, salary dif. 26 00:01:47,720 --> 00:01:49,940 Okay, let's try running that. 27 00:01:50,450 --> 00:01:51,380 Here we go. 28 00:01:51,560 --> 00:01:53,180 And here's our difference. 29 00:01:53,180 --> 00:01:55,730 So the first salary up here, 159,000. 30 00:01:56,210 --> 00:01:59,630 There is no difference with the previous value because there is no previous value. 31 00:01:59,630 --> 00:02:00,580 That's the highest. 32 00:02:00,590 --> 00:02:05,870 But then we have 103,000, which is a 56,000 difference. 33 00:02:06,110 --> 00:02:09,740 And then this row, 91,000 versus 103,000. 34 00:02:09,740 --> 00:02:11,180 The difference is 12,000. 35 00:02:11,180 --> 00:02:13,090 89,000 versus 91,000. 36 00:02:13,100 --> 00:02:14,440 The difference is 2000. 37 00:02:14,450 --> 00:02:16,340 So hopefully you can see how that works. 38 00:02:16,550 --> 00:02:18,800 We also have lead. 39 00:02:19,580 --> 00:02:21,740 So if I did minus lead. 40 00:02:22,850 --> 00:02:27,380 That's going to compute to the value of the the next row's salary. 41 00:02:28,180 --> 00:02:31,430 So right here we have 159,000 and 103. 42 00:02:31,450 --> 00:02:38,770 This was calculated based off of 159,000 minus the next row's salary, which is this right here. 43 00:02:38,860 --> 00:02:40,990 So we can look forward or backwards. 44 00:02:40,990 --> 00:02:44,800 And again, that's within each partition or within each window. 45 00:02:44,800 --> 00:02:47,130 And right now we have one window. 46 00:02:47,140 --> 00:02:50,890 But let's try another example where we partition by department. 47 00:02:51,960 --> 00:02:59,790 So I'll do salary minus the previous rose salary over and then I'll say partition by department. 48 00:03:00,030 --> 00:03:02,790 So now we end up with our different windows. 49 00:03:02,820 --> 00:03:04,740 Each one is ordered by salary. 50 00:03:04,770 --> 00:03:11,010 So then we're going to calculate the difference between each row and its previous row in terms of department 51 00:03:11,010 --> 00:03:11,790 salary. 52 00:03:12,000 --> 00:03:14,420 So we'll call this department salary diff. 53 00:03:16,500 --> 00:03:18,180 And I'll run this. 54 00:03:19,600 --> 00:03:20,260 And here we are. 55 00:03:20,260 --> 00:03:24,460 We see within customer service the highest paid person of 61,000. 56 00:03:24,940 --> 00:03:28,150 There is no lag, so the value here is null. 57 00:03:28,270 --> 00:03:30,760 But then the next person is 56,000. 58 00:03:31,000 --> 00:03:32,320 That's a difference of 5000. 59 00:03:33,460 --> 00:03:38,080 We took this row and we subtracted the previous value for salary. 60 00:03:38,440 --> 00:03:42,700 Same thing within engineering 103,000 sets the top salary. 61 00:03:42,700 --> 00:03:44,260 And then the next one is 90. 62 00:03:44,260 --> 00:03:46,690 1000 is a $12,000 difference. 63 00:03:47,170 --> 00:03:49,990 Nowhere near as big as this one within sales. 64 00:03:50,140 --> 00:03:56,890 I guess our sales team works on commission and this person is a real producer, a real sales guy or 65 00:03:56,890 --> 00:03:58,930 gal, 159,000. 66 00:03:59,620 --> 00:04:07,060 And then the drop of 87,000 in salary to 72,000 is the second highest paid employee within sales. 67 00:04:07,480 --> 00:04:09,910 So that is lag and lead. 68 00:04:10,540 --> 00:04:13,000 It's often used to calculate differences between rows. 69 00:04:13,000 --> 00:04:14,480 It doesn't have to be a difference. 70 00:04:14,500 --> 00:04:16,089 You could do whatever you want. 71 00:04:16,089 --> 00:04:18,100 I mean, as you saw, we could just put the lag. 72 00:04:18,100 --> 00:04:20,170 That's only the previous salary. 73 00:04:20,920 --> 00:04:23,830 Not that useful, but that's what we see, right? 74 00:04:23,860 --> 00:04:25,900 Previous salary was 61,000. 75 00:04:26,200 --> 00:04:28,930 Previous salary for this row, 56,000. 76 00:04:28,990 --> 00:04:31,420 More often than not, you'll do some calculation. 77 00:04:31,930 --> 00:04:36,400 You can actually even get fancier and you can specify how many rows back. 78 00:04:37,240 --> 00:04:38,740 Let me what happened here. 79 00:04:39,100 --> 00:04:43,030 You can specify how many rows back by providing a number afterwards. 80 00:04:43,030 --> 00:04:49,480 Like if I do salary comma to this will get me the salary from two rows previous. 81 00:04:49,480 --> 00:04:52,000 So we have this row right here. 82 00:04:52,000 --> 00:04:56,140 55,002 rows ago was 61,000. 83 00:04:56,680 --> 00:05:00,160 Not useful probably in this situation, but you can do that. 84 00:05:00,520 --> 00:05:01,090 All right. 85 00:05:01,090 --> 00:05:04,060 So that wraps up our discussion of window functions. 86 00:05:04,180 --> 00:05:13,240 There's a lot of tricky stuff here, but I hope you can walk away understanding by now the concept of 87 00:05:13,240 --> 00:05:17,770 partitioning your data, your rows into some set of windows. 88 00:05:17,770 --> 00:05:21,310 Each one of those windows can be ordered and then you can run functions on them. 89 00:05:21,310 --> 00:05:26,230 Get the first value, get the nth value, get the average, get the sum, get the rank. 90 00:05:26,350 --> 00:05:33,730 And then instead of being reduced down to one value, we end up with rows where we have the results 91 00:05:33,730 --> 00:05:37,360 of those window functions alongside the individual rows. 92 00:05:37,360 --> 00:05:42,670 So we take some big chunk of data, hundreds of rows, maybe thousands of rows, partitioned it into 93 00:05:42,670 --> 00:05:49,090 windows, smaller pieces, and then do some analysis and then recombine it all back into one big table. 94 00:05:49,210 --> 00:05:51,010 So that is window functions.