1 00:00:00,300 --> 00:00:01,460 All righty, then. 2 00:00:01,470 --> 00:00:02,719 Welcome back, everyone. 3 00:00:02,730 --> 00:00:09,030 In this section, we're kind of continuing with the trend of focusing on selecting our data on reading 4 00:00:09,030 --> 00:00:09,330 it. 5 00:00:09,330 --> 00:00:16,379 If we're going back to CRUD in the last section we saw things like sorting with order by using limit 6 00:00:16,379 --> 00:00:17,010 and so on. 7 00:00:17,010 --> 00:00:20,880 In this section we're going to focus on what are called aggregate functions. 8 00:00:20,880 --> 00:00:27,780 And basically these are built in functions in my SQL that will allow us to aggregate or combine data 9 00:00:27,810 --> 00:00:29,280 to get meaning out of it. 10 00:00:29,280 --> 00:00:36,630 So by the end, we'll be able to do things like find the average page length of all of our books or 11 00:00:36,630 --> 00:00:39,900 find the average page length for each author. 12 00:00:39,900 --> 00:00:46,920 So let's say we'd be able to tell that someone like Carver, who writes short stories, maybe has shorter 13 00:00:46,920 --> 00:00:50,760 books compared to David Foster Wallace, and that's just average. 14 00:00:50,760 --> 00:00:56,490 But we'll see how to do other things like count some men, Max and a bunch of others, and they're known 15 00:00:56,490 --> 00:00:59,580 as aggregate functions because they work on aggregated data. 16 00:00:59,910 --> 00:01:01,350 So yeah, let's get started. 17 00:01:01,350 --> 00:01:06,690 It's time to learn some more stuff and it will definitely 100% be the most fun thing that you've ever 18 00:01:06,690 --> 00:01:07,200 done. 19 00:01:07,230 --> 00:01:08,160 Let's get going. 20 00:01:08,940 --> 00:01:13,260 The first thing we're going to take a look at, the first one of these aggregate functions is called 21 00:01:13,260 --> 00:01:13,920 count. 22 00:01:13,920 --> 00:01:18,330 And it does pretty much what you would expect, at least in the way that I'm going to show it to you 23 00:01:18,330 --> 00:01:18,870 now. 24 00:01:19,020 --> 00:01:24,000 We're going to revisit Count in the next video once we introduce what's called Group Buy. 25 00:01:24,000 --> 00:01:26,250 But for now, we're just focusing on count on its own. 26 00:01:26,250 --> 00:01:29,340 And what it will do is count whatever you tell it to count. 27 00:01:29,340 --> 00:01:31,140 So here's an example. 28 00:01:31,140 --> 00:01:33,090 How many books are in our database? 29 00:01:33,360 --> 00:01:39,840 If we wanted to do that right now, you could do a select star from books and manually count, which 30 00:01:39,840 --> 00:01:41,460 technically would I mean, that would work. 31 00:01:41,460 --> 00:01:49,440 But if we have Amazon's database that has millions of books, that would be a lot of work that would 32 00:01:49,440 --> 00:01:53,880 take at least a couple of thousand mechanical Turks to do so. 33 00:01:53,880 --> 00:01:55,320 That's not ideal. 34 00:01:55,320 --> 00:02:02,610 Instead, we can use count and it just looks like this select count star from books. 35 00:02:02,820 --> 00:02:05,640 So let's try it and then we'll discuss it a bit more. 36 00:02:06,450 --> 00:02:15,120 So here I am, I'm using the same database, just double checking bookshop so we can do a select star 37 00:02:15,120 --> 00:02:16,170 from books. 38 00:02:16,350 --> 00:02:18,390 And like I said, we could go through and count. 39 00:02:20,040 --> 00:02:24,450 And you can't just rely on the IDs of course, because we could have deleted some. 40 00:02:24,510 --> 00:02:29,940 So just the highest number of the ID is not a reliable way of knowing how many books are in there, 41 00:02:30,450 --> 00:02:34,170 because when you delete something, those IDs don't change, they don't shift down. 42 00:02:34,440 --> 00:02:42,930 So instead we do a select count and we'll put Star here, which basically will count every row, the 43 00:02:42,930 --> 00:02:43,440 entire row. 44 00:02:43,440 --> 00:02:49,080 So select count star from books and we get 19. 45 00:02:49,560 --> 00:02:54,120 So that tells us there are 19 books and we could verify that. 46 00:02:54,360 --> 00:02:56,400 Let me just speed this up as I count. 47 00:02:56,400 --> 00:03:02,910 One, two, three, four, five, six, seven, eight, nine, ten, 11, 12, 13, 14, 15, 16, 17, 48 00:03:02,910 --> 00:03:03,570 18, 19. 49 00:03:04,110 --> 00:03:04,530 All right. 50 00:03:04,530 --> 00:03:07,860 So there were 19 books, so we can trust this count. 51 00:03:07,860 --> 00:03:08,520 It's right. 52 00:03:09,270 --> 00:03:13,440 And that's one way of using count just to check how many things are in the table in general. 53 00:03:13,470 --> 00:03:16,560 A more specific way of using it would be to do something like this. 54 00:03:16,560 --> 00:03:19,230 How many author first names are in our database? 55 00:03:19,230 --> 00:03:20,340 And it's not 19. 56 00:03:20,340 --> 00:03:29,940 We know that now because if we do a select author f name from books, you know, we have some authors 57 00:03:29,940 --> 00:03:31,710 who are in here multiple times. 58 00:03:31,710 --> 00:03:35,040 Neil Gaiman is in there twice, so it's going to count them twice. 59 00:03:35,040 --> 00:03:36,870 It's just going to count everything here. 60 00:03:37,230 --> 00:03:43,260 So if we did this select count author name from books, is it really going to be correct? 61 00:03:43,260 --> 00:03:44,850 And the answer is no, as I just said. 62 00:03:44,850 --> 00:03:45,990 But let's just try it. 63 00:03:45,990 --> 00:03:53,130 Select the count of author l name from books or we're doing actually f name, aren't we? 64 00:03:54,420 --> 00:03:58,340 And it tells us there are 19 author f names and that's technically true. 65 00:03:58,350 --> 00:04:00,450 There are 19 F names in here. 66 00:04:00,660 --> 00:04:03,450 I want to know how many unique first names are in there. 67 00:04:03,450 --> 00:04:10,050 And fortunately there's a way of doing that and it relies on distinct, which we've already seen. 68 00:04:10,380 --> 00:04:13,260 So we just put distinct inside of that count. 69 00:04:13,830 --> 00:04:18,120 So select count, distinct author f name from books. 70 00:04:18,120 --> 00:04:19,079 Let's try it. 71 00:04:20,130 --> 00:04:22,170 Select Count. 72 00:04:22,410 --> 00:04:25,020 Distinct author. 73 00:04:26,640 --> 00:04:29,490 F name from books. 74 00:04:30,060 --> 00:04:36,060 And now it tells us 12 and I won't make you sit through me counting it, but there are indeed 12/1 names 75 00:04:36,060 --> 00:04:37,350 that we have in our database. 76 00:04:37,680 --> 00:04:44,760 So instead of 19, it's now decided, okay, there's only one here and there's one here and there's 77 00:04:44,760 --> 00:04:45,480 one here. 78 00:04:46,080 --> 00:04:51,600 So that's the basics of using count, along with using count with distinct. 79 00:04:51,630 --> 00:04:54,210 Now, I do want to show you a bit of an edge case here. 80 00:04:54,360 --> 00:04:57,930 Let's say I wanted to know how many unique authors are in my database. 81 00:04:57,930 --> 00:05:00,240 So that means first name and last name. 82 00:05:01,290 --> 00:05:09,270 So if we do our select count and let's just do author last name first from books. 83 00:05:10,960 --> 00:05:13,270 We get 19, as you would expect, there's 19 rows. 84 00:05:13,270 --> 00:05:14,620 Every row has a last name. 85 00:05:15,380 --> 00:05:19,160 But we know that there are some duplicated last names. 86 00:05:19,970 --> 00:05:24,680 And if we just look at those, I'll just do this very quickly. 87 00:05:26,490 --> 00:05:28,370 It's the same problem that we had with first name, right? 88 00:05:28,380 --> 00:05:32,400 We had Foster Wallace, we have Eggers and we have gay men. 89 00:05:33,120 --> 00:05:35,970 So if we use that distinct. 90 00:05:37,430 --> 00:05:40,400 Select Count, distinct last names. 91 00:05:41,690 --> 00:05:43,670 Maybe you can see where I'm going here. 92 00:05:44,690 --> 00:05:50,570 If I hit enter, it tells us there's 11 distinct author last names, which is true. 93 00:05:50,600 --> 00:05:56,480 There are 11 distinct last names, but that doesn't mean there are 11 distinct authors. 94 00:05:56,930 --> 00:06:04,230 Because if we go and do a select author f name, comma, author, l name. 95 00:06:04,250 --> 00:06:10,610 So if we look at both first name and last name from books, remember that we have Dan Harris and Frieda 96 00:06:10,610 --> 00:06:11,300 Harris. 97 00:06:11,450 --> 00:06:16,340 So when we're selecting distinct last names, it treats this as the same because Harris is the same 98 00:06:16,340 --> 00:06:17,030 as Harris. 99 00:06:17,480 --> 00:06:23,540 So if we wanted to know how many distinct authors were in there, what we could do is do author l name, 100 00:06:23,540 --> 00:06:25,790 comma, author f name. 101 00:06:25,970 --> 00:06:32,420 And now it's only going to return distinct rows where first name and last name or last name and first 102 00:06:32,420 --> 00:06:33,740 name are distinct. 103 00:06:35,350 --> 00:06:42,700 And that gives us 12, which in our case is the same as just doing this right here as we just saw. 104 00:06:42,730 --> 00:06:44,830 If we only do author f name. 105 00:06:45,400 --> 00:06:50,080 But that's because we don't have any duplicated names or any duplicated first names in there. 106 00:06:50,080 --> 00:06:51,220 We have David Foster Wallace. 107 00:06:51,220 --> 00:06:54,670 But we don't have I don't know, David Johnson or something. 108 00:06:55,450 --> 00:06:59,470 So this would be problematic if we had multiple Davids. 109 00:07:00,220 --> 00:07:04,780 So then we could use this where we're doing distinct author l name and author f name together. 110 00:07:04,870 --> 00:07:08,020 And the last thing we'll do is this question here. 111 00:07:08,050 --> 00:07:12,490 How many titles contain the the string, the E. 112 00:07:12,790 --> 00:07:16,350 So if you'd like to make this a little bit of a mini exercise, go ahead and try it. 113 00:07:16,360 --> 00:07:20,800 How many titles contain the in the title or how many books title contains the? 114 00:07:21,280 --> 00:07:26,350 And the answer is that we need to use like alongside with count. 115 00:07:26,530 --> 00:07:27,760 So let's do it now. 116 00:07:28,390 --> 00:07:36,690 We're going to do a select and let's just start by selecting the books, the title from books where 117 00:07:37,150 --> 00:07:38,860 title like. 118 00:07:39,670 --> 00:07:46,360 And then this is where we need those wildcards t h e we need a wild card on either side. 119 00:07:47,910 --> 00:07:48,150 All right. 120 00:07:48,150 --> 00:07:48,630 And here we go. 121 00:07:48,630 --> 00:07:53,880 We've got the namesake, the king, the circle, the amazing, the lobster, the Bardo. 122 00:07:53,910 --> 00:07:54,960 Great book, by the way. 123 00:07:55,620 --> 00:07:59,520 So there are six of them here, right? 124 00:07:59,520 --> 00:08:00,150 Yes. 125 00:08:01,240 --> 00:08:07,480 So if we wanted to do that with our code dynamically, not by counting ourselves, we would just replace 126 00:08:08,740 --> 00:08:15,970 that with a count, select count from books where title is like the and it tells us there are six. 127 00:08:16,750 --> 00:08:17,080 All right. 128 00:08:17,080 --> 00:08:20,800 So I just want to show you, you can combine it just like you can with any other of the things that 129 00:08:20,800 --> 00:08:21,580 we've seen. 130 00:08:21,840 --> 00:08:26,890 So substring or cat or upper, they work the same way. 131 00:08:27,040 --> 00:08:28,840 In this case, we're just selecting count. 132 00:08:29,230 --> 00:08:30,280 All right, so let's count. 133 00:08:30,280 --> 00:08:34,480 Next step, we're going to go a little bit off the rails and talk about something called Group By. 134 00:08:34,510 --> 00:08:35,549 Brace yourself. 135 00:08:35,559 --> 00:08:37,419 It's it's refreshing. 136 00:08:37,429 --> 00:08:38,350 Let's put it that way.