1 00:00:00,090 --> 00:00:00,710 All right. 2 00:00:00,720 --> 00:00:01,470 Welcome back. 3 00:00:01,500 --> 00:00:03,710 So we're moving on to our next question here. 4 00:00:03,719 --> 00:00:07,800 Even though this one actually isn't framed as a question, there's no question mark. 5 00:00:07,800 --> 00:00:11,330 But that doesn't make it any less relevant or less real world. 6 00:00:11,340 --> 00:00:16,770 In fact, this is one that I think is a really common operation, a common question to ask, and it 7 00:00:16,770 --> 00:00:19,020 has to do with inactive users. 8 00:00:19,170 --> 00:00:27,300 So across any sort of startup or online app company, there's often users who sign up, create an account 9 00:00:27,300 --> 00:00:32,700 and then do nothing, whether it means never posting or never buying something, never converting from 10 00:00:32,700 --> 00:00:38,850 a free account to a paid account, like in the case of Spotify or never in my case. 11 00:00:39,360 --> 00:00:41,520 This is a real example for my courses. 12 00:00:41,520 --> 00:00:45,780 Students who sign up and then never start it, never make it past the first video. 13 00:00:45,810 --> 00:00:47,130 They're at 0%. 14 00:00:47,490 --> 00:00:52,530 And I'm actually able to send targeted, I think once a month, twice a month. 15 00:00:52,530 --> 00:00:56,610 I can send an email update and I can target that to specific segments. 16 00:00:56,610 --> 00:01:04,440 So if I updated an introduction section to a course and I only wanted to let students know who haven't 17 00:01:04,440 --> 00:01:08,930 started yet, I want to target them and say, Hey, by the way, I noticed you haven't started yet. 18 00:01:08,940 --> 00:01:10,230 Well, this is a good time to start. 19 00:01:10,230 --> 00:01:12,150 I just revamped the whole intro. 20 00:01:12,180 --> 00:01:15,720 I could do that based off of what we're going to be doing here. 21 00:01:15,750 --> 00:01:19,760 Of course, our data and our database is set up differently, but it's the same idea. 22 00:01:19,770 --> 00:01:25,470 So this is a really common thing and all we're going to try and do is for our Instagram clone, our 23 00:01:25,470 --> 00:01:31,050 fake Instagram, we're trying to identify users who have never posted a photo, so they've signed up, 24 00:01:31,080 --> 00:01:32,520 they never posted anything. 25 00:01:32,520 --> 00:01:36,210 And we want our users to be more engaged for a bunch of reasons. 26 00:01:36,360 --> 00:01:41,070 We want higher engagement in our app, which means that we could have a higher valuation potentially, 27 00:01:41,070 --> 00:01:45,210 or sell ads for more, maybe get higher investments, all that kind of stuff. 28 00:01:45,210 --> 00:01:47,520 We want people posting if they're signed up. 29 00:01:47,520 --> 00:01:50,550 It looks bad to have dead accounts, people who sign up and then give up. 30 00:01:50,850 --> 00:01:52,590 So we're going to try and send an email. 31 00:01:53,010 --> 00:01:57,300 We're not worrying about doing that, but how are we going to target these people who have never posted 32 00:01:57,300 --> 00:01:58,680 a photo first step? 33 00:01:58,680 --> 00:01:59,790 Let's identify them. 34 00:02:00,180 --> 00:02:05,550 And essentially we're going to need to work with a users table and the photos table. 35 00:02:06,000 --> 00:02:16,200 So I'll add my three here, which is identify inactive users, users with no photos. 36 00:02:18,630 --> 00:02:21,150 So you've probably gathered we need to do a join. 37 00:02:21,150 --> 00:02:28,740 We're going to be working with the users table so we can start there, select start from users and we 38 00:02:28,740 --> 00:02:35,010 have 100 something users, but we can't see who's done what because we have the photos table and if 39 00:02:35,010 --> 00:02:38,130 we do our select star from photos just to jog your memory. 40 00:02:38,250 --> 00:02:40,650 Well, I guess we haven't actually seen all of our data here yet. 41 00:02:40,650 --> 00:02:46,740 We've got about exactly 257 photos, but a lot of them are done by the same user. 42 00:02:46,740 --> 00:02:48,750 So this here is user ID. 43 00:02:49,770 --> 00:02:53,760 If I scroll way up, here we go, we can see these. 44 00:02:54,060 --> 00:02:55,680 These are all posted by the same user. 45 00:02:55,680 --> 00:02:58,410 So three these are all from the same user. 46 00:02:58,440 --> 00:02:59,850 These are all from the same user. 47 00:03:00,090 --> 00:03:05,820 But then we have some users who haven't posted anything, but there's no way of knowing that just by 48 00:03:05,820 --> 00:03:08,880 looking at the photos, because these are the posted photos. 49 00:03:08,880 --> 00:03:14,490 So what we need to do is join these two together and hopefully you know what we need to do. 50 00:03:14,490 --> 00:03:22,890 If I just do a regular inner join in or join photos and where do we want to do it? 51 00:03:22,890 --> 00:03:32,220 I want to join on users id equals photos dot user id and if we run this now. 52 00:03:35,790 --> 00:03:36,500 It's a mess. 53 00:03:36,510 --> 00:03:38,790 So let's focus on just the things we need. 54 00:03:39,000 --> 00:03:45,690 So rather than Star can do username, maybe the image URL and we can just start with those two. 55 00:03:47,130 --> 00:03:47,400 All right. 56 00:03:47,400 --> 00:03:51,000 So we can see all these usernames and what they've submitted. 57 00:03:51,000 --> 00:03:55,310 But the problem here is that this doesn't account for the people who haven't submitted anything. 58 00:03:55,320 --> 00:04:00,360 So if there was a user who didn't submit something, they're not showing up here because we're doing 59 00:04:00,360 --> 00:04:05,910 an inner join, which remember is the intersection of both circles, the left and the right, where 60 00:04:05,910 --> 00:04:06,750 they overlap. 61 00:04:06,750 --> 00:04:14,520 But what we want is to identify where users on the left have no overlap on the right for photos so we 62 00:04:14,520 --> 00:04:20,160 can change this to a left join and that one change if we rerun this. 63 00:04:21,850 --> 00:04:22,790 We'll have to scroll a bit. 64 00:04:22,810 --> 00:04:23,500 There we go. 65 00:04:23,710 --> 00:04:29,950 You can see we have certain users who have a null over here and that's referring to the fact that they 66 00:04:29,950 --> 00:04:32,260 have no photos corresponding to them. 67 00:04:32,260 --> 00:04:37,530 So there's a couple of them as I go here, and that's what we want to be able to identify. 68 00:04:37,540 --> 00:04:38,860 So we're almost there. 69 00:04:38,860 --> 00:04:46,210 Actually, all we need to do is only select the users that have photos of NULL essentially. 70 00:04:46,210 --> 00:04:52,210 And of course, if I go back to having star here, it's not just image URL that's null, it's everything 71 00:04:52,210 --> 00:04:52,780 in photos. 72 00:04:52,780 --> 00:04:53,800 It's hard to see. 73 00:04:53,800 --> 00:04:57,280 Here's an example, but we've got a couple of nulls here. 74 00:04:57,280 --> 00:05:04,570 We've got the image or the photo ID, we've got the image URL, we've got the user ID of the photo and 75 00:05:04,570 --> 00:05:06,510 then we have the created additional. 76 00:05:06,790 --> 00:05:08,410 So all four things are null. 77 00:05:08,560 --> 00:05:10,270 We just need to focus on one of them. 78 00:05:10,270 --> 00:05:11,380 It doesn't matter which one. 79 00:05:11,380 --> 00:05:17,110 So I'll go back to just adding username here and we can do it. 80 00:05:17,110 --> 00:05:18,190 Doesn't matter really. 81 00:05:18,760 --> 00:05:23,110 Let's do image URL again and then all we want to do is add a wear down here. 82 00:05:23,110 --> 00:05:27,640 But we can't just say where photos dot. 83 00:05:28,030 --> 00:05:29,710 We can just do id equals null. 84 00:05:29,710 --> 00:05:30,340 That doesn't work. 85 00:05:30,340 --> 00:05:35,650 Remember null is weird so we need to do where photos id is null. 86 00:05:36,580 --> 00:05:38,890 And this could be anything on the photo side. 87 00:05:38,900 --> 00:05:44,680 Could be photos ID it could be photos that image URL, photos that user ID, any of them. 88 00:05:44,680 --> 00:05:51,400 So I'll just do it as ID and if I do this now, you'll see. 89 00:05:51,400 --> 00:05:53,980 And we actually don't even need to display the image URL. 90 00:05:54,760 --> 00:06:01,450 But if I do that, this is our list of users who have never posted anything for whatever reasons. 91 00:06:01,450 --> 00:06:03,610 Some of them programmatically. 92 00:06:03,610 --> 00:06:09,940 I created them as bots, which all they do is go comment and like other people's photos, but they don't 93 00:06:09,940 --> 00:06:11,230 post anything themselves. 94 00:06:11,230 --> 00:06:17,740 And then another persona of a user is a lurker, which is somebody who has an account and they don't 95 00:06:17,740 --> 00:06:22,990 use it all that much, but they just kind of look at people's photos, maybe like something, but they 96 00:06:22,990 --> 00:06:25,600 don't really they definitely don't post and they don't comment. 97 00:06:25,600 --> 00:06:30,970 They're kind of a silent member of the community, so we get a decent number of them. 98 00:06:31,150 --> 00:06:34,030 Sorry for that fire truck and that's all we need to do. 99 00:06:34,270 --> 00:06:36,190 If we want to figure out how many, that's another thing. 100 00:06:36,190 --> 00:06:40,870 We can just add account in, but we just want the username so that we can target them. 101 00:06:40,870 --> 00:06:42,700 These are people who've never posted anything. 102 00:06:42,700 --> 00:06:47,710 And just a note, you know, if you did this the other way around, it would have worked just fine as 103 00:06:47,710 --> 00:06:48,490 well too. 104 00:06:48,490 --> 00:06:52,660 Do photos users and then a right join. 105 00:06:55,200 --> 00:06:56,220 Does the same thing. 106 00:06:56,460 --> 00:06:58,710 So either way it works. 107 00:06:58,710 --> 00:07:06,090 But the key thing is that we're working with NULL, where we have an intersection where some users and 108 00:07:06,090 --> 00:07:08,070 some photos overlap, most of them do. 109 00:07:08,280 --> 00:07:14,190 But then we also have that subset of users who have no photos and we need to also join those with NULL, 110 00:07:14,280 --> 00:07:16,950 which is how we were able to tell if they have no photos. 111 00:07:16,950 --> 00:07:18,240 So here's my solution. 112 00:07:18,240 --> 00:07:20,970 If you came up with something different, post it and we can discuss it. 113 00:07:21,270 --> 00:07:22,050 Moving on.