1 00:00:00,120 --> 00:00:06,390 And finally, we've arrived at our last entity to store, which is hash tags or tags, and we save this 2 00:00:06,390 --> 00:00:12,150 one for last because as I mentioned in the intro video, it's the hardest, I guess. 3 00:00:12,600 --> 00:00:16,800 Well, it doesn't have to be, but there's a couple of different solutions we could go about and I wanted 4 00:00:16,800 --> 00:00:20,430 to have a discussion about which is best if there is one answer. 5 00:00:20,430 --> 00:00:24,960 So let's step through three potential structures that we could use. 6 00:00:24,960 --> 00:00:31,260 And remember, the way that this works is that one photo can have ten, 20, 50 hashtags. 7 00:00:31,260 --> 00:00:33,300 I do think there is a limit on Instagram. 8 00:00:33,570 --> 00:00:39,060 I mean, you can keep adding them as many as you want, but only 30, I believe, actually will show 9 00:00:39,060 --> 00:00:39,570 up. 10 00:00:40,410 --> 00:00:45,840 Or it's not that they won't show up, but it's that when you search for a hashtag, your photo and they 11 00:00:45,840 --> 00:00:49,620 show up if you use less than 30 Anyways, we're getting off track here. 12 00:00:50,310 --> 00:00:57,030 But the idea is you can add hashtags to a photo, but then you can also go click on a hashtag and see 13 00:00:57,030 --> 00:00:59,100 all the associated photos with it. 14 00:00:59,250 --> 00:01:06,210 So here's one way we could structure it, which might seem maybe the least obvious to you, which would 15 00:01:06,210 --> 00:01:13,770 be to do it like this where we actually just have our photos table, but we add a column called tags 16 00:01:13,770 --> 00:01:20,910 and it's a string, just a var char that just has our tags and we could separate them by the hash sign 17 00:01:20,910 --> 00:01:23,700 the octo thorp or by spaces or dashes or something. 18 00:01:23,700 --> 00:01:27,090 But we're just storing them together and this could work. 19 00:01:27,090 --> 00:01:28,770 And places do use this. 20 00:01:28,770 --> 00:01:33,750 I've seen it suggested at least I haven't seen it in practice, but you can imagine how it would work 21 00:01:34,410 --> 00:01:36,240 to add a new tag into something. 22 00:01:36,240 --> 00:01:41,610 You'd basically just take all the other tags and then concatenate in at the end a new tag or at the 23 00:01:41,610 --> 00:01:42,300 beginning. 24 00:01:42,300 --> 00:01:48,450 And then you could have an order to figure out which ones were posted first, I guess based off of that 25 00:01:48,450 --> 00:01:49,410 order that they're in. 26 00:01:50,400 --> 00:01:51,600 And it's pretty simple. 27 00:01:51,840 --> 00:01:58,470 If you wanted to find all photos that are tagged with cute, you just basically have to use like. 28 00:01:59,460 --> 00:02:01,620 Where tags like cute. 29 00:02:01,620 --> 00:02:04,700 But then as you can see down here, there are a couple problems. 30 00:02:04,710 --> 00:02:07,740 Let's start with an advantage, which is that it's very easy to implement. 31 00:02:07,740 --> 00:02:09,419 You don't need any other tables. 32 00:02:09,630 --> 00:02:11,630 But there are a couple of disadvantages. 33 00:02:11,640 --> 00:02:17,550 One is that there's a limit to how many tags you can have in there, which is the length of whatever 34 00:02:17,550 --> 00:02:22,470 the column is of a bar char, whatever the maximum length is, you can only put so many in there. 35 00:02:23,340 --> 00:02:28,500 The other problem or any other problem actually, is that you can't store additional information. 36 00:02:28,500 --> 00:02:32,460 Like if we wanted to store the first time a hash tag was used. 37 00:02:33,000 --> 00:02:39,720 If I wanted to be able to see the first person to use the hashtag whatever cat or something, I'm not 38 00:02:39,720 --> 00:02:44,460 storing the time that something was tagged because, you know, on Instagram you can add hash tags after 39 00:02:44,460 --> 00:02:45,330 the fact. 40 00:02:45,720 --> 00:02:50,400 I'm sure they're storing that when they're tagged so that they can track trends as hash tags come and 41 00:02:50,400 --> 00:02:51,030 go. 42 00:02:51,270 --> 00:02:54,000 So it doesn't allow us to store additional information like that. 43 00:02:54,270 --> 00:02:59,940 And then the other thing that's kind of important is that you have to be careful when you're searching 44 00:02:59,940 --> 00:03:00,610 for stuff. 45 00:03:00,630 --> 00:03:06,210 So if I was trying to find the hash tag food and I wanted to see nice food, beautiful food, so the 46 00:03:06,210 --> 00:03:07,470 hash tag is just food. 47 00:03:07,800 --> 00:03:10,860 Well, it will return this, which has sad food. 48 00:03:11,460 --> 00:03:15,030 If I'm just doing like, food, so I'd have to be careful. 49 00:03:15,030 --> 00:03:20,550 And I need to add spaces maybe to make it easier to separate things out. 50 00:03:21,030 --> 00:03:22,290 Basically, you just have to be careful. 51 00:03:22,500 --> 00:03:23,470 There's ways around it. 52 00:03:23,490 --> 00:03:24,450 We've seen them before. 53 00:03:24,450 --> 00:03:26,670 How you use search with the like. 54 00:03:26,910 --> 00:03:32,340 But the point is this is an easy solution, but it's not going to be the best and it's not the best 55 00:03:32,340 --> 00:03:33,450 for us right now. 56 00:03:33,990 --> 00:03:36,810 So another solution is to use two tables. 57 00:03:37,200 --> 00:03:43,260 So we have our photos as our photos table currently is now, but then we have a tags table where we 58 00:03:43,260 --> 00:03:48,870 have our tag name in this case cute, and then a corresponding photo ID. 59 00:03:48,900 --> 00:03:56,790 So if this photo with ID of one is tagged cute, we put it there and then a photo with ID three which 60 00:03:56,790 --> 00:03:59,280 is the selfie here is also tags. 61 00:03:59,280 --> 00:03:59,850 Cute. 62 00:04:00,390 --> 00:04:05,790 And then photo ID two is tagged with microwave and three with ego, three with smile, two with gross. 63 00:04:05,790 --> 00:04:12,720 So as you can see, we can have a single photo with multiple tags like here where we have this selfie 64 00:04:12,720 --> 00:04:16,470 ID three is tagged with smile, ego and cute. 65 00:04:16,769 --> 00:04:21,510 And we can also have tags that are applied to multiple photos like cute. 66 00:04:21,540 --> 00:04:24,030 It's applied to the first photo and the third photo. 67 00:04:24,420 --> 00:04:25,770 So this works. 68 00:04:26,460 --> 00:04:28,440 There's unlimited number of tags. 69 00:04:28,680 --> 00:04:29,870 That's an advantage. 70 00:04:29,880 --> 00:04:35,970 We don't have to worry about how many can fit into the string, but a disadvantage is that for one thing, 71 00:04:35,970 --> 00:04:42,360 we are storing these tag names over and over and over and these are strings and that it's maybe not 72 00:04:42,360 --> 00:04:46,950 ideal to store all these different strings and have duplicated data, but it's not a big deal. 73 00:04:46,950 --> 00:04:48,510 The duplication isn't the problem. 74 00:04:48,510 --> 00:04:48,950 Really. 75 00:04:48,960 --> 00:04:54,240 What it comes down to is that this is actually slower than the previous solution when it comes to things 76 00:04:54,240 --> 00:04:57,300 like inserting or updating or deleting. 77 00:04:57,300 --> 00:05:01,230 But then another problem is that it's actually slower than the previous solution. 78 00:05:01,230 --> 00:05:06,270 So in just a moment I can show you a comparison that I didn't do, but there's a great article that 79 00:05:06,270 --> 00:05:12,630 I found where somebody does a speed test between these the three solutions to figure out when they perform 80 00:05:12,630 --> 00:05:13,170 best. 81 00:05:13,290 --> 00:05:16,290 So this is not one we're going to use either. 82 00:05:17,610 --> 00:05:20,670 But so there's been three issues which maybe you got. 83 00:05:20,910 --> 00:05:22,860 It involves three tables. 84 00:05:23,100 --> 00:05:25,950 So we have our photos table unchanged. 85 00:05:26,070 --> 00:05:28,650 We have a different tags table. 86 00:05:28,650 --> 00:05:32,460 So this tags table is just a tag name and an ID. 87 00:05:33,150 --> 00:05:40,230 And then we have a middle table, a join table called photo tags or tag photos or tags or something, 88 00:05:40,230 --> 00:05:44,070 whatever you want to call it, which is an instance of a hash tag being applied to a photo. 89 00:05:44,220 --> 00:05:49,710 And all that it is is a photo ID and a tag ID and that's it. 90 00:05:50,310 --> 00:05:59,070 So in this case, photo ID one is my cat is being tagged with cute and then the same photo. 91 00:05:59,070 --> 00:06:01,320 My cat is being tagged with pets. 92 00:06:01,650 --> 00:06:03,860 And then we have we can keep going. 93 00:06:03,870 --> 00:06:10,260 My meal is being tagged with microwave and my meal is being tagged with groups and so on. 94 00:06:10,260 --> 00:06:13,380 So we have these two columns and that's pretty much it. 95 00:06:14,040 --> 00:06:18,900 The advantage, again, just like the previous one, there's an unlimited number of tags and we can 96 00:06:18,900 --> 00:06:20,430 also add additional information. 97 00:06:20,430 --> 00:06:26,760 So if I wanted to, when a tag is created the first time over here, I could store the first time it 98 00:06:26,760 --> 00:06:27,570 was created. 99 00:06:27,720 --> 00:06:34,620 And then over here, every time it's subsequently used, I could store the time it was used or something. 100 00:06:35,010 --> 00:06:41,160 And so I'm sure there's some interesting data you could garner from that or the location of where you 101 00:06:41,160 --> 00:06:43,270 know, where in the world it was used or something like that. 102 00:06:43,270 --> 00:06:47,820 At the time of the day, I don't know, day of the week, whatever. 103 00:06:47,820 --> 00:06:50,610 And we also have less duplication of the tags over here. 104 00:06:51,120 --> 00:06:56,070 But then there are a couple of problems and they really come down to just more work. 105 00:06:56,070 --> 00:07:01,080 So when we're inserting something, if it's a first time a hash tag has been used and it's not in the 106 00:07:01,080 --> 00:07:07,620 database yet, we have to create it and then we have to associate it with a photo using photo tags. 107 00:07:07,830 --> 00:07:10,500 And the same thing when we're updating, we have to. 108 00:07:12,470 --> 00:07:12,680 Okay. 109 00:07:12,980 --> 00:07:13,580 What? 110 00:07:14,370 --> 00:07:16,830 She just vomited anyway. 111 00:07:16,890 --> 00:07:19,080 And then we also have to worry about orphans. 112 00:07:19,080 --> 00:07:21,420 We should always all be worried about orphans. 113 00:07:21,420 --> 00:07:27,510 But in particular, when we're talking about these tables, when we're deleting something, if we delete 114 00:07:27,510 --> 00:07:30,420 a tag for some reason, which Instagram does. 115 00:07:30,750 --> 00:07:36,900 If there's a problem, if something is being misused or somehow being, I don't know, controversial, 116 00:07:37,800 --> 00:07:39,240 they can be removed. 117 00:07:39,240 --> 00:07:43,680 And so then we just have to make sure when we're removing it from here, we also need to remove it from 118 00:07:43,680 --> 00:07:47,730 all associated photo tags So it can be a bit more complicated. 119 00:07:47,730 --> 00:07:54,750 But basically it comes down to at this, the cost of having this nicer structure is that it's just a 120 00:07:54,750 --> 00:07:55,740 little more work. 121 00:07:55,770 --> 00:08:02,040 And just as an end note here, I did mention earlier that there is this great article online. 122 00:08:02,040 --> 00:08:09,900 This guy who does speed tests of different ways of doing tagging and it's written up include the link 123 00:08:10,710 --> 00:08:11,490 along with this video. 124 00:08:11,490 --> 00:08:14,280 If you want to read it, I'll just spend 10 seconds talking about it. 125 00:08:14,280 --> 00:08:21,780 But basically he does some test here to show that depending on one the size of your data, set this 126 00:08:21,780 --> 00:08:26,820 example with a smaller data set, the second and third approaches I showed you with the multiple tables 127 00:08:26,820 --> 00:08:27,930 are actually faster. 128 00:08:28,140 --> 00:08:33,390 But then as we get a larger data set with thousands of tags, it actually changes. 129 00:08:33,780 --> 00:08:36,000 So it's just interesting to note that and to read. 130 00:08:36,000 --> 00:08:42,299 But also it talks about how much time it takes to insert compared to compare it across the three different 131 00:08:42,299 --> 00:08:43,020 versions. 132 00:08:43,650 --> 00:08:48,720 And it's just an interesting article, but if you actually look at its conclusion section, which is 133 00:08:48,720 --> 00:08:51,720 probably what most of you care about, where's that? 134 00:08:52,880 --> 00:08:58,580 Is that he actually thinks that it would be best to have if you're working on a huge app to have two 135 00:08:58,580 --> 00:08:59,390 versions. 136 00:08:59,900 --> 00:09:06,740 The the first version I showed you and the third version I showed you and basically they are perform 137 00:09:06,740 --> 00:09:08,660 best at in different situations. 138 00:09:08,660 --> 00:09:10,880 And this is getting into the woods a little bit. 139 00:09:10,880 --> 00:09:16,670 But basically the version that we're going to go with the third one with three different tables is faster 140 00:09:16,670 --> 00:09:20,780 if you're working with common tags, things that are used often. 141 00:09:20,780 --> 00:09:28,460 But if you're searching or you're trying to do joint with like rare or the wrong word, but uncommon 142 00:09:28,460 --> 00:09:34,190 tags that just aren't used very often, then it's actually slower and it would be faster to do the first 143 00:09:34,190 --> 00:09:36,710 version I showed you where you store them as text. 144 00:09:37,370 --> 00:09:43,520 So this guy is suggesting to use a combination of the two, which we're not going to get into. 145 00:09:43,520 --> 00:09:50,030 But just interesting to note that there isn't one best way and that it really comes down to your needs, 146 00:09:50,240 --> 00:09:53,270 what your app needs, the product needs dictate. 147 00:09:53,270 --> 00:09:59,960 And in all likelihood, as an app like Instagram grows, or especially when it started out to where 148 00:09:59,960 --> 00:10:06,680 it is now, it's schema, it's database, it's undergone multiple big overhauls and changes, probably 149 00:10:06,680 --> 00:10:13,340 not just the schema itself, but the database, whether it's going from something like MySQL to another 150 00:10:13,340 --> 00:10:21,140 type database or a bigger change going from a relational database like MySQL to a non relational database. 151 00:10:21,140 --> 00:10:23,660 So it's really hard to say what's best. 152 00:10:23,810 --> 00:10:28,760 But it's interesting to take a look at somebody who's actually crunched the numbers and even after doing 153 00:10:28,760 --> 00:10:34,310 that, can't necessarily give a definitive answer because the answer is it depends.