1 00:00:00,000 --> 00:00:01,650 Hey Cloud Gurus. 2 00:00:01,650 --> 00:00:04,463 Welcome to our lesson on implementing data redundancy. 3 00:00:06,490 --> 00:00:08,460 In this lesson, we'll start with an overview 4 00:00:08,460 --> 00:00:10,373 of what we mean by data redundancy. 5 00:00:11,260 --> 00:00:13,970 We'll then take a look at our options for primary region 6 00:00:13,970 --> 00:00:16,870 redundancy, and then how we accomplish 7 00:00:16,870 --> 00:00:18,533 secondary region redundancy. 8 00:00:19,500 --> 00:00:21,853 We'll then wrap everything up in a review. 9 00:00:23,270 --> 00:00:26,320 This will be a high-level overview of data redundancy. 10 00:00:26,320 --> 00:00:28,090 These concepts are something you should already 11 00:00:28,090 --> 00:00:30,800 be familiar with as an Azure professional. 12 00:00:30,800 --> 00:00:33,000 They're baked into Azure Storage in general, 13 00:00:33,000 --> 00:00:35,960 not just the portions dealing with data engineering. 14 00:00:35,960 --> 00:00:37,960 And so we're going to keep it high level, 15 00:00:37,960 --> 00:00:40,350 but make sure you know the fundamentals of the options 16 00:00:40,350 --> 00:00:42,150 and how they compare to one another. 17 00:00:43,780 --> 00:00:46,010 As an overview of Azure Storage, 18 00:00:46,010 --> 00:00:48,010 it is replicated by default. 19 00:00:48,010 --> 00:00:51,090 It automatically creates multiple copies of your data. 20 00:00:51,090 --> 00:00:53,350 And this is built right into the service. 21 00:00:53,350 --> 00:00:55,640 So you don't have to worry about a single copy 22 00:00:55,640 --> 00:00:57,323 of your data being compromised. 23 00:00:58,230 --> 00:01:00,610 This makes you ready in the event of failures, 24 00:01:00,610 --> 00:01:03,773 helping you meet your availability and durability targets. 25 00:01:04,740 --> 00:01:07,680 Azure Storage has several options for how to go about this, 26 00:01:07,680 --> 00:01:10,150 allowing you to have it your own way. 27 00:01:10,150 --> 00:01:12,690 You can weigh the trade-offs between lower costs 28 00:01:12,690 --> 00:01:15,600 and higher availability and pick the solution 29 00:01:15,600 --> 00:01:17,100 that works best for your need. 30 00:01:18,190 --> 00:01:21,150 First, let's talk about protecting home base, 31 00:01:21,150 --> 00:01:23,620 or in other words, your primary region. 32 00:01:23,620 --> 00:01:26,003 This is where you first create your resources. 33 00:01:26,930 --> 00:01:30,700 The default level of protection we talked about is LRS, 34 00:01:30,700 --> 00:01:32,533 locally redundant storage. 35 00:01:33,550 --> 00:01:35,640 This creates 3 synchronous copies 36 00:01:35,640 --> 00:01:38,060 within a single physical location, 37 00:01:38,060 --> 00:01:40,680 so inside of 1 data center. 38 00:01:40,680 --> 00:01:43,270 It's the lowest cost option for redundancy, 39 00:01:43,270 --> 00:01:45,373 but also has the lowest availability. 40 00:01:46,320 --> 00:01:49,520 This protects you against rack or drive failures. 41 00:01:49,520 --> 00:01:51,920 But if something happens to the entire data center, 42 00:01:51,920 --> 00:01:53,080 such as a fire, 43 00:01:53,080 --> 00:01:55,580 then all 3 copies of your data are still lost. 44 00:01:56,760 --> 00:01:58,250 If you're not comfortable with that, 45 00:01:58,250 --> 00:02:02,630 there is also ZRS, zone redundant storage, 46 00:02:02,630 --> 00:02:05,730 and this has 3 synchronous copies across Azure 47 00:02:05,730 --> 00:02:08,570 availability zones in this region. 48 00:02:08,570 --> 00:02:11,460 And so even if that 1 data center is compromised, 49 00:02:11,460 --> 00:02:14,143 you still have copies in 2 other data centers. 50 00:02:15,060 --> 00:02:18,644 As a bonus tip, this is what Microsoft recommends for using 51 00:02:18,644 --> 00:02:20,653 Azure Data Lake Storage Gen2. 52 00:02:22,440 --> 00:02:26,490 If we look at these 2 options visually for LRS, 53 00:02:26,490 --> 00:02:28,490 we have our data center 54 00:02:28,490 --> 00:02:31,370 and the 3 copies of the data within there. 55 00:02:31,370 --> 00:02:34,010 And while that gives some peace of mind about avoiding 56 00:02:34,010 --> 00:02:36,190 failures, you can see that this data center 57 00:02:36,190 --> 00:02:37,963 is the only thing housing our data. 58 00:02:38,820 --> 00:02:41,680 So in ZRS we have a similar setup. 59 00:02:41,680 --> 00:02:43,840 There's a single copy in a data center 60 00:02:43,840 --> 00:02:47,910 in Availability Zone 1, but then we also have another 61 00:02:47,910 --> 00:02:51,080 synchronous copy in Availability Zone 2 62 00:02:51,080 --> 00:02:53,950 and Availability Zone 3. 63 00:02:53,950 --> 00:02:57,220 And so you can easily see how this would give you more peace 64 00:02:57,220 --> 00:02:59,170 of mind for your critical applications. 65 00:03:00,070 --> 00:03:03,140 However, this is all still in 1 region 66 00:03:03,140 --> 00:03:05,350 in your primary region. 67 00:03:05,350 --> 00:03:07,910 Sometimes you need to take it up a notch. 68 00:03:07,910 --> 00:03:10,920 If you have a disaster that affects an entire region, 69 00:03:10,920 --> 00:03:14,100 and so all of the data centers within that region go down, 70 00:03:14,100 --> 00:03:17,390 ZRS by itself is not enough to protect you. 71 00:03:17,390 --> 00:03:22,000 For this, we have GRS, or geo-redundant storage, 72 00:03:22,000 --> 00:03:26,510 and this is basically LRS plus an asynchronous copy 73 00:03:26,510 --> 00:03:30,053 to a single physical location in a secondary zone. 74 00:03:31,130 --> 00:03:34,550 And so you still get the 3 copies within 1 data center 75 00:03:34,550 --> 00:03:36,190 and your primary zone, 76 00:03:36,190 --> 00:03:40,373 and then another 3 copies using LRS in another region. 77 00:03:41,380 --> 00:03:44,000 Notice that that copy is asynchronous 78 00:03:44,000 --> 00:03:47,250 to the secondary region, whereas the copies in your primary 79 00:03:47,250 --> 00:03:49,350 region are synchronous, 80 00:03:49,350 --> 00:03:51,670 meaning they all happen at the same time. 81 00:03:51,670 --> 00:03:55,163 Asynchronous can lag behind the synchronous copies some. 82 00:03:56,350 --> 00:03:58,810 If GRS does not meet your needs, you can step it up 83 00:03:58,810 --> 00:04:03,810 from there and go to GZRS, or geo-zone-redundant storage. 84 00:04:05,900 --> 00:04:09,490 And in this case, ZRS is used in the primary region 85 00:04:09,490 --> 00:04:13,740 for your synchronous copies, and then similar to GRS, 86 00:04:13,740 --> 00:04:17,000 an asynchronous copy to a single physical location 87 00:04:17,000 --> 00:04:19,290 in the secondary region is made. 88 00:04:19,290 --> 00:04:21,820 And so the primary differentiator between these two 89 00:04:21,820 --> 00:04:24,320 is not what happens in the secondary region. 90 00:04:24,320 --> 00:04:28,160 Either way, that's a LRS copy of the data. 91 00:04:28,160 --> 00:04:30,530 The difference is in the primary region. 92 00:04:30,530 --> 00:04:33,713 One uses LRS there, the other ZRS. 93 00:04:34,600 --> 00:04:36,930 It should also be noted that you don't choose 94 00:04:36,930 --> 00:04:38,580 the secondary region. 95 00:04:38,580 --> 00:04:40,280 When you create your storage account, 96 00:04:40,280 --> 00:04:42,050 you select the primary region, 97 00:04:42,050 --> 00:04:45,210 but then the paired secondary region is chosen for you 98 00:04:45,210 --> 00:04:47,720 based on the primary region that you chose. 99 00:04:47,720 --> 00:04:49,140 And this cannot be changed. 100 00:04:49,140 --> 00:04:51,053 That's all handled by Microsoft. 101 00:04:51,950 --> 00:04:54,930 It should also be noted that your secondary copies 102 00:04:54,930 --> 00:04:57,693 are not readable until you fail over to them. 103 00:04:58,570 --> 00:05:00,730 If you need those copies to be readable, 104 00:05:00,730 --> 00:05:03,705 then you're going to need to use the RA version 105 00:05:03,705 --> 00:05:07,980 of these options, RA-GRS and RA-GZRS, 106 00:05:09,967 --> 00:05:12,627 the RA standing for read access. 107 00:05:13,810 --> 00:05:16,410 Again, let's take a look at these visually. 108 00:05:16,410 --> 00:05:20,530 And so with GRS, we have our primary region, 109 00:05:20,530 --> 00:05:24,530 and within there, we have our LRS, 3 copies of the data 110 00:05:24,530 --> 00:05:26,760 within 1 data center. 111 00:05:26,760 --> 00:05:30,500 However, we want data in the secondary region as well. 112 00:05:30,500 --> 00:05:35,080 And so we use geo-replication and create a second LRS 113 00:05:35,080 --> 00:05:37,570 set of copies in Datacenter 2 114 00:05:37,570 --> 00:05:40,530 that is located in that secondary region. 115 00:05:40,530 --> 00:05:42,930 And if we want to have read access to that, 116 00:05:42,930 --> 00:05:46,853 we enable RA-GRS for a cost, of course. 117 00:05:48,070 --> 00:05:52,830 And with GZRS, again, we have our primary region, 118 00:05:52,830 --> 00:05:56,610 but this time we have ZRS in play there. 119 00:05:56,610 --> 00:05:59,570 And so we have our data spread across 3 different 120 00:05:59,570 --> 00:06:01,963 availability zones within that region. 121 00:06:02,880 --> 00:06:06,030 Again, we want to copy data to the secondary region. 122 00:06:06,030 --> 00:06:10,120 And so we use geo-replication to create an LRS set 123 00:06:10,120 --> 00:06:14,740 of 3 copies in a separate data center of that region. 124 00:06:14,740 --> 00:06:17,810 And just as before, if we need those copies to be readable, 125 00:06:17,810 --> 00:06:22,693 we need to use RA-GZRS again for a higher cost. 126 00:06:25,260 --> 00:06:29,900 By way of review, LRS creates 3 copies 127 00:06:29,900 --> 00:06:33,910 in 1 data center, whereas ZRS spreads the copies 128 00:06:33,910 --> 00:06:37,203 across 3 availability zones in the primary region. 129 00:06:39,030 --> 00:06:42,190 GRS uses LRS and the primary region, 130 00:06:42,190 --> 00:06:45,920 and then implements LRS in a secondary region. 131 00:06:45,920 --> 00:06:50,340 Similarly, GZRS is ZRS in the primary 132 00:06:50,340 --> 00:06:53,590 plus LRS in the secondary. 133 00:06:53,590 --> 00:06:56,900 Both of these options for secondary region replication 134 00:06:56,900 --> 00:07:00,370 will use LRS in that secondary region. 135 00:07:00,370 --> 00:07:04,620 The differentiation is whether they use LRS or ZRS 136 00:07:04,620 --> 00:07:06,113 in the primary region. 137 00:07:07,530 --> 00:07:10,290 And if we need the ability for our secondary copies 138 00:07:10,290 --> 00:07:15,040 to be readable, we need to use RA-GRS or RA-GZRS. 139 00:07:17,580 --> 00:07:20,500 And remember that each of these comes with different costs 140 00:07:20,500 --> 00:07:22,110 associated with them. 141 00:07:22,110 --> 00:07:24,410 LRS is going to be your cheapest option, 142 00:07:24,410 --> 00:07:26,880 but also have the lowest durability. 143 00:07:26,880 --> 00:07:28,550 And as you add more durability, 144 00:07:28,550 --> 00:07:30,950 you're also going to add more costs. 145 00:07:30,950 --> 00:07:32,920 And so every situation will be a balance 146 00:07:32,920 --> 00:07:35,020 between the amount of durability you need 147 00:07:35,020 --> 00:07:36,583 and the cost to implement it. 148 00:07:37,660 --> 00:07:39,180 Thank you for joining me for this lesson 149 00:07:39,180 --> 00:07:41,730 on implementing data redundancy. 150 00:07:41,730 --> 00:07:43,870 Again, these are not concepts that are exclusive 151 00:07:43,870 --> 00:07:46,280 to data engineering, and so you should already have 152 00:07:46,280 --> 00:07:48,230 a good base understanding of them, 153 00:07:48,230 --> 00:07:51,030 but I hope this is a good refresher and helps you understand 154 00:07:51,030 --> 00:07:54,203 how to best protect your data from disasters. 155 00:07:55,120 --> 00:07:56,790 That's it for now, when you're ready, 156 00:07:56,790 --> 00:07:58,340 I'll see you in the next video.