1 00:00:00,000 --> 00:00:01,140 Hey Cloud Gurus. 2 00:00:01,140 --> 00:00:04,003 Welcome to this lesson on using Azure Data Lakes. 3 00:00:05,830 --> 00:00:08,100 In this lesson, we'll be doing a quick recap 4 00:00:08,100 --> 00:00:11,400 on what Azure Data Lake Storage Gen2 is. 5 00:00:11,400 --> 00:00:13,040 I know that Brian has already covered that 6 00:00:13,040 --> 00:00:16,640 in the previous section, so that will be just a quick stop. 7 00:00:16,640 --> 00:00:18,380 From there, we'll talk about where it fits 8 00:00:18,380 --> 00:00:20,970 in the modern data warehouse. 9 00:00:20,970 --> 00:00:23,950 We're going to do a demo on creating and interacting 10 00:00:23,950 --> 00:00:26,817 with these accounts live in the Azure portal, 11 00:00:26,817 --> 00:00:29,840 and then we'll wrap everything up with a review. 12 00:00:29,840 --> 00:00:30,943 So let's get started. 13 00:00:33,450 --> 00:00:37,230 What exactly is an Azure Data Lake Storage Gen2 account? 14 00:00:37,230 --> 00:00:40,690 Well, it's built on Azure Blob storage. 15 00:00:40,690 --> 00:00:44,040 That was kind of our first iteration and gave us several 16 00:00:44,040 --> 00:00:46,490 benefits in the way of storage, 17 00:00:46,490 --> 00:00:49,600 including being a low-cost solution. 18 00:00:49,600 --> 00:00:52,950 They had various tiers we could take advantage of, 19 00:00:52,950 --> 00:00:54,913 and it was highly available. 20 00:00:55,870 --> 00:00:58,090 And so these great benefits are still at the core 21 00:00:58,090 --> 00:01:00,710 of what Azure Data Lake Storage is. 22 00:01:00,710 --> 00:01:04,000 But now with the advent of storage Gen2 accounts, 23 00:01:04,000 --> 00:01:07,300 it builds on top of that and combines the capabilities 24 00:01:07,300 --> 00:01:11,850 of Azure Blob Storage and Azure Data Lake Storage Gen1, 25 00:01:11,850 --> 00:01:13,575 giving us additional abilities 26 00:01:13,575 --> 00:01:16,266 such as file system semantics, 27 00:01:16,266 --> 00:01:19,823 file-level security, and scale. 28 00:01:21,970 --> 00:01:25,190 Really what this does for us in the end is 29 00:01:25,190 --> 00:01:28,570 enhances the solution for analytics. 30 00:01:28,570 --> 00:01:31,440 And there are a few key areas in which it does this, 31 00:01:31,440 --> 00:01:35,390 namely, performance, because you have no need to copy 32 00:01:35,390 --> 00:01:38,472 or transform the data before analysis, 33 00:01:38,472 --> 00:01:42,720 compared to just the flat namespace of Blob storage. 34 00:01:42,720 --> 00:01:46,240 It also benefits us in the area of management. 35 00:01:46,240 --> 00:01:49,570 Having that hierarchical namespace allows us to organize 36 00:01:49,570 --> 00:01:52,240 by directories and subdirectories, 37 00:01:52,240 --> 00:01:54,203 making management much easier. 38 00:01:55,200 --> 00:01:58,160 And finally it benefits us with security, 39 00:01:58,160 --> 00:02:00,360 allowing POSIX permissions on directories 40 00:02:00,360 --> 00:02:01,923 or individual files. 41 00:02:02,860 --> 00:02:04,320 And so, as you can ascertain, 42 00:02:04,320 --> 00:02:06,140 and as Brian pointed out earlier, 43 00:02:06,140 --> 00:02:09,410 this all comes back to the hierarchical namespace. 44 00:02:09,410 --> 00:02:11,260 That's really what makes it stand apart, 45 00:02:11,260 --> 00:02:15,220 and really what makes it enhanced for analytics. 46 00:02:15,220 --> 00:02:17,419 That being the case, let's talk for a second 47 00:02:17,419 --> 00:02:20,370 about how it fits into our modern data warehouse. 48 00:02:20,370 --> 00:02:23,770 These days, we have all kinds of data of all kinds of types 49 00:02:23,770 --> 00:02:25,890 coming from all different directions. 50 00:02:25,890 --> 00:02:30,070 We have logs, media files, relational database data, 51 00:02:30,070 --> 00:02:33,510 all kinds of structured and unstructured data. 52 00:02:33,510 --> 00:02:36,270 We use a component such as Azure Data Factory 53 00:02:36,270 --> 00:02:38,010 to ingest that. 54 00:02:38,010 --> 00:02:39,910 And then we need a place to store it, 55 00:02:39,910 --> 00:02:43,037 namely Azure Data Lake Storage Gen2. 56 00:02:43,037 --> 00:02:46,250 And this is a great repository for all of our data 57 00:02:46,250 --> 00:02:48,700 in our analytics solution. 58 00:02:48,700 --> 00:02:50,860 From there, it kind of becomes the central hub 59 00:02:50,860 --> 00:02:53,240 from which all the fun breaks out. 60 00:02:53,240 --> 00:02:56,792 You could have Azure Databricks in your prep and train step 61 00:02:56,792 --> 00:03:00,690 and maybe services such as Azure Synapse Analytics, 62 00:03:00,690 --> 00:03:02,300 Azure Analysis Services, 63 00:03:02,300 --> 00:03:06,620 or Power BI downstream in your model and serve step. 64 00:03:06,620 --> 00:03:08,650 But each of these is going to be relying 65 00:03:08,650 --> 00:03:11,053 on Azure Data Lake Storage Gen2. 66 00:03:12,170 --> 00:03:14,050 We could be pulling information from there, 67 00:03:14,050 --> 00:03:15,720 prepping and training it into Databricks, 68 00:03:15,720 --> 00:03:18,000 and then sending it over to Synapse, 69 00:03:18,000 --> 00:03:21,570 or Synapse could be pulling directly from our data lake. 70 00:03:21,570 --> 00:03:24,430 Those services can further pass it over to Analysis Services 71 00:03:24,430 --> 00:03:27,520 and Power BI, but no matter what services you add 72 00:03:27,520 --> 00:03:29,400 into your analytics solution, 73 00:03:29,400 --> 00:03:33,460 be it Cosmos DB, Stream Analytics, or what have you, 74 00:03:33,460 --> 00:03:36,495 it's all tying back to the Azure data lake 75 00:03:36,495 --> 00:03:38,363 as its central hub for this data. 76 00:03:39,390 --> 00:03:42,290 With that in mind, let's jump over to the Azure portal 77 00:03:42,290 --> 00:03:45,453 and take a live look at this crucial analytics component. 78 00:03:47,340 --> 00:03:50,100 Here we are in the Azure portal, and I'm utilizing 79 00:03:50,100 --> 00:03:51,950 the cloud sandbox feature available 80 00:03:51,950 --> 00:03:54,260 through A Cloud Guru subscriptions. 81 00:03:54,260 --> 00:03:57,420 This is a fantastic tool for getting hands-on experience 82 00:03:57,420 --> 00:03:59,430 with Azure services. 83 00:03:59,430 --> 00:04:01,750 To get started, it's the same as creating 84 00:04:01,750 --> 00:04:03,810 any other storage account. 85 00:04:03,810 --> 00:04:06,900 We could search for storage accounts in the search box, 86 00:04:06,900 --> 00:04:10,440 or click our 3-line hamburger menu 87 00:04:10,440 --> 00:04:12,823 and go to Storage Accounts. 88 00:04:14,760 --> 00:04:16,753 Let's Create Storage Account. 89 00:04:19,830 --> 00:04:21,940 Since I'm using the cloud sandbox, 90 00:04:21,940 --> 00:04:23,890 it's already filled in my subscription 91 00:04:23,890 --> 00:04:25,994 and resource group for me. Of course, 92 00:04:25,994 --> 00:04:28,173 you can make those any that you want. 93 00:04:30,190 --> 00:04:35,190 We can set our storage account name, and it's already taken. 94 00:04:36,120 --> 00:04:38,700 So I'll add some numbers to the end. 95 00:04:38,700 --> 00:04:42,090 East US is fine with me, and I'll leave everything else 96 00:04:42,090 --> 00:04:45,140 default for now because what we are really concerned about 97 00:04:45,140 --> 00:04:49,130 with a data lake is on our Advanced tab. 98 00:04:49,130 --> 00:04:52,480 And let's come down to Data Lake Storage Gen2. 99 00:04:52,480 --> 00:04:55,200 This one setting is what sets this apart 100 00:04:55,200 --> 00:04:58,210 from creation of a standard storage account, 101 00:04:58,210 --> 00:05:03,210 enabling the hierarchical namespace. As mentioned earlier, 102 00:05:03,280 --> 00:05:06,930 all of those performance management and security benefits 103 00:05:06,930 --> 00:05:10,243 stem off of having this hierarchical namespace. 104 00:05:11,631 --> 00:05:13,290 I leave everything else at default for now 105 00:05:13,290 --> 00:05:14,853 and come to Review + Create. 106 00:05:15,820 --> 00:05:18,740 It's going to verify that the setup looks good, 107 00:05:18,740 --> 00:05:20,633 and now I can click Create. 108 00:05:23,020 --> 00:05:24,560 That's going to take a couple of moments 109 00:05:24,560 --> 00:05:26,120 to spin up this resource. 110 00:05:26,120 --> 00:05:28,830 I'll pause the video here and be right back with you 111 00:05:28,830 --> 00:05:29,930 when it has completed. 112 00:05:31,940 --> 00:05:33,910 Alright, that resource is ready. 113 00:05:33,910 --> 00:05:36,350 It actually spun that up very quickly. 114 00:05:36,350 --> 00:05:40,370 And now we can click Go to Resource. Of course, 115 00:05:40,370 --> 00:05:44,730 lots of helpful information right on our Overview page. 116 00:05:44,730 --> 00:05:48,790 Let's come over under Data Storage and look at Containers, 117 00:05:48,790 --> 00:05:51,200 and containers are really how you're going to be dividing 118 00:05:51,200 --> 00:05:53,670 up a lot of your analytics files, 119 00:05:53,670 --> 00:05:55,720 which we'll talk about in another lesson, 120 00:05:56,750 --> 00:05:59,690 but it's easy as clicking on the Create Container button, 121 00:05:59,690 --> 00:06:02,720 and we can name this something like landing 122 00:06:02,720 --> 00:06:03,953 for our landing zone. 123 00:06:06,730 --> 00:06:11,023 And maybe we have another that has our curated data. 124 00:06:16,500 --> 00:06:19,170 And so you can already begin to imagine the benefits 125 00:06:19,170 --> 00:06:23,210 of having this hierarchical namespace and easily managing 126 00:06:23,210 --> 00:06:27,340 different zones and those files within the zones. 127 00:06:27,340 --> 00:06:31,310 We can manage access control and ACLs per folder 128 00:06:31,310 --> 00:06:32,910 and per file, 129 00:06:32,910 --> 00:06:37,023 huge advantages over just your traditional blob storage. 130 00:06:37,860 --> 00:06:40,540 We can also manage things like the networking, 131 00:06:40,540 --> 00:06:44,040 which networks have access to this Azure data lake, 132 00:06:44,040 --> 00:06:46,300 as well as shared access signatures 133 00:06:46,300 --> 00:06:48,330 and other security features. 134 00:06:48,330 --> 00:06:51,050 We'll go further into those in a future lesson. 135 00:06:51,050 --> 00:06:53,030 For now, I just want you to start imagining 136 00:06:53,030 --> 00:06:55,560 how we can create this Azure data lake, 137 00:06:55,560 --> 00:06:57,670 start storing our files there, 138 00:06:57,670 --> 00:06:59,440 and set it up to be that central hub 139 00:06:59,440 --> 00:07:01,183 in our analytical solution. 140 00:07:03,240 --> 00:07:06,310 By way of review, Azure Data Lake Storage Gen2 141 00:07:06,310 --> 00:07:09,610 combines the benefits of Azure Blob storage 142 00:07:09,610 --> 00:07:11,433 and ADLs Gen1. 143 00:07:12,890 --> 00:07:16,160 Those added features make a great solution for analytic 144 00:07:16,160 --> 00:07:20,690 workloads or when a human-readable hierarchy is needed, 145 00:07:20,690 --> 00:07:23,480 because let's be honest, it's just easier for us to read 146 00:07:23,480 --> 00:07:25,723 sensible directory and file structures. 147 00:07:26,840 --> 00:07:29,500 ADLs Gen2 is a foundational component 148 00:07:29,500 --> 00:07:32,710 of almost every Azure analytics solution. 149 00:07:32,710 --> 00:07:36,260 So you're going to be circling back around to this a lot. 150 00:07:36,260 --> 00:07:38,040 No matter how you build your solution, 151 00:07:38,040 --> 00:07:40,993 it's usually going to be a key component of it. 152 00:07:42,160 --> 00:07:45,220 And finally, even though we're going to go over security 153 00:07:45,220 --> 00:07:46,620 in more depth later, 154 00:07:46,620 --> 00:07:48,840 I want you to have in mind that it's recommended 155 00:07:48,840 --> 00:07:51,860 to turn on the firewall and only allow access 156 00:07:51,860 --> 00:07:54,360 from other Azure services. 157 00:07:54,360 --> 00:07:55,970 That's really just in your best interest 158 00:07:55,970 --> 00:07:57,420 for keeping that data secure. 159 00:07:58,560 --> 00:08:00,270 Thank you for joining me for this lesson. 160 00:08:00,270 --> 00:08:01,440 If you have any questions, 161 00:08:01,440 --> 00:08:03,180 please feel free to reach out to me 162 00:08:03,180 --> 00:08:04,830 and I'll see you in the next one.