1 00:00:00,610 --> 00:00:03,250 Hey Cloud Gurus, welcome to the section recap. 2 00:00:03,250 --> 00:00:05,290 You made it! I'm so proud of you. 3 00:00:05,290 --> 00:00:07,570 I know this has been a lot of information, 4 00:00:07,570 --> 00:00:10,040 and so let's take a second to review some of the main points 5 00:00:10,040 --> 00:00:12,990 from this section and take a look at where we go from here. 6 00:00:17,070 --> 00:00:21,290 Azure Data Lake Storage Gen2 allows scalable, flexible, 7 00:00:21,290 --> 00:00:25,490 and highly available storage for a variety of data formats. 8 00:00:25,490 --> 00:00:28,180 And remember, this technology is a central piece 9 00:00:28,180 --> 00:00:31,220 of almost every data engineering solution. 10 00:00:31,220 --> 00:00:33,460 Of course, there are many other pieces, 11 00:00:33,460 --> 00:00:36,540 including Azure Synapse Analytics, Databricks, 12 00:00:36,540 --> 00:00:39,940 but for our storage section, it almost always comes back 13 00:00:39,940 --> 00:00:41,783 to Azure Data Lake Storage Gen2. 14 00:00:43,290 --> 00:00:46,410 Distribution, partitioning, sharding, and pruning 15 00:00:46,410 --> 00:00:48,930 are all unique methods of breaking up data 16 00:00:48,930 --> 00:00:50,830 into workable subsets. 17 00:00:50,830 --> 00:00:52,530 Remember that distribution has to do 18 00:00:52,530 --> 00:00:55,040 with those 60 underlying databases 19 00:00:55,040 --> 00:00:58,610 that Azure Synapse Analytics creates for you automatically. 20 00:00:58,610 --> 00:01:01,880 Partitioning is dividing up a single database instance 21 00:01:01,880 --> 00:01:03,780 into different chunks. 22 00:01:03,780 --> 00:01:06,860 Sharding differs from that in that it spreads the data 23 00:01:06,860 --> 00:01:08,970 across multiple computers. 24 00:01:08,970 --> 00:01:11,320 And pruning allows you to selectively pull back 25 00:01:11,320 --> 00:01:15,240 only specific pieces, even within a single partition 26 00:01:15,240 --> 00:01:17,223 according to your query filters. 27 00:01:18,550 --> 00:01:21,720 Know your data well in order to make wise decisions 28 00:01:21,720 --> 00:01:25,780 on folder structure, file formats, and partition keys. 29 00:01:25,780 --> 00:01:28,080 Each of your situations will be different, 30 00:01:28,080 --> 00:01:31,200 and how you answer exam questions may be different 31 00:01:31,200 --> 00:01:33,990 depending on the needs of the scenario. 32 00:01:33,990 --> 00:01:37,070 Keep in mind our zones for landing, 33 00:01:37,070 --> 00:01:39,078 curated production, and so forth, 34 00:01:39,078 --> 00:01:41,253 the different file formats, 35 00:01:41,253 --> 00:01:44,460 and each of their advantages and disadvantages, 36 00:01:44,460 --> 00:01:47,290 and how to wisely choose partition keys. 37 00:01:47,290 --> 00:01:49,420 Not necessarily the perfect key, 38 00:01:49,420 --> 00:01:51,290 because there is no perfect key, 39 00:01:51,290 --> 00:01:53,310 but the one that gets you the best results 40 00:01:53,310 --> 00:01:54,763 for your situation. 41 00:01:56,280 --> 00:01:57,940 All of these concepts coming back 42 00:01:57,940 --> 00:02:01,540 to one central theme, going fast. 43 00:02:01,540 --> 00:02:04,760 Remember, properly designed storage lays a foundation 44 00:02:04,760 --> 00:02:07,090 for achieving maximum query performance 45 00:02:07,090 --> 00:02:09,210 and data availability. 46 00:02:09,210 --> 00:02:10,670 All of the methods we've discussed 47 00:02:10,670 --> 00:02:12,810 have different ways of going about this, 48 00:02:12,810 --> 00:02:16,070 but it's pretty much all to accomplish this 1 goal, 49 00:02:16,070 --> 00:02:18,620 making things performant and making them available. 50 00:02:20,060 --> 00:02:21,870 And so at this point in the course, 51 00:02:21,870 --> 00:02:24,320 you're familiar with what data engineering is. 52 00:02:24,320 --> 00:02:26,357 Thanks to Brian's spectacular introduction 53 00:02:26,357 --> 00:02:30,000 in his crash course, you now have a foundation 54 00:02:30,000 --> 00:02:32,000 about data storage. 55 00:02:32,000 --> 00:02:33,700 From here, you're going to move on 56 00:02:33,700 --> 00:02:37,240 to learning about data ingestion and transformation 57 00:02:37,240 --> 00:02:38,850 before we get into how to create 58 00:02:38,850 --> 00:02:40,473 different processing solutions. 59 00:02:41,520 --> 00:02:43,810 I hope that you have enjoyed this section. 60 00:02:43,810 --> 00:02:45,910 If there's any part that you're shaky on, 61 00:02:45,910 --> 00:02:48,270 feel free to go back and review that video, 62 00:02:48,270 --> 00:02:51,330 practice the hands-on labs, and I would encourage you also 63 00:02:51,330 --> 00:02:53,370 to look up Microsoft's documentation 64 00:02:53,370 --> 00:02:55,470 on that particular subject. 65 00:02:55,470 --> 00:02:57,270 Approach learning from multiple avenues 66 00:02:57,270 --> 00:02:58,973 to get the best result possible. 67 00:03:00,090 --> 00:03:02,010 Thank you so much for joining me, 68 00:03:02,010 --> 00:03:04,400 take a break if you need, get a little coffee, 69 00:03:04,400 --> 00:03:07,070 absorb all that you've learned, and when you're ready, 70 00:03:07,070 --> 00:03:09,760 proceed on to the next section. 71 00:03:09,760 --> 00:03:11,083 Keep being awesome, Gurus.