1 00:00:00,220 --> 00:00:02,820 Welcome to the "Section Recap". 2 00:00:02,820 --> 00:00:05,140 In this lesson, we're going to review everything 3 00:00:05,140 --> 00:00:07,300 that we learned in the section. 4 00:00:07,300 --> 00:00:09,130 Yes, it should be a review. 5 00:00:09,130 --> 00:00:11,520 If you find that it's not, jump back in, 6 00:00:11,520 --> 00:00:12,710 re-watch the lessons 7 00:00:12,710 --> 00:00:15,560 to make sure that you understand the content. 8 00:00:15,560 --> 00:00:18,770 Also, we are focusing on the DP-203. 9 00:00:18,770 --> 00:00:21,860 So, there are several lessons especially in this section 10 00:00:21,860 --> 00:00:23,960 that we didn't go as deep as you would need 11 00:00:23,960 --> 00:00:25,950 to really understand the concepts, 12 00:00:25,950 --> 00:00:27,670 but we went as deep as we needed to 13 00:00:27,670 --> 00:00:31,683 to get you to pass the DP-203, which is the focus. 14 00:00:32,770 --> 00:00:34,620 And finally, number 3, 15 00:00:34,620 --> 00:00:36,880 again, if you don't know something review. 16 00:00:36,880 --> 00:00:39,280 Jump back in, do the labs, review the lessons, 17 00:00:39,280 --> 00:00:40,630 read through Microsoft docs, 18 00:00:40,630 --> 00:00:43,010 make sure that you understand the material 19 00:00:43,010 --> 00:00:44,563 before moving forward. 20 00:00:45,480 --> 00:00:47,590 Alright, star schemas. 21 00:00:47,590 --> 00:00:49,160 We talked about star schemas, 22 00:00:49,160 --> 00:00:52,000 and if you remember, star schemas have the fact table. 23 00:00:52,000 --> 00:00:53,780 That's the bit in the middle right there. 24 00:00:53,780 --> 00:00:56,440 And that's full of all the countable items. 25 00:00:56,440 --> 00:00:59,390 Star schemas have 1 dimension table level, 26 00:00:59,390 --> 00:01:01,510 and those are the dimension tables. 27 00:01:01,510 --> 00:01:05,000 And all of those tie into the fact table. 28 00:01:05,000 --> 00:01:07,810 Also, we talked about it not being normalized, 29 00:01:07,810 --> 00:01:10,850 which means that there's lots of copies of data. 30 00:01:10,850 --> 00:01:14,210 So star schemas are known for not being normalized, 31 00:01:14,210 --> 00:01:17,070 and they're used for simple queries. 32 00:01:17,070 --> 00:01:19,730 So when we do star schemas, think simple queries 33 00:01:19,730 --> 00:01:23,067 and not normalized, which means there's copies of data, 34 00:01:23,067 --> 00:01:25,660 and 1 dimension table level. 35 00:01:25,660 --> 00:01:27,730 When we talk about snowflake though, 36 00:01:27,730 --> 00:01:30,810 that is going to have multiple dimension table levels, 37 00:01:30,810 --> 00:01:32,083 as seen here. 38 00:01:32,930 --> 00:01:35,557 And then, we talked a little bit about high cardinality, 39 00:01:35,557 --> 00:01:38,050 and that was just little repetition. 40 00:01:38,050 --> 00:01:40,310 So, it is very normalized. 41 00:01:40,310 --> 00:01:42,740 So there's not going to be a lot of copies of data, there's 42 00:01:42,740 --> 00:01:46,260 not going to be a lot of repetition, that's snowflake. 43 00:01:46,260 --> 00:01:49,460 So this is going to be much better for complex queries, 44 00:01:49,460 --> 00:01:52,510 that's where you would use a snowflake schema. 45 00:01:52,510 --> 00:01:55,440 And snowflake schemas are going to take less storage space 46 00:01:55,440 --> 00:01:59,123 because again, there's not tons of copies of the data. 47 00:02:00,210 --> 00:02:03,250 We also talked about fact tables and fact table grains. 48 00:02:03,250 --> 00:02:06,620 If you remember, fact tables contain numeric data. 49 00:02:06,620 --> 00:02:08,970 So think profits, product sales, 50 00:02:08,970 --> 00:02:10,630 registers, things like that. 51 00:02:10,630 --> 00:02:12,230 Countable items. 52 00:02:12,230 --> 00:02:14,410 Each row represents a single event. 53 00:02:14,410 --> 00:02:15,920 So that could be a purchase, 54 00:02:15,920 --> 00:02:19,083 or register receipt, something like that. 55 00:02:20,030 --> 00:02:22,930 And those single events are going to give us data 56 00:02:22,930 --> 00:02:25,840 that we can measure to give us insights. 57 00:02:25,840 --> 00:02:27,570 We talked about fact table grains, 58 00:02:27,570 --> 00:02:29,800 and the grain was just the level of detail 59 00:02:29,800 --> 00:02:31,610 that the fact table contains. 60 00:02:31,610 --> 00:02:33,170 We talked about how important it was 61 00:02:33,170 --> 00:02:35,320 to adjust the grain appropriately, 62 00:02:35,320 --> 00:02:38,090 so that you have a good fit for the queries you're running 63 00:02:38,090 --> 00:02:41,053 and the data that exists in your tables. 64 00:02:42,330 --> 00:02:44,450 Alright, we also talked about that relationship 65 00:02:44,450 --> 00:02:46,470 between fact and dimension tables. 66 00:02:46,470 --> 00:02:48,375 In fact, they do have a relationship 67 00:02:48,375 --> 00:02:51,570 and that's defined by your keys. 68 00:02:51,570 --> 00:02:54,710 We have primary keys and foreign keys. 69 00:02:54,710 --> 00:02:57,370 So primary keys are the unique data column 70 00:02:57,370 --> 00:02:59,560 used to define your relationships. 71 00:02:59,560 --> 00:03:00,930 And then your primary keys, 72 00:03:00,930 --> 00:03:03,433 that's going to link 2 different tables together. 73 00:03:05,730 --> 00:03:08,120 We also talked about external tables. 74 00:03:08,120 --> 00:03:09,900 External tables are just tables 75 00:03:09,900 --> 00:03:12,730 that live outside of your database. 76 00:03:12,730 --> 00:03:15,540 And it's helpful to be able to connect to those 77 00:03:15,540 --> 00:03:17,560 when you need to access data, 78 00:03:17,560 --> 00:03:20,560 but you don't want to copy the entire data set. 79 00:03:20,560 --> 00:03:24,820 So we looked at some scripts to give us a fast ad hoc way 80 00:03:24,820 --> 00:03:29,503 to access data outside the bounds of our databases. 81 00:03:31,150 --> 00:03:33,320 And then, we talked about metastores. 82 00:03:33,320 --> 00:03:37,110 And if you remember, metastores are just databases 83 00:03:37,110 --> 00:03:39,820 that hold metadata about our data. 84 00:03:39,820 --> 00:03:42,030 So think paths and formats. 85 00:03:42,030 --> 00:03:45,590 And we talked about Databricks and Synapse Spark Pools, 86 00:03:45,590 --> 00:03:49,140 creating a metadata database, 87 00:03:49,140 --> 00:03:52,170 but 1 individual for each workspace. 88 00:03:52,170 --> 00:03:54,670 And when you want to access multiple workspaces, 89 00:03:54,670 --> 00:03:56,260 the metadata within that, 90 00:03:56,260 --> 00:03:59,460 you can create a metastore and you can access 91 00:03:59,460 --> 00:04:01,852 and connect all of that individual metadata 92 00:04:01,852 --> 00:04:05,483 into 1 metastore that you can then work with. 93 00:04:06,950 --> 00:04:09,790 And then we talked about the what, when, where, and why 94 00:04:09,790 --> 00:04:12,617 of maintaining metadata. 95 00:04:12,617 --> 00:04:16,690 And we talked about copy activities in Data Factory 96 00:04:16,690 --> 00:04:21,090 and Azure Synapse Analytics. And being able to pull metadata 97 00:04:21,090 --> 00:04:25,500 on customer-specified metadata language, disposition, 98 00:04:25,500 --> 00:04:27,570 type, encoding, cache control, 99 00:04:27,570 --> 00:04:29,690 and pulling that data in when we use 100 00:04:29,690 --> 00:04:34,150 our copy data activity in Data Factory or Synapse. 101 00:04:34,150 --> 00:04:36,030 Remember you need to use binary. 102 00:04:36,030 --> 00:04:39,700 And remember, that we use it because we get continuity. 103 00:04:39,700 --> 00:04:41,020 We get more information, 104 00:04:41,020 --> 00:04:43,490 so we can kind of see the trail of what's happening. 105 00:04:43,490 --> 00:04:45,833 So that's a very helpful thing as well. 106 00:04:47,120 --> 00:04:49,310 And then finally, in summary, 107 00:04:49,310 --> 00:04:51,940 so that takes us to the end of all of the lessons 108 00:04:51,940 --> 00:04:54,030 that we've looked through in this section. 109 00:04:54,030 --> 00:04:56,770 This was a much more conceptual section, 110 00:04:56,770 --> 00:04:59,960 but still keep a focus on the services that you would use 111 00:04:59,960 --> 00:05:01,600 for those concepts. 112 00:05:01,600 --> 00:05:03,850 So make sure that you're thinking about that. 113 00:05:04,720 --> 00:05:06,450 Also that database material 114 00:05:06,450 --> 00:05:10,070 when we talk about snowflake and star schemas, 115 00:05:10,070 --> 00:05:12,280 that should be largely a review. 116 00:05:12,280 --> 00:05:14,170 If it's not, you might want to go back in 117 00:05:14,170 --> 00:05:16,380 and just spend a little bit of time digging deeper 118 00:05:16,380 --> 00:05:20,040 into snowflake and star schemas in general. 119 00:05:20,040 --> 00:05:21,350 Then that would also hold true 120 00:05:21,350 --> 00:05:24,420 for primary and secondary keys. 121 00:05:24,420 --> 00:05:25,620 You don't need to know everything, 122 00:05:25,620 --> 00:05:28,120 but you do need to know the basics. 123 00:05:28,120 --> 00:05:29,590 Don't forget the labs. 124 00:05:29,590 --> 00:05:32,200 You are going to need those for the exam 125 00:05:32,200 --> 00:05:34,390 and for your career. So don't forget about the labs. 126 00:05:34,390 --> 00:05:36,500 Lastly, if you're enjoying this course, 127 00:05:36,500 --> 00:05:39,490 hey, do me a huge favor and smash that thumbs up button 128 00:05:39,490 --> 00:05:40,970 as you go through the lessons. 129 00:05:40,970 --> 00:05:42,068 It means the world to me, 130 00:05:42,068 --> 00:05:45,010 I'm super excited we have finished this section. 131 00:05:45,010 --> 00:05:46,390 We are getting very close, 132 00:05:46,390 --> 00:05:48,860 we're in the last quarter of this course. 133 00:05:48,860 --> 00:05:51,640 I'm getting excited for you as you near completion, 134 00:05:51,640 --> 00:05:53,570 and get ready to take your exam, 135 00:05:53,570 --> 00:05:55,196 and grab your certification. 136 00:05:55,196 --> 00:05:59,090 Alright, with that, let's jump on to the next section. 137 00:05:59,090 --> 00:06:00,040 I'll see you there.