1 00:00:00,650 --> 00:00:01,730 Hey, what's up Gurus. 2 00:00:01,730 --> 00:00:04,180 In this lesson, we are going to be talking about 3 00:00:04,180 --> 00:00:07,730 implementing version control for pipeline artifacts. 4 00:00:07,730 --> 00:00:08,770 And in order to do that, 5 00:00:08,770 --> 00:00:10,880 we need to talk about GitHub. 6 00:00:10,880 --> 00:00:13,310 And so I'm going to provide you a very basic introduction 7 00:00:13,310 --> 00:00:16,370 to GitHub and CI/CD. 8 00:00:16,370 --> 00:00:18,270 However, CI/CD and GitHub, 9 00:00:18,270 --> 00:00:20,950 we're going to talk about those at the 20,000 foot level, 10 00:00:20,950 --> 00:00:24,600 because they're not really on the DP-203. 11 00:00:24,600 --> 00:00:25,830 I mean, they kind of are 12 00:00:25,830 --> 00:00:28,010 because we're talking about version control, 13 00:00:28,010 --> 00:00:29,140 but just a little bit. 14 00:00:29,140 --> 00:00:32,430 So we're not going to go very deep into quality assurance, 15 00:00:32,430 --> 00:00:34,030 CI/CD, GitHub. 16 00:00:34,030 --> 00:00:35,610 We're just going to talk in broad strokes 17 00:00:35,610 --> 00:00:38,170 about what they actually are. 18 00:00:38,170 --> 00:00:39,950 Then we're going to talk about source control 19 00:00:39,950 --> 00:00:41,130 and Azure Data Factory, 20 00:00:41,130 --> 00:00:43,620 and I'm going to talk to you about using GitHub 21 00:00:43,620 --> 00:00:47,070 in Azure Data Factory, and the best way to do that 22 00:00:47,070 --> 00:00:48,500 is to show it to you in the portal. 23 00:00:48,500 --> 00:00:49,600 So we'll hop into the portal 24 00:00:49,600 --> 00:00:52,590 and take a look at how everything functions. 25 00:00:52,590 --> 00:00:54,973 With that, let's jump into the lesson. 26 00:00:55,840 --> 00:00:58,360 So, start off with CI/CD. 27 00:00:58,360 --> 00:01:03,350 Why do we care about CI/CD, and what even is CI/CD? 28 00:01:03,350 --> 00:01:05,430 The thought process behind CI/CD 29 00:01:05,430 --> 00:01:08,780 is to take small changes and continually push them 30 00:01:08,780 --> 00:01:12,180 through the system rather than wait, store up, 31 00:01:12,180 --> 00:01:15,510 and then do one massive change all at once. 32 00:01:15,510 --> 00:01:18,963 That's the thought process behind CI/CD. 33 00:01:19,870 --> 00:01:22,850 So CI/CD is important because it helps you to decide 34 00:01:22,850 --> 00:01:25,260 how code is built, tested, and released. 35 00:01:25,260 --> 00:01:27,030 It allows you to reduce downtime, 36 00:01:27,030 --> 00:01:28,370 increases your efficiency, 37 00:01:28,370 --> 00:01:31,860 and allows for parallel development. 38 00:01:31,860 --> 00:01:33,290 How does it do that? 39 00:01:33,290 --> 00:01:35,820 Well, through branches and commits. 40 00:01:35,820 --> 00:01:36,750 And of course, again, 41 00:01:36,750 --> 00:01:38,830 we're talking about this in very broad strokes, 42 00:01:38,830 --> 00:01:41,260 but at the base level, branches and commits 43 00:01:41,260 --> 00:01:43,880 are one way that CI/CD helps you 44 00:01:43,880 --> 00:01:45,430 to do parallel development 45 00:01:45,430 --> 00:01:48,720 and increases your efficiency, and et cetera. 46 00:01:48,720 --> 00:01:52,340 So let's take a look at this picture over here on the right. 47 00:01:52,340 --> 00:01:55,480 And what we can see here is we have a master branch 48 00:01:55,480 --> 00:01:57,780 is what that's called. That's the blue circles. 49 00:01:57,780 --> 00:01:59,700 And then we have 2 other branches. 50 00:01:59,700 --> 00:02:01,300 We have an orange branch, 51 00:02:01,300 --> 00:02:04,800 and then we have that green branch down there at the bottom. 52 00:02:04,800 --> 00:02:08,170 So let's say that we are implementing some new code. 53 00:02:08,170 --> 00:02:10,310 And we implement the new code, 54 00:02:10,310 --> 00:02:11,920 and as soon as we've implemented the new code, 55 00:02:11,920 --> 00:02:14,450 someone says, hey, you know what we really need? 56 00:02:14,450 --> 00:02:16,450 It'd be really great if we had some orange cats 57 00:02:16,450 --> 00:02:18,680 that just popped up on the screen 58 00:02:18,680 --> 00:02:21,410 when you opened up your browser. 59 00:02:21,410 --> 00:02:22,630 Okay. 60 00:02:22,630 --> 00:02:25,910 So they sent you on the task of getting those orange cats 61 00:02:25,910 --> 00:02:27,400 to pop up on the browser. 62 00:02:27,400 --> 00:02:28,800 That may take a while. 63 00:02:28,800 --> 00:02:32,160 So that represents our orange circles up there at the top. 64 00:02:32,160 --> 00:02:34,620 That's you working through the changes 65 00:02:34,620 --> 00:02:36,950 that need to happen to the code 66 00:02:36,950 --> 00:02:39,750 in order to get those orange cats to pop up. 67 00:02:39,750 --> 00:02:41,860 Well, while that's going on, that blue branch, 68 00:02:41,860 --> 00:02:42,830 hey, it's still going. 69 00:02:42,830 --> 00:02:45,430 And there's still some basic things that you need to do 70 00:02:45,430 --> 00:02:49,170 in order to keep the code base moving forward 71 00:02:49,170 --> 00:02:52,380 and error-free and et cetera, et cetera. 72 00:02:52,380 --> 00:02:55,540 And so each one of those orange bubbles or blue bubbles 73 00:02:55,540 --> 00:02:57,730 represents basically a save. 74 00:02:57,730 --> 00:03:00,520 So we can work on our orange cats. 75 00:03:00,520 --> 00:03:03,130 We can continue to iteratively work on that 76 00:03:03,130 --> 00:03:03,990 until we're done, 77 00:03:03,990 --> 00:03:05,000 those orange bubbles. 78 00:03:05,000 --> 00:03:06,240 And once we're done, 79 00:03:06,240 --> 00:03:08,690 we can request a pull. 80 00:03:08,690 --> 00:03:11,400 Now that is this line right here. 81 00:03:11,400 --> 00:03:13,740 And that is basically us saying hey, we're done. 82 00:03:13,740 --> 00:03:17,060 We want to pull our work back into another branch. 83 00:03:17,060 --> 00:03:17,893 So in this case, 84 00:03:17,893 --> 00:03:19,320 we want to take our work and pull it 85 00:03:19,320 --> 00:03:21,140 into this master branch. 86 00:03:21,140 --> 00:03:23,060 And we want to combine those two together 87 00:03:23,060 --> 00:03:25,050 once we're sure that everything is safe 88 00:03:25,050 --> 00:03:27,580 and that our cats aren't going to break 89 00:03:27,580 --> 00:03:29,410 the rest of the application. 90 00:03:29,410 --> 00:03:33,870 So this is essentially how CI/CD branch and commits work 91 00:03:33,870 --> 00:03:36,570 and GitHub is the repository that we use 92 00:03:36,570 --> 00:03:37,800 to make that happen. 93 00:03:37,800 --> 00:03:39,180 Again, keep in mind, 94 00:03:39,180 --> 00:03:43,673 this is an extremely high-level look at CI/CD and GitHub. 95 00:03:44,540 --> 00:03:46,290 The goal is just to get you to understand 96 00:03:46,290 --> 00:03:48,650 a few of the basics of kind of what's happening here, 97 00:03:48,650 --> 00:03:51,171 and then the real work is actually going to happen now, 98 00:03:51,171 --> 00:03:55,330 as we start to talk about source control and Data Factory. 99 00:03:55,330 --> 00:03:57,366 So where do we use this? 100 00:03:57,366 --> 00:04:00,230 Well, for the DP-203, the answer is Data Factory. 101 00:04:00,230 --> 00:04:02,700 Of course, there's other places we can use GitHub 102 00:04:02,700 --> 00:04:04,500 and we can use CI/CD, 103 00:04:04,500 --> 00:04:05,990 but for the DP-203, 104 00:04:05,990 --> 00:04:08,080 you're going to use it in Data Factory. 105 00:04:08,080 --> 00:04:11,750 The advantages again, source control, performance, 106 00:04:11,750 --> 00:04:13,540 collaboration, and integration. 107 00:04:13,540 --> 00:04:17,800 Those are really the reasons that we want to do CI/CD, 108 00:04:17,800 --> 00:04:19,690 and we want to use source control. 109 00:04:19,690 --> 00:04:22,150 Source control being those branches, right. 110 00:04:22,150 --> 00:04:23,660 We're going to have the master branch, 111 00:04:23,660 --> 00:04:25,760 and then we're going to build additional branches 112 00:04:25,760 --> 00:04:28,510 in order to work on multiple projects 113 00:04:28,510 --> 00:04:30,830 in parallel at the same time, 114 00:04:30,830 --> 00:04:33,000 and then pull everything into that master 115 00:04:33,000 --> 00:04:36,080 once we're sure that everything is safe. 116 00:04:36,080 --> 00:04:40,150 So let's jump in and take a look at this in Data Factory. 117 00:04:40,150 --> 00:04:42,530 So I'm going to switch over here to the portal. 118 00:04:42,530 --> 00:04:43,740 The first thing we need to see 119 00:04:43,740 --> 00:04:46,070 is under Creating a Data Factory. 120 00:04:46,070 --> 00:04:48,680 So if I went in to start a brand new data factory, 121 00:04:48,680 --> 00:04:52,030 I have an option to configure a GitHub. 122 00:04:52,030 --> 00:04:54,400 So I can come in here and I can choose 123 00:04:54,400 --> 00:04:58,460 and put all of my information in for my repository. 124 00:04:58,460 --> 00:04:59,680 So I click on GitHub, 125 00:04:59,680 --> 00:05:01,080 I put my repo name in, 126 00:05:01,080 --> 00:05:03,250 I put the branch in that I'm working on, 127 00:05:03,250 --> 00:05:06,210 all of the information as I create my Data Factory, 128 00:05:06,210 --> 00:05:10,240 and then I can use that repository in Data Factory. 129 00:05:10,240 --> 00:05:13,393 So now let's jump over and look in Data Factory Studio 130 00:05:13,393 --> 00:05:15,300 by clicking here. 131 00:05:15,300 --> 00:05:17,340 And first, we're going to go down to Manage, 132 00:05:17,340 --> 00:05:20,620 and we're going to see here under Git Configuration 133 00:05:20,620 --> 00:05:23,230 that this starts off in the master branch. 134 00:05:23,230 --> 00:05:25,550 So if I do work in Data Factory right now, 135 00:05:25,550 --> 00:05:28,060 it's going to save it to that master branch. 136 00:05:28,060 --> 00:05:30,950 But if you remember, I am working on orange cats, 137 00:05:30,950 --> 00:05:32,500 so I can click on this, 138 00:05:32,500 --> 00:05:35,700 and I can choose my orange cat branch. 139 00:05:35,700 --> 00:05:36,533 Hey, there we go. 140 00:05:36,533 --> 00:05:38,110 Let's go ahead and discard changes, 141 00:05:38,110 --> 00:05:40,910 and now we're in our orange cat branch. 142 00:05:40,910 --> 00:05:42,220 So anything that I do now 143 00:05:42,220 --> 00:05:45,180 is going to go to that orange cat branch. 144 00:05:45,180 --> 00:05:48,647 So I can come in here and I can create a new pipeline. 145 00:05:48,647 --> 00:05:49,480 So let's just start. 146 00:05:49,480 --> 00:05:50,313 We're not going to build the whole thing, 147 00:05:50,313 --> 00:05:52,840 but let's just say that we were working on our pipeline here 148 00:05:52,840 --> 00:05:54,590 under this orange cat branch. 149 00:05:54,590 --> 00:05:56,170 And we finished all of our work. 150 00:05:56,170 --> 00:05:57,270 We've saved everything, 151 00:05:57,270 --> 00:06:00,920 and we are ready to pull it into our master branch. 152 00:06:00,920 --> 00:06:02,380 Well, I can click on that 153 00:06:02,380 --> 00:06:04,550 and I can create a pull request, 154 00:06:04,550 --> 00:06:06,250 which we talked about that, 155 00:06:06,250 --> 00:06:10,810 and then I can choose to pull into our master branch. 156 00:06:10,810 --> 00:06:14,090 So here's my orange cat moving into the master branch 157 00:06:14,090 --> 00:06:16,260 and it'll look and compare any changes 158 00:06:16,260 --> 00:06:19,060 between my version and the master version. 159 00:06:19,060 --> 00:06:21,640 And we can talk about what kinds of things 160 00:06:21,640 --> 00:06:24,430 we would need to do before we were comfortable 161 00:06:24,430 --> 00:06:25,800 and ready to do that, 162 00:06:25,800 --> 00:06:29,590 understanding that there's not going to be any bad effects. 163 00:06:29,590 --> 00:06:31,940 Alright, so with that, I'm actually going to stop, 164 00:06:31,940 --> 00:06:34,120 jump back over into the lesson, 165 00:06:34,120 --> 00:06:36,500 and we are going to stop talking about this. 166 00:06:36,500 --> 00:06:38,580 There's tons of things that we could talk about 167 00:06:38,580 --> 00:06:40,403 with CI/CD and GitHub, 168 00:06:41,530 --> 00:06:43,840 but for the DP-203, which is our focus, 169 00:06:43,840 --> 00:06:46,840 we only need to understand those basics. 170 00:06:46,840 --> 00:06:48,150 So just remember that. 171 00:06:48,150 --> 00:06:51,230 So it's just the basics for CI/CD, 172 00:06:51,230 --> 00:06:53,270 source control being really important, 173 00:06:53,270 --> 00:06:56,030 which is why I spent so much time focusing on branches 174 00:06:56,030 --> 00:06:57,390 and how that works. 175 00:06:57,390 --> 00:06:59,800 And you need to understand the basic steps. 176 00:06:59,800 --> 00:07:00,810 And most importantly, 177 00:07:00,810 --> 00:07:02,860 you need to understand that everything lives 178 00:07:02,860 --> 00:07:06,670 in Data Factory, at least for the DP-203 179 00:07:06,670 --> 00:07:09,710 when we talk about CI/CD and version control. 180 00:07:09,710 --> 00:07:11,950 Alright, that is it for this lesson. 181 00:07:11,950 --> 00:07:13,553 I will see you in the next.