1 00:00:00,640 --> 00:00:01,690 Hey, what's up, Gurus? 2 00:00:01,690 --> 00:00:04,750 In this lesson, we are going to take a couple of minutes, 3 00:00:04,750 --> 00:00:07,740 and we're going to talk about Data Factory Flow. 4 00:00:07,740 --> 00:00:09,770 So let's get started. 5 00:00:09,770 --> 00:00:12,180 So in the lesson, we are going to discuss 6 00:00:12,180 --> 00:00:14,810 what Data Factory Flow is. 7 00:00:14,810 --> 00:00:18,490 And then we're going to talk about why it's useful. 8 00:00:18,490 --> 00:00:20,760 We'll finish up by taking a look in the portal, 9 00:00:20,760 --> 00:00:24,493 and I'll actually show you how Data Factory Flow works. 10 00:00:27,210 --> 00:00:29,910 So at the beginning, Data Factory Flow 11 00:00:29,910 --> 00:00:33,564 is just a visual way to do data transformation 12 00:00:33,564 --> 00:00:36,720 in Azure Data Factory. 13 00:00:36,720 --> 00:00:40,380 So this is a visual no-code solution 14 00:00:40,380 --> 00:00:42,989 that allows you to do some basic transformations 15 00:00:42,989 --> 00:00:46,313 in Data Factory, which is very helpful. 16 00:00:47,270 --> 00:00:51,058 Now, basically what you do is, you go into Data Factory, 17 00:00:51,058 --> 00:00:54,480 and you can create Data Factory Flows, 18 00:00:54,480 --> 00:00:58,830 which are then added into your Data Factory pipelines 19 00:00:58,830 --> 00:01:00,423 as an additional step. 20 00:01:02,010 --> 00:01:04,610 So why would we want to use a Data Factory Flow, 21 00:01:04,610 --> 00:01:08,120 and why would we not want to use a Data Factory Flow? 22 00:01:08,120 --> 00:01:11,040 Well, the reason we want to use it is because, 23 00:01:11,040 --> 00:01:13,860 hey, it's faster, and it doesn't require code. 24 00:01:13,860 --> 00:01:15,710 If we're already in Data Factory, 25 00:01:15,710 --> 00:01:18,380 and we want to do some simple transformations, 26 00:01:18,380 --> 00:01:20,980 simple being the important word, hey, 27 00:01:20,980 --> 00:01:23,820 Data Factory Flow can be a fantastic solution, 28 00:01:23,820 --> 00:01:27,190 because we can just very quickly visually draw out 29 00:01:27,190 --> 00:01:28,780 what we want to have happen, 30 00:01:28,780 --> 00:01:32,333 and then implement that directly into our pipelines. 31 00:01:33,650 --> 00:01:37,740 So think Databricks Light when we talk about this. 32 00:01:37,740 --> 00:01:39,140 Now, why we wouldn't want it 33 00:01:39,140 --> 00:01:41,850 is if we have a lot of complexity. 34 00:01:41,850 --> 00:01:44,470 If you have a lot of complexity in your transformations, 35 00:01:44,470 --> 00:01:46,010 Data Factory Flow is probably 36 00:01:46,010 --> 00:01:48,670 not going to be the best solution. 37 00:01:48,670 --> 00:01:50,849 Or, if you need something lighter still, 38 00:01:50,849 --> 00:01:52,470 let's say you're pulling your data 39 00:01:52,470 --> 00:01:56,530 into a visualization solution like Power BI, 40 00:01:56,530 --> 00:01:58,980 you might want to do some transformations in Power BI, 41 00:01:58,980 --> 00:02:01,560 some very simple transformations. 42 00:02:01,560 --> 00:02:03,850 So it really comes down to what you're doing. 43 00:02:03,850 --> 00:02:06,070 If you're already using Data Factory, 44 00:02:06,070 --> 00:02:08,520 and you need some basic transformations, 45 00:02:08,520 --> 00:02:11,300 Data Factory Flow is a fantastic solution. 46 00:02:11,300 --> 00:02:14,190 If you have complex transformation needs 47 00:02:14,190 --> 00:02:17,170 or you're using an additional service like Power BI, 48 00:02:17,170 --> 00:02:18,560 you might want to evaluate 49 00:02:18,560 --> 00:02:21,210 and see which solution would be the best one for you. 50 00:02:23,050 --> 00:02:26,120 So that is the basics of Data Factory Flow. 51 00:02:26,120 --> 00:02:27,440 Let's jump on over to the portal 52 00:02:27,440 --> 00:02:29,563 and take a look at it in action. 53 00:02:31,410 --> 00:02:34,970 So here we find ourselves in our Data Factory studio, 54 00:02:34,970 --> 00:02:36,050 and what we're going to do, 55 00:02:36,050 --> 00:02:38,620 is we're going to choose Author, 56 00:02:38,620 --> 00:02:41,500 and I'm just going to create a real quick data flow for you. 57 00:02:41,500 --> 00:02:45,080 So we'll come over here, click on the 3 dots, 58 00:02:45,080 --> 00:02:47,560 do a new Data Factory Flow. 59 00:02:47,560 --> 00:02:48,680 Now, at the beginning, 60 00:02:48,680 --> 00:02:51,440 there's 3 basic things that you want to see. 61 00:02:51,440 --> 00:02:54,940 So, the first is our top bar for this. 62 00:02:54,940 --> 00:02:57,440 Then we have our graph or our pane, 63 00:02:57,440 --> 00:02:59,700 where all the work is going to take place. 64 00:02:59,700 --> 00:03:02,260 And then down below that, we have our configuration area, 65 00:03:02,260 --> 00:03:04,740 where we can configure the different components 66 00:03:04,740 --> 00:03:06,913 that we're dragging into Data Flow. 67 00:03:07,820 --> 00:03:10,490 So let's go ahead and start by adding a source. 68 00:03:10,490 --> 00:03:12,590 So I'm just going to click on the Add Source. 69 00:03:12,590 --> 00:03:16,140 It's going to pull in our first source box, 70 00:03:16,140 --> 00:03:18,760 and I'm going to come down to my configuration panel, 71 00:03:18,760 --> 00:03:21,030 and I'm just going to choose the components. 72 00:03:21,030 --> 00:03:22,110 Now, for this, 73 00:03:22,110 --> 00:03:24,080 I'm not going to go into all the configuration, 74 00:03:24,080 --> 00:03:26,800 because that's way more detailed than we need, 75 00:03:26,800 --> 00:03:27,633 but we're going to go ahead 76 00:03:27,633 --> 00:03:30,980 and just choose a sample dataset that I had created. 77 00:03:30,980 --> 00:03:34,560 Choose source2 by clicking another add source. 78 00:03:34,560 --> 00:03:36,730 And we're going to create a second source, 79 00:03:36,730 --> 00:03:39,380 and we could continue that process on as much as we wanted. 80 00:03:39,380 --> 00:03:42,710 And you can see that I can drag around my canvas 81 00:03:42,710 --> 00:03:44,863 and kind of resize things as needed. 82 00:03:46,090 --> 00:03:48,010 So now let's say that we have our 2 sources, 83 00:03:48,010 --> 00:03:49,860 and the very first thing that we want to do 84 00:03:49,860 --> 00:03:53,440 is we want to join those 2 sources. 85 00:03:53,440 --> 00:03:56,010 So I can click on this little plus button here, 86 00:03:56,010 --> 00:03:58,360 and I can look at all of the different types 87 00:03:58,360 --> 00:04:01,030 of transformations that I could do. 88 00:04:01,030 --> 00:04:04,180 So let's start off by creating a join. 89 00:04:04,180 --> 00:04:05,720 So right now, we're going to join, 90 00:04:05,720 --> 00:04:08,160 and it's only connected to 1 source. 91 00:04:08,160 --> 00:04:10,230 So that's our left stream, 92 00:04:10,230 --> 00:04:12,640 and I can also choose our right stream, 93 00:04:12,640 --> 00:04:13,970 and you can see that it just creates 94 00:04:13,970 --> 00:04:16,140 that nice little bar for me, 95 00:04:16,140 --> 00:04:18,300 and I can choose different types of joins, 96 00:04:18,300 --> 00:04:22,830 and I can create join conditions however I would like to. 97 00:04:22,830 --> 00:04:24,502 I can also come in to Optimize 98 00:04:24,502 --> 00:04:26,750 if I want to go more detailed, 99 00:04:26,750 --> 00:04:29,900 and I can create different types of partitioning 100 00:04:29,900 --> 00:04:31,620 for the different steps. 101 00:04:31,620 --> 00:04:34,800 So you can get pretty detailed with what you want to do. 102 00:04:34,800 --> 00:04:36,130 But essentially, all you're going to do 103 00:04:36,130 --> 00:04:37,955 is click on that plus button, 104 00:04:37,955 --> 00:04:39,238 and you're going to keep going. 105 00:04:39,238 --> 00:04:41,280 So I could create an aggregate. 106 00:04:41,280 --> 00:04:43,610 And I could come down here and I could define the columns 107 00:04:43,610 --> 00:04:45,820 that I want to aggregate together. 108 00:04:45,820 --> 00:04:47,760 So I could do all of my transformations, 109 00:04:47,760 --> 00:04:50,750 and then when I'm done, I can take that data flow, 110 00:04:50,750 --> 00:04:55,500 and I can introduce it into my Data Factory pipeline. 111 00:04:55,500 --> 00:04:57,990 So it's really a very handy toolset 112 00:04:57,990 --> 00:05:00,290 that allows you to do quite a bit. 113 00:05:00,290 --> 00:05:02,010 So let's finish up and kind of review 114 00:05:02,010 --> 00:05:03,690 what we have talked about. 115 00:05:03,690 --> 00:05:05,690 So what is Data Factory Flow? 116 00:05:05,690 --> 00:05:08,280 Well, it's a code-free transformational tool 117 00:05:08,280 --> 00:05:11,160 that we can use in Data Factory. 118 00:05:11,160 --> 00:05:14,020 We use it because it is code free. 119 00:05:14,020 --> 00:05:16,560 It's already integrated in with Data Factory 120 00:05:16,560 --> 00:05:20,300 and allows us to directly connect our flows into pipelines. 121 00:05:20,300 --> 00:05:23,170 But there are a few limitations in what you can do, 122 00:05:23,170 --> 00:05:26,150 so if you have something very complex, again, 123 00:05:26,150 --> 00:05:28,480 you may not want Data Factory Flow. 124 00:05:28,480 --> 00:05:30,330 Kind of depends on what you're doing. 125 00:05:31,260 --> 00:05:33,640 So a very useful example would be, 126 00:05:33,640 --> 00:05:36,800 if we have duplicate data, missing data, things like that. 127 00:05:36,800 --> 00:05:39,060 So, simple transformations. 128 00:05:39,060 --> 00:05:41,320 You want to take a look at Data Factory Flow. 129 00:05:41,320 --> 00:05:43,390 So just make sure you have those concepts in mind. 130 00:05:43,390 --> 00:05:45,560 You understand what Data Factory Flow is 131 00:05:45,560 --> 00:05:47,040 and the basics of how easy it is 132 00:05:47,040 --> 00:05:49,850 to create a flow in Data Factory. 133 00:05:49,850 --> 00:05:52,450 That's it for this lesson. I'll see you in the next.