1 00:00:00,933 --> 00:00:02,330 All right, so in this lesson, 2 00:00:02,330 --> 00:00:04,100 we are going to talk about how to load 3 00:00:04,100 --> 00:00:07,100 a DataFrame with sensitive information. 4 00:00:07,100 --> 00:00:10,180 Now, this lesson is very much going to be a blending lesson. 5 00:00:10,180 --> 00:00:13,060 We're going to be combining topics on SAS tokens, 6 00:00:13,060 --> 00:00:15,240 Azure Key Vault, and we're going to be taking 7 00:00:15,240 --> 00:00:17,080 several trips to the portal. 8 00:00:17,080 --> 00:00:21,010 So if you find yourself questioning what a SAS token is 9 00:00:21,010 --> 00:00:23,350 or how to use Azure Key Vault, make sure to jump back 10 00:00:23,350 --> 00:00:26,410 into those lessons. But I'm hoping that, as we move forward, 11 00:00:26,410 --> 00:00:29,053 you're starting to see all of this come together. 12 00:00:29,930 --> 00:00:33,230 Next, we're going to take a look at some code in Databricks 13 00:00:33,230 --> 00:00:36,450 that you need in order to complete this task. 14 00:00:36,450 --> 00:00:38,360 It's time to go back to the Key Vault. 15 00:00:38,360 --> 00:00:40,430 Now, the very first step in this process 16 00:00:40,430 --> 00:00:44,960 is adding a Blob Storage and generating a SAS token. 17 00:00:44,960 --> 00:00:48,600 We need to do that in order to create this secure connection 18 00:00:48,600 --> 00:00:51,410 between Databricks and our Blob Storage. 19 00:00:51,410 --> 00:00:54,220 So let's actually go ahead and hop into the portal 20 00:00:54,220 --> 00:00:57,010 and take a look at how we do that. 21 00:00:57,010 --> 00:01:00,220 So here, we find ourself in my storage account. 22 00:01:00,220 --> 00:01:04,430 And what we need to do is we need to generate a SAS token. 23 00:01:04,430 --> 00:01:08,010 So we're going to come down here to Shared Access Signature, 24 00:01:08,010 --> 00:01:12,230 click on that. And you can see here that I can create 25 00:01:12,230 --> 00:01:14,450 a whole bunch of different options to really specify 26 00:01:14,450 --> 00:01:16,290 what I want this token to do. 27 00:01:16,290 --> 00:01:20,200 So let's just go ahead and allow all resource types for now. 28 00:01:20,200 --> 00:01:22,100 And we'll leave the rest of this as is, 29 00:01:22,100 --> 00:01:24,810 and we're going to go ahead and generate our SAS token. 30 00:01:24,810 --> 00:01:27,732 Now if you're doing this for real, you're going to want 31 00:01:27,732 --> 00:01:29,490 to change your start and expiration time, 32 00:01:29,490 --> 00:01:32,020 because that is same day and not very long. 33 00:01:32,020 --> 00:01:35,230 So, let's go ahead and generate that SAS token string. 34 00:01:35,230 --> 00:01:38,270 So I've clicked on that, and now it has generated my string. 35 00:01:38,270 --> 00:01:40,900 And so if I scroll down, and again, let me point this out. 36 00:01:40,900 --> 00:01:44,760 So these resources are all going away. 37 00:01:44,760 --> 00:01:47,180 So it's not really that much of a security concern, 38 00:01:47,180 --> 00:01:50,150 but you would not want to be sharing this information. 39 00:01:50,150 --> 00:01:52,710 So it gives me my SAS information. 40 00:01:52,710 --> 00:01:55,510 So we'll come back to that here in just a few minutes. 41 00:01:55,510 --> 00:01:57,970 But that's the first step in the process. 42 00:01:57,970 --> 00:02:00,390 The second step is hopping into Key Vault 43 00:02:00,390 --> 00:02:04,110 and configuring a secret to store my SAS token, 44 00:02:04,110 --> 00:02:06,320 because it's got to go somewhere, and we talked a lot 45 00:02:06,320 --> 00:02:08,530 about not hardcoding, but instead, 46 00:02:08,530 --> 00:02:12,310 storing our keys in the Key Vault. 47 00:02:12,310 --> 00:02:14,470 So let me hop into my Key Vault, 48 00:02:14,470 --> 00:02:16,380 which is the second tab here. 49 00:02:16,380 --> 00:02:19,180 So now we find ourself in my Key Vault, 50 00:02:19,180 --> 00:02:22,010 and I have scrolled down and clicked on Secrets. 51 00:02:22,010 --> 00:02:25,630 So now, I can generate or import a secret. 52 00:02:25,630 --> 00:02:27,900 So let me go ahead and click on that. 53 00:02:27,900 --> 00:02:29,900 So let's go ahead and give this a name, 54 00:02:29,900 --> 00:02:34,669 and we'll just call it dp203storagesecret. 55 00:02:35,980 --> 00:02:38,490 And we want to paste our token in, 56 00:02:38,490 --> 00:02:41,570 and this is the token that we got from this last storage. 57 00:02:41,570 --> 00:02:43,860 So we'll go ahead and paste that in. 58 00:02:43,860 --> 00:02:48,560 And then let's go ahead and create that secret. 59 00:02:48,560 --> 00:02:51,090 All right, so now we have our storage secret, 60 00:02:51,090 --> 00:02:52,513 which is step 2. 61 00:02:53,420 --> 00:02:57,343 Then, we need to link our Azure Databricks to the Key Vault. 62 00:02:58,370 --> 00:03:01,150 And we do that by going into Databricks, 63 00:03:01,150 --> 00:03:03,670 under Create Secret Scope. 64 00:03:03,670 --> 00:03:06,520 And you can actually find that by going to the URL 65 00:03:06,520 --> 00:03:11,520 of your Databricks, and then #secrets/createScope. 66 00:03:12,110 --> 00:03:14,110 Or you can access it from the menu here. 67 00:03:14,110 --> 00:03:17,010 But I've opened this up, so let's go ahead and type in 68 00:03:17,010 --> 00:03:19,890 just dp203secret for our name, 69 00:03:19,890 --> 00:03:21,950 and then we need to link our Key Vault. 70 00:03:21,950 --> 00:03:24,910 And so to do that, we're going to come into our Key Vault, 71 00:03:24,910 --> 00:03:29,910 go down to Properties. We're going to grab our Vault URI, 72 00:03:30,070 --> 00:03:31,920 and then we're going to paste that here. 73 00:03:33,300 --> 00:03:36,470 And then we're going to grab our resource ID as well, 74 00:03:36,470 --> 00:03:40,610 and paste that in right here. 75 00:03:40,610 --> 00:03:42,530 Go ahead and create that, 76 00:03:42,530 --> 00:03:46,973 and then that will configure our secret scope. 77 00:03:48,240 --> 00:03:50,010 Let's go ahead and click on that. 78 00:03:50,010 --> 00:03:53,970 All right, so we have now generated a SAS token, 79 00:03:53,970 --> 00:03:56,390 we have created a secret to store our SAS token 80 00:03:56,390 --> 00:03:58,440 in the Key Vault, and then we have linked 81 00:03:58,440 --> 00:04:01,000 our Databricks to our Key Vault. 82 00:04:01,000 --> 00:04:03,140 The next thing we need to do is we're ready to go ahead 83 00:04:03,140 --> 00:04:08,140 and mount our storage container to our Databricks account. 84 00:04:08,380 --> 00:04:12,230 And this is the code that we will be using to do that. 85 00:04:12,230 --> 00:04:14,820 And so you can see, actually, here at the very top, 86 00:04:14,820 --> 00:04:17,140 we are adding the storage account, the container, 87 00:04:17,140 --> 00:04:19,530 and the reference for the secret 88 00:04:19,530 --> 00:04:21,300 in order to pass our SAS token. 89 00:04:21,300 --> 00:04:24,170 So this is the piece that's really important, 90 00:04:24,170 --> 00:04:25,380 up here at the top. 91 00:04:25,380 --> 00:04:26,950 Now, there's 2 different ways to do this. 92 00:04:26,950 --> 00:04:28,450 This is the first way. 93 00:04:28,450 --> 00:04:32,180 Now if we do this, anyone who has access to this workspace 94 00:04:32,180 --> 00:04:35,660 will have access to this data source. 95 00:04:35,660 --> 00:04:39,060 If we just want the data source to be passed or written 96 00:04:39,060 --> 00:04:42,260 directly using our SAS token, which means that not everyone 97 00:04:42,260 --> 00:04:44,200 in the workspace would have access, 98 00:04:44,200 --> 00:04:46,650 then we want to use this. 99 00:04:46,650 --> 00:04:51,040 So this is a code snippet that allows us to use 100 00:04:51,040 --> 00:04:53,070 our SAS token to directly write, 101 00:04:53,070 --> 00:04:55,970 without actually mounting our data source. 102 00:04:55,970 --> 00:04:58,970 Very similar, you can see our container and our SAS token, 103 00:04:58,970 --> 00:05:00,800 and then the source and URI. 104 00:05:00,800 --> 00:05:02,260 So this is what we would use 105 00:05:02,260 --> 00:05:05,590 if we want to just directly write. 106 00:05:05,590 --> 00:05:08,270 All right, so let's go ahead and wrap all this up. 107 00:05:08,270 --> 00:05:11,580 It really comes down to the basic flow that you need 108 00:05:11,580 --> 00:05:15,100 in order to mount and write securely to a data source. 109 00:05:15,100 --> 00:05:17,050 First, we create the SAS token. 110 00:05:17,050 --> 00:05:19,690 Then we create a secret to store the SAS token. 111 00:05:19,690 --> 00:05:22,930 Then we link our Databricks to our Azure Key Vault. 112 00:05:22,930 --> 00:05:25,420 Then we mount the container using the Key Vault 113 00:05:25,420 --> 00:05:28,230 or alternately, we just write directly 114 00:05:28,230 --> 00:05:31,990 using that SAS token with that second code snippet. 115 00:05:31,990 --> 00:05:33,300 And with that, we're all done. 116 00:05:33,300 --> 00:05:35,570 That's what we need to remember to load a DataFrame 117 00:05:35,570 --> 00:05:39,140 with sensitive information for the DP-203. 118 00:05:39,140 --> 00:05:41,240 All right, I'll see you in the next lesson.