1 00:00:00,330 --> 00:00:02,010 ‫So let's do a deeper dive 2 00:00:02,010 --> 00:00:04,230 ‫into how the repository works 3 00:00:04,230 --> 00:00:07,440 ‫and the concept of upstream repositories. 4 00:00:07,440 --> 00:00:10,170 ‫So, when you have a Code Artifact Repo, 5 00:00:10,170 --> 00:00:13,920 ‫you can actually have multiple upstream repositories. 6 00:00:13,920 --> 00:00:15,060 ‫So let's have an example. 7 00:00:15,060 --> 00:00:17,550 ‫Here, the repository "my-repo" 8 00:00:17,550 --> 00:00:19,761 ‫can have an upstream of Repository A 9 00:00:19,761 --> 00:00:22,830 ‫and an upstream of Repository B. 10 00:00:22,830 --> 00:00:24,180 ‫So what does it mean to have upstreams 11 00:00:24,180 --> 00:00:25,470 ‫and why is it helpful? 12 00:00:25,470 --> 00:00:26,915 ‫Well, when you have an upstream, 13 00:00:26,915 --> 00:00:31,234 ‫any package manager that is trying to access your base repo, 14 00:00:31,234 --> 00:00:33,936 ‫can also try to find dependencies 15 00:00:33,936 --> 00:00:37,380 ‫in all the upstream repositories. 16 00:00:37,380 --> 00:00:40,260 ‫But, the benefits of this, is that's for your developer, 17 00:00:40,260 --> 00:00:42,510 ‫for whoever connects to your repo, 18 00:00:42,510 --> 00:00:45,480 ‫there's only a single repository endpoint, 19 00:00:45,480 --> 00:00:46,620 ‫and then you can have 20 00:00:46,620 --> 00:00:49,800 ‫up to ten upstream repositories per repository. 21 00:00:49,800 --> 00:00:51,360 ‫So that means we're going to be able to search 22 00:00:51,360 --> 00:00:55,055 ‫a tree of repositories for the right dependencies. 23 00:00:55,055 --> 00:00:58,163 ‫Now also, when you define a repository, 24 00:00:58,163 --> 00:01:00,930 ‫you can have what's called a external connection, 25 00:01:00,930 --> 00:01:03,386 ‫and it can only be one per repository. 26 00:01:03,386 --> 00:01:04,219 ‫So for example, 27 00:01:04,219 --> 00:01:08,311 ‫Repository A can be connected to an external repository, 28 00:01:08,311 --> 00:01:11,432 ‫and this is called an external connection, 29 00:01:11,432 --> 00:01:14,671 ‫and it could be the public NPM repo 30 00:01:14,671 --> 00:01:17,550 ‫and algorithms that we get. 31 00:01:17,550 --> 00:01:18,990 ‫We get the benefits that, 32 00:01:18,990 --> 00:01:21,360 ‫by connecting to just one single repo, 33 00:01:21,360 --> 00:01:24,060 ‫we have access to our private dependencies 34 00:01:24,060 --> 00:01:27,030 ‫but also any public dependencies that we've defined 35 00:01:27,030 --> 00:01:29,580 ‫through external connections. 36 00:01:29,580 --> 00:01:31,260 ‫So let's talk about these external connections 37 00:01:31,260 --> 00:01:32,233 ‫a little bit more. 38 00:01:32,233 --> 00:01:34,200 ‫So when you have an external connection, 39 00:01:34,200 --> 00:01:36,725 ‫it is by default a connection 40 00:01:36,725 --> 00:01:41,340 ‫between one of your repos and an external public repository. 41 00:01:41,340 --> 00:01:43,860 ‫For example, it could be Maven for the Java world 42 00:01:43,860 --> 00:01:47,190 ‫it could be NPM, for the JavaScript world, 43 00:01:47,190 --> 00:01:50,010 ‫PyPI for the Python world 44 00:01:50,010 --> 00:01:53,482 ‫and nuggets for the dot-net c-sharp world. 45 00:01:53,482 --> 00:01:56,010 ‫And so we have here a- 46 00:01:56,010 --> 00:02:00,270 ‫the opportunity to connect NPM to a repository 47 00:02:00,270 --> 00:02:02,760 ‫within CodeArtifacts, called an external connection. 48 00:02:02,760 --> 00:02:04,200 ‫And then what's going to happen is that, 49 00:02:04,200 --> 00:02:09,200 ‫if a package that is fetched from NPM is not available 50 00:02:09,510 --> 00:02:11,670 ‫in Repo A, then we're going to fetch it 51 00:02:11,670 --> 00:02:15,810 ‫and is going to be stored as a cash diversion in our Repo A. 52 00:02:15,810 --> 00:02:18,660 ‫And so this is why there's only a maximum of one 53 00:02:18,660 --> 00:02:21,090 ‫external connection per public repository to 54 00:02:21,090 --> 00:02:23,567 ‫have like a perfect cash mechanism. 55 00:02:23,567 --> 00:02:26,583 ‫But if you want to have multiple public repositories 56 00:02:26,583 --> 00:02:29,040 ‫then you can just have many repository 57 00:02:29,040 --> 00:02:30,170 ‫within CodeArtifacts. 58 00:02:30,170 --> 00:02:34,395 ‫So for example, here we want to connect to npmjs.com 59 00:02:34,395 --> 00:02:37,290 ‫and so we're going to configure one repo with one 60 00:02:37,290 --> 00:02:40,123 ‫external connection in your accounts, and then 61 00:02:40,123 --> 00:02:43,650 ‫all the other repositories of your account that needs to 62 00:02:43,650 --> 00:02:45,647 ‫also fetch from NPM can just 63 00:02:45,647 --> 00:02:49,457 ‫have defined the Repository A as an upstream. 64 00:02:49,457 --> 00:02:53,490 ‫And now that we have this, any package fetched 65 00:02:53,490 --> 00:02:55,830 ‫from NPM the "js" are going to be cashed 66 00:02:55,830 --> 00:02:59,258 ‫in the upstream Repo A, and then automatically 67 00:02:59,258 --> 00:03:03,540 ‫the Repo B, C, and D will have access to these packages. 68 00:03:03,540 --> 00:03:05,340 ‫So as a developer, when you pull a package 69 00:03:05,340 --> 00:03:06,540 ‫it's going to be cashed in repo A 70 00:03:06,540 --> 00:03:10,830 ‫and then sent right through to your build system. 71 00:03:10,830 --> 00:03:12,330 ‫So what about the retention 72 00:03:12,330 --> 00:03:14,340 ‫of these artifacts in CodeArtifacts? 73 00:03:14,340 --> 00:03:16,920 ‫So, if you have a requested package 74 00:03:16,920 --> 00:03:18,559 ‫and it's found in an upstream repo 75 00:03:18,559 --> 00:03:20,880 ‫then a reference to it is going to be retained 76 00:03:20,880 --> 00:03:24,128 ‫and always available to the downstream repository. 77 00:03:24,128 --> 00:03:27,360 ‫And then if you somehow change the package 78 00:03:27,360 --> 00:03:29,670 ‫in the upstream repository, it's not going to 79 00:03:29,670 --> 00:03:32,855 ‫affect your own copy in the downstream repository. 80 00:03:32,855 --> 00:03:35,670 ‫And any intermediate repository that allowed you to 81 00:03:35,670 --> 00:03:38,850 ‫fetch that package will not be able to keep the package 82 00:03:38,850 --> 00:03:40,257 ‫for efficiency purposes. 83 00:03:40,257 --> 00:03:44,505 ‫So let's have a look. If we take a package from npmjs.com, 84 00:03:44,505 --> 00:03:47,220 ‫but to get there we have two upstreams, 85 00:03:47,220 --> 00:03:48,333 ‫What's going to happen is 86 00:03:48,333 --> 00:03:51,107 ‫that the package manager is going to request this package. 87 00:03:51,107 --> 00:03:54,390 ‫Then it's not present in any of the three repos 88 00:03:54,390 --> 00:03:57,000 ‫so it's going to be fetched, then it's going to be fetched, 89 00:03:57,000 --> 00:03:59,670 ‫it's going to be retained in Repository A. 90 00:03:59,670 --> 00:04:02,310 ‫That's because it's the most downstream repository 91 00:04:02,310 --> 00:04:03,510 ‫and it's the one we connect 92 00:04:03,510 --> 00:04:05,570 ‫to from our developer machine. 93 00:04:05,570 --> 00:04:07,428 ‫Repository C, as well, 94 00:04:07,428 --> 00:04:10,212 ‫because that's the one that has the external connection 95 00:04:10,212 --> 00:04:12,840 ‫to the NPM public repository. 96 00:04:12,840 --> 00:04:14,798 ‫So we'll keep a copy cashed in it 97 00:04:14,798 --> 00:04:18,720 ‫and the Repository B will not have anything 98 00:04:18,720 --> 00:04:19,578 ‫because it is considered 99 00:04:19,578 --> 00:04:23,631 ‫in this chain an intermediate repository. 100 00:04:23,631 --> 00:04:27,810 ‫Finally, it looks like we're copying stuff all around 101 00:04:27,810 --> 00:04:30,211 ‫and so we are sort of duplicating packages 102 00:04:30,211 --> 00:04:33,090 ‫but this is a good time to introduce domains. 103 00:04:33,090 --> 00:04:34,770 ‫So when you have repository 104 00:04:34,770 --> 00:04:37,050 ‫you can also introduce the concept of a domain. 105 00:04:37,050 --> 00:04:39,810 ‫And a domain can span across multiple accounts 106 00:04:39,810 --> 00:04:42,360 ‫and multiple repositories within these accounts. 107 00:04:42,360 --> 00:04:45,780 ‫And when you have a domain, actually you define one storage 108 00:04:45,780 --> 00:04:48,954 ‫for all your repositories and so you de-duplicate storage 109 00:04:48,954 --> 00:04:51,782 ‫because, if the same dependency 110 00:04:51,782 --> 00:04:54,211 ‫must be in different repository, 111 00:04:54,211 --> 00:04:56,280 ‫then it's going to be stored once 112 00:04:56,280 --> 00:04:59,640 ‫in a domain in shared storage and then only references 113 00:04:59,640 --> 00:05:02,310 ‫to it will be stored in your repository. 114 00:05:02,310 --> 00:05:04,110 ‫So it makes it extremely efficient. 115 00:05:04,110 --> 00:05:05,970 ‫And then you can see, then you can create 116 00:05:05,970 --> 00:05:08,310 ‫as many upstream connections as you want. 117 00:05:08,310 --> 00:05:09,420 ‫You also have fast copying 118 00:05:09,420 --> 00:05:11,969 ‫because when you pull repository dependencies 119 00:05:11,969 --> 00:05:13,650 ‫then, instead of copying them, 120 00:05:13,650 --> 00:05:17,040 ‫you just create a new reference to the shared storage. 121 00:05:17,040 --> 00:05:19,470 ‫Also, it's very easy to share stuff 122 00:05:19,470 --> 00:05:21,030 ‫in the domain across teams 123 00:05:21,030 --> 00:05:24,450 ‫because you have the same metadata, the same assets, 124 00:05:24,450 --> 00:05:27,601 ‫and you encrypt everything with the same KMS key. 125 00:05:27,601 --> 00:05:30,270 ‫And on top of it, if you wanted to define an 126 00:05:30,270 --> 00:05:32,490 ‫access policy into your repositories, 127 00:05:32,490 --> 00:05:34,410 ‫you can just define what's called a domain 128 00:05:34,410 --> 00:05:37,026 ‫resource based policy, which is going to be applied 129 00:05:37,026 --> 00:05:39,854 ‫to all the accounts and all the repositories 130 00:05:39,854 --> 00:05:41,760 ‫within the the domain. 131 00:05:41,760 --> 00:05:43,890 ‫And you can define advanced rules, 132 00:05:43,890 --> 00:05:46,260 ‫such as who has the right to set up 133 00:05:46,260 --> 00:05:48,570 ‫and modify external connections. 134 00:05:48,570 --> 00:05:50,640 ‫Okay, so that's it for this lecture. 135 00:05:50,640 --> 00:05:53,823 ‫I hope you liked it and I will see you in the next lecture.