1 00:00:00,290 --> 00:00:02,450 ‫So let's go ahead and get some practice 2 00:00:02,450 --> 00:00:03,780 ‫for Kinesis Data Streams. 3 00:00:03,780 --> 00:00:06,710 ‫So I'm going to open the Kinesis service, 4 00:00:06,710 --> 00:00:08,560 ‫and create our first Kinesis Data Streams. 5 00:00:08,560 --> 00:00:10,620 ‫As we can see, we have three options here. 6 00:00:10,620 --> 00:00:13,150 ‫We can use Data Streams, Data Firehose, or Data Analytics, 7 00:00:13,150 --> 00:00:14,870 ‫but we only know about Data Streams so far, 8 00:00:14,870 --> 00:00:15,900 ‫so let's go ahead. 9 00:00:15,900 --> 00:00:17,960 ‫We get some information around the pricing, 10 00:00:17,960 --> 00:00:22,960 ‫which is at per shard, we pay $0.05 per hour. 11 00:00:23,640 --> 00:00:24,473 ‫Okay. 12 00:00:24,473 --> 00:00:26,700 ‫And then there's a cost for making PUTs or for sending data 13 00:00:26,700 --> 00:00:29,000 ‫into Kinesis Data Streams. 14 00:00:29,000 --> 00:00:30,910 ‫So we'll create a Data Stream, 15 00:00:30,910 --> 00:00:33,290 ‫let's name our stream, DemoStream, 16 00:00:33,290 --> 00:00:35,960 ‫and then we have to define the data stream capacity. 17 00:00:35,960 --> 00:00:37,210 ‫So as you can see, we have two modes, 18 00:00:37,210 --> 00:00:40,370 ‫we have the On-demand mode and the Provision mode. 19 00:00:40,370 --> 00:00:41,640 ‫And in the On-demand mode, 20 00:00:41,640 --> 00:00:44,130 ‫you don't have to think about the capacity, 21 00:00:44,130 --> 00:00:46,070 ‫it's automatically going to scale for you, 22 00:00:46,070 --> 00:00:49,390 ‫so there's a maximum throughput of 200 megabytes per second, 23 00:00:49,390 --> 00:00:51,160 ‫and 200,000 records per second, 24 00:00:51,160 --> 00:00:54,980 ‫and the maximum read capacity of 400 megabytes per second, 25 00:00:54,980 --> 00:00:58,030 ‫per consumer, if you're using the enhanced Fan-Out option. 26 00:00:58,030 --> 00:01:00,950 ‫So the on demand as a pay-per throughput pricing model, 27 00:01:00,950 --> 00:01:03,030 ‫but there is no free tier. Okay. 28 00:01:03,030 --> 00:01:06,300 ‫And the Provision mode also doesn't have a free tier. 29 00:01:06,300 --> 00:01:08,750 ‫And in the provision mode, you need to provision shards. 30 00:01:08,750 --> 00:01:10,990 ‫And so there's a Shard estimator tool, 31 00:01:10,990 --> 00:01:13,210 ‫if you wanted to understand how many shards you needed, 32 00:01:13,210 --> 00:01:16,760 ‫based on like how many records you send per second, 33 00:01:16,760 --> 00:01:18,520 ‫and what is the size of your record, 34 00:01:18,520 --> 00:01:21,010 ‫and how many consumers you have. Okay. 35 00:01:21,010 --> 00:01:24,220 ‫But in a example, we're going to just set up one shard. 36 00:01:24,220 --> 00:01:27,160 ‫One shard gives us one megabyte per second for write 37 00:01:27,160 --> 00:01:29,730 ‫and two megabytes per second for read. 38 00:01:29,730 --> 00:01:30,990 ‫And obviously, if you put 10 shards, 39 00:01:30,990 --> 00:01:33,640 ‫then everything is multiplied by 10. Okay, 40 00:01:33,640 --> 00:01:36,010 ‫And so one shard is enough for us to do the demo. 41 00:01:36,010 --> 00:01:37,730 ‫It's also the cheapest option we can get. 42 00:01:37,730 --> 00:01:40,360 ‫If you do not want to pay any money in this course, 43 00:01:40,360 --> 00:01:41,580 ‫then do not do this hands on, 44 00:01:41,580 --> 00:01:43,610 ‫because you will pay some money for your shard, 45 00:01:43,610 --> 00:01:45,070 ‫although we will deal with it, 46 00:01:45,070 --> 00:01:47,260 ‫and then delete it fast enough. 47 00:01:47,260 --> 00:01:50,980 ‫So when you're ready, you just click on Create data stream, 48 00:01:50,980 --> 00:01:53,810 ‫and you wait for the data stream to be created. 49 00:01:53,810 --> 00:01:56,230 ‫And our stream is now successfully created. 50 00:01:56,230 --> 00:01:58,290 ‫So in terms of applications, we can see, 51 00:01:58,290 --> 00:02:01,360 ‫we have producers, and we are recommended three options. 52 00:02:01,360 --> 00:02:04,750 ‫The Kinesis Agents, the SDK, 53 00:02:04,750 --> 00:02:06,280 ‫or the Kinesis Producer Library. 54 00:02:06,280 --> 00:02:09,730 ‫So they're all available on GitHub, for you to check it out. 55 00:02:09,730 --> 00:02:13,700 ‫This is a way to stream data from application servers 56 00:02:13,700 --> 00:02:15,980 ‫into Kinesis Data Streams. 57 00:02:15,980 --> 00:02:18,970 ‫The SDK is for you to develop producers at a very low level. 58 00:02:18,970 --> 00:02:21,230 ‫And the KPL is for you to develop producers 59 00:02:21,230 --> 00:02:23,910 ‫at a very high level, with a better API. 60 00:02:23,910 --> 00:02:24,860 ‫And in terms of consumers, 61 00:02:24,860 --> 00:02:28,090 ‫we get Kinesis Data Analytics, Kinesis Data Firehose, 62 00:02:28,090 --> 00:02:31,150 ‫or the Kinesis Client Library, or Lambda, as well, 63 00:02:31,150 --> 00:02:33,370 ‫that is not shown as an option right here. 64 00:02:33,370 --> 00:02:35,450 ‫Okay. We get some monitoring information 65 00:02:35,450 --> 00:02:36,760 ‫about our Kinesis Data Streams. 66 00:02:36,760 --> 00:02:39,110 ‫So how many records we are sending into it? 67 00:02:39,110 --> 00:02:40,380 ‫We could look at the configuration. 68 00:02:40,380 --> 00:02:42,230 ‫So if you wanted to scale the stream, 69 00:02:42,230 --> 00:02:44,170 ‫we can say how many shards we want. 70 00:02:44,170 --> 00:02:46,610 ‫We can go from one, to say, five, 71 00:02:46,610 --> 00:02:49,470 ‫and scale our Kinesis Data Stream. 72 00:02:49,470 --> 00:02:50,670 ‫We could add some tags, 73 00:02:50,670 --> 00:02:53,420 ‫and then we could use enhanced Fan-Out, and configure it, 74 00:02:53,420 --> 00:02:55,610 ‫if we wanted to have some consumer applications 75 00:02:55,610 --> 00:02:58,660 ‫leveraging the enhanced Fan-Out capability. 76 00:02:58,660 --> 00:02:59,557 ‫But for now, let's keep it simple. 77 00:02:59,557 --> 00:03:03,230 ‫We just want to write and read from our stream, 78 00:03:03,230 --> 00:03:04,063 ‫and so therefore, 79 00:03:04,063 --> 00:03:07,850 ‫we're going to use the SDK for producing and for consuming. 80 00:03:07,850 --> 00:03:10,370 ‫So to do so, we want to open a CLI, 81 00:03:10,370 --> 00:03:12,590 ‫and let's use CloudShell because it's fun. 82 00:03:12,590 --> 00:03:14,350 ‫So I'm going to click on CloudShell right here, 83 00:03:14,350 --> 00:03:16,400 ‫which is the icon, next to the bell icon. 84 00:03:16,400 --> 00:03:18,180 ‫And this is going to open, for me,. 85 00:03:18,180 --> 00:03:20,773 ‫a command line interface in AWS. 86 00:03:21,870 --> 00:03:23,970 ‫As an alternative, you could be using your own terminal 87 00:03:23,970 --> 00:03:25,360 ‫or CLI if it's preconfigured, 88 00:03:25,360 --> 00:03:27,370 ‫but I like to switch things up, 89 00:03:27,370 --> 00:03:28,380 ‫and this one, I really like, 90 00:03:28,380 --> 00:03:30,640 ‫because there's less configuration involved at least. 91 00:03:30,640 --> 00:03:31,910 ‫Creating the environment, the first time, 92 00:03:31,910 --> 00:03:35,250 ‫can take a bit of time, and the CloudShell is free on AWS, 93 00:03:35,250 --> 00:03:36,640 ‫so no worries here. 94 00:03:36,640 --> 00:03:38,570 ‫In the meantime, on Kinesis, 95 00:03:38,570 --> 00:03:41,080 ‫open the kinesis-data-streams.sh file, 96 00:03:41,080 --> 00:03:43,330 ‫and we're going to use that overall. 97 00:03:43,330 --> 00:03:44,720 ‫So there are two types of commands 98 00:03:44,720 --> 00:03:48,500 ‫to write to Kinesis Data Stream based on your CLI version 99 00:03:48,500 --> 00:03:50,170 ‫for the AWS CLI. 100 00:03:50,170 --> 00:03:52,340 ‫So usually, you have version two installed 101 00:03:52,340 --> 00:03:54,160 ‫but it is possible for you to somehow, 102 00:03:54,160 --> 00:03:55,720 ‫have version one installed. 103 00:03:55,720 --> 00:03:56,920 ‫And so you get the version, 104 00:03:56,920 --> 00:04:00,480 ‫you just type aws --version, and paste, 105 00:04:00,480 --> 00:04:02,070 ‫and then, you will get some information 106 00:04:02,070 --> 00:04:04,280 ‫around the version of AWS CLI. 107 00:04:04,280 --> 00:04:05,370 ‫So in CloudShell, 108 00:04:05,370 --> 00:04:07,890 ‫the CLI version that's going to be installed is version two, 109 00:04:07,890 --> 00:04:11,440 ‫as you can see here, it was CLI version 2.1.16, 110 00:04:11,440 --> 00:04:13,570 ‫so we're going to use the CLI commands version two. 111 00:04:13,570 --> 00:04:15,580 ‫But if you wanted to have the version one, 112 00:04:15,580 --> 00:04:17,320 ‫then you would use these comments. 113 00:04:17,320 --> 00:04:19,300 ‫Okay. So now we're good to go. 114 00:04:19,300 --> 00:04:21,630 ‫The first we want to do is send a record 115 00:04:21,630 --> 00:04:23,600 ‫into our Kinesis Data Stream. 116 00:04:23,600 --> 00:04:27,020 ‫And to do so, there is an API called put-record. 117 00:04:27,020 --> 00:04:29,560 ‫And put-record, we need to specify a stream name. 118 00:04:29,560 --> 00:04:32,450 ‫So in this example, I didn't name my stream test, 119 00:04:32,450 --> 00:04:34,020 ‫I named it DemoStream, 120 00:04:35,330 --> 00:04:37,930 ‫so we'll have to change this in the CLI command, 121 00:04:37,930 --> 00:04:39,030 ‫but you get the idea. 122 00:04:39,030 --> 00:04:41,630 ‫Then you specify a partition key for the data settings. 123 00:04:41,630 --> 00:04:44,240 ‫So user one, and remember the data 124 00:04:44,240 --> 00:04:45,570 ‫that shares the same partition key 125 00:04:45,570 --> 00:04:47,840 ‫will go to the same shard, but we only have one shard, 126 00:04:47,840 --> 00:04:49,990 ‫so that doesn't matter in this case. 127 00:04:49,990 --> 00:04:52,820 ‫Then the data itself, so users signup. 128 00:04:52,820 --> 00:04:55,810 ‫And finally, because we are writing some text data, 129 00:04:55,810 --> 00:04:58,570 ‫we need to say this option cli-binary-format 130 00:04:58,570 --> 00:05:01,180 ‫raw-in-base64-out. 131 00:05:01,180 --> 00:05:04,720 ‫Okay. So let it paste this command, so copy and paste, 132 00:05:04,720 --> 00:05:07,480 ‫but let me just edit the stream name, 133 00:05:07,480 --> 00:05:09,950 ‫to make sure that it is DemoStream. 134 00:05:10,920 --> 00:05:12,900 ‫And the cloud channel is automatically configured 135 00:05:12,900 --> 00:05:14,450 ‫with your own IM credentials, 136 00:05:14,450 --> 00:05:17,000 ‫so it will inherit whatever you have for IM credentials, 137 00:05:17,000 --> 00:05:19,370 ‫and also, we'll use the region by default, 138 00:05:19,370 --> 00:05:20,240 ‫in which it was launched. 139 00:05:20,240 --> 00:05:21,683 ‫So us-east-1. 140 00:05:22,640 --> 00:05:23,860 ‫I press Enter. 141 00:05:23,860 --> 00:05:25,690 ‫And now, we get a successful message. 142 00:05:25,690 --> 00:05:29,930 ‫So the message was sent shardId-0000000000000. 143 00:05:29,930 --> 00:05:32,490 ‫So our first shard, and the sequence number 144 00:05:32,490 --> 00:05:34,010 ‫of the message is here. 145 00:05:34,010 --> 00:05:38,690 ‫If I do it again, then I'm going to get a second message 146 00:05:38,690 --> 00:05:41,040 ‫with a successful, so we can do user signup. 147 00:05:41,040 --> 00:05:42,530 ‫We can mix up the message, 148 00:05:42,530 --> 00:05:47,450 ‫then user login, and then, maybe user logout. 149 00:05:47,450 --> 00:05:49,350 ‫So we're just, we're just setting a few messages 150 00:05:49,350 --> 00:05:51,260 ‫into our Kinesis Data Stream. 151 00:05:51,260 --> 00:05:53,040 ‫Perfect. So I'm going to clear this, 152 00:05:53,040 --> 00:05:54,620 ‫and if you waited a little bit, 153 00:05:54,620 --> 00:05:58,480 ‫and went into monitoring, and look at the stream metrics, 154 00:05:58,480 --> 00:05:59,900 ‫I will have it on one hour, 155 00:05:59,900 --> 00:06:02,430 ‫you would see a put record right here, 156 00:06:02,430 --> 00:06:04,040 ‫but it takes a bit of time for CloudWatch metrics 157 00:06:04,040 --> 00:06:08,010 ‫to be updated, but you would see it here. Okay. 158 00:06:08,010 --> 00:06:10,270 ‫So next we want to be able to consume 159 00:06:10,270 --> 00:06:11,290 ‫from Kinesis Data Stream. 160 00:06:11,290 --> 00:06:15,400 ‫So to do so, we're going to first describe the stream, 161 00:06:15,400 --> 00:06:18,620 ‫to get some information around what this stream is made of, 162 00:06:18,620 --> 00:06:21,540 ‫because we need to be able to consume from a specific shard. 163 00:06:21,540 --> 00:06:24,930 ‫So it's DemoStream, I'll press Enter, 164 00:06:24,930 --> 00:06:27,260 ‫and as you can see here, we have this StreamDescription. 165 00:06:27,260 --> 00:06:30,650 ‫We have one shard called shardId-0000000000000, 166 00:06:30,650 --> 00:06:33,920 ‫and so we need to keep this in our mind. Okay. 167 00:06:33,920 --> 00:06:36,540 ‫To be able to read from this stream. 168 00:06:36,540 --> 00:06:38,570 ‫And so when you use the CLI, 169 00:06:38,570 --> 00:06:40,990 ‫the SDK at a very, very low level, 170 00:06:40,990 --> 00:06:43,630 ‫you need to specify from which shard you are reading from. 171 00:06:43,630 --> 00:06:46,580 ‫But if you are using Kinesis Client Library, 172 00:06:46,580 --> 00:06:49,260 ‫all of this is handled for you by the library itself. 173 00:06:49,260 --> 00:06:50,650 ‫But we are using the CLI, 174 00:06:50,650 --> 00:06:52,890 ‫so we have to specify the shard ID. 175 00:06:52,890 --> 00:06:55,040 ‫So I press Q to quit this, 176 00:06:55,040 --> 00:06:56,341 ‫and I'm going to consume some data. 177 00:06:56,341 --> 00:06:59,060 ‫So I'm going to run this command, right here, 178 00:06:59,060 --> 00:07:00,563 ‫and let me clear this. 179 00:07:01,730 --> 00:07:03,170 ‫And there's two things to note, 180 00:07:03,170 --> 00:07:05,500 ‫so number one, I need to change the name of the stream 181 00:07:05,500 --> 00:07:06,333 ‫I'm consuming from. 182 00:07:06,333 --> 00:07:09,310 ‫So DemoStream is a stream, 183 00:07:09,310 --> 00:07:12,450 ‫and then the shard-iterator-type is TRIM-HORIZON. 184 00:07:12,450 --> 00:07:14,410 ‫And this means that you're going to read 185 00:07:15,293 --> 00:07:16,126 ‫from the very beginning of the stream, 186 00:07:16,126 --> 00:07:17,800 ‫so it will read all the records 187 00:07:17,800 --> 00:07:19,560 ‫that were sent for from the beginning. 188 00:07:19,560 --> 00:07:22,210 ‫The other option, just make sure to only receive the records 189 00:07:22,210 --> 00:07:24,260 ‫from that very moment onwards when from a new launched 190 00:07:24,260 --> 00:07:25,740 ‫to CLI command. 191 00:07:25,740 --> 00:07:29,130 ‫Anyway, so I'm going to press Enter. 192 00:07:29,130 --> 00:07:31,665 ‫And this is going to give me a ShardIterator. 193 00:07:31,665 --> 00:07:36,040 ‫And this ShardIterator can be reused to consume records, 194 00:07:36,040 --> 00:07:37,730 ‫so the next API comment 195 00:07:37,730 --> 00:07:42,000 ‫is kinesis get-records --shard-iterator. 196 00:07:42,000 --> 00:07:45,490 ‫And then, we just specify this entire string, right here. 197 00:07:45,490 --> 00:07:47,380 ‫So this consumption mode I'm doing right now, 198 00:07:47,380 --> 00:07:49,320 ‫by using the low level API, 199 00:07:49,320 --> 00:07:51,330 ‫describing the stream, getting a ShardIterator, 200 00:07:51,330 --> 00:07:54,340 ‫and getting records is using the shared consumption mode. 201 00:07:54,340 --> 00:07:55,980 ‫This is not using enhanced Fan-Out, 202 00:07:55,980 --> 00:07:57,240 ‫which in my opinion, 203 00:07:57,240 --> 00:08:00,610 ‫should be using the Kinesis Client Library, 204 00:08:00,610 --> 00:08:03,520 ‫Consumer Library for you to really leverage, 205 00:08:03,520 --> 00:08:04,750 ‫and have a nice API to do so. 206 00:08:04,750 --> 00:08:06,340 ‫But this is low level. 207 00:08:06,340 --> 00:08:08,700 ‫So let's click, and let's press Enter 208 00:08:08,700 --> 00:08:09,930 ‫on the Kinesis get-records, 209 00:08:09,930 --> 00:08:12,330 ‫and we get a batch of records out of it. 210 00:08:12,330 --> 00:08:14,930 ‫So we used to have record one right here, 211 00:08:14,930 --> 00:08:16,970 ‫which is PartitionKey user1. 212 00:08:16,970 --> 00:08:20,380 ‫And we have some data right here, but it's base64-encoded. 213 00:08:20,380 --> 00:08:23,470 ‫We have, again, another data, base64-encoded. 214 00:08:23,470 --> 00:08:25,330 ‫We get some time stamp information, 215 00:08:25,330 --> 00:08:26,960 ‫another data, base64-encoded. 216 00:08:26,960 --> 00:08:28,620 ‫And then if I press Enter, 217 00:08:28,620 --> 00:08:30,240 ‫it's going to go a little bit down. 218 00:08:30,240 --> 00:08:32,780 ‫Some more data that is base64-encoded. 219 00:08:32,780 --> 00:08:34,940 ‫So to just make sure we can read the data, 220 00:08:34,940 --> 00:08:39,223 ‫I can go to websites, base 64 decode online, 221 00:08:40,300 --> 00:08:44,120 ‫and I'm going to paste this data right here, 222 00:08:44,120 --> 00:08:47,400 ‫into base 64 decode, and click on DECODE, 223 00:08:47,400 --> 00:08:51,170 ‫and this gives us user signup. 224 00:08:51,170 --> 00:08:53,923 ‫And if I paste the second type of data, this one, 225 00:08:55,190 --> 00:09:00,190 ‫copy and paste it in here, is going to give us user login. 226 00:09:01,070 --> 00:09:02,380 ‫So we which is that we sent. 227 00:09:02,380 --> 00:09:03,213 ‫So that's perfect. 228 00:09:03,213 --> 00:09:04,660 ‫Everything is working. 229 00:09:04,660 --> 00:09:05,870 ‫And then, as you can see, 230 00:09:05,870 --> 00:09:08,940 ‫there's a NextShardIterator argument right here. 231 00:09:08,940 --> 00:09:10,560 ‫So the next time we consume, 232 00:09:10,560 --> 00:09:14,080 ‫we need to specify this NextShardIterator argument 233 00:09:14,080 --> 00:09:16,670 ‫to consume from where we stop consuming from. 234 00:09:16,670 --> 00:09:19,230 ‫So this is something you have to iterate from in your code. 235 00:09:19,230 --> 00:09:20,530 ‫But at a low level, 236 00:09:20,530 --> 00:09:22,970 ‫we've produced data to Kinesis Data Stream, 237 00:09:22,970 --> 00:09:26,400 ‫and consume data from Kinesis Data Stream, which is awesome. 238 00:09:26,400 --> 00:09:28,500 ‫And we've also used CloudShell in the meantime, 239 00:09:28,500 --> 00:09:30,100 ‫which I think is very handy. 240 00:09:30,100 --> 00:09:31,150 ‫So that's it for this demo, 241 00:09:31,150 --> 00:09:33,820 ‫just keep this stream open as we will be using it 242 00:09:33,820 --> 00:09:36,170 ‫for Kinesis Data Firehose in a second. 243 00:09:36,170 --> 00:09:38,270 ‫And I will see you in the next lecture.