1 00:00:00,030 --> 00:00:02,909 so in this lesson we're just going to take this a little bit further and look 2 00:00:02,909 --> 00:00:08,030 at how we can use cloud watch as an administration tool so not only just for 3 00:00:08,030 --> 00:00:13,080 invoking ec2 instances in an auto scaling group or whatever we're doing 4 00:00:13,080 --> 00:00:20,160 but also to look at how we can use it as one one of our key tool box tools that 5 00:00:20,160 --> 00:00:25,710 we can actually monitor and then we can actually use for compliance whether we 6 00:00:25,710 --> 00:00:30,720 can use for providing information to management and the like so we'll go 7 00:00:30,720 --> 00:00:36,059 through and look at statistics we'll go through further into CloudWatch alarms 8 00:00:36,059 --> 00:00:40,559 and also look at cloud watch events so the way cloud watch can trigger 9 00:00:40,559 --> 00:00:45,829 something to happen and also we'll look at cloud watch logs like cloud watch can 10 00:00:45,829 --> 00:00:56,550 can monitor log information and also act on that log information if required so 11 00:00:56,550 --> 00:01:03,180 cloud watch metrics are available across a wide range in the majority of AWS 12 00:01:03,180 --> 00:01:07,020 resources but they're not available in all regions on all resources so you 13 00:01:07,020 --> 00:01:10,560 really need to have a look at the full list that's online at the developer 14 00:01:10,560 --> 00:01:14,549 guide look at the online version because it's it's going to be updated when that 15 00:01:14,549 --> 00:01:19,020 when those services are more available for example that they could be billing 16 00:01:19,020 --> 00:01:23,310 so we not only look at it for resources but we can also look at it for for 17 00:01:23,310 --> 00:01:27,780 billing so if our costs are going out of control then it's a really good thing to 18 00:01:27,780 --> 00:01:31,110 be able to be alerted to that so so that's that's something to definitely 19 00:01:31,110 --> 00:01:36,600 consider if you are an administrator and you're responsible for a budget for 20 00:01:36,600 --> 00:01:44,250 example we can look at dynamodb ec2 EBS elastic beanstalk Opsworks Kinesis 21 00:01:44,250 --> 00:01:49,079 Firehose streams and like so a wide range of areas where we can use cloud watch 22 00:01:49,079 --> 00:01:59,790 metrics not just in ec2 so not only can we look at and monitor raw data we can 23 00:01:59,790 --> 00:02:03,630 actually look at statistics so if we wanted to look at average and Max and 24 00:02:03,630 --> 00:02:08,660 images probably what we would mostly look at if we're looking at this for a 25 00:02:08,660 --> 00:02:12,989 for monitoring purpose as far as getting information and displaying that 26 00:02:12,989 --> 00:02:18,110 information we can also get those those statistics 27 00:02:18,110 --> 00:02:22,100 through three methods we can get it through the CLI the API or through the 28 00:02:22,100 --> 00:02:30,020 console so through the CLI or the API to the get metric statistics call and we 29 00:02:30,020 --> 00:02:36,709 can get a maximum number of data points of up to 50,850 the maximum number 30 00:02:36,709 --> 00:02:43,700 of data points returned from a single request is 1440 so you need to be aware 31 00:02:43,700 --> 00:02:48,890 of that if you are looking to dump large or you looking to collect large amounts 32 00:02:48,890 --> 00:02:55,100 of information and bring it in so you really need to treat this information as 33 00:02:55,100 --> 00:02:58,100 what it is it's a stream of information so it's not something that you would 34 00:02:58,100 --> 00:03:03,140 just come along and just get a great big dump of that you would normally manager 35 00:03:03,140 --> 00:03:11,239 as a stream or as a regular reading of that data in the short term so and again 36 00:03:11,239 --> 00:03:15,320 you can do that with the console and you can also create dashboards so you can 37 00:03:15,320 --> 00:03:24,799 display multiple graphs of alarms and metrics on on a dashboard screen so so 38 00:03:24,799 --> 00:03:32,540 it's very useful so you need to really not only understand alarms occur but the 39 00:03:32,540 --> 00:03:37,790 way in which they occur and how they are invoked so again we can have building 40 00:03:37,790 --> 00:03:42,500 alarms as well as resource alarm so be aware of that and we'll have a look at 41 00:03:42,500 --> 00:03:48,440 how we utilize building alarms in the at the end of this lesson we know that it 42 00:03:48,440 --> 00:03:52,790 integrates with SNS and does that quite well we know that there are three states 43 00:03:52,790 --> 00:03:58,070 okay alarm and insufficient data so we just need to make sure that we fully 44 00:03:58,070 --> 00:04:03,799 understand how an alarm is invoked and and when it is actually invoked in there 45 00:04:03,799 --> 00:04:07,640 under what conditions so I've just got this graph here straight out of the 46 00:04:07,640 --> 00:04:10,790 developer guide and the reason I've done that is because this is a graph that you 47 00:04:10,790 --> 00:04:15,609 will see on your exam no doubt so just want to make sure that it looks and 48 00:04:15,609 --> 00:04:20,660 feels the way that you're going to be used to in the exam so looking at this 49 00:04:20,660 --> 00:04:25,880 we've got an alarm that is set up with a threshold set to 3 and an evaluation 50 00:04:25,880 --> 00:04:30,420 here it is set to three so if a metric is 51 00:04:30,420 --> 00:04:35,820 above the alarm threshold for the number of time periods defined by the 52 00:04:35,820 --> 00:04:40,830 evaluation period then the alarm is invoked so looking at our graph here 53 00:04:40,830 --> 00:04:46,920 after time period one and two we're down at one so we're well below our threshold 54 00:04:46,920 --> 00:04:53,460 or three so that blue line there is our threshold of three so when we go to time 55 00:04:53,460 --> 00:05:00,360 period three and we can see that all of sudden jumps up to four so that's our 56 00:05:00,360 --> 00:05:07,230 first point where it has exceeded the threshold but an alarm is not has not 57 00:05:07,230 --> 00:05:12,630 been invoked because it hasn't done it for the evaluation period of three so it 58 00:05:12,630 --> 00:05:16,320 hasn't done it three consecutive times so that's the first time and then we 59 00:05:16,320 --> 00:05:21,750 look at time period for which it's still above and time period five which is 60 00:05:21,750 --> 00:05:28,140 above so at time period five it's occurred after three time periods 61 00:05:28,140 --> 00:05:34,800 so now evaluation period of three so at time period number five our action is 62 00:05:34,800 --> 00:05:42,150 invoked our alarm will be invoked so then it drops back down and basically 63 00:05:42,150 --> 00:05:48,630 our our service can allow another alarm to occur when it's when it's dropped 64 00:05:48,630 --> 00:05:52,740 back down from the alarm threshold so we've dropped back down and we've gone 65 00:05:52,740 --> 00:05:56,640 down there the time period eight where we're down to one again and then all of 66 00:05:56,640 --> 00:06:01,470 a sudden at nine it's jumped right up to five and a half and then drop back down 67 00:06:01,470 --> 00:06:06,600 again so that's dropped up there for one only after one time period so because it 68 00:06:06,600 --> 00:06:11,340 hasn't done it for three consecutive time period so at the end of three 69 00:06:11,340 --> 00:06:16,830 consider time periods it hasn't occurred that alarm won't be invoked so just need 70 00:06:16,830 --> 00:06:20,880 to understand how that how that works and understand that graph also 71 00:06:20,880 --> 00:06:24,390 understand that it's not always above the threshold it could be below a 72 00:06:24,390 --> 00:06:32,490 threshold as well so depending on how how that metric is set up so another 73 00:06:32,490 --> 00:06:39,810 thing to consider is cloud watch logs so cloud watch can monitor store and access 74 00:06:39,810 --> 00:06:45,179 log files from ec2 instances from AWS cloud 75 00:06:45,179 --> 00:06:51,269 trail which which collects logs on our API calls through all of our resources 76 00:06:51,269 --> 00:06:57,629 or it could be from a number of other resources that we generate log files so 77 00:06:57,629 --> 00:07:02,129 that allows us to have real time monitoring of our log information so no 78 00:07:02,129 --> 00:07:06,809 longer is this log just being created and then stored away somewhere for later 79 00:07:06,809 --> 00:07:13,559 view cloud watch can monitor that log information as a stream in real time as 80 00:07:13,559 --> 00:07:20,249 it's occurring which is a great thing so you can really define certain situations 81 00:07:20,249 --> 00:07:23,879 where you want to be not notified in real time of that and to take corrective 82 00:07:23,879 --> 00:07:28,769 action or whatever you need to do so those logs will be set up as a log 83 00:07:28,769 --> 00:07:34,319 stream and that will be a sequence of a log events from a particular source via 84 00:07:34,319 --> 00:07:41,099 an ec2 instance or whatever it is now we can set up our log streams in log groups 85 00:07:41,099 --> 00:07:47,039 so those log groups they have that there are groups of streams that have the same 86 00:07:47,039 --> 00:07:54,269 retention and monitoring and access control settings so you would define 87 00:07:54,269 --> 00:07:57,509 those retention monitoring and access control settings and then you would have 88 00:07:57,509 --> 00:08:03,179 your log streams organized into log groups so it's one thing to have these 89 00:08:03,179 --> 00:08:06,899 streams Headley's log group log groups and have all this information coming in 90 00:08:06,899 --> 00:08:13,259 the cloud which needs to know what to look for it needs to know what is not 91 00:08:13,259 --> 00:08:19,309 normal and so metric filters they allow cloud watch they give cloud watch the 92 00:08:19,309 --> 00:08:23,999 opportunity to look at that information and to extract what it needs so defines 93 00:08:23,999 --> 00:08:31,709 how information is extracted from that log stream and in what situation a data 94 00:08:31,709 --> 00:08:37,169 point a data point would be created by cloud watch and then we have our 95 00:08:37,169 --> 00:08:41,669 retention settings and so they're just how long our events are kept in cloud 96 00:08:41,669 --> 00:08:46,800 watch logs so cloud watch logs and combine again in the next level c-clamp 97 00:08:46,800 --> 00:08:49,980 watch events - very very good services that you can 98 00:08:49,980 --> 00:08:54,060 use outside of just the usual alarm situations from an easy 99 00:08:54,060 --> 00:08:58,560 to instance or whatever that you can use for in particular for compliance and 100 00:08:58,560 --> 00:09:02,400 security issues and also for troubleshooting issues you've used if 101 00:09:02,400 --> 00:09:06,690 you you know you've got problems and then you can clearly define what you are 102 00:09:06,690 --> 00:09:13,560 looking so you can really sort out the bugs in your infrastructure next we have 103 00:09:13,560 --> 00:09:19,710 cloud watch events so they occur when a resource changes States for example it 104 00:09:19,710 --> 00:09:23,870 could be an ec2 changes from being running to terminated 105 00:09:23,870 --> 00:09:29,700 it could be when an auto scaling group launches a new instance so that's an 106 00:09:29,700 --> 00:09:35,760 event that occurs when a resource has changed we can also look at integrating 107 00:09:35,760 --> 00:09:42,300 this with cloud cloud trail so cloud trail as we know it logs our 108 00:09:42,300 --> 00:09:50,520 API calls so it's a very good security compliance and troubleshooting tool that 109 00:09:50,520 --> 00:09:55,110 can help us greatly and we combine the cloud watch events can alert us to 110 00:09:55,110 --> 00:10:00,720 problems very quickly so for an example we might want to look at situations when 111 00:10:00,720 --> 00:10:06,300 someone logs into the console as a root user and so that's something that we 112 00:10:06,300 --> 00:10:09,990 don't want to happen and shouldn't be happening so we could actually have 113 00:10:09,990 --> 00:10:13,770 cloud trail set up with a cloud watch event 114 00:10:13,770 --> 00:10:20,790 attached to to that cloud trail log log stream and it could send off for example 115 00:10:20,790 --> 00:10:27,570 an SNS message or something could happen when that situation occurs so again 116 00:10:27,570 --> 00:10:33,300 Claire watch need to know an event that it needs to be alerted to and how to 117 00:10:33,300 --> 00:10:38,190 route those to the appropriate target for processing so that's where rules 118 00:10:38,190 --> 00:10:43,230 come along where we define the rules for cloud watch to generate an event when 119 00:10:43,230 --> 00:10:47,910 that situation occurs and so that will route them to one or more targets now a 120 00:10:47,910 --> 00:10:54,540 target can be a lambda function it can be an SNS topic it could be an sqs queue 121 00:10:54,540 --> 00:10:59,430 it could be a kanesha stream or it could be built-in targets and an example of a 122 00:10:59,430 --> 00:11:04,290 built-in target would be to terminate an instance so you know when something with 123 00:11:04,290 --> 00:11:07,160 so when someone logs in as root we might 124 00:11:07,160 --> 00:11:11,990 have an instance there we want to terminate the instance or whatever we do so that's 125 00:11:11,990 --> 00:11:18,410 an example of a built-in target that we that we can use, so now we just move on 126 00:11:18,410 --> 00:11:22,610 to some hands-on stuff and we've got the theory out of the way for now so let's 127 00:11:22,610 --> 00:11:25,149 move on