1

00:00:00,030  -->  00:00:02,909
so in this lesson we're just going to
take this a little bit further and look

2

00:00:02,909  -->  00:00:08,030
at how we can use cloud watch as an
administration tool so not only just for

3

00:00:08,030  -->  00:00:13,080
invoking ec2 instances in an auto
scaling group or whatever we're doing

4

00:00:13,080  -->  00:00:20,160
but also to look at how we can use it as
one one of our key tool box tools that

5

00:00:20,160  -->  00:00:25,710
we can actually monitor and then we can
actually use for compliance whether we

6

00:00:25,710  -->  00:00:30,720
can use for providing information to
management and the like so we'll go

7

00:00:30,720  -->  00:00:36,059
through and look at statistics we'll go
through further into CloudWatch alarms

8

00:00:36,059  -->  00:00:40,559
and also look at cloud watch events so
the way cloud watch can trigger

9

00:00:40,559  -->  00:00:45,829
something to happen and also we'll look
at cloud watch logs like cloud watch can

10

00:00:45,829  -->  00:00:56,550
can monitor log information and also act
on that log information if required so

11

00:00:56,550  -->  00:01:03,180
cloud watch metrics are available across
a wide range in the majority of AWS

12

00:01:03,180  -->  00:01:07,020
resources but they're not available in
all regions on all resources so you

13

00:01:07,020  -->  00:01:10,560
really need to have a look at the full
list that's online at the developer

14

00:01:10,560  -->  00:01:14,549
guide look at the online version because
it's it's going to be updated when that

15

00:01:14,549  -->  00:01:19,020
when those services are more available
for example that they could be billing

16

00:01:19,020  -->  00:01:23,310
so we not only look at it for resources
but we can also look at it for for

17

00:01:23,310  -->  00:01:27,780
billing so if our costs are going out of
control then it's a really good thing to

18

00:01:27,780  -->  00:01:31,110
be able to be alerted to that so so
that's that's something to definitely

19

00:01:31,110  -->  00:01:36,600
consider if you are an administrator and
you're responsible for a budget for

20

00:01:36,600  -->  00:01:44,250
example we can look at dynamodb ec2 EBS
elastic beanstalk Opsworks Kinesis

21

00:01:44,250  -->  00:01:49,079
Firehose streams and like so a wide range of
areas where we can use cloud watch

22

00:01:49,079  -->  00:01:59,790
metrics not just in ec2 so not only can
we look at and monitor raw data we can

23

00:01:59,790  -->  00:02:03,630
actually look at statistics so if we
wanted to look at average and Max and

24

00:02:03,630  -->  00:02:08,660
images probably what we would mostly
look at if we're looking at this for a

25

00:02:08,660  -->  00:02:12,989
for monitoring purpose as far as getting
information and displaying that

26

00:02:12,989  -->  00:02:18,110
information
we can also get those those statistics

27

00:02:18,110  -->  00:02:22,100
through three methods we can get it
through the CLI the API or through the

28

00:02:22,100  -->  00:02:30,020
console so through the CLI or the API to
the get metric statistics call and we

29

00:02:30,020  -->  00:02:36,709
can get a maximum number of data points
of up to 50,850 the maximum number

30

00:02:36,709  -->  00:02:43,700
of data points returned from a single
request is 1440 so you need to be aware

31

00:02:43,700  -->  00:02:48,890
of that if you are looking to dump large
or you looking to collect large amounts

32

00:02:48,890  -->  00:02:55,100
of information and bring it in so you
really need to treat this information as

33

00:02:55,100  -->  00:02:58,100
what it is it's a stream of information
so it's not something that you would

34

00:02:58,100  -->  00:03:03,140
just come along and just get a great big
dump of that you would normally manager

35

00:03:03,140  -->  00:03:11,239
as a stream or as a regular reading of
that data in the short term so and again

36

00:03:11,239  -->  00:03:15,320
you can do that with the console and you
can also create dashboards so you can

37

00:03:15,320  -->  00:03:24,799
display multiple graphs of alarms and
metrics on on a dashboard screen so so

38

00:03:24,799  -->  00:03:32,540
it's very useful so you need to really
not only understand alarms occur but the

39

00:03:32,540  -->  00:03:37,790
way in which they occur and how they are
invoked so again we can have building

40

00:03:37,790  -->  00:03:42,500
alarms as well as resource alarm so be
aware of that and we'll have a look at

41

00:03:42,500  -->  00:03:48,440
how we utilize building alarms in the at
the end of this lesson we know that it

42

00:03:48,440  -->  00:03:52,790
integrates with SNS and does that quite
well we know that there are three states

43

00:03:52,790  -->  00:03:58,070
okay alarm and insufficient data so we
just need to make sure that we fully

44

00:03:58,070  -->  00:04:03,799
understand how an alarm is invoked and
and when it is actually invoked in there

45

00:04:03,799  -->  00:04:07,640
under what conditions so I've just got
this graph here straight out of the

46

00:04:07,640  -->  00:04:10,790
developer guide and the reason I've done
that is because this is a graph that you

47

00:04:10,790  -->  00:04:15,609
will see on your exam no doubt so just
want to make sure that it looks and

48

00:04:15,609  -->  00:04:20,660
feels the way that you're going to be
used to in the exam so looking at this

49

00:04:20,660  -->  00:04:25,880
we've got an alarm that is set up with a
threshold set to 3 and an evaluation

50

00:04:25,880  -->  00:04:30,420
here
it is set to three so if a metric is

51

00:04:30,420  -->  00:04:35,820
above the alarm threshold for the number
of time periods defined by the

52

00:04:35,820  -->  00:04:40,830
evaluation period then the alarm is
invoked so looking at our graph here

53

00:04:40,830  -->  00:04:46,920
after time period one and two we're down
at one so we're well below our threshold

54

00:04:46,920  -->  00:04:53,460
or three so that blue line there is our
threshold of three so when we go to time

55

00:04:53,460  -->  00:05:00,360
period three and we can see that all of
sudden jumps up to four so that's our

56

00:05:00,360  -->  00:05:07,230
first point where it has exceeded the
threshold but an alarm is not has not

57

00:05:07,230  -->  00:05:12,630
been invoked because it hasn't done it
for the evaluation period of three so it

58

00:05:12,630  -->  00:05:16,320
hasn't done it three consecutive times
so that's the first time and then we

59

00:05:16,320  -->  00:05:21,750
look at time period for which it's still
above and time period five which is

60

00:05:21,750  -->  00:05:28,140
above so at time period five it's
occurred after three time periods

61

00:05:28,140  -->  00:05:34,800
so now evaluation period of three so at
time period number five our action is

62

00:05:34,800  -->  00:05:42,150
invoked our alarm will be invoked so
then it drops back down and basically

63

00:05:42,150  -->  00:05:48,630
our our service can allow another alarm
to occur when it's when it's dropped

64

00:05:48,630  -->  00:05:52,740
back down from the alarm threshold so
we've dropped back down and we've gone

65

00:05:52,740  -->  00:05:56,640
down there the time period eight where
we're down to one again and then all of

66

00:05:56,640  -->  00:06:01,470
a sudden at nine it's jumped right up to
five and a half and then drop back down

67

00:06:01,470  -->  00:06:06,600
again so that's dropped up there for one
only after one time period so because it

68

00:06:06,600  -->  00:06:11,340
hasn't done it for three consecutive
time period so at the end of three

69

00:06:11,340  -->  00:06:16,830
consider time periods it hasn't occurred
that alarm won't be invoked so just need

70

00:06:16,830  -->  00:06:20,880
to understand how that how that works
and understand that graph also

71

00:06:20,880  -->  00:06:24,390
understand that it's not always above
the threshold it could be below a

72

00:06:24,390  -->  00:06:32,490
threshold as well so depending on how
how that metric is set up so another

73

00:06:32,490  -->  00:06:39,810
thing to consider is cloud watch logs so
cloud watch can monitor store and access

74

00:06:39,810  -->  00:06:45,179
log
files from ec2 instances from AWS cloud

75

00:06:45,179  -->  00:06:51,269
trail which which collects logs on our
API calls through all of our resources

76

00:06:51,269  -->  00:06:57,629
or it could be from a number of other
resources that we generate log files so

77

00:06:57,629  -->  00:07:02,129
that allows us to have real time
monitoring of our log information so no

78

00:07:02,129  -->  00:07:06,809
longer is this log just being created
and then stored away somewhere for later

79

00:07:06,809  -->  00:07:13,559
view cloud watch can monitor that log
information as a stream in real time as

80

00:07:13,559  -->  00:07:20,249
it's occurring which is a great thing so
you can really define certain situations

81

00:07:20,249  -->  00:07:23,879
where you want to be not notified in
real time of that and to take corrective

82

00:07:23,879  -->  00:07:28,769
action or whatever you need to do so
those logs will be set up as a log

83

00:07:28,769  -->  00:07:34,319
stream and that will be a sequence of a
log events from a particular source via

84

00:07:34,319  -->  00:07:41,099
an ec2 instance or whatever it is now we
can set up our log streams in log groups

85

00:07:41,099  -->  00:07:47,039
so those log groups they have that there
are groups of streams that have the same

86

00:07:47,039  -->  00:07:54,269
retention and monitoring and access
control settings so you would define

87

00:07:54,269  -->  00:07:57,509
those retention monitoring and access
control settings and then you would have

88

00:07:57,509  -->  00:08:03,179
your log streams organized into log
groups so it's one thing to have these

89

00:08:03,179  -->  00:08:06,899
streams Headley's log group log groups
and have all this information coming in

90

00:08:06,899  -->  00:08:13,259
the cloud which needs to know what to
look for it needs to know what is not

91

00:08:13,259  -->  00:08:19,309
normal and so metric filters they allow
cloud watch they give cloud watch the

92

00:08:19,309  -->  00:08:23,999
opportunity to look at that information
and to extract what it needs so defines

93

00:08:23,999  -->  00:08:31,709
how information is extracted from that
log stream and in what situation a data

94

00:08:31,709  -->  00:08:37,169
point a data point would be created by
cloud watch and then we have our

95

00:08:37,169  -->  00:08:41,669
retention settings and so they're just
how long our events are kept in cloud

96

00:08:41,669  -->  00:08:46,800
watch logs so cloud watch logs and
combine again in the next level c-clamp

97

00:08:46,800  -->  00:08:49,980
watch events
- very very good services that you can

98

00:08:49,980  -->  00:08:54,060
use outside of just the usual alarm
situations from an easy

99

00:08:54,060  -->  00:08:58,560
to instance or whatever that you can use
for in particular for compliance and

100

00:08:58,560  -->  00:09:02,400
security issues and also for
troubleshooting issues you've used if

101

00:09:02,400  -->  00:09:06,690
you you know you've got problems and
then you can clearly define what you are

102

00:09:06,690  -->  00:09:13,560
looking so you can really sort out the
bugs in your infrastructure next we have

103

00:09:13,560  -->  00:09:19,710
cloud watch events so they occur when a
resource changes States for example it

104

00:09:19,710  -->  00:09:23,870
could be an ec2
changes from being running to terminated

105

00:09:23,870  -->  00:09:29,700
it could be when an auto scaling group
launches a new instance so that's an

106

00:09:29,700  -->  00:09:35,760
event that occurs when a resource has
changed we can also look at integrating

107

00:09:35,760  -->  00:09:42,300
this with cloud cloud trail
so cloud trail as we know it logs our

108

00:09:42,300  -->  00:09:50,520
API calls so it's a very good security
compliance and troubleshooting tool that

109

00:09:50,520  -->  00:09:55,110
can help us greatly and we combine the
cloud watch events can alert us to

110

00:09:55,110  -->  00:10:00,720
problems very quickly so for an example
we might want to look at situations when

111

00:10:00,720  -->  00:10:06,300
someone logs into the console as a root
user and so that's something that we

112

00:10:06,300  -->  00:10:09,990
don't want to happen and shouldn't be
happening so we could actually have

113

00:10:09,990  -->  00:10:13,770
cloud trail set up with a cloud watch
event

114

00:10:13,770  -->  00:10:20,790
attached to to that cloud trail log log
stream and it could send off for example

115

00:10:20,790  -->  00:10:27,570
an SNS message or something could happen
when that situation occurs so again

116

00:10:27,570  -->  00:10:33,300
Claire watch need to know an event that
it needs to be alerted to and how to

117

00:10:33,300  -->  00:10:38,190
route those to the appropriate target
for processing so that's where rules

118

00:10:38,190  -->  00:10:43,230
come along where we define the rules for
cloud watch to generate an event when

119

00:10:43,230  -->  00:10:47,910
that situation occurs and so that will
route them to one or more targets now a

120

00:10:47,910  -->  00:10:54,540
target can be a lambda function it can
be an SNS topic it could be an sqs queue

121

00:10:54,540  -->  00:10:59,430
it could be a kanesha stream or it could
be built-in targets and an example of a

122

00:10:59,430  -->  00:11:04,290
built-in target would be to terminate an
instance so you know when something with

123

00:11:04,290  -->  00:11:07,160
so when someone logs in as root we
might

124

00:11:07,160  -->  00:11:11,990
have an instance there we want to terminate
the instance or whatever we do so that's

125

00:11:11,990  -->  00:11:18,410
an example of a built-in target that we
that we can use, so now we just move on

126

00:11:18,410  -->  00:11:22,610
to some hands-on stuff and we've got the
theory out of the way for now so let's

127

00:11:22,610  -->  00:11:25,149
move on