0
1
00:00:00,450 --> 00:00:05,250
Please note that this content is
targeted for SysOps administrators. If
1

2
00:00:05,250 --> 00:00:09,840
you're a Solutions Architect or a
developer you may want to skip over this
2

3
00:00:09,840 --> 00:00:16,379
one. Welcome back to BackSpace
Academy. In this hands-on lecture I'm
3

4
00:00:16,379 --> 00:00:21,600
just going to run through some of the
techniques that are available for us for
4

5
00:00:21,600 --> 00:00:25,949
troubleshooting our ec2 instances. So I'm
just starting off here in the ec2
5

6
00:00:25,949 --> 00:00:30,390
dashboard. What I'm going to look at here
is on the left hand side we can see we
6

7
00:00:30,390 --> 00:00:34,859
have our limits. So we click on that, it's
going to provide us with all of the
7

8
00:00:34,859 --> 00:00:40,499
available limits that we have, so we can
see here on this account that I've got
8

9
00:00:40,499 --> 00:00:46,979
here I can run five c1 medium
instances but I can only run one
9

10
00:00:46,979 --> 00:00:52,589
c3 for extra large instances, so
just one of those so if I try and run
10

11
00:00:52,589 --> 00:00:56,819
more than that it's not going to let me,
so I would have to go through and
11

12
00:00:56,819 --> 00:01:02,399
request a limit increased, so what I'll
do now is I'm just going to try and
12

13
00:01:02,399 --> 00:01:06,840
launch two of these instances now I
don't recommend that you do this
13

14
00:01:06,840 --> 00:01:11,399
yourself because they're very expensive
instances sand if you forget to delete
14

15
00:01:11,399 --> 00:01:14,640
them you're going to end up with a
pretty nasty bill at the end of the
15

16
00:01:14,640 --> 00:01:19,259
month. So just go back to the ec2
dashboard, launch instance, I'm just going
16

17
00:01:19,259 --> 00:01:26,100
to grab a Linux AMI will be fine,
let's scroll down to a c4
17

18
00:01:26,100 --> 00:01:35,909
extra-large. Okay c4  4xextra
large, I'm just going to review and launch
18

19
00:01:35,909 --> 00:01:38,210
that
19

20
00:01:44,320 --> 00:01:49,120
so it's allowed me to launch that
instance it's quite a big one and I'm
20

21
00:01:49,120 --> 00:01:54,940
just going to try and launch another one
of those so again we're just you see
21

22
00:01:54,940 --> 00:02:08,490
Linux AMI and c4 for extra-large and
we'll just review and launch
22

23
00:02:08,490 --> 00:02:16,000
select our key pair and launch, ok so the
launch has failed, so we've exceeded our
23

24
00:02:16,000 --> 00:02:22,240
limits and it's stopped us from
launching that second instance, so I just
24

25
00:02:22,240 --> 00:02:26,890
cancel that before I get a big bill
and I'll just terminate the one that
25

26
00:02:26,890 --> 00:02:29,670
I've created there.
26

27
00:02:36,950 --> 00:02:43,080
okay so what I'll do now is just launch
a smaller instance and we'll just see
27

28
00:02:43,080 --> 00:02:47,430
what tools are available for us to
troubleshoot that instance so we just
28

29
00:02:47,430 --> 00:02:52,489
launched another instance again we'll
just use the Amazon Linux will be fine
29

30
00:02:52,489 --> 00:02:58,849
and a t2 micro will be fine and it's
going to review and launch
30

31
00:03:09,090 --> 00:03:14,590
so we'll just wait for that to finish
being launched and finish all that
31

32
00:03:14,590 --> 00:03:21,910
status checks okay so after a certain
amount of time we have our instance up
32

33
00:03:21,910 --> 00:03:26,680
and running and it's status checks have
passed now if we select that instance
33

34
00:03:26,680 --> 00:03:34,150
and we click on the status checks tab we
can see there that yes it has passed its
34

35
00:03:34,150 --> 00:03:38,140
system status checks and it's instant
status checks now if you want to
35

36
00:03:38,140 --> 00:03:42,280
understand the difference between the
two just hover over the information icon
36

37
00:03:42,280 --> 00:03:48,760
there and the system status check check
verifies that your instance is reachable
37

38
00:03:48,760 --> 00:03:54,010
whereas we look at the instant status
check that verifies that your is
38

39
00:03:54,010 --> 00:04:01,030
instances operating system is accepting
traffic also it tells you what to do if
39

40
00:04:01,030 --> 00:04:05,230
there is a problem that you fail that
check so you can see there that the if
40

41
00:04:05,230 --> 00:04:09,880
you fail an instant status check that's
an operating system is not operating
41

42
00:04:09,880 --> 00:04:16,209
correctly then you you are best off to
reboot that instance whereas if you look
42

43
00:04:16,209 --> 00:04:21,430
at the system status check if something
goes wrong with that then it's obviously
43

44
00:04:21,430 --> 00:04:26,380
at a system level and there is a problem
with the underlying host that is running
44

45
00:04:26,380 --> 00:04:31,150
that that that instance and so the best
thing to do is to stop and start it or
45

46
00:04:31,150 --> 00:04:40,090
else replace that instance now the next
thing we can look at the system logs for
46

47
00:04:40,090 --> 00:04:47,620
the instance so we go to instance
settings and then get a system log so
47

48
00:04:47,620 --> 00:04:52,270
what that is it's an output of the
console of our Linux operating system so
48

49
00:04:52,270 --> 00:04:57,520
that is if we were connected to this
Linux instance this is what we would see
49

50
00:04:57,520 --> 00:05:02,410
from from the console and so that gives
you a lot of information as to what is
50

51
00:05:02,410 --> 00:05:07,539
going on so we can scroll down here to
the latest and we can see it's at the at
51

52
00:05:07,539 --> 00:05:11,260
the login stage they are waiting for a
login to the instance so just close that
52

53
00:05:11,260 --> 00:05:17,050
and what we can also do is we can go
into again into instant settings and we
53

54
00:05:17,050 --> 00:05:22,210
can get a a screenshot of the console of
that instance
54

55
00:05:22,210 --> 00:05:30,090
as it is right now and so there we can
see it's it's waiting for a logging and
55

56
00:05:30,090 --> 00:05:35,620
we can refresh that every want as well
and get a later one and so these are
56

57
00:05:35,620 --> 00:05:40,449
quite quite good because it allows you
to see exactly what has gone going on
57

58
00:05:40,449 --> 00:05:45,280
from a historical perspective through
the booting up stage and also what the
58

59
00:05:45,280 --> 00:05:50,139
console is telling the telling you right
at this point in time just close out of
59

60
00:05:50,139 --> 00:05:58,380
that now if we find that we have a
problem with an instance status check
60

61
00:05:58,380 --> 00:06:04,360
what we can do is we can implement an a
cloud watch action or a cloud watch
61

62
00:06:04,360 --> 00:06:09,340
alarm so we just go into cloud watch
monitoring and add edit alarms and
62

63
00:06:09,340 --> 00:06:14,440
create alarm so what we can do is we
obviously can send a notification as we
63

64
00:06:14,440 --> 00:06:19,319
can do with most cloud watch alarms but
we can also take an action on that
64

65
00:06:19,319 --> 00:06:24,699
instance so we click on take action we
can actually recover that instance so if
65

66
00:06:24,699 --> 00:06:30,490
it it's if it's filed a status check
then we can automatically repair this
66

67
00:06:30,490 --> 00:06:36,250
instance in in the event of that but you
can only recover certain types not all
67

68
00:06:36,250 --> 00:06:41,080
ec2 instance types again but you can
recover t2 and most of the common types
68

69
00:06:41,080 --> 00:06:47,560
of of ec2 instance you can also take an
action to stop the instance now if you
69

70
00:06:47,560 --> 00:06:52,030
want to stop the instance as opposed to
recover the instance the cloud watch
70

71
00:06:52,030 --> 00:06:56,680
service needs to have an IAM role to do
that so you need to select here create
71

72
00:06:56,680 --> 00:07:04,750
IAM role and you can only stop an
instance if it's if it's an EBS volume
72

73
00:07:04,750 --> 00:07:10,060
backed instance, so you can't stop an a
an instance that is instance store back
73

74
00:07:10,060 --> 00:07:14,620
it will just need to be terminated so
then we've got terminate instance as
74

75
00:07:14,620 --> 00:07:19,150
well and we can also reboot that
instance but again we need to create or
75

76
00:07:19,150 --> 00:07:23,710
we need to have an I am role created for
that so what we can do is if we would
76

77
00:07:23,710 --> 00:07:29,050
like to have different stages that kick
in we can first off try and recover the
77

78
00:07:29,050 --> 00:07:33,909
instance if it's filed for so meters
consecutive periods and we can create
78

79
00:07:33,909 --> 00:07:37,960
alarm for that
and then we can look at rebooting the
79

80
00:07:37,960 --> 00:07:43,030
instance and we can do that after say
you know ten consecutive periods same
80

81
00:07:43,030 --> 00:07:49,120
and then we can create create another
alarm to terminate that instance if it's
81

82
00:07:49,120 --> 00:07:53,200
which would be very handy bits in an
auto scaling group so you can terminate
82

83
00:07:53,200 --> 00:07:59,740
that and then another instance can be
created quite quickly for you so I'll
83

84
00:07:59,740 --> 00:08:08,290
just get out of that now so just back
here in our status checks tab we can see
84

85
00:08:08,290 --> 00:08:13,030
here there is also a link to create a
status check alarm and that is the same
85

86
00:08:13,030 --> 00:08:18,520
alarm that we are talking before and so
we can take an action to recover the
86

87
00:08:18,520 --> 00:08:21,730
instance now if we want to terminate an
instance one thing I didn't mention
87

88
00:08:21,730 --> 00:08:27,100
before is it we need to make sure that
termination protection on that instance
88

89
00:08:27,100 --> 00:08:32,740
is disabled otherwise CloudWatch won't be
able to terminate that instance, so
89

90
00:08:32,740 --> 00:08:38,410
that's about all I need to tell you
about troubleshooting of ec2 coming up
90

91
00:08:38,410 --> 00:08:42,219
next we'll have the same for
troubleshooting RDS I'll see you in the
91

92
00:08:42,219 --> 00:08:44,580
next lesson