1
00:00:00,120 --> 00:00:02,130
In this video, we're going to discuss collisions

2
00:00:02,130 --> 00:00:04,410
and broadcast storms, how to identify 'em

3
00:00:04,410 --> 00:00:06,060
and how to overcome them.

4
00:00:06,060 --> 00:00:07,560
First collisions.

5
00:00:07,560 --> 00:00:10,230
A collision occurs on your network when something happens

6
00:00:10,230 --> 00:00:13,350
to the data as it's sent through the physical network medium

7
00:00:13,350 --> 00:00:15,870
and it's prevented from reaching its final destination.

8
00:00:15,870 --> 00:00:18,420
Now, most of the time a collision occurs when two hosts

9
00:00:18,420 --> 00:00:21,090
on the network are transmitting at the same time,

10
00:00:21,090 --> 00:00:23,010
and therefore their signals get combined

11
00:00:23,010 --> 00:00:25,620
on the network medium and become unreadable.

12
00:00:25,620 --> 00:00:27,720
Now, a collision can occur in both wired

13
00:00:27,720 --> 00:00:29,190
and wireless networks.

14
00:00:29,190 --> 00:00:31,530
If you think back to earlier lessons on ethernet,

15
00:00:31,530 --> 00:00:34,170
you learned how CSMA/CD work

16
00:00:34,170 --> 00:00:37,950
or on wireless networks, how we use CSMA/CA.

17
00:00:37,950 --> 00:00:39,450
You remember that collisions are possible

18
00:00:39,450 --> 00:00:41,220
on both of these type of networks,

19
00:00:41,220 --> 00:00:43,650
and we have to figure out a way to get through them.

20
00:00:43,650 --> 00:00:45,270
Now, to prevent collisions,

21
00:00:45,270 --> 00:00:46,650
you need to architect your networks

22
00:00:46,650 --> 00:00:48,330
with smaller collision domains

23
00:00:48,330 --> 00:00:51,390
because this decreases the chance of a collision happening.

24
00:00:51,390 --> 00:00:53,430
A collision domain is a network segment

25
00:00:53,430 --> 00:00:56,280
connected by a shared medium or through repeaters

26
00:00:56,280 --> 00:00:57,870
or simultaneous data transmissions

27
00:00:57,870 --> 00:01:00,090
can collide with one another.

28
00:01:00,090 --> 00:01:02,700
Now, if we connect all of our devices to a hub,

29
00:01:02,700 --> 00:01:04,290
all of those devices are going to share

30
00:01:04,290 --> 00:01:06,090
a single collision domain.

31
00:01:06,090 --> 00:01:07,770
Similarly, if we're all connected

32
00:01:07,770 --> 00:01:09,570
to the same wireless access point,

33
00:01:09,570 --> 00:01:11,280
we're all going to be treated as being a part

34
00:01:11,280 --> 00:01:13,050
of the same collision domain.

35
00:01:13,050 --> 00:01:14,760
To break apart those collision domains

36
00:01:14,760 --> 00:01:16,470
into smaller collision domains,

37
00:01:16,470 --> 00:01:18,630
we need to use any Layer 2 device,

38
00:01:18,630 --> 00:01:20,610
like a switch or a bridge.

39
00:01:20,610 --> 00:01:22,800
Now, when we replace a hub with a switch,

40
00:01:22,800 --> 00:01:25,320
each switch port becomes its own collision domain,

41
00:01:25,320 --> 00:01:27,330
and this completely prevents all the collisions

42
00:01:27,330 --> 00:01:30,330
from occurring on that switch port between the switch

43
00:01:30,330 --> 00:01:31,560
and that client device

44
00:01:31,560 --> 00:01:33,510
if you're connecting it directly to it.

45
00:01:33,510 --> 00:01:36,870
So how can you detect collisions and why are they so bad?

46
00:01:36,870 --> 00:01:38,160
Well, the first indication

47
00:01:38,160 --> 00:01:40,140
that you might have excessive collisions

48
00:01:40,140 --> 00:01:42,930
is when your network performance starts to go bad.

49
00:01:42,930 --> 00:01:44,790
After all, anytime you have collisions

50
00:01:44,790 --> 00:01:46,350
on an ethernet-based network,

51
00:01:46,350 --> 00:01:48,720
the devices are going to pick a random back off timer,

52
00:01:48,720 --> 00:01:50,460
and then they're going to retransmit.

53
00:01:50,460 --> 00:01:51,990
So if you have a collision,

54
00:01:51,990 --> 00:01:54,210
it now requires two additional transmissions

55
00:01:54,210 --> 00:01:56,640
to get that data sent back out from the sources

56
00:01:56,640 --> 00:01:58,050
to those destinations.

57
00:01:58,050 --> 00:01:59,490
If you have a lot of collisions,

58
00:01:59,490 --> 00:02:01,170
this causes an exponential decline

59
00:02:01,170 --> 00:02:03,240
in the performance of your network's throughput.

60
00:02:03,240 --> 00:02:05,550
Another way you can more accurately determine if collisions

61
00:02:05,550 --> 00:02:06,780
are occurring on your network

62
00:02:06,780 --> 00:02:09,660
is to run the show interface command on your network device,

63
00:02:09,660 --> 00:02:10,979
and then look at the statistics

64
00:02:10,979 --> 00:02:12,600
for the different switch ports.

65
00:02:12,600 --> 00:02:14,910
Let's take a look at a few examples here.

66
00:02:14,910 --> 00:02:16,620
First, let's look at this example

67
00:02:16,620 --> 00:02:19,140
where we can see the collision counter is increasing.

68
00:02:19,140 --> 00:02:21,030
We expect to see zero collisions,

69
00:02:21,030 --> 00:02:23,280
but as the collisions begin to occur,

70
00:02:23,280 --> 00:02:26,430
the interface statistics are going to start climbing upwards.

71
00:02:26,430 --> 00:02:28,050
If you're using hubs in your network,

72
00:02:28,050 --> 00:02:30,660
you're going to have some collisions occur, and that's fine.

73
00:02:30,660 --> 00:02:31,710
This only becomes a problem

74
00:02:31,710 --> 00:02:33,480
when you start to have too many collisions

75
00:02:33,480 --> 00:02:35,940
and your network performance starts to deteriorate.

76
00:02:35,940 --> 00:02:38,370
That said, if you're running a switch based network

77
00:02:38,370 --> 00:02:40,740
and you really should be, then you shouldn't have

78
00:02:40,740 --> 00:02:42,960
collisions occurring, and this would be an indication

79
00:02:42,960 --> 00:02:45,510
that something is not working the way you designed it.

80
00:02:45,510 --> 00:02:48,600
Next, we could also see the deferred counter increasing.

81
00:02:48,600 --> 00:02:51,120
Now the deferred counter is going to count the number of times

82
00:02:51,120 --> 00:02:53,220
the interface has tried to send a frame,

83
00:02:53,220 --> 00:02:56,130
but they found the carrier busy at the first attempt.

84
00:02:56,130 --> 00:02:57,870
This is called carrier sensing.

85
00:02:57,870 --> 00:02:59,610
Again, if you're running a switch,

86
00:02:59,610 --> 00:03:01,890
you should not see a deferred counter rising

87
00:03:01,890 --> 00:03:04,200
because nobody should be waiting to transmit.

88
00:03:04,200 --> 00:03:06,180
Now, if you're using a hub-based network,

89
00:03:06,180 --> 00:03:08,460
this is going to be a normal part of your network operations

90
00:03:08,460 --> 00:03:10,320
and it shouldn't be a concern.

91
00:03:10,320 --> 00:03:12,150
Next, we have late collisions.

92
00:03:12,150 --> 00:03:14,040
Now, this occurs when a collision is detected

93
00:03:14,040 --> 00:03:18,420
after 5.12 microsecond, which is the amount of time it takes

94
00:03:18,420 --> 00:03:21,270
for the 512th bit of a frame to be sent.

95
00:03:21,270 --> 00:03:23,460
This is displayed under the late collision counter

96
00:03:23,460 --> 00:03:25,170
in the interface statistics.

97
00:03:25,170 --> 00:03:27,660
A late collision by itself indicates a problem

98
00:03:27,660 --> 00:03:29,700
but not the root cause.

99
00:03:29,700 --> 00:03:32,790
Instead, the cause is usually an incorrect cable being used,

100
00:03:32,790 --> 00:03:34,350
a bad network interface card

101
00:03:34,350 --> 00:03:36,750
or the use of too many hubs on the network.

102
00:03:36,750 --> 00:03:38,850
Finally, we have excess collisions.

103
00:03:38,850 --> 00:03:40,650
Basically, there is a limit to the number

104
00:03:40,650 --> 00:03:42,960
of times a device can back off from transmitting

105
00:03:42,960 --> 00:03:46,260
and wait when it experiences collision to retransmit again.

106
00:03:46,260 --> 00:03:47,460
When the collision occurs,

107
00:03:47,460 --> 00:03:49,140
it's going to choose a back off timer,

108
00:03:49,140 --> 00:03:51,060
and then it tries retransmitting again.

109
00:03:51,060 --> 00:03:52,590
If it detects another collision,

110
00:03:52,590 --> 00:03:55,350
it picks a new back off timer and tries again.

111
00:03:55,350 --> 00:03:58,080
It'll keep doing this for up to 16 times,

112
00:03:58,080 --> 00:04:00,600
but on the 16th time, it's going to give up

113
00:04:00,600 --> 00:04:02,700
and simply just drop that frame.

114
00:04:02,700 --> 00:04:05,100
In this case, it's marked by the interface statistic

115
00:04:05,100 --> 00:04:06,870
as an excessive collision.

116
00:04:06,870 --> 00:04:08,760
Now, if you want to display the exact number

117
00:04:08,760 --> 00:04:11,070
of excessive collisions, you can enter the command

118
00:04:11,070 --> 00:04:14,040
show controller ethernet on your network platform,

119
00:04:14,040 --> 00:04:16,740
and the excessive collision counters will be displayed.

120
00:04:16,740 --> 00:04:18,720
If you're experiencing excessive collisions,

121
00:04:18,720 --> 00:04:21,060
this is going to indicate a problem in the network.

122
00:04:21,060 --> 00:04:22,320
Usually, this is caused

123
00:04:22,320 --> 00:04:24,660
by devices using full duplex communication

124
00:04:24,660 --> 00:04:27,090
over a shared ethernet segment, like a hub

125
00:04:27,090 --> 00:04:29,460
or you have a broken network interface card,

126
00:04:29,460 --> 00:04:31,710
or you simply have too many clients connected

127
00:04:31,710 --> 00:04:33,480
to the same collision domain.

128
00:04:33,480 --> 00:04:35,550
To overcome an excessive collision issue,

129
00:04:35,550 --> 00:04:37,710
you should turn off auto negotiation for the speed

130
00:04:37,710 --> 00:04:40,140
and duplex of an interface, hardcode the speed

131
00:04:40,140 --> 00:04:42,480
to a lower setting, and change the duplex

132
00:04:42,480 --> 00:04:44,940
to half duplex instead of full duplex.

133
00:04:44,940 --> 00:04:47,040
These speed and duplex settings can be configured

134
00:04:47,040 --> 00:04:49,800
on the networking device or on the client itself

135
00:04:49,800 --> 00:04:52,530
within the Windows, Linux, Unix, and OSX

136
00:04:52,530 --> 00:04:55,740
operating systems under the network adapter settings.

137
00:04:55,740 --> 00:04:58,500
Next, we need to talk about broadcast storms.

138
00:04:58,500 --> 00:05:00,840
Now, a broadcast storm occurs when a network system

139
00:05:00,840 --> 00:05:04,890
is overwhelmed by continuous multicast or broadcast traffic.

140
00:05:04,890 --> 00:05:06,960
Broadcast storms are dangerous to your network

141
00:05:06,960 --> 00:05:08,700
because they can quickly overwhelm switches

142
00:05:08,700 --> 00:05:10,680
and other devices as they struggle to keep up

143
00:05:10,680 --> 00:05:13,470
with the flood of packets that's trying to get processed.

144
00:05:13,470 --> 00:05:14,970
When a broadcast storm occurs,

145
00:05:14,970 --> 00:05:17,580
your network performance is going to decrease rapidly,

146
00:05:17,580 --> 00:05:20,190
and the worst case is it can cause a complete denial

147
00:05:20,190 --> 00:05:21,930
of service in your network.

148
00:05:21,930 --> 00:05:24,030
Remember, a broadcast packet is addressed

149
00:05:24,030 --> 00:05:26,520
at both Layer 2 and Layer 3.

150
00:05:26,520 --> 00:05:27,840
On Layer 2 devices,

151
00:05:27,840 --> 00:05:32,550
the address is FF:FF:FF:FF:FF:FF.

152
00:05:32,550 --> 00:05:33,960
Now if it's at Layer 3,

153
00:05:33,960 --> 00:05:35,730
you're going to see the IP address used

154
00:05:35,730 --> 00:05:39,720
of 255.255.255.255.

155
00:05:39,720 --> 00:05:42,120
Now, a broadcast domain is a logical division

156
00:05:42,120 --> 00:05:44,250
of a computer network in which all of your nodes

157
00:05:44,250 --> 00:05:46,230
can reach each other using the broadcast

158
00:05:46,230 --> 00:05:49,050
at the data link layer, which is Layer 2.

159
00:05:49,050 --> 00:05:51,690
Now, a broadcast domain can be within the same local area

160
00:05:51,690 --> 00:05:53,880
network segment, or it can be bridged

161
00:05:53,880 --> 00:05:54,713
to other local area network segments as well.

162
00:05:54,713 --> 00:05:59,430
Remember, a switch and a Layer 2 device will not break up

163
00:05:59,430 --> 00:06:02,670
broadcast domains because they bridge these things together.

164
00:06:02,670 --> 00:06:04,560
Now instead, you have to reach a router

165
00:06:04,560 --> 00:06:07,470
or a Layer 3 switch to break up the broadcast domain

166
00:06:07,470 --> 00:06:09,420
into smaller broadcast domains.

167
00:06:09,420 --> 00:06:11,700
In general, there's just a couple of main causes

168
00:06:11,700 --> 00:06:14,160
for broadcast storms occurring in your network.

169
00:06:14,160 --> 00:06:16,290
First, you have a singular broadcast domain

170
00:06:16,290 --> 00:06:17,910
that's just way too large.

171
00:06:17,910 --> 00:06:19,530
In this case, the number of clients

172
00:06:19,530 --> 00:06:21,240
will simply create a broadcast storm

173
00:06:21,240 --> 00:06:23,310
when they're conducting their normal operations.

174
00:06:23,310 --> 00:06:25,560
For example, if you have a large enterprise network

175
00:06:25,560 --> 00:06:26,850
that has numerous switches,

176
00:06:26,850 --> 00:06:28,560
they're all interconnecting the network.

177
00:06:28,560 --> 00:06:31,050
This is going to create a really large broadcast domain

178
00:06:31,050 --> 00:06:33,870
because again, switches don't break apart broadcast domains

179
00:06:33,870 --> 00:06:35,100
like a router does.

180
00:06:35,100 --> 00:06:37,350
So if you have a large broadcast domain set up

181
00:06:37,350 --> 00:06:42,350
using ClassB Private Address Ranges like 172.16.0.0/16

182
00:06:43,410 --> 00:06:47,700
you could have up to 65,534 usable IP addresses

183
00:06:47,700 --> 00:06:50,130
and host on that one broadcast domain,

184
00:06:50,130 --> 00:06:52,620
and that's a really large broadcast domain.

185
00:06:52,620 --> 00:06:55,140
Instead, you should break up that broadcast domain

186
00:06:55,140 --> 00:06:58,440
by subnetting out the Class B network into smaller networks

187
00:06:58,440 --> 00:07:00,420
to reduce the number of broadcasts being generated

188
00:07:00,420 --> 00:07:02,020
by those clients on the network.

189
00:07:03,326 --> 00:07:04,560
Between each subnet you're then going to use a router

190
00:07:04,560 --> 00:07:05,790
or a Layer 3 switch

191
00:07:05,790 --> 00:07:09,180
to break up those subnets into separate broadcast domains.

192
00:07:09,180 --> 00:07:11,520
Now, our second cause of broadcast storms occur,

193
00:07:11,520 --> 00:07:12,570
we have a large volume

194
00:07:12,570 --> 00:07:15,840
of DHCP requests on a given broadcast domain.

195
00:07:15,840 --> 00:07:18,240
Whenever a new host connects to a broadcast domain,

196
00:07:18,240 --> 00:07:20,220
they attempt to get an IP address assignment

197
00:07:20,220 --> 00:07:21,780
using the DORA process

198
00:07:21,780 --> 00:07:24,450
of Discover, Offer, Request, and Acknowledge

199
00:07:24,450 --> 00:07:26,370
using the DHCP protocol.

200
00:07:26,370 --> 00:07:29,130
Now, the D or Discover part of this process happens

201
00:07:29,130 --> 00:07:30,510
as a broadcast packet

202
00:07:30,510 --> 00:07:32,910
because the new network client doesn't know the location

203
00:07:32,910 --> 00:07:34,770
of the DHCP server yet,

204
00:07:34,770 --> 00:07:37,590
so this can create a storm of broadcast packets

205
00:07:37,590 --> 00:07:39,810
if the network has a lot of clients try to negotiate

206
00:07:39,810 --> 00:07:42,270
for an IP address all at the same time.

207
00:07:42,270 --> 00:07:44,550
For example, let's say your network device reboots

208
00:07:44,550 --> 00:07:45,750
and all the clients attempt

209
00:07:45,750 --> 00:07:47,820
to renegotiate their DHCP leases,

210
00:07:47,820 --> 00:07:50,040
that can cause a broadcast storm.

211
00:07:50,040 --> 00:07:51,960
If you see a broadcast storm as being caused

212
00:07:51,960 --> 00:07:53,820
by your DHCP configurations,

213
00:07:53,820 --> 00:07:56,220
you need to check if you're using DHCP relays

214
00:07:56,220 --> 00:07:57,840
between your different VLANs.

215
00:07:57,840 --> 00:08:00,240
If you are, you're essentially allowing these VLANs

216
00:08:00,240 --> 00:08:03,270
to be treated as a single broadcast domain for the purposes

217
00:08:03,270 --> 00:08:06,510
of DHCP discovery, and this can lead to broadcast storms

218
00:08:06,510 --> 00:08:09,390
if you have a very large network with a lot of clients.

219
00:08:09,390 --> 00:08:12,030
The third cause of broadcast storms occurring on a network

220
00:08:12,030 --> 00:08:13,740
is going to occur when loops are created

221
00:08:13,740 --> 00:08:15,540
inside of a switching environment.

222
00:08:15,540 --> 00:08:17,310
If you happen to have unmanaged switches

223
00:08:17,310 --> 00:08:19,380
and somebody accidentally cables them together,

224
00:08:19,380 --> 00:08:21,270
this can create an unintentional loop

225
00:08:21,270 --> 00:08:23,280
and this can lead to broadcast storms.

226
00:08:23,280 --> 00:08:26,340
To prevent these, make sure you're enabling BPDUs

227
00:08:26,340 --> 00:08:29,550
or Bridge Protocol Data Units on your managed switches.

228
00:08:29,550 --> 00:08:31,740
Also, you should enforce a maximum number

229
00:08:31,740 --> 00:08:33,330
of MAC addresses per port

230
00:08:33,330 --> 00:08:36,090
because this will also shut down a port if a broadcast storm

231
00:08:36,090 --> 00:08:38,280
starts to go on through that port.

232
00:08:38,280 --> 00:08:41,850
So how can you identify if you're having a broadcast storm?

233
00:08:41,850 --> 00:08:43,590
Well, one of the easiest methods

234
00:08:43,590 --> 00:08:45,300
is to look at your packet counters.

235
00:08:45,300 --> 00:08:47,340
If you know your normal baseline for your network,

236
00:08:47,340 --> 00:08:49,860
and now you see it rapidly increasing way faster

237
00:08:49,860 --> 00:08:52,170
than you normally do, this could be an indication

238
00:08:52,170 --> 00:08:53,640
of a broadcast storm.

239
00:08:53,640 --> 00:08:55,560
For example, let's say you're running a network

240
00:08:55,560 --> 00:08:58,320
that you're used to seeing about 10,000 packets per second,

241
00:08:58,320 --> 00:09:01,230
and now you're seeing 100,000 packets per second.

242
00:09:01,230 --> 00:09:04,170
This could be indicative of a potential broadcast storm.

243
00:09:04,170 --> 00:09:06,600
In this example, you can see our average broadcast

244
00:09:06,600 --> 00:09:08,850
is around 8,700 packets per second,

245
00:09:08,850 --> 00:09:12,120
and we rarely go above 20,000 broadcast packets per second.

246
00:09:12,120 --> 00:09:15,360
So if I see a spike upwards to 100,000 packets

247
00:09:15,360 --> 00:09:16,560
or more per second,

248
00:09:16,560 --> 00:09:19,410
I would know I have a potential broadcast storm on my hands.

249
00:09:19,410 --> 00:09:21,750
Now, another way is to look at your network monitoring tools

250
00:09:21,750 --> 00:09:24,030
and determine the packet loss on your network.

251
00:09:24,030 --> 00:09:25,950
If a broadcast storm starts to occur,

252
00:09:25,950 --> 00:09:28,170
your network devices are going to have a hard time keeping up

253
00:09:28,170 --> 00:09:30,750
with the processing of all the packets on the network,

254
00:09:30,750 --> 00:09:34,110
and so packet loss will rise exponentially.

255
00:09:34,110 --> 00:09:35,400
Now, the most definitive way

256
00:09:35,400 --> 00:09:37,140
to identify a broadcast storm though

257
00:09:37,140 --> 00:09:40,920
is to set up a packet analyzer like Wireshark or TCPdump,

258
00:09:40,920 --> 00:09:43,380
and then you start collecting the traffic on the network.

259
00:09:43,380 --> 00:09:45,870
As you look at that traffic, you're going to look for packets

260
00:09:45,870 --> 00:09:47,610
and see if there's a lot of broadcast packets

261
00:09:47,610 --> 00:09:48,840
that are occurring rapidly.

262
00:09:48,840 --> 00:09:51,600
If so, you're suffering from a broadcast storm.

263
00:09:51,600 --> 00:09:54,210
In this example, you see a Layer 2 broadcast storm

264
00:09:54,210 --> 00:09:55,830
occurring due to an enormous amount

265
00:09:55,830 --> 00:09:58,890
of ARP broadcast being conducted in this network.

266
00:09:58,890 --> 00:10:01,470
Now, in this example, I'm using TCPdump

267
00:10:01,470 --> 00:10:03,870
to see all the broadcast traffic on the network.

268
00:10:03,870 --> 00:10:07,650
To do this, you're going to run the command tcpdump -i,

269
00:10:07,650 --> 00:10:12,300
and then your interface ether broadcast and ether multicast.

270
00:10:12,300 --> 00:10:15,240
This will run TCPdump and only display broadcast

271
00:10:15,240 --> 00:10:17,730
and multicast packets to the screen.

272
00:10:17,730 --> 00:10:20,520
In this example, you can see there are a lot of ARP requests

273
00:10:20,520 --> 00:10:22,530
that are going on because there's a broadcast storm

274
00:10:22,530 --> 00:10:24,000
that's starting to happen.

275
00:10:24,000 --> 00:10:25,260
Remember, the best way

276
00:10:25,260 --> 00:10:27,360
to prevent a broadcast storm is either set up

277
00:10:27,360 --> 00:10:29,970
loop preventions like using BPDUs,

278
00:10:29,970 --> 00:10:31,410
limiting the number of MAC addresses

279
00:10:31,410 --> 00:10:33,060
that can access a given switch port,

280
00:10:33,060 --> 00:10:35,010
and breaking up large broadcast domains

281
00:10:35,010 --> 00:10:36,570
into smaller broadcast domains

282
00:10:36,570 --> 00:10:38,883
using routers and multilayer switches.