1 00:00:00,120 --> 00:00:02,130 In this video, we're going to discuss collisions 2 00:00:02,130 --> 00:00:04,410 and broadcast storms, how to identify 'em 3 00:00:04,410 --> 00:00:06,060 and how to overcome them. 4 00:00:06,060 --> 00:00:07,560 First collisions. 5 00:00:07,560 --> 00:00:10,230 A collision occurs on your network when something happens 6 00:00:10,230 --> 00:00:13,350 to the data as it's sent through the physical network medium 7 00:00:13,350 --> 00:00:15,870 and it's prevented from reaching its final destination. 8 00:00:15,870 --> 00:00:18,420 Now, most of the time a collision occurs when two hosts 9 00:00:18,420 --> 00:00:21,090 on the network are transmitting at the same time, 10 00:00:21,090 --> 00:00:23,010 and therefore their signals get combined 11 00:00:23,010 --> 00:00:25,620 on the network medium and become unreadable. 12 00:00:25,620 --> 00:00:27,720 Now, a collision can occur in both wired 13 00:00:27,720 --> 00:00:29,190 and wireless networks. 14 00:00:29,190 --> 00:00:31,530 If you think back to earlier lessons on ethernet, 15 00:00:31,530 --> 00:00:34,170 you learned how CSMA/CD work 16 00:00:34,170 --> 00:00:37,950 or on wireless networks, how we use CSMA/CA. 17 00:00:37,950 --> 00:00:39,450 You remember that collisions are possible 18 00:00:39,450 --> 00:00:41,220 on both of these type of networks, 19 00:00:41,220 --> 00:00:43,650 and we have to figure out a way to get through them. 20 00:00:43,650 --> 00:00:45,270 Now, to prevent collisions, 21 00:00:45,270 --> 00:00:46,650 you need to architect your networks 22 00:00:46,650 --> 00:00:48,330 with smaller collision domains 23 00:00:48,330 --> 00:00:51,390 because this decreases the chance of a collision happening. 24 00:00:51,390 --> 00:00:53,430 A collision domain is a network segment 25 00:00:53,430 --> 00:00:56,280 connected by a shared medium or through repeaters 26 00:00:56,280 --> 00:00:57,870 or simultaneous data transmissions 27 00:00:57,870 --> 00:01:00,090 can collide with one another. 28 00:01:00,090 --> 00:01:02,700 Now, if we connect all of our devices to a hub, 29 00:01:02,700 --> 00:01:04,290 all of those devices are going to share 30 00:01:04,290 --> 00:01:06,090 a single collision domain. 31 00:01:06,090 --> 00:01:07,770 Similarly, if we're all connected 32 00:01:07,770 --> 00:01:09,570 to the same wireless access point, 33 00:01:09,570 --> 00:01:11,280 we're all going to be treated as being a part 34 00:01:11,280 --> 00:01:13,050 of the same collision domain. 35 00:01:13,050 --> 00:01:14,760 To break apart those collision domains 36 00:01:14,760 --> 00:01:16,470 into smaller collision domains, 37 00:01:16,470 --> 00:01:18,630 we need to use any Layer 2 device, 38 00:01:18,630 --> 00:01:20,610 like a switch or a bridge. 39 00:01:20,610 --> 00:01:22,800 Now, when we replace a hub with a switch, 40 00:01:22,800 --> 00:01:25,320 each switch port becomes its own collision domain, 41 00:01:25,320 --> 00:01:27,330 and this completely prevents all the collisions 42 00:01:27,330 --> 00:01:30,330 from occurring on that switch port between the switch 43 00:01:30,330 --> 00:01:31,560 and that client device 44 00:01:31,560 --> 00:01:33,510 if you're connecting it directly to it. 45 00:01:33,510 --> 00:01:36,870 So how can you detect collisions and why are they so bad? 46 00:01:36,870 --> 00:01:38,160 Well, the first indication 47 00:01:38,160 --> 00:01:40,140 that you might have excessive collisions 48 00:01:40,140 --> 00:01:42,930 is when your network performance starts to go bad. 49 00:01:42,930 --> 00:01:44,790 After all, anytime you have collisions 50 00:01:44,790 --> 00:01:46,350 on an ethernet-based network, 51 00:01:46,350 --> 00:01:48,720 the devices are going to pick a random back off timer, 52 00:01:48,720 --> 00:01:50,460 and then they're going to retransmit. 53 00:01:50,460 --> 00:01:51,990 So if you have a collision, 54 00:01:51,990 --> 00:01:54,210 it now requires two additional transmissions 55 00:01:54,210 --> 00:01:56,640 to get that data sent back out from the sources 56 00:01:56,640 --> 00:01:58,050 to those destinations. 57 00:01:58,050 --> 00:01:59,490 If you have a lot of collisions, 58 00:01:59,490 --> 00:02:01,170 this causes an exponential decline 59 00:02:01,170 --> 00:02:03,240 in the performance of your network's throughput. 60 00:02:03,240 --> 00:02:05,550 Another way you can more accurately determine if collisions 61 00:02:05,550 --> 00:02:06,780 are occurring on your network 62 00:02:06,780 --> 00:02:09,660 is to run the show interface command on your network device, 63 00:02:09,660 --> 00:02:10,979 and then look at the statistics 64 00:02:10,979 --> 00:02:12,600 for the different switch ports. 65 00:02:12,600 --> 00:02:14,910 Let's take a look at a few examples here. 66 00:02:14,910 --> 00:02:16,620 First, let's look at this example 67 00:02:16,620 --> 00:02:19,140 where we can see the collision counter is increasing. 68 00:02:19,140 --> 00:02:21,030 We expect to see zero collisions, 69 00:02:21,030 --> 00:02:23,280 but as the collisions begin to occur, 70 00:02:23,280 --> 00:02:26,430 the interface statistics are going to start climbing upwards. 71 00:02:26,430 --> 00:02:28,050 If you're using hubs in your network, 72 00:02:28,050 --> 00:02:30,660 you're going to have some collisions occur, and that's fine. 73 00:02:30,660 --> 00:02:31,710 This only becomes a problem 74 00:02:31,710 --> 00:02:33,480 when you start to have too many collisions 75 00:02:33,480 --> 00:02:35,940 and your network performance starts to deteriorate. 76 00:02:35,940 --> 00:02:38,370 That said, if you're running a switch based network 77 00:02:38,370 --> 00:02:40,740 and you really should be, then you shouldn't have 78 00:02:40,740 --> 00:02:42,960 collisions occurring, and this would be an indication 79 00:02:42,960 --> 00:02:45,510 that something is not working the way you designed it. 80 00:02:45,510 --> 00:02:48,600 Next, we could also see the deferred counter increasing. 81 00:02:48,600 --> 00:02:51,120 Now the deferred counter is going to count the number of times 82 00:02:51,120 --> 00:02:53,220 the interface has tried to send a frame, 83 00:02:53,220 --> 00:02:56,130 but they found the carrier busy at the first attempt. 84 00:02:56,130 --> 00:02:57,870 This is called carrier sensing. 85 00:02:57,870 --> 00:02:59,610 Again, if you're running a switch, 86 00:02:59,610 --> 00:03:01,890 you should not see a deferred counter rising 87 00:03:01,890 --> 00:03:04,200 because nobody should be waiting to transmit. 88 00:03:04,200 --> 00:03:06,180 Now, if you're using a hub-based network, 89 00:03:06,180 --> 00:03:08,460 this is going to be a normal part of your network operations 90 00:03:08,460 --> 00:03:10,320 and it shouldn't be a concern. 91 00:03:10,320 --> 00:03:12,150 Next, we have late collisions. 92 00:03:12,150 --> 00:03:14,040 Now, this occurs when a collision is detected 93 00:03:14,040 --> 00:03:18,420 after 5.12 microsecond, which is the amount of time it takes 94 00:03:18,420 --> 00:03:21,270 for the 512th bit of a frame to be sent. 95 00:03:21,270 --> 00:03:23,460 This is displayed under the late collision counter 96 00:03:23,460 --> 00:03:25,170 in the interface statistics. 97 00:03:25,170 --> 00:03:27,660 A late collision by itself indicates a problem 98 00:03:27,660 --> 00:03:29,700 but not the root cause. 99 00:03:29,700 --> 00:03:32,790 Instead, the cause is usually an incorrect cable being used, 100 00:03:32,790 --> 00:03:34,350 a bad network interface card 101 00:03:34,350 --> 00:03:36,750 or the use of too many hubs on the network. 102 00:03:36,750 --> 00:03:38,850 Finally, we have excess collisions. 103 00:03:38,850 --> 00:03:40,650 Basically, there is a limit to the number 104 00:03:40,650 --> 00:03:42,960 of times a device can back off from transmitting 105 00:03:42,960 --> 00:03:46,260 and wait when it experiences collision to retransmit again. 106 00:03:46,260 --> 00:03:47,460 When the collision occurs, 107 00:03:47,460 --> 00:03:49,140 it's going to choose a back off timer, 108 00:03:49,140 --> 00:03:51,060 and then it tries retransmitting again. 109 00:03:51,060 --> 00:03:52,590 If it detects another collision, 110 00:03:52,590 --> 00:03:55,350 it picks a new back off timer and tries again. 111 00:03:55,350 --> 00:03:58,080 It'll keep doing this for up to 16 times, 112 00:03:58,080 --> 00:04:00,600 but on the 16th time, it's going to give up 113 00:04:00,600 --> 00:04:02,700 and simply just drop that frame. 114 00:04:02,700 --> 00:04:05,100 In this case, it's marked by the interface statistic 115 00:04:05,100 --> 00:04:06,870 as an excessive collision. 116 00:04:06,870 --> 00:04:08,760 Now, if you want to display the exact number 117 00:04:08,760 --> 00:04:11,070 of excessive collisions, you can enter the command 118 00:04:11,070 --> 00:04:14,040 show controller ethernet on your network platform, 119 00:04:14,040 --> 00:04:16,740 and the excessive collision counters will be displayed. 120 00:04:16,740 --> 00:04:18,720 If you're experiencing excessive collisions, 121 00:04:18,720 --> 00:04:21,060 this is going to indicate a problem in the network. 122 00:04:21,060 --> 00:04:22,320 Usually, this is caused 123 00:04:22,320 --> 00:04:24,660 by devices using full duplex communication 124 00:04:24,660 --> 00:04:27,090 over a shared ethernet segment, like a hub 125 00:04:27,090 --> 00:04:29,460 or you have a broken network interface card, 126 00:04:29,460 --> 00:04:31,710 or you simply have too many clients connected 127 00:04:31,710 --> 00:04:33,480 to the same collision domain. 128 00:04:33,480 --> 00:04:35,550 To overcome an excessive collision issue, 129 00:04:35,550 --> 00:04:37,710 you should turn off auto negotiation for the speed 130 00:04:37,710 --> 00:04:40,140 and duplex of an interface, hardcode the speed 131 00:04:40,140 --> 00:04:42,480 to a lower setting, and change the duplex 132 00:04:42,480 --> 00:04:44,940 to half duplex instead of full duplex. 133 00:04:44,940 --> 00:04:47,040 These speed and duplex settings can be configured 134 00:04:47,040 --> 00:04:49,800 on the networking device or on the client itself 135 00:04:49,800 --> 00:04:52,530 within the Windows, Linux, Unix, and OSX 136 00:04:52,530 --> 00:04:55,740 operating systems under the network adapter settings. 137 00:04:55,740 --> 00:04:58,500 Next, we need to talk about broadcast storms. 138 00:04:58,500 --> 00:05:00,840 Now, a broadcast storm occurs when a network system 139 00:05:00,840 --> 00:05:04,890 is overwhelmed by continuous multicast or broadcast traffic. 140 00:05:04,890 --> 00:05:06,960 Broadcast storms are dangerous to your network 141 00:05:06,960 --> 00:05:08,700 because they can quickly overwhelm switches 142 00:05:08,700 --> 00:05:10,680 and other devices as they struggle to keep up 143 00:05:10,680 --> 00:05:13,470 with the flood of packets that's trying to get processed. 144 00:05:13,470 --> 00:05:14,970 When a broadcast storm occurs, 145 00:05:14,970 --> 00:05:17,580 your network performance is going to decrease rapidly, 146 00:05:17,580 --> 00:05:20,190 and the worst case is it can cause a complete denial 147 00:05:20,190 --> 00:05:21,930 of service in your network. 148 00:05:21,930 --> 00:05:24,030 Remember, a broadcast packet is addressed 149 00:05:24,030 --> 00:05:26,520 at both Layer 2 and Layer 3. 150 00:05:26,520 --> 00:05:27,840 On Layer 2 devices, 151 00:05:27,840 --> 00:05:32,550 the address is FF:FF:FF:FF:FF:FF. 152 00:05:32,550 --> 00:05:33,960 Now if it's at Layer 3, 153 00:05:33,960 --> 00:05:35,730 you're going to see the IP address used 154 00:05:35,730 --> 00:05:39,720 of 255.255.255.255. 155 00:05:39,720 --> 00:05:42,120 Now, a broadcast domain is a logical division 156 00:05:42,120 --> 00:05:44,250 of a computer network in which all of your nodes 157 00:05:44,250 --> 00:05:46,230 can reach each other using the broadcast 158 00:05:46,230 --> 00:05:49,050 at the data link layer, which is Layer 2. 159 00:05:49,050 --> 00:05:51,690 Now, a broadcast domain can be within the same local area 160 00:05:51,690 --> 00:05:53,880 network segment, or it can be bridged 161 00:05:53,880 --> 00:05:54,713 to other local area network segments as well. 162 00:05:54,713 --> 00:05:59,430 Remember, a switch and a Layer 2 device will not break up 163 00:05:59,430 --> 00:06:02,670 broadcast domains because they bridge these things together. 164 00:06:02,670 --> 00:06:04,560 Now instead, you have to reach a router 165 00:06:04,560 --> 00:06:07,470 or a Layer 3 switch to break up the broadcast domain 166 00:06:07,470 --> 00:06:09,420 into smaller broadcast domains. 167 00:06:09,420 --> 00:06:11,700 In general, there's just a couple of main causes 168 00:06:11,700 --> 00:06:14,160 for broadcast storms occurring in your network. 169 00:06:14,160 --> 00:06:16,290 First, you have a singular broadcast domain 170 00:06:16,290 --> 00:06:17,910 that's just way too large. 171 00:06:17,910 --> 00:06:19,530 In this case, the number of clients 172 00:06:19,530 --> 00:06:21,240 will simply create a broadcast storm 173 00:06:21,240 --> 00:06:23,310 when they're conducting their normal operations. 174 00:06:23,310 --> 00:06:25,560 For example, if you have a large enterprise network 175 00:06:25,560 --> 00:06:26,850 that has numerous switches, 176 00:06:26,850 --> 00:06:28,560 they're all interconnecting the network. 177 00:06:28,560 --> 00:06:31,050 This is going to create a really large broadcast domain 178 00:06:31,050 --> 00:06:33,870 because again, switches don't break apart broadcast domains 179 00:06:33,870 --> 00:06:35,100 like a router does. 180 00:06:35,100 --> 00:06:37,350 So if you have a large broadcast domain set up 181 00:06:37,350 --> 00:06:42,350 using ClassB Private Address Ranges like 172.16.0.0/16 182 00:06:43,410 --> 00:06:47,700 you could have up to 65,534 usable IP addresses 183 00:06:47,700 --> 00:06:50,130 and host on that one broadcast domain, 184 00:06:50,130 --> 00:06:52,620 and that's a really large broadcast domain. 185 00:06:52,620 --> 00:06:55,140 Instead, you should break up that broadcast domain 186 00:06:55,140 --> 00:06:58,440 by subnetting out the Class B network into smaller networks 187 00:06:58,440 --> 00:07:00,420 to reduce the number of broadcasts being generated 188 00:07:00,420 --> 00:07:02,020 by those clients on the network. 189 00:07:03,326 --> 00:07:04,560 Between each subnet you're then going to use a router 190 00:07:04,560 --> 00:07:05,790 or a Layer 3 switch 191 00:07:05,790 --> 00:07:09,180 to break up those subnets into separate broadcast domains. 192 00:07:09,180 --> 00:07:11,520 Now, our second cause of broadcast storms occur, 193 00:07:11,520 --> 00:07:12,570 we have a large volume 194 00:07:12,570 --> 00:07:15,840 of DHCP requests on a given broadcast domain. 195 00:07:15,840 --> 00:07:18,240 Whenever a new host connects to a broadcast domain, 196 00:07:18,240 --> 00:07:20,220 they attempt to get an IP address assignment 197 00:07:20,220 --> 00:07:21,780 using the DORA process 198 00:07:21,780 --> 00:07:24,450 of Discover, Offer, Request, and Acknowledge 199 00:07:24,450 --> 00:07:26,370 using the DHCP protocol. 200 00:07:26,370 --> 00:07:29,130 Now, the D or Discover part of this process happens 201 00:07:29,130 --> 00:07:30,510 as a broadcast packet 202 00:07:30,510 --> 00:07:32,910 because the new network client doesn't know the location 203 00:07:32,910 --> 00:07:34,770 of the DHCP server yet, 204 00:07:34,770 --> 00:07:37,590 so this can create a storm of broadcast packets 205 00:07:37,590 --> 00:07:39,810 if the network has a lot of clients try to negotiate 206 00:07:39,810 --> 00:07:42,270 for an IP address all at the same time. 207 00:07:42,270 --> 00:07:44,550 For example, let's say your network device reboots 208 00:07:44,550 --> 00:07:45,750 and all the clients attempt 209 00:07:45,750 --> 00:07:47,820 to renegotiate their DHCP leases, 210 00:07:47,820 --> 00:07:50,040 that can cause a broadcast storm. 211 00:07:50,040 --> 00:07:51,960 If you see a broadcast storm as being caused 212 00:07:51,960 --> 00:07:53,820 by your DHCP configurations, 213 00:07:53,820 --> 00:07:56,220 you need to check if you're using DHCP relays 214 00:07:56,220 --> 00:07:57,840 between your different VLANs. 215 00:07:57,840 --> 00:08:00,240 If you are, you're essentially allowing these VLANs 216 00:08:00,240 --> 00:08:03,270 to be treated as a single broadcast domain for the purposes 217 00:08:03,270 --> 00:08:06,510 of DHCP discovery, and this can lead to broadcast storms 218 00:08:06,510 --> 00:08:09,390 if you have a very large network with a lot of clients. 219 00:08:09,390 --> 00:08:12,030 The third cause of broadcast storms occurring on a network 220 00:08:12,030 --> 00:08:13,740 is going to occur when loops are created 221 00:08:13,740 --> 00:08:15,540 inside of a switching environment. 222 00:08:15,540 --> 00:08:17,310 If you happen to have unmanaged switches 223 00:08:17,310 --> 00:08:19,380 and somebody accidentally cables them together, 224 00:08:19,380 --> 00:08:21,270 this can create an unintentional loop 225 00:08:21,270 --> 00:08:23,280 and this can lead to broadcast storms. 226 00:08:23,280 --> 00:08:26,340 To prevent these, make sure you're enabling BPDUs 227 00:08:26,340 --> 00:08:29,550 or Bridge Protocol Data Units on your managed switches. 228 00:08:29,550 --> 00:08:31,740 Also, you should enforce a maximum number 229 00:08:31,740 --> 00:08:33,330 of MAC addresses per port 230 00:08:33,330 --> 00:08:36,090 because this will also shut down a port if a broadcast storm 231 00:08:36,090 --> 00:08:38,280 starts to go on through that port. 232 00:08:38,280 --> 00:08:41,850 So how can you identify if you're having a broadcast storm? 233 00:08:41,850 --> 00:08:43,590 Well, one of the easiest methods 234 00:08:43,590 --> 00:08:45,300 is to look at your packet counters. 235 00:08:45,300 --> 00:08:47,340 If you know your normal baseline for your network, 236 00:08:47,340 --> 00:08:49,860 and now you see it rapidly increasing way faster 237 00:08:49,860 --> 00:08:52,170 than you normally do, this could be an indication 238 00:08:52,170 --> 00:08:53,640 of a broadcast storm. 239 00:08:53,640 --> 00:08:55,560 For example, let's say you're running a network 240 00:08:55,560 --> 00:08:58,320 that you're used to seeing about 10,000 packets per second, 241 00:08:58,320 --> 00:09:01,230 and now you're seeing 100,000 packets per second. 242 00:09:01,230 --> 00:09:04,170 This could be indicative of a potential broadcast storm. 243 00:09:04,170 --> 00:09:06,600 In this example, you can see our average broadcast 244 00:09:06,600 --> 00:09:08,850 is around 8,700 packets per second, 245 00:09:08,850 --> 00:09:12,120 and we rarely go above 20,000 broadcast packets per second. 246 00:09:12,120 --> 00:09:15,360 So if I see a spike upwards to 100,000 packets 247 00:09:15,360 --> 00:09:16,560 or more per second, 248 00:09:16,560 --> 00:09:19,410 I would know I have a potential broadcast storm on my hands. 249 00:09:19,410 --> 00:09:21,750 Now, another way is to look at your network monitoring tools 250 00:09:21,750 --> 00:09:24,030 and determine the packet loss on your network. 251 00:09:24,030 --> 00:09:25,950 If a broadcast storm starts to occur, 252 00:09:25,950 --> 00:09:28,170 your network devices are going to have a hard time keeping up 253 00:09:28,170 --> 00:09:30,750 with the processing of all the packets on the network, 254 00:09:30,750 --> 00:09:34,110 and so packet loss will rise exponentially. 255 00:09:34,110 --> 00:09:35,400 Now, the most definitive way 256 00:09:35,400 --> 00:09:37,140 to identify a broadcast storm though 257 00:09:37,140 --> 00:09:40,920 is to set up a packet analyzer like Wireshark or TCPdump, 258 00:09:40,920 --> 00:09:43,380 and then you start collecting the traffic on the network. 259 00:09:43,380 --> 00:09:45,870 As you look at that traffic, you're going to look for packets 260 00:09:45,870 --> 00:09:47,610 and see if there's a lot of broadcast packets 261 00:09:47,610 --> 00:09:48,840 that are occurring rapidly. 262 00:09:48,840 --> 00:09:51,600 If so, you're suffering from a broadcast storm. 263 00:09:51,600 --> 00:09:54,210 In this example, you see a Layer 2 broadcast storm 264 00:09:54,210 --> 00:09:55,830 occurring due to an enormous amount 265 00:09:55,830 --> 00:09:58,890 of ARP broadcast being conducted in this network. 266 00:09:58,890 --> 00:10:01,470 Now, in this example, I'm using TCPdump 267 00:10:01,470 --> 00:10:03,870 to see all the broadcast traffic on the network. 268 00:10:03,870 --> 00:10:07,650 To do this, you're going to run the command tcpdump -i, 269 00:10:07,650 --> 00:10:12,300 and then your interface ether broadcast and ether multicast. 270 00:10:12,300 --> 00:10:15,240 This will run TCPdump and only display broadcast 271 00:10:15,240 --> 00:10:17,730 and multicast packets to the screen. 272 00:10:17,730 --> 00:10:20,520 In this example, you can see there are a lot of ARP requests 273 00:10:20,520 --> 00:10:22,530 that are going on because there's a broadcast storm 274 00:10:22,530 --> 00:10:24,000 that's starting to happen. 275 00:10:24,000 --> 00:10:25,260 Remember, the best way 276 00:10:25,260 --> 00:10:27,360 to prevent a broadcast storm is either set up 277 00:10:27,360 --> 00:10:29,970 loop preventions like using BPDUs, 278 00:10:29,970 --> 00:10:31,410 limiting the number of MAC addresses 279 00:10:31,410 --> 00:10:33,060 that can access a given switch port, 280 00:10:33,060 --> 00:10:35,010 and breaking up large broadcast domains 281 00:10:35,010 --> 00:10:36,570 into smaller broadcast domains 282 00:10:36,570 --> 00:10:38,883 using routers and multilayer switches.