1 00:00:00,05 --> 00:00:03,04 - [Instructor] Computer vision is the AI 2 00:00:03,04 --> 00:00:07,03 that is trained using images and videos. 3 00:00:07,03 --> 00:00:08,08 When you think about 4 00:00:08,08 --> 00:00:12,06 what is the most important computer vision model, 5 00:00:12,06 --> 00:00:16,01 what comes to mind is object detection. 6 00:00:16,01 --> 00:00:19,06 Object detection is, as the name suggests, 7 00:00:19,06 --> 00:00:21,06 the ability of the model 8 00:00:21,06 --> 00:00:25,02 to detect common objects in an image. 9 00:00:25,02 --> 00:00:28,04 It is about detecting many different objects 10 00:00:28,04 --> 00:00:30,04 in the same image. 11 00:00:30,04 --> 00:00:33,08 We are going to do this demo using Azure AI. 12 00:00:33,08 --> 00:00:36,05 It's part of the Vision Studio. 13 00:00:36,05 --> 00:00:40,02 Azure Vision AI is a pre-trained model 14 00:00:40,02 --> 00:00:44,03 and it has been trained with thousands of common objects. 15 00:00:44,03 --> 00:00:46,00 So let us look at a demo. 16 00:00:46,00 --> 00:00:49,03 Let's pick one of the examples they've given here. 17 00:00:49,03 --> 00:00:51,03 And as you can see, I'm not signed in, 18 00:00:51,03 --> 00:00:53,07 so I'm using a free version of Azure. 19 00:00:53,07 --> 00:00:57,06 So you can try this for yourself directly without any cost. 20 00:00:57,06 --> 00:01:01,00 So here, as soon as I brought this image, 21 00:01:01,00 --> 00:01:03,05 one of their example images here, 22 00:01:03,05 --> 00:01:05,08 you can see there are boxes around it 23 00:01:05,08 --> 00:01:08,01 that is called a bounding box. 24 00:01:08,01 --> 00:01:11,04 That is the output of an object detection model. 25 00:01:11,04 --> 00:01:16,06 It identifies objects in the images and it puts a box 26 00:01:16,06 --> 00:01:19,03 and it recognizes a person 27 00:01:19,03 --> 00:01:22,09 with 95.5 degree confidence. 28 00:01:22,09 --> 00:01:25,06 And this one is a skateboard 29 00:01:25,06 --> 00:01:28,09 and it recognizes it with 90% confidence. 30 00:01:28,09 --> 00:01:34,02 What does that mean? Remember, all AI is stochastic. 31 00:01:34,02 --> 00:01:36,06 That means it is making a prediction 32 00:01:36,06 --> 00:01:38,08 every time you use the AI. 33 00:01:38,08 --> 00:01:41,05 So this AI is an inference AI. 34 00:01:41,05 --> 00:01:44,07 This is a pre-trained model from Azure 35 00:01:44,07 --> 00:01:47,04 and we are users of this model 36 00:01:47,04 --> 00:01:51,01 and we are giving this data, this image as the data 37 00:01:51,01 --> 00:01:53,04 for the model to make a prediction 38 00:01:53,04 --> 00:01:56,00 to say these are two common objects 39 00:01:56,00 --> 00:01:58,05 and it looks at a person also as an object, right? 40 00:01:58,05 --> 00:02:01,06 So it's, a object doesn't mean an inanimate object. 41 00:02:01,06 --> 00:02:04,04 So it recognizes a person and a skateboard. 42 00:02:04,04 --> 00:02:09,00 So it is giving the statistical confidence of how sure 43 00:02:09,00 --> 00:02:11,04 that it is a skateboard. 44 00:02:11,04 --> 00:02:14,03 You see something called a threshold value here. 45 00:02:14,03 --> 00:02:17,03 That is something that you have control to decide 46 00:02:17,03 --> 00:02:21,09 how flexible you are in accepting a low confidence 47 00:02:21,09 --> 00:02:23,07 or high confidence from the AI. 48 00:02:23,07 --> 00:02:26,02 In this example, it has only two objects. 49 00:02:26,02 --> 00:02:29,07 So why don't I pick another one. So let's take this one. 50 00:02:29,07 --> 00:02:33,09 As soon as I put it here, it is recognizing three people. 51 00:02:33,09 --> 00:02:37,04 So it says person 85.6% confidence, 52 00:02:37,04 --> 00:02:40,07 another person is 76.5% confidence, 53 00:02:40,07 --> 00:02:44,04 and a third person at 72.3% confidence. 54 00:02:44,04 --> 00:02:49,01 So outside of that, we can also see it recognizes objects, 55 00:02:49,01 --> 00:02:52,01 it sees a laptop, it sees the seating 56 00:02:52,01 --> 00:02:56,07 and more seating, and it also sees the table and footwear. 57 00:02:56,07 --> 00:03:00,04 That's beautiful. So all of these objects are recognized. 58 00:03:00,04 --> 00:03:02,02 So now, you're going to play with the threshold. 59 00:03:02,02 --> 00:03:07,00 So threshold, we can make it low or we can make it high. 60 00:03:07,00 --> 00:03:09,03 If I make the threshold as high, 61 00:03:09,03 --> 00:03:11,07 then it does not bring as many objects. 62 00:03:11,07 --> 00:03:16,06 So you can see if you want the AI to show you objects 63 00:03:16,06 --> 00:03:19,07 that it is highly confident with a high degree 64 00:03:19,07 --> 00:03:23,01 of confidence, then you can increase the threshold. 65 00:03:23,01 --> 00:03:26,04 If you say it doesn't matter, even if you have low degree 66 00:03:26,04 --> 00:03:27,08 of confidence, show me everything 67 00:03:27,08 --> 00:03:29,09 that you can recognize in this object, 68 00:03:29,09 --> 00:03:32,07 then you can actually find more things. 69 00:03:32,07 --> 00:03:35,00 What is the difference? Can you think for a minute? 70 00:03:35,00 --> 00:03:38,06 Why would this matter? What can you do with it? 71 00:03:38,06 --> 00:03:41,06 So it goes back to what do you want 72 00:03:41,06 --> 00:03:44,05 this computer vision model to do for you? 73 00:03:44,05 --> 00:03:48,09 So if this is going to be a robotic surgery 74 00:03:48,09 --> 00:03:52,01 and a robotic arm that is going to use the camera 75 00:03:52,01 --> 00:03:54,07 to look at things, then you would want it 76 00:03:54,07 --> 00:03:58,07 to have a very high degree of confidence, right? 77 00:03:58,07 --> 00:04:00,06 Something like that. 78 00:04:00,06 --> 00:04:03,09 If it is just identifying things around 79 00:04:03,09 --> 00:04:05,07 in your office or in your warehouse 80 00:04:05,07 --> 00:04:09,00 or counting inventory, then you might not be 81 00:04:09,00 --> 00:04:12,06 so upset if it misses out a few objects 82 00:04:12,06 --> 00:04:14,02 or it adds a few more objects 83 00:04:14,02 --> 00:04:15,09 with a low degree of confidence. 84 00:04:15,09 --> 00:04:19,07 So it is important for you to think about what is it 85 00:04:19,07 --> 00:04:23,02 that you want the object detection model to do. 86 00:04:23,02 --> 00:04:26,00 Object detection is a very important model used 87 00:04:26,00 --> 00:04:28,02 by autonomous vehicles that we have been using 88 00:04:28,02 --> 00:04:30,03 as an example in this course. 89 00:04:30,03 --> 00:04:34,09 But you can apply this for any camera, anything with vision. 90 00:04:34,09 --> 00:04:38,07 Okay, so let us move ahead 91 00:04:38,07 --> 00:04:41,01 and I've shown you two examples. 92 00:04:41,01 --> 00:04:44,09 I want to actually show you a different image 93 00:04:44,09 --> 00:04:47,00 that is not in the examples. 94 00:04:47,00 --> 00:04:49,09 So the way you insert your own image 95 00:04:49,09 --> 00:04:52,06 is you click on browse for a file. 96 00:04:52,06 --> 00:04:55,06 So I brought this image as a sample image 97 00:04:55,06 --> 00:04:57,01 and you can see it's working 98 00:04:57,01 --> 00:05:00,07 and it has identified couple humans in there. 99 00:05:00,07 --> 00:05:03,00 Oh, it's recognizing three people in here. 100 00:05:03,00 --> 00:05:08,02 What about this person and what about this robotic arm? 101 00:05:08,02 --> 00:05:13,02 So again, it goes back to how confident you want this to be 102 00:05:13,02 --> 00:05:17,00 and it it brings it down to one person at high degree 103 00:05:17,00 --> 00:05:19,08 of confidence, two people or three people. 104 00:05:19,08 --> 00:05:23,05 And however low I go on my threshold, 105 00:05:23,05 --> 00:05:27,07 it does not recognize this robotic arm or a fire spark here 106 00:05:27,07 --> 00:05:30,04 or the cables, nothing else. 107 00:05:30,04 --> 00:05:32,01 So if I want to see what is this, 108 00:05:32,01 --> 00:05:34,00 it doesn't recognize the table. 109 00:05:34,00 --> 00:05:36,05 So again, it goes back to your use case 110 00:05:36,05 --> 00:05:40,00 and how important that it recognizes everyone. 111 00:05:40,00 --> 00:05:44,06 So I would say if it recognizes three humans in here 112 00:05:44,06 --> 00:05:48,02 and you're building something like a robotic arm 113 00:05:48,02 --> 00:05:52,00 that is going to be doing welding and it is going to coexist 114 00:05:52,00 --> 00:05:55,06 and work well with humans, then it is important 115 00:05:55,06 --> 00:05:57,02 to recognize all humans. 116 00:05:57,02 --> 00:05:59,00 Let me give you another example. 117 00:05:59,00 --> 00:06:00,08 Think of a warehouse 118 00:06:00,08 --> 00:06:04,04 and it could be autonomous mobility robot 119 00:06:04,04 --> 00:06:08,08 that is moving around in the warehouse or a factory. 120 00:06:08,08 --> 00:06:13,04 Then it cannot say, I can recognize three people 121 00:06:13,04 --> 00:06:15,09 and not the other person. 122 00:06:15,09 --> 00:06:18,03 What if this robot is moving around? 123 00:06:18,03 --> 00:06:20,08 The autonomous mobility robots are called AMRs. 124 00:06:20,08 --> 00:06:22,09 What if the AMR is moving around 125 00:06:22,09 --> 00:06:27,09 and any of these people are opening an elevator door 126 00:06:27,09 --> 00:06:29,01 and coming in? 127 00:06:29,01 --> 00:06:31,08 It has to stop for everyone equally. 128 00:06:31,08 --> 00:06:33,07 So that becomes very important. 129 00:06:33,07 --> 00:06:38,02 So what do you do if your existing model does not recognize 130 00:06:38,02 --> 00:06:40,09 a situation that is important for your work? 131 00:06:40,09 --> 00:06:43,08 That is why you have the option of testing this out 132 00:06:43,08 --> 00:06:46,06 with your own images for your own environment. 133 00:06:46,06 --> 00:06:50,01 And if it doesn't still work, then you go down here 134 00:06:50,01 --> 00:06:52,07 and it says you want to use your own images 135 00:06:52,07 --> 00:06:55,05 and you'll have to sign up for Azure 136 00:06:55,05 --> 00:06:58,06 and it costs you to try out the REST API 137 00:06:58,06 --> 00:07:00,08 and all the details of their API 138 00:07:00,08 --> 00:07:03,03 and the SDK reference is all available here. 139 00:07:03,03 --> 00:07:06,01 So your data scientists can actually use this 140 00:07:06,01 --> 00:07:10,03 pre-trained model and train this model to become smarter, 141 00:07:10,03 --> 00:07:13,08 to understand more objects to the degree of confidence 142 00:07:13,08 --> 00:07:16,07 that you want using the images that you provide. 143 00:07:16,07 --> 00:07:21,05 So computer cameras are essentially 144 00:07:21,05 --> 00:07:24,05 trying to recreate vision as in human vision. 145 00:07:24,05 --> 00:07:25,08 And we take it for granted. 146 00:07:25,08 --> 00:07:28,09 It is a very complicated phenomena 147 00:07:28,09 --> 00:07:31,03 for us to see our environment. 148 00:07:31,03 --> 00:07:36,00 So it is not just that we are seeing things, they are in 3D. 149 00:07:36,00 --> 00:07:40,03 So when you collect images of your own environment 150 00:07:40,03 --> 00:07:44,09 and you want to train your model, you have to think about 151 00:07:44,09 --> 00:07:48,00 how the human vision is a miracle. 152 00:07:48,00 --> 00:07:50,05 It understands multiple layers of the image. 153 00:07:50,05 --> 00:07:53,07 It understands background angle, lighting, placement. 154 00:07:53,07 --> 00:07:55,09 You'll have to capture all that in your workflow. 155 00:07:55,09 --> 00:07:58,08 So you have to collect sample images 156 00:07:58,08 --> 00:08:01,06 that covers different lighting, daytime, evening, 157 00:08:01,06 --> 00:08:03,06 twilight, different angle. 158 00:08:03,06 --> 00:08:05,08 So the difference between these people 159 00:08:05,08 --> 00:08:08,03 and this person might be the angle 160 00:08:08,03 --> 00:08:10,05 in which they are standing, 161 00:08:10,05 --> 00:08:13,08 or it could be that this image is layered 162 00:08:13,08 --> 00:08:16,08 and this person is at a layer at the back. 163 00:08:16,08 --> 00:08:20,01 So you might want to capture that placement of the people 164 00:08:20,01 --> 00:08:22,05 or the objects that you want to identify. 165 00:08:22,05 --> 00:08:24,07 A different background might confuse 166 00:08:24,07 --> 00:08:26,02 the computer vision model. 167 00:08:26,02 --> 00:08:28,05 And so you might want to give the same objects 168 00:08:28,05 --> 00:08:30,07 in multiple different backgrounds 169 00:08:30,07 --> 00:08:34,02 to teach the algorithm that it is the same object 170 00:08:34,02 --> 00:08:36,02 regardless of the background where it is. 171 00:08:36,02 --> 00:08:39,06 So there's a lot of different samples that you have to give 172 00:08:39,06 --> 00:08:44,01 and you have to bring your business acumen of what you want, 173 00:08:44,01 --> 00:08:46,08 how your environment changes over time 174 00:08:46,08 --> 00:08:48,02 and what you want to capture, 175 00:08:48,02 --> 00:08:50,04 what is important for your safety. 176 00:08:50,04 --> 00:08:53,06 And later, we will learn how you want 177 00:08:53,06 --> 00:08:55,02 to think about the data privacy 178 00:08:55,02 --> 00:08:56,08 of what you don't want to capture. 179 00:08:56,08 --> 00:08:59,07 I can see this person is holding a tablet, 180 00:08:59,07 --> 00:09:01,06 this person is holding a paper. 181 00:09:01,06 --> 00:09:05,00 Does it matter in your company, in your environment? 182 00:09:05,00 --> 00:09:07,00 You might want to think about the privacy implication 183 00:09:07,00 --> 00:09:10,08 of what you are using to train the model too. 184 00:09:10,08 --> 00:09:13,05 So there's a lot to learn here. 185 00:09:13,05 --> 00:09:15,05 The easiest thing what we learned here 186 00:09:15,05 --> 00:09:17,04 is to use an existing image 187 00:09:17,04 --> 00:09:19,08 and see how it puts bounding boxes 188 00:09:19,08 --> 00:09:21,05 and does object detection. 189 00:09:21,05 --> 00:09:23,07 The next step was to understand 190 00:09:23,07 --> 00:09:25,02 that there is difference in which 191 00:09:25,02 --> 00:09:27,00 it understand different objects. 192 00:09:27,00 --> 00:09:29,07 And the third thing is for you to understand 193 00:09:29,07 --> 00:09:33,08 how you as a product manager get to control the threshold 194 00:09:33,08 --> 00:09:36,09 on what is applicable for your own application. 195 00:09:36,09 --> 00:09:40,09 And then finally, you can test it with your own image. 196 00:09:40,09 --> 00:09:42,05 And if you do not want 197 00:09:42,05 --> 00:09:45,00 to use the existing model, it doesn't work for you, 198 00:09:45,00 --> 00:09:48,03 you can try vision model from some other technology company. 199 00:09:48,03 --> 00:09:50,09 I'm showing you Azure's Vision Studio, 200 00:09:50,09 --> 00:09:53,04 and test and see what works for you. 201 00:09:53,04 --> 00:09:57,07 Or as a last resort, you can go train this existing model 202 00:09:57,07 --> 00:09:59,03 with your own data set 203 00:09:59,03 --> 00:10:01,05 or your own collection of images 204 00:10:01,05 --> 00:10:02,09 so that you're training the model. 205 00:10:02,09 --> 00:10:05,02 At that point, it is important to remember 206 00:10:05,02 --> 00:10:08,00 the difference between training and inference. 207 00:10:08,00 --> 00:10:10,01 What you have here is an inference model. 208 00:10:10,01 --> 00:10:13,02 It has been pre-trained by Azure and is given to you. 209 00:10:13,02 --> 00:10:15,07 You are a user, it's an inference model. 210 00:10:15,07 --> 00:10:17,08 The minute you say, I'm going to train it 211 00:10:17,08 --> 00:10:22,07 and take a whole bunch of images and run the SDK 212 00:10:22,07 --> 00:10:27,06 or API on top of it, you are training the AI to learn 213 00:10:27,06 --> 00:10:29,04 to get the next level of AI. 214 00:10:29,04 --> 00:10:32,01 Again, you have to test it and then bring it to inference 215 00:10:32,01 --> 00:10:34,06 and then take it to your customer, to your environment 216 00:10:34,06 --> 00:10:36,06 to test whether it works for you to see 217 00:10:36,06 --> 00:10:38,08 exactly what it is detecting 218 00:10:38,08 --> 00:10:41,08 so that it works for your example. 219 00:10:41,08 --> 00:10:44,06 So you might want to just try this out 220 00:10:44,06 --> 00:10:47,07 with different camera setting for factory equipments or city 221 00:10:47,07 --> 00:10:49,02 or home or medical devices 222 00:10:49,02 --> 00:10:51,07 or the AMR in the warehouse that we were talking about. 223 00:10:51,07 --> 00:10:53,07 Next, we will do an interesting challenge. 224 00:10:53,07 --> 00:10:54,07 And before I wrap up, 225 00:10:54,07 --> 00:10:56,07 I want to give you two more references. 226 00:10:56,07 --> 00:11:00,02 One is, if you're thinking about artificial intelligence 227 00:11:00,02 --> 00:11:03,00 in IoT and Edge devices, you can refer 228 00:11:03,00 --> 00:11:04,09 to my course artificial intelligence 229 00:11:04,09 --> 00:11:07,02 and connected products right here on LinkedIn Learning, 230 00:11:07,02 --> 00:11:10,04 especially if you want to learn about computer vision 231 00:11:10,04 --> 00:11:13,00 and different models of computer vision, 232 00:11:13,00 --> 00:11:15,09 there is a special lesson just for that. 233 00:11:15,09 --> 00:11:18,05 And the other thing is, if you're thinking about, 234 00:11:18,05 --> 00:11:20,07 hey, this is human-computer interface 235 00:11:20,07 --> 00:11:22,09 and the device is standing here 236 00:11:22,09 --> 00:11:25,03 with the sparkly fire in here and the people 237 00:11:25,03 --> 00:11:28,05 are standing here, how do I design the AI 238 00:11:28,05 --> 00:11:33,00 to work as a proper application in a human-centered way? 239 00:11:33,00 --> 00:11:35,00 Then you can refer to my course AI design 240 00:11:35,00 --> 00:11:36,02 Using Autonomous Vehicles. 241 00:11:36,02 --> 00:11:37,03 It's actually Innovating 242 00:11:37,03 --> 00:11:39,07 With AI Design Using Autonomous Vehicles 243 00:11:39,07 --> 00:11:41,05 and you can learn about AI on variety 244 00:11:41,05 --> 00:11:45,02 of different AI on Edge devices including computer vision. 245 00:11:45,02 --> 00:11:47,05 So next, I'm going to see you at a challenge 246 00:11:47,05 --> 00:11:49,04 and I'll give you the solution 247 00:11:49,04 --> 00:11:51,06 and we will take this lesson forward 248 00:11:51,06 --> 00:11:55,00 with your own image from your own environment.