1 00:00:00,00 --> 00:00:05,09 (upbeat techno music) 2 00:00:05,09 --> 00:00:08,05 - [Instructor] Now I'm going to share my solution 3 00:00:08,05 --> 00:00:09,07 for the challenge 4 00:00:09,07 --> 00:00:15,02 where you have taken your own image from your environment 5 00:00:15,02 --> 00:00:20,06 and done object detection using Azure AI. 6 00:00:20,06 --> 00:00:22,03 So let's get started. 7 00:00:22,03 --> 00:00:26,00 I want to show you couple different images that I used 8 00:00:26,00 --> 00:00:29,02 and let's see what resonates with you 9 00:00:29,02 --> 00:00:33,06 and the image that you did object detection for. 10 00:00:33,06 --> 00:00:36,09 So here's my first image. 11 00:00:36,09 --> 00:00:38,05 Let's see. 12 00:00:38,05 --> 00:00:39,09 Okay. 13 00:00:39,09 --> 00:00:44,08 So this shows five people, okay? 14 00:00:44,08 --> 00:00:49,03 And it is showing all five people, again, 15 00:00:49,03 --> 00:00:51,04 with different degree of confidence, right? 16 00:00:51,04 --> 00:00:53,02 This is 88, 90. 17 00:00:53,02 --> 00:00:55,03 The person at is slightly at the back, 18 00:00:55,03 --> 00:00:57,05 so it is saying 57. 19 00:00:57,05 --> 00:00:59,01 This person is 81. 20 00:00:59,01 --> 00:01:00,05 This person is also at the back, 21 00:01:00,05 --> 00:01:02,09 but it recognizes them at 81. 22 00:01:02,09 --> 00:01:06,02 And this person is looking slightly bigger 23 00:01:06,02 --> 00:01:07,08 because they're in the front and they are taller, 24 00:01:07,08 --> 00:01:10,02 so that's 93%. 25 00:01:10,02 --> 00:01:13,07 So if you want, based on the use case, remember you have 26 00:01:13,07 --> 00:01:15,04 to change the threshold. 27 00:01:15,04 --> 00:01:17,05 So the important thing is in this case, 28 00:01:17,05 --> 00:01:21,01 if you wanted the computer to be captured, 29 00:01:21,01 --> 00:01:22,03 then it's not being captured. 30 00:01:22,03 --> 00:01:24,04 If you want this machine to be captured, 31 00:01:24,04 --> 00:01:26,01 then it's not being captured. 32 00:01:26,01 --> 00:01:29,05 So it's important for you to think about what you want 33 00:01:29,05 --> 00:01:33,00 to capture, whether that is missing in this picture. 34 00:01:33,00 --> 00:01:33,08 Then do what you do? 35 00:01:33,08 --> 00:01:35,07 You'll have to train the model. 36 00:01:35,07 --> 00:01:38,01 And you have to create an account, 37 00:01:38,01 --> 00:01:40,07 sign into Azure and get started, 38 00:01:40,07 --> 00:01:43,00 and the SDK and everything is there. 39 00:01:43,00 --> 00:01:44,01 It's going to cost you money, 40 00:01:44,01 --> 00:01:45,07 so they're very transparent about it, 41 00:01:45,07 --> 00:01:48,01 and you can review the pricing. 42 00:01:48,01 --> 00:01:51,00 I'm going to show you another image. 43 00:01:51,00 --> 00:01:53,09 Here's the next one. 44 00:01:53,09 --> 00:01:56,05 So here, it is showing four people. 45 00:01:56,05 --> 00:01:58,09 Seems very straightforward, right? 46 00:01:58,09 --> 00:02:02,06 So again, it is showing 92%, 47 00:02:02,06 --> 00:02:04,08 95, 91, 87. 48 00:02:04,08 --> 00:02:06,03 This is pretty good. 49 00:02:06,03 --> 00:02:08,01 And I wonder what are they all looking at. 50 00:02:08,01 --> 00:02:09,04 They seem very excited. 51 00:02:09,04 --> 00:02:12,02 So if I reduce the threshold, 52 00:02:12,02 --> 00:02:14,09 it's still showing four people. 53 00:02:14,09 --> 00:02:19,07 If I increase the threshold, it still shows four people. 54 00:02:19,07 --> 00:02:21,03 So that's not too bad. 55 00:02:21,03 --> 00:02:24,07 And again, it doesn't recognize this image in here. 56 00:02:24,07 --> 00:02:29,04 If I wanted it to understand this machine in here, I'm going 57 00:02:29,04 --> 00:02:31,02 to need a lot more training data 58 00:02:31,02 --> 00:02:35,03 to retrain this existing pre-trained model. 59 00:02:35,03 --> 00:02:37,08 So you understand that that's a drill 60 00:02:37,08 --> 00:02:40,01 that we have to go through and it's an expensive exercise, 61 00:02:40,01 --> 00:02:42,05 so you have to be aware of that. 62 00:02:42,05 --> 00:02:44,07 So I'm going to show you a third image. 63 00:02:44,07 --> 00:02:47,01 This is an exciting one. 64 00:02:47,01 --> 00:02:51,02 On the surface, it looks like more people in the factory, 65 00:02:51,02 --> 00:02:53,00 and it's three people. 66 00:02:53,00 --> 00:02:56,00 One person with 81% confidence. 67 00:02:56,00 --> 00:02:58,04 Another recognized at 71. 68 00:02:58,04 --> 00:03:01,02 Another recognized at 92. 69 00:03:01,02 --> 00:03:04,01 And again, it's not recognizing the laptop or paper, 70 00:03:04,01 --> 00:03:08,06 and it is actually showing more people recognized, right? 71 00:03:08,06 --> 00:03:11,04 So it's recognizing even this blurred image 72 00:03:11,04 --> 00:03:14,00 of a person at the back here 73 00:03:14,00 --> 00:03:15,05 and this other person at the back. 74 00:03:15,05 --> 00:03:17,08 That's not bad, right? 75 00:03:17,08 --> 00:03:20,09 Again, it depends on what you want the environment 76 00:03:20,09 --> 00:03:24,03 to be seen, how you want the environment to be seen, 77 00:03:24,03 --> 00:03:28,01 what you want to be visible to the camera, right? 78 00:03:28,01 --> 00:03:29,07 And the reason I want 79 00:03:29,07 --> 00:03:34,07 to show this picture is you can see there are five people, 80 00:03:34,07 --> 00:03:39,02 and it is showing six people in here. 81 00:03:39,02 --> 00:03:41,00 Who noticed that? 82 00:03:41,00 --> 00:03:42,00 Nice job. 83 00:03:42,00 --> 00:03:44,02 Which one is missing? 84 00:03:44,02 --> 00:03:47,00 This person that it is recognizing 85 00:03:47,00 --> 00:03:53,06 at 54.7% confidence doesn't exist in the picture. 86 00:03:53,06 --> 00:03:56,02 Let's go figure that out. 87 00:03:56,02 --> 00:03:59,09 So, we've been talking about recognizing objects. 88 00:03:59,09 --> 00:04:01,02 It's doing a pretty good job. 89 00:04:01,02 --> 00:04:03,05 This Azure AI is doing a pretty good job 90 00:04:03,05 --> 00:04:05,07 of recognizing persons. 91 00:04:05,07 --> 00:04:09,04 And in the previous demo, the real demo exercise 92 00:04:09,04 --> 00:04:11,05 that I showed you, if you want to go back, 93 00:04:11,05 --> 00:04:15,00 look at the demo lesson, a couple lessons back, 94 00:04:15,00 --> 00:04:17,03 a couple videos back, you can see 95 00:04:17,03 --> 00:04:21,07 that it recognized the footwear of the person, the seating. 96 00:04:21,07 --> 00:04:25,01 It did a lot more object recognition. 97 00:04:25,01 --> 00:04:28,02 And in this case, it doesn't seem to be doing that. 98 00:04:28,02 --> 00:04:32,01 But if that's not your requirement, this is great, 99 00:04:32,01 --> 00:04:33,08 then it'll do the job for you. 100 00:04:33,08 --> 00:04:37,03 But, I am interested in seeing what is it looking at 101 00:04:37,03 --> 00:04:40,00 as this other person? 102 00:04:40,00 --> 00:04:45,08 Aha, it's a person at 54% confidence, 103 00:04:45,08 --> 00:04:47,07 and it's not a person at all. 104 00:04:47,07 --> 00:04:51,09 It is somehow seeing this part of this machinery 105 00:04:51,09 --> 00:04:54,00 as the head of a person, 106 00:04:54,00 --> 00:04:58,06 and it is saying, "I see a person not so confident. 107 00:04:58,06 --> 00:05:02,05 It is at 54.7% confidence." 108 00:05:02,05 --> 00:05:04,01 So the lesson here is 109 00:05:04,01 --> 00:05:07,05 you create pictures from your own environment 110 00:05:07,05 --> 00:05:11,09 on what your H-AI camera should be watching for, 111 00:05:11,09 --> 00:05:15,03 and you test it and with lots of different images. 112 00:05:15,03 --> 00:05:19,02 So, don't pick one image, test it and say, "Okay, good. 113 00:05:19,02 --> 00:05:20,05 It's doing what it needs to do." 114 00:05:20,05 --> 00:05:23,05 Here it is making things up. 115 00:05:23,05 --> 00:05:26,07 And all AI is stochastic. 116 00:05:26,07 --> 00:05:30,00 It makes a prediction every single time you deal 117 00:05:30,00 --> 00:05:32,07 with an inference AI and feed it a piece of data. 118 00:05:32,07 --> 00:05:35,01 In this case, it's computer vision. 119 00:05:35,01 --> 00:05:38,04 And the data is this image. 120 00:05:38,04 --> 00:05:39,08 And when you feed it an image, 121 00:05:39,08 --> 00:05:42,00 it is going to make a prediction. 122 00:05:42,00 --> 00:05:44,04 It's not just the low degree of confidence, 123 00:05:44,04 --> 00:05:49,04 but it can make a false positive or a false negative. 124 00:05:49,04 --> 00:05:52,00 So it's important to understand whether it is missing out 125 00:05:52,00 --> 00:05:55,01 on some objects that it should recognize. 126 00:05:55,01 --> 00:05:56,05 Then you need to retrain the model 127 00:05:56,05 --> 00:06:00,00 or find a different model that could work 128 00:06:00,00 --> 00:06:02,00 for you from a different vendor maybe. 129 00:06:02,00 --> 00:06:03,04 But in this case, 130 00:06:03,04 --> 00:06:08,07 it is making a false prediction as if a person is there. 131 00:06:08,07 --> 00:06:12,01 So in this case, if you're trying to get a headcount of 132 00:06:12,01 --> 00:06:14,03 how many people are in the factory, instead 133 00:06:14,03 --> 00:06:16,06 of five people, it's going to say six. 134 00:06:16,06 --> 00:06:18,07 What if you're running a fire drill 135 00:06:18,07 --> 00:06:19,05 and you want to make sure 136 00:06:19,05 --> 00:06:20,09 that everyone has left the building, 137 00:06:20,09 --> 00:06:23,08 and it'll create a problem 138 00:06:23,08 --> 00:06:25,03 because you're going to be looking 139 00:06:25,03 --> 00:06:28,04 for this non-existent person who's just sitting and smiling 140 00:06:28,04 --> 00:06:30,09 and happens to be a machine. 141 00:06:30,09 --> 00:06:32,01 So think about it. 142 00:06:32,01 --> 00:06:34,05 I want you to get very imaginative 143 00:06:34,05 --> 00:06:38,09 about the different possibilities of the use cases 144 00:06:38,09 --> 00:06:41,08 of every Edge device that you have. 145 00:06:41,08 --> 00:06:43,05 What can it do? 146 00:06:43,05 --> 00:06:45,08 What can it not do? 147 00:06:45,08 --> 00:06:47,07 What should it not be doing? 148 00:06:47,07 --> 00:06:51,02 It should not be making fake persons prediction 149 00:06:51,02 --> 00:06:52,03 in your images. 150 00:06:52,03 --> 00:06:54,04 And easy way to deal with this, 151 00:06:54,04 --> 00:06:57,00 now that you know it is at 54% confidence is 152 00:06:57,00 --> 00:06:59,01 to change the threshold value. 153 00:06:59,01 --> 00:07:02,07 So here, if I bring it down all the way, 154 00:07:02,07 --> 00:07:04,09 nope, it doesn't. 155 00:07:04,09 --> 00:07:07,01 It's still, if I change the per threshold... 156 00:07:07,01 --> 00:07:08,00 Oh, it did. 157 00:07:08,00 --> 00:07:09,02 So if I went this way 158 00:07:09,02 --> 00:07:14,01 and change the threshold, it says, aha, 54. 159 00:07:14,01 --> 00:07:17,07 Mm, it's more than that, then it will, right? 160 00:07:17,07 --> 00:07:19,09 So now, I change the threshold, 161 00:07:19,09 --> 00:07:25,00 and I said, "Okay, if I have a high threshold, 57, 162 00:07:25,00 --> 00:07:28,02 it's not going to make that prediction of that person." 163 00:07:28,02 --> 00:07:31,03 So, you could actually change the threshold 164 00:07:31,03 --> 00:07:34,06 and get rid of these imaginary images 165 00:07:34,06 --> 00:07:36,04 that you don't want in the picture, 166 00:07:36,04 --> 00:07:37,07 which have low confidence. 167 00:07:37,07 --> 00:07:39,07 But what if this image was predicted, 168 00:07:39,07 --> 00:07:41,01 but it was at high confidence? 169 00:07:41,01 --> 00:07:42,06 Then this threshold won't work. 170 00:07:42,06 --> 00:07:44,03 So that's a trade off you do. 171 00:07:44,03 --> 00:07:47,06 So first you get pictures of your environment, get lots 172 00:07:47,06 --> 00:07:49,06 of different pictures of your environment. 173 00:07:49,06 --> 00:07:53,03 Look for what you are expecting the camera to catch, 174 00:07:53,03 --> 00:07:55,03 make sure that's covered. 175 00:07:55,03 --> 00:07:58,05 Make sure there are objects that are not missing. 176 00:07:58,05 --> 00:08:00,04 Then you have to get more and more pictures 177 00:08:00,04 --> 00:08:01,08 to train your model, 178 00:08:01,08 --> 00:08:04,01 and it's going to cost you money, right? 179 00:08:04,01 --> 00:08:06,00 The other thing is you don't want the model 180 00:08:06,00 --> 00:08:08,05 to be making false predictions 181 00:08:08,05 --> 00:08:11,03 and showing things that don't exist. 182 00:08:11,03 --> 00:08:13,09 And again, I was giving you this example of, you know, 183 00:08:13,09 --> 00:08:17,00 fire drill and making sure everyone got up the the building. 184 00:08:17,00 --> 00:08:19,07 But, it might not matter if it is showing some piece 185 00:08:19,07 --> 00:08:23,07 of equipment or something that is not as material 186 00:08:23,07 --> 00:08:24,08 for your use case. 187 00:08:24,08 --> 00:08:26,00 It might not matter. 188 00:08:26,00 --> 00:08:26,09 But if you're sitting 189 00:08:26,09 --> 00:08:29,01 and using your computer vision model to sit 190 00:08:29,01 --> 00:08:31,08 and count how many products were produced, 191 00:08:31,08 --> 00:08:36,01 and it is going to do a counting by identifying them. 192 00:08:36,01 --> 00:08:40,06 If it makes a false prediction, it might double count 193 00:08:40,06 --> 00:08:42,05 and it might cost you money. 194 00:08:42,05 --> 00:08:46,02 So, it is important to not get carried away by, wow, 195 00:08:46,02 --> 00:08:48,03 it is recognizing people, 196 00:08:48,03 --> 00:08:51,07 but to get to the purpose of the use case 197 00:08:51,07 --> 00:08:53,05 of what you want it to do. 198 00:08:53,05 --> 00:08:55,05 So it has to identify the things you want. 199 00:08:55,05 --> 00:08:58,03 It has to not identify the things you don't want. 200 00:08:58,03 --> 00:09:00,07 You can play with the threshold value. 201 00:09:00,07 --> 00:09:04,08 Try to make this work with the existing API. 202 00:09:04,08 --> 00:09:08,07 Again, this, I'm showing this as a simple demo style 203 00:09:08,07 --> 00:09:12,09 of input images and then try it easily. 204 00:09:12,09 --> 00:09:16,08 But you can also do this using the help of a data scientist. 205 00:09:16,08 --> 00:09:18,06 Use your REST API. 206 00:09:18,06 --> 00:09:21,07 You can use the SDK for reference. 207 00:09:21,07 --> 00:09:23,02 When would you do that? 208 00:09:23,02 --> 00:09:25,01 When you want to integrate this. 209 00:09:25,01 --> 00:09:27,08 So the easiest thing I think as a product manager, 210 00:09:27,08 --> 00:09:30,09 you think about different images and test it. 211 00:09:30,09 --> 00:09:32,01 See if this works for you. 212 00:09:32,01 --> 00:09:33,09 So when many different vendors show up 213 00:09:33,09 --> 00:09:35,06 and say, "Hey, we can do computer vision 214 00:09:35,06 --> 00:09:39,01 and object detection, and we can find anomaly detection 215 00:09:39,01 --> 00:09:41,07 and find that odd thing out in your image", 216 00:09:41,07 --> 00:09:43,07 you actually think about your use case, 217 00:09:43,07 --> 00:09:46,05 you think about the accuracy, think about the threshold, 218 00:09:46,05 --> 00:09:48,05 test it out with various different pictures. 219 00:09:48,05 --> 00:09:53,00 And then, you can actually use the REST API and SDK, 220 00:09:53,00 --> 00:09:54,03 go to your IT department, 221 00:09:54,03 --> 00:09:56,01 or you know, if you're going 222 00:09:56,01 --> 00:09:57,08 to get your data scientists in-house, 223 00:09:57,08 --> 00:10:01,06 get them to actually use this as an Azure AI 224 00:10:01,06 --> 00:10:03,03 from Vision Studio, 225 00:10:03,03 --> 00:10:07,01 and integrate it into your own product, into your workflow, 226 00:10:07,01 --> 00:10:10,04 into your existing technology as an existing AI. 227 00:10:10,04 --> 00:10:12,04 So that's what happens with all H-AI. 228 00:10:12,04 --> 00:10:15,01 H-AI is built, trained by somebody. 229 00:10:15,01 --> 00:10:17,04 It could be retrained by your own team if you want, 230 00:10:17,04 --> 00:10:18,08 which is expensive. 231 00:10:18,08 --> 00:10:21,00 It is all done by data science. 232 00:10:21,00 --> 00:10:24,07 And then the final, the product, which is the AI, 233 00:10:24,07 --> 00:10:26,04 is an inference model. 234 00:10:26,04 --> 00:10:28,03 You just burn it into the device 235 00:10:28,03 --> 00:10:31,01 or you actually get that AI 236 00:10:31,01 --> 00:10:33,05 and integrate it into your workflow 237 00:10:33,05 --> 00:10:34,09 so that it solves your problem. 238 00:10:34,09 --> 00:10:37,06 So never lose focus on what is the customer problem. 239 00:10:37,06 --> 00:10:38,06 What is the use case? 240 00:10:38,06 --> 00:10:40,08 What is the degree of accuracy you want, 241 00:10:40,08 --> 00:10:44,04 and what is right for your business and customers? 242 00:10:44,04 --> 00:10:46,00 Good luck.