1
00:00:00,00 --> 00:00:05,09
(upbeat techno music)

2
00:00:05,09 --> 00:00:08,05
- [Instructor] Now I'm going to share my solution

3
00:00:08,05 --> 00:00:09,07
for the challenge

4
00:00:09,07 --> 00:00:15,02
where you have taken your own image from your environment

5
00:00:15,02 --> 00:00:20,06
and done object detection using Azure AI.

6
00:00:20,06 --> 00:00:22,03
So let's get started.

7
00:00:22,03 --> 00:00:26,00
I want to show you couple different images that I used

8
00:00:26,00 --> 00:00:29,02
and let's see what resonates with you

9
00:00:29,02 --> 00:00:33,06
and the image that you did object detection for.

10
00:00:33,06 --> 00:00:36,09
So here's my first image.

11
00:00:36,09 --> 00:00:38,05
Let's see.

12
00:00:38,05 --> 00:00:39,09
Okay.

13
00:00:39,09 --> 00:00:44,08
So this shows five people, okay?

14
00:00:44,08 --> 00:00:49,03
And it is showing all five people, again,

15
00:00:49,03 --> 00:00:51,04
with different degree of confidence, right?

16
00:00:51,04 --> 00:00:53,02
This is 88, 90.

17
00:00:53,02 --> 00:00:55,03
The person at is slightly at the back,

18
00:00:55,03 --> 00:00:57,05
so it is saying 57.

19
00:00:57,05 --> 00:00:59,01
This person is 81.

20
00:00:59,01 --> 00:01:00,05
This person is also at the back,

21
00:01:00,05 --> 00:01:02,09
but it recognizes them at 81.

22
00:01:02,09 --> 00:01:06,02
And this person is looking slightly bigger

23
00:01:06,02 --> 00:01:07,08
because they're in the front and they are taller,

24
00:01:07,08 --> 00:01:10,02
so that's 93%.

25
00:01:10,02 --> 00:01:13,07
So if you want, based on the use case, remember you have

26
00:01:13,07 --> 00:01:15,04
to change the threshold.

27
00:01:15,04 --> 00:01:17,05
So the important thing is in this case,

28
00:01:17,05 --> 00:01:21,01
if you wanted the computer to be captured,

29
00:01:21,01 --> 00:01:22,03
then it's not being captured.

30
00:01:22,03 --> 00:01:24,04
If you want this machine to be captured,

31
00:01:24,04 --> 00:01:26,01
then it's not being captured.

32
00:01:26,01 --> 00:01:29,05
So it's important for you to think about what you want

33
00:01:29,05 --> 00:01:33,00
to capture, whether that is missing in this picture.

34
00:01:33,00 --> 00:01:33,08
Then do what you do?

35
00:01:33,08 --> 00:01:35,07
You'll have to train the model.

36
00:01:35,07 --> 00:01:38,01
And you have to create an account,

37
00:01:38,01 --> 00:01:40,07
sign into Azure and get started,

38
00:01:40,07 --> 00:01:43,00
and the SDK and everything is there.

39
00:01:43,00 --> 00:01:44,01
It's going to cost you money,

40
00:01:44,01 --> 00:01:45,07
so they're very transparent about it,

41
00:01:45,07 --> 00:01:48,01
and you can review the pricing.

42
00:01:48,01 --> 00:01:51,00
I'm going to show you another image.

43
00:01:51,00 --> 00:01:53,09
Here's the next one.

44
00:01:53,09 --> 00:01:56,05
So here, it is showing four people.

45
00:01:56,05 --> 00:01:58,09
Seems very straightforward, right?

46
00:01:58,09 --> 00:02:02,06
So again, it is showing 92%,

47
00:02:02,06 --> 00:02:04,08
95, 91, 87.

48
00:02:04,08 --> 00:02:06,03
This is pretty good.

49
00:02:06,03 --> 00:02:08,01
And I wonder what are they all looking at.

50
00:02:08,01 --> 00:02:09,04
They seem very excited.

51
00:02:09,04 --> 00:02:12,02
So if I reduce the threshold,

52
00:02:12,02 --> 00:02:14,09
it's still showing four people.

53
00:02:14,09 --> 00:02:19,07
If I increase the threshold, it still shows four people.

54
00:02:19,07 --> 00:02:21,03
So that's not too bad.

55
00:02:21,03 --> 00:02:24,07
And again, it doesn't recognize this image in here.

56
00:02:24,07 --> 00:02:29,04
If I wanted it to understand this machine in here, I'm going

57
00:02:29,04 --> 00:02:31,02
to need a lot more training data

58
00:02:31,02 --> 00:02:35,03
to retrain this existing pre-trained model.

59
00:02:35,03 --> 00:02:37,08
So you understand that that's a drill

60
00:02:37,08 --> 00:02:40,01
that we have to go through and it's an expensive exercise,

61
00:02:40,01 --> 00:02:42,05
so you have to be aware of that.

62
00:02:42,05 --> 00:02:44,07
So I'm going to show you a third image.

63
00:02:44,07 --> 00:02:47,01
This is an exciting one.

64
00:02:47,01 --> 00:02:51,02
On the surface, it looks like more people in the factory,

65
00:02:51,02 --> 00:02:53,00
and it's three people.

66
00:02:53,00 --> 00:02:56,00
One person with 81% confidence.

67
00:02:56,00 --> 00:02:58,04
Another recognized at 71.

68
00:02:58,04 --> 00:03:01,02
Another recognized at 92.

69
00:03:01,02 --> 00:03:04,01
And again, it's not recognizing the laptop or paper,

70
00:03:04,01 --> 00:03:08,06
and it is actually showing more people recognized, right?

71
00:03:08,06 --> 00:03:11,04
So it's recognizing even this blurred image

72
00:03:11,04 --> 00:03:14,00
of a person at the back here

73
00:03:14,00 --> 00:03:15,05
and this other person at the back.

74
00:03:15,05 --> 00:03:17,08
That's not bad, right?

75
00:03:17,08 --> 00:03:20,09
Again, it depends on what you want the environment

76
00:03:20,09 --> 00:03:24,03
to be seen, how you want the environment to be seen,

77
00:03:24,03 --> 00:03:28,01
what you want to be visible to the camera, right?

78
00:03:28,01 --> 00:03:29,07
And the reason I want

79
00:03:29,07 --> 00:03:34,07
to show this picture is you can see there are five people,

80
00:03:34,07 --> 00:03:39,02
and it is showing six people in here.

81
00:03:39,02 --> 00:03:41,00
Who noticed that?

82
00:03:41,00 --> 00:03:42,00
Nice job.

83
00:03:42,00 --> 00:03:44,02
Which one is missing?

84
00:03:44,02 --> 00:03:47,00
This person that it is recognizing

85
00:03:47,00 --> 00:03:53,06
at 54.7% confidence doesn't exist in the picture.

86
00:03:53,06 --> 00:03:56,02
Let's go figure that out.

87
00:03:56,02 --> 00:03:59,09
So, we've been talking about recognizing objects.

88
00:03:59,09 --> 00:04:01,02
It's doing a pretty good job.

89
00:04:01,02 --> 00:04:03,05
This Azure AI is doing a pretty good job

90
00:04:03,05 --> 00:04:05,07
of recognizing persons.

91
00:04:05,07 --> 00:04:09,04
And in the previous demo, the real demo exercise

92
00:04:09,04 --> 00:04:11,05
that I showed you, if you want to go back,

93
00:04:11,05 --> 00:04:15,00
look at the demo lesson, a couple lessons back,

94
00:04:15,00 --> 00:04:17,03
a couple videos back, you can see

95
00:04:17,03 --> 00:04:21,07
that it recognized the footwear of the person, the seating.

96
00:04:21,07 --> 00:04:25,01
It did a lot more object recognition.

97
00:04:25,01 --> 00:04:28,02
And in this case, it doesn't seem to be doing that.

98
00:04:28,02 --> 00:04:32,01
But if that's not your requirement, this is great,

99
00:04:32,01 --> 00:04:33,08
then it'll do the job for you.

100
00:04:33,08 --> 00:04:37,03
But, I am interested in seeing what is it looking at

101
00:04:37,03 --> 00:04:40,00
as this other person?

102
00:04:40,00 --> 00:04:45,08
Aha, it's a person at 54% confidence,

103
00:04:45,08 --> 00:04:47,07
and it's not a person at all.

104
00:04:47,07 --> 00:04:51,09
It is somehow seeing this part of this machinery

105
00:04:51,09 --> 00:04:54,00
as the head of a person,

106
00:04:54,00 --> 00:04:58,06
and it is saying, "I see a person not so confident.

107
00:04:58,06 --> 00:05:02,05
It is at 54.7% confidence."

108
00:05:02,05 --> 00:05:04,01
So the lesson here is

109
00:05:04,01 --> 00:05:07,05
you create pictures from your own environment

110
00:05:07,05 --> 00:05:11,09
on what your H-AI camera should be watching for,

111
00:05:11,09 --> 00:05:15,03
and you test it and with lots of different images.

112
00:05:15,03 --> 00:05:19,02
So, don't pick one image, test it and say, "Okay, good.

113
00:05:19,02 --> 00:05:20,05
It's doing what it needs to do."

114
00:05:20,05 --> 00:05:23,05
Here it is making things up.

115
00:05:23,05 --> 00:05:26,07
And all AI is stochastic.

116
00:05:26,07 --> 00:05:30,00
It makes a prediction every single time you deal

117
00:05:30,00 --> 00:05:32,07
with an inference AI and feed it a piece of data.

118
00:05:32,07 --> 00:05:35,01
In this case, it's computer vision.

119
00:05:35,01 --> 00:05:38,04
And the data is this image.

120
00:05:38,04 --> 00:05:39,08
And when you feed it an image,

121
00:05:39,08 --> 00:05:42,00
it is going to make a prediction.

122
00:05:42,00 --> 00:05:44,04
It's not just the low degree of confidence,

123
00:05:44,04 --> 00:05:49,04
but it can make a false positive or a false negative.

124
00:05:49,04 --> 00:05:52,00
So it's important to understand whether it is missing out

125
00:05:52,00 --> 00:05:55,01
on some objects that it should recognize.

126
00:05:55,01 --> 00:05:56,05
Then you need to retrain the model

127
00:05:56,05 --> 00:06:00,00
or find a different model that could work

128
00:06:00,00 --> 00:06:02,00
for you from a different vendor maybe.

129
00:06:02,00 --> 00:06:03,04
But in this case,

130
00:06:03,04 --> 00:06:08,07
it is making a false prediction as if a person is there.

131
00:06:08,07 --> 00:06:12,01
So in this case, if you're trying to get a headcount of

132
00:06:12,01 --> 00:06:14,03
how many people are in the factory, instead

133
00:06:14,03 --> 00:06:16,06
of five people, it's going to say six.

134
00:06:16,06 --> 00:06:18,07
What if you're running a fire drill

135
00:06:18,07 --> 00:06:19,05
and you want to make sure

136
00:06:19,05 --> 00:06:20,09
that everyone has left the building,

137
00:06:20,09 --> 00:06:23,08
and it'll create a problem

138
00:06:23,08 --> 00:06:25,03
because you're going to be looking

139
00:06:25,03 --> 00:06:28,04
for this non-existent person who's just sitting and smiling

140
00:06:28,04 --> 00:06:30,09
and happens to be a machine.

141
00:06:30,09 --> 00:06:32,01
So think about it.

142
00:06:32,01 --> 00:06:34,05
I want you to get very imaginative

143
00:06:34,05 --> 00:06:38,09
about the different possibilities of the use cases

144
00:06:38,09 --> 00:06:41,08
of every Edge device that you have.

145
00:06:41,08 --> 00:06:43,05
What can it do?

146
00:06:43,05 --> 00:06:45,08
What can it not do?

147
00:06:45,08 --> 00:06:47,07
What should it not be doing?

148
00:06:47,07 --> 00:06:51,02
It should not be making fake persons prediction

149
00:06:51,02 --> 00:06:52,03
in your images.

150
00:06:52,03 --> 00:06:54,04
And easy way to deal with this,

151
00:06:54,04 --> 00:06:57,00
now that you know it is at 54% confidence is

152
00:06:57,00 --> 00:06:59,01
to change the threshold value.

153
00:06:59,01 --> 00:07:02,07
So here, if I bring it down all the way,

154
00:07:02,07 --> 00:07:04,09
nope, it doesn't.

155
00:07:04,09 --> 00:07:07,01
It's still, if I change the per threshold...

156
00:07:07,01 --> 00:07:08,00
Oh, it did.

157
00:07:08,00 --> 00:07:09,02
So if I went this way

158
00:07:09,02 --> 00:07:14,01
and change the threshold, it says, aha, 54.

159
00:07:14,01 --> 00:07:17,07
Mm, it's more than that, then it will, right?

160
00:07:17,07 --> 00:07:19,09
So now, I change the threshold,

161
00:07:19,09 --> 00:07:25,00
and I said, "Okay, if I have a high threshold, 57,

162
00:07:25,00 --> 00:07:28,02
it's not going to make that prediction of that person."

163
00:07:28,02 --> 00:07:31,03
So, you could actually change the threshold

164
00:07:31,03 --> 00:07:34,06
and get rid of these imaginary images

165
00:07:34,06 --> 00:07:36,04
that you don't want in the picture,

166
00:07:36,04 --> 00:07:37,07
which have low confidence.

167
00:07:37,07 --> 00:07:39,07
But what if this image was predicted,

168
00:07:39,07 --> 00:07:41,01
but it was at high confidence?

169
00:07:41,01 --> 00:07:42,06
Then this threshold won't work.

170
00:07:42,06 --> 00:07:44,03
So that's a trade off you do.

171
00:07:44,03 --> 00:07:47,06
So first you get pictures of your environment, get lots

172
00:07:47,06 --> 00:07:49,06
of different pictures of your environment.

173
00:07:49,06 --> 00:07:53,03
Look for what you are expecting the camera to catch,

174
00:07:53,03 --> 00:07:55,03
make sure that's covered.

175
00:07:55,03 --> 00:07:58,05
Make sure there are objects that are not missing.

176
00:07:58,05 --> 00:08:00,04
Then you have to get more and more pictures

177
00:08:00,04 --> 00:08:01,08
to train your model,

178
00:08:01,08 --> 00:08:04,01
and it's going to cost you money, right?

179
00:08:04,01 --> 00:08:06,00
The other thing is you don't want the model

180
00:08:06,00 --> 00:08:08,05
to be making false predictions

181
00:08:08,05 --> 00:08:11,03
and showing things that don't exist.

182
00:08:11,03 --> 00:08:13,09
And again, I was giving you this example of, you know,

183
00:08:13,09 --> 00:08:17,00
fire drill and making sure everyone got up the the building.

184
00:08:17,00 --> 00:08:19,07
But, it might not matter if it is showing some piece

185
00:08:19,07 --> 00:08:23,07
of equipment or something that is not as material

186
00:08:23,07 --> 00:08:24,08
for your use case.

187
00:08:24,08 --> 00:08:26,00
It might not matter.

188
00:08:26,00 --> 00:08:26,09
But if you're sitting

189
00:08:26,09 --> 00:08:29,01
and using your computer vision model to sit

190
00:08:29,01 --> 00:08:31,08
and count how many products were produced,

191
00:08:31,08 --> 00:08:36,01
and it is going to do a counting by identifying them.

192
00:08:36,01 --> 00:08:40,06
If it makes a false prediction, it might double count

193
00:08:40,06 --> 00:08:42,05
and it might cost you money.

194
00:08:42,05 --> 00:08:46,02
So, it is important to not get carried away by, wow,

195
00:08:46,02 --> 00:08:48,03
it is recognizing people,

196
00:08:48,03 --> 00:08:51,07
but to get to the purpose of the use case

197
00:08:51,07 --> 00:08:53,05
of what you want it to do.

198
00:08:53,05 --> 00:08:55,05
So it has to identify the things you want.

199
00:08:55,05 --> 00:08:58,03
It has to not identify the things you don't want.

200
00:08:58,03 --> 00:09:00,07
You can play with the threshold value.

201
00:09:00,07 --> 00:09:04,08
Try to make this work with the existing API.

202
00:09:04,08 --> 00:09:08,07
Again, this, I'm showing this as a simple demo style

203
00:09:08,07 --> 00:09:12,09
of input images and then try it easily.

204
00:09:12,09 --> 00:09:16,08
But you can also do this using the help of a data scientist.

205
00:09:16,08 --> 00:09:18,06
Use your REST API.

206
00:09:18,06 --> 00:09:21,07
You can use the SDK for reference.

207
00:09:21,07 --> 00:09:23,02
When would you do that?

208
00:09:23,02 --> 00:09:25,01
When you want to integrate this.

209
00:09:25,01 --> 00:09:27,08
So the easiest thing I think as a product manager,

210
00:09:27,08 --> 00:09:30,09
you think about different images and test it.

211
00:09:30,09 --> 00:09:32,01
See if this works for you.

212
00:09:32,01 --> 00:09:33,09
So when many different vendors show up

213
00:09:33,09 --> 00:09:35,06
and say, "Hey, we can do computer vision

214
00:09:35,06 --> 00:09:39,01
and object detection, and we can find anomaly detection

215
00:09:39,01 --> 00:09:41,07
and find that odd thing out in your image",

216
00:09:41,07 --> 00:09:43,07
you actually think about your use case,

217
00:09:43,07 --> 00:09:46,05
you think about the accuracy, think about the threshold,

218
00:09:46,05 --> 00:09:48,05
test it out with various different pictures.

219
00:09:48,05 --> 00:09:53,00
And then, you can actually use the REST API and SDK,

220
00:09:53,00 --> 00:09:54,03
go to your IT department,

221
00:09:54,03 --> 00:09:56,01
or you know, if you're going

222
00:09:56,01 --> 00:09:57,08
to get your data scientists in-house,

223
00:09:57,08 --> 00:10:01,06
get them to actually use this as an Azure AI

224
00:10:01,06 --> 00:10:03,03
from Vision Studio,

225
00:10:03,03 --> 00:10:07,01
and integrate it into your own product, into your workflow,

226
00:10:07,01 --> 00:10:10,04
into your existing technology as an existing AI.

227
00:10:10,04 --> 00:10:12,04
So that's what happens with all H-AI.

228
00:10:12,04 --> 00:10:15,01
H-AI is built, trained by somebody.

229
00:10:15,01 --> 00:10:17,04
It could be retrained by your own team if you want,

230
00:10:17,04 --> 00:10:18,08
which is expensive.

231
00:10:18,08 --> 00:10:21,00
It is all done by data science.

232
00:10:21,00 --> 00:10:24,07
And then the final, the product, which is the AI,

233
00:10:24,07 --> 00:10:26,04
is an inference model.

234
00:10:26,04 --> 00:10:28,03
You just burn it into the device

235
00:10:28,03 --> 00:10:31,01
or you actually get that AI

236
00:10:31,01 --> 00:10:33,05
and integrate it into your workflow

237
00:10:33,05 --> 00:10:34,09
so that it solves your problem.

238
00:10:34,09 --> 00:10:37,06
So never lose focus on what is the customer problem.

239
00:10:37,06 --> 00:10:38,06
What is the use case?

240
00:10:38,06 --> 00:10:40,08
What is the degree of accuracy you want,

241
00:10:40,08 --> 00:10:44,04
and what is right for your business and customers?

242
00:10:44,04 --> 00:10:46,00
Good luck.