1
00:00:00,05 --> 00:00:03,04
- [Instructor] Computer vision is the AI

2
00:00:03,04 --> 00:00:07,03
that is trained using images and videos.

3
00:00:07,03 --> 00:00:08,08
When you think about

4
00:00:08,08 --> 00:00:12,06
what is the most important computer vision model,

5
00:00:12,06 --> 00:00:16,01
what comes to mind is object detection.

6
00:00:16,01 --> 00:00:19,06
Object detection is, as the name suggests,

7
00:00:19,06 --> 00:00:21,06
the ability of the model

8
00:00:21,06 --> 00:00:25,02
to detect common objects in an image.

9
00:00:25,02 --> 00:00:28,04
It is about detecting many different objects

10
00:00:28,04 --> 00:00:30,04
in the same image.

11
00:00:30,04 --> 00:00:33,08
We are going to do this demo using Azure AI.

12
00:00:33,08 --> 00:00:36,05
It's part of the Vision Studio.

13
00:00:36,05 --> 00:00:40,02
Azure Vision AI is a pre-trained model

14
00:00:40,02 --> 00:00:44,03
and it has been trained with thousands of common objects.

15
00:00:44,03 --> 00:00:46,00
So let us look at a demo.

16
00:00:46,00 --> 00:00:49,03
Let's pick one of the examples they've given here.

17
00:00:49,03 --> 00:00:51,03
And as you can see, I'm not signed in,

18
00:00:51,03 --> 00:00:53,07
so I'm using a free version of Azure.

19
00:00:53,07 --> 00:00:57,06
So you can try this for yourself directly without any cost.

20
00:00:57,06 --> 00:01:01,00
So here, as soon as I brought this image,

21
00:01:01,00 --> 00:01:03,05
one of their example images here,

22
00:01:03,05 --> 00:01:05,08
you can see there are boxes around it

23
00:01:05,08 --> 00:01:08,01
that is called a bounding box.

24
00:01:08,01 --> 00:01:11,04
That is the output of an object detection model.

25
00:01:11,04 --> 00:01:16,06
It identifies objects in the images and it puts a box

26
00:01:16,06 --> 00:01:19,03
and it recognizes a person

27
00:01:19,03 --> 00:01:22,09
with 95.5 degree confidence.

28
00:01:22,09 --> 00:01:25,06
And this one is a skateboard

29
00:01:25,06 --> 00:01:28,09
and it recognizes it with 90% confidence.

30
00:01:28,09 --> 00:01:34,02
What does that mean? Remember, all AI is stochastic.

31
00:01:34,02 --> 00:01:36,06
That means it is making a prediction

32
00:01:36,06 --> 00:01:38,08
every time you use the AI.

33
00:01:38,08 --> 00:01:41,05
So this AI is an inference AI.

34
00:01:41,05 --> 00:01:44,07
This is a pre-trained model from Azure

35
00:01:44,07 --> 00:01:47,04
and we are users of this model

36
00:01:47,04 --> 00:01:51,01
and we are giving this data, this image as the data

37
00:01:51,01 --> 00:01:53,04
for the model to make a prediction

38
00:01:53,04 --> 00:01:56,00
to say these are two common objects

39
00:01:56,00 --> 00:01:58,05
and it looks at a person also as an object, right?

40
00:01:58,05 --> 00:02:01,06
So it's, a object doesn't mean an inanimate object.

41
00:02:01,06 --> 00:02:04,04
So it recognizes a person and a skateboard.

42
00:02:04,04 --> 00:02:09,00
So it is giving the statistical confidence of how sure

43
00:02:09,00 --> 00:02:11,04
that it is a skateboard.

44
00:02:11,04 --> 00:02:14,03
You see something called a threshold value here.

45
00:02:14,03 --> 00:02:17,03
That is something that you have control to decide

46
00:02:17,03 --> 00:02:21,09
how flexible you are in accepting a low confidence

47
00:02:21,09 --> 00:02:23,07
or high confidence from the AI.

48
00:02:23,07 --> 00:02:26,02
In this example, it has only two objects.

49
00:02:26,02 --> 00:02:29,07
So why don't I pick another one. So let's take this one.

50
00:02:29,07 --> 00:02:33,09
As soon as I put it here, it is recognizing three people.

51
00:02:33,09 --> 00:02:37,04
So it says person 85.6% confidence,

52
00:02:37,04 --> 00:02:40,07
another person is 76.5% confidence,

53
00:02:40,07 --> 00:02:44,04
and a third person at 72.3% confidence.

54
00:02:44,04 --> 00:02:49,01
So outside of that, we can also see it recognizes objects,

55
00:02:49,01 --> 00:02:52,01
it sees a laptop, it sees the seating

56
00:02:52,01 --> 00:02:56,07
and more seating, and it also sees the table and footwear.

57
00:02:56,07 --> 00:03:00,04
That's beautiful. So all of these objects are recognized.

58
00:03:00,04 --> 00:03:02,02
So now, you're going to play with the threshold.

59
00:03:02,02 --> 00:03:07,00
So threshold, we can make it low or we can make it high.

60
00:03:07,00 --> 00:03:09,03
If I make the threshold as high,

61
00:03:09,03 --> 00:03:11,07
then it does not bring as many objects.

62
00:03:11,07 --> 00:03:16,06
So you can see if you want the AI to show you objects

63
00:03:16,06 --> 00:03:19,07
that it is highly confident with a high degree

64
00:03:19,07 --> 00:03:23,01
of confidence, then you can increase the threshold.

65
00:03:23,01 --> 00:03:26,04
If you say it doesn't matter, even if you have low degree

66
00:03:26,04 --> 00:03:27,08
of confidence, show me everything

67
00:03:27,08 --> 00:03:29,09
that you can recognize in this object,

68
00:03:29,09 --> 00:03:32,07
then you can actually find more things.

69
00:03:32,07 --> 00:03:35,00
What is the difference? Can you think for a minute?

70
00:03:35,00 --> 00:03:38,06
Why would this matter? What can you do with it?

71
00:03:38,06 --> 00:03:41,06
So it goes back to what do you want

72
00:03:41,06 --> 00:03:44,05
this computer vision model to do for you?

73
00:03:44,05 --> 00:03:48,09
So if this is going to be a robotic surgery

74
00:03:48,09 --> 00:03:52,01
and a robotic arm that is going to use the camera

75
00:03:52,01 --> 00:03:54,07
to look at things, then you would want it

76
00:03:54,07 --> 00:03:58,07
to have a very high degree of confidence, right?

77
00:03:58,07 --> 00:04:00,06
Something like that.

78
00:04:00,06 --> 00:04:03,09
If it is just identifying things around

79
00:04:03,09 --> 00:04:05,07
in your office or in your warehouse

80
00:04:05,07 --> 00:04:09,00
or counting inventory, then you might not be

81
00:04:09,00 --> 00:04:12,06
so upset if it misses out a few objects

82
00:04:12,06 --> 00:04:14,02
or it adds a few more objects

83
00:04:14,02 --> 00:04:15,09
with a low degree of confidence.

84
00:04:15,09 --> 00:04:19,07
So it is important for you to think about what is it

85
00:04:19,07 --> 00:04:23,02
that you want the object detection model to do.

86
00:04:23,02 --> 00:04:26,00
Object detection is a very important model used

87
00:04:26,00 --> 00:04:28,02
by autonomous vehicles that we have been using

88
00:04:28,02 --> 00:04:30,03
as an example in this course.

89
00:04:30,03 --> 00:04:34,09
But you can apply this for any camera, anything with vision.

90
00:04:34,09 --> 00:04:38,07
Okay, so let us move ahead

91
00:04:38,07 --> 00:04:41,01
and I've shown you two examples.

92
00:04:41,01 --> 00:04:44,09
I want to actually show you a different image

93
00:04:44,09 --> 00:04:47,00
that is not in the examples.

94
00:04:47,00 --> 00:04:49,09
So the way you insert your own image

95
00:04:49,09 --> 00:04:52,06
is you click on browse for a file.

96
00:04:52,06 --> 00:04:55,06
So I brought this image as a sample image

97
00:04:55,06 --> 00:04:57,01
and you can see it's working

98
00:04:57,01 --> 00:05:00,07
and it has identified couple humans in there.

99
00:05:00,07 --> 00:05:03,00
Oh, it's recognizing three people in here.

100
00:05:03,00 --> 00:05:08,02
What about this person and what about this robotic arm?

101
00:05:08,02 --> 00:05:13,02
So again, it goes back to how confident you want this to be

102
00:05:13,02 --> 00:05:17,00
and it it brings it down to one person at high degree

103
00:05:17,00 --> 00:05:19,08
of confidence, two people or three people.

104
00:05:19,08 --> 00:05:23,05
And however low I go on my threshold,

105
00:05:23,05 --> 00:05:27,07
it does not recognize this robotic arm or a fire spark here

106
00:05:27,07 --> 00:05:30,04
or the cables, nothing else.

107
00:05:30,04 --> 00:05:32,01
So if I want to see what is this,

108
00:05:32,01 --> 00:05:34,00
it doesn't recognize the table.

109
00:05:34,00 --> 00:05:36,05
So again, it goes back to your use case

110
00:05:36,05 --> 00:05:40,00
and how important that it recognizes everyone.

111
00:05:40,00 --> 00:05:44,06
So I would say if it recognizes three humans in here

112
00:05:44,06 --> 00:05:48,02
and you're building something like a robotic arm

113
00:05:48,02 --> 00:05:52,00
that is going to be doing welding and it is going to coexist

114
00:05:52,00 --> 00:05:55,06
and work well with humans, then it is important

115
00:05:55,06 --> 00:05:57,02
to recognize all humans.

116
00:05:57,02 --> 00:05:59,00
Let me give you another example.

117
00:05:59,00 --> 00:06:00,08
Think of a warehouse

118
00:06:00,08 --> 00:06:04,04
and it could be autonomous mobility robot

119
00:06:04,04 --> 00:06:08,08
that is moving around in the warehouse or a factory.

120
00:06:08,08 --> 00:06:13,04
Then it cannot say, I can recognize three people

121
00:06:13,04 --> 00:06:15,09
and not the other person.

122
00:06:15,09 --> 00:06:18,03
What if this robot is moving around?

123
00:06:18,03 --> 00:06:20,08
The autonomous mobility robots are called AMRs.

124
00:06:20,08 --> 00:06:22,09
What if the AMR is moving around

125
00:06:22,09 --> 00:06:27,09
and any of these people are opening an elevator door

126
00:06:27,09 --> 00:06:29,01
and coming in?

127
00:06:29,01 --> 00:06:31,08
It has to stop for everyone equally.

128
00:06:31,08 --> 00:06:33,07
So that becomes very important.

129
00:06:33,07 --> 00:06:38,02
So what do you do if your existing model does not recognize

130
00:06:38,02 --> 00:06:40,09
a situation that is important for your work?

131
00:06:40,09 --> 00:06:43,08
That is why you have the option of testing this out

132
00:06:43,08 --> 00:06:46,06
with your own images for your own environment.

133
00:06:46,06 --> 00:06:50,01
And if it doesn't still work, then you go down here

134
00:06:50,01 --> 00:06:52,07
and it says you want to use your own images

135
00:06:52,07 --> 00:06:55,05
and you'll have to sign up for Azure

136
00:06:55,05 --> 00:06:58,06
and it costs you to try out the REST API

137
00:06:58,06 --> 00:07:00,08
and all the details of their API

138
00:07:00,08 --> 00:07:03,03
and the SDK reference is all available here.

139
00:07:03,03 --> 00:07:06,01
So your data scientists can actually use this

140
00:07:06,01 --> 00:07:10,03
pre-trained model and train this model to become smarter,

141
00:07:10,03 --> 00:07:13,08
to understand more objects to the degree of confidence

142
00:07:13,08 --> 00:07:16,07
that you want using the images that you provide.

143
00:07:16,07 --> 00:07:21,05
So computer cameras are essentially

144
00:07:21,05 --> 00:07:24,05
trying to recreate vision as in human vision.

145
00:07:24,05 --> 00:07:25,08
And we take it for granted.

146
00:07:25,08 --> 00:07:28,09
It is a very complicated phenomena

147
00:07:28,09 --> 00:07:31,03
for us to see our environment.

148
00:07:31,03 --> 00:07:36,00
So it is not just that we are seeing things, they are in 3D.

149
00:07:36,00 --> 00:07:40,03
So when you collect images of your own environment

150
00:07:40,03 --> 00:07:44,09
and you want to train your model, you have to think about

151
00:07:44,09 --> 00:07:48,00
how the human vision is a miracle.

152
00:07:48,00 --> 00:07:50,05
It understands multiple layers of the image.

153
00:07:50,05 --> 00:07:53,07
It understands background angle, lighting, placement.

154
00:07:53,07 --> 00:07:55,09
You'll have to capture all that in your workflow.

155
00:07:55,09 --> 00:07:58,08
So you have to collect sample images

156
00:07:58,08 --> 00:08:01,06
that covers different lighting, daytime, evening,

157
00:08:01,06 --> 00:08:03,06
twilight, different angle.

158
00:08:03,06 --> 00:08:05,08
So the difference between these people

159
00:08:05,08 --> 00:08:08,03
and this person might be the angle

160
00:08:08,03 --> 00:08:10,05
in which they are standing,

161
00:08:10,05 --> 00:08:13,08
or it could be that this image is layered

162
00:08:13,08 --> 00:08:16,08
and this person is at a layer at the back.

163
00:08:16,08 --> 00:08:20,01
So you might want to capture that placement of the people

164
00:08:20,01 --> 00:08:22,05
or the objects that you want to identify.

165
00:08:22,05 --> 00:08:24,07
A different background might confuse

166
00:08:24,07 --> 00:08:26,02
the computer vision model.

167
00:08:26,02 --> 00:08:28,05
And so you might want to give the same objects

168
00:08:28,05 --> 00:08:30,07
in multiple different backgrounds

169
00:08:30,07 --> 00:08:34,02
to teach the algorithm that it is the same object

170
00:08:34,02 --> 00:08:36,02
regardless of the background where it is.

171
00:08:36,02 --> 00:08:39,06
So there's a lot of different samples that you have to give

172
00:08:39,06 --> 00:08:44,01
and you have to bring your business acumen of what you want,

173
00:08:44,01 --> 00:08:46,08
how your environment changes over time

174
00:08:46,08 --> 00:08:48,02
and what you want to capture,

175
00:08:48,02 --> 00:08:50,04
what is important for your safety.

176
00:08:50,04 --> 00:08:53,06
And later, we will learn how you want

177
00:08:53,06 --> 00:08:55,02
to think about the data privacy

178
00:08:55,02 --> 00:08:56,08
of what you don't want to capture.

179
00:08:56,08 --> 00:08:59,07
I can see this person is holding a tablet,

180
00:08:59,07 --> 00:09:01,06
this person is holding a paper.

181
00:09:01,06 --> 00:09:05,00
Does it matter in your company, in your environment?

182
00:09:05,00 --> 00:09:07,00
You might want to think about the privacy implication

183
00:09:07,00 --> 00:09:10,08
of what you are using to train the model too.

184
00:09:10,08 --> 00:09:13,05
So there's a lot to learn here.

185
00:09:13,05 --> 00:09:15,05
The easiest thing what we learned here

186
00:09:15,05 --> 00:09:17,04
is to use an existing image

187
00:09:17,04 --> 00:09:19,08
and see how it puts bounding boxes

188
00:09:19,08 --> 00:09:21,05
and does object detection.

189
00:09:21,05 --> 00:09:23,07
The next step was to understand

190
00:09:23,07 --> 00:09:25,02
that there is difference in which

191
00:09:25,02 --> 00:09:27,00
it understand different objects.

192
00:09:27,00 --> 00:09:29,07
And the third thing is for you to understand

193
00:09:29,07 --> 00:09:33,08
how you as a product manager get to control the threshold

194
00:09:33,08 --> 00:09:36,09
on what is applicable for your own application.

195
00:09:36,09 --> 00:09:40,09
And then finally, you can test it with your own image.

196
00:09:40,09 --> 00:09:42,05
And if you do not want

197
00:09:42,05 --> 00:09:45,00
to use the existing model, it doesn't work for you,

198
00:09:45,00 --> 00:09:48,03
you can try vision model from some other technology company.

199
00:09:48,03 --> 00:09:50,09
I'm showing you Azure's Vision Studio,

200
00:09:50,09 --> 00:09:53,04
and test and see what works for you.

201
00:09:53,04 --> 00:09:57,07
Or as a last resort, you can go train this existing model

202
00:09:57,07 --> 00:09:59,03
with your own data set

203
00:09:59,03 --> 00:10:01,05
or your own collection of images

204
00:10:01,05 --> 00:10:02,09
so that you're training the model.

205
00:10:02,09 --> 00:10:05,02
At that point, it is important to remember

206
00:10:05,02 --> 00:10:08,00
the difference between training and inference.

207
00:10:08,00 --> 00:10:10,01
What you have here is an inference model.

208
00:10:10,01 --> 00:10:13,02
It has been pre-trained by Azure and is given to you.

209
00:10:13,02 --> 00:10:15,07
You are a user, it's an inference model.

210
00:10:15,07 --> 00:10:17,08
The minute you say, I'm going to train it

211
00:10:17,08 --> 00:10:22,07
and take a whole bunch of images and run the SDK

212
00:10:22,07 --> 00:10:27,06
or API on top of it, you are training the AI to learn

213
00:10:27,06 --> 00:10:29,04
to get the next level of AI.

214
00:10:29,04 --> 00:10:32,01
Again, you have to test it and then bring it to inference

215
00:10:32,01 --> 00:10:34,06
and then take it to your customer, to your environment

216
00:10:34,06 --> 00:10:36,06
to test whether it works for you to see

217
00:10:36,06 --> 00:10:38,08
exactly what it is detecting

218
00:10:38,08 --> 00:10:41,08
so that it works for your example.

219
00:10:41,08 --> 00:10:44,06
So you might want to just try this out

220
00:10:44,06 --> 00:10:47,07
with different camera setting for factory equipments or city

221
00:10:47,07 --> 00:10:49,02
or home or medical devices

222
00:10:49,02 --> 00:10:51,07
or the AMR in the warehouse that we were talking about.

223
00:10:51,07 --> 00:10:53,07
Next, we will do an interesting challenge.

224
00:10:53,07 --> 00:10:54,07
And before I wrap up,

225
00:10:54,07 --> 00:10:56,07
I want to give you two more references.

226
00:10:56,07 --> 00:11:00,02
One is, if you're thinking about artificial intelligence

227
00:11:00,02 --> 00:11:03,00
in IoT and Edge devices, you can refer

228
00:11:03,00 --> 00:11:04,09
to my course artificial intelligence

229
00:11:04,09 --> 00:11:07,02
and connected products right here on LinkedIn Learning,

230
00:11:07,02 --> 00:11:10,04
especially if you want to learn about computer vision

231
00:11:10,04 --> 00:11:13,00
and different models of computer vision,

232
00:11:13,00 --> 00:11:15,09
there is a special lesson just for that.

233
00:11:15,09 --> 00:11:18,05
And the other thing is, if you're thinking about,

234
00:11:18,05 --> 00:11:20,07
hey, this is human-computer interface

235
00:11:20,07 --> 00:11:22,09
and the device is standing here

236
00:11:22,09 --> 00:11:25,03
with the sparkly fire in here and the people

237
00:11:25,03 --> 00:11:28,05
are standing here, how do I design the AI

238
00:11:28,05 --> 00:11:33,00
to work as a proper application in a human-centered way?

239
00:11:33,00 --> 00:11:35,00
Then you can refer to my course AI design

240
00:11:35,00 --> 00:11:36,02
Using Autonomous Vehicles.

241
00:11:36,02 --> 00:11:37,03
It's actually Innovating

242
00:11:37,03 --> 00:11:39,07
With AI Design Using Autonomous Vehicles

243
00:11:39,07 --> 00:11:41,05
and you can learn about AI on variety

244
00:11:41,05 --> 00:11:45,02
of different AI on Edge devices including computer vision.

245
00:11:45,02 --> 00:11:47,05
So next, I'm going to see you at a challenge

246
00:11:47,05 --> 00:11:49,04
and I'll give you the solution

247
00:11:49,04 --> 00:11:51,06
and we will take this lesson forward

248
00:11:51,06 --> 00:11:55,00
with your own image from your own environment.