1
00:00:00,660 --> 00:00:09,810
We've got our data now in X train y train X validation y validation but in a previous video we evaluated

2
00:00:09,810 --> 00:00:12,760
our model using the score function.

3
00:00:12,840 --> 00:00:16,410
What does the score function use that uses the coefficient of determination.

4
00:00:16,800 --> 00:00:22,410
But if we remember right back at the top will go right back out that are just to check it out because

5
00:00:22,410 --> 00:00:23,730
we wrote this down.

6
00:00:23,730 --> 00:00:27,720
We know that the evaluation metric for this competition.

7
00:00:27,750 --> 00:00:35,670
So for this project that we're working on is R M S L E or in other words root means squared log error.

8
00:00:35,790 --> 00:00:38,010
That's not the same as coefficient of determination.

9
00:00:38,010 --> 00:00:43,230
And if you're wondering where we got this evaluation function from it's from the code competition overview

10
00:00:43,230 --> 00:00:47,790
dot evaluation which we saw right at the start but we'll just check that out again.

11
00:00:47,790 --> 00:00:48,880
Evaluation.

12
00:00:49,020 --> 00:00:55,200
So the evaluation metric for this competition is the R M S L we can Google that to see what it actually

13
00:00:55,200 --> 00:00:55,530
is.

14
00:00:56,310 --> 00:00:57,630
Let's go here.

15
00:00:57,720 --> 00:00:59,250
Root means squared log error

16
00:01:02,490 --> 00:01:07,130
what is the difference between how do you interpret root mean squared log root error.

17
00:01:07,140 --> 00:01:08,890
Okay so we got a fair few options.

18
00:01:08,920 --> 00:01:12,090
So if you wanted to read further these are something you can look up but we're going to work through

19
00:01:12,090 --> 00:01:13,500
this in code.

20
00:01:13,500 --> 00:01:14,960
Let's go down.

21
00:01:15,270 --> 00:01:20,520
You might be wondering we haven't seen or really mentioned root means squared log error.

22
00:01:20,580 --> 00:01:22,380
That's a bit of a mouthful in itself.

23
00:01:22,380 --> 00:01:28,350
So what we might do we're gonna have to create our own evaluation function because if we come back to

24
00:01:28,350 --> 00:01:34,150
our keynote so classification and regression metrics we're working on regression problem here.

25
00:01:34,170 --> 00:01:37,910
This is one of the slides from the SBA loan section.

26
00:01:37,930 --> 00:01:39,810
We got to mean absolute error.

27
00:01:39,930 --> 00:01:42,890
Okay means squared error MSE.

28
00:01:43,080 --> 00:01:43,860
Okay.

29
00:01:43,890 --> 00:01:45,150
Root means squared error.

30
00:01:45,180 --> 00:01:47,520
But this is missing the log component.

31
00:01:47,580 --> 00:01:50,550
Let's see if we could get that in through SBA loan.

32
00:01:50,760 --> 00:01:54,440
So we'll go back here what we might do is build our own.

33
00:01:54,510 --> 00:01:57,560
So building and an evaluation function now.

34
00:01:57,780 --> 00:02:02,460
This is something you may have to do depending on the project you're working on right.

35
00:02:02,460 --> 00:02:07,080
So if we're working on this cable competition and they say that for some reason we have to use this

36
00:02:07,080 --> 00:02:07,690
metric.

37
00:02:07,740 --> 00:02:12,570
Who knows the people who posted this challenge may require this they may have decided this is their

38
00:02:12,570 --> 00:02:16,070
best evaluation metric depending on what project you're working on.

39
00:02:16,080 --> 00:02:21,810
You may have to find an evaluation function which suits your use case the best.

40
00:02:21,980 --> 00:02:28,880
And so what we're going to do is create a evaluation function because we're going to be running multiple

41
00:02:28,880 --> 00:02:33,110
experiments we want to be feeding multiple different machine learning models and evaluating them.

42
00:02:33,110 --> 00:02:38,690
That's why we're going to build it into a function so we can use this functionality multiple times without

43
00:02:38,840 --> 00:02:42,750
having to repeat ourselves because we're kind of lazy when it comes to writing or this code.

44
00:02:42,740 --> 00:02:49,030
You know I just want to experiment more and more and create a valuation function.

45
00:02:49,040 --> 00:02:55,860
The competition uses root mean square low error M S L A.

46
00:02:55,910 --> 00:02:59,440
So you go from S K lined up metrics that can tell you.

47
00:02:59,480 --> 00:03:08,570
So I get learn you could go to the SBA loan valuation metrics but we can go through here we can have

48
00:03:08,570 --> 00:03:12,500
a look at some evaluation metrics main squared log era.

49
00:03:13,470 --> 00:03:13,790
Okay.

50
00:03:13,820 --> 00:03:20,750
So we might use that mean squared log era but we're going to have to add our own little piece on it

51
00:03:20,750 --> 00:03:23,150
because this is not root mean square logout.

52
00:03:23,180 --> 00:03:28,360
So what we might do is it might take this one we can import that let's do that.

53
00:03:28,870 --> 00:03:38,830
So from S.K. learned up metrics import mean squared log error and for the hell of it or do mean absolute

54
00:03:38,920 --> 00:03:46,210
error you make predictions on all the examples and then you minus your predictions from your actual

55
00:03:46,210 --> 00:03:49,380
values and take the average that's the mean absolute error.

56
00:03:49,420 --> 00:03:53,880
So how much on average is your prediction from the actual sale price.

57
00:03:54,400 --> 00:04:01,240
But Main squared log error is more to do with the ratio and most often times with regression problems

58
00:04:01,240 --> 00:04:01,770
right.

59
00:04:01,780 --> 00:04:06,820
If you're like 10 dollars off something like So if we were 10 dollars off this sale price prediction

60
00:04:07,330 --> 00:04:08,280
wouldn't really matter.

61
00:04:08,380 --> 00:04:11,920
But if we were 10 per cent off that'd be pretty bad right.

62
00:04:11,920 --> 00:04:16,410
We wouldn't want to be more than 10 percent off whatever value insert something it could be 20 percent

63
00:04:16,420 --> 00:04:17,560
could be 30 percent.

64
00:04:17,560 --> 00:04:20,020
But that's kind of the difference between these two right.

65
00:04:20,020 --> 00:04:25,930
Anytime you see absolute think of it as being 10 dollars off and anytime you see like squared or some

66
00:04:25,930 --> 00:04:32,230
sort of log or think of it like that the ratio so being like 10 percent off that's when are these two

67
00:04:32,230 --> 00:04:33,920
you can really compare them to each other.

68
00:04:33,970 --> 00:04:36,590
But we've got a little bit more of that in the slide.

69
00:04:36,700 --> 00:04:40,900
You might want to take a screenshot of that that is a pretty good one.

70
00:04:40,900 --> 00:04:42,350
But this will all be in the slide.

71
00:04:42,350 --> 00:04:43,930
You can check out later.

72
00:04:43,990 --> 00:04:47,490
We want to create a valuation function we've got main squared log era.

73
00:04:47,490 --> 00:04:52,840
So how would we turn that into root mean square logo.

74
00:04:52,870 --> 00:04:55,840
So let's try it out so maybe we go here.

75
00:04:55,840 --> 00:05:06,660
I don't want to type out root I mean that's too much R S M S Ali root main squared logger.

76
00:05:06,700 --> 00:05:12,940
Beautiful and that can just take Y test and Y prance gonna take the same inputs as my psychic learn

77
00:05:12,930 --> 00:05:19,570
function so see how this takes true why red it could be y true could be why test doesn't really matter.

78
00:05:19,720 --> 00:05:20,630
Why test.

79
00:05:20,770 --> 00:05:26,820
And then we go here we'll leave a little doctoring for ourselves calculates

80
00:05:29,370 --> 00:05:41,390
root main squared log era between predictions and true true labels.

81
00:05:41,390 --> 00:05:42,860
Wonderful.

82
00:05:42,860 --> 00:05:46,350
And then we might go return.

83
00:05:46,630 --> 00:05:47,630
Nice and simple.

84
00:05:47,720 --> 00:05:59,960
We can use shoes empty square root of main squared log era it will have to pass this to the test matches

85
00:05:59,990 --> 00:06:03,540
and the predictions to mean squared log error.

86
00:06:03,610 --> 00:06:05,090
That's actually pretty easy.

87
00:06:05,240 --> 00:06:13,110
What we're doing here we're taking the square root so right sort of root component of the means squared.

88
00:06:13,130 --> 00:06:17,940
Log era the squared log error between predictions and true labels.

89
00:06:17,960 --> 00:06:18,840
Wonderful.

90
00:06:18,860 --> 00:06:23,990
So we've created our own evaluation function that lines up with Cargill's requirements.

91
00:06:24,440 --> 00:06:25,880
The root means squared log error.

92
00:06:26,790 --> 00:06:27,630
Okay.

93
00:06:27,710 --> 00:06:32,870
And we probably want to check a few more things like we want and create an evaluation function to check

94
00:06:32,870 --> 00:06:36,110
a few things at the same time so we don't have to keep calling something like this.

95
00:06:36,110 --> 00:06:45,930
So let's create function to evaluate model on a few different levels.

96
00:06:45,980 --> 00:06:49,820
So again evaluation is just as important as building the actual model.

97
00:06:49,850 --> 00:06:59,900
So we'll go here show scores models so we'll pass it our model we'll make some predictions on the training

98
00:07:00,320 --> 00:07:03,380
data some model not predict.

99
00:07:03,410 --> 00:07:11,330
So we'll evaluate how our model did on the training data X train and then we'll evaluate how our model

100
00:07:11,330 --> 00:07:19,400
did on the validation data model dot predict we'll pass it our X validation here if our model is performing

101
00:07:19,400 --> 00:07:27,300
better on the validation data set that's hinting at us that our model is over fitting.

102
00:07:27,450 --> 00:07:32,820
So between training and validation generally because our validation set only has 12000 examples and

103
00:07:32,820 --> 00:07:37,920
we're not training our model on it you'd want to see slightly worse metrics on the validation set.

104
00:07:37,920 --> 00:07:39,120
Investing the training set.

105
00:07:39,960 --> 00:07:44,430
So we'll go scores equal maybe we'll return a dictionary of some sort.

106
00:07:44,460 --> 00:07:44,700
Yeah.

107
00:07:44,730 --> 00:07:48,510
So training and they a.k.a. mean absolute error.

108
00:07:48,510 --> 00:07:49,750
We're gonna go.

109
00:07:49,850 --> 00:07:52,050
Can I tab auto complete mean absolute error.

110
00:07:52,080 --> 00:07:53,070
Yes I can.

111
00:07:53,070 --> 00:07:53,850
Why train.

112
00:07:53,850 --> 00:07:56,990
So remember we're just comparing the Why train.

113
00:07:57,000 --> 00:08:02,900
So this is the training mean absolute error on the training labels versus the training predictions.

114
00:08:03,090 --> 00:08:04,570
And beautiful.

115
00:08:04,680 --> 00:08:06,880
We're gonna go valid M.A..

116
00:08:07,080 --> 00:08:13,590
So again you'd expect this valid M.A. To be slightly worse and the training is going to be why valid

117
00:08:13,780 --> 00:08:24,210
then this is gonna be valid reds and then we're gonna go training and excel for root means squared error

118
00:08:24,240 --> 00:08:28,710
and we can do this using our fancy little function that we've just created up here.

119
00:08:28,710 --> 00:08:32,030
This is exciting but writing our own evaluation functions.

120
00:08:32,040 --> 00:08:34,670
Maybe that's a pull request potential cyclone.

121
00:08:34,690 --> 00:08:38,260
Wonder why they haven't got root mean squared log error.

122
00:08:38,280 --> 00:08:39,660
Maybe it's not that popular.

123
00:08:39,660 --> 00:08:43,470
Or maybe you can just easily create it yourself like that.

124
00:08:43,620 --> 00:08:44,160
Is that correct.

125
00:08:44,160 --> 00:08:44,760
Why train.

126
00:08:44,760 --> 00:08:45,390
Train prints.

127
00:08:45,390 --> 00:08:47,580
I'm talking too much.

128
00:08:47,580 --> 00:08:48,990
Don't talk in code yet.

129
00:08:49,070 --> 00:08:49,640
No one can.

130
00:08:49,650 --> 00:08:51,120
You can do whatever you want.

131
00:08:51,120 --> 00:08:53,450
Here we go here and there Sally.

132
00:08:53,460 --> 00:08:56,910
Except don't under evaluate your machine learning models.

133
00:08:56,910 --> 00:09:00,480
That's the only thing you're not allowed to do.

134
00:09:00,480 --> 00:09:06,510
Everyone loves a good model but everyone loves and a well evaluated machine learning model even more.

135
00:09:06,510 --> 00:09:16,530
And finally for good measure since we're here let's do the default score and we just import that.

136
00:09:16,530 --> 00:09:20,580
That's probably a bit better to try to score

137
00:09:23,570 --> 00:09:24,320
what is this.

138
00:09:24,380 --> 00:09:26,870
Why is it not working tab.

139
00:09:26,940 --> 00:09:30,140
I can play is given me some issues.

140
00:09:30,140 --> 00:09:31,050
So we've got to ask.

141
00:09:31,070 --> 00:09:32,510
Just on a R2 score.

142
00:09:32,570 --> 00:09:33,110
Yeah.

143
00:09:33,110 --> 00:09:34,950
Trust your instincts Daniel come on.

144
00:09:35,370 --> 00:09:36,320
R to score.

145
00:09:36,590 --> 00:09:41,870
So that's all we're changing there rather than NA Rican puting something will are just imported now.

146
00:09:41,870 --> 00:09:44,700
This is fairly decent function not going to lie.

147
00:09:44,810 --> 00:09:46,290
Takes care of a few things.

148
00:09:46,400 --> 00:09:47,470
Train friends.

149
00:09:47,540 --> 00:09:50,860
Probably some redundancy we could fix but that's alright.

150
00:09:50,930 --> 00:09:54,920
We're trying to experiment as fast as possible you might be thinking about how experiments are getting

151
00:09:54,920 --> 00:09:59,720
slowed down but with your own practice without having to listen to me talk and go through these sort

152
00:09:59,720 --> 00:10:01,970
of things you'll be whizzing through experiments.

153
00:10:01,980 --> 00:10:03,710
No that's wrong isn't it.

154
00:10:04,600 --> 00:10:11,410
This needs to be veiled threats prints are beautiful and we're going to return the scores.

155
00:10:11,410 --> 00:10:11,820
Okay.

156
00:10:11,850 --> 00:10:13,360
So what have we done here.

157
00:10:14,140 --> 00:10:19,780
So it built an evaluation function inline with what's required for this particular project and in our

158
00:10:19,780 --> 00:10:21,490
case its root mean square logger.

159
00:10:21,520 --> 00:10:26,770
I'm gonna go hoarse saying that if I keep doing it and then we've created another little helper function

160
00:10:27,340 --> 00:10:31,840
that's gonna do a whole bunch of other metrics is gonna make some predictions using a model we pass

161
00:10:31,840 --> 00:10:32,220
it to.

162
00:10:32,710 --> 00:10:36,940
And then it's gonna compare those predictions on a whole bunch of different playing fields on the mean

163
00:10:36,940 --> 00:10:43,680
absolute error on the root mean square log error and on the coefficient of determination.

164
00:10:44,170 --> 00:10:46,600
So that's enough for this one video.

165
00:10:46,600 --> 00:10:51,670
Building our own custom evaluation functions now again is something you might want to do if your own

166
00:10:51,670 --> 00:10:54,900
projects because we are running as many experiments as possible.

167
00:10:54,940 --> 00:10:57,950
It is helpful to have evaluation functions like this.

168
00:10:57,950 --> 00:11:02,950
You can quickly do a whole bunch of different things rather than having to retype out this every time

169
00:11:02,950 --> 00:11:09,800
we try a new model but lets our press shift an intermezzo our functions are instantiated and we're going

170
00:11:09,800 --> 00:11:11,600
to try and test them out the next video.