1 00:00:00,660 --> 00:00:09,810 We've got our data now in X train y train X validation y validation but in a previous video we evaluated 2 00:00:09,810 --> 00:00:12,760 our model using the score function. 3 00:00:12,840 --> 00:00:16,410 What does the score function use that uses the coefficient of determination. 4 00:00:16,800 --> 00:00:22,410 But if we remember right back at the top will go right back out that are just to check it out because 5 00:00:22,410 --> 00:00:23,730 we wrote this down. 6 00:00:23,730 --> 00:00:27,720 We know that the evaluation metric for this competition. 7 00:00:27,750 --> 00:00:35,670 So for this project that we're working on is R M S L E or in other words root means squared log error. 8 00:00:35,790 --> 00:00:38,010 That's not the same as coefficient of determination. 9 00:00:38,010 --> 00:00:43,230 And if you're wondering where we got this evaluation function from it's from the code competition overview 10 00:00:43,230 --> 00:00:47,790 dot evaluation which we saw right at the start but we'll just check that out again. 11 00:00:47,790 --> 00:00:48,880 Evaluation. 12 00:00:49,020 --> 00:00:55,200 So the evaluation metric for this competition is the R M S L we can Google that to see what it actually 13 00:00:55,200 --> 00:00:55,530 is. 14 00:00:56,310 --> 00:00:57,630 Let's go here. 15 00:00:57,720 --> 00:00:59,250 Root means squared log error 16 00:01:02,490 --> 00:01:07,130 what is the difference between how do you interpret root mean squared log root error. 17 00:01:07,140 --> 00:01:08,890 Okay so we got a fair few options. 18 00:01:08,920 --> 00:01:12,090 So if you wanted to read further these are something you can look up but we're going to work through 19 00:01:12,090 --> 00:01:13,500 this in code. 20 00:01:13,500 --> 00:01:14,960 Let's go down. 21 00:01:15,270 --> 00:01:20,520 You might be wondering we haven't seen or really mentioned root means squared log error. 22 00:01:20,580 --> 00:01:22,380 That's a bit of a mouthful in itself. 23 00:01:22,380 --> 00:01:28,350 So what we might do we're gonna have to create our own evaluation function because if we come back to 24 00:01:28,350 --> 00:01:34,150 our keynote so classification and regression metrics we're working on regression problem here. 25 00:01:34,170 --> 00:01:37,910 This is one of the slides from the SBA loan section. 26 00:01:37,930 --> 00:01:39,810 We got to mean absolute error. 27 00:01:39,930 --> 00:01:42,890 Okay means squared error MSE. 28 00:01:43,080 --> 00:01:43,860 Okay. 29 00:01:43,890 --> 00:01:45,150 Root means squared error. 30 00:01:45,180 --> 00:01:47,520 But this is missing the log component. 31 00:01:47,580 --> 00:01:50,550 Let's see if we could get that in through SBA loan. 32 00:01:50,760 --> 00:01:54,440 So we'll go back here what we might do is build our own. 33 00:01:54,510 --> 00:01:57,560 So building and an evaluation function now. 34 00:01:57,780 --> 00:02:02,460 This is something you may have to do depending on the project you're working on right. 35 00:02:02,460 --> 00:02:07,080 So if we're working on this cable competition and they say that for some reason we have to use this 36 00:02:07,080 --> 00:02:07,690 metric. 37 00:02:07,740 --> 00:02:12,570 Who knows the people who posted this challenge may require this they may have decided this is their 38 00:02:12,570 --> 00:02:16,070 best evaluation metric depending on what project you're working on. 39 00:02:16,080 --> 00:02:21,810 You may have to find an evaluation function which suits your use case the best. 40 00:02:21,980 --> 00:02:28,880 And so what we're going to do is create a evaluation function because we're going to be running multiple 41 00:02:28,880 --> 00:02:33,110 experiments we want to be feeding multiple different machine learning models and evaluating them. 42 00:02:33,110 --> 00:02:38,690 That's why we're going to build it into a function so we can use this functionality multiple times without 43 00:02:38,840 --> 00:02:42,750 having to repeat ourselves because we're kind of lazy when it comes to writing or this code. 44 00:02:42,740 --> 00:02:49,030 You know I just want to experiment more and more and create a valuation function. 45 00:02:49,040 --> 00:02:55,860 The competition uses root mean square low error M S L A. 46 00:02:55,910 --> 00:02:59,440 So you go from S K lined up metrics that can tell you. 47 00:02:59,480 --> 00:03:08,570 So I get learn you could go to the SBA loan valuation metrics but we can go through here we can have 48 00:03:08,570 --> 00:03:12,500 a look at some evaluation metrics main squared log era. 49 00:03:13,470 --> 00:03:13,790 Okay. 50 00:03:13,820 --> 00:03:20,750 So we might use that mean squared log era but we're going to have to add our own little piece on it 51 00:03:20,750 --> 00:03:23,150 because this is not root mean square logout. 52 00:03:23,180 --> 00:03:28,360 So what we might do is it might take this one we can import that let's do that. 53 00:03:28,870 --> 00:03:38,830 So from S.K. learned up metrics import mean squared log error and for the hell of it or do mean absolute 54 00:03:38,920 --> 00:03:46,210 error you make predictions on all the examples and then you minus your predictions from your actual 55 00:03:46,210 --> 00:03:49,380 values and take the average that's the mean absolute error. 56 00:03:49,420 --> 00:03:53,880 So how much on average is your prediction from the actual sale price. 57 00:03:54,400 --> 00:04:01,240 But Main squared log error is more to do with the ratio and most often times with regression problems 58 00:04:01,240 --> 00:04:01,770 right. 59 00:04:01,780 --> 00:04:06,820 If you're like 10 dollars off something like So if we were 10 dollars off this sale price prediction 60 00:04:07,330 --> 00:04:08,280 wouldn't really matter. 61 00:04:08,380 --> 00:04:11,920 But if we were 10 per cent off that'd be pretty bad right. 62 00:04:11,920 --> 00:04:16,410 We wouldn't want to be more than 10 percent off whatever value insert something it could be 20 percent 63 00:04:16,420 --> 00:04:17,560 could be 30 percent. 64 00:04:17,560 --> 00:04:20,020 But that's kind of the difference between these two right. 65 00:04:20,020 --> 00:04:25,930 Anytime you see absolute think of it as being 10 dollars off and anytime you see like squared or some 66 00:04:25,930 --> 00:04:32,230 sort of log or think of it like that the ratio so being like 10 percent off that's when are these two 67 00:04:32,230 --> 00:04:33,920 you can really compare them to each other. 68 00:04:33,970 --> 00:04:36,590 But we've got a little bit more of that in the slide. 69 00:04:36,700 --> 00:04:40,900 You might want to take a screenshot of that that is a pretty good one. 70 00:04:40,900 --> 00:04:42,350 But this will all be in the slide. 71 00:04:42,350 --> 00:04:43,930 You can check out later. 72 00:04:43,990 --> 00:04:47,490 We want to create a valuation function we've got main squared log era. 73 00:04:47,490 --> 00:04:52,840 So how would we turn that into root mean square logo. 74 00:04:52,870 --> 00:04:55,840 So let's try it out so maybe we go here. 75 00:04:55,840 --> 00:05:06,660 I don't want to type out root I mean that's too much R S M S Ali root main squared logger. 76 00:05:06,700 --> 00:05:12,940 Beautiful and that can just take Y test and Y prance gonna take the same inputs as my psychic learn 77 00:05:12,930 --> 00:05:19,570 function so see how this takes true why red it could be y true could be why test doesn't really matter. 78 00:05:19,720 --> 00:05:20,630 Why test. 79 00:05:20,770 --> 00:05:26,820 And then we go here we'll leave a little doctoring for ourselves calculates 80 00:05:29,370 --> 00:05:41,390 root main squared log era between predictions and true true labels. 81 00:05:41,390 --> 00:05:42,860 Wonderful. 82 00:05:42,860 --> 00:05:46,350 And then we might go return. 83 00:05:46,630 --> 00:05:47,630 Nice and simple. 84 00:05:47,720 --> 00:05:59,960 We can use shoes empty square root of main squared log era it will have to pass this to the test matches 85 00:05:59,990 --> 00:06:03,540 and the predictions to mean squared log error. 86 00:06:03,610 --> 00:06:05,090 That's actually pretty easy. 87 00:06:05,240 --> 00:06:13,110 What we're doing here we're taking the square root so right sort of root component of the means squared. 88 00:06:13,130 --> 00:06:17,940 Log era the squared log error between predictions and true labels. 89 00:06:17,960 --> 00:06:18,840 Wonderful. 90 00:06:18,860 --> 00:06:23,990 So we've created our own evaluation function that lines up with Cargill's requirements. 91 00:06:24,440 --> 00:06:25,880 The root means squared log error. 92 00:06:26,790 --> 00:06:27,630 Okay. 93 00:06:27,710 --> 00:06:32,870 And we probably want to check a few more things like we want and create an evaluation function to check 94 00:06:32,870 --> 00:06:36,110 a few things at the same time so we don't have to keep calling something like this. 95 00:06:36,110 --> 00:06:45,930 So let's create function to evaluate model on a few different levels. 96 00:06:45,980 --> 00:06:49,820 So again evaluation is just as important as building the actual model. 97 00:06:49,850 --> 00:06:59,900 So we'll go here show scores models so we'll pass it our model we'll make some predictions on the training 98 00:07:00,320 --> 00:07:03,380 data some model not predict. 99 00:07:03,410 --> 00:07:11,330 So we'll evaluate how our model did on the training data X train and then we'll evaluate how our model 100 00:07:11,330 --> 00:07:19,400 did on the validation data model dot predict we'll pass it our X validation here if our model is performing 101 00:07:19,400 --> 00:07:27,300 better on the validation data set that's hinting at us that our model is over fitting. 102 00:07:27,450 --> 00:07:32,820 So between training and validation generally because our validation set only has 12000 examples and 103 00:07:32,820 --> 00:07:37,920 we're not training our model on it you'd want to see slightly worse metrics on the validation set. 104 00:07:37,920 --> 00:07:39,120 Investing the training set. 105 00:07:39,960 --> 00:07:44,430 So we'll go scores equal maybe we'll return a dictionary of some sort. 106 00:07:44,460 --> 00:07:44,700 Yeah. 107 00:07:44,730 --> 00:07:48,510 So training and they a.k.a. mean absolute error. 108 00:07:48,510 --> 00:07:49,750 We're gonna go. 109 00:07:49,850 --> 00:07:52,050 Can I tab auto complete mean absolute error. 110 00:07:52,080 --> 00:07:53,070 Yes I can. 111 00:07:53,070 --> 00:07:53,850 Why train. 112 00:07:53,850 --> 00:07:56,990 So remember we're just comparing the Why train. 113 00:07:57,000 --> 00:08:02,900 So this is the training mean absolute error on the training labels versus the training predictions. 114 00:08:03,090 --> 00:08:04,570 And beautiful. 115 00:08:04,680 --> 00:08:06,880 We're gonna go valid M.A.. 116 00:08:07,080 --> 00:08:13,590 So again you'd expect this valid M.A. To be slightly worse and the training is going to be why valid 117 00:08:13,780 --> 00:08:24,210 then this is gonna be valid reds and then we're gonna go training and excel for root means squared error 118 00:08:24,240 --> 00:08:28,710 and we can do this using our fancy little function that we've just created up here. 119 00:08:28,710 --> 00:08:32,030 This is exciting but writing our own evaluation functions. 120 00:08:32,040 --> 00:08:34,670 Maybe that's a pull request potential cyclone. 121 00:08:34,690 --> 00:08:38,260 Wonder why they haven't got root mean squared log error. 122 00:08:38,280 --> 00:08:39,660 Maybe it's not that popular. 123 00:08:39,660 --> 00:08:43,470 Or maybe you can just easily create it yourself like that. 124 00:08:43,620 --> 00:08:44,160 Is that correct. 125 00:08:44,160 --> 00:08:44,760 Why train. 126 00:08:44,760 --> 00:08:45,390 Train prints. 127 00:08:45,390 --> 00:08:47,580 I'm talking too much. 128 00:08:47,580 --> 00:08:48,990 Don't talk in code yet. 129 00:08:49,070 --> 00:08:49,640 No one can. 130 00:08:49,650 --> 00:08:51,120 You can do whatever you want. 131 00:08:51,120 --> 00:08:53,450 Here we go here and there Sally. 132 00:08:53,460 --> 00:08:56,910 Except don't under evaluate your machine learning models. 133 00:08:56,910 --> 00:09:00,480 That's the only thing you're not allowed to do. 134 00:09:00,480 --> 00:09:06,510 Everyone loves a good model but everyone loves and a well evaluated machine learning model even more. 135 00:09:06,510 --> 00:09:16,530 And finally for good measure since we're here let's do the default score and we just import that. 136 00:09:16,530 --> 00:09:20,580 That's probably a bit better to try to score 137 00:09:23,570 --> 00:09:24,320 what is this. 138 00:09:24,380 --> 00:09:26,870 Why is it not working tab. 139 00:09:26,940 --> 00:09:30,140 I can play is given me some issues. 140 00:09:30,140 --> 00:09:31,050 So we've got to ask. 141 00:09:31,070 --> 00:09:32,510 Just on a R2 score. 142 00:09:32,570 --> 00:09:33,110 Yeah. 143 00:09:33,110 --> 00:09:34,950 Trust your instincts Daniel come on. 144 00:09:35,370 --> 00:09:36,320 R to score. 145 00:09:36,590 --> 00:09:41,870 So that's all we're changing there rather than NA Rican puting something will are just imported now. 146 00:09:41,870 --> 00:09:44,700 This is fairly decent function not going to lie. 147 00:09:44,810 --> 00:09:46,290 Takes care of a few things. 148 00:09:46,400 --> 00:09:47,470 Train friends. 149 00:09:47,540 --> 00:09:50,860 Probably some redundancy we could fix but that's alright. 150 00:09:50,930 --> 00:09:54,920 We're trying to experiment as fast as possible you might be thinking about how experiments are getting 151 00:09:54,920 --> 00:09:59,720 slowed down but with your own practice without having to listen to me talk and go through these sort 152 00:09:59,720 --> 00:10:01,970 of things you'll be whizzing through experiments. 153 00:10:01,980 --> 00:10:03,710 No that's wrong isn't it. 154 00:10:04,600 --> 00:10:11,410 This needs to be veiled threats prints are beautiful and we're going to return the scores. 155 00:10:11,410 --> 00:10:11,820 Okay. 156 00:10:11,850 --> 00:10:13,360 So what have we done here. 157 00:10:14,140 --> 00:10:19,780 So it built an evaluation function inline with what's required for this particular project and in our 158 00:10:19,780 --> 00:10:21,490 case its root mean square logger. 159 00:10:21,520 --> 00:10:26,770 I'm gonna go hoarse saying that if I keep doing it and then we've created another little helper function 160 00:10:27,340 --> 00:10:31,840 that's gonna do a whole bunch of other metrics is gonna make some predictions using a model we pass 161 00:10:31,840 --> 00:10:32,220 it to. 162 00:10:32,710 --> 00:10:36,940 And then it's gonna compare those predictions on a whole bunch of different playing fields on the mean 163 00:10:36,940 --> 00:10:43,680 absolute error on the root mean square log error and on the coefficient of determination. 164 00:10:44,170 --> 00:10:46,600 So that's enough for this one video. 165 00:10:46,600 --> 00:10:51,670 Building our own custom evaluation functions now again is something you might want to do if your own 166 00:10:51,670 --> 00:10:54,900 projects because we are running as many experiments as possible. 167 00:10:54,940 --> 00:10:57,950 It is helpful to have evaluation functions like this. 168 00:10:57,950 --> 00:11:02,950 You can quickly do a whole bunch of different things rather than having to retype out this every time 169 00:11:02,950 --> 00:11:09,800 we try a new model but lets our press shift an intermezzo our functions are instantiated and we're going 170 00:11:09,800 --> 00:11:11,600 to try and test them out the next video.