1 00:00:00,360 --> 00:00:07,350 Long before going deep dive into the session, let's have a quick recap of what we all have done in 2 00:00:07,350 --> 00:00:08,810 all our previous session. 3 00:00:09,150 --> 00:00:16,170 So from importing this data, we have the data from databases as well as from our seats flight that 4 00:00:16,170 --> 00:00:23,010 we have perform analysis on our data using this explo, which is exactly my sentiment analysis with 5 00:00:23,010 --> 00:00:31,350 respect to this summary feature after we have analyzed here what exactly the expatriated analysis for 6 00:00:31,680 --> 00:00:34,870 positive sentiment or you can say positive sentences. 7 00:00:34,890 --> 00:00:40,470 So this is exactly the accelerated analysis for our positive sentiment. 8 00:00:40,530 --> 00:00:42,690 So in this session, we have this assignment. 9 00:00:42,690 --> 00:00:49,650 The very first one is we have to perform this exploratory analysis for the negative sentences. 10 00:00:49,650 --> 00:00:57,570 And this is exactly the second statement in which I have to analyze to what type of users Amazon can 11 00:00:57,570 --> 00:00:59,540 recommend more product. 12 00:00:59,670 --> 00:01:02,430 So it's a very popular use case of Amazon. 13 00:01:02,430 --> 00:01:04,840 And believe me, it's a very popular use case of Amazon. 14 00:01:05,310 --> 00:01:11,150 So so, yeah, let's let's go ahead with very first statement in which you have to perform this analysis 15 00:01:11,700 --> 00:01:17,370 so what you guys can do rather than writing code again and again, I'm going to share your trick. 16 00:01:17,370 --> 00:01:19,340 Just copy and paste. 17 00:01:19,710 --> 00:01:20,090 Yeah. 18 00:01:20,100 --> 00:01:23,870 Because most of the time you have to do this as well. 19 00:01:23,880 --> 00:01:29,370 So I'm just going to copy and just going to paste, just do some modifications. 20 00:01:29,730 --> 00:01:33,510 That said, here I am going to say it is less than zero. 21 00:01:33,510 --> 00:01:40,740 And here I will say this time my data is nothing but data on this go negative. 22 00:01:40,770 --> 00:01:48,660 So now what we have to do in this huge chunk of data, you will figure out you have to just combine 23 00:01:48,660 --> 00:01:50,880 this huge chunk of data here. 24 00:01:50,880 --> 00:01:58,350 I'm going to say I have to combine now, but make sure you have to combine in this data on a score negative 25 00:01:58,350 --> 00:01:58,630 data. 26 00:01:58,920 --> 00:02:01,560 So I have to just combine all this stuff here. 27 00:02:01,560 --> 00:02:06,150 I'm going to say this my total and a score text, too, so just executed. 28 00:02:06,180 --> 00:02:10,500 Now what you have to do, you have to perform some cleaning on your data. 29 00:02:10,650 --> 00:02:13,320 So which is exactly this one. 30 00:02:13,560 --> 00:02:21,140 So now I have to just paste over here and this time I'm going to save my data is total tax too. 31 00:02:21,420 --> 00:02:25,110 And this is also total tax to now what we have to do. 32 00:02:25,110 --> 00:02:29,850 We have to remove some extra spaces if it is available in data. 33 00:02:30,030 --> 00:02:33,530 So let me check if it is available in data. 34 00:02:33,560 --> 00:02:34,260 Not so. 35 00:02:34,260 --> 00:02:37,090 You will figure out you still have some extra spaces. 36 00:02:37,380 --> 00:02:44,700 So now what you have to do, you have to just copy these blocks of code from here and you have to just 37 00:02:44,910 --> 00:02:46,650 paste over here this time. 38 00:02:46,650 --> 00:02:53,770 I would say it is my tool and this is also my total in this context on a scale to just executed. 39 00:02:53,790 --> 00:02:59,990 Now it's time for to generate your beautiful visual, which is exactly my workout. 40 00:03:00,270 --> 00:03:06,960 So here I have to say my children to school text to this time this is my word cloud to this in my word 41 00:03:06,960 --> 00:03:08,130 cloud to here. 42 00:03:08,130 --> 00:03:14,460 I had to slide my word cloud two and after it all is of similar just executed. 43 00:03:14,460 --> 00:03:20,400 It will take some couple of seconds to showcase that beautiful word out in front of you. 44 00:03:20,430 --> 00:03:23,250 So it is exactly that word cloud here. 45 00:03:23,250 --> 00:03:26,430 You will see the negative sentiment customers. 46 00:03:26,580 --> 00:03:34,830 They are basically going to use these words, say disappointed, bad taste, horrible, terrible, expansive, 47 00:03:34,830 --> 00:03:44,150 expected flavor, little and all these kind of keywords are user or customer are going to Praful. 48 00:03:44,310 --> 00:03:50,970 So now let's move to an exponent statement in which you have to perform analysis and you have to come 49 00:03:50,970 --> 00:03:56,970 up with some conclusion to what type of user Amazon recommend more products. 50 00:03:57,240 --> 00:03:59,520 So far, this is what I'm going to do. 51 00:03:59,550 --> 00:04:07,860 So very first you have to understand to who or you can see what type of users Amazon recommend products. 52 00:04:07,860 --> 00:04:15,450 So Amazon blackmond more products to only those one who are going to buy more or two, those one who 53 00:04:15,450 --> 00:04:17,100 has a better conversion rate. 54 00:04:17,430 --> 00:04:22,380 So very first to have to make your data ready for your analysis purpose. 55 00:04:22,740 --> 00:04:29,550 So what I am going to do in this, let me show you are things if I'm going to call this dot head or 56 00:04:29,550 --> 00:04:31,860 you can also call did it or it's all up to you. 57 00:04:32,310 --> 00:04:39,570 So in this doctor had you will see here you have a column, this user I'd seen this user idea. 58 00:04:39,960 --> 00:04:45,920 You have to search those users to which Amazon is going to recommend product. 59 00:04:46,170 --> 00:04:54,630 So in this user I.D., if I have to ask this user ID and on this, if I am going to call my end unique 60 00:04:54,750 --> 00:04:59,970 number of unique user, you will observe it has that much unique. 61 00:05:00,450 --> 00:05:06,960 So from that huge chunk of yuzu, you have to extract some top end users, you have to extract some 62 00:05:07,260 --> 00:05:10,940 top 20 users depending upon different different qualities. 63 00:05:10,950 --> 00:05:13,500 They have different different specifications. 64 00:05:13,500 --> 00:05:15,710 They have like more number of products. 65 00:05:15,720 --> 00:05:18,420 Those users are going to buy more. 66 00:05:18,450 --> 00:05:24,540 The recommendation that a user have for it means you have to up this huge chunk of data on the basis 67 00:05:24,540 --> 00:05:33,300 of this user I.D. So here I am going to say, does dot group buy on the basis of this user ID? 68 00:05:33,300 --> 00:05:36,290 I have to group my data once I recoup my data. 69 00:05:36,510 --> 00:05:39,200 I had to aggregate my all of the features. 70 00:05:39,390 --> 00:05:46,530 So here I am going to say, oh, I need total summary of each and every user. 71 00:05:46,560 --> 00:05:49,860 I need this total summary of what I can say. 72 00:05:49,860 --> 00:05:54,820 I need count of summary of each and every user. 73 00:05:54,840 --> 00:06:03,090 So here I am going to say I have to aggregate and I have to aggregate the summary as by performing my 74 00:06:03,120 --> 00:06:05,360 count toward you in a similar way. 75 00:06:05,400 --> 00:06:15,240 I'm going to say on this text I'm also going to perform my count because I need every user has given 76 00:06:15,240 --> 00:06:17,430 how much number of feedbacks. 77 00:06:17,640 --> 00:06:24,300 So I'm going to say on this text, I have to just perform this count operation. 78 00:06:24,600 --> 00:06:31,890 And after it, I'm just going to say I just need a meniscal of each and every user. 79 00:06:31,900 --> 00:06:36,410 So here I am going to say I need a mean score. 80 00:06:36,420 --> 00:06:37,580 So is goal. 81 00:06:37,890 --> 00:06:42,050 And here you have to mention that data in the form of dictionary. 82 00:06:42,330 --> 00:06:50,970 And now what you have to do in this product ID, you have to perform again, gone because I have to 83 00:06:50,970 --> 00:06:53,940 find you each and every user has. 84 00:06:53,940 --> 00:06:56,370 But how many number of products? 85 00:06:56,700 --> 00:07:05,010 So here I am going to say on this product ID, I have to just perform this count operation now if I'm 86 00:07:05,010 --> 00:07:07,140 going to execute this blocks of code. 87 00:07:07,150 --> 00:07:14,080 So it will exactly return me this beautiful statistics on which you have to analyze this data. 88 00:07:14,430 --> 00:07:17,430 So let's say I have to short this data frame. 89 00:07:17,430 --> 00:07:19,580 Let's have to short this all the stuff. 90 00:07:19,860 --> 00:07:24,570 So for this, I'm going to say not short on this core values. 91 00:07:24,570 --> 00:07:29,090 And here you have a parameter which is exactly your by. 92 00:07:29,310 --> 00:07:35,820 So here I am going to say I have to short this on the basis of basically this text. 93 00:07:36,240 --> 00:07:42,090 I'm going to say I showed this on the basis of X, then I have to show this data any form of, let's 94 00:07:42,090 --> 00:07:43,770 say, descending order for this. 95 00:07:43,770 --> 00:07:51,480 I have to set my ascending parameter as false because by default it decides to just execute it again. 96 00:07:51,480 --> 00:07:53,620 It will take some couple of seconds. 97 00:07:53,640 --> 00:08:00,660 Now, in this table, you will figure out this user has that much number of somebody that has given 98 00:08:00,660 --> 00:08:02,160 that much number of feedback's. 99 00:08:02,940 --> 00:08:09,860 This user has that much meniscal and he has bought that much number of product. 100 00:08:10,080 --> 00:08:14,650 So it is a beautiful statistics with respect to each and every user I.D.. 101 00:08:15,000 --> 00:08:18,510 Now, you will say that all these are exactly my top 10 users. 102 00:08:18,510 --> 00:08:21,630 You will see let's say I have to store it somewhere. 103 00:08:21,750 --> 00:08:24,960 Let's say I will say it is my let's say recorded of him. 104 00:08:24,960 --> 00:08:25,330 It's all up. 105 00:08:25,440 --> 00:08:30,990 You know what I'm going to do if I'm going to print now in this raw data frame? 106 00:08:31,200 --> 00:08:33,280 What we have to do, we have to manage. 107 00:08:33,510 --> 00:08:35,180 It is still the column names. 108 00:08:35,490 --> 00:08:39,480 So here I am going to say are the DOT columns. 109 00:08:39,480 --> 00:08:43,120 Let's say I have to assign my own column name next to the very first column. 110 00:08:43,120 --> 00:08:49,770 Name that I have to assign is exactly my, let's say, number of some B so here I'm going to say number 111 00:08:49,770 --> 00:08:52,830 of summaries. 112 00:08:53,010 --> 00:09:01,200 The second column name that I want to assign is exactly my let's say num underscore text or number of 113 00:09:01,350 --> 00:09:01,830 text. 114 00:09:02,310 --> 00:09:10,830 The third one is exactly my let's say average on this score, average score or you can say it is meniscal 115 00:09:10,830 --> 00:09:13,520 and the fourth column, which is exactly as it is. 116 00:09:13,720 --> 00:09:23,550 I have to assign, let's say, number of product that are usable, just our number of products user 117 00:09:23,550 --> 00:09:25,380 purchased are purchased. 118 00:09:25,380 --> 00:09:28,280 It's all appeal whatever column name you want to assign. 119 00:09:28,680 --> 00:09:34,790 Now, after doing all these things, let's say I have to just print my data and just execute it. 120 00:09:34,800 --> 00:09:40,640 This is that finally the frame that you have to consider for your visual purpose. 121 00:09:40,650 --> 00:09:47,190 Now, let's say from this huge chunk of data, I just need my top ten users. 122 00:09:47,430 --> 00:09:53,910 So I'm just going to say I'm going to use my body plot from my matplotlib. 123 00:09:53,910 --> 00:09:59,430 So I'm going to save BLT Dot Dot and here, if you will, press crosstab. 124 00:09:59,720 --> 00:10:06,200 You will figure out all these custom parameters of the sanction, what is X, what is all these kinds 125 00:10:06,200 --> 00:10:06,520 of things? 126 00:10:06,560 --> 00:10:13,340 What is X height, weight and all these things, let's say on X axis? 127 00:10:13,580 --> 00:10:18,050 I just need the user I.D. for this. 128 00:10:18,050 --> 00:10:24,470 What I'm going to do, I'm just going to say it is nothing but my, let's say, raw dot index. 129 00:10:24,470 --> 00:10:28,160 And in this index I just need my top 10 index. 130 00:10:28,380 --> 00:10:30,350 Let's say I have to store it somewhere. 131 00:10:30,740 --> 00:10:34,970 So for design, I'm say to nothing but my user and the score then. 132 00:10:35,270 --> 00:10:36,410 So just executed. 133 00:10:36,440 --> 00:10:43,610 Now what we have to do very first on this X, we have to assign this user 10 and now we have some other 134 00:10:43,610 --> 00:10:44,410 parameter. 135 00:10:44,450 --> 00:10:48,080 The second parameter, the most important one is exactly your height. 136 00:10:48,410 --> 00:10:52,040 Let's say in this height, I just need this count. 137 00:10:52,310 --> 00:10:56,980 So here I'm going to say it is nothing but my raw all. 138 00:10:57,170 --> 00:11:02,950 Let's say a number of or you can say number of product purchased. 139 00:11:03,200 --> 00:11:11,660 So here I will say it is nothing but a number of punches and I just need to top ten people, whatever 140 00:11:11,990 --> 00:11:14,870 you want, I have to store it somewhere else. 141 00:11:14,870 --> 00:11:16,280 So I'm going to say it is nothing. 142 00:11:16,280 --> 00:11:20,520 But my number underscored ten just executed. 143 00:11:20,540 --> 00:11:24,880 Now you have to pass this number underscored that over here. 144 00:11:25,280 --> 00:11:28,130 Now you have some additional parameters. 145 00:11:28,130 --> 00:11:34,550 Let's say what label you want to assign to my label is nothing but my let's say whatever label you want 146 00:11:34,550 --> 00:11:37,400 to assign, let's say most recommended users. 147 00:11:37,400 --> 00:11:42,560 So I'm going to say most are recommended users. 148 00:11:42,560 --> 00:11:46,120 And after it, what I am going to do, I'm just going to execute it. 149 00:11:46,120 --> 00:11:47,750 It will take some couple of seconds. 150 00:11:48,020 --> 00:11:55,190 You will still see over here this graph looks a little bit messy and still you don't have some extra 151 00:11:55,190 --> 00:11:56,090 button available. 152 00:11:56,360 --> 00:12:03,950 So for this, let's say I'm just going to assign some X level using this BLT dot X label. 153 00:12:04,190 --> 00:12:12,290 And let's say my ex label is, let's say user I.D. And after it, we have to assign some white label 154 00:12:12,290 --> 00:12:12,830 as well. 155 00:12:12,830 --> 00:12:17,570 So my label is nothing but let's say no product Porchester here. 156 00:12:17,570 --> 00:12:26,290 I would say number of dollars for change and after that we have to rotate our X label as well. 157 00:12:26,540 --> 00:12:35,890 So for this I would say BLT dot, x, x and here I have a barometer which is exactly my rotation parameters. 158 00:12:35,910 --> 00:12:38,450 Here I'm going to say rotation. 159 00:12:38,720 --> 00:12:41,300 It goes to all the girls. 160 00:12:41,690 --> 00:12:43,460 So execute it. 161 00:12:43,460 --> 00:12:45,070 It will take some couple of seconds. 162 00:12:45,440 --> 00:12:49,430 And this is a beautiful stat that you exactly need. 163 00:12:49,730 --> 00:12:57,200 And if you want some more beautiful with additions, you guys can go ahead with module that we all are 164 00:12:57,200 --> 00:13:00,950 going to cover in all our upcoming projects. 165 00:13:00,950 --> 00:13:09,590 So from this board plot, you will conclude these are exactly my top ten users so we can recommend more 166 00:13:09,590 --> 00:13:18,020 and more product to this user ID as there is a high probability that this person are going to buy the 167 00:13:18,020 --> 00:13:18,620 product. 168 00:13:19,190 --> 00:13:21,320 So that's all about this conclusion. 169 00:13:21,320 --> 00:13:23,940 That's all about this problem statement as well. 170 00:13:23,940 --> 00:13:26,090 I hope you love the session very much. 171 00:13:26,390 --> 00:13:27,080 Thank you. 172 00:13:27,080 --> 00:13:28,060 Have a nice day. 173 00:13:28,250 --> 00:13:29,090 Keep learning. 174 00:13:29,090 --> 00:13:31,160 Keep growing, keep motivating.