0 1 00:00:01,940 --> 00:00:03,660 So let's go to PDF parser. 1 2 00:00:10,640 --> 00:00:21,670 We will again run as >python pdf-parser.py then give the location of the pdf file 2 3 00:00:21,670 --> 00:00:27,100 example1.pdf. Press 'Enter' and it throws bunch of result to us. 3 4 00:00:27,100 --> 00:00:37,150 So the first result of PDF parser is nothing but the complete raw output of the PDF file. 4 5 00:00:37,150 --> 00:00:46,730 You can see that it begins with PDF magic bytes which tells us that it's a PDA file of version 1.4. 5 6 00:00:46,760 --> 00:00:49,290 Then we have objects inside it. 6 7 00:00:49,340 --> 00:00:54,960 You can just keep scrolling down you can see there is one object that contains Stream 7 8 00:00:58,590 --> 00:01:00,200 as you move down. 8 9 00:01:00,270 --> 00:01:02,060 So there is another object. 9 10 00:01:02,070 --> 00:01:09,000 This might seem like suspicious but you have to look at what's what's exactly there inside this particular 10 11 00:01:09,000 --> 00:01:09,510 dictionary. 11 12 00:01:09,510 --> 00:01:16,440 So it seems like it's a font setting element where this PDF has some specific font setting element 12 13 00:01:16,470 --> 00:01:17,970 these are basically 13 14 00:01:18,000 --> 00:01:24,680 the hex representation of the value of that font. 14 15 00:01:24,720 --> 00:01:31,680 So it's not really something critical in terms of maliciousness of the file. You can further come down. 15 16 00:01:34,010 --> 00:01:38,060 So these objects that contain stream these can be of interest. 16 17 00:01:38,210 --> 00:01:45,650 But as you see these objects have been referenced so we have to look who actually is trying to reference 17 18 00:01:45,650 --> 00:01:55,340 to these or whether they are actually being referenced or they are just some placeholders. 18 19 00:01:55,390 --> 00:02:05,020 So if you move down you object 24 tells us that it's basically having a javascript and the javascript 19 20 00:02:05,020 --> 00:02:12,560 is executing a URL with unescape. if you further move down object 25. 20 21 00:02:12,620 --> 00:02:18,430 That's more about the title of PDF and that say we have the end of file 21 22 00:02:22,680 --> 00:02:23,000 OK. 22 23 00:02:23,030 --> 00:02:30,740 In order to quickly search for anything inside the PDF, the option that pdf parser gives us is '-s' 23 24 00:02:30,830 --> 00:02:35,960 with this parameter, you can search for any string inside the inside the PDF. 24 25 00:02:36,110 --> 00:02:39,460 Let's say I want to look for 'javascript' 25 26 00:02:41,860 --> 00:02:45,550 So it gets me all the locations where javascript has been located. 26 27 00:02:45,640 --> 00:02:51,880 For example object number 24 contains javascript and it has the actual script as well. 27 28 00:02:52,800 --> 00:03:03,010 and there is another subject object 26, which contains a dictionary that is calling the 28 29 00:03:03,010 --> 00:03:07,620 javascript and referencing to object number 23. 29 30 00:03:07,660 --> 00:03:14,090 So let us see what exactly is there in object number 26. 30 31 00:03:14,110 --> 00:03:22,370 I think that is going to be the same data that we see here but let's run '-o' which is for object 31 32 00:03:22,520 --> 00:03:25,090 and pass it object number which is 26. 32 33 00:03:25,250 --> 00:03:29,580 So if we press enter it gives us the content of object number 26. 33 34 00:03:29,780 --> 00:03:37,700 So again the object number 26 says that it's trying to call a javascript that is 34 35 00:03:37,710 --> 00:03:38,420 at object number 23 35 36 00:03:38,420 --> 00:03:42,460 So let's go to object 23 and see what's there. 36 37 00:03:43,730 --> 00:03:46,990 So object 23 is interesting here. 37 38 00:03:47,060 --> 00:03:49,190 It's not really doing anything. 38 39 00:03:49,190 --> 00:03:52,370 It is just referencing to object number 24. 39 40 00:03:53,200 --> 00:03:56,600 And you guys know what is there an object on 24. 40 41 00:03:57,700 --> 00:04:00,340 It's our javascript that we just now saw. 41 42 00:04:00,340 --> 00:04:09,220 So this is basically a kind of way by which malware authors try to create a sort of loop so that the 42 43 00:04:09,220 --> 00:04:14,280 PDF tools are not able to quickly recognize where the javascript is located. 43 44 00:04:14,470 --> 00:04:22,330 So if you see there was object 26 was referencing to object number 23 44 45 00:04:22,340 --> 00:04:25,610 in an object number 23 referenced to object number 24. 45 46 00:04:25,620 --> 00:04:30,870 And it was object 24 that actually contained the javascript inside it. 46 47 00:04:31,390 --> 00:04:33,100 So we have the javascript here. 47 48 00:04:33,220 --> 00:04:35,980 Now it's a simple unescape script. 48 49 00:04:36,040 --> 00:04:42,220 All you have to do is just append a document.write to it and you can see what exactly this javascript 49 50 00:04:42,220 --> 00:04:53,230 translates into. Let us quickly analyze another example. 50 51 00:04:53,260 --> 00:04:57,880 So again it's a pretty long output and we already have a result from PDFid. 51 52 00:04:57,890 --> 00:05:01,450 that example2.pdf also contains javascript 52 53 00:05:01,570 --> 00:05:02,900 So let us search for that 53 54 00:05:08,290 --> 00:05:09,090 OK. 54 55 00:05:09,110 --> 00:05:18,640 So it's saying that there is a script that is referencing to an action. 55 56 00:05:18,810 --> 00:05:24,120 So lets search for action here. 56 57 00:05:24,150 --> 00:05:25,820 What exactly it does. 57 58 00:05:25,890 --> 00:05:34,710 OK so if we look at the referencing action, this javascript is trying to launch command.exe. From there 58 59 00:05:34,860 --> 00:05:43,810 It's going to home drive. it's looking weather template.pdf exists on desktop or not. 59 60 00:05:43,810 --> 00:05:45,260 or not. 60 61 00:05:45,280 --> 00:05:49,370 If that file exists it's actually executing it. 61 62 00:05:50,270 --> 00:05:55,070 So this is what this javascript is trying to do it's basically a launch action as soon as you launch 62 63 00:05:55,070 --> 00:05:55,980 that PDF, 63 64 00:05:56,060 --> 00:05:59,560 This is the particular portion of the script that will get executed. 64 65 00:05:59,560 --> 00:06:05,870 So that is how we follow the trails and try to understand what the javascript is trying to do.