0 1 00:00:10,420 --> 00:00:11,260 Hello everyone. 1 2 00:00:11,260 --> 00:00:16,010 Welcome to the last video about the PE files. 2 3 00:00:16,030 --> 00:00:24,130 So the first two videos we spend time on understanding the structure of PE files looking at different 3 4 00:00:24,130 --> 00:00:32,440 sections and really understanding how exactly a PE file looks like, what is it consisting of and how to 4 5 00:00:32,560 --> 00:00:42,010 make sense of different sections or different attributes which are present inside the headers of a PE 5 6 00:00:42,040 --> 00:00:42,910 file. 6 7 00:00:42,910 --> 00:00:50,680 So I just wanted to quickly go through another video to help you understand what happens when you execute 7 8 00:00:50,770 --> 00:00:53,980 a program inside an operating system. 8 9 00:00:53,980 --> 00:00:58,390 Let us for simplicity consider Windows operating system. 9 10 00:00:58,390 --> 00:01:02,090 So what happens when a program is executed. 10 11 00:01:02,110 --> 00:01:11,350 So when you run the binary the shell picks up the information and it tells the p e file loader to run 11 12 00:01:11,350 --> 00:01:12,170 the file. 12 13 00:01:12,370 --> 00:01:18,320 So for every different file format operating systems have different file loaders. 13 14 00:01:18,370 --> 00:01:21,370 So because in this case we are talking about PE files. 14 15 00:01:21,370 --> 00:01:30,310 So the shell informs the PE loader to run the file. The kernel then allocates virtual memory from the pool 15 16 00:01:30,610 --> 00:01:33,810 to fit the binaries image inside it. 16 17 00:01:33,850 --> 00:01:41,830 So if you remember when we were talking about import address tables and other virtual address spaces 17 18 00:01:41,920 --> 00:01:48,190 in our previous video, we used term relative virtual address or virtual address. 18 19 00:01:48,220 --> 00:01:54,110 So I'll quickly help you understand what exactly virtual address means. 19 20 00:01:54,160 --> 00:01:58,210 So there are two types of memories on your operating system. 20 21 00:01:58,210 --> 00:02:05,800 One is the hard disk which is your local storage and the other one is your RAM or the random access 21 22 00:02:05,800 --> 00:02:08,870 memory which is a volatile dynamic memory. 22 23 00:02:08,890 --> 00:02:16,430 So when PE file is sitting on the disk it's basically saved on the local hard disk. 23 24 00:02:16,570 --> 00:02:24,790 When you run it the operating system allocates a virtual address space in which the program can run. 24 25 00:02:24,790 --> 00:02:32,470 So this is basically an optimization that the operating system implements in order to make sure that 25 26 00:02:32,770 --> 00:02:39,560 the operating system allocates enough memory for an executable to really run. If let's say you're 26 27 00:02:39,590 --> 00:02:46,270 running a program you do not want to continuously add more and more space to it you just want to allocate 27 28 00:02:46,270 --> 00:02:52,350 it a wide space so that it can stay where they're run as long as it is alive. 28 29 00:02:52,390 --> 00:02:59,140 And once you close that operating system once you close that executable, the operating system just flushes 29 30 00:02:59,140 --> 00:03:03,630 that memory out and they file still exists on the disk. 30 31 00:03:03,640 --> 00:03:09,010 So this is how memory management happens inside the operating system. 31 32 00:03:09,010 --> 00:03:14,640 When the file is sitting on the disk, it's basically having a local 32 33 00:03:14,640 --> 00:03:22,520 address space and when it is executed it has a virtual memory space in the random access memory. 33 34 00:03:22,570 --> 00:03:28,720 So what the kernel does is that it basically allocates the virtual memory for the executable to load 34 35 00:03:28,930 --> 00:03:36,950 it then loads it in that particular memory section. Then what happens is, the loader 35 36 00:03:36,950 --> 00:03:42,200 starts parsing the different sections of the PE file to start making sense out of it. 36 37 00:03:42,220 --> 00:03:46,510 I'm referring to the same sections which we talked about in the previous videos. 37 38 00:03:46,690 --> 00:03:50,230 If you remember we talked about the Import address table (IAT). 38 39 00:03:50,230 --> 00:03:58,120 So what the import address table contains is the information about all the import functions that the 39 40 00:03:58,120 --> 00:04:00,230 executable wants to use. 40 41 00:04:00,250 --> 00:04:08,730 So you might be familiar with DLLs in Windows operating system which is a dynamic link libraries. 41 42 00:04:09,210 --> 00:04:18,090 So DLLs are nothing but executable codes and functions which can be reused by any program running 42 43 00:04:18,090 --> 00:04:19,700 on the Windows operating system. 43 44 00:04:19,770 --> 00:04:23,430 So DLLs are the classic examples of code reuse. 44 45 00:04:23,430 --> 00:04:31,290 You basically create all this dynamic like libraries and you let any other executable use them on demand. 45 46 00:04:31,290 --> 00:04:39,480 For example if let's say you want to load a video inside your application then you do not have to write 46 47 00:04:39,520 --> 00:04:48,330 the entire program to create that movie file, create that player media player where it should be 47 48 00:04:48,330 --> 00:04:49,710 loaded and stuff like that. 48 49 00:04:49,770 --> 00:04:56,340 All you can do is you can simply import the Windows DLLs which are responsible for running a video 49 50 00:04:56,340 --> 00:04:57,970 file on the operating system. 50 51 00:04:58,020 --> 00:05:04,170 You can just import them inside your code and you can use them to quickly launch any video inside your 51 52 00:05:04,800 --> 00:05:05,490 file. 52 53 00:05:05,490 --> 00:05:08,560 You do not have to write that same code again and again. 53 54 00:05:08,640 --> 00:05:14,220 Another example can be let's say if you want to establish a network connection, you just don't have to 54 55 00:05:14,220 --> 00:05:15,780 write the entire code. 55 56 00:05:15,780 --> 00:05:21,600 You can use the windows DLL libraries which are responsible for setting up these socket connections, 56 57 00:05:21,600 --> 00:05:26,310 establishing the connection, doing the handshakes and stuff like that and the operating system basically 57 58 00:05:26,310 --> 00:05:28,320 takes care of all those things. 58 59 00:05:28,320 --> 00:05:36,150 So that's the major advantage of using DLLs. So every executable that is there on the windows machine 59 60 00:05:36,300 --> 00:05:41,730 they will be using some of the other forms of the DLLs because they all rely on certain properties 60 61 00:05:41,730 --> 00:05:48,950 of the operating system to run to execute and do whatever behavior they want to do on the OS(operating system).. 61 62 00:05:48,990 --> 00:05:54,500 So the information of all those DLLs is basically contained in the import address table. 62 63 00:05:54,540 --> 00:06:00,810 So as we move ahead with our other videos, import address tables(IAT) will play a key role in understanding 63 64 00:06:01,110 --> 00:06:07,890 the behavior of an operating system, what kind of properties does that operating does that executable 64 65 00:06:07,890 --> 00:06:08,370 contain. 65 66 00:06:08,370 --> 00:06:14,220 Because we all know what DLLs can be used to perform what kind of tasks. 66 67 00:06:14,310 --> 00:06:19,650 So we can immediately guess that okay this particular executable wants to 67 68 00:06:19,980 --> 00:06:26,400 connect to Internet, this particular executable it wants to launch the shell prompt and perform certain 68 69 00:06:26,400 --> 00:06:27,630 commands inside it. 69 70 00:06:27,810 --> 00:06:33,300 This particular executable is trying to launch power shell and stuff like that so it becomes slightly 70 71 00:06:33,330 --> 00:06:39,030 easy to really understand what are the actual behaviors of the executable file. 71 72 00:06:40,350 --> 00:06:47,540 So moving to the second part. What happens is that the PE loader once it has populated the import address 72 73 00:06:47,560 --> 00:06:53,640 table with the addresses of all the DLLs which are there in their memory it will start recursively 73 74 00:06:53,850 --> 00:06:59,700 passing through all those symbol functions, locating them in the memory, and assigning it to the executable. 74 75 00:06:59,730 --> 00:07:07,430 The standard process about how an operating system loads a process into the memory. 75 76 00:07:07,470 --> 00:07:12,020 The kernel then pushes the current location into an execution stack. 76 77 00:07:12,030 --> 00:07:19,380 So for example what happens is when you are executing a certain section of the code and all of a sudden 77 78 00:07:19,380 --> 00:07:23,880 you call a DLL which says that okay play the movie. 78 79 00:07:23,880 --> 00:07:33,060 So the command has to now jump from your section to another memory section where that DLL is loaded. 79 80 00:07:33,060 --> 00:07:39,270 So what the kernel does is that it stores the current execution location on the stack jumps on to the 80 81 00:07:39,270 --> 00:07:40,650 Dll's memort location. 81 82 00:07:40,650 --> 00:07:49,200 The alleged memory location execute the DLL, then come back to the same place where it had last executed 82 83 00:07:49,200 --> 00:07:52,460 your binary and then start this step instructions again. 83 84 00:07:52,470 --> 00:08:00,540 So this is how kernel does a memory mapping of your executable with other linked libraries that are 84 85 00:08:00,660 --> 00:08:04,290 that you are using from the windows operating system. 85 86 00:08:04,470 --> 00:08:10,110 So the kernel basically pushes the current location into an execution stack so that it knows where to 86 87 00:08:10,110 --> 00:08:17,100 come back to it then jumps from the current memory to the shared memory location where the DL is loaded 87 88 00:08:17,610 --> 00:08:25,050 executes it performs the action then pops the memory from the stack and returns back to begin the further 88 89 00:08:25,050 --> 00:08:27,450 execution of your program. 89 90 00:08:27,450 --> 00:08:35,370 So once the execution is complete the kernel just deletes all the virtual space that he had created for 90 91 00:08:35,370 --> 00:08:36,280 the executable. 91 92 00:08:36,400 --> 00:08:43,860 And this is how a program basically gets executed inside the operating system a very high level overview 92 93 00:08:43,860 --> 00:08:48,000 of how a program would run inside an operating system. 93 94 00:08:48,000 --> 00:08:49,440 I would highly encourage that. 94 95 00:08:49,440 --> 00:08:56,340 You can go ahead and read more about a lot of interesting operating system concepts like segmentation 95 96 00:08:56,340 --> 00:09:02,200 and paging and lot of other stuff related to dynamic linking libraries how operating system manages 96 97 00:09:02,200 --> 00:09:03,240 memory and stuff like that. 97 98 00:09:03,240 --> 00:09:05,940 Those are really really interesting computer science concepts. 98 99 00:09:07,820 --> 00:09:08,730 So at last. 99 100 00:09:08,750 --> 00:09:13,030 This is a very simple representation of what we just talked about. 100 101 00:09:13,040 --> 00:09:20,390 So if you look at the first point, it says that running program gets its own area in RAM which 101 102 00:09:20,390 --> 00:09:26,300 is our virtual memory to hold the code and the data so everything that was there inside the executable 102 103 00:09:26,510 --> 00:09:33,760 gets written onto the RAM and then the CPU basically executes all those instructions one by one. 103 104 00:09:33,890 --> 00:09:40,160 Doing all these memory management, linking with these shared libraries, executing them coming back and 104 105 00:09:40,160 --> 00:09:45,410 stuff like that so that's how a program gets executed in the memory. 105 106 00:09:45,410 --> 00:09:52,550 So this was all about understanding the basic behavior of P files and structures and how an operating 106 107 00:09:52,550 --> 00:09:54,350 system looks at a PE file. 107 108 00:09:54,500 --> 00:10:00,890 So going forward we'll start looking into specific cases of static malware analysis and then we look 108 109 00:10:00,890 --> 00:10:08,510 at a look at different cases about how we can analyze PE file to understand whether it has certain suspicious 109 110 00:10:08,510 --> 00:10:11,500 or malicious properties inside it or not. 110 111 00:10:11,750 --> 00:10:13,550 So that's all about this video. 111 112 00:10:13,550 --> 00:10:14,560 Thanks a lot for watching.