0 1 00:00:13,120 --> 00:00:22,410 So continuing from our previous video will cover a quick demo about how we can compile source code to 1 2 00:00:22,410 --> 00:00:23,470 generate an executable. 2 3 00:00:23,760 --> 00:00:32,400 So here we'll be taking an example of C C++ based source code and then we're going to we'll be going 3 4 00:00:32,400 --> 00:00:37,290 through the same step which we talked about like it will first go to a compiler. 4 5 00:00:37,410 --> 00:00:40,700 It will generate the object file from there. 5 6 00:00:40,770 --> 00:00:48,780 It would go to a linker which will include all the header files and from there it will finally generate 6 7 00:00:49,050 --> 00:00:54,510 the executable or the exe file. Once the file is generated, 7 8 00:00:54,570 --> 00:01:02,400 We can obviously run it and if you remember the beginning few videos of this module, we talked about 8 9 00:01:02,460 --> 00:01:09,110 how actually an operating system loads and executable into the memory. 9 10 00:01:09,120 --> 00:01:15,240 So in this way we'll be able to connect that entire process together about how we start writing the 10 11 00:01:15,240 --> 00:01:23,040 code how we compile it how we generate a file executable and what happens when the operating system 11 12 00:01:23,190 --> 00:01:24,540 runs that executable. 12 13 00:01:24,750 --> 00:01:30,770 So once this entire loop is clear that's when we'll start jumping on to the actual reverse engineer 13 14 00:01:30,780 --> 00:01:31,260 problem. 14 15 00:01:31,770 --> 00:01:36,780 So if you remember in the previous video we mentioned about assembly code being getting generated out 15 16 00:01:36,780 --> 00:01:38,190 of the compiler. 16 17 00:01:38,370 --> 00:01:45,930 So if you look at the usual engineering process it follows from source code to the compilation of code 17 18 00:01:46,050 --> 00:01:48,720 to the generation of executable file. 18 19 00:01:48,720 --> 00:01:54,360 Now when you talk about reverse engineering you basically go in the other direction when you're looking 19 20 00:01:54,360 --> 00:01:59,040 at a malware sample you're actually looking at an executable file. 20 21 00:01:59,040 --> 00:02:04,810 You don't have the object file or you don't have the source code file. 21 22 00:02:05,730 --> 00:02:07,030 So what do we do. 22 23 00:02:07,080 --> 00:02:09,780 We use tools like decompilers. 23 24 00:02:09,780 --> 00:02:15,900 So just notice the term decompilers. it's actually the reverse of the compilation process. 24 25 00:02:16,020 --> 00:02:25,950 So it picks up an executable file and it convert it into a low level understandable format like assembly 25 26 00:02:26,160 --> 00:02:34,880 so that you can understand what operations would that executable do so this was the reason why I went 26 27 00:02:34,880 --> 00:02:42,020 through this entire cycle because you know if you look at an exe straight away you also have 27 28 00:02:42,020 --> 00:02:44,110 to understand where that came from. 28 29 00:02:44,100 --> 00:02:50,450 What was the process behind it and why spend so much time in understanding those things when you know 29 30 00:02:50,460 --> 00:02:55,230 you can just look at the behavior and say that OK these are the things that the malware is doing. 30 31 00:02:55,400 --> 00:03:05,120 The whole challenge is that an EXE is basically a final compiled file so you cannot 31 32 00:03:05,120 --> 00:03:09,080 look at the exact source code from which it was generated. 32 33 00:03:09,080 --> 00:03:17,030 You can do it in certain cases where defined decompilers are available for example .Net but not with 33 34 00:03:18,170 --> 00:03:27,080 other file formats or not with other programming languages in which the EXE was compiled and generated. 34 35 00:03:27,440 --> 00:03:35,420 So that's the whole reason for creating all these initial few videos to really help you understand the 35 36 00:03:35,420 --> 00:03:39,820 meaning of reverse engineering why exactly is this term called reverse engineering. 36 37 00:03:39,890 --> 00:03:48,350 That is because we are not working on the just the code itself. We also work on the final 37 38 00:03:48,420 --> 00:03:54,830 outputs and we pick it up and we start tracing it back we start tracing it back to its origins to see 38 39 00:03:55,160 --> 00:03:58,300 what exactly was the purpose of creating this file. 39 40 00:03:58,520 --> 00:04:06,700 That's the whole idea about the reverse engineering piece of malware analysis so let's move to a demo 40 41 00:04:06,970 --> 00:04:07,330 here. 41 42 00:04:07,340 --> 00:04:09,540 I'm using an example file called 42 43 00:04:10,120 --> 00:04:19,830 Hello.c. So I will also include this example file in the course resource section so you can download 43 44 00:04:19,830 --> 00:04:23,790 it and you can test these on your own as well. 44 45 00:04:28,800 --> 00:04:36,480 Now I will be using the standard GCC compiler which is used for compiling C++ plus programs. 45 46 00:04:36,540 --> 00:04:39,000 You can easily download it from the internet. 46 47 00:04:39,000 --> 00:04:41,800 I can provide you the link in the resource section as well. 47 48 00:04:42,180 --> 00:04:47,640 So once you have downloaded the compiler you have to make sure that the path is defined as well. 48 49 00:04:48,300 --> 00:04:58,800 So once you have defined the path Gcc would be recognized as a command-line operation so we can give 49 50 00:04:58,800 --> 00:05:09,510 the command GCC -s hello.c. So -s tells me to generate the assembly 50 51 00:05:11,400 --> 00:05:12,260 code of the file. 51 52 00:05:12,330 --> 00:05:20,250 So if you see here a new file called Hello.s has been created. Let's look at the content of Hello.s 52 53 00:05:20,360 --> 00:05:21,300 now 53 54 00:05:25,260 --> 00:05:27,270 so that's how the file would look like 54 55 00:05:31,610 --> 00:05:33,000 you can see here. 55 56 00:05:33,330 --> 00:05:38,960 It tells me that the original compile file was hello.c. The main function. 56 57 00:05:39,050 --> 00:05:43,700 Then if you start looking here there are a bunch of assembly instructions. 57 58 00:05:43,700 --> 00:05:50,470 So we'll be going through in much more detail as to what those assembly instructions actually mean. 58 59 00:05:50,510 --> 00:05:57,500 But this was just an demonstration to show you how we can generate the assembly if we have the source 59 60 00:05:57,500 --> 00:06:03,550 code so we can just combine that code to generate the assembly file. 60 61 00:06:03,560 --> 00:06:09,950 So now if you want to generate a final executable by using your source code file, all you have to do is 61 62 00:06:10,250 --> 00:06:17,970 use gcc -0 and then give a name to the executable let's say hello 62 63 00:06:18,650 --> 00:06:22,590 And then the name of your source code file which is Hello.c. 63 64 00:06:22,600 --> 00:06:30,050 When you run it, GCC will compile it and generate an EXE file for you. 64 65 00:06:30,050 --> 00:06:32,140 So this is our EXE file. 65 66 00:06:32,180 --> 00:06:37,130 Now when you run the file you get the output as Hello World. 66 67 00:06:37,190 --> 00:06:40,190 So that's what the program was designed to do. 67 68 00:06:40,190 --> 00:06:41,710 So if you look at. 68 69 00:06:41,880 --> 00:06:44,240 So if you look at the program 69 70 00:06:50,090 --> 00:06:55,960 you'll see that it's a pretty straightforward standard program which includes stdio.h, then define 70 71 00:06:55,970 --> 00:07:02,890 some main function and then uses printf to print Hello World onto your screen and then return 0. 71 72 00:07:02,900 --> 00:07:09,080 So that's how we converted a source code into an executable. 72 73 00:07:09,080 --> 00:07:12,290 Now when you are doing the malware analysis it will be the other way around. 73 74 00:07:12,830 --> 00:07:21,590 It's easy to understand how exactly or what actions a source code is doing but you cannot really look 74 75 00:07:21,590 --> 00:07:26,600 at any EXE file and say that without even running it 75 76 00:07:26,700 --> 00:07:33,210 You cannot say that what actions it is doing so if you look at the content of a PE file, you won't see much 76 77 00:07:33,210 --> 00:07:39,280 inside like it's all random and it will basically point you to file's parsing information, things 77 78 00:07:39,300 --> 00:07:41,820 that we had looked out in the previous videos. 78 79 00:07:42,390 --> 00:07:47,580 So this was a quick demonstration to help you really understand about the process of compilation. 79 80 00:07:47,580 --> 00:07:54,450 Now we have covered a lot of ground on how exactly programs are created and how exactly they are compiled 80 81 00:07:54,450 --> 00:07:58,430 and how operating systems load them into the memory. 81 82 00:07:58,710 --> 00:08:04,980 We have this background built up so we can start focusing on the reverse engineering process where we would 82 83 00:08:04,980 --> 00:08:11,670 start compiling executable files to understand what operations they are doing on the system. 83 84 00:08:11,670 --> 00:08:12,810 Thanks for watching this video.