[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Can anyone explain reverse engineering a program ? Like, how
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /sci/ - Science & Math

Thread replies: 23
Thread images: 1
File: 1459386937241.jpg (51 KB, 591x855) Image search: [Google]
1459386937241.jpg
51 KB, 591x855
Can anyone explain reverse engineering a program ? Like, how do you match up a senseful bulks of code from just a lot of opcodes and assembly?
>>
>>7978975
The term you're looking for is "visualization." There are no good decompilers just like there are no fully optimizing compilers. To native assembly programmers those bulks of code ARE sensible already, you're just not fluent in the language. It's a very hard problem made all the harder when you don't have a formal notion of what an algorithm is.
>>
>>7978975
There is software out there that helps with this. Shows you memory addresses, and functions etc. It's still no easy task however, but anyone who does it seriously is not just going straight from assembly.
>>
>>7979011
Example
https://www.youtube.com/watch?v=9VO74HdCex0
>>
>>7979004
>>7979011
>>7979017
ty anons, Im looking to become a software engineer as i find this kind of stuff interesting
>>
>>7978975

Manual reverse engineering:

You're going to want to use an interactive debugger like IDA. It'll provide some useful tools like the ability to manually follow branches in assembly, and it'll also provide a friendly visual representation of program control flow between basic blocks. You'll essentially need to trace inputs as they propagate through the target program and identify the addresses in the program code section at which whatever action you're interested in happens. Using the interactive debugger, you can incrementally step through instructions, observing register state so as to deduce how the program is working.

Once you get comfortable with using an interactive debugger, you can work with automatic methods. For example, a common area of work in program verification in computer science is something called concolic execution. Concolic execution is a mixture of concrete (using real inputs) and symbolic (using logical variables) execution. It works like this: manually inspect your target binary to determine a point of interest (in the form of an instruction address). Next, choose a target region of memory (or register--doesn't matter) that you're looking to influence to be a certain value. Finally, with these in hand, you can run the binary through a concolic execution engine which uses underconstrained approximations to satisfiability to determine what inputs the program needs to see to reach your desired state.

Have fun, OP!
>>
>>7979028

As a software engineer, you usually won't be working directly with assembly (unless you're working in compilers or static/dynamic analysis tools).

Which domains are you interested in?
>>
>>7979028
Well, reverse engineering isn't as helpful as you'd hope it was. Most programs these days are written at extremely high levels of abstraction to the point where "decompiling" will produce pages of entirely redundant code. Even with access to the original source code of a given application, it'll still likely be filled with lots of disorganized junk code that slows the rate at which a programmer can read and understand the coded algorithm. If you have a particular interest in decompiling and reverse engineering specifically, I'd recommend looking at older gaming systems than trying to go at modern architectures.
>>
>>7979065

This is, in general, false. In compiled languages, regardless of the level of abstraction occurring in a high-level programming language, control flow is highly optimized by the compiler to a set of well-defined (and well-known) patterns. The exception to this is machine code obfuscation, which is an additional layer of protection some more paranoid software development agencies employ.

Reverse engineering doesn't necessarily involve code decompilation (though this is a legitimate research area). It instead focuses on the methods required to deduce something about the way a program works to an extent to where it could (hypothetically) be modified to achieve some predictable result.
>>
>>7979065
>>7979076

An addendum to this: most compiled code is written in C++. The best way to get practice with reverse engineering code would be to reverse engineer your own toy programs written in C++ and compiled using a common compiler like clang or g++.
>>
>>7978975

Alot of simple programs decompile quite easily. Unless they compiled with debug=true then you probably wont get any function names though. (you would be surprised how many programs still have all the debug shit intact though)

I have decompiled crippleware programs so I could keep running them before. I have also decompiled my own programs which I carelessly lost the source for.

It really is not hard to decompile a small program.
>>
>>7979076
>control flow is highly optimized by the compiler
That's often highly unreliable. Some compilers do a decent job but most are, themselves, bloated by the same mess of code that they're trying to piece together in the first place.
>>
>>7979132

I'm sorry, but the decades of work in compilers objectively quantifying optimization in compilers directly contradicts what you're saying regarding reliability, and the current industry state of the art in which compiler optimization is extremely common directs your implication that somehow patterned optimization is somehow rare.

Let's be honest for a moment. Do you actually have any idea what you're talking about, or do you just like to see yourself post?

I've been passively noting the threads you post in, and, given the raw frequency and breadth of the topics you seem to respond to, I had conjectured that it's extremely unlikely that there's much content to what you're posting. The fact that you've posted such inaccurate information on a topic I'm personally fairly experienced with has seemingly confirmed this suspicion.

Objectively, what do you get out of this?
>>
>>7979557
CS is one of the things I'm extremely well versed with. Most optimization is directed at efficiency in terms of compiled machine language algorithms. It doesn't generally benefit anyone to be able to take the compiled code and visualize it concisely at a higher level of abstraction. When you need it to be higher level, you can usually work with the natural source code rather than trying to decompile it. I'm coming at this from a practical angle since I deal with systems software research. You sound like a more theoretical student. I'm not saying optimization is rare, I'm saying the algorithms behind the code don't gain any visualization points as a result of optimization research. Basically the code is a mess no matter how you attack it and it won't make too much of a difference how you try to visualize a given algorithm because it'll still just be the same algorithm in the end.

You can't do non-theoretical CS without some level of tolerance to terrible code. That's all I was saying.
>>
>>7979576

Whether or not you are well-versed in computer science in general, there comes a point where you should realize when you're talking about a specialty you're not so aware of. As a point of reference, although I do have an interest in theoretical computer science, most of my work and research (until recently) has been in systems research from the perspective of security. I work at a very large company which performs low-level code analysis daily. One specific project of ours which has gained a lot of traction, especially with our government sponsors, involves reverse engineering cryptographic drivers--using compiler-level optimization structures--using control flow graphs using basic blocks.

There are plenty of companies in the wild whose living comes from doing precisely what you're saying is useless. If you'd like an example, feel free to research any modern antivirus software provider.

From a systems security perspective, terrible code is a wonderful entry point for valuable research.
>>
>>7979605
>useless
I'm not saying it's useless, I'm saying it's generally intolerable. It's fine if you have a career that benefits from an ecosystem of bad code, but for most of us it's just terrible to work with.
>>
>>7979004
Like this anon said. As an example, certain patterns of opcodes become recognizable as common constructs in higher level languages. A sequence of loading, adding, storing then jumping back to the load statement is a simple loop. Making this jump conditional and subtracting a constant from the sum and checking whether its positive or negative before the jump is common as well.
>>
>>7979643
If it was that easy, why isn't there a good decompiler?

Can even the language used to write the code be determined, or can code be reversed into several different languages
>>
>>7979661
>why isn't there a good decompiler?
I'd assume its because there isn't a one-to-one mapping between operations and the fact that there's multiple ways of implementing certain constructs.

>>7979661
>Can even the language used to write the code be determined, or can code be reversed into several different languages
I think this is very ISA specific and I don't know enough to give a good answer on this.

To be honest, I'm more experienced in assembler writing versus compiler writing.
>>
>>7979661
>If it was that easy, why isn't there a good decompiler?

Decompiling code is actually very easy; the problem is that little things like "formatting", "variable names", and "comments" are artifacts of the language and thus not present in the compiled code. Also, compiler optimizations mean that the compiled code does not have a direct one-to-one relationship with the written code; redundancies and inefficient patterns are automatically rewritten, leaving no way to figure out if they were originally written that way or used to be equivalent slower code.

All of this makes decompiled code a nightmare to understand.
>>
>>7979689
>used to be equivalent slower code
Also noting that there are many different patterns that are equivalent to the same set of optimized instructions, and you don't know which it used to be.
>>
>>7979557
>Do you actually have any idea what you're talking about, or do you just like to see yourself post?

Got it in one. He's the somewhat-more-coherent successor to M. Fractals, our previous resident crank/shitposter
>>
i can explain it..

write a function and compile it. now dexompule it and look at the god damn op codes.
Thread replies: 23
Thread images: 1

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.