Batu69 Posted January 1, 2016 Share Posted January 1, 2016 Resistance coders, malware writers, and copyright infringers take note 32c3 Anonymous programmers, from malware writers to copyright infringers and those baiting governments with censorship-foiling software, may all be unveiled using stylistic programming traits which survive into the compiled binaries – regardless of common obfuscation methods. Video: Aylin: De-anonymizing Programmers The work, titled De-anonymizing Programmers: Large Scale Authorship Attribution from Executable Binaries of Compiled Code and Source Code, was presented by Aylin Caliskan-Islam to the 32nd annual Chaos Communications Congress on Tuesday. It was accompanied by the publication of an arxiv [PDF] titled When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries, written by researchers based at Princeton University in the US, one of whom is notably part of the Army Research Laboratory. The researchers began trying to identifying malicious programmers, noting that there is "no technical difference" between security-enhancing use-cases for mapping the style of posts, and privacy-infringing use cases. In other words, writing style betrays the writer. Many of the distinguishing features (such as variable names) in the C/C++ source code compiled and analysed by the researchers are removed when that code is compiled, and compiler optimisation procedures may further alter the structural qualities of programs, obfuscating authorship even further. However, in examining the authorship of executable binaries "from the standpoint of machine learning, using a novel set of features that includes ones obtained by decompiling the executable binary to source code," the researchers were able to show "that many syntactical features present in source code do in fact survive compilation and can be recovered from [the] decompiled executable binary." The researchers used "state-of-the-art reverse engineering methods" (as displayed in the above graph) to "extract a large variety of features from each executable binary" that would represent the stylistic quirks of programmers using feature vectors. Practically, this meant querying the Netwide disassembler and then the Radare2 disassembler, before using both the "state-of-the-art" Hex-Ray decompiler and the open source Snowman decompiler to extract 426 stylometrically significant feature vectors from the binaries for comparison. A random forest classifier was then trained on eight executable binaries per programmer, to generate accurate author models of coding style. It was thus capable of attributing authorship "to the vectorial representations of previously unseen executable binaries." The researchers noted that: "While we can de-anonymize 100 programmers from unoptimized executable binaries with 78 per cent accuracy, we can de-anonymize them from optimized executable binaries with 64 per cent accuracy." "We also show that stripping and removing symbol information from the executable binaries reduces the accuracy to 66 per cent, which is a surprisingly small drop. This suggests that coding style survives complicated transformations." In their future work, the researchers plan to investigate whether stylistic properties may be "completely stripped from binaries to render them anonymous" and also to look at real-world authorship attribution cases, "such as identifying authors of malware, which go through a mixture of sophisticated obfuscation methods by combining polymorphism and encryption." Article source Link to comment Share on other sites More sharing options...
steven36 Posted January 1, 2016 Share Posted January 1, 2016 What they going to do turn the government in for wintering malware . LMAO ? did the internet all the sudden turn safe and hackers stop hacking NOOOOO . That leads me this just PR to get grant money . Even this committer in the topic says its hog wash Quote These detection methods don't scale. With statistical detection methods the number of false positives and false negatives increases geometrically with sample size. Increase the sample size to 1000, then 10,000, and you will see its pointless except to conjure up some grant money. http://forums.theregister.co.uk/user/31834/ Link to comment Share on other sites More sharing options...
straycat19 Posted January 1, 2016 Share Posted January 1, 2016 I believe you misinterpreted the author's meaning of his remarks concerning statistical analysis in relation to this project. He meant that you don't need more samples because statistical analysis does not apply in this case and that a conclusion can be reached with fewer samples to the same reliability. Many times it is helpful if the code can be tracked down to one person or group of persons so that a direction for further investigation can be determined. Just another tool to thin out the group of possible suspects in a timely manner. Link to comment Share on other sites More sharing options...
steven36 Posted January 1, 2016 Share Posted January 1, 2016 8 hours ago, straycat19 said: I believe you misinterpreted the author's meaning of his remarks concerning statistical analysis in relation to this project. He meant that you don't need more samples because statistical analysis does not apply in this case and that a conclusion can be reached with fewer samples to the same reliability. Many times it is helpful if the code can be tracked down to one person or group of persons so that a direction for further investigation can be determined. Just another tool to thin out the group of possible suspects in a timely manner. So you believe a bunch of collage kids can do something to stop that Antivirus industry has been trying prevent since 1980 and have failed with all the money in the world for research? Everyday it gets harder and harder to detect malware . They don't make it sound like just another tool they make it sound like they could back door any cracker or hacker by De-anonymizing and figure out who they are that's hogwash . And even if it could some countries harbor hackers were the USA or the west cant touch them and these places is were most of the stuff you read in the news that's really bad is based at. Probably just another tool to give out false positives and convict the innocent . Just like the Antivirus industry and its false positives and if they was a real 0day exploit they couldn't protect us if it had too . Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.