Jump to content

Forget anonymity, we can remember you wholesale with machine intel, hackers warned


Batu69

Recommended Posts

Resistance coders, malware writers, and copyright infringers take note

32c3 Anonymous programmers, from malware writers to copyright infringers and those baiting governments with censorship-foiling software, may all be unveiled using stylistic programming traits which survive into the compiled binaries – regardless of common obfuscation methods.

 

Video: Aylin: De-anonymizing Programmers

 

The work, titled De-anonymizing Programmers: Large Scale Authorship Attribution from Executable Binaries of Compiled Code and Source Code, was presented by Aylin Caliskan-Islam to the 32nd annual Chaos Communications Congress on Tuesday.

 

It was accompanied by the publication of an arxiv [PDF] titled When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries, written by researchers based at Princeton University in the US, one of whom is notably part of the Army Research Laboratory.

 

The researchers began trying to identifying malicious programmers, noting that there is "no technical difference" between security-enhancing use-cases for mapping the style of posts, and privacy-infringing use cases. In other words, writing style betrays the writer.

 

Many of the distinguishing features (such as variable names) in the C/C++ source code compiled and analysed by the researchers are removed when that code is compiled, and compiler optimisation procedures may further alter the structural qualities of programs, obfuscating authorship even further.

 

However, in examining the authorship of executable binaries "from the standpoint of machine learning, using a novel set of features that includes ones obtained by decompiling the executable binary to source code," the researchers were able to show "that many syntactical features present in source code do in fact survive compilation and can be recovered from [the] decompiled executable binary."

 

The researchers used "state-of-the-art reverse engineering methods" (as displayed in the above graph) to "extract a large variety of features from each executable binary" that would represent the stylistic quirks of programmers using feature vectors.

 

Practically, this meant querying the Netwide disassembler and then the Radare2 disassembler, before using both the "state-of-the-art" Hex-Ray decompiler and the open source Snowman decompiler to extract 426 stylometrically significant feature vectors from the binaries for comparison.

 

A random forest classifier was then trained on eight executable binaries per programmer, to generate accurate author models of coding style. It was thus capable of attributing authorship "to the vectorial representations of previously unseen executable binaries."

 

The researchers noted that: "While we can de-anonymize 100 programmers from unoptimized executable binaries with 78 per cent accuracy, we can de-anonymize them from optimized executable binaries with 64 per cent accuracy."

 

"We also show that stripping and removing symbol information from the executable binaries reduces the accuracy to 66 per cent, which is a surprisingly small drop. This suggests that coding style survives complicated transformations."

 

In their future work, the researchers plan to investigate whether stylistic properties may be "completely stripped from binaries to render them anonymous" and also to look at real-world authorship attribution cases, "such as identifying authors of malware, which go through a mixture of sophisticated obfuscation methods by combining polymorphism and encryption."

 

Article source

Link to comment
Share on other sites


  • Replies 3
  • Views 848
  • Created
  • Last Reply

What they going to do turn the government in for wintering malware . LMAO ?   did the internet all the sudden turn safe  and hackers stop hacking NOOOOO . That leads  me this just PR to get grant money .

 

Even  this committer in the topic says its hog wash

Quote

 

These detection methods don't scale.

With statistical detection methods the number of false positives and false negatives increases geometrically with sample size.

Increase the sample size to 1000, then 10,000, and you will see its pointless except to conjure up some grant money.

 

Link to comment
Share on other sites


I believe you misinterpreted the author's meaning of his remarks concerning statistical analysis in relation to this project.  He meant that you don't need more samples because statistical analysis does not apply in this case and that a conclusion can be reached with fewer samples to the same reliability.  Many times it is helpful if the code can be tracked down to one person or group of persons so that a direction for further investigation can be determined.  Just another tool to thin out the group of possible suspects in a timely manner.

Link to comment
Share on other sites


8 hours ago, straycat19 said:

I believe you misinterpreted the author's meaning of his remarks concerning statistical analysis in relation to this project.  He meant that you don't need more samples because statistical analysis does not apply in this case and that a conclusion can be reached with fewer samples to the same reliability.  Many times it is helpful if the code can be tracked down to one person or group of persons so that a direction for further investigation can be determined.  Just another tool to thin out the group of possible suspects in a timely manner.

So you  believe a bunch of collage kids can do something to stop  that  Antivirus industry  has been trying  prevent since 1980  and have failed with all the money in the world for research? Everyday  it gets harder and harder to detect  malware . They don't make it sound like just another tool  they make it sound like they could back door any cracker  or hacker  by De-anonymizing and figure  out who they are  that's hogwash . And  even if  it  could some countries harbor  hackers  were the USA  or the west cant touch them  and these places is were most of the stuff  you read in the news that's really bad  is based at.  Probably just another tool to give out false positives and convict the innocent  . Just like the Antivirus industry and its  false  positives  and  if they was a real 0day  exploit they couldn't protect us if it had too .

Link to comment
Share on other sites


Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...