Jump to content

Twitter’s vast metadata haul is a privacy nightmare for users


tao

Recommended Posts

Working with publicly available metadata from Twitter, a machine learning algorithm was able to identify users with 96.7 per cent accuracy

Metadata is everywhere. Everything you tweet, every picture you take, and every status update you post on Facebook. It’s used by police and security forces to identify people who try to hide their identities and locations, while associated metadata in selfies can inadvertently ensnare criminals unaware that the data can destroy their alibi.

And metadata on Twitter can also be used in extremely precise identification each and every one of us – according to a new paper by researchers at University College London and the Alan Turing Institute. Your tweets, it turns out, no matter how anonymous you might think they are, can be traced back to you with unerring accuracy. All someone needs to do is look at the metadata.

The scientists used tweets and the associated metadata to identify any user in a group of 10,000 Twitter users with 96.7 per cent accuracy. Even when muddling up to 60 per cent of the metadata, the model could still pinpoint a single person with more than 95 per cent accuracy.

“Metadata is much larger when compared to the actual content of a tweet,” says Savvas Zannettou, a PhD student at the Cyprus University of Technology. People wrongly assume that because the data is online, they aren’t vulnerable to identification, adds Beatrice Perez of University College London, a co-author of the paper.

No right-thinking person would tell a total stranger what their address is if approached on the street. But they might tell them how often they turn their bedroom light on and off. “That’s the mentality with metadata,” says Perez. “People think it’s not a big deal. But couple it with another piece of information and I know when you’re home or not.”

It’s a commonly held belief, agrees Zannettou. “The average person doesn’t recognise that she can be easily identified using metadata.” Most Twitter users, he reckons, have no idea that Twitter holds 144 pieces of metadata on them, which is publicly accessible through the site’s API.

Being anonymous won't help

The researchers took a corpus of five million Twitter users and ran 14 pieces of metadata from their tweets (including the time the account was created, the time a tweet was published, and the number of favourites, followers and following) through three different machine learning algorithms.

The most efficient at identifying individual accounts with the best accuracy was also one of the most basic machine learning algorithms, say the researchers. It showed that it’s possible to identify with near-precise accuracy an individual using just a handful of pieces of metadata.

It does so by training the model with a known dataset of users, demonstrating that they behave in a certain way on Twitter based on the metadata of their tweets. When the model is run “in the wild”, using new tweets from the same users, it can unpick people’s behaviour from metadata, identifying them as a specific individual.

Trying to anonymise the data collected by social networks isn’t the answer, says Perez. “It’s very hard to anonymise a data set,” she explains. Triangulation using one or more sets of data is easy to do, and can often undo any attempts to remove identifying information.

Perez and her colleagues proved that by obfuscating the dataset they had from Twitter, removing some fields to try and make it more difficult for their system to pinpoint individuals. “If we had a few data points not blurred, it was still easy,” she says. The identification rate stayed largely stable right up to the point at which all unique elements are removed – and it becomes impossible to discern one person from any other.

Things are likely to improve following the introduction of GDPR in late May. “I think we’re going to find more scrutiny around metadata,” explains Pat Walshe, a data protection consultant. Article 25 of GDPR calls for “data protection by design and by default”. That regulation, also called data minimisation, requires that only the specific data required to carry out a task is processed by companies.

But the bigger question, beyond whether it’s right or not that companies can hold so much identifying information about us all, is whether the average person values their privacy in the first place. “For sure, the average user should care,” says Zannettou. “But I’m sceptical if they do.”

< Here >

Link to comment
Share on other sites


  • Replies 2
  • Views 1.2k
  • Created
  • Last Reply

Hundreds of articles are written warning people that if they are on the internet they and their data is not secure.  If you use any website, whether you use a commercial VPN or proxy or a free one, or even if you use TOR, you can be traced.  It may not be possible for normal users to track you but any of the thousands of law enforcement/intelligence organizations can and some with the ease that you use to open a web page.  For years it has been public information that people can be traced on facebook, twitter, etc by anyone.  The show Catfish showed people weekly how they tracked people down on facebook and twitter, it didn't take a rocket scientist for a bunch of code and software to do it.

Link to comment
Share on other sites


10 hours ago, straycat19 said:

Hundreds of articles are written warning people that if they are on the internet they and their data is not secure.  If you use any website, whether you use a commercial VPN or proxy or a free one, or even if you use TOR, you can be traced

There's a big difference in being traced and having some malware the FBI  are some other agency has  bought from some hacker that is no better than the person they are exploiting served  to you. Are some dumb user having poor cybersec  to begin with and even using sites like Twitter , Facebook , Etc to begin with . In many countries  they use tor in and open source social media its a matter of life or death because they speak on subjects that  there Government don't allow , Things were people like you take for granted  like freedom of the press can get you killed in many countries , these are tor users too and there life depends on this software . Funny they never was able too catch up with Snowden tell after he escaped , and all the carnies they put on sites he used didn't do a bit  of good after the fact.  Last time i read anything about them exploiting Tor was some years ago when they was using it to catch a pedo ring.

 

Basically every article tells you how they got caught and none of it was as simple as tracing someone. Only very few have they ever posted about who they caught  are they got away  they knew about, because the majority  never got caught . For every person they took out on the darkweb  there were 100s they never caught . Drug Markets on the Darkweb are so bad that the idiots use to come on the clearnet  on sites like reddit and tell each other were to buy  so reddit ban the sub and they were just doing these junkies a favor and shutting them up were the fbi and others could not see them, the further they drive them underground the harder they are to catch. Its like all the hackers that are using Twitter there idiots as soon as one of them do something serious  the FBI  will have enough too catch them on.

 

Quote

 

hackerfactor:

How do Tor users get caught later down the track after years?

(A) The user was careless and leaked enough information about themselves outside of TOR.

(B) The user was targetted by malware or a hostile site (even a hostile hidden service) that exposed enough information to determine their identity.

(C) A hostile exit node alters content, permitting tracking. (E.g., call-home javascript, replacing downloaded executables, altering SSL certificates to force a non-tunnel CA lookup), tracking bitcoin transfers, etc.

(D) A server creates a profile of the user that is specific enough to identify the user/browser/computer outside of TOR.

(E) The user's non-TOR activities led to the capture. After acquiring the computer (via warrant), they also identify past TOR activity.

(F) The user is lured out of TOR. E.g., "We should meet in person..."

 

You wrote: "Isn't the whole idea of Tor to remain anonymous?" No. TOR only anonymizes the network and transport layers. It does nothing for anonymizing the session, presentation, or application layers. And it does nothing to anonymize the user (your name or any information that you reveal about yourself).

You also mentioned people who were captured years later. Even if law enforcement knows who you are right now, they have no obligation to arrest you right now. They may find you due to a minor crime and wait until you commit a major infraction before sweeping in. Or in the case of child porn, they may queue up the people so they can arrest them all at the same time. With a big bust, there's no early warning that the cops are coming. (Basically, rather than arresting one person and warning the others -- potentially allowing criminals to flee/hide, the police will wait and arrest everyone at once.)

 

 

 

 

 

 

As you can see from the quote above  smart hackers know  how you get caught and don't even need to be told because they done discussed it a million times , the ones on twitter are wanting media attention and trying to make a name for themselves  and are reckless  to even to be posting about hacking  on such a site. If every one got caught  online the dark markets  or hacking forums would not even exist , Most of them stay one step  ahead   or crimes would go away on the internet  and everyday someone is getting exploited with some bitcoin miners witch is legal software you can can even download from windows store too mine your own pc.

 

Whitehat hackers would be without jobs  and they be no need for billions of people to do security updates or use antivirus  and law enforcement would be able to put all there efforts into solving crimes in the street, if they always caught the bad guys on the internet  in a perfect fairytale world you make cyber crime solving out too be.  Even the Government uses security software because there is no agency that never been hacked.

 

Equifax hack alone put like half the USA users data at risk  and that was just one hack they really doing a super job stopping it.

https://money.cnn.com/2018/02/09/pf/equifax-hack-senate-disclosure/index.html

Link to comment
Share on other sites


Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...