Hack Ideas


Raising Awareness of Personal Information Leaks


Northeastern University has been running ReCon project with the target of identifying which mobile apps are leaking Personal Information, which type of information it’s leaked and what are the destinations of that info. We want you to create different tools that leverage the insights discovered by this project.

  1. The first idea for the hack is developing a visualization on top of the ReCon Dataest that let people understand what is going on with their data when they use mobile applications on their mobile devices. This could be built directly by using ReCon Dataset.
  2. Another idea is developing a browser plugin that let end-users check what is the level of leakiness of an application when she is browsing the application homepage in the Google Play or App Stores.
  3. An Android application that scans the applications a user has installed in her device and tells her the potential information that those applications could be leaking and calculates a leaking score for her device. The idea would be using ReCon API to develop an Android application that scans the other apps installed in the device and use the API to check if any of those apps has been spotted in the past leaking Personal Information by ReCon. The application could offer user advise about how to proceed depending on how bad the issues spotted in the installed applications are.
  4. The idea would be implementing a Browser extension that inspects the traffic and provides a similar level of detection of Personal Infomration leaks to third parties.
  5. We would like also to provide users with a Real-time visualization of leaks that let them better understand which information is being sent and where is it sent to.
  6. We would also love a mecanism in which users could contribute back their feedback about the leaks so we could create a crowdsourcing database with all the modificatoin/blocking rules users create.

Type: Visualization, Browser Plugin, App
Assets
  • ReCon Dataset It's a JSON file that gathers aggregated information about all the leaks of Personal Information that have been detected by the ReCon project.
  • API It's an API built on top of that json that could be used to check directly in an app is leaking PI or whether a domain is receiving information and from which apps.
  • ReCon ML Source Code Source Code of the ReCon Machine Learning System that detects PI Leaks.
  • Example Visualization You can get some inspiration by checking a sample visualization developed by the Data Transparency Lab Team.

Links

Privacy Census


Thanks to a Princeton project named Privacy Census, we can now understand how extended are tracking techniques over the web, including advanced ones such as fingerprinting. However, this information might be difficult to process and understand for the average user. We are looking for ways to communicate this appropriately to end-users so they can understand what are those techniques, how are they used and how do they affect them.

  1. One idea would be using either the whole dataset shared by Princeton or a simplified dataset that just contains the key insights about some of the key tracking techniques that are being analysed. The visualization could be, for instance, a website that shows which are the top domains using fingerpriting, or how fingerprinting adoption varies depending on the geography or on the website type.
  2. Another idea would be building a browser plugin that, utilising the information already spotted by Princeton, shows user information about the site she is browsing or even show her the level of exposure to fingerprinting according to the browsing history. This could be built via a browser plugin that monitors the current website, compares it to the findings database and informs the user (e.g. via an icon) about the usage of Finferprinting.
  3. The challenges above mostly target fingerprinting techniques, but the Dataset also contains information about stateful tracking (e.g. cookies, headers...) thata could be used for many other ideas.

Type: Visualization, Plugin
Assets
  • Complete Datasets The complete postgres dumps with all the information of the 1000000 sites crawl.
  • Insight Dataset A distilled information about the websites that have been spotted using different fingerprinting techniques.

Links

How much are you worth for Facebook?


Facebook Data Valuation Tool is a Browser plugin that has been developed by Carlos III University. The plugin shows end-user an estimation of the value they are generating to Facebook based on her browsing activity (i.e. the ads she watches and clicks on).

In this context, we are thinking in two different directions that could be used by the teams. On one hand, we would like to offer end-users the possibility to explore what type of ads are shown to them. Thanks to this tool we have a good set of historical data about Facebook sessions and ads. The idea would be building visualizations that let users understand which companies are targetting them: top advertisers, type of ads, historical evolution, etc. This could be implemented as stand-alone visualization but also could be potentially embeded into the plugin itself. On the other hand, we are also interested in the creation of companion education material that based on the tool, let users understand clearly the value they have for facebook, and on which facets do their value depend. Education Material could be some statistics/data, a vídeo, a website, etc. For instance, it would be interesting to communicate users how their value vary depending on the audience in which they are categorised and how are they assigned to those audiences.


Type: Plugin, Website, Education material
Assets
  • Tool The FDVT Chrome plugin
  • Dataset It's a JSON file that shows the ads that some users of the toold have been shown durin a limited period of time

Links

How much is my data worth?


During the last year, people have started to be aware of the importance of personal data for Internet companies. However, it’s not clear how much value do they have for those companies. We thinks users would love a way to calculate how much they are worth, taking into account the type of data, the specific data and the potential number of similar users.

This could be built in different ways: as a website in which the user can input manually some data or connect to some of the services they use, as a mobile application or as a browser plugin that detects user activity. For instance, it should be possible to build something similar to what Financial Times built (see link below) in a more user friendly way or in a way that instead of requesting users to fill-in some data, it monitors user activity.


Type: Browser Plugin, Website, Mobile App
Links

How revealing my public information can be?


Many people unknowingly volunteer sensitive personal information and fail to restrict access to it. This kind of information is analytically valuable to both data brokers and exploitation value to attackers. The idea is raising awareness of the potential danger it has for users.

By using public, unauthenticated APIs to collect information on individuals that could be used to answer a security question or is otherwise sensitive. You may make inferences based solely on metadata if appropriate. The goal is to demonstrate accidental disclosure and raise awareness. Example: Someone tweets “I got a new car!” and a photo. You now have the make and model of their current, and maybe first, vehicle. This information may come up in an account recovery process.


Type: Browser Plugin, Website
Assets

Parent Child Education Challenge


Tracking mechanisms often can’t separate child and adult activity, putting children at risk for targeted advertising and analytics. Younger generations also tend to have a much larger “digital footprint”, often increasing their risk of identity theft and cyberstalking. There’s no technical “silver bullet” for these complex issues, so educating parents and children is key to improving their cyber safety.

The goal is creating an interactive tool for educating parents, children, or both on topics related to online privacy. This an open-ended challenge so you may use any format, API, language, or platform you see fit! Some examples: “Privacy checkup” style web app that aggregates data from multiple sources or a “What to do if …?” to preven particularly risky situations.


Type: All

Education and Raising Awareness on 3rd Party Tracking behind websites and apps


Average user don’t usually understand what is really going on when they browse a Web Site or when they use an application in their smartphone. Every time they do such a simple action, many third parties are involved, and in most of the situations, all of them try to track them.

The goal is creating an interactive tool for explaining to users what goes on every time they connect, for instance, to a news web site. Such a tool could use the databases collected either in ReCon project (mobile apps) or Privacy Census (Web Sites)


Type: All
Assets
  • Insight Dataset It's a JSON file that gathers the information about which sites have been spotted fingerprinting users categorised by site type and country
  • ReCon Dataset It's a JSON file that gathers aggregated information about all the leaks of Personal Information that have been detected by the ReCon project.

Links

Tools to bring transparency to targeted advertising


Targeted advertising has been the focus of much research effort, mostly dedicated to optimizing the strategies for targeting users. As a consequence, it has increased online advertising revenues significantly. However, it has also been raising more and more concerns from users, who often feel that it constitutes an invasion of their privacy. In particular, users often wonder “why am I being shown this ad?” or similar questions. EURECOM Institute and MPI-SWS, have been conducting an study on how to provide answers to such questions, in a privacy sensitive way. To this extent the following following would be useful:

  1. Facebook Ad Collector: Build a browser extension (preferably Chrome extension) that collects the ads users receive on Facebook, as well as data from the new ‘Why am I seeing this” functionality on Facebook, and sends them all to a server. This functionality provides some explanations that could be useful to our end goal, since It tells users why they have been targeted. From each ad, the info collected should include at least the landing page url of the ad and the media content of the ad (e.g. image).
  2. Android Ad Collector: Build an application that collects all the ads that appear in an android device. This means that the app needs to collect not only the browser ads but also ads that appear in other android applications. Insights on how to achieve this functionality can be gained by examining how ad blockers like Adblock Plus for Android work, and emulating their functionality. Similarly to the Facebook Ad Collector, after the data have been gathered, they should be sent to a server. In this case apart from the landing page url and the media content of the ad, we would also like to collect the link that invokes the ad (or some information on the ad distributor if getting the exact url is not possible).

Type: Browser Plugin, Android App
Links