Data Transparency Lab Research empowers its users and beyond.

Daniel Coloma, CTO, Data Transparency Lab.
17 Jul 2016
Research findings resulted in the correction of privacy failings by developers in following releases.

One of the main aims of DTL is to provide insights about the use of our personal data by web services and apps. One of our 2015 Grantees, Northeastern University's project ReCon , offers a tool to end-users that lets them understand which bits of Personal Information (PI) are sent by their smartphones to other parties, usually unbeknownst to them. ReCon specifically details the type of PI that is sent, the applications that are sending it, and the domains that are receiving it.

One of the key insights the ReCon project unveiled was that some applications were sending user passwords in plain text (i.e. not encrypted). After detecting this situation, the ReCon team contacted the application developers. As a result, most of the app developers fixed the situation. Interestingly, these insights were not only useful to ReCon users, but also extremely useful to all users of the identified applications, mainly because the findings resulted in the improvement and correction of the privacy failings by their developers in following releases.

This is one example of how the tools that DTL develops directly benefit the end users of the tool itself, but also have impact beyond them. In the case of ReCon, millions of users who were not using the ReCon tool were affected by the same vulnerabilities. That made me wonder how we could make the most of the projects we were running, particularly ReCon.

ReCon: Data Visualizations

The first approach that came to mind was sharing all the PI leaks detected openly (the meta-data, not the info. itself), so any lay user could look for information about how a particular application behaves privacy-wise. I decided to explore the possibility to develop a visualization of the PI leaks detected by the ReCon project. I shared this idea with Dave Choffnes, the ReCon Principal Investigator, who was very supportive of the idea and excited to work with us on it.

The ReCon team shared with me a json file that included the aggregated information about PI leaks detected by ReCon so I already had a good starting point to play with. With the help of my colleagues Blanca and Rafa we developed a first version of the visualization that you can play with at the DTL Website.

It is a very early quick and dirty version, developed by an engineer with no expertise in visualization, but despite that, it is worth checking out because of the insights that were revealed: e.g. some applications send a user's device identifier up to 9 different domains. We are working with visualization experts from our community to upgrade this visualization, to make it more understandable and compelling for end-users. We hope to release it in the coming months.

The JSON file and the source code of the initial visualizations are fully available, so please feel free to play with it, send us your Pull Requests or create a totally brand new visualization.

Open API

We also thought that there was be extra-value beyond a nice visualization of the data discovered by providing real-time interactive applications that can leverage ReCon findings. For example, a mobile application could warn a user about which apps on her phone are leaking certain kinds of PI, based on the information ReCon detected about other users of those same privacy-lacking apps.

In order to address this kind of use of cases we decided to develop an API that wraps the ReCon dataset. For instance, such an API could be used to detect whether a particular app has been previously spotted leaking Personal Information (or not). This could be useful to identify how "leaky" a device is and to warn users before installing an application.

The API is in alpha status and is freely and openly available. The documentation provides information about how to use the API and there are some slides that provide more general context beyond the API.