Data Portability. As of May 2018, the EU activated General Data Protection Regulation (GDPR), with the goal to increase user data transparency and privacy. One major pillar of GDPR is giving user access to the data that services have collected. Consequently, GDPR contains a clause prompting the development of infrastructure that allows moving data from one service to another. This clause aims to solve two problems (1) giving users access to their data (via export) (2) leveling the playing field for all services, by making it easier for users to move to smaller services. In an era where services thrive on user data, this is extremely incentivizing for smaller services (theoretically atleast).
Services. Due to the absence of a platform for direct data transfer between services, the authors analyze second hand data transfer characteristics (Exporting from one service and Importing on another). They achieve this by analyzing 182 (including 100 top Alexa services) services manually. Their dataset creation is manual but extremely extensive, as they keep record of all interactions, time windows, formats, correspondence etc. while importing and exporting data.
Dataset. To produce real world data, the authors created accounts and performed interactions with each service manually. It is interesting to note that there are 4 categories of data that the services acquire from users; received (which is given directly e.g email, clicks), observed (data collected by sensors), inferred, predicted. The scope of the following analysis is limited to received data.
Questions. Using regression models the authors were able to answer the following question:
* Are services with higher rank more compliant? * Do services with higher rank provide less data to users? * Do higher rank services use extensive authentication for data porting? * Do higher rank services provide faster data transfer? * Do lower rank services provide more and better import opportunities?
Compliance. Any service that processes data export request in legally-allowed time (30 days), provides data in machine readable format (JSON, HTML, XML), provides all of the received data of the user, is considered compliant. Only 74% of the 182 services facilitated data export at some level, and popularity / Alexa rank, was not a significant variable in predicting overall compliance.
Scope of Data. Furthermore, as per intuition, we should see higher rank services exporting less data as compared to lower ranked. This insight is based on the hunch that popular services value possession of unique data. To everyones’ surprise, rank was significantly related to providing more data while exporting. In some cases, higher ranked services would even provide observed data.
Authentication. Analyzing data scope remains incomplete without checking characteristics of authentication during data export. Large amounts of data in malicious hands can cause serious issues. The data shows services with higher rank require significantly more authentication than lower ranked services. This is a positive result as we showed higher ranked services provided more data while exporting.
Transfer Speed and Import Opportunities Finally, the authors analyzed how quickly a service processed data transfer and if they had import opportunities. They expected to see higher ranked services to show slower processing times and lower ranked services to provide more import opportunities. They were able to show higher rank did not affect faster processing times. However, adding to more unexpected results, they saw no relation of lower rank with more import opportunities.
Discussion. This paper highlights several interesting results and offers possible explanations w.r.t compliance with data portability. They argue that absence of a direct transfer platform is one major reason that higher rank services are being more compliant. As doing so will build more trust with audience. At the same time lack of direct transfer reduces the probability of users actually transferring their data (convenience). Making compliance a win-win for higher ranked services. Furthermore, they argue that lack of awareness is behind lower rank services not utilizing this golden opportunity.