The UpGuard Cyber Risk team can now report that two more third-party developed Facebook app datasets have been found exposed to the public internet. One, originating from the Mexico-based media company Cultura Colectiva, weighs in at 146 gigabytes and contains over 540 million records detailing comments, likes, reactions, account names, FB IDs and more. This same type of collection, in similarly concentrated form, has been cause for concern in the recent past, given the potential uses of such data.
A separate backup from a Facebook-integrated app titled “At the Pool” was also found exposed to the public internet via an Amazon S3 bucket. This database backup contained columns for fk_user_id, fb_user, fb_friends, fb_likes, fb_music, fb_movies, fb_books, fb_photos, fb_events, fb_groups, fb+checkins, fb_interests, password, and more. The passwords are presumably for the “At the Pool” app rather than for the user’s Facebook account, but would put users at risk who have reused the same password across accounts.
Redacted example of Facebook data from the exposed At the Pool dataset.
The At the Pool discovery is not as large as the Cultura Colectiva dataset, but it contains plaintext (i.e. unprotected) Facebook passwords for 22,000 users. At the Pool ceased operation in 2014 (last non-redirect web archived capture here), and even the parent company’s website is currently returning a 404 error notice. This should offer little consolation to the app’s end users whose names, passwords, email addresses, Facebook IDs, and other details were openly exposed for an unknown period of time.
Data contained in the exposed Cultura Colectiva dataset.
Each of the data sets was stored in its own Amazon S3 bucket configured to allow public download of files.
The data sets vary in when they were last updated, the data points present, and the number of unique individuals in each. What ties them together is that they both contain data about Facebook users, describing their interests, relationships, and interactions, that were available to third party developers. As Facebook faces scrutiny over its data stewardship practices, they have made efforts to reduce third party access. But as these exposures show, the data genie cannot be put back in the bottle. Data about Facebook users has been spread far beyond the bounds of what Facebook can control today. Combine that plenitude of personal data with storage technologies that are often misconfigured for public access, and the result is a long tail of data about Facebook users that continues to leak.
Redacted example of Facebook data from the exposed Cultura Colectiva dataset.
Incident Response
These two separate discoveries demonstrated two polar opposite ends of the spectrum when it comes to the ease, or difficulty, of seeing them secured. With regard to the Cultura Colectiva data, our first notification email went out to Cultura Colectiva on January 10th, 2019. The second email to them went out on January 14th. To this day there has been no response.
Due to the data being stored in Amazon’s S3 cloud storage, we then notified Amazon Web Services of the situation on January 28th. AWS sent a response on February 1st informing us that the bucket’s owner was made aware of the exposure.
When February 21st rolled around and the data was still not secured, we again sent an email to Amazon Web Services. AWS again responded on that same day stating they would look into further potential ways to handle the situation.
It was not until the morning of April 3rd, 2019, after Facebook was contacted by Bloomberg for comment, that the database backup, inside an AWS S3 storage bucket titled “cc-datalake,” was finally secured.
On the flip side of the coin, the data stemming from “At the Pool” had been taken offline during the time UpGuard was looking into the likely data origin, and prior to a formal notification email being sent. It is unknown if this is a coincidence, if there was a hosting period lapse, or if a responsible party became aware of the exposure at that time. Regardless, the application is no longer active and all signs point to its parent company having shut down.
Conclusion
These two situations speak to the inherent problem of mass information collection: the data doesn’t naturally go away, and a derelict storage location may or may not be given the attention it requires.
For app developers on Facebook, part of the platform’s appeal is access to some slice of the data generated by and about Facebook users. For Cultura Colectiva, data on responses to each post allows them to tune an algorithm for predicting which future content will generate the most traffic. The data exposed in each of these sets would not exist without Facebook, yet these data sets are no longer under Facebook’s control. In each case, the Facebook platform facilitated the collection of data about individuals and its transfer to third parties, who became responsible for its security. The surface area for protecting the data of Facebook users is thus vast and heterogenous, and the responsibility for securing it lies with millions of app developers who have built on its platform.