Reference: When is S3 data public?

Published: Saturday, 8 February 2025

reference
security
domains

SkySiege tests A-S3-4, A-S3-5, A-S3-6, A-S3-7, A-S3-8 and A-S3-9 all focus on public access blocks and the quality of S3 Bucket Policies. The reasoning for this is simple - Public Access Blocks and Bucket Policies are front line protections for ensuring that your data is not publicly available.

By publicly available we can reference two definitions:

The technical definition where data is available to any entity without any authentication or authorisation challenges. This means data is available to any entity by definition
The practical definition, where there may be some authentication or authorisation challenges but they’re so trivial that the data is practically public. This can mean data which is gated behind a self service login or data which is weakly protected such as with a weak password

For AWS S3 buckets, configuring the S3 bucket’s data to be technically public will set off a number of AWS flags alerting you to how your data is publicly available. This is easy to detect for and one of the first SkySiege Cloud Assessment checks.

However, having access policies that are weak enough that the data is practically public is less obvious and can happen even in AWS with the depth of IAM and Access Control configuration available. We would classify any data that is open to any logged in AWS customers as practically public data. UpGuard Cyber discovered an example of practically public data hosted by Data Analytics firm Alteryx covering ~123 Million US households.

Concept ¶

On October 6, 2017, the UpGuard Cyber Risk Team discovered that an S3 bucket that whilst not public, only required requests to come from an authenticated AWS Account, meaning that the contents of the bucket were available to any authenticated AWS customer. This bucket contained a huge amount of sensitive information including:

Alteryx software including various versions of the software and beta releases
Data from Alteryx partner Experian covering household preferences, behaviour and categorisation of US households
Data from the US Census Bureau sourced from the 2010 US Census

Accessing the data was simple, you ensured that you had authenticated with AWS over browser or AWS SDKs as a valid AWS customer and requested access to contents in the exposed bucket named alteryxdownload. No further authentication nor authorisation was necessary.

Impact ¶

The exposed datasets compromised sensitive personal information such as home addresses, contact information, financial histories and purchasing behaviours of millions of US Households. The data is a goldmine for accommodating identity theft, fraud and targeted marketing data.

Additionally, while the data impact takes center stage in UpGuard’s report, the entire software stack for Alteryx is also accessible, providing access to Alteryx’s products including beta versions.

Expansion ¶

Whilst the US Census data is available from the US Census Bureau, Experian’s data was not publicly accessible. The scope for actively using this data to aid identity theft and fraud is substantial but can also be used to create shadow profiles of real people en masse, aiding the generation of fake identities that mirror real life counterparts for business models such as social media manipulation or brigading.

Alongside the data breach, the bucket contained Alteryx’ software releases including versions labelled as beta. When considering the beta versions, it could be possible depending on the software architecture to gain access to proprietary code or other corporate knowledge that could be sold to competitors or utilised to create a fraudulent version to be distributed into the wild. Beta versions are not usually obfuscated or hold protective measures like production software.

A final possibility was whether the open bucket allowed for uploads from logged in AWS Customers in addition to the read access that was discovered. If so, then maliciously modified versions of Alteryx’ software could be uploaded directly to the bucket and available to all existing and new customers without immediate detection.

Conclusion ¶

In addition to the standard considerations around technically public data, it’s important to explicitly vet access policies to ensure that data is not also practically public. Whilst the alteryxdownload bucket wasn’t technically public, it was public enough to be effectively freely available.

Another consideration is the potential blast radius of the breach. For unambitious threat actors, the data alone is valuable enough, however, a more motivated assailant could poison the Alteryx codebase over a longer period of time, feed data directly to a competitor and maliciously destroy all installations or use installations as an attack vector to compromise Alteryx customers.