You are on the Cert-IST public site
The danger of Cloud services like GitHub or Amazon S3

Date :February 08, 2018

Publication: Article

In February 2018, Uber's CISO testified at a U.S. Senate committee about the massive data leak that occurred at Uber in 2016 (and that was disclosed in late 2017). has published an article on this and the full testimony is also available on the US Senate website.

He explained how the Uber data theft happened:

  • The hackers obtained the login and password for a private GitHub account used by Uber to host some of its developments. It is not known how this information was obtained. This could be for example by brute force attack (weak password) or by phishing. Uber has since implemented a 2-factor authentication on its GitHub accounts.
  • While exploring GitHub content, hackers found in the Uber’s source code authentication tokens to access Amazon S3 "Buckets" created by Uber to store backup data.
  • The Amazon S3 access gained that way, allowed hackers to steal Uber’s data.

Note: "Buckets" are storage spaces accessible via web services, marketed in the AWS (Amazon Web Services) offer under the name Amazon S3 (Simple Storage Service).

This attack scheme is becoming common, and this is one of the points of vigilance that we identify in our 2017 review regarding major vulnerabilities and attacks

  • Companies are increasingly building cloud-based solutions (for example via Amazon AWS or Microsoft Azure) and are also using cloud-based collaborative tools (such as GitHub) to develop their IT projects.
  • If these cloud-based solutions are not properly protected, they can easily be hacked.


The quest for poorly protected Cloud spaces is becoming more and more active:

The attackers have become aware of these weaknesses, and this it led to an increase in this type of attack. GitHub and S3 Buckets are the most exposed targets, and there are more and more tools to automate the search for vulnerable targets. For example:

  • Gitleaks a tool (written in GO) to search in a GitHub repository for AWS access tokens (API keys) or others.
  • Slurp, S3Scanner or AWSBucketDump: tools to search for S3 buckets and test if they are weakly protected.

Finding a list of S3 Buckets existing at Amazon is the main challenge for attackers, as Buckets are identified by names chosen by their creators (e.g."") and there is no directory. So the hacker must guess these names at random, or perform searches to discover the Buckets that really exist.

Because of this, there would be a boom in attack attempts if a search engine listing existing Buckets would be available on Internet. There have been two recent attempts (at least) in this area:


How many sites are really badly protected? :

HTTPCS has published an interesting study on this topic. According to the tests it performed:

  • 5.8% of the 100,000 Buckets tested contained data that can be read by everyone (no login).
  • 2% are not even write-protected and allow file upload or overwrite.

The 5.8% figure is not very high, considering that this is not always sensitive data, but they definitely demonstrate some negligence in data protection practices. It would be useful to also have figures about the extent of AWS token leaks (leaks via Github for example) to better measure the risk in the case of combined attacks (GitHub + S3). The Gitleaks tool mentioned above provides in its ReadMe file the references to a publication which could cover this topic.

 Amazon is obviously aware of this negligence:

  • Since November 2017, it displays warning messages in the Amazon S3 administration interface if the bucket is "public" (readable by all).
  • Since February 2018, it has given free access to its "Trusted Advisor S3 Bucket Permissions Check" tool.



Of course, these problems are not only with Amazon products, and exist as well for other similar offers, such as: Microsoft Azure, DropBox, Google Cloud, etc.

The use of Cloud solutions exposes companies to new risks, or rather opens to Internet risks that sometimes already exist with company’s internal repositories. For example, we know that maintaining proper access rights on a shared document repository is a tough task which is difficult to sustain over the long term without strict organization. When the documentary repository is accessible only within the company, the data leak risk is limited to this perimeter. In the case of an outsourced solution (via the cloud), there is no more perimeter limit.

Finally, it should also be noted that most often these leaks are not really due to negligence, but rather to a lack of awareness about the possible attacks.


On the same topic