Microsoft's Accidental Data Exposure on GitHub

Microsoft's AI researchers inadvertently made public a massive amount of sensitive data, including private keys and passwords, when they uploaded open source training data to GitHub.

Cloud security startup, Wiz, disclosed to TechCrunch that they found a Microsoft AI research division's GitHub repository that unintentionally revealed cloud-based data. The repository, aimed at offering open source code and AI image recognition models, contained a link to an Azure Storage URL for downloading the models. Wiz revealed that the URL granted access to the entire storage account, unintentionally revealing private data.

The exposed data encompassed 38 terabytes, including two Microsoft employees' personal computer backups. This trove contained passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams conversations from several employees.

Further, Wiz highlighted that the URL, which has been revealing this data since 2020, was misconfigured to provide "full control" instead of "read-only" access. This means anyone with knowledge could alter or inject harmful content.

The error stemmed from the inclusion of a shared access signature (SAS) token in the URL by Microsoft AI developers, granting more access than intended. SAS tokens facilitate the sharing of Azure Storage account data links.

Wiz's co-founder and CTO, Ami Luttwak, pointed out to TechCrunch the challenges of maintaining security with rapidly advancing AI technologies. He emphasized the importance of extra security measures, especially when vast amounts of data are at play.

After being informed by Wiz on June 22, Microsoft deactivated the SAS token by June 24 and finished assessing the organizational implications by August 16. In a subsequent statement, Microsoft confirmed that no customer data was jeopardized and other internal services remained safe.

As a proactive measure, following Wiz's research, Microsoft has enhanced GitHub's secret scanning service. This system will now scrutinize public open source code alterations for any overly accessible SAS tokens or credentials.

Posted: 2023-09-19
By: dwirch
