RubyGems.org Vulnerability Explained

After evaluating Gemfury’s processing of RubyGems, we feel it is important to share our understanding and bring awareness to possible security issues when parsing untrusted YAML input.

On January 30, 2013, the community package server RubyGems.org was compromised with a rogue code execution vulnerability. The all-volunteer team sprung to action and in the following 53 hours yanked the expoit, patched the vulnerability, verified all the existing gems, and migrated the service to AWS. As of today, the service has been restored and deemed safe for use.

Important: This vulnerability came from misuse of a standard YAML library and might not be specific to just RubyGems.org. Many applications depend on this library and are potentially vulnerable to a similar exploit if exposed to untrusted YAML input – please take this opportunity to audit and secure your own applications.

Quick review of RubyGem structure

RubyGems are used to encapsulate, package, and share Ruby code. A Gem is nothing more than a tar.gz archive of the files packaged with gem build:

$ tar -ztf rails-3.2.11.gem 
data.tar.gz
metadata.gz

The data.tar.gz archive contains all packaged files that the author has chosen to distribute. A list of these files is specified in the original gemspec.

The metadata.gz file is a compressed YAML.dump serialization of the Gem::Specification object that is defined by the above-mentioned gemspec. This specification contains the name, version, author, file list, dependencies, and other important information about the Gem.

Uploading to RubyGems.org

When a Gem is uploaded to RubyGems.org or Gemfury, the server extracts the contents of metadata.gz and uses this to index the Gem. The extracted data is used on the Gem information page and, more importantly, in the backend indexes queried by gem install and Bundler when a developer installs that Gem.

The vulnerability

Before the discovery of this exploit, RubyGems.org loaded the content of metadata.gz by calling YAML.load which is a part of the standard Ruby libraries.

A powerful feature of the Ruby YAML library is the ability to serialize Ruby objects. For example, when YAML.load was called on the Gem metadata, the returned object was a Gem::Specification instance and not one of the basic types.

This feature was used to compromise RubyGems.org – the exploit was an uploaded gem with a well-crafted metadata.gz file that instantiated an object that could and did execute arbitrary Ruby code.

YAML has a number of ways to deserialize Ruby objects and one of them is specifically designed for subclasses of Hash that takes the following form in the YAML file:

--- !ruby/hash:MyHashClass
Hello: World
Foo: Bar

In this example, when the parser encounters this input, it will create a new instance of MyHashClass and call []= method for each listed key/value pair. And it does so without verifying whether MyHashClass is actually a subclass of Hash.

So now, to execute arbitrary code, one just has to find any existing class that calls eval on either of the arguments to the []= method. Unfortunately, the class that was used in this exploit is included in every Ruby on Rails application as part of Action Pack’s routing.

If you trace the []= method of NamedRouteCollection, you will find that it inserts the content of the first argument into a module_eval block, thus executing rogue code.

Assesment

Please evaluate whether your applications is loading YAML input anywhere from an untrusted source. A good way to catch it is to stub the YAML.load method after all your configuration files are loaded and re-run your test suite.

Mitigation

If your application is supposed to process untrusted YAML input, I recommend two possible solutions:

If your input is only expected to have basic types without any Ruby objects, I recommend looking at safe_yaml which disables non-basic types for both Syck and Psych parsers.

Using only basic types should be the standard approach of serializing to YAML. It is not a good practice to expose internal details of your application (like class names) outside of a trusted environment.

However if, like RubyGems.org, your input does expect to contain certain Ruby classes, then you should customize the behavior of Psych to only instantiate a whitelist set of classes. Also, audit and/or stub the following methods for each of the whitelisted classes.

def []=(k, v) end
def init_with(v) end
def yaml_initialize(k, v) end

Additional resources

Comments or questions?

Please reply to the following tweet or contact us: