Diving into the internals of Gem packaging

Package managers are indispensible to most modern languages. Package managers have allowed developers to create and distribute building blocks that can be used to bootstrap complex web applications. In this post we will dive into the internals of Rubygems, the Ruby programming language’s package manager.

Any package manager has a simple mission: take code separated into a module and convert it into a format that can be easily distributed and installed in a number of different environments

Anatomy of a gem

A gem at the very minimum consists of a gemspec file and a Ruby file with the same name as the name of the package. The convention is to place this Ruby file in a lib folder.

-- hola
   |
    -- hola.gemspec
   |
    -- lib
       |
        -- hola.rb

The gemspec file is used to list out the specifications for the gem. The gemspec must specify the authors, files, name and summary and version for the gem. Optionally, you can also specify runtime and development dependencies. These specifications are then used when building as well as installing the gem.

$LOAD_PATH.unshift File.expand_path("../lib", __FILE__)
require "rspec/version"

Gem::Specification.new do |s|
  s.name        = "rspec"
  s.version     = RSpec::Version::STRING
  s.platform    = Gem::Platform::RUBY
  s.license     = "MIT"
  s.authors     = ["Steven Baker", "David Chelimsky", "Myron Marston"]
  s.email       = "rspec@googlegroups.com"
  s.homepage    = "http://github.com/rspec"
  s.summary     = "rspec-#{RSpec::Version::STRING}"
  s.description = "BDD for Ruby"

  s.metadata = {
    'bug_tracker_uri'   => 'https://github.com/rspec/rspec/issues',
    'documentation_uri' => 'https://rspec.info/documentation/',
    'mailing_list_uri'  => 'https://groups.google.com/forum/#!forum/rspec',
    'source_code_uri'   => 'https://github.com/rspec/rspec',
  }

  s.files            = `git ls-files -- lib/*`.split("\n")
  s.files           += ["LICENSE.md"]
  s.test_files       = `git ls-files -- {spec,features}/*`.split("\n")
  s.executables      = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
  s.extra_rdoc_files = [ "README.md" ]
  s.rdoc_options     = ["--charset=UTF-8"]
  s.require_path     = "lib"
end

gem build

To convert the gem codebase into a distributable package, Rubygems provides a build command. The build command requires a gemspec file as a parameter and uses the provided specifications to create .gem file which can then be distributed.

This .gem file is simply a tarball file that in turn consists of three .gz files: a data.tar.gz file, a checksum.yaml.gz file and a metatdata.gz file. A cryptographically signed gem will also contain corresponding .sig files that contain signatures.

-- hola.gem
   |
    -- checksum.yaml.gz
   |
    -- metadata.gz
   |
    -- data.tar.gz

Rubygems does this using a Gem::Package::TarWriter class that creates tarballs from the information and data provided in the gemspec. The metadata file is a compressed YAML file that lists the information provided in the gemspec.

--- !ruby/object:Gem::Specification
name: rspec
version: !ruby/object:Gem::Version
  version: 3.9.0
platform: ruby
authors:
- Steven Baker
- David Chelimsky
- Myron Marston
    ~~~~~~~~~~~~~~~ removed for brevity ~~~~~~~~~~~~~~~~~~~~~~~~
dependencies:
- !ruby/object:Gem::Dependency
  name: rspec-core
  requirement: !ruby/object:Gem::Requirement
    requirements:
    - - "~>"
      - !ruby/object:Gem::Version
        version: 3.9.0
  type: :runtime
  prerelease: false
  version_requirements: !ruby/object:Gem::Requirement
    requirements:
    - - "~>"
      - !ruby/object:Gem::Version
        version: 3.9.0
    ~~~~~~~~~~~~~~~ removed for brevity ~~~~~~~~~~~~~~~~~~~~~~~~
executables: []
extensions: []
extra_rdoc_files:
- README.md
files:
- LICENSE.md
- README.md
- lib/rspec.rb
- lib/rspec/version.rb
homepage: http://github.com/rspec
licenses:
- MIT
    ~~~~~~~~~~~~~~~ removed for brevity ~~~~~~~~~~~~~~~~~~~~~~~~
required_ruby_version: !ruby/object:Gem::Requirement
  requirements:
  - - ">="
    - !ruby/object:Gem::Version
      version: '0'
required_rubygems_version: !ruby/object:Gem::Requirement
  requirements:
  - - ">="
    - !ruby/object:Gem::Version
      version: '0'
requirements: []
rubygems_version: 3.0.6
signing_key:
specification_version: 4
summary: rspec-3.9.0
test_files: []

The data.tar.gz file is a tarball that consists of the actual gem code as listed in the files attribute in the gemspec.

The checksum.yaml.gz file is a compressed YAML file containing checksums for the data and metadata files.

---
SHA256:
  metadata.gz: 717820f4463baa76607e57e500d14c680608fe6aac01405c7cfe6fd2dcd990db
  data.tar.gz: 919fc9aedde011882f1814d4d16cf92fdfedc728a979f0f814c819211787627f
SHA512:
  metadata.gz: c39a368fbab5da77ca12870485b0d7663fce1deb90b9528f57f695a8543525d61494ac55ffb4ffc7fc6a6c80c2ca2e5492499965bb26485f5c76a49916b699b7
  data.tar.gz: 90ee39bf3cb841049201bdec98d2d45dcdfd0d7c927566621ca7be5529a6c89c8b6b85cab37166e8005aeb44279d3fcf20001aa80cdf4f1d62cd92be391bea82

gem install

When gem install is run, Rubygems fetches this .gem file from the configured gem repository(https://rubygems.org by default) and untars the files and copies them into their proper location using the Gem::Package::TarReader class. This location would depend on the OS and the Ruby version management tool used(rbenv, rvm).

Once a gem is installed, the gem can be loaded in any Ruby file with a require statement. The require statement changes the $LOAD_PATH global variable which contains a list of paths from where code should be loaded.