Ruby tracing part two - rbtrace

February 4, 2017

In a previous post, I looked at how Linux’s eBPF framework can be used with Ruby.

So, it turns out there’s an excellent gem for Ruby tracing: rbtrace by Aman Gupta, which I encountered a few months ago, in Sam Saffron’s post when I needed to get a heap dump from a Ruby process to track down a memory leak.

rbtrace

Justin Weiss wrote up a summary with some cool examples of what the gem can do, but in a nutshell:

It allows the tracer (rbtrace script) being notified, when a tracee (Ruby process) executes a method.

Additionally, it allows the tracer to execute snippets of Ruby code in the tracee’s context and to fork the ruby process.

So, using these capabilities, it’s possible to get heap dumps, thread backtraces, profile slow functions (SQL queries, HTTP calls), and attach GDB safely, all in production!

But how does it work? And, if it is ‘safe for production’, what is its performance impact?

To understand them better, I tried to explore rbtrace’s and Ruby VM’s trace internals.

In this post, I’ll cover the how’s of both the C extension and Ruby parts, the communication between them, and look at a few interesting and sometimes undocumented Ruby APIs. Afterwards, I’ll present some benchmarks against a test application with and without rbtrace enabled.

Internals

rbtrace has two parts - a C extension for MRI, which registers a hook in the VM, and a command line tool that communicates with the extension and interrogates it for useful information.

Ruby library

The ’non-C part’ of rbtrace is quite simple, it consists mostly of a:

Command-line client, for doing the usual option-parsing dance
MsgQ class for inter-process communication using the queue. This class uses FFI to access underlying system functions.
The tracer that handles the communication between the VM and rbtrace, interpreting and printing incoming data.

The communication is handled in an interesting way:

The tracer asks the OS to set up a IPC message queue between the rbtrace and ruby, using PID of the Ruby process as the shared key. This queue then allows the tracer pass short commands (up to 120 or 256 characters, depending on the platform) to the tracee. It also sets up a socket, into which the tracee writes its output.

So the complete picture looks like:

rbtrace writes a command into an IPC message queue
It sends Ruby VM a SIGURG signal. The C-extension previously added a listener for this signal.
rbtrace starts waiting for output on the socket
Upon receiving the signal, the extension executes the command (registers a hook for profiling or evaluates Ruby code provided by the user).
As the Ruby VM is calling the hook, the extension writes out the resulting info into a socket.
rbtrace reads and parses this info, displays it on the console.

I haven’t seen SIGURG before - reading about it lead me to an article on Wikipedia about out-of-band communication. So it’s use as notification between these processes suddenly made sense.

C extension

It turns out, that the ‘C part’ - the native extension part of rbtrace, is not that complicated either. Most of the code is related to passing the data in an out of the native extension, and bookkeeping around traced method names and expressions.

There are three interesting bits:

signal handler, or rather the rb_postponed_job API.
Ruby’s event_hook API
hook that gets called on method/cmethod entry/return

First two are APIs for interaction with Ruby - these allow the extension to run code inside the VM. The last one is the hook that extracts info from the runtime context.

The rb_postponed_job API

During compilation, the extension tests for presence of rb_postponed_job and if not present, the implementation falls back to executing the incoming command immediately.

The postponed job API allows for running the registered function (a ‘job’), whenever the thread handles interrupts.

Defined in debug.h, the API looks like:

typedef void (*rb_postponed_job_func_t)(void *arg);
int rb_postponed_job_register(unsigned int flags, rb_postponed_job_func_t func, void *data);
int rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data);

The implementation is in vm_trace.c. But how do the jobs get executed? And when?

First part is the responsibility of rb_postponed_job_flush(rb_vm_t), which sets interrupt mask on the current thread POSTPONED_JOB_INTERRUPT_MASK|TRAP_INTERRUPT_MASK, walks the registered jobs and executes them, then restores the mask to previous state.

To find out when, we need to see how rb_postponed_job_flush is called. Grepping for its name and a bit of digging reveals rb_thread_execute_interrupts, which gets called on the VM receiving a signal, or during IO.

Now, I don’t know for certain, but it seems that the same signal that triggers reading of the incoming command by the C-extension, also triggers execution of the job?

rb_event_hook API

Apparently, it’s possible to register hooks to various events in the Ruby VM. These hooks then get automatically executed when the event occurs. rbtrace uses it to trace function calls:

static void event_hook_install()
{
  if (!rbtracer.installed) {
    rb_add_event_hook(
      event_hook,
      RUBY_EVENT_CALL   | RUBY_EVENT_C_CALL |
      RUBY_EVENT_RETURN | RUBY_EVENT_C_RETURN
#ifdef RUBY_VM
      , 0
#endif
    );
    rbtracer.installed = true;
  }
}

rbtrace hook

This is the actual rbtrace’s hook that gets called by Ruby VM on method entries and returns:

static void event_hook(rb_event_flag_t event, VALUE data, VALUE self, ID mid, VALUE klass)

It:

resolves the current context (method, class)
and based on the tracer options:
goes through the filters (tracers) to see if it should execute any of those
records method entry/exit (limited stack depth) and times them, prints out if they’re slower than threshold
walks through registered expressions and evaluates them in the context of the caller

The hook also supports several expressions passed via -m option, which are interesting for debugging:

method_name(self), which inspects the value of ‘self’, so e.g. tracing sleep(self) in a Puma app:

Kernel#sleep(self=#<Puma::ThreadPool::Reaper:0x00561287c2a098...

method(__source__) outputs the current source location:

Kernel#sleep(__source__="/home/tiny/.gem/ruby/2.4.0/gems/puma-3.6.2/lib/puma/thread_pool.rb:230") <1.001770>

method_name(@ivar_name) prints a value of an instance variable:

Kernel#sleep(@timeout=1) <1.001300>

any other expression gets evaluated in context of of current object, as can be seen []:

8< --- snip from rbtrace.c ---
snprintf(buffer, len+150, "(begin; ObjectSpace._id2ref(%ld).instance_eval{ %s  }; rescue Exception => e; e; end).inspect", NUM2LONG(rb_obj_id(self)), expr);
8< -- snip ---

So for example sleep(instance_variables) will output:

Kernel#sleep(instance_variables=[:@pool, :@timeout, :@running, :@thread]) <1.001604>

One thing that I was unable to get working was the GC tracing, as it relies on a patch that no longer applies on 2.4 Ruby. However, Ruby now has internal events related to GC, so maybe it should be possible to patch rbtrace to use them.

Performance and security

I was curious about the performance - besides security , it’s the most common argument for or against running tracing or debugging tools in a production environment. So I wrote a simple script using benchmark-ips, then traced the process with rbtrace -p PID -m 'Hash#key?', before sending it a signal, which started the benchmark:

require 'benchmark/ips'
require 'rbtrace'

def run
  n    = 1_000_000
  h    = Hash.new
  i    = 0

  keys = Array.new(n) { |i| i }
  keys.sample(n >> 1).each { |k| h[k] = true }

  Benchmark.ips do |x|
    x.report { h.key?(keys.sample) }
  end
end

go = false

Signal.trap('USR1') { go = true }

puts Process.pid
sleep 1 while !go
run

And the results - without require "rbtrace":

Warming up --------------------------------------
        150.284k i/100ms
Calculating -------------------------------------
        2.120M (± 5.6%) i/s -     10.670M in   5.049265s

With rbtrace required, untraced process:

Warming up --------------------------------------
        148.118k i/100ms
Calculating -------------------------------------
        2.177M (± 3.9%) i/s -     10.961M in   5.042478s

With rbtrace required, tracing Hash#key? (rbtrace -p PID -m 'Hash#key?' &>/dev/null):

Warming up --------------------------------------
        5.678k i/100ms
Calculating -------------------------------------
        62.708k (± 2.6%) i/s -    317.968k in   5.074310s

The first two measurements suggest, that including rbtrace does not add any overhead to the process. Which is exactly what we need - as long rbtrace is not used, it should not slow down the app.

The last one shows the process slow down about 30x during tracing. While this may seem like a lot, it only lasts as long as the rbtrace is attached. It is also proportional to how often the traced method (and the hook) gets invoked, so for tracing HTTP requests or SQL queries, the overhead should be negligible.

What about safety? I’m not a security expert, so I can’t really tell - caveat emptor! It seems to me that the attacker would have to already have access to the box (to send commands to the process via the message queue), or execute arbitrary methods, so the extension only opens channels that are already irrelevant in the case there is a vulnerability. I’d definitely love to hear from someone with security experience on this one, though…

Conclusion

rbtrace is a powerful tool for production Ruby debugging - be it memory leaks, multi-threaded programs, or slow external API calls. And its internals provide a look into how to attach a C extension into the heart of the Ruby VM - definitely something worth knowing about.

How do you use rbtrace? Are you solving Ruby performance issues in prod? Or just like to talk about Ruby? Shoot me a mail to (balazs at kutilovi dot cz), or a tweet! I’ll be happy to chat.