Ruby tracing part two - rbtrace
In a previous post, I looked at how Linux’s eBPF framework can be used with Ruby.
Of course TIMTOWTDI!
So, it turns out there’s an excellent gem for Ruby tracing: rbtrace by Aman Gupta, which I encountered a few months ago, in Sam Saffron’s post when I needed to get a heap dump from a Ruby process to track down a memory leak.
rbtrace
Justin Weiss wrote up a summary with some cool examples of what the gem can do, but in a nutshell:
It allows the tracer (rbtrace script) being notified, when a tracee (Ruby process) executes a method.
Additionally, it allows the tracer to execute snippets of Ruby code in
the tracee’s context and to fork the ruby
process.
So, using these capabilities, it’s possible to get heap dumps, thread backtraces, profile slow functions (SQL queries, HTTP calls), and attach GDB safely, all in production!
But how does it work? And, if it is ‘safe for production’, what is its performance impact?
To understand them better, I tried to explore rbtrace
’s and Ruby VM’s trace internals.
In this post, I’ll cover the how’s of both the C extension and Ruby
parts, the communication between them, and look at a few interesting
and sometimes undocumented Ruby APIs. Afterwards, I’ll present some
benchmarks against a test application with and without rbtrace
enabled.
Internals
rbtrace
has two parts - a C extension for MRI, which registers a hook in the
VM, and a command line tool that communicates with the extension and
interrogates it for useful information.
Ruby library
The ’non-C part’ of rbtrace
is quite simple, it consists mostly of a:
- Command-line client, for doing the usual option-parsing dance
- MsgQ class for inter-process communication using the queue. This class uses FFI to access underlying system functions.
- The tracer that handles the communication between the VM and
rbtrace
, interpreting and printing incoming data.
The communication is handled in an interesting way:
The tracer asks the OS to set up a IPC message queue
between the rbtrace
and ruby
, using PID of the Ruby process as the shared
key. This queue then allows the tracer pass short commands (up to 120 or 256
characters, depending on the platform) to the tracee. It also sets up a
socket,
into which the tracee writes its output.
So the complete picture looks like:
rbtrace
writes a command into an IPC message queue- It sends Ruby VM a SIGURG signal. The C-extension previously added a listener for this signal.
rbtrace
starts waiting for output on the socket- Upon receiving the signal, the extension executes the command (registers a hook for profiling or evaluates Ruby code provided by the user).
- As the Ruby VM is calling the hook, the extension writes out the resulting info into a socket.
rbtrace
reads and parses this info, displays it on the console.
I haven’t seen SIGURG
before - reading about it lead me to an
article on Wikipedia about out-of-band communication. So it’s use as
notification between these processes suddenly made sense.
C extension
It turns out, that the ‘C part’ - the native extension part of rbtrace
, is not that
complicated either. Most of the code is related to passing the data in an out
of the native extension, and bookkeeping around traced method names and
expressions.
There are three interesting bits:
- signal handler, or rather the
rb_postponed_job
API. - Ruby’s
event_hook
API - hook that gets called on method/cmethod entry/return
First two are APIs for interaction with Ruby - these allow the extension to run code inside the VM. The last one is the hook that extracts info from the runtime context.
The rb_postponed_job API
During compilation, the extension tests for presence of
rb_postponed_job
and if not present, the implementation falls back to
executing the incoming command immediately.
The postponed job API allows for running the registered function (a ‘job’), whenever the thread handles interrupts.
Defined in debug.h, the API looks like:
typedef void (*rb_postponed_job_func_t)(void *arg);
int rb_postponed_job_register(unsigned int flags, rb_postponed_job_func_t func, void *data);
int rb_postponed_job_register_one(unsigned int flags, rb_postponed_job_func_t func, void *data);
The implementation is in vm_trace.c. But how do the jobs get executed? And when?
First part is the responsibility of
rb_postponed_job_flush(rb_vm_t)
, which sets interrupt mask on the
current thread POSTPONED_JOB_INTERRUPT_MASK|TRAP_INTERRUPT_MASK
, walks
the registered jobs and executes them, then restores the mask to
previous state.
To find out when, we need to see how rb_postponed_job_flush
is called.
Grepping for its name and a bit of digging reveals rb_thread_execute_interrupts
,
which gets called on the VM receiving a
signal, or
during IO.
Now, I don’t know for certain, but it seems that the same signal that triggers reading of the incoming command by the C-extension, also triggers execution of the job?
rb_event_hook API
Apparently, it’s possible to register hooks to various events in the
Ruby VM. These hooks then get automatically executed when the event
occurs. rbtrace
uses it to trace function calls:
static void event_hook_install()
{
if (!rbtracer.installed) {
rb_add_event_hook(
event_hook,
RUBY_EVENT_CALL | RUBY_EVENT_C_CALL |
RUBY_EVENT_RETURN | RUBY_EVENT_C_RETURN
#ifdef RUBY_VM
, 0
#endif
);
rbtracer.installed = true;
}
}
rbtrace hook
This is the actual rbtrace
’s hook that gets called by Ruby VM on method entries and
returns:
static void event_hook(rb_event_flag_t event, VALUE data, VALUE self, ID mid, VALUE klass)
It:
- resolves the current context (method, class)
- and based on the tracer options:
- goes through the filters (tracers) to see if it should execute any of those
- records method entry/exit (limited stack depth) and times them, prints out if they’re slower than threshold
- walks through registered expressions and evaluates them in the context of the caller
The hook also supports several expressions passed via -m
option, which
are interesting for debugging:
method_name(self)
, which inspects the value of ‘self’, so e.g. tracing sleep(self)
in a Puma app:
Kernel#sleep(self=#<Puma::ThreadPool::Reaper:0x00561287c2a098...
method(__source__)
outputs the current source location:
Kernel#sleep(__source__="/home/tiny/.gem/ruby/2.4.0/gems/puma-3.6.2/lib/puma/thread_pool.rb:230") <1.001770>
method_name(@ivar_name)
prints a value of an instance variable:
Kernel#sleep(@timeout=1) <1.001300>
any other expression gets evaluated in context of of current object, as can be seen []:
8< --- snip from rbtrace.c ---
snprintf(buffer, len+150, "(begin; ObjectSpace._id2ref(%ld).instance_eval{ %s }; rescue Exception => e; e; end).inspect", NUM2LONG(rb_obj_id(self)), expr);
8< -- snip ---
So for example sleep(instance_variables)
will output:
Kernel#sleep(instance_variables=[:@pool, :@timeout, :@running, :@thread]) <1.001604>
One thing that I was unable to get working was the GC tracing, as it relies
on a patch that no longer applies on 2.4 Ruby.
However, Ruby now has internal events related to GC, so maybe it
should be possible to patch rbtrace
to use them.
Performance and security
I was curious about the performance - besides security , it’s the most common
argument for or against running tracing or debugging tools in a production
environment. So I wrote a simple script using
benchmark-ips, then traced the
process with rbtrace -p PID -m 'Hash#key?'
, before sending it a signal, which
started the benchmark:
require 'benchmark/ips'
require 'rbtrace'
def run
n = 1_000_000
h = Hash.new
i = 0
keys = Array.new(n) { |i| i }
keys.sample(n >> 1).each { |k| h[k] = true }
Benchmark.ips do |x|
x.report { h.key?(keys.sample) }
end
end
go = false
Signal.trap('USR1') { go = true }
puts Process.pid
sleep 1 while !go
run
And the results - without require "rbtrace"
:
Warming up --------------------------------------
150.284k i/100ms
Calculating -------------------------------------
2.120M (± 5.6%) i/s - 10.670M in 5.049265s
With rbtrace
required, untraced process:
Warming up --------------------------------------
148.118k i/100ms
Calculating -------------------------------------
2.177M (± 3.9%) i/s - 10.961M in 5.042478s
With rbtrace
required, tracing Hash#key?
(rbtrace -p PID -m 'Hash#key?' &>/dev/null
):
Warming up --------------------------------------
5.678k i/100ms
Calculating -------------------------------------
62.708k (± 2.6%) i/s - 317.968k in 5.074310s
The first two measurements suggest, that including rbtrace
does not add any
overhead to the process. Which is exactly what we need - as long rbtrace
is
not used, it should not slow down the app.
The last one shows the process slow down about 30x during tracing. While this
may seem like a lot, it only lasts as long as the rbtrace
is attached. It
is also proportional to how often the traced method (and the hook) gets
invoked, so for tracing HTTP
requests or SQL
queries, the overhead should
be negligible.
What about safety? I’m not a security expert, so I can’t really tell - caveat emptor! It seems to me that the attacker would have to already have access to the box (to send commands to the process via the message queue), or execute arbitrary methods, so the extension only opens channels that are already irrelevant in the case there is a vulnerability. I’d definitely love to hear from someone with security experience on this one, though…
Conclusion
rbtrace
is a powerful tool for production Ruby debugging - be it memory
leaks, multi-threaded programs, or slow external API calls. And its internals
provide a look into how to attach a C extension into the heart of
the Ruby VM - definitely something worth knowing about.
How do you use rbtrace
? Are you solving Ruby performance issues in prod? Or
just like to talk about Ruby? Shoot me a mail to (balazs at kutilovi dot cz),
or a tweet! I’ll be happy to chat.