So, for those of you who haven't heard, DTrace is a tool that's been around since late 2003, but has recently started gaining popularity.  It's an end-to-end instrumentation framework for system debugging and performance analysis, which allows developers to gain a complete picture of the environment in which an application is being deployed, from the kernel all the way down to individual library function calls.

To utilize it, you start with an OS kernel that has DTrace support.  At the moment, that means Solaris, FreeBSD, NetBSD, and Mac OS X, with an experimental Linux port well underway.

Then, you get your compiled / interpreted languages and dependent libraries with DTrace support.  It's available for just about every language used in client apps and server-side scripting today, with the merciful exception of that awful "dot" stuff.  There are also quite a few full application suites with end-to-end support as well, notably Firefox.

Once you're working with a complete toolset, you use the dtrace command to load scripts (written in the 'D' programming language, hence the name) that describe the things you want to know and, optionally, what action(s) to take based on that information.

For example, you might ask DTrace, "which of my application's threads take the longest CPU time to execute, and within those threads, which syscall in particular consumes the most time, and what's blocking that syscall that it takes so long?"  Such information is immensely useful- it can tell you about e.g. deadlocks between otherwise-independent threads in your application that want disk access at the same time, or resource conflicts between your application itself and other tools on which it depends.  Such information is virtually impossible to resolve with conventional debugging and tracing tools.

So, just how usable is DTrace?  Since I'm an avid FreeBSD user myself, and Erlang is my poison of choice for application development, I decided to give that combo a try.

First off, you need to have a FreeBSD kernel with DTrace support.  This has been the default in GENERIC kernels on Tier 1 platforms since 7.1, iirc.  If you're on a custom kernel, instructions can be found here.  It's worth noting that on -CURRENT, as usual, your mileage will almost certainly vary (and in odd and confusing ways).  Once you've got an appropriately-configured kernel, you still need to actually load the DTrace modules.

Second, you need an application and/or language suite with DTrace support.  Erlang staight from ports has this capability, but it's not enabled in the default build (so ditto binary pkgs).  Grab an up-to-date ports tree, cd /usr/ports/lang/erlang; sudo make config and turn on DTrace.  Build and install the port with the usual process.  If you're going with a language suite here, then you also need an application to run with it.  I just used my own (very immature, pre-alpha) EMiLE project as a crash-test dummy.

Finally, you need some probes to actually run.  There are plenty of demo scripts in
/usr/share/dtrace for 10.0 and later.  If you're running something earlier, you can get them with the sysutils/DTraceToolkit port.  I also found some demo goodies just for Erlang in this presentation.

Once you've got all of those things, you fire up your application, and run dtrace(1) with the appropriate arguments to run your command directly, or to point to your script file.  The results vary by your target, but are generally pretty amazing.  I discovered several resource concurrency issues and corrected them in about fifteen minutes thanks to DTrace, whereas I probably would have otherwise spent weeks hunting them down (if I found them at all).

The summary: apart from the headaches of using a new scripting language, if you're an application developer and you're not using DTrace, you're probably missing out on the 'deep view' of what your code is actually doing not just within its own confines, but to the hosting system at large.  It's definitely a tool that I'll be using going forward.