On Editors

“What’s your favorite editor?” This question is almost bound to start a fight amongst even the most amiable of coders. Nothing boils programmer blood as much as an argument over dev tools. Of course, that never stops anyone from asking, and I’ve frequently debated the virtues of all sorts of programming tools, including of course the one true editor, Emacs.

Now, before you write me off as just another of those wacko Emacs users, hear me out. As much as I love Emacs, I firmly believe that every need deserves a careful look at which tool is best. For example, I am currently writing this post with a new tool I’m trying out, IA Writer, rather than my usual Emacs. I’ll give a run-down of a few of the tools I use, and the needs they fulfill, and hopefully this will help you out in your journey to find the best tools for the job.

First, a bit of background: I’ve used a multitude of editors and environments, from QBasic in Windows 3.1 to Visual Studio. I’ve ran Windows, Linux, and OS X as my daily work OS at various times. Now with that said, I don’t claim to be un-biased regarding programming tools. I am very partial to open source software, and will always choose the open source alternative if it is good enough for my needs.

What exactly are my needs? I write code on a daily basis, in a variety of languages. Lately, for example, I’ve needed to write C/C++, Java, Python, JavaScript, Latex, shell scripts, C#, PHP, and scrape together some HTML and CSS. My preference would naturally be to learn one tool that I can use for all these languages. I want to work as quickly and efficiently as possible, which means I need very useable tools. For me usability requires a simple, clean, keyboard-driven interface. I prefer to touch my mouse as little as possible while working, since the mouse disturbs my flow of work and is infinitely slower than a few quick key presses. Finally, for bonus points, I’d like my tools to be cross-platform since I switch back and forth on various machines and would like a somewhat consistent interface. So, this adds up to lots of languages, keyboard-friendly, and hopefully cross-platform.

These requirements leave me few choices for a primary editor: Kate (and similar basic text editors), Emacs, Vim, Eclipse, Netbeans, and Visual Studio. Visual Studio immediately fails my requirements, since it only excels for a small subset of the languages I use. However, I must admit that VS is easily the best tool I’ve found if you are primarily working in C# (sorry Mono…). I’m not a big fan of heavy IDE’s such as Eclipse and Netbeans, since they tend to be a bit slow for me. In addition, both tend to focus primarily on Java, which I use as rarely as possible out of personal preference. This leaves a choice of either a basic text editor such as Kate or Notepad++, or something more complex, such as Emacs or Vim. I used Kate for a long time as my primary editor, but I never looked back after eventually switching to Emacs. I also used Vim for about two years, but found that my brain didn’t work well with modal editing. Additionally, I was annoyed by the necessity to heavily customize Vim to make it useable for the way I work. I won’t dwell too much on comparing Emacs to other editors, however I hope you can make your own comparison from the features I highlight.

Starting out with Emacs

Now I’ve heard many complaints about the usability of Emacs, but most of this seems to stem from the complexity of shortcuts. First of all, you must use Caps Lock as your Control key to make any sense of Emacs. There’s just no way around it, since Control is so crucial and yet so far away on modern keyboard layouts. Second, most of the important keybinds fit on the front and back of a standard letter size paper. Print out the Emacs quick reference, stick it next to your keyboard, and you’ll quickly learn all the everyday shortcuts.

Unless otherwise noted, all code snippets should be placed in your ~/.emacs file. For installing packages I try to use ELPA. Starting in the latest version (24), Emacs finally has built-in package installation support. This makes installing packages a breeze: meta-x package-list-packages, move the cursor to a package you want, hit i to mark for installation, then x to install. Done! ELPA can keep your packages up-to-date, and has a wide selection of common packages. I use the following config for access to all the usual repositories:

.emacs
1
2
3
(setq package-archives '(("gnu" . "http://elpa.gnu.org/packages/")
                         ("marmalade" . "http://marmalade-repo.org/packages/")
                         ("melpa" . "http://melpa.milkbox.net/packages/")))

How I use Emacs

To give you a feel for how powerful Emacs is, I’ll describe a few of my favorite tricks and features, such as navigation, LaTeX integration, and gdb integration.

Favorite Tricks

I have a few favorite Emacs tricks and commands that I find terribly useful for what I need, and you might too.

I absolutely love org-mode, but I think I would need to take an entire blog post to really do it justice. For now, I shall simply say that it is the simplest, greatest, all-around note-taking and organization software around. I know people who use Emacs solely for org-mode. It’s that good.

Hexl-mode is a fairly nice builtin, although not very feature-complete, hex-viewer. It works great for quickly viewing some binary data, but isn’t quite enough if you plan to do a lot of hex hacking.

While not really a trick, I love the way that Emacs handles indentation. Emacs always “does the right thing” after you set up your indent style, such as with this config:

.emacs
1
2
3
4
(setq c-default-style "k&r"
      c-basic-offset 2)

(setq default-tab-width 2)

Hitting tab always indents the current line to the right position, regardless of where it started. This is great for making all of your indentation consistently perfect. If you need to re-indent a section, just select the section and hit meta-\ to fix all the indentation in that section. While I’ve found most modes do a great job of auto-indentation, unfortunately the js2-mode and HTML modes tend to fall short. However, on the whole, I much prefer this single tab to indent style rather than forcing the programmer to think about how many times to hit tab.

How to get around

I like having a lot of buffers (files) open at once. In fact I’m guilty of this in general – my browser usually has at least 150 tabs open. Fortunately, Emacs makes this super easy. First, I need it to remember all my open files between sessions:

.emacs
1
(desktop-save-mode 1)

For a long time I simply used ctrl-x b to switch buffers. By default, this buffer switch does a few helpful things. The default buffer is the most recently used buffer, which is helpful for flipping back and forth between a few files, and older buffers are quickly accessibly via the up arrow. However, I found that IDO makes switching buffers and opening files much easier. IDO is an unobtrusive plugin which completes buffer and filenames as you type, generally in a natural manner. IDO is built in to recent versions of Emacs and is enabled with:

.emacs
1
2
(require 'ido)
(ido-mode t)

I tend to use the following options for extra goodness:

.emacs
1
2
3
4
5
(ido-everywhere t)
(setq ido-enable-flex-matching t)
(setq ido-create-new-buffer 'always)
(setq ido-enable-tramp-completion nil)
(setq ido-confirm-unique-completion nil)

Modes

As a grad student now, I tend to write a lot of stuff in LaTeX, so I need a good editing environment for that. Along with almost every other Emacs user, my mode of choice is AUCTeX, but I augment this with a few additions. First of all, I use the following settings for AUCTeX:

.emacs
1
2
3
4
5
6
7
8
9
10
(setq TeX-PDF-mode t)

(add-hook 'LaTeX-mode-hook 'visual-line-mode)
(add-hook 'LaTeX-mode-hook 'speck-mode)
(add-hook 'LaTeX-mode-hook 'LaTeX-math-mode)
(add-hook 'LaTeX-mode-hook 'writegood-mode)
(add-hook 'TeX-mode-hook 'zotelo-minor-mode)

(add-hook 'LaTeX-mode-hook 'turn-on-reftex)
(setq reftex-plug-into-AUCTeX t)

These settings set the default output to PDF, turn on visual line mode (line navigation respects soft-wraps), turn on speck mode for spell checking (an alternative to flyspell, which is also great), turn on math mode for LaTeX since I write a lot of math, and turn on reftex and integration with Zotero, my reference manager. I won’t go into a lot of detail on these, since the manuals for each tool tend to be fairly good.

I have found that writegood-mode is great for keeping my writing clean and straightforward when I’m writing papers. It simply marks common errors such as repeated words, as well as uses of the passive voice, and weak or overused words. Try it out, and see what you think!

I use the usual modes for editing various sources, such as js2-mode, python-mode, etc. for syntax highlighting and other goodness. These modes are all available from ELPA, and a quick google or check of the Emacs wiki should point you in the right direction.

GDB

The gdb integration in Emacs is so useful it deserves a section of its own. To fire up gdb in emacs, simply use meta-x gdb. This will prompt you for a command to run gdb with, defaulting to the current executable if your working directory is simple enough to figure this out. You can use any of the usual gdb options here, but make sure that you keep the -i=mi bit at the beginning – this allows Emacs to integrate gdb nicely into its interface.

The best way to use gdb, assuming you have enough screen space, is the many-windows mode. Simply hit meta-x gdb-many-windows after starting a gdb session, and Voilà, you have a gdb prompt, current source file, local vars, breakpoints, etc. Breakpoints can be set in open source files simply by navigating to the line you want to break on and hitting ctrl-x SPACE. Breakpoints are disabled simply by finding the breakpoint in the breakpoints buffer and hitting D on it. Hitting spacebar on a breakpoint disables it. You can also do lots more fun things with gdb, such as open buffers for live memory dumps or disassembly views. Check out the manual for more details.

Now go forth and conquer

I keep my configs in git, which allows me to easily branch for any new machine or platform I’m running on. Check out my exact configs on github and let me know if you have any suggestions or comments!

I hope this post is useful to some, and that others would have great ideas that I can incorporate into my editing workflow. Please comment or contact me (links in the header) if you have any suggestions or comments!

Runtime Function Patching in JS

Have you ever needed to create a stub function which patched itself with the real function at runtime? As part of some research I’ve been working on lately, I found that I needed to do exactly this. Just to make things more challenging this needed to be automated for any valid JS function, no matter how crazy and convoluted. This post will explore how I went about this and the rewriting I ended up with. I’ll try to stick fairly close to how my function patching code actually evolved, but this should certainly not be treated as a history.

Disclaimer: This is still a work in progress, and although it’s been tested with fairly large and comprehensive code bases, there may certainly be edge cases I haven’t thought of. Please comment or send me a message if you find any!

Background

I’m currently working on a JavaScript source-to-source compiler which transforms any arbitrary function into a string and defers the execution and parsing of the function until it gets called (if ever). To do this, the compiler needs to rewrite a function into a stub which fetches and parses the original code, and then replaces itself with the real function.

Starting out

Let’s trace an example of rewriting a simple function. I’ll be using human readable variable names, but in a compiler all generated variable names would need to be checked against current valid variable names and/or be namespaced to not conflict with any existing variable names.

Here’s a simple example function to start out with:

1
2
3
4
5
6
function foo() {
  console.log('Hello, world!');
}

// and actually execute the function
foo();

It turns out that in JavaScript you can assign a value to a function name from inside that function. So, I started by simply replacing the original function with the following stub:

1
2
3
4
5
6
var fooStr = "function() {console.log('Hello, world!');}";

function foo() {
  foo = loadFunction(fooStr);
  foo();
}

Note: We can do an analogous rewrite to anonymous functions assigned to variables. Awesome, we just need to define loadFunction and we’re done, right? Shouldn’t be too hard… Let’s give it a try:

1
2
3
function loadFunction(fnStr) {
  return eval(fnStr);
}

There, that should just about do it… What’s that you say? eval() is complaining about not being able to parse the function? Oh, let’s see, it seems that the anonymous function needs to be wrapped in parens so it can be parsed as a StatementExpression! Ok, fine:

1
var fooStr = "(function() {console.log('Hello, world!');})";

Awesome, it works now!

One thing to note here – loadFunction needs to be in the same frame (function scope) as the original function so that the function created with eval closes over the same variables as the original function would have. This means that nested functions will need a different loadFunction at each nesting level containing a rewritten function so the eval occurs in the correct scope.

Patching references

Now, we’re not done yet… What if some other part of the code took a reference to foo before it executed? Something simple like var fooAlias = foo; would do this. Now, fooAlias will get a reference to the stub, and every time fooAlias was called, the function would have to be recreated. This is, of course, horribly bad for performance! While we can’t replace references to the stub, we can make the stub into a simple trampoline that stores the loaded function and jumps to it if available. Although there are probably other ways to do this, I chose to set the function string variable (fooStr in this case) to null, and check to make sure it was a valid string before the eval. With this change, the code now looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function loadFunction(fnStr, fn) {
  if (typeof fnStr === 'string') {
    return eval(fnStr);
  } else {
    return fn;
  }
}

function foo() {
  foo = loadFunction(fooStr, foo);
  fooStr = null;
  foo();
}

var fooAlias = foo;

fooAlias();
fooAlias();

The first time fooAlias is called, fooStr will be a valid string and the function will be created as it should. The second call will still result in a call to loadFunction, but this will immediately return the replaced version of foo, without recreating it.

Redefinition

Well, now this is all fine and dandy, but does it work for all cases? Unfortunately, as you may have guessed, not yet. First of all, what happens if the code redefines foo after we rewrite it? (I ran into this while trying to compile BananaBread, where a function is redefined to extend it with a bit of new functionality). For example:

1
2
3
4
5
6
7
8
9
10
11
12
function foo() {
  console.log('foo');
}

var oldFoo = foo;

foo = function() {
  console.log('In the new foo');
};

oldFoo();
foo();

should output:

foo
now in the new foo

Using the current transformation, we would rewrite this code to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var fooStr1 = '(function(){console.log("foo")})';
var fooStr2 = '(function(){console.log("In the new foo")})';

function foo() {
  foo = loadFunction(fooStr1, foo);
  fooStr1 = null;
  foo();
}

var oldFoo = foo;

var foo = function () {
  foo = loadFunction(fooStr2, foo);
  fooStr2 = null;
  foo();
};

oldFoo();
foo();

but this outputs:

foo
foo

Wait a second! That’s not what we expected! What happened? Well, if you trace carefully, you will see that the call to oldFoo() on line 18 overwrites foo with the version of foo generated from fooStr1. This is then called as foo() on line 19 and the old version is executed, rather than the correctly redefined version.

We can avoid this by overwriting the stub with the generated function only when it has not been changed elsewhere. This is easily done by checking that foo === arguments.callee before overwriting foo (be careful here: arguments.callee is not the same as this in the browser, as one might expect – see this MDN doc for more info).

With this change, the rewritten code now looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
var fooStr1 = '(function(){console.log("foo")})';
var fooStr2 = '(function(){console.log("In the new foo")})';

function foo() {
  var temp = loadFunction(fooStr1, foo);
  fooStr1 = null;
  if (foo === arguments.callee)
    foo = temp;
  temp();
}

var oldFoo = foo;

var foo = function () {
  temp = loadFunction(fooStr2, foo);
  fooStr2 = null;
  if (foo === arguments.callee)
    foo = temp;
  temp();
};

oldFoo();
foo();

This now works as expected. Unfortunately loadFunction will still get called every time oldFoo is called, but fixing this would add length to the stub, which I’d like to avoid.

Unfortunately we also need to keep track of each stub (and replacement) so that loadFunction works correctly. Without this, a second call to oldFoo which in turn calls loadFunction, returning foo, will return the wrong version of foo! If that seems a bit convoluted, it’s probably because it is. Hopefully tracing through the code will help:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
var generated1 = foo, generated2;
var fooStr1 = '(function(){console.log("foo")})';
var fooStr2 = '(function(){console.log("In the new foo")})';

function foo() {
  generated1 = loadFunction(fooStr1, generated1);
  fooStr1 = null;
  if (foo === arguments.callee)
    foo = generated1;
  generated1();
}

var oldFoo = foo;

var foo = function () {
  generated2 = loadFunction(fooStr2, generated2);
  fooStr2 = null;
  if (foo === arguments.callee)
    foo = generated2;
  generated2();
}, generated2 = foo;

oldFoo();
foo();

This stores the stubs in generated1 and generated2 respectively, and then overwrites these vars when the function is loaded, regardless of whether we can replace the foo name itself. Pay special attention to where these vars were assigned to. generated1 must be assigned at the top of the function because function declarations are lifted to the top of their frame in JavaScript, and we cannot assign to generated2 until after foo is redefined.

All that’s left now for the stub is to properly pass arguments and context to the loaded function by using .apply():

1
2
3
4
5
6
7
function foo() {
  generated1 = loadFunction(fooStr1, generated1);
  fooStr1 = null;
  if (foo === arguments.callee)
    foo = generated1;
  generated1.apply(this, arguments);
}

Properties

Finally, there’s one last piece of the puzzle – copying over any attributes that should be attached to the function being loaded, but that instead were attached to the stub. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
function foo() {
  generated = loadFunction(fooStr, generated);
  fooStr = null;
  if (foo === arguments.callee)
    foo = generated;
  generated.apply(this, arguments);
}

foo.prototype.prototypeMethod = function() {
}

foo.objectMethod = function() {
}

As you can see, prototypeMethod and objectMethod were attached to the stub prototype and object. When foo is loaded and the stub replaced, these methods disappear! We need to extend loadFunction to copy any properties over to the loaded function like so:

1
2
3
4
5
6
7
8
9
10
11
12
function loadFunction(fnStr, fn) {
  if (typeof fnStr === 'string') {
    var temp = eval(fnStr);
    temp.prototype = fn.prototype;
    for (var prop in fn) {
      temp[prop] = fn[prop];
    }
    return temp;
  } else {
    return fn;
  }
}

Putting it all together

Here’s my final version of function loading and patching for a typical function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
var generated = foo;

function loadFunction(fnStr, fn) {
  if (typeof fnStr === 'string') {
    var temp = eval(fnStr);
    temp.prototype = fn.prototype;
    for (var prop in fn) {
      temp[prop] = fn[prop];
    }
    return temp;
  } else {
    return fn;
  }
}

var fooStr = '(function(){console.log("foo")})';


function foo() {
  generated = loadFunction(fooStr, generated);
  fooStr = null;
  if (foo === arguments.callee)
    foo = generated;
  generated.apply(this, arguments);
}

foo();

This covers every edge case that I could think of, and should work to patch a function at runtime (as well as possible).

Low-Level JavaScript

This summer I’m taking a short break from grad school and working as a research intern at Mozilla. Awesome experience so far, and it’s great to be back to working on open source projects! I don’t think I’ve ever cloned so many github repos in such a short time.

Ever wanted to have manual memory allocation in JavaScript? How about types? One of the projects that I’m helping out with is LLJS, a typed dialect of JavaScript which resembles C/C++. It features manual memory allocation when you want it, co-existing alongside JavaScript’s normal garbage-collected object system. The goal here is to allow programmers to write fast JS code with a familiar syntax. Definitely be sure to check out the interactive demos to see examples of LLJS in action.

What’s new with LLJS?

We recently added arrays, with both statically stack-allocated and dynamically heap-allocated variants. These resemble C and C++ array syntax, respectively. I also added union types, to complement our existing struct types. These behave precisely as you would expect from C. Recently I added syntax for defining structs and unions inside other structs or unions, which significantly simplifies writings structs.

I also worked on optimizing our malloc implementation, which is much faster now. We use a naive malloc algorithm from the K&R book, but since we don’t actually need to worry about paging, I modified this so that all memory is on a single page. This cut out function calls for allocating new pages, speeding allocation up quite a bit.

My mentor here at Mozilla, Michael Bebenita, added functions and optional constructors to structs, which are beginning to look a lot like C++ structs rather than C structs.

Finally, perhaps the most exciting new thing in LLJS is memory checking! Tim Disney, another intern here at Mozilla, implemented Valgrind style memory checking for LLJS. It can currently detect the most common memory errors: use after free, uninitialized reads, double frees, and memory leaks.

Esprima in LLJS

As part of another project, I needed super fast JS parsing in JS. Since I was already working with LLJS, I figured I might as well try it out on a larger scale and see if I couldn’t get Esprima ported over to LLJS and using structs for the generated AST. I hoped this would make an already fast parser even faster and leaner by using manual memory allocation.

Well, over 4000 lines of LLJS later, I finished the port. String handling is somewhat inefficient (it creates a new C string for every string in the program), but it works! Check out the sources on github if you’re interested.

So, was it faster? Well… not really. As it turns out, modern JS engines are very good at allocating objects, so manual allocation does not appear to be faster than the engine object allocation. The LLJS version does use about %10 less memory when parsing large JS sources, so that’s a win. From some preliminary testing, traversing the AST appears to be faster with LLJS, since property access is fast, but this comes at the cost of making traversal code harder to write.

Where I think we can definitely win, though, is code which allocates and frees code very often. Manual memory reuse and freeing could provide speedups here over engine garbage collection for some applications (note: I have yet to test this…).

What’s next?

In the beginning of the summer I started off adding source map support to LLJS, so that debug tools in the browser will show the corresponding LLJS sources instead of the compiled JS sources. This is mostly implemented in escodegen, which we use to generate the JS code from a rewritten AST. This got a little bogged down due to a bug in the chrome devtools (not loading sources from a data URI source-map), so I haven’t finished it off yet. Pretty much all that’s left to do is testing there, though.

As far as language features go, I will definitely be adding support for enum types as soon as I get the chance. We are also looking at implementing bit fields as a convenience for bit packing in structs.

If you have more ideas for where LLJS should head, please start an issue on github or (even better) toss us a pull request. Most importantly though, go try it out! If you find it helpful (or frustrating), let us know.

Hello World

Test Post

Just a test post to get things started. Not sure how often I’ll be posting here, but I wanted someplace to write up a few of the things I’m working on. More content will be coming soon!