Skip to main content

Command Palette

Search for a command to run...

CodeMod Tutorial - Part 3 - Paths, Moving Nodes

Updated
13 min read
CodeMod Tutorial - Part 3 - Paths, Moving Nodes
R

Royston Shufflebotham is a Front-end developer and architect at i2, in Cambridge, UK.

He got his first taste of coding in 1981, and really hasn't stopped since then, working his way through backend and frontend jobs in text matching, web & PKI infrastructure and crypto, finance, web server software, laboratory automation and investigative analysis, journeying through Quick Basic, Perl, TCL, C++ (several times), Delphi, C#, Java (several times) and JavaScript.

He appears to have finally settled down as a Front-end developer, preferring to work in React and TypeScript.

He also has a very unhealthy fondness for fiddling round with, or writing, developer tools and DevOps instead of writing real production code.

NOTE: This is Royston's personal blog.

If you fancy some legal nonsense, here is some:

Whilst the author does take some care to ensure the information presented here is accurate and useful, he makes no claims or representations as to accuracy, completeness, correctness, suitability or validity of any information on this site, and refuses to be liable for any errors, omissions, or delays in this information or any losses, injuries or damages arising from its display or use. All information is provided on an as-is basis with no warranties, and confers no rights. There is no warranty that the site is free of viruses or other harmful components. It is the reader's responsibility to verify their own facts and to secure their own systems.

The views, thoughts and opinions expressed in this blog are those of the author only and do not necessarily reflect the official policy or position of any other agency, organization, employer or company, including, but not limited to, my employer.

These views are also subject to change, revision and rethinking at any time; please do not hold us to them in perpetuity.


Oh, and in case you're wondering, it's pronounced ˈrɔɪstən ˈʃʌfl̩boθəm (or ˈrɔɪstən ˈʃʌfl̩bəʊθəm if it's easier for you)(guide). Yeah, I'm definitely helping there...

In part 2 we walked through the process of developing a codemod, and built one from scratch. This time we're going to build a more complex one, introduce the concept of Paths, node factory functions, and look at why we should "use it in a sentence".

Our Codemod

This time, we're going to write a codemod that rewrites array construction.

Let's say you've just inherited a large old JS codebase that contains lots of code like this:

var myArray = new Array(1, 2, 3);
var myOtherArray = new Array('x', 'y', 'z');

Note the explicit array constructor: new Array(). That's generally a pretty unnecessary thing to be doing (in those forms), and can actually be very confusing. We're going to write a codemod that changes those array construction calls into plain array syntax

i.e. to convert the above into:

var myArray = [1, 2, 3];
var myOtherArray = ['x', 'y', 'z'];

Aside

Array construction

A brief interlude for a quick pop quiz...

Do you know what new Array(5, 'b') does? You probably know, or can guess, that it creates a two-element array containing 5 and b. It's more simply written as [5, 'b'].

What about new Array(5)? Some readers may be surprised to hear that it doesn't create a one-element array; it creates a five-element array, initialized all as empty. Its equivalent is [,,,,,].

There are many JS developers who don't know about that one-argument special case, and it's easy to fall into it accidentally even if you do know (e.g. by removing the b element from the first example).

So, it's usually better to use the array syntax [5, 6] instead of explicit Array construction (new Array(5, 6)). The codemod we develop on this page will perform that change automatically across an entire codebase.

Building the codemod

We'll work through the development steps we listed last time.

1. Figure out what we want to change

We want to convert any calls that look like new Array('a', 'b') with or without parameters, into plain array literal syntax ['a', 'b']. The array literal will contain the arguments from the new Array call as its contents, even if they're complex expressions.

2. Figure out what we don't want to change

  • It's important that we don't modify any one-parameter new Array(5) calls. new Array(5) is absolutely not replaceable by [5].
  • We don't want to modify the construction of any other type, only Array

Aside

Actually, things are a little more nuanced than this. The special case Array construction behaviour only happens when the single parameter to new Array() is a number.

If we know that a single parameter isn't a number (e.g. new Array('a string')), or if we had access to type information from, for instance, a TypeScript parser, we could be smarter, but we'll leave those as 'extra credit' exercises.

3. Construct a testbed

We'll develop the testbed shortly, but first, let's look at what happens if our testbed is a little... lacking.

It's very easy, when first writing codemods, to rush into targeting a very precise piece of syntax. In our case, we want to target

new Array('a')`

so let's look at the AST for exactly that. It's:

ASTDiagram2.png

It would be easy to conclude, from that code and diagram that we should be looking for an ExpressionStatement containing a NewExpression with the appropriate callee.

Let's look at a very similar piece of code:

const x = new Array(3, 4);

The AST for that is

ASTDiagram1.png

Notice that there's no ExpressionStatement in that tree.

It turns out that the original ExpressionStatement was only there because the new Array code was at the root of our source file. It's actually only the NewExpression that we're interested in. The ExpressionStatement was a red herring.

How do we avoid making this mistake?

"Use it in a sentence"

Imagine you're in an elementary spelling bee competition and you're asked to spell the word 'pair'. Of course, in such a competition you only hear the word spoken, so you're not sure whether they said 'pair' or 'pear' (or 'pare' or 'pere' )?

You can't ask them to spell the word to clarify, so what do you do? You ask the caller to use the word in a sentence. If they respond with "I'm going to peer through the window.", you know you completely mis-heard originally, but you now know what the word is.

Similarly, if we're trying to figure out the AST structure for a piece of code, we can easily make bad assumptions if we only look at one example with no context.

So, when collecting interesting cases to match, try to 'use them in a sentence' to get a more realistic AST. In fact, use them in multiple different ways that are close to how your actual code uses them.

We'll create a set of sample code that illustrates our positive and negative cases and doesn't simply leave the new Array at the top of the source file:

// Empty array case
// -> const x = [];
const x = new Array();

// Complex arguments case
// -> somefunc(['a', 3 + 4]);
somefunc(new Array('a', 3 + 4));

// Single argument case; leave it alone
somefunc(new Array(5));

// Not Array case; leave it alone
somefunc(new NotArray('a', 5));

That includes things we want to change and similar things we don't want to change. Good!

4. Build a target profile

Follow this link to AST Explorer and it'll take you directly to a pre-configured AST Explorer with that code in it.

The configuration is simply to do 'JavaScript parsing' and to use 'recast' with the 'esprima' parser, and be set up for JSCodeShift transformations.

It also includes the same skeleton codemod from last time ready for us to develop, namely:

export default function transformer(file, api) {
  const j = api.jscodeshift;

  return j(file.source)
    .find(j.Identifier)
    .forEach(path => path.node.name = 'Foo')
    .toSource();
}

Explore the AST

Click around the code and get a feel for the syntax tree.

You'll discover that we're looking for NewExpression nodes where the callee is an Identifier with a name of Array:

NewExpression.png

5. Fire tracer bullets

Let's update our .find() call to target those nodes.

We'll also need to tweak our .forEach(): we were previously targeting Identifiers and could simply set their name property. We're now targeting NewExpressions and they don't have name property; it's their callee that does.

export default function transformer(file, api) {
  const j = api.jscodeshift;
  return j(file.source)
    .find(j.NewExpression, { callee: { type: 'Identifier', name: 'Array' } })
    .forEach(path => path.node.callee.name = 'Foo')
    .toSource();
}

That's an excellent start. We're targeting all the new Array calls:

// Empty array case
// -> const x = [];
const x = Foo;

// Complex arguments case
// -> somefunc(['a', 3 + 4]);
somefunc(Foo);

// Single argument case; leave it alone
somefunc(Foo);

// Not Array case; leave it alone
somefunc(new NotArray('a', 5));

We're still incorrectly targeting the one-argument new Array(5) call. Can we extend the find call to locate that? Sadly not. If we wanted to look for a NewExpression that had an arguments property with a length equal to 1, we could do it, but we want only those with a length not equal to 1.

It's time to introduce another JSCodeShift function that gives us more power: .filter(). But to make sense of that, we firstly need to introduce the concept of Paths.

Paths

Let's take a look at a simple bit of code and its syntax tree:

new Array(3, 'q')

NodesTree.png

The NewExpression AST node has a callee property which points at an Identifier (with a name of Array). It also has an arguments property, but this doesn't point at an AST node object; it points at an array object which contains some nodes.

When processing ASTs we often discover candidate target nodes and want to 'walk around the tree a bit' to confirm that we've found the correct pattern. But navigating around trees like this can be very difficult:

  • There are no 'parent' pointers from a node to its parent.
  • For nodes such as the NumericLiteral and the StringLiteral, what is their parent anyway? Is it the NewExpression or the containing array?
  • Even if there was a 'parent' pointer, we often want to know which property we would come through from parent to child (e.g. left or right side of a + operator?), and a simple 'parent' pointer wouldn't provide that.
  • It's tricky to inspect all of a node's child nodes (e.g. to recurse through a tree): you'd have to examine all the properties and deal with any arrays you came across. Examining the properties safely (without tripping over internal implementation properties, potentially triggering infinite loops) requires knowledge of the properties available on each node type.

This is where Paths come in. Path objects are a feature of the recast library, which JSCodeShift uses.

Recast builds a tree of Path objects parallel to, and pointing at, the tree of AST nodes:

PathsToNodes.png

Amongst the properties on a Path object are:

  • value: the corresponding entry in the syntax tree. Note that Path objects don't always point at syntax _Node_s. Path objects are also created for intermediate containers in the syntax tree, such as arrays, in which case the value property would point at the array.
  • parentPath (not shown on the above diagram, for simplicity): this enables us to walk up the tree
  • name: the 'property' (or numeric index for array Path objects) that was traversed when moving from parent Path object to child Path object
  • node: the containing syntax tree node. That is, if a Path object's value property is pointing to an array for the arguments property of a syntax tree node, the node property will point at the syntax tree node itself

Filtering

When we work with a chain of commands in JSCodeShift (e.g. .find(j.Identifier).forEach(...)), the chain is actually operating on a collection of Path objects, not simply nodes.

In our particular case, we have found all the NewExpression syntax tree nodes, and are manipulating the Path objects that point at them. We want to keep only those Path objects whose NewExpressions have an argument count !== 1.

So we add a new .filter call as follows:

export default function transformer(file, api) {
  const j = api.jscodeshift;

  return j(file.source)
    .find(j.NewExpression, { callee: { type: 'Identifier', name: 'Array' } })
    .filter(path => path.node.arguments.length !== 1)
    .forEach(path => path.node.callee.name = 'Foo')
    .toSource();
}

The result is

// Convert this to
// const x = [];
const x = Foo;

// Leave this alone
somefunc(new Array(5));

// Convert this to
// somefunc(['a', 5]);
somefunc(Foo);

// Leave this alone
somefunc(new NotArray('a', 5));

The difference this time is that, as we'd hoped, the single-argument new Array(5) call is now left alone.

That's taken care of all the 'negative' cases that we don't want to change. We now need to replace the array with something more useful than merely 'Foo'.

6. Apply changes

Last time our codemod didn't do very much: it just changed the name of an existing Identifier. This time, we want to replace, completely, a NewExpression with an array.

Let's start by replacing them with an empty array, to keep things simple.

What does an array look like - what AST structure do we need to create? We can repeat the same exercise we went through to find the structure to target (remembering to "use it in a sentence").

We need to create an ArrayExpression node like this:

ArrayExpression.png

.replaceWith()

Thankfully, JSCodeShift provides a neat little function called .replaceWith(). It can be called on a collection of Paths, and it uses the navigation smarts that we outlined earlier to replace the Paths' nodes with ones we specify.

We're going to start by using .replaceWith() to replace all our target nodes with empty arrays. But how do we actually create those nodes?

JSCodeShift provides us with a whole suite of AST node factory functions.

AST Node Factories

By now, we're used to writing j.Identifier or j.NewExpression to indicate a type of node to search for. JSCodeShift provides a parallel set of functions, starting with lowercase letters, which construct nodes.

e.g. j.identifier('Foo') will create an Identifier with the name property set to 'Foo'.

In our case, we need to create an ArrayExpression, so we'll need to use j.arrayExpression(). Let's try it:

export default function transformer(file, api) {
  const j = api.jscodeshift;

  return j(file.source)
    .find(j.NewExpression, { callee: { type: 'Identifier', name: 'Array' } })
    .filter(path => path.node.arguments.length !== 1)
    .replaceWith(path => j.arrayExpression())
    .toSource();
}

Our output?

no value or default function given for field "elements" of ArrayExpression("elements": [Expression | SpreadElement | RestElement | null])

Uh oh. That's not good. We've obviously missed out a value for elements, whatever that is. We can get a clue by looking at the typing information in the j.arrayExpression() constructor:

arrayExpressionCompletion.png

It's expecting to be given an array of AST nodes for the array elements, but we didn't provide that.

Let's try again, providing an empty array of AST nodes:

export default function transformer(file, api) {
  const j = api.jscodeshift;

  return j(file.source)
    .find(j.NewExpression, { callee: { type: 'Identifier', name: 'Array' } })
    .filter(path => path.node.arguments.length !== 1)
    .replaceWith(j.arrayExpression([]))
    .toSource();
}

The result is

// Empty array case
// -> const x = [];
const x = [];

// Complex arguments case
// -> somefunc(['a', 3 + 4]);
somefunc([]);

// Single argument case; leave it alone
somefunc(new Array(5));

// Not Array case; leave it alone
somefunc(new NotArray('a', 5));

That's looking good. 3 of the 4 cases are now correct. We just need to fix the case where there are actually array parameters.

.replaceWith(j.arrayExpression([])) replaces all of the found nodes with the same expression. We want to examine each Path being replaced, and create an appropriate ArrayExpression for each Path.

The good news is that .replaceWith also accepts a transformation function which is expected to transform a Path object into a replacement expression. We want to transform a NewExpression of a series of array parameters into a literal array containing those same parameters.

Moving nodes around

Of course, we don't want to replace new Array(1, 2) with []. We want to replace it with [1, 2]. How do we construct all the correct array parameters? They could be anything, not just simple numbers.

Well, the good news is that we already have them. The arguments property to the NewExpression already contains exactly the nodes that we need.

Our transformation ends up being surprisingly simple: instead of creating a new ArrayExpression with an empty array of AST nodes, we reuse the AST nodes that were the arguments to the NewExpression:

export default function transformer(file, api) {
  const j = api.jscodeshift;

  return j(file.source)
    .find(j.NewExpression, { callee: { type: 'Identifier', name: 'Array' } })
    .filter(path => path.node.arguments.length !== 1)
    .replaceWith(path => j.arrayExpression(path.node.arguments))
    .toSource();
}

The result is our final correct desired output:

// Empty array case
// -> const x = [];
const x = [];

// Complex arguments case
// -> somefunc(['a', 3 + 4]);
somefunc(['a', 3 + 4]);

// Single argument case; leave it alone
somefunc(new Array(5));

// Not Array case; leave it alone
somefunc(new NotArray('a', 5));

Hooray!

Summary

We developed our understanding of codemods quite a bit this time. You should now have some familiarity with

  • the Path object tree
  • .find()ing and .filter()ing Path collections
  • creating replacement AST trees using AST node factory functions such as j.identifier()

We also discussed why it's a good idea to 'use it in a sentence' when trying to understand an AST tree.

Next time, we'll talk about Scopes.

81 views