Using jq with jless to explore json objects (Draft)

It is possible to use jless to quickly generate paths for jq which will target a given node in a json structure.

This is a really useful feature, but there are a couple of potential 'gotchas' which I'll try to describe here. I'm relatively new to jq, so what I say here should be taken with a pinch of salt, and I'd be very glad to hear any feedback about points that I might have got wrong. With that said, this is my current understanding of how to use jless with jq.

jless

jless is a command line tool for viewing json (and yaml) data. If you'd like to learn more, I have written a brief post about how I use jless, and there is also a helpful user guide on the jless website. When you're exploring a data structure with jless, you can use shortcut commands to generate paths to the currently focused node:

In this post I'm outlining how to use the output of the jq command.

jless with jq

According to the jless user guide the command yq copies "a jq style path that will select the currently focused node" to the clipboard.

This is true, but I find that it's perhaps not as clear as it could be. In theory, there are lots of different kinds of jq filters which would select any given node, but they might differ in which other nodes would be selected by that path.

My understanding is that the jq paths which jless generates are filters which jq can use to extract the currently focused value and any values occupying the same position in the json structure. We should think of structure here in a particular way, where essentially two values share a structural position if you can change the path (such as would be generated by yp) by only editing the array indices in the path.

It might be easier to illustrate with an example:

{
  "drinks": [
    {
      "name": "earl gray",
      "ingredients": ["earl gray tea bag", "lemon slice", "boiling water"]
    },
    {
      "name": "macchiato",
      "ingredients": ["espresso", "foamed milk"]
    }
  ]
}

Let's imagine that we've focused on the line containing the value 'lemon slice' and asked jless to generate a jq path for it. The path it generates is:

  .drinks[].ingredients[]

If you feed this path into jq, it does capture the value 'lemon slice'. But it will also capture 'earl gray tea bag', 'boiling water', 'espresso', and 'foamed milk'.

The JavaScript-style paths to the values captured by this jq filter are as follows:

 .drinks[0].ingredients[0]
 .drinks[0].ingredients[1]
 .drinks[0].ingredients[2]
 .drinks[1].ingredients[0]
 .drinks[1].ingredients[1]

We can see from this that the object or dictionary keys stay constant, but that the array indexes change. The jq filter generated by jless will capture any values which can be accessed by substituting different numbers into the path that gets us to our original value 'lemon slice'.

Collecting the values from a data structure which share this kind of 'structural' position is a common task, and I've found this to be a really useful shortcut. It's true that writing the same sort of query in JavaScript or Python would be fairly trivial, but it would still take a certain amount of work to load and parse the json data, work out the right path structure, write the loops, etc. When we're just exploring a dataset, rather than trying to build functionality around it, combining jless and jq can be used to very quickly and easily generate insights.

A potential 'gotcha'

Having said this, there is at least one potential problem to watch out for when using jless to generate jq filters.

To illustrate, consider a slight variation on our original json object:

{
  "drinks": [
    {
      "name": "earl gray",
      "ingredients": ["earl gray tea bag", "lemon slice"]
    },
    {
      "name": "macchiato",
      "ingredients": ["espresso", "foamed milk"]
    },
    {
      "name": "custard"
    }
  ]
}

For whatever reason, the author of this object hasn't added any ingredients for 'custard'. Now if we focus on 'lemon slice' in jless and ask it for a jq path, we'll get the same path we did before: .drinks[].ingredients[]. But if we use this path for the new file which includes the 'custard' entry, jq will throw an error:

  jq: error (at <stdin>:21): Cannot iterate over null (null)

We've hit a familiar problem with json paths here. Essentially, with this command we're asking jq to access a value at a path which doesn't exist. jless doesn't appear to know this, so the path it gives us will only run without errors if at each point in the tree that we're navigating, the key or array that it expects for the original case is also present on the other isomorphic branches.

The good news is that jq has a built-in option for avoiding these kinds of errors. Much like optional chaining in JavaScript, you can put a ? after a path segment and jq will not throw errors if it doesn't find what it's expecting for that segment on any given branch.

In this case, we can modify the query as follows:

 cat jq-break-test.json | jq '[.drinks[].ingredients[]?]'

This gives us something more like what we probably want here:

["earl gray tea bag", "lemon slice", "espresso", "foamed milk"]

It's tempting to just make all path segments optional in this way, to avoid errors, but it's also worth bearing in mind that key and index errors can be a useful check on whether your assumptions about a json structure are accurate, so for some purposes it will be worth using as few ? operators as you can get away with.

I hope that helps if you want to try out using jless with jq to explore json data. Questions, corrections, or suggestions for alternative ways to approach these kinds of tasks are very welcome!