Introducing HTTPQL: A New Query Language for Hackers
Emile Fugulin
•
January 17, 2024
A common request we received from users in the last few months was to improve our search functionality. Our old interface was decent for simple cases but was lacking the moment you wanted to do some more serious filtering. It also was not super clear what "All of" versus "Any of" would do.
When we started designing the new filtering system, we looked at existing solutions we liked and how we could take inspiration from them. We quickly eliminated UI-based filters in favour of text-based ones, mainly because it is so much faster to type than to click on a bunch of buttons.
We are committed to make Caido more keyboard friendly in the coming months. Please reach out if you have ideas!
One such system is the popular and powerful Wireshark filtering.
Another one, closer to our space, is the Burp Suite extension Logger++.
Just from those two systems, we can already draw some conclusions on what the system should include:
- Concept of operand operator value
- Some form of boolean logic (and, or, parentheses)
- A way to save and re-use filters (maybe even nesting)
In this sense, the new search interface for Caido does look familiar but also contains some new features.
The specification
As far as we are aware, there is no common specification for a query language that operates on HTTP. We are humbly trying to change that with HTTPQL. It was obviously modeled on the needs and limits of our existing backend, so it would likely need to evolve before it could hope to become an industry standard.
We won't go over the whole specification in this blog, but we have more documentation here.
In HTTPQL, each clause contains a namespace, a field, an operator and a value. Right now, the available namespaces are req (request) and resp (response). Each has fields that are parsed from the data.
The operators available are dependent on the type of the value. For text and bytes, we currently offer eq (equals), like (the SQL LIKE operator) and cont (contains), as well as their counterparts (neq, nlike and ncont).
Clauses are then combined with logical operators to form a query.
We also added the ability to save HTTPQL queries (we call them presets) and reuse them in other queries. This is super useful, for example, to quickly eliminate some unwanted requests like images, styles, ads tracking, etc.
In this example, the no-images preset is expanded to a bunch of nested clauses automatically without polluting your original query.
Finally, when you just want to search for a certain string everywhere, we offer a shortcut to do just that!
Under the hood
We know some of you might be wondering how we make this new search feature work. Here are some of the juicy details. As you might know, the Caido Proxy has a clear frontend/backend separation. This means the query is first parsed on the frontend using Lezer (here is the grammar) into an Abstract Syntax Tree (AST).
It is then converted to a tree of GraphQL objects to be transmitted to the Caido Proxy backend. We don't hide this schema; you can even use it yourself directly in Caido!
Once the query reaches the backend, it is transformed again into Rust objects. We then do two final transformations:
- One for SQL clauses to fetch the existing data in the SQLite database of the current project.
- One for Rust clauses to filter new requests that come through the proxy. We send those that pass the filter to the frontend using a websocket.
With those two transformations, we are ready to serve you the data and new requests as they come in. The whole pipeline looks like:
What's next?
We have a lot of ideas on how we want to continue improving the search experience in Caido. Our Pro users should see a Caido Assistant integration this year to help them craft complex queries faster. We also want to add a regex operator to provide even more advanced search functionality.
If you have ideas, feel free to share them with us using our Github issue tracker or directly on Discord!