The pi coding agent I use is pretty minimal by design. So every now and then I end up needing to create a skill or an extension to handle something specific.
One of the first things I needed was a way to grab HTML from the web (blog posts, library docs) and turn it into markdown locally with a CLI tool. I didn’t want the agent to waste tokens parsing raw HTML itself.
That’s when I found html-to-markdown. Here is what it is and how to use it.
It does exactly what the name says: converts HTML to markdown.
It’s a Rust library with bindings in a bunch of languages (Python and TypeScript included), but it also comes with a CLI. You just call it standalone and pipe stuff through it.
Now whenever I have some HTML that needs to become markdown, this is my go-to.
You can install it through a few different package managers. I just go with whatever the tool’s written in, and since this one is Rust, that means cargo:
cargo install html-to-markdown-cli
Other installation options are here: https://docs.html-to-markdown.kreuzberg.dev/installation/
The easiest way is with pipes. You pipe HTML text in, you get markdown out.
Pair it with curl and you can convert remote pages just like this:
curl -L https://mkennedy.codes/posts/python-numbers-every-programmer-should-know | html-to-markdown > /tmp/article.md
To wire this into pi I just created a skill that runs that exact command. The agent reads a URL and pipes it through curl + html-to-markdown before doing anything else. This saves a lot of tokens in the long run.
# Should show the installed version
html-to-markdown --version
# Quick test with inline HTML
echo "<h1>Hello</h1><p>World</p>" | html-to-markdown
You should see the markdown version of whatever HTML you piped in. The test above should output something like:
# Hello
World
curl isn’t enough and you’re dealing with JavaScript-heavy pages, you’ll need something like Playwright. I’ll probably create a pi extension in TypeScript or search for a tool to directly get the html of those pages if/when the need arises.curl + html-to-markdown has been enough for months now.