TURNING HTML INTO MARKDOWN WITH HTML-TO-MARKDOWN

I needed a way to grab HTML content from the web and convert it to markdown locally without letting my pi coding agent waste tokens parsing raw HTML. I found html-to-markdown, a Rust CLI tool that does exactly that. Here is how to install and use it.

2026 May 20

How I did that?

The pi coding agent I use is pretty minimal by design. So every now and then I end up needing to create a skill or an extension to handle something specific.

One of the first things I needed was a way to grab HTML from the web (blog posts, library docs) and turn it into markdown locally with a CLI tool. I didn’t want the agent to waste tokens parsing raw HTML itself.

That’s when I found html-to-markdown. Here is what it is and how to use it.

What is html-to-markdown?

It does exactly what the name says: converts HTML to markdown.

It’s a Rust library with bindings in a bunch of languages (Python and TypeScript included), but it also comes with a CLI. You just call it standalone and pipe stuff through it.

Now whenever I have some HTML that needs to become markdown, this is my go-to.

How to install the CLI

You can install it through a few different package managers. I just go with whatever the tool’s written in, and since this one is Rust, that means cargo:

cargo install html-to-markdown-cli

Other installation options are here: https://docs.html-to-markdown.kreuzberg.dev/installation/

How to use it

The easiest way is with pipes. You pipe HTML text in, you get markdown out.

Pair it with curl and you can convert remote pages just like this:

curl -L https://mkennedy.codes/posts/python-numbers-every-programmer-should-know | html-to-markdown > /tmp/article.md

To wire this into pi I just created a skill that runs that exact command. The agent reads a URL and pipes it through curl + html-to-markdown before doing anything else. This saves a lot of tokens in the long run.

Verify the setup is correct

# Should show the installed version
html-to-markdown --version

# Quick test with inline HTML
echo "<h1>Hello</h1><p>World</p>" | html-to-markdown

You should see the markdown version of whatever HTML you piped in. The test above should output something like:

# Hello

World

Important notes

If curl isn’t enough and you’re dealing with JavaScript-heavy pages, you’ll need something like Playwright. I’ll probably create a pi extension in TypeScript or search for a tool to directly get the html of those pages if/when the need arises.
For code docs and most blog posts curl + html-to-markdown has been enough for months now.
The tool strips images and media by default. You get clean markdown text only.