Back to Blog
ToolsJSONAutomationOpen Source

I Built Two Browser-Based Converters: Document ↔ JSON and Spreadsheet ↔ JSON

I just released two small utilities I've been using in my own workflow:

Both run entirely in your browser. Your files never leave your machine.


Why JSON?

If you've ever tried to automate anything involving Word documents or Excel spreadsheets, you've run into the same wall: these file formats are opaque binaries. You can't grep them, you can't diff them in git, and you can't pass them directly to an API or LLM without a heavy library.

JSON changes that.

JSON is text. That means you can read it, edit it, diff it, commit it to git, and pipe it through any tool in any language without a specialized parser.

JSON is universal. Every language — Python, JavaScript, Rust, Ruby — has a JSON library built in. Once your document is JSON, the entire ecosystem is available to you.

JSON is the lingua franca of APIs. LLM APIs, web services, databases — they all speak JSON. If you want to feed document content to GPT-4 or Claude, you don't hand it a .docx file. You hand it text, ideally structured text. That's exactly what these tools produce.

JSON is roundtrippable. This is the part most converters miss. Converting to JSON is only useful if you can go back. Both tools support bidirectional conversion — so you can edit the JSON programmatically and regenerate the original file format.


What the converters produce

Document → JSON

A .docx file becomes a structured object:

{
  "fileName": "report.docx",
  "convertedAt": "2026-03-01T...",
  "content": [
    { "type": "heading", "level": 1, "text": "Quarterly Report" },
    { "type": "paragraph", "text": "Revenue increased by...", "runs": [...] },
    { "type": "list-item", "ordered": false, "level": 1, "text": "Key finding" },
    { "type": "table", "rows": [["Q1", "¥1.2M"], ["Q2", "¥1.8M"]] }
  ]
}

Every block is typed — headings with their level, paragraphs with inline formatting runs (bold, italic, underline), list items with nesting level, and tables as row/column arrays. This structure is designed to be easy to process programmatically.

Spreadsheet → JSON

A single-sheet .xlsx becomes a flat array. A multi-sheet file becomes an object keyed by sheet name:

{
  "Sales": [
    { "Month": "Jan", "Revenue": 120000, "Region": "Tokyo" },
    { "Month": "Feb", "Revenue": 145000, "Region": "Osaka" }
  ],
  "Summary": [
    { "Total": 265000, "YoY": "+12%" }
  ]
}

Column headers become object keys. Every row becomes an object. The result slots directly into any script, API call, or database import.


Practical use cases

Document processing pipelines — convert a batch of reports to JSON, extract specific sections with jq or Python, feed the relevant text to an LLM for summarization or translation, then reconstruct the document.

Spreadsheet data in code — pull data from .xlsx into a Node.js or Python script without installing a heavy spreadsheet library. One conversion, then standard JSON handling.

AI-assisted editing — convert a document to JSON, send the content array to an LLM with editing instructions, write the modified JSON back to .docx. The structured format makes it easy for models to operate on specific blocks without touching the rest.

Git-tracking document content — store the JSON alongside the source file in your repository. Now git diff actually tells you what changed between versions.

Data migration — export spreadsheet data to JSON as a clean intermediate step when moving between systems.


No server, no uploads

Both converters run entirely in the browser using:

Your file is processed locally by the browser's JavaScript engine. Nothing is sent to any server.


Try them

Both tools are open source and free to use. I built them primarily for my own automation work — but if they're useful to you, all the better.

If you're building document or spreadsheet pipelines and want something more tailored to your workflow, reach out — that's the kind of problem I enjoy working on.

Share:
View all posts