Skip to content

HTML to JSON

PyPI PyPI - Downloads codecov

Welcome to the documentation for the html-to-json library — a small Python library for converting HTML (and HTML tables) to JSON.

📢 If this library is useful to you, please consider sponsoring the project.

Quick-Start

Install html-to-json:

pip install html-to-json

Use it:

import html_to_json

html_string = """<head>
    <title>Test site</title>
    <meta charset="UTF-8"></head>"""

output_json = html_to_json.convert(html_string)
print(output_json)

Try it in your browser (INTERACTIVE!)

No install required — run the real html-to-json library right in your browser via Pyodide. Edit some HTML, flip the options, and watch the JSON update live.

Open the interactive demo

Capabilities

What this library converts
  • Arbitrary HTML to a JSON-friendly Python dictionary (html_to_json.convert)
  • HTML tables to a list of row dictionaries (html_to_json.convert_tables), including:
    • Tables with headers in the first row
    • Tables with headers in the first column
    • Tables without headers
Configuration options

convert():

  • capture_element_values (default True) — capture the text inside each element under the _value key.
  • capture_element_attributes (default True) — capture each element's attributes under the _attributes key.

convert_tables():

  • record_html (default False) — capture each cell's inner HTML as a string. Useful for preserving links and other inline markup.
  • record_children (default False) — capture each cell's children as JSON (using the same shape produced by convert). If both flags are set, record_html wins.

Feedback

If you have ideas to improve this package, please open an issue!

Credits

This package was created with Cookiecutter and fhightower's Python project template.