HTML to JSON¶
Welcome to the documentation for the html-to-json library — a small Python library for converting HTML (and HTML tables) to JSON.
📢 If this library is useful to you, please consider sponsoring the project.
Quick-Start¶
Install html-to-json:
pip install html-to-json
Use it:
import html_to_json
html_string = """<head>
<title>Test site</title>
<meta charset="UTF-8"></head>"""
output_json = html_to_json.convert(html_string)
print(output_json)
Try it in your browser (INTERACTIVE!)¶
No install required — run the real html-to-json library right in your browser via Pyodide. Edit some HTML, flip the options, and watch the JSON update live.
Capabilities¶
What this library converts
- Arbitrary HTML to a JSON-friendly Python dictionary (
html_to_json.convert) - HTML tables to a list of row dictionaries (
html_to_json.convert_tables), including:- Tables with headers in the first row
- Tables with headers in the first column
- Tables without headers
Configuration options
convert():
capture_element_values(defaultTrue) — capture the text inside each element under the_valuekey.capture_element_attributes(defaultTrue) — capture each element's attributes under the_attributeskey.
convert_tables():
record_html(defaultFalse) — capture each cell's inner HTML as a string. Useful for preserving links and other inline markup.record_children(defaultFalse) — capture each cell's children as JSON (using the same shape produced byconvert). If both flags are set,record_htmlwins.
Feedback¶
If you have ideas to improve this package, please open an issue!
Credits¶
This package was created with Cookiecutter and fhightower's Python project template.