gh-131535: Fix stale example in html.parser docs, make examples doctests (GH-131551)

This commit is contained in:
Brian Schubert 2025-05-07 11:50:05 -04:00 committed by GitHub
parent 77b14a6d58
commit ee76e36d76
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -43,7 +43,9 @@ Example HTML Parser Application
As a basic example, below is a simple HTML parser that uses the As a basic example, below is a simple HTML parser that uses the
:class:`HTMLParser` class to print out start tags, end tags, and data :class:`HTMLParser` class to print out start tags, end tags, and data
as they are encountered:: as they are encountered:
.. testcode::
from html.parser import HTMLParser from html.parser import HTMLParser
@ -63,7 +65,7 @@ as they are encountered::
The output will then be: The output will then be:
.. code-block:: none .. testoutput::
Encountered a start tag: html Encountered a start tag: html
Encountered a start tag: head Encountered a start tag: head
@ -230,7 +232,9 @@ Examples
-------- --------
The following class implements a parser that will be used to illustrate more The following class implements a parser that will be used to illustrate more
examples:: examples:
.. testcode::
from html.parser import HTMLParser from html.parser import HTMLParser
from html.entities import name2codepoint from html.entities import name2codepoint
@ -266,13 +270,17 @@ examples::
parser = MyHTMLParser() parser = MyHTMLParser()
Parsing a doctype:: Parsing a doctype:
.. doctest::
>>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ' >>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
... '"http://www.w3.org/TR/html4/strict.dtd">') ... '"http://www.w3.org/TR/html4/strict.dtd">')
Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd" Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"
Parsing an element with a few attributes and a title:: Parsing an element with a few attributes and a title:
.. doctest::
>>> parser.feed('<img src="python-logo.png" alt="The Python logo">') >>> parser.feed('<img src="python-logo.png" alt="The Python logo">')
Start tag: img Start tag: img
@ -285,7 +293,9 @@ Parsing an element with a few attributes and a title::
End tag : h1 End tag : h1
The content of ``script`` and ``style`` elements is returned as is, without The content of ``script`` and ``style`` elements is returned as is, without
further parsing:: further parsing:
.. doctest::
>>> parser.feed('<style type="text/css">#python { color: green }</style>') >>> parser.feed('<style type="text/css">#python { color: green }</style>')
Start tag: style Start tag: style
@ -300,16 +310,25 @@ further parsing::
Data : alert("<strong>hello!</strong>"); Data : alert("<strong>hello!</strong>");
End tag : script End tag : script
Parsing comments:: Parsing comments:
>>> parser.feed('<!-- a comment -->' .. doctest::
>>> parser.feed('<!--a comment-->'
... '<!--[if IE 9]>IE-specific content<![endif]-->') ... '<!--[if IE 9]>IE-specific content<![endif]-->')
Comment : a comment Comment : a comment
Comment : [if IE 9]>IE-specific content<![endif] Comment : [if IE 9]>IE-specific content<![endif]
Parsing named and numeric character references and converting them to the Parsing named and numeric character references and converting them to the
correct char (note: these 3 references are all equivalent to ``'>'``):: correct char (note: these 3 references are all equivalent to ``'>'``):
.. doctest::
>>> parser = MyHTMLParser()
>>> parser.feed('&gt;&#62;&#x3E;')
Data : >>>
>>> parser = MyHTMLParser(convert_charrefs=False)
>>> parser.feed('&gt;&#62;&#x3E;') >>> parser.feed('&gt;&#62;&#x3E;')
Named ent: > Named ent: >
Num ent : > Num ent : >
@ -317,18 +336,22 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
:meth:`~HTMLParser.handle_data` might be called more than once :meth:`~HTMLParser.handle_data` might be called more than once
(unless *convert_charrefs* is set to ``True``):: (unless *convert_charrefs* is set to ``True``):
>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']: .. doctest::
>>> for chunk in ['<sp', 'an>buff', 'ered', ' text</s', 'pan>']:
... parser.feed(chunk) ... parser.feed(chunk)
... ...
Start tag: span Start tag: span
Data : buff Data : buff
Data : ered Data : ered
Data : text Data : text
End tag : span End tag : span
Parsing invalid HTML (e.g. unquoted attributes) also works:: Parsing invalid HTML (e.g. unquoted attributes) also works:
.. doctest::
>>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>') >>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>')
Start tag: p Start tag: p