gh-131535: Fix stale example in html.parser docs, make examples doctests (GH-131551)
This commit is contained in:
parent
77b14a6d58
commit
ee76e36d76
@ -43,7 +43,9 @@ Example HTML Parser Application
|
|||||||
|
|
||||||
As a basic example, below is a simple HTML parser that uses the
|
As a basic example, below is a simple HTML parser that uses the
|
||||||
:class:`HTMLParser` class to print out start tags, end tags, and data
|
:class:`HTMLParser` class to print out start tags, end tags, and data
|
||||||
as they are encountered::
|
as they are encountered:
|
||||||
|
|
||||||
|
.. testcode::
|
||||||
|
|
||||||
from html.parser import HTMLParser
|
from html.parser import HTMLParser
|
||||||
|
|
||||||
@ -63,7 +65,7 @@ as they are encountered::
|
|||||||
|
|
||||||
The output will then be:
|
The output will then be:
|
||||||
|
|
||||||
.. code-block:: none
|
.. testoutput::
|
||||||
|
|
||||||
Encountered a start tag: html
|
Encountered a start tag: html
|
||||||
Encountered a start tag: head
|
Encountered a start tag: head
|
||||||
@ -230,7 +232,9 @@ Examples
|
|||||||
--------
|
--------
|
||||||
|
|
||||||
The following class implements a parser that will be used to illustrate more
|
The following class implements a parser that will be used to illustrate more
|
||||||
examples::
|
examples:
|
||||||
|
|
||||||
|
.. testcode::
|
||||||
|
|
||||||
from html.parser import HTMLParser
|
from html.parser import HTMLParser
|
||||||
from html.entities import name2codepoint
|
from html.entities import name2codepoint
|
||||||
@ -266,13 +270,17 @@ examples::
|
|||||||
|
|
||||||
parser = MyHTMLParser()
|
parser = MyHTMLParser()
|
||||||
|
|
||||||
Parsing a doctype::
|
Parsing a doctype:
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
>>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
|
>>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
|
||||||
... '"http://www.w3.org/TR/html4/strict.dtd">')
|
... '"http://www.w3.org/TR/html4/strict.dtd">')
|
||||||
Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"
|
Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"
|
||||||
|
|
||||||
Parsing an element with a few attributes and a title::
|
Parsing an element with a few attributes and a title:
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
>>> parser.feed('<img src="python-logo.png" alt="The Python logo">')
|
>>> parser.feed('<img src="python-logo.png" alt="The Python logo">')
|
||||||
Start tag: img
|
Start tag: img
|
||||||
@ -285,7 +293,9 @@ Parsing an element with a few attributes and a title::
|
|||||||
End tag : h1
|
End tag : h1
|
||||||
|
|
||||||
The content of ``script`` and ``style`` elements is returned as is, without
|
The content of ``script`` and ``style`` elements is returned as is, without
|
||||||
further parsing::
|
further parsing:
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
|
>>> parser.feed('<style type="text/css">#python { color: green }</style>')
|
||||||
Start tag: style
|
Start tag: style
|
||||||
@ -300,16 +310,25 @@ further parsing::
|
|||||||
Data : alert("<strong>hello!</strong>");
|
Data : alert("<strong>hello!</strong>");
|
||||||
End tag : script
|
End tag : script
|
||||||
|
|
||||||
Parsing comments::
|
Parsing comments:
|
||||||
|
|
||||||
>>> parser.feed('<!-- a comment -->'
|
.. doctest::
|
||||||
|
|
||||||
|
>>> parser.feed('<!--a comment-->'
|
||||||
... '<!--[if IE 9]>IE-specific content<![endif]-->')
|
... '<!--[if IE 9]>IE-specific content<![endif]-->')
|
||||||
Comment : a comment
|
Comment : a comment
|
||||||
Comment : [if IE 9]>IE-specific content<![endif]
|
Comment : [if IE 9]>IE-specific content<![endif]
|
||||||
|
|
||||||
Parsing named and numeric character references and converting them to the
|
Parsing named and numeric character references and converting them to the
|
||||||
correct char (note: these 3 references are all equivalent to ``'>'``)::
|
correct char (note: these 3 references are all equivalent to ``'>'``):
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
|
>>> parser = MyHTMLParser()
|
||||||
|
>>> parser.feed('>>>')
|
||||||
|
Data : >>>
|
||||||
|
|
||||||
|
>>> parser = MyHTMLParser(convert_charrefs=False)
|
||||||
>>> parser.feed('>>>')
|
>>> parser.feed('>>>')
|
||||||
Named ent: >
|
Named ent: >
|
||||||
Num ent : >
|
Num ent : >
|
||||||
@ -317,18 +336,22 @@ correct char (note: these 3 references are all equivalent to ``'>'``)::
|
|||||||
|
|
||||||
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
|
Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but
|
||||||
:meth:`~HTMLParser.handle_data` might be called more than once
|
:meth:`~HTMLParser.handle_data` might be called more than once
|
||||||
(unless *convert_charrefs* is set to ``True``)::
|
(unless *convert_charrefs* is set to ``True``):
|
||||||
|
|
||||||
>>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']:
|
.. doctest::
|
||||||
|
|
||||||
|
>>> for chunk in ['<sp', 'an>buff', 'ered', ' text</s', 'pan>']:
|
||||||
... parser.feed(chunk)
|
... parser.feed(chunk)
|
||||||
...
|
...
|
||||||
Start tag: span
|
Start tag: span
|
||||||
Data : buff
|
Data : buff
|
||||||
Data : ered
|
Data : ered
|
||||||
Data : text
|
Data : text
|
||||||
End tag : span
|
End tag : span
|
||||||
|
|
||||||
Parsing invalid HTML (e.g. unquoted attributes) also works::
|
Parsing invalid HTML (e.g. unquoted attributes) also works:
|
||||||
|
|
||||||
|
.. doctest::
|
||||||
|
|
||||||
>>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>')
|
>>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>')
|
||||||
Start tag: p
|
Start tag: p
|
||||||
|
Loading…
x
Reference in New Issue
Block a user