In this assignment you'll be writing a program that converts an HTML file into a format that is easier to read.
When one "pretty-prints" a file, the contents of the file are edited to produce a more attractive looking document, usually for printing or reading purposes. (See Wikipedia for more information.)
Write a Python program prettyprint.py
which:
atds.py
) to analyze the tokensprettyprint.py
An HTML document, which is used to describe the contents of a webpage, consists of a series of markup "tags"--easily identified by angle brackets that surround them--and content. A simple example:
<html>
<head>
<title>
My favorite equation of all time
</title>
</head>
<body>
<p>
c<sup>2</sup> >= a<sup>2</sup> + b<sup>2</sup>
</p>
</body>
</html>
Each tag has an opening and closing angle bracket, and tags themselves occur in pairs, with the second tag of a pair including a forward-slash (/) indicating the closing of that part of the document.
So, <html>
indicates the beginning of an html document, and </html>
indicates the closing of the html document. <p>
indicates the beginning of a paragraph, and </p>
indicates the end of the paragraph, and so on.
Between any opening and closing tags are the contents of that block.
To "prettify" an html document we want to be able to convert it, regardless of its original formatting, to the style demonstrated above.
<html>
tags, <head>
, <title>
, <body>
, <p>
, and <div>
tags.<sup>
above) an occur in-line.Note that any given webpage is displayed correctly according to HTML syntax, regardless of the formatting of the HTML document itself. So, this code:
<html><head><title>My favorite equation of all time</title></head><body>
<p>c<sup>2</sup> >= a<sup>2</sup> + b<sup>2</sup></p></body></html>
will display exactly the same in a webpage as the "pretty" code above. This code is just harder for a programmer to read and work with.
The prettyprint.py
program takes a file with ugly code, and convert it to pretty code as shown in the first example above.