Introduction to Software Development
Homework 4: Make a Website
(Deadline as per Canvas)
This homework deals with the following topics:
● Reading and writing files
● Scraping and parsing information from a text file
● Very basic HTML
● Unit testing
General Problem Specification
The basic skeleton of a website is an HTML page. This HTML page is a text file with a certain
format. Taking advantage of this fact, one can take an HTML template and create multiple
pages with different values stored. This is exactly what we are going to do in this homework.
Many websites use these kinds of scripts to mass generate HTML pages from databases. We
will use a sample text file as our ‘database’.
Our goal will be to take a resume in a simplistic text file and convert it into an HTML file that
can be displayed by a web browser.
What is HTML?
You do not need to know much HTML to do this assignment. You can do the assignment by
just understanding that HTML is the language that your browser interprets to display the page.
It is a tag-based language where each set of tags provides some basic information for how the
browser renders the text that the tags enclose.
For example,
will be rendered by your browser with “The Beatles”
as a heading with a large font.
indicates the ending of the heading. An HTML webpage is typically divided into a head section
and a body section. We have provided you with a basic website template. We want you to
retain the head section and write only the body section via your Python program.
For more details about HTML, your best resource will be to use the w3schools website which
can be found here: www.w3schools.com. This website has a ton of information and provides
all of the common HTML you’ll need to know for this assignment.
1
Introduction to Software Development
Input Test File
We will read a simple text file that is supposed to represent a student’s resume. The resume
has some key points but can be somewhat unstructured. In particular, the order of some of the
information will definitely be different for different people.
The following are the things that you definitely DO know about the resume:
● Every resume will have a name, which will be written at the top. The top line in the text
file will contain just the name.
● There will be a line in the file, which contains an email address. It will be a single line
with just the email address and nothing else.
● Every resume will have a list of projects. Projects are listed line-by-line below a heading
called “Projects”. An example of what you might see in a resume file is something like
this:
Projects
Worked on big data to turn it into bigger data
Applied deep learning to learn how to boil water
Styled web pages with blink tags.
Washed cars …
----------
● The list of projects ends with a single line that looks like ‘----------’. That is, it will have
at least 10 minus signs. While this is an odd formatting requirement, this will actually
make the assignment easier for you.
● Every resume will have a list of Courses. Courses are listed like:
Courses - CIT590, AB120
OR
Courses :- Pottery, Making money by lying
● The formatting for Courses will be the word “Courses” followed by some amount of
punctuation, followed by the comma-separated list of courses.
● Your program should be able to look at the above example and then extract the
courses without including the ‘-’ sign or the ‘:-’. Note that any type of punctuation
could be between “Courses” and the list of courses.
2
Introduction to Software Development
Functions for Parsing the File
At the very least, you need to write one function for each piece of information that you want to
extract from the text file. Contrary to previous homework assignments, in this assignment we
will not provide you with a strict outline for the functions or what arguments to pass to them.
When we grade, we’ll look at how modular your code is and how you decided to break up the
functionality into separate functions. We’ll also look at how you named the functions, what
arguments they take, and how well you unit test the functions.
Here’s a basic breakdown of the functionality required to read the file into memory, parse each
section of the file to extract the relevant information, and write the final HTML-formatted
information to a new file. You should be writing to a file, not appending.
Reading the File
● Since the resume file is pretty small, write a function that reads the file and stores it in
memory as a list of lines.
● Then, you can use list and string manipulations to do all of the other necessary work.
● You should not prompt the user for a filename (or any other information). You can rely
on the provided code which includes hardcoded resume filenames to read.
Detecting the Name
● Detect and return the name by extracting the first line.
● The one extra thing we want you to do, just for practice, is if the first character in the
name string is not an uppercase letter (capital ’A’ through ’Z’), consider the name invalid
and ignore it. In this case, use ‘Invalid Name’ as the user’s name.
● For example:
Brandon Krakowsky is a valid name
brandon Krakowsky is not a valid name, so your output html file will display ‘Invalid
Name’ instead
● Another thing to note is that the name on the first line could have leading or trailing
whitespace, which you will need to remove.
● Note: Do not use the istitle() function in Python. This returns True if ALL words in a text
start with an upper case letter, AND the rest of the characters in each word are lower
case letters, otherwise False. This function will incorrectly identify a name like Edward
jones as being an invalid name, when it’s actually valid.
3
Introduction to Software Development
Detecting the Email
● Detect and return the email address by looking for a line that has the ‘@’
character.
● For an email to be valid:
○ The last four characters of the email need to be either ‘.com’ or ’.edu’.
○ The email contains a lowercase English character after the ‘@’.
○ There should be no digits or numbers in the email address.
● These rules will accommodate lbrandon@wharton.upenn.edu but will not
accommodate lbrandon@python.org or lbrandon2@wharton.upenn.edu
● For example:
lbrandon@wharton.upenn.edu is a valid email
lbrandon@wharton2.upenn.com is not a valid email
lbrandon2@wharton.upenn.com is also not a valid email
● The email string could have leading or trailing whitespace, which will need to be
stripped.
● We are fully aware that these rules are inadequate. However, we want you to
use these rules and only these rules.
● If an email string is invalid based on the given rules, consider the email address to be
missing. This means your function should return an empty string and your output
resume file will not display an email address.
● DO NOT GOOGLE FOR A FUNCTION FOR THIS. Googling for solutions to your
homework is an act of academic dishonesty and in this particular case, you will get
solutions involving crazy regular expressions, which is a topic we haven’t yet discussed
in class. (In general, your code should never involve a topic that we have not discussed
in class.). Plus, you can easily achieve the required functionality without the use of a
regular expression.
Detecting the Courses
● Detect and return the courses as a list by looking for the word “Courses” in the list
and then extract the line that contains that word.
● Then make sure you extract the correct courses. In particular, any random
punctuation after the word “Courses” and before the first actual course needs to be
ignored.
● You are allowed to assume that every course begins with a letter of the English
alphabet.
● Note that the word “Courses”, the random punctuation, or individual courses in the
list could have leading or trailing whitespace that needs removed.
4
Introduction to Software Development
Detecting the Projects
● Detect and return the projects as a list by looking for the word “Projects” in the list.
● Each subsequent line is a project, until you hit a line that contains ‘----------’. This is
NOT an underscore. It is (at least) ten minus signs put together. You have reached the
end of the projects section if and only if you see a line that has at least 10 minus
signs, one after the other.
● If you detect a blank line between project descriptions, ignore that line.
● Each project could have leading or trailing whitespace that needs removed.
Writing the HTML
Once you have gathered all the pieces of information from the text file, we want you to
programmatically write HTML. Here are the steps for that:
● Start by saving the file resume_template.html that is provided in the same directory as
your code.
● Preview the file in a text editor that does HTML syntax highlighting (e.g. Sublime
Text). Note that opening this page in your browser will give you a blank web page
since the file only contains a header and an empty body.
● You are going to programmatically copy the HTML in resume_template.html, fill in the
empty with the resume content, then write the final HTML to a new file
resume.html. You should be writing to a file, not appending to a file. (Note: you should
not modify or overwrite the resume_template.html file. You should copy the
information from this file, modify it as needed, and then write the modified information
to a new file, resume.html. Think about how you can save the information from
resume_template.html in program memory to help you accomplish this.)
● More specifically, your Python code will do the following:
o Open and read resume_template.html
o Read every line of HTML into program memory
o Remove the last 2 lines of HTML (the and lines). (You can delete
these lines, and you will programmatically add them back later)
o Add all HTML-formatted resume content (as described below).
o Add the last 2 lines of HTML back in (the and lines).
o Write the final HTML to a new file resume.html
Why are we doing this? Because the HTML in resume_template.html looks something
like the below, and we need to start by removing the last two lines of HTML (closing
and tags) in order to insert the resume content in the correct
location.
random header stuff
lots of style rules we won’t worry about
5
Introduction to Software Development
We want to put our resume content in between the body tags to make it look like this:
random header stuff
lots of style rules we won’t worry about
HTML-formatted resume content goes here
In order to write proper HTML you will need to write the following helper function:
def surround_block(tag, text):
● This function surrounds the given text with the given HTML tag and returns the
string
● For example, surround_block(‘h1’, ‘The Beatles’) would return
‘
’
You’re going to display the email address in your webpage as an active email link. The proper
way to create a link in HTML is to use the and tags. The tells where the link
should start and the indicates where the link should end. Everything between these two
tags will be displayed as a link and the target of the link is added to the tag using the href
attribute.
For example, a link to google.com would look like this:
Click here to go to Google
The tag also provides the option to specify an email address as the target of the link. To
do this, you use the “mailto: email address” format for the href attribute.
For example, an email link to abc@example.com would look like this:
Send Email
In order to write proper HTML for the email link, you will need to write the following helper
function:
6
Introduction to Software Development
def create_email_link(email_address):
● This function creates an email link with the given email_address
● To cut down on spammers harvesting the email address from your webpage, this
function should display the email address with an [aT] instead of an @
● For example, create_email_link(‘tom@seas.upenn.edu’) would return
tom[aT]seas.upenn.edu
● Note: If (for some reason) the given email address does not contain @, use the email
address as is and don't replace the @*
● For example, create_email_link(‘tom.at.seas.upenn.edu’) would return
‘tom.at.seas.upenn.edu’
*Note: the create_email_link function does not determine if the email address is valid as
described in the “Detecting the Email” section above. Detecting a valid email address should
be separate from creating the email link. Even though every resume should contain an email
address with an @ symbol, we are asking that you create the function this way to practice
what to do for unexpected inputs.
Now break the writing of the resume into the following steps:
1. In order to format the resume content, we have to make sure that all the text is
enclosed within the following tags:
Since we typically write things line-by-line, this initial step will write the 1
st
line
7
Introduction to Software Development
Think about how you can do that. In particular, think about how the create_email_link
and surround_block functions can be used to achieve this. Write another function to
make this entire intro section of the resume and then write it out to a file. Note
that the email address should be preceded by “Email:” in your final output.
3. For the projects section. you will have to produce output like the following. Assume
that you have the data about the projects by reading through the original file and stored
that in some data structure (decide whether you want to use list, set, tuple or
dictionary)
built a robot
fixed an ios app
Note that you MUST have
tags surrounded by
tags in your output. If you do
not exactly match this format, you will have issues with the project bullet points not
displaying correctly.
is short for list, and
is short for unordered list.
4. For the courses section, we actually just want to list the courses. We do not want to
create bullet points. We also want the heading to be a bit smaller in size and enclosed
in
版权所有:留学生编程辅导网 2021,All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。