Preview

3 - Covid-19 stats scraper site (recreate)

 1. Beautiful Soup is a Python library for pulling data out of _______________________,

  routes, such as the ones set up in Flask.

  Python and Python related files.

  EXCEL and ACCESS files.

  HTML and XML files.

 2. The following HTML is referred to in BeautifulSoup's documentation. Every tag has a name, accessible as __________________.
html_doc = """
The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

"""

  .title

  .attriute

   .name

  .tag

 3. A tag may have any number of attributes. The tag "< b id="boldest" >" has an attribute "id" whose value is "boldest". You can access a tag's attributes by treating the tag like a dictionary:

  True

  False

 4. The following code extracts all the letter a's in the text.
for link in soup.find_all('a'):
    print(link.get('href'))
# http://example.com/elsie
# http://example.com/lacie
# http://example.com/tillie

  False

  True

 5. In the following code, the last line outputs totalcases_stripped as _________________.
url = 'https://www.worldometers.info/coronavirus/'
req = requests.get(url)
bsObj = BeautifulSoup(req.text, "html.parser")

data=bsObj.find_all('div',class_="maincounter-number")

print(data)
print()
totalcases=data[0].text
print(totalcases)
totalcases_stripped=totalcases.strip()
print()
print()
print("Stripped total cases:",totalcases_stripped)

  a data type

  a list

  a string

  an integer

 6. In the following code, the command strip() ________________________________________.
txt = '     example     '
print(txt.strip())
# output 
# 'example'

  removes the string of any spaces before the word

  removes the spaces before the particular word and after it

  strips the words of integers

  removes any punctuation marks from the word - e.g. commas, full stops or exclamation marks

 7. Which statement best describes what the following bit of code does?
totalcases=int(data[0].text.strip().replace(',',''))

  It removes any letters from the variable totalcases and adds a 0 to the end

  It removes spaces from the value held inside totalcases and replaces the whole value with the integer 0

  It removes any spaces from either end of the value held in totalcases and also turns it back into a string from an integer

  It removes any commas from the value held in totalcases and also casts it to an integer

 8. Analyse the following code. The final line print(totalcases+100) will __________________________________.
import requests
from bs4 import BeautifulSoup


url = "https://www.worldometers.info/coronavirus/"
req = requests.get(url)
bsObj = BeautifulSoup(req.text, "html.parser")
data = bsObj.find_all("div",class_ = "maincounter-number")

totalcases=data[0].text.strip()
print(totalcases)
print(totalcases+100)

  produce an output of the totalcases+100 and convert it to an integer

  produce an error: TypeError: must be str, not int

  produce an output of "totalcases" instead of the actual number

  produce an output of the totalcases+100

 9. Given the following code, there is the following error. Index Error: list index is out of range. This suggests that ______________________.
import requests
from bs4 import BeautifulSoup
url = "https://www.worldometers.info/coronavirus/"
req = requests.get(url)
bsObj = BeautifulSoup(req.text, "html.parser")
data = bsObj.find_all("div",class_ = "maincounter-number")
totalcases=data[4].text.strip()
print(totalcases)

  The list 'data' is too short so it should always start at 0. Line 7 should be: data[0]

  The list bsObj is too short.

  The list 'data' is too long and line 7 should be: totalcases=data[5].text.strip()

  In the 'data' list there are only 3 instances of data being extracted.

 10. The simplest way to navigate the parse tree is to note the _____________________ you want. If you want the tag, you would write: soup.head:
soup.head
# The Dormouse's story

soup.title
# The Dormouse's story

  name of the python variable

  name of the tag

  name of the id attribute

  name of the div class