Preview

2 - Scrape a site & display on HTML webpage

 1. The easiest way to install external libraries in python is to use ____. It is a package management system used to install and manage software packages written in Python.

  requests

  beautifulsoup

  pygame

  pip

 2. The first step involved in web scraping is to: send an ____________ to the URL of the webpage you want to access. The server responds to the request by returning the HTML content of the webpage.

  HTTP request

  POST request

  XMLS request

  Beautiful Soup object request

 3. An alternative way to web scrape is to use the ____ of the website (if it exists). For example, Facebook has the Facebook Graph ______ which allows retrieval of data posted on Facebook.

  MNI

  GUI

  API

  HTML source

 4. What is 'soup' in the following line of code? (some knowledge of OOP required)
soup = BeautifulSoup(html_doc, 'html.parser')

  It is a class that is being created

  It is a created object that inherits all attributes and methods from Beautiful Soup.

  It is a variable that is given the value of an HTML document

  It is the name given to the websccraper text

 5. What is the output of the following code?
from bs4 import BeautifulSoup
html_doc = """
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>The post-corona age</title>
</head>
<body>
<h2>Corona virus and 2020</h2>
<p>
This is a paragraph which says something about the world as it stands in 2020 and everything the covid pandemic has
done to create havoc</p>

</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.find("title"))

  Answer: text/html

  Answer: "The post-corona age"

  Answer: Head

  Answer: "The post-corona age" but with the title tags around it

 6. What is the output from line 20 from the following code?
from bs4 import BeautifulSoup
html_doc = """



CoronaVirusAge


Welcome

Here is a random paragraph

Learn stuff here

...or from here

""" soup = BeautifulSoup(html_doc, 'html.parser') print("Length of the text of the first

tag:") print(len(soup.find('h2').text))

  Error

  7

  Welcome

  h2

 7. In the following code, we create a BeautifulSoup object and _____________________________________________.
soup = BeautifulSoup(r.content, 'html5lib')

  pass it the content of the python file we are using to parse and the website URL

  pass it the content of the HTML website that we wish to extract.

  None of the options listed here are valid

  pass it two arguments, the raw HTML content and the HTML parser we want to use.

 8. In this code the first argument is the HTML tag you want to search and second argument is a dictionary type element to specify the _____________________.
table = soup.find('div', attrs = {'class':'container'})

  HTML div tags associated with that di class.

  additional variables associated with the BeautifulSoup object

  additional HTML pages associated with that site.

  additional attributes associated with that tag.

 9. In the above example, the find() method returns __________________. findAll() method is similar to find method in terms of arguments but it returns a list of all matching elements.

  the first matching element.

  all instances of the word "find".

  every single matching element associated with that tag.

  the entire HTML page

 10. Web Scraping can be considered illegal in many cases. It may also cause your IP to be blocked permanently by a website.

  FALSE

  TRUE