Preview

2 - Scrape a site & display on HTML webpage

 1. The easiest way to install external libraries in python is to use ____. It is a package management system used to install and manage software packages written in Python.

  requests

  pygame

  pip

  beautifulsoup

 2. The first step involved in web scraping is to: send an ____________ to the URL of the webpage you want to access. The server responds to the request by returning the HTML content of the webpage.

  XMLS request

  Beautiful Soup object request

  POST request

  HTTP request

 3. An alternative way to web scrape is to use the ____ of the website (if it exists). For example, Facebook has the Facebook Graph ______ which allows retrieval of data posted on Facebook.

  GUI

  MNI

  HTML source

  API

 4. What is 'soup' in the following line of code? (some knowledge of OOP required)
soup = BeautifulSoup(html_doc, 'html.parser')

  It is the name given to the websccraper text

  It is a created object that inherits all attributes and methods from Beautiful Soup.

  It is a class that is being created

  It is a variable that is given the value of an HTML document

 5. What is the output of the following code?
from bs4 import BeautifulSoup
html_doc = """
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>The post-corona age</title>
</head>
<body>
<h2>Corona virus and 2020</h2>
<p>
This is a paragraph which says something about the world as it stands in 2020 and everything the covid pandemic has
done to create havoc</p>

</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.find("title"))

  Answer: Head

  Answer: text/html

  Answer: "The post-corona age"

  Answer: "The post-corona age" but with the title tags around it

 6. What is the output from line 20 from the following code?
from bs4 import BeautifulSoup
html_doc = """



CoronaVirusAge


Welcome

Here is a random paragraph

Learn stuff here

...or from here

""" soup = BeautifulSoup(html_doc, 'html.parser') print("Length of the text of the first

tag:") print(len(soup.find('h2').text))

  h2

  7

  Error

  Welcome

 7. In the following code, we create a BeautifulSoup object and _____________________________________________.
soup = BeautifulSoup(r.content, 'html5lib')

  pass it two arguments, the raw HTML content and the HTML parser we want to use.

  pass it the content of the HTML website that we wish to extract.

  None of the options listed here are valid

  pass it the content of the python file we are using to parse and the website URL

 8. In this code the first argument is the HTML tag you want to search and second argument is a dictionary type element to specify the _____________________.
table = soup.find('div', attrs = {'class':'container'})

  HTML div tags associated with that di class.

  additional attributes associated with that tag.

  additional variables associated with the BeautifulSoup object

  additional HTML pages associated with that site.

 9. In the above example, the find() method returns __________________. findAll() method is similar to find method in terms of arguments but it returns a list of all matching elements.

  every single matching element associated with that tag.

  all instances of the word "find".

  the entire HTML page

  the first matching element.

 10. Web Scraping can be considered illegal in many cases. It may also cause your IP to be blocked permanently by a website.

  TRUE

  FALSE