Test&Track

Don't have an account? Sign up

Google

Sorry, this content is only available to premium members

Sign up

Please tick the below box to proceed

I agree (or if I am under 13 my parent or guardian agrees on my behalf) to the terms and conditions of use and that:
- My test statistics may be published on the site leaderboard against my username
- My teacher(s) can review my test scores
- I can receive feedback on my tests from my teacher(s)

Please tick this box to proceed

Already have an account? Sign in

1. The easiest way to install external libraries in python is to use ____. It is a package management system used to install and manage software packages written in Python.

requests

beautifulsoup

pygame

pip

2. The first step involved in web scraping is to: send an ____________ to the URL of the webpage you want to access. The server responds to the request by returning the HTML content of the webpage.

HTTP request

POST request

XMLS request

Beautiful Soup object request

3. An alternative way to web scrape is to use the ____ of the website (if it exists). For example, Facebook has the Facebook Graph ______ which allows retrieval of data posted on Facebook.

MNI

GUI

API

HTML source

4. What is 'soup' in the following line of code? (some knowledge of OOP required)

soup = BeautifulSoup(html_doc, 'html.parser')

It is a class that is being created

It is a created object that inherits all attributes and methods from Beautiful Soup.

It is a variable that is given the value of an HTML document

It is the name given to the websccraper text

5. What is the output of the following code?

from bs4 import BeautifulSoup
html_doc = """
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>The post-corona age</title>
</head>
<body>
<h2>Corona virus and 2020</h2>
<p>
This is a paragraph which says something about the world as it stands in 2020 and everything the covid pandemic has
done to create havoc</p>

</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.find("title"))

Answer: text/html

Answer: "The post-corona age"

Answer: Head

Answer: "The post-corona age" but with the title tags around it

6. What is the output from line 20 from the following code?

from bs4 import BeautifulSoup
html_doc = """



CoronaVirusAge


Welcome

Here is a random paragraph
Learn stuff here
...or from here


"""
soup = BeautifulSoup(html_doc, 'html.parser')
print("Length of the text of the first  tag:")
print(len(soup.find('h2').text))

Error

Welcome

7. In the following code, we create a BeautifulSoup object and _____________________________________________.

soup = BeautifulSoup(r.content, 'html5lib')

pass it the content of the python file we are using to parse and the website URL

pass it the content of the HTML website that we wish to extract.

None of the options listed here are valid

pass it two arguments, the raw HTML content and the HTML parser we want to use.

8. In this code the first argument is the HTML tag you want to search and second argument is a dictionary type element to specify the _____________________.

table = soup.find('div', attrs = {'class':'container'})

HTML div tags associated with that di class.

additional variables associated with the BeautifulSoup object

additional HTML pages associated with that site.

additional attributes associated with that tag.

9. In the above example, the find() method returns __________________. findAll() method is similar to find method in terms of arguments but it returns a list of all matching elements.

the first matching element.

all instances of the word "find".

every single matching element associated with that tag.

the entire HTML page

10. Web Scraping can be considered illegal in many cases. It may also cause your IP to be blocked permanently by a website.

Thank you for taking the trouble to report an error!

We will investigate and make the necessary changes.

Preview

2 - Scrape a site & display on HTML webpage

Welcome

tag:") print(len(soup.find('h2').text))