Test&Track

Don't have an account? Sign up

Google

Sorry, this content is only available to premium members

Sign up

Please tick the below box to proceed

I agree (or if I am under 13 my parent or guardian agrees on my behalf) to the terms and conditions of use and that:
- My test statistics may be published on the site leaderboard against my username
- My teacher(s) can review my test scores
- I can receive feedback on my tests from my teacher(s)

Please tick this box to proceed

Already have an account? Sign in

1. The easiest way to install external libraries in python is to use ____. It is a package management system used to install and manage software packages written in Python.

requests

pygame

pip

beautifulsoup

2. The first step involved in web scraping is to: send an ____________ to the URL of the webpage you want to access. The server responds to the request by returning the HTML content of the webpage.

XMLS request

Beautiful Soup object request

POST request

HTTP request

3. An alternative way to web scrape is to use the ____ of the website (if it exists). For example, Facebook has the Facebook Graph ______ which allows retrieval of data posted on Facebook.

GUI

MNI

HTML source

API

4. What is 'soup' in the following line of code? (some knowledge of OOP required)

soup = BeautifulSoup(html_doc, 'html.parser')

It is the name given to the websccraper text

It is a created object that inherits all attributes and methods from Beautiful Soup.

It is a class that is being created

It is a variable that is given the value of an HTML document

5. What is the output of the following code?

from bs4 import BeautifulSoup
html_doc = """
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>The post-corona age</title>
</head>
<body>
<h2>Corona virus and 2020</h2>
<p>
This is a paragraph which says something about the world as it stands in 2020 and everything the covid pandemic has
done to create havoc</p>

</body>
</html>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.find("title"))

Answer: Head

Answer: text/html

Answer: "The post-corona age"

Answer: "The post-corona age" but with the title tags around it

6. What is the output from line 20 from the following code?

from bs4 import BeautifulSoup
html_doc = """



CoronaVirusAge


Welcome

Here is a random paragraph
Learn stuff here
...or from here


"""
soup = BeautifulSoup(html_doc, 'html.parser')
print("Length of the text of the first  tag:")
print(len(soup.find('h2').text))

Error

Welcome

7. In the following code, we create a BeautifulSoup object and _____________________________________________.

soup = BeautifulSoup(r.content, 'html5lib')

pass it two arguments, the raw HTML content and the HTML parser we want to use.

pass it the content of the HTML website that we wish to extract.

None of the options listed here are valid

pass it the content of the python file we are using to parse and the website URL

8. In this code the first argument is the HTML tag you want to search and second argument is a dictionary type element to specify the _____________________.

table = soup.find('div', attrs = {'class':'container'})

HTML div tags associated with that di class.

additional attributes associated with that tag.

additional variables associated with the BeautifulSoup object

additional HTML pages associated with that site.

9. In the above example, the find() method returns __________________. findAll() method is similar to find method in terms of arguments but it returns a list of all matching elements.

every single matching element associated with that tag.

all instances of the word "find".

the entire HTML page

the first matching element.

10. Web Scraping can be considered illegal in many cases. It may also cause your IP to be blocked permanently by a website.

Thank you for taking the trouble to report an error!

We will investigate and make the necessary changes.

Preview

2 - Scrape a site & display on HTML webpage

Welcome

tag:") print(len(soup.find('h2').text))