Preview

01 - Big Data

 1. One characteristic of a data set that might result in it being classified as Big Data is that it contains a ________________________________.
Big-Data_introimage_cert.jpg

  variety of different forms of information.

  series of indexes

  primary key as well as a secondary key

  primary key

 2. Another characteristic that might result in a data set being classed as Big Data is:

  there is a lot of duplicated data that is not necesssary.

  there is hardly any data to process suggesting that it will be able to fit on a single server.

  there is a lot of nonsensical or irrelevant data present.

  there is a lot / high volume of data (to process) - as a result, data will not fit on one server.

 3. Big Data is used in real time applications, the internet and in mobile networks.

  FALSE

  TRUE

 4. It is possible to model a database and define it in terms of nodes, edges, & properties using a _____________ data structure.

 5. Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with __________that are too large or complex to be dealt with, by traditional data-processing application software.
Trivia: The world's technological per-capita 
capacity to store information has 
roughly doubled every 40 months 
since the 1980s; as of 2012, 
every day 2.5 exabytes (2.5?1018) 
of data are generated. 

Based on an IDC report prediction, 
the global data volume will grow 
exponentially from 4.4 zettabytes to 
44 zettabytes between 2013 and 2020. 

By 2025, IDC predicts there will be 
163 zettabytes of data.

  data sets

  numeric data

  string data

  database fields

 6. There are five widely recognised Vs of Big Data. Three of them are Velocity,Veracity and Value. Can you name the other two? (2 marks)

 7. Velocity refers to every day data growth which includes conversations in forums, blogs, social media posts etc.

  TRUE

  FALSE

 8. Variety characteristic refers to the format of big data ? whether it is ________________________.

  static or dynamic

  variable or constant

  fluid or monotone

   structured or unstructured.

 9. An example of an ______________________ format is, a video file format, image files, plain text format, from web document

  variable

  unstructured

  structured

  static

 10. This is a common Big Data interview question. Big data is important because by processing big data, organizations can obtain insight information related to the following (see excerpt) and fill in the blanks. (1 mark)
1. ______________
2. Improvements in products or services
3. To understand customer behavior and markets 
4. Effective decision making
5. To become more competitive

 11. 4. Name two tools or systems used in big data processing? (2 marks)

 12. Generally speaking, there are three steps involved in big data solutions. Fill in the blanks for the second step below.
Big data solutions follow three standard 
steps in its implementation. 

They are:
1. Data ingestion: This step will define the approach 
to extract and consolidate data from multiple sources. 
For example, data sources can be social network feeds,
 CRM, RDBMS, etc. The data extracted from different 
sources is stored in a Hadoop distributed file system 
(HDFS).


2.  __________________________________________

3. Process the data: This is the last step. 
The data stored must be processed. Processing is 
done using tools such as Spark, Pig, MapReduce, 
and others.

 13. Apache Hadoop is an open-source framework used for storing, processing, and analyzing complex unstructured data sets for deriving insights for companies. The three main components of Hadoop are-
1. _______________ ? A programming model which processes large datasets in parallel
HDFS ? A Java-based distributed file system used for data storage without prior organization
YARN ? A framework that manages resources and handles requests from distributed applications

  Cart GUL

  Core Providence

  2D Ambience

  Map Reduce

 14. Data Storage is the next step in Big Data Solutions. In this step, the data is extracted from the first step is stored in HDFS or NoSQL database, also known as ______. The HDFS storage is widely used for sequential access. On the contrary, ________ is use

  Fbun

  QuantumDon

  RandomHob

  Hbase

 15. ______________ refers to the minimal hardware resources and components, collectively needed, to run the Apache Hadoop framework and related data management tools. Apache Hadoop requires 64-512 GB of RAM to execute tasks, and any hardware that supports its

  Cartesian Hardware

  Hardware Minomo

  Matrix Perception

   Commodity Hardware

 16. Define and describe the term FSCK. FSCK (______________) is a command used to run a Hadoop summary report that describes the state of the Hadoop file system.

 17. Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics
. In 2010, this industry was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole

  TRUE

  FALSE

 18. CRVS (Civil Registration and Vital Statistics) collects all certificates status from birth to death. CRVS is, however, NOT a source of big data for governments.

  TRUE

  FALSE

 19. Big data can be used to improve training and understanding competitors, using sport sensors. It is also possible to _______________ in a match using big data analytics.

  predict winners

  predict a terror attack

  predict injuries

  predict exact scores with 100% accuracy every time

 20. In March 2012, The White House announced a national "Big Data Initiative" that consisted of six Federal departments and agencies committing more than _______________ to big data research projects

  $200,000

  $2 million

  $20 million

  $200 million