1. One characteristic of a data set that might result in it being classified as Big Data is that it contains a ________________________________.
2. Another characteristic that might result in a data set being classed as Big Data is:
3. Big Data is used in real time applications, the internet and in mobile networks.
4. It is possible to model a database and define it in terms of nodes, edges, & properties using a _____________ data structure.
5. Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with __________that are too large or complex to be dealt with, by traditional data-processing application software.
6. There are five widely recognised Vs of Big Data. Three of them are Velocity,Veracity and Value. Can you name the other two? (2 marks)
7. Velocity refers to every day data growth which includes conversations in forums, blogs, social media posts etc.
8. Variety characteristic refers to the format of big data ? whether it is ________________________.
9. An example of an ______________________ format is, a video file format, image files, plain text format, from web document
10. This is a common Big Data interview question. Big data is important because by processing big data, organizations can obtain insight information related to the following (see excerpt) and fill in the blanks. (1 mark)
1. ______________
2. Improvements in products or services
3. To understand customer behavior and markets
4. Effective decision making
5. To become more competitive
11. 4. Name two tools or systems used in big data processing? (2 marks)
12. Generally speaking, there are three steps involved in big data solutions. Fill in the blanks for the second step below.
Big data solutions follow three standard
steps in its implementation.
They are:
1. Data ingestion: This step will define the approach
to extract and consolidate data from multiple sources.
For example, data sources can be social network feeds,
CRM, RDBMS, etc. The data extracted from different
sources is stored in a Hadoop distributed file system
(HDFS).
2. __________________________________________
3. Process the data: This is the last step.
The data stored must be processed. Processing is
done using tools such as Spark, Pig, MapReduce,
and others.
13. Apache Hadoop is an open-source framework used for storing, processing, and analyzing complex unstructured data sets for deriving insights for companies. The three main components of Hadoop are-
14. Data Storage is the next step in Big Data Solutions. In this step, the data is extracted from the first step is stored in HDFS or NoSQL database, also known as ______. The HDFS storage is widely used for sequential access. On the contrary, ________ is use
15. ______________ refers to the minimal hardware resources and components, collectively needed, to run the Apache Hadoop framework and related data management tools. Apache Hadoop requires 64-512 GB of RAM to execute tasks, and any hardware that supports its
16. Define and describe the term FSCK. FSCK (______________) is a command used to run a Hadoop summary report that describes the state of the Hadoop file system.
17. Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics
18. CRVS (Civil Registration and Vital Statistics) collects all certificates status from birth to death. CRVS is, however, NOT a source of big data for governments.
19. Big data can be used to improve training and understanding competitors, using sport sensors. It is also possible to _______________ in a match using big data analytics.
20. In March 2012, The White House announced a national "Big Data Initiative" that consisted of six Federal departments and agencies committing more than _______________ to big data research projects