

![]()


Big Data Analytics is a course designed to introduce students to the concepts, tools, and techniques used to analyze and interpret massive datasets. Through lectures and hands-on projects, students learn to manage, process, and extract valuable insights from structured and unstructured data using platforms such as Hadoop and Spark. The course covers data mining, machine learning, data visualization, and ethical considerations in big data, enabling students to solve real-world problems in various industries by making data-driven decisions.
Recommended Textbook
Data Mining A Tutorial Based Primer 1st Edition by Richard Roiger
Available Study Resources on Quizplus
14 Chapters
167 Verified Questions
167 Flashcards
Source URL: https://quizplus.com/study-set/3934 Page 2

Available Study Resources on Quizplus for this Chatper
22 Verified Questions
22 Flashcards
Source URL: https://quizplus.com/quiz/78433
Sample Questions
Q1) Database query is used to uncover this type of knowledge.
A) deep
B) hidden
C) shallow
D) multidimensional
Answer: C
Q2) A statement to be tested.
A) theory
B) procedure
C) principle
D) hypothesis
Answer: D
Q3) A nearest neighbor approach is best used
A) with large-sized datasets.
B) when irrelevant attributes have been removed from the data.
C) when a generalized model of the data is desireable.
D) when an explanation of what has been found is of primary importance.
Answer: B
To view all questions and flashcards with answers, click on the resource link above. Page 3

Available Study Resources on Quizplus for this Chatper
16 Verified Questions
16 Flashcards
Source URL: https://quizplus.com/quiz/78434
Sample Questions
Q1) Which statement about outliers is true?
A) Outliers should be identified and removed from a dataset.
B) Outliers should be part of the training dataset but should not be present in the test data.
C) Outliers should be part of the test dataset but should not be present in the training data.
D) The nature of the problem determines how outliers are used.
E) More than one of a,b,c or d is true.
Answer: D
Q2) How many class 2 instances are in the dataset?
Answer: 23
Q3) Given desired class C and population P, lift is defined as
A) the probability of class C given population P divided by the probability of C given a sample taken from the population.
B) the probability of population P given a sample taken from P.
C) the probability of class C given a sample taken from population P.
D) the probability of class C given a sample taken from population P divided by the probability of C within the entire population P.
Answer: D
To view all questions and flashcards with answers, click on the resource link above.
Page 4

Available Study Resources on Quizplus for this Chatper
13 Verified Questions
13 Flashcards
Source URL: https://quizplus.com/quiz/78435
Sample Questions
Q1) Given a rule of the form IF X THEN Y, rule confidence is defined as the conditional probability that
A) Y is true when X is known to be true.
B) X is true when Y is known to be true.
C) Y is false when X is known to be false.
D) X is false when Y is known to be false.
Answer: A
Q2) An evolutionary approach to data mining.
A) backpropagation learning
B) genetic learning
C) decision tree learning
D) linear regression
Answer: B
Q3) A genetic learning operation that creates new population elements by combining parts of two or more existing elements.
A) selection
B) crossover
C) mutation
D) absorption
Answer: B
To view all questions and flashcards with answers, click on the resource link above. Page 5

Available Study Resources on Quizplus for this Chatper
12 Verified Questions
12 Flashcards
Source URL: https://quizplus.com/quiz/78436
Sample Questions
Q1) A particular categorical attribute value has a predictiveness score of 1.0 and a predictability score of 0.50. The attribute value is
A) necessary but not sufficient for class membership.
B) sufficient but not necessary for class membership.
C) necessary and sufficient for class membership.
D) neither necessary nor sufficient for class membership.
Q2) The single best representative of a class.
A) mean
B) centroid
C) signature
D) prototype
Q3) The first row of an iDAV formatted file contains attribute names. The second row reflects attribute types. What is specified in the third row of an iDAV formatted file?
A) attribute predictability
B) attribute tolerance
C) attribute similarity
D) attribute usage
To view all questions and flashcards with answers, click on the resource link above.

Available Study Resources on Quizplus for this Chatper
10 Verified Questions
10 Flashcards
Source URL: https://quizplus.com/quiz/78437
Sample Questions
Q1) KDD has been described as the application of ___ to data mining.
A) the waterfall model
B) object-oriented programming
C) the scientific method
D) procedural intuition
Q2) Attibutes may be eliminated from the target dataset during this step of the KDD process.
A) creating a target dataset
B) data preprocessing
C) data transformation
D) data mining
Q3) This technique uses mean and standard deviation scores to transform real-valued attributes.
A) decimal scaling
B) min-max normalization
C) z-score normalization
D) logarithmic normalization
To view all questions and flashcards with answers, click on the resource link above. Page 7

Available Study Resources on Quizplus for this Chatper
13 Verified Questions
13 Flashcards
Source URL: https://quizplus.com/quiz/78438
Sample Questions
Q1) Operational databases are designed to support _____ whereas decision support systems are design to support __________.
A) transactional processing, data analysis
B) data analysis, transactional processing
C) independent data marts, dependent data marts
D) dependent data marts, independent data marts
Q2) The level of detail of the information stored in a data warehouse.
A) granularity
B) scope
C) functionality
D) level of query
Q3) Which of the following is not an example of a slice operation?
A) Select all cells where purchase category = retail.
B) Select all cells where purchase category = retail or vehicle.
C) Provide a spreadsheet of quarter and region information for all cells pertaining to restaurant.
D) Identify the region of peak travel expenditure for each quarter.
To view all questions and flashcards with answers, click on the resource link above. Page 8

Available Study Resources on Quizplus for this Chatper
13 Verified Questions
13 Flashcards
Source URL: https://quizplus.com/quiz/78439
Sample Questions
Q1) The correlation between the number of years an employee has worked for a company and the salary of the employee is 0.75. What can be said about employee salary and years worked?
A) There is no relationship between salary and years worked.
B) Individuals that have worked for the company the longest have higher salaries.
C) Individuals that have worked for the company the longest have lower salaries.
D) The majority of employees have been with the company a long time.
E) The majority of employees have been with the company a short period of time.
Q2) Selecting data so as to assure that each class is properly represented in both the training and test set.
A) cross validation
B) stratification
C) verification
D) bootstrapping
Q3) Data used to optimize the parameter settings of a supervised learner model.
A) training
B) test
C) verification
D) validation
To view all questions and flashcards with answers, click on the resource link above.
Page 9

Available Study Resources on Quizplus for this Chatper
10 Verified Questions
10 Flashcards
Source URL: https://quizplus.com/quiz/78440
Sample Questions
Q1) A feed-forward neural network is said to be fully connected when
A) all nodes are connected to each other.
B) all nodes at the same layer are connected to each other.
C) all nodes at one layer are connected to the nodes in the next higher layer.
D) all hidden layer nodes are connected to all output layer nodes.
Q2) This neural network explanation technique is used to determine the relative importance of individual input attributes.
A) sensitivity analysis
B) average member technique
C) mean squared error analysis
D) absolute average technique
Q3) Neural network training is accomplished by repeatedly passing the training data through the network while
A) individual network weights are modified.
B) training instance attribute values are modified.
C) the ordering of the training instances is modified.
D) individual network nodes have the coefficients on their corresponding functional parameters modified.
To view all questions and flashcards with answers, click on the resource link above.
10

Available Study Resources on Quizplus for this Chatper
4 Verified Questions
4 Flashcards
Source URL: https://quizplus.com/quiz/78441
Sample Questions
Q1) The test set accuracy of a backpropagation neural network can often be improved by
A) increasing the number of epochs used to train the network.
B) decreasing the number of hidden layer nodes.
C) increasing the learning rate.
D) decreasing the number of hidden layers.
Q2) The total delta measures the total absolute change in network connection weights for each pass of the training data through a neural network. This value is most often used to determine the convergence of a
A) perceptron network.
B) feed-forward network.
C) backpropagation network.
D) self-organizing network.
Q3) This type of supervised network architecture does not contain a hidden layer.
A) backpropagation
B) perceptron
C) self-organizing map
D) genetic
To view all questions and flashcards with answers, click on the resource link above.
11

Available Study Resources on Quizplus for this Chatper
13 Verified Questions
13 Flashcards
Source URL: https://quizplus.com/quiz/78442
Sample Questions
Q1) This clustering algorithm merges and splits nodes to help modify nonoptimal partitions.
A) agglomerative clustering
B) expectation maximization
C) conceptual clustering
D) K-Means clustering
Q2) This supervised learning technique can process both numeric and categorical input attributes.
A) linear regression
B) Bayes classifier
C) logistic regression
D) backpropagation learning
Q3) Machine learning techniques differ from statistical techniques in that machine learning methods
A) typically assume an underlying distribution for the data.
B) are better able to deal with missing and noisy data.
C) are not able to explain their behavior.
D) have trouble with large-sized datasets.
To view all questions and flashcards with answers, click on the resource link above.
12
Available Study Resources on Quizplus for this Chatper
10 Verified Questions
10 Flashcards
Source URL: https://quizplus.com/quiz/78443
Sample Questions
Q1) A set of pageviews requested by a single user from a Web server.
A) index page
B) common log
C) session
D) page frame
Q2) The automation of Web site adaptation involves creating and deleting
A) index pages
B) cookies
C) pageviews
D) clickstreams
Q3) A data file that contains session information.
A) cookie
B) pageview
C) page frame
D) common log
Q4) Usage profiles for Web-based personalization contain several
A) pageviews
B) clickstreams
C) cookies
D) session files

Page 13
To view all questions and flashcards with answers, click on the resource link above.

Available Study Resources on Quizplus for this Chatper
15 Verified Questions
15 Flashcards
Source URL: https://quizplus.com/quiz/78444
Sample Questions
Q1) An internal test of an expert system whose purpose is to determine if the system uses the same reasoning process as the experts) used to build the system.
A) validation
B) verification
C) reliability
D) suitability
Q2) A problem that cannot be solved with a computer using a traditional algorithmic technique.
A) exponentially hard problem
B) recursive problem
C) non-transformable problem
D) combinatorial problem
Q3) Knowledge about knowledge is known as
A) metaknowledge
B) class knowledge
C) structured knowledge
D) classified knowledge
Q4) Construct a goal tree using the following production rules. Assume the goal is g.
To view all questions and flashcards with answers, click on the resource link above.
Page 14

Available Study Resources on Quizplus for this Chatper
10 Verified Questions
10 Flashcards
Source URL: https://quizplus.com/quiz/78445
Sample Questions
Q1) A fuzzy set is associated with a
A) linguistic variable.
B) certainty factor.
C) hypothesis to be tested.
D) linguistic value.
Q2) This technique is used to determine the height of a rule consequent membership function as determined by the truth of the rule's antecedent condition.
A) fuzzy set union
B) fuzzy set intersection
C) center of gravity
D) clipping
Q3) Computing the probability of picking a heart from a deck of 52 cards can be determined using ______ probability technique.
A) an objective
B) an experimental
C) a subjective
D) an inexact
To view all questions and flashcards with answers, click on the resource link above. Page 15
Available Study Resources on Quizplus for this Chatper
6 Verified Questions
6 Flashcards
Source URL: https://quizplus.com/quiz/78446
Sample Questions
Q1) This type of agent resides inside a data warehouse in an attempt to discover changes in business trends.
A) semiautonomous agent
B) cooperative agent
C) data mining agent
D) filtering agent
Q2) A fundamental difference between a data mining approach to problem solving and an expert systems approach is
A) the output of an expert system is a set of rules and the output of a data mining technique is a decision tree.
B) a data mining technique builds a model without the aid of a human expert.
C) a model built using a data mining technique can explain how decisions are made but an expert system cannot.
D) an expert system is built using inductive learning whereas a data mining model is built using one or several deductive techniques
To view all questions and flashcards with answers, click on the resource link above.

16