Review the following data and Pig code.
M,38,95111
F,29,95060
F,45,95192
M,62,95102
F,56,95102
A = LOAD andapos;dataandapos; USING PigStorage(andapos;.andapos;) as (gender:Chararray, age:int,
zlp:chararray);
B = FOREACH A GENERATE age;
Which one of the following commands would save the results of B to a folder in hdfs named myoutput?
A. STORE A INTO andapos;myoutputandapos; USING PigStorage(andapos;,andapos;);
B. DUMP B using PigStorage(andapos;myoutputandapos;);
C. STORE B INTO andapos;myoutputandapos;;
D. DUMP B INTO andapos;myoutputandapos;;
You want to count the number of occurrences for each unique word in the supplied input data. You've decided to implement this by having your mapper tokenize each word and emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you that you could optimize this by specifying a combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?
A. Yes, because the sum operation is both associative and commutative and the input and output types to the reduce method match.
B. No, because the sum operation in the reducer is incompatible with the operation of a Combiner.
C. No, because the Reducer and Combiner are separate interfaces.
D. No, because the Combiner is incompatible with a mapper which doesn't use the same data type for both the key and value.
E. Yes, because Java is a polymorphic object-oriented language and thus reducer code can be reused as a combiner.
A client application creates an HDFS file named foo.txt with a replication factor of 3. Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?
A. The file will be marked as corrupted if data node B fails during the creation of the file.
B. Each data node locks the local file to prohibit concurrent readers and writers of the file.
C. Each data node stores a copy of the file in the local file system with the same name as the HDFS file.
D. The file can be accessed if at least one of the data nodes storing the file is available.
Which one of the following statements is FALSE regarding the communication between DataNodes and a federation of NameNodes in Hadoop 2.2?
A. Each DataNode receives commands from one designated master NameNode.
B. DataNodes send periodic heartbeats to all the NameNodes.
C. Each DataNode registers with all the NameNodes.
D. DataNodes send periodic block reports to all the NameNodes.
What does the following command do?
register andapos;/piggyban):/pig-files.jarandapos;;
A. Invokes the user-defined functions contained in the jar file
B. Assigns a name to a user-defined function or streaming command
C. Transforms Pig user-defined functions into a format that Hive can accept
D. Specifies the location of the JAR file containing the user-defined functions
Which one of the following statements is true about a Hive-managed table?
A. Records can only be added to the table using the Hive INSERT command.
B. When the table is dropped, the underlying folder in HDFS is deleted.
C. Hive dynamically defines the schema of the table based on the FROM clause of a SELECT query.
D. Hive dynamically defines the schema of the table based on the format of the underlying data.
MapReduce v2 (MRv2/YARN) is designed to address which two issues?
A. Single point of failure in the NameNode.
B. Resource pressure on the JobTracker.
C. HDFS latency.
D. Ability to run frameworks other than MapReduce, such as MPI.
E. Reduce complexity of the MapReduce APIs.
F. Standardize on a single MapReduce API.
You have written a Mapper which invokes the following five calls to the OutputColletor.collect method:
output.collect (new Text ("Apple"), new Text ("Red") ) ;
output.collect (new Text ("Banana"), new Text ("Yellow") ) ;
output.collect (new Text ("Apple"), new Text ("Yellow") ) ;
output.collect (new Text ("Cherry"), new Text ("Red") ) ;
output.collect (new Text ("Apple"), new Text ("Green") ) ;
How many times will the Reducer's reduce method be invoked?
A. 6
B. 3
C. 1
D. 0
E. 5
A NameNode in Hadoop 2.2 manages ______________.
A. Two namespaces: an active namespace and a backup namespace
B. A single namespace
C. An arbitrary number of namespaces
D. No namespaces
In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
A. Increase the parameter that controls minimum split size in the job configuration.
B. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
C. Set the number of mappers equal to the number of input files you want to process.
D. Write a custom FileInputFormat and override the method isSplitable to always return false.