Exam2pass
0 items Sign In or Register
  • Home
  • IT Exams
  • Guarantee
  • FAQs
  • Reviews
  • Contact Us
  • Demo
Exam2pass > Databricks > Databricks Certifications > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

Questions 4

Which of the following statements about stages is correct?

A. Different stages in a job may be executed in parallel.

B. Stages consist of one or more jobs.

C. Stages ephemerally store transactions, before they are committed through actions.

D. Tasks in a stage may be executed by multiple machines at the same time.

E. Stages may contain multiple actions, narrow, and wide transformations.

Buy Now

Correct Answer: D

Questions 5

Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?

A. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) 2.dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))

B. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) 2.dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-ddHH:mm:ss"))

C. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) 2.dfDates = dfDates.withColumn("date", to_timestamp("date", "dd/MM/yyyy HH:mm:ss"))

D. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) 2.dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-ddHH:mm:ss"))

E. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

Buy Now

Correct Answer: C

Questions 6

Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

A. DataFrame.repartition(12)

B. DataFrame.coalesce(6).shuffle()

C. DataFrame.coalesce(6)

D. DataFrame.coalesce(6, shuffle=True)

E. DataFrame.repartition(6)

Buy Now

Correct Answer: E

Questions 7

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A. 1. filter

2.

array_contains("cozy")

3.

select

4.

"itemId"

5.

explode

6.

"attributes"

B. 1. where

2.

"array_contains(attributes, 'cozy')"

3.

select

4.

itemId

5.

explode

6.

attributes

C. 1. filter

2.

"array_contains(attributes, 'cozy')"

3.

select

4.

"itemId"

5.

map

6.

"attributes"

D. 1. filter

2.

"array_contains(attributes, cozy)"

3.

select

4.

"itemId"

5.

explode

6.

"attributes"

E. 1. filter

2.

"array_contains(attributes, 'cozy')"

3.

select

4.

"itemId"

5.

explode

6.

"attributes"

Buy Now

Correct Answer: E

The correct code block is:

itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes")) The key here is

understanding how to use array_contains(). You can either use it as an expression in a string, or you can

import it from pyspark.sql.functions. In that case, the following would also

work:

itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook |

Dynamic notebook: See test 1, 29 (Databricks import instructions) (https://flrs.github.io/

spark_practice_tests_code/#1/29.html , https://bit.ly/sparkpracticeexams_import_instructions)

Questions 8

Which of the following code blocks returns a DataFrame that matches the multi-column DataFrame itemsDf, except that integer column itemId has been converted into a string column?

A. itemsDf.withColumn("itemId", convert("itemId", "string"))

B. itemsDf.withColumn("itemId", col("itemId").cast("string"))

C. itemsDf.select(cast("itemId", "string"))

D. itemsDf.withColumn("itemId", col("itemId").convert("string"))

E. spark.cast(itemsDf, "itemId", "string")

Buy Now

Correct Answer: B

itemsDf.withColumn("itemId", col("itemId").cast("string")) Correct. You can convert the data type of a column using the cast method of the Column class. Also note that you will have to use the withColumn method on itemsDf for replacing the existing itemId column with the new version that contains strings. itemsDf.withColumn("itemId", col("itemId").convert ("string")) Incorrect. The Column object that col("itemId") returns does not have a convert method. itemsDf.withColumn("itemId", convert("itemId", "string")) Wrong. Spark's spark.sql.functions module does not have a convert method. The is trying to mislead you by using the word "converted". Type conversion is also called "type casting". This may help you remember to look for a cast method instead of a convert method (see correct answer). itemsDf.select(astype("itemId", "string")) False. While astype is a method of Column (and an alias of Column.cast), it is not a method of pyspark.sql.functions (what the code block implies). In addition, the

Questions 9

Which of the following describes Spark's way of managing memory?

A. Spark uses a subset of the reserved system memory.

B. Storage memory is used for caching partitions derived from DataFrames.

C. As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

D. Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

E. Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

Buy Now

Correct Answer: B

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

No, it is either execution or storage.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

No, Spark's garbage collection runs faster on fewer big objects than many small objects. Disabling

serialization potentially greatly reduces the memory footprint of a Spark application.

The opposite is true ?serialization reduces the memory footprint, but may impact performance in a

negative way.

Spark uses a subset of the reserved system memory. No, the reserved system memory is separate from

Spark memory. Reserved memory stores Spark's internal objects.

More info: Tuning - Spark 3.1.2 Documentation, Spark Memory Management | Distributed Systems

Architecture, Learning Spark, 2nd Edition, Chapter 7

Questions 10

Which of the following code blocks prints out in how many rows the expression Inc. appears in the stringtype column supplier of DataFrame itemsDf?

A. 1.counter = 0

2.

3.for index, row in itemsDf.iterrows():

4.

if 'Inc.' in row['supplier']:

5.

counter = counter + 1

6.

7.print(counter)

B. 1.counter = 0

2.

3.def count(x):

4.

if 'Inc.' in x['supplier']:

5.

counter = counter + 1

6.

7.itemsDf.foreach(count)

8.print(counter)

C. print(itemsDf.foreach(lambda x: 'Inc.' in x))

D. print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

E. 1.accum=sc.accumulator(0)

2.

3.def check_if_inc_in_supplier(row):

4.

if 'Inc.' in row['supplier']:

5.

accum.add(1)

6.

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Buy Now

Correct Answer: E

Correct code block:

accum=sc.accumulator(0)

def check_if_inc_in_supplier(row):

if 'Inc.' in row['supplier']:

accum.add(1)

itemsDf.foreach(check_if_inc_in_supplier)

print(accum.value)

To answer this correctly, you need to know both about the DataFrame.foreach() method and

accumulators.

When Spark runs the code, it executes it on the executors. The executors do not have any information

about variables outside of their scope. This is whhy simply using a Python variable counter,

like in the two examples that start with counter = 0, will not work. You need to tell the executors explicitly

that counter is a special shared variable, an Accumulator, which is managed by the driver

and can be accessed by all executors for the purpose of adding to it. If you have used Pandas in the past,

you might be familiar with the iterrows() command.

Notice that there is no such command in PySpark.

The two examples that start with print do not work, since DataFrame.foreach() does not have a return

value.

More info: pyspark.sql.DataFrame.foreach -- PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, 22 (Databricks import instructions)

Questions 11

Which of the following options describes the responsibility of the executors in Spark?

A. The executors accept jobs from the driver, analyze those jobs, and return results to the driver.

B. The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.

C. The executors accept tasks from the driver, execute those tasks, and return results to the driver.

D. The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.

E. The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

Buy Now

Correct Answer: C

More info: Running Spark: an overview of Spark's runtime architecture - Manning (https://bit.ly/2RPmJn9)

Questions 12

The code block displayed below contains an error. The code block should merge the rows of DataFrames transactionsDfMonday and transactionsDfTuesday into a new DataFrame, matching column names and inserting null values where column names do not appear in both DataFrames. Find the error.

Sample of DataFrame transactionsDfMonday:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 5| null| null| null| 2|null|

5.| 6| 3| 2| 25| 2|null|

6.+-------------+---------+-----+-------+---------+----+

Sample of DataFrame transactionsDfTuesday:

1.+-------+-------------+---------+-----+

2.|storeId|transactionId|productId|value|

3.+-------+-------------+---------+-----+

4.| 25| 1| 1| 4|

5.| 2| 2| 2| 7|

6.| 3| 4| 2| null|

7.| null| 5| 2| null|

8.+-------+-------------+---------+-----+

Code block:

sc.union([transactionsDfMonday, transactionsDfTuesday])

A. The DataFrames' RDDs need to be passed into the sc.union method instead of the DataFrame variable names.

B. Instead of union, the concat method should be used, making sure to not use its default arguments.

C. Instead of the Spark context, transactionDfMonday should be called with the join method instead of the union method, making sure to use its default arguments.

D. Instead of the Spark context, transactionDfMonday should be called with the union method.

E. Instead of the Spark context, transactionDfMonday should be called with the unionByName method instead of the union method, making sure to not use its default arguments.

Buy Now

Correct Answer: E

Correct code block:

transactionsDfMonday.unionByName(transactionsDfTuesday, True) Output of correct code block:

+-------------+---------+-----+-------+---------+----+ |transactionId|predError|value|storeId|productId| f|

+-------------+---------+-----+-------+---------+----+ | 5| null| null| null| 2|null|

| 6| 3| 2| 25| 2|null|

| 1| null| 4| 25| 1|null|

| 2| null| 7| 2| 2|null|

| 4| null| null| 3| 2|null|

| 5| null| null| null| 2|null|

+-------------+---------+-----+-------+---------+----+ For solving this question, you should be aware of the

difference between the DataFrame.union() and DataFrame.unionByName() methods. The first one

matches columns independent of their

names, just by their order. The second one matches columns by their name (which is asked for in the

Questions 13

The code block shown below should return a new 2-column DataFrame that shows one attribute from column attributes per row next to the associated itemName, for all suppliers in column supplier whose name includes Sports. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

2.|itemId|itemName |attributes |supplier |

3.+------+----------------------------------+-----------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.| 5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |

6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.__1__(__2__).select(__3__, __4__) A. 1. filter

2.

col("supplier").isin("Sports")

3.

"itemName"

4.

explode(col("attributes"))

B. 1. where

2.

col("supplier").contains("Sports")

3.

"itemName"

4.

"attributes"

C. 1. where

2.

col(supplier).contains("Sports")

3.

explode(attributes)

4.

itemName

D. 1. where

2.

"Sports".isin(col("Supplier"))

3.

"itemName"

4.

array_explode("attributes")

E. 1. filter

2.

col("supplier").contains("Sports")

3.

"itemName"

4.

explode("attributes")

Buy Now

Correct Answer: E

Output of correct code block:

+----------------------------------+------+

|itemName |col |

+----------------------------------+------+

|Thick Coat for Walking in the Snow|blue |

|Thick Coat for Walking in the Snow|winter|

|Thick Coat for Walking in the Snow|cozy |

|Outdoors Backpack |green |

|Outdoors Backpack |summer|

|Outdoors Backpack |travel|

+----------------------------------+------+

The key to solving this is knowing about Spark's explode operator. Using this operator, you can extract

values from arrays into single rows. The following guidance steps through

the

answers systematically from the first to the last gap. Note that there are many ways to solving the gap

Exam Code: DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0
Last Update: Jun 13, 2025
Questions: 180

PDF (Q&A)

$45.99
ADD TO CART

VCE

$49.99
ADD TO CART

PDF + VCE

$59.99
ADD TO CART

Exam2Pass----The Most Reliable Exam Preparation Assistance

There are tens of thousands of certification exam dumps provided on the internet. And how to choose the most reliable one among them is the first problem one certification candidate should face. Exam2Pass provide a shot cut to pass the exam and get the certification. If you need help on any questions or any Exam2Pass exam PDF and VCE simulators, customer support team is ready to help at any time when required.

Home | Guarantee & Policy |  Privacy & Policy |  Terms & Conditions |  How to buy |  FAQs |  About Us |  Contact Us |  Demo |  Reviews

2025 Copyright @ exam2pass.com All trademarks are the property of their respective vendors. We are not associated with any of them.