Exam2pass
0 items Sign In or Register
  • Home
  • IT Exams
  • Guarantee
  • FAQs
  • Reviews
  • Contact Us
  • Demo
Exam2pass > Google > Google Certifications > PROFESSIONAL-DATA-ENGINEER > PROFESSIONAL-DATA-ENGINEER Online Practice Questions and Answers

PROFESSIONAL-DATA-ENGINEER Online Practice Questions and Answers

Questions 4

You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want to see if you can improve training speed by removing some features while having a minimum effect on model accuracy. What can you do?

A. Eliminate features that are highly correlated to the output labels.

B. Combine highly co-dependent features into one representative feature.

C. Instead of feeding in each feature individually, average their values in batches of 3.

D. Remove the features that have null values for more than 50% of the training records.

Buy Now

Correct Answer: B

Questions 5

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

A. Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.

B. Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.

C. Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.

D. Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.

Buy Now

Correct Answer: D

Questions 6

You are training a spam classifier. You notice that you are overfitting the training data. Which three actions can you take to resolve this problem? (Choose three.)

A. Get more training examples

B. Reduce the number of training examples

C. Use a smaller set of features

D. Use a larger set of features

E. Increase the regularization parameters

F. Decrease the regularization parameters

Buy Now

Correct Answer: ADF

Questions 7

You are developing a model to identify the factors that lead to sales conversions for your customers. You have completed processing your data. You want to continue through the model development lifecycle. What should you do next?

A. Use your model to run predictions on fresh customer input data.

B. Test and evaluate your model on your curated data to determine how well the model performs.

C. Monitor your model performance, and make any adjustments needed.

D. Delineate what data will be used for testing and what will be used for training the model.

Buy Now

Correct Answer: B

After processing your data, the next step in the model development lifecycle is to test and evaluate your model on the curated data. This is crucial to determine the performance of the model and to understand how well it can predict sales conversions for your customers. The evaluation phase involves using various metrics and techniques to assess the accuracy, precision, recall, and other relevant performance indicators of the model. It helps in identifying any issues or areas for improvement before deploying the model in a production environment. References: The information provided here is verified by the Google Professional Data Engineer Certification uide and related resources, which outline the steps and best practices in the model development lifecycle

Questions 8

You need to set access to BigQuery for different departments within your company.

Your solution should comply with the following requirements:

1.

Each department should have access only to their data. Each department will have one or more leads who need to be able to create and update tables and provide them to their team.

2.

Each department has data analysts who need to be able to query but not modify data.

How should you set access to the data in BigQuery?

A. Create a dataset for each department. Assign the department leads the role of OWNER, and assign the data analysts the role of WRITER on their dataset.

B. Create a dataset for each department. Assign the department leads the role of WRITER, and assign the data analysts the role of READER on their dataset.

C. Create a table for each department. Assign the department leads the role of Owner, and assign the data analysts the role of Editor on the project the table is in.

D. Create a table for each department. Assign the department leads the role of Editor, and assign the data analysts the role of Viewer on the project the table is in.

Buy Now

Correct Answer: D

Questions 9

Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud?

A. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.

B. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.

C. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.

D. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional storage bucket as a final destination.

Buy Now

Correct Answer: B

Questions 10

Which of the following are feature engineering techniques? (Select 2 answers)

A. Hidden feature layers

B. Feature prioritization

C. Crossed feature columns

D. Bucketization of a continuous feature

Buy Now

Correct Answer: CD

Selecting and crafting the right set of feature columns is key to learning an effective model. Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into. Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.

Reference: https://www.tensorflow.org/tutorials/wide#selecting_and_engineering_features_for_the_mo del

Questions 11

When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

A. 500 TB

B. 1 GB

C. 1 TB

D. 500 GB

Buy Now

Correct Answer: C

Cloud Bigtable is not a relational database. It does not support SQL queries, joins, or multi- row transactions. It is not a good solution for less than 1 TB of data.

Reference:

https://cloud.google.com/bigtable/docs/overview#title_short_and_other_storage_options

Questions 12

How can you get a neural network to learn about relationships between categories in a categorical feature?

A. Create a multi-hot column

B. Create a one-hot column

C. Create a hash bucket

D. Create an embedding column

Buy Now

Correct Answer: D

There are two problems with one-hot encoding. First, it has high dimensionality, meaning that instead of having just one value, like a continuous feature, it has many values, or dimensions. This makes computation more time-consuming, especially if a feature has a very large number of categories. The second problem is that it doesn't encode any relationships between the categories. They are completely independent from each other, so the network has no way of knowing which ones are similar to each other. Both of these problems can be solved by representing a categorical feature with an embedding column. The idea is that each category has a smaller vector with, let's say, 5 values in it. But unlike a one-hot vector, the values are not usually 0. The values are weights, similar to the weights that are used for basic features in a neural network. The difference is that each category has a set of weights (5 of them in this case). You can think of each value in the embedding vector as a feature of the category. So, if two categories are very similar to each other, then their embedding vectors should be very similar too. Reference: https://cloudacademy.com/google/introduction-to-google-cloud-machine- learning-engine-course/a-wide-and-deep-model.html

Questions 13

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison.

What should you do?

A. Select random samples from the tables using the RAND() function and compare the samples.

B. Select random samples from the tables using the HASH() function and compare the samples.

C. Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.

D. Create stratified random samples using the OVER() function and compare equivalent samples from each table.

Buy Now

Correct Answer: B

Exam Code: PROFESSIONAL-DATA-ENGINEER
Exam Name: Professional Data Engineer on Google Cloud Platform
Last Update: Jun 11, 2025
Questions: 331

PDF (Q&A)

$45.99
ADD TO CART

VCE

$49.99
ADD TO CART

PDF + VCE

$59.99
ADD TO CART

Exam2Pass----The Most Reliable Exam Preparation Assistance

There are tens of thousands of certification exam dumps provided on the internet. And how to choose the most reliable one among them is the first problem one certification candidate should face. Exam2Pass provide a shot cut to pass the exam and get the certification. If you need help on any questions or any Exam2Pass exam PDF and VCE simulators, customer support team is ready to help at any time when required.

Home | Guarantee & Policy |  Privacy & Policy |  Terms & Conditions |  How to buy |  FAQs |  About Us |  Contact Us |  Demo |  Reviews

2025 Copyright @ exam2pass.com All trademarks are the property of their respective vendors. We are not associated with any of them.