Rate this post

初心者向けのDatabricks-Certified-Professional-Data-Engineer試験 [2023] 問題集でDatabricksのPDF問題


新問題 11
To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.
The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.
Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?


新問題 12
What is the main difference between the below two commands?
2.SELECT * FROM table
2.AS SELECT * FROM table


新問題 13
If you create a database sample_db with the statement CREATE DATABASE sample_db what will be the default location of the database in DBFS?


新問題 14
An external object storage container has been mounted to the location/mnt/finance_eda_bucket.
The following logic was executed to create a database for the finance team:

After the database was successfully created and permissions configured, a member of the finance team runs the following code:

If all users on the finance team are members of thefinancegroup, which statement describes how thetx_sales table will be created?


新問題 15
The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. Therecent_sensor_recordingstable contains an identifyingsensor_idalongside thetimestampandtemperaturefor the most recent 5 minutes of recordings.
The below query is used to create the alert:

The query is set to refresh each minute and always completes in less than 10 seconds. The alert is set to trigger whenmean (temperature) > 120. Notifications are triggered to be sent at most every 1 minute.
If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?


新問題 16
The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame namedpredswith the schema “customer_id LONG, predictions DOUBLE, date DATE”.

The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.
Which code block accomplishes this task while minimizing potential compute costs?


新問題 17
A team member is leaving the team and he/she is currently the owner of the few tables, instead of transfering the ownership to a user you have decided to transfer the ownership to a group so in the future anyone in the group can manage the permissions rather than a single individual, which of the following commands help you accomplish this?


新問題 18
You are still noticing slowness in query after performing optimize which helped you to resolve the small files problem, the column(transactionId) you are using to filter the data has high cardinality and auto incrementing number. Which delta optimization can you enable to filter data effectively based on this column?


新問題 19
The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.
The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series ofVACUUMcommands on all Delta Lake tables throughout the organization.
The compliance officer has recently learned about Delta Lake’s time travel functionality. They are concerned that this might allow continued access to deleted data.
Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?


新問題 20
A data engineer has created a Delta table as part of a data pipeline. Downstream data analysts now need
SELECT permission on the Delta table.
Assuming the data engineer is the Delta table owner, which part of the Databricks Lakehouse Plat-form can
the data engineer use to grant the data analysts the appropriate access?


新問題 21
Which of the following data workloads will utilize a Silver table as its source?


新問題 22
Which of the following Structured Streaming queries is performing a hop from a Bronze table to a Silver


新問題 23
When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?


新問題 24
Which of the following developer operations in CI/CD flow can be implemented in Databricks Re-pos?


新問題 25
Which of the following is correct for the global temporary view?


新問題 26
Consider flipping a coin for which the probability of heads is p, where p is unknown, and our goa is to
estimate p. The obvious approach is to count how many times the coin came up heads and divide by the total
number of coin flips. If we flip the coin 1000 times and it comes up heads 367 times, it is very reasonable to
estimate p as approximately 0.367. However, suppose we flip the coin only twice and we get heads both times.
Is it reasonable to estimate p as 1.0? Intuitively, given that we only flipped the coin twice, it seems a bit
rash to conclude that the coin will always come up heads, and____________is a way of avoiding such rash


新問題 27
How do you access or use tables in the unity catalog?


新問題 28
Which of the following locations hosts the driver and worker nodes of a Databricks-managed clus-ter?


新問題 29
Your team has hundreds of jobs running but it is difficult to track cost of each job run, you are asked to provide a recommendation on how to monitor and track cost across various workloads


新問題 30
What could be the expected output of query SELECT COUNT (DISTINCT *) FROM user on this table


新問題 31
Which of the following Structured Streaming queries is performing a hop from a bronze table to a Silver table?

