Which of the following process best covers all of the following characteristics?
Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns. ?Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates,functional dependencies, embedded value dependencies, and performing inter-table analysis.
A. Data Visualization
B. Data Virtualization
C. Data Profiling
D. Data Collection
Which command is used to install Jupyter Notebook?
A. pip install jupyter
B. pip install notebook
C. pip install jupyter-notebook
D. pip install nbconvert
Which Python method can be used to Remove duplicates by Data scientist?
A. remove_duplicates()
B. duplicates()
C. drop_duplicates()
D. clean_duplicates()
As Data Scientist looking out to use Reader account, Which ones are the correct considerations about Reader Accounts for Third-Party Access?
A. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
B. Each reader account belongs to the provider account that created it.
C. Users in a reader account can query data that has been shared with the reader account, but cannot perform any of the DML tasks that are allowed in a full account, such as data loading, insert, update, and similar data manipulation operations.
D. Data sharing is only possible between Snowflake accounts.
Which method is used for detecting data outliers in Machine learning?
A. Scaler
B. Z-Score
C. BOXI
D. CMIYC
Consider a data frame df with 10 rows and index [ 'r1', 'r2', 'r3', 'row4', 'row5', 'row6', 'r7', 'r8', 'r9', 'row10']. What does the aggregate method shown in below code do?
g = df.groupby(df.index.str.len())
A. aggregate({'A':len, 'B':np.sum})
B. Computes Sum of column A values
C. Computes length of column A
D. Computes length of column A and Sum of Column B values of each group
E. Computes length of column A and Sum of Column B values
Which of the following is a Python-based web application framework for visualizing data and analyzing results in a more efficient and flexible way?
A. StreamBI
B. Streamlit
C. Streamsets
D. Rapter
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?
A. by 1
B. no change
C. by intercept
D. by its slope
Which of the Following is not type of Windows function in Snowflake? Choose 2.
A. Rank-related functions.
B. Window frame functions.
C. Aggregation window functions.
D. Association functions.
Which of the following is a common evaluation metric for binary classification?
A. Accuracy
B. F1 score
C. Mean squared error (MSE)
D. Area under the ROC curve (AUC)