A large company receives files from external parties in Amazon EC2 throughout the day. At the end of the day, the files are combined into a single file, compressed into a gzip file, and uploaded to Amazon S3. The total size of all the files is close to 100 GB daily. Once the files are uploaded to Amazon S3, an AWS Batch program executes a COPY command to load the files into an Amazon Redshift cluster.
Which program modification will accelerate the COPY process?
A. Upload the individual files to Amazon S3 and run the COPY command as soon as the files become available.
B. Split the number of files so they are equal to a multiple of the number of slices in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
C. Split the number of files so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
D. Apply sharding by breaking up the files so the distkey columns with the same values go to the same file. Gzip and upload the sharded files to Amazon S3. Run the COPY command on the files.
A retail company's data analytics team recently created multiple product sales analysis dashboards for the average selling price per product using Amazon QuickSight. The dashboards were created from .csv files uploaded to Amazon S3. The team is now planning to share the dashboards with the respective external product owners by creating individual users in Amazon QuickSight. For compliance and governance reasons, restricting access is a key requirement. The product owners should view only their respective product analysis in the dashboard reports.
Which approach should the data analytics team take to allow product owners to view only their products in the dashboard?
A. Separate the data by product and use S3 bucket policies for authorization.
B. Separate the data by product and use IAM policies for authorization.
C. Create a manifest file with row-level security.
D. Create dataset rules with row-level security.
A large university has adopted a strategic goal of increasing diversity among enrolled students. The data analytics team is creating a dashboard with data visualizations to enable stakeholders to view historical trends. All access must be authenticated using Microsoft Active Directory. All data in transit and at rest must be encrypted.
Which solution meets these requirements?
A. Amazon QuickSight Standard edition configured to perform identity federation using SAML 2.0. and the default encryption settings.
B. Amazon QuickSight Enterprise edition configured to perform identity federation using SAML 2.0 and the default encryption settings.
C. Amazon QuckSight Standard edition using AD Connector to authenticate using Active Directory. Configure Amazon QuickSight to use customer-provided keys imported into AWS KMS.
D. Amazon QuickSight Enterprise edition using AD Connector to authenticate using Active Directory. Configure Amazon QuickSight to use customer-provided keys imported into AWS KMS.
A bank operates in a regulated environment. The compliance requirements for the country in which the bank operates say that customer data for each state should only be accessible by the bank's employees located in the same state. Bank employees in one state should NOT be able to access data for customers who have provided a home address in a different state.
The bank's marketing team has hired a data analyst to gather insights from customer data for a new campaign being launched in certain states. Currently, data linking each customer account to its home state is stored in a tabular .csv file within a single Amazon S3 folder in a private S3 bucket. The total size of the S3 folder is 2 GB uncompressed. Due to the country's compliance requirements, the marketing team is not able to access this folder.
The data analyst is responsible for ensuring that the marketing team gets one-time access to customer data for their campaign analytics project, while being subject to all the compliance requirements and controls.
Which solution should the data analyst implement to meet the desired requirements with the LEAST amount of setup effort?
A. Re-arrange data in Amazon S3 to store customer data about each state in a different S3 folder within the same bucket. Set up S3 bucket policies to provide marketing employees with appropriate data access under compliance controls. Delete the bucket policies after the project.
B. Load tabular data from Amazon S3 to an Amazon EMR cluster using s3DistCp. Implement a custom Hadoop-based row-level security solution on the Hadoop Distributed File System (HDFS) to provide marketing employees with appropriate data access under compliance controls. Terminate the EMR cluster after the project.
C. Load tabular data from Amazon S3 to Amazon Redshift with the COPY command. Use the built-in row-level security feature in Amazon Redshift to provide marketing employees with appropriate data access under compliance controls. Delete the Amazon Redshift tables after the project.
D. Load tabular data from Amazon S3 to Amazon QuickSight Enterprise edition by directly importing it as a data source. Use the built-in row-level security feature in Amazon QuickSight to provide marketing employees with appropriate data access under compliance controls. Delete Amazon QuickSight data sources after the project is complete.
An online retail company is using Amazon Redshift to run queries and perform analytics on customer shopping behavior. When multiple queries are running on the cluster, runtime for small queries increases significantly. The company's data analytics team to decrease the runtime of these small queries by prioritizing them ahead of large queries.
Which solution will meet these requirements?
A. Use Amazon Redshift Spectrum for small queries
B. Increase the concurrency limit in workload management (WLM)
C. Configure short query acceleration in workload management (WLM)
D. Add a dedicated compute node for small queries
A company is storing millions of sales transaction records in Amazon Redshift. A data analyst must perform an analysis on sales data. The analysis depends on a subset of customer record data that resides in a Salesforce application. The company wants to transfer the data from Salesforce with the least possible infrastructure setup, coding, and operational effort.
Which solution meets these requirements?
A. Use AWS Glue and the SpringML library to connect Apache Spark with Salesforce and extract the data as a table to Amazon S3 in Apache Parquet format. Query the data by using Amazon Redshift Spectrum.
B. Use Amazon AppFlow to create a flow. Establish a connection and a flow trigger to transfer customer record data from Salesforce to an Amazon Redshift table.
C. Use Amazon API Gateway to configure a Salesforce customer data flow subscription to AWS Lambda events and create tables in Amazon S3 in Apache Parquet format. Query the data by using Amazon Redshift Spectrum.
D. Use Salesforce Data Loader to export the Salesforce customer data as a .csv file and load it into Amazon S3. Query the data by using Amazon Redshift Spectrum.
A company needs a solution to control data access for the company's Amazon S3 data lake. The company expects the number of data sources in the data lake and the number of users that access the data to increase rapidly. All the data in the data lake is cataloged in an AWS Glue Data Catalog. Users access the data by using Amazon Athena and Amazon QuickSight.
A data analytics specialist must implement a solution that controls which users can ingest new data into the data lake. The solution also must restrict access to data at the column level and must provide audit capabilities.
Which solution will meet these requirements?
A. Use IAM resource-based policies to allow access to required S3 prefixes only. Use AWS CloudTrail for audit logs.
B. Use AWS Lake Formation access controls for the data in the data lake. Use AWS CloudTrail for audit logs.
C. Use IAM identity-based policies to allow access to authorized users only. Use Amazon CloudWatch for audit logs.
D. Use Athena federated queries to access the data in the data lake. Use S3 server access logs for audit logs.
A data analyst notices the following error message while loading data to an Amazon Redshift cluster:
“The bucket you are attempting to access must be addressed using the specified endpoint.”
What should the data analyst do to resolve this issue?
A. Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.
B. Change the Amazon S3 object's ACL to grant the S3 bucket owner full control of the object.
C. Launch the Redshift cluster in a VPC.
D. Configure the timeout settings according to the operating system used to connect to the Redshift cluster.
A large fashion retailer wants to transform a source dataset to a consumable format. The retailer is building an ETL pipeline and needs to deduplicate the data because the retailer's various departments share similar customer and stock information. The retailer wants to build a data lake in Amazon S3 after the transformation and deduplication processes are completed.
Which solution MOST cost-effectively meets these requirements?
A. Load the data into Amazon Redshift and build custom deduplication scripts by using SQL. Use the UNLOAD command in Amazon Redshift to store the data in Amazon S3.
B. Use AWS Glue to transform the data and use FindMatches to deduplicate the data. Store the output in Amazon S3.
C. Use Amazon EMR to transform the data. Deduplicate the data by using custom Spark SQL scripts and use EMRFS to store the output in Amazon S3.
D. Use an Amazon Athena federated query to load the data from the sources. Build custom Athena SQL scripts to deduplicate and store the output to Amazon S3.
A bank wants to migrate a Teradata data warehouse to the AWS Cloud The bank needs a solution for reading large amounts of data and requires the highest possible performance. The solution also must maintain the separation of storage and compute.
Which solution meets these requirements?
A. Use Amazon Athena to query the data in Amazon S3
B. Use Amazon Redshift with dense compute nodes to query the data in Amazon Redshift managed storage
C. Use Amazon Redshift with RA3 nodes to query the data in Amazon Redshift managed storage
D. Use PrestoDB on Amazon EMR to query the data in Amazon S3