Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other?
A. Recall
B. Misclassification rate
C. Mean absolute percentage error (MAPE)
D. Area Under the ROC Curve (AUC)
A company has raw user and transaction data stored in AmazonS3 a MySQL database, and Amazon RedShift A Data Scientist needs to perform an analysis by joining the three datasets from Amazon S3, MySQL, and Amazon RedShift, and then calculating the average-of a few selected columns from the joined data
Which AWS service should the Data Scientist use?
A. Amazon Athena
B. Amazon Redshift Spectrum
C. AWS Glue
D. Amazon QuickSight
A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked.
Which services are integrated with Amazon SageMaker to track this information? (Choose two.)
A. AWS CloudTrail
B. AWS Health
C. AWS Trusted Advisor
D. Amazon CloudWatch
E. AWS Config
A Machine Learning Specialist is working with multiple data sources containing billions of records that need to be joined. What feature engineering and model development approach should the Specialist take with a dataset this large?
A. Use an Amazon SageMaker notebook for both feature engineering and model development
B. Use an Amazon SageMaker notebook for feature engineering and Amazon ML for model development
C. Use Amazon EMR for feature engineering and Amazon SageMaker SDK for model development
D. Use Amazon ML for both feature engineering and model development.
A technology startup is using complex deep neural networks and GPU compute to recommend the company's products to its existing customers based upon each customer's habits and interactions. The solution currently pulls each dataset from an Amazon S3 bucket before loading the data into a TensorFlow model pulled from the company's Git repository that runs locally. This job then runs for several hours while continually outputting its progress to the same S3 bucket. The job can be paused, restarted, and continued at any time in the event of a failure, and is run from a central queue.
Senior managers are concerned about the complexity of the solution's resource management and the costs involved in repeating the process regularly. They ask for the workload to the automated so it runs once a week, starting Monday and completing by the close of business Friday.
Which architecture should be used to scale the solution at the lowest cost?
A. Implement the solution using AWS Deep Learning Containers and run the container as a job using AWS Batch on a GPU-compatible Spot Instance
B. Implement the solution using a low-cost GPU compatible Amazon EC2 instance and use the AWS Instance Scheduler to schedule the task
C. Implement the solution using AWS Deep Learning Containers, run the workload using AWS Fargate running on Spot Instances, and then schedule the task using the built-in task scheduler
D. Implement the solution using Amazon ECS running on Spot Instances and schedule the task using the ECS service scheduler.
A company needs to quickly make sense of a large amount of data and gain insight from it. The data is in different formats, the schemas change frequently, and new data sources are added regularly. The company wants to use AWS services to explore multiple data sources, suggest schemas, and enrich and transform the data. The solution should require the least possible coding effort for the data flows and the least possible infrastructure management.
Which combination of AWS services will meet these requirements?
A. 1. Amazon EMR for data discovery, enrichment, and transformation
2.
Amazon Athena for querying and analyzing the results in Amazon S3 using standard SQL
3.
Amazon QuickSight for reporting and getting insights
B. 1. Amazon Kinesis Data Analytics for data ingestion
2.
Amazon EMR for data discovery, enrichment, and transformation
3.
Amazon Redshift for querying and analyzing the results in Amazon S3
C. 1. AWS Glue for data discovery, enrichment, and transformation
2.
Amazon Athena for querying and analyzing the results in Amazon S3 using standard SQL
3.
Amazon QuickSight for reporting and getting insights
D. 1. AWS Data Pipeline for data transfer
2.
AWS Step Functions for orchestrating AWS Lambda jobs for data discovery, enrichment, and transformation
3.
Amazon Athena for querying and analyzing the results in Amazon S3 using standard SQL
4.
Amazon QuickSight for reporting and getting insights
A machine learning (ML) specialist wants to bring a custom training algorithm to Amazon SageMaker. The ML specialist implements the algorithm in a Docker container that is supported by SageMaker. How should the ML specialist package the Docker container so that SageMaker can launch the training correctly?
A. Specify the server argument in the ENTRYPOINT instruction in the Dockerfile.
B. Specify the training program in the ENTRYPOINT instruction in the Dockerfile.
C. Include the path to the training data in the docker build command when packaging the container.
D. Use a COPY instruction in the Dockerfile to copy the training program to the /opt/ml/train directory.
An online retail company wants to develop a natural language processing (NLP) model to improve customer service. A machine learning (ML) specialist is setting up distributed training of a Bidirectional Encoder Representations from
Transformers (BERT) model on Amazon SageMaker. SageMaker will use eight compute instances for the distributed training.
The ML specialist wants to ensure the security of the data during the distributed training. The data is stored in an Amazon S3 bucket.
Which combination of steps should the ML specialist take to protect the data during the distributed training? (Choose three.)
A. Run distributed training jobs in a private VPC. Enable inter-container traffic encryption.
B. Run distributed training jobs across multiple VPCs. Enable VPC peering.
C. Create an S3 VPC endpoint. Then configure network routes, endpoint policies, and S3 bucket policies.
D. Grant read-only access to SageMaker resources by using an IAM role.
E. Create a NAT gateway. Assign an Elastic IP address for the NAT gateway.
F. Configure an inbound rule to allow traffic from a security group that is associated with the training instances.
A machine learning (ML) specialist is using the Amazon SageMaker DeepAR forecasting algorithm to train a model on CPU-based Amazon EC2 On-Demand instances. The model currently takes multiple hours to train. The ML specialist wants to decrease the training time of the model.
Which approaches will meet this requirement? (SELECT TWO )
A. Replace On-Demand Instances with Spot Instances
B. Configure model auto scaling dynamically to adjust the number of instances automatically.
C. Replace CPU-based EC2 instances with GPU-based EC2 instances.
D. Use multiple training instances.
E. Use a pre-trained version of the model. Run incremental training.
A machine learning (ML) specialist needs to solve a binary classification problem for a marketing dataset. The ML specialist must maximize the Area Under the ROC Curve (AUC) of the algorithm by training an XGBoost algorithm. The ML specialist must find values for the eta, alpha, min_child_weight, and max_depth hyperparameters that will generate the most accurate model.
Which approach will meet these requirements with the LEAST operational overhead?
A. Use a bootstrap script to install scikit-learn on an Amazon EMR cluster. Deploy the EMR cluster. Apply k-fold cross-validation methods to the algorithm.
B. Deploy Amazon SageMaker prebuilt Docker images that have scikit-learn installed. Apply k-fold cross-validation methods to the algorithm.
C. Use Amazon SageMaker automatic model tuning (AMT). Specify a range of values for each hyperparameter.
D. Subscribe to an AUC algorithm that is on AWS Marketplace. Specify a range of values for each hyperparameter.