# Amazon EMR code examples for the SDK for Python ## Overview Shows how to use the AWS SDK for Python (Boto3) to work with Amazon EMR. *Amazon EMR is a web service that makes it easy to process vast amounts of data efficiently using Apache Hadoop and services offered by Amazon Web Services.* ## ⚠ Important * Running this code might result in charges to your AWS account. For more details, see [AWS Pricing](https://aws.amazon.com/pricing/?aws-products-pricing.sort-by=item.additionalFields.productNameLowercase&aws-products-pricing.sort-order=asc&awsf.Free%20Tier%20Type=*all&awsf.tech-category=*all) and [Free Tier](https://aws.amazon.com/free/?all-free-tier.sort-by=item.additionalFields.SortRank&all-free-tier.sort-order=asc&awsf.Free%20Tier%20Types=*all&awsf.Free%20Tier%20Categories=*all). * Running the tests might result in charges to your AWS account. * We recommend that you grant your code least privilege. At most, grant only the minimum permissions required to perform the task. For more information, see [Grant least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege). * This code is not tested in every AWS Region. For more information, see [AWS Regional Services](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services). ## Code examples ### Prerequisites For prerequisites, see the [README](../../README.md#Prerequisites) in the `python` folder. Install the packages required by these examples by running the following in a virtual environment: ``` python -m pip install -r requirements.txt ``` ### Single actions Code excerpts that show you how to call individual service functions. * [Add steps to a job flow](emr_basics.py#L128) (`AddJobFlowSteps`) * [Describe a cluster](emr_basics.py#L89) (`DescribeCluster`) * [Describe a step](emr_basics.py#L185) (`DescribeStep`) * [List steps for a cluster](emr_basics.py#L163) (`ListSteps`) * [Run a job flow](emr_basics.py#L18) (`RunJobFlow`) * [Terminate job flows](emr_basics.py#L110) (`TerminateJobFlows`) ### Scenarios Code examples that show you how to accomplish a specific task by calling multiple functions within the same service. * [Create a short-lived Amazon EMR cluster and run a step](emr_usage_demo.py) * [Run a shell script to install libraries](install_libraries.py) ## Run the examples ### Instructions #### Create a short-lived Amazon EMR cluster and run a step This example shows you how to create a short-lived Amazon EMR cluster that runs a step and automatically terminates after the step completes. Start the example by running the following at a command prompt: ``` python emr_usage_demo.py ``` Shows how to write a job step that uses Apache Spark to estimate the value of pi by performing a large number of parallelized calculations on cluster instances. Results are written to Amazon EMR logs and also to an S3 bucket. #### Run a shell script to install libraries This example shows you how to use AWS Systems Manager to run a shell script on Amazon EMR instances that installs additional libraries. This way, you can automate instance management instead of running commands manually through an SSH connection. Start the example by running the following at a command prompt: ``` python install_libraries.py ``` To install additional libraries on running cluster instances, run the following command at a command prompt: ``` python install_libraries.py CLUSTER_ID SHELL_SCRIPT_PATH ``` This example is intended to be run as part of the tutorial in [Installing Additional Kernels and Libraries](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-install-kernels-libs.html). The cluster specified by *CLUSTER_ID* must be set up to work with Systems Manager. You must also have previously uploaded a shell script to the Amazon S3 location specified by *SHELL_SCRIPT_PATH*. ### Tests ⚠ Running tests might result in charges to your AWS account. To find instructions for running these tests, see the [README](../../README.md#Tests) in the `python` folder. ## Additional resources * [Amazon EMR Management Guide](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html) * [Amazon EMR API Reference](https://docs.aws.amazon.com/emr/latest/APIReference/Welcome.html) * [SDK for Python Amazon EMR reference](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html) --- Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: Apache-2.0