# AWS Glue code examples for the SDK for Python ## Overview Shows how to use the AWS SDK for Python (Boto3) to work with AWS Glue. *AWS Glue is a scalable, serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.* ## ⚠ Important * Running this code might result in charges to your AWS account. For more details, see [AWS Pricing](https://aws.amazon.com/pricing/?aws-products-pricing.sort-by=item.additionalFields.productNameLowercase&aws-products-pricing.sort-order=asc&awsf.Free%20Tier%20Type=*all&awsf.tech-category=*all) and [Free Tier](https://aws.amazon.com/free/?all-free-tier.sort-by=item.additionalFields.SortRank&all-free-tier.sort-order=asc&awsf.Free%20Tier%20Types=*all&awsf.Free%20Tier%20Categories=*all). * Running the tests might result in charges to your AWS account. * We recommend that you grant your code least privilege. At most, grant only the minimum permissions required to perform the task. For more information, see [Grant least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege). * This code is not tested in every AWS Region. For more information, see [AWS Regional Services](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services). ## Code examples ### Prerequisites For prerequisites, see the [README](../../README.md#Prerequisites) in the `python` folder. Install the packages required by these examples by running the following in a virtual environment: ``` python -m pip install -r requirements.txt ``` ### Single actions Code excerpts that show you how to call individual service functions. * [Create a crawler](glue_wrapper.py#L51) (`CreateCrawler`) * [Create a job definition](glue_wrapper.py#L137) (`CreateJob`) * [Delete a crawler](glue_wrapper.py#L304) (`DeleteCrawler`) * [Delete a database from the Data Catalog](glue_wrapper.py#L288) (`DeleteDatabase`) * [Delete a job definition](glue_wrapper.py#L254) (`DeleteJob`) * [Delete a table from a database](glue_wrapper.py#L271) (`DeleteTable`) * [Get a crawler](glue_wrapper.py#L28) (`GetCrawler`) * [Get a database from the Data Catalog](glue_wrapper.py#L99) (`GetDatabase`) * [Get a job run](glue_wrapper.py#L234) (`GetJobRun`) * [Get runs of a job](glue_wrapper.py#L214) (`GetJobRuns`) * [Get tables from a database](glue_wrapper.py#L118) (`GetTables`) * [List job definitions](glue_wrapper.py#L196) (`ListJobs`) * [Start a crawler](glue_wrapper.py#L82) (`StartCrawler`) * [Start a job run](glue_wrapper.py#L163) (`StartJobRun`) ### Scenarios Code examples that show you how to accomplish a specific task by calling multiple functions within the same service. * [Get started with crawlers and jobs](glue_wrapper.py) ## Run the examples ### Instructions #### Get started with crawlers and jobs This example shows you how to do the following: * Create a crawler that crawls a public Amazon S3 bucket and generates a database of CSV-formatted metadata. * List information about databases and tables in your AWS Glue Data Catalog. * Create a job to extract CSV data from the S3 bucket, transform the data, and load JSON-formatted output into another S3 bucket. * List information about job runs, view transformed data, and clean up resources. This example requires the following scaffold resources that are defined in the accompanying AWS CloudFormation script `setup_scenario_getting_started.yaml`. * An Amazon Simple Storage Service (Amazon S3) bucket that can contain the Python ETL job script and receive output data. * An AWS Identity and Access Management (IAM) role that can be assumed by AWS Glue. The role must grant read-write access to the S3 bucket and standard rights needed by AWS Glue. You can deploy the scaffold resources at a command prompt. ``` python scaffold.py deploy ``` This outputs a role and bucket name similar to the following. ``` Outputs: RoleName: AWSGlueServiceRole-DocExample BucketName: doc-example-glue-scenario-docexampleglue6e2f12e5-3zjkuexample ``` If you prefer, you can deploy and destroy scaffold resources by using the AWS Cloud Development Kit (AWS CDK). To do this, run `cdk deploy` or `cdk destroy` in the [/resources/cdk/glue_role_bucket](/resources/cdk/glue_role_bucket) folder. Start the example by running the following at a command prompt: ``` python glue_wrapper.py ``` After the example is done, destroy scaffold resources at a command prompt. ``` python scaffold.py destroy ``` ### Tests ⚠ Running tests might result in charges to your AWS account. To find instructions for running these tests, see the [README](../../README.md#Tests) in the `python` folder. ## Additional resources * [AWS Glue Developer Guide](https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html) * [AWS Glue API Reference](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html) * [SDK for Python AWS Glue reference](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html) --- Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: Apache-2.0