Airflow Sagemaker

Bases: airflow.contrib.operators.sagemakerbaseoperator.SageMakerBaseOperator. Initiate a SageMaker transform job. This operator returns The ARN of the model created in Amazon SageMaker. Config – The configuration necessary to start a transform job (templated). SageMaker Python SDK. SageMaker Python SDK is an open source library for training and deploying machine learning models on Amazon SageMaker. With the SDK, you can train and deploy models using popular deep learning frameworks Apache MXNet and TensorFlow.

This notebook uses fashion-mnist dataset classification task as an example to show how one can track Airflow Workflow executions using Sagemaker Experiments.

Overall, the notebook is organized as follow:

  1. Download dataset and upload to Amazon S3.

  2. Create a simple CNN model to do the classification.

  3. Define the workflow as a DAG with two executions, a SageMaker TrainingJob for training the CNN model, followed by a SageMaker TransformJob to run batch predictions on model.

  4. Host and run the workflow locally, and track the workflow run as an Experiment.

  5. List executions.

Note that if you are running the notebook in SageMaker Studio, please select Python3(TensorflowCPUOptimized) Kernel; if you are running in SageMaker Notebook, please select conda_tensorflow_py36 kernel.

Setup¶

Create a S3 bucket to hold data¶

Preparing dataset¶

We will be creating a SageMaker Training Job and fitting by (x_train,y_train), and then a SageMaker Transform Job to perform batch inference over a large-scale (10K) test data. To do the batch inference, we need first flatten each sampl image (28x28) in x_test into an float array with 784 features, and then concatenate all flattened samples into a csv file.

Upload the dataset to s3¶

Create a simple CNN¶

The CNN we use in this example contains two consecutive (Conv2D - MaxPool - Dropout) modules, followed by a feed-forward layer, and a softmax layer to normalize the output into a valid probability distribution.

Create workflow configurations¶

For the purpose of demonstration, we will be executing our workflow locally. Lets first create a dir under airflow root to store our DAGs.

We will create an experiment named fashion-mnist-classification-experiment to track our workflow execution first.

The following cell defines our DAG, which is a workflow with two steps. One is running a training job on SageMaker, then followed by running a transform job to perform batch inference on the fashion-mnist testset we created before.

We will write the DAG defnition into the airflow/dags we just created above.

Now, lets init the airflow db and host it locally

Then, we start a backfill job to execute our workflow. Note, we use backfill job simply because we dont want to wait until the airflow scheduler to trigger the workflow to run.

List workflow executions¶

Each execution in the workflow is modeled by a trial, lets list our workflow executions

Airflow Sagemaker

Let’s take a closer look at the jobs we created and executed by our workflow

cleanup¶

Run the following cell to clean up the sample experiment, if you are working on your own experiment, please ignore.

This article describes how to set up instance profiles to allow you to deploy MLflow models to AWS SageMaker.It is possible to use access keys for an AWS user with similar permissions as the IAM role specified here, but Databricks recommendsusing instance profiles to give a cluster permission to deploy to SageMaker.

Step 1: Create an AWS IAM role and attach SageMaker permission policy

  1. In the AWS console, go to the IAM service.

  2. Click the Roles tab in the sidebar.

  3. Click Create role.

    1. Under Select type of trusted entity, select AWS service.

    2. Under Choose the service that will use this role, click the EC2 service.

    3. Click Next: Permissions.

  4. In the Attach permissions policies screen, select AmazonSageMakerFullAccess.

  5. Click Next: Review.

  6. In the Role name field, enter a role name.

  7. Click Create role.

  8. In the Roles list, click the role name.

Make note of your Role ARN, which is of the format arn:aws:iam::<account-id>:role/<role-name>.

Step 2: Add an inline policy for access to SageMaker deployment resources

Add a policy to the role.

  1. Click .

  2. Paste in the following JSON definition:

Sagemaker airflow blog

These permissions are required to allow the Databricks cluster to:

  1. Obtain the new role’s canonical ARN.
  2. Upload permission-scoped objects to S3 for use by SageMaker endpoint servers.

The role’s permissions will look like:

Airflow Sagemaker Example

Sagemaker

Step 3: Update the role’s trust policy

Add iam:AssumeRole access to sagemaker.amazonaws.com.

  1. Go to Role Summary > Trust relationships > Edit trust relationship.

  2. Paste and save the following JSON:

Your role’s trust relationships should resemble the following:

Sagemaker Airflow Blog

Step 4: Allow your Databricks workspace AWS role to pass the role

  1. Go to your Databricks workspace AWS role.

  2. Click .

  3. Paste and save the following JSON definition:

where account-id is the ID of the account running the AWS SageMaker service androle-name is the role you defined in Step 1.

Airflow Vs Sagemaker

Airflow Sagemaker

Step 5: Create a Databricks cluster instance profile

  1. In your Databricks Admin Console, go to the Instance Profiles tab and click Add Instance Profile.

  2. Paste in the instance profile ARN associated with the AWS role you created. This ARN is of the form arn:aws:iam::<account-id>:instance-profile/<role-name> and can be found in the AWS console:

  3. Click the Add button.

Airflow vs sagemaker

For details, see Secure access to S3 buckets using instance profiles.

Step 6: Launch a cluster with the instance profile

Airflow Sagemaker

See Step 6: Launch a cluster with the instance profile.