Adding retry logic to urllib3 Python code

In this post I’m going to cover the basics of implementing retry logic using urllib3.

There is probably a solid argument saying “why aren’t you just using requests?”, as it happens, requests uses urllib3 and it’s Retry functionality.

For the purposes of this post, lets imagine that we have a REST service and one of the resources is particularly popular, or flakey, and is throwing the occasional 503 HTTP Code.

Our initial code might look something like;

import urllib3

http = urllib3.PoolManager()
r = http.request('GET', 'http://www.myflakyendpoint.com/dicey')
if r.status == 200:
    logger.info('That was lucky')

We have once chance to get it right. Yes, some convoluted while loop against the status code could be used, but thats ugly.

Another option available to us is to make use of urllib3.util.Retry and get our request to retry a specified amount of times.

import urllib3
from urllib3.util import Retry
from urllib3.exceptions import MaxRetryError

http = urllib3.PoolManager()
retry = Retry(3, raise_on_status=True, status_forcelist=range(500, 600))

try:
    r = http.request('GET', 'http://www.myflakyendpoint.com/dicey', retries=retry)
except MaxRetryError as m_err:
    logger.error('Failed due to {}'.format(m_err.reason))

In this code we’ve created a Retry object telling it to retry a total of 3 times and throw an exception if all retries are exhausted. The status_forcelist is the HTTP status codes that will be considered to be failures.

Some other interesting arguments for the Retry object are.

Argument Comment
total The total number of retries that are allowed. Trumps the combined figure of connect and read
read How many read retries that are allowed
connect How many connect errors that are allowed
redirect How many redirects to allow. This is handy to prevent redirect loops
method_whitelist Which pethos are allowed. By default only idempotent methods are allowed, ruling out POST
backoff_factor How much to increase the back off factor (see docs for more info)
raise_on_status Whether to return the failed status or raise an exception

For more information, see the urllib3 documentation

Refreshing AWS credentials in Python

In a recent post I covered an using RefreshingAWSCredentials within .NET AWS SDK to solve an issue with the way my current organisation has configured SingleSignOn (SSO) and temporary credentials.

Essentially, the solution involves a background process updating a credenial file then using a time limited AWSCredential object to refresh the credentials.

Next…

The next issue to surface was satisfying the same requirement but for the Python based component of the 3rd party solution.

Refreshing Credential File

In this case, on RedHat instance, there is a cron job executing a Python script which handles the SSO process and writes the updated credentials and session token to a file which can be used by the 3rd party component.

Refreshing the Credentials in code

The exising code creates a session then creates the required resources. This works fine for the first hour till the temporary credentials expire.

from botocore.session import get_session

queues['incoming'] = session.resource('sqs', region).get_queue_by_name(QueueName='incoming_queue')

There is only a small amount of work to make this refreshing against the externally updated credential file. For this we’ll make use of the RefreshableCredentials from botocore.credentials.

from botocore.credentials import RefreshableCredentials
from botocore.session import get_session
from configparser import ConfigParser
from datetime import datetime, timedelta, timezone

def refresh_external_credentials():
    config = ConfigParser()
    config.read(credential_file_path)
    profile = config.get(profile_name)
    expiry = (datetime.now(timezone.utc) + timedelta(minutes=refresh_minutes))
    return {
        "access_key": profile.get('aws_access_key_id'),
        "secret_key": profile.get('aws_secret_access_key'),
        "token": profile.get('aws_session_token'),
        "expiry_time": expiry.isoformat()
    }

There are a few config entries here.

  • credential_file_path is the location of the credential file that is getting externally updated
  • profile_name is the profile in the credential file that you want to use
  • refresh_minutes is the time before the AWS credential will expire and the refresh_external_credentials() function will get called.

We now need to create the credential object for a session which will then be able to auto refresh.

session_credentials = RefreshableCredentials.create_from_metadata(
    metadata = refresh_external_credentials(),
    refresh_using = refresh_external_credentials,
    method = 'sts-assume-role'
)

Going back to the original code, the new session_credentials can be plugged in to provide long life application against temporary tokens.

import boto3

# ideally taken from config
region = 'eu-west-1'
incoming_queue_name = 'incoming_queue'

session = get_session()
session._credentials = session_credentials
autorefresh_session = boto3.Session(botocore_session=session)

queues['incoming'] = autorefresh_session.resource('sqs', region).get_queue_by_name(QueueName=incoming_queue_name)

RefreshingAWSCredentials with .NET

NOTE: I haven’t really written any C# in about 5 years, this code may be a bit crap and using old techniques

Overview

Where I am currently working we have Single Sign On for AWS API calls and need to use task accounts to connect and get temporary credentials. To that end, its not very easy to have long running processes making calls to AWS API’s such as S3 and SQS.

I am working a proof of concept which has a 3rd party .NET component which listens to SQS messages, calls into a proprietary API then dumps the results on S3. This code wasn’t written by people who knew about the hoops we need to authenticate.

To complicate things, the context the application runs under isn’t the context of the service account which has been granted rights to the appropriate role and the instance profile doesn’t have the correct rights. (For reasons I won’t go into, the Ops team won’t correct this).

So, having written that, I realise we’re looking at a very niche use case, but if it looks familiar, read on.

Solution

Behind the scenes, there is a PowerShell in the background running as a scheduled task to go through Single Sign On and get new tokens to write in the credential file. I won’t go into any more detail than that as its very company specific.

As the 3rd party application creates the client on startup it uses the latest credentials but they don’t get refreshed from the credential file. I found an abstract class in the .NET SDK called RefreshingAWSCredentials which looked promising.

With this class, you can set an expiration for the Credential object such that any AWS SDK client that is using it for the API calls - for example;

var s3Client = new AmazonS3Client(ExternalRefreshingAWSCredentilas.Credentials);

will create an S3Client that is given the refreshing credentials specified below.

using Amazon.Runtime;
using Amazon.Runtime.CredentialManagement;
using log4net;
using System;
using System.Configuration;
 
namespace AwsCredentialsExample.Credentials
{
    public class ExternalRefreshingAWSCredentials : RefreshingAWSCredentials
    {
        private static readonly object lockObj = new Object();
        private static readonly ILog Logger = LogManager.GetLogger(typeof(ExternalRefreshingAWSCredentials));
        private readonly string credentialFileProfile;
        private readonly string credentialFileLocation;
        private static ExternalRefreshingAWSCredentials refreshingCredentials;
        private CredentialsRefreshState credentialRefreshState;
        private int refreshMinutes = 45;

        public static ExternalRefreshingAWSCredentials Credentials {
            get {
                if (refreshingCredentials == null) {
                    lock (lockObj) {
                        if (refreshingCredentials == null) {
                            refreshingCredentials = new ExternalRefreshingAWSCredentials();
                        }
                    }
                }
                return refreshingCredentials;
            }
        }

        private ExternalRefreshingAWSCredentials()
        {
             credentialFileProfile = ConfigurationManager.AppSettings["CredentialFileProfile"];
             credentialFileLocation = ConfigurationManager.AppSettings["CredentialFileLocation"];
             if (ConfigurationManager.AppSettings.HasKey("ClientRefreshIntervalMinutes"))
             {
                 refreshMinutes = int.Parse(ConfigurationManager.AppSettings.Get("ClientRefreshIntervalMinutes"));
             }
             Logger.Info(string.Format("Credential file location is {0}", credentialFileLocation));
              Logger.Info(string.Format("Credential file profile is {0}", credentialFileProfile));
            credentialRefreshState = GenerateNewCredentials();
        }
 
        public override void ClearCredentials()
        {
            Logger.Info("Clearing the credentials");
            credentialRefreshState = null;
        }
 
        protected override CredentialsRefreshState GenerateNewCredentials()
        {
            Logger.Info(string.Format("Generating credentials, valid for {0} minutes", refreshMinutes));
            var credFile = new StoredProfileAWSCredentials(credentialFileProfile, credentialFileLocation);
            return new CredentialsRefreshState(credFile.GetCredentials(), DateTime.Now.AddMinutes(refreshMinutes));
        }
 
        public override ImmutableCredentials GetCredentials()
        {
            if (credentialRefreshState == null || credentialRefreshState.Expiration < DateTime.Now)
            {
                credentialRefreshState = GenerateNewCredentials();
            }
            return credentialRefreshState.Credentials;
        }
    }
}

Usage

There are three configurations to be used with this. These should be added as AppSettings in the app.config

  1. CredentialFileLocation - The location of the credential file that is being updated externally (in this case by PowerShell)
  2. CredentialFileProfile - The profile from the credential file to use
  3. ClientRefreshIntervalMinutes - How long to keep the credentials before expiring them (defaults to 45 minutes)

As suggested before, you can now create your AWS clients passing in the Credentials property in place of any of the normally used Credentials objects.

var s3Client = new AmazonS3Client(ExternalRefreshingAWSCredentials.Credentials);

Generating test data with Faker

Python is one of those languages where if you can concieve it, there is probably already a library for it.

One great library is Faker - this makes the generation of sensible test data much easier and removes a lot of the issues around having to using unrealistic junk values when creating it on your own.

There is lots to see, and your probably best off reading the docs, but this is to give you an overview.

Installation

Installation is simple, just use pip to install;

pip install faker

Usage

Now that you have it installed, you can use python REPL or ptpython to have a play.

### Some basics

from faker import Factory

fake = Factory.create()
fake.first_name(), fake.last_name(), fake.email()

This will give you a tuple with a random name and email;

('Mary', 'Bennett', 'jamesrodriguez@hotmail.com')

Localisation

If you want to get UK post codes, you can tell the factory a localisation to use when generating the data;

from faker import Factory

fake = Factory.create('en_GB')

fake.street_address(), fake.postcode()

which will yield;

('Studio 54\nCollins fork', 'L2 7XP')

Synchronising Multiple Fakes

Everytime you call a method on the fake object you get a new value. If you wanted to synchronise two fake objects you can use the seed. This will mean that the each consecutive call from each fake will return the same value.

This is probably more easily demonstrated in code;

from faker import Factory

fake1 = Factory.create()
fake2 = Factory.create()

fake1.seed(12345)
fake2.seed_instance(12345)

fake1.name(), fake2.name()
fake1.name(), fake2.name()

This will result in a tuple containing the same name across synchronised fake objects.

('Adam Bryan', 'Adam Bryan')
('Jacob Lee', 'Jacob Lee')

Making it a bit more interesting

In a previous pose I fiddled with credit card data where I created test data. Faker can be used to help out here. The code below isn’t an example of amazing Python, its just simple code to show it working.

First, we bring in the imports that are going to be used;

import csv
import random
from faker import Factory
from faker.providers import credit_card

Some helper methods, these are just to keep things clean

def get_transaction_amount():
    return round(random.randint(1, 1000) * random.random(), 2)

def get_transaction_date(fake):
    return fake.date_time_between(start_date='-30y', end_date='now').isoformat()

Some more helpers for the creation of records for our customer and transaction

def create_customer_record(customer_id):
    fake = Factory.create('en_GB')
    return [customer_id, fake.first_name(), fake.last_name(), fake.street_address().replace('\n', ', '), fake.city(), fake.postcode(), fake.email()]

def create_financials_record(customer_id):
    fake = Factory.create('en_GB')
    return [customer_id, fake.credit_card_number(), get_transaction_amount(), get_transaction_date(fake)]

A helper function to save the records to file

def flush_records(records, filename):
    with open(filename, 'a') as file:
        csv_writer=csv.writer(file, delimiter = ',', quotechar = '"', quoting = csv.QUOTE_MINIMAL)
        for record in records:
            csv_writer.writerow(record)
    records.clear()

Finally the main calling block to create the records

def create_customer_files(customer_count=100):
    customer_records = []
    financial_records = []
    for id in range(1, customer_count):
        customer_id = str(id).zfill(10)
        customer_records.append(create_customer_record(customer_id))
        financial_records.append(create_financials_record(customer_id))
        if len(customer_records) == 100:
            flush_records(customer_records, 'customer.csv')
            flush_records(financial_records, 'financials.csv')
    flush_records(customer_records, 'customer.csv')
    flush_records(financial_records, 'financials.csv')

create_customer_files()

Once we run this, we’ll have 2 files with customer details and a credit card transaction.

Customer records

0000000001,Clifford,Turner,"Flat 17, Smith crescent",Johnsonborough,DN5 7JJ,ucooper@gmail.com
0000000002,Amy,Clements,"Studio 96s, Anne harbor",Maureenfurt,LA53 8HZ,marshalllee@williams-hart.info
0000000003,Robin,Sinclair,5 Lesley motorway,Bryanbury,E2 9EU,sheilawhitehead@miles.com

Financial records

0000000001,4851450166907,179.28,2009-06-01T07:03:43
0000000002,370196022599953,229.46,2017-12-11T10:14:59
0000000003,4285121047016,10.61,1995-04-23T23:54:19

By sharing the customer ID across both files we have some semblence of referential integrity.

This code only creates a single transaction per customer, it can be easily modified to create multiple transactions by adjusting the create_financial_records to take an optional argument of transaction_count=1 and updating the append to handle an array of arrays ```

AWS EC2 Comparison

When working with EC2 instances, you really want to be right sizing the instance from the outset. With Amazon regularly bringing out new classes of instance it is hard to keep track of what is available now and what the characteristics are.

Some time ago I came across EC2Instances.info. This site regularly scrapes the instance information pages in the AWS documentation to gather the latest data about available instances and aggregate it all together.

EC2Instances screen shot

The functionality I find most useful is when I know I want for example 4GB or RAM, I can filter to get the list of all suitable, then sort them in ascending order and apply additional search criteria.

Next time you’re looking for an instance type, its worth a look.