One of the interview rounds that is a key part of major organisations (like AWS, Microsoft and Google) is the System Design interview. This is where you do a high level design of a scalable product, and talk through the components and how you would build it (further on that in a coming post). One of the well know questions is “How would you design a URL shortening service like tinyURL / bit.ly ?” I did some reading on this and it looked like a good product to actually build, so lets go through the steps and code that can make this work.
I built this based on a a single instance of the app and database(s) – and review at the end some ideas on scaling. Its setup on a Windows 10 PC, with Python 3.8.5 via VisualStudio Code, and Docker for windows installed. In the code I use:
- Python with BOTO3 module installed (see my previous post HERE)
- Redis via docker instance (download HERE) (in my case, running on default port 6379). You can run Redis from the source code on any instance method you like, but I wanted to work with Docker just to add to the technologies being used in the design
- DynamoDB (with a table called tinyUrl set up to store the entries with a sort key called miniUrl)
We will build this as a CLI based python app – but in the real world you would have this done via a webpage (for the creation) and using DNS to do the lookup of tinyURL to match it to the full domain.
The logic of the python app is as follows:
# — Create
 takes in the URL and shrinks to a 8 char string
 checks if already used (redis) – if so return shortened string
 if not in redis, adds to DynamoDB and redis
# — Read
 Looks in to redis for key
 if present, returns full URL
 if not present, returns ERR
# — __init__
 Reads all existing key/value pairs from DynamoDB and inserts into Redis cache
A few notes on the steps and some reasoning behind why we use each technology:
- The 8 char string will be built from an MD5 hash of the URL. Each character in the tinyURL will be from a base62 choice [A-Za-z0-9], which gives us 8^62 different combinations of characters (which is a BIG space), also meaning that the chance of a collision (same hash/tinyURL) is very small.
- I use Redis as the READ database, since its an in memory cache system that is super fast for NoSQL lookups (like this one – simply needing a key/value pair for the tinyURL and Original URL).
- DynamoDB is used as the backend to provide a persistent store of the URLs (which could also hold more data for each entry like user and time/date requested).
- I also use the aioredis module for python, allowing us to do async operations with Redis.
I will go over the key commands in the code, but the full code can be downloaded from HERE.
async def read_redis(url):
redis_db = await aioredis.create_redis(‘redis://localhost:6379′, encoding=’utf-8’)
val = await redis_db.get(url, encoding=’utf-8′)
This function connects to Redis and then reads in the tinyURL from the cache. If its not present – it will return false (ie a cache miss).
async def startup_load_redis():
dynamo_db = boto3.resource(‘dynamodb’, region_name=myRegion)
table = dynamo_db.Table(table_name)
response = table.scan()
for entries in response[‘Items’]:
await redis_db.set(entries[‘miniUrl’], entries[‘fullUrl’])
This function is called at startup. It creates a DynamoDB connection and reads the entries in the Table (table_name – our tinyURL key/pair table). It then goes through and adds those to the Redis cache so Redis is up to date and ready to answer the queries (instead of the DynamoDB).
async def create_short_url(url):
url_hash = hashlib.md5(url.encode()).hexdigest()
url_hash_len = int(len(url_hash)/3)
left_char = url_hash[:url_hash_len]
middle_char = url_hash[url_hash_len:-url_hash_len]
right_char = url_hash[-url_hash_len:]
xor_char = int(left_char, 16) ^ int(middle_char, 16) ^ int(right_char, 16)
xor_char, r = divmod(xor_char, base_62_len)
shortUrl = ”.join([str(elem) for elem in result])
This routine creates the tinyURL. Firstly it creates and MD5 hash of the URL, to give it a unique 32 byte value. We then break this 32 byte value up into 3x ten bytes, and xor these together, to get an 8 byte value. We then use mod (%) to turn that into a base62 string, which is our tinyURL.
From here we simply check if that tinyUrl is already in the cache (cache hit), otherwise we add it to both the DynamoDB and the Redis cache for future reads.
And that’s it ! We now have the core of a very fast tinyURL system that works.
System Design Scaling Notes
With this built, we have a fast tinyURL service, that is essentially single threaded. So how to now scale this into a large region or globally, with millions of requests a day and multi-millions of URLs to store ? That’s the System Design part – and here are a few things to consider in the list:
- Move the app to a web based solution, which runs behind scaled instances of web servers
- Redis should be run in a cluster, possible across regions
- Availability can be an issue if this is to be replicated across regions (how long after a tinyURL is created must it be available to the whole world ?)
- An SQL solution may be an option for the persistent storage, since something like DynamoDB is eventually consistent by design so takes time to be read ready across regions
- If a cache miss occurs – perhaps a read to a single SQL DB (worldwide) may be feasible ? Consider the tradeoffs
- DNS severs will need to have apps built that can query the Redis cache and also the persistent storage to see if the tinyURL being created OR being read is present. Services like AWS Route53 can be considered to also support things like load balancing and geo-zone responses
Once you start diving into the design and then the ideas behind how to scale it becomes very interesting ! Hope you enjoyed this post, and make sure you continue to learn about and use products like AWS and Redis in solutions you are working on.