Validating API Requests and It’s Importance

Hey guys, as most of you know, request validation in building an API should be a must and very strict. In this article we will go through on how you can implement request validation when building an API (in python) and why is it so important.

Problem At Hand

Let’s start by having a scenario in which you have a simple API (totally hypothetical) which takes an array of numbers in a POST request calculates the sum of numbers (integers/decimals), saves the answer in some database and also returns in the response. The example API could be – /add with the given requestBody

{
    "numbers": [1, 2, 3.0, 4, 5]
}

Now this may seem very simple to you, where we have a simple array of numbers and have the following simple code for the add function.

def add(numbers):
    sum = 0
    for num in numbers:
        sum += num
    return sum

Now this will work just fine till the time we have a “numbers” array. Now suppose the client sends the array –

{
    "numbers": [1, 2, 3.0, 4, 5, "hello"]
}

Now, while this maybe a perfectly correct JSON as a requestBody, your addition algorithm would fail and application would crash with the following error –

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: unsupported operand type(s) for +=: 'float' and 'str'

Obviously, you would think, why would we ever have a string into the array in the first place? Yes, ideally you would not, but consider the case where you have your API published and some other 3rd party client (be it any type of app / other service) consuming your API, and by mistake they sent this. So rather than the client receiving a graceful error, your application’s thread would get killed and the client would receive a generic HTTP 500 INTERNAL SERVER ERROR.

Ideally, your application should never exit / behave unexpectedly. You should handle all the scenarios gracefully..!

Request Validation

Now to solve this, you can have initial check where you iterate the array and check if each element meets your requirement and then you pass the array to your addition function (obviously you can check it while adding, but the scope is not about performance in this article but simplicity). This approach would seem good, where if you find any issue, you can just return a HTTP 400 BAD REQUEST return status saying that the array does not have the proper format. Considering this, assume a type of real situation where you have the following JSON as requestBody for some API to add a new employee-

{
    "name": "Shrey",
    "countryCode": 91,
    "age": 25,
    "address": {
        "city": "Abc",
        "stateCode": "XY",
        "zipcode": 12345
    },
    "employeeType": "E1",
    "department": "D2"
}

Now, to validate this via if/else checks, just imagine how many custom check you would need if we needed to check the following stuff (all hypothetical assumptions) –

  1. Name should always be required.
  2. Country code should allow only a certain country codes in list – [1, 2, 91, 1050]
  3. Age should be greater than 0 but less than 120.
  4. Address is a nested object where city should have a minimum of 3 characters, stateCode should be a valid stateCode and zipcode should always be a 5 digit number.
  5. The department and employee has a M:N relationship where each department can have only defined set of employee types.

Now this will get complicated to write all your request validation and cases in your view/controller and get’s tightly coupled with that particular view. What if there is a better way to achieve this, making it independent and strict? Obviously there is..! We will see how we can use marshmallow package in python to achieve this (and yes, you can use this concept in any language or framework you want.)

Introduction to Marshmallow

Marshmallow is an awesome package which allows you to check, verify and serialise your objects over a well defined strict model, allowing you to decouple request validation logic from your views. This package is well suited for any type of Python API be it Flask, Django, Pyramid or any other..! A small example would be –

from marshmallow import Schema, fields, validate

class PersonSchema(Schema):
    name = fields.String(required=True)
    age = fields.Integer(required=True, validate=validate.Range(min=0, max=120))

Now to validate your requestBody, all you need to do is –

from custom_responses import BadRequest

# Extract POST body or the arguments of GET request (request.GET.dict())
requestBody = request.POST
errors = PersonSchema().validate(requestBody)
if errors:
    # raise a 400 Bad Request with the errors dictionary.
    raise BadRequest("Validation failed.", errors=errors)

So all you need to do is build a Validation Schema for each API as a Model Class and use that to validate each of your incoming API calls. You can also reuse your models for different related APIs where you have similar structure.

How to use Request Validation in Production

Now as we saw, we can build our own Schemas by inheriting marshmallow’s Schema class, and can call the .validate(data_dict) function to validate the data. Now, to write it in the views/controllers is okay, but you might want to decouple this outside your view in either some type of middleware, or you could write a simple decorator to handle this and decorate each of your API. In that way, you just need to pass in your validation Schema to the decorator and your view will only contain the business logic of your API. Example decorator (just a pseudo code example)-

# Only a pseudo code example
def request_validator(validation_schema, *args, **kwargs):
    def validator(original_function):
        @wraps(original_function)
        def inner(request, *args, **kwargs):
                if method in ["GET", "HEAD"]:
                    data = request.GET.dict()
                else:
                    data = request.POST
                errors = validation_schema().validate(data)
                if errors:
                    # raise a 400 HTTP response with errors.
                    raise BadRequest("Validation Failed.", errors) 
            return original_function(request, *args, **kwargs)

        return inner
    return validator

Then just wrap your API with this decorator –

from custom_utils import request_validator
from custom_schemas import SomeAPISchema

@require_http_methods(["GET"])
@request_validator(validation_schema=SomeAPISchema)
def some_api(request):
    # do stuff
    return response

This way, you can actually scale your code, and all your Schemas and validations can rest independently of your view code.!

Important Validation Examples

Marshmallow is a very powerful tool to create really complex schemas and can apply very strict, custom, complex validations just the way you want. Almost everything is possible. Let’s recreate the example of add employee request body given above with the rules that are given below the example.

from marshmallow import Schema, ValidationError, fields, validate

class AddressSchema(Schema):
    city = fields.String(required=True, validate=validate.Length(min=3))
    stateCode = fields.String(default="XY", validate=validate.OneOf(["OH", "XY", "AB"]))
    zipcode = fields.Integer(validate=validate.Range(min=10000, max=99999))

class EmployeeSchema(Schema):
    name = fields.String(required=true)
    countryCode = fields.Integer(validate=validate.OneOf([1, 2, 91, 1050]))
    age = fields.Integer(validate=validate.Range(min=0, max=120))
    address = fields.Nested(AddressSchema)
    employeeType = fields.String(required=True)
    department = fields.String(required=True)
    # extra examples
    marks = fields.List(fields.Integer())
    extraInfo = fields.Dict()

    @validates("age")
    def custom_name_validator(self, value, **kwargs):
        if age > 25 and age < 35:
            raise ValidationError("Person in age 25-35 are not allowed.")
    
    @validate_schema
    def validate_employee_with_department(self, data, **kwargs):
        if not (
            (
                data["employeeType"] in ["E1", "E2", "E3"] and
                data["department"] in ["D1", "D2"]
            ) or 
            (
                data["employeeType"] in ["E4", "E5", "E6"] and
                data["department"] in ["D3", "D2", "D6"]
            )
        ):
            raise ValidationError("Incorrect mapping.")

Do checkout the official marshmallow documentation for other validation functions, field types, parameters and customisations..!! Do like and comment on how this helped you in your coding life.! Keep sharing with your friends too..!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: