Pros and Cons of an API

I'm not completely sure if explicitly defining an API is the best solution for this situation. In the company where I work, we have them defined and there are mixed feelings.

👍 On the one hand, explicitly saying what a microservice expects as an input seems logical - other developers should then only care about reading that, and not worrying about the underlying implementation.
And if someone makes a change in their API, it will be clearly visible via the changelog on that API.
Also, when creating a new feature, developers can start by defining the APIs which helps a lot in the thought process.

👎 On the other hand, making an API easily readable is hard, and almost impossible for scenarios with complex data structures.
It also takes a lot of time to write it and maintain it.
If the developer who wrote it is near, oftentimes it's much easier to just ask them for help.

Approaches to defining an API

There are two approaches to defining an API I'm going to cover.

To keep things simple, we'll write an API for a microservice that can add or subtract N numbers.

For our example: all we need are two fields:

action which can be ADD/SUBTRACT,
arguments which is a list of numbers.

The message payload into the microservice will look like this:

{
  "action": "ADD",
  "arguments": [4, 8.3, 11]
}

Approach 1: JsonSchema

This is the approach I have the most experience with. JsonSchema is a vocabulary that allows you to annotate and validate JSON documents (our messages from Kafka).

Everything in the schema is pretty self-explanatory, so I won't go into any details.

{
  "type": "object",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "required": [
    "action",
    "arguments"
  ],
  "properties": {
    "action": {
      "type": "string",
      "enum": ["ADD", "SUBTRACT"]
    },
    "arguments": {
      "type": "array",
      "items": {
        "type": "number"
      }
    },
  },
  "additionalProperties": false
}

I can only tell you that things get pretty crazy when there are more complex data structures (> 700 lines).

To cope with the low readability and to test our schema, we oftentimes write examples for different cases. This way, other developers can look at them instead and reference the real schema for fine details.

P.S. An awesome tool for debugging schemas is this JSON Schema Validator. I've been using it for months now.

Approach 2: Python `dataclasses`

For this example, we'll stick with Python, but I guess the approach is applicable to most object-oriented languages.

Basically, Python introduced dataclasses in version 3.7, and they make defining data structures easier and cleaner.

from dataclasses import dataclass
from enum import Enum
from typing import List


class ActionOptions(Enum):
    ADD = 'ADD'
    SUBTRACT = 'SUBTRACT'


@dataclass
class Message:
    action: ActionOptions
    arguments: List[float]

And then, by using a package dacite we can import our message into this structure:

from dacite import from_dict, Config  


data = {"action": "ADD", "args": [4, 8.3, 11]}
message = from_dict(
    Message,
    data,
    config=Config(type_hooks={ActionOptions: ActionOptions}, strict=True)
)

dacite allows us to import more complex data structures and raises an exception when the data doesn't match our structure.

I like the idea that the dataclasses can have different methods on them that may fit its context. It is easy to write any custom validation that can't be easily expressed otherwise.
It is also possibly easier to read for a developer, but I don't know for sure yet.

The downside is that the classes have to be defined bottom-up so python can reference everything correctly (although that can probably be bypassed). Also, dacite is still in active development and needs some improvements.

What to choose?

I'm really looking forward to trying the second approach and battle-testing it. As time passes by, I'm seeing JsonSchema as more of a pain point than a relief. Although, to be honest, maybe I just don't appreciate it enough 🙂

What I want to hear is what do you think:

Do you write APIs for your microservices?
Which approaches work and which don't?
Do you know other reasons why the two approaches I listed could be awesome/awful?

Should you explicitly define APIs when using Microservices?

Pros and Cons of an API

Approaches to defining an API

Approach 1: JsonSchema

Approach 2: Python dataclasses

What to choose?

Approach 2: Python `dataclasses`