Machine learning Model to Microservice Part 1: Code Quality from Get Set Go..

Machine learning Model to Microservice Part 1: Code Quality from Get Set Go..

Part 1 on converting model code to Microservice

The first step in having production code is ensuring that code is of a certain quality. Code quality is not hand waving or a very empirical aspect of programming where it varies from person to person. There are standards of code quality that most languages offer since most of the data science and deep learning is done in Python, I would mention PEP8 as the standard that must be maintained.

The good part is that there have been many people who have programmatic tools for ensuring this code quality in terms of linting, pre-commit hooks.

Linting:

Linting is the process of analyzing static code for potential errors. In python certain tools exists for checking the code quality, code style (PEP8). Some of them are

Tools

  • Black: Style formatting that is based on PEP8, This changes the codes in order to remain compliant to the PEP8 standards.
  • flake8: Verifies PEP8 compliance, circular complexity
  • bandit: A tool designed to detect common security issues in code
  • radon: computes code metrics for the given tool

These tools can be run individually every time but it is better to add them to pre-commit hooks and also as a stage in CI testing(coming soon) so that the code can be checked for quality and if one of them fails we can ensure compliance to industry standards

Pre-commit hooks

Before pre-commit we must get familiar with the idea of git hooks, Git like many other version control system has hooks as a method for running automated scripts. Pre-commit hooks are one such method to ensure automated linting, type-checking, etc so that commit gets rejected if it doesn't meet the standards.

Below I have provided an example .pre-commit-config.yml file which will contain the pre-commit hooks for this project

repos:
  - hooks:
      - id: black
        args:
          - --line-length=99
        language_version: python3
    repo: https://github.com/ambv/black
    rev: 21.4b0
  - hooks:
      - id: flake8
        args:
          - "--max-line-length=99"
    repo: https://gitlab.com/pycqa/flake8
    rev: 3.9.1
  - hooks:
      - args:
          - --django
        id: name-tests-test
      - id: requirements-txt-fixer
      - id: check-executables-have-shebangs
    repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.4.0

Explanation

The above pre-commit config file has repos that are to be added. So for this case, we have added black, flake8, and pre-commit hooks which check the requirements.txt file and check the executable scripts like bash scripts.

The repo field is used to add the corresponding project repository link and rev represents the version. The args represent some input to these linters.

Package Management

Another huge part of making a piece of code reproducible is package management. In Python, there are multiple packages that need to be installed and they can have varying versions and if the deployable version varies from the initial version then the code cannot be reproduced.

I prefer using Poetry it comes with a pyproject.toml file and a poetry.lock file which maintains the dependencies and locks the packages in poetry.lock file.

I hope this served as an introduction into maintaining code quality and managing various package versions. If you are interested in learning more about