DOT Document Server

v5.17.0

Overview

DOT Document Server is a RESTful microservice for document image normalizing and recognising document visual zones, mainly text fields.

API Reference

The DOT Document Server API reference is published here

Distribution package contents

You can find the distribution package in our CRM portal. It contains these files:

Your sales representative will provide you the credentials for the CRM login.
  • config – The configuration folder

    • application.yml – The application configuration file, see Externalized configuration

    • logback-spring.xml – The logging configuration file

  • doc – The documentation folder

    • Innovatrics_DOT_Document_Server_version_Technical_Documentation.html – Technical documentation

    • Innovatrics_DOT_Document_Server_version_Technical_Documentation.pdf – Technical documentation

    • swagger.json – Swagger API file

  • docker – The Docker folder

    • Dockerfile – The text document that contains all the commands to assemble a Docker image, see Docker

    • entrypoint.sh – The entry point script

  • libs – The libraries folder

    • libsam.so – The Innovatrics OCR library

    • libiface.so – The Innovatrics IFace library

    • solvers – The Innovatrics IFace library solvers

  • dot-document-server.jar – The executable JAR file, see How to run

  • Innovatrics_DOT_Document_Server_version_postman_collection.json – Postman collection

Installation

System requirements

  • Ubuntu 18.04 (64-bit)

Steps

  1. Install the following packages:

    • OpenJDK Runtime Environment (JRE) (openjdk-11-jre)

    • Locales

    apt-get update
    apt-get install -y openjdk-11-jre locales
  2. Set the locale

    sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen
    export LANG=en_US.UTF-8; export LANGUAGE=en_US:en; export LC_ALL=en_US.UTF-8
  3. Extract the DOT Document Server distribution package to any folder.

  4. Link the application libraries:

    ldconfig /local/path/to/current/dir/libs
    Replace the path /local/path/to/current/dir in the command with your current path. Keep /libs as a suffix in the path.

Activate the DOT license

You only need to activate DOT license if you are going to use Digital Attack Detection

The activation of the DOT license depends on the type of your deployment.

If you perform serverless or Docker deployments, please contact your sales representative or sales@innovatrics.com to receive a license. Once you receive the license, please deploy it as described in steps 5 and 6 below.

If you perform a bare metal installation, or use a fixed VM or AWS instance, perform the following steps:

  1. Run DOT Document Server to generate the Hardware ID necessary for the license.

    java -Dspring.config.additional-location=file:config/application.yml -Dlogging.config=file:config/logback-spring.xml -DLOGS_DIR=logs -Djna.library.path=libs/ -jar dot-document-server.jar

    Copy the Hardware ID, which you can find in the output. See the example below:

    Unable to init IFace. Hardware ID: xxxxxxxxxxxx
  2. Visit our CRM portal and go to Products > Digital Onboarding Toolkit > Licenses.

  3. Then, select Generate License and paste the Hardware ID.

    Generate license
  4. Confirm again with Generate License and download the license.

  5. Copy your license file for Innovatrics IFace SDK 4.10.0 under your application folder (e.g. {DOT_DOCUMENT_SERVER_DIR}/license/iengine.lic)

  6. If your license is installed in the system, then it is used by default. Otherwise, set the property innovatrics.dot.iface.license.filepath to the path of your iengine.lic file in config/application.yml:

    innovatrics:
        dot:
            iface:
                license:
                    filepath: license/iengine.lic

How to run

As DOT Document Server is a stand-alone Spring Boot application with an embedded servlet container, there is no need for deployment on a pre-installed web server. Instead just run in the application folder:

java -Dspring.config.additional-location=file:config/application.yml -Dlogging.config=file:config/logback-spring.xml -DLOGS_DIR=logs -Djna.library.path=libs/ -jar dot-document-server.jar

Embedded Tomcat web server will be started and the application will be listening on the port 8080 (or another configured port).

Docker

For building a Docker image, you can use the Dockerfile and the entrypoint.sh script. A Dockerfile example and Entrypoint.sh script example can be also found in the Appendix.

Build the Docker image as follows:

cd docker
cp ../dot-document-server.jar .
cp ../libs/libsam.so.* .
cp ../libs/libiface.so.* .
cp -r ../libs/solvers/ ./solvers
sed -i -e 's/#license/license/' ../config/application.yml
sed -i -e 's/#filepath/filepath/' ../config/application.yml
docker build --build-arg JAR_FILE=dot-document-server.jar --build-arg SAM_OCR_LIB=libsam.so.* --build-arg IFACE_LIB=libiface.so.* -t dot-document-server .

Run Docker container without IFace

Run the container according to the instructions below:

docker run -v /local/path/to/config/dir/:/srv/dot-document-server/config -v /local/path/to/logs/dir/:/srv/dot-document-server/logs -p 8080:8080 dot-document-server
Replace the path /local/path/to/config/dir/ in the command with your local path to the config directory (from the distribution package).
Replace the path /local/path/to/logs/dir/ in the command with your local path to the logs directory (you need to create the directory).

Run Docker container with IFace

When you have an active DOT license you can run the container according to the instructions below:

docker run -v /local/path/to/license/dir/:/srv/dot-document-server/license -v /local/path/to/config/dir/:/srv/dot-document-server/config -v /local/path/to/logs/dir/:/srv/dot-document-server/logs -p 8080:8080 dot-document-server
Replace the path /local/path/to/license/dir/ in the command with your local path to the license directory.
Replace the path /local/path/to/config/dir/ in the command with your local path to the config directory (from the distribution package).
Replace the path /local/path/to/logs/dir/ in the command with your local path to the logs directory (you need to create the directory).

Externalized configuration

YAML configuration file is located under the config folder:

config/application.yml

There are two groups of properties:

  • Spring Boot properties

  • DOT Document Server specific properties

Spring Boot properties

You can find the specification at Common Application properties.

For example, if you would like to specify a different server port, you just add the following property:

server.port=9080

To fully understand how the externalized configuration works in Spring Boot, see Spring Boot documentation, chapter Externalized Configuration.

DOT Document Server specific properties

These properties are tied with DOT Document Server specific behavior. All project properties are described directly in the distributed YAML configuration file.

Externalized configuration via command line arguments

You can specify any Spring Boot property via command line arguments, as you can see below:

java -jar dot-document-server.jar --server.port=9080

Logging

DOT Document Server logs to the console and writes the log file (dot-document-server.log) as well. The log file is located at a directory defined by the LOGS_DIR system property. Log files rotate when they reach 5 MB size and the maximum history is 5 files by default.

API Transaction Counter Log

The separate log file dot-document-transaction-counter.log is located at a directory defined by the LOGS_DIR system property. This log file contains information about counts of API calls (transactions). The same rolling policy is applied as for the application log, except the maximum history of this log file is 180 files.

Docker: Persisting log files in local filesystem

When you run DOT Document Server as a Docker container, you may have access to log files even after the container doesn’t exist anymore. This can be achieved by using Docker volumes. To find out how to run a container, see Docker.

Monitoring

Information as build or license info can be accessed on /api/v6/actuator/info. Information about available endpoints can be viewed under /swagger-ui.html.

The health endpoint accessible under /actuator/health provides information about the health of the application. This feature can be used by an external tool such as Spring Boot Admin, etc. For more information, see Spring Boot documentation, section Production ready monitoring. Spring Boot Actuator Documentation also provides info about other monitoring endpoints that can be enabled.

Application also supports exposing metrics in standardised prometheus format. These are accessible under /actuator/prometheus

Tracing

OpenTracing API with Jaeger implementation is used for tracing purposes. The DOT Document Server tracing implementation supports SpanContext extraction from HTTP request using HTTP Headers format. For more information, see OpenTracing Specification. Tracing is disabled by default. To enable Jaeger tracing:

Set these application properties:

opentracing:
  jaeger:
    enabled: true
    udp-sender:
      host: jaegerhost
      port: portNumber

For more information about Jaeger configuration, see Jaeger Client Lib Docs.

Image requirements

  • The supported image formats are JPEG, PNG, BMP, WEBMP or GIF

  • The document image must be large enough — when the document card is normalized, the text height must be at least 32 px (document card height is approximately 1000 px)

  • The document card edges must be clearly visible and be placed at least 10 px inside the image area

  • The image must be sharp enough for the human eye to recognize the text

  • Image should not contain objects or background with visible edges. (example below) This can confuse process of detecting card on image

EdgesDemo

Appendix

Changelog

5.16.0 - 2021-07-15

  • Internal improvements.

5.15.0 - 2021-06-18

Added
  • Added Display Attack Detection. Enabled and configured IFace is required for this functionality.

  • Added config property: innovatrics.dot.iface.enabled: Enable IFace. When set to true, an active IFace license is needed.

  • API: DocumentOcrRequest.documentProperties.displayAttackDetection.enabled: Request the display attack detection.

  • API: DocumentOcrResponse.documentProperties.displayAttackDetection: Display attack detection result.

5.14.0 - 2021-05-14

  • Internal improvements.

5.13.0 - 2021-04-27

Changed
  • Internal improvements.

5.12.1 - 2021-04-09

Changed
  • API: DocumentOcrResponse.textFields: The textFields field will always be returned when the response is 200.

  • API: DocumentOcrResponse.imageFields: The imageFields field will always be returned when the response is 200.

5.12.0 - 2021-04-07

Added
  • API: DocumentOcrRequest.documentProperties.quality.enabled: Enable the quality check.

  • API: DocumentOcrResponse.documentProperties.quality: The quality check result. # Changed

  • API: DocumentOcrResponse.documentProperties: The document properties are returned now when the authenticity check or quality check are enabled in request. The document properties can contain the document color similarity and a result of the authenticity check and/or result of the quality check. Therefore, the authenticity check result and the color similarity are optional in the document properties from now on.

5.11.0 - 2021-03-18

Changed
  • API: DocumentOcrResponse.documentProperties.colorProfile.similarityScoreLevel: Added VERY_LOW value

  • API: DocumentOcrResponse.documentProperties.authenticity.confidenceLevel: Added VERY_LOW value

  • API: DocumentOcrResponse.documentProperties.authenticity.suspiciousFields.textFields.confidenceLevel: Added VERY_LOW value

  • API: DocumentOcrResponse.documentProperties.authenticity.suspiciousFields.imageFields.confidenceLevel: Added VERY_LOW value

5.10.1 - 2021-03-08

Changed
  • Internal improvements.

5.10.0 - 2021-03-05

Changed
  • Internal improvements.

5.9.0 - 2021-02-26

Changed
  • Internal improvements.

5.8.0 - 2021-02-18

Changed
  • Internal improvements.

5.7.0 - 2021-01-15

Changed
  • Internal improvements.

5.6.0 - 2020-12-16

Changed
  • Internal improvements.

5.5.0 - 2020-12-14

Changed
  • Internal improvements.

5.4.0 - 2020-12-08

Changed
  • Internal improvements.

5.3.0 - 2020-12-04

Added
  • API: DocumentMetadataResponse.documentTypes.pages.textFields.valueNormalized: Flag to inform if the value in this field is being returned normalized.

5.2.0 - 2020-11-12

Changed
  • Internal improvements.

5.1.1 - 2020-10-30

Fixed
  • API: DocumentOcrResponse.documentProperties.authenticity.details: Removed from the response.

5.1.0 - 2020-10-30

Added
  • API: DocumentOcrResponse.documentProperties.authenticity.suspiciousFields: Text fields and image fields, which are suspicious by their authenticity.

Removed
  • API: DocumentOcrResponse.documentProperties.authenticity.details: The authenticity details were removed from the response.

5.0.0 - 2020-10-23

Added
  • Added the authenticity check feature.

  • API: DocumentOcrRequest.documentProperties.authenticity.enabled: Enable authenticity check.

  • API: DocumentMetadataResponse.documentTypes.pages.authenticity: The authenticity metadata.

  • API: DocumentOcrResponse.documentProperties: The document properties, which are returned only when the authenticity check is enabled in request. The document properties contain a document color similarity and a result of the authenticity check.

Changed
  • Rename the application to Document Server.

  • New API version 5.

  • API: DocumentOcrResponse.textFields.confidence: Change the data type from Int to Double and the value interval from [0,1000] to [0,1].

  • API: DocumentOcrResponse.textFields.lines.confidence: Change the data type from Int to Double and the value interval from [0,1000] to [0,1].

4.26.0 - 2020-10-19

Changed
  • Internal improvements.

4.25.0 - 2020-10-12

Changed
  • API: DocumentOcrRequest.documentTypeAdvice.country: Case insensitive.

  • API: DocumentOcrRequest.documentTypeAdvice.type: Case insensitive.

  • API: DocumentOcrRequest.documentTypeAdvice.edition: Case insensitive.

  • API: DocumentOcrRequest.documentTypeAdvice.machineReadableTravelDocument: Case insensitive.

  • API: DocumentOcrRequest.documentTypeAdvice.pageTypes: Case insensitive.

4.24.0 - 2020-10-09

Changed
  • Internal improvements.

4.23.0 - 2020-10-01

Changed
  • Internal improvements.

4.22.0 - 2020-09-30

Changed
  • Internal improvements.

4.21.0 - 2020-09-10

Changed
  • Internal improvements.

4.20.0 - 2020-09-04

Changed
  • Internal improvements.

4.19.0 - 2020-08-25

Changed
  • Internal improvements.

4.18.0 - 2020-08-22

Changed
  • Internal improvements.

4.17.0 - 2020-08-13

Changed
  • Internal improvements.

4.16.0 - 2020-08-13

Changed
  • Internal improvements.

4.15.1 - 2020-08-10

Changed
  • Internal improvements.

4.14.0 - 2020-08-07

Changed
  • Internal improvements.

4.13.0 - 2020-07-30

Changed
  • Internal improvements.

4.12.0 - 2020-07-15

Changed
  • Internal improvements.

4.11.0 - 2020-07-02

Changed
  • Internal improvements.

4.10.0 - 2020-06-24

Changed
  • Internal improvements.

4.9.0 - 2020-06-22

Changed
  • Internal improvements.

4.8.0 - 2020-06-12

Changed
  • Internal improvements.

4.7.0 - 2020-06-11

Changed
  • Internal improvements.

4.6.0 - 2020-06-10

Changed
  • Internal improvements.

4.5.0 - 2020-06-08

Changed
  • Internal improvements.

4.4.0 - 2020-05-07

Fixed
  • Normalized image aspect ratio.

4.3.0 - 2020-04-27

Added
  • API: New attribute DocumentOcrResponse.normalizedDocumentImage: If an image of a normalized document should be present in a response.

Changed
  • API: DocumentOcrResponse.normalizedImage: Change to JPG image format.

  • API: DocumentOcrResponse.normalizedImage: Not present in response by default. (See DocumentOcrResponse.normalizedDocumentImage).

  • OCR performance optimization.

4.2.0 - 2020-04-17

Changed
  • Improve document image normalization.

4.1.0 - 2020-04-03

Added
  • New major release.

DOT Document Server Dockerfile example

FROM ubuntu:18.04

WORKDIR /srv/dot-document-server

VOLUME ["/srv/dot-document-server/logs"]
VOLUME ["/srv/dot-document-server/config"]

ARG JAR_FILE
ARG SAM_OCR_LIB

COPY ${JAR_FILE} ./app.jar
COPY ${SAM_OCR_LIB} ./lib/
COPY entrypoint.sh /usr/local/bin/

RUN set -ex \
  && apt-get update \
  && apt-get install -y openjdk-11-jre curl locales \
  && touch *.jar \
  && chmod 500 /usr/local/bin/entrypoint.sh \
  && ldconfig /srv/dot-document-server/lib/

ENV JAVA_OPTS=''
ENV LOGS_DIR=/logs

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && \
    locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

ENTRYPOINT ["sh", "-c", "entrypoint.sh"]

Entrypoint.sh script example

#!/bin/sh
set -e

exec sh -c "java $JAVA_OPTS -Dspring.config.additional-location=file:config/application.yml -Dlogging.config=file:config/logback-spring.xml -jar app.jar"