Travel Document Data Extraction

DOT provides both MRZ parsing of the raw MRZ data, and a direct data extraction from MRZ of the passport's image.

Most travel identity documents nowadays are machine-readable, which means that the most relevant data is encoded in format suitable for optical character recognition (OCR). The part of the identity document where this information is embedded is called the machine-readable zone, or MRZ for short.

Passport data extraction

DOT Document Server provides specific Passports API call to extract and parse data from any passport that fulfills ICAO Document 9303 specification. This provides only the data encoded in MRZ.

Parsing the MRZ data

DOT Document Server can extract the raw MRZ field data from identity document via the OCR API call. Subsequently, DOT Core server can decode the document MRZ data (of all types described below) via the MRZ Parsing API call

Short Extract of the MRZ specification

According to the ICAO Document 9303, there are three standardized document types depending on the position of the MRZ within the document. These three types are:

  • Size 1 Travel Document (TD1)
  • Size 2 Travel Document (TD2)
  • Size 3 Travel Document (TD3)

Size 1 Travel Document (TD1)

The TD1 is mostly used in identity cards. As space is limited, the MRZ is moved to the back, which results in the need to capture both the front and the back of the document in order to both assess its validity and extract the required information. Furthermore, each issuing country can add additional content to the document, which is usually on the back of the document above the MRZ.

TD1 document example:

Travel Document TD1 - Sample)

As seen in the picture above, the MRZ of the TD1 spans 3 lines, and each line is 30 characters long. The optional content goes in the area above the MRZ. Furthermore, there are check digits added to the MRZ so that the data in the MRZ can be verified.

The following image explains the different fields that are present in the MRZ:

Travel Document TD1 - MRZ Sample

Innovatrics DOT Core Server is able to extract and present this information, as well as validate the check digits.

Size 2 Travel Document (TD2)

Although also used in several identity cards, the TD2 is being replaced by the TD1 due to its more manageable size.

One of the main benefits of the TD2 is that it has the MRZ on the front side – making the back side less important. This means that only the front side of the document needs to be scanned to extract the required information.

TD2 document example:

Travel Document TD2 - Sample)

As illustrated above, the MRZ of the TD2 spans 2 lines, and each line is 36 characters long.

The following image explains the different fields that are present in the MRZ:

Travel Document TD2 - MRZ Sample

Size 3 Travel Document (TD3)

The TD3 is used for most passports worldwide. This document, although being a booklet, contains a card with all the information on the front. Therefore only the front of the document needs to be scanned, making the process easier for passport control officers as well as data extraction software such as Innovatrics DOT Core Server.

TD3 document example:

Travel Document TD3 - Sample)

As illustrated above, the MRZ of the TD3 spans 2 lines, and each line is 44 characters long. Furthermore, there are several check digits added to the MRZ so that the data in the MRZ can be verified.

The following image explains the different fields that are present in the MRZ:

Travel Document TD3 - MRZ Sample

Innovatrics DOT Core Server can extract and process this data for most TD3 documents, as well as verifying whether the check digits are valid or not.