department of informatics

Geometric Relations in Allograph-Based Writer Identification

Student:
Marcel Würsch

Contact: 
Rolf Ingold

Angelika Garz

 

Introduction

Handwriting is a biometric feature and is considered to be unique to a person. The main factors influencing a person’s handwriting are biological and cultural alike. But research still relies on the hypothesis that a person’s writing style is consistent over time and distinct from the handwriting of another individual.

Writer identification is the task of matching a handwriting sample against a database of known scribes. The aim is to return a list of candidates that share the same handwriting attributes as the document in question. Current research aims to tackle two different remaining problems. On one hand identification rates can still be improved, even though modern systems are sometimes able to achieve near perfect recognition on small datasets. The accuracy decreases with an increase of the database size. On the other hand writer identification systems should be able to reason their decision-making, i.e., make the process transparent to a human. This is a problem of state-of-the-art systems built up on distance or similarity based matching. These methods provide an answer to the binary question whether two handwritings are alike or not; but are a black box and do not provide information about how the decision was achieved and why two handwritings are considered similar.

Allographic methods are one way of addressing this latter problem. Allographs are writer-specific shape variants of graphemes, which are the smallest semantically distinguishing unit of a writing system, i.e., parts of characters such as loops. Allographic methods have the potential for translation of their results into a human-understandable report.

Such a representation of the results would be of great use. Writer identification systems applied in forensics aim at having a semi- or full-automatic system that can judge on the identification of a writer. But such a system cannot be used in court if the results are not understandable by a jury or a judge.

Scope of the thesis

The aim of this thesis is to assess allographic methods and evaluate their performance. This could lead to writer identification systems that are able to better reason about their decision making.

We will investigate if additional meta-codebooks can be used to improve identification rates. Meta-codebooks encode information not available in standard allographic codebooks, e.g. geometric relations of allographs. The goal is to examine whether writer-specific combinations of allographs can be found.

The performance of the method will be compared to two baseline methods. The baseline methods are chosen so that code might be reused. One of the methods is a non-allographic one. This allows to not only evaluate if the proposed method improves allographic approaches but also if it performs well in comparison to other methods.

The goals of this thesis are the following:

  • Implement and compare selected state-of-the-art methods (as baseline)
  • Evaluate the use of Implicit Shape Model (ISM) for allograph-based identification
  • Incorporate implemented methods in the HisDoc 2.0 framework

Technologies

  • Java – As programming language (HisDoc 2.0 is Java-based)
  • BoofCV – an open source Java library for computer vision
  • Piccolo2d – a GUI toolkit library for Java
  • Apache Commons Libraries– A repository of reusable Java components