department of informatics

HisDoc 2.0 — Towards Computer-Assisted Palaeography

An Integrated Approach Incorporating Text Localization, Script Analysis, and Semantics for Historical Documents

Summary

In HisDoc 2.0 we will investigate the yet missing ingredients for automatic large-scale analysis of historical documents, and how to make the results useful for historians. It will build upon the foundations laid in the HisDoc project and continue research on textual heritage preservation in a novel direction.

HisDoc 2.0 will take the approach a step further: it will be dedicated to palaeographical studies and incorporate semantic domain knowledge automatically extracted from existing document databases into Document Image Analysis (DIA) methods in order to facilitate large-scale processing.

Holistic DIA Approach

The innovations of HisDoc 2.0 are two-fold. First, we will address documents with complex layouts and several scribes, which have been circuited by the research community so far. Existing approaches presume laboratory conditions (e.g. high-quality binarization, or pre-segmented texts regions) and focus on sub-tasks, treating interrelated tasks independently.

In real-world applications, mutual dependencies exist, e.g. reliable script analysis depends on the exact localization of text regions on a page, which in presence of various kinds of scripts or personal writing styles then again depends on discriminating scripts. We will exploit mutual dependencies and analyze, develop, and integrate methods for text localization, script discrimination, and scribe identification into one holistic approach in order to obtain a flexible, robust, and generic approach for historical manuscript analysis in presence of complex layouts.

Detailed project description, publications and further information: HisDoc 2.0  Homepage

Period

The project will start in January 2014 and is granted for 3 years.

Fundings

The project is funded by the SNF.

Participants

External Partner

Master Thesis Projects