department of informatics

HisDoc: Historical Document Analysis, Recognition, and Retrieval

Summary

The HisDoc research project, short for “Historical Document Analysis, Recognition, and Retrieval”, has brought together three partners from the Universities of Fribourg, Bern, and Neuchâtel in the fields of document image analysis, handwriting recognition, and information retrieval, respectively under the Sinergia program of the SNF in order to develop tools to support cultural heritage preservation by integrating historical manuscripts in digital libraries

It aimed at making historical documents, particularly medieval documents, electronically available for access via the Internet. The major aims of the research project were, first, to develop generic tools that can be adapted with little effort to different types of documents and languages. Secondly, after an interactive training phase, we wanted to perform image analysis and text recognition fully automatically. Thirdly, the text search engine should be able to cope with old languages as well as errors in the automatic transcription. We consider all these aims as reached.

The project was composed of three distinct modules strongly related:

  • Images Analysis Module had two main goals.
    •  Image enhancement, which consists inpreparing the historical document image for improving further analysis;
    •  Layout analysis, which aims at providing a structural description of the page’s content
  • Text Recognition Module proposed flexible and robust recognition systems that are suitable for the transcription of historical texts.
    Flexibility
    means that the systems can be adapted to new writing styles without great effort, while robustness means that the recognizers should attain a high rate of correct recognition.
  • Information Retrieval Module determined whether the quality obtained with the text recognition module would result in effective information retrieval from older manuscript collections.
    Instead of focusing only on OCR accuracy, this module will complement it with both retrieval effectiveness (precision and recall) and retrieval efficiency (response delay).

Detailed project description, publications and further information: HisDoc Homepage

Period

The project started on May 2009 and it was finished in June 2013.

Fundings

The project was funded by the Sinergia program of the SNF.

Participants

Partners

The project is performed in collaboration with

  • University of Fribourg (Unifr)
  • University of Bern (Unibe)
  • University of Neuchâtel (Unine)