department of informatics

Dolores - Document Logical Restructuring

Summary :

Dolores aims at recovering the logical structures of electronic documents (newspaper, scientific papers...). Dolores uses the canonical format (XCDF) generated by XED as a starting point. From this format representing the physical structure of a document, Dolores focuses on further logical structures recovering. This recovered knowledge could be extremely useful :

  • retrieval and document alignment could be drastically improved due to a more precise indexing;
  • document reediting would be far more easy (retro-engineering);
  • application of views and styles over logical data allowing powerful information selection and presentation.

Currently Dolores focuses on the newspaper class, this kind of documents offers a lot of interesting and relevant features: a rich layout with a lot of typographical and topological information as well as deep logical hierarchies. In the future Dolores should be able to process any kind of textual centric documents.

 

Period : The project has started on summer 2004 and is supposed to last 4 years

 

Participants :


Publications related to this project

  • J.-L. Bloechle, M. Rigamonti, D. Lalanne, R. Ingold, "XCDF : un format canonique pour la repr√©sentation de documents." In proc. of Colloque International Francophone sur l'Ecrit et le Document (CIFED'06), Fribourg (Switzerland), September 18 - 22 2006 , pp. 19-23.
  • J.-L. Bloechle, M. Rigamonti, K. Hadjar, D. Lalanne, R. Ingold, "XCDF: A Canonical and Structured Document Format." In Horst Bunke, A. Lawrence Spitz (eds.), LNCS: "7th International Workshop, DAS 2006, Nelson, New Zealand, February 13-15, 2006, Proceedings", Springer-Verlag, vol. 3872, ISBN:3-540-32140-3, 2006 , pp. 141-152.