Fribourg Document-centric multimedia meeting browsers

Current researches in image and video analysis are willing to automatically create indexes and pictorial video summaries to help users browse through multimedia corpuses. However, those methods are often based on low-level visual features and lack semantic information. Other research projects use language understanding techniques or text caption derived from OCR, in order to create more powerful indexes and search mechanisms. Our assumption is that in a large proportion of multimedia applications (e.g. lectures, meetings, news, etc.), classical printable documents play a central role in the thematic structure of discussions. Further, we believe printable documents could provide a natural and thematic mean for browsing and searching through large multimedia repository.


JFriDoc [video]

JFriDoc is a multi-modal browser that allows to navigate in a meeting. It is an adaptation of FriDoc browser, described bellow in the page, using the JFerret framework.


TDoc

TDoc is low fidelity Tangible Document User Interface, developped in a single day. It uses RFID readers to identify the document in use. Further uses distance sensors to determine which document block has been selected by the user and finally identicates the number of related utterances that speak about this document in the related meetings. Finally the user can select using a physical scroll bar to listen to the meeting dialogs.


Inquisitor [video]

Inquisitor is a mono-modal browser for visualizing static documents and for correcting annotations provided from document analysis systems such as XED. It has been developed as master thesis by Florian Evéquoz.

Inquisitor focuses on printable electronic documents, e.g. books, articles, newspapers, which are defined as static documents. Inquisitor is therefore a system, which is used to (a) visualize a single static document, its annotations and the existing intra-document links, (b) validate annotations and links and finally (c) edit them. Since static documents are considered being a meaningful entry point for cross-media navi-gation in FriDoc and FaericWorld, the main conceptual task of Inquisitor is to prepare these static documents and ensure the consistency of annotations. In the context of static documents, the physical structure is the base annotation, which is directly derived from the raw media, i.e. a PDF document, and can be ex-tracted automatically using classical methods of document analysis. Other annotations include the reading order, the logical structure, the table of content, the thumbnail view, etc. Those derived annotations can be computed either automatically using recognizers or manually by the user. Physical structures are composed of clusters grouping homogenous document primitives (text, graphics and images), which respect topologic, stylistic and typographic proximity.


FriDoc [video]

Our prototype of document-centric multimedia meeting browser is illustrated in figure 1 and then on figure 2. First of all, figure 1 presents our cross-meeting browser, allowing a thematic search and browsing on a multimedia archive. All the newspaper articles, stored in the press reviews archive, are plotted on the visualization according to user request (e.g. Bush, Irak, Sharon, etc.). The most relevant articles are returned by the system and organized spatially according to the user keywords; the higher is an article, represented as a white circle, on the visualization, the more it contains user keywords and thus answers the user request. Further, the relative participation of each keyword is represented using histograms. The horizontal axis represents the date of the meeting in which the article was projected or discussed. This way, the visualization also indicates the evolution of a theme throughout the time. On the same visualization, the speech transcript for each meeting, represented as a black circle, is plotted following the same visualization rules. In fact, this cross-meeting browser allows visualizing quickly an important number of meetings, and favours a thematic browsing of the meeting archive, using not only the meetings speech transcript but also the content of the documents, discussed or projected during the meetings, as entry points to the meeting archive.

When the user selects an article, the corresponding meeting recordings are opened at the time when the article was discussed or projected. On figure 2, our intra-meeting browser is presented; it is composed of the following components: the documents in focus on the left, on top documents discussed and under documents projected, the audio/video clips in the middle, the structured transcription of the meeting dialogs on the right part, and finally the chronograph visualization on the bottom-right of the interface. All the representations are synchronized, meaning they all have the same time reference, and clicking on one of them causes all the components to visualize their content at the same time. For instance, clicking on a journal article positions audio/video clips at the time when it was discussed, positions the speech transcription at the same time, and displays the document that was projected. These visual links directly illustrate the document/speech and document/video alignments presented above in the article. The chronograph visualization at the bottom-right of figure 3 represents the complete meeting's duration. It is a visual overview of the overall meeting and can serve as a control bar. Each layer stands for a different temporal annotation: speaker turns, utterances, document blocks and slides projected. Other annotations can be displayed depending on the meeting type (topics, silences, dialog acts, pen-strokes for handwritten notes, gesture, etc.). Those temporal annotations are currently stored in the form of XML files, which hold timestamps for each state change (i.e. new speaker, new topic, slide change, etc.) and spatial information for documents. For example, the speech transcript contains speaker turns, divided in speech utterances, with their corresponding start and end times.

Furthermore, the chronograph visualization is interactive; users can click on any pie slice of a circle layer in order to access a specific moment of the meeting, a specific topic or a specific document article, thanks to the document/speech alignment. On the document side, clicking on an article places the audio/video sequences at the moment when the content of this document block is being direct illustration of document/speech and document/document alignments. The chronograph or other similar visualizations reveal some potential relationships between sets of annotations, synergies or conflicts, and can bring to light new methods in order improve the automatic generation of annotations.

At the time of writing, 22 meetings, of roughly 15 minutes each, have been integrated in our meeting browser, both at the cross-meetings and intra-meeting levels. Based on those data, a preliminary user evaluation of this documentcentric browser has been performed on 8 users. The goal was to measure the usefulness of document alignments for browsing and searching through a multimedia meeting archive. Users' performance in answering questions, both unimodal and multimodal schemas (e.g. Which articles from the New York Times have been discussed by Denis?), have been measured on both qualitative and quantitative basis (e.g. task duration, number of clicks, satisfaction, etc.). Users browsing meetings using document alignments solved 76% of the questions and users browsing meetings without the document alignments solved 66% of the questions. The performance difference becomes particularly significant for multi-modal questions, i.e. requiring information both from the speech transcript and from document discussed or projected. In this case, around 70% of the questions were solved when users were benefiting from the alignments and only half of the questions were solved without the alignments.

Reading more

  • Denis Lalanne, Rolf Ingold. " Documents statiques et multimodalité, L'alignement temporel pour structurer des archives multimédias de réunions ". Document numérique Vol.8 N° 4/2004, " Temps et Documents ", Lavoisier, 02-2005, pp.65-89.

  • Denis Lalanne, Rolf Ingold, Didier von Rotz, Ardhendu Behera, Dalila Mekhaldi, Andrei Popescu-Belis - "Using static documents as structured and thematic interfaces to multimedia meeting archives". In Bourlard H. & Bengio S., eds. (2004), Multimodal Interaction and Related Machine Learning Algorithms, LNCS, Springer-Verlag, Berlin, pp. 87-100.

  • "The IM2 Multimodal Meeting Browser Family", IM2 technical report [PDF].

  • A preliminary user evaluation, technical report (send email to Denis Lalanne for more info).

For more information, contact Denis Lalanne (AT) unifr.ch