Multimedia Search Engines

School of the 3eme cycle romand in Informatics

October 8th and 9th, 2007


With the emergent constitution of multimedia archives, the expansion of infotainment and the democratisation of multimedia acquisition devices, multimedia search engines are becoming crucial for managing both personal data and large multimedia archives. Further, medias (images, videos, sounds, documents, blogs, etc.) are often related and thus cross-media mining methods are fundamental not only to enrich multimedia indexing methods but also to extend current querying and search strategies. The major goal of this seminar is to give a state-of-the-art of the current research projects and to bring out the major challenges related to the creation of multimedia search engines, both personal and universal. The main players of the domain will present their approaches and projects, as well as researchers from various domains tackling related themes such as mono-modal data analysis (sound, speech, image, video, document, etc.), cross-media analysis, multimedia search strategies, multimodal querying methods, or finally multimedia information visualization.

Content of Tutorials

Semantic Indexing & Retrieval of Video (Dr. Marcel Worring, University of Amsterdam)

The semantic gap between the low level information that can be derived from the visual data and the conceptual view the user has of the same data is a major bottleneck in video retrieval systems. It has dictated that solutions to image and video indexing could only be applied in narrow domains using specific concept detectors, e.g., “sunset” or “face”. This leads to lexica of at most 10-20 concepts. The use of multimodal indexing, advances in machine learning, and the availability of some large, annotated information sources, e.g., the TRECVID benchmark, has paved the way to increase lexicon size by orders of magnitude (now 100 concepts, in a few years 1,000). This brings it within reach of research in ontology engineering, i.e. creating and maintaining large, typically 10,000+ structured sets of shared concepts. When this goal is reached we could search for videos in our home collection or on the web based on their semantic content, we could develop semantic video editing tools, or develop tools that monitor various video sources and trigger alerts based on semantic events. This tutorial lays the foundation for these exciting new horizons. It will cover: Different methods for semantic video indexing; Semantic retrieval; Interactive access to the data; Evaluation of indexing and interactive access in TRECVID; The challenges ahead and how to meet them.

Marcel Worring received the MSc degree (honors) and PhD degree, both in computer science, from the Vrije Universiteit, Amsterdam, The Netherlands, in 1988 and the Universiteit van Amsterdam in 1993, respectively. He is currently an associate professor at the University of Amsterdam. His interests are in multimedia search and systems. He has published over 100 scientific papers and serves on the program committee of several international conferences. He is the chair of the IAPR TC12 on Multimedia and Visual Information Systems. He is general chair of the 2007 ACM International Conference on Image and Video Retrieval in Amsterdam.

Video processing for indexing and retrieval (Dr. Georges Quénot, Laboratoire d'Informatique de Grenoble - CNRS)

Indexing the contents of video documents requires a lot of consecutive steps to go from the raw binary contents up to its semantic interpretation. In this course, we will focus on the first stages of the contents analysis which correspond to the extraction of low to intermediate level information. This includes: low level visual feature extraction; shot Boundary Detection, key frame selection and shot or keyframe clustering; camera motion indexing and mobile object segmentation and tracking; story segmentation and video structuring or summarization; speaker identification and emotion indexing. The question of the evaluation of video indexing and retrieval systems will also be addressed and illustrated in the context of the TREC/TRECVID campaigns.

Georges Quénot is Researcher at CNRS (French National Centre for Scientific Research). He has an engineer diploma of the French Polytechnic School (1983) and a PhD in computer science (1988) from the University of Orsay. He is currently with the Multimedia Information Indexing and Retrieval group (MRIM) of the Laboratoire d'Informatique de Grenoble (LIG) where he is responsible for the activities on video indexing and retrieval. His current research activity is about semantic indexing of image and video documents using supervised learning, networks of classifiers and multimodal fusion. He participated since 2001 in the NIST TRECVID evaluations on shot segmentation, story segmentation, concept indexing and search tasks.

Music Information Retrieval (Prof. Andreas Rauber, Vienna University of Technology)

In this course we will take a closer look at the various areas, tasks, and methods that together form the field of music information retrieval (MIR). We will start by considering the various types of data that are relevant for MIR activities, ranging from both symbolic as well as acoustic music data, via textual, up to image and video data. This will be followed by a brief overview of the overwhelming number of tasks and challenges in MIR to provide a thorough understanding of the problem domain and the interdisciplinary nature of this domain. The core part of the course will then address a number of selected topics. Specifically, we will focus on various techniques for feature extraction from music, and their utilization for tasks such as retrieval, genre classification, chord detection, and others. We will also analyze and discuss the benefits of combining different modalities, such as textual and acoustic information, as well as the utilization of web information for these tasks. Last, but not least, we will take a closer look at a few applications, such as the PlaySOM and PocketSOM, that assist users in organizing their music collections, creating playlists on desktop computers as well as mobile phones.

Andreas Rauber is Associate Professor at the Department of Software Technology and Interactive Systems (ifs) at the Vienna University of Technology (TU-Wien). He furthermore is head of the iSpaces research group at the eCommerce Competence Center (ec3). He received his MSc and PhD in Computer Science from the Vienna University of Technology in 1997 and 2000, respectively. In 2001 he joined the National Research Council of Italy (CNR) in Pisa as an ERCIM Research Fellow, followed by an ERCIM Research position at the French National Institute for Research in Computer Science and Control (INRIA), at Rocquencourt, France, in 2002. In 1998 he received the ÖGAI Award of the Austrian Society for Artificial Intelligence (ÖGAI), and the Cor-Baayen Award of the European Research Consortium for Informatics and Mathematics (ERCIM) in 2002. His research interests cover the broad scope of digital libraries, including specifically text and music information retrieval and organization, information visualization, as well as data analysis and neural computation.

This workshop is part of the 3eme cycle romand in Informatics
