With the emergent constitution of multimedia archives, the expansion of infotainment and the democratisation of multimedia acquisition devices, multimedia search engines are becoming crucial for managing both personal data and large multimedia archives. Further, medias (images, videos, sounds, documents, blogs, etc.) are often related and thus cross-media mining methods are fundamental not only to enrich multimedia indexing methods but also to extend current querying and search strategies.
The major goal of this seminar is to give a state-of-the-art of the current research projects and to bring out the major challenges related to the creation of multimedia search engines, both personal and universal. The main players of the domain will present their approaches and projects, as well as researchers from various domains tackling related themes such as mono-modal data analysis (sound, speech, image, video, document, etc.), cross-media analysis, multimedia search strategies, multimodal querying methods, or finally multimedia information visualization.
Interested participants are invited to register here.
- Prof. Dr. Andreas Rauber, Vienna University of Technology, Austria
- Dr. Marcel Worring, University of Amsterdam, The Netherlands
- Dr. Nozha Boujemaa, INRIA Rocquencourt, France
- Dr. George Quenot, CLIPS-IMAG, Grenoble, France
- Prof. Fabio Crestani, University of Lugano, Switzerland
- Dr. Robert van Kommer, Swisscom Innovation, Switzerland
- Dr. Alessandro Vinciarelli, IDIAP, Switzerland
- Dr. Thomas Hofmann, Google, Switzerland
[ September 15th, 2007 ] - Registration deadline
[ October 8th and 9th, 2007 ] - Date of the school
Interested participants are invited to register here.
Content of Tutorials
Semantic Indexing & Retrieval of Video (Dr. Marcel Worring, University of Amsterdam)
The semantic gap between the low level information that can be derived from the visual data and the conceptual view the user has of the same data is a major bottleneck in video retrieval systems. It has dictated that solutions to image and video indexing could only be applied in narrow domains using specific concept detectors, e.g., “sunset” or “face”. This leads to lexica of at most 10-20 concepts. The use of multimodal indexing, advances in machine learning, and the availability of some large, annotated information sources, e.g., the TRECVID benchmark, has paved the way to increase lexicon size by orders of magnitude (now 100 concepts, in a few years 1,000). This brings it within reach of research in ontology engineering, i.e. creating and maintaining large, typically 10,000+ structured sets of shared concepts. When this goal is reached we could search for videos in our home collection or on the web based on their semantic content, we could develop semantic video editing tools, or develop tools that monitor various video sources and trigger alerts based on semantic events. This tutorial lays the foundation for these exciting new horizons. It will cover: Different methods for semantic video indexing; Semantic retrieval; Interactive access to the data; Evaluation of indexing and interactive access in TRECVID; The challenges ahead and how to meet them.
Marcel Worring received the MSc degree (honors) and PhD degree, both in computer science, from the Vrije Universiteit, Amsterdam, The Netherlands, in 1988 and the Universiteit van Amsterdam in 1993, respectively. He is currently an associate professor at the University of Amsterdam. His interests are in multimedia search and systems.
He has published over 100 scientific papers and serves on the program committee of several international conferences. He is the chair of the IAPR TC12 on Multimedia and Visual Information Systems. He is general chair of the 2007 ACM International Conference on Image and Video Retrieval in Amsterdam.
Video processing for indexing and retrieval (Dr. Georges Quénot, Laboratoire d'Informatique de Grenoble - CNRS)
Indexing the contents of video documents requires a lot of
consecutive steps to go from the raw binary contents up to its
semantic interpretation. In this course, we will focus on the first
stages of the contents analysis which correspond to the extraction
of low to intermediate level information. This includes:
low level visual feature extraction; shot Boundary Detection,
key frame selection and shot or keyframe clustering; camera motion
indexing and mobile object segmentation and tracking; story
segmentation and video structuring or summarization; speaker
identification and emotion indexing. The question of the evaluation
of video indexing and retrieval systems will also be addressed
and illustrated in the context of the TREC/TRECVID campaigns.
Georges Quénot is Researcher at CNRS (French National Centre for
Scientific Research). He has an engineer diploma of the French
Polytechnic School (1983) and a PhD in computer science (1988)
from the University of Orsay. He is currently with the Multimedia
Information Indexing and Retrieval group (MRIM) of the Laboratoire
d'Informatique de Grenoble (LIG) where he is responsible for the
activities on video indexing and retrieval. His current research
activity is about semantic indexing of image and video documents
using supervised learning, networks of classifiers and multimodal
fusion. He participated since 2001 in the NIST TRECVID evaluations
on shot segmentation, story segmentation, concept indexing and
Music Information Retrieval (Prof. Andreas Rauber, Vienna University of Technology)
In this course we will take a closer look at the various areas, tasks,
and methods that together form the field of music information retrieval
We will start by considering the various types of data that are relevant
for MIR activities, ranging from both symbolic as well as acoustic music
data, via textual, up to image and video data. This will be followed by
a brief overview of the overwhelming number of tasks and challenges in
MIR to provide a thorough understanding of the problem domain and the
interdisciplinary nature of this domain.
The core part of the course will then address a number of selected
topics. Specifically, we will focus on various techniques for feature
extraction from music, and their utilization for tasks such as
retrieval, genre classification, chord detection, and others. We will
also analyze and discuss the benefits of combining different modalities,
such as textual and acoustic information, as well as the utilization of
web information for these tasks. Last, but not least, we will take a
closer look at a few applications, such as the PlaySOM and PocketSOM,
that assist users in organizing their music collections, creating
playlists on desktop computers as well as mobile phones.
Andreas Rauber is Associate Professor at the Department of Software
Technology and Interactive Systems (ifs) at the Vienna University of
Technology (TU-Wien). He furthermore is head of the iSpaces research
group at the eCommerce Competence Center (ec3).
He received his MSc and PhD in Computer Science from the Vienna
University of Technology in 1997 and 2000, respectively. In 2001 he
joined the National Research Council of Italy (CNR) in Pisa as an ERCIM
Research Fellow, followed by an ERCIM Research position at the French
National Institute for Research in Computer Science and Control (INRIA),
at Rocquencourt, France, in 2002.
In 1998 he received the ÖGAI Award of the Austrian Society for
Artificial Intelligence (ÖGAI), and the Cor-Baayen Award of the European
Research Consortium for Informatics and Mathematics (ERCIM) in 2002.
His research interests cover the broad scope of digital libraries,
including specifically text and music information retrieval and
organization, information visualization, as well as data analysis and
This workshop is part of the 3eme cycle romand in Informatics
For more information, please refer to the conference website at:
http://diuf.unifr.ch/3e-cycle/ or http://www.cuso.ch/3e-cycle/bienvenue.html