Fall 2015

Data Science Seminar

Lecturers: Mourad Khayati and Djellel Eddine Difallah

Teaching language: English

Level: MSc students

Academic year: Fall 2015

Overview

Structure

Evaluation and Expectations

Schedule

List of Papers


Overview

The seminar on data science involves presentations that cover recent topics on data science. In the scope of this seminar, we investigate two sets of papers. The first set of papers will cover scalable machine learning techniques. A special focus will be on clustering, compression and similarity techniques used for time series data and graphs. Additionally, matrix decomposition/factorization and sentiment analysis techniques will be studied.

The second set of papers will cover big-data management infrastructures. We will focus on data storage techniques tailored to specific data types, e.g., graphs, time-series and arrays, in addition to generic data formats used in scalable distributed file systems such as Hadoop's HDFS. We will also consider papers on job scheduling techniques used in large data processing centers shared by thousands of data scientists


Structure

The goal for the students is to learn how to critically read and study research papers, how to describe  a paper in a report, and how to present it in a seminar. Under supervision, students will select one paper to study, contrast and compare with related work. This seminar aims to help students to gather in-depth knowledge of an advanced topic and develop the skills required to describe a complex problem in the form of both a presentation and a written report.

IMPORTANT NOTE: The papers will be distributed on a first come first serve basis.


Evaluation and Expectations

The final grade depends on the quality of the report, presentation and active participation during the seminar. Each participant prepares a self contained report of max 10 pages and gives a presentation of 20 minutes. The report should describe in detail the proposed technique(s). The report might contain a small running example and should explore the extreme cases where the proposed approach would perform best and worst.

IMPORTANT NOTE: Attendance is mandatory for the two class seminar sessions.


Schedule

Kickoff Meeting. Date: Tue, 22.09.2015, 14:00-15:00
Setup and organization of seminar, and paper assignment

----------------------------------------------------------------------

Date: Tue, 3.11.2015
Report deadline
Batch1

Date: Tue, 10.11.2015, all day
Office meeting with students from
Batch1

First Seminar Session. Date: Tue, 17.11.2015, 14:00-18:00, room: A303

Presentations of Batch1

----------------------------------------------------------------------

Date: Tue, 1.12.2015
Report deadline of
Batch2

Date: Wed, 09.12.2015, all day
Office meeting with students from
Batch2

Second Seminar Session. Date: Tue, 15.12.2015, 14:00-18.00, room: A303

Presentations of Batch2

Date: Tue, 12.01.2016
Deadline final Report of
Batch1 and Batch2


Paper Assignment

The papers will be distributed on a first come first serve basis. Please use the following online application to select one paper among the list of papers below: assignment app

Paper

Presentation Date

Presenter

Contact

Report Deadline

(1) Spinning Fast Iterative Data Flows

17.11.2015

Adrian

Hänni

Mourad Khayati 

3.11.2015

(2) Crowdsourced Enumeration Queries 

17.11.2015

Soheil Roshankish

Djellel Eddine Difallah

3.11.2015

(3) Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing 

17.11.2015

Jeremy Serre

Mourad Khayati 

3.11.2015

(4) Distributed Representations of Words and Phrases and their Compositionality  + implementation

17.11.2015

Alexandre

Nikodemski

Djellel Eddine Difallah 

3.11.2015

(5) Entity Linking meets Word Sense Disambiguation: a Unified Approach 

17.11.2015

Axel Cotting

Djellel Eddine Difallah

3.11.2015

(6) Discovering Recurring Patterns in Time Series

15.12.2015

Felix Meyenhofer

Mourad Khayati 

1.12.2015

(7) Rare Time Series Motif Discovery from Unbounded Streams

15.12.2015

Oliver

Stapleton

Mourad Khayati 

1.12.2015

(8) Time series anomaly discovery with grammar-based compression

15.12.2015

Abir Ben Slimane

Mourad Khayati 

1.12.2015

(9) Paxos Quorum Leases: Fast Reads Without Sacrificing Writes 

15.12.2015

Marian Briceag

Djellel Eddine Difallah

1.12.2015

(10) Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

15.12.2015

Arun Sittampalam

Djellel Eddine Difallah

1.12.2015