Call for Participation

ICDAR2017 Competition on Multi-font and Multi-size Digitally Represented Arabic Text

We are proposing to organize the third edition of the Arabic Printed Recognition Competition. The underlying objective is to contribute in the evolution of Arabic printed text recognition research. Following the competition at ICDAR 2011 ([Slimane 11]) and ICDAR 2013 ([Slimane 13]) the third competition takes place at the 14th International Conference on Document Analysis and Recognition (ICDAR 2017), during November 10-15, 2017, Kyoto, Japan and will be organized using the freely available Arabic Printed Text Images (APTI) Database presented in ICDAR 2009. A description of this database is published under http://diuf.unifr.ch/diva/APTI. Actually, the APTI database is used by more than 100 groups all over the world to develop or benchmark Arabic printed text image recognition systems.

Scientific Objectives

The scientific objectives of this third edition are to measure the capacity of recognition systems to identify the font and the font-size using one Arabic word, and the impact of font and font-size on the text recognition performances. This will be evaluated in multi-font and multi-font contexts. To our knowledge, no competition was organized before for font and font-size identification. The protocols will be defined to evaluate the capacity of recognition systems to handle different sizes and fonts using low resolution images in the aim to look for a robust approach to screen based OCR. The main difficulty is probably in the multi-font and multi-size context as differences between fonts are rather important for Arabic text.

Modalities of the evaluation

The evaluation will be organized using a blind procedure. Participants are allowed to train their different systems using their own database or the available sets of APTI. At a given date, participants have to send their executable that will be ran on an unseen data set in our premises.
Participants can use APTI as training material. The training data in APTI is composed of 5 sets as described in the ICDAR 2009 paper [Slimane 09]. The testing data of the evaluation is composed by an unpublished set (set 6) which is kept secret for evaluation purposes. 
For the participants using APTI, it is recommended that they follow strictly the rotation procedure to train their system, as described in the ICDAR 2009 paper. Doing so, comparisons of training algorithms will be easier to interpret. We encourage participants to communicate us their pre-evaluation recognition rates obtained using the rotation procedure before submitting their executables.
The results of the competition will be presented in a special session at ICDAR 2017.

Evaluation Protocols

The evaluation will be reported as font, font-size, word and character recognition rates (FRR, FSRR, WRR and CRR). In this edition, we use the same font sizes (6, 8, 10, 12, 18 and 24) used in the first edition and all systems will be tested using the set 6 of APTI.

First APTI Protocol for Competition: 1st APTIPC

Font: Andalus, Arabic Transparent, AdvertisingBold, Diwani Letter,  DecoType Thuluth, Simplified Arabic, Tahoma, Traditional Arabic, DecoType Naskh, M Unicode Sara, Style: Plain, Size: All
font-size (FS) recognition system
Size
6
8
10
12
18
24
FSRR
%
%
%
%
%
%

Recognition rates of the font-size (FS) recognition system
This protocol aims to identify font-size based on Arabic words independently to the font. Participants in this protocol should submit one font-size recognition system for all fonts.

Second APTI Protocol for Competition: 2nd APTIPC

Font: Andalus (A), Arabic Transparent (B), AdvertisingBold (C), Diwani Letter (D),  DecoType Thuluth (E), Simplified Arabic (F), Tahoma (G), Traditional Aatbic (H), DecoType Naskh (I), M Unicode Sara (J), Style: Plain, Size: All
font (F) recognition system
Size
A
B
C
D
E
F
G
H
I
J
FRR
%
%
%
%
%
%
%
%
%
%

Recognition rates of the font (F) recognition system

This protocol aims to identify font based on Arabic words independently to the font-size. Participants in this protocol should submit one font recognition system for all font-sizes.

Third APTI Protocol for Competition: 3rd APTIPC

Font: Andalus (A), Arabic Transparent (B), AdvertisingBold (C), Diwani Letter (D),  DecoType Thuluth (E), Simplified Arabic (F), Tahoma (G), Traditional Arabic (H), DecoType Naskh (I), M Unicode Sara (J), Style: Plain, Size: All
font and font-size recognition system
Size/Font
A
B
C
D
E
F
G
H
I
J
6
%
%
%
%
%
%
%
%
%
%
8
%
%
%
%
%
%
%
%
%
%
10
%
%
%
%
%
%
%
%
%
%
12
%
%
%
%
%
%
%
%
%
%
18
%
%
%
%
%
%
%
%
%
%
24
%
%
%
%
%
%
%
%
%
%

Recognition rates of the font and font-size recognition system
This protocol aims to identify in the same time font and font-size based on Arabic words. Participants in this protocol should submit one font and font-size recognition system.

Fourth APTI Protocol for Competition: 4th APTIPC

Font: Andalus (A), Arabic Transparent (B), AdvertisingBold (C), Diwani Letter (D),  DecoType Thuluth (E), Simplified Arabic (F), Tahoma (G), Traditional Arabic (H), DecoType Naskh (I), M Unicode Sara (J), Style: Plain, Size: All
multi-font and multi-size text recognition system
Size/Font
A
B
C
D
E
F
G
H
I
J
6
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
8
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
10
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
12
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
18
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
24
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR
CRR WRR

Recognition rates of the multi-font and multi-size text recognition system
This protocol uses All APTI fonts independently to the size. Participants in this protocol should submit one multi-font and multi-size text recognition system.

Protocol participation: participants can participate to one, two, three or all proposed protocols.

Recognizer Running Format

We run a recognizer (called ProposedRec) by invoking it from the command line as follows:

> ProposedRec input.txt output.txt

input.txt
The input file is just a list of paths to each png images to be recognized. For example:
D:\data\competition\Image_0.png
D:\data\competition\Image_1.png
D:\data\competition\Image_2.png

output.txt
The output file should be containing the path of the recognized image and the font (for the font recognition protocol), the font-size (for the font-size recognition protocol), the font and font-size (for the font and font-size recognition protocol) and the characters labels composing the word image (for the text recognition protocol). Participant should use the font, font-size and character labels presented in [Slimane 09] available with the database. An example of output file is presented in the following:

  1. For font recognition

#!MLF!#
"D:/data/competition/Image_0.rec"
Andalus
.
"D:/data/competition/Image_1.rec"
ArabicTransparent
.
.....

  1. For font-size recognition

#!MLF!#
"D:/data/competition/Image_0.rec"
ten
.
"D:/data/competition/Image_1.rec"
twelve
.
.....

  1. For font and font-size recognition

#!MLF!#
"D:/data/competition/Image_0.rec"
AdvertisingBold_ten
.
"D:/data/competition/Image_1.rec"
DecoTypeThuluth_twelve
.
.....

  1. For text recognition

#!MLF!#
"D:/data/competition/Image_0.rec"
TildAboveAlif
Laam
Taaa
.
"D:/data/competition/Image_1.rec"
Laam
TildAboveAlif
Raa
Alif
HamzaAboveAlifBroken
Haa
Miim
.
…..
Important Dates
Competitions open to participants: January 23, 2017
Deadline for submission of executables: June 15, 2017

Expected number of participation in the proposed contest: 10 participants

Organizers

Fouad Slimane1, 3(main contact)
Jean Hennebert 2
Rolf Ingold3

1 MEDIA research lab, Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland
2iCoSys Institute, College of Engineering and Architecture of Fribourg, Switzerland
3 Diva Group, University of Fribourg, Switzerland

References

[Slimane 09]: Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M. Alimi, Jean Hennebert, "A New Arabic Printed Text Image Database and Evaluation Protocols." In proc. of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona (Spain), July 26 - 29 2009, pp. 946-950.

[Slimane 11]: Fouad Slimane, Slim Kanoun, Haikel El-Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert, "Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text".  In proc. of the Eleventh International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing (China), September 18-21, 2011, pp. 1449-1453.

[Slimane 13]: Fouad Slimane, Slim Kanoun, Haikel El-Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert, "ICDAR2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text". In proc. of The twelfth International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC (USA), August 25-28, 2013, pp. 1433-1437.