Call for Participation

ICDAR2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text

We are proposing to organize the second edition of the Arabic Printed Recognition Competition. The first edition was organized in the Eleventh International Conference on Document Analysis and Recognition (ICDAR’11), Beijing (China) [Slimane 11]. The underlying objective is to contribute in the evolution of Arabic printed text recognition research. This competition takes place at the 12th International Conference on Document Analysis and Recognition (ICDAR’13), during August 25-28, 2013, Washington DC, United States of America and will be organized using the freely available Arabic Printed Text Images (APTI) Database presented in ICDAR’09. A description of this database is published under http://diuf.unifr.ch/diva/APTI. Actually, the APTI database is used by more than 27 groups all over the world to develop or benchmark Arabic printed text image recognition systems.

Scientific Objectives

The scientific objectives of this second edition are to measure the impact of font-size on the recognition performances. This will be evaluated in mono-font and multi-font contexts. The protocols will be defined to evaluate the capacity of recognition systems to handle different sizes and fonts using low resolution images in the aim to look for a robust approach to screen based OCR. The main difficulty is probably in the multi-font context as differences between fonts are rather important for Arabic text.

Modalities of the evaluation

The evaluation will be organized using a blind procedure. Participants are allowed to train their different systems using their own database or the available sets of APTI. At a given date, participants have to send their executable that will be ran on an unseen data set in our premises.
Participants will have full access to the public part of APTI as training material. The training data in APTI is composed of 5 sets as described in the ICDAR’09 paper [Slimane 09]. The testing data of the evaluation is composed by an unpublished set (APTI set 6) which is similar to the other sets in terms of image characteristics but which is kept undisclosed for evaluation purposes. The participants will not have access to this set 6 neither before nor after the ICDAR competition.
For the participants using APTI, it is recommended that they follow strictly the rotation procedure to train their system, as described in the ICDAR’09 paper. Doing so, comparisons of training algorithms will be easier to interpret. We encourage participants to communicate us their pre-evaluation recognition rates obtained using the rotation procedure before submitting their executables.
The results of the competition will be presented in a special session at ICDAR 2013.

Evaluation Protocols

The evaluation will be reported as word and character recognition rates (WRR and CRR). In this edition, we use the same font sizes (6, 8, 10, 12, 18 and 24) used in the first edition and all systems will be tested using a part of set 6 of APTI.

Zero APTI Protocol for Competition: 0 APTIPC
Font : Arabic Transparent, Style : Plain
Font Size = 6 Font Size = 8 Font Size = 10 Font Size = 12 Font Size = 18 Font Size = 24
System 1 System 2 System 3 System 4 System 5 System 6
CRR %
CRR %
CRR %
CRR %
CRR %
CRR %
WRR %
WRR %
WRR %
WRR %
WRR %
WRR %

Recognition rates of the “Arabic Transparent” mono-size systems

This protocol is the same one proposed in the first edition of the competition. To participate to this protocol, participants should submit six systems (one for each Size) or one system using the parameter size, like: ProposedRec -s size input.txt output.txt

First APTI Protocol for Competition: 1st APTIPC

Font: Arabic Transparent, Style: Plain, Size: 6, 8, 10, 12, 18, 24
“Arabic Transparent” multi-size system
Size
6
8
10
12
18
24
CRR
%
%
%
%
%
%
WRR
%
%
%
%
%
%

Recognition rates of the “Arabic Transparent” multi-size system

This protocol uses the same font used in protocol zero but independently to the size. Participants should submit one multi-size system for this protocol.

Second APTI Protocol for Competition: 2nd APTIPC

Font: DecoType Naskh, Style: Plain, Size: 6, 8, 10, 12, 18, 24
DecoType Naskh” multi-size system
Size
6
8
10
12
18
24
CRR
%
%
%
%
%
%
WRR
%
%
%
%
%
%

Recognition rates of the “DecoType Naskh” multi-size system

This protocol uses the ligatured font “DecoType Naskh independently to the size. Participants should submit one multi-size system for this protocol.
Third APTI Protocol for Competition: 3rd APTIPC

Font: Andalus, Arabic Transparent, AdvertisingBold, Diwani Letter,  DecoType Thuluth, Simplified Arabic, Tahoma, Traditional Aatbic, DecoType Naskh, M Unicode Sara, Style: Plain, Size: 6, 8, 10, 12, 18, 24
multi-font and multi-size system
Size
6
8
10
12
18
24
CRR
%
%
%
%
%
%
WRR
%
%
%
%
%
%

Recognition rates of the multi-font and multi-size system

This protocol uses All APTI fonts independently to the size. Participants in this protocol should submit one multi-font and multi-size system.

Protocol participation: participants can participate to one, two, three or all proposed protocols.
Recognizer Running Format
We run a recognizer (called ProposedRec) by invoking it from the command line as follows:
> ProposedRec input.txt output.txt
input.txt
The input file is just a list of paths to each png images to be recognized. For example:
D:\APTI-Database\Images\Set12345\Andalus_6_Bold\set1\ Image_6_Andalus_0.png
D:\APTI-Database\Images\Set12345\Andalus_6_Bold\set1\ Image_6_Andalus_1.png
D:\APTI-Database\Images\Set12345\Andalus_6_Bold\set1\ Image_6_Andalus_2.png

output.txt
The output file should be containing the path of the recognized image and the characters labels composing the word image. Participant should use the character labels presented in [Slimane 09] available with the database. An example of output file is presented in the following:

#!MLF!#
"D:\APTI-Database\Images\Set12345\Andalus_6_Bold\set1\ Image_6_Andalus_0.rec"
TildAboveAlif
Laam
Taaa
.
"D:\APTI-Database\Images\Set12345\Andalus_6_Bold\set1\ Image_6_Andalus_1.rec"
Laam
TildAboveAlif
Raa
Alif
HamzaAboveAlifBroken
Haa
Miim
.
…..
Important Dates
New Deadline for or competition registration: March 31, 2013 (by email)
New Deadline for submission of executables: March 31, 2013Expected number of participation in the proposed contest: 10 participants

Organizers

Fouad Slimane1,2 (main contact)
Slim Kanoun4
Haikal El Abed3
Jean Hennebert 1
Rolf Ingold1
Adel M. Alimi2,

1 Diva Group, University of Fribourg, Switzerland
2 REGIM Group, National Engineering School of Sfax, Tunisia
3 Institute for Communications Technology (IfN), Germany
4 National Engineering School of Sfax, Tunisia

References

[Slimane 09]: Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M. Alimi, Jean Hennebert, "A New Arabic Printed Text Image Database and Evaluation Protocols." In proc. of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona (Spain), July 26 - 29 2009, pp. 946-950.

[Slimane 11]: Fouad Slimane, Slim Kanoun, Haikel El-Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert, "Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text".  In proc. of the Eleventh International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing (China), September 18-21, 2011, pp. 1449-1453.