Call for Participation

Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text

We are organizing the first edition of the Arabic Digitally Printed Text Recognition Competition. The underlining objective is to compare recognition rates on different fonts and sizes of digitally represented Arabic  text and to contribute in the evolution of Arabic printed text recognition research. This competition takes place at the 11th International Conference on Document Analysis and Recognition (ICDAR2011), during September 18-21, 2011, Beijing, China and will be organized using the new freely available Arabic Printed Text Images (APTI) Database presented in ICDAR’09. A description of this database is published under http://diuf.unifr.ch/diva/APTI. Actually, many research groups have started using the APTI database.

Scientific Objectives

The scientific objectives of this first edition are to measure the impact of font size on the recognition performances. This will be evaluated in mono-font and multi-font contexts. The protocols will be defined to evaluate the capacity of recognition systems to handle different sizes and fonts using digitally low resolution images in the aim to look for a robust approach to screen based OCR. The main difficulty is probably in the multi-font context as differences between fonts are rather important for Arabic text.

Modalities of the evaluation

The evaluation will be organized using a blind procedure. Participants are allowed to train their different systems using their own database or the available sets of APTI. At a given date, participants have to send their executable that will be ran on an unseen data set in our premises.
Participants can use APTI as training material. The training data in APTI is composed of 5 sets as described in the ICDAR’09 paper [Slimane 09]. The testing data of the evaluation is composed by an unpublished set (set6) which is kept secret for evaluation purposes. 
For the participants using APTI, it is recommended that they follow strictly the rotation procedure to train their system, as described in the ICDAR paper. Doing so, comparisons of training algorithms will be easier to interpret. We encourage participants to communicate us their pre-evaluation recognition rates obtained using the rotation procedure before submitting their executables.
The results of the competition will be presented in a special session at ICDAR 2011.

Evaluation Protocols

The evaluation will be reported as word recognition rates and also using an edition distance of characters in words.

First APTI Protocol for Competition: 1st APTIPC


Font : Arabic Transparent, Style : Plain
Font Size = 6 Font Size = 8 Font Size = 10 Font Size = 12 Font Size = 18 Font Size = 24
System 1 System 2 System 3 System 4 System 5 System 6
Reco. Rate % Reco. Rate % Reco. Rate % Reco. Rate % Reco. Rate % Reco. Rate %
Dist. charac. in word Dist. charac. in word Dist. charac. in word Dist. charac. in word Dist. charac. in word Dist. charac. in word

Recognition rates of systems tested with set 6

Second APTI Protocol for Competition: 2nd APTIPC
In this first competition, we will just use the following fonts: Diwani letter, Andalus, Arabic Transparent, Simplified Arabic, Traditional Arabic and sizes (6, 8, 10, 12, 18, 24)

Font : Diwani letter, Andalus, Arabic Transparent, Simplified Arabic, Traditional Arabic,
Style : Plain
Font Size = 6 Font Size = 8 Font Size = 10 Font Size = 12 Font Size = 18 Font Size = 24
System 1 System 2 System 3 System 4 System 5 System 6
Reco. Rate % Reco. Rate % Reco. Rate % Reco. Rate % Reco. Rate % Reco. Rate %
Dist. charac. in word Dist. charac. in word Dist. charac. in word Dist. charac. in word Dist. charac. in word Dist. charac. in word

Recognition rates of systems tested with set 6

Systems: participants can submit different executable systems based on the different font/size or one global system with the possibility to put different parameters.

Recognizer Running Format

For all tests, participant in this competition send us 12 systems (an executable file for each size).
We run a recognizer (called ProposedRec) by invoking it from the command line as follows:
> ProposedRec [parameters] input.txt output.txt
> Example: ProposedRec -f font -s size input.txt output.txt
input.txt
The input file is just a list of paths to each png images to be recognized. For example:
D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_0.png
D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_1.png
D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_2.png

output.txt
The output file should be containing the path of recognized image and the characters labels composing the word image. Participant should use the character labels presented in [Slimane 09] available with the database. An example of output file is presented in the following:

"D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_0.png"
TildAboveAlif
Laam
Taaa
.
"D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_1.png"
Laam
TildAboveAlif
Raa
Alif
HamzaAboveAlifBroken
Haa
Miim
.
…..
Important Dates
Deadline for or competition registration: March 31, April 30, 2011 (by email)
Deadline for submission of executables: April 30, 2011
Expected number of participation in the proposed contest: 10 participants

Organizers

Fouad Slimane1,2 (main contact)
Slim Kanoun5
Haikal El Abed4
Jean Hennebert 1,3
Rolf Ingold1
Adel M. Alimi2,

1 Diva Group, University of Fribourg, Switzerland
2 REGIM Group, National Engineering School of Sfax, Tunisia
3 HES-SO // Wallis, University of Applied Sciences Western, Switzerland
4 Institute for Communications Technology (IfN), Germany
5 National Engineering School of Sfax, Tunisia

References

[Slimane 09]: Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M. Alimi, Jean Hennebert, "A New Arabic Printed Text Image Database and Evaluation Protocols." In proc. of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona (Spain), July 26 - 29 2009, pp. 946-950.