Call for Participation
Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text
We are organizing the first edition of the Arabic Digitally Printed Text Recognition Competition. The underlining objective is to compare recognition rates on different fonts and sizes of digitally represented Arabic text and to contribute in the evolution of Arabic printed text recognition research. This competition takes place at the 11th International Conference on Document Analysis and Recognition (ICDAR2011), during September 18-21, 2011, Beijing, China and will be organized using the new freely available Arabic Printed Text Images (APTI) Database presented in ICDAR’09. A description of this database is published under http://diuf.unifr.ch/diva/APTI. Actually, many research groups have started using the APTI database.
Scientific Objectives
The scientific objectives of this first edition are to measure the impact of font size on the recognition performances. This will be evaluated in mono-font and multi-font contexts. The protocols will be defined to evaluate the capacity of recognition systems to handle different sizes and fonts using digitally low resolution images in the aim to look for a robust approach to screen based OCR. The main difficulty is probably in the multi-font context as differences between fonts are rather important for Arabic text.
Modalities of the evaluation
The evaluation will be organized using a blind procedure. Participants are allowed to train their different systems using their own database or the available sets of APTI. At a given date, participants have to send their executable that will be ran on an unseen data set in our premises.
Participants can use APTI as training material. The training data in APTI is composed of 5 sets as described in the ICDAR’09 paper [Slimane 09]. The testing data of the evaluation is composed by an unpublished set (set6) which is kept secret for evaluation purposes.
For the participants using APTI, it is recommended that they follow strictly the rotation procedure to train their system, as described in the ICDAR paper. Doing so, comparisons of training algorithms will be easier to interpret. We encourage participants to communicate us their pre-evaluation recognition rates obtained using the rotation procedure before submitting their executables.
The results of the competition will be presented in a special session at ICDAR 2011.
Evaluation Protocols
The evaluation will be reported as word recognition rates and also using an edition distance of characters in words.First APTI Protocol for Competition: 1st APTIPC
Font : Arabic Transparent, Style : Plain |
|||||
Font Size = 6 | Font Size = 8 | Font Size = 10 | Font Size = 12 | Font Size = 18 | Font Size = 24 |
System 1 | System 2 | System 3 | System 4 | System 5 | System 6 |
Reco. Rate % | Reco. Rate % | Reco. Rate % | Reco. Rate % | Reco. Rate % | Reco. Rate % |
Dist. charac. in word | Dist. charac. in word | Dist. charac. in word | Dist. charac. in word | Dist. charac. in word | Dist. charac. in word |
Recognition rates of systems tested with set 6
Second APTI Protocol for Competition: 2nd APTIPC
In this first competition, we will just use the following fonts: Diwani letter, Andalus, Arabic Transparent, Simplified Arabic, Traditional Arabic and sizes (6, 8, 10, 12, 18, 24)
Font : Diwani letter, Andalus, Arabic Transparent, Simplified Arabic, Traditional Arabic, Style : Plain |
|||||
Font Size = 6 | Font Size = 8 | Font Size = 10 | Font Size = 12 | Font Size = 18 | Font Size = 24 |
System 1 | System 2 | System 3 | System 4 | System 5 | System 6 |
Reco. Rate % | Reco. Rate % | Reco. Rate % | Reco. Rate % | Reco. Rate % | Reco. Rate % |
Dist. charac. in word | Dist. charac. in word | Dist. charac. in word | Dist. charac. in word | Dist. charac. in word | Dist. charac. in word |
Recognition rates of systems tested with set 6
Systems: participants can submit different executable systems based on the different font/size or one global system with the possibility to put different parameters.
Recognizer Running Format
For all tests, participant in this competition send us 12 systems (an executable file for each size).We run a recognizer (called ProposedRec) by invoking it from the command line as follows:
> ProposedRec [parameters] input.txt output.txt
> Example: ProposedRec -f font -s size input.txt output.txt
input.txt
The input file is just a list of paths to each png images to be recognized. For example:
D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_0.png
D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_1.png
D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_2.png
…
output.txt
The output file should be containing the path of recognized image and the characters labels composing the word image. Participant should use the character labels presented in [Slimane 09] available with the database. An example of output file is presented in the following:
"D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_0.png"
TildAboveAlif
Laam
Taaa
.
"D:\APTI-Database\Images\Andalus_6_Plain\set6\ Image_6_Andalus_1.png"
Laam
TildAboveAlif
Raa
Alif
HamzaAboveAlifBroken
Haa
Miim
.
…..
Important Dates
Deadline for or competition registration: March 31, April 30, 2011 (by email)
Deadline for submission of executables: April 30, 2011
Expected number of participation in the proposed contest: 10 participants
Organizers
Fouad Slimane1,2 (main contact)
Slim Kanoun5
Haikal El Abed4
Jean Hennebert 1,3
Rolf Ingold1
Adel M. Alimi2,
1 Diva Group, University of Fribourg, Switzerland
2 REGIM Group, National Engineering School of Sfax, Tunisia
3 HES-SO // Wallis, University of Applied Sciences Western, Switzerland
4 Institute for Communications Technology (IfN), Germany
5 National Engineering School of Sfax, Tunisia
References
[Slimane 09]: Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M. Alimi, Jean Hennebert, "A New Arabic Printed Text Image Database and Evaluation Protocols." In proc. of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona (Spain), July 26 - 29 2009, pp. 946-950.Recent News
[23/01/2017] The third edition of the ICDAR2017 Competition on Multi-font and Multi-size Digitally Represented Arabic Text will be organized at ICDAR'2017 using APTI Database.
[03/01/2013] The second edition of the Competition on Multi-font and Multi-size Digitally Represented Arabic Text will be organized at ICDAR'2013 using APTI Database.
[14/02/2011] The first edition of the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text was organized at ICDAR'2011 using APTI Database.
[06/06/2009] APTI Database was officially presented at ICDAR'09.
This work is a joint collaboration between diferent research groups:
http://diuf.unifr.ch/diva
DIVA Group from University of Fribourg (Switzerland)
REGIM Group from University of Sfax (Tunisia)
http://iig.hevs.ch/valais/software-engineering.html
Software Engineering Unit from Business Information System Institute (HES-SO //Wallis - Switzerland)