Call for Participation
ICDAR2017 Competition on Multi-font and Multi-size Digitally Represented Arabic Text
We are proposing to organize the third edition of the Arabic Printed Recognition Competition. The underlying objective is to contribute in the evolution of Arabic printed text recognition research. Following the competition at ICDAR 2011 ([Slimane 11]) and ICDAR 2013 ([Slimane 13]) the third competition takes place at the 14th International Conference on Document Analysis and Recognition (ICDAR 2017), during November 10-15, 2017, Kyoto, Japan and will be organized using the freely available Arabic Printed Text Images (APTI) Database presented in ICDAR 2009. A description of this database is published under http://diuf.unifr.ch/diva/APTI. Actually, the APTI database is used by more than 100 groups all over the world to develop or benchmark Arabic printed text image recognition systems.
Scientific Objectives
The scientific objectives of this third edition are to measure the capacity of recognition systems to identify the font and the font-size using one Arabic word, and the impact of font and font-size on the text recognition performances. This will be evaluated in multi-font and multi-font contexts. To our knowledge, no competition was organized before for font and font-size identification. The protocols will be defined to evaluate the capacity of recognition systems to handle different sizes and fonts using low resolution images in the aim to look for a robust approach to screen based OCR. The main difficulty is probably in the multi-font and multi-size context as differences between fonts are rather important for Arabic text.
Modalities of the evaluation
The evaluation will be organized using a blind procedure. Participants are allowed to train their different systems using their own database or the available sets of APTI. At a given date, participants have to send their executable that will be ran on an unseen data set in our premises.
Participants can use APTI as training material. The training data in APTI is composed of 5 sets as described in the ICDAR 2009 paper [Slimane 09]. The testing data of the evaluation is composed by an unpublished set (set 6) which is kept secret for evaluation purposes.
For the participants using APTI, it is recommended that they follow strictly the rotation procedure to train their system, as described in the ICDAR 2009 paper. Doing so, comparisons of training algorithms will be easier to interpret. We encourage participants to communicate us their pre-evaluation recognition rates obtained using the rotation procedure before submitting their executables.
The results of the competition will be presented in a special session at ICDAR 2017.
Evaluation Protocols
The evaluation will be reported as font, font-size, word and character recognition rates (FRR, FSRR, WRR and CRR). In this edition, we use the same font sizes (6, 8, 10, 12, 18 and 24) used in the first edition and all systems will be tested using the set 6 of APTI.
First APTI Protocol for Competition: 1st APTIPC
Font: Andalus, Arabic Transparent, AdvertisingBold, Diwani Letter, DecoType Thuluth, Simplified Arabic, Tahoma, Traditional Arabic, DecoType Naskh, M Unicode Sara, Style: Plain, Size: All | ||||||
font-size (FS) recognition system |
||||||
Size |
6 |
8 |
10 |
12 |
18 |
24 |
FSRR |
% |
% |
% |
% |
% |
% |
Recognition rates of the font-size (FS) recognition system
This protocol aims to identify font-size based on Arabic words independently to the font. Participants in this protocol should submit one font-size recognition system for all fonts.
Second APTI Protocol for Competition: 2nd APTIPC
Font: Andalus (A), Arabic Transparent (B), AdvertisingBold (C), Diwani Letter (D), DecoType Thuluth (E), Simplified Arabic (F), Tahoma (G), Traditional Aatbic (H), DecoType Naskh (I), M Unicode Sara (J), Style: Plain, Size: All | ||||||||||
font (F) recognition system |
||||||||||
Size |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
FRR |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
Recognition rates of the font (F) recognition system
This protocol aims to identify font based on Arabic words independently to the font-size. Participants in this protocol should submit one font recognition system for all font-sizes.Third APTI Protocol for Competition: 3rd APTIPC
Font: Andalus (A), Arabic Transparent (B), AdvertisingBold (C), Diwani Letter (D), DecoType Thuluth (E), Simplified Arabic (F), Tahoma (G), Traditional Arabic (H), DecoType Naskh (I), M Unicode Sara (J), Style: Plain, Size: All | ||||||||||
font and font-size recognition system |
||||||||||
Size/Font |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
6 |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
8 |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
10 |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
12 |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
18 |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
24 |
% |
% |
% |
% |
% |
% |
% |
% |
% |
% |
Recognition rates of the font and font-size recognition system
This protocol aims to identify in the same time font and font-size based on Arabic words. Participants in this protocol should submit one font and font-size recognition system.
Fourth APTI Protocol for Competition: 4th APTIPC
Font: Andalus (A), Arabic Transparent (B), AdvertisingBold (C), Diwani Letter (D), DecoType Thuluth (E), Simplified Arabic (F), Tahoma (G), Traditional Arabic (H), DecoType Naskh (I), M Unicode Sara (J), Style: Plain, Size: All | ||||||||||
multi-font and multi-size text recognition system |
||||||||||
Size/Font |
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
6 |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
8 |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
10 |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
12 |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
18 |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
24 |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
CRR WRR |
Recognition rates of the multi-font and multi-size text recognition system
This protocol uses All APTI fonts independently to the size. Participants in this protocol should submit one multi-font and multi-size text recognition system.
Protocol participation: participants can participate to one, two, three or all proposed protocols.
Recognizer Running Format
We run a recognizer (called ProposedRec) by invoking it from the command line as follows:
input.txt
The input file is just a list of paths to each png images to be recognized. For example:
D:\data\competition\Image_0.png
D:\data\competition\Image_1.png
D:\data\competition\Image_2.png
…
output.txt
The output file should be containing the path of the recognized image and the font (for the font recognition protocol), the font-size (for the font-size recognition protocol), the font and font-size (for the font and font-size recognition protocol) and the characters labels composing the word image (for the text recognition protocol). Participant should use the font, font-size and character labels presented in [Slimane 09] available with the database. An example of output file is presented in the following:
- For font recognition
#!MLF!#
"D:/data/competition/Image_0.rec"
Andalus
.
"D:/data/competition/Image_1.rec"
ArabicTransparent
.
.....
- For font-size recognition
#!MLF!#
"D:/data/competition/Image_0.rec"
ten
.
"D:/data/competition/Image_1.rec"
twelve
.
.....
- For font and font-size recognition
#!MLF!#
"D:/data/competition/Image_0.rec"
AdvertisingBold_ten
.
"D:/data/competition/Image_1.rec"
DecoTypeThuluth_twelve
.
.....
- For text recognition
#!MLF!#
"D:/data/competition/Image_0.rec"
TildAboveAlif
Laam
Taaa
.
"D:/data/competition/Image_1.rec"
Laam
TildAboveAlif
Raa
Alif
HamzaAboveAlifBroken
Haa
Miim
.
…..
Important Dates
Competitions open to participants: January 23, 2017
Deadline for submission of executables: June 15, 2017
Expected number of participation in the proposed contest: 10 participants
Organizers
Fouad Slimane1, 3(main contact)
Jean Hennebert 2
Rolf Ingold3
1 MEDIA research lab, Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland
2iCoSys Institute, College of Engineering and Architecture of Fribourg, Switzerland
3 Diva Group, University of Fribourg, Switzerland
References
[Slimane 09]: Fouad Slimane, Rolf Ingold, Slim Kanoun, Adel M. Alimi, Jean Hennebert, "A New Arabic Printed Text Image Database and Evaluation Protocols." In proc. of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR 2009), Barcelona (Spain), July 26 - 29 2009, pp. 946-950.
[Slimane 11]: Fouad Slimane, Slim Kanoun, Haikel El-Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert, "Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text". In proc. of the Eleventh International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing (China), September 18-21, 2011, pp. 1449-1453.[Slimane 13]: Fouad Slimane, Slim Kanoun, Haikel El-Abed, Adel M. Alimi, Rolf Ingold, Jean Hennebert, "ICDAR2013 Competition on Multi-font and Multi-size Digitally Represented Arabic Text". In proc. of The twelfth International Conference on Document Analysis and Recognition (ICDAR 2013), Washington DC (USA), August 25-28, 2013, pp. 1433-1437.
Recent News
[23/01/2017] The third edition of the ICDAR2017 Competition on Multi-font and Multi-size Digitally Represented Arabic Text will be organized at ICDAR'2017 using APTI Database.
[03/01/2013] The second edition of the Competition on Multi-font and Multi-size Digitally Represented Arabic Text will be organized at ICDAR'2013 using APTI Database.
[14/02/2011] The first edition of the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text was organized at ICDAR'2011 using APTI Database.
[06/06/2009] APTI Database was officially presented at ICDAR'09.
This work is a joint collaboration between diferent research groups:
http://diuf.unifr.ch/diva
DIVA Group from University of Fribourg (Switzerland)
REGIM Group from University of Sfax (Tunisia)
http://iig.hevs.ch/valais/software-engineering.html
Software Engineering Unit from Business Information System Institute (HES-SO //Wallis - Switzerland)