Database statistics
The APTI Database consists of 113’284 different single words presented in 10 fonts, 10 font-sizes and 4 font-styles. Table 1 shows the total quantity of word images, PAWs (Piece of Arabic Words), and characters in APTI Database.
Number of Words | Number of PAWs | Number of characters | |
113’284 | 274’833 | 648’280 | |
Number of Font | 10 | 10 | 10 |
Number of Font Size | 10 | 10 | 10 |
Number of Font Styles | 4 | 4 | 4 |
Total | 45’313’600 | 109’933’200 | 259’312’000 |
Table 1: Quantity of words, PAWs, characters in database
Division into sets
APTI Database is dividedinto six equilibrated sets to allow for flexibility in the composition of development and evaluation partitions. The words in each set are different but the distribution of all used letters is nearly the same in the various sets (see table 2).
Letter label | Set 1 | Set 2 | Set 3 | Set 4 | Set 5 | Set 6 |
Alif | 15078 | 14925 | 15165 | 15120 | 15046 | 15019 |
Baa | 4513 | 4763 | 4692 | 4704 | 4730 | 4717 |
Taaa | 9926 | 9884 | 9897 | 9797 | 9942 | 9897 |
Thaa | 634 | 633 | 631 | 634 | 643 | 628 |
Jiim | 1893 | 1897 | 1887 | 1924 | 1915 | 1939 |
Haaa | 2953 | 2963 | 3017 | 2933 | 3000 | 3000 |
Xaa | 1407 | 1435 | 1439 | 1401 | 1403 | 1407 |
Daal | 3187 | 3033 | 3075 | 2990 | 3028 | 3086 |
Thaal | 514 | 520 | 528 | 504 | 516 | 518 |
Raa | 6304 | 6243 | 6169 | 6335 | 6253 | 6267 |
Zaay | 1064 | 1054 | 1054 | 1066 | 1042 | 1045 |
Siin | 3674 | 3556 | 3674 | 3512 | 3629 | 3603 |
Shiin | 1457 | 1446 | 1418 | 1434 | 1455 | 1458 |
Saad | 1374 | 1377 | 1388 | 1411 | 1371 | 1389 |
Daad | 922 | 943 | 936 | 906 | 921 | 920 |
Thaaa | 1419 | 1426 | 1431 | 1426 | 1446 | 1462 |
Taa | 242 | 238 | 240 | 238 | 239 | 241 |
Ayn | 2764 | 2823 | 2769 | 2718 | 2755 | 2723 |
Ghayn | 981 | 970 | 983 | 984 | 990 | 1004 |
Faa | 2305 | 2256 | 2221 | 2313 | 2339 | 2315 |
Gaaf | 2784 | 2734 | 2853 | 2883 | 2762 | 2803 |
Kaaf | 2101 | 2090 | 2099 | 2145 | 2136 | 2140 |
Laam | 6745 | 6926 | 6972 | 7002 | 6790 | 6724 |
Miim | 7871 | 7836 | 7957 | 7806 | 7797 | 7817 |
Nuun | 7484 | 7433 | 7289 | 7316 | 7400 | 7264 |
NuunChadda | 225 | 224 | 224 | 223 | 224 | 223 |
Haa | 2670 | 2687 | 2590 | 2718 | 2705 | 2724 |
Waaw | 4421 | 4313 | 4325 | 4333 | 4264 | 4352 |
Yaa | 6641 | 6630 | 6876 | 6685 | 6648 | 6735 |
YaaChadda | 725 | 727 | 709 | 719 | 735 | 733 |
Hamza | 192 | 187 | 190 | 193 | 192 | 188 |
HamzaAboveAlif | 1437 | 1483 | 1455 | 1512 | 1456 | 1427 |
TaaaClosed | 1417 | 1407 | 1394 | 1364 | 1409 | 1385 |
HamzaUnderAlif | 253 | 250 | 256 | 247 | 248 | 247 |
AlifBroken | 162 | 161 | 164 | 163 | 161 | 161 |
TildAboveAlif | 84 | 84 | 83 | 83 | 83 | 83 |
HamzaAboveAlifBroken | 210 | 208 | 208 | 209 | 208 | 210 |
HamzaAboveWaaw | 89 | 90 | 89 | 91 | 89 | 90 |
Quantity of Characters | 108’122 | 107’855 | 108’347 | 108’042 | 107’970 | 107’944 |
Quantity of PAWs | 45’982 | 45’740 | 45’792 | 45’884 | 45’630 | 45’805 |
Quantity of words | 18897 | 18892 | 18886 | 18875 | 18868 | 18866 |
Table 2: Distribution of characters in the different sets of APTI
For more details about the distribution of each shape of characters in their respective sets (see paper) .
Recent News
[23/01/2017] The third edition of the ICDAR2017 Competition on Multi-font and Multi-size Digitally Represented Arabic Text will be organized at ICDAR'2017 using APTI Database.
[03/01/2013] The second edition of the Competition on Multi-font and Multi-size Digitally Represented Arabic Text will be organized at ICDAR'2013 using APTI Database.
[14/02/2011] The first edition of the Arabic Recognition Competition: Multi-font Multi-size Digitally Represented Text was organized at ICDAR'2011 using APTI Database.
[06/06/2009] APTI Database was officially presented at ICDAR'09.
This work is a joint collaboration between diferent research groups:
http://diuf.unifr.ch/diva
DIVA Group from University of Fribourg (Switzerland)
REGIM Group from University of Sfax (Tunisia)
http://iig.hevs.ch/valais/software-engineering.html
Software Engineering Unit from Business Information System Institute (HES-SO //Wallis - Switzerland)