Database statistics

The APTI Database consists of 113’284 different single words presented in 10 fonts, 10 font-sizes and 4 font-styles. Table 1 shows the total quantity of word images, PAWs (Piece of Arabic Words), and characters in APTI Database.

  Number of Words Number of PAWs Number of characters
  113’284 274’833 648’280
Number of Font 10 10 10
Number of Font Size 10 10 10
Number of Font Styles 4 4 4
Total 45’313’600 109’933’200 259’312’000

Table 1: Quantity of words, PAWs, characters in database

Division into sets

APTI Database is dividedinto six equilibrated sets to allow for flexibility in the composition of development and evaluation partitions. The words in each set are different but the distribution of all used letters is nearly the same in the various sets (see table 2).

Letter label Set 1 Set 2 Set 3 Set 4 Set 5 Set 6
Alif 15078 14925 15165 15120 15046 15019
Baa 4513 4763 4692 4704 4730 4717
Taaa 9926 9884 9897 9797 9942 9897
Thaa 634 633 631 634 643 628
Jiim 1893 1897 1887 1924 1915 1939
Haaa 2953 2963 3017 2933 3000 3000
Xaa 1407 1435 1439 1401 1403 1407
Daal 3187 3033 3075 2990 3028 3086
Thaal 514 520 528 504 516 518
Raa 6304 6243 6169 6335 6253 6267
Zaay 1064 1054 1054 1066 1042 1045
Siin 3674 3556 3674 3512 3629 3603
Shiin 1457 1446 1418 1434 1455 1458
Saad 1374 1377 1388 1411 1371 1389  
Daad 922 943 936 906 921 920
Thaaa 1419 1426 1431 1426 1446 1462
Taa 242 238 240 238 239 241
Ayn 2764 2823 2769 2718 2755 2723
Ghayn 981 970 983 984 990 1004
Faa 2305 2256 2221 2313 2339 2315   
Gaaf 2784 2734 2853 2883 2762 2803
Kaaf 2101 2090 2099 2145 2136 2140
Laam 6745 6926 6972 7002 6790 6724
Miim 7871 7836 7957 7806 7797 7817
Nuun 7484 7433 7289 7316 7400 7264
NuunChadda 225 224 224 223 224 223
Haa 2670 2687 2590 2718 2705 2724
Waaw 4421 4313 4325 4333 4264 4352
Yaa 6641 6630 6876 6685 6648 6735
YaaChadda 725 727 709 719 735 733
Hamza 192 187 190 193 192 188
HamzaAboveAlif 1437 1483 1455 1512 1456 1427
TaaaClosed 1417 1407 1394 1364 1409 1385
HamzaUnderAlif 253 250 256 247 248 247
AlifBroken 162 161 164 163 161 161
TildAboveAlif 84 84 83 83 83 83
HamzaAboveAlifBroken 210 208 208 209 208 210
HamzaAboveWaaw 89 90 89 91 89 90
Quantity of Characters 108’122 107’855 108’347 108’042 107’970 107’944
Quantity of PAWs 45’982 45’740 45’792 45’884 45’630 45’805
Quantity of words 18897 18892 18886 18875 18868 18866

Table 2: Distribution of characters in the different sets of APTI

For more details about the distribution of each shape of characters in their respective sets (see paper) .