If you think encoding and rendering the Qur'an electronically in its complete form is a worthwhile project, and you would like to support the work, please click the button and make a donation:
Encoding the Qur'an

Many, myself included, have tried to encode the Qur'an using unicode. We have each struggled in our own way to work around the various problems encountered, chosen what to include and what to omit, and many have succeeded in a decent rendering.

The problems are twofold.

First, unicode is not complete. It does not contain all of the diacritics and additional characters necessary for a full and accurate rendering.

Second, the method of rendering each glyph (character) depends only upon how the previous one was drawn. This means that many of the ligatures (multiple characters joined together) cannot be drawn correctly, particularly lam-alif combinations.

In short, a markup approach is required, containing all of the characters, large and small, from which the best current rendering in unicode might be extracted, and also a rules based approach for drawing out the full text.

Existing unicode versions of the Qur'an

When I first started attempting a 'clean' unicode version, I could only find three sources. I do not know the origin of any of their sources, but think of them as the "Sacred Texts" version, the "One Ummah" version, and the "Hadith.net" version.

Since then, a few more more versions have come to light. Here are the ones I could find, and have used

NameurlSame asUsed?Notes
Sacred texts http://www.sacred-texts.com YesNotes
One Ummah http://www.oneummah.net YesNotes
Open Source http://www.pakistanopensource.org/... Sacred texts NoNotes
Hadith.net http://www.hadith.net YesNotes
Quran.com http://quran.com/article49.html YesNotes
Mystic letters http://www.mysticletters.com YesNotes
Previous http://www.ivencia.com/quran Yes Created from Sacred Texts,
Hadith.net & One Ummah
versions, then edited by hand
Qur'an Karim http://www.lib.umich.edu/... One Ummah No
Translations http://www.usc.edu/ Yes Yusuf ali, Shakir
and Pickthall
translations
Khalifa WinQT2 Yes I have added in the last
two verses of Surah 9 while
comparing
DivineIslam Quran Viewer Yes

 

Differences in style

Encoding

The Khalifa and Hadith.Net versions use Windows-1256 arabic encoding, e.g. بسم الله الرحمن الرحيم

Divine Islam, Quran.com, Mystic letters, One Ummah and Previous, all use 7 character html (or NCR) unicode definitions, e.g ذَلِك&#1614 (bismi)

Sacred Texts uses unicode arabic, e.g. بِسْمِ

Alif hints

The Sacred texts and the One Ummah versions contain alif hints. The Sacred texts version encodes them as \u065F or character 1631, which appears as a rectangle indicating an unused character on my computer. The One Ummah versions uses a full alif, character 1575 or \u0627.

Diacritics

Khalifa does not contain any diacritics - khasra, damma, fatha, sukun etc.

Lam alif ligatures

When a lam is immediately followed by an alif, a lam-alif is created. If there is a diacritic between the lam and the alif, the lam-alif is not drawn. The first lam alif in the Qur'an is in Surah 1, ayah 7. It should be lam.fatha.alif.

In Sacred Texts and Previous, it is lam.fatha.alif. In Divine Islam, it is lam alif. In the others it is lam.alif. Both lam.alif and lam.alif.fatha show up the lam-alif glyph - وَلاَ  - The versions where it is lam.fatha.alif does not show the lam-alif glyph - وَلَا .

Comparing the versions

 

There are seven versions of the Qur'an, which although they are almost identical, do differ slightly in pronunciation. For more, see here: http://www.islamic-awareness.org/Quran/Text/Qiraat/hafs.html#1. I have used the common printed version of the Complex for the printing of the Holy Qur'an, which, as far as I am aware, is the the Hafs version.

To begin to compare the electronic versions, there are two steps I have taken.

Step 1. Look for obvious errors and manually correct them. I have done this in the following cases so far, where short passages of a few words were repeated. The corrections to Previous are not particularly relevant as I shall only be using those surahs I edited by hand: Surahs 1, part of 2, 42, 50, 68 and 100 to 114 inclusive. I have included them for the sake of completeness.

Manual correction 1.

Previous and Khalifa both have an extra piece of text in Surah 6, line 81, Compare the following:

Sacred T: وَكَيْفَ أَخَافُ مَآ أَشْرَكْتُمْ وَلَا تَخَافُونَ أَنَّكُمْ أَشْرَكْتُم بِٱللَّهِ مَا لَمْ يُنَزِّلْ بِهِۦ عَلَيْكُمْ سُلْطَٟنًۭاۚ فَأَىُّ ٱلْفَرِيقَيْنِ أَحَقُّ بِٱلْأَمْنِۖ إِن كُنتُمْ تَعْلَمُونَ

Previous:وَكَيْفَ أَخَافُ مَا أَشْرَكْتُمْ وَلَا تَخَافُونَ أَنَّكُمْ أَشْرَكْتُمْ بِاللَّهِ وَلَا تَخَافُونَ أَنَّكُمْ أَشْرَكْتُمْ بِاللَّهِ مَا لَمْ يُنَزِّلْ بِهِ عَلَيْكُمْ سُلْطَانًا فَأَيُّ الْفَرِيقَيْنِ أَحَقُّ بِالْأَمْنِ إِنْ كُنتُمْ تَعْلَمُونَ

The highlighted text is not in my printed Qur'an. It is a repetition of six words, and has been excised from Previous and Khalifa.

Manual Correction 2.

Surah 13 in Previous, was missing ayah 11 onwards. I have added the remaining ayat from the One Ummah version.

Manual Correction 3.

Khalifa has (infamously) removed the last two lines from Surah 9. I have replaced them with the Hadith.net version, but may return to this subject later.

Manual Correction 4.

Surah 5, line 6 has another repeat in Previous. The following text shows only the start of the relevant line. The numbers 7 and 8 are counts of a single letter which has identified the repeat.

Sacred: 7يَٟٓأَيُّهَا ٱلَّذِينَ ءَامَنُوٓا۟ إِذَا قُمْتُمْ إِلَى ٱلصَّلَوٰةِ فَٱغْسِلُوا۟ وُجُوهَكُمْ وَأَيْدِيَكُمْ إِلَى ٱلْمَرَافِقِ وَٱمْسَحُوا

Previous: 8 يَاأَيُّهَا الَّذِينَ آمَنُوا إِذَا قُمْتُمْ إِلَى الصَّلَاةِ فَاغْسِلُوا وُجُوهَكُمْ وَأَيْدِيَكُمْ فَاغْسِلُوا وُجُوهَكُمْ وَأَيْدِيَكُمْ إِلَى الْمَرَافِقِ وَامْسَحُوا

Manual Correction 5.

Surah 4, line 102 has another repeat in Previous. The following text shows the relevant passage.

Previous: 4 وَإِذَا كُنتَ فِيهِمْ فَأَقَمْتَ لَهُمْ الصَّلَوةَ فَلْتَقُمْ طَائِفَةٌ مِنْهُمْ مَعَكَ وَلْيَأْخُذُوا أَسْلِحَتَهُمْ فَإِذَا سَجَدُوا فَلْيَكُونُوا مِنْ وَرَائِكُمْ وَلْتَأْتِ طَائِفَةٌوَلْتَأْتِ طَائِفَةٌ أُخْرَى لَمْ

Khalifa: 3 واذا كنت فيهم فاقمت لهم الصلوة فلتقم طائفة منهم معك ولياخذوا اسلحتهم فاذا سجدوا فليكونوا من ورائكم ولتات طائفة اخرى لم

Manual Correction 6.

Surah 24 has the first word of ayah 8 at the end of ayah 7 in the Mystic Letters and One Ummah versions.

 

The fact that these manual corrections in some cases bridge more than one version, begging the question: How many differences are there between the versions?

Version differences

 

The last manual change, moving a word at the end of one line to the beginning of the next, indicates a common source. Looking for the differences between the Mystic Letters and One Ummah versions finds only 17 differences, most but not all of which are a choice of hamza or alif.

In all other cases, comparison reveals many hundreds if not thousands of differences. The majority are attributable to alif hints, choices of alif form, or variations in diacritics. The next stage is to take one letter or character at a time, and look for variations in counts of each character in each ayah or word. At this point, the preference would be to compare word by word, but the separation of words in the different versions is not consistent and thus makes it difficult.

Letter counts

The following table shows the letter counts for most of the alphabet and diacritics in all surahs after the corrections noted above. This includes the bismillah at the start of the surah only in Surah 1.

Letter name char unicode Mystic Hadith Ummah Quran Sacred Previous Khalifa Divine Agree
space   32 71579 68565 71565 71564 75330 71201 71196 71567
hamza ء 1569 1515 1527 1525 1526 3060 3357 3551 1578
alif madda آ 1570 1730 1735 1730 1506 391 190 18 1511
alif hamza أ 1571 9120 9109 9120 9121 8900 6373 1 9117
wa hamza ؤ 1572 784 794 784 757 705 702 708 675
alif hamza below إ 1573 5086 5085 5086 5086 5088 3599 3 5084
ya hamza ئ 1574 1161 1152 1151 1177 851 936 915 1206
Alif ا 1575 43322 43247 43324 43569 24793 43116 52641 43558
ba ب 1576 11491 11491 11491 11491 11491 11491 11490 11491
ta marbuta ة 1577 2363 2362 2363 2367 2344 2348 2346 2396
ta   ت 1578 10501 10502 10501 10497 10520 10517 10519 10468
tha ث 1579 1414 1414 1414 1414 1414 1414 1414 1414 Y
jim ج 1580 3317 3317 3317 3317 3317 3317 3317 3317 Y
ha ح 1581 4140 4140 4140 4140 4140 4140 4140 4140 Y
ka خ 1582 2497 2497 2497 2497 2497 2497 2497 2497 Y
dal د 1583 5991 5991 5991 5991 5991 5991 5991 5991 Y
zal ذ 1584 4932 4932 4932 4932 4932 4932 4932 4932 Y
ra ر 1585 12403 12403 12403 12403 12403 12403 12403 12403 Y
za  ز 1586 1599 1599 1599 1599 1599 1599 1599 1599 Y
sin س 1587 6012 6012 6012 6012 6011 6013 6013 6013
shin ش 1588 2124 2124 2124 2124 2124 2124 2124 2124 Y
sad ص 1589 2072 2072 2072 2072 2074 2071 2071 2071
dad ض 1590 1686 1686 1686 1686 1686 1686 1686 1686 Y
taa ط 1591 1273 1273 1273 1273 1273 1273 1273 1273 Y
zaa ظ 1592 853 853 853 853 853 853 853 853 Y
ayn ع 1593 9405 9405 9405 9405 9405 9405 9405 9405 Y
ghayn غ 1594 1221 1221 1221 1221 1221 1221 1221 1221 Y
tatweel ـ 1600 0 66 375 0 495 1 0 0
fa ف 1601 8748 8748 8748 8747 8747 8747 8747 8747
qaf ق 1602 7034 7034 7034 7034 7034 7034 7034 7034 Y
kaf ك 1603 10497 10497 10497 10497 10497 10497 10497 10497 Y
lam ل 1604 38191 38191 38191 38190 38102 38109 38105 38190
mim م 1605 26735 26735 26735 26735 26735 26735 26735 26740
nun ن 1606 27270 27273 27271 27270 27268 27270 27270 27270
ha ه 1607 14850 14850 14850 14850 14850 14849 14849 14849
wa و 1608 24814 24900 24813 24813 24971 24959 24975 24798
alif maksura ى 1609 2593 2603 2603 2602 6684 2600 2599 2585
ya   ي 1610 21977 21977 21977 21977 18222 21918 21911 21977
fathatan ً 1611 3666 3724 3666 3742 3741 3650 0 3347
dammatan ٌ 1612 2490 2513 2490 2519 2519 2477 5 2517
khasratan ٍ 1613 2604 2625 2604 2634 2633 2567 0 2633
fatha َ 1614 119032 119367 119031 119736 122946 117047 38 117065
damma ُ 1615 37046 37207 37048 37311 37320 36094 14 37203
khasra ِ 1616 45612 45799 45611 45970 45969 44242 18 45715
shadda ّ 1617 22496 22488 22496 22509 22666 18699 8 18109
sukun ْ 1618 38852 38862 38853 38849 37149 44530 18 42792

 

Although there is close agreement over many of letters, those of many forms, e.g. hamza, alif or ya, agreement varies depending upon encoding choices.

The job now is to eliminate the excess or missing characters.

Please do not consider these numbers as correct or definitive in any way. I am still in the process of checking my conversion and comparison algorithms.

 

Textual differences
 

Here, I shall present the differences letter by letter, source by source. Although the comment "All agree" may seem superfluous, it is a count line by line rather than adding up the total occurrences of a letter in all ayat in all surahs. This test identifies if there is a missing letter in one line and an excess letter in another line, which the above test did not.

Space (char 32)

There are 2506 lines (approximately 1/3 of the total lines) where the word counts do not match.

Alif ا

At present, I cannot see the best way to compare alif counts because of the many different versions and unicode choices. I shall return to this later.

Ba ب

Khalifa, Surah 5 line 72 word 13 is  ييني yayani. The others have  يَبَنِىٓ   yabani, which translates to O Children .

Ta and Ta Marbuta ة ت

The counts for ta and ta marbuta added together show close agreement. I shall return to this later.

Tha ث

All agree

Jim ج

All agree

Ha ح

All agree

Kha خ

All agree

Dal د

All agree

Zal ذ

All agree

Ra ر

All agree

Za ز

All agree

Sin س and Sad ص

Surah 2, line 245, word 13, wayabsut translated as and amplifies. The printed versions show a small sin above a sad. The mystic, hadith, ummah, quran and divine versions show    وَيَبْسُطُThe Previous and Khalifa versions show a sad. Only the Sacred texts version shows sad with a small sin.    وَيَبْصُۣط . It appears below the sad. In the printed version the small sin is above the sad.

Surah 4 line 176 ends with salim in Sacred. All others end in alim - the all knower

Surah 7 line 69 word 21, similar to Surah 2, line 45 word 13 above, uses a sad rather than a sin in Sacred texts. بَصْۣطَةًۭۖ

Surah 52 line 37 word 6 uses a sin in Previous, Khalifa and Divine, and a sad in all others. الْمُسَيْطِرُونَ

Surah 88 line 22 ends with the word بِمُسَيْطِرٍ which uses a sin in Previous and Khalifa and a sad in all others.

Shin ش

All agree

Dad ض

All agree

Ta ط

All agree

Za ظ

All agree

Ayn ع

 All agree

Ghayn غ

All agree

Fa ف

Surah 7 line 14 in the mystic, hadith and ummah versions has an extra fa before the alif of the second word فَأَنظِرْنِي

Qaf ق

All agree

Kaf ك

All agree

Lam ل

There are 88 differences in the use of lam. The most common difference is over the choice of اللَّيْلِ or ٱلَّيْلِ . The first occurrence of this is in Surah 2 line 164 word 6. Khalifa, Previous and Sacred have one lam, the others have two lams. The printed version has one lam.

Variations of this and other words with single or double lams are وَاللاَّتِي in Surah 4 line 15 and other places. وَاللَّذَانِ in Surah 4 line 16 and other places.

Mim م

The Divine source has a few extra mims, all in the same sequence: أَمْ مَنْ The second min (from the right) is extra, and occurs in
Surah 10, lines 31 and 35
Surah 67 lines 20, 21 and 22

Nun ن

Surah 4 line 90 word 13 has an extra nun in the Mystic, Hadith and Ummah versions. يُقَاتِلُونَكُمْ rather than يُقَتِلُوكُمْ There is no nun in the printed version.

Surah 11, line 14 begins فَإِنْ لَمْ in the Sacred, Previous and Khalifa versions. In all others it is فَالمْ . It is the latter without the nun in the printed version.

Previous has an extra nun in Surah 18, line 95 word 2. مَكَّنَنِي

Surah 21, line 88, word 6 has a nun with a small nun above it in the printed version. Only Sacred has the small nun - نُۨجِى Previous and Khalifa omit the small nun - نُجِي . The others show it as a full nun - نُنْجِي

Surah 59 line 64 word 3. Previous and Khalifa both have a double nun تَأْمُرُونَنِي instead of a nun with a shadda.

Surah 68 line 1 begins نون in Khalifa. All others begin with a single nun as does the printed version.

Surah 72 line 16 begins وَأَنْ in Hadith. All others and the printed version have no nun.

Ha ه

Surah 24 line 33 word 40. Previous, Khalifa and Divine have a single ha. All others, and the printed version have two ha characters together يُكْرِههُّنَّ

Wa و

There are just over 200 differences in the use of the letter wa. Many appear to be a choice between wa and alif, as in the following examples:

ٱلصَّلَوٰةَ or الصَّلاةَ
الزَّكَاةَ or ٱلزَّكَوٰةَ
الْحَيَاةِ or ٱلْحَيَوٰةِ
الرِّبَا or ٱلرِّبَوٰا۟ۗ
تَلْوُونَ or تَلْوُۥنَ

Other places with mismatches are 2:251:4, 12:85:2, 14:21:4

Ya and Alif Maksura ى ي

There are many places (a few hundred) where ya and alif maksura have been interchanged. For example the word (pronounced fee and translated as 'in') in Surah 2 at the start of line 10, is fa.kasra.alif-maksura only in Sacred and the printed text. The others have fa.kasra.ya.

 

Building a more accurate version
I am now building a what I hope will be a more accurate version of the hafs reading. The method is as follows:
  1. Load each line from all versions.
  2. If the lines are identical, then accept the line as correct. See note 1.
  3. Split the line into individual words
  4. If the number of words is the same in all versions, compare the words one by one. Where the words differ, choose the correct or nearest version and edit according top the printed text.
  5. If the number of words differs (which it does in about 1/3 of all lines), choose the correct or nearest version and edit according to the printed text.

As I do this, I am beginning to see that the Sacred Texts version is by far the more accurate. I am also beginning to think that the current Sacred Texts version is not the version that was there when I previously worked on this. Because the Pakistan Open Source version and the Sacred Texts versions are identical, it may be that Sacred Texts now uses the Pakistan Open Source version after making comparisons of their own. It could also be that the Pakistan Open Source version is a copy of the Sacred Texts version. I shall contact the encoders at some point to try and find out, but at this point, I prefer to expend my energy on the composite version.

1. Al Fatiha - The Opening
2. Al Baqarah - The Heifer
3. Al Imran - The Family of Imran
4. An Nisaa - The Women
5. Al Ma'ida - The Table Spread
6. Al An'am - Cattle
7. Al A'raf - The Heights
8. AL Anfal - Spoils of War
9. At Tauba - Repentance, or Baraat - Immunity
10. Yunus - Jonah
11. Hud - The Prophet Hud
12. Yusuf - Joseph
13. Al Ra'd - The Thunder
14. Ibrahim - Abraham
15. Al Hijr - The Rocky Tract
16. An Nahl - The Bee
17. Bani Israil - The Children of Israel, or Al Isra - The Night Journey
18. Al Kahf - The Cave
19. Maryam - Mary
20. Ta Ha
21. Al Anbiyaa - The Prophets
22. Al Hajj - The Pilgrimage
23. Al Muminun - The Believers
24. An Nur - Light
25. Al Furqan - The Criterion
26. Ash-Shu'araa - The Poets
27. An Naml - The Ants
28. Al Qasas - The Narration
29. Al 'Ankabut - The Spider
30. Ar Rum - The Romans
31. Luqman - Luqman the Wise
32. As Sajdah - Adoration
33. Al Ahzab - The Confederates
34. Saba - Sheba
35. Fatir - Originator, or Malaika - The Angels
36. Ya Sin
37. As Saffat - Those Ranged in Ranks
38. Sad
39. Az Zumar - The Groups
40. Al Mumin - The Believer
41. Ha Mim, or Fussilat
42. Ash Shura - The Consultation
43. Az Zukhruf - Gold Adornments
44. Ad Dukhan - Smoke, or Mist
45. Al Jathiya - Bowing the Knee
46. Al Ahqaf - Winding Sand Tracts
47. Muhammad
48. Al Fath - Victory
49. Al Hujurat - The Inner Apartments
50. Qaf
51. Az Zariyat - The Winds that Scatter
52. At Tur - The Mount
53. An Najm - The Star
54. Al Qamar - The Moon
55. Al Rahman - The Most Gracious
56. Al Waqi'a - The Inevitable Event
57. Al Hadid - Iron
58. Al Mujadila - The Woman who Pleads
59. Al Hashr - The Gathering
60. Al Mumtahana - The Woman to be Examined
61. As Saff - Battle Array
62. Al Jumu'a - The Assembly (Friday) Prayer
63. Al Munafiqun - The Hypocrites
64. Tagabun - Mutual Loss and Gain
65. At Talaq - Divorce
66. At Tahrim - Holding something to be forbidden
67. Al Mulk - Dominion
68. Al Qalam - The Pen, or Nun
69. Al Haqqa - The Sure Reality
70. Al Ma'arij - The Ways of Ascent
71. Nuh - Noah
72. Al Jinn - The Jinn
73. Al Muzzammil - Folded in Garments
74. Al Muddathir - One Wrapped Up
75. Al Qiyimat - The Ressurection
76. Ad Dahr - Time, or Al Insan - Man
77. Al Mursalat - Those Sent Forth
78. An Nabaa - The Great News
79. An Nazi'at - Those who Tear Out
80. 'Abasa - He Frowned
81. At Takwir - The Folding Up
82. Al Infitar - The Cleaving Asunder
83. At Tatfif - Dealing in Fraud
84. Al Inshiqaq - The Rending Asunder
85. Al Buruj - The Signs of the Zodiac
86. At Tariq - The Night Visitant
87. Al A'la - The Most High
88. Al Gashiya - The Overwhelming Event
89. Al Fajr - Dawn
90. Al Balad - The City
91. Ash Shams - The Sun
92. Al Lail - The Night
93. Ad Dhuha - The Glorious Morning Light
94. Al Inshirah - The Expansion
95. At Tin - The Fig
96. Iqraa - Read, or Al Alaq - The Leech-like Clot
97. Al Qadr - The Night of Power
98. Al Baiyina - The Clear Evidence
99. Az Zilzal - The Convolution
100. Al Adiyat - Those who Run
101. Al Qari'a - The Day of Clamour
102. At Takathur - Piling Up
103. Al Asr - Time through the Ages
104. Al Humaza - Scandalmonger
105. Al Fil - The Elephant
106. Quraish - The Quraish
107. Al Ma'un - Neighbourly Needs
108. Al Kauthar - Abundance
109. Al Kafirun - The Disbelievers
110 An Nasr - Help
111. Al Lahab - The Flame
112. Al Ikhlas - Purity of Faith
113 Al Falaq - The Dawn
114 An Nas - Mankind

Notes:

I have yet to eyeball the output against a printed version. That is my next task with these 16 surahs.

The Sacred Texts/Pakistan open source has many cases of a high-isolated-mim following dammatan and kasratan. They are not in my printed version so I have not included them. In some cases, the kasratan at the end of a word is a kasra in the printed version.

The low wa in Surah 111 line 2 I have listed for now as high-wa to make things easier, as there is no low-wa character in unicode. I have yet to come across a high-wa in the printed version. The same applies to low-ya/high-ya.

Alif maksura followed by an alif hint doesn't look right when rendered in unicode.