|Encoding the Qur'an|
Many, myself included, have tried to encode the Qur'an using unicode. We have each struggled in our own way to work around the various problems encountered, chosen what to include and what to omit, and many have succeeded in a decent rendering.
The problems are twofold.
First, unicode is not complete. It does not contain all of the diacritics and additional characters necessary for a full and accurate rendering.
Second, the method of rendering each glyph (character) depends only upon how the previous one was drawn. This means that many of the ligatures (multiple characters joined together) cannot be drawn correctly, particularly lam-alif combinations.
In short, a markup approach is required, containing all of the characters, large and small, from which the best current rendering in unicode might be extracted, and also a rules based approach for drawing out the full text.
|Existing unicode versions of the Qur'an|
When I first started attempting a 'clean' unicode version, I could only find three sources. I do not know the origin of any of their sources, but think of them as the "Sacred Texts" version, the "One Ummah" version, and the "Hadith.net" version.
Since then, a few more more versions have come to light. Here are the ones I could find, and have used
|Open Source||http://www.pakistanopensource.org/...||Sacred texts||No||Notes|
Created from Sacred Texts, |
Hadith.net & One Ummah
versions, then edited by hand
|Qur'an Karim||http://www.lib.umich.edu/...||One Ummah||No|
Yusuf ali, Shakir |
I have added in the last|
two verses of Surah 9 while
|Differences in style|
EncodingThe Khalifa and Hadith.Net versions use Windows-1256 arabic encoding, e.g. بسم الله الرحمن الرحيم
Divine Islam, Quran.com, Mystic letters, One Ummah and Previous, all use 7 character html (or NCR) unicode definitions, e.g ذَلِكَ (bismi)
Sacred Texts uses unicode arabic, e.g. بِسْمِ
The Sacred texts and the One Ummah versions contain alif hints. The Sacred texts version encodes them as \u065F or character 1631, which appears as a rectangle indicating an unused character on my computer. The One Ummah versions uses a full alif, character 1575 or \u0627.
Khalifa does not contain any diacritics - khasra, damma, fatha, sukun etc.
Lam alif ligatures
When a lam is immediately followed by an alif, a lam-alif is created. If there is a diacritic between the lam and the alif, the lam-alif is not drawn. The first lam alif in the Qur'an is in Surah 1, ayah 7. It should be lam.fatha.alif.
In Sacred Texts and Previous, it is lam.fatha.alif. In Divine Islam, it is lam alif. In the others it is lam.alif. Both lam.alif and lam.alif.fatha show up the lam-alif glyph - وَلاَ - The versions where it is lam.fatha.alif does not show the lam-alif glyph - وَلَا .
|Comparing the versions|
There are seven versions of the Qur'an, which although they are almost identical, do differ slightly in pronunciation. For more, see here: http://www.islamic-awareness.org/Quran/Text/Qiraat/hafs.html#1. I have used the common printed version of the Complex for the printing of the Holy Qur'an, which, as far as I am aware, is the the Hafs version.
To begin to compare the electronic versions, there are two steps I have taken.
Step 1. Look for obvious errors and manually correct them. I have done this in the following cases so far, where short passages of a few words were repeated. The corrections to Previous are not particularly relevant as I shall only be using those surahs I edited by hand: Surahs 1, part of 2, 42, 50, 68 and 100 to 114 inclusive. I have included them for the sake of completeness.
Manual correction 1.
Previous and Khalifa both have an extra piece of text in Surah 6, line 81, Compare the following:
Sacred T:وَكَيْفَ أَخَافُ مَآ أَشْرَكْتُمْ وَلَا تَخَافُونَ أَنَّكُمْ أَشْرَكْتُم بِٱللَّهِ مَا لَمْ يُنَزِّلْ بِهِۦ عَلَيْكُمْ سُلْطَٟنًۭاۚ فَأَىُّ ٱلْفَرِيقَيْنِ أَحَقُّ بِٱلْأَمْنِۖ إِن كُنتُمْ تَعْلَمُونَ
Previous:وَكَيْفَ أَخَافُ مَا أَشْرَكْتُمْ وَلَا تَخَافُونَ أَنَّكُمْ أَشْرَكْتُمْ بِاللَّهِ وَلَا تَخَافُونَ أَنَّكُمْ أَشْرَكْتُمْ بِاللَّهِ مَا لَمْ يُنَزِّلْ بِهِ عَلَيْكُمْ سُلْطَانًا فَأَيُّ الْفَرِيقَيْنِ أَحَقُّ بِالْأَمْنِ إِنْ كُنتُمْ تَعْلَمُونَ
The highlighted text is not in my printed Qur'an. It is a repetition of six words, and has been excised from Previous and Khalifa.
Manual Correction 2.
Surah 13 in Previous, was missing ayah 11 onwards. I have added the remaining ayat from the One Ummah version.
Manual Correction 3.
Khalifa has (infamously) removed the last two lines from Surah 9. I have replaced them with the Hadith.net version, but may return to this subject later.
Manual Correction 4.
Surah 5, line 6 has another repeat in Previous. The following text shows only the start of the relevant line. The numbers 7 and 8 are counts of a single letter which has identified the repeat.
Sacred: 7يَٟٓأَيُّهَا ٱلَّذِينَ ءَامَنُوٓا۟ إِذَا قُمْتُمْ إِلَى ٱلصَّلَوٰةِ فَٱغْسِلُوا۟ وُجُوهَكُمْ وَأَيْدِيَكُمْ إِلَى ٱلْمَرَافِقِ وَٱمْسَحُوا
Previous: 8 يَاأَيُّهَا الَّذِينَ آمَنُوا إِذَا قُمْتُمْ إِلَى الصَّلَاةِ فَاغْسِلُوا وُجُوهَكُمْ وَأَيْدِيَكُمْ فَاغْسِلُوا وُجُوهَكُمْ وَأَيْدِيَكُمْ إِلَى الْمَرَافِقِ وَامْسَحُوا
Manual Correction 5.
Surah 4, line 102 has another repeat in Previous. The following text shows the relevant passage.
Previous: 4وَإِذَا كُنتَ فِيهِمْ فَأَقَمْتَ لَهُمْ الصَّلَوةَ فَلْتَقُمْ طَائِفَةٌ مِنْهُمْ مَعَكَ وَلْيَأْخُذُوا أَسْلِحَتَهُمْ فَإِذَا سَجَدُوا فَلْيَكُونُوا مِنْ وَرَائِكُمْ وَلْتَأْتِ طَائِفَةٌوَلْتَأْتِ طَائِفَةٌ أُخْرَى لَمْ
Khalifa: 3واذا كنت فيهم فاقمت لهم الصلوة فلتقم طائفة منهم معك ولياخذوا اسلحتهم فاذا سجدوا فليكونوا من ورائكم ولتات طائفة اخرى لم
Manual Correction 6.
Surah 24 has the first word of ayah 8 at the end of ayah 7 in the Mystic Letters and One Ummah versions.
The fact that these manual corrections in some cases bridge more than one version, begging the question: How many differences are there between the versions?
The last manual change, moving a word at the end of one line to the beginning of the next, indicates a common source. Looking for the differences between the Mystic Letters and One Ummah versions finds only 17 differences, most but not all of which are a choice of hamza or alif.
In all other cases, comparison reveals many hundreds if not thousands of differences. The majority are attributable to alif hints, choices of alif form, or variations in diacritics. The next stage is to take one letter or character at a time, and look for variations in counts of each character in each ayah or word. At this point, the preference would be to compare word by word, but the separation of words in the different versions is not consistent and thus makes it difficult.
The following table shows the letter counts for most of the alphabet and diacritics in all surahs after the corrections noted above. This includes the bismillah at the start of the surah only in Surah 1.
|alif hamza below||إ||1573||5086||5085||5086||5086||5088||3599||3||5084|
Although there is close agreement over many of letters, those of many forms, e.g. hamza, alif or ya, agreement varies depending upon encoding choices.
The job now is to eliminate the excess or missing characters.
Please do not consider these numbers as correct or definitive in any way. I am still in the process of checking my conversion and comparison algorithms.
Here, I shall present the differences letter by letter, source by source. Although the comment "All agree" may seem superfluous, it is a count line by line rather than adding up the total occurrences of a letter in all ayat in all surahs. This test identifies if there is a missing letter in one line and an excess letter in another line, which the above test did not.
|Space (char 32)|
There are 2506 lines (approximately 1/3 of the total lines) where the word counts do not match.
At present, I cannot see the best way to compare alif counts because of the many different versions and unicode choices. I shall return to this later.
Khalifa, Surah 5 line 72 word 13 is ييني yayani. The others have يَبَنِىٓ yabani, which translates to O Children .
|Ta and Ta Marbuta ة ت|
The counts for ta and ta marbuta added together show close agreement. I shall return to this later.
|Sin س and Sad ص|
Surah 2, line 245, word 13, wayabsut translated as and amplifies. The printed versions show a small sin above a sad. The mystic, hadith, ummah, quran and divine versions show وَيَبْسُطُThe Previous and Khalifa versions show a sad. Only the Sacred texts version shows sad with a small sin. وَيَبْصُۣط . It appears below the sad. In the printed version the small sin is above the sad.
Surah 4 line 176 ends with salim in Sacred. All others end in alim - the all knower
Surah 7 line 69 word 21, similar to Surah 2, line 45 word 13 above, uses a sad rather than a sin in Sacred texts. بَصْۣطَةًۭۖSurah 52 line 37 word 6 uses a sin in Previous, Khalifa and Divine, and a sad in all others. الْمُسَيْطِرُونَ
Surah 88 line 22 ends with the word بِمُسَيْطِرٍ which uses a sin in Previous and Khalifa and a sad in all others.
Surah 7 line 14 in the mystic, hadith and ummah versions has an extra fa before the alif of the second word فَأَنظِرْنِي
There are 88 differences in the use of lam. The most common difference is over the choice of اللَّيْلِ or ٱلَّيْلِ . The first occurrence of this is in Surah 2 line 164 word 6. Khalifa, Previous and Sacred have one lam, the others have two lams. The printed version has one lam.
Variations of this and other words with single or double lams are وَاللاَّتِي in Surah 4 line 15 and other places. وَاللَّذَانِ in Surah 4 line 16 and other places.
The Divine source has a few extra mims, all in the same sequence: أَمْ مَنْ
The second min (from the right) is extra, and occurs in
Surah 10, lines 31 and 35
Surah 67 lines 20, 21 and 22
Surah 4 line 90 word 13 has an extra nun in the Mystic, Hadith and Ummah versions. يُقَاتِلُونَكُمْ rather than يُقَتِلُوكُمْ There is no nun in the printed version.
Surah 11, line 14 begins فَإِنْ لَمْ in the Sacred, Previous and Khalifa versions. In all others it is فَالمْ . It is the latter without the nun in the printed version.
Previous has an extra nun in Surah 18, line 95 word 2. مَكَّنَنِي
Surah 21, line 88, word 6 has a nun with a small nun above it in the printed version. Only Sacred has the small nun - نُۨجِى Previous and Khalifa omit the small nun - نُجِي . The others show it as a full nun - نُنْجِي
Surah 59 line 64 word 3. Previous and Khalifa both have a double nun تَأْمُرُونَنِي instead of a nun with a shadda.
Surah 68 line 1 begins نون in Khalifa. All others begin with a single nun as does the printed version.
Surah 72 line 16 begins وَأَنْ in Hadith. All others and the printed version have no nun.
Surah 24 line 33 word 40. Previous, Khalifa and Divine have a single ha. All others, and the printed version have two ha characters together يُكْرِههُّنَّ
There are just over 200 differences in the use of the letter wa. Many appear to be a choice between wa and alif, as in the following examples:
Other places with mismatches are 2:251:4, 12:85:2, 14:21:4
|Ya and Alif Maksura ى ي|
There are many places (a few hundred) where ya and alif maksura have been interchanged. For example the word (pronounced fee and translated as 'in') in Surah 2 at the start of line 10, is fa.kasra.alif-maksura only in Sacred and the printed text. The others have fa.kasra.ya.
|Building a more accurate version|
As I do this, I am beginning to see that the Sacred Texts version is by far the more accurate. I am also beginning to think that the current Sacred Texts version is not the version that was there when I previously worked on this. Because the Pakistan Open Source version and the Sacred Texts versions are identical, it may be that Sacred Texts now uses the Pakistan Open Source version after making comparisons of their own. It could also be that the Pakistan Open Source version is a copy of the Sacred Texts version. I shall contact the encoders at some point to try and find out, but at this point, I prefer to expend my energy on the composite version.
|1. Al Fatiha - The Opening|
|2. Al Baqarah - The Heifer|
|3. Al Imran - The Family of Imran|
|4. An Nisaa - The Women|
|5. Al Ma'ida - The Table Spread|
|6. Al An'am - Cattle|
|7. Al A'raf - The Heights|
|8. AL Anfal - Spoils of War|
|9. At Tauba - Repentance, or Baraat - Immunity|
|10. Yunus - Jonah|
|11. Hud - The Prophet Hud|
|12. Yusuf - Joseph|
|13. Al Ra'd - The Thunder|
|14. Ibrahim - Abraham|
|15. Al Hijr - The Rocky Tract|
|16. An Nahl - The Bee|
|17. Bani Israil - The Children of Israel, or Al Isra - The Night Journey|
|18. Al Kahf - The Cave|
|19. Maryam - Mary|
|20. Ta Ha|
|21. Al Anbiyaa - The Prophets|
|22. Al Hajj - The Pilgrimage|
|23. Al Muminun - The Believers|
|24. An Nur - Light|
|25. Al Furqan - The Criterion|
|26. Ash-Shu'araa - The Poets|
|27. An Naml - The Ants|
|28. Al Qasas - The Narration|
|29. Al 'Ankabut - The Spider|
|30. Ar Rum - The Romans|
|31. Luqman - Luqman the Wise|
|32. As Sajdah - Adoration|
|33. Al Ahzab - The Confederates|
|34. Saba - Sheba|
|35. Fatir - Originator, or Malaika - The Angels|
|36. Ya Sin|
|37. As Saffat - Those Ranged in Ranks|
|39. Az Zumar - The Groups|
|40. Al Mumin - The Believer|
|41. Ha Mim, or Fussilat|
|42. Ash Shura - The Consultation|
|43. Az Zukhruf - Gold Adornments|
|44. Ad Dukhan - Smoke, or Mist|
|45. Al Jathiya - Bowing the Knee|
|46. Al Ahqaf - Winding Sand Tracts|
|48. Al Fath - Victory|
|49. Al Hujurat - The Inner Apartments|
|51. Az Zariyat - The Winds that Scatter|
|52. At Tur - The Mount|
|53. An Najm - The Star|
|54. Al Qamar - The Moon|
|55. Al Rahman - The Most Gracious|
|56. Al Waqi'a - The Inevitable Event|
|57. Al Hadid - Iron|
|58. Al Mujadila - The Woman who Pleads|
|59. Al Hashr - The Gathering|
|60. Al Mumtahana - The Woman to be Examined|
|61. As Saff - Battle Array|
|62. Al Jumu'a - The Assembly (Friday) Prayer|
|63. Al Munafiqun - The Hypocrites|
|64. Tagabun - Mutual Loss and Gain|
|65. At Talaq - Divorce|
|66. At Tahrim - Holding something to be forbidden|
|67. Al Mulk - Dominion|
|68. Al Qalam - The Pen, or Nun|
|69. Al Haqqa - The Sure Reality|
|70. Al Ma'arij - The Ways of Ascent|
|71. Nuh - Noah|
|72. Al Jinn - The Jinn|
|73. Al Muzzammil - Folded in Garments|
|74. Al Muddathir - One Wrapped Up|
|75. Al Qiyimat - The Ressurection|
|76. Ad Dahr - Time, or Al Insan - Man|
|77. Al Mursalat - Those Sent Forth|
|78. An Nabaa - The Great News|
|79. An Nazi'at - Those who Tear Out|
|80. 'Abasa - He Frowned|
|81. At Takwir - The Folding Up|
|82. Al Infitar - The Cleaving Asunder|
|83. At Tatfif - Dealing in Fraud|
|84. Al Inshiqaq - The Rending Asunder|
|85. Al Buruj - The Signs of the Zodiac|
|86. At Tariq - The Night Visitant|
|87. Al A'la - The Most High|
|88. Al Gashiya - The Overwhelming Event|
|89. Al Fajr - Dawn|
|90. Al Balad - The City|
|91. Ash Shams - The Sun|
|92. Al Lail - The Night|
|93. Ad Dhuha - The Glorious Morning Light|
|94. Al Inshirah - The Expansion|
|95. At Tin - The Fig|
|96. Iqraa - Read, or Al Alaq - The Leech-like Clot|
|97. Al Qadr - The Night of Power|
|98. Al Baiyina - The Clear Evidence|
|99. Az Zilzal - The Convolution|
|100. Al Adiyat - Those who Run|
|101. Al Qari'a - The Day of Clamour|
|102. At Takathur - Piling Up|
|103. Al Asr - Time through the Ages|
|104. Al Humaza - Scandalmonger|
|105. Al Fil - The Elephant|
|106. Quraish - The Quraish|
|107. Al Ma'un - Neighbourly Needs|
|108. Al Kauthar - Abundance|
|109. Al Kafirun - The Disbelievers|
|110 An Nasr - Help|
|111. Al Lahab - The Flame|
|112. Al Ikhlas - Purity of Faith|
|113 Al Falaq - The Dawn|
|114 An Nas - Mankind|
I have yet to eyeball the output against a printed version. That is my next task with these 16 surahs.
The Sacred Texts/Pakistan open source has many cases of a high-isolated-mim following dammatan and kasratan. They are not in my printed version so I have not included them. In some cases, the kasratan at the end of a word is a kasra in the printed version.
The low wa in Surah 111 line 2 I have listed for now as high-wa to make things easier, as there is no low-wa character in unicode. I have yet to come across a high-wa in the printed version. The same applies to low-ya/high-ya.
Alif maksura followed by an alif hint doesn't look right when rendered in unicode.