Arabic and the WWW
Sibylle Wegener
Introduction
Most internet users, who for the first time try to read Arabic web sites, have to face difficulties.
Often they only see a strange mix of characters making no sense.
But if this muddle is the only problem there are ways to solve it.
This article aims at introducing various browser software able to display Arabic,
and wants to show how to improve your browser settings to make Arabic readable.
Please note that this is only the abridged English version of a more detailed German text.
First of all, we want to take a look at the causes for our difficulties.
Most of them arise from the characteristics of the Arabic script.
One way to solve problems like displaying bi-directional text and the Arabic characters in their
correct shape has been implementing Arabic text as graphics.
Among the shortcomings of this practice we have to count the large computer storage space needed, which also leeds to slower transfer and display.
Indexing a page like this as well as to search it or to do copy and paste is impossible.
These drawbacks led the developers of internet pages to think of other ways of including Arabic text in their pages.
Most of these other ways to solve the problems resulting from the peculiarities of the Arabic language are connected with the existing character codes for Arabic.
As for the other languages as well, Arabic characters have to be converted into numeric codes before dealing with them on the computer or internet.
But to display all the characters of the Arabic language the well-known ASCII character set was not sufficient. Therefore other character sets had to be developed.
Today we have to face the fact that too many different character sets for Arabic evolved.
For instance most of the existing computer types ( e.g. Mac, PC ) have their own character sets for Arabic.
This led to the situation that an internet user wasn`t able to read a web site authored by another internet user, who has been using the specific character set belonging to his computer type.
Still this is a widespread problem and the main reason for the muddle described above.
To display a web site in the correct way, you have to find out the character set that was used by the author of that site.
The necessary information is often hidden in the source code of an online document.
But your browser is the most importent tool in this respect. The browser can get information about the character set a page is written in, and display it correctly. Browser and server can exchange information on the character sets the browser is able to display, so that the server is sending to him a corresponding version.
A browser, too, offers you to make a choice between a number of character sets he is able to display. This is especially important when you found out the character set used while creating a site and want to change your browser settings to make the site readable.
Some of the browser software available today chooses the right character set automatically.
This short overview was intended to show the importance of browser software for a successful reading of Arabic web sites. Today there are various Arabic or Multilanguage browser programs for the different operating systems. The most recommended ones are listed on our browser page.
>From this page you can choose the software corresponding to the operating system you are using.
For more detailled information please use the links to the providers.
Unfortunately, we can not report on all the software from our own experience.
Because of this we would appreciate your comments.
Further information on this topic can be found on the web pages listed on our links page.
The Arabic Script
The fact that the ASCII character set is not sufficient to display Arabic is due to the special features of the Arabic script.
This is not the place to discuss its peculiarities in detail, but general information
shall be given to throw a light on the backgrounds of the encoding problems.
Arabic text is cursive and written from right to left with the exeption of numbers, which are laid out left to right. This change in the writing direction is usually described by the expression bidirectionality of Arabic text. Bidirectional text also emerges when Latin and Arabic text is mixed.
The Arabic letters change their shape depending on their position in the word. A single character can have up to four different shapes. Most but not all letters connect to one another when written in one word. To sum up it can be said that the shape of a character depends on the context. To choose the right shape of a character a contextual analysis algorithm is needed.
Arabic includes ligatures. In Arabic, ligatures are combinations of two, and sometimes three, characters into one shape. The resulting single glyph replaces the two or three characters of which it is composed.
Last but not least: Arabic includes diacritics, which are signs that modify the implicit sound or value of a character. They can be written above or below any character.
These are the most important features of Arabic a browser has to be able to handle.
To sum up: a contextual analysis algorithm is required. The browser has to be able to display bidirectional text and diacritics properly.
For further information see for instance the article Arabic Language Features
(from Microsoft`s Middle Eastern Languages Issues).
Arabic Character Encoding
For Arabic as well as for all other languages it is necessary to encode the characters for that the computer can handle it and text can be transmitted through networks.
Therefore it needs character sets. For the so called lingua franca of the internet, which is English, we have a standardized character encoding, but not so for Arabic.
Through the years a variety of different character sets developed and spreaded out.
Some of the milestones in the history of Arabic character sets are the development of CUDAR-U as the first standard Arabic character set (which used 7 bits per character) in 1981, and one year later ASMO-449 (7 bits, too), which should become the basis for all following standard sets.
The first international standard set for Arabic appeared in 1986, it is called ISO-8859-6 (8 bits).
In the 1990s the influence of Microsoft`s operating system MS Windows led to the spread of the Microsoft encoding for Arabic MSCP-1256.
Considering the confusing great number of Arabic character sets (about 20 already in the 1980s) Unicode or Universal Character Set (UCS) might be a solution.
UCS is capable of coding almost all living languages of the world and offers also several advantages for the encoding of Arabic text. It is for instance able to handle ligatures and bidirectional text.
But up to now Unicode is just one out of several possible encodings we must be able to handle.
Therefore we have to look for suitable browser software.
Browser Settings
Some browser programs like the Internet Explorer offer the download of Arabic language tools already during the setup. When you`re installing Internet Explorer choose the Customize Your Installation option during the setup to add additional components like Arabic language support.
If you missed to do so, the Internet Explorer will offer you to carry out a windows update to get the required support.
>From the browser options you can choose a certain character set, for instance Arabic (Windows), Arabic (ASMO 708), Arabic (DOS) and Arabic (ISO).
By doing so you can select the right character set to read a page. The information about the character set can be taken from the CHARSET attribute of the "A HREF" element or it may be indicated in the META element in the header of the HTML document.
If you are using Internet Explorer you can also enable your browser to select the proper character set for a web site by itself.
Browser Programs
This section offers a survey of popular browser software for different operating systems / computer types.
Choose one browser from the table below to find out more about it and get to the provider of this software!
Please keep us informed if there is a new software which should be added to our list.
Microsoft Internet Explorer 5.x
After installing the Arabic language support you can choose one of the following character sets:
Arabic (Windows)
Arabic (ASMO 708)
Arabic (ISO)
Arabic (DOS)
You can choose between the two fonts Arabic Traditional and Arabic Transparent.
Please note that Arabic text can not be inserted, except you have an localized version of Windows.
MS Internet Explorer 5.x runs under
- MS Windows 95 or higher and Windows NT 4.0.
Further information can be found at the homepage of the Microsoft Internet Explorer.
- Macintosh MacOS 8.1 or later
Further information at MacTopia.
Netscape Navigator and Sindbad Communicator
To read Arabic pages with the Netscape Navigator you should install an additional software, Sindbad Communicator, of which you can get a free version for Netscape Navigator starting with version 4.0.
Sindbad Communicator is provided by Sakhrsoft and supports:
- Arabic and non-Arabic Windows 95/98
- Arabic version of Windows NT
.
You can download Sindbad Communicator from Sakhr`s homepage.
Tango Lite Version 3.1 (Arabic, French, English)
Tango Lite version 3.1 supports Arabic and French. The user can choose among different interfaces and the pop up keyboard makes it easier to enter Arabic text.
Email massages can be created and viewed with Tango.
Tango Lite version 3.1 (Arabic, French, English) runs under:
- MS Windows 95/98
- Windows NT
Further information see the homepage of worldlanguage.com, from where you can buy the program online.
Tango Browser V. 3.1
The Tango Browser Pro with Japanese Support v. 3.1 offers even more than the Lite Version: he allows you to display web pages authored in any of over 90 languages, select the language of its interface, automatically retrieve these pages in the language version you prefer, and even input text in a wide variety of languages.
Tango Browser Pro with Japanese Support v. 3.1 runs under:
- MS Windows 95/98
- Windows NT
More detailed information is to be found at worldlanguage.com.
Icab
The Icab web browser is now available as iCabPreview.
The free version is provided by Alexander Clauss & iCab Company.
System requirements:
- Minimum 4MB free RAM
- System 7.0.1 or 7.1 if ThreadManager and DragManager are also installed
- System 7.5 or newer
- MacTCP or OpenTransport
- InternetConfig 1.2 (or Mac OS 8.5 or newer)
- PPC, 68020, 68030 or 68040
For other recommended software see the iCab info.
Download the iCabPreview from the Icab homepage.
WinArabic 1.5
The software WinArabic 1.5, which was developed by Mughamarat, allows Mac users to read Arabic web sites.
It supportes CP 1256 encoding and Arabic Unicode.
To find out more about the advantages of WinArabic 1.5 see the Mughamarat homepage.
System requirements:
- PowerPC Macintosh
- Mac OS 7.5 or newer
- Arabic Language Kit
- Netscape 4.5 or newer
- Eudora 3.x
You can download a Demo Version of WinArabic 1.5.
AraMosaic
AraMosaic is an enhanced NCSA Mosaic 2.7b4 Unix/X11 WWW browser supporting Arabic and English text.
Up to now it is available only for Unix/X11.
You can do copy and paste and print Arabic pages (postscript).
AraMosaic is provided in binary form for the following systems:
- SGI Irix 5.2/5.3/6.2/6.5 (AraMosaic.sgi.tar.gz )
- Sun Solaris 2.4/2.5 (AraMosaic.solaris.tar.gz)
- Sun Solaris 7 (AraMosaic.solaris7.tar.gz)
- Sun Solaris X86 2.6(AraMosaic.solarisX86.tar.gz)
- SunOS 4.1.3/X11/OpenWindows (AraMosaic.sunos.tar.gz )
- Linux 2.x.x/Motif 2. (AraMosaic.linux_MotifDynam.tar.gz)
- Linux 2.x.x/no Motif (AraMosaic.linux_MotifStatic.tar.gz)
- Linux 2.x.x/Full Arabic support (AraMosaic_linux_ArabicMotif.tar.gz)
- DEC Alpha OSF1 3.2 (AraMosaic.alpha-dec-osf32.tar.gz)
Download is available via anonymus ftp for instance on this web site:
http://www.langbox.com/arabic/download/AraMosaic
For other download areas and detailed information see the web site of LangBox International.
PMosaic
PMosaic is an enhanced NCSA Mosaic 2.71 Unix/X11 WWW browser. It supports Trilingual Persian, Arabic and English hypertext.
Like AraMosaic it is only available for Unix/X11.
PMosaic is provided in binary form for the following systems:
- DEC Ultrix (pmosaic92.ultrix.tar.gz )
- DEC Alpha (pmosaic92.alpha.tar.gz )
- HP 7xx (pmosaic92.hp.tar.gz )
- IBM RS6000 (pmosaic92.ibm.tar.gz)
- Linux (pmosaic92.linux.tar.gz)
- SGI 5.2/5.3 (pmosaic92.sgi.tar.gz )
- SunOS 4.1.3/X11/OpenWindows (pmosaic92.sun.tar.gz )
- Sun Solaris 2.3(pmosaic92.solaris.tar.gz)
Download via anonymus ftp is possible, for instance from the following address:
http://tehran.stanford.edu/Iran_Lib/Pmosaic/.
Further information is published on the web site of the Global Publishing Group, choose this link to get there.
AraZilla
AraZilla is a free Arabic web browser for the Linux Platform.
The AraZilla is an extension of the Mozilla code with the LangBox's XLANGBOX-ARA Arabic toolkit for UNIX.
AraZilla has been tested by users of the following systems:
- SuSE Linux 6.3
- SuSE Linux 6.2
- RedHat 6.1
For further information and download see the AraZilla homepage.
Links
BabelSite,
an Alis Technologies / Internet Society joint project to internationalize the Internet
Non English and the net by Knut Vikør
The Arabic Macintosh - An informal resource center by Knut Vikør
Alan Wood's Unicode Resources
Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications
Using the Internet in Arabic: Problems and Solutions by Badr H. AL-BADR
Nicholas Heer`s Home Page
How to read Arabic on the net
Multilingual World Wide Web Working Group:
Useful resources for accessing the Web in non-Latin fonts
© Virtual Library Middle East North Africa, SSG ULB Halle, 2000-2001
URL of this page: http://ssgdoc.bibliothek.uni-halle.de/vlib/html/docs/arabwww_en.html