Thursday, July 10, 2008

Character encoding in Apache

One of the last problem that i faced is related on how to setup character encoding on apache httpd for content served directly by the web server. What i was willing to do was to serve all the content with UTF-8 character encoding. Let see how to procede.

First we must edit the httpd.conf. Search for AddDefaultCharset and set it to utf-8:

AddDefaultCharset utf-8

Then we must be sure that the content that we are going to serve is stored on the file system in utf-8 as well. If this is not the case we will have wired behavior, with characters not displayed correctly. Unfortunately in linux is not possible to discover which is the encoding of a file, so the quickest way will be to verify directly the output on the browser. Linux in any case provide a quick way to convert from an encoding to another:

iconv -f ISO-8859-1 -t UTF-8 index.html > index.html.utf8

then we can simply:

mv -f index.html.utf8 index.html

Pay attention that iconv do NOT check the source file encoding. This means that if you apply this command to a file that is already stored with utf-8 encoding it will simply produce a wired file as output.

Last but not least, i would suggest you to avoid to put inside the page:

<meta equiv="Content-Type" content="text/html; charset=utf-8">

because in any case the parameters passed by apache through the http header will override it.

PS I've created a small script to modify a bunch of file in a row. It accept a search like parameter :

#!/usr/bin/ksh
USAGE="usage: "
if (( $# < 1 )) then
print $USAGE;
exit;
elif (( $# > 1 )) then
print $USAGE;
exit;
fi

for file in *$1*
do
if [[ $file = *html* ]]
then
iconv -f ISO-8859-1 -t UTF-8 $file > $file.utf8;
mv -f $file.utf8 $file;
echo $file;
fi
done


Wednesday, July 9, 2008

Compress / Decompress initrd files

To Decompress:

zcat initrd.gz > initrd
mount -t ext2 -o loop,rw initrd /tmp/

After the modification of the folder structure, to compress again:
gzip -c initrd > initrd.gz