[Clfs-commits] [CLFS Trac] #182: UTF-8
CLFS Trac
trac at cross-lfs.org
Mon Sep 29 11:05:27 PDT 2008
#182: UTF-8
-------------------+--------------------------------------------------------
Reporter: ken | Owner: clfs-commits at lists.cross-lfs.org.
Type: task | Status: new
Priority: minor | Milestone:
Component: BOOK | Version:
Keywords: |
-------------------+--------------------------------------------------------
Couldn't find a ticket for this, so starting a new one as an aide-memoire.
If people want to use UTF-8 (and so far, there seems a lack of consensus),
the assumption is that it should be optional. So far, I've been using it
for a couple of years or so, and I'm aware of at least the following
additions (there are probably others):
1. for glibc add libidn. Now that glibc no longer gets releases, I'm
going to try this with upstream libidn (v1.9), but I haven't yet.
2. for ncurses --enable-widec so that we build the ...w versions and
remove/replace the non-wide versions similar to in LFS (ISTR the detail is
slightly different for how to do this on multilib).
3. perhaps a note that if procps fails to compile in a UTF-8 system, check
what you did to ncurses.
4. for groff, optionally sed characters U+2010,2018,2019,2212 to ascii
characters more likely to be found in common screen fonts, as in LFS.
5. for man, convert the message files from various legacy encodings to
UTF-8, and similarly the supplied non-English man pages (apropos,
makewhatis, etc). I don't know if any other core packages need this, the
problem for each package is to find a message that has been translated,
and work out how to generate that error so it can be tested to ee if the
translation appears or if a legacy encoding appears.
6. follow man by groff-utf8 and sed man.conf to use it.
7. alter vim to put UTF-8 pages (fr, it, pl, ru) into the language
directory instead of fr.UTF-8 etc. My notes say that russian otherwise
goes into ru.KOI8-R but I don't apparently do any recoding, so that needs
to be checked again - certainly, with vim-7.1 I've got UTF-8 pages
installed.
8. At the moment, I don't think there are any UTF-8 pages shipped in any
of the core packages. Shadow used to have loads, but those seem to have
been dropped when debian rescued it. Perhaps we should have something a
bit like what is in LFS explaining how to recode pages, but with the
presumption that anyone doing this wil be recoding to UTF-8. Maybe also a
note that support for non-alphabetic in groff-utf8 is not perfect -
sometimes there are error messages about fitting the text to the line,
e.g.
<standard input>:51: warning [p 1, 2.3i]: cannot adjust line - this
applies particularly for japanese, but maybe also for chinese or korean (I
can only trigger it for japanese).
Doing the recoding of the man files apparently means that 'man' cannot use
legacy encodings (e.g. latin2, koi8r) - even latin1 might have oddities.
Note that man pages in UTF-8 alphabetic languages work in the console,
provided you have a suitable font. For chinese, japanese, korean you need
a graphical display - rxvt-unicode works, I assume gnome-terminal does
too.
We would also need some explanation of why to use this (easy - supports
multiple languages on screen at the same time, rather than just a number
of neighbouring languages, and handles "fancy quotes" sometimes found in
english pages, e.g. from smartmontools), and alternatively why to not use
it (perhaps, for people who have a large amount of text in legacy
encodings, or who need to use legacy encodings).
Discussion about the "should we do this" part on -dev, please.
--
Ticket URL: <http://trac.cross-lfs.org/ticket/182>
CLFS Trac <http://trac.cross-lfs.org>
The Cross Linux From Scratch Project.
More information about the Clfs-commits
mailing list