[Clfs-commits] [CLFS Trac] #182: UTF-8

Mon Sep 29 11:05:27 PDT 2008

#182: UTF-8
-------------------+--------------------------------------------------------
 Reporter:  ken    |       Owner:  clfs-commits at lists.cross-lfs.org.
     Type:  task   |      Status:  new                              
 Priority:  minor  |   Milestone:                                   
Component:  BOOK   |     Version:                                   
 Keywords:         |  
-------------------+--------------------------------------------------------
 Couldn't find a ticket for this, so starting a new one as an aide-memoire.

 If people want to use UTF-8 (and so far, there seems a lack of consensus),
 the assumption is that it should be optional.  So far, I've been using it
 for a couple of years or so, and I'm aware of at least the following
 additions (there are probably others):

 1. for glibc add libidn.  Now that glibc no longer gets releases, I'm
 going to try this with upstream libidn (v1.9), but I haven't yet.

 2. for ncurses --enable-widec so that we build the ...w versions and
 remove/replace the non-wide versions similar to in LFS (ISTR the detail is
 slightly different for how to do this on multilib).

 3. perhaps a note that if procps fails to compile in a UTF-8 system, check
 what you did to ncurses.

 4. for groff, optionally sed characters U+2010,2018,2019,2212 to ascii
 characters more likely to be found in common screen fonts, as in LFS.

 5. for man, convert the message files from various legacy encodings to
 UTF-8, and similarly the supplied non-English man pages (apropos,
 makewhatis, etc).  I don't know if any other core packages need this, the
 problem for each package is to find a message that has been translated,
 and work out how to generate that error so it can be tested to ee if the
 translation appears or if a legacy encoding appears.

 6. follow man by groff-utf8 and sed man.conf to use it.

 7. alter vim to put UTF-8 pages (fr, it, pl, ru) into the language
 directory instead of fr.UTF-8 etc.  My notes say that russian otherwise
 goes into ru.KOI8-R but I don't apparently do any recoding, so that needs
 to be checked again - certainly, with vim-7.1 I've got UTF-8 pages
 installed.

 8. At the moment, I don't think there are any UTF-8 pages shipped in any
 of the core packages.  Shadow used to have loads, but those seem to have
 been dropped when debian  rescued it.  Perhaps we should have something a
 bit like what is in LFS explaining how to recode pages, but with the
 presumption that anyone doing this wil be recoding to UTF-8.  Maybe also a
 note that support for non-alphabetic in groff-utf8 is not perfect -
 sometimes there are error messages about fitting the text to the line,
 e.g.
 <standard input>:51: warning [p 1, 2.3i]: cannot adjust line - this
 applies particularly for japanese, but maybe also for chinese or korean (I
 can only trigger it for japanese).

 Doing the recoding of the man files apparently means that 'man' cannot use
 legacy encodings (e.g. latin2, koi8r) - even latin1 might have oddities.

 Note that man pages in UTF-8 alphabetic languages work in the console,
 provided you have a suitable font.  For chinese, japanese, korean you need
 a graphical display - rxvt-unicode works, I assume gnome-terminal does
 too.

 We would also need some explanation of why to use this (easy - supports
 multiple languages on screen at the same time, rather than just a number
 of neighbouring languages, and handles "fancy quotes" sometimes found in
 english pages, e.g. from smartmontools), and alternatively why to not use
 it (perhaps, for people who have a large amount of text in legacy
 encodings, or who need to use legacy encodings).

  Discussion about the "should we do this" part on -dev, please.

-- 
Ticket URL: <http://trac.cross-lfs.org/ticket/182>
CLFS Trac <http://trac.cross-lfs.org>
The Cross Linux From Scratch Project.