Encoding.3
上传用户:rrhhcc
上传日期:2015-12-11
资源大小:54129k
文件大小:25k
- '"
- '" Copyright (c) 1997-1998 Sun Microsystems, Inc.
- '"
- '" See the file "license.terms" for information on usage and redistribution
- '" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
- '"
- '" RCS: @(#) $Id: Encoding.3,v 1.11.2.1 2003/07/18 16:56:24 dgp Exp $
- '"
- .so man.macros
- .TH Tcl_GetEncoding 3 "8.1" Tcl "Tcl Library Procedures"
- .BS
- .SH NAME
- Tcl_GetEncoding, Tcl_FreeEncoding, Tcl_ExternalToUtfDString, Tcl_ExternalToUtf, Tcl_UtfToExternalDString, Tcl_UtfToExternal, Tcl_WinTCharToUtf, Tcl_WinUtfToTChar, Tcl_GetEncodingName, Tcl_SetSystemEncoding, Tcl_GetEncodingNames, Tcl_CreateEncoding, Tcl_GetDefaultEncodingDir, Tcl_SetDefaultEncodingDir - procedures for creating and using encodings.
- .SH SYNOPSIS
- .nf
- fB#include <tcl.h>fR
- .sp
- Tcl_Encoding
- fBTcl_GetEncodingfR(fIinterp, namefR)
- .sp
- void
- fBTcl_FreeEncodingfR(fIencodingfR)
- .sp
- char *
- fBTcl_ExternalToUtfDStringfR(fIencoding, src, srcLen, dstPtrfR)
- .sp
- int
- fBTcl_ExternalToUtffR(fIinterp, encoding, src, srcLen, flags, statePtr, dst, dstLen, srcReadPtr, dstWrotePtr,
- dstCharsPtrfR)
- .sp
- char *
- fBTcl_UtfToExternalDStringfR(fIencoding, src, srcLen, dstPtrfR)
- .sp
- int
- fBTcl_UtfToExternalfR(fIinterp, encoding, src, srcLen, flags, statePtr, dst, dstLen, srcReadPtr, dstWrotePtr,
- dstCharsPtrfR)
- .sp
- char *
- fBTcl_WinTCharToUtffR(fItsrc, srcLen, dstPtrfR)
- .sp
- TCHAR *
- fBTcl_WinUtfToTCharfR(fIsrc, srcLen, dstPtrfR)
- .sp
- CONST char *
- fBTcl_GetEncodingNamefR(fIencodingfR)
- .sp
- int
- fBTcl_SetSystemEncodingfR(fIinterp, namefR)
- .sp
- void
- fBTcl_GetEncodingNamesfR(fIinterpfR)
- .sp
- Tcl_Encoding
- fBTcl_CreateEncodingfR(fItypePtrfR)
- .sp
- CONST char *
- fBTcl_GetDefaultEncodingDirfR(fIvoidfR)
- .sp
- void
- fBTcl_SetDefaultEncodingDirfR(fIpathfR)
- .SH ARGUMENTS
- .AS Tcl_EncodingState *dstWrotePtr
- .AP Tcl_Interp *interp in
- Interpreter to use for error reporting, or NULL if no error reporting is
- desired.
- .AP "CONST char" *name in
- Name of encoding to load.
- .AP Tcl_Encoding encoding in
- The encoding to query, free, or use for converting text. If fIencodingfR is
- NULL, the current system encoding is used.
- .AP "CONST char" *src in
- For the fBTcl_ExternalToUtffR functions, an array of bytes in the
- specified encoding that are to be converted to UTF-8. For the
- fBTcl_UtfToExternalfR and fBTcl_WinUtfToTCharfR functions, an array of
- UTF-8 characters to be converted to the specified encoding.
- .AP "CONST TCHAR" *tsrc in
- An array of Windows TCHAR characters to convert to UTF-8.
- .AP int srcLen in
- Length of fIsrcfR or fItsrcfR in bytes. If the length is negative, the
- encoding-specific length of the string is used.
- .AP Tcl_DString *dstPtr out
- Pointer to an uninitialized or free fBTcl_DStringfR in which the converted
- result will be stored.
- .AP int flags in
- Various flag bits OR-ed together.
- TCL_ENCODING_START signifies that the
- source buffer is the first block in a (potentially multi-block) input
- stream, telling the conversion routine to reset to an initial state and
- perform any initialization that needs to occur before the first byte is
- converted. TCL_ENCODING_END signifies that the source buffer is the last
- block in a (potentially multi-block) input stream, telling the conversion
- routine to perform any finalization that needs to occur after the last
- byte is converted and then to reset to an initial state.
- TCL_ENCODING_STOPONERROR signifies that the conversion routine should
- return immediately upon reading a source character that doesn't exist in
- the target encoding; otherwise a default fallback character will
- automatically be substituted.
- .AP Tcl_EncodingState *statePtr in/out
- Used when converting a (generally long or indefinite length) byte stream
- in a piece by piece fashion. The conversion routine stores its current
- state in fI*statePtrfR after fIsrcfR (the buffer containing the
- current piece) has been converted; that state information must be passed
- back when converting the next piece of the stream so the conversion
- routine knows what state it was in when it left off at the end of the
- last piece. May be NULL, in which case the value specified for fIflagsfR
- is ignored and the source buffer is assumed to contain the complete string to
- convert.
- .AP char *dst out
- Buffer in which the converted result will be stored. No more than
- fIdstLenfR bytes will be stored in fIdstfR.
- .AP int dstLen in
- The maximum length of the output buffer fIdstfR in bytes.
- .AP int *srcReadPtr out
- Filled with the number of bytes from fIsrcfR that were actually
- converted. This may be less than the original source length if there was
- a problem converting some source characters. May be NULL.
- .AP int *dstWrotePtr out
- Filled with the number of bytes that were actually stored in the output
- buffer as a result of the conversion. May be NULL.
- .AP int *dstCharsPtr out
- Filled with the number of characters that correspond to the number of bytes
- stored in the output buffer. May be NULL.
- .AP Tcl_EncodingType *typePtr in
- Structure that defines a new type of encoding.
- .AP "CONST char" *path in
- A path to the location of the encoding file.
- .BE
- .SH INTRODUCTION
- .PP
- These routines convert between Tcl's internal character representation,
- UTF-8, and character representations used by various operating systems or
- file systems, such as Unicode, ASCII, or Shift-JIS. When operating on
- strings, such as such as obtaining the names of files or displaying
- characters using international fonts, the strings must be translated into
- one or possibly multiple formats that the various system calls can use. For
- instance, on a Japanese Unix workstation, a user might obtain a filename
- represented in the EUC-JP file encoding and then translate the characters to
- the jisx0208 font encoding in order to display the filename in a Tk widget.
- The purpose of the encoding package is to help bridge the translation gap.
- UTF-8 provides an intermediate staging ground for all the various
- encodings. In the example above, text would be translated into UTF-8 from
- whatever file encoding the operating system is using. Then it would be
- translated from UTF-8 into whatever font encoding the display routines
- require.
- .PP
- Some basic encodings are compiled into Tcl. Others can be defined by the
- user or dynamically loaded from encoding files in a
- platform-independent manner.
- .SH DESCRIPTION
- .PP
- fBTcl_GetEncodingfR finds an encoding given its fInamefR. The name may
- refer to a builtin Tcl encoding, a user-defined encoding registered by
- calling fBTcl_CreateEncodingfR, or a dynamically-loadable encoding
- file. The return value is a token that represents the encoding and can be
- used in subsequent calls to procedures such as fBTcl_GetEncodingNamefR,
- fBTcl_FreeEncodingfR, and fBTcl_UtfToExternalfR. If the name did not
- refer to any known or loadable encoding, NULL is returned and an error
- message is returned in fIinterpfR.
- .PP
- The encoding package maintains a database of all encodings currently in use.
- The first time fInamefR is seen, fBTcl_GetEncodingfR returns an
- encoding with a reference count of 1. If the same fInamefR is requested
- further times, then the reference count for that encoding is incremented
- without the overhead of allocating a new encoding and all its associated
- data structures.
- .PP
- When an fIencodingfR is no longer needed, fBTcl_FreeEncodingfR
- should be called to release it. When an fIencodingfR is no longer in use
- anywhere (i.e., it has been freed as many times as it has been gotten)
- fBTcl_FreeEncodingfR will release all storage the encoding was using
- and delete it from the database.
- .PP
- fBTcl_ExternalToUtfDStringfR converts a source buffer fIsrcfR from the
- specified fIencodingfR into UTF-8. The converted bytes are stored in
- fIdstPtrfR, which is then null-terminated. The caller should eventually
- call fBTcl_DStringFreefR to free any information stored in fIdstPtrfR.
- When converting, if any of the characters in the source buffer cannot be
- represented in the target encoding, a default fallback character will be
- used. The return value is a pointer to the value stored in the DString.
- .PP
- fBTcl_ExternalToUtffR converts a source buffer fIsrcfR from the specified
- fIencodingfR into UTF-8. Up to fIsrcLenfR bytes are converted from the
- source buffer and up to fIdstLenfR converted bytes are stored in fIdstfR.
- In all cases, fI*srcReadPtrfR is filled with the number of bytes that were
- successfully converted from fIsrcfR and fI*dstWrotePtrfR is filled with
- the corresponding number of bytes that were stored in fIdstfR. The return
- value is one of the following:
- .RS
- .IP fBTCL_OKfR 29
- All bytes of fIsrcfR were converted.
- .IP fBTCL_CONVERT_NOSPACEfR 29
- The destination buffer was not large enough for all of the converted data; as
- many characters as could fit were converted though.
- .IP fBTCL_CONVERT_MULTIBYTEfR 29
- The last fews bytes in the source buffer were the beginning of a multibyte
- sequence, but more bytes were needed to complete this sequence. A
- subsequent call to the conversion routine should pass a buffer containing
- the unconverted bytes that remained in fIsrcfR plus some further bytes
- from the source stream to properly convert the formerly split-up multibyte
- sequence.
- .IP fBTCL_CONVERT_SYNTAXfR 29
- The source buffer contained an invalid character sequence. This may occur
- if the input stream has been damaged or if the input encoding method was
- misidentified.
- .IP fBTCL_CONVERT_UNKNOWNfR 29
- The source buffer contained a character that could not be represented in
- the target encoding and TCL_ENCODING_STOPONERROR was specified.
- .RE
- .LP
- fBTcl_UtfToExternalDStringfR converts a source buffer fIsrcfR from UTF-8
- into the specified fIencodingfR. The converted bytes are stored in
- fIdstPtrfR, which is then terminated with the appropriate encoding-specific
- null. The caller should eventually call fBTcl_DStringFreefR to free any
- information stored in fIdstPtrfR. When converting, if any of the
- characters in the source buffer cannot be represented in the target
- encoding, a default fallback character will be used. The return value is
- a pointer to the value stored in the DString.
- .PP
- fBTcl_UtfToExternalfR converts a source buffer fIsrcfR from UTF-8 into
- the specified fIencodingfR. Up to fIsrcLenfR bytes are converted from
- the source buffer and up to fIdstLenfR converted bytes are stored in
- fIdstfR. In all cases, fI*srcReadPtrfR is filled with the number of
- bytes that were successfully converted from fIsrcfR and fI*dstWrotePtrfR
- is filled with the corresponding number of bytes that were stored in
- fIdstfR. The return values are the same as the return values for
- fBTcl_ExternalToUtffR.
- .PP
- fBTcl_WinUtfToTCharfR and fBTcl_WinTCharToUtffR are
- Windows-only convenience
- functions for converting between UTF-8 and Windows strings. On Windows 95
- (as with the Macintosh and Unix operating systems),
- all strings exchanged between Tcl and the operating system are "char"
- based. On Windows NT, some strings exchanged between Tcl and the
- operating system are "char" oriented while others are in Unicode. By
- convention, in Windows a TCHAR is a character in the ANSI code page
- on Windows 95 and a Unicode character on Windows NT.
- .PP
- If you planned to use the same "char" based interfaces on both Windows
- 95 and Windows NT, you could use fBTcl_UtfToExternalfR and
- fBTcl_ExternalToUtffR (or their fBTcl_DStringfR equivalents) with an
- encoding of NULL (the current system encoding). On the other hand,
- if you planned to use the Unicode interface when running on Windows NT
- and the "char" interfaces when running on Windows 95, you would have
- to perform the following type of test over and over in your program
- (as represented in pseudo-code):
- .CS
- if (running NT) {
- encoding <- Tcl_GetEncoding("unicode");
- nativeBuffer <- Tcl_UtfToExternal(encoding, utfBuffer);
- Tcl_FreeEncoding(encoding);
- } else {
- nativeBuffer <- Tcl_UtfToExternal(NULL, utfBuffer);
- .CE
- fBTcl_WinUtfToTCharfR and fBTcl_WinTCharToUtffR automatically
- handle this test and use the proper encoding based on the current
- operating system. fBTcl_WinUtfToTCharfR returns a pointer to
- a TCHAR string, and fBTcl_WinTCharToUtffR expects a TCHAR string
- pointer as the fIsrcfR string. Otherwise, these functions
- behave identically to fBTcl_UtfToExternalDStringfR and
- fBTcl_ExternalToUtfDStringfR.
- .PP
- fBTcl_GetEncodingNamefR is roughly the inverse of fBTcl_GetEncodingfR.
- Given an fIencodingfR, the return value is the fInamefR argument that
- was used to create the encoding. The string returned by
- fBTcl_GetEncodingNamefR is only guaranteed to persist until the
- fIencodingfR is deleted. The caller must not modify this string.
- .PP
- fBTcl_SetSystemEncodingfR sets the default encoding that should be used
- whenever the user passes a NULL value for the fIencodingfR argument to
- any of the other encoding functions. If fInamefR is NULL, the system
- encoding is reset to the default system encoding, fBbinaryfR. If the
- name did not refer to any known or loadable encoding, TCL_ERROR is
- returned and an error message is left in fIinterpfR. Otherwise, this
- procedure increments the reference count of the new system encoding,
- decrements the reference count of the old system encoding, and returns
- TCL_OK.
- .PP
- fBTcl_GetEncodingNamesfR sets the fIinterpfR result to a list
- consisting of the names of all the encodings that are currently defined
- or can be dynamically loaded, searching the encoding path specified by
- fBTcl_SetDefaultEncodingDirfR. This procedure does not ensure that the
- dynamically-loadable encoding files contain valid data, but merely that they
- exist.
- .PP
- fBTcl_CreateEncodingfR defines a new encoding and registers the C
- procedures that are called back to convert between the encoding and
- UTF-8. Encodings created by fBTcl_CreateEncodingfR are thereafter
- visible in the database used by fBTcl_GetEncodingfR. Just as with the
- fBTcl_GetEncodingfR procedure, the return value is a token that
- represents the encoding and can be used in subsequent calls to other
- encoding functions. fBTcl_CreateEncodingfR returns an encoding with a
- reference count of 1. If an encoding with the specified fInamefR
- already exists, then its entry in the database is replaced with the new
- encoding; the token for the old encoding will remain valid and continue
- to behave as before, but users of the new token will now call the new
- encoding procedures.
- .PP
- The fItypePtrfR argument to fBTcl_CreateEncodingfR contains information
- about the name of the encoding and the procedures that will be called to
- convert between this encoding and UTF-8. It is defined as follows:
- .PP
- .CS
- typedef struct Tcl_EncodingType {
- CONST char *fIencodingNamefR;
- Tcl_EncodingConvertProc *fItoUtfProcfR;
- Tcl_EncodingConvertProc *fIfromUtfProcfR;
- Tcl_EncodingFreeProc *fIfreeProcfR;
- ClientData fIclientDatafR;
- int fInullSizefR;
- } Tcl_EncodingType;
- .CE
- .PP
- The fIencodingNamefR provides a string name for the encoding, by
- which it can be referred in other procedures such as
- fBTcl_GetEncodingfR. The fItoUtfProcfR refers to a callback
- procedure to invoke to convert text from this encoding into UTF-8.
- The fIfromUtfProcfR refers to a callback procedure to invoke to
- convert text from UTF-8 into this encoding. The fIfreeProcfR refers
- to a callback procedure to invoke when this encoding is deleted. The
- fIfreeProcfR field may be NULL. The fIclientDatafR contains an
- arbitrary one-word value passed to fItoUtfProcfR, fIfromUtfProcfR,
- and fIfreeProcfR whenever they are called. Typically, this is a
- pointer to a data structure containing encoding-specific information
- that can be used by the callback procedures. For instance, two very
- similar encodings such as fBasciifR and fBmacRomanfR may use the
- same callback procedure, but use different values of fIclientDatafR
- to control its behavior. The fInullSizefR specifies the number of
- zero bytes that signify end-of-string in this encoding. It must be
- fB1fR (for single-byte or multi-byte encodings like ASCII or
- Shift-JIS) or fB2fR (for double-byte encodings like Unicode).
- Constant-sized encodings with 3 or more bytes per character (such as
- CNS11643) are not accepted.
- .PP
- The callback procedures fItoUtfProcfR and fIfromUtfProcfR should match the
- type fBTcl_EncodingConvertProcfR:
- .PP
- .CS
- typedef int Tcl_EncodingConvertProc(
- ClientData fIclientDatafR,
- CONST char *fIsrcfR,
- int fIsrcLenfR,
- int fIflagsfR,
- Tcl_Encoding *fIstatePtrfR,
- char *fIdstfR,
- int fIdstLenfR,
- int *fIsrcReadPtrfR,
- int *fIdstWrotePtrfR,
- int *fIdstCharsPtrfR);
- .CE
- .PP
- The fItoUtfProcfR and fIfromUtfProcfR procedures are called by the
- fBTcl_ExternalToUtffR or fBTcl_UtfToExternalfR family of functions to
- perform the actual conversion. The fIclientDatafR parameter to these
- procedures is the same as the fIclientDatafR field specified to
- fBTcl_CreateEncodingfR when the encoding was created. The remaining
- arguments to the callback procedures are the same as the arguments,
- documented at the top, to fBTcl_ExternalToUtffR or
- fBTcl_UtfToExternalfR, with the following exceptions. If the
- fIsrcLenfR argument to one of those high-level functions is negative,
- the value passed to the callback procedure will be the appropriate
- encoding-specific string length of fIsrcfR. If any of the fIsrcReadPtrfR,
- fIdstWrotePtrfR, or fIdstCharsPtrfR arguments to one of the high-level
- functions is NULL, the corresponding value passed to the callback
- procedure will be a non-NULL location.
- .PP
- The callback procedure fIfreeProcfR, if non-NULL, should match the type
- fBTcl_EncodingFreeProcfR:
- .CS
- typedef void Tcl_EncodingFreeProc(
- ClientData fIclientDatafR);
- .CE
- .PP
- This fIfreeProcfR function is called when the encoding is deleted. The
- fIclientDatafR parameter is the same as the fIclientDatafR field
- specified to fBTcl_CreateEncodingfR when the encoding was created.
- .PP
- fBTcl_GetDefaultEncodingDirfR and fBTcl_SetDefaultEncodingDirfR
- access and set the directory to use when locating the default encoding
- files. If this value is not NULL, the fBTclpInitLibraryPathfR routine
- appends the path to the head of the search path, and uses this path as
- the first place to look into when trying to locate the encoding file.
- .SH "ENCODING FILES"
- Space would prohibit precompiling into Tcl every possible encoding
- algorithm, so many encodings are stored on disk as dynamically-loadable
- encoding files. This behavior also allows the user to create additional
- encoding files that can be loaded using the same mechanism. These
- encoding files contain information about the tables and/or escape
- sequences used to map between an external encoding and Unicode. The
- external encoding may consist of single-byte, multi-byte, or double-byte
- characters.
- .PP
- Each dynamically-loadable encoding is represented as a text file. The
- initial line of the file, beginning with a ``#'' symbol, is a comment
- that provides a human-readable description of the file. The next line
- identifies the type of encoding file. It can be one of the following
- letters:
- .IP "[1] fBSfR"
- A single-byte encoding, where one character is always one byte long in the
- encoding. An example is fBiso8859-1fR, used by many European languages.
- .IP "[2] fBDfR"
- A double-byte encoding, where one character is always two bytes long in the
- encoding. An example is fBbig5fR, used for Chinese text.
- .IP "[3] fBMfR"
- A multi-byte encoding, where one character may be either one or two bytes long.
- Certain bytes are a lead bytes, indicating that another byte must follow
- and that together the two bytes represent one character. Other bytes are not
- lead bytes and represent themselves. An example is fBshiftjisfR, used by
- many Japanese computers.
- .IP "[4] fBEfR"
- An escape-sequence encoding, specifying that certain sequences of bytes
- do not represent characters, but commands that describe how following bytes
- should be interpreted.
- .PP
- The rest of the lines in the file depend on the type.
- .PP
- Cases [1], [2], and [3] are collectively referred to as table-based encoding
- files. The lines in a table-based encoding file are in the same
- format as this example taken from the fBshiftjisfR encoding (this is not
- the complete file):
- .CS
- # Encoding file: shiftjis, multi-byte
- M
- 003F 0 40
- 00
- 0000000100020003000400050006000700080009000A000B000C000D000E000F
- 0010001100120013001400150016001700180019001A001B001C001D001E001F
- 0020002100220023002400250026002700280029002A002B002C002D002E002F
- 0030003100320033003400350036003700380039003A003B003C003D003E003F
- 0040004100420043004400450046004700480049004A004B004C004D004E004F
- 0050005100520053005400550056005700580059005A005B005C005D005E005F
- 0060006100620063006400650066006700680069006A006B006C006D006E006F
- 0070007100720073007400750076007700780079007A007B007C007D203E007F
- 0080000000000000000000000000000000000000000000000000000000000000
- 0000000000000000000000000000000000000000000000000000000000000000
- 0000FF61FF62FF63FF64FF65FF66FF67FF68FF69FF6AFF6BFF6CFF6DFF6EFF6F
- FF70FF71FF72FF73FF74FF75FF76FF77FF78FF79FF7AFF7BFF7CFF7DFF7EFF7F
- FF80FF81FF82FF83FF84FF85FF86FF87FF88FF89FF8AFF8BFF8CFF8DFF8EFF8F
- FF90FF91FF92FF93FF94FF95FF96FF97FF98FF99FF9AFF9BFF9CFF9DFF9EFF9F
- 0000000000000000000000000000000000000000000000000000000000000000
- 0000000000000000000000000000000000000000000000000000000000000000
- 81
- 0000000000000000000000000000000000000000000000000000000000000000
- 0000000000000000000000000000000000000000000000000000000000000000
- 0000000000000000000000000000000000000000000000000000000000000000
- 0000000000000000000000000000000000000000000000000000000000000000
- 300030013002FF0CFF0E30FBFF1AFF1BFF1FFF01309B309C00B4FF4000A8FF3E
- FFE3FF3F30FD30FE309D309E30034EDD30053006300730FC20152010FF0F005C
- 301C2016FF5C2026202520182019201C201DFF08FF0930143015FF3BFF3DFF5B
- FF5D30083009300A300B300C300D300E300F30103011FF0B221200B100D70000
- 00F7FF1D2260FF1CFF1E22662267221E22342642264000B0203220332103FFE5
- FF0400A200A3FF05FF03FF06FF0AFF2000A72606260525CB25CF25CE25C725C6
- 25A125A025B325B225BD25BC203B301221922190219121933013000000000000
- 000000000000000000000000000000002208220B2286228722822283222A2229
- 000000000000000000000000000000002227222800AC21D221D4220022030000
- 0000000000000000000000000000000000000000222022A52312220222072261
- 2252226A226B221A223D221D2235222B222C0000000000000000000000000000
- 212B2030266F266D266A2020202100B6000000000000000025EF000000000000
- .CE
- .PP
- The third line of the file is three numbers. The first number is the
- fallback character (in base 16) to use when converting from UTF-8 to this
- encoding. The second number is a fB1fR if this file represents the
- encoding for a symbol font, or fB0fR otherwise. The last number (in base
- 10) is how many pages of data follow.
- .PP
- Subsequent lines in the example above are pages that describe how to map
- from the encoding into 2-byte Unicode. The first line in a page identifies
- the page number. Following it are 256 double-byte numbers, arranged as 16
- rows of 16 numbers. Given a character in the encoding, the high byte of
- that character is used to select which page, and the low byte of that
- character is used as an index to select one of the double-byte numbers in
- that page - the value obtained being the corresponding Unicode character.
- By examination of the example above, one can see that the characters 0x7E
- and 0x8163 in fBshiftjisfR map to 203E and 2026 in Unicode, respectively.
- .PP
- Following the first page will be all the other pages, each in the same
- format as the first: one number identifying the page followed by 256
- double-byte Unicode characters. If a character in the encoding maps to the
- Unicode character 0000, it means that the character doesn't actually exist.
- If all characters on a page would map to 0000, that page can be omitted.
- .PP
- Case [4] is the escape-sequence encoding file. The lines in an this type of
- file are in the same format as this example taken from the fBiso2022-jpfR
- encoding:
- .CS
- .ta 1.5i
- # Encoding file: iso2022-jp, escape-driven
- E
- init {}
- final {}
- iso8859-1 \x1b(B
- jis0201 \x1b(J
- jis0208 \x1b$@
- jis0208 \x1b$B
- jis0212 \x1b$(D
- gb2312 \x1b$A
- ksc5601 \x1b$(C
- .CE
- .PP
- In the file, the first column represents an option and the second column
- is the associated value. fBinitfR is a string to emit or expect before
- the first character is converted, while fBfinalfR is a string to emit
- or expect after the last character. All other options are names of
- table-based encodings; the associated value is the escape-sequence that
- marks that encoding. Tcl syntax is used for the values; in the above
- example, for instance, ``fB{}fR'' represents the empty string and
- ``fB\x1bfR'' represents character 27.
- .PP
- When fBTcl_GetEncodingfR encounters an encoding fInamefR that has not
- been loaded, it attempts to load an encoding file called fInamefB.encfR
- from the fBencodingfR subdirectory of each directory specified in the
- library path fB$tcl_libPathfR. If the encoding file exists, but is
- malformed, an error message will be left in fIinterpfR.
- .SH KEYWORDS
- utf, encoding, convert