Utf.3
上传用户:rrhhcc
上传日期:2015-12-11
资源大小:54129k
文件大小:10k
- '"
- '" Copyright (c) 1997 Sun Microsystems, Inc.
- '"
- '" See the file "license.terms" for information on usage and redistribution
- '" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
- '"
- '" RCS: @(#) $Id: Utf.3,v 1.13.2.2 2003/07/18 22:15:45 dkf Exp $
- '"
- .so man.macros
- .TH Utf 3 "8.1" Tcl "Tcl Library Procedures"
- .BS
- .SH NAME
- Tcl_UniChar, Tcl_UniCharCaseMatch, Tcl_UniCharNcasecmp, Tcl_UniCharToUtf, Tcl_UtfToUniChar, Tcl_UniCharToUtfDString, Tcl_UtfToUniCharDString, Tcl_UniCharLen, Tcl_UniCharNcmp, Tcl_UtfCharComplete, Tcl_NumUtfChars, Tcl_UtfFindFirst, Tcl_UtfFindLast, Tcl_UtfNext, Tcl_UtfPrev, Tcl_UniCharAtIndex, Tcl_UtfAtIndex, Tcl_UtfBackslash - routines for manipulating UTF-8 strings.
- .SH SYNOPSIS
- .nf
- fB#include <tcl.h>fR
- .sp
- typedef ... Tcl_UniChar;
- .sp
- int
- fBTcl_UniCharToUtffR(fIch, buffR)
- .sp
- int
- fBTcl_UtfToUniCharfR(fIsrc, chPtrfR)
- .VS 8.4
- .sp
- char *
- fBTcl_UniCharToUtfDStringfR(fIuniStr, numChars, dstPtrfR)
- .sp
- Tcl_UniChar *
- fBTcl_UtfToUniCharDStringfR(fIsrc, len, dstPtrfR)
- .VE 8.4
- .sp
- int
- fBTcl_UniCharLenfR(fIuniStrfR)
- .sp
- int
- fBTcl_UniCharNcmpfR(fIuniStr, uniStr, numfR)
- .VS 8.4
- .sp
- int
- fBTcl_UniCharNcasecmpfR(fIuniStr, uniStr, numfR)
- .sp
- int
- fBTcl_UniCharCaseMatchfR(fIuniStr, uniPattern, nocasefR)
- .VE 8.4
- .sp
- int
- fBTcl_UtfNcmpfR(fIsrc, src, numfR)
- .sp
- int
- fBTcl_UtfNcasecmpfR(fIsrc, src, numfR)
- .sp
- int
- fBTcl_UtfCharCompletefR(fIsrc, lenfR)
- .sp
- int
- fBTcl_NumUtfCharsfR(fIsrc, lenfR)
- .VS 8.4
- .sp
- CONST char *
- fBTcl_UtfFindFirstfR(fIsrc, chfR)
- .sp
- CONST char *
- fBTcl_UtfFindLastfR(fIsrc, chfR)
- .sp
- CONST char *
- fBTcl_UtfNextfR(fIsrcfR)
- .sp
- CONST char *
- fBTcl_UtfPrevfR(fIsrc, startfR)
- .VE 8.4
- .sp
- Tcl_UniChar
- fBTcl_UniCharAtIndexfR(fIsrc, indexfR)
- .VS 8.4
- .sp
- CONST char *
- fBTcl_UtfAtIndexfR(fIsrc, indexfR)
- .VE 8.4
- .sp
- int
- fBTcl_UtfBackslashfR(fIsrc, readPtr, dstfR)
- .SH ARGUMENTS
- .AS "CONST Tcl_UniChar" numChars in/out
- .AP char *buf out
- Buffer in which the UTF-8 representation of the Tcl_UniChar is stored. At most
- TCL_UTF_MAX bytes are stored in the buffer.
- .AP int ch in
- The Tcl_UniChar to be converted or examined.
- .AP Tcl_UniChar *chPtr out
- Filled with the Tcl_UniChar represented by the head of the UTF-8 string.
- .AP "CONST char" *src in
- Pointer to a UTF-8 string.
- .AP "CONST Tcl_UniChar" *uniStr in
- A null-terminated Unicode string.
- .AP "CONST Tcl_UniChar" *uniPattern in
- A null-terminated Unicode string.
- .AP int len in
- The length of the UTF-8 string in bytes (not UTF-8 characters). If
- negative, all bytes up to the first null byte are used.
- .AP int numChars in
- The length of the Unicode string in characters. Must be greater than or
- equal to 0.
- .AP "Tcl_DString" *dstPtr in/out
- A pointer to a previously-initialized fBTcl_DStringfR.
- .AP "unsigned long" num in
- The number of characters to compare.
- .AP "CONST char" *start in
- Pointer to the beginning of a UTF-8 string.
- .AP int index in
- The index of a character (not byte) in the UTF-8 string.
- .AP int *readPtr out
- If non-NULL, filled with the number of bytes in the backslash sequence,
- including the backslash character.
- .AP char *dst out
- Buffer in which the bytes represented by the backslash sequence are stored.
- At most TCL_UTF_MAX bytes are stored in the buffer.
- .VS 8.4
- .AP int nocase in
- Specifies whether the match should be done case-sensitive (0) or
- case-insensitive (1).
- .VE 8.4
- .BE
- .SH DESCRIPTION
- .PP
- These routines convert between UTF-8 strings and Tcl_UniChars. A
- Tcl_UniChar is a Unicode character represented as an unsigned, fixed-size
- quantity. A UTF-8 character is a Unicode character represented as
- a varying-length sequence of up to TCL_UTF_MAX bytes. A multibyte UTF-8
- sequence consists of a lead byte followed by some number of trail bytes.
- .PP
- fBTCL_UTF_MAXfR is the maximum number of bytes that it takes to
- represent one Unicode character in the UTF-8 representation.
- .PP
- fBTcl_UniCharToUtffR stores the Tcl_UniChar fIchfR as a UTF-8 string
- in starting at fIbuffR. The return value is the number of bytes stored
- in fIbuffR.
- .PP
- fBTcl_UtfToUniCharfR reads one UTF-8 character starting at fIsrcfR
- and stores it as a Tcl_UniChar in fI*chPtrfR. The return value is the
- number of bytes read from fIsrcfR.. The caller must ensure that the
- source buffer is long enough such that this routine does not run off the
- end and dereference non-existent or random memory; if the source buffer
- is known to be null-terminated, this will not happen. If the input is
- not in proper UTF-8 format, fBTcl_UtfToUniCharfR will store the first
- byte of fIsrcfR in fI*chPtrfR as a Tcl_UniChar between 0x0000 and
- 0x00ff and return 1.
- .PP
- fBTcl_UniCharToUtfDStringfR converts the given Unicode string
- to UTF-8, storing the result in a previously-initialized fBTcl_DStringfR.
- You must specify the length of the given Unicode string.
- The return value is a pointer to the UTF-8 representation of the
- Unicode string. Storage for the return value is appended to the
- end of the fBTcl_DStringfR.
- .PP
- fBTcl_UtfToUniCharDStringfR converts the given UTF-8 string to Unicode,
- storing the result in the previously-initialized fBTcl_DStringfR.
- you may either specify the length of the given UTF-8 string or "-1",
- in which case fBTcl_UtfToUniCharDStringfR uses fBstrlenfR to
- calculate the length. The return value is a pointer to the Unicode
- representation of the UTF-8 string. Storage for the return value
- is appended to the end of the fBTcl_DStringfR. The Unicode string
- is terminated with a Unicode null character.
- .PP
- fBTcl_UniCharLenfR corresponds to fBstrlenfR for Unicode
- characters. It accepts a null-terminated Unicode string and returns
- the number of Unicode characters (not bytes) in that string.
- .PP
- fBTcl_UniCharNcmpfR and fBTcl_UniCharNcasecmpfR correspond to
- fBstrncmpfR and fBstrncasecmpfR, respectively, for Unicode characters.
- They accepts two null-terminated Unicode strings and the number of characters
- to compare. Both strings are assumed to be at least fIlenfR characters
- long. fBTcl_UniCharNcmpfR compares the two strings character-by-character
- according to the Unicode character ordering. It returns an integer greater
- than, equal to, or less than 0 if the first string is greater than, equal
- to, or less than the second string respectively. fBTcl_UniCharNcasecmpfR
- is the Unicode case insensitive version.
- .PP
- .VS 8.4
- fBTcl_UniCharCaseMatchfR is the Unicode equivalent to
- fBTcl_StringCaseMatchfR. It accepts a null-terminated Unicode string,
- a Unicode pattern, and a boolean value specifying whether the match should
- be case sensitive and returns whether the string matches the pattern.
- .VE 8.4
- .PP
- fBTcl_UtfNcmpfR corresponds to fBstrncmpfR for UTF-8 strings. It
- accepts two null-terminated UTF-8 strings and the number of characters
- to compare. (Both strings are assumed to be at least fIlenfR
- characters long.) fBTcl_UtfNcmpfR compares the two strings
- character-by-character according to the Unicode character ordering.
- It returns an integer greater than, equal to, or less than 0 if the
- first string is greater than, equal to, or less than the second string
- respectively.
- .PP
- fBTcl_UtfNcasecmpfR corresponds to fBstrncasecmpfR for UTF-8
- strings. It is similar to fBTcl_UtfNcmpfR except comparisons ignore
- differences in case when comparing upper, lower or title case
- characters.
- .PP
- fBTcl_UtfCharCompletefR returns 1 if the source UTF-8 string fIsrcfR
- of length fIlenfR bytes is long enough to be decoded by
- fBTcl_UtfToUniCharfR, or 0 otherwise. This function does not guarantee
- that the UTF-8 string is properly formed. This routine is used by
- procedures that are operating on a byte at a time and need to know if a
- full Tcl_UniChar has been seen.
- .PP
- fBTcl_NumUtfCharsfR corresponds to fBstrlenfR for UTF-8 strings. It
- returns the number of Tcl_UniChars that are represented by the UTF-8 string
- fIsrcfR. The length of the source string is fIlenfR bytes. If the
- length is negative, all bytes up to the first null byte are used.
- .PP
- fBTcl_UtfFindFirstfR corresponds to fBstrchrfR for UTF-8 strings. It
- returns a pointer to the first occurrence of the Tcl_UniChar fIchfR
- in the null-terminated UTF-8 string fIsrcfR. The null terminator is
- considered part of the UTF-8 string.
- .PP
- fBTcl_UtfFindLastfR corresponds to fBstrrchrfR for UTF-8 strings. It
- returns a pointer to the last occurrence of the Tcl_UniChar fIchfR
- in the null-terminated UTF-8 string fIsrcfR. The null terminator is
- considered part of the UTF-8 string.
- .PP
- Given fIsrcfR, a pointer to some location in a UTF-8 string,
- fBTcl_UtfNextfR returns a pointer to the next UTF-8 character in the
- string. The caller must not ask for the next character after the last
- character in the string if the string is not terminated by a null
- character.
- .PP
- Given fIsrcfR, a pointer to some location in a UTF-8 string (or to a
- null byte immediately following such a string), fBTcl_UtfPrevfR
- returns a pointer to the closest preceding byte that starts a UTF-8
- character.
- This function will not back up to a position before fIstartfR,
- the start of the UTF-8 string. If fIsrcfR was already at fIstartfR, the
- return value will be fIstartfR.
- .PP
- fBTcl_UniCharAtIndexfR corresponds to a C string array dereference or the
- Pascal Ord() function. It returns the Tcl_UniChar represented at the
- specified character (not byte) fIindexfR in the UTF-8 string
- fIsrcfR. The source string must contain at least fIindexfR
- characters. Behavior is undefined if a negative fIindexfR is given.
- .PP
- fBTcl_UtfAtIndexfR returns a pointer to the specified character (not
- byte) fIindexfR in the UTF-8 string fIsrcfR. The source string must
- contain at least fIindexfR characters. This is equivalent to calling
- fBTcl_UtfNextfR fIindexfR times. If a negative fIindexfR is given,
- the return pointer points to the first character in the source string.
- .PP
- fBTcl_UtfBackslashfR is a utility procedure used by several of the Tcl
- commands. It parses a backslash sequence and stores the properly formed
- UTF-8 character represented by the backslash sequence in the output
- buffer fIdstfR. At most TCL_UTF_MAX bytes are stored in the buffer.
- fBTcl_UtfBackslashfR modifies fI*readPtrfR to contain the number
- of bytes in the backslash sequence, including the backslash character.
- The return value is the number of bytes stored in the output buffer.
- .PP
- See the fBTclfR manual entry for information on the valid backslash
- sequences. All of the sequences described in the Tcl manual entry are
- supported by fBTcl_UtfBackslashfR.
- .SH KEYWORDS
- utf, unicode, backslash