RegExp.3
上传用户:rrhhcc
上传日期:2015-12-11
资源大小:54129k
文件大小:15k
- '"
- '" Copyright (c) 1994 The Regents of the University of California.
- '" Copyright (c) 1994-1996 Sun Microsystems, Inc.
- '" Copyright (c) 1998-1999 Scriptics Corporation
- '"
- '" See the file "license.terms" for information on usage and redistribution
- '" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
- '"
- '" RCS: @(#) $Id: RegExp.3,v 1.13 2002/11/13 22:11:40 vincentdarley Exp $
- '"
- .so man.macros
- .TH Tcl_RegExpMatch 3 8.1 Tcl "Tcl Library Procedures"
- .BS
- .SH NAME
- Tcl_RegExpMatch, Tcl_RegExpCompile, Tcl_RegExpExec, Tcl_RegExpRange, Tcl_GetRegExpFromObj, Tcl_RegExpMatchObj, Tcl_RegExpExecObj, Tcl_RegExpGetInfo - Pattern matching with regular expressions
- .SH SYNOPSIS
- .nf
- fB#include <tcl.h>fR
- .sp
- int
- fBTcl_RegExpMatchObjfR(fIinterpfR, fIstrObjfR, fIpatObjfR)
- .sp
- int
- fBTcl_RegExpMatchfR(fIinterpfR, fIstringfR, fIpatternfR)
- .sp
- Tcl_RegExp
- fBTcl_RegExpCompilefR(fIinterpfR, fIpatternfR)
- .sp
- int
- fBTcl_RegExpExecfR(fIinterpfR, fIregexpfR, fIstringfR, fIstartfR)
- .sp
- fBTcl_RegExpRangefR(fIregexpfR, fIindexfR, fIstartPtrfR, fIendPtrfR)
- .VS 8.1
- .sp
- Tcl_RegExp
- fBTcl_GetRegExpFromObjfR(fIinterpfR, fIpatObjfR, fIcflagsfR)
- .sp
- int
- fBTcl_RegExpExecObjfR(fIinterpfR, fIregexpfR, fIobjPtrfR, fIoffsetfR, fInmatchesfR, fIeflagsfR)
- .sp
- fBTcl_RegExpGetInfofR(fIregexpfR, fIinfoPtrfR)
- .VE 8.1
- .SH ARGUMENTS
- .AS Tcl_Interp *interp
- .AP Tcl_Interp *interp in
- Tcl interpreter to use for error reporting. The interpreter may be
- NULL if no error reporting is desired.
- .VS 8.1
- .AP Tcl_Obj *strObj in/out
- Refers to the object from which to get the string to search. The
- internal representation of the object may be converted to a form that
- can be efficiently searched.
- .AP Tcl_Obj *patObj in/out
- Refers to the object from which to get a regular expression. The
- compiled regular expression is cached in the object.
- .VE 8.1
- .AP char *string in
- String to check for a match with a regular expression.
- .AP "CONST char" *pattern in
- String in the form of a regular expression pattern.
- .AP Tcl_RegExp regexp in
- Compiled regular expression. Must have been returned previously
- by fBTcl_GetRegExpFromObjfR or fBTcl_RegExpCompilefR.
- .AP char *start in
- If fIstringfR is just a portion of some other string, this argument
- identifies the beginning of the larger string.
- If it isn't the same as fIstringfR, then no fB^fR matches
- will be allowed.
- .AP int index in
- Specifies which range is desired: 0 means the range of the entire
- match, 1 or greater means the range that matched a parenthesized
- sub-expression.
- .VS 8.4
- .AP "CONST char" **startPtr out
- The address of the first character in the range is stored here, or
- NULL if there is no such range.
- .AP "CONST char" **endPtr out
- The address of the character just after the last one in the range
- is stored here, or NULL if there is no such range.
- .VE 8.4
- .VS 8.1
- .AP int cflags in
- OR-ed combination of compilation flags. See below for more information.
- .AP Tcl_Obj *objPtr in/out
- An object which contains the string to check for a match with a
- regular expression.
- .AP int offset in
- The character offset into the string where matching should begin.
- The value of the offset has no impact on fB^fR matches. This
- behavior is controlled by fIeflagsfR.
- .AP int nmatches in
- The number of matching subexpressions that should be remembered for
- later use. If this value is 0, then no subexpression match
- information will be computed. If the value is -1, then
- all of the matching subexpressions will be remembered. Any other
- value will be taken as the maximum number of subexpressions to
- remember.
- .AP int eflags in
- OR-ed combination of the values TCL_REG_NOTBOL and TCL_REG_NOTEOL.
- See below for more information.
- .AP Tcl_RegExpInfo *infoPtr out
- The address of the location where information about a previous match
- should be stored by fBTcl_RegExpGetInfofR.
- .VE 8.1
- .BE
- .SH DESCRIPTION
- .PP
- fBTcl_RegExpMatchfR determines whether its fIpatternfR argument
- matches fIregexpfR, where fIregexpfR is interpreted
- as a regular expression using the rules in the fBre_syntaxfR
- reference page.
- If there is a match then fBTcl_RegExpMatchfR returns 1.
- If there is no match then fBTcl_RegExpMatchfR returns 0.
- If an error occurs in the matching process (e.g. fIpatternfR
- is not a valid regular expression) then fBTcl_RegExpMatchfR
- returns -1 and leaves an error message in the interpreter result.
- .VS 8.1.2
- fBTcl_RegExpMatchObjfR is similar to fBTcl_RegExpMatchfR except it
- operates on the Tcl objects fIstrObjfR and fIpatObjfR instead of
- UTF strings.
- fBTcl_RegExpMatchObjfR is generally more efficient than
- fBTcl_RegExpMatchfR, so it is the preferred interface.
- .VE 8.1.2
- .PP
- fBTcl_RegExpCompilefR, fBTcl_RegExpExecfR, and fBTcl_RegExpRangefR
- provide lower-level access to the regular expression pattern matcher.
- fBTcl_RegExpCompilefR compiles a regular expression string into
- the internal form used for efficient pattern matching.
- The return value is a token for this compiled form, which can be
- used in subsequent calls to fBTcl_RegExpExecfR or fBTcl_RegExpRangefR.
- If an error occurs while compiling the regular expression then
- fBTcl_RegExpCompilefR returns NULL and leaves an error message
- in the interpreter result.
- Note: the return value from fBTcl_RegExpCompilefR is only valid
- up to the next call to fBTcl_RegExpCompilefR; it is not safe to
- retain these values for long periods of time.
- .PP
- fBTcl_RegExpExecfR executes the regular expression pattern matcher.
- It returns 1 if fIstringfR contains a range of characters that
- match fIregexpfR, 0 if no match is found, and
- -1 if an error occurs.
- In the case of an error, fBTcl_RegExpExecfR leaves an error
- message in the interpreter result.
- When searching a string for multiple matches of a pattern,
- it is important to distinguish between the start of the original
- string and the start of the current search.
- For example, when searching for the second occurrence of a
- match, the fIstringfR argument might point to the character
- just after the first match; however, it is important for the
- pattern matcher to know that this is not the start of the entire string,
- so that it doesn't allow fB^fR atoms in the pattern to match.
- The fIstartfR argument provides this information by pointing
- to the start of the overall string containing fIstringfR.
- fIStartfR will be less than or equal to fIstringfR; if it
- is less than fIstringfR then no fB^fR matches will be allowed.
- .PP
- fBTcl_RegExpRangefR may be invoked after fBTcl_RegExpExecfR
- returns; it provides detailed information about what ranges of
- the string matched what parts of the pattern.
- fBTcl_RegExpRangefR returns a pair of pointers in fI*startPtrfR
- and fI*endPtrfR that identify a range of characters in
- the source string for the most recent call to fBTcl_RegExpExecfR.
- fIIndexfR indicates which of several ranges is desired:
- if fIindexfR is 0, information is returned about the overall range
- of characters that matched the entire pattern; otherwise,
- information is returned about the range of characters that matched the
- fIindexfR'th parenthesized subexpression within the pattern.
- If there is no range corresponding to fIindexfR then NULL
- is stored in fI*startPtrfR and fI*endPtrfR.
- .PP
- .VS 8.1
- fBTcl_GetRegExpFromObjfR, fBTcl_RegExpExecObjfR, and
- fBTcl_RegExpGetInfofR are object interfaces that provide the most
- direct control of Henry Spencer's regular expression library. For
- users that need to modify compilation and execution options directly,
- it is recommended that you use these interfaces instead of calling the
- internal regexp functions. These interfaces handle the details of UTF
- to Unicode translations as well as providing improved performance
- through caching in the pattern and string objects.
- .PP
- fBTcl_GetRegExpFromObjfR attempts to return a compiled regular
- expression from the fIpatObjfR. If the object does not already
- contain a compiled regular expression it will attempt to create one
- from the string in the object and assign it to the internal
- representation of the fIpatObjfR. The return value of this function
- is of type fBTcl_RegExpfR. The return value is a token for this
- compiled form, which can be used in subsequent calls to
- fBTcl_RegExpExecObjfR or fBTcl_RegExpGetInfofR. If an error
- occurs while compiling the regular expression then
- fBTcl_GetRegExpFromObjfR returns NULL and leaves an error message in
- the interpreter result. The regular expression token can be used as
- long as the internal representation of fIpatObjfR refers to the
- compiled form. The fIeflagsfR argument is a bitwise OR of
- zero or more of the following flags that control the compilation of
- fIpatObjfR:
- .RS 2
- .TP
- fBTCL_REG_ADVANCEDfR
- Compile advanced regular expressions (`AREs'). This mode corresponds to
- the normal regular expression syntax accepted by the Tcl regexp and
- regsub commands.
- .TP
- fBTCL_REG_EXTENDEDfR
- Compile extended regular expressions (`EREs'). This mode corresponds
- to the regular expression syntax recognized by Tcl 8.0 and earlier
- versions.
- .TP
- fBTCL_REG_BASICfR
- Compile basic regular expressions (`BREs'). This mode corresponds
- to the regular expression syntax recognized by common Unix utilities
- like fBsedfR and fBgrepfR. This is the default if no flags are
- specified.
- .TP
- fBTCL_REG_EXPANDEDfR
- Compile the regular expression (basic, extended, or advanced) using an
- expanded syntax that allows comments and whitespace. This mode causes
- non-backslashed non-bracket-expression white
- space and #-to-end-of-line comments to be ignored.
- .TP
- fBTCL_REG_QUOTEfR
- Compile a literal string, with all characters treated as ordinary characters.
- .TP
- fBTCL_REG_NOCASEfR
- Compile for matching that ignores upper/lower case distinctions.
- .TP
- fBTCL_REG_NEWLINEfR
- Compile for newline-sensitive matching. By default, newline is a
- completely ordinary character with no special meaning in either
- regular expressions or strings. With this flag, `[^' bracket
- expressions and `.' never match newline, `^' matches an empty string
- after any newline in addition to its normal function, and `$' matches
- an empty string before any newline in addition to its normal function.
- fBREG_NEWLINEfR is the bitwise OR of fBREG_NLSTOPfR and
- fBREG_NLANCHfR.
- .TP
- fBTCL_REG_NLSTOPfR
- Compile for partial newline-sensitive matching,
- with the behavior of
- `[^' bracket expressions and `.' affected,
- but not the behavior of `^' and `$'. In this mode, `[^' bracket
- expressions and `.' never match newline.
- .TP
- fBTCL_REG_NLANCHfR
- Compile for inverse partial newline-sensitive matching,
- with the behavior of
- of `^' and `$' (the ``anchors'') affected, but not the behavior of
- `[^' bracket expressions and `.'. In this mode `^' matches an empty string
- after any newline in addition to its normal function, and `$' matches
- an empty string before any newline in addition to its normal function.
- .TP
- fBTCL_REG_NOSUBfR
- Compile for matching that reports only success or failure,
- not what was matched. This reduces compile overhead and may improve
- performance. Subsequent calls to fBTcl_RegExpGetInfofR or
- fBTcl_RegExpRangefR will not report any match information.
- .TP
- fBTCL_REG_CANMATCHfR
- Compile for matching that reports the potential to complete a partial
- match given more text (see below).
- .RE
- .PP
- Only one of
- fBTCL_REG_EXTENDEDfR,
- fBTCL_REG_ADVANCEDfR,
- fBTCL_REG_BASICfR, and
- fBTCL_REG_QUOTEfR may be specified.
- .PP
- fBTcl_RegExpExecObjfR executes the regular expression pattern
- matcher. It returns 1 if fIobjPtrfR contains a range of characters
- that match fIregexpfR, 0 if no match is found, and -1 if an error
- occurs. In the case of an error, fBTcl_RegExpExecObjfR leaves an
- error message in the interpreter result. The fInmatchesfR value
- indicates to the matcher how many subexpressions are of interest. If
- fInmatchesfR is 0, then no subexpression match information is
- recorded, which may allow the matcher to make various optimizations.
- If the value is -1, then all of the subexpressions in the pattern are
- remembered. If the value is a positive integer, then only that number
- of subexpressions will be remembered. Matching begins at the
- specified Unicode character index given by fIoffsetfR. Unlike
- fBTcl_RegExpExecfR, the behavior of anchors is not affected by the
- offset value. Instead the behavior of the anchors is explicitly
- controlled by the fIeflagsfR argument, which is a bitwise OR of
- zero or more of the following flags:
- .RS 2
- .TP
- fBTCL_REG_NOTBOLfR
- The starting character will not be treated as the beginning of a
- line or the beginning of the string, so `^' will not match there.
- Note that this flag has no effect on how `fBeAfR' matches.
- .TP
- fBTCL_REG_NOTEOLfR
- The last character in the string will not be treated as the end of a
- line or the end of the string, so '$' will not match there.
- Note that this flag has no effect on how `fBeZfR' matches.
- .RE
- .PP
- fBTcl_RegExpGetInfofR retrieves information about the last match
- performed with a given regular expression fIregexpfR. The
- fIinfoPtrfR argument contains a pointer to a structure that is
- defined as follows:
- .PP
- .CS
- typedef struct Tcl_RegExpInfo {
- int fInsubsfR;
- Tcl_RegExpIndices *fImatchesfR;
- long fIextendStartfR;
- } Tcl_RegExpInfo;
- .CE
- .PP
- The fInsubsfR field contains a count of the number of parenthesized
- subexpressions within the regular expression. If the fBTCL_REG_NOSUBfR
- was used, then this value will be zero. The fImatchesfR field
- points to an array of fInsubsfR values that indicate the bounds of each
- subexpression matched. The first element in the array refers to the
- range matched by the entire regular expression, and subsequent elements
- refer to the parenthesized subexpressions in the order that they
- appear in the pattern. Each element is a structure that is defined as
- follows:
- .PP
- .CS
- typedef struct Tcl_RegExpIndices {
- long fIstartfR;
- long fIendfR;
- } Tcl_RegExpIndices;
- .CE
- .PP
- The fIstartfR and fIendfR values are Unicode character indices
- relative to the offset location within fIobjPtrfR where matching began.
- The fIstartfR index identifies the first character of the matched
- subexpression. The fIendfR index identifies the first character
- after the matched subexpression. If the subexpression matched the
- empty string, then fIstartfR and fIendfR will be equal. If the
- subexpression did not participate in the match, then fIstartfR and
- fIendfR will be set to -1.
- .PP
- The fIextendStartfR field in fBTcl_RegExpInfofR is only set if the
- fBTCL_REG_CANMATCHfR flag was used. It indicates the first
- character in the string where a match could occur. If a match was
- found, this will be the same as the beginning of the current match.
- If no match was found, then it indicates the earliest point at which a
- match might occur if additional text is appended to the string. If it
- is no match is possible even with further text, this field will be set
- to -1.
- .VE 8.1
- .SH "SEE ALSO"
- re_syntax(n)
- .SH KEYWORDS
- match, pattern, regular expression, string, subexpression, Tcl_RegExpIndices, Tcl_RegExpInfo