AWK.9
上传用户:jnzhq888
上传日期:2007-01-18
资源大小:51694k
文件大小:8k
- .CD "awk (en pattern matching language"
- .SX "awk fIrulesfR [fIfilefR] ...
- .FL "fR(none)"
- .EX "awk rules input" "Process fIinputfR according to fIrulesfR"
- .EX "awk rules (en >out" "Input from terminal, output to fIoutfR"
- .PP
- AWK is a programming language devised by Aho, Weinberger, and Kernighan
- at Bell Labs (hence the name).
- fIAwkfR programs search files for
- specific patterns and performs *(OQactions*(CQ for every occurrence
- of these patterns. The patterns can be *(OQregular expressions*(CQ
- as used in the fIedfR editor. The actions are expressed
- using a subset of the C language.
- .PP
- The patterns and actions are usually placed in a *(OQrules*(CQ file
- whose name must be the first argument in the command line,
- preceded by the flag fB(enffR. Otherwise, the first argument on the
- command line is taken to be a string containing the rules
- themselves. All other arguments are taken to be the names of text
- files on which the rules are to be applied, with fB(enfR being the
- standard input. To take rules from the standard input, use fB(enf (enfR.
- .PP
- The command:
- .HS
- .Cx "awk rules prog.ds+2*s0u"
- .HS
- would read the patterns and actions rules from the file fIrulesfR
- and apply them to all the arguments.
- .PP
- The general format of a rules file is:
- .HS
- ~~~<pattern> { <action> }
- ~~~<pattern> { <action> }
- ~~~...
- .HS
- There may be any number of these <pattern> { <action> }
- sequences in the rules file. fIAwkfR reads a line of input from
- the current input file and applies every <pattern> { <action> }
- in sequence to the line.
- .PP
- If the <pattern> corresponding to any { <action> } is missing,
- the action is applied to every line of input. The default
- { <action> } is to print the matched input line.
- .SS "Patterns"
- .PP
- The <pattern>s may consist of any valid C expression. If the
- <pattern> consists of two expressions separated by a comma, it
- is taken to be a range and the <action> is performed on all
- lines of input that match the range. <pattern>s may contain
- *(OQregular expressions*(CQ delimited by an @ symbol. Regular
- expressions can be thought of as a generalized *(OQwildcard*(CQ
- string matching mechanism, similar to that used by many
- operating systems to specify file names. Regular expressions
- may contain any of the following characters:
- .HS
- .in +0.75i
- .ta +0.5i
- .ti -0.5i
- x An ordinary character
- .ti -0.5i
- \ The backslash quotes any character
- .ti -0.5i
- ^ A circumflex at the beginning of an expr matches the beginning of a line.
- .ti -0.5i
- $ A dollar-sign at the end of an expression matches the end of a line.
- .ti -0.5i
- &. A period matches any single character except newline.
- .ti -0.5i
- * An expression followed by an asterisk matches zero or more occurrences
- of that expression: *(OQfo**(CQ matches *(OQf*(CQ, *(OQfo*(CQ, *(OQfoo*(CQ, *(OQfooo*(CQ, etc.
- .ti -0.5i
- + An expression followed by a plus sign matches one or more occurrences
- of that expression: *(OQfo+*(CQ matches *(OQfo*(CQ, *(OQfoo*(CQ, *(OQfooo*(CQ, etc.
- .ti -0.5i
- [] A string enclosed in square brackets matches any single character in that
- string, but no others. If the first character in the string is a circumflex, the
- expression matches any character except newline and the characters in the
- string. For example, *(OQ[xyz]*(CQ matches *(OQxx*(CQ and *(OQzyx*(CQ, while
- *(OQ[^xyz]*(CQ matches *(OQabc*(CQ but not *(OQaxb*(CQ. A range of characters may be
- specified by two characters separated by *(OQ-*(CQ.
- .in -0.75i
- .SS "Actions"
- .PP
- Actions are expressed as a subset of the C language. All
- variables are global and default to int's if not formally
- declared.
- Only char's and int's and pointers and arrays of
- char and int are allowed. fIAwkfR allows only decimal integer
- constants to be used(emno hex (0xnn) or octal (0nn). String
- and character constants may contain all of the special C
- escapes (\n, \r, etc.).
- .PP
- fIAwkfR supports the *(OQif*(CQ, *(OQelse*(CQ,
- *(OQwhile*(CQ and *(OQbreak*(CQ flow of
- control constructs, which behave exactly as in C.
- .PP
- Also supported are the following unary and binary operators,
- listed in order from highest to lowest precedence:
- .HS
- .ta 0.25i 1.75i 3.0i
- .nf
- fB Operator Type AssociativityfR
- () [] unary left to right
- .tr ~~
- ! ~ ++ (en(en (en * & unary right to left
- .tr ~
- * / % binary left to right
- + (en binary left to right
- << >> binary left to right
- < <= > >= binary left to right
- == != binary left to right
- & binary left to right
- ^ binary left to right
- | binary left to right
- && binary left to right
- || binary left to right
- = binary right to left
- .fi
- .HS
- Comments are introduced by a '#' symbol and are terminated by
- the first newline character. The standard *(OQ/**(CQ and *(OQ*/*(CQ
- comment delimiters are not supported and will result in a
- syntax error.
- .SP 0.5
- .SS "Fields"
- .SP 0.5
- .PP
- When fIawkfR reads a line from the current input file, the
- record is automatically separated into *(OQfields.*(CQ A field is
- simply a string of consecutive characters delimited by either
- the beginning or end of line, or a *(OQfield separator*(CQ character.
- Initially, the field separators are the space and tab character.
- The special unary operator '$' is used to reference one of the
- fields in the current input record (line). The fields are
- numbered sequentially starting at 1. The expression *(OQ$0*(CQ
- references the entire input line.
- .PP
- Similarly, the *(OQrecord separator*(CQ is used to determine the end
- of an input *(OQline,*(CQ initially the newline character. The field
- and record separators may be changed programatically by one of
- the actions and will remain in effect until changed again.
- .PP
- Multiple (up to 10) field separators are allowed at a time, but
- only one record separator.
- .PP
- Fields behave exactly like strings; and can be used in the same
- context as a character array. These *(OQarrays*(CQ can be considered
- to have been declared as:
- .SP 0.15
- .HS
- ~~~~~char ($n)[ 128 ];
- .HS
- .SP 0.15
- In other words, they are 128 bytes long. Notice that the
- parentheses are necessary because the operators [] and $
- associate from right to left; without them, the statement
- would have parsed as:
- .HS
- .SP 0.15
- ~~~~~char $(1[ 128 ]);
- .HS
- .SP 0.15
- which is obviously ridiculous.
- .PP
- If the contents of one of these field arrays is altered, the
- *(OQ$0*(CQ field will reflect this change. For example, this
- expression:
- .HS
- .SP 0.15
- ~~~~~*$4 = 'A';
- .HS
- .SP 0.15
- will change the first character of the fourth field to an upper-
- case letter 'A'. Then, when the following input line:
- .HS
- .SP 0.15
- ~~~~~120 PRINT "Name address Zip"
- .SP 0.15
- .HS
- is processed, it would be printed as:
- .HS
- .SP 0.15
- ~~~~~120 PRINT "Name Address Zip"
- .HS
- .SP 0.15
- Fields may also be modified with the strcpy() function (see
- below). For example, the expression:
- .HS
- ~~~~~strcpy( $4, "Addr." );
- .HS
- applied to the same line above would yield:
- .HS
- ~~~~~120 PRINT "Name Addr. Zip"
- .HS
- .SS "Predefined Variables"
- .PP
- The following variables are pre-defined:
- .HS
- .in +1.5i
- .ta +1.25i
- .ti -1.25i
- FS Field separator (see below).
- .ti -1.25i
- RS Record separator (see below also).
- .ti -1.25i
- NF Number of fields in current input record (line).
- .ti -1.25i
- NR Number of records processed thus far.
- .ti -1.25i
- FILENAME Name of current input file.
- .ti -1.25i
- BEGIN A special <pattern> that matches the beginning of input text.
- .ti -1.25i
- END A special <pattern> that matches the end of input text.
- .in -1.5i
- .HS
- fIAwkfR also provides some useful built-in functions for string
- manipulation and printing:
- .HS
- .in +1.5i
- .ta +1.25i
- .ti -1.25i
- print(arg) Simple printing of strings only, terminated by '\n'.
- .ti -1.25i
- printf(arg...) Exactly the printf() function from C.
- .ti -1.25i
- getline() Reads the next record and returns 0 on end of file.
- .ti -1.25i
- nextfile() Closes the current input file and begins processing the next file
- .ti -1.25i
- strlen(s) Returns the length of its string argument.
- .ti -1.25i
- strcpy(s,t) Copies the string *(OQt*(CQ to the string *(OQs*(CQ.
- .ti -1.25i
- strcmp(s,t) Compares the *(OQs*(CQ to *(OQt*(CQ and returns 0 if they match.
- .ti -1.25i
- toupper(c) Returns its character argument converted to upper-case.
- .ti -1.25i
- tolower(c) Returns its character argument converted to lower-case.
- .ti -1.25i
- match(s,@re@) Compares the string *(OQs*(CQ to the regular expression *(OQre*(CQ and
- returns the number of matches found (zero if none).
- .in -1.5i
- .SS "Authors"
- .PP
- fIAwkfR was written by Saeko Hirabauashi and Kouichi Hirabayashi.