xfunc.sgml
上传用户:blenddy
上传日期:2007-01-07
资源大小:6495k
文件大小:20k
- <Chapter Id="xfunc">
- <Title>Extending <Acronym>SQL</Acronym>: Functions</Title>
- <Para>
- As it turns out, part of defining a new type is the
- definition of functions that describe its behavior.
- Consequently, while it is possible to define a new
- function without defining a new type, the reverse is
- not true. We therefore describe how to add new functions
- to <ProductName>Postgres</ProductName> before describing
- how to add new types.
- <ProductName>Postgres</ProductName> <Acronym>SQL</Acronym>
- provides two types of functions: query language functions
- (functions written in <Acronym>SQL</Acronym> and programming
- language functions (functions written in a compiled
- programming language such as <Acronym>C</Acronym>.) Either kind
- of function can take a base type, a composite type or
- some combination as arguments (parameters). In addition,
- both kinds of functions can return a base type or
- a composite type. It's easier to define <Acronym>SQL</Acronym>
- functions, so we'll start with those. Examples in this section
- can also be found in <FileName>funcs.sql</FileName>
- and <FileName>funcs.c</FileName>.
- </Para>
- <Sect1>
- <Title>Query Language (<Acronym>SQL</Acronym>) Functions</Title>
- <Sect2>
- <Title><Acronym>SQL</Acronym> Functions on Base Types</Title>
- <Para>
- The simplest possible <Acronym>SQL</Acronym> function has no arguments and
- simply returns a base type, such as <Acronym>int4</Acronym>:
-
- <ProgramListing>
- CREATE FUNCTION one() RETURNS int4
- AS 'SELECT 1 as RESULT' LANGUAGE 'sql';
- SELECT one() AS answer;
- +-------+
- |answer |
- +-------+
- |1 |
- +-------+
- </ProgramListing>
- </Para>
- <Para>
- Notice that we defined a target list for the function
- (with the name RESULT), but the target list of the
- query that invoked the function overrode the function's
- target list. Hence, the result is labelled answer
- instead of one.
- </Para>
- <Para>
- It's almost as easy to define <Acronym>SQL</Acronym> functions
- that take base types as arguments. In the example below, notice
- how we refer to the arguments within the function as $1
- and $2.
-
- <ProgramListing>
- CREATE FUNCTION add_em(int4, int4) RETURNS int4
- AS 'SELECT $1 + $2;' LANGUAGE 'sql';
- SELECT add_em(1, 2) AS answer;
- +-------+
- |answer |
- +-------+
- |3 |
- +-------+
- </ProgramListing>
- </Para>
- </sect2>
- <Sect2>
- <Title><Acronym>SQL</Acronym> Functions on Composite Types</Title>
- <Para>
- When specifying functions with arguments of composite
- types (such as EMP), we must not only specify which
- argument we want (as we did above with $1 and $2) but
- also the attributes of that argument. For example,
- take the function double_salary that computes what your
- salary would be if it were doubled.
-
- <ProgramListing>
- CREATE FUNCTION double_salary(EMP) RETURNS int4
- AS 'SELECT $1.salary * 2 AS salary;' LANGUAGE 'sql';
- SELECT name, double_salary(EMP) AS dream
- FROM EMP
- WHERE EMP.cubicle ~= '(2,1)'::point;
-
- +-----+-------+
- |name | dream |
- +-----+-------+
- |Sam | 2400 |
- +-----+-------+
- </ProgramListing>
- </para>
- <Para>
- Notice the use of the syntax $1.salary.
- Before launching into the subject of functions that
- return composite types, we must first introduce the
- function notation for projecting attributes. The simple way
- to explain this is that we can usually use the
- notation attribute(class) and class.attribute interchangably.
-
- <ProgramListing>
- --
- -- this is the same as:
- -- SELECT EMP.name AS youngster FROM EMP WHERE EMP.age < 30
- --
- SELECT name(EMP) AS youngster
- FROM EMP
- WHERE age(EMP) < 30;
- +----------+
- |youngster |
- +----------+
- |Sam |
- +----------+
- </ProgramListing>
- </para>
- <Para>
- As we shall see, however, this is not always the case.
- This function notation is important when we want to use
- a function that returns a single instance. We do this
- by assembling the entire instance within the function,
- attribute by attribute. This is an example of a function
- that returns a single EMP instance:
-
- <ProgramListing>
- CREATE FUNCTION new_emp() RETURNS EMP
- AS 'SELECT 'None'::text AS name,
- 1000 AS salary,
- 25 AS age,
- '(2,2)'::point AS cubicle'
- LANGUAGE 'sql';
- </ProgramListing>
- </Para>
- <Para>
- In this case we have specified each of the attributes
- with a constant value, but any computation or expression
- could have been substituted for these constants.
- Defining a function like this can be tricky. Some of
- the more important caveats are as follows:
-
-
- <ItemizedList>
- <ListItem>
- <Para>
- The target list order must be exactly the same as
- that in which the attributes appear in the CREATE
- TABLE statement (or when you execute a .* query).
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- You must typecast the expressions (using ::) very carefully
- or you will see the following error:
-
- <ProgramListing>
- WARN::function declared to return type EMP does not retrieve (EMP.*)
- </ProgramListing>
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- When calling a function that returns an instance, we
- cannot retrieve the entire instance. We must either
- project an attribute out of the instance or pass the
- entire instance into another function.
- <ProgramListing>
- SELECT name(new_emp()) AS nobody;
- +-------+
- |nobody |
- +-------+
- |None |
- +-------+
- </ProgramListing>
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- The reason why, in general, we must use the function
- syntax for projecting attributes of function return
- values is that the parser just doesn't understand
- the other (dot) syntax for projection when combined
- with function calls.
-
- <ProgramListing>
- SELECT new_emp().name AS nobody;
- WARN:parser: syntax error at or near "."
- </ProgramListing>
- </Para>
- </ListItem>
- </ItemizedList>
- </para>
- <Para>
- Any collection of commands in the <Acronym>SQL</Acronym> query
- language can be packaged together and defined as a function.
- The commands can include updates (i.e., <Acronym>insert</Acronym>,
- <Acronym>update</Acronym> and <Acronym>delete</Acronym>) as well
- as <Acronym>select</Acronym> queries. However, the final command
- must be a <Acronym>select</Acronym> that returns whatever is
- specified as the function's returntype.
-
- <ProgramListing>
- CREATE FUNCTION clean_EMP () RETURNS int4
- AS 'DELETE FROM EMP WHERE EMP.salary <= 0;
- SELECT 1 AS ignore_this'
- LANGUAGE 'sql';
- SELECT clean_EMP();
- +--+
- |x |
- +--+
- |1 |
- +--+
-
- </ProgramListing>
- </Para>
- </sect2>
- </sect1>
- <Sect1>
- <Title>Programming Language Functions</Title>
- <Sect2>
- <Title>Programming Language Functions on Base Types</Title>
- <Para>
- Internally, <ProductName>Postgres</ProductName> regards a
- base type as a "blob of memory." The user-defined
- functions that you define over a type in turn define the
- way that <ProductName>Postgres</ProductName> can operate
- on it. That is, <ProductName>Postgres</ProductName> will
- only store and retrieve the data from disk and use your
- user-defined functions to input, process, and output the data.
- Base types can have one of three internal formats:
- <ItemizedList>
- <ListItem><Para>pass by value, fixed-length</Para>
- </ListItem>
- <ListItem><Para>pass by reference, fixed-length</Para>
- </ListItem>
- <ListItem><Para>pass by reference, variable-length</Para>
- </ListItem>
- </ItemizedList>
- </Para>
- <Para>
- By-value types can only be 1, 2 or 4 bytes in length
- (even if your computer supports by-value types of other
- sizes). <ProductName>Postgres</ProductName> itself
- only passes integer types by value. You should be careful
- to define your types such that they will be the same
- size (in bytes) on all architectures. For example, the
- <Acronym>long</Acronym> type is dangerous because it
- is 4 bytes on some machines and 8 bytes on others, whereas
- <Acronym>int</Acronym> type is 4 bytes on most
- <Acronym>UNIX</Acronym> machines (though not on most
- personal computers). A reasonable implementation of
- the <Acronym>int4</Acronym> type on <Acronym>UNIX</Acronym>
- machines might be:
-
- <ProgramListing>
- /* 4-byte integer, passed by value */
- typedef int int4;
- </ProgramListing>
- </Para>
- <Para>
- On the other hand, fixed-length types of any size may
- be passed by-reference. For example, here is a sample
- implementation of a <ProductName>Postgres</ProductName> type:
-
- <ProgramListing>
- /* 16-byte structure, passed by reference */
- typedef struct
- {
- double x, y;
- } Point;
- </ProgramListing>
- </Para>
- <Para>
- Only pointers to such types can be used when passing
- them in and out of <ProductName>Postgres</ProductName> functions.
- Finally, all variable-length types must also be passed
- by reference. All variable-length types must begin
- with a length field of exactly 4 bytes, and all data to
- be stored within that type must be located in the memory
- immediately following that length field. The
- length field is the total length of the structure
- (i.e., it includes the size of the length field
- itself). We can define the text type as follows:
- </Para>
- <Para>
- <ProgramListing>
- typedef struct {
- int4 length;
- char data[1];
- } text;
- </ProgramListing>
- </Para>
- <Para>
- Obviously, the data field is not long enough to hold
- all possible strings -- it's impossible to declare such
- a structure in <Acronym>C</Acronym>. When manipulating
- variable-length types, we must be careful to allocate
- the correct amount of memory and initialize the length field.
- For example, if we wanted to store 40 bytes in a text
- structure, we might use a code fragment like this:
- <ProgramListing>
- #include "postgres.h"
- ...
- char buffer[40]; /* our source data */
- ...
- text *destination = (text *) palloc(VARHDRSZ + 40);
- destination->length = VARHDRSZ + 40;
- memmove(destination->data, buffer, 40);
- ...
- </ProgramListing>
- </Para>
- <Para>
- Now that we've gone over all of the possible structures
- for base types, we can show some examples of real functions.
- Suppose <FileName>funcs.c</FileName> look like:
- <ProgramListing>
- #include <string.h>
- #include "postgres.h"
- /* By Value */
-
- int
- add_one(int arg)
- {
- return(arg + 1);
- }
-
- /* By Reference, Fixed Length */
-
- Point *
- makepoint(Point *pointx, Point *pointy )
- {
- Point *new_point = (Point *) palloc(sizeof(Point));
-
- new_point->x = pointx->x;
- new_point->y = pointy->y;
-
- return new_point;
- }
-
- /* By Reference, Variable Length */
-
- text *
- copytext(text *t)
- {
- /*
- * VARSIZE is the total size of the struct in bytes.
- */
- text *new_t = (text *) palloc(VARSIZE(t));
- memset(new_t, 0, VARSIZE(t));
- VARSIZE(new_t) = VARSIZE(t);
- /*
- * VARDATA is a pointer to the data region of the struct.
- */
- memcpy((void *) VARDATA(new_t), /* destination */
- (void *) VARDATA(t), /* source */
- VARSIZE(t)-VARHDRSZ); /* how many bytes */
- return(new_t);
- }
-
- text *
- concat_text(text *arg1, text *arg2)
- {
- int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
- text *new_text = (text *) palloc(new_text_size);
- memset((void *) new_text, 0, new_text_size);
- VARSIZE(new_text) = new_text_size;
- strncpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
- strncat(VARDATA(new_text), VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
- return (new_text);
- }
- </ProgramListing>
- </Para>
- <Para>
- On <Acronym>OSF/1</Acronym> we would type:
-
- <ProgramListing>
- CREATE FUNCTION add_one(int4) RETURNS int4
- AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c';
- CREATE FUNCTION makepoint(point, point) RETURNS point
- AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c';
-
- CREATE FUNCTION concat_text(text, text) RETURNS text
- AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c';
-
- CREATE FUNCTION copytext(text) RETURNS text
- AS 'PGROOT/tutorial/funcs.so' LANGUAGE 'c';
- </ProgramListing>
- </Para>
- <Para>
- On other systems, we might have to make the filename
- end in .sl (to indicate that it's a shared library).
- </Para>
- </Sect2>
- <Sect2>
- <Title>Programming Language Functions on Composite Types</Title>
- <Para>
- Composite types do not have a fixed layout like C
- structures. Instances of a composite type may contain
- null fields. In addition, composite types that are
- part of an inheritance hierarchy may have different
- fields than other members of the same inheritance hierarchy.
- Therefore, <ProductName>Postgres</ProductName> provides
- a procedural interface for accessing fields of composite types
- from C. As <ProductName>Postgres</ProductName> processes
- a set of instances, each instance will be passed into your
- function as an opaque structure of type <Acronym>TUPLE</Acronym>.
- Suppose we want to write a function to answer the query
- <ProgramListing>
- * SELECT name, c_overpaid(EMP, 1500) AS overpaid
- FROM EMP
- WHERE name = 'Bill' or name = 'Sam';
- </ProgramListing>
- In the query above, we can define c_overpaid as:
-
- <ProgramListing>
- #include "postgres.h"
- #include "executor/executor.h" /* for GetAttributeByName() */
-
- bool
- c_overpaid(TupleTableSlot *t, /* the current instance of EMP */
- int4 limit)
- {
- bool isnull = false;
- int4 salary;
- salary = (int4) GetAttributeByName(t, "salary", &isnull);
- if (isnull)
- return (false);
- return(salary > limit);
- }
- </ProgramListing>
- </Para>
- <Para>
- <Acronym>GetAttributeByName</Acronym> is the
- <ProductName>Postgres</ProductName> system function that
- returns attributes out of the current instance. It has
- three arguments: the argument of type TUPLE passed into
- the function, the name of the desired attribute, and a
- return parameter that describes whether the attribute
- is null. <Acronym>GetAttributeByName</Acronym> will
- align data properly so you can cast its return value to
- the desired type. For example, if you have an attribute
- name which is of the type name, the <Acronym>GetAttributeByName</Acronym>
- call would look like:
- <ProgramListing>
- char *str;
- ...
- str = (char *) GetAttributeByName(t, "name", &isnull)
- </ProgramListing>
- </Para>
- <Para>
- The following query lets <ProductName>Postgres</ProductName>
- know about the c_overpaid function:
- <ProgramListing>
- * CREATE FUNCTION c_overpaid(EMP, int4) RETURNS bool
- AS 'PGROOT/tutorial/obj/funcs.so' LANGUAGE 'c';
- </ProgramListing>
- </Para>
- <Para>
- While there are ways to construct new instances or modify
- existing instances from within a C function, these
- are far too complex to discuss in this manual.
- </Para>
- </Sect2>
- <Sect2>
- <Title>Caveats</Title>
- <Para>
- We now turn to the more difficult task of writing
- programming language functions. Be warned: this section
- of the manual will not make you a programmer. You must
- have a good understanding of <Acronym>C</Acronym>
- (including the use of pointers and the malloc memory manager)
- before trying to write <Acronym>C</Acronym> functions for
- use with <ProductName>Postgres</ProductName>. While it may
- be possible to load functions written in languages other
- than <Acronym>C</Acronym> into <ProductName>Postgres</ProductName>,
- this is often difficult (when it is possible at all)
- because other languages, such as <Acronym>FORTRAN</Acronym>
- and <Acronym>Pascal</Acronym> often do not follow the same
- "calling convention" as <Acronym>C</Acronym>. That is, other
- languages do not pass argument and return values
- between functions in the same way. For this reason, we
- will assume that your programming language functions
- are written in <Acronym>C</Acronym>.
- The basic rules for building <Acronym>C</Acronym> functions
- are as follows:
- <ItemizedList>
- <ListItem>
- <Para>
- Most of the header (include) files for
- <ProductName>Postgres</ProductName>
- should already be installed in
- <FileName>PGROOT/include</FileName> (see Figure 2).
- You should always include
-
- <ProgramListing>
- -I$PGROOT/include
- </ProgramListing>
- on your cc command lines. Sometimes, you may
- find that you require header files that are in
- the server source itself (i.e., you need a file
- we neglected to install in include). In those
- cases you may need to add one or more of
- <ProgramListing>
- -I$PGROOT/src/backend
- -I$PGROOT/src/backend/include
- -I$PGROOT/src/backend/port/<PORTNAME>
- -I$PGROOT/src/backend/obj
- </ProgramListing>
- (where <PORTNAME> is the name of the port, e.g.,
- alpha or sparc).
- </para>
- </ListItem>
- <ListItem>
- <Para> When allocating memory, use the
- <ProductName>Postgres</ProductName>
- routines palloc and pfree instead of the
- corresponding <Acronym>C</Acronym> library routines
- malloc and free.
- The memory allocated by palloc will be freed
- automatically at the end of each transaction,
- preventing memory leaks.
- </Para>
- </ListItem>
- <ListItem>
- <Para> Always zero the bytes of your structures using
- memset or bzero. Several routines (such as the
- hash access method, hash join and the sort algorithm)
- compute functions of the raw bits contained in
- your structure. Even if you initialize all fields
- of your structure, there may be
- several bytes of alignment padding (holes in the
- structure) that may contain garbage values.
- </Para>
- </ListItem>
- <ListItem>
- <Para> Most of the internal <ProductName>Postgres</ProductName>
- types are declared in postgres.h, so it's a good
- idea to always include that file as well. Including
- postgres.h will also include elog.h and palloc.h for you.
- </Para>
- </ListItem>
- <ListItem>
- <Para> Compiling and loading your object code so that
- it can be dynamically loaded into
- <ProductName>Postgres</ProductName>
- always requires special flags. See Appendix A
- for a detailed explanation of how to do it for
- your particular operating system.
- </Para>
- </ListItem>
- </ItemizedList>
- </Para>
- </Sect2>
- </sect1>
- </chapter>