xindex.sgml
上传用户:blenddy
上传日期:2007-01-07
资源大小:6495k
文件大小:22k
- <chapter id="xindex">
- <title>Interfacing Extensions To Indices</title>
- <para>
- The procedures described thus far let you define a new type, new
- functions and new operators. However, we cannot yet define a secondary
- index (such as a <acronym>B-tree</acronym>, <acronym>R-tree</acronym> or
- hash access method) over a new type or its operators.
- </para>
- <para>
- Look back at
- <xref endterm="EXTEND-CATALOGS" linkend="EXTEND-CATALOGS">.
- The right half shows the catalogs that we must modify in order to tell
- <productname>Postgres</productname> how to use a user-defined type and/or
- user-defined operators with an index (i.e., <filename>pg_am, pg_amop,
- pg_amproc, pg_operator</filename> and <filename>pg_opclass</filename>).
- Unfortunately, there is no simple command to do this. We will demonstrate
- how to modify these catalogs through a running example: a new operator
- class for the <acronym>B-tree</acronym> access method that stores and
- sorts complex numbers in ascending absolute value order.
- </para>
- <para>
- The <filename>pg_am</filename> class contains one instance for every user
- defined access method. Support for the heap access method is built into
- <productname>Postgres</productname>, but every other access method is
- described here. The schema is
- <table tocentry="1">
- <title>Index Schema</title>
- <titleabbrev>Indices</titleabbrev>
- <tgroup cols="2">
- <thead>
- <row>
- <entry>Attribute</entry>
- <entry>Description</entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry>amname</entry>
- <entry>name of the access method</entry>
- </row>
- <row>
- <entry>amowner</entry>
- <entry>object id of the owner's instance in pg_user</entry>
- </row>
- <row>
- <entry>amkind</entry>
- <entry>not used at present, but set to 'o' as a place holder</entry>
- </row>
- <row>
- <entry>amstrategies</entry>
- <entry>number of strategies for this access method (see below)</entry>
- </row>
- <row>
- <entry>amsupport</entry>
- <entry>number of support routines for this access method (see below)</entry>
- </row>
- <row>
- <entry>amgettuple</entry>
- </row>
- <row>
- <entry>aminsert</entry>
- </row>
- <row>
- <entry>...</entry>
- <entry>procedure identifiers for interface routines to the access
- method. For example, regproc ids for opening, closing, and
- getting instances from the access method appear here.</entry>
- </row>
- </tbody>
- </tgroup>
- </table>
- </para>
- <para>
- The <acronym>object ID</acronym> of the instance in
- <filename>pg_am</filename> is used as a foreign key in lots of other
- classes. You don't need to add a new instance to this class; all
- you're interested in is the <acronym>object ID</acronym> of the access
- method instance you want to extend:
- <programlisting>
- SELECT oid FROM pg_am WHERE amname = 'btree';
- +----+
- |oid |
- +----+
- |403 |
- +----+
- </programlisting>
- We will use that <command>SELECT</command> in a <command>WHERE</command>
- clause later.
- </para>
- <para>
- The <filename>amstrategies</filename> attribute exists to standardize
- comparisons across data types. For example, <acronym>B-tree</acronym>s
- impose a strict ordering on keys, lesser to greater. Since
- <productname>Postgres</productname> allows the user to define operators,
- <productname>Postgres</productname> cannot look at the name of an operator
- (eg, ">" or "<") and tell what kind of comparison it is. In fact,
- some access methods don't impose any ordering at all. For example,
- <acronym>R-tree</acronym>s express a rectangle-containment relationship,
- whereas a hashed data structure expresses only bitwise similarity based
- on the value of a hash function. <productname>Postgres</productname>
- needs some consistent way of taking a qualification in your query,
- looking at the operator and then deciding if a usable index exists. This
- implies that <productname>Postgres</productname> needs to know, for
- example, that the "<=" and ">" operators partition a
- <acronym>B-tree</acronym>. <productname>Postgres</productname>
- uses strategies to express these relationships between
- operators and the way they can be used to scan indices.
- </para>
- <para>
- Defining a new set of strategies is beyond the scope of this discussion,
- but we'll explain how <acronym>B-tree</acronym> strategies work because
- you'll need to know that to add a new operator class. In the
- <filename>pg_am</filename> class, the amstrategies attribute is the
- number of strategies defined for this access method. For
- <acronym>B-tree</acronym>s, this number is 5. These strategies
- correspond to
- <table tocentry="1">
- <title>B-tree Strategies</title>
- <titleabbrev>B-tree</titleabbrev>
- <tgroup cols="2">
- <thead>
- <row>
- <entry>Operation</entry>
- <entry>Index</entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry>less than</entry>
- <entry>1</entry>
- </row>
- <row>
- <entry>less than or equal</entry>
- <entry>2</entry>
- </row>
- <row>
- <entry>equal</entry>
- <entry>3</entry>
- </row>
- <row>
- <entry>greater than or equal</entry>
- <entry>4</entry>
- </row>
- <row>
- <entry>greater than</entry>
- <entry>5</entry>
- </row>
- </tbody>
- </tgroup>
- </table>
- </para>
- <para>
- The idea is that you'll need to add procedures corresponding to the
- comparisons above to the <filename>pg_amop</filename> relation (see below).
- The access method code can use these strategy numbers, regardless of data
- type, to figure out how to partition the <acronym>B-tree</acronym>,
- compute selectivity, and so on. Don't worry about the details of adding
- procedures yet; just understand that there must be a set of these
- procedures for <filename>int2, int4, oid,</filename> and every other
- data type on which a <acronym>B-tree</acronym> can operate.
- </para>
- <para>
- Sometimes, strategies aren't enough information for the system to figure
- out how to use an index. Some access methods require other support
- routines in order to work. For example, the <acronym>B-tree</acronym>
- access method must be able to compare two keys and determine whether one
- is greater than, equal to, or less than the other. Similarly, the
- <acronym>R-tree</acronym> access method must be able to compute
- intersections, unions, and sizes of rectangles. These
- operations do not correspond to user qualifications in
- SQL queries; they are administrative routines used by
- the access methods, internally.
- </para>
- <para>
- In order to manage diverse support routines consistently across all
- <productname>Postgres</productname> access methods,
- <filename>pg_am</filename> includes an attribute called
- <filename>amsupport</filename>. This attribute records the number of
- support routines used by an access method. For <acronym>B-tree</acronym>s,
- this number is one -- the routine to take two keys and return -1, 0, or
- +1, depending on whether the first key is less than, equal
- to, or greater than the second.
- <note>
- <para>
- Strictly speaking, this routine can return a negative
- number (< 0), 0, or a non-zero positive number (> 0).
- </para>
- </note>
- </para>
- <para>
- The <filename>amstrategies</filename> entry in pg_am is just the number
- of strategies defined for the access method in question. The procedures
- for less than, less equal, and so on don't appear in
- <filename>pg_am</filename>. Similarly, <filename>amsupport</filename>
- is just the number of support routines required by the access
- method. The actual routines are listed elsewhere.
- </para>
- <para>
- The next class of interest is pg_opclass. This class exists only to
- associate a name and default type with an oid. In pg_amop, every
- <acronym>B-tree</acronym> operator class has a set of procedures, one
- through five, above. Some existing opclasses are <filename>int2_ops,
- int4_ops, and oid_ops</filename>. You need to add an instance with your
- opclass name (for example, <filename>complex_abs_ops</filename>) to
- <filename>pg_opclass</filename>. The <filename>oid</filename> of
- this instance is a foreign key in other classes.
- <programlisting>
- INSERT INTO pg_opclass (opcname, opcdeftype)
- SELECT 'complex_abs_ops', oid FROM pg_type WHERE typname = 'complex_abs';
- SELECT oid, opcname, opcdeftype
- FROM pg_opclass
- WHERE opcname = 'complex_abs_ops';
- +------+-----------------+------------+
- |oid | opcname | opcdeftype |
- +------+-----------------+------------+
- |17314 | complex_abs_ops | 29058 |
- +------+-----------------+------------+
- </programlisting>
- Note that the oid for your <filename>pg_opclass</filename> instance will
- be different! Don't worry about this though. We'll get this number
- from the system later just like we got the oid of the type here.
- </para>
- <para>
- So now we have an access method and an operator class.
- We still need a set of operators; the procedure for
- defining operators was discussed earlier in this manual.
- For the complex_abs_ops operator class on Btrees,
- the operators we require are:
- <programlisting>
- absolute value less-than
- absolute value less-than-or-equal
- absolute value equal
- absolute value greater-than-or-equal
- absolute value greater-than
- </programlisting>
- </para>
- <para>
- Suppose the code that implements the functions defined
- is stored in the file
- <filename>PGROOT/src/tutorial/complex.c</filename>
- </para>
- <para>
- Part of the code look like this: (note that we will only show the
- equality operator for the rest of the examples. The other four
- operators are very similar. Refer to <filename>complex.c</filename>
- or <filename>complex.source</filename> for the details.)
- <programlisting>
- #define Mag(c) ((c)->x*(c)->x + (c)->y*(c)->y)
- bool
- complex_abs_eq(Complex *a, Complex *b)
- {
- double amag = Mag(a), bmag = Mag(b);
- return (amag==bmag);
- }
- </programlisting>
- </para>
- <para>
- There are a couple of important things that are happening below.
- </para>
- <para>
- First, note that operators for less-than, less-than-or equal, equal,
- greater-than-or-equal, and greater-than for <filename>int4</filename>
- are being defined. All of these operators are already defined for
- <filename>int4</filename> under the names <, <=, =, >=,
- and >. The new operators behave differently, of course. In order
- to guarantee that <productname>Postgres</productname> uses these
- new operators rather than the old ones, they need to be named differently
- from the old ones. This is a key point: you can overload operators in
- <productname>Postgres</productname>, but only if the operator isn't
- already defined for the argument types. That is, if you have <
- defined for (int4, int4), you can't define it again.
- <productname>Postgres</productname> does not check this when you define
- your operator, so be careful. To avoid this problem, odd names will be
- used for the operators. If you get this wrong, the access methods
- are likely to crash when you try to do scans.
- </para>
- <para>
- The other important point is that all the operator functions return
- Boolean values. The access methods rely on this fact. (On the other
- hand, the support function returns whatever the particular access method
- expects -- in this case, a signed integer.) The final routine in the
- file is the "support routine" mentioned when we discussed the amsupport
- attribute of the <filename>pg_am</filename> class. We will use this
- later on. For now, ignore it.
- </para>
- <para>
- <programlisting>
- CREATE FUNCTION complex_abs_eq(complex_abs, complex_abs)
- RETURNS bool
- AS 'PGROOT/tutorial/obj/complex.so'
- LANGUAGE 'c';
- </programlisting>
- </para>
- <para>
- Now define the operators that use them. As noted, the operator names
- must be unique among all operators that take two <filename>int4</filename>
- operands. In order to see if the operator names listed below are taken,
- we can do a query on <filename>pg_operator</filename>:
- <programlisting>
- /*
- * this query uses the regular expression operator (~)
- * to find three-character operator names that end in
- * the character &
- */
- SELECT *
- FROM pg_operator
- WHERE oprname ~ '^..&$'::text;
- </programlisting>
- </para>
- <para>
- to see if your name is taken for the types you want. The important
- things here are the procedure (which are the <acronym>C</acronym>
- functions defined above) and the restriction and join selectivity
- functions. You should just use the ones used below--note that there
- are different such functions for the less-than, equal, and greater-than
- cases. These must be supplied, or the access method will crash when it
- tries to use the operator. You should copy the names for restrict and
- join, but use the procedure names you defined in the last step.
- <programlisting>
- CREATE OPERATOR = (
- leftarg = complex_abs, rightarg = complex_abs,
- procedure = complex_abs_eq,
- restrict = eqsel, join = eqjoinsel
- )
- </programlisting>
- </para>
- <para>
- Notice that five operators corresponding to less, less equal, equal,
- greater, and greater equal are defined.
- </para>
- <para>
- We're just about finished. the last thing we need to do is to update
- the <filename>pg_amop</filename> relation. To do this, we need the
- following attributes:
- <table tocentry="1">
- <title><filename>pg_amproc</filename> Schema</title>
- <titleabbrev><filename>pg_amproc</filename></titleabbrev>
- <tgroup cols="2">
- <thead>
- <row>
- <entry>Attribute</entry>
- <entry>Description</entry>
- </row>
- </thead>
- <tbody>
- <row>
- <entry>amopid</entry>
- <entry>the <filename>oid</filename> of the <filename>pg_am</filename> instance
- for B-tree (== 403, see above)</entry>
- </row>
- <row>
- <entry>amopclaid</entry>
- <entry>the <filename>oid</filename> of the
- <filename>pg_opclass</filename> instance for <filename>complex_abs_ops</filename>
- (== whatever you got instead of <filename>17314</filename>, see above)</entry>
- </row>
- <row>
- <entry>amopopr</entry>
- <entry>the <filename>oid</filename>s of the operators for the opclass
- (which we'll get in just a minute)</entry>
- </row>
- <row>
- <entry>amopselect, amopnpages</entry>
- <entry>cost functions</entry>
- </row>
- </tbody>
- </tgroup>
- </table>
- The cost functions are used by the query optimizer to decide whether or
- not to use a given index in a scan. Fortunately, these already exist.
- The two functions we'll use are <filename>btreesel</filename>, which
- estimates the selectivity of the <acronym>B-tree</acronym>, and
- <filename>btreenpage</filename>, which estimates the number of pages a
- search will touch in the tree.
- </para>
- <para>
- So we need the <filename>oid</filename>s of the operators we just
- defined. We'll look up the names of all the operators that take
- two <filename>complex</filename>es, and pick ours out:
-
- <programlisting>
- SELECT o.oid AS opoid, o.oprname
- INTO TABLE complex_ops_tmp
- FROM pg_operator o, pg_type t
- WHERE o.oprleft = t.oid and o.oprright = t.oid
- and t.typname = 'complex_abs';
- +------+---------+
- |oid | oprname |
- +------+---------+
- |17321 | < |
- +------+---------+
- |17322 | <= |
- +------+---------+
- |17323 | = |
- +------+---------+
- |17324 | >= |
- +------+---------+
- |17325 | > |
- +------+---------+
- </programlisting>
- (Again, some of your <filename>oid</filename> numbers will almost
- certainly be different.) The operators we are interested in are those
- with <filename>oid</filename>s 17321 through 17325. The values you
- get will probably be different, and you should substitute them for the
- values below. We will do this with a select statement.
- </para>
- <para>
- Now we're ready to update <filename>pg_amop</filename> with our new
- operator class. The most important thing in this entire discussion
- is that the operators are ordered, from less equal through greater
- equal, in <filename>pg_amop</filename>. We add the instances we need:
- <programlisting>
- INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,
- amopselect, amopnpages)
- SELECT am.oid, opcl.oid, c.opoid, 1,
- 'btreesel'::regproc, 'btreenpage'::regproc
- FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c
- WHERE amname = 'btree' AND
- opcname = 'complex_abs_ops' AND
- c.oprname = '<';
- </programlisting>
- Now do this for the other operators substituting for the "1" in the
- third line above and the "<" in the last line. Note the order:
- "less than" is 1, "less than or equal" is 2, "equal" is 3, "greater
- than or equal" is 4, and "greater than" is 5.
- </para>
- <para>
- The next step is registration of the "support routine" previously
- described in our discussion of <filename>pg_am</filename>. The
- <filename>oid</filename> of this support routine is stored in the
- <filename>pg_amproc</filename> class, keyed by the access method
- <filename>oid</filename> and the operator class <filename>oid</filename>.
- First, we need to register the function in
- <productname>Postgres</productname> (recall that we put the
- <acronym>C</acronym> code that implements this routine in the bottom of
- the file in which we implemented the operator routines):
- <programlisting>
- CREATE FUNCTION complex_abs_cmp(complex, complex)
- RETURNS int4
- AS 'PGROOT/tutorial/obj/complex.so'
- LANGUAGE 'c';
- SELECT oid, proname FROM pg_proc
- WHERE proname = 'complex_abs_cmp';
- +------+-----------------+
- |oid | proname |
- +------+-----------------+
- |17328 | complex_abs_cmp |
- +------+-----------------+
- </programlisting>
- (Again, your <filename>oid</filename> number will probably be different
- and you should substitute the value you see for the value below.)
- We can add the new instance as follows:
- <programlisting>
- INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
- SELECT a.oid, b.oid, c.oid, 1
- FROM pg_am a, pg_opclass b, pg_proc c
- WHERE a.amname = 'btree' AND
- b.opcname = 'complex_abs_ops' AND
- c.proname = 'complex_abs_cmp';
- </programlisting>
- </para>
- <para>
- Now we need to add a hashing strategy to allow the type to be indexed.
- We do this by using another type in pg_am but we reuse the sames ops.
- <programlisting>
- INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,
- amopselect, amopnpages)
- SELECT am.oid, opcl.oid, c.opoid, 1,
- 'hashsel'::regproc, 'hashnpage'::regproc
- FROM pg_am am, pg_opclass opcl, complex_abs_ops_tmp c
- WHERE amname = 'hash' AND
- opcname = 'complex_abs_ops' AND
- c.oprname = '=';
- </programlisting>
- </para>
- <para>
- In order to use this index in a where clause, we need to modify the
- <filename>pg_operator</filename> class as follows.
- <programlisting>
- UPDATE pg_operator
- SET oprrest = 'eqsel'::regproc, oprjoin = 'eqjoinsel'
- WHERE oprname = '=' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
-
- UPDATE pg_operator
- SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'
- WHERE oprname = '<filename>' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
-
- UPDATE pg_operator
- SET oprrest = 'neqsel'::regproc, oprjoin = 'neqjoinsel'
- WHERE oprname = '<filename>' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
-
- UPDATE pg_operator
- SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel'
- WHERE oprname = '<' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
-
- UPDATE pg_operator
- SET oprrest = 'intltsel'::regproc, oprjoin = 'intltjoinsel'
- WHERE oprname = '<=' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
-
- UPDATE pg_operator
- SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel'
- WHERE oprname = '>' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');
-
- UPDATE pg_operator
- SET oprrest = 'intgtsel'::regproc, oprjoin = 'intgtjoinsel'
- WHERE oprname = '>=' AND
- oprleft = oprright AND
- oprleft = (SELECT oid FROM pg_type WHERE typname = 'complex_abs');</filename></filename>
- </programlisting>
- </para>
- <para>
- And last (Finally!) we register a description of this type.
- <programlisting>
- INSERT INTO pg_description (objoid, description)
- SELECT oid, 'Two part G/L account'
- FROM pg_type WHERE typname = 'complex_abs';
- </programlisting>
- </para>
- </chapter>
- <!-- Keep this comment at the end of the file
- Local variables:
- mode: sgml
- sgml-omittag:nil
- sgml-shorttag:t
- sgml-minimize-attributes:nil
- sgml-always-quote-attributes:t
- sgml-indent-step:1
- sgml-indent-data:t
- sgml-parent-document:nil
- sgml-default-dtd-file:"./reference.ced"
- sgml-exposed-tags:nil
- sgml-local-catalogs:"/usr/lib/sgml/catalog"
- sgml-local-ecat-files:nil
- End:
- -->