geqo.sgml
上传用户:blenddy
上传日期:2007-01-07
资源大小:6495k
文件大小:12k
- <!--
- $Header: /usr/local/cvsroot/pgsql/doc/src/sgml/geqo.sgml,v 1.8 1999/03/30 15:25:56 thomas Exp $
- Genetic Optimizer
- $Log: geqo.sgml,v $
- Revision 1.8 1999/03/30 15:25:56 thomas
- Fix up small markup problems. Force omit-tags to nil so we have tag
- completion as required by the newest DocBook conventions.
- Revision 1.7 1999/02/19 01:57:08 thomas
- Fix SGML markup from last content changes.
- Revision 1.6 1999/02/18 05:26:17 momjian
- Enable bushy plans by default.
- Revision 1.5 1998/12/29 02:24:15 thomas
- Clean up to ensure tag completion as required by the newest versions
- of Norm's Modular Style Sheets and jade/docbook.
- From Vince Vielhaber <vev@michvhf.com>.
- Revision 1.4 1998/08/15 06:55:05 thomas
- Change Id field in chapter tag to change html output file name.
- -->
- <Chapter Id="geqo">
- <DocInfo>
- <Author>
- <FirstName>Martin</FirstName>
- <SurName>Utesch</SurName>
- <Affiliation>
- <Orgname>
- University of Mining and Technology
- </Orgname>
- <Orgdiv>
- Institute of Automatic Control
- </Orgdiv>
- <Address>
- <City>
- Freiberg
- </City>
- <Country>
- Germany
- </Country>
- </Address>
- </Affiliation>
- </Author>
- <Date>1997-10-02</Date>
- </DocInfo>
- <Title>Genetic Query Optimization in Database Systems</Title>
- <Para>
- <Note>
- <Title>Author</Title>
- <Para>
- Written by <ULink url="utesch@aut.tu-freiberg.de">Martin Utesch</ULink>
- for the Institute of Automatic Control at the University of Mining and Technology in Freiberg, Germany.
- </Para>
- </Note>
- </para>
- <Sect1>
- <Title>Query Handling as a Complex Optimization Problem</Title>
- <Para>
- Among all relational operators the most difficult one to process and
- optimize is the <FirstTerm>join</FirstTerm>. The number of alternative plans to answer a query
- grows exponentially with the number of <Command>join</Command>s included in it. Further
- optimization effort is caused by the support of a variety of <FirstTerm>join methods</FirstTerm>
- (e.g., nested loop, index scan, merge join in <ProductName>Postgres</ProductName>) to
- process individual <Command>join</Command>s and a diversity of <FirstTerm>indices</FirstTerm> (e.g., r-tree,
- b-tree, hash in <ProductName>Postgres</ProductName>) as access paths for relations.
- </para>
- <Para>
- The current <ProductName>Postgres</ProductName> optimizer implementation performs a <FirstTerm>near-
- exhaustive search</FirstTerm> over the space of alternative strategies. This query
- optimization technique is inadequate to support database application
- domains that involve the need for extensive queries, such as artificial
- intelligence.
- </para>
- <Para>
- The Institute of Automatic Control at the University of Mining and
- Technology, in Freiberg, Germany, encountered the described problems as its
- folks wanted to take the <ProductName>Postgres</ProductName> DBMS as the backend for a decision
- support knowledge based system for the maintenance of an electrical
- power grid. The DBMS needed to handle large <Command>join</Command> queries for the
- inference machine of the knowledge based system.
- </para>
- <Para>
- Performance difficulties within exploring the space of possible query
- plans arose the demand for a new optimization technique being developed.
- </para>
- <Para>
- In the following we propose the implementation of a <FirstTerm>Genetic Algorithm</FirstTerm>
- as an option for the database query optimization problem.
- </para>
- </sect1>
- <Sect1>
- <Title>Genetic Algorithms (<Acronym>GA</Acronym>)</Title>
- <Para>
- The <Acronym>GA</Acronym> is a heuristic optimization method which operates through
- determined, randomized search. The set of possible solutions for the
- optimization problem is considered as a <FirstTerm>population</FirstTerm> of <FirstTerm>individuals</FirstTerm>.
- The degree of adaption of an individual to its environment is specified
- by its <FirstTerm>fitness</FirstTerm>.
- </para>
- <Para>
- The coordinates of an individual in the search space are represented
- by <FirstTerm>chromosomes</FirstTerm>, in essence a set of character strings. A <FirstTerm>gene</FirstTerm> is a
- subsection of a chromosome which encodes the value of a single parameter
- being optimized. Typical encodings for a gene could be <FirstTerm>binary</FirstTerm> or
- <FirstTerm>integer</FirstTerm>.
- </para>
- <Para>
- Through simulation of the evolutionary operations <FirstTerm>recombination</FirstTerm>,
- <FirstTerm>mutation</FirstTerm>, and <FirstTerm>selection</FirstTerm> new generations of search points are found
- that show a higher average fitness than their ancestors.
- </para>
- <Para>
- According to the "comp.ai.genetic" <Acronym>FAQ</Acronym> it cannot be stressed too
- strongly that a <Acronym>GA</Acronym> is not a pure random search for a solution to a
- problem. A <Acronym>GA</Acronym> uses stochastic processes, but the result is distinctly
- non-random (better than random).
- <ProgramListing>
- Structured Diagram of a <Acronym>GA</Acronym>:
- ---------------------------
- P(t) generation of ancestors at a time t
- P''(t) generation of descendants at a time t
- +=========================================+
- |>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
- +=========================================+
- | INITIALIZE t := 0 |
- +=========================================+
- | INITIALIZE P(t) |
- +=========================================+
- | evalute FITNESS of P(t) |
- +=========================================+
- | while not STOPPING CRITERION do |
- | +-------------------------------------+
- | | P'(t) := RECOMBINATION{P(t)} |
- | +-------------------------------------+
- | | P''(t) := MUTATION{P'(t)} |
- | +-------------------------------------+
- | | P(t+1) := SELECTION{P''(t) + P(t)} |
- | +-------------------------------------+
- | | evalute FITNESS of P''(t) |
- | +-------------------------------------+
- | | t := t + 1 |
- +===+=====================================+
- </ProgramListing>
- </para>
- </sect1>
- <Sect1>
- <Title>Genetic Query Optimization (<Acronym>GEQO</Acronym>) in Postgres</Title>
- <Para>
- The <Acronym>GEQO</Acronym> module is intended for the solution of the query
- optimization problem similar to a traveling salesman problem (<Acronym>TSP</Acronym>).
- Possible query plans are encoded as integer strings. Each string
- represents the <Command>join</Command> order from one relation of the query to the next.
- E. g., the query tree
- <ProgramListing>
- /
- / 2
- / 3
- 4 1
- </ProgramListing>
- is encoded by the integer string '4-1-3-2',
- which means, first join relation '4' and '1', then '3', and
- then '2', where 1, 2, 3, 4 are relids in <ProductName>Postgres</ProductName>.
- </para>
- <Para>
- Parts of the <Acronym>GEQO</Acronym> module are adapted from D. Whitley's Genitor
- algorithm.
- </para>
- <Para>
- Specific characteristics of the <Acronym>GEQO</Acronym> implementation in <ProductName>Postgres</ProductName>
- are:
- <ItemizedList Mark="bullet" Spacing="compact">
- <ListItem>
- <Para>
- Usage of a <FirstTerm>steady state</FirstTerm> <Acronym>GA</Acronym> (replacement of the least fit
- individuals in a population, not whole-generational replacement)
- allows fast convergence towards improved query plans. This is
- essential for query handling with reasonable time;
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- Usage of <FirstTerm>edge recombination crossover</FirstTerm> which is especially suited
- to keep edge losses low for the solution of the <Acronym>TSP</Acronym> by means of a <Acronym>GA</Acronym>;
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- Mutation as genetic operator is deprecated so that no repair
- mechanisms are needed to generate legal <Acronym>TSP</Acronym> tours.
- </Para>
- </ListItem>
- </ItemizedList>
- </para>
- <Para>
- The <Acronym>GEQO</Acronym> module gives the following benefits to the <ProductName>Postgres</ProductName> DBMS
- compared to the <ProductName>Postgres</ProductName> query optimizer implementation:
- <ItemizedList Mark="bullet" Spacing="compact">
- <ListItem>
- <Para>
- Handling of large <Command>join</Command> queries through non-exhaustive search;
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- Improved cost size approximation of query plans since no longer
- plan merging is needed (the <Acronym>GEQO</Acronym> module evaluates the cost for a
- query plan as an individual).
- </Para>
- </ListItem>
- </ItemizedList>
- </para>
- </Sect1>
- <Sect1>
- <Title>Future Implementation Tasks for <ProductName>Postgres</ProductName> <Acronym>GEQO</Acronym></Title>
- <Sect2>
- <Title>Basic Improvements</Title>
- <Sect3>
- <Title>Improve freeing of memory when query is already processed</Title>
- <Para>
- With large <Command>join</Command> queries the computing time spent for the genetic query
- optimization seems to be a mere <Emphasis>fraction</Emphasis> of the time
- <ProductName>Postgres</ProductName>
- needs for freeing memory via routine <Function>MemoryContextFree</Function>,
- file <FileName>backend/utils/mmgr/mcxt.c</FileName>.
- Debugging showed that it get stucked in a loop of routine
- <Function>OrderedElemPop</Function>, file <FileName>backend/utils/mmgr/oset.c</FileName>.
- The same problems arise with long queries when using the normal
- <ProductName>Postgres</ProductName> query optimization algorithm.
- </para>
- </sect3>
- <Sect3>
- <Title>Improve genetic algorithm parameter settings</Title>
- <Para>
- In file <FileName>backend/optimizer/geqo/geqo_params.c</FileName>, routines
- <Function>gimme_pool_size</Function> and <Function>gimme_number_generations</Function>,
- we have to find a compromise for the parameter settings
- to satisfy two competing demands:
- <ItemizedList Spacing="compact">
- <ListItem>
- <Para>
- Optimality of the query plan
- </Para>
- </ListItem>
- <ListItem>
- <Para>
- Computing time
- </Para>
- </ListItem>
- </ItemizedList>
- </para>
- </sect3>
- <Sect3>
- <Title>Find better solution for integer overflow</Title>
- <Para>
- In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine
- <Function>geqo_joinrel_size</Function>,
- the present hack for MAXINT overflow is to set the <ProductName>Postgres</ProductName> integer
- value of <StructField>rel->size</StructField> to its logarithm.
- Modifications of <StructName>Rel</StructName> in <FileName>backend/nodes/relation.h</FileName> will
- surely have severe impacts on the whole <ProductName>Postgres</ProductName> implementation.
- </para>
- </sect3>
- <Sect3>
- <Title>Find solution for exhausted memory</Title>
- <Para>
- Memory exhaustion may occur with more than 10 relations involved in a query.
- In file <FileName>backend/optimizer/geqo/geqo_eval.c</FileName>, routine
- <Function>gimme_tree</Function> is recursively called.
- Maybe I forgot something to be freed correctly, but I dunno what.
- Of course the <StructName>rel</StructName> data structure of the <Command>join</Command> keeps growing and
- growing the more relations are packed into it.
- Suggestions are welcome :-(
- </para>
- </sect3>
- </sect2>
- <BIBLIOGRAPHY Id="geqo-biblio">
- <TITLE>
- References
- </TITLE>
- <PARA>Reference information for <Acronym>GEQ</Acronym> algorithms.
- </PARA>
- <BIBLIOENTRY>
- <BOOKBIBLIO>
- <TITLE>
- The Hitch-Hiker's Guide to Evolutionary Computation
- </TITLE>
- <AUTHORGROUP>
- <AUTHOR>
- <FIRSTNAME>Jörg</FIRSTNAME>
- <SURNAME>Heitkötter</SURNAME>
- </AUTHOR>
- <AUTHOR>
- <FIRSTNAME>David</FIRSTNAME>
- <SURNAME>Beasley</SURNAME>
- </AUTHOR>
- </AUTHORGROUP>
- <PUBLISHER>
- <PUBLISHERNAME>
- InterNet resource
- </PUBLISHERNAME>
- </PUBLISHER>
- <ABSTRACT>
- <Para>
- FAQ in <ULink url="news://comp.ai.genetic">comp.ai.genetic</ULink>
- is available at <ULink url="ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html">Encore</ULink>.
- </Para>
- </ABSTRACT>
- </BOOKBIBLIO>
- <BOOKBIBLIO>
- <TITLE>
- The Design and Implementation of the Postgres Query Optimizer
- </TITLE>
- <AUTHORGROUP>
- <AUTHOR>
- <FIRSTNAME>Z.</FIRSTNAME>
- <SURNAME>Fong</SURNAME>
- </AUTHOR>
- </AUTHORGROUP>
- <PUBLISHER>
- <PUBLISHERNAME>
- University of California, Berkeley Computer Science Department
- </PUBLISHERNAME>
- </PUBLISHER>
- <ABSTRACT>
- <Para>
- File <FileName>planner/Report.ps</FileName> in the 'postgres-papers' distribution.
- </Para>
- </ABSTRACT>
- </BOOKBIBLIO>
- <BOOKBIBLIO>
- <TITLE>
- Fundamentals of Database Systems
- </TITLE>
- <AUTHORGROUP>
- <AUTHOR>
- <FIRSTNAME>R.</FIRSTNAME>
- <SURNAME>Elmasri</SURNAME>
- </AUTHOR>
- <AUTHOR>
- <FIRSTNAME>S.</FIRSTNAME>
- <SURNAME>Navathe</SURNAME>
- </AUTHOR>
- </AUTHORGROUP>
- <PUBLISHER>
- <PUBLISHERNAME>
- The Benjamin/Cummings Pub., Inc.
- </PUBLISHERNAME>
- </PUBLISHER>
- </BOOKBIBLIO>
- </BIBLIOENTRY>
- </BIBLIOGRAPHY>
- </sect1>
- </Chapter>
- <!-- Keep this comment at the end of the file
- Local variables:
- mode: sgml
- sgml-omittag:nil
- sgml-shorttag:t
- sgml-minimize-attributes:nil
- sgml-always-quote-attributes:t
- sgml-indent-step:1
- sgml-indent-data:t
- sgml-parent-document:nil
- sgml-default-dtd-file:"./reference.ced"
- sgml-exposed-tags:nil
- sgml-local-catalogs:"/usr/lib/sgml/catalog"
- sgml-local-ecat-files:nil
- End:
- -->