manual.txt
上传用户:tsgydb
上传日期:2007-04-14
资源大小:10674k
文件大小:1762k
- several date-part extraction functions, such as `YEAR()', `MONTH()', and
- `DAYOFMONTH()'. `MONTH()' is the appropriate function here. To see
- how it works, run a simple query that displays the value of both
- `birth' and `MONTH(birth)':
- mysql> SELECT name, birth, MONTH(birth) FROM pet;
- +----------+------------+--------------+
- | name | birth | MONTH(birth) |
- +----------+------------+--------------+
- | Fluffy | 1993-02-04 | 2 |
- | Claws | 1994-03-17 | 3 |
- | Buffy | 1989-05-13 | 5 |
- | Fang | 1990-08-27 | 8 |
- | Bowser | 1989-08-31 | 8 |
- | Chirpy | 1998-09-11 | 9 |
- | Whistler | 1997-12-09 | 12 |
- | Slim | 1996-04-29 | 4 |
- | Puffball | 1999-03-30 | 3 |
- +----------+------------+--------------+
- Finding animals with birthdays in the upcoming month is easy, too.
- Suppose the current month is April. Then the month value is `4' and
- you look for animals born in May (month 5) like this:
- mysql> SELECT name, birth FROM pet WHERE MONTH(birth) = 5;
- +-------+------------+
- | name | birth |
- +-------+------------+
- | Buffy | 1989-05-13 |
- +-------+------------+
- There is a small complication if the current month is December, of
- course. You don't just add one to the month number (`12') and look for
- animals born in month 13, because there is no such month. Instead, you
- look for animals born in January (month 1).
- You can even write the query so that it works no matter what the current
- month is. That way you don't have to use a particular month number in
- the query. `DATE_ADD()' allows you to add a time interval to a given
- date. If you add a month to the value of `NOW()', then extract the
- month part with `MONTH()', the result produces the month in which to
- look for birthdays:
- mysql> SELECT name, birth FROM pet
- -> WHERE MONTH(birth) = MONTH(DATE_ADD(NOW(), INTERVAL 1 MONTH));
- A different way to accomplish the same task is to add `1' to get the
- next month after the current one (after using the modulo function
- (`MOD') to wrap around the month value to `0' if it is currently `12'):
- mysql> SELECT name, birth FROM pet
- -> WHERE MONTH(birth) = MOD(MONTH(NOW()), 12) + 1;
- Note that `MONTH' returns a number between 1 and 12. And
- `MOD(something,12)' returns a number between 0 and 11. So the addition
- has to be after the `MOD()' otherwise we would go from November (11) to
- January (1).
- Working with `NULL' Values
- ..........................
- The `NULL' value can be surprising until you get used to it.
- Conceptually, `NULL' means missing value or unknown value and it is
- treated somewhat differently than other values. To test for `NULL',
- you cannot use the arithmetic comparison operators such as `=', `<', or
- `!='. To demonstrate this for yourself, try the following query:
- mysql> SELECT 1 = NULL, 1 != NULL, 1 < NULL, 1 > NULL;
- +----------+-----------+----------+----------+
- | 1 = NULL | 1 != NULL | 1 < NULL | 1 > NULL |
- +----------+-----------+----------+----------+
- | NULL | NULL | NULL | NULL |
- +----------+-----------+----------+----------+
- Clearly you get no meaningful results from these comparisons. Use the
- `IS NULL' and `IS NOT NULL' operators instead:
- mysql> SELECT 1 IS NULL, 1 IS NOT NULL;
- +-----------+---------------+
- | 1 IS NULL | 1 IS NOT NULL |
- +-----------+---------------+
- | 0 | 1 |
- +-----------+---------------+
- In *MySQL*, 0 or `NULL' means false and anything else means true. The
- default trueth value from a boolean operation is 1.
- This special treatment of `NULL' is why, in the previous section, it
- was necessary to determine which animals are no longer alive using
- `death IS NOT NULL' instead of `death != NULL'.
- Pattern Matching
- ................
- *MySQL* provides standard SQL pattern matching as well as a form of
- pattern matching based on extended regular expressions similar to those
- used by Unix utilities such as `vi', `grep', and `sed'.
- SQL pattern matching allows you to use `_' to match any single
- character and `%' to match an arbitrary number of characters (including
- zero characters). In *MySQL*, SQL patterns are case insensitive by
- default. Some examples are shown below. Note that you do not use `='
- or `!=' when you use SQL patterns; use the `LIKE' or `NOT LIKE'
- comparison operators instead.
- To find names beginning with `b':
- mysql> SELECT * FROM pet WHERE name LIKE "b%";
- +--------+--------+---------+------+------------+------------+
- | name | owner | species | sex | birth | death |
- +--------+--------+---------+------+------------+------------+
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- | Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
- +--------+--------+---------+------+------------+------------+
- To find names ending with `fy':
- mysql> SELECT * FROM pet WHERE name LIKE "%fy";
- +--------+--------+---------+------+------------+-------+
- | name | owner | species | sex | birth | death |
- +--------+--------+---------+------+------------+-------+
- | Fluffy | Harold | cat | f | 1993-02-04 | NULL |
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- +--------+--------+---------+------+------------+-------+
- To find names containing a `w':
- mysql> SELECT * FROM pet WHERE name LIKE "%w%";
- +----------+-------+---------+------+------------+------------+
- | name | owner | species | sex | birth | death |
- +----------+-------+---------+------+------------+------------+
- | Claws | Gwen | cat | m | 1994-03-17 | NULL |
- | Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
- | Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
- +----------+-------+---------+------+------------+------------+
- To find names containing exactly five characters, use the `_' pattern
- character:
- mysql> SELECT * FROM pet WHERE name LIKE "_____";
- +-------+--------+---------+------+------------+-------+
- | name | owner | species | sex | birth | death |
- +-------+--------+---------+------+------------+-------+
- | Claws | Gwen | cat | m | 1994-03-17 | NULL |
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- +-------+--------+---------+------+------------+-------+
- The other type of pattern matching provided by *MySQL* uses extended
- regular expressions. When you test for a match for this type of
- pattern, use the `REGEXP' and `NOT REGEXP' operators (or `RLIKE' and
- `NOT RLIKE', which are synonyms).
- Some characteristics of extended regular expressions are:
- * `.' matches any single character.
- * A character class `[...]' matches any character within the
- brackets. For example, `[abc]' matches `a', `b', or `c'. To name
- a range of characters, use a dash. `[a-z]' matches any lowercase
- letter, whereas `[0-9]' matches any digit.
- * `*' matches zero or more instances of the thing preceding it. For
- example, `x*' matches any number of `x' characters, `[0-9]*'
- matches any number of digits, and `.*' matches any number of
- anything.
- * Regular expressions are case sensitive, but you can use a
- character class to match both lettercases if you wish. For
- example, `[aA]' matches lowercase or uppercase `a' and `[a-zA-Z]'
- matches any letter in either case.
- * The pattern matches if it occurs anywhere in the value being
- tested. (SQL patterns match only if they match the entire value.)
- * To anchor a pattern so that it must match the beginning or end of
- the value being tested, use `^' at the beginning or `$' at the end
- of the pattern.
- To demonstrate how extended regular expressions work, the `LIKE' queries
- shown above are rewritten below to use `REGEXP'.
- To find names beginning with `b', use `^' to match the beginning of the
- name:
- mysql> SELECT * FROM pet WHERE name REGEXP "^b";
- +--------+--------+---------+------+------------+------------+
- | name | owner | species | sex | birth | death |
- +--------+--------+---------+------+------------+------------+
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- | Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
- +--------+--------+---------+------+------------+------------+
- Prior to MySQL 3.23.4, `REGEXP' is case sensitive, and the previous
- query will return no rows. To match either lowercase or uppercase `b',
- use this query instead:
- mysql> SELECT * FROM pet WHERE name REGEXP "^[bB]";
- From MySQL 3.23.4 on, to force a `REGEXP' comparison to be case
- sensitive, use the `BINARY' keyword to make one of the strings a binary
- string. This query will match only lowercase `b' at the beginning of a
- name:
- mysql> SELECT * FROM pet WHERE name REGEXP BINARY "^b";
- To find names ending with `fy', use `$' to match the end of the name:
- mysql> SELECT * FROM pet WHERE name REGEXP "fy$";
- +--------+--------+---------+------+------------+-------+
- | name | owner | species | sex | birth | death |
- +--------+--------+---------+------+------------+-------+
- | Fluffy | Harold | cat | f | 1993-02-04 | NULL |
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- +--------+--------+---------+------+------------+-------+
- To find names containing a lowercase or uppercase `w', use this query:
- mysql> SELECT * FROM pet WHERE name REGEXP "w";
- +----------+-------+---------+------+------------+------------+
- | name | owner | species | sex | birth | death |
- +----------+-------+---------+------+------------+------------+
- | Claws | Gwen | cat | m | 1994-03-17 | NULL |
- | Bowser | Diane | dog | m | 1989-08-31 | 1995-07-29 |
- | Whistler | Gwen | bird | NULL | 1997-12-09 | NULL |
- +----------+-------+---------+------+------------+------------+
- Because a regular expression pattern matches if it occurs anywhere in
- the value, it is not necessary in the previous query to put a wild card
- on either side of the pattern to get it to match the entire value like
- it would be if you used a SQL pattern.
- To find names containing exactly five characters, use `^' and `$' to
- match the beginning and end of the name, and five instances of `.' in
- between:
- mysql> SELECT * FROM pet WHERE name REGEXP "^.....$";
- +-------+--------+---------+------+------------+-------+
- | name | owner | species | sex | birth | death |
- +-------+--------+---------+------+------------+-------+
- | Claws | Gwen | cat | m | 1994-03-17 | NULL |
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- +-------+--------+---------+------+------------+-------+
- You could also write the previous query using the `{n}'
- "repeat-`n'-times" operator:
- mysql> SELECT * FROM pet WHERE name REGEXP "^.{5}$";
- +-------+--------+---------+------+------------+-------+
- | name | owner | species | sex | birth | death |
- +-------+--------+---------+------+------------+-------+
- | Claws | Gwen | cat | m | 1994-03-17 | NULL |
- | Buffy | Harold | dog | f | 1989-05-13 | NULL |
- +-------+--------+---------+------+------------+-------+
- Counting Rows
- .............
- Databases are often used to answer the question, "How often does a
- certain type of data occur in a table?" For example, you might want to
- know how many pets you have, or how many pets each owner has, or you
- might want to perform various kinds of censuses on your animals.
- Counting the total number of animals you have is the same question as
- "How many rows are in the `pet' table?" because there is one record per
- pet. The `COUNT()' function counts the number of non-`NULL' results, so
- the query to count your animals looks like this:
- mysql> SELECT COUNT(*) FROM pet;
- +----------+
- | COUNT(*) |
- +----------+
- | 9 |
- +----------+
- Earlier, you retrieved the names of the people who owned pets. You can
- use `COUNT()' if you want to find out how many pets each owner has:
- mysql> SELECT owner, COUNT(*) FROM pet GROUP BY owner;
- +--------+----------+
- | owner | COUNT(*) |
- +--------+----------+
- | Benny | 2 |
- | Diane | 2 |
- | Gwen | 3 |
- | Harold | 2 |
- +--------+----------+
- Note the use of `GROUP BY' to group together all records for each
- `owner'. Without it, all you get is an error message:
- mysql> SELECT owner, COUNT(owner) FROM pet;
- ERROR 1140 at line 1: Mixing of GROUP columns (MIN(),MAX(),COUNT()...)
- with no GROUP columns is illegal if there is no GROUP BY clause
- `COUNT()' and `GROUP BY' are useful for characterizing your data in
- various ways. The following examples show different ways to perform
- animal census operations.
- Number of animals per species:
- mysql> SELECT species, COUNT(*) FROM pet GROUP BY species;
- +---------+----------+
- | species | COUNT(*) |
- +---------+----------+
- | bird | 2 |
- | cat | 2 |
- | dog | 3 |
- | hamster | 1 |
- | snake | 1 |
- +---------+----------+
- Number of animals per sex:
- mysql> SELECT sex, COUNT(*) FROM pet GROUP BY sex;
- +------+----------+
- | sex | COUNT(*) |
- +------+----------+
- | NULL | 1 |
- | f | 4 |
- | m | 4 |
- +------+----------+
- (In this output, `NULL' indicates sex unknown.)
- Number of animals per combination of species and sex:
- mysql> SELECT species, sex, COUNT(*) FROM pet GROUP BY species, sex;
- +---------+------+----------+
- | species | sex | COUNT(*) |
- +---------+------+----------+
- | bird | NULL | 1 |
- | bird | f | 1 |
- | cat | f | 1 |
- | cat | m | 1 |
- | dog | f | 1 |
- | dog | m | 2 |
- | hamster | f | 1 |
- | snake | m | 1 |
- +---------+------+----------+
- You need not retrieve an entire table when you use `COUNT()'. For
- example, the previous query, when performed just on dogs and cats,
- looks like this:
- mysql> SELECT species, sex, COUNT(*) FROM pet
- -> WHERE species = "dog" OR species = "cat"
- -> GROUP BY species, sex;
- +---------+------+----------+
- | species | sex | COUNT(*) |
- +---------+------+----------+
- | cat | f | 1 |
- | cat | m | 1 |
- | dog | f | 1 |
- | dog | m | 2 |
- +---------+------+----------+
- Or, if you wanted the number of animals per sex only for known-sex
- animals:
- mysql> SELECT species, sex, COUNT(*) FROM pet
- -> WHERE sex IS NOT NULL
- -> GROUP BY species, sex;
- +---------+------+----------+
- | species | sex | COUNT(*) |
- +---------+------+----------+
- | bird | f | 1 |
- | cat | f | 1 |
- | cat | m | 1 |
- | dog | f | 1 |
- | dog | m | 2 |
- | hamster | f | 1 |
- | snake | m | 1 |
- +---------+------+----------+
- Using More Than one Table
- -------------------------
- The `pet' table keeps track of which pets you have. If you want to
- record other information about them, such as events in their lives like
- visits to the vet or when litters are born, you need another table.
- What should this table look like? It needs:
- * To contain the pet name so you know which animal each event
- pertains to.
- * A date so you know when the event occurred.
- * A field to describe the event.
- * An event type field, if you want to be able to categorize events.
- Given these considerations, the `CREATE TABLE' statement for the
- `event' table might look like this:
- mysql> CREATE TABLE event (name VARCHAR(20), date DATE,
- -> type VARCHAR(15), remark VARCHAR(255));
- As with the `pet' table, it's easiest to load the initial records by
- creating a tab-delimited text file containing the information:
- Fluffy 1995-05-15 litter 4 kittens, 3 female, 1 male
- Buffy 1993-06-23 litter 5 puppies, 2 female, 3 male
- Buffy 1994-06-19 litter 3 puppies, 3 female
- Chirpy 1999-03-21 vet needed beak straightened
- Slim 1997-08-03 vet broken rib
- Bowser 1991-10-12 kennel
- Fang 1991-10-12 kennel
- Fang 1998-08-28 birthday Gave him a new chew toy
- Claws 1998-03-17 birthday Gave him a new flea collar
- Whistler 1998-12-09 birthday First birthday
- Load the records like this:
- mysql> LOAD DATA LOCAL INFILE "event.txt" INTO TABLE event;
- Based on what you've learned from the queries you've run on the `pet'
- table, you should be able to perform retrievals on the records in the
- `event' table; the principles are the same. But when is the `event'
- table by itself insufficient to answer questions you might ask?
- Suppose you want to find out the ages of each pet when they had their
- litters. The `event' table indicates when this occurred, but to
- calculate the age of the mother, you need her birth date. Because that
- is stored in the `pet' table, you need both tables for the query:
- mysql> SELECT pet.name, (TO_DAYS(date) - TO_DAYS(birth))/365 AS age, remark
- -> FROM pet, event
- -> WHERE pet.name = event.name AND type = "litter";
- +--------+------+-----------------------------+
- | name | age | remark |
- +--------+------+-----------------------------+
- | Fluffy | 2.27 | 4 kittens, 3 female, 1 male |
- | Buffy | 4.12 | 5 puppies, 2 female, 3 male |
- | Buffy | 5.10 | 3 puppies, 3 female |
- +--------+------+-----------------------------+
- There are several things to note about this query:
- * The `FROM' clause lists two tables because the query needs to pull
- information from both of them.
- * When combining (joining) information from multiple tables, you
- need to specify how records in one table can be matched to records
- in the other. This is easy because they both have a `name'
- column. The query uses `WHERE' clause to match up records in the
- two tables based on the `name' values.
- * Because the `name' column occurs in both tables, you must be
- specific about which table you mean when referring to the column.
- This is done by prepending the table name to the column name.
- You need not have two different tables to perform a join. Sometimes it
- is useful to join a table to itself, if you want to compare records in
- a table to other records in that same table. For example, to find
- breeding pairs among your pets, you can join the `pet' table with
- itself to pair up males and females of like species:
- mysql> SELECT p1.name, p1.sex, p2.name, p2.sex, p1.species
- -> FROM pet AS p1, pet AS p2
- -> WHERE p1.species = p2.species AND p1.sex = "f" AND p2.sex = "m";
- +--------+------+--------+------+---------+
- | name | sex | name | sex | species |
- +--------+------+--------+------+---------+
- | Fluffy | f | Claws | m | cat |
- | Buffy | f | Fang | m | dog |
- | Buffy | f | Bowser | m | dog |
- +--------+------+--------+------+---------+
- In this query, we specify aliases for the table name in order to refer
- to the columns and keep straight which instance of the table each
- column reference is associated with.
- Getting Information About Databases and Tables
- ==============================================
- What if you forget the name of a database or table, or what the
- structure of a given table is (for example, what its columns are
- called)? *MySQL* addresses this problem through several statements
- that provide information about the databases and tables it supports.
- You have already seen `SHOW DATABASES', which lists the databases
- managed by the server. To find out which database is currently
- selected, use the `DATABASE()' function:
- mysql> SELECT DATABASE();
- +------------+
- | DATABASE() |
- +------------+
- | menagerie |
- +------------+
- If you haven't selected any database yet, the result is blank.
- To find out what tables the current database contains (for example, when
- you're not sure about the name of a table), use this command:
- mysql> SHOW TABLES;
- +---------------------+
- | Tables in menagerie |
- +---------------------+
- | event |
- | pet |
- +---------------------+
- If you want to find out about the structure of a table, the `DESCRIBE'
- command is useful; it displays information about each of a table's
- columns:
- mysql> DESCRIBE pet;
- +---------+-------------+------+-----+---------+-------+
- | Field | Type | Null | Key | Default | Extra |
- +---------+-------------+------+-----+---------+-------+
- | name | varchar(20) | YES | | NULL | |
- | owner | varchar(20) | YES | | NULL | |
- | species | varchar(20) | YES | | NULL | |
- | sex | char(1) | YES | | NULL | |
- | birth | date | YES | | NULL | |
- | death | date | YES | | NULL | |
- +---------+-------------+------+-----+---------+-------+
- `Field' indicates the column name, `Type' is the data type for the
- column, `Null' indicates whether or not the column can contain `NULL'
- values, `Key' indicates whether or not the column is indexed, and
- `Default' specifies the column's default value.
- If you have indexes on a table, `SHOW INDEX FROM tbl_name' produces
- information about them.
- Using `mysql' in Batch Mode
- ===========================
- In the previous sections, you used `mysql' interactively to enter
- queries and view the results. You can also run `mysql' in batch mode.
- To do this, put the commands you want to run in a file, then tell
- `mysql' to read its input from the file:
- shell> mysql < batch-file
- If you need to specify connection parameters on the command line, the
- command might look like this:
- shell> mysql -h host -u user -p < batch-file
- Enter password: ********
- When you use `mysql' this way, you are creating a script file, then
- executing the script.
- Why use a script? Here are a few reasons:
- * If you run a query repeatedly (say, every day or every week),
- making it a script allows you to avoid retyping it each time you
- execute it.
- * You can generate new queries from existing ones that are similar
- by copying and editing script files.
- * Batch mode can also be useful while you're developing a query,
- particularly for multiple-line commands or multiple-statement
- sequences of commands. If you make a mistake, you don't have to
- retype everything. Just edit your script to correct the error,
- then tell `mysql' to execute it again.
- * If you have a query that produces a lot of output, you can run the
- output through a pager rather than watching it scroll off the top
- of your screen:
- shell> mysql < batch-file | more
- * You can catch the output in a file for further processing:
- shell> mysql < batch-file > mysql.out
- * You can distribute your script to other people so they can run the
- commands, too.
- * Some situations do not allow for interactive use, for example,
- when you run a query from a `cron' job. In this case, you must
- use batch mode.
- The default output format is different (more concise) when you run
- `mysql' in batch mode than when you use it interactively. For example,
- the output of `SELECT DISTINCT species FROM pet' looks like this when
- run interactively:
- +---------+
- | species |
- +---------+
- | bird |
- | cat |
- | dog |
- | hamster |
- | snake |
- +---------+
- But like this when run in batch mode:
- species
- bird
- cat
- dog
- hamster
- snake
- If you want to get the interactive output format in batch mode, use
- `mysql -t'. To echo to the output the commands that are executed, use
- `mysql -vvv'.
- Queries from Twin Project
- =========================
- At Analytikerna and Lentus, we have been doing the systems and field
- work for a big research project. This project is a collaboration
- between the Institute of Environmental Medicine at Karolinska
- Institutet Stockholm and the Section on Clinical Research in Aging and
- Psychology at the University of Southern California.
- The project involves a screening part where all twins in Sweden older
- than 65 years are interviewed by telephone. Twins who meet certain
- criteria are passed on to the next stage. In this latter stage, twins
- who want to participate are visited by a doctor/nurse team. Some of the
- examinations include physical and neuropsychological examination,
- laboratory testing, neuroimaging, psychological status assessment, and
- family history collection. In addition, data are collected on medical
- and environmental risk factors.
- More information about Twin studies can be found at:
- <http://www.imm.ki.se/TWIN/TWINUKW.HTM>
- The latter part of the project is administered with a Web interface
- written using Perl and *MySQL*.
- Each night all data from the interviews are moved into a *MySQL*
- database.
- Find all Non-distributed Twins
- ------------------------------
- The following query is used to determine who goes into the second part
- of the project:
- select
- concat(p1.id, p1.tvab) + 0 as tvid,
- concat(p1.christian_name, " ", p1.surname) as Name,
- p1.postal_code as Code,
- p1.city as City,
- pg.abrev as Area,
- if(td.participation = "Aborted", "A", " ") as A,
- p1.dead as dead1,
- l.event as event1,
- td.suspect as tsuspect1,
- id.suspect as isuspect1,
- td.severe as tsevere1,
- id.severe as isevere1,
- p2.dead as dead2,
- l2.event as event2,
- h2.nurse as nurse2,
- h2.doctor as doctor2,
- td2.suspect as tsuspect2,
- id2.suspect as isuspect2,
- td2.severe as tsevere2,
- id2.severe as isevere2,
- l.finish_date
- from
- twin_project as tp
- /* For Twin 1 */
- left join twin_data as td on tp.id = td.id and tp.tvab = td.tvab
- left join informant_data as id on tp.id = id.id and tp.tvab = id.tvab
- left join harmony as h on tp.id = h.id and tp.tvab = h.tvab
- left join lentus as l on tp.id = l.id and tp.tvab = l.tvab
- /* For Twin 2 */
- left join twin_data as td2 on p2.id = td2.id and p2.tvab = td2.tvab
- left join informant_data as id2 on p2.id = id2.id and p2.tvab = id2.tvab
- left join harmony as h2 on p2.id = h2.id and p2.tvab = h2.tvab
- left join lentus as l2 on p2.id = l2.id and p2.tvab = l2.tvab,
- person_data as p1,
- person_data as p2,
- postal_groups as pg
- where
- /* p1 gets main twin and p2 gets his/her twin. */
- /* ptvab is a field inverted from tvab */
- p1.id = tp.id and p1.tvab = tp.tvab and
- p2.id = p1.id and p2.ptvab = p1.tvab and
- /* Just the sceening survey */
- tp.survey_no = 5 and
- /* Skip if partner died before 65 but allow emigration (dead=9) */
- (p2.dead = 0 or p2.dead = 9 or
- (p2.dead = 1 and
- (p2.death_date = 0 or
- (((to_days(p2.death_date) - to_days(p2.birthday)) / 365)
- >= 65))))
- and
- (
- /* Twin is suspect */
- (td.future_contact = 'Yes' and td.suspect = 2) or
- /* Twin is suspect - Informant is Blessed */
- (td.future_contact = 'Yes' and td.suspect = 1 and id.suspect = 1) or
- /* No twin - Informant is Blessed */
- (ISNULL(td.suspect) and id.suspect = 1 and id.future_contact = 'Yes') or
- /* Twin broken off - Informant is Blessed */
- (td.participation = 'Aborted'
- and id.suspect = 1 and id.future_contact = 'Yes') or
- /* Twin broken off - No inform - Have partner */
- (td.participation = 'Aborted' and ISNULL(id.suspect) and p2.dead = 0))
- and
- l.event = 'Finished'
- /* Get at area code */
- and substring(p1.postal_code, 1, 2) = pg.code
- /* Not already distributed */
- and (h.nurse is NULL or h.nurse=00 or h.doctor=00)
- /* Has not refused or been aborted */
- and not (h.status = 'Refused' or h.status = 'Aborted'
- or h.status = 'Died' or h.status = 'Other')
- order by
- tvid;
- Some explanations:
- `concat(p1.id, p1.tvab) + 0 as tvid'
- We want to sort on the concatenated `id' and `tvab' in numerical
- order. Adding `0' to the result causes *MySQL* to treat the result
- as a number.
- column `id'
- This identifies a pair of twins. It is a key in all tables.
- column `tvab'
- This identifies a twin in a pair. It has a value of `1' or `2'.
- column `ptvab'
- This is an inverse of `tvab'. When `tvab' is `1' this is `2', and
- vice versa. It exists to save typing and to make it easier for
- *MySQL* to optimize the query.
- This query demonstrates, among other things, how to do lookups on a
- table from the same table with a join (`p1' and `p2'). In the example,
- this is used to check whether a twin's partner died before the age of
- 65. If so, the row is not returned.
- All of the above exist in all tables with twin-related information. We
- have a key on both `id,tvab' (all tables), and `id,ptvab'
- (`person_data') to make queries faster.
- On our production machine (A 200MHz UltraSPARC), this query returns
- about 150-200 rows and takes less than one second.
- The current number of records in the tables used above:
- *Table* *Rows*
- `person_data' 71074
- `lentus' 5291
- `twin_project' 5286
- `twin_data' 2012
- `informant_data' 663
- `harmony' 381
- `postal_groups' 100
- Show a Table on Twin Pair Status
- --------------------------------
- Each interview ends with a status code called `event'. The query shown
- below is used to display a table over all twin pairs combined by event.
- This indicates in how many pairs both twins are finished, in how many
- pairs one twin is finished and the other refused, and so on.
- select
- t1.event,
- t2.event,
- count(*)
- from
- lentus as t1,
- lentus as t2,
- twin_project as tp
- where
- /* We are looking at one pair at a time */
- t1.id = tp.id
- and t1.tvab=tp.tvab
- and t1.id = t2.id
- /* Just the sceening survey */
- and tp.survey_no = 5
- /* This makes each pair only appear once */
- and t1.tvab='1' and t2.tvab='2'
- group by
- t1.event, t2.event;
- MySQL Server Functions
- **********************
- What Languages Are Supported by MySQL?
- ======================================
- `mysqld' can issue error messages in the following languages: Czech,
- Danish, Dutch, English (the default), Estonian, French, German, Greek,
- Hungarian, Italian, Japanese, Korean, Norwegian, Norwegian-ny, Polish,
- Portuguese, Romanian, Russian, Slovak, Spanish, and Swedish.
- To start `mysqld' with a particular language, use either the
- `--language=lang' or `-L lang' options. For example:
- shell> mysqld --language=swedish
- or:
- shell> mysqld --language=/usr/local/share/swedish
- Note that all language names are specified in lowercase.
- The language files are located (by default) in
- `MYSQL_BASE_DIR/share/LANGUAGE/'.
- To update the error message file, you should edit the `errmsg.txt' file
- and execute the following command to generate the `errmsg.sys' file:
- shell> comp_err errmsg.txt errmsg.sys
- If you upgrade to a newer version of *MySQL*, remember to repeat your
- changes with the new `errmsg.txt' file.
- The Character Set Used for Data and Sorting
- -------------------------------------------
- By default, *MySQL* uses the ISO-8859-1 (Latin1) character set. This is
- the character set used in the USA and western Europe.
- All standard *MySQL* binaries are compiled with
- `--with-extra-charsets=complex'. This will add code to all standard
- programs to be able to handle `latin1' and all multi-byte character
- sets within the binary. Other character sets will be loaded from a
- character-set definition file when needed.
- The character set determines what characters are allowed in names and
- how things are sorted by the `ORDER BY' and `GROUP BY' clauses of the
- `SELECT' statement.
- You can change the character set with the `--default-character-set'
- option when you start the server. The character sets available depend
- on the `--with-charset=charset' option to `configure', and the
- character set configuration files listed in `SHAREDIR/charsets/Index'.
- *Note Quick install::.
- If you change the character set when running MySQL (which may also
- change the sort order), you must run myisamchk -r -q on all tables.
- Otherwise your indexes may not be ordered correctly.
- When a client connects to a *MySQL* server, the server sends the
- default character set in use to the client. The client will switch to
- use this character set for this connection.
- One should use `mysql_real_escape_string()' when escaping strings for a
- SQL query. `mysql_real_escape_string()' is identical to the old
- `mysql_escape_string()' function, except that it takes the MYSQL
- connection handle as the first parameter.
- If the client is compiled with different paths than where the server is
- installed and the user who configured *MySQL* didn't included all
- character sets in the *MySQL* binary, one must specify for the client
- where it can find the additional character sets it will need if the
- server runs with a different character set than the client.
- One can specify this by putting in a *MySQL* option file:
- [client]
- character-sets-dir=/usr/local/mysql/share/mysql/charsets
- where the path points to where the dynamic *MySQL* character sets are
- stored.
- One can force the client to use specific character set by specifying:
- [client]
- default-character-set=character-set-name
- but normally this is never needed.
- Adding a New Character Set
- --------------------------
- To add another character set to *MySQL*, use the following procedure.
- Decide if the set is simple or complex. If the character set does not
- need to use special string collating routines for sorting and does not
- need multi-byte character support, it is simple. If it needs either of
- those features, it is complex.
- For example, `latin1' and `danish' are simple charactersets while
- `big5' or `czech' are complex character sets.
- In the following section, we have assumed that you name your character
- set `MYSET'.
- For a simple character set do the following:
- 1. Add MYSET to the end of the `sql/share/charsets/Index' file Assign
- an unique number to it.
- 2. Create the file `sql/share/charsets/MYSET.conf'. (You can use
- `sql/share/charsets/latin1.conf' as a base for this).
- The syntax for the file very simple:
- * Comments start with a '#' character and proceed to the end of
- the line.
- * Words are separated by arbitrary amounts of whitespace.
- * When defining the character set, every word must be a number
- in hexadecimal format
- * The `ctype' array takes up the first 257 words. The
- `to_lower', `to_upper' and `sort_order' arrays take up 256
- words each after that.
- *Note Character arrays::.
- 3. Add the character set name to the `CHARSETS_AVAILABLE' and
- `COMPILED_CHARSETS' lists in `configure.in'.
- 4. Reconfigure, recompile, and test.
- For a complex character set do the following:
- 1. Create the file `strings/ctype-MYSET.c' in the *MySQL* source
- distribution.
- 2. Add MYSET to the end of the `sql/share/charsets/Index' file.
- Assign an unique number to it.
- 3. Look at one of the existing `ctype-*.c' files to see what needs to
- be defined, for example `strings/ctype-big5.c'. Note that the
- arrays in your file must have names like `ctype_MYSET',
- `to_lower_MYSET', and so on. This corresponds to the arrays in
- the simple character set. *Note Character arrays::. For a complex
- character set
- 4. Near the top of the file, place a special comment like this:
- /*
- * This comment is parsed by configure to create ctype.c,
- * so don't change it unless you know what you are doing.
- *
- * .configure. number_MYSET=MYNUMBER
- * .configure. strxfrm_multiply_MYSET=N
- * .configure. mbmaxlen_MYSET=N
- */
- The `configure' program uses this comment to include the character
- set into the *MySQL* library automatically.
- The strxfrm_multiply and mbmaxlen lines will be explained in the
- following sections. Only include them if you the string collating
- functions or the multi-byte character set functions, respectively.
- 5. You should then create some of the following functions:
- * `my_strncoll_MYSET()'
- * `my_strcoll_MYSET()'
- * `my_strxfrm_MYSET()'
- * `my_like_range_MYSET()'
- *Note String collating::.
- 6. Add the character set name to the `CHARSETS_AVAILABLE' and
- `COMPILED_CHARSETS' lists in `configure.in'.
- 7. Reconfigure, recompile, and test.
- The file `sql/share/charsets/README' includes some more instructions.
- If you want to have the character set included in the *MySQL*
- distribution, mail a patch to <internals@lists.mysql.com>.
- The character definition arrays
- -------------------------------
- `to_lower[]' and `to_upper[]' are simple arrays that hold the lowercase
- and uppercase characters corresponding to each member of the character
- set. For example:
- to_lower['A'] should contain 'a'
- to_upper['a'] should contain 'A'
- `sort_order[]' is a map indicating how characters should be ordered for
- comparison and sorting purposes. For many character sets, this is the
- same as `to_upper[]' (which means sorting will be case insensitive).
- *MySQL* will sort characters based on the value of
- `sort_order[character]'. For more complicated sorting rules, see the
- discussion of string collating below. *Note String collating::.
- `ctype[]' is an array of bit values, with one element for one character.
- (Note that `to_lower[]', `to_upper[]', and `sort_order[]' are indexed
- by character value, but `ctype[]' is indexed by character value + 1.
- This is an old legacy to be able to handle EOF.)
- You can find the following bitmask definitions in `m_ctype.h':
- #define _U 01 /* Uppercase */
- #define _L 02 /* Lowercase */
- #define _N 04 /* Numeral (digit) */
- #define _S 010 /* Spacing character */
- #define _P 020 /* Punctuation */
- #define _C 040 /* Control character */
- #define _B 0100 /* Blank */
- #define _X 0200 /* heXadecimal digit */
- The `ctype[]' entry for each character should be the union of the
- applicable bitmask values that describe the character. For example,
- `'A'' is an uppercase character (`_U') as well as a hexadecimal digit
- (`_X'), so `ctype['A'+1]' should contain the value:
- _U + _X = 01 + 0200 = 0201
- String Collating Support
- ------------------------
- If the sorting rules for your language are too complex to be handled
- with the simple `sort_order[]' table, you need to use the string
- collating functions.
- Right now the best documentation on this is the character sets that are
- already implemented. Look at the big5, czech, gbk, sjis, and tis160
- character sets for examples.
- You must specify the `strxfrm_multiply_MYSET=N' value in the special
- comment at the top of the file. `N' should be set to the maximum ratio
- the strings may grow during `my_strxfrm_MYSET' (it must be a positive
- integer).
- Multi-byte Character Support
- ----------------------------
- If your want to add support for a new character set that includes
- multi-byte characters, you need to use the multi-byte character
- functions.
- Right now the best documentation on this is the character sets that are
- already implemented. Look at the euc_kr, gb2312, gbk, sjis and ujis
- character sets for examples. These are implemented in the
- `ctype-'charset'.c' files in the `strings' directory.
- You must specify the `mbmaxlen_MYSET=N' value in the special comment at
- the top of the source file. `N' should be set to the size in bytes of
- the largest character in the set.
- How Big MySQL Tables Can Be
- ===========================
- *MySQL* Version 3.22 has a 4G limit on table size. With the new
- `MyISAM' in *MySQL* Version 3.23 the maximum table size is pushed up to
- 8 million terabytes (2 ^ 63 bytes).
- Note, however, that operating systems have their own file size limits.
- Here are some examples:
- *Operating System* *File Size Limit*
- Linux-Intel 32 bit 2G, 4G or bigger depending on Linux
- version
- Linux-Alpha 8T (?)
- Solaris 2.5.1 2G (possible 4G with patch)
- Solaris 2.6 4G
- Solaris 2.7 Intel 4G
- Solaris 2.7 ULTRA-SPARC 8T (?)
- On Linux 2.2 you can get bigger tables than 2G by using the LFS patch
- for the ext2 file system. On Linux 2.4 there exists also patches for
- ReiserFS to get support for big files.
- This means that the table size for *MySQL* is normally limited by the
- operating system.
- By default, *MySQL* tables have a maximum size of about 4G. You can
- check the maximum table size for a table with the `SHOW TABLE STATUS'
- command or with the `myisamchk -dv table_name'. *Note SHOW::.
- If you need bigger tables than 4G (and your operating system supports
- this), you should set the `AVG_ROW_LENGTH' and `MAX_ROWS' parameter
- when you create your table. *Note CREATE TABLE::. You can also set
- these later with `ALTER TABLE'. *Note ALTER TABLE::.
- If your big table is going to be read-only, you could use `myisampack'
- to merge and compress many tables to one. `myisampack' usually
- compresses a table by at least 50%, so you can have, in effect, much
- bigger tables. *Note `myisampack': myisampack.
- You can go around the operating system file limit for `MyISAM' data
- files by using the `RAID' option. *Note CREATE TABLE::.
- Another solution can be the included MERGE library, which allows you to
- handle a collection of identical tables as one. *Note MERGE tables:
- MERGE.
- Replication in MySQL
- ********************
- Introduction
- ============
- One way replication can be used is to increase both robustness and
- speed. For robustness you can have two systems and can switch to the
- backup if you have problems with the master. The extra speed is
- achieved by sending a part of the non-updating queries to the replica
- server. Of course this only works if non-updating queries dominate, but
- that is the normal case.
- Starting in Version 3.23.15, *MySQL* supports one-way replication
- internally. One server acts as the master, while the other acts as the
- slave. Note that one server could play the roles of master in one pair
- and slave in the other. The master server keeps a binary log of updates
- (*Note Binary log::.) and an index file to binary logs to keep track of
- log rotation. The slave, upon connecting, informs the master where it
- left off since the last successfully propagated update, catches up on
- the updates, and then blocks and waits for the master to notify it of
- the new updates.
- Note that if you are replicating a database, all updates to this
- database should be done through the master!
- On older servers one can use the update log to do simple replication.
- *Note Log Replication::.
- Another benefit of using replication is that one can get live backups of
- the system by doing a backup on a slave instead of doing it on the
- master. *Note Backup::.
- Replication Implementation Overview
- ===================================
- *MySQL* replication is based on the server keeping track of all changes
- to your database (updates, deletes, etc) in the binary log. (*Note
- Binary log::.) and the slave server(s) reading the saved queries from
- the master server's binary log so that the slave can execute the same
- queries on its copy of the data.
- It is *very important* to realize that the binary log is simply a
- record starting from a fixed point in time (the moment you enable binary
- logging). Any slaves which you set up will need copies of all the data
- from your master as it existed the moment that you enabled binary
- logging on the master. If you start your slaves with data that doesn't
- agree with what was on the master *when the binary log was started*,
- your slaves may fail.
- A future version (4.0) of *MySQL* will remove the need to keep a
- (possibly large) snapshot of data for new slaves that you might wish to
- set up through the live backup functionality with no locking required.
- However, at this time, it is necessary to block all writes either with a
- global read lock or by shutting down the master while taking a snapshot.
- Once a slave is properly configured and running, it will simply connect
- to the master and wait for updates to process. If the master goes away
- or the slave loses connectivity with your master, it will keep trying to
- connect every `master-connect-retry' seconds until it is able to
- reconnect and resume listening for updates.
- Each slave keeps track of where it left off. The master server has no
- knowledge of how many slaves there are or which ones are up-to-date at
- any given time.
- The next section explains the master/slave setup process in more detail.
- HOWTO
- =====
- Below is a quick description of how to set up complete replication on
- your current *MySQL* server. It assumes you want to replicate all your
- databases and have not configured replication before. You will need to
- shutdown your master server briefly to complete the steops outlined
- below.
- 1. Make sure you have a recent version of *MySQL* installed on the
- master and slave(s).
- Use Version 3.23.29 or higher. Previous releases used a different
- binary log format and had bugs which have been fixed in newer
- releases. Please, do not report bugs until you have verified that
- the problem is present in the latest release.
- 2. Set up special a replication user on the master with the `FILE'
- privilege and permission to connect from all the slaves. If the
- user is only doing replication (which is recommended), you don't
- need to grant any additional privileges.
- For example, to create a user named `repl' which can access your
- master from any host, you might use this command:
- GRANT FILE ON *.* TO repl@"%" IDENTIFIED BY '<password>';
- 3. Shut down *MySQL* on the master.
- mysqladmin -u root -p<password> shutdown
- 4. Snapshot all the data on your master server.
- The easiest way to do this (on Unix) is to simply use *tar* to
- produce an archvie of your entrie data directory. The exact data
- directory location depends on your installation.
- tar -cvf /tmp/mysql-snapshot.tar /path/to/data-dir
- Windows users can use WinZip or similar software to create an
- archive of the data directory.
- 5. In `my.cnf' on the master add `log-bin' and `server-id=unique
- number' to the `[mysqld]' section and restart it. It is very
- important that the id of the slave is different from the id of the
- master. Think of `server-id' as something similar to the IP
- address - it uniquely identifies the server instance in the
- comminity of replication partners.
- [mysqld]
- log-bin
- server-id=1
- 6. Restart *MySQL* on the master.
- 7. Add the following to `my.cnf' on the slave(s):
- master-host=<hostname of the master>
- master-user=<replication user name>
- master-password=<replication user password>
- master-port=<TCP/IP port for master>
- server-id=<some unique number between 2 and 2^32-1>
- replacing the values in <> with what is relevant to your system.
- `server-id' must be different for each server participating in
- replication. If you don't specify a server-id, it will be set to
- 1 if you have not defined `master-host', else it will be set to 2.
- Note that in the case of `server-id' omission the master will
- refuse connections from all slaves, and the slave will refuse to
- connect to a master. Thus, omitting `server-id' is only good for
- backup with a binary log.
- 8. Copy the snapshot data into your data directory on your slave(s).
- Make sure that the privileges on the files and directories are
- correct. The user which *MySQL* runs as needs to be able to read
- and write to them, just as on the master.
- 9. Restart the slave(s).
- After you have done the above, the slave(s) should connect to the master
- and catch up on any updates which happened since the snapshot was taken.
- If you have forgotten to set `server-id' for the slave you will get the
- following error in the error log file:
- Warning: one should set server_id to a non-0 value if master_host is set.
- The server will not act as a slave.
- If you have forgot to do this for the master, the slaves will not be
- able to connect to the master.
- If a slave is not able to replicate for any reason, you will find error
- messages in the error log on the slave.
- Once a slave is replicating, you will find a file called `master.info'
- in the same directory as your error log. The `master.info' file is used
- by the slave to keep track of how much of the master's binary log is
- has processed. *Do not* remove or edit the file, unless you really know
- what you are doing. Even in that case, it is preferred that you use
- `CHANGE MASTER TO' command.
- Replication Features and known problems
- =======================================
- Below is an explanation of what is supported and what is not:
- * Replication will be done correctly with `AUTO_INCREMENT',
- `LAST_INSERT_ID', and `TIMESTAMP' values.
- * `RAND()' in updates does not replicate properly. Use
- `RAND(some_non_rand_expr)' if you are replcating updates with
- `RAND()'. You can, for example, use `UNIX_TIMESTAMP()' for the
- argument to `RAND()'.
- * `LOAD DATA INFILE' will be handled properly as long as the file
- still resides on the master server at the time of update
- propagation. `LOAD LOCAL DATA INFILE' will be skipped.
- * Update queries that use user variables are not replication-safe
- (yet).
- * `FLUSH' commands are not stored in the binary log and are because
- of this not replicated to the slaves. This is not normally a
- problem as `FLUSH' doesn't change anything. This does however mean
- that if you update the `MySQL' privilege tables directly without
- using `GRANT' statement and you replicate the `MySQL' privilege
- database, you must do a `FLUSH PRIVILEGES' on your slaves to put
- the new privileges into effect.
- * Temporary tables starting in 3.23.29 are replicated properly with
- the exception of the case when you shut down slave server ( not
- just slave thread), you have some temporary tables open, and the
- are used in subsequent updates. To deal with this problem, to
- shut down the slave, do `SLAVE STOP', then check
- `Slave_open_temp_tables' variable to see if it is 0, then issue
- `mysqladmin shutdown'. If the number is not 0, restart the slave
- thread with `SLAVE START' and see if you have better luck next
- time. There will be a cleaner solution, but it has to wait until
- version 4.0. In earlier versions temporary tables are not being
- replicated properly - we recommend that you either upgrade, or
- execute `SET SQL_LOG_BIN=0' on your clients before all queries
- with temp tables.
- * *MySQL* only supports one master and many slaves. We will in 4.x
- add a voting algorithm to automaticly change master if something
- goes wrong with the current master. We will also introduce 'agent'
- processes to help doing load balancing by sending select queries
- to different slaves.
- * Starting in Version 3.23.26, it is safe to connect servers in a
- circular master-slave relationship with `log-slave-updates'
- enabled. Note, however, that many queries will not work right in
- this kind of setup unless your client code is written to take care
- of the potential problems that can happen from updates that occur
- in different sequence on different servers. Note that the log
- format has changed in Version 3.23.26 so that pre-3.23.26 slaves
- will not be able to read it.
- * If the query on the slave gets an error, the slave thread will
- terminate, and a message will appear in the `.err' file. You should
- then connect to the slave manually, fix the cause of the error (for
- example, non-existent table), and then run `SLAVE START' sql
- command (available starting in Version 3.23.16). In Version
- 3.23.15, you will have to restart the server.
- * If connection to the master is lost, the slave will retry
- immediately, and then in case of failure every
- `master-connect-retry' (default 60) seconds. Because of this, it
- is safe to shut down the master, and then restart it after a
- while. The slave will also be able to deal with network
- connectivity outages.
- * Shutting down the slave (cleanly) is also safe, as it keeps track
- of where it left off. Unclean shutdowns might produce problems,
- especially if disk cache was not synced before the system died.
- Your system fault tolerance will be greatly increased if you have
- a good UPS.
- * If the master is listening on a non-standard port, you will also
- need to specify this with `master-port' parameter in `my.cnf' .
- * In Version 3.23.15, all of the tables and databases will be
- replicated. Starting in Version 3.23.16, you can restrict
- replication to a set of databases with `replicate-do-db'
- directives in `my.cnf' or just exclude a set of databases with
- `replicate-ignore-db'. Note that up until Version 3.23.23, there
- was a bug that did not properly deal with `LOAD DATA INFILE' if
- you did it in a database that was excluded from replication.
- * Starting in Version 3.23.16, `SET SQL_LOG_BIN = 0' will turn off
- replication (binary) logging on the master, and `SET SQL_LOG_BIN =
- 1' will turn in back on - you must have the process privilege to do
- this.
- * Starting in Version 3.23.19, you can clean up stale replication
- leftovers when something goes wrong and you want a clean start
- with `FLUSH MASTER' and `FLUSH SLAVE' commands. In Version 3.23.26
- we have renamed them to `RESET MASTER' and `RESET SLAVE'
- respectively to clarify what they do. The old `FLUSH' variants
- still work, though, for compatibility.
- * Starting in Version 3.23.21, you can use `LOAD TABLE FROM MASTER'
- for network backup and to set up replication initially. We have
- recently received a number of bug reports concerning it that we
- are investigating, so we recommend that you use it only in testing
- until we make it more stable.
- * Starting in Version 3.23.23, you can change masters and adjust log
- position with `CHANGE MASTER TO'.
- * Starting in Version 3.23.23, you tell the master that updates in
- certain databases should not be logged to the binary log with
- `binlog-ignore-db'.
- * Starting in Version 3.23.26, you can use `replicate-rewrite-db' to
- tell the slave to apply updates from one database on the master to
- the one with a different name on the slave.
- * Starting in Version 3.23.28, you can use `PURGE MASTER LOGS TO
- 'log-name'' to get rid of old logs while the slave is running.
- Replication Options in my.cnf
- =============================
- If you are using replication, we recommend you to use MySQL Version
- 3.23.30 or later. Older versions work, but they do have some bugs and
- are missing some features.
- On both master and slave you need to use the `server-id' option. This
- sets an unique replication id. You should pick a unique value in the
- range between 1 to 2^32-1 for each master and slave. Example:
- `server-id=3'
- The following table has the options you can use for the *MASTER*:
- *Option* *Description*
- `log-bin=filename' Write to a binary update log to the specified
- location. Note that if you give it a parameter
- with an extension (for example,
- `log-bin=/mysql/logs/replication.log' ) versions
- up to 3.23.24 will not work right during
- replication if you do `FLUSH LOGS' . The problem
- is fixed in Version 3.23.25. If you are using
- this kind of log name, `FLUSH LOGS' will be
- ignored on binlog. To clear the log, run `FLUSH
- MASTER', and do not forget to run `FLUSH SLAVE'
- on all slaves. In Version 3.23.26 and in later
- versions you should use `RESET MASTER' and `RESET
- SLAVE'
- `log-bin-index=filename' Because the user could issue the `FLUSH LOGS'
- command, we need to know which log is currently
- active and which ones have been rotated out and
- in what sequence. This information is stored in
- the binary log index file. The default is
- `hostname`.index. You can use this option if you
- want to be a rebel. (Example:
- `log-bin-index=db.index')
- `sql-bin-update-same' If set, setting `SQL_LOG_BIN' to a value will
- automatically set `SQL_LOG_UPDATE' to the same
- value and vice versa.
- `binlog-do-db=database_name' Tells the master it should log updates for the
- specified database, and exclude all others not
- explicitly mentioned. (Example:
- `binlog-do-db=some_database')
- `binlog-ignore-db=database_name' Tells the master that updates to the given
- database should not be logged to the binary log
- (Example: `binlog-ignore-db=some_database')
- The following table has the options you can use for the *SLAVE*:
- *Option* *Description*
- `master-host=host' Master hostname or IP address for replication.
- If not set, the slave thread will not be started.
- (Example: `master-host=db-master.mycompany.com')
- `master-user=username' The user the slave thread will us for
- authentication when connecting to the master. The
- user must have `FILE' privilege. If the master
- user is not set, user `test' is assumed.
- (Example: `master-user=scott')
- `master-password=password' The password the slave thread will authenticate
- with when connecting to the master. If not set,
- an empty password is assumed. (Example:
- `master-password=tiger')
- `master-port=portnumber' The port the master is listening on. If not set,
- the compiled setting of `MYSQL_PORT' is assumed.
- If you have not tinkered with `configure'
- options, this should be 3306. (Example:
- `master-port=3306')
- `master-connect-retry=seconds' The number of seconds the slave thread will
- sleep before retrying to connect to the master in
- case the master goes down or the connection is
- lost. Default is 60. (Example:
- `master-connect-retry=60')
- `master-info-file=filename' The location of the file that remembers where we
- left off on the master during the replication
- process. The default is master.info in the data
- directory. Sasha: The only reason I see for ever
- changing the default is the desire to be
- rebelious. (Example:
- `master-info-file=master.info')
- `replicate-do-table=db_name.table_name' Tells the slave thread to restrict replication
- to the specified database. To specify more than
- one table, use the directive multiple times, once
- for each table. . (Example:
- `replicate-do-table=some_db.some_table')
- `replicate-ignore-table=db_name.table_name' Tells the slave thread to not replicate to the
- specified table. To specify more than one table
- to ignore, use the directive multiple times, once
- for each table.(Example:
- `replicate-ignore-table=db_name.some_table')
- `replicate-wild-do-table=db_name.table_name' Tells the slave thread to restrict replication
- to the tables that match the specified wildcard
- pattern. . To specify more than one table, use
- the directive multiple times, once for each
- table. . (Example:
- `replicate-do-table=foo%.bar%' will replicate
- only updates to tables in all databases that
- start with foo and whose table names start with
- bar)
- `replicate-wild-ignore-table=db_name.table_name' Tells the slave thread to not replicate to the
- tables that match the given wild card pattern. To
- specify more than one table to ignore, use the
- directive multiple times, once for each
- table.(Example:
- `replicate-ignore-table=foo%.bar%' - will not
- upates to tables in all databases that start with
- foo and whose table names start with bar)
- `replicate-ignore-db=database_name' Tells the slave thread to not replicate to the
- specified database. To specify more than one
- database to ignore, use the directive multiple
- times, once for each database. This option will
- not work if you use cross database updates. If
- you need cross database updates to work, make sure
- you have 3.23.28 or later, and use
- `replicate-wild-ignore-table=db_name.%'(Example:
- `replicate-ignore-db=some_db')
- `replicate-do-db=database_name' Tells the slave thread to restrict replication
- to the specified database. To specify more than
- one database, use the directive multiple times,
- once for each database. Note that this will only
- work if you do not use cross-database queries
- such as `UPDATE some_db.some_table SET foo='bar''
- while having selected a different or no database.
- If you need cross database updates to work, make
- sure you have 3.23.28 or later, and use
- `replicate-wild-do-table=db_name.%' (Example:
- `replicate-do-db=some_db')
- `log-slave-updates' Tells the slave to log the updates from the
- slave thread to the binary log. Off by default.
- You will need to turn it on if you plan to
- daisy-chain the slaves.
- `replicate-rewrite-db=from_name->to_name' Updates to a database with a different name than
- the original (Example:
- `replicate-rewrite-db=master_db_name->slave_db_name'
- `skip-slave-start' Tells the slave server not to start the slave on
- the startup. The user can start it later with
- `SLAVE START'.
- SQL Commands Related to Replication
- ===================================
- Replication can be controlled through the SQL interface. Below is the
- summary of commands:
- *Command* *Description*
- `SLAVE START' Starts the slave thread. (Slave)
- `SLAVE STOP' Stops the slave thread. (Slave)
- `SET SQL_LOG_BIN=0' Disables update logging if the user has process
- privilege. Ignored otherwise. (Master)
- `SET SQL_LOG_BIN=1' Re-enables update logging if the user has process
- privilege. Ignored otherwise. (Master)
- `SET Skip the next `n' events from the master. Only
- SQL_SLAVE_SKIP_COUNTER=n'valid when the slave thread is not running,
- otherwise, gives an error. Useful for recovering
- from replication glitches.
- `RESET MASTER' Deletes all binary logs listed in the index file,
- resetting the binlog index file to be empty. In
- pre-3.23.26 versions, `FLUSH MASTER' (Master)
- `RESET SLAVE' Makes the slave forget its replication position
- in the master logs. In pre 3.23.26 versions the
- command was called `FLUSH SLAVE'(Slave)
- `LOAD TABLE tblname Downloads a copy of the table from master to the
- FROM MASTER' slave. (Slave)
- `CHANGE MASTER TO Changes the master parameters to the values
- master_def_list' specified in `master_def_list' and restarts the
- slave thread. `master_def_list' is a
- comma-separated list of `master_def' where
- `master_def' is one of the following:
- `MASTER_HOST', `MASTER_USER', `MASTER_PASSWORD',
- `MASTER_PORT', `MASTER_CONNECT_RETRY',
- `MASTER_LOG_FILE', `MASTER_LOG_POS'. Example:
-
-
- CHANGE MASTER TO
- MASTER_HOST='master2.mycompany.com',
- MASTER_USER='replication',
- MASTER_PASSWORD='bigs3cret',
- MASTER_PORT=3306,
- MASTER_LOG_FILE='master2-bin.001',
- MASTER_LOG_POS=4;
- You only need to specify the values that need to
- be changed. The values that you omit will stay
- the same with the exception of when you change
- the host or the port. In that case, the slave
- will assume that since you are connecting to a
- different host or a different port, the master is
- different. Therefore, the old values of log and
- position are not applicable anymore, and will
- automatically be reset to an empty string and 0,
- respectively (the start values). Note that if you
- restart the slave, it will remember its last
- master. If this is not desirable, you should
- delete the `master.info' file before restarting,
- and the slave will read its master from `my.cnf'
- or the command line. (Slave)
- `SHOW MASTER STATUS' Provides status information on the binlog of the
- master. (Master)
- `SHOW SLAVE STATUS' Provides status information on essential
- parameters of the slave thread. (Slave)
- `SHOW MASTER LOGS' Only available starting in Version 3.23.28. Lists
- the binary logs on the master. You should use
- this command prior to `PURGE MASTER LOGS TO' to
- find out how far you should go.
- `PURGE MASTER LOGS TO Available starting in Version 3.23.28. Deletes
- 'logname'' all the replication logs that are listed in the
- log index as being prior to the specified log,
- and removed them from the log index, so that the
- given log now becomes first. Example:
-
- PURGE MASTER LOGS TO 'mysql-bin.010'
- This command will do nothing and fail with an
- error if you have an active slave that is
- currently reading one of the logs you are trying
- to delete. However, if you have a dormant slave,
- and happen to purge one of the logs it wants to
- read, the slave will be unable to replicate once
- it comes up. The command is safe to run while
- slaves are replicating - you do not need to stop
- them.
-
- You must first check all the slaves with `SHOW
- SLAVE STATUS' to see which log they are on, then
- do a listing of the logs on the master with `SHOW
- MASTER LOGS', find the earliest log among all the
- slaves (if all the slaves are up to date, this
- will be the last log on the list), backup all the
- logs you are about to delete (optional) and purge
- up to the target log.
- Replication FAQ
- ===============
- *Q*: Why do I sometimes see more than one `Binlog_Dump' thread on the
- master after I have restarted the slave?
- *A*: `Binlog_Dump' is a continuous process that is handled by the
- server in the following way:
- * Catch up on the updates.
- * Once there are no more updates left, go into `pthread_cond_wait()',
- from which we can be awakened either by an update or a kill.
- * On wake up, check the reason. If we are not supposed to die,
- continue the `Binlog_dump' loop.
- * If there is some fatal error, such as detecting a dead client,
- terminate the loop.
- So if the slave thread stops on the slave, the corresponding
- `Binlog_Dump' thread on the master will not notice it until after at
- least one update to the master (or a kill), which is needed to wake it
- up from `pthread_cond_wait()'. In the meantime, the slave could have
- opened another connection, which resulted in another `Binlog_Dump'
- thread.
- The above problem should not be present in Version 3.23.26 and later
- versions. In Version 3.23.26 we added `server-id' to each replication
- server, and now all the old zombie threads are killed on the master
- when a new replication thread connects from the same slave
- *Q*: How do I rotate replication logs?
- *A*: In Version 3.23.28 you should use `PURGE MASTER LOGS TO' command
- after determining which logs can be deleted, and optionally backing
- them up first. In earlier versions the process is much more painful,
- and cannot be safely done without stopping all the slaves in the case
- that you plan to re-use log names. You will need to stop the slave
- threads, edit the binary log index file, delete all the old logs,
- restart the master, start slave threads,and then remove the old log
- files.
- *Q*: How do I upgrade on a hot replication setup?
- *A*: If you are upgrading pre-3.23.26 versions, you should just lock
- the master tables, let the slave catch up, then run `FLUSH MASTER' on
- the master, and `FLUSH SLAVE' on the slave to reset the logs, then
- restart new versions of the master and the slave. Note that the slave
- can stay down for some time - since the master is logging all the
- updates, the slave will be able to catch up once it is up and can
- connect.
- After 3.23.26, we have locked the replication protocol for
- modifications, so you can upgrade masters and slave on the fly to a
- newer 3.23 version and you can have different versions of *MySQL*
- running on the slave and the master, as long as they are both newer
- than 3.23.26.
- *Q*: What issues should I be aware of when setting up two-way
- replication?
- *A*: *MySQL* replication currently does not support any locking
- protocol between master and slave to guarantee the atomicity of a
- distributed (cross-server) update. In in other words, it is possible
- for client A to make an update to co-master 1, and in the meantime,
- before it propagates to co-master 2, client B could make an update to
- co-master 2 that will make the update of client A work differently than
- it did on co-master 1. Thus when the update of client A will make it to
- co-master 2, it will produce tables that will be different than what
- you have on co-master 1, even after all the updates from co-master 2
- have also propagated. So you should not co-chain two servers in a
- two-way replication relationship, unless you are sure that you updates
- can safely happen in any order, or unless you take care of mis-ordered
- updates somehow in the client code.
- You must also realize that two-way replication actually does not improve
- performance very much, if at all, as far as updates are concerned. Both
- servers need to do the same amount of updates each, as you would have
- one server do. The only difference is that there will be a little less
- lock contention, because the updates originating on another server will
- be serialized in one slave thread. This benefit, though, might be
- offset by network delays.
- *Q*: How can I use replication to improve performance of my system?
- *A*: You should set up one server as the master, and direct all writes
- to it, and configure as many slaves as you have the money and rackspace
- for, distributing the reads among the master and the slaves. You can
- also start the slaves with `--skip-bdb', `--low-priority-updates' and
- `--delay-key-write-for-all-tables' to get speed improvements for the
- slave. In this case the slave will use non-transactional `MyISAM'
- tables instead of `BDB' tables to get more speed.
- *Q*: What should I do to prepare my client code to use
- performance-enhancing replication?
- *A*: If the part of your code that is responsible for database access
- has been properly abstracted/modularized, converting it to run with the
- replicated setup should be very smooth and easy - just change the
- implementation of your database access to read from some slave or the
- master, and to awlays write to the master. If your code does not have
- this level of abstraction, setting up a replicated system will give you
- an opportunity/motivation to it clean up. You should start by
- creating a wrapper library /module with the following functions:
- * `safe_writer_connect()'
- * `safe_reader_connect()'
- * `safe_reader_query()'
- * `safe_writer_query()'
- `safe_' means that the function will take care of handling all the
- error conditions.
- You should then convert your client code to use the wrapper library.
- It may be a painful and scary process at first, but it will pay off in
- the long run. All applications that follow the above pattern will be
- able to take advantage of one-master/many slaves solution. The code
- will be a lot easier to maintain, and adding troubleshooting options
- will be trivial. You will just need to modify one or two functions, for
- example, to log how long each query took, or which query, among your
- many thousands, gave you an error. If you have written a lot of code
- already, you may want to automate the conversion task by using Monty's
- `replace' utility, which comes with the standard distribution of
- *MySQL*, or just write your own Perl script. Hopefully, your code
- follows some recognizable pattern. If not, then you are probably better
- off re-writing it anyway, or at least going through and manually
- beating it into a pattern.
- Note that, of course, you can use different names for the functions.
- What is important is having unified interface for connecting for reads,
- connecting for writes, doing a read, and doing a write.
- *Q*: When and how much can *MySQL* replication improve the performance
- of my system?
- *A*: *MySQL* replication is most beneficial for a system with frequent
- reads and not so frequent writes. In theory, by using a one master/many
- slaves setup you can scale by adding more slaves until you either run
- out of network bandwidth, or your update load grows to the point that
- the master cannot handle it.
- In order to determine how many slaves you can get before the added
- benefits begin to level out, and how much you can improve performance
- of your site, you need to know your query patterns, and empirically
- (by benchmarking) determine the relationship between the throughput on
- reads (reads per second, or `max_reads') and on writes `max_writes') on
- a typical master and a typical slave. The example below will show you a
- rather simplified calculation of what you can get with replication for
- our imagined system.
- Let's say our system load consists of 10% writes and 90% reads, and we
- have determined that `max_reads' = 1200 - 2 * `max_writes', or in other
- words, our system can do 1200 reads per second with no writes, our
- average write is twice as slow as average read, and the relationship is
- linear. Let us suppose that our master and slave are of the same
- capacity, and we have N slaves and 1 master. Then we have for each
- server (master or slave):
- `reads = 1200 - 2 * writes' (from bencmarks)
- `reads = 9* writes / (N + 1) ' (reads split, but writes go to all
- servers)
- `9*writes/(N+1) + 2 * writes = 1200'
- `writes = 1200/(2 + 9/(N+1)'
- So if N = 0, which means we have no replication, our system can handle
- 1200/11, about 109 writes per second (which means we will have 9 times
- as many reads due to the nature of our application).
- If N = 1, we can get up to 184 writes per second.
- If N = 8, we get up to 400.
- If N = 17, 480 writes.
- Eventually as N approaches infinity (and our budget negative infinity),
- we can get very close to 600 writes per second, increasing system
- throughput about 5.5 times. However, with only 8 servers, we increased
- it almost 4 times already.
- Note that our computations assumed infinite network bandwidth, and
- neglected several other factors that could turn out to be signficant on
- your system. In many cases, you may not be able to make a computation
- similar to the one above that will accurately predict what will happen
- on your system if you add N replication slaves. However, answering the
- following questions should help you decided whether and how much, if at
- all, the replication will improve the performance of your system:
- * What is the read/write ratio on your system?
- * How much more write load can one server handle if you reduce the
- reads?
- * How many slaves do you have bandwidth for on your network?
- *Q*: How can I use replication to provide redundancy/high availability?
- *A*: With the currently available features, you would have to set up a
- master and a slave (or several slaves), and write a script that will
- monitor the master to see if it is up, and instruct your applications
- and the slaves of the master change in case of failure. Some
- suggestions:
- * To tell a slave to change the master use the `CHANGE MASTER TO'
- command.
- * A good way to keep your applications informed where the master is
- by having a dynamic DNS entry for the master. With *bind* you can
- use `nsupdate' to dynamically update your DNS.
- * You should run your slaves with the `log-bin' option and without
- `log-slave-updates'. This way the slave will be ready to become a
- master as soon as you issue `STOP SLAVE'; `RESET MASTER', and
- `CHANGE MASTER TO' on the other slaves. It will also help you catch
- spurious updates that may happen because of misconfiguration of the
- slave (ideally, you want to configure access rights so that no
- client can update the slave, except for the slave thread) combined
- with the bugs in your client programs (they should never update
- the slave directly).
- We are currently working on intergrating an automatic master election
- system into *MySQL*, but until it is ready, you will have to create
- your own monitoring tools.
- Troubleshooting Replication
- ===========================
- If you have followed the instructions, and your replication setup is not
- working, first elliminate the user error factor by checking the
- following:
- * Is the master logging to the binary log? Check with `SHOW MASTER
- STATUS'. If it is, `Position' will be non-zero. If not, verify
- that you have given the master `log-bin' option and have set
- `server-id'.
- * Is the slave running? Check with `SHOW SLAVE STATUS'. The answer
- is found in `Slave_running' column. If not, verify slave options
- and check the error log for messages.
- * If the slave is running, did it establish connection with the
- master? Do `SHOW PROCESSLIST', find the thread with `system user'
- value in `User' column and `none' in the `Host' column, and check
- the `State' column. If it says `connecting to master', verify the
- privileges for the replication user on the master, master host
- name, your DNS setup, whether the master is actually running,
- whether it is reachable from the slave, and if all that seems ok,
- read the error logs.
- * If the slave was running, but then stopped, look at SHOW SLAVE
- STATUS output andcheck the error logs. It usually happens when
- some query that succeeded on the master fails on the slave. This
- should never happen if you have taken a proper snapshot of the
- master, and never modify the data on the slave outside of the
- slave thread. If it does, it is a bug, read below on how to report
- it.
- * If a query on that succeeded on the master refuses to run on the
- slave, and a full database resync ( the proper thing to do ) does
- not seem feasible, try the following:
- bullet First see if there is some stray record in the way.
- Understand how it got there, then delete it and run `SLAVE
- START'
- bullet If the above does not work or does not apply, try to
- understand if it would be safe to make the update manually (
- if needed) and then ignore the next query from the master.
- bullet If you have decided you can skip the next query, do `SET
- SQL_SLAVE_SKIP_COUNTER=1; SLAVE START;' to skip a query that
- does not use auto_increment, last_insert_id or timestamp, or
- `SET SQL_SLAVE_SKIP_COUNTER=2; SLAVE START;' otherwise
- bullet If you are sure the slave started out perfectly in sync
- with the master, and no one has updated the tables involved
- outside of slave thread, report the bug, so you will not have
- to do the above tricks again.
- * Make sure you are not running into an old bug by upgrading to the
- most recent version.
- * If all else fails, read the error logs. If they are big, `grep -i
- slave /path/to/your-log.err' on the slave. There is no generic
- pattern to search for on the master, as the only errors it logs
- are general system errors - if it can, it will send the error to
- the slave when things go wrong.
- When you have determined that there is no user error involved, and
- replication still either does not work at all or is unstable, it is
- time to start working on a bug report. We need to get as much info as
- possible from you to be able to track down the bug. Please do spend
- some time and effort preparing a good bug report. Ideally, we would
- like to have a test case in the format found in `mysql-test/t/rpl*'
- directory of the source tree. If you submit a test case like that, you
- can expect a patch within a day or two in most cases, although, of
- course, you mileage may vary depending on a number of factors.
- Second best option is a just program with easily configurable connection
- arguments for the master and the slave that will demonstrate the
- problem on our systems. You can write one in Perl or in C, depending on
- which language you know better.
- If you have one of the above ways to demonstrate the bug, use
- `mysqlbug' to prepare a bug report and send it to
- <bugs@lists.mysql.com>. If you have a phantom - a problem that does
- occur but you cannot duplicate "at will":
- * Verify that there is no user error involved. For example, if you
- update the slave outside of the slave thread, the data will be out
- of sync, and you can have unique key violations on updates, in
- which case the slave thread will stop and wait for you to clean up
- the tables manually to bring them in sync.
- * Run slave with `log-slave-updates' and `log-bin' - this will keep
- a log of all updates on the slave.
- * Save all evidence before reseting the replication. If we have no
- or only sketchy information, it would take us a while to track
- down the problem. The evidence you should collect is:
- * All binary logs on the master
- * All binary log on the slave
- * The output of `SHOW MASTER STATUS' on the master at the time
- you have discovered the problem
- * The output of `SHOW SLAVE STATUS' on the master at the time
- you have discovered the problem
- * Error logs on the master and on the slave
- * Use `mysqlbinlog' to examine the binary logs. The following should
- be helpful to find the trouble query, for example:
- mysqlbinlog -j pos_from_slave_status /path/to/log_from_slave_status | head
- Once you have collected the evidence on the phantom problem, try hard to
- isolate it into a separate test case first. Then report the problem to
- <bugs@lists.mysql.com> with as much info as possible.
- Getting Maximum Performance from MySQL
- **************************************
- Optimization is a complicated task because it ultimately requires
- understanding of the whole system. While it may be possible to do some
- local optimizations with small knowledge of your system/application, the
- more optimal you want your system to become the more you will have to
- know about it.
- So this chapter will try to explain and give some examples of different
- ways to optimize *MySQL*. But remember that there are always some
- (increasingly harder) additional ways to make the system even faster.
- Optimization Overview
- =====================
- The most important part for getting a system fast is of course the basic
- design. You also need to know what kinds of things your system will be
- doing, and what your bottlenecks are.
- The most common bottlenecks are:
- * Disk seeks. It takes time for the disk to find a piece of data.
- With modern disks in 1999, the mean time for this is usually lower
- than 10ms, so we can in theory do about 1000 seeks a second. This
- time improves slowly with new disks and is very hard to optimize
- for a single table. The way to optimize this is to spread the data
- on more than one disk.
- * Disk reading/writing. When the disk is at the correct position we
- need to read the data. With modern disks in 1999, one disk
- delivers something like 10-20Mb/s. This is easier to optimize than
- seeks because you can read in parallel from multiple disks.
- * CPU cycles. When we have the data in main memory (or if it
- already were there) we need to process it to get to our result.
- Having small tables compared to the memory is the most common
- limiting factor. But then, with small tables speed is usually not
- the problem.
- * Memory bandwidth. When the CPU needs more data than can fit in
- the CPU cache the main memory bandwidth becomes a bottleneck. This
- is an uncommon bottleneck for most systems, but one should be
- aware of it.
- System/Compile Time and Startup Parameter Tuning
- ================================================
- We start with the system level things since some of these decisions have
- to be made very early. In other cases a fast look at this part may
- suffice because it not that important for the big gains. However, it is
- always nice to have a feeling about how much one could gain by changing
- things at this level.
- The default OS to use is really important! To get the most use of
- multiple CPU machines one should use Solaris (because the threads works
- really nice) or Linux (because the 2.2 kernel has really good SMP
- support). Also on 32-bit machines Linux has a 2G file size limit by
- default. Hopefully this will be fixed soon when new filesystems are
- released (XFS/Reiserfs). If you have a desperate need for files bigger
- than 2G on Linux-intel 32 bit, you should get the LFS patch for the ext2
- file system.
- Because we have not run *MySQL* in production on that many platforms, we
- advice you to test your intended platform before choosing it, if
- possible.
- Other tips:
- * If you have enough RAM, you could remove all swap devices. Some
- operating systems will use a swap device in some contexts even if
- you have free memory.
- * Use the `--skip-locking' *MySQL* option to avoid external locking.
- Note that this will not impact *MySQL*'s functionality as long as
- you only run one server. Just remember to take down the server (or
- lock relevant parts) before you run `myisamchk'. On some system
- this switch is mandatory because the external locking does not
- work in any case.
- The `--skip-locking' option is on by default when compiling with
- MIT-pthreads, because `flock()' isn't fully supported by
- MIT-pthreads on all platforms. It's also on default for Linux as
- Linux file locking are not yet safe.
- The only case when you can't use `--skip-locking' is if you run
- multiple *MySQL* _servers_ (not clients) on the same data, or run
- `myisamchk' on the table without first flushing and locking the
- `mysqld' server tables first.
- You can still use `LOCK TABLES'/`UNLOCK TABLES' even if you are
- using `--skip-locking'
- How Compiling and Linking Affects the Speed of MySQL
- ----------------------------------------------------
- Most of the following tests are done on Linux with the *MySQL*
- benchmarks, but they should give some indication for other operating
- systems and workloads.
- You get the fastest executable when you link with `-static'.
- On Linux, you will get the fastest code when compiling with `pgcc' and
- `-O6'. To compile `sql_yacc.cc' with these options, you need about 200M
- memory because `gcc/pgcc' needs a lot of memory to make all functions
- inline. You should also set `CXX=gcc' when configuring *MySQL* to avoid
- inclusion of the `libstdc++' library (it is not needed). Note that with
- some versions of `pgcc', the resulting code will only run on true
- Pentium processors, even if you use the compiler option that you want
- the resulting code to be working on all x586 type processors (like AMD).
- By just using a better compiler and/or better compiler options you can
- get a 10-30 % speed increase in your application. This is particularly
- important if you compile the SQL server yourself!
- We have tested both the Cygnus CodeFusion and Fujitsu compilers, but
- when we tested them, neither was sufficiently bug free to allow *MySQL*
- to be compiled with optimizations on.
- When you compile *MySQL* you should only include support for the
- character sets that you are going to use. (Option `--with-charset=xxx').
- The standard *MySQL* binary distributions are compiled with support for
- all character sets.
- Here is a list of some mesurements that we have done:
- * If you use `pgcc' and compile everything with `-O6', the `mysqld'
- server is 1% faster than with `gcc' 2.95.2.
- * If you link dynamically (without `-static'), the result is 13%
- slower on Linux. Note that you still can use a dynamic linked
- *MySQL* library. It is only the server that is critical for
- performance.
- * If you connect using TCP/IP rather than Unix sockets, the result
- is 7.5% slower on the same computer. (If you are connection to
- `localhost', *MySQL* will, by default, use sockets).
- * If you compile with `--with-debug=full', then you will loose 20 %
- for most queries, but some queries may take substantially longer
- (The *MySQL* benchmarks ran 35 % slower) If you use
- `--with-debug', then you will only loose 15 %.
- * On a Sun SPARCstation 20, SunPro C++ 4.2 is 5 % faster than `gcc'
- 2.95.2.
- * Compiling with `gcc' 2.95.2 for ultrasparc with the option
- `-mcpu=v8 -Wa,-xarch=v8plusa' gives 4 % more performance.
- * On Solaris 2.5.1, MIT-pthreads is 8-12% slower than Solaris native
- threads on a single processor. With more load/CPUs the difference
- should get bigger.
- * Running with `--log-bin' makes *[MySQL* 1 % slower.
- * Compiling without frame pointers `-fomit-frame-pointer' with gcc
- makes MySQL 1 % faster.
- The *MySQL*-Linux distribution provided by MySQL AB used to be compiled
- with `pgcc', but we had to go back to regular gcc because of a bug in
- `pgcc' that would generate the code that does not run on AMD. We will
- continue using gcc until that bug is resolved. In the meantime, if you
- have a non-AMD machine, you can get a faster binary by compiling with
- `pgcc'. The standard MySqL Linux binary is linked statically to get it
- faster and more portable.
- Disk Issues
- -----------
- * As mentioned before, disks seeks are a big performance bottleneck.
- This problems gets more and more apparent when the data starts to
- grow so large that effective caching becomes impossible. For large
- databases, where you access data more or less randomly, you can be
- sure that you will need at least one disk seek to read and a
- couple of disk seeks to write things. To minimize this problem,
- use disks with low seek times.
- * Increase the number of available disk spindles (and thereby reduce
- the seek overhead) by either symlink files to different disks or
- striping the disks.
- *Using symbolic links*
- This means that you symlink the index and/or data file(s)
- from the normal data directory to another disk (that may also
- be striped). This makes both the seek and read times better
- (if the disks are not used for other things). *Note Symbolic
- links::.
- *Striping*
- Striping means that you have many disks and put the first
- block on the first disk, the second block on the second disk,
- and the Nth on the (N mod number_of_disks) disk, and so on.
- This means if your normal data size is less than the stripe
- size (or perfectly aligned) you will get much better
- performance. Note that striping is very dependent on the OS
- and stripe-size. So benchmark your application with different
- stripe-sizes. *Note Benchmarks::.
- Note that the speed difference for striping is *very*
- dependent on the parameters. Depending on how you set the
- striping parameters and number of disks you may get a
- difference in orders of magnitude. Note that you have to
- choose to optimize for random or sequential access.
- * For reliability you may want to use RAID 0+1 (striping +
- mirroring), but in this case you will need 2*N drives to hold N
- drives of data. This is probably the best option if you have the
- money for it! You may, however, also have to invest in some
- volume-management software to handle it efficiently.
- * A good option is to have semi-important data (that can be
- regenerated) on RAID 0 disk while storing really important data
- (like host information and logs) on a RAID 0+1 or RAID N disk.
- RAID N can be a problem if you have many writes because of the
- time to update the parity bits.
- * You may also set the parameters for the file system that the
- database uses. One easy change is to mount the file system with
- the noatime option. That makes it skip the updating of the last
- access time in the inode and by this will avoid some disk seeks.
- * On Linux, you can get much more performance (up to 100 % under
- load is not uncommon) by using hdpram to configure your disk's
- interface! The following should be quite good hdparm options for
- *MySQL* (and probably many other applications):
- hdparm -m 16 -d 1
- Note that the performance/reliability when using the above depends
- on your hardware, so we strongly suggest that you test your system
- throughly after using `hdparm'! Please consult the `hdparm' man
- page for more information! If `hdparm' is not used wisely,
- filesystem corruption may result. Backup everything before
- experimenting!
- * On many operating systems you can mount the disks with the 'async'
- flag to set the file system to be updated asynchronously. If your
- computer is reasonable stable, this should give you more
- performance without sacrificing too much reliability. (This flag
- is on by default on Linux.)
- * If you don't need to know when a file was last accessed (which is
- not really useful on a databasa server), you can mount your file
- systems with the noatime flag.
- Using Symbolic Links for Databases and Tables
- .............................................
- You can move tables and databases from the database directory to other
- locations and replace them with symbolic links to the new locations.
- You might want to do this, for example, to move a database to a file
- system with more free space.
- If *MySQL* notices that a table is symbolically linked, it will resolve
- the symlink and use the table it points to instead. This works on all
- systems that support the `realpath()' call (at least Linux and Solaris
- support `realpath()')! On systems that don't support `realpath()', you
- should not access the table through the real path and through the
- symlink at the same time! If you do, the table will be inconsistent
- after any update.
- *MySQL* doesn't that you link one directory to multiple databases.
- Replacing a database directory with a symbolic link will work fine as
- long as you don't make a symbolic link between databases. Suppose you
- have a database `db1' under the *MySQL* data directory, and then make a
- symlink `db2' that points to `db1':
- shell> cd /path/to/datadir
- shell> ln -s db1 db2
- Now, for any table `tbl_a' in `db1', there also appears to be a table
- `tbl_a' in `db2'. If one thread updates `db1.tbl_a' and another thread
- updates `db2.tbl_a', there will be problems.
- If you really need this, you must change the following code in
- `mysys/mf_format.c':
- if (flag & 32 || (!lstat(to,&stat_buff) && S_ISLNK(stat_buff.st_mode)))
- to
- if (1)
- On Windows you can use internal symbolic links to directories by
- compiling *MySQL* with `-DUSE_SYMDIR'. This allows you to put different
- databases on different disks. *Note Windows symbolic links::.
- Tuning Server Parameters
- ------------------------
- You can get the default buffer sizes used by the `mysqld' server with
- this command:
- shell> mysqld --help
- This command produces a list of all `mysqld' options and configurable
- variables. The output includes the default values and looks something
- like this:
- Possible variables for option --set-variable (-O) are:
- back_log current value: 5
- bdb_cache_size current value: 1048540
- binlog_cache_size current_value: 32768
- connect_timeout current value: 5
- delayed_insert_timeout current value: 300
- delayed_insert_limit current value: 100
- delayed_queue_size current value: 1000
- flush_time current value: 0
- interactive_timeout current value: 28800
- join_buffer_size current value: 131072
- key_buffer_size current value: 1048540
- lower_case_table_names current value: 0
- long_query_time current value: 10
- max_allowed_packet current value: 1048576
- max_binlog_cache_size current_value: 4294967295
- max_connections current value: 100
- max_connect_errors current value: 10
- max_delayed_threads current value: 20
- max_heap_table_size current value: 16777216
- max_join_size current value: 4294967295
- max_sort_length current value: 1024
- max_tmp_tables current value: 32
- max_write_lock_count current value: 4294967295
- myisam_sort_buffer_size current value: 8388608
- net_buffer_length current value: 16384
- net_retry_count current value: 10
- net_read_timeout current value: 30
- net_write_timeout current value: 60
- query_buffer_size current value: 0
- record_buffer current value: 131072
- slow_launch_time current value: 2
- sort_buffer current value: 2097116
- table_cache current value: 64
- thread_concurrency current value: 10
- tmp_table_size current value: 1048576
- thread_stack current value: 131072
- wait_timeout current value: 28800
- If there is a `mysqld' server currently running, you can see what
- values it actually is using for the variables by executing this command:
- shell> mysqladmin variables
- You can find a full description for all variables in the `SHOW
- VARIABLES' section in this manual. *Note SHOW VARIABLES::.
- You can also see some statistics from a running server by issuing the
- command `SHOW STATUS'. *Note SHOW STATUS::.
- *MySQL* uses algorithms that are very scalable, so you can usually run
- with very little memory. If you, however, give *MySQL* more memory, you
- will normally also get better performance.
- When tuning a *MySQL* server, the two most important variables to use
- are `key_buffer_size' and `table_cache'. You should first feel
- confident that you have these right before trying to change any of the
- other variables.
- If you have much memory (>=256M) and many tables and want maximum
- performance with a moderate number of clients, you should use something
- like this:
- shell> safe_mysqld -O key_buffer=64M -O table_cache=256
- -O sort_buffer=4M -O record_buffer=1M &
- If you have only 128M and only a few tables, but you still do a lot of
- sorting, you can use something like:
- shell> safe_mysqld -O key_buffer=16M -O sort_buffer=1M
- If you have little memory and lots of connections, use something like
- this:
- shell> safe_mysqld -O key_buffer=512k -O sort_buffer=100k
- -O record_buffer=100k &
- or even:
- shell> safe_mysqld -O key_buffer=512k -O sort_buffer=16k
- -O table_cache=32 -O record_buffer=8k -O net_buffer=1K &
- When you have installed *MySQL*, the `support-files' directory will
- contain some different `my.cnf' example files, `my-huge.cnf',
- `my-large.cnf', `my-medium.cnf', and `my-small.cnf', you can use as a
- base to optimize your system.
- If there are very many connections, "swapping problems" may occur unless
- `mysqld' has been configured to use very little memory for each
- connection. `mysqld' performs better if you have enough memory for all
- connections, of course.
- Note that if you change an option to `mysqld', it remains in effect only
- for that instance of the server.
- To see the effects of a parameter change, do something like this:
- shell> mysqld -O key_buffer=32m --help
- Make sure that the `--help' option is last; otherwise, the effect of any
- options listed after it on the command line will not be reflected in the
- output.
- How MySQL Opens and Closes Tables
- ---------------------------------
- `table_cache', `max_connections', and `max_tmp_tables' affect the
- maximum number of files the server keeps open. If you increase one or
- both of these values, you may run up against a limit imposed by your
- operating system on the per-process number of open file descriptors.
- However, you can increase the limit on many systems. Consult your OS
- documentation to find out how to do this, because the method for
- changing the limit varies widely from system to system.
- `table_cache' is related to `max_connections'. For example, for 200
- concurrent running connections, you should have a table cache of at
- least `200 * n', where `n' is the maximum number of tables in a join.
- The cache of open tables can grow to a maximum of `table_cache'
- (default 64; this can be changed with the `-O table_cache=#' option to
- `mysqld'). A table is never closed, except when the cache is full and
- another thread tries to open a table or if you use `mysqladmin refresh'
- or `mysqladmin flush-tables'.
- When the table cache fills up, the server uses the following procedure
- to locate a cache entry to use:
- * Tables that are not currently in use are released, in
- least-recently-used order.
- * If the cache is full and no tables can be released, but a new
- table needs to be opened, the cache is temporarily extended as
- necessary.
- * If the cache is in a temporarily-extended state and a table goes
- from in-use to not-in-use state, the table is closed and released
- from the cache.
- A table is opened for each concurrent access. This means that if you
- have two threads accessing the same table or access the table twice in
- the same query (with `AS') the table needs to be opened twice. The
- first open of any table takes two file descriptors; each additional use
- of the table takes only one file descriptor. The extra descriptor for
- the first open is used for the index file; this descriptor is shared
- among all threads.
- You can check if your table cache is too small by checking the mysqld
- variable `opened_tables'. If this is quite big, even if you haven't
- done a lot of `FLUSH TABLES', you should increase your table cache.
- *Note SHOW STATUS::.
- Drawbacks to Creating Large Numbers of Tables in the Same Database
- ------------------------------------------------------------------
- If you have many files in a directory, open, close, and create
- operations will be slow. If you execute `SELECT' statements on many
- different tables, there will be a little overhead when the table cache
- is full, because for every table that has to be opened, another must be
- closed. You can reduce this overhead by making the table cache larger.
- Why So Many Open tables?
- ------------------------
- When you run `mysqladmin status', you'll see something like this:
- Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12
- This can be somewhat perplexing if you only have 6 tables.
- *MySQL* is multithreaded, so it may have many queries on the same table
- simultaneously. To minimize the problem with two threads having
- different states on the same file, the table is opened independently by
- each concurrent thread. This takes some memory and one extra file
- descriptor for the data file. The index file descriptor is shared
- between all threads.
- How MySQL Uses Memory
- ---------------------
- The list below indicates some of the ways that the `mysqld' server uses
- memory. Where applicable, the name of the server variable relevant to
- the memory use is given:
- * The key buffer (variable `key_buffer_size') is shared by all
- threads; Other buffers used by the server are allocated as needed.
- *Note Server parameters::.
- * Each connection uses some thread-specific space: A stack (default
- 64K, variable `thread_stack'), a connection buffer (variable
- `net_buffer_length'), and a result buffer (variable
- `net_buffer_length'). The connection buffer and result buffer are
- dynamically enlarged up to `max_allowed_packet' when needed. When
- a query is running, a copy of the current query string is also
- allocated.
- * All threads share the same base memory.
- * Only the compressed ISAM / MyISAM tables are memory mapped. This
- is because the 32-bit memory space of 4GB is not large enough for
- most big tables. When systems with a 64-bit address space become
- more common we may add general support for memory mapping.
- * Each request doing a sequential scan over a table allocates a read
- buffer (variable `record_buffer').
- * All joins are done in one pass, and most joins can be done without
- even using a temporary table. Most temporary tables are
- memory-based (HEAP) tables. Temporary tables with a big record
- length (calculated as the sum of all column lengths) or that
- contain `BLOB' columns are stored on disk.
- One problem in *MySQL* versions before Version 3.23.2 is that if a
- HEAP table exceeds the size of `tmp_table_size', you get the error
- `The table tbl_name is full'. In newer versions this is handled by
- automatically changing the in-memory (HEAP) table to a disk-based
- (MyISAM) table as necessary. To work around this problem, you can
- increase the temporary table size by setting the `tmp_table_size'
- option to `mysqld', or by setting the SQL option `SQL_BIG_TABLES'
- in the client program. *Note `SET OPTION': SET OPTION. In *MySQL*
- Version 3.20, the maximum size of the temporary table was
- `record_buffer*16', so if you are using this version, you have to
- increase the value of `record_buffer'. You can also start `mysqld'
- with the `--big-tables' option to always store temporary tables on
- disk. However, this will affect the speed of many complicated
- queries.
- * Most requests doing a sort allocates a sort buffer and 0-2
- temporary files depending on the result set size. *Note Temporary
- files::.
- * Almost all parsing and calculating is done in a local memory
- store. No memory overhead is needed for small items and the normal
- slow memory allocation and freeing is avoided. Memory is allocated
- only for unexpectedly large strings (this is done with `malloc()'
- and `free()').
- * Each index file is opened once and the data file is opened once
- for each concurrently running thread. For each concurrent thread,
- a table structure, column structures for each column, and a buffer
- of size `3 * n' is allocated (where `n' is the maximum row length,
- not counting `BLOB' columns). A `BLOB' uses 5 to 8 bytes plus the
- length of the `BLOB' data. The `ISAM'/`MyISAM' table handlers will
- use one extra row buffer for internal usage.
- * For each table having `BLOB' columns, a buffer is enlarged
- dynamically to read in larger `BLOB' values. If you scan a table,
- a buffer as large as the largest `BLOB' value is allocated.
- * Table handlers for all in-use tables are saved in a cache and
- managed as a FIFO. Normally the cache has 64 entries. If a table
- has been used by two running threads at the same time, the cache
- contains two entries for the table. *Note Table cache::.
- * A `mysqladmin flush-tables' command closes all tables that are not
- in use and marks all in-use tables to be closed when the currently
- executing thread finishes. This will effectively free most in-use
- memory.
- `ps' and other system status programs may report that `mysqld' uses a
- lot of memory. This may be caused by thread-stacks on different memory
- addresses. For example, the Solaris version of `ps' counts the unused
- memory between stacks as used memory. You can verify this by checking
- available swap with `swap -s'. We have tested `mysqld' with commercial
- memory-leakage detectors, so there should be no memory leaks.
- How MySQL Locks Tables
- ----------------------
- You can find a discussion about different locking methods in the
- appendix. *Note Locking methods::.
- All locking in *MySQL* is deadlock-free. This is managed by always
- requesting all needed locks at once at the beginning of a query and
- always locking the tables in the same order.
- The locking method *MySQL* uses for `WRITE' locks works as follows:
- * If there are no locks on the table, put a write lock on it.
- * Otherwise, put the lock request in the write lock queue.
- The locking method *MySQL* uses for `READ' locks works as follows:
- * If there are no write locks on the table, put a read lock on it.
- * Otherwise, put the lock request in the read lock queue.
- When a lock is released, the lock is made available to the threads in
- the write lock queue, then to the threads in the read lock queue.
- This means that if you have many updates on a table, `SELECT'
- statements will wait until there are no more updates.
- To work around this for the case where you want to do many `INSERT' and
- `SELECT' operations on a table, you can insert rows in a temporary
- table and update the real table with the records from the temporary
- table once in a while.
- This can be done with the following code:
- mysql> LOCK TABLES real_table WRITE, insert_table WRITE;
- mysql> insert into real_table select * from insert_table;
- mysql> TRUNCATE TABLE insert_table;
- mysql> UNLOCK TABLES;
- You can use the `LOW_PRIORITY' options with `INSERT' if you want to
- prioritize retrieval in some specific cases. *Note `INSERT': INSERT.
- You could also change the locking code in `mysys/thr_lock.c' to use a
- single queue. In this case, write locks and read locks would have the
- same priority, which might help some applications.
- Table Locking Issues
- --------------------
- The table locking code in *MySQL* is deadlock free.
- *MySQL* uses table locking (instead of row locking or column locking)
- on all table types, except `BDB' tables, to achieve a very high lock
- speed. For large tables, table locking is MUCH better than row locking
- for most applications, but there are, of course, some pitfalls.
- For `BDB' tables, *MySQL* only uses table locking if you explicitely
- lock the table with `LOCK TABLES' or execute a command that will modify
- every row in the table, like `ALTER TABLE'.
- In *MySQL* Version 3.23.7 and above, you can insert rows into `MyISAM'
- tables at the same time other threads are reading from the table. Note
- that currently this only works if there are no holes after deleted rows
- in the table at the time the insert is made.
- Table locking enables many threads to read from a table at the same
- time, but if a thread wants to write to a table, it must first get
- exclusive access. During the update, all other threads that want to
- access this particular table will wait until the update is ready.
- As updates on tables normally are considered to be more important than
- `SELECT', all statements that update a table have higher priority than
- statements that retrieve information from a table. This should ensure
- that updates are not 'starved' because one issues a lot of heavy
- queries against a specific table. (You can change this by using
- LOW_PRIORITY with the statement that does the update or `HIGH_PRIORITY'
- with the `SELECT' statement.)
- Starting from *MySQL* Version 3.23.7 one can use the
- `max_write_lock_count' variable to force *MySQL* to temporary give all
- `SELECT' statements, that wait for a table, a higher priority after a
- specific number of inserts on a table.
- Table locking is, however, not very good under the following senario:
- * A client issues a `SELECT' that takes a long time to run.
- * Another client then issues an `UPDATE' on a used table. This client
- will wait until the `SELECT' is finished.
- * Another client issues another `SELECT' statement on the same
- table. As `UPDATE' has higher priority than `SELECT', this `SELECT'
- will wait for the `UPDATE' to finish. It will also wait for the
- first `SELECT' to finish!
- * A thread is waiting for something like `full disk', in which case
- all threads that wants to access the problem table will also be
- put in a waiting state until more disk space is made available.
- Some possible solutions to this problem are:
- * Try to get the `SELECT' statements to run faster. You may have to
- create some summary tables to do this.
- * Start `mysqld' with `--low-priority-updates'. This will give all
- statements that update (modify) a table lower priority than a
- `SELECT' statement. In this case the last `SELECT' statement in
- the previous scenario would execute before the `INSERT' statement.
- * You can give a specific `INSERT', `UPDATE', or `DELETE' statement
- lower priority with the `LOW_PRIORITY' attribute.
- * Start `mysqld' with a low value for *max_write_lock_count* to give
- `READ' locks after a certain number of `WRITE' locks.
- * You can specify that all updates from a specific thread should be
- done with low priority by using the SQL command: `SET
- SQL_LOW_PRIORITY_UPDATES=1'. *Note `SET OPTION': SET OPTION.
- * You can specify that a specific `SELECT' is very important with the
- `HIGH_PRIORITY' attribute. *Note `SELECT': SELECT.
- * If you have problems with `INSERT' combined with `SELECT', switch
- to use the new `MyISAM' tables as these support concurrent
- `SELECT's and `INSERT's.
- * If you mainly mix `INSERT' and `SELECT' statements, the `DELAYED'
- attribute to `INSERT' will probably solve your problems. *Note
- `INSERT': INSERT.
- * If you have problems with `SELECT' and `DELETE', the `LIMIT'
- option to `DELETE' may help. *Note `DELETE': DELETE.
- How MySQL uses DNS
- ------------------
- When a new threads connects to `mysqld', `mysqld' will span a new
- thread to handle the request. This thread will first check if the
- hostname is in the hostname cache. If not the thread will call
- `gethostbyaddr_r()' and `gethostbyname_r()' to resolve the hostname.
- If the operating system doesn't support the above thread-safe calls, the
- thread will lock a mutex and call `gethostbyaddr()' and
- `gethostbyname()' instead. Note that in this case no other thread can
- resolve other hostnames that is not in the hostname cache until the
- first thread is ready.
- You can disable DNS host lookup by starting `mysqld' with
- `--skip-name-resolve'. In this case you can however only use IP names
- in the *MySQL* privilege tables.
- If you have a very slow DNS and many hosts, you can get more
- performance by either disabling DNS lookop with `--skip-name-resolve'
- or by increasing the `HOST_CACHE_SIZE' define (default: 128) and
- recompile `mysqld'.
- You can disable the hostname cache with `--skip-host-cache'. You can
- clear the hostname cache with `FLUSH HOSTS' or `mysqladmin flush-hosts'.
- If you don't want to allow connections over `TCP/IP', you can do this
- by starting mysqld with `--skip-networking'.
- Get Your Data as Small as Possible
- ==================================
- One of the most basic optimization is to get your data (and indexes) to
- take as little space on the disk (and in memory) as possible. This can
- give huge improvements because disk reads are faster and normally less
- main memory will be used. Indexing also takes less resources if done on
- smaller columns.
- *MySQL* supports a lot of different table types and row formats.
- Choosing the right table format may give you a big performance gain.
- *Note Table types::.
- You can get better performance on a table and minimize storage space
- using the techniques listed below:
- * Use the most efficient (smallest) types possible. *MySQL* has many
- specialized types that save disk space and memory.
- * Use the smaller integer types if possible to get smaller tables.
- For example, `MEDIUMINT' is often better than `INT'.
- * Declare columns to be `NOT NULL' if possible. It makes everything
- faster and you save one bit per column. Note that if you really
- need `NULL' in your application you should definitely use it. Just
- avoid having it on all columns by default.
- * If you don't have any variable-length columns (`VARCHAR', `TEXT',
- or `BLOB' columns), a fixed-size record format is used. This is
- faster but unfortunately may waste some space. *Note MyISAM table
- formats::.
- * The primary index of a table should be as short as possible. This
- makes identification of one row easy and efficient.
- * For each table, you have to decide which storage/index method to
- use. *Note Table types::.
- * Only create the indexes that you really need. Indexes are good for
- retrieval but bad when you need to store things fast. If you mostly
- access a table by searching on a combination of columns, make an
- index on them. The first index part should be the most used
- column. If you are ALWAYS using many columns, you should use the
- column with more duplicates first to get better compression of the
- index.
- * If it's very likely that a column has a unique prefix on the first
- number of characters, it's better to only index this prefix.
- *MySQL* supports an index on a part of a character column. Shorter
- indexes are faster not only because they take less disk space but
- also because they will give you more hits in the index cache and
- thus fewer disk seeks. *Note Server parameters::.
- * In some circumstances it can be beneficial to split into two a
- table that is scanned very often. This is especially true if it is
- a dynamic format table and it is possible to use a smaller static
- format table that can be used to find the relevant rows when
- scanning the table.
- How MySQL Uses Indexes
- ======================
- Indexes are used to find rows with a specific value of one column fast.
- Without an index *MySQL* has to start with the first record and then
- read through the whole table until it finds the relevant rows. The
- bigger the table, the more this costs. If the table has an index for
- the colums in question, *MySQL* can quickly get a position to seek to
- in the middle of the data file without having to look at all the data.
- If a table has 1000 rows, this is at least 100 times faster than
- reading sequentially. Note that if you need to access almost all 1000
- rows it is faster to read sequentially because we then avoid disk seeks.
- All *MySQL* indexes (`PRIMARY', `UNIQUE', and `INDEX') are stored in
- B-trees. Strings are automatically prefix- and end-space compressed.
- *Note `CREATE INDEX': CREATE INDEX.
- Indexes are used to:
- * Quickly find the rows that match a `WHERE' clause.
- * Retrieve rows from other tables when performing joins.
- * Find the `MAX()' or `MIN()' value for a specific indexed column.
- This is optimized by a preprocessor that checks if you are using
- `WHERE' key_part_# = constant on all key parts < N. In this case
- *MySQL* will do a single key lookup and replace the `MIN()'
- expression with a constant. If all expressions are replaced with
- constants, the query will return at once:
- SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10
- * Sort or group a table if the sorting or grouping is done on a
- leftmost prefix of a usable key (for example, `ORDER BY
- key_part_1,key_part_2 '). The key is read in reverse order if all
- key parts are followed by `DESC'.
- The index can also be used even if the `ORDER BY' doesn't match
- the index exactly, as long as all the unused index parts and all
- the extra are `ORDER BY' columns are constants in the `WHERE'
- clause. The following queries will use the index to resolve the
- `ORDER BY' part:
- SELECT * FROM foo ORDER BY key_part1,key_part2,key_part3;
- SELECT * FROM foo WHERE column=constant ORDER BY column, key_part1;
- SELECT * FROM foo WHERE key_part1=const GROUP BY key_part2;
- * In some cases a query can be optimized to retrieve values without
- consulting the data file. If all used columns for some table are
- numeric and form a leftmost prefix for some key, the values may be
- retrieved from the index tree for greater speed:
- SELECT key_part3 FROM table_name WHERE key_part1=1
- Suppose you issue the following `SELECT' statement:
- mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
- If a multiple-column index exists on `col1' and `col2', the appropriate
- rows can be fetched directly. If separate single-column indexes exist
- on `col1' and `col2', the optimizer tries to find the most restrictive
- index by deciding which index will find fewer rows and using that index
- to fetch the rows.
- If the table has a multiple-column index, any leftmost prefix of the
- index can be used by the optimizer to find rows. For example, if you
- have a three-column index on `(col1,col2,col3)', you have indexed
- search capabilities on `(col1)', `(col1,col2)', and `(col1,col2,col3)'.
- *MySQL* can't use a partial index if the columns don't form a leftmost
- prefix of the index. Suppose you have the `SELECT' statements shown
- below:
- mysql> SELECT * FROM tbl_name WHERE col1=val1;
- mysql> SELECT * FROM tbl_name WHERE col2=val2;
- mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
- If an index exists on `(col1,col2,col3)', only the first query shown
- above uses the index. The second and third queries do involve indexed
- columns, but `(col2)' and `(col2,col3)' are not leftmost prefixes of
- `(col1,col2,col3)'.
- *MySQL* also uses indexes for `LIKE' comparisons if the argument to
- `LIKE' is a constant string that doesn't start with a wild-card
- character. For example, the following `SELECT' statements use indexes:
- mysql> select * from tbl_name where key_col LIKE "Patrick%";
- mysql> select * from tbl_name where key_col LIKE "Pat%_ck%";
- In the first statement, only rows with `"Patrick" <= key_col <
- "Patricl"' are considered. In the second statement, only rows with
- `"Pat" <= key_col < "Pau"' are considered.
- The following `SELECT' statements will not use indexes:
- mysql> select * from tbl_name where key_col LIKE "%Patrick%";
- mysql> select * from tbl_name where key_col LIKE other_col;
- In the first statement, the `LIKE' value begins with a wild-card
- character. In the second statement, the `LIKE' value is not a constant.
- Searching using `column_name IS NULL' will use indexes if column_name
- is an index.
- *MySQL* normally uses the index that finds the least number of rows. An
- index is used for columns that you compare with the following operators:
- `=', `>', `>=', `<', `<=', `BETWEEN', and a `LIKE' with a non-wild-card
- prefix like `'something%''.