Add note about space usage of 'manual' approach to clustering, per
suggestion from Sergey Koposov. Also some other minor editing.
This commit is contained in:
parent
6fada49805
commit
10c70b8602
@ -1,5 +1,5 @@
|
|||||||
<!--
|
<!--
|
||||||
$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.37 2006/10/31 01:52:31 neilc Exp $
|
$PostgreSQL: pgsql/doc/src/sgml/ref/cluster.sgml,v 1.38 2006/11/04 19:03:51 tgl Exp $
|
||||||
PostgreSQL documentation
|
PostgreSQL documentation
|
||||||
-->
|
-->
|
||||||
|
|
||||||
@ -108,8 +108,8 @@ CLUSTER
|
|||||||
If you are requesting a range of indexed values from a table, or a
|
If you are requesting a range of indexed values from a table, or a
|
||||||
single indexed value that has multiple rows that match,
|
single indexed value that has multiple rows that match,
|
||||||
<command>CLUSTER</command> will help because once the index identifies the
|
<command>CLUSTER</command> will help because once the index identifies the
|
||||||
heap page for the first row that matches, all other rows
|
table page for the first row that matches, all other rows
|
||||||
that match are probably already on the same heap page,
|
that match are probably already on the same table page,
|
||||||
and so you save disk accesses and speed up the query.
|
and so you save disk accesses and speed up the query.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -137,30 +137,33 @@ CLUSTER
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
There is another way to cluster data. The
|
There is another way to cluster data. The
|
||||||
<command>CLUSTER</command> command reorders the original table using
|
<command>CLUSTER</command> command reorders the original table by
|
||||||
the ordering of the index you specify. This can be slow
|
scanning it using the index you specify. This can be slow
|
||||||
on large tables because the rows are fetched from the heap
|
on large tables because the rows are fetched from the table
|
||||||
in index order, and if the heap table is unordered, the
|
in index order, and if the table is disordered, the
|
||||||
entries are on random pages, so there is one disk page
|
entries are on random pages, so there is one disk page
|
||||||
retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache,
|
retrieved for every row moved. (<productname>PostgreSQL</productname> has
|
||||||
but the majority of a big table will not fit in the cache.)
|
a cache, but the majority of a big table will not fit in the cache.)
|
||||||
The other way to cluster a table is to use
|
The other way to cluster a table is to use
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
|
CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
|
||||||
SELECT <replaceable class="parameter">columnlist</replaceable> FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
|
SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
which uses the <productname>PostgreSQL</productname> sorting code in
|
which uses the <productname>PostgreSQL</productname> sorting code
|
||||||
the <literal>ORDER BY</literal> clause to create the desired order; this is usually much
|
to produce the desired order;
|
||||||
faster than an index scan for
|
this is usually much faster than an index scan for disordered data.
|
||||||
unordered data. You then drop the old table, use
|
Then you drop the old table, use
|
||||||
<command>ALTER TABLE ... RENAME</command>
|
<command>ALTER TABLE ... RENAME</command>
|
||||||
to rename <replaceable class="parameter">newtable</replaceable> to the old name, and
|
to rename <replaceable class="parameter">newtable</replaceable> to the
|
||||||
recreate the table's indexes. However, this approach does not preserve
|
old name, and recreate the table's indexes.
|
||||||
|
The big disadvantage of this approach is that it does not preserve
|
||||||
OIDs, constraints, foreign key relationships, granted privileges, and
|
OIDs, constraints, foreign key relationships, granted privileges, and
|
||||||
other ancillary properties of the table — all such items must be
|
other ancillary properties of the table — all such items must be
|
||||||
manually recreated.
|
manually recreated. Another disadvantage is that this way requires a sort
|
||||||
|
temporary file about the same size as the table itself, so peak disk usage
|
||||||
|
is about three times the table size instead of twice the table size.
|
||||||
</para>
|
</para>
|
||||||
</refsect1>
|
</refsect1>
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user