postgres/doc/src/sgml/ref/copy.sgml

496 lines
15 KiB
Plaintext
Raw Normal View History

<!--
2000-07-22 02:39:10 +00:00
$Header: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v 1.17 2000/07/22 02:39:10 momjian Exp $
Postgres documentation
-->
<refentry id="SQL-COPY">
<refmeta>
<refentrytitle id="sql-copy-title">
1998-05-13 05:34:00 +00:00
COPY
</refentrytitle>
<refmiscinfo>SQL - Language Statements</refmiscinfo>
</refmeta>
<refnamediv>
<refname>
1998-05-13 05:34:00 +00:00
COPY
</refname>
<refpurpose>
1998-05-13 05:34:00 +00:00
Copies data between files and tables
</refpurpose>
</refnamediv>
<refsynopsisdiv>
<refsynopsisdivinfo>
<date>1999-12-11</date>
</refsynopsisdivinfo>
<synopsis>
1998-09-16 14:43:12 +00:00
COPY [ BINARY ] <replaceable class="parameter">table</replaceable> [ WITH OIDS ]
FROM { '<replaceable class="parameter">filename</replaceable>' | <filename>stdin</filename> }
1999-10-29 23:52:22 +00:00
[ [USING] DELIMITERS '<replaceable class="parameter">delimiter</replaceable>' ]
[ WITH NULL AS '<replaceable class="parameter">null string</replaceable>' ]
1998-09-16 14:43:12 +00:00
COPY [ BINARY ] <replaceable class="parameter">table</replaceable> [ WITH OIDS ]
TO { '<replaceable class="parameter">filename</replaceable>' | <filename>stdout</filename> }
1999-10-29 23:52:22 +00:00
[ [USING] DELIMITERS '<replaceable class="parameter">delimiter</replaceable>' ]
[ WITH NULL AS '<replaceable class="parameter">null string</replaceable>' ]
</synopsis>
1998-05-13 05:34:00 +00:00
<refsect2 id="R2-SQL-COPY-1">
<refsect2info>
<date>1998-09-08</date>
</refsect2info>
<title>
1998-05-13 05:34:00 +00:00
Inputs
</title>
<para>
<variablelist>
<varlistentry>
<term>BINARY</term>
<listitem>
<para>
Changes the behavior of field formatting, forcing all data to be
2000-04-18 23:43:24 +00:00
stored or read in binary format rather than as text.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">table</replaceable></term>
<listitem>
<para>
The name of an existing table.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>WITH OIDS</term>
<listitem>
<para>
Copies the internal unique object id (OID) for each row.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">filename</replaceable></term>
<listitem>
<para>
The absolute Unix pathname of the input or output file.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><filename>stdin</filename></term>
<listitem>
<para>
Specifies that input comes from a pipe or terminal.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><filename>stdout</filename></term>
<listitem>
<para>
Specifies that output goes to a pipe or terminal.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">delimiter</replaceable></term>
<listitem>
<para>
A character that delimits the input or output fields.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">null print</replaceable></term>
<listitem>
<para>
A string to represent NULL values. The default is
2000-04-18 23:21:04 +00:00
<quote><literal>\N</literal></quote> (backslash-N).
You might prefer an empty string, for example.
</para>
<note>
<para>
On a copy in, any data item that matches this string will be stored as
a NULL value, so you should make sure that you use the same string
as you used on copy out.
</para>
1999-12-17 14:52:51 +00:00
</note>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect2>
1998-05-13 05:34:00 +00:00
<refsect2 id="R2-SQL-COPY-2">
<refsect2info>
<date>1998-09-08</date>
</refsect2info>
<title>
1998-05-13 05:34:00 +00:00
Outputs
</title>
<para>
<variablelist>
<varlistentry>
<term><computeroutput>
COPY
</computeroutput></term>
<listitem>
<para>
The copy completed successfully.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><computeroutput>
ERROR: <replaceable>reason</replaceable>
</computeroutput></term>
<listitem>
<para>
The copy failed for the reason stated in the error message.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect2>
</refsynopsisdiv>
1998-05-13 05:34:00 +00:00
<refsect1 id="R1-SQL-COPY-1">
<refsect1info>
<date>1998-09-08</date>
</refsect1info>
<title>
1998-05-13 05:34:00 +00:00
Description
</title>
<para>
1998-09-16 14:43:12 +00:00
<command>COPY</command> moves data between
<productname>Postgres</productname> tables and
standard file-system files.
1998-09-16 14:43:12 +00:00
<command>COPY</command> instructs
the <productname>Postgres</productname> backend
to directly read from or write to a file. The file must be directly visible to
the backend and the name must be specified from the viewpoint of the backend.
If <filename>stdin</filename> or <filename>stdout</filename> are
specified, data flows through the client frontend to the backend.
</para>
<refsect2 id="R2-SQL-COPY-3">
<refsect2info>
<date>1998-09-08</date>
</refsect2info>
<title>
1998-09-16 14:43:12 +00:00
Notes
</title>
<para>
The BINARY keyword will force all data to be
2000-04-18 23:43:24 +00:00
stored/read as binary format rather than as text. It is
somewhat faster than the normal copy command, but is not
generally portable, and the files generated are somewhat larger,
2000-04-18 23:44:58 +00:00
although this factor is highly dependent on the data itself.
</para>
<para>
By default, a text copy uses a tab ("\t") character as a delimiter.
The delimiter may also be changed to any other single character
with the keyword phrase USING DELIMITERS. Characters
in data fields which happen to match the delimiter character will
2000-04-18 23:44:58 +00:00
be backslash quoted.
</para>
<para>
You must have <firstterm>select access</firstterm> on any table
whose values are read by
<command>COPY</command>, and either
<firstterm>insert or update access</firstterm> to a
1998-05-13 05:34:00 +00:00
table into which values are being inserted by <command>COPY</command>.
The backend also needs appropriate Unix permissions for any file read
or written by <command>COPY</command>.
</para>
<para>
1998-09-16 14:43:12 +00:00
The keyword phrase USING DELIMITERS specifies a single character
to be used for all delimiters between columns. If multiple characters
are specified in the delimiter string, only the first character is
1998-09-16 14:43:12 +00:00
used.
<tip>
<para>
Do not confuse <command>COPY</command> with the
<application>psql</application> instruction <command>\copy</command>.
</para>
</tip>
</para>
<para>
<command>COPY</command> neither invokes rules nor acts on column defaults.
It does invoke triggers, however.
</para>
<para>
<command>COPY</command> stops operation at the first error. This
should not lead to problems in the event of
a <command>COPY FROM</command>, but the
target relation will, of course, be partially modified in a
<command>COPY TO</command>.
<command>VACUUM</command> should be used to clean up
after a failed copy.
</para>
<para>
Because the Postgres backend's current working directory
is not usually the same as the user's
working directory, the result of copying to a file
"<filename>foo</filename>" (without
additional path information) may yield unexpected results for the
naive user. In this case, <filename>foo</filename>
will wind up in <filename>$PGDATA/foo</filename>. In
general, the full pathname as it would appear to the backend server machine
should be used when specifying files to
be copied.
</para>
<para>
Files used as arguments to <command>COPY</command>
must reside on or be
accessible to the database server machine by being either on
local disks or on a networked file system.
</para>
<para>
When a TCP/IP connection from one machine to another is used, and a
target file is specified, the target file will be written on the
machine where the backend is running rather than the user's
machine.
</para>
</refsect2>
1998-05-13 05:34:00 +00:00
</refsect1>
<refsect1 id="R1-SQL-COPY-2">
1998-05-13 05:34:00 +00:00
<refsect1info>
<date>1998-05-04</date>
</refsect1info>
1998-09-16 14:43:12 +00:00
<title>File Formats</title>
1998-05-13 05:34:00 +00:00
<refsect2>
<refsect2info>
<date>1998-05-04</date>
</refsect2info>
1998-09-16 14:43:12 +00:00
<title>Text Format</title>
1998-05-13 05:34:00 +00:00
<para>
1998-09-16 14:43:12 +00:00
When <command>COPY TO</command> is used without the BINARY option,
the file generated will have each row (instance) on a single line, with each
column (attribute) separated by the delimiter character. Embedded
1998-05-13 05:34:00 +00:00
delimiter characters will be preceded by a backslash character
1998-09-16 14:43:12 +00:00
("\"). The attribute values themselves are strings generated by the
1998-05-13 05:34:00 +00:00
output function associated with each attribute type. The output
function for a type should not try to generate the backslash
character; this will be handled by <command>COPY</command> itself.
</para>
<para>
The actual format for each instance is
<programlisting>
1998-09-16 14:43:12 +00:00
&lt;attr1&gt;&lt;<replaceable class=parameter>separator</replaceable>&gt;&lt;attr2&gt;&lt;<replaceable class=parameter>separator</replaceable>&gt;...&lt;<replaceable class=parameter>separator</replaceable>&gt;&lt;attr<replaceable class="parameter">n</replaceable>&gt;&lt;newline&gt;
</programlisting>
1998-05-13 05:34:00 +00:00
The oid is placed on the beginning of the line
if WITH OIDS is specified.
1998-05-13 05:34:00 +00:00
</para>
<para>
If <command>COPY</command> is sending its output to standard
1998-09-16 14:43:12 +00:00
output instead of a file, it will send a backslash("\") and a period
(".") followed immediately by a newline, on a separate line,
1998-05-13 05:34:00 +00:00
when it is done. Similarly, if <command>COPY</command> is reading
1998-09-16 14:43:12 +00:00
from standard input, it will expect a backslash ("\") and a period
(".") followed by a newline, as the first three characters on a
line to denote end-of-file. However, <command>COPY</command>
1998-05-13 05:34:00 +00:00
will terminate (followed by the backend itself) if a true EOF is
1998-09-16 14:43:12 +00:00
encountered before this special end-of-file pattern is found.
1998-05-13 05:34:00 +00:00
</para>
<para>
The backslash character has other special meanings. A literal backslash
character is represented as two
1998-09-16 14:43:12 +00:00
consecutive backslashes ("\\"). A literal tab character is represented
1998-05-13 05:34:00 +00:00
as a backslash and a tab. A literal newline character is
1998-09-16 14:43:12 +00:00
represented as a backslash and a newline. When loading text data
not generated by <acronym>Postgres</acronym>,
you will need to convert backslash
2000-04-18 18:41:15 +00:00
characters ("\") to double-backslashes ("\\") to ensure that they
are loaded properly.
1998-05-13 05:34:00 +00:00
</para>
</refsect2>
1998-05-13 05:34:00 +00:00
<refsect2>
<refsect2info>
<date>1998-05-04</date>
</refsect2info>
1998-09-16 14:43:12 +00:00
<title>Binary Format</title>
1998-05-13 05:34:00 +00:00
<para>
In the case of <command>COPY BINARY</command>, the first four
bytes in the file will be the number of instances in the file. If
this number is zero, the <command>COPY BINARY</command> command
will read until end of file is encountered. Otherwise, it will
stop reading when this number of instances has been read.
Remaining data in the file will be ignored.
</para>
<para>
The format for each instance in the file is as follows. Note that
this format must be followed <emphasis>exactly</emphasis>.
Unsigned four-byte integer quantities are called uint32 in the
table below.
</para>
<table frame="all">
<title>Contents of a binary copy file</title>
<tgroup cols="2" colsep="1" rowsep="1" align="center">
<colspec colname="col1">
<colspec colname="col2">
1998-05-13 05:34:00 +00:00
<spanspec namest="col1" nameend="col2" spanname="subhead">
<tbody>
<row>
<entry spanname="subhead" align="center">At the start of the file</entry>
1998-05-13 05:34:00 +00:00
</row>
<row>
<entry>uint32</entry>
<entry>number of tuples</entry>
</row>
<row>
<entry spanname="subhead" align="center">For each tuple</entry>
1998-05-13 05:34:00 +00:00
</row>
<row>
<entry>uint32</entry>
<entry>total length of tuple data</entry>
</row>
<row>
<entry>uint32</entry>
<entry>oid (if specified)</entry>
</row>
<row>
<entry>uint32</entry>
<entry>number of null attributes</entry>
</row>
<row>
1998-09-16 14:43:12 +00:00
<entry>[uint32,...,uint32]</entry>
<entry>attribute numbers of attributes, counting from 0</entry>
1998-05-13 05:34:00 +00:00
</row>
<row>
<entry>-</entry>
<entry>&lt;tuple data&gt;</entry>
</row>
</tbody>
</tgroup>
</table>
</refsect2>
<refsect2>
<refsect2info>
<date>1998-05-04</date>
</refsect2info>
1998-09-16 14:43:12 +00:00
<title>Alignment of Binary Data</title>
1998-05-13 05:34:00 +00:00
<para>
On Sun-3s, 2-byte attributes are aligned on two-byte boundaries,
and all larger attributes are aligned on four-byte boundaries.
Character attributes are aligned on single-byte boundaries. On
1998-09-16 14:43:12 +00:00
most other machines, all attributes larger than 1 byte are aligned on
1998-05-13 05:34:00 +00:00
four-byte boundaries. Note that variable length attributes are
preceded by the attribute's length; arrays are simply contiguous
streams of the array element type.
</para>
</refsect2>
</refsect1>
<refsect1 id="R1-SQL-COPY-3">
<title>
1998-05-13 05:34:00 +00:00
Usage
</title>
<para>
1998-09-16 14:43:12 +00:00
The following example copies a table to standard output,
2000-07-22 02:39:10 +00:00
using a pipe (|) as the field
1998-09-16 14:43:12 +00:00
delimiter:
</para>
<programlisting>
1998-09-16 14:43:12 +00:00
COPY country TO <filename>stdout</filename> USING DELIMITERS '|';
</programlisting>
<para>
1998-09-16 14:43:12 +00:00
To copy data from a Unix file into a table "country":
</para>
<programlisting>
1998-09-16 14:43:12 +00:00
COPY country FROM '/usr1/proj/bray/sql/country_data';
</programlisting>
<para>
1998-09-16 14:43:12 +00:00
Here is a sample of data suitable for copying into a table
from <filename>stdin</filename> (so it
1998-05-13 05:34:00 +00:00
has the termination sequence on the last line):
</para>
<programlisting>
1998-05-13 05:34:00 +00:00
AF AFGHANISTAN
AL ALBANIA
DZ ALGERIA
...
ZM ZAMBIA
ZW ZIMBABWE
\.
</programlisting>
<para>
1998-09-16 14:43:12 +00:00
The same data, output in binary format on a Linux/i586 machine.
The data is shown after filtering through
the Unix utility <command>od -c</command>. The table has
three fields; the first is <classname>char(2)</classname>
and the second is <classname>text</classname>. All the
rows have a null value in the third field.
Notice how the <classname>char(2)</classname>
1998-05-13 05:34:00 +00:00
field is padded with nulls to four bytes and the text field is
preceded by its length:
</para>
<programlisting>
1998-05-13 05:34:00 +00:00
355 \0 \0 \0 027 \0 \0 \0 001 \0 \0 \0 002 \0 \0 \0
006 \0 \0 \0 A F \0 \0 017 \0 \0 \0 A F G H
A N I S T A N 023 \0 \0 \0 001 \0 \0 \0 002
\0 \0 \0 006 \0 \0 \0 A L \0 \0 \v \0 \0 \0 A
L B A N I A 023 \0 \0 \0 001 \0 \0 \0 002 \0
\0 \0 006 \0 \0 \0 D Z \0 \0 \v \0 \0 \0 A L
G E R I A
... \n \0 \0 \0 Z A M B I A 024 \0
\0 \0 001 \0 \0 \0 002 \0 \0 \0 006 \0 \0 \0 Z W
\0 \0 \f \0 \0 \0 Z I M B A B W E
</programlisting>
1998-05-13 05:34:00 +00:00
</refsect1>
<refsect1 id="R1-SQL-COPY-6">
<title>
1998-05-13 05:34:00 +00:00
Compatibility
</title>
1998-05-13 05:34:00 +00:00
<refsect2 id="R2-SQL-COPY-4">
<refsect2info>
<date>1998-09-08</date>
</refsect2info>
<title>
1998-05-13 05:34:00 +00:00
SQL92
</title>
<para>
1998-09-16 14:43:12 +00:00
There is no <command>COPY</command> statement in SQL92.
</para>
1998-05-13 05:34:00 +00:00
</refsect2>
</refsect1>
</refentry>
1998-05-13 05:34:00 +00:00
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:nil
1998-05-13 05:34:00 +00:00
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:nil
sgml-default-dtd-file:"../reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:"/usr/lib/sgml/catalog"
sgml-local-ecat-files:nil
End:
-->