Advanced Features

Advanced Features Introduction In the previous chapter we have covered the basics of using SQL to store and access your data in PostgreSQL. We will now discuss some more advanced features of SQL that simplify management and prevent loss or corruption of your data. Finally, we will look at some PostgreSQL extensions. This chapter will on occasion refer to examples found in to change or improve them, so it will be of advantage if you have read that chapter. Some examples from this chapter can also be found in advanced.sql in the tutorial directory. This file also contains some example data to load, which is not repeated here. (Refer to for how to use the file.) Views view Refer back to the queries in . Suppose the combined listing of weather records and city location is of particular interest to your application, but you do not want to type the query each time you need it. You can create a view over the query, which gives a name to the query that you can refer to like an ordinary table. CREATE VIEW myview AS SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name; SELECT * FROM myview; Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which may change as your application evolves, behind consistent interfaces. Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon. Foreign Keys foreign key referential integrity Recall the weather and cities tables from . Consider the following problem: You want to make sure that no one can insert rows in the weather table that do not have a matching entry in the cities table. This is called maintaining the referential integrity of your data. In simplistic database systems this would be implemented (if at all) by first looking at the cities table to check if a matching record exists, and then inserting or rejecting the new weather records. This approach has a number of problems and is very inconvenient, so PostgreSQL can do this for you. The new declaration of the tables would look like this: CREATE TABLE cities ( city varchar(80) primary key, location point ); CREATE TABLE weather ( city varchar(80) references cities(city), temp_lo int, temp_hi int, prcp real, date date ); Now try inserting an invalid record: INSERT INTO weather VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28'); ERROR: insert or update on table "weather" violates foreign key constraint "weather_city_fkey" DETAIL: Key (city)=(Berkeley) is not present in table "cities". The behavior of foreign keys can be finely tuned to your application. We will not go beyond this simple example in this tutorial, but just refer you to for more information. Making correct use of foreign keys will definitely improve the quality of your database applications, so you are strongly encouraged to learn about them. Transactions transaction Transactions are a fundamental concept of all database systems. The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation. The intermediate states between the steps are not visible to other concurrent transactions, and if some failure occurs that prevents the transaction from completing, then none of the steps affect the database at all. For example, consider a bank database that contains balances for various customer accounts, as well as total deposit balances for branches. Suppose that we want to record a payment of $100.00 from Alice's account to Bob's account. Simplifying outrageously, the SQL commands for this might look like UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; UPDATE branches SET balance = balance - 100.00 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Alice'); UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; UPDATE branches SET balance = balance + 100.00 WHERE name = (SELECT branch_name FROM accounts WHERE name = 'Bob'); The details of these commands are not important here; the important point is that there are several separate updates involved to accomplish this rather simple operation. Our bank's officers will want to be assured that either all these updates happen, or none of them happen. It would certainly not do for a system failure to result in Bob receiving $100.00 that was not debited from Alice. Nor would Alice long remain a happy customer if she was debited without Bob being credited. We need a guarantee that if something goes wrong partway through the operation, none of the steps executed so far will take effect. Grouping the updates into a transaction gives us this guarantee. A transaction is said to be atomic: from the point of view of other transactions, it either happens completely or not at all. We also want a guarantee that once a transaction is completed and acknowledged by the database system, it has indeed been permanently recorded and won't be lost even if a crash ensues shortly thereafter. For example, if we are recording a cash withdrawal by Bob, we do not want any chance that the debit to his account will disappear in a crash just as he walks out the bank door. A transactional database guarantees that all the updates made by a transaction are logged in permanent storage (i.e., on disk) before the transaction is reported complete. Another important property of transactional databases is closely related to the notion of atomic updates: when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others. For example, if one transaction is busy totalling all the branch balances, it would not do for it to include the debit from Alice's branch but not the credit to Bob's branch, nor vice versa. So transactions must be all-or-nothing not only in terms of their permanent effect on the database, but also in terms of their visibility as they happen. The updates made so far by an open transaction are invisible to other transactions until the transaction completes, whereupon all the updates become visible simultaneously. In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN and COMMIT commands. So our banking transaction would actually look like BEGIN; UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; -- etc etc COMMIT; If, partway through the transaction, we decide we do not want to commit (perhaps we just noticed that Alice's balance went negative), we can issue the command ROLLBACK instead of COMMIT, and all our updates so far will be canceled. PostgreSQL actually treats every SQL statement as being executed within a transaction. If you do not issue a BEGIN command, then each individual statement has an implicit BEGIN and (if successful) COMMIT wrapped around it. A group of statements surrounded by BEGIN and COMMIT is sometimes called a transaction block. Some client libraries issue BEGIN and COMMIT commands automatically, so that you may get the effect of transaction blocks without asking. Check the documentation for the interface you are using. It's possible to control the statements in a transaction in a more granular fashion through the use of savepoints. Savepoints allow you to selectively discard parts of the transaction, while committing the rest. After defining a savepoint with SAVEPOINT, you can if needed roll back to the savepoint with ROLLBACK TO. All the transaction's database changes between defining the savepoint and rolling back to it are discarded, but changes earlier than the savepoint are kept. After rolling back to a savepoint, it continues to be defined, so you can roll back to it several times. Conversely, if you are sure you won't need to roll back to a particular savepoint again, it can be released, so the system can free some resources. Keep in mind that either releasing or rolling back to a savepoint will automatically release all savepoints that were defined after it. All this is happening within the transaction block, so none of it is visible to other database sessions. When and if you commit the transaction block, the committed actions become visible as a unit to other sessions, while the rolled-back actions never become visible at all. Remembering the bank database, suppose we debit $100.00 from Alice's account, and credit Bob's account, only to find later that we should have credited Wally's account. We could do it using savepoints like this: BEGIN; UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; SAVEPOINT my_savepoint; UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; -- oops ... forget that and use Wally's account ROLLBACK TO my_savepoint; UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Wally'; COMMIT; This example is, of course, oversimplified, but there's a lot of control to be had over a transaction block through the use of savepoints. Moreover, ROLLBACK TO is the only way to regain control of a transaction block that was put in aborted state by the system due to an error, short of rolling it back completely and starting again. Inheritance inheritance Inheritance is a concept from object-oriented databases. Although it opens up interesting new possibilities of database design, this feature is currently unmaintained and known to have serious gotchas in its foreign key implementation, which you should take care to avoid. The fixes below are probably version-specific and may require updates in the future. The example below illustrates the gotcha. BEGIN; CREATE TABLE foo ( foo_id SERIAL PRIMARY KEY ); CREATE TABLE parent ( parent_id SERIAL PRIMARY KEY , foo_id INTEGER NOT NULL REFERENCES foo(foo_id) ON DELETE CASCADE , parent_1_text TEXT NOT NULL ); CREATE TABLE child_1 ( child_1_text TEXT NOT NULL ) INHERITS(parent); CREATE TABLE child_2 ( child_2_text TEXT NOT NULL ) INHERITS(parent); INSERT INTO foo VALUES(DEFAULT); INSERT INTO child_1 (foo_id, parent_1_text, child_1_text) VALUES (currval('public.foo_foo_id_seq'), 'parent text 1', 'child_1 text 1'); INSERT INTO foo VALUES(DEFAULT); INSERT INTO child_1 (foo_id, parent_1_text, child_1_text) VALUES (currval('public.foo_foo_id_seq'), 'parent text 2', 'child_1 text 2'); INSERT INTO foo VALUES(DEFAULT); INSERT INTO child_2 (foo_id, parent_1_text, child_2_text) VALUES (currval('foo_foo_id_seq'), 'parent text 3', 'child_2 text 1'); DELETE FROM foo WHERE foo_id = 1; SELECT * FROM parent; parent_id | foo_id | parent_1_text -----------+--------+--------------- 1 | 1 | parent text 1 2 | 2 | parent text 2 3 | 3 | parent text 3 (3 rows) SELECT * FROM child_1; parent_id | foo_id | parent_1_text | child_1_text -----------+--------+---------------+---------------- 1 | 1 | parent text 1 | child_1 text 1 2 | 2 | parent text 2 | child_1 text 2 (2 rows) ROLLBACK; Oops!! None of parent, child or foo should have any rows with foo_id = 1 in them. Here is a way to fix the above tables. To fix the gotcha, you must put foreign key constraints on each of the child tables, as they will not be automatically inherited as you might expect. ALTER TABLE child_1 ADD CONSTRAINT cascade_foo FOREIGN KEY (foo_id) REFERENCES foo(foo_id) ON DELETE CASCADE; ALTER TABLE child_2 ADD CONSTRAINT cascade_foo FOREIGN KEY (foo_id) REFERENCES foo(foo_id) ON DELETE CASCADE; That caveat out of the way, let's create two tables: A table cities and a table capitals. Naturally, capitals are also cities, so you want some way to show the capitals implicitly when you list all cities. If you're really clever you might invent some scheme like this: CREATE TABLE capitals ( name text, population real, altitude int, -- (in ft) state char(2) ); CREATE TABLE non_capitals ( name text, population real, altitude int -- (in ft) ); CREATE VIEW cities AS SELECT name, population, altitude FROM capitals UNION SELECT name, population, altitude FROM non_capitals; This works OK as far as querying goes, but it gets ugly when you need to update several rows, for one thing. A better solution is this: CREATE TABLE cities ( name text, population real, altitude int -- (in ft) ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities); In this case, a row of capitals inherits all columns (name, population, and altitude) from its parent, cities. The type of the column name is text, a native PostgreSQL type for variable length character strings. State capitals have an extra column, state, that shows their state. In PostgreSQL, a table can inherit from zero or more other tables. For example, the following query finds the names of all cities, including state capitals, that are located at an altitude over 500 ft.: SELECT name, altitude FROM cities WHERE altitude > 500; which returns: name | altitude -----------+---------- Las Vegas | 2174 Mariposa | 1953 Madison | 845 (3 rows) On the other hand, the following query finds all the cities that are not state capitals and are situated at an altitude of 500 ft. or higher: SELECT name, altitude FROM ONLY cities WHERE altitude > 500; name | altitude -----------+---------- Las Vegas | 2174 Mariposa | 1953 (2 rows) Here the ONLY before cities indicates that the query should be run over only the cities table, and not tables below cities in the inheritance hierarchy. Many of the commands that we have already discussed -- SELECT, UPDATE, and DELETE -- support this ONLY notation. Conclusion PostgreSQL has many features not touched upon in this tutorial introduction, which has been oriented toward newer users of SQL. These features are discussed in more detail in the remainder of this book. If you feel you need more introductory material, please visit the PostgreSQL web site for links to more resources.