This reverts the removal of the call in commit (272923a0). It turns out it
wasn't superfluous after all: without it, renegotiation fails if a client
certificate was used. The rest of the changes in that commit are still OK
and not reverted.
Per investigation of bug #12769 by Arne Scheffer, although this doesn't fix
the reported bug yet.
The client socket is always in non-blocking mode, and if we actually want
blocking behaviour, we emulate it by sleeping and retrying. But we have
retry loops at different layers for reads and writes, which was confusing.
To simplify, remove all the sleeping and retrying code from the lower
levels, from be_tls_read and secure_raw_read and secure_raw_write, and put
all the logic in secure_read() and secure_write().
At least in all modern versions of OpenSSL, it is enough to call
SSL_renegotiate() once, and then forget about it. Subsequent SSL_write()
and SSL_read() calls will finish the handshake.
The SSL_set_session_id_context() call is unnecessary too. We only have
one SSL context, and the SSL session was created with that to begin with.
We'd leak the ident_serv data structure if the second pg_getaddrinfo_all
(the one for the local address) failed. This is not of great consequence
because a failure return here just leads directly to backend exit(), but
if this function is going to try to clean up after itself at all, it should
not have such holes in the logic. Try to fix it in a future-proof way by
having all the failure exits go through the same cleanup path, rather than
"optimizing" some of them.
Per Coverity. Back-patch to 9.2, which is as far back as this patch
applies cleanly.
We used to handle authentication_timeout by setting
ImmediateInterruptOK to true during large parts of the authentication
phase of a new connection. While that happens to work acceptably in
practice, it's not particularly nice and has ugly corner cases.
Previous commits converted the FE/BE communication to use latches and
implemented support for interrupt handling during both
send/recv. Building on top of that work we can get rid of
ImmediateInterruptOK during authentication, by immediately treating
timeouts during authentication as a reason to die. As die interrupts
are handled immediately during client communication that provides a
sensibly quick reaction time to authentication timeout.
Additionally add a few CHECK_FOR_INTERRUPTS() to some more complex
authentication methods. More could be added, but this already should
provides a reasonable coverage.
While it this overall increases the maximum time till a timeout is
reacted to, it greatly reduces complexity and increases
reliability. That seems like a overall win. If the increase proves to
be noticeable we can deal with those cases by moving to nonblocking
network code and add interrupt checking there.
Reviewed-By: Heikki Linnakangas
Up to now it was impossible to terminate a backend that was trying to
send/recv data to/from the client when the socket's buffer was already
full/empty. While the send/recv calls itself might have gotten
interrupted by signals on some platforms, we just immediately retried.
That could lead to situations where a backend couldn't be terminated ,
after a client died without the connection being closed, because it
was blocked in send/recv.
The problem was far more likely to be hit when sending data than when
reading. That's because while reading a command from the client, and
during authentication, we processed interrupts immediately . That
primarily left COPY FROM STDIN as being problematic for recv.
Change things so that that we process 'die' events immediately when
the appropriate signal arrives. We can't sensibly react to query
cancels at that point, because we might loose sync with the client as
we could be in the middle of writing a message.
We don't interrupt writes if the write buffer isn't full, as indicated
by write() returning EWOULDBLOCK, as that would lead to fewer error
messages reaching clients.
Per discussion with Kyotaro HORIGUCHI and Heikki Linnakangas
Discussion: 20140927191243.GD5423@alap3.anarazel.de
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
This allows to introduce more elaborate handling of interrupts while
reading from a socket. Currently some interrupt handlers have to do
significant work from inside signal handlers, and it's very hard to
correctly write code to do so. Generic signal handler limitations,
combined with the fact that we can't safely jump out of a signal
handler while reading from the client have prohibited implementation
of features like timeouts for idle-in-transaction.
Additionally we use the latch code to wait in a couple places where we
previously only had waiting code on windows as other platforms just
busy looped.
This can increase the number of systemcalls happening during FE/BE
communication. Benchmarks so far indicate that the impact isn't very
high, and there's room for optimization in the latch code. The chance
of cleaning up the usage of latches gives us, seem to outweigh the
risk of small performance regressions.
This commit theoretically can't used without the next patch in the
series, as WaitLatchOrSocket is not defined to be fully signal
safe. As we already do that in some cases though, it seems better to
keep the commits separate, so they're easier to understand.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.
We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.
To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.
To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.
Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.
Security: CVE-2015-0244
strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.
A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about. So just use memcpy() in
these cases.
In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).
I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution). There are also a
few such calls in ecpg that could possibly do with more analysis.
AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior. These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.
This can happen if an error occurs in a standalone backend. This bug
was introduced by commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d.
Reported by Álvaro Herrera.
Transactions can now set their commit timestamp directly as they commit,
or an external transaction commit timestamp can be fed from an outside
system using the new function TransactionTreeSetCommitTsData(). This
data is crash-safe, and truncated at Xid freeze point, same as pg_clog.
This module is disabled by default because it causes a performance hit,
but can be enabled in postgresql.conf requiring only a server restart.
A new test in src/test/modules is included.
Catalog version bumped due to the new subdirectory within PGDATA and a
couple of new SQL functions.
Authors: Álvaro Herrera and Petr Jelínek
Reviewed to varying degrees by Michael Paquier, Andres Freund, Robert
Haas, Amit Kapila, Fujii Masao, Jaime Casanova, Simon Riggs, Steven
Singer, Peter Eisentraut
Code that check the flag no longer need #ifdef's, which is more convenient.
In particular, makes it easier to write extensions that depend on it.
In the passing, modify sslinfo's ssl_is_used function to check ssl_in_use
instead of the OpenSSL specific 'ssl' pointer. It doesn't make any
difference currently, as sslinfo is only compiled when built with OpenSSL,
but seems cleaner anyway.
Obviously, every translation unit should not be declaring this
separately. It needs to be PGDLLIMPORT as well, to avoid breaking
third-party code that uses any of the functions that the commit
mentioned above changed to macros.
A background worker can use pq_redirect_to_shm_mq() to direct protocol
that would normally be sent to the frontend to a shm_mq so that another
process may read them.
The receiving process may use pq_parse_errornotice() to parse an
ErrorResponse or NoticeResponse from the background worker and, if
it wishes, ThrowErrorData() to propagate the error (with or without
further modification).
Patch by me. Review by Andres Freund.
Move the functions within the file so that public interface functions come
first, followed by internal functions. Previously, be_tls_write was first,
then internal stuff, and finally the rest of the public interface, which
clearly didn't make much sense.
Per Andres Freund's complaint.
This refactoring is in preparation for adding support for other SSL
implementations, with no user-visible effects. There are now two #defines,
USE_OPENSSL which is defined when building with OpenSSL, and USE_SSL which
is defined when building with any SSL implementation. Currently, OpenSSL is
the only implementation so the two #defines go together, but USE_SSL is
supposed to be used for implementation-independent code.
The libpq SSL code is changed to use a custom BIO, which does all the raw
I/O, like we've been doing in the backend for a long time. That makes it
possible to use MSG_NOSIGNAL to block SIGPIPE when using SSL, which avoids
a couple of syscall for each send(). Probably doesn't make much performance
difference in practice - the SSL encryption is expensive enough to mask the
effect - but it was a natural result of this refactoring.
Based on a patch by Martijn van Oosterhout from 2006. Briefly reviewed by
Alvaro Herrera, Andreas Karlsson, Jeff Janes.
The previous naming broke the query that libpq's lo_initialize() uses
to collect the OIDs of the server-side functions it requires, because
that query effectively assumes that there is only one function named
lo_create in the pg_catalog schema (and likewise only one lo_open, etc).
While we should certainly make libpq more robust about this, the naive
query will remain in use in the field for the foreseeable future, so it
seems the only workable choice is to use a different name for the new
function. lo_from_bytea() won a small straw poll.
Back-patch into 9.4 where the new function was introduced.
According to the Single Unix Spec and assorted man pages, you're supposed
to use the constants named AF_xxx when setting ai_family for a getaddrinfo
call. In a few places we were using PF_xxx instead. Use of PF_xxx
appears to be an ancient BSD convention that was not adopted by later
standardization. On BSD and most later Unixen, it doesn't matter much
because those constants have equivalent values anyway; but nonetheless
this code is not per spec.
In the same vein, replace PF_INET by AF_INET in one socket() call, which
wasn't even consistent with the other socket() call in the same function
let alone the remainder of our code.
Per investigation of a Cygwin trouble report from Marco Atzeri. It's
probably a long shot that this will fix his issue, but it's wrong in
any case.
Previously, in some places, socket creation errors were checked for
negative values, which is not true for Windows because sockets are
unsigned. This masked socket creation errors on Windows.
Backpatch through 9.0. 8.4 doesn't have the infrastructure to fix this.
The code for matching clients to pg_hba.conf lines that specify host names
(instead of IP address ranges) failed to complain if reverse DNS lookup
failed; instead it silently didn't match, so that you might end up getting
a surprising "no pg_hba.conf entry for ..." error, as seen in bug #9518
from Mike Blackwell. Since we don't want to make this a fatal error in
situations where pg_hba.conf contains a mixture of host names and IP
addresses (clients matching one of the numeric entries should not have to
have rDNS data), remember the lookup failure and mention it as DETAIL if
we get to "no pg_hba.conf entry". Apply the same approach to forward-DNS
lookup failures, too, rather than treating them as immediate hard errors.
Along the way, fix a couple of bugs that prevented us from detecting an
rDNS lookup error reliably, and make sure that we make only one rDNS lookup
attempt; formerly, if the lookup attempt failed, the code would try again
for each host name entry in pg_hba.conf. Since more or less the whole
point of this design is to ensure there's only one lookup attempt not one
per entry, the latter point represents a performance bug that seems
sufficient justification for back-patching.
Also, adjust src/port/getaddrinfo.c so that it plays as well as it can
with this code. Which is not all that well, since it does not have actual
support for rDNS lookup, but at least it should return the expected (and
required by spec) error codes so that the main code correctly perceives the
lack of functionality as a lookup failure. It's unlikely that PG is still
being used in production on any machines that require our getaddrinfo.c,
so I'm not excited about working harder than this.
To keep the code in the various branches similar, this includes
back-patching commits c424d0d1052cb4053c8712ac44123f9b9a9aa3f2 and
1997f34db4687e671690ed054c8f30bb501b1168 into 9.2 and earlier.
Back-patch to 9.1 where the facility for hostnames in pg_hba.conf was
introduced.
Commit 613c6d26bd42dd8c2dd0664315be9551475b8864 sloppily replaced a
lookup of the UID obtained from getpeereid() with a lookup of the
server's own user name, thus totally destroying peer authentication.
Revert. Per report from Christoph Berg.
In passing, make sure get_user_name() zeroes *errstr on success on
Windows as well as non-Windows. I don't think any callers actually
depend on this ATM, but we should be consistent across platforms.
krb_srvname is actually not available anymore as a parameter server-side, since
with gssapi we accept all principals in our keytab. It's still used in libpq for
client side specification.
In passing remove declaration of krb_server_hostname, where all the functionality
was already removed.
Noted by Stephen Frost, though a different solution than his suggestion
Additional non-security issues/improvements spotted by Coverity.
In backend/libpq, no sense trying to protect against port->hba being
NULL after we've already dereferenced it in the switch() statement.
Prevent against possible overflow due to 32bit arithmitic in
basebackup throttling (not yet released, so no security concern).
Remove nonsensical check of array pointer against NULL in procarray.c,
looks to be a holdover from 9.1 and earlier when there were pointers
being used but now it's just an array.
Remove pointer check-against-NULL in tsearch/spell.c as we had already
dereferenced it above (in the strcmp()).
Remove dead code from adt/orderedsetaggs.c, isnull is checked
immediately after each tuplesort_getdatum() call and if true we return,
so no point checking it again down at the bottom.
Remove recently added minor error-condition memory leak in pg_regress.
A number of issues were identified by the Coverity scanner and are
addressed in this patch. None of these appear to be security issues
and many are mostly cosmetic changes.
Short comments for each of the changes follows.
Correct the semi-colon placement in be-secure.c regarding SSL retries.
Remove a useless comparison-to-NULL in proc.c (value is dereferenced
prior to this check and therefore can't be NULL).
Add checking of chmod() return values to initdb.
Fix a couple minor memory leaks in initdb.
Fix memory leak in pg_ctl- involves free'ing the config file contents.
Use an int to capture fgetc() return instead of an enum in pg_dump.
Fix minor memory leaks in pg_dump.
(note minor change to convertOperatorReference()'s API)
Check fclose()/remove() return codes in psql.
Check fstat(), find_my_exec() return codes in psql.
Various ECPG memory leak fixes.
Check find_my_exec() return in ECPG.
Explicitly ignore pqFlush return in libpq error-path.
Change PQfnumber() to avoid doing an strdup() when no changes required.
Remove a few useless check-against-NULL's (value deref'd beforehand).
Check rmtree(), malloc() results in pg_regress.
Also check get_alternative_expectfile() return in pg_regress.
Commit 820f08cabdcbb8998050c3d4873e9619d6d8cba4 claimed to make the server
and libpq handle SSL protocol versions identically, but actually the server
was still accepting SSL v3 protocol while libpq wasn't. Per discussion,
SSL v3 is obsolete, and there's no good reason to continue to accept it.
So make the code really equivalent on both sides. The behavior now is
that we use the highest mutually-supported TLS protocol version.
Marko Kreen, some comment-smithing by me
It's worth distinguishing these cases from run-of-the-mill wrong-password
problems, since users have been known to waste lots of time pursuing the
wrong theory about what's failing. Now, our longstanding policy about how
to report authentication failures is that we don't really want to tell the
*client* such things, since that might be giving information to a bad guy.
But there's nothing wrong with reporting the details to the postmaster log,
and indeed the comments in this area of the code contemplate that
interesting details should be so reported. We just weren't handling these
particular interesting cases usefully.
To fix, add infrastructure allowing subroutines of ClientAuthentication()
to return a string to be added to the errdetail_log field of the main
authentication-failed error report. We might later want to use this to
report other subcases of authentication failure the same way, but for the
moment I just dealt with password cases.
Per discussion of a patch from Josh Drake, though this is not what
he proposed.
krb5 has been deprecated since 8.3, and the recommended way to do
Kerberos authentication is using the GSSAPI authentication method
(which is still fully supported).
libpq retains the ability to identify krb5 authentication, but only
gives an error message about it being unsupported. Since all authentication
is initiated from the backend, there is no need to keep it at all
in the backend.
Previously, lookups of non-existent user names could return "Success";
it will now return "User does not exist" by resetting errno. This also
centralizes the user name lookup code in libpgport.
Report and analysis by Nicolas Marchildon; patch by me
This sets up ECDH key exchange, when compiling against OpenSSL that
supports EC. Then the ECDHE-RSA and ECDHE-ECDSA cipher suites can be
used for SSL connections. The latter one means that EC keys are now
usable.
The reason for EC key exchange is that it's faster than DHE and it
allows to go to higher security levels where RSA will be horribly slow.
There is also new GUC option ssl_ecdh_curve that specifies the curve
name used for ECDH. It defaults to "prime256v1", which is the most
common curve in use in HTTPS.
From: Marko Kreen <markokr@gmail.com>
Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
By default, OpenSSL (and SSL/TLS in general) lets the client cipher
order take priority. This is OK for browsers where the ciphers were
tuned, but few PostgreSQL client libraries make the cipher order
configurable. So it makes sense to have the cipher order in
postgresql.conf take priority over client defaults.
This patch adds the setting "ssl_prefer_server_ciphers" that can be
turned on so that server cipher order is preferred. Per discussion,
this now defaults to on.
From: Marko Kreen <markokr@gmail.com>
Reviewed-by: Adrian Klaver <adrian.klaver@gmail.com>
Current OpenSSL code includes a BIO_clear_retry_flags() step in the
sock_write() function. Either we failed to copy the code correctly, or
they added this since we copied it. In any case, lack of the clear step
appears to be the cause of the server lockup after connection loss reported
in bug #8647 from Valentine Gogichashvili. Assume that this is correct
coding for all OpenSSL versions, and hence back-patch to all supported
branches.
Diagnosis and patch by Alexander Kukushkin.
These functions must be careful that they return the intended value of
errno to their callers. There were several scenarios where this might
not happen:
1. The recent SSL renegotiation patch added a hunk of code that would
execute after setting errno. In the first place, it's doubtful that we
should consider renegotiation to be successfully completed after a failure,
and in the second, there's no real guarantee that the called OpenSSL
routines wouldn't clobber errno. Fix by not executing that hunk except
during success exit.
2. errno was left in an unknown state in case of an unrecognized return
code from SSL_get_error(). While this is a "can't happen" case, it seems
like a good idea to be sure we know what would happen, so reset errno to
ECONNRESET in such cases. (The corresponding code in libpq's fe-secure.c
already did this.)
3. There was an (undocumented) assumption that client_read_ended() wouldn't
change errno. While true in the current state of the code, this seems less
than future-proof. Add explicit saving/restoring of errno to make sure
that changes in the called functions won't break things.
I see no need to back-patch, since #1 is new code and the other two issues
are mostly hypothetical.
Per discussion with Amit Kapila.
With these, one need no longer manipulate large object descriptors and
extract numeric constants from header files in order to read and write
large object contents from SQL.
Pavel Stehule, reviewed by Rushabh Lathia.
asprintf(), aside from not being particularly portable, has a fundamentally
badly-designed API; the psprintf() function that was added in passing in
the previous patch has a much better API choice. Moreover, the NetBSD
implementation that was borrowed for the previous patch doesn't work with
non-C99-compliant vsnprintf, which is something we still have to cope with
on some platforms; and it depends on va_copy which isn't all that portable
either. Get rid of that code in favor of an implementation similar to what
we've used for many years in stringinfo.c. Also, move it into libpgcommon
since it's not really libpgport material.
I think this patch will be enough to turn the buildfarm green again, but
there's still cosmetic work left to do, namely get rid of pg_asprintf()
in favor of using psprintf(). That will come in a followon patch.
Add asprintf(), pg_asprintf(), and psprintf() to simplify string
allocation and composition. Replacement implementations taken from
NetBSD.
Reviewed-by: Álvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Asif Naeem <anaeem.it@gmail.com>
The existing renegotiation code was home for several bugs: it might
erroneously report that renegotiation had failed; it might try to
execute another renegotiation while the previous one was pending; it
failed to terminate the connection if the renegotiation never actually
took place; if a renegotiation was started, the byte count was reset,
even if the renegotiation wasn't completed (this isn't good from a
security perspective because it means continuing to use a session that
should be considered compromised due to volume of data transferred.)
The new code is structured to avoid these pitfalls: renegotiation is
started a little earlier than the limit has expired; the handshake
sequence is retried until it has actually returned successfully, and no
more than that, but if it fails too many times, the connection is
closed. The byte count is reset only when the renegotiation has
succeeded, and if the renegotiation byte count limit expires, the
connection is terminated.
This commit only touches the master branch, because some of the changes
are controversial. If everything goes well, a back-patch might be
considered.
Per discussion started by message
20130710212017.GB4941@eldon.alvh.no-ip.org