* Fix for a potential bug:

when the SQL thread stops, set rli->inside_transaction to 0. This is needed if the user later restarts replication from a completely different place where there are only autocommit statements. * Detect the case where the master died while flushing the binlog cache to the binlog and stop with error. Cannot add a testcase for this in 4.0 (I tested it manually) as the slave always runs with --skip-innodb. sql/log_event.cc: Detect the case where the master died while flushing the binlog cache to the binlog: in that case, we have a BEGIN with no COMMIT/ROLLBACK in the relay log; we detect this with rli->inside_transaction in Rotate_log_event::exec_event() (which is the only right place to detect this, see comments). When we see it, we stop with error. In 4.1, I had put code in Start_log_event::exec_event(); I'll remove it next time I push in the 4.1 tree. sql/slave.cc: * Use slave_print_error instead of sql_print_error, to put the info in SHOW SLAVE STATUS too. * Fix for a potential bug: when the SQL thread stops, set rli->inside_transaction to 0. This is not needed if replication later restarts from the same position; but this is needed if the user restarts replication from a completely different place where there are only autocommit statements (in that case, if we didn't set to 0, the position would never increment in SHOW SLAVE STATUS, even if queries are processed well).
2003-08-23 16:53:04 +02:00 · 2003-08-23 16:53:04 +02:00 · 6e10224d71
commit 6e10224d71
parent e3563c7911
2 changed files with 43 additions and 8 deletions
--- a/sql/log_event.cc
+++ b/sql/log_event.cc
@ -2068,9 +2068,6 @@ Fatal error running LOAD DATA INFILE on table '%s'. Default database: '%s'",

  TODO
    - Remove all active user locks
-    - If we have an active transaction at this point, the master died
-      in the middle while writing the transaction to the binary log.
-      In this case we should stop the slave.
 */

 int Start_log_event::exec_event(struct st_relay_log_info* rli)
@ -2098,8 +2095,10 @@ int Start_log_event::exec_event(struct st_relay_log_info* rli)
    break;
 case BINLOG_FORMAT_323_GEQ_57 : 
    /* Can distinguish, based on the value of 'created' */
-    if (created) /* this was generated at master startup*/
-      close_temporary_tables(thd);
+    if (!created) 
+      break;
+    /* otherwise this was generated at master startup*/  
+    close_temporary_tables(thd);
    break;
  default :
    /* this case is impossible */
@ -2156,10 +2155,28 @@ int Stop_log_event::exec_event(struct st_relay_log_info* rli)
    We can't rotate the slave as this will cause infinitive rotations
    in a A -> B -> A setup.

+  NOTES
+    As a transaction NEVER spans on 2 or more binlogs:
+    if we have an active transaction at this point, the master died while
+    writing the transaction to the binary log, i.e. while flushing the binlog
+    cache to the binlog. As the write was started, the transaction had been
+    committed on the master, so we lack of information to replay this
+    transaction on the slave; all we can do is stop with error.
+    If we didn't detect it, then positions would start to become garbage (as we
+    are incrementing rli->relay_log_pos whereas we are in a transaction: the new
+    rli->relay_log_pos will be
+    relay_log_pos of the BEGIN + size of the Rotate event = garbage.
+
+    Since MySQL 4.0.14, the master ALWAYS sends a Rotate event when it starts
+    sending the next binlog, so we are sure to receive a Rotate event just
+    after the end of the "dead master"'s binlog; so this exec_event() is the
+    right place to catch the problem. If we would wait until
+    Start_log_event::exec_event() it would be too late, rli->relay_log_pos would
+    already be garbage.
+
  RETURN VALUES
    0	ok
- */
-  
+*/

 int Rotate_log_event::exec_event(struct st_relay_log_info* rli)
 {
@ -2167,6 +2184,18 @@ int Rotate_log_event::exec_event(struct st_relay_log_info* rli)
  DBUG_ENTER("Rotate_log_event::exec_event");

  pthread_mutex_lock(&rli->data_lock);
+
+  if (rli->inside_transaction)
+  {
+    slave_print_error(rli, 0,
+                      "there is an unfinished transaction in the relay log \
+(could find neither COMMIT nor ROLLBACK in the relay log); it could be that \
+the master died while writing the transaction to its binary log. Now the slave \
+is rolling back the transaction.");
+    pthread_mutex_unlock(&rli->data_lock);
+    DBUG_RETURN(1);
+  }
+
  memcpy(log_name, new_log_ident, ident_len+1);
  rli->master_log_pos = pos;
  rli->relay_log_pos += get_event_len();
--- a/sql/slave.cc
+++ b/sql/slave.cc
@ -2260,7 +2260,7 @@ static int exec_relay_log_event(THD* thd, RELAY_LOG_INFO* rli)
  }
  else
  {
-    sql_print_error("\
+    slave_print_error(rli, 0, "\
 Could not parse relay log event entry. The possible reasons are: the master's \
 binary log is corrupted (you can check this by running 'mysqlbinlog' on the \
 binary log), the slave's relay log is corrupted (you can check this by running \
@ -2695,6 +2695,12 @@ the slave SQL thread with \"SLAVE START\". We stopped at log \
  DBUG_ASSERT(rli->slave_running == 1); // tracking buffer overrun
  /* When master_pos_wait() wakes up it will check this and terminate */
  rli->slave_running= 0; 
+  /* 
+     Going out of the transaction. Necessary to mark it, in case the user
+     restarts replication from a non-transactional statement (with CHANGE
+     MASTER).
+  */
+  rli->inside_transaction= 0;
  /* Wake up master_pos_wait() */
  pthread_mutex_unlock(&rli->data_lock);
  DBUG_PRINT("info",("Signaling possibly waiting master_pos_wait() functions"));