MDEV-27025 allows to insert records before the record on which DELETE is
locked, as a result the DELETE misses those records, what causes serious ACID
violation.
Revert MDEV-27025, MDEV-27550. The test which shows the scenario of ACID
violation is added.
During an increase in resize, the new curr_size got a value
less than old_size.
As n_chunks_new and n_chunks have a strong correlation to the
resizing operation in progress, we can use them and remove the
need for old_size.
For convienece the n_chunks_new < n_chunks is now the is_shrinking
function.
The volatile compiler optimization on n_chunks{,_new} is removed
as real mutex uses are needed.
Other n_chunks_new/n_chunks methods:
n_chunks_new and n_chunks almost always read and altered under
the pool mutex.
Exceptions are:
* i_s_innodb_buffer_page_fill,
* buf_pool_t::is_uncompressed (via is_blocked_field)
These need reexamining for the need of a mutex, however comments
indicates this already.
get_n_pages has uses in buffer pool load, recover log memory
exhaustion estimates and innodb status so take the minimum number
of chunks for safety.
The buf_pool_t::running_out function also uses curr_size/old_size.
We replace this hot function calculation with just n_chunks_new.
This is the new size of the chunks before the resizing occurs.
If we are resizing down, we've already got the case we had previously
(as the minimum). If we are resizing upwards, we are taking an
optimistic view that there will be buffer chunks available for locks.
As this memory allocation is occurring immediately next the resizing
function it seems likely.
Compiler hint UNIV_UNLIKELY removed to leave it to the branch
predictor to make an informed decision.
Added test case of a smaller size than the Marko/Roel original
in JIRA reducing the size to 256M. SEGV hits roughly 1/10 times
but its better than a 21G memory size.
Reviewer: Marko
The virtual member function handler::reset_auto_increment(ulonglong)
is only ever invoked by the default implementation of the virtual
member function handler::truncate().
Because ha_innobase::truncate() overrides handler::truncate() without
ever invoking handler::truncate(), some InnoDB member functions are
never called.
ha_innobase::innobase_reset_autoinc(), ha_innobase::reset_auto_increment():
Removed (unreachable code).
ha_innobase::delete_all_rows(): Removed. The default implementation
handler::delete_all_rows() works just as fine.
- InnoDB FTS DDL decrements the FTS_DOC_ID when there is a
deleted marked record involved. FTS_DOC_ID must never be
reused. The purpose of FTS_DOC_ID is to be a unique row
identifier that will be changed whenever a fulltext indexed
column is updated.
btr_cur_optimistic_insert(): Disregard DEBUG_DBUG injection to
invoke btr_page_reorganize() if the page (and the table) is empty.
Otherwise, an assertion would fail in btr_page_reorganize_low()
because PAGE_MAX_TRX_ID is 0 in an empty secondary index leaf page.
create_table_info_t::innobase_table_flags(): Ignore page_compressed
and page_compression_level on TEMPORARY tables.
ha_innobase::truncate(): Add a debug assertion that create() must succeed
on temporary tables.
udf_handler::fix_fields(): Execute an assignment outside "if"
so that GCC 12 will not issue a bogus-looking warning.
Also, deduplicate some error handling code.
- Server incorrectly downgrading the MDL after prepare phase when
table is empty. mdl_exclusive_after_prepare is being set in
prepare phase only. But mdl_exclusive_after_prepare condition was
misplaced and checked before prepare phase by
commit d270525dfde86bcb92a2327234a0954083e14a94 and it is now
changed to check after prepare phase.
- main.innodb_mysql_sync test case was changed to avoid locking
optimization when table is empty.
The test innodb.log_file_size would occasionally crash in
a debug assertion failure in srv_prepare_to_delete_redo_log_file().
In the core dump, the assertion would hold.
buf_pool_t::any_io_pending(): Avoid dirty reads by acquiring
buf_pool.mutex. This function is only being invoked in debug builds,
so we do not care about any performance impact.
The follwoing could happen if InnoDB did some background work while
test was running:
@@ -5,6 +5,8 @@
SELECT LOCK_MODE, LOCK_TYPE, TABLE_SCHEMA, TABLE_NAME FROM information_schema.metadata_lock_info;
LOCK_MODE LOCK_TYPE TABLE_SCHEMA TABLE_NAME
MDL_SHARED_HIGH_PRIO Table metadata lock test t1
+MDL_SHARED Table metadata lock mysql innodb_table_stats
+MDL_SHARED Table metadata lock mysql innodb_index_stat
In commit c7d0448797a2013079c3e0b7ca97f7bdc1138696 (MDEV-15132)
MariaDB Server 10.3 stopped writing the latest transaction identifier
to the TRX_SYS page. Instead, the transaction identifier will be
recovered from undo log pages.
Unfortunately, before commit 3926673ce7149aa223103126b6aeac819b10fab5
and mysql/mysql-server@dc29792ff2
(MySQL 5.1.48 or MariaDB 5.1.48) InnoDB did not always initialize all
data fields, but some garbage could be left behind in unused parts
of data pages.
In undo log pages that are essentially free, but added to a list for
reuse (TRX_UNDO_CACHED) the TRX_UNDO_TRX_NO fields could contain garbage,
instead of 0. As long as such undo pages are being reused and never
marked completely free, the garbage contents may remain forever.
In fact, the function trx_undo_header_create() and the record
MLOG_UNDO_HDR_CREATE will only initialize TRX_UNDO_TRX_ID, but leave
TRX_UNDO_TRX_NO uninitialized.
trx_undo_mem_create_at_db_start(): Only read the TRX_UNDO_TRX_NO
fields of TRX_UNDO_CACHED pages if the TRX_UNDO_PAGE_TYPE is 0,
that is, the page was updated by MariaDB Server 10.3. Earlier versions
would always write the TRX_UNDO_PAGE_TYPE as 1 or 2.
trx_undo_header_create(): Zero out the TRX_UNDO_TRX_NO field.
Strictly speaking, this will change the semantics of the
MLOG_UNDO_HDR_CREATE record, but it should not do any harm to
overwrite a potentially garbage field with zeroes.
Note: This fix will only help future upgrades straight from
MariaDB Server 10.2 or MySQL 5.6 or earlier. If such an upgrade has
already been made, then an earlier server startup could have
fast-forwarded the transaction ID sequence to a large value.
If this large value cannot be represented in 48 bits (the size of
the DB_TRX_ID column in clustered index records), then various
strange things can happen.
DEBUG_SYNC signals can get lost in certain tests due to later
DEBUG_SYNC commands overwriting them. This patch addresses
these issues in three tests: main.query_cache_debug,
main.partition_debug_sync, and
rpl.rpl_dump_request_retry_warning.
Additionally, main.partition_debug_sync needed changes to the
result file (the others did not). The synchronization happened
between two commands, one based on ALTER, the other on DROP.
A new thread/connection was needed to synchronize the DEBUG_SYNC
actions between these commands, thereby changing the result file.
Additional comments were added for clarification.
Reviewed By:
============
Andrei Elkin <andrei.elkin@mariadb.com>
- Make innodb_ft_cache_size & innodb_ft_total_cache_size are dynamic
variable and increase the maximum value of innodb_ft_cache_size to
512MB for 32-bit system and 1 TB for 64-bit system and set
innodb_ft_total_cache_size maximum value to 1 TB for 64-bit system.
- Print warning if the fts cache exceeds the innodb_ft_cache_size
and also unlock the cache if fts cache memory reduces less than
innodb_ft_cache_size.
- In 10.6, trx_rseg_t mutex was ported to use latch. As part of this porting
profiling of the patch was removed. This patch reenables it given that
the said latch continues to occupy the top-slots in the contention list.