gh-134004: Added the reorganize() methods to dbm.sqlite, dbm.dumb and shelve (GH-134028)

They are similar to the same named method in dbm.gnu.
This commit is contained in:
Andrea-Oliveri 2025-06-01 14:30:04 +02:00 committed by GitHub
parent b595237166
commit f806463e16
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 172 additions and 6 deletions

View File

@ -15,10 +15,16 @@
* :mod:`dbm.ndbm`
If none of these modules are installed, the
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
is a `third party interface <https://www.jcea.es/programacion/pybsddb.htm>`_ to
the Oracle Berkeley DB.
.. note::
None of the underlying modules will automatically shrink the disk space used by
the database file. However, :mod:`dbm.sqlite3`, :mod:`dbm.gnu` and :mod:`dbm.dumb`
provide a :meth:`!reorganize` method that can be used for this purpose.
.. exception:: error
A tuple containing the exceptions that can be raised by each of the supported
@ -186,6 +192,17 @@ or any other SQLite browser, including the SQLite CLI.
The Unix file access mode of the file (default: octal ``0o666``),
used only when the database has to be created.
.. method:: sqlite3.reorganize()
If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
space will be kept and reused as new (key, value) pairs are added.
.. note::
While reorganizing, as much as two times the size of the original database is required
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.
.. versionadded:: next
:mod:`dbm.gnu` --- GNU database manager
---------------------------------------
@ -284,6 +301,10 @@ functionality like crash tolerance.
reorganization; otherwise, deleted file space will be kept and reused as new
(key, value) pairs are added.
.. note::
While reorganizing, as much as one time the size of the original database is required
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.
.. method:: gdbm.sync()
When the database has been opened in fast mode, this method forces any
@ -438,6 +459,11 @@ The :mod:`!dbm.dumb` module defines the following:
with a sufficiently large/complex entry due to stack depth limitations in
Python's AST compiler.
.. warning::
:mod:`dbm.dumb` does not support concurrent read/write access. (Multiple
simultaneous read accesses are safe.) When a program has the database open
for writing, no other program should have it open for reading or writing.
.. versionchanged:: 3.5
:func:`~dbm.dumb.open` always creates a new database when *flag* is ``'n'``.
@ -460,3 +486,15 @@ The :mod:`!dbm.dumb` module defines the following:
.. method:: dumbdbm.close()
Close the database.
.. method:: dumbdbm.reorganize()
If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
space will not be reused.
.. note::
While reorganizing, no additional free disk space is required. However, be aware
that this factor changes for each :mod:`dbm` submodule.
.. versionadded:: next

View File

@ -75,8 +75,15 @@ Two additional methods are supported:
Write back all entries in the cache if the shelf was opened with *writeback*
set to :const:`True`. Also empty the cache and synchronize the persistent
dictionary on disk, if feasible. This is called automatically when the shelf
is closed with :meth:`close`.
dictionary on disk, if feasible. This is called automatically when
:meth:`reorganize` is called or the shelf is closed with :meth:`close`.
.. method:: Shelf.reorganize()
Calls :meth:`sync` and attempts to shrink space used on disk by removing empty
space resulting from deletions.
.. versionadded:: next
.. method:: Shelf.close()
@ -116,6 +123,11 @@ Restrictions
* On macOS :mod:`dbm.ndbm` can silently corrupt the database file on updates,
which can cause hard crashes when trying to read from the database.
* :meth:`Shelf.reorganize` may not be available for all database packages and
may temporarely increase resource usage (especially disk space) when called.
Additionally, it will never run automatically and instead needs to be called
explicitly.
.. class:: Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')

View File

@ -89,6 +89,14 @@ New modules
Improved modules
================
dbm
---
* Added new :meth:`!reorganize` methods to :mod:`dbm.dumb` and :mod:`dbm.sqlite3`
which allow to recover unused free space previously occupied by deleted entries.
(Contributed by Andrea Oliveri in :gh:`134004`.)
difflib
-------
@ -96,6 +104,15 @@ difflib
class, and migrated the output to the HTML5 standard.
(Contributed by Jiahao Li in :gh:`134580`.)
shelve
------
* Added new :meth:`!reorganize` method to :mod:`shelve` used to recover unused free
space previously occupied by deleted entries.
(Contributed by Andrea Oliveri in :gh:`134004`.)
ssl
---

View File

@ -9,7 +9,7 @@ XXX TO DO:
- seems to contain a bug when updating...
- reclaim free space (currently, space once occupied by deleted or expanded
items is never reused)
items is not reused exept if .reorganize() is called)
- support concurrent access (currently, if two processes take turns making
updates, they can mess up the index)
@ -17,8 +17,6 @@ updates, they can mess up the index)
- support efficient access to large databases (currently, the whole index
is read when the database is opened, and some updates rewrite the whole index)
- support opening for read-only (flag = 'm')
"""
import ast as _ast
@ -289,6 +287,34 @@ class _Database(collections.abc.MutableMapping):
def __exit__(self, *args):
self.close()
def reorganize(self):
if self._readonly:
raise error('The database is opened for reading only')
self._verify_open()
# Ensure all changes are committed before reorganizing.
self._commit()
# Open file in r+ to allow changing in-place.
with _io.open(self._datfile, 'rb+') as f:
reorganize_pos = 0
# Iterate over existing keys, sorted by starting byte.
for key in sorted(self._index, key = lambda k: self._index[k][0]):
pos, siz = self._index[key]
f.seek(pos)
val = f.read(siz)
f.seek(reorganize_pos)
f.write(val)
self._index[key] = (reorganize_pos, siz)
blocks_occupied = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
reorganize_pos += blocks_occupied * _BLOCKSIZE
f.truncate(reorganize_pos)
# Commit changes to index, which were not in-place.
self._commit()
def open(file, flag='c', mode=0o666):
"""Open the database file, filename, and return corresponding object.

View File

@ -15,6 +15,7 @@ LOOKUP_KEY = "SELECT value FROM Dict WHERE key = CAST(? AS BLOB)"
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
ITER_KEYS = "SELECT key FROM Dict"
REORGANIZE = "VACUUM"
class error(OSError):
@ -122,6 +123,9 @@ class _Database(MutableMapping):
def __exit__(self, *args):
self.close()
def reorganize(self):
self._execute(REORGANIZE)
def open(filename, /, flag="r", mode=0o666):
"""Open a dbm.sqlite3 database and return the dbm object.

View File

@ -171,6 +171,11 @@ class Shelf(collections.abc.MutableMapping):
if hasattr(self.dict, 'sync'):
self.dict.sync()
def reorganize(self):
self.sync()
if hasattr(self.dict, 'reorganize'):
self.dict.reorganize()
class BsdDbShelf(Shelf):
"""Shelf implementation using the "BSD" db interface.

View File

@ -135,6 +135,67 @@ class AnyDBMTestCase:
assert(f[key] == b"Python:")
f.close()
def test_anydbm_readonly_reorganize(self):
self.init_db()
with dbm.open(_fname, 'r') as d:
# Early stopping.
if not hasattr(d, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")
self.assertRaises(dbm.error, lambda: d.reorganize())
def test_anydbm_reorganize_not_changed_content(self):
self.init_db()
with dbm.open(_fname, 'c') as d:
# Early stopping.
if not hasattr(d, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")
keys_before = sorted(d.keys())
values_before = [d[k] for k in keys_before]
d.reorganize()
keys_after = sorted(d.keys())
values_after = [d[k] for k in keys_before]
self.assertEqual(keys_before, keys_after)
self.assertEqual(values_before, values_after)
def test_anydbm_reorganize_decreased_size(self):
def _calculate_db_size(db_path):
if os.path.isfile(db_path):
return os.path.getsize(db_path)
total_size = 0
for root, _, filenames in os.walk(db_path):
for filename in filenames:
file_path = os.path.join(root, filename)
total_size += os.path.getsize(file_path)
return total_size
# This test requires relatively large databases to reliably show difference in size before and after reorganizing.
with dbm.open(_fname, 'n') as f:
# Early stopping.
if not hasattr(f, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")
for k in self._dict:
f[k.encode('ascii')] = self._dict[k] * 100000
db_keys = list(f.keys())
# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_before = _calculate_db_size(os.path.dirname(_fname))
# Delete some elements from the start of the database.
keys_to_delete = db_keys[:len(db_keys) // 2]
with dbm.open(_fname, 'c') as f:
for k in keys_to_delete:
del f[k]
f.reorganize()
# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_after = _calculate_db_size(os.path.dirname(_fname))
self.assertLess(size_after, size_before)
def test_open_with_bytes(self):
dbm.open(os.fsencode(_fname), "c").close()

View File

@ -1365,6 +1365,7 @@ Milan Oberkirch
Pascal Oberndoerfer
Géry Ogam
Seonkyo Ok
Andrea Oliveri
Jeffrey Ollie
Adam Olsen
Bryan Olson

View File

@ -0,0 +1,2 @@
:mod:`shelve` as well as underlying :mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!reorganize` methods to
recover unused free space previously occupied by deleted entries.