1739 Commits

Author SHA1 Message Date
Petr Viktorin
e413e26719
gh-134891: Add PyUnstable_Unicode_GET_CACHED_HASH (GH-134892) 2025-06-06 15:51:00 +02:00
Victor Stinner
f49a07b531
gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973)
Replace most PyUnicodeWriter_WriteUTF8() calls with
PyUnicodeWriter_WriteASCII().

Unrelated change to please the linter: remove an unused
import in test_ctypes.

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-05-29 14:54:30 +00:00
Victor Stinner
fe9f6e829a
gh-133968: Add fast path to PyUnicodeWriter_WriteStr() (#133969)
Don't call PyObject_Str() if the input type is str.
2025-05-13 15:31:41 +02:00
Serhiy Storchaka
9f69a58623
gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler (GH-129648)
If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
2025-05-12 20:42:23 +03:00
Stan Ulbrych
4fd1095280
gh-133610: Remove PyUnicode_AsDecoded/Encoded functions (#133612) 2025-05-09 17:31:24 +02:00
Petr Viktorin
987e45e632
gh-128972: Add _Py_ALIGN_AS and revert PyASCIIObject memory layout. (GH-133085)
Add `_Py_ALIGN_AS` as per C API WG vote: https://github.com/capi-workgroup/decisions/issues/61
This patch only adds it to free-threaded builds; the `#ifdef Py_GIL_DISABLED`
can be removed in the future.

Use this to revert `PyASCIIObject` memory layout for non-free-threaded builds.
The long-term plan is to deprecate the entire struct; until that happens
it's better to keep it unchanged, as courtesy to people that rely on it despite
it not being stable ABI.
2025-05-02 18:30:40 +02:00
Lysandros Nikolaou
60202609a2
gh-132661: Implement PEP 750 (#132662)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Wingy <git@wingysam.xyz>
Co-authored-by: Koudai Aono <koxudaxi@gmail.com>
Co-authored-by: Dave Peck <davepeck@gmail.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Paul Everitt <pauleveritt@me.com>
Co-authored-by: sobolevn <mail@sobolevn.me>
2025-04-30 11:46:41 +02:00
Donghee Na
75cbb8d89e
gh-132070: Use _PyObject_IsUniquelyReferenced in unicodeobject (gh-133039)
---------

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
2025-04-29 09:48:53 +09:00
Stan Ulbrych
f6fb498c97
gh-132798: Schedule removal of PyUnicode_AsDecoded/Encoded functions for 3.15 (#132799)
Co-authored-by: Victor Stinner <vstinner@python.org>
2025-04-25 15:07:41 +02:00
Jon Crall
fc0ec29889
gh-103997: Automatically dedent the argument to "-c" (#103998)
Co-authored-by: sunmy2019 <59365878+sunmy2019@users.noreply.github.com>
Co-authored-by: Kirill Podoprigora <80244920+Eclips4@users.noreply.github.com>
Co-authored-by: Inada Naoki <songofacandy@gmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
2025-04-18 17:39:30 +09:00
Bénédikt Tran
edbf7fb129
gh-111178: remove redundant casts for functions with correct signatures (#131673) 2025-04-01 17:18:11 +02:00
Victor Stinner
22706843e0
gh-131238: Remove many includes from pycore_interp.h (#131472) 2025-03-19 17:46:24 +00:00
Mark Shannon
a1aeec61c4
GH-131238: Core header refactor (GH-131250)
* Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h

* Removes many cross-header dependencies
2025-03-17 09:19:04 +00:00
Mark Shannon
f30376c650
GH-127705: Fix _Py_RefcntAdd to handle objects becoming immortal (GH-131140) 2025-03-12 16:54:10 +00:00
Victor Stinner
ed8675c571
gh-111178: Fix function signatures of unicodeiter (#130684) 2025-03-04 10:33:09 +01:00
Sergey Miryanov
3a7f17c7e2
gh-130790: Remove references about unicode's readiness from comments (#130801) 2025-03-03 19:18:09 +00:00
Sergey B Kirpichev
f39a07be47
gh-87790: support thousands separators for formatting fractional part of floats (#125304)
```pycon
>>> f"{123_456.123_456:_._f}"  # Whole and fractional
'123_456.123_456'
>>> f"{123_456.123_456:_f}"    # Integer component only
'123_456.123456'
>>> f"{123_456.123_456:._f}"   # Fractional component only
'123456.123_456'
>>> f"{123_456.123_456:.4_f}"  # with precision
'123456.1_235'
```
2025-02-25 16:27:07 +01:00
Sam Gross
b9d2ee687c
gh-129701: Fix a data race in intern_common in the free threaded build (GH-130089)
* gh-129701: Fix a data race in `intern_common` in the free threaded build

* Use a mutex to avoid potentially returning a non-immortalized string,
  because immortalization happens after the insertion into the interned
  dict.

* Use `Py_DECREF()` calls instead of `Py_SET_REFCNT(s, Py_REFCNT(s) - 2)`
  for thread-safety. This code path isn't performance sensistive, so
  just use `Py_DECREF()` unconditionally for simplicity.
2025-02-17 14:15:40 +01:00
Stan Ulbrych
3402e133ef
gh-82045: Correct and deduplicate "isprintable" docs; add test. (GH-130118)
We had the definition of what makes a character "printable" documented in three places, giving two different definitions.
The definition in the comment on `_PyUnicode_IsPrintable` was inverted; correct that.

With that correction, the two definitions turn out to be equivalent -- but to confirm that, you have to go look up, or happen to know, that those are the only five "Other" categories and only three "Separator" categories in the Unicode character database.  That makes it hard for the reader to tell whether they really are the same, or if there's some subtle difference in the intended semantics.

Fix that by cutting the C API docs' and the C comment's copies of the subtle details, in favor of referring to the Python-level docs. That ensures it's explicit that these are all meant to agree, and also lets us concentrate improvements to the wording in one place.

Speaking of which, borrow some ideas from the C comment, along with other tweaks, to hopefully add a bit more clarity to that one newly-centralized copy in the docs.

Also add a thorough test that the implementation agrees with this definition.

Author:    Greg Price <gnprice@gmail.com>

Co-authored-by: Greg Price <gnprice@gmail.com>
2025-02-14 18:16:47 +01:00
Victor Stinner
0373926260
gh-129354: Use PyErr_FormatUnraisable() function (#129511)
Replace PyErr_WriteUnraisable() with PyErr_FormatUnraisable().
2025-01-31 13:16:08 +01:00
Victor Stinner
a810cb89f1
gh-89188: Implement PyUnicode_KIND() as a function (#129412)
Implement PyUnicode_KIND() and PyUnicode_DATA() as function, in
addition to the macros with the same names. The macros rely on C bit
fields which have compiler-specific layout.
2025-01-30 11:27:27 +00:00
Umar Butler
8d8b854824
gh-128016: Improved invalid escape sequence warning message (#128020) 2025-01-15 18:00:54 +01:00
Donghee Na
ae23a012e6
gh-128137: Update PyASCIIObject to handle interned field with the atomic operation (gh-128196) 2025-01-05 18:17:06 +09:00
Alexander Shadchin
46cb6340d7
gh-127903: Fix a crash on debug builds when calling Objects/unicodeobject::_copy_characters` (#127876) 2025-01-03 18:47:58 +00:00
Sam Gross
8eebe4e6d0
gh-128212: Fix race in _PyUnicode_CheckConsistency (GH-128367)
There was a data race on the utf8 field between `PyUnicode_SET_UTF8` and
`_PyUnicode_CheckConsistency`. Use the `_PyUnicode_UTF8()` accessor,
which uses an atomic load internally, to avoid the data race.
2025-01-02 14:02:54 -05:00
Kumar Aditya
3c168f7f79
gh-128013: fix data race in PyUnicode_AsUTF8AndSize on free-threading (#128021) 2024-12-19 17:08:32 +05:30
Victor Stinner
f802c8bf87
gh-128013: Convert unicodeobject.c macros to functions (#128061)
Convert unicodeobject.c macros to static inline functions.

* Add _PyUnicode_SET_UTF8() and _PyUnicode_SET_UTF8_LENGTH() macros.
* Add PyUnicode_HASH() and PyUnicode_SET_HASH() macros.
* Remove unused _PyUnicode_KIND() and _PyUnicode_GET_LENGTH() macros.
2024-12-18 16:34:31 +01:00
Inada Naoki
5dd775bed0
gh-126024: unicodeobject: optimize find_first_nonascii (GH-127790)
Remove 1 branch.
2024-12-13 17:21:46 +01:00
Bénédikt Tran
36c6178d37
gh-126024: fix UBSan failure in unicodeobject.c:find_first_nonascii (GH-127566) 2024-12-06 09:31:30 -05:00
Victor Stinner
bf21e2160d
Fix Unicode encode_wstr_utf8() (#127420)
Raise RuntimeError instead of RuntimeWarning.
2024-12-02 11:14:47 +01:00
Inada Naoki
7043bbd1ca
gh-127417: fix UTF-8 decoder optimization on AIX (#127433) 2024-11-30 21:52:37 +09:00
Inada Naoki
322b486010
gh-126024: optimize UTF-8 decoder for short non-ASCII string (#126025) 2024-11-29 19:48:02 +09:00
Pablo Galindo Salgado
30aeb00d36
gh-126076: Account for relocated objects in tracemalloc (#126077) 2024-11-19 10:35:17 +00:00
Eric Snow
6f26d496d3
gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709)
They used to be shared, before 3.12.  Returning to sharing them resolves a failure on Py_TRACE_REFS builds.

Co-authored-by: Petr Viktorin <encukou@gmail.com>
2024-10-23 10:10:06 -06:00
Victor Stinner
1639d934b9
gh-125196: Add a free list to PyUnicodeWriter (#125227) 2024-10-10 12:11:06 +02:00
Victor Stinner
1b2a5485f9
gh-125196: PyUnicodeWriter_Discard(NULL) does nothing (#125222) 2024-10-09 23:32:02 +00:00
Victor Stinner
ee3167b978
gh-125196: Add fast-path for int in PyUnicodeWriter_WriteStr() (#125214)
PyUnicodeWriter_WriteStr() and PyUnicodeWriter_WriteRepr() now call
directly _PyLong_FormatWriter() if the argument is an int.
2024-10-10 00:01:02 +02:00
Eric Snow
f2cb399470
gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865)
Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.

This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785.
2024-10-09 11:32:16 -06:00
Victor Stinner
e0c87c64b1
gh-124502: Remove _PyUnicode_EQ() function (#125114)
* Replace unicode_compare_eq() with unicode_eq().
* Use unicode_eq() in setobject.c.
* Replace _PyUnicode_EQ() with _PyUnicode_Equal().
* Remove unicode_compare_eq() and _PyUnicode_EQ().
2024-10-09 10:15:17 +02:00
Victor Stinner
a7f0727ca5
gh-124502: Add PyUnicode_Equal() function (#124504) 2024-10-07 21:24:53 +00:00
T. Wouters
7bdfabe2d1
gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807)
Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)"

This reverts commit 98b2ed7e239c807f379cd2bf864f372d79064aac.
2024-09-30 16:41:46 -07:00
Neil Schemenauer
98b2ed7e23
gh-116510: Fix crash due to shared immortal interned strings. (gh-124646) 2024-09-26 19:16:51 -07:00
Petr Viktorin
da5855e99a
gh-112301: Use literal format strings in unicode_fromformat_arg (GH-124203) 2024-09-25 19:46:01 +02:00
Victor Stinner
ef9d54703f
gh-107954, PEP 741: Add PyInitConfig C API (#123502)
Add Doc/c-api/config.rst documentation.
2024-09-03 12:33:49 +00:00
Victor Stinner
d8e69b2c1b
gh-122854: Add Py_HashBuffer() function (#122855) 2024-08-30 15:42:27 +00:00
Serhiy Storchaka
1a0b828994
gh-122561: Clean up and microoptimize str.translate and charmap codec (GH-122932)
* Replace PyLong_AS_LONG() with PyLong_AsLong().
* Call PyLong_AsLong() only once per the replacement code.
* Use PyMapping_GetOptionalItem() instead of PyObject_GetItem().
2024-08-28 12:11:13 +03:00
Eddie Elizondo
3203a74129
gh-113190: Reenable non-debug interned string cleanup (GH-113601) 2024-08-15 11:55:09 +00:00
Jelle Zijlstra
53ebb6232a
gh-122888: Fix crash on certain calls to str() (#122889)
Fixes #122888
2024-08-12 09:20:09 -07:00
Victor Stinner
fda6bd842a
Replace PyObject_Del with PyObject_Free (#122453)
PyObject_Del() is just a alias to PyObject_Free() kept for backward
compatibility. Use directly PyObject_Free() instead.
2024-08-01 14:12:33 +02:00
Petr Viktorin
bb09ba6792
gh-122291: Intern latin-1 one-byte strings at startup (GH-122303) 2024-07-27 10:27:06 +02:00