1berry/ruby - ruby - Gitea : Git Mirror

Author	SHA1	Message	Date
Luke Gruber	97994c77fb	Only use regex internal reg_cache when in main ractor Using this `reg_cache` is racy across ractors, so don't use it when in a ractor. Also, its use across ractors can cause a regular expression created in 1 ractor to be used in another ractor (an isolation bug).	2025-06-12 13:13:18 -07:00
Luke Gruber	585dcffff1	Fix regular expressions across ractors that match different encodings In commit d42b9ffb206, an optimization was introduced that can speed up Regexp#match by 15% when it matches with strings of different encodings. This optimization, however, does not work across ractors. To fix this, we only use the optimization if no ractors have been started. In the future, we could use atomics for the reference counting if we find it's needed and if it's more performant. The backtrace of the misbehaving native thread: ``` * frame #0: 0x0000000189c94388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000189ccd88c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x0000000189bd6c60 libsystem_c.dylib`abort + 124 frame #3: 0x0000000189adb174 libsystem_malloc.dylib`malloc_vreport + 892 frame #4: 0x0000000189adec90 libsystem_malloc.dylib`malloc_report + 64 frame #5: 0x0000000189ae321c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame #6: 0x00000001001c3be4 ruby`onig_free_body(reg=0x000000012d84b660) at regcomp.c:5663:5 frame #7: 0x00000001001ba828 ruby`rb_reg_prepare_re(re=4748462304, str=4748451168) at re.c:1680:13 frame #8: 0x00000001001bac58 ruby`rb_reg_onig_match(re=4748462304, str=4748451168, match=(ruby`reg_onig_search [inlined] rbimpl_RB_TYPE_P_fastpath at value_type.h:349:14 ruby`reg_onig_search [inlined] rbimpl_rstring_getmem at rstring.h:391:5 ruby`reg_onig_search at re.c:1781:5), args=0x000000013824b168, regs=0x000000013824b150) at re.c:1708:20 frame #9: 0x00000001001baefc ruby`rb_reg_search_set_match(re=4748462304, str=4748451168, pos=<unavailable>, reverse=0, set_backref_str=1, set_match=0x0000000000000000) at re.c:1809:27 frame #10: 0x00000001001bae80 ruby`rb_reg_search0(re=<unavailable>, str=<unavailable>, pos=<unavailable>, reverse=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at re.c:1861:12 [artificial] frame #11: 0x0000000100230b90 ruby`rb_pat_search0(pat=<unavailable>, str=<unavailable>, pos=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at string.c:6619:16 [artificial] frame #12: 0x00000001002287f4 ruby`rb_str_sub_bang [inlined] rb_pat_search(pat=4748462304, str=4748451168, pos=0, set_backref_str=1) at string.c:6626:12 frame #13: 0x00000001002287dc ruby`rb_str_sub_bang(argc=1, argv=0x00000001381280d0, str=4748451168) at string.c:6668:11 frame #14: 0x000000010022826c ruby`rb_str_sub ``` You can reproduce this by running: ``` RUBY_TESTOPTS="--name=/test_str_capitalize/" make test-all TESTS=test/ruby/test_m17n.comb ``` However, you need to run it with multiple ractors at once. Co-authored-by: jhawthorn <john@hawthorn.email>	2025-06-10 09:00:17 -07:00
Peter Zhu	1cdec3240b	Fix memory leak in rb_reg_search_set_match https://github.com/ruby/ruby/pull/12801 changed regexp matches to reuse the backref, which causes memory to leak if the original registers of the match is not freed. For example, the following script leaks memory: 10.times do 1_000_000.times do "aaaaaaaaaaa".gsub(/a/, "") end puts `ps -o rss= -p #{$$}` end Before: 774256 1535152 2297360 3059280 3821296 4583552 5160304 5091456 5114256 4980192 After: 12480 11440 11696 11632 11632 11760 11824 11824 11824 11888	2025-03-11 21:55:03 -04:00
Jean Boussier	97e6ad49a4	Reuse the backref if it isn't marked as busy. [Misc #20652]	2025-02-24 18:32:46 +01:00
Jean Boussier	87f9c3c65e	String#gsub! Elide MatchData allocation when we know it can't escape In gsub is used with a string replacement or a map that doesn't have a default proc, we know for sure no code can cause the MatchData to escape the `gsub` call. In such case, we still have to allocate a new MatchData because we don't know what is the lifetime of the backref, but for any subsequent match we can re-use the MatchData we allocated ourselves, reducing allocations significantly. This partially fixes [Misc #20652], except when a block is used, and partially reduce the performance impact of abc0304cb28cb9dcc3476993bc487884c139fd11 / [Bug #17507] ``` compare-ruby: ruby 3.5.0dev (2025-02-24T09:44:57Z master 5cf146399f) +PRISM [arm64-darwin24] built-ruby: ruby 3.5.0dev (2025-02-24T10:58:27Z gsub-elude-match da966636e9) +PRISM [arm64-darwin24] warming up.... \| \|compare-ruby\|built-ruby\| \|:----------------\|-----------:\|---------:\| \|escape \| 3.577k\| 3.697k\| \| \| -\| 1.03x\| \|escape_bin \| 5.869k\| 6.743k\| \| \| -\| 1.15x\| \|escape_utf8 \| 3.448k\| 3.738k\| \| \| -\| 1.08x\| \|escape_utf8_bin \| 6.361k\| 7.267k\| \| \| -\| 1.14x\| ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>	2025-02-24 18:32:46 +01:00
Nobuyoshi Nakada	f2c9eac887	[DOC] Follow up link to heading changes The section "Special global variables" has changed: e021754db013ca9cd6dbd68b416425b32ee81490: Special Global Variables 2b4b513ef046c25c0a8d3d7b10a0566314b27099: Regexp Global Variables e50b7bf784b53ac126986dd7f9fd22ccc9b59c60: Regexp@Global+Variables	2025-01-16 15:20:28 +09:00
Stan Lo	730731cc86	Fix links to syntax/literals.rdoc	2024-12-15 15:36:08 +09:00
Alan Wu	5a570421a5	[DOC] Regexp.last_match returns `$~`, not `$!`	2024-08-09 16:02:36 -04:00
Peter Zhu	7464514ca5	Fix memory leak in String#start_with? when regexp times out [Bug #20653] This commit refactors how Onigmo handles timeout. Instead of raising a timeout error, onig_search will return a ONIGERR_TIMEOUT which the caller can free memory, and then raise a timeout error. This fixes a memory leak in String#start_with when the regexp times out. For example: regex = Regexp.new("^#{"(a)" 10_000}x$", timeout: 0.000001) str = "a" * 1000000 + "x" 10.times do 100.times do str.start_with?(regex) rescue end puts `ps -o rss= -p #{$$}` end Before: 33216 51936 71152 81728 97152 103248 120384 133392 133520 133616 After: 14912 15376 15824 15824 16128 16128 16144 16144 16160 16160	2024-07-26 08:42:38 -04:00
Shugo Maeda	e048a073a3	Add MatchData#bytebegin and MatchData#byteend These methods return the byte-based offset of the beginning or end of the specified match. [Feature #20576]	2024-07-16 14:48:06 +09:00
Jean Boussier	3a7846b1aa	Add a hint of `ASCII-8BIT` being `BINARY` [Feature #18576] Since outright renaming `ASCII-8BIT` is deemed to backward incompatible, the next best thing would be to only change its `#inspect`, particularly in exception messages.	2024-04-18 10:17:26 +02:00
Peter Zhu	01bfd1a2bf	Fix memory leak in OnigRegion when match raises [Bug #20228] rb_reg_onig_match can raise a Regexp::TimeoutError, which would cause the OnigRegion to leak.	2024-02-02 10:39:42 -05:00
Peter Zhu	1c120efe02	Fix memory leak in stk_base when Regexp timeout [Bug #20228] If rb_reg_check_timeout raises a Regexp::TimeoutError, then the stk_base will leak.	2024-02-02 10:39:42 -05:00
git	5b6167c252	* expand tabs. [ci skip] Please consider using misc/expand_tabs.rb as a pre-commit hook.	2024-01-07 15:50:59 +00:00
Nobuyoshi Nakada	c30b8ae947	Adjust styles and indents [ci skip]	2024-01-08 00:50:41 +09:00
Luke Gruber	e12d4c654e	Don't create T_MATCH object if /regexp/.match(string) doesn't match Fixes [Bug #20104]	2024-01-01 13:28:26 -08:00
Peter Zhu	f0efeddd41	Fix Regexp#inspect for GC compaction rb_reg_desc was not safe for GC compaction because it took in the C string and length but not the backing String object so it get moved during compaction. This commit changes rb_reg_desc to use the string from the Regexp object. The test fails when RGENGC_CHECK_MODE is turned on: TestRegexp#test_inspect_under_gc_compact_stress [test/ruby/test_regexp.rb:474]: <"(?-mix:\\/)\|"> expected but was <"/\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00/">.	2023-12-24 11:04:41 -05:00
Peter Zhu	42442ed789	Fix Regexp#match for GC compaction The test fails when RGENGC_CHECK_MODE is turned on: TestRegexp#test_match_under_gc_compact_stress: NoMethodError: undefined method `match' for nil test_regexp.rb:878:in `block in test_match_under_gc_compact_stress'	2023-12-24 09:03:55 -05:00
Peter Zhu	fadda88903	Fix Regexp#to_s for GC compaction The test fails when RGENGC_CHECK_MODE is turned on: TestRegexp#test_to_s_under_gc_compact_stress = 13.46 s 1) Failure: TestRegexp#test_to_s_under_gc_compact_stress [test/ruby/test_regexp.rb:81]: <"(?-mix:abcd\u3042)"> expected but was <"(?-mix:\u5C78\u3030\u5C78\u3030\u5C78\u3030\u5C78\u3030\u5C78\u3030)">.	2023-12-23 16:52:05 -05:00
Nobuyoshi Nakada	dee45ac231	[DOC] State MatchData#[] when multiple captures with the same name	2023-12-19 13:48:51 +09:00
Victor Shepelev	570d7b2c3e	[DOC] Adjust some new features wording/examples. (#9183 ) * Reword Range#overlap? docs last paragraph. * Docs: add explanation about Queue#freeze * Docs: Add :rescue event docs for TracePoint * Docs: Enhance Module#set_temporary_name documentation * Docs: Slightly expand Process::Status deprecations * Fix MatchData#named_captures rendering glitch * Improve Dir.fchdir examples * Adjust Refinement#target docs	2023-12-14 23:01:48 +02:00
Dustin Brown	d89280e8bf	Copy encoding flags when copying a regex [Bug #20039 ] * 🐛 Fixes [Bug #20039](https://bugs.ruby-lang.org/issues/20039) When a Regexp is initialized with another Regexp, we simply copy the properties from the original. However, the flags on the original were not being copied correctly. This caused an issue when the original had multibyte characters and was being compared with an ASCII string. Without the forced encoding flag (`KCODE_FIXED`) transferred on to the new Regexp, the comparison would fail. See the included test for an example. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>	2023-12-06 19:25:29 -08:00
Nobuyoshi Nakada	caa9881fde	[DOC] Fix doc/regexp.rdoc links - Rename regexp.rdoc to exclude from "Pages". This file is for to be included in the "class Regexp" document, but it also appeared as a separate page duplicately. - Fix links on case-sensitive filesystems. - Fix to use rdoc-ref instead of converted HTML page names.	2023-11-14 15:56:57 +09:00
Herwin	8b3d044004	[DOC] Indentation fix in comments of MatchData#inspect The old version did not add syntax highlighting to the code block, and included the "Related:" line in the code block as well.	2023-10-20 18:26:37 +09:00
Herwin	3467355450	[DOC] Fix typo in docs of Regexp#deconstruct_keys of => if	2023-10-20 07:18:03 +09:00
Peter Zhu	d42b9ffb20	Reuse Regexp ptr when recompiling When matching an incompatible encoding, the Regexp needs to recompile. If `usecnt == 0`, then we can reuse the `ptr` because nothing else is using it. This avoids allocating another `regex_t`. This speeds up matches that switch to incompatible encodings by 15%. Branch: ``` Regex#match? with different encoding 1.431M (± 1.3%) i/s - 7.264M in 5.076153s Regex#match? with same encoding 16.858M (± 1.1%) i/s - 85.347M in 5.063279s ``` Base: ``` Regex#match? with different encoding 1.248M (± 2.0%) i/s - 6.342M in 5.083151s Regex#match? with same encoding 16.377M (± 1.1%) i/s - 82.519M in 5.039504s ``` Script: ``` regex = /foo/ str1 = "日本語" str2 = "English".force_encoding("ASCII-8BIT") Benchmark.ips do \|x\| x.report("Regex#match? with different encoding") do \|times\| i = 0 while i < times regex.match?(str1) regex.match?(str2) i += 1 end end x.report("Regex#match? with same encoding") do \|times\| i = 0 while i < times regex.match?(str1) i += 1 end end end ```	2023-07-31 09:17:18 -04:00
Takashi Kokubun	9721972175	Resurrect rb_reg_prepare_re C API Existing strscan releases rely on this C API. It means that the current Ruby master doesn't work if your Gemfile.lock has strscan unless it's locked to 3.0.7, which is not released yet. To fix it, let's not remove the C API we've exposed to users.	2023-07-27 15:30:10 -07:00
Peter Zhu	69b20d1196	Don't load RREGEXP_PTR twice	2023-07-27 14:41:12 -04:00
Peter Zhu	511c51e116	Refactor err string in rb_reg_prepare_re	2023-07-27 14:04:02 -04:00
Peter Zhu	7193b404a1	Add function rb_reg_onig_match rb_reg_onig_match performs preparation, error handling, and cleanup for matching a regex against a string. This reduces repetitive code and removes the need for StringScanner to access internal data of regex.	2023-07-27 13:33:40 -04:00
Kunshan Wang	639aa76e82	Embed struct rmatch into GC slot (#8097 )	2023-07-20 14:17:38 -04:00
Nobuyoshi Nakada	913e01e80e	Stop allocating unused backref strings at `defined?`	2023-06-27 23:14:10 +09:00
Nobuyoshi Nakada	df5ae0a550	Use `rb_reg_nth_defined` instead of `rb_match_nth_defined`	2023-06-27 22:39:15 +09:00
Burdette Lamar	932dd9f10e	[DOC] Regexp doc (#7923 )	2023-06-20 09:28:21 -04:00
git	d7300038e4	* expand tabs. [ci skip] Please consider using misc/expand_tabs.rb as a pre-commit hook.	2023-06-09 12:45:58 +00:00
Nobuyoshi Nakada	ab6eb3786c	Optimize `Regexp#dup` and `Regexp.new(/RE/)` When copying from another regexp, copy already built `regex_t` instead of re-compiling its source.	2023-06-09 20:22:30 +09:00
Jeremy Evans	a8ba1ddd78	Use UTF-8 encoding for literal extended regexps with UTF-8 characters in comments Fixes [Bug #19455]	2023-04-23 19:27:58 -07:00
Vladimir Dementyev	b09f5c7bf7	MatchData#named_captures: add optional symbolize_names keyword (#6952 )	2023-04-19 11:19:31 +12:00
Matt Valentine-House	026321c5b9	[Feature #19474 ] Refactor NEWOBJ macros NEWOBJ_OF is now our canonical newobj macro. It takes an optional ec	2023-04-06 11:07:16 +01:00
Takashi Kokubun	233ddfac54	Stop exporting symbols for MJIT	2023-03-06 21:59:23 -08:00
Nobuyoshi Nakada	a5310e609d	[DOC] Fix options of `Regexp#initialize` `Integer#\|` is bit-wise OR operator, not logical OR.	2023-03-06 13:57:17 +09:00
Nobuyoshi Nakada	8ee604b9d4	`rb_scan_args` never fills optional arguments with `Qundef`	2023-03-06 13:57:17 +09:00
Nobuyoshi Nakada	680bd9027f	[Bug #19471 ] `Regexp.compile` should handle keyword arguments As well as `Regexp.new`, it should pass keyword arguments to the `Regexp#initialize` method.	2023-03-03 15:27:37 +09:00
Jeremy Evans	04cfb26bd3	Remove support for the Regexp.new 3rd argument This was deprecated in Ruby 3.2. Fixes [Bug #18797]	2023-03-01 23:42:47 -08:00
Nobuyoshi Nakada	ef00c6da88	Adjust `else` style to be consistent in each files [ci skip]	2023-02-26 13:20:43 +09:00
BurdetteLamar	3b239d2480	Remove (newly unneeded) remarks about aliases	2023-02-19 14:26:34 -08:00
Jean Boussier	46298955e4	Implement Write Barrier for RMatch objects They only have two references.	2023-02-10 16:12:22 +01:00
OKURA Masafumi	11e0f62148	[DOC] Fix typo in document of regexp [ci skip]	2023-02-10 18:32:21 +09:00
Nobuyoshi Nakada	b49cd84311	Remove `REG_LITERAL` flag All `Regexp` literals are frozen now.	2023-02-09 19:21:24 +09:00
Jeremy Evans	eccfc978fd	Fix parsing of regexps that toggle extended mode on/off inside regexp This was broken in ec3542229b29ec93062e9d90e877ea29d3c19472. That commit didn't handle cases where extended mode was turned on/off inside the regexp. There are two ways to turn extended mode on/off: ``` /(?-x:#y)#z /x =~ '#y' /(?-x)#y(?x)#z /x =~ '#y' ``` These can be nested inside the same regexp: ``` /(?-x:(?x)#x (?-x)#y)#z /x =~ '#y' ``` As you can probably imagine, this makes handling these regexps somewhat complex. Due to the nesting inside portions of regexps, the unassign_nonascii function needs to be recursive. In recursive mode, it needs to track both opening and closing parentheses, similar to how it already tracked opening and closing brackets for character classes. When scanning the regexp and coming to `(?` not followed by `#`, scan for options, and use `x` and `i` to determine whether to turn on or off extended mode. For `:`, indicting only the current regexp section should have the extended mode switched, recurse with the extended mode set or unset. For `)`, indicating the remainder of the regexp (or current regexp portion if already recursing) should turn extended mode on or off, just change the extended mode flag and keep scanning. While testing this, I noticed that `a`, `d`, and `u` are accepted as options, in addition to `i`, `m`, and `x`, but I can't see where those options are documented. I'm not sure whether or not handling `a`, `d`, and `u` as options is a bug. Fixes [Bug #19379]	2023-01-30 08:51:12 -08:00

1 2 3 4 5 ...

656 Commits