143 Commits

Author SHA1 Message Date
Nobuyoshi Nakada
2e3f81838c
Align styles [ci skip] 2025-05-15 17:48:40 +09:00
Hiroya Fujinami
18f8c514ea
Fix memoization for the /(...){0}/ case (#13169)
In this case, the previous implementation counted an extra number of
opcodes to cache and the matching was unstable on memoization.

This patch is to fix that problem by not counting an number of opcodes
to cache in the parentheses of `(...){0}`.
2025-04-24 12:03:24 +00:00
Daniel Colson
29b26fd3e7 Fix macro for disabled match cache
The `MEMOIZE_LOOKAROUND_MATCH_CACHE_POINT` macro needs an argument
otherwise we end up with:

```
../regexec.c:3955:2: error: called object type 'void' is not a function or function pointer
 3955 |         STACK_POS_END(stkp);
      |         ^~~~~~~~~~~~~~~~~~~
../regexec.c:1680:41: note: expanded from macro 'STACK_POS_END'
 1680 |     MEMOIZE_LOOKAROUND_MATCH_CACHE_POINT(k);\
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
../regexec.c:3969:7: error: called object type 'void' is not a function or function pointer
 3969 |       STACK_POP_TIL_POS_NOT;
      |       ^~~~~~~~~~~~~~~~~~~~~
../regexec.c:1616:41: note: expanded from macro 'STACK_POP_TIL_POS_NOT'
 1616 |     MEMOIZE_LOOKAROUND_MATCH_CACHE_POINT(stk);\
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
```

The macro definition with the match cache enabled already has the
correct argument. This one is for when the match cache is disabled (I
had disabled it while trying to learn more about how it works.)
2025-04-13 11:44:49 +09:00
John Hawthorn
8409edc497 Fix regex timeout double-free after stack_double
As of 10574857ce167869524b97ee862b610928f6272f, it's possible to crash
on a double free due to `stk_alloc` AKA `msa->stack_p` being freed
twice, once at the end of match_at and a second time in `FREE_MATCH_ARG`
in the parent caller.

Fixes [Bug #20886]
2024-11-11 23:33:21 -08:00
kojix2
550ac2f2ed
[DOC] Fix typos 2024-10-31 12:44:50 +09:00
Nobuyoshi Nakada
c94ea1cccb
Fix size modifier for size_t 2024-09-25 10:40:14 +09:00
Peter Zhu
7464514ca5 Fix memory leak in String#start_with? when regexp times out
[Bug #20653]

This commit refactors how Onigmo handles timeout. Instead of raising a
timeout error, onig_search will return a ONIGERR_TIMEOUT which the
caller can free memory, and then raise a timeout error.

This fixes a memory leak in String#start_with when the regexp times out.
For example:

    regex = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
    str = "a" * 1000000 + "x"

    10.times do
      100.times do
        str.start_with?(regex)
      rescue
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    33216
    51936
    71152
    81728
    97152
    103248
    120384
    133392
    133520
    133616

After:

    14912
    15376
    15824
    15824
    16128
    16128
    16144
    16144
    16160
    16160
2024-07-26 08:42:38 -04:00
Peter Zhu
10574857ce Fix memory leak in Regexp capture group when timeout
[Bug #20650]

The capture group allocates memory that is leaked when it times out.

For example:

    re = Regexp.new("^#{"(a*)" * 10_000}x$", timeout: 0.000001)
    str = "a" * 1000000 + "x"

    10.times do
      100.times do
        re =~ str
      rescue Regexp::TimeoutError
      end

      puts `ps -o rss= -p #{$$}`
    end

Before:

    34688
    56416
    78288
    100368
    120784
    140704
    161904
    183568
    204320
    224800

After:

    16288
    16288
    16880
    16896
    16912
    16928
    16944
    17184
    17184
    17200
2024-07-25 09:23:49 -04:00
Daniel Colson
d292a9b98c [Bug #20453] segfault in Regexp timeout
https://bugs.ruby-lang.org/issues/20228 started freeing `stk_base` to
avoid a memory leak. But `stk_base` is sometimes stack allocated (using
`xalloca`), so the free only works if the regex stack has grown enough
to hit `stack_double` (which uses `xmalloc` and `xrealloc`).

To reproduce the problem on master and 3.3.1:

```ruby
Regexp.timeout = 0.001
/^(a*)x$/ =~ "a" * 1000000 + "x"'
```

Some details about this potential fix:

`stk_base == stk_alloc` on
[init](dde99215f2/regexec.c (L1153)),
so if `stk_base != stk_alloc` we can be sure we called
[`stack_double`](dde99215f2/regexec.c (L1210))
and it's safe to free. It's also safe to free if we've
[saved](dde99215f2/regexec.c (L1187-L1189))
the stack to `msa->stack_p`, since we do the `stk_base != stk_alloc`
check before saving.

This matches the check we do inside
[`stack_double`](dde99215f2/regexec.c (L1221))
2024-04-25 10:28:18 -04:00
Hiroshi SHIBATA
989a235580
Fix Use-After-Free issue for Regexp
Co-authored-by: Isaac Peka <7493006+isaac-peka@users.noreply.github.com>
2024-04-23 19:16:08 +09:00
Isaac Peka
33e5b47c16
Fix handling of reg->dmin in Regex matching 2024-04-23 19:16:05 +09:00
Nobuyoshi Nakada
3a04ea2d03 [Bug #20305] Fix matching against an incomplete character
When matching against an incomplete character, some `enclen` calls are
expected not to exceed the limit, and some are expected to return the
required length and then the results are checked if it exceeds.
2024-02-27 13:58:03 +09:00
Nobuyoshi Nakada
75aaeb35b8
[Bug #20239] Fix overflow at down-casting 2024-02-07 15:14:26 +09:00
Peter Zhu
1c120efe02 Fix memory leak in stk_base when Regexp timeout
[Bug #20228]

If rb_reg_check_timeout raises a Regexp::TimeoutError, then the stk_base
will leak.
2024-02-02 10:39:42 -05:00
Hiroya Fujinami
3e6e3ca262
Correctly handle consecutive lookarounds (#9738)
Fix [Bug #20207]
Fix [Bug #20212]

Handling consecutive lookarounds in init_cache_opcodes is buggy, so it
causes invalid memory access reported in [Bug #20207] and [Bug #20212].
This fixes it by using recursive functions to detected lookarounds
nesting correctly.
2024-01-29 23:51:26 +09:00
Hiroya Fujinami
597955aae8
Fix to work match cache with peek next optimization (#9459) 2024-01-10 11:22:23 +09:00
Hiroya Fujinami
2571d5376a
Reduce if for decreasing counter on OP_REPEAT_INC (#9393)
This commit also reduces the warning `'stkp' may be used
uninitialized in this function`.
2023-12-30 01:08:51 +09:00
Hiroya Fujinami
bb59696614
Fix [Bug #20098]: set counter value for {n,m} repetition correctly (#9391) 2023-12-29 19:30:24 +09:00
Hiroya Fujinami
d8702ddbfb
Fix [Bug #20083]: correct a cache point size for atomic groups (#9367) 2023-12-28 23:20:03 +09:00
Alan Wu
9786b909f9 Fix regex match cache out-of-bounds access
Previously the following read and wrote 1 byte out-of-bounds:

    $ valgrind ruby -e 'p /(\W+)[bx]\?/i.match? "aaaaaa aaaaaaaaa aaaa aaaaaaaa aaa aaaaxaaaaaaaaaaa aaaaa aaaaaaaaaaaa a ? aaa aaaa a ?"' 2> >(grep Invalid -A 30)

Because of the `match_cache_point_index + 1` in
memoize_extended_match_cache_point() and
check_extended_match_cache_point(), we need one more byte of space.
2023-11-16 10:23:15 +01:00
Hiroya Fujinami
34cb174800
Optimize regexp matching for look-around and atomic groups (#7931) 2023-10-30 13:10:42 +09:00
Peter Zhu
7193b404a1 Add function rb_reg_onig_match
rb_reg_onig_match performs preparation, error handling, and cleanup for
matching a regex against a string. This reduces repetitive code and
removes the need for StringScanner to access internal data of regex.
2023-07-27 13:33:40 -04:00
Peter Zhu
58386814a7 Don't check for null pointer in calls to free
According to the C99 specification section 7.20.3.2 paragraph 2:

> If ptr is a null pointer, no action occurs.

So we do not need to check that the pointer is a null pointer.
2023-06-30 09:13:31 -04:00
TSUYUSATO Kitsune
a5819b5b25
Allow the match cache optimization for atomic groups (#7804) 2023-05-22 11:27:34 +09:00
TSUYUSATO Kitsune
93dd13d97a
Remove warnings and errors in regexec.c with ONIG_DEBUG_... macros (#7803) 2023-05-13 10:04:28 +09:00
TSUYUSATO Kitsune
ac730d3e75
Delay start of the match cache optimization (#7738) 2023-05-04 13:15:51 +09:00
TSUYUSATO Kitsune
a1c2c274ee
Refactor Regexp#match cache implementation (#7724)
* Refactor Regexp#match cache implementation

Improved variable and function names
Fixed [Bug 19537] (Maybe fixed in https://github.com/ruby/ruby/pull/7694)

* Add a comment of the glossary for "match cache"

* Skip to reset match cache when no cache point on null check
2023-04-19 13:08:28 +09:00
Nobuyoshi Nakada
fac814c2dc
Fix PLATFORM_GET_INC
On platforms where unaligned word access is not allowed, and if
`sizeof(val)` and `sizeof(type)` differ:

- `val` > `type`, `val` will be a garbage.
- `val` < `type`, outside `val` will be clobbered.
2023-04-16 17:45:27 +09:00
Nobuyoshi Nakada
0ac3f2c20e [Bug #19587] Fix reset_match_cache arguments 2023-04-12 18:35:32 +09:00
Nobuyoshi Nakada
1b697d7cb5 Constify 2023-04-12 18:35:32 +09:00
Nobuyoshi Nakada
2e1a95b569 Extract bsearch_cache_index function 2023-04-12 18:35:32 +09:00
TSUYUSATO Kitsune
dddc542e9b
[Bug #19476]: correct cache index computation for repetition (#7457) 2023-03-13 18:31:13 +09:00
TSUYUSATO Kitsune
e22c4e8877
[Bug #19467] correct cache points and counting failure on OP_ANYCHAR_STAR_PEEK_NEXT (#7454) 2023-03-13 15:46:41 +09:00
TSUYUSATO Kitsune
b726d60c98
Fix [Bug 19273], set correct value to outer_repeat on OP_REPEAT (#7035) 2022-12-28 20:03:25 +09:00
Nobuyoshi Nakada
43f4093a31
Adjust style [ci skip] 2022-12-22 15:12:05 +09:00
TSUYUSATO Kitsune
fbedadb61f
Add Regexp.linear_time? (#6901) 2022-12-14 12:57:14 +09:00
Yusuke Endoh
b8e542b463 Make absent operator work at the end of the input string
https://bugs.ruby-lang.org/issues/19104#change-100542
2022-12-12 14:26:38 +09:00
TSUYUSATO Kitsune
189e3c0ada Add default cases for cache point finding function 2022-11-17 23:19:17 +09:00
TSUYUSATO Kitsune
90bfac296e Add OP_CCLASS_MB case 2022-11-17 23:19:17 +09:00
TSUYUSATO Kitsune
1dc4128e92 Reduce warnings 2022-11-09 23:21:26 +09:00
TSUYUSATO Kitsune
36ff0521c1 Use long instead of int 2022-11-09 23:21:26 +09:00
Yusuke Endoh
d868f4ca31 Check for integer overflow in the allocation of match_cache table 2022-11-09 23:21:26 +09:00
Yusuke Endoh
14845ab4ff Ensure that the table size for CACHE_MATCH fits with int
Currently, the keys for CACHE_MATCH are handled as an `int` type. So we
should make sure the table size are smaller than the range of `int`.
2022-11-09 23:21:26 +09:00
Yusuke Endoh
537286d0bb Prevent GCC warnings
```
regexec.c: In function ‘reset_match_cache’:
regexec.c:1259:56: warning: suggest parentheses around ‘-’ inside ‘<<’ [-Wparentheses]
 1259 |     match_cache[k1 >> 3] &= ((1 << (8 - (k2 & 7) - 1)) - 1 << ((k2 & 7) + 1)) | ((1 << (k1 & 7)) - 1);
      |                              ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
regexec.c:1269:60: warning: suggest parentheses around ‘-’ inside ‘<<’ [-Wparentheses]
 1269 |         match_cache[k2 >> 3] &= ((1 << (8 - (k2 & 7) - 1)) - 1 << ((k2 & 7) + 1));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
regexec.c: In function ‘find_cache_index_table’:
regexec.c:1192:11: warning: ‘m’ may be used uninitialized [-Wmaybe-uninitialized]
 1192 |   if (!(0 <= m && m < num_cache_table && table[m].addr == p)) {
      |         ~~^~~~
regexec.c: In function ‘match_at’:
regexec.c:1238:12: warning: ‘m1’ is used uninitialized [-Wuninitialized]
 1238 |   if (table[m1].addr < pbegin && m1 + 1 < num_cache_table) m1++;
      |            ^
regexec.c:1218:39: note: ‘m1’ was declared here
 1218 |   int l = 0, r = num_cache_table - 1, m1, m2;
      |                                       ^~
regexec.c:1239:12: warning: ‘m2’ is used uninitialized [-Wuninitialized]
 1239 |   if (table[m2].addr > pend && m2 - 1 > 0) m2--;
      |            ^
regexec.c:1218:43: note: ‘m2’ was declared here
 1218 |   int l = 0, r = num_cache_table - 1, m1, m2;
      |                                           ^~
```
2022-11-09 23:21:26 +09:00
Yusuke Endoh
ff5dba8319 Return ONIGERR_MEMORY if it fails to allocate memory for cache_match_opt 2022-11-09 23:21:26 +09:00
TSUYUSATO Kitsune
a1c1fc558a Revert "Refactor field names"
This reverts commit 1e6673d6bbd2adbf555d82c7c0906ceb148ed6ee.
2022-11-09 23:21:26 +09:00
TSUYUSATO Kitsune
22294731a8 Refactor field names 2022-11-09 23:21:26 +09:00
TSUYUSATO Kitsune
ff2998a86c Remove debug printf 2022-11-09 23:21:26 +09:00
TSUYUSATO Kitsune
37613fea16 Clear cache on OP_NULL_CHECK_END_MEMST 2022-11-09 23:21:26 +09:00
TSUYUSATO Kitsune
f25bb291b4 Support OP_REPEAT and OP_REPEAT_INC 2022-11-09 23:21:26 +09:00