For calls such as:
m(*ary, a: 2, **h)
m(*ary, **h, **h, **h)
Where m does not take a positional argument splat, there was previously
an array allocation (splatarray true) to dup ary, even though it was not
necessary to do so. This is because the elimination of the array allocation
(splatarray false) was performed in the optimizer, and the optimizer didn't
handle this case, because the instructions for the keywords can be of
arbitrary length.
Move part of the optimization from the optimizer to the compiler,
detecting parse trees of the form:
ARGS_PUSH:
head: SPLAT
tail: HASH (without brace)
And using splatarray false instead of splatarray true for them.
Unfortunately, moving part of the optimization to the compiler broke
the hash allocation elimination optimization for calls of the
form:
m(*ary, a: 2)
That's because the compiler had already set splatarray false,
and the optimizer code was looking for splatarray true.
Split the array allocation elimination and hash allocation
elimination in the optimizer so that the hash allocation
elimination will still apply if the compiler performs the
splatarray false optimization.
* YJIT: increase context cache size to 1024
The other day I ran into a mysterious bug while increasing the
cache size to 1024. I was not able to reproduce this locally.
Opening this PR for testing/debugging.
* Add extra debug assertions
* Add more comments to context code
* Update yjit/src/core.rs
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
* Update yjit/src/core.rs
* Comment out potentially problematic assertion
* Revert cache size to 512 so we can merge other changes
---------
Co-authored-by: Alan Wu <XrXr@users.noreply.github.com>
Since dln.c is replaced with dmydln.c for miniruby, we cannot load shared
GC for miniruby. This means that many tests do not have the shared GC
loaded and many tests fail because of the warning that emits in miniruby.
This commit changes shared GC to directly use dlopen for loading the
shared GC.
If there's a lockfile, but it's out of sync with the Gemfile because a
dependency has been deleted, and frozen mode is set, Bundler will print
the following strange error:
```
$ bundle add rake
, but the lockfile can't be updated because frozen mode is set
You have deleted from the Gemfile:
* rake (~> 13.2)
Run `bundle install` elsewhere and add the updated Gemfile to version control.
```
This commit changes the error to:
```
Some dependencies were deleted from your gemfile, but the lockfile can't be updated because frozen mode is set
You have deleted from the Gemfile:
* rake (~> 13.2)
Run `bundle install` elsewhere and add the updated Gemfile to version control.
```
https://github.com/rubygems/rubygems/commit/452da4048d
If Gemfile is empty and there's no lockfile (situation after `bundle init`), and
`frozen` is configured, running `bundle add` will result in an strange
error, like this:
```
$ bundle add rake
, but the lockfile can't be updated because frozen mode is set
You have deleted from the Gemfile:
* rake (~> 13.2)
Run `bundle install` elsewhere and add the updated Gemfile to version control.
```
This commit fixes the problem to instead print
https://github.com/rubygems/rubygems/commit/152331a9dc
Enumerable#all?, #any?, #one?, and #none? do not use yielded arguments
as an Array. So they can use rb_block_call2 to omit array allocatoin.
Enumerable#find does not have to immediately accept yielded arguments as
an Array. It can delay array allocation until the predicate block
returns truthy.
(TODO: Enumerable#count and #find_all seem to be optimizable as well.)
This function accepts flags:
RB_NO_KEYWORDS, RB_PASS_KEYWORDS, RB_PASS_CALLED_KEYWORDS:
Works as the same as rb_block_call_kw.
RB_BLOCK_NO_USE_PACKED_ARGS:
The given block ("bl_proc") does not use "yielded_arg" of rb_block_call_func_t.
Instead, the block accesses the yielded arguments via "argc" and "argv".
This flag allows the called method to yield arguments without allocating an Array.
While working on something else I noticed:
* Usage of uppercased "RUBY" and "JAVA" as platforms, when those don't
really exist.
* Usage of some test gems with "1.0" as gemspec version and "1.0.0" as
actual version.
This commit fixes both inconsistencies to make things more expectable.
https://github.com/rubygems/rubygems/commit/e3ec32e247
The granularity is calculated as `10 ** ndigits.abs` rather than
`ndigits.abs * 10`. For example, if `ndigits` is `-2`, it should be
`10 ** 2 == 100` rather than `2 * 10 == 20`.
This change implements a fallback mode for the `--yjit-dump-disasm`
development command-line option to make it usable in release builds.
Previously, using the option with release builds of YJIT yielded only
a warning asking the user to build with `--enable-yjit=dev`.
While builds that use the `disasm` feature still give the best output,
just having the comments is useful enough for many kinds of debugging.
Having it usable in release builds is nice for new hackers, too, since
this allows for tinkering without having to learn how to build YJIT in
development mode.
Sample output on A64:
```
# regenerate_branch
# Insn: 0001 opt_send_without_block (stack_size: 1)
# guard known object with singleton class
0x11f7e0034: 4b 00 00 58 03 00 00 14 08 ce 9c 04 01 00 00
0x11f7e0043: 00 3f 00 0b eb 81 06 01 54 1f 20 03 d5
# RUBY_VM_CHECK_INTS(ec)
0x11f7e0050: 8b 02 42 b8 cb 07 01 35
# stack overflow check
0x11f7e0058: ab 62 02 91 7f 02 0b eb 69 07 01 54
# save PC to CFP
0x11f7e0064: 0b 3b 9a d2 2b 2f a0 f2 0b 00 cc f2 6b 02 00
0x11f7e0073: f8 ab 82 00 91
```
To ensure this feature doesn't incur too much cost when running without
the `--yjit-dump-disasm` option, I checked that there is no significant
impact to compile time and memory usage with the `compile_time_ns` and
`yjit_alloc_size` entry in `RubyVM::YJIT.runtime_stats`. For each
sample, I ran 3 iterations of the `lobsters` YJIT benchmark. The
statistics summary and done with the `summary` function in R.
Compile time, sample size of 60, lower is better:
```
Before After
Min. :2.054e+09 Min. :2.028e+09
1st Qu.:2.069e+09 1st Qu.:2.044e+09
Median :2.081e+09 Median :2.060e+09
Mean :2.089e+09 Mean :2.066e+09
3rd Qu.:2.109e+09 3rd Qu.:2.085e+09
Max. :2.146e+09 Max. :2.144e+09
```
Allocation size, sample size of 20, lower is better:
```
Before After
Min. :21804742 Min. :21794082
1st Qu.:21826682 1st Qu.:21816282
Median :21844042 Median :21826814
Mean :21960664 Mean :22026291
3rd Qu.:21861228 3rd Qu.:22040439
Max. :22587426 Max. :22930614
```
The `yjit_alloc_size` samples are noisy, but since the average increased
by only 0.3%, and the median is lower, I feel safe saying that there is
no significant change.