ruby/lib/pp.rb
Jeremy Evans e4f85bfc31 Implement Set as a core class
Set has been an autoloaded standard library since Ruby 3.2.
The standard library Set is less efficient than it could be, as it
uses Hash for storage, which stores unnecessary values for each key.

Implementation details:

* Core Set uses a modified version of `st_table`, named `set_table`.
  than `s/st_/set_/`, the main difference is that the stored records
  do not have values, making them 1/3 smaller. `st_table_entry` stores
  `hash`, `key`, and `record` (value), while `set_table_entry` only
  stores `hash` and `key`.  This results in large sets using ~33% less
  memory compared to stdlib Set.  For small sets, core Set uses 12% more
  memory (160 byte object slot and 64 malloc bytes, while stdlib set
  uses 40 for Set and 160 for Hash).  More memory is used because
  the set_table is embedded and 72 bytes in the object slot are
  currently wasted. Hopefully we can make this more efficient and have
  it stored in an 80 byte object slot in the future.

* All methods are implemented as cfuncs, except the pretty_print
  methods, which were moved to `lib/pp.rb` (which is where the
  pretty_print methods for other core classes are defined).  As is
  typical for core classes, internal calls call C functions and
  not Ruby methods.  For example, to check if something is a Set,
  `rb_obj_is_kind_of` is used, instead of calling `is_a?(Set)` on the
  related object.

* Almost all methods use the same algorithm that the pure-Ruby
  implementation used.  The exception is when calling `Set#divide` with a
  block with 2-arity.  The pure-Ruby method used tsort to implement this.
  I developed an algorithm that only allocates a single intermediate
  hash and does not need tsort.

* The `flatten_merge` protected method is no longer necessary, so it
  is not implemented (it could be).

* Similar to Hash/Array, subclasses of Set are no longer reflected in
  `inspect` output.

* RDoc from stdlib Set was moved to core Set, with minor updates.

This includes a comprehensive benchmark suite for all public Set
methods.  As you would expect, the native version is faster in the
vast majority of cases, and multiple times faster in many cases.
There are a few cases where it is significantly slower:

* Set.new with no arguments (~1.6x)
* Set#compare_by_identity for small sets (~1.3x)
* Set#clone for small sets (~1.5x)
* Set#dup for small sets (~1.7x)

These are slower as Set does not currently use the AR table
optimization that Hash does, so a new set_table is initialized for
each call.  I'm not sure it's worth the complexity to have an AR
table-like optimization for small sets (for hashes it makes sense,
as small hashes are used everywhere in Ruby).

The rbs and repl_type_completor bundled gems will need updates to
support core Set.  The pull request marks them as allowed failures.

This passes all set tests with no changes.  The following specs
needed modification:

* Modifying frozen set error message (changed for the better)
* `Set#divide` when passed a 2-arity block no longer yields the same
  object as both the first and second argument (this seems like an issue
  with the previous implementation).
* Set-like objects that override `is_a?` such that `is_a?(Set)` return
  `true` are no longer treated as Set instances.
* `Set.allocate.hash` is no longer the same as `nil.hash`
* `Set#join` no longer calls `Set#to_a` (it calls the underlying C
   function).
* `Set#flatten_merge` protected method is not implemented.

Previously, `set.rb` added a `SortedSet` autoload, which loads
`set/sorted_set.rb`.  This replaces the `Set` autoload in `prelude.rb`
with a `SortedSet` autoload, but I recommend removing it and
`set/sorted_set.rb`.

This moves `test/set/test_set.rb` to `test/ruby/test_set.rb`,
reflecting that switch to a core class.  This does not move the spec
files, as I'm not sure how they should be handled.

Internally, this uses the st_* types and functions as much as
possible, and only adds set_* types and functions as needed.
The underlying set_table implementation is stored in st.c, but
there is no public C-API for it, nor is there one planned, in
order to keep the ability to change the internals going forward.

For internal uses of st_table with Qtrue values, those can
probably be replaced with set_table.  To do that, include
internal/set_table.h.  To handle symbol visibility (rb_ prefix),
internal/set_table.h uses the same macro approach that
include/ruby/st.h uses.

The Set class (rb_cSet) and all methods are defined in set.c.
There isn't currently a C-API for the Set class, though C-API
functions can be added as needed going forward.

Implements [Feature #21216]

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Oliver Nutter <mrnoname1000@riseup.net>
2025-04-26 10:31:11 +09:00

737 lines
19 KiB
Ruby

# frozen_string_literal: true
require 'prettyprint'
##
# A pretty-printer for Ruby objects.
#
##
# == What PP Does
#
# Standard output by #p returns this:
# #<PP:0x81fedf0 @genspace=#<Proc:0x81feda0>, @group_queue=#<PrettyPrint::GroupQueue:0x81fed3c @queue=[[#<PrettyPrint::Group:0x81fed78 @breakables=[], @depth=0, @break=false>], []]>, @buffer=[], @newline="\n", @group_stack=[#<PrettyPrint::Group:0x81fed78 @breakables=[], @depth=0, @break=false>], @buffer_width=0, @indent=0, @maxwidth=79, @output_width=2, @output=#<IO:0x8114ee4>>
#
# Pretty-printed output returns this:
# #<PP:0x81fedf0
# @buffer=[],
# @buffer_width=0,
# @genspace=#<Proc:0x81feda0>,
# @group_queue=
# #<PrettyPrint::GroupQueue:0x81fed3c
# @queue=
# [[#<PrettyPrint::Group:0x81fed78 @break=false, @breakables=[], @depth=0>],
# []]>,
# @group_stack=
# [#<PrettyPrint::Group:0x81fed78 @break=false, @breakables=[], @depth=0>],
# @indent=0,
# @maxwidth=79,
# @newline="\n",
# @output=#<IO:0x8114ee4>,
# @output_width=2>
#
##
# == Usage
#
# pp(obj) #=> obj
# pp obj #=> obj
# pp(obj1, obj2, ...) #=> [obj1, obj2, ...]
# pp() #=> nil
#
# Output <tt>obj(s)</tt> to <tt>$></tt> in pretty printed format.
#
# It returns <tt>obj(s)</tt>.
#
##
# == Output Customization
#
# To define a customized pretty printing function for your classes,
# redefine method <code>#pretty_print(pp)</code> in the class.
# Note that <code>require 'pp'</code> is needed before redefining <code>#pretty_print(pp)</code>.
#
# <code>#pretty_print</code> takes the +pp+ argument, which is an instance of the PP class.
# The method uses #text, #breakable, #nest, #group and #pp to print the
# object.
#
##
# == Pretty-Print JSON
#
# To pretty-print JSON refer to JSON#pretty_generate.
#
##
# == Author
# Tanaka Akira <akr@fsij.org>
class PP < PrettyPrint
# The version string
VERSION = "0.6.2"
# Returns the usable width for +out+.
# As the width of +out+:
# 1. If +out+ is assigned to a tty device, its width is used.
# 2. Otherwise, or it could not get the value, the +COLUMN+
# environment variable is assumed to be set to the width.
# 3. If +COLUMN+ is not set to a non-zero number, 80 is assumed.
#
# And finally, returns the above width value - 1.
# * This -1 is for Windows command prompt, which moves the cursor to
# the next line if it reaches the last column.
def PP.width_for(out)
begin
require 'io/console'
_, width = out.winsize
rescue LoadError, NoMethodError, SystemCallError
end
(width || ENV['COLUMNS']&.to_i&.nonzero? || 80) - 1
end
# Outputs +obj+ to +out+ in pretty printed format of
# +width+ columns in width.
#
# If +out+ is omitted, <code>$></code> is assumed.
# If +width+ is omitted, the width of +out+ is assumed (see
# width_for).
#
# PP.pp returns +out+.
def PP.pp(obj, out=$>, width=width_for(out))
q = new(out, width)
q.guard_inspect_key {q.pp obj}
q.flush
#$pp = q
out << "\n"
end
# Outputs +obj+ to +out+ like PP.pp but with no indent and
# newline.
#
# PP.singleline_pp returns +out+.
def PP.singleline_pp(obj, out=$>)
q = SingleLine.new(out)
q.guard_inspect_key {q.pp obj}
q.flush
out
end
# :stopdoc:
def PP.mcall(obj, mod, meth, *args, &block)
mod.instance_method(meth).bind_call(obj, *args, &block)
end
# :startdoc:
if defined? ::Ractor
class << self
# Returns the sharing detection flag as a boolean value.
# It is false (nil) by default.
def sharing_detection
Ractor.current[:pp_sharing_detection]
end
# Sets the sharing detection flag to b.
def sharing_detection=(b)
Ractor.current[:pp_sharing_detection] = b
end
end
else
@sharing_detection = false
class << self
# Returns the sharing detection flag as a boolean value.
# It is false by default.
attr_accessor :sharing_detection
end
end
# Module that defines helper methods for pretty_print.
module PPMethods
# Yields to a block
# and preserves the previous set of objects being printed.
def guard_inspect_key
if Thread.current[:__recursive_key__] == nil
Thread.current[:__recursive_key__] = {}.compare_by_identity
end
if Thread.current[:__recursive_key__][:inspect] == nil
Thread.current[:__recursive_key__][:inspect] = {}.compare_by_identity
end
save = Thread.current[:__recursive_key__][:inspect]
begin
Thread.current[:__recursive_key__][:inspect] = {}.compare_by_identity
yield
ensure
Thread.current[:__recursive_key__][:inspect] = save
end
end
# Check whether the object_id +id+ is in the current buffer of objects
# to be pretty printed. Used to break cycles in chains of objects to be
# pretty printed.
def check_inspect_key(id)
Thread.current[:__recursive_key__] &&
Thread.current[:__recursive_key__][:inspect] &&
Thread.current[:__recursive_key__][:inspect].include?(id)
end
# Adds the object_id +id+ to the set of objects being pretty printed, so
# as to not repeat objects.
def push_inspect_key(id)
Thread.current[:__recursive_key__][:inspect][id] = true
end
# Removes an object from the set of objects being pretty printed.
def pop_inspect_key(id)
Thread.current[:__recursive_key__][:inspect].delete id
end
private def guard_inspect(object)
recursive_state = Thread.current[:__recursive_key__]
if recursive_state && recursive_state.key?(:inspect)
begin
push_inspect_key(object)
yield
ensure
pop_inspect_key(object) unless PP.sharing_detection
end
else
guard_inspect_key do
push_inspect_key(object)
yield
end
end
end
# Adds +obj+ to the pretty printing buffer
# using Object#pretty_print or Object#pretty_print_cycle.
#
# Object#pretty_print_cycle is used when +obj+ is already
# printed, a.k.a the object reference chain has a cycle.
def pp(obj)
# If obj is a Delegator then use the object being delegated to for cycle
# detection
obj = obj.__getobj__ if defined?(::Delegator) and ::Delegator === obj
if check_inspect_key(obj)
group {obj.pretty_print_cycle self}
return
end
guard_inspect(obj) do
group do
obj.pretty_print self
rescue NoMethodError
text Kernel.instance_method(:inspect).bind_call(obj)
end
end
end
# A convenience method which is same as follows:
#
# group(1, '#<' + obj.class.name, '>') { ... }
def object_group(obj, &block) # :yield:
group(1, '#<' + obj.class.name, '>', &block)
end
# A convenience method, like object_group, but also reformats the Object's
# object_id.
def object_address_group(obj, &block)
str = Kernel.instance_method(:to_s).bind_call(obj)
str.chomp!('>')
group(1, str, '>', &block)
end
# A convenience method which is same as follows:
#
# text ','
# breakable
def comma_breakable
text ','
breakable
end
# Adds a separated list.
# The list is separated by comma with breakable space, by default.
#
# #seplist iterates the +list+ using +iter_method+.
# It yields each object to the block given for #seplist.
# The procedure +separator_proc+ is called between each yields.
#
# If the iteration is zero times, +separator_proc+ is not called at all.
#
# If +separator_proc+ is nil or not given,
# +lambda { comma_breakable }+ is used.
# If +iter_method+ is not given, :each is used.
#
# For example, following 3 code fragments has similar effect.
#
# q.seplist([1,2,3]) {|v| xxx v }
#
# q.seplist([1,2,3], lambda { q.comma_breakable }, :each) {|v| xxx v }
#
# xxx 1
# q.comma_breakable
# xxx 2
# q.comma_breakable
# xxx 3
def seplist(list, sep=nil, iter_method=:each) # :yield: element
sep ||= lambda { comma_breakable }
first = true
kwsplat = EMPTY_KWHASH
list.__send__(iter_method) {|*v|
if first
first = false
else
sep.call
end
kwsplat ? yield(*v, **kwsplat) : yield(*v)
}
end
EMPTY_KWHASH = if RUBY_VERSION >= "3.0"
{}.freeze
end
private_constant :EMPTY_KWHASH
# A present standard failsafe for pretty printing any given Object
def pp_object(obj)
object_address_group(obj) {
seplist(obj.pretty_print_instance_variables, lambda { text ',' }) {|v|
breakable
v = v.to_s if Symbol === v
text v
text '='
group(1) {
breakable ''
pp(obj.instance_eval(v))
}
}
}
end
# A pretty print for a Hash
def pp_hash(obj)
group(1, '{', '}') {
seplist(obj, nil, :each_pair) {|k, v|
group {
pp_hash_pair k, v
}
}
}
end
if RUBY_VERSION >= '3.4.'
# A pretty print for a pair of Hash
def pp_hash_pair(k, v)
if Symbol === k
sym_s = k.inspect
if sym_s[1].match?(/["$@!]/) || sym_s[-1].match?(/[%&*+\-\/<=>@\]^`|~]/)
text "#{k.to_s.inspect}:"
else
text "#{k}:"
end
else
pp k
text ' '
text '=>'
end
group(1) {
breakable
pp v
}
end
else
def pp_hash_pair(k, v)
pp k
text '=>'
group(1) {
breakable ''
pp v
}
end
end
end
include PPMethods
class SingleLine < PrettyPrint::SingleLine # :nodoc:
include PPMethods
end
module ObjectMixin # :nodoc:
# 1. specific pretty_print
# 2. specific inspect
# 3. generic pretty_print
# A default pretty printing method for general objects.
# It calls #pretty_print_instance_variables to list instance variables.
#
# If +self+ has a customized (redefined) #inspect method,
# the result of self.inspect is used but it obviously has no
# line break hints.
#
# This module provides predefined #pretty_print methods for some of
# the most commonly used built-in classes for convenience.
def pretty_print(q)
umethod_method = Object.instance_method(:method)
begin
inspect_method = umethod_method.bind_call(self, :inspect)
rescue NameError
end
if inspect_method && inspect_method.owner != Kernel
q.text self.inspect
elsif !inspect_method && self.respond_to?(:inspect)
q.text self.inspect
else
q.pp_object(self)
end
end
# A default pretty printing method for general objects that are
# detected as part of a cycle.
def pretty_print_cycle(q)
q.object_address_group(self) {
q.breakable
q.text '...'
}
end
# Returns a sorted array of instance variable names.
#
# This method should return an array of names of instance variables as symbols or strings as:
# +[:@a, :@b]+.
def pretty_print_instance_variables
instance_variables.sort
end
# Is #inspect implementation using #pretty_print.
# If you implement #pretty_print, it can be used as follows.
#
# alias inspect pretty_print_inspect
#
# However, doing this requires that every class that #inspect is called on
# implement #pretty_print, or a RuntimeError will be raised.
def pretty_print_inspect
if Object.instance_method(:method).bind_call(self, :pretty_print).owner == PP::ObjectMixin
raise "pretty_print is not overridden for #{self.class}"
end
PP.singleline_pp(self, ''.dup)
end
end
end
class Array # :nodoc:
def pretty_print(q) # :nodoc:
q.group(1, '[', ']') {
q.seplist(self) {|v|
q.pp v
}
}
end
def pretty_print_cycle(q) # :nodoc:
q.text(empty? ? '[]' : '[...]')
end
end
class Hash # :nodoc:
def pretty_print(q) # :nodoc:
q.pp_hash self
end
def pretty_print_cycle(q) # :nodoc:
q.text(empty? ? '{}' : '{...}')
end
end
class Set # :nodoc:
def pretty_print(pp) # :nodoc:
pp.group(1, '#<Set:', '>') {
pp.breakable
pp.group(1, '{', '}') {
pp.seplist(self) { |o|
pp.pp o
}
}
}
end
def pretty_print_cycle(pp) # :nodoc:
pp.text sprintf('#<Set: {%s}>', empty? ? '' : '...')
end
end
class << ENV # :nodoc:
def pretty_print(q) # :nodoc:
h = {}
ENV.keys.sort.each {|k|
h[k] = ENV[k]
}
q.pp_hash h
end
end
class Struct # :nodoc:
def pretty_print(q) # :nodoc:
q.group(1, sprintf("#<struct %s", PP.mcall(self, Kernel, :class).name), '>') {
q.seplist(PP.mcall(self, Struct, :members), lambda { q.text "," }) {|member|
q.breakable
q.text member.to_s
q.text '='
q.group(1) {
q.breakable ''
q.pp self[member]
}
}
}
end
def pretty_print_cycle(q) # :nodoc:
q.text sprintf("#<struct %s:...>", PP.mcall(self, Kernel, :class).name)
end
end
class Data # :nodoc:
def pretty_print(q) # :nodoc:
class_name = PP.mcall(self, Kernel, :class).name
class_name = " #{class_name}" if class_name
q.group(1, "#<data#{class_name}", '>') {
members = PP.mcall(self, Kernel, :class).members
values = []
members.select! do |member|
begin
values << __send__(member)
true
rescue NoMethodError
false
end
end
q.seplist(members.zip(values), lambda { q.text "," }) {|(member, value)|
q.breakable
q.text member.to_s
q.text '='
q.group(1) {
q.breakable ''
q.pp value
}
}
}
end
def pretty_print_cycle(q) # :nodoc:
q.text sprintf("#<data %s:...>", PP.mcall(self, Kernel, :class).name)
end
end if defined?(Data.define)
class Range # :nodoc:
def pretty_print(q) # :nodoc:
begin_nil = self.begin == nil
end_nil = self.end == nil
q.pp self.begin if !begin_nil || end_nil
q.breakable ''
q.text(self.exclude_end? ? '...' : '..')
q.breakable ''
q.pp self.end if !end_nil || begin_nil
end
end
class String # :nodoc:
def pretty_print(q) # :nodoc:
lines = self.lines
if lines.size > 1
q.group(0, '', '') do
q.seplist(lines, lambda { q.text ' +'; q.breakable }) do |v|
q.pp v
end
end
else
q.text inspect
end
end
end
class File < IO # :nodoc:
class Stat # :nodoc:
def pretty_print(q) # :nodoc:
require 'etc'
q.object_group(self) {
q.breakable
q.text sprintf("dev=0x%x", self.dev); q.comma_breakable
q.text "ino="; q.pp self.ino; q.comma_breakable
q.group {
m = self.mode
q.text sprintf("mode=0%o", m)
q.breakable
q.text sprintf("(%s %c%c%c%c%c%c%c%c%c)",
self.ftype,
(m & 0400 == 0 ? ?- : ?r),
(m & 0200 == 0 ? ?- : ?w),
(m & 0100 == 0 ? (m & 04000 == 0 ? ?- : ?S) :
(m & 04000 == 0 ? ?x : ?s)),
(m & 0040 == 0 ? ?- : ?r),
(m & 0020 == 0 ? ?- : ?w),
(m & 0010 == 0 ? (m & 02000 == 0 ? ?- : ?S) :
(m & 02000 == 0 ? ?x : ?s)),
(m & 0004 == 0 ? ?- : ?r),
(m & 0002 == 0 ? ?- : ?w),
(m & 0001 == 0 ? (m & 01000 == 0 ? ?- : ?T) :
(m & 01000 == 0 ? ?x : ?t)))
}
q.comma_breakable
q.text "nlink="; q.pp self.nlink; q.comma_breakable
q.group {
q.text "uid="; q.pp self.uid
begin
pw = Etc.getpwuid(self.uid)
rescue ArgumentError
end
if pw
q.breakable; q.text "(#{pw.name})"
end
}
q.comma_breakable
q.group {
q.text "gid="; q.pp self.gid
begin
gr = Etc.getgrgid(self.gid)
rescue ArgumentError
end
if gr
q.breakable; q.text "(#{gr.name})"
end
}
q.comma_breakable
q.group {
q.text sprintf("rdev=0x%x", self.rdev)
if self.rdev_major && self.rdev_minor
q.breakable
q.text sprintf('(%d, %d)', self.rdev_major, self.rdev_minor)
end
}
q.comma_breakable
q.text "size="; q.pp self.size; q.comma_breakable
q.text "blksize="; q.pp self.blksize; q.comma_breakable
q.text "blocks="; q.pp self.blocks; q.comma_breakable
q.group {
t = self.atime
q.text "atime="; q.pp t
q.breakable; q.text "(#{t.tv_sec})"
}
q.comma_breakable
q.group {
t = self.mtime
q.text "mtime="; q.pp t
q.breakable; q.text "(#{t.tv_sec})"
}
q.comma_breakable
q.group {
t = self.ctime
q.text "ctime="; q.pp t
q.breakable; q.text "(#{t.tv_sec})"
}
}
end
end
end
class MatchData # :nodoc:
def pretty_print(q) # :nodoc:
nc = []
self.regexp.named_captures.each {|name, indexes|
indexes.each {|i| nc[i] = name }
}
q.object_group(self) {
q.breakable
q.seplist(0...self.size, lambda { q.breakable }) {|i|
if i == 0
q.pp self[i]
else
if nc[i]
q.text nc[i]
else
q.pp i
end
q.text ':'
q.pp self[i]
end
}
}
end
end
if defined?(RubyVM::AbstractSyntaxTree)
class RubyVM::AbstractSyntaxTree::Node # :nodoc:
def pretty_print_children(q, names = [])
children.zip(names) do |c, n|
if n
q.breakable
q.text "#{n}:"
end
q.group(2) do
q.breakable
q.pp c
end
end
end
def pretty_print(q)
q.group(1, "(#{type}@#{first_lineno}:#{first_column}-#{last_lineno}:#{last_column}", ")") {
case type
when :SCOPE
pretty_print_children(q, %w"tbl args body")
when :ARGS
pretty_print_children(q, %w[pre_num pre_init opt first_post post_num post_init rest kw kwrest block])
when :DEFN
pretty_print_children(q, %w[mid body])
when :ARYPTN
pretty_print_children(q, %w[const pre rest post])
when :HSHPTN
pretty_print_children(q, %w[const kw kwrest])
else
pretty_print_children(q)
end
}
end
end
end
class Object < BasicObject # :nodoc:
include PP::ObjectMixin
end
[Numeric, Symbol, FalseClass, TrueClass, NilClass, Module].each {|c|
c.class_eval {
def pretty_print_cycle(q)
q.text inspect
end
}
}
[Numeric, FalseClass, TrueClass, Module].each {|c|
c.class_eval {
def pretty_print(q)
q.text inspect
end
}
}
module Kernel
# Returns a pretty printed object as a string.
#
# See the PP module for more information.
def pretty_inspect
PP.pp(self, ''.dup)
end
# prints arguments in pretty form.
#
# +#pp+ returns argument(s).
def pp(*objs)
objs.each {|obj|
PP.pp(obj)
}
objs.size <= 1 ? objs.first : objs
end
module_function :pp
end