C comment: improve statistics computation comment example

Discussion: https://postgr.es/m/CAKFQuwbD672Sc0EXv0ifx3pzfQ5UAEpiAeaBGKz_Ox-4d2NGCA@mail.gmail.com

Author: David G. Johnston

Backpatch-through: master
This commit is contained in:
Bruce Momjian 2023-10-31 11:42:02 -04:00
parent 741ed2065c
commit b706172d22

View File

@ -389,18 +389,20 @@ tablename | null_frac | n_distinct | most_common_vals
</programlisting>
In this case there is no <acronym>MCV</acronym> information for
<structfield>unique2</structfield> because all the values appear to be
unique, so we use an algorithm that relies only on the number of
distinct values for both relations together with their null fractions:
<structname>unique2</structname> and all the values appear to be
unique (n_distinct = -1), so we use an algorithm that relies on the row
count estimates for both relations (num_rows, not shown, but "tenk")
together with the column null fractions (zero for both):
<programlisting>
selectivity = (1 - null_frac1) * (1 - null_frac2) * min(1/num_distinct1, 1/num_distinct2)
selectivity = (1 - null_frac1) * (1 - null_frac2) / max(num_rows1, num_rows2)
= (1 - 0) * (1 - 0) / max(10000, 10000)
= 0.0001
</programlisting>
This is, subtract the null fraction from one for each of the relations,
and divide by the maximum of the numbers of distinct values.
and divide by the row count of the larger relation (this value does get
scaled in the non-unique case).
The number of rows
that the join is likely to emit is calculated as the cardinality of the
Cartesian product of the two inputs, multiplied by the