Skip to content

[SPARK-57249][SQL] Skip the dead result null assignment in DecimalDivideWithOverflowCheck codegen when overflow cannot produce null#56304

Open
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:decimaldivide-codegen
Open

[SPARK-57249][SQL] Skip the dead result null assignment in DecimalDivideWithOverflowCheck codegen when overflow cannot produce null#56304
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:decimaldivide-codegen

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This is a sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen).

DecimalDivideWithOverflowCheck.doGenCode always emitted the result null check after the toPrecision call in the non-null-left branch:

if (${eval1.isNull}) {
  $nullHandling
} else {
  ${ev.value} = ${eval1.value}.$decimalMethod(${eval2.value}).toPrecision(
      ${dataType.precision}, ${dataType.scale}, Decimal.ROUND_HALF_UP(), $nullOnOverflow, $errorContextCode);
  ${ev.isNull} = ${ev.value} == null;
}

The result can only become null in the nullOnOverflow path: Decimal.toPrecision returns null only when nullOnOverflow is true and overflow occurs; otherwise it throws on overflow and never returns null. On this branch ev.isNull is already initialized from eval1.isNull (which is false here, since the branch runs only when the left operand is non-null). So when nullOnOverflow is false, ev.value == null is always false and the assignment is a dead ev.isNull = false; write.

This PR gates the assignment on nullOnOverflow:

val setNull = if (nullOnOverflow) s"${ev.isNull} = ${ev.value} == null;" else ""

nullOnOverflow is the right predicate: nullable = nullOnOverflow, and on this branch the left operand is non-null, so child nullability is irrelevant. When nullOnOverflow is true the assignment is kept (it is what turns a division overflow into a null result). The null-left handling and the throw-on-overflow path are unchanged.

Why are the changes needed?

To reduce the size of the generated Java in whole-stage codegen, as tracked by SPARK-56908. The expression is produced by decimal Average (DecimalDivideWithOverflowCheck(sum, count, resultType, ctx, evalMode != ANSI)), so under ANSI mode (the default) nullOnOverflow is false and this dead write was emitted on the decimal avg codegen path.

This completes the decimal dead-write cleanup that also covers MakeDecimal and CheckOverflow (the remaining == null checks in decimalExpressions.scala, e.g. CheckOverflowInSum, are live and not removable).

Does this PR introduce any user-facing change?

No. This is a codegen-only change; eval, nullable, dataType, toString, and results are unchanged, so SQL output and plan/golden files are unaffected.

How was this patch tested?

Adds a direct unit test in DecimalExpressionSuite (the expression previously had no direct test), covering nullOnOverflow false/true with no overflow, with an overflowing result (null when nullOnOverflow, error otherwise), and a null left operand (null vs error). checkEvaluation runs interpreted and codegen (mutable and unsafe projections). Also ran the decimal AVG aggregate tests in DataFrameAggregateSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…ideWithOverflowCheck codegen when overflow cannot produce null

DecimalDivideWithOverflowCheck.doGenCode always emitted
`ev.isNull = ev.value == null;` after the `toPrecision` call in the
non-null-left branch. The result can only become null in the nullOnOverflow
path: `Decimal.toPrecision` returns null only when `nullOnOverflow` is true and
overflow occurs; otherwise it throws on overflow and never returns null. On
that branch `ev.isNull` is already initialized from `eval1.isNull` (false,
since the branch runs only when the left operand is non-null). So when
nullOnOverflow is false the assignment is a dead `ev.isNull = false` write.

Gate the assignment on nullOnOverflow. This drops the dead write for the common
ANSI path (the expression is produced by decimal Average, with
nullOnOverflow = evalMode != ANSI, so false by default). Behavior is unchanged.
Also adds a direct unit test, which the expression previously lacked.
@HyukjinKwon
Copy link
Copy Markdown
Member

cc @gengliangwang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants