gh-146306: Optimize float operations by mutating uniquely-referenced operands in place (JIT only)#146307
Open
eendebakpt wants to merge 3 commits intopython:mainfrom
Open
gh-146306: Optimize float operations by mutating uniquely-referenced operands in place (JIT only)#146307eendebakpt wants to merge 3 commits intopython:mainfrom
eendebakpt wants to merge 3 commits intopython:mainfrom
Conversation
…y-referenced operands in place
When the tier 2 optimizer can prove that an operand to a float
operation is uniquely referenced (refcount 1), mutate it in place
instead of allocating a new PyFloatObject.
New tier 2 micro-ops:
- _BINARY_OP_{ADD,SUBTRACT,MULTIPLY}_FLOAT_INPLACE (unique LHS)
- _BINARY_OP_{ADD,SUBTRACT,MULTIPLY}_FLOAT_INPLACE_RIGHT (unique RHS)
- _UNARY_NEGATIVE_FLOAT_INPLACE (unique operand)
Speeds up the pyperformance nbody benchmark by ~19%.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Avoid compound assignment (+=, -=, *=) directly on ob_fval in inplace float ops. On 32-bit Windows, this generates JIT stencils with _xmm register references that MSVC cannot parse. Instead, read into a local double, compute, and write back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add -mno-sse to clang args for i686-pc-windows-msvc target. The COFF32 stencil converter cannot handle _xmm register references that clang emits for inline float arithmetic. Using x87 FPU instructions avoids this. SSE is optional on 32-bit x86; x87 is the baseline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Selected pyperformance benchmarks: |
| r = right; | ||
| if (PyJitRef_IsUnique(left)) { | ||
| ADD_OP(_BINARY_OP_SUBTRACT_FLOAT_INPLACE, 0, 0); | ||
| l = PyJitRef_Borrow(left); |
Member
There was a problem hiding this comment.
Isn't it more correct to say l = sym_new_null(ctx);? Same for below?
Comment on lines
+804
to
+806
| res = left; | ||
| l = PyStackRef_NULL; | ||
| r = right; |
Member
There was a problem hiding this comment.
This part is a little confusing. Could you please add a comment saying.
Just this one is enough, as the rest can be explained using the same.
Suggested change
| res = left; | |
| l = PyStackRef_NULL; | |
| r = right; | |
| // Transfer ownership of left to res. | |
| // Original left is now dead. | |
| res = left; | |
| l = PyStackRef_NULL; | |
| r = right; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We can add the following tier 2 micro-ops that mutate the uniquely-referenced operand:
_BINARY_OP_ADD_FLOAT_INPLACE/_INPLACE_RIGHT— unique LHS / RHS_BINARY_OP_SUBTRACT_FLOAT_INPLACE/_INPLACE_RIGHT— unique LHS / RHS_BINARY_OP_MULTIPLY_FLOAT_INPLACE/_INPLACE_RIGHT— unique LHS / RHS_UNARY_NEGATIVE_FLOAT_INPLACE— unique operandThe
_RIGHTvariants handle commutative ops (add, multiply) plus subtract when only the RHS is unique. The optimizer emits these inoptimizer_bytecodes.cwhenPyJitRef_IsUnique(left)orPyJitRef_IsUnique(right)is true and the operand is a known float. The mutated operand is marked as borrowed so the following_POP_TOPbecomes_POP_TOP_NOP.Micro-benchmarks:
total += a*b + ctotal += a + btotal += a*b + c*dpyperformance nbody (20k iterations):
Followup
Some operations that could be added in followup PR's: division of floats, operations on a float and an int with a uniquely referenced float, integer operations (but this is move involved because of small ints and number of digits), operations with complex numbers.
Script