feat: Add bigframes.pandas.job_history() API to track BigQuery jobs#2435
feat: Add bigframes.pandas.job_history() API to track BigQuery jobs#2435
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
|
||
|
|
||
| @dataclasses.dataclass | ||
| class JobMetadata: |
There was a problem hiding this comment.
can we add a static factory method to build this from an sdk query job object?
There was a problem hiding this comment.
Done! Added a from_job classmethod (and from_row_iterator) to handle building the metadata object directly from the jobs.
| error_result: Optional[Mapping[str, Any]] = None | ||
| cached: Optional[bool] = None | ||
| job_url: Optional[str] = None | ||
| query: Optional[str] = None |
There was a problem hiding this comment.
I do worry that at a certain point, storing all query test generated by the session might clog up memory?
There was a problem hiding this comment.
Good point! To prevent memory bloat during long sessions, I have added truncation so we cap the stored query text strings at a maximum of 1024 characters.
sycai
left a comment
There was a problem hiding this comment.
I have the concern of placing job_history under the bigframes.pandas package. We may consider bigframes or session instances as the residing places, mainly because functionalities under bigframes.ml and bigframes.bigquery can also trigger jobs but they do not belong to bigframes.pandas.
|
Migration Notice: This library is moving to the google-cloud-python monorepo soon. We closed this PR due to inactivity to ensure a clean migration. Please re-open this work in the new monorepo once the migration is complete! |
2fbbfa1 to
8d3b0c5
Compare
I agree with you. I have fully moved it out of bf.pandas. The API is now renamed to execution_history() to better reflect the broadened abstraction and is directly exposed via the root module (bigframes.execution_history()) and on the Session instance. |
This PR is not ready for review. I need it for colab notebook testing.
This PR introduces a new function bigframes.pandas.job_history() that allows users to retrieve a pandas DataFrame listing the BigQuery jobs initiated by BigFrames in the current Python session. This provides visibility into the underlying BigQuery execution, including query text, resource usage, and job duration, which is invaluable for monitoring and optimization.
Key Changes:
Usage Example:
verified at vs code notebook: screen/8u2yhaRV9iHbDbF
Fixes #<481840739> 🦕