End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails#141
End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails#141
Conversation
|
@copilot |
Updated the PR description to show the wildcard case — calling |
There was a problem hiding this comment.
Pull request overview
Adds public schema discovery methods under client.tables so SDK users can programmatically discover table columns (attributes) and relationship definitions, which is particularly helpful when authoring client.query.sql() queries.
Changes:
- Added
TableOperations.list_columns(),list_relationships(), andlist_table_relationships()public APIs that delegate to data-layer helpers. - Implemented the corresponding data-layer methods
_ODataClient._list_columns()and_RelationshipOperationsMixin._list_relationships()/_list_table_relationships(). - Added unit tests and README examples for the new discovery APIs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/PowerPlatform/Dataverse/operations/tables.py |
Adds the three new public client.tables.* schema discovery methods and docstrings/examples. |
src/PowerPlatform/Dataverse/data/_odata.py |
Implements _list_columns() to fetch EntityDefinitions({MetadataId})/Attributes. |
src/PowerPlatform/Dataverse/data/_relationships.py |
Implements _list_relationships() and _list_table_relationships() (1:N + N:N) metadata discovery. |
tests/unit/test_tables_operations.py |
Verifies the new public methods delegate correctly to the underlying _odata methods. |
tests/unit/data/test_relationships.py |
Adds data-layer tests for _list_relationships() and _list_table_relationships(). |
tests/unit/data/test_list_columns.py |
New data-layer tests for _ODataClient._list_columns(). |
README.md |
Documents usage examples for the new schema discovery APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ith tests and README examples Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
…me bridge, guardrails, examples - Schema discovery APIs: list_columns, list_relationships, list_table_relationships - SELECT * auto-expansion via list_columns (server blocks SELECT *) - client.dataframe.sql() for SQL -> DataFrame bridge - SQL safety guardrails: block writes, auto-inject TOP 5000, warn on leading-wildcard LIKE and implicit cross joins - Comprehensive sql_examples.py (28 sections): JOINs, aggregates, GROUP BY, DISTINCT, OFFSET FETCH, polymorphic lookups, audit trail, SQL read -> DataFrame transform -> SDK write-back - Updated query.sql() docstring with accurate SQL capabilities - 732 tests (was 318), all passing
e9a25aa to
7f06533
Compare
…dd SELECT * + JOIN warning Live-tested against Aurora VM (aurorabapenv71aff.crm10.dynamics.com): - Server auto-caps at 5000 rows without TOP -> no client-side injection needed - Server blocks SELECT * on both single-table and JOIN queries -> SDK expansion confirmed needed - Added warning when SELECT * used with JOIN (expansion only includes first table columns) 732 tests passing.
New methods on client.query for SQL-first developers: - sql_columns(table) -> simplified column metadata list - sql_select(table) -> comma-separated column list for SELECT - sql_joins(table) -> all possible JOINs with ready-to-use clauses - sql_join(from, to) -> auto-generated JOIN clause between tables Key finding from live testing: SQL JOINs use the raw attribute name (e.g. parentcustomerid), NOT the _value suffix. The ReferencingAttribute from relationship metadata matches exactly. 13 new unit tests, 745 total passing.
New methods on client.query for OData users (parallel to SQL helpers): - odata_select(table) -> list[str] for records.get(select=) - odata_expands(table) -> all navigation properties with entity sets - odata_expand(from, to) -> PascalCase nav property name for expand= - odata_bind(from, to, id) -> @odata.bind dict for create/update payloads These eliminate the most error-prone parts of OData queries: - No more guessing PascalCase navigation property names for - No more manually constructing @odata.bind with entity set names - Column discovery matches records.get(select=) format directly 11 new unit tests, 756 total passing.
_list_table_relationships now fetches all 3 relationship types: - OneToManyRelationships (incoming: where table is referenced) - ManyToOneRelationships (outgoing: where table has a lookup) <-- NEW - ManyToManyRelationships This was critical for sql_joins() and odata_expands() which need outgoing lookups. Without ManyToOne, custom table lookups were invisible to the helpers. Also fixed: create_lookup_field requires lowercase logical names (not SchemaName) -- documented in copilot-instructions. Live-validated: 47/47 SQL tests passed on Aurora VM. Unit tests: 756 passed.
Columns with AttributeOf set are auto-generated display names (e.g. createdbyname, createdbyyominame) that cause errors in OData \ and are not real data columns. Now filtered out. Also: _list_table_relationships docstring updated for ManyToOne. 756 unit tests passing.
… depth - Moved write statement check (INSERT/UPDATE/DELETE) before _extract_logical_table() so users get the clear 'SQL endpoint is read-only' error instead of confusing 'no FROM clause' ValueError - Live-tested JOIN depth: 8-table JOINs confirmed working on Aurora VM - Live-tested nested polymorphic: customerid -> account -> owner -> businessunit chain works - Fixed validate_sql_full.py: lookup existence check (no more 95s waits), COUNT assertion relaxed for accumulated data, AttributeOf filter - 66/66 validation tests passed, 12/12 JOIN depth tests passed - 756 unit tests passing
…vs OData comparison New sections (27-31): - 27: AND/OR, NOT IN, NOT LIKE boolean logic - 28: Deep JOINs (5-8 tables) with built-in tables - 29: SQL helper functions (sql_columns, sql_select, sql_joins, sql_join) - 30: OData helper functions (odata_select, odata_expands, odata_expand, odata_bind) - 31: SQL vs OData side-by-side comparison with live benchmark - 32: Updated summary table with all new features - 33: Cleanup Summary table now includes: AND/OR, NOT IN/LIKE, 8+ table JOINs, nested polymorphic, self-JOIN, DISTINCT+JOIN, all helper functions. SQL-first workflow updated with helper-driven steps. 756 unit tests passing.
…vel cap) Live-tested on Aurora VM: - 9-table: 2.14s PASS - 10-table: 0.69s PASS (= OData limit) - 11-table: 0.56s PASS (BEYOND OData limit) - 12-table: 0.66s PASS - 13-table: 0.63s PASS - 15-table: 0.63s PASS (no degradation!) Updated sql_examples.py summary: 15+ tables (no depth limit) Updated SQL vs OData comparison table.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
New section 32: IMPORTANT Anti-Patterns & Best Practices - Cartesian products (FROM a, b without ON) -- AVOID - Leading-wildcard LIKE -- forces full table scans - No filter on large system tables -- returns max rows - SELECT * on wide tables -- transfers 260+ columns - Deep JOINs without TOP -- always use TOP N Also: toned down JOIN depth messaging to 'no limit (tested 15)' instead of promoting 15-table JOINs as a feature. Fixed duplicate SQL vs OData table in examples. 34 sections, 1194 lines. 756 unit tests passing.
Implicit cross joins (FROM a, b without ON) now raise ValidationError instead of UserWarning. The server allows these but they produce cartesian products (N*M intermediate rows) that degrade shared DB. SDK now blocks with clear error: 'Implicit cross join detected... Use explicit JOIN...ON syntax instead.' New subcode: validation_sql_cross_join_blocked SDK guardrail summary: - INSERT/UPDATE/DELETE -> ValidationError (blocked) - FROM a, b (cartesian) -> ValidationError (blocked) <-- upgraded - LIKE '%value' -> UserWarning (performance advisory) - SELECT * with JOIN -> UserWarning (partial expansion) 756 unit tests passing.
Principle: SDK should not be more restrictive than the server. The server allows cartesian queries (FROM a, b) -- they work and return results (capped at 5000). The SDK warns users but does not block, because: 1. Small-table cartesian products are safe (3 rows * 3 rows = 9) 2. Server has its own resource governance (5000-row cap, timeouts) 3. Blocking what the server allows creates user frustration Final guardrail design: - ValidationError: INSERT/UPDATE/DELETE (server blocks too, but SDK catches earlier with clearer message) - UserWarning: FROM a, b (cartesian), LIKE '%value', SELECT * + JOIN 756 unit tests passing.
All checks in one place (_sql_guardrails) for easy future bypass. BLOCKED (ValidationError -- saves network round-trip): - INSERT/UPDATE/DELETE/DROP/etc. (write statements) - CROSS JOIN, RIGHT JOIN, FULL OUTER JOIN (unsupported join types) - UNION / UNION ALL - HAVING - CTE (WITH ... AS) - Subqueries (IN (SELECT ...), EXISTS (SELECT ...)) WARNED (UserWarning -- query executes, advisory only): - LIKE '%value' (leading-wildcard, full table scan) - FROM a, b (implicit cartesian, server allows but risky) - SELECT * with JOIN (partial expansion) Principle: block what server blocks (save time), warn what server allows but is risky (respect user intent). 14 new tests, 770 total passing.
…* + JOIN behavior - SELECT * with JOIN only expands first (FROM) table columns - Joined table columns are NOT included in expansion - Added sql_select()/sql_columns() as recommended alternative - Updated guardrail summary to mention all blocked patterns
1. Remove unused imports (patch, PropertyMock) from test_select_star_expansion 2. Fix JOIN depth '6+' -> 'no depth limit' in query.py docstring 3. Fix stale 'auto-inject TOP' comment in _odata.py 4. Fix Learn-incompatible docstring types in tables.py list_columns 5. Fix Learn-incompatible docstring types in tables.py list_relationships 6. Fix list_table_relationships docstring: add ManyToOne 7. Fix cross-join regex to catch unaliased FROM account, contact 8. Fix write regex to catch comment-prefixed writes (/* */ and --) 9. Narrow odata_expands exception handling (KeyError/AttributeError/ValueError) 10. Consistent JOIN depth claims across all docs 770 unit tests passing.
|
All 11 review comments addressed in commit a595a72:
770 unit tests passing. |
…taverse API)
Dataverse Web API supports EntityDefinitions({id})/ManyToOneRelationships
as a standard metadata endpoint. ManyToOne is the outgoing lookup
perspective (child -> parent), complementing OneToMany (parent -> child).
- README: 'one-to-many + many-to-many' -> 'one-to-many + many-to-one + many-to-many'
- _relationships.py docstring updated to match
- copilot-instructions.md: added SKILL.md reference to governance list
770 unit tests passing.
…ributeOf test Critical fixes: - tables.py: Learn-incompatible ':class:\list\ of :class:\str\' -> 'list[str]' - _odata.py, _relationships.py, _error_codes.py: add __all__ = [] per SKILL.md Important fixes: - _odata.py: _delete_multiple param 'logical_name' -> 'table_schema_name' - _odata.py: _lowercase_list docstring removed misleading \ mention - _odata.py: _query_sql guardrails comment clarified (defense-in-depth) New test: - test_excludes_attribute_of_columns: verifies computed display-name columns (with AttributeOf set) are filtered from sql_columns() 771 unit tests passing.
…ng approach CodeQL found exponential backtracking in _SQL_WRITE_RE which used nested quantifiers for SQL comment matching: (?:/\*.*?\*/\s*|...)* Fix: separate comment stripping (_SQL_COMMENT_RE) from write detection. 1. Strip SQL comments with a safe non-backtracking regex 2. Check for write keywords with the simple anchored regex The comment regex uses [^*]*\*+(?:[^/*][^*]*\*+)*/ which is the standard safe pattern for matching C-style block comments. 3 new tests for comment-prefixed write detection. 774 unit tests passing.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- sql_joins(): fix docstring example to not use aliases with join_clause (join_clause uses full table name; use sql_join() for aliased queries) - sql_examples: replace _value JOIN pattern with sql_join() helper - list_table_relationships: docstring now mentions ManyToOne - remove unused VALIDATION_SQL_CROSS_JOIN_BLOCKED import - PR description: cartesian updated from ValidationError to UserWarning
???SQL developers working with Dataverse shouldn't need to learn OData or the Web API. This PR closes that gap by making the Python SDK's SQL endpoint usable end-to-end -- from schema discovery, through querying (including SELECT *, JOINs, aggregates), to writing results back via DataFrames.
Live-validated against Aurora VM: 66/66 SQL tests + 15-table JOIN confirmed + all anti-patterns guarded.
What's new
Schema discovery APIs
list_columns(table),list_relationships(),list_table_relationships(table)(OneToMany + ManyToOne + ManyToMany)SELECT * auto-expansion
SDK transparently rewrites
SELECT *into explicit column names (server blocks * directly). Warns when used with JOINs.DataFrame SQL bridge
client.dataframe.sql()-- SQL to pandas DataFrame. Full bidirectional: SQL read -> pandas transform -> SDK write-back.SQL helpers (
client.query)sql_columns,sql_select,sql_joins,sql_join-- auto-discover columns and JOINs from metadata.OData helpers (
client.query)odata_select,odata_expands,odata_expand,odata_bind-- auto-discover nav properties, build @odata.bind.SQL safety guardrails (3 layers of protection)
Layer 1 -- Server rejects (user cannot do these at all):
JOINwithoutON-- server errorINSERT/UPDATE/DELETE-- server errorSELECT *-- server error (SDK rewrites transparently)TOP > 5000-- server errorRIGHT/FULL OUTER/CROSS JOIN-- server errorLayer 2 -- SDK blocks or warns (before hitting the server):
INSERT/UPDATE/DELETE--ValidationErrorwith clear messageFROM a, b(cartesian) --UserWarning(server allows but produces N*M intermediate rows)LIKE '%value'--UserWarning(full table scan advisory)SELECT * with JOIN--UserWarning(partial expansion advisory)Layer 3 -- Documentation warns (sql_examples.py Section 32):
Anti-patterns and best practices (Section 32 of examples)
Dedicated section documenting 5 SQL anti-patterns with clear explanations of WHY they're dangerous (shared database impact) and HOW to fix them. The SDK's guardrails automatically catch patterns 1 and 2.
Key findings from live testing
parentcustomerid), not_valuesuffix\limit (SQL JOINs are flat, not nested)Test coverage
Comprehensive examples (
sql_examples.py-- 34 sections, 1194 lines)