Skip to content

End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails#141

Open
Copilot wants to merge 21 commits intomainfrom
copilot/add-public-api-for-metadata
Open

End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails#141
Copilot wants to merge 21 commits intomainfrom
copilot/add-public-api-for-metadata

Conversation

Copy link
Contributor

Copilot AI commented Mar 13, 2026

???SQL developers working with Dataverse shouldn't need to learn OData or the Web API. This PR closes that gap by making the Python SDK's SQL endpoint usable end-to-end -- from schema discovery, through querying (including SELECT *, JOINs, aggregates), to writing results back via DataFrames.

Live-validated against Aurora VM: 66/66 SQL tests + 15-table JOIN confirmed + all anti-patterns guarded.

What's new

Schema discovery APIs

  • list_columns(table), list_relationships(), list_table_relationships(table) (OneToMany + ManyToOne + ManyToMany)

SELECT * auto-expansion

SDK transparently rewrites SELECT * into explicit column names (server blocks * directly). Warns when used with JOINs.

DataFrame SQL bridge

client.dataframe.sql() -- SQL to pandas DataFrame. Full bidirectional: SQL read -> pandas transform -> SDK write-back.

SQL helpers (client.query)

sql_columns, sql_select, sql_joins, sql_join -- auto-discover columns and JOINs from metadata.

OData helpers (client.query)

odata_select, odata_expands, odata_expand, odata_bind -- auto-discover nav properties, build @odata.bind.

SQL safety guardrails (3 layers of protection)

Layer 1 -- Server rejects (user cannot do these at all):

  • JOIN without ON -- server error
  • INSERT/UPDATE/DELETE -- server error
  • SELECT * -- server error (SDK rewrites transparently)
  • TOP > 5000 -- server error
  • RIGHT/FULL OUTER/CROSS JOIN -- server error
  • Subqueries -- server error

Layer 2 -- SDK blocks or warns (before hitting the server):

  • INSERT/UPDATE/DELETE -- ValidationError with clear message
  • FROM a, b (cartesian) -- UserWarning (server allows but produces N*M intermediate rows)
  • LIKE '%value' -- UserWarning (full table scan advisory)
  • SELECT * with JOIN -- UserWarning (partial expansion advisory)

Layer 3 -- Documentation warns (sql_examples.py Section 32):

  • Cartesian products, leading-wildcard LIKE, unfiltered system tables, SELECT * on wide tables, deep JOINs without TOP

Anti-patterns and best practices (Section 32 of examples)

Dedicated section documenting 5 SQL anti-patterns with clear explanations of WHY they're dangerous (shared database impact) and HOW to fix them. The SDK's guardrails automatically catch patterns 1 and 2.

Key findings from live testing

  • SQL JOINs use raw attribute name (parentcustomerid), not _value suffix
  • No server depth limit on JOINs -- tested up to 15 tables (all PASS, ~0.6s)
  • JOINs with proper ON clauses use indexed FK lookups -- safe for production
  • Server enforces: 5000-row cap, ON clause required for JOINs, blocks write SQL
  • This exceeds OData's 10-level \ limit (SQL JOINs are flat, not nested)

Test coverage

  • 756 unit tests passing
  • 66 live validation tests (custom + built-in tables + all helpers)
  • 15-table JOIN depth confirmed on Aurora VM
  • SQL vs OData comparison with live benchmark in examples

Comprehensive examples (sql_examples.py -- 34 sections, 1194 lines)

Copilot AI changed the title [WIP] Add public API for discovering entity/table column definitions Add public schema discovery APIs: list_columns, list_relationships, list_table_relationships Mar 13, 2026
Copilot AI requested a review from saurabhrb March 13, 2026 16:22
@saurabhrb
Copy link
Contributor

@copilot
update the PR description to even show case where no columns are selected, i.e., with wild card *

Copy link
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot
update the PR description to even show case where no columns are selected, i.e., with wild card *

Updated the PR description to show the wildcard case — calling list_columns("account") with no select parameter returns all column properties (equivalent to *). The description now clearly distinguishes between "all properties" and "narrowed via select".

@saurabhrb saurabhrb marked this pull request as ready for review March 20, 2026 23:44
@saurabhrb saurabhrb requested a review from a team as a code owner March 20, 2026 23:44
Copilot AI review requested due to automatic review settings March 20, 2026 23:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds public schema discovery methods under client.tables so SDK users can programmatically discover table columns (attributes) and relationship definitions, which is particularly helpful when authoring client.query.sql() queries.

Changes:

  • Added TableOperations.list_columns(), list_relationships(), and list_table_relationships() public APIs that delegate to data-layer helpers.
  • Implemented the corresponding data-layer methods _ODataClient._list_columns() and _RelationshipOperationsMixin._list_relationships() / _list_table_relationships().
  • Added unit tests and README examples for the new discovery APIs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/PowerPlatform/Dataverse/operations/tables.py Adds the three new public client.tables.* schema discovery methods and docstrings/examples.
src/PowerPlatform/Dataverse/data/_odata.py Implements _list_columns() to fetch EntityDefinitions({MetadataId})/Attributes.
src/PowerPlatform/Dataverse/data/_relationships.py Implements _list_relationships() and _list_table_relationships() (1:N + N:N) metadata discovery.
tests/unit/test_tables_operations.py Verifies the new public methods delegate correctly to the underlying _odata methods.
tests/unit/data/test_relationships.py Adds data-layer tests for _list_relationships() and _list_table_relationships().
tests/unit/data/test_list_columns.py New data-layer tests for _ODataClient._list_columns().
README.md Documents usage examples for the new schema discovery APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI and others added 2 commits March 20, 2026 16:54
…ith tests and README examples

Co-authored-by: saurabhrb <32964911+saurabhrb@users.noreply.github.com>
@saurabhrb saurabhrb changed the title Add public schema discovery APIs: list_columns, list_relationships, list_table_relationships End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, and comprehensive examples Mar 21, 2026
…me bridge, guardrails, examples

- Schema discovery APIs: list_columns, list_relationships, list_table_relationships
- SELECT * auto-expansion via list_columns (server blocks SELECT *)
- client.dataframe.sql() for SQL -> DataFrame bridge
- SQL safety guardrails: block writes, auto-inject TOP 5000, warn on
  leading-wildcard LIKE and implicit cross joins
- Comprehensive sql_examples.py (28 sections): JOINs, aggregates, GROUP BY,
  DISTINCT, OFFSET FETCH, polymorphic lookups, audit trail, SQL read ->
  DataFrame transform -> SDK write-back
- Updated query.sql() docstring with accurate SQL capabilities
- 732 tests (was 318), all passing
@saurabhrb saurabhrb force-pushed the copilot/add-public-api-for-metadata branch from e9a25aa to 7f06533 Compare March 21, 2026 05:30
Saurabh Badenkal added 6 commits March 20, 2026 22:35
…dd SELECT * + JOIN warning

Live-tested against Aurora VM (aurorabapenv71aff.crm10.dynamics.com):
- Server auto-caps at 5000 rows without TOP -> no client-side injection needed
- Server blocks SELECT * on both single-table and JOIN queries -> SDK expansion confirmed needed
- Added warning when SELECT * used with JOIN (expansion only includes first table columns)

732 tests passing.
New methods on client.query for SQL-first developers:
- sql_columns(table) -> simplified column metadata list
- sql_select(table) -> comma-separated column list for SELECT
- sql_joins(table) -> all possible JOINs with ready-to-use clauses
- sql_join(from, to) -> auto-generated JOIN clause between tables

Key finding from live testing: SQL JOINs use the raw attribute name
(e.g. parentcustomerid), NOT the _value suffix. The ReferencingAttribute
from relationship metadata matches exactly.

13 new unit tests, 745 total passing.
New methods on client.query for OData users (parallel to SQL helpers):
- odata_select(table) -> list[str] for records.get(select=)
- odata_expands(table) -> all  navigation properties with entity sets
- odata_expand(from, to) -> PascalCase nav property name for expand=
- odata_bind(from, to, id) -> @odata.bind dict for create/update payloads

These eliminate the most error-prone parts of OData queries:
- No more guessing PascalCase navigation property names for
- No more manually constructing @odata.bind with entity set names
- Column discovery matches records.get(select=) format directly

11 new unit tests, 756 total passing.
_list_table_relationships now fetches all 3 relationship types:
- OneToManyRelationships (incoming: where table is referenced)
- ManyToOneRelationships (outgoing: where table has a lookup) <-- NEW
- ManyToManyRelationships

This was critical for sql_joins() and odata_expands() which need
outgoing lookups. Without ManyToOne, custom table lookups were
invisible to the helpers.

Also fixed: create_lookup_field requires lowercase logical names
(not SchemaName) -- documented in copilot-instructions.

Live-validated: 47/47 SQL tests passed on Aurora VM.
Unit tests: 756 passed.
Columns with AttributeOf set are auto-generated display names
(e.g. createdbyname, createdbyyominame) that cause errors in OData
\ and are not real data columns. Now filtered out.

Also: _list_table_relationships docstring updated for ManyToOne.
756 unit tests passing.
… depth

- Moved write statement check (INSERT/UPDATE/DELETE) before
  _extract_logical_table() so users get the clear 'SQL endpoint is
  read-only' error instead of confusing 'no FROM clause' ValueError
- Live-tested JOIN depth: 8-table JOINs confirmed working on Aurora VM
- Live-tested nested polymorphic: customerid -> account -> owner -> businessunit chain works
- Fixed validate_sql_full.py: lookup existence check (no more 95s waits),
  COUNT assertion relaxed for accumulated data, AttributeOf filter
- 66/66 validation tests passed, 12/12 JOIN depth tests passed
- 756 unit tests passing
@saurabhrb saurabhrb changed the title End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, and comprehensive examples End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails (8-table JOINs validated) Mar 22, 2026
Saurabh Badenkal added 2 commits March 21, 2026 21:30
…vs OData comparison

New sections (27-31):
- 27: AND/OR, NOT IN, NOT LIKE boolean logic
- 28: Deep JOINs (5-8 tables) with built-in tables
- 29: SQL helper functions (sql_columns, sql_select, sql_joins, sql_join)
- 30: OData helper functions (odata_select, odata_expands, odata_expand, odata_bind)
- 31: SQL vs OData side-by-side comparison with live benchmark
- 32: Updated summary table with all new features
- 33: Cleanup

Summary table now includes: AND/OR, NOT IN/LIKE, 8+ table JOINs,
nested polymorphic, self-JOIN, DISTINCT+JOIN, all helper functions.
SQL-first workflow updated with helper-driven steps.

756 unit tests passing.
…vel cap)

Live-tested on Aurora VM:
- 9-table: 2.14s PASS
- 10-table: 0.69s PASS (= OData  limit)
- 11-table: 0.56s PASS (BEYOND OData limit)
- 12-table: 0.66s PASS
- 13-table: 0.63s PASS
- 15-table: 0.63s PASS (no degradation!)

Updated sql_examples.py summary: 15+ tables (no depth limit)
Updated SQL vs OData comparison table.
@saurabhrb saurabhrb changed the title End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails (8-table JOINs validated) End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails (15-table JOINs validated) Mar 22, 2026
@saurabhrb saurabhrb requested a review from Copilot March 22, 2026 04:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Saurabh Badenkal added 5 commits March 21, 2026 21:49
New section 32: IMPORTANT Anti-Patterns & Best Practices
- Cartesian products (FROM a, b without ON) -- AVOID
- Leading-wildcard LIKE -- forces full table scans
- No filter on large system tables -- returns max rows
- SELECT * on wide tables -- transfers 260+ columns
- Deep JOINs without TOP -- always use TOP N

Also: toned down JOIN depth messaging to 'no limit (tested 15)'
instead of promoting 15-table JOINs as a feature. Fixed duplicate
SQL vs OData table in examples.

34 sections, 1194 lines. 756 unit tests passing.
Implicit cross joins (FROM a, b without ON) now raise ValidationError
instead of UserWarning. The server allows these but they produce
cartesian products (N*M intermediate rows) that degrade shared DB.

SDK now blocks with clear error: 'Implicit cross join detected...
Use explicit JOIN...ON syntax instead.'

New subcode: validation_sql_cross_join_blocked

SDK guardrail summary:
- INSERT/UPDATE/DELETE -> ValidationError (blocked)
- FROM a, b (cartesian) -> ValidationError (blocked) <-- upgraded
- LIKE '%value' -> UserWarning (performance advisory)
- SELECT * with JOIN -> UserWarning (partial expansion)

756 unit tests passing.
Principle: SDK should not be more restrictive than the server.
The server allows cartesian queries (FROM a, b) -- they work and
return results (capped at 5000). The SDK warns users but does not
block, because:
1. Small-table cartesian products are safe (3 rows * 3 rows = 9)
2. Server has its own resource governance (5000-row cap, timeouts)
3. Blocking what the server allows creates user frustration

Final guardrail design:
- ValidationError: INSERT/UPDATE/DELETE (server blocks too, but SDK
  catches earlier with clearer message)
- UserWarning: FROM a, b (cartesian), LIKE '%value', SELECT * + JOIN

756 unit tests passing.
All checks in one place (_sql_guardrails) for easy future bypass.

BLOCKED (ValidationError -- saves network round-trip):
- INSERT/UPDATE/DELETE/DROP/etc. (write statements)
- CROSS JOIN, RIGHT JOIN, FULL OUTER JOIN (unsupported join types)
- UNION / UNION ALL
- HAVING
- CTE (WITH ... AS)
- Subqueries (IN (SELECT ...), EXISTS (SELECT ...))

WARNED (UserWarning -- query executes, advisory only):
- LIKE '%value' (leading-wildcard, full table scan)
- FROM a, b (implicit cartesian, server allows but risky)
- SELECT * with JOIN (partial expansion)

Principle: block what server blocks (save time), warn what server
allows but is risky (respect user intent).

14 new tests, 770 total passing.
…* + JOIN behavior

- SELECT * with JOIN only expands first (FROM) table columns
- Joined table columns are NOT included in expansion
- Added sql_select()/sql_columns() as recommended alternative
- Updated guardrail summary to mention all blocked patterns
@saurabhrb saurabhrb changed the title End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails (15-table JOINs validated) End-to-end SQL support: schema discovery, SELECT * expansion, DataFrame bridge, helpers, guardrails Mar 22, 2026
1. Remove unused imports (patch, PropertyMock) from test_select_star_expansion
2. Fix JOIN depth '6+' -> 'no depth limit' in query.py docstring
3. Fix stale 'auto-inject TOP' comment in _odata.py
4. Fix Learn-incompatible docstring types in tables.py list_columns
5. Fix Learn-incompatible docstring types in tables.py list_relationships
6. Fix list_table_relationships docstring: add ManyToOne
7. Fix cross-join regex to catch unaliased FROM account, contact
8. Fix write regex to catch comment-prefixed writes (/* */ and --)
9. Narrow odata_expands exception handling (KeyError/AttributeError/ValueError)
10. Consistent JOIN depth claims across all docs

770 unit tests passing.
@saurabhrb
Copy link
Contributor

All 11 review comments addressed in commit a595a72:

  1. Removed unused imports (patch, PropertyMock) from test_select_star_expansion.py
  2. Fixed JOIN depth claim from '6+' to 'no depth limit' in query.py docstring
  3. Fixed stale 'auto-inject TOP' comment in _odata.py
  4. Fixed Learn-incompatible docstring types in tables.py list_columns() -- uses bracket generics now
  5. Fixed Learn-incompatible docstring types in tables.py list_relationships()
  6. Fixed list_table_relationships() docstring to include ManyToOne (matches implementation)
  7. Broadened cross-join regex to catch both aliased and unaliased comma joins (FROM account, contact)
  8. Improved write regex to catch comment-prefixed writes (/* */DELETE, --\nDELETE)
  9. Narrowed odata_expands() exception handling to (KeyError, AttributeError, ValueError)
  10. Consistent JOIN depth claims across all documentation
  11. README comment about one-to-many+many-to-many was already corrected in earlier commits

770 unit tests passing.

Saurabh Badenkal added 2 commits March 21, 2026 22:31
…taverse API)

Dataverse Web API supports EntityDefinitions({id})/ManyToOneRelationships
as a standard metadata endpoint. ManyToOne is the outgoing lookup
perspective (child -> parent), complementing OneToMany (parent -> child).

- README: 'one-to-many + many-to-many' -> 'one-to-many + many-to-one + many-to-many'
- _relationships.py docstring updated to match
- copilot-instructions.md: added SKILL.md reference to governance list

770 unit tests passing.
…ributeOf test

Critical fixes:
- tables.py: Learn-incompatible ':class:\list\ of :class:\str\' -> 'list[str]'
- _odata.py, _relationships.py, _error_codes.py: add __all__ = [] per SKILL.md

Important fixes:
- _odata.py: _delete_multiple param 'logical_name' -> 'table_schema_name'
- _odata.py: _lowercase_list docstring removed misleading \ mention
- _odata.py: _query_sql guardrails comment clarified (defense-in-depth)

New test:
- test_excludes_attribute_of_columns: verifies computed display-name
  columns (with AttributeOf set) are filtered from sql_columns()

771 unit tests passing.
…ng approach

CodeQL found exponential backtracking in _SQL_WRITE_RE which used
nested quantifiers for SQL comment matching: (?:/\*.*?\*/\s*|...)*

Fix: separate comment stripping (_SQL_COMMENT_RE) from write detection.
1. Strip SQL comments with a safe non-backtracking regex
2. Check for write keywords with the simple anchored regex

The comment regex uses [^*]*\*+(?:[^/*][^*]*\*+)*/ which is the
standard safe pattern for matching C-style block comments.

3 new tests for comment-prefixed write detection.
774 unit tests passing.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- sql_joins(): fix docstring example to not use aliases with join_clause
  (join_clause uses full table name; use sql_join() for aliased queries)
- sql_examples: replace _value JOIN pattern with sql_join() helper
- list_table_relationships: docstring now mentions ManyToOne
- remove unused VALIDATION_SQL_CROSS_JOIN_BLOCKED import
- PR description: cartesian updated from ValidationError to UserWarning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants