Skip to content

Implement batch API with changeset, upsert, and DataFrame integration#129

Open
sagebree wants to merge 24 commits intomainfrom
users/sagebree/batch
Open

Implement batch API with changeset, upsert, and DataFrame integration#129
sagebree wants to merge 24 commits intomainfrom
users/sagebree/batch

Conversation

@sagebree
Copy link
Contributor

@sagebree sagebree commented Feb 27, 2026

Summary

  • Adds client.batch namespace -- a deferred-execution batch API that packs multiple
    Dataverse Web API operations into a single POST $batch HTTP request
  • Adds client.batch.dataframe namespace -- pandas DataFrame wrappers for batch operations
  • Adds client.records.upsert() and client.batch.records.upsert() backed by the
    UpsertMultiple bound action with alternate-key support
  • Fixes a bug where alternate key fields were merged into the UpsertMultiple request
    body, causing 400 Bad Request on the create path

Batch API Design

Implements the Batch API Design spec from @sagebree:

Capability How to use Status
Record CRUD (create / update / delete / get) batch.records.* Done
Upsert by alternate key batch.records.upsert(...) Done
Table metadata (create / delete / columns / relationships) batch.tables.* Done
SQL queries batch.query.sql(...) Done
Atomic write groups batch.changeset() Done
Continue past failures batch.execute(continue_on_error=True) Done
DataFrame integration batch.dataframe.create/update/delete Done (new)

Design constraints enforced:

  • Maximum 1000 operations per batch (validated before sending)
  • records.get paginated overload not supported -- single-record only
  • GET operations cannot be placed inside a changeset (enforced by API design)
  • Content-ID references are only valid within the same changeset
  • File upload operations not batchable
  • tables.create returns no table metadata on success (HTTP 204)
  • tables.add_columns / tables.remove_columns do not flush the picklist cache
  • client.flush_cache() not supported in batch (client-side operation)

What's included

New: client.batch API

  • batch.records.create / get / update / delete / upsert
  • batch.tables.create / get / list / add_columns / remove_columns / delete
  • batch.tables.list(filter=..., select=...) -- parity with client.tables.list() from Add filter and select parameters to client.tables.list() #112
  • batch.tables.create_one_to_many_relationship / create_many_to_many_relationship / delete_relationship / get_relationship / create_lookup_field
  • batch.query.sql
  • batch.changeset() context manager for transactional (all-or-nothing) operations
  • Content-ID reference chaining inside changesets (globally unique across all changesets via shared counter)
  • execute(continue_on_error=True) for mixed success/failure batches
  • BatchResult with .responses, .succeeded, .failed, .created_ids, .has_errors

New: client.batch.dataframe API

  • batch.dataframe.create(table, df) -- DataFrame rows to CreateMultiple batch item
  • batch.dataframe.update(table, df, id_column) -- DataFrame rows to update batch items
  • batch.dataframe.delete(table, ids_series) -- pandas Series to delete batch items

Existing: Refactored existing APIs

  • Payload generation shared between batch and direct API via _build_* / _RawRequest pattern
  • Execution of batch operations deferred to execute()

OData $batch spec compliance

  • Audited against Microsoft Learn docs
  • Content-Transfer-Encoding: binary per part
  • Content-Type: application/http per part
  • Content-Type: application/json; type=entry for POST/PATCH bodies
  • CRLF line endings throughout
  • Absolute URLs in batch parts
  • Empty changesets silently skipped (prevents invalid multipart)
  • Top-level batch error handling (non-multipart 4xx/5xx raises HttpError with parsed Dataverse error details)
  • Accepts 200, 202 Accepted, 207 Multi-Status, and 400 batch response codes

Review comment fixes

  • Fixed expected status codes to include 202/207 for all Dataverse environments
  • Fixed _split_multipart / _parse_mime_part return type annotations: List[Tuple[Dict[str, str], str]]
  • Fixed OptionSet string check regression: now uses dict key lookup instead of JSON string search
  • Fixed _build_get to lowercase select column names (consistency with _get_multiple)
  • Added RFC 3986 %20 encoding documentation in _build_sql docstring
  • Fixed content-id response parsing for non-changeset parts
  • Fixed test assertions after merge: data bytes instead of json kwarg
  • Exception type parity: batch.records.upsert() raises TypeError (matching client.records.upsert())

Testing

Unit tests -- 579 tests passing:

  • test_batch_operations.py -- BatchRequest, BatchRecordOperations, BatchTableOperations, BatchQueryOperations, ChangeSet, BatchItemResponse, BatchResult
  • test_batch_serialization.py -- multipart serialization, response parsing, intent resolution, upsert dispatch, batch size limit, content-ID uniqueness, top-level error handling
  • test_batch_edge_cases.py -- 40 edge case tests: empty changeset, changeset rollback, content-ID in standalone parts, mixed batch, multiple changesets, batch size limits, top-level errors, continue-on-error, serialization compliance, multipart parsing, content-ID references, intent validation
  • test_batch_dataframe.py -- 18 tests: DataFrame create/update/delete, validation, NaN handling, empty series, bulk delete
  • test_odata_internal.py -- _build_upsert_multiple body exclusion, conflict detection, URL/method correctness

E2E tests -- 14 tests passing against live Dataverse (crm10.dynamics.com):

  1. Basic batch CRUD (single create + CreateMultiple, update, get, delete)
  2. Changeset happy path (create + update via $ref content-ID)
  3. Changeset rollback (failing op rolls back entire changeset)
  4. Multiple changesets (globally unique content-IDs)
  5. Continue-on-error (mixed success/failure)
  6. Batch SQL query
  7. Batch tables.get + tables.list
  8. DataFrame batch create
  9. DataFrame batch update
  10. DataFrame batch delete
  11. Mixed batch (changeset + standalone GET)
  12. Empty changeset (silently skipped)
  13. Content-ID chaining (2 creates + 2 updates via $ref)
  14. Table setup/teardown

Examples & docs

  • examples/advanced/batch.py -- reference examples for all batch operation types
  • examples/advanced/walkthrough.py -- batch section added (section 11)
  • examples/basic/functional_testing.py -- test_batch_all_operations() covering all operation categories against a live environment

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants