Skip to content

fix(ssh): make execution timeout configurable and remove internal 300s cutoff#3394

Open
Danigm-dev wants to merge 8 commits intosimstudioai:stagingfrom
Danigm-dev:fix/ssh-timeout-configurable-staging
Open

fix(ssh): make execution timeout configurable and remove internal 300s cutoff#3394
Danigm-dev wants to merge 8 commits intosimstudioai:stagingfrom
Danigm-dev:fix/ssh-timeout-configurable-staging

Conversation

@Danigm-dev
Copy link

Summary

Makes SSH/tool internal execution timeout configurable by forwarding the effective timeout to internal
secure fetch calls, removing the hard internal 300s cutoff. Also documents how to configure
EXECUTION_TIMEOUT_FREE for local Docker and Helm/cloud deployments.

Fixes #(issue)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: tests

Testing

  • Ran targeted suite:
    • cd apps/sim && bunx vitest run tools/index.test.ts
  • Result:
    • 1 passed, 52 passed

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

N/A

@vercel
Copy link

vercel bot commented Mar 2, 2026

@Danigm-dev is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR removes the hardcoded 300s (5 minute) internal timeout for SSH/tool execution by forwarding the configurable EXECUTION_TIMEOUT_FREE value to the underlying secureFetchWithPinnedIP calls. The implementation replaces the local AbortController pattern with direct timeout forwarding, allowing SSH blocks to run up to 10 minutes when configured.

  • Refactored internal route handling in apps/sim/tools/index.ts to use secureFetchWithPinnedIP with configurable timeout
  • Added comprehensive test coverage for timeout forwarding behavior with two dedicated test cases
  • Updated documentation for both Docker Compose and Helm deployments with clear configuration examples
  • Test expectations aligned with current error parser behavior (HTTP status text fallback when response body parsing fails)

Confidence Score: 5/5

  • This PR is safe to merge with no identified risks
  • The implementation is clean and focused, with comprehensive test coverage verifying both explicit and default timeout forwarding. The refactoring properly handles DNS resolution for internal routes, constructs Response objects correctly for different status codes, and maintains clear error messages. Documentation is thorough for both local and cloud deployments.
  • No files require special attention

Important Files Changed

Filename Overview
apps/sim/tools/index.ts Replaced hardcoded 300s timeout with configurable timeout forwarding to secureFetchWithPinnedIP, enabling SSH blocks to run beyond 5 minutes when configured
apps/sim/tools/index.test.ts Added comprehensive test coverage for timeout forwarding behavior and updated error message expectations to match current parser behavior
README.md Documented SSH block timeout configuration for local deployments, explaining how to set EXECUTION_TIMEOUT_FREE environment variable
docker-compose.prod.yml Added EXECUTION_TIMEOUT_FREE environment variable with default value of 300 seconds to enable timeout configuration
helm/sim/README.md Documented Helm deployment steps for configuring SSH 10-minute timeout in cloud environments with verification commands

Last reviewed commit: a624850

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Danigm-dev Danigm-dev force-pushed the fix/ssh-timeout-configurable-staging branch from 3d83b39 to f70f154 Compare March 18, 2026 13:47
@cursor
Copy link

cursor bot commented Mar 18, 2026

PR Summary

Medium Risk
Changes internal tool HTTP execution to use SSRF-protected secureFetchWithPinnedIP with DNS/IP resolution and body handling, which could affect all internal /api/* tool calls and timeout/error behavior. Adds new timeout-related tests and deployment docs, but regressions would surface as tool execution failures or altered error messages.

Overview
Removes the hard ~300s internal cutoff for tool calls to internal /api/* routes by forwarding the effective tool timeout (or DEFAULT_EXECUTION_TIMEOUT_MS) into secureFetchWithPinnedIP, including resolving/pinning the internal hostname to an IP.

Adjusts internal-response handling to correctly deal with no-body statuses (e.g. 204) and to parse error bodies more safely via response.clone() (preserving plain-text errors when JSON parsing fails). Adds focused Vitest coverage for internal timeout propagation, plain-text error preservation, and 204 behavior.

Updates self-hosting docs and deployment configs to expose EXECUTION_TIMEOUT_FREE (Docker Compose + Helm) and explains how to align workflow execution timeouts with the internal tool HTTP timeout path for 10-minute SSH runs.

Written by Cursor Bugbot for commit c589adf. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@Danigm-dev
Copy link
Author

Fixed in c589adfd0.

The issue was valid: the internal secure-fetch path rebuilt a standard Response, and the error handler then tried response.json() before falling back to response.text(), which could consume the body and lose plain-text error messages.

This is now fixed by probing JSON from response.clone() and preserving the original body for the text fallback.

I also added targeted coverage for:

  • internal secure-fetch responses with text/plain error bodies
  • internal 204 responses to ensure we do not attempt to read a body

Relevant tests pass locally for the affected paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant