Skip to content

feat(connectors): add 7 knowledge base connectors (Google Forms, Typeform, Azure DevOps, YouTube, JSM, S3, Sentry)#4880

Merged
waleedlatif1 merged 16 commits into
stagingfrom
feat/connect
Jun 4, 2026
Merged

feat(connectors): add 7 knowledge base connectors (Google Forms, Typeform, Azure DevOps, YouTube, JSM, S3, Sentry)#4880
waleedlatif1 merged 16 commits into
stagingfrom
feat/connect

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Add 7 new KB connectors: Google Forms, Typeform, Azure DevOps (wiki + work items + repo files), YouTube, Jira Service Management, Amazon S3 (incl. R2/MinIO custom endpoints), and Sentry (incl. self-hosted)
  • X connector is built but deregistered — X's per-app rate limits and global monthly read caps don't fit multi-tenant background syncing; re-enabling is a two-line registry change
  • Each connector went through 4 validation passes against live provider API docs (build → audit → adversarial ship-gate → convergence pass); S3's SigV4 signing verified against AWS reference vectors
  • Harden add-connector/validate-connector skills with the listingCapped deletion-reconciliation requirement (the most common bug class found during validation)
  • Update KB connector docs: count (30 → 49), full categorized table, API-key sources, config examples

Type of Change

  • New feature

Testing

Connector test suite passes (100 tests), full type-check and Biome clean. Validated against live provider API docs; not yet smoke-tested against live accounts.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

…form, Azure DevOps, YouTube, JSM, S3, Sentry)
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 4, 2026 9:47pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 4, 2026

PR Summary

Medium Risk
Large new sync surface area (external APIs, credentials, and deletion-reconciliation behavior); mistakes in listing completeness or caps could delete or miss KB documents until caught in live smoke tests.

Overview
Adds seven knowledge-base connectors and wires them into CONNECTOR_REGISTRY: Google Forms (Drive listing + deferred form/response rendering), Typeform, Azure DevOps (multi-phase wiki / WIQL work items / git files with PAT auth), YouTube, Jira Service Management (service-desk requests + optional comments), Amazon S3 (SigV4 list/get, custom endpoints for R2/MinIO), and Sentry (issues + latest-event hydration, self-hosted host).

New implementations consistently set syncContext.listingCapped when listings are truncated (caps, API limits, auth gaps, or per-item errors) so full-sync deletion reconciliation does not remove documents that were never returned. Azure DevOps is the most elaborate (20k WIQL cap probe, phase cursors, maxItems across phases).

Docs/skills: add-connector / validate-connector now require listingCapped checks; user docs expand the connector catalog (30 → 49), API-key sources, and setup examples. mapTags.test.ts covers tag mapping for the new connectors. Minor docs UI fix raises search dialog z-index above the sticky navbar.

Reviewed by Cursor Bugbot for commit 3bfec6e. Configure here.

Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
Comment thread apps/sim/connectors/s3/s3.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 4, 2026

Greptile Summary

This PR adds 7 new knowledge-base connectors (Google Forms, Typeform, Azure DevOps wiki+work items+repo files, YouTube, JSM, Amazon S3/R2/MinIO, and Sentry) plus a deregistered X connector, and hardens the sync engine with a new listingTruncated guard and an extracted shouldReconcileDeletions helper.

  • New connectors are well-structured: each implements listingCapped semantics consistently, uses metadata-based content hashes (ETag, revision ID, status date, publish time), and defers per-document hydration to getDocument when appropriate.
  • Sync engine fix: the new listingTruncated flag correctly blocks deletion reconciliation when pagination is forcibly stopped (MAX_PAGES or a missing cursor), and — unlike listingCapped — cannot be overridden by a forced full-sync.
  • readBodyWithLimit helper in utils.ts streams the response body with a hard byte cap and cancels the reader on overflow, fixing the previously-reported S3 content-length bypass.

Confidence Score: 5/5

Safe to merge; the connector logic is well-reasoned and the sync-engine changes are covered by new tests.

The connectors follow established patterns correctly: content-hash invariance between listing and hydration, proper listingCapped guards in all new connectors, and streaming body reads to prevent oversized-object buffering. The sync-engine refactor is additive and tested. The only findings are a minor pattern inconsistency in the YouTube listingCapped guard (unreachable in practice due to how effectivePageSize is constrained) and a docs-site global.css edit that would be cleaner in a component-local file.

No files require special attention beyond the minor notes on apps/sim/connectors/youtube/youtube.ts and apps/docs/app/global.css.

Important Files Changed

Filename Overview
apps/sim/connectors/youtube/youtube.ts New YouTube connector; good publishedAfter early-stop logic for uploads playlists, and Shorts detection via a batched videos.list call. Minor: listingCapped guard is missing the overflow > 0 branch that all other connectors include.
apps/sim/connectors/s3/s3.ts New S3 connector with custom SigV4 signing, path-style addressing for R2/MinIO, streaming readBodyWithLimit body guard, and ETag-based change detection. Previously-flagged content-length bypass has been addressed.
apps/sim/connectors/sentry/sentry.ts New Sentry connector syncing issues with latest-event enrichment. Link-header cursor parsing, two-probe validation (project + issues), and self-hosted support look correct.
apps/sim/connectors/google-forms/google-forms.ts New Google Forms connector. Content hash combines Drive revisionId (structure) and latestResponseTime (new submissions) for accurate change detection.
apps/sim/connectors/azure-devops/azure-devops.ts New Azure DevOps connector covering wiki pages, work items (WIQL), and repository files across three cursor-encoded phases. Binary detection and readBodyWithLimit file size capping are present.
apps/sim/lib/knowledge/connectors/sync-engine.ts Extracts shouldReconcileDeletions into a tested helper and adds a new listingTruncated guard that is non-overridable even by a forced full-sync. Correct behaviour.
apps/sim/connectors/utils.ts Adds readBodyWithLimit, parseMultiValue, and joinTagArray helpers. readBodyWithLimit streams and cancels on overflow, fixing the previously-reported S3 issue.
apps/docs/app/global.css Adds z-index and !important overrides for the docs search dialog to fix navbar overlap. Rule c3b5e4b0 recommends against editing global.css for scoped component fixes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    LS[listDocuments page N] --> DOCS{documents fetched}
    DOCS -->|contentDeferred: true| STUB[emit stub]
    DOCS -->|contentDeferred: false| FULL[emit full doc]
    STUB --> HASH{content hash changed?}
    FULL --> HASH
    HASH -->|unchanged| SKIP[skip - docsUnchanged++]
    HASH -->|changed| GD[getDocument - hydrate content]
    GD -->|null| DEL[mark for deletion]
    GD -->|doc| UPSERT[add/update in KB]
    LS --> CAP{listingCapped?}
    CAP -->|true| NORECONCILE[skip deletion reconciliation]
    CAP -->|false| RECONCILE[hard-delete absent docs]
    TRUNCATED{listingTruncated?} -->|true| NORECONCILE
    FULLSYNC{forced fullSync?} -->|true + listingCapped| RECONCILE
    FULLSYNC -->|true + listingTruncated| NORECONCILE
Loading

Reviews (11): Last reviewed commit: "fix(connectors): ado discards foreign-ph..." | Re-trigger Greptile

Comment thread apps/sim/connectors/google-forms/google-forms.ts
Comment thread apps/sim/connectors/s3/s3.ts
Comment thread apps/sim/connectors/jsm/jsm.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/google-forms/google-forms.ts
Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
Comment thread apps/sim/connectors/jsm/jsm.ts
Comment thread apps/sim/connectors/google-forms/google-forms.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 5f4516b. Configure here.

…(fullSync cannot override); s3 byte-exact size fallback; ado tsdoc accuracy
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/connectors/azure-devops/azure-devops.ts
Comment thread apps/sim/connectors/google-forms/google-forms.ts Outdated
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 3bfec6e. Configure here.

- registry: register x connector (was dead code, never wired in)
- google-docs/google-drive/google-forms: gate deletion reconciliation on
  Drive incompleteSearch; google-docs also now sets listingCapped on its
  maxDocs cap path
- jsm: add read:jira-user scope so reporter resolves on requests
- gong: only set listingCapped on genuine truncation, not exact-cap
  source exhaustion
- gitlab: issues phase switched to keyset pagination (removes ~50k
  offset ceiling), matching the repo-tree phase
- grain: parallelize recording + transcript fetch in getDocument
- ashby: document updatedAt-based content-hash limitation for
  notes/feedback change detection
- tests: mapTags coverage for x, granola, greenhouse, fathom, rootly
@waleedlatif1 waleedlatif1 merged commit 2ccd9a3 into staging Jun 4, 2026
9 checks passed
@waleedlatif1 waleedlatif1 deleted the feat/connect branch June 4, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant