Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,602 changes: 1,602 additions & 0 deletions .github/workflows/daily-ambient-context-optimizer.lock.yml

Large diffs are not rendered by default.

297 changes: 297 additions & 0 deletions .github/workflows/daily-ambient-context-optimizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
---
emoji: "🌫️"
name: Daily Ambient Context Optimizer
description: Samples recent agentic workflow runs, inspects the first DLLM request text, and recommends prompt, skill, and agent changes to shrink ambient context
on:
schedule: daily
workflow_dispatch:
permissions:
contents: read
actions: read
issues: read
pull-requests: read
tracker-id: daily-ambient-context-optimizer
strict: true
max-daily-effective-tokens: 100M
network:
allowed: [defaults, github]
tools:
agentic-workflows:
bash: true
safe-outputs:
mentions: false
allowed-github-references: []
create-issue:
title-prefix: "[ambient-context] "
labels: [automation, report, workflow-optimization, analysis]
close-older-issues: true
expires: 7d
max: 1
timeout-minutes: 45
steps:
- name: Setup Python runtime
uses: actions/setup-python@v6.2.0
with:
python-version: "3.12"
- name: Prepare analysis workspace
run: |
mkdir -p /tmp/gh-aw/ambient-context
imports:
- shared/otlp.md
---

# Daily Ambient Context Optimizer

You are a cost-optimization analyst for `${{ github.repository }}`.

Your job is to inspect the **first request sent to the DLLM** for several recent workflow runs, identify avoidable ambient context, and publish exactly one issue with concrete workflow improvements.

## Goals

1. Sample several agentic workflow runs from the last 24 hours.
2. Inspect the first DLLM request text for each sampled run.
3. Use deterministic Python analysis to measure prompt bloat and repetition.
4. Recommend the highest-leverage improvements to workflow `.md` files, skill usage, and the set of agents/sub-agents.
5. Create exactly one detailed issue report.

## Data Collection

### Step 1 — Download recent runs

Use the `logs` tool from the `agentic-workflows` MCP server with:

- `start_date: "-1d"`
- `count: 120`
- `parse: true`

The tool downloads run data under `/tmp/gh-aw/aw-mcp/logs/`.

### Step 2 — Pick the sample set

Sample **6 runs** when available. If fewer than 6 eligible runs exist, sample all eligible runs down to a minimum of 3 before falling back to a reduced-data report.

Eligibility rules:

- `status == "completed"`
- exclude this workflow itself
- prefer successful runs, but include up to 2 failed runs when they have usable prompt artifacts
- prefer breadth: no more than 2 runs from the same workflow when alternatives exist
- require a usable first-request source:
- preferred: `prompt.txt`
- fallback: the first `user.message` event in `events.jsonl`

Prefer higher-cost runs first by using `effective_tokens`, `token_usage`, `turns`, or prompt size when available.

### Step 3 — Enrich a subset with audits

Run the `audit` tool from the `agentic-workflows` MCP server for the **3 most expensive sampled runs** so you have richer token context and references.

## First-Request Extraction Rules

Treat the first DLLM request text as:

1. `prompt.txt` when present, because it is the generated prompt sent to the agent
2. otherwise, extract the first user-message payload from the run's `events.jsonl`

For each sampled run, save the extracted text to:

- `/tmp/gh-aw/ambient-context/samples/run-<id>.txt`

Also save one metadata JSON file per run at:

- `/tmp/gh-aw/ambient-context/samples/run-<id>.json`

Include at least:

- `run_id`
- `workflow_name`
- `workflow_path`
- `run_url`
- `status`
- `conclusion`
- `effective_tokens`
- `token_usage`
- `turns`
- `request_chars`
- `request_lines`
- `request_source`

## Deterministic Analysis

Write and run a Python script at `/tmp/gh-aw/ambient-context/analyze_requests.py`.

Use only the Python standard library. Do **not** install third-party packages.

The script must read every sampled `run-*.txt` and `run-*.json` file and produce:

- `/tmp/gh-aw/ambient-context/request-analysis.json`
- `/tmp/gh-aw/ambient-context/request-analysis.md`

The script must compute deterministic metrics for each sampled first request:

- bytes, characters, lines, words
- markdown heading count
- list item count
- code fence count
- HTML `<details>` count
- table row count
- inline agent count (`## agent:`)
- inline skill count (`## skill:`)
- imported skill reference count (`SKILL.md`)
- duplicate line ratio
- duplicate paragraph ratio
- longest 5 sections by heading
- top repeated non-trivial lines or paragraphs
- count of lines mentioning tools, skills, agents, safe outputs, and workflow instructions

Aggregate metrics across the sample set:

- sampled run count
- distinct workflow count
- median request chars
- p95 request chars
- top workflows by first-request size
- most common repeated fragments
- most common large-section headings

## Source Review

For every sampled run, read the current workflow source file from the repository when `workflow_path` resolves to a local `.github/workflows/*.md` file.

Assess whether the request size is likely driven by:

- verbose workflow markdown
- overly broad or duplicated skill instructions
- too many inline agents or agent definitions that are not justified
- duplicated guardrails, examples, or formatting rules
- context that should be moved to deterministic `steps:` or smaller sub-agents

## Sub-Agent Usage

After the deterministic Python script finishes, invoke `request-optimizer` **once per sampled run** using that run's compact JSON summary, not the raw full prompt, whenever at least 3 sampled runs exist.

Each sub-agent invocation may return at most 3 opportunities for its run. Aggregate and deduplicate those per-run opportunities, then do the final prioritization yourself.

## Recommendation Rules

Produce **3 to 7** recommendations total.

Each recommendation must include:

- category: `workflow-md`, `skills`, or `agents`
- affected workflow(s)
- evidence from deterministic metrics
- why it should shrink the first request
- expected impact: `high`, `medium`, or `low`
- whether the change is likely safe immediately or needs manual review

Prioritize recommendations that:

1. remove repeated context shared across many runs
2. reduce broad skill loading or oversized skill fusion
3. simplify or remove low-value inline agents
4. move deterministic data gathering out of the main prompt

Do not recommend changes that would obviously weaken safety or remove necessary task context.

## Report Requirements

Create exactly one issue titled:

`[ambient-context] Daily Ambient Context Optimizer - YYYY-MM-DD`

Use only `###` or lower headings.

Keep the issue structured like this:

### Executive Summary
- runs sampled
- workflows covered
- median and p95 first-request size
- highest-level conclusion

### Highest-Leverage Changes
- a concise numbered list of the top recommendations

### Key Metrics
| Metric | Value |
|---|---|
| Sampled runs | ... |
| Distinct workflows | ... |
| Median chars | ... |
| P95 chars | ... |
| Largest sampled request | ... |

<details>
<summary>Per-Run First-Request Metrics</summary>

Include a markdown table with one row per sampled run.

</details>

<details>
<summary>Repeated Ambient Context Signals</summary>

Summarize repeated sections, duplicated fragments, and bloated headings.

</details>

<details>
<summary>Deterministic Analysis Output</summary>

Summarize the Python script outputs and cite the most relevant metrics.

</details>

### Recommendations by Category
#### Workflow Markdown
#### Skills
#### Agents

### References
- Include up to 3 sampled run links in `[§12345](https://github.com/owner/repo/actions/runs/12345)` format

## Reduced-Data Behavior

If fewer than 3 eligible runs exist, still create the issue.

In that case:

- explain the reduced sample size clearly
- report whatever evidence is available
- prioritize repository-wide recommendations only when supported by the sampled data

Do not use `noop` merely because the sample is small or imperfect. Create exactly one issue whenever logs are available. Use `noop` only if no run logs can be downloaded at all or the repository context is unavailable.

## agent: `request-optimizer`
---
description: Ranks prompt-shrinking opportunities for one sampled run from compact deterministic metrics
model: small
---
You are a compact optimization classifier.

Input:
- one JSON object for a sampled run
- optional workflow source excerpt

Return JSON only:

```json
{
"run_id": 123,
"workflow_name": "name",
"opportunities": [
{
"category": "workflow-md|skills|agents",
"finding": "short statement",
"evidence": ["metric or source detail"],
"impact": "high|medium|low"
}
]
}
```

Rules:
- return at most 3 opportunities
- use only provided evidence
- prefer opportunities that reduce first-request size without reducing safety
39 changes: 39 additions & 0 deletions actions/setup/js/copilot_harness.test.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -425,12 +425,51 @@ describe("copilot_harness.cjs", () => {
expect(onPermissionRequest({ kind: "mcp", serverName: "github", toolName: "get_file_contents" })).toEqual({ kind: "approve-once" });
expect(onPermissionRequest({ kind: "url", url: "https://example.com" })).toEqual({ kind: "approve-once" });
expect(onPermissionRequest({ kind: "write", fileName: "a.txt", diff: "", intention: "" })).toEqual({ kind: "approve-once" });
expect(onPermissionRequest({ kind: "read", fileName: "a.txt" })).toEqual({
kind: "reject",
feedback: "Tool invocation is not allowed by workflow tool permissions.",
});
expect(onPermissionRequest({ kind: "shell", commands: [{ identifier: "rm" }], fullCommandText: "rm -rf /tmp/x" })).toEqual({
kind: "reject",
feedback: "Tool invocation is not allowed by workflow tool permissions.",
});
});

it("allows read requests when read is explicitly allowlisted", async () => {
const disconnect = vi.fn().mockResolvedValue(undefined);
const stop = vi.fn().mockResolvedValue(undefined);
const createSession = vi.fn().mockResolvedValue({
sessionId: "session-read-allowed",
on: () => {},
sendAndWait: vi.fn().mockResolvedValue({ data: { content: "ok" } }),
disconnect,
});
class FakeCopilotClient {
start = vi.fn().mockResolvedValue(undefined);
createSession = createSession;
stop = stop;
}

const result = await runWithCopilotSDK({
sdkUri: "http://127.0.0.1:3002",
prompt: "test prompt",
logger: () => {},
permissionConfig: {
allowedTools: ["read"],
},
sdkModule: {
CopilotClient: FakeCopilotClient,
RuntimeConnection: { forUri: vi.fn(() => ({})) },
approveAll: () => ({ kind: "approve-once" }),
},
});

expect(result.exitCode).toBe(0);
const sessionConfig = createSession.mock.calls[0][0];
const onPermissionRequest = sessionConfig.onPermissionRequest;
expect(onPermissionRequest({ kind: "read", fileName: "a.txt" })).toEqual({ kind: "approve-once" });
});

it("logs permission-denied SDK requests as core warnings", async () => {
const disconnect = vi.fn().mockResolvedValue(undefined);
const stop = vi.fn().mockResolvedValue(undefined);
Expand Down
5 changes: 3 additions & 2 deletions actions/setup/js/copilot_sdk_driver.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ function summarizePermissionRequest(request) {
return `url(${request.url || "unknown"})`;
case "write":
return `write(${request.fileName || "unknown"})`;
case "read":
return "read";
case "custom-tool":
return `custom-tool(${request.toolName || "unknown"})`;
default:
Expand Down Expand Up @@ -164,8 +166,7 @@ function buildCopilotSDKPermissionHandler(permissionConfig, approveAll, logOptio
case "write":
return allowedToolEntries.has("write");
case "read":
// Read permissions are low-risk and are broadly expected by the agent flow.
return true;
return allowedToolEntries.has("read");
case "url":
return allowedToolEntries.has("web_fetch");
case "mcp":
Expand Down
Loading
Loading