19 Commits

Author SHA1 Message Date
Yury Semikhatsky
0f7fd1362f chore: mark 0.0.12 (#176) 2025-04-14 19:39:10 -07:00
Yury Semikhatsky
de08c24b96 fix: consider DISPLAY only on linux (#175) 2025-04-14 19:07:39 -07:00
Pavel Feldman
71e51ea42a chore: mark v0.0.11 (#173) 2025-04-14 16:48:36 -07:00
Pavel Feldman
0c5a104e0f chore: default to headless when DISPLAY is missing (#172)
Fixes https://github.com/microsoft/playwright-mcp/issues/165
2025-04-14 16:47:32 -07:00
Pavel Feldman
606b898a71 chore: allow reusing tab over cdp (#170)
Fixes https://github.com/microsoft/playwright-mcp/issues/164
2025-04-14 16:39:58 -07:00
Simon Knott
e729494bd9 feat: browser_resize (#92) 2025-04-14 16:09:48 -07:00
Cameron
77080e8ca4 Restore package-lock.json module hashes (#151)
- Adds integrity hashes that were missing for 5 npm packages in
`package-lock.json`

<hr />

Is there a reason hashes for some of these dependencies are missing from
`package-lock.json`?

Right now these omissions prevent me from packaging a nix derivation for
this mcp server directly off this repo
(https://github.com/cameronfyfe/nix-mcp-servers/blob/main/pkgs/servers/mcp-server-playwright/default.nix#L17)
and I was wondering if they might just be missing due to a bad merge or
something odd like that at some point. `npm install` under normal use
doesn't seem to care if the hashes are missing and installs the packages
anyway, but it's a blocker for hermetic build systems like nix.

If this is intentional for some reason I'm not familiar with feel free
to ignore and close.
2025-04-10 15:24:36 +02:00
Simon Knott
31ac1ed191 fix: exit watchdog should listen for SIGINT/SIGTERM (#144) 2025-04-07 14:51:57 -07:00
Paul Irish
b8ff009b0a chore: add back stable vscode install button (#145) 2025-04-07 14:18:01 -07:00
Yoshiki Nakagawa
42167878fb chore: Update README.md (#140)
Remove old documents
2025-04-07 08:54:05 +02:00
Pavel Feldman
6b15c7e422 chore: mark v0.0.10 (#138) 2025-04-05 19:14:50 -07:00
Pavel Feldman
abd56f514b chore: introduce capabilities argument (#135) 2025-04-04 17:14:30 -07:00
Pavel Feldman
707ebbf4d4 chore: group tools, prepare for capabilities (#134) 2025-04-04 15:22:00 -07:00
Pavel Feldman
fc0cccf4a5 chore: reuse the first tab when navigating (#131) 2025-04-03 22:39:55 -07:00
Pavel Feldman
e36d4ea695 chore: allow multiple tabs (#129) 2025-04-03 19:24:17 -07:00
Pavel Feldman
b358e47d71 chore: prep for multiple pages in context (#124) 2025-04-03 10:30:05 -07:00
Yury Semikhatsky
38f038a5dc chore: typo in description (#127) 2025-04-02 17:26:45 -07:00
Yury Semikhatsky
2291011dc7 feat: add slowly option for typing one character at a time (#121) 2025-04-02 14:36:30 -07:00
Pavel Feldman
89627fd23a chore: extract page snapshot, prep for multipage (#120) 2025-04-02 11:42:39 -07:00
30 changed files with 1553 additions and 657 deletions

View File

@@ -22,6 +22,9 @@ jobs:
- name: Install dependencies
run: npm ci
- name: Playwright install
run: npx playwright install --with-deps
- name: Run linting
run: npm run lint

152
README.md
View File

@@ -43,7 +43,7 @@ const urlForWebsites = `vscode:mcp/install?${encodeURIComponent(config)}`;
const urlForGithub = `https://insiders.vscode.dev/redirect?url=${encodeURIComponent(urlForWebsites)}`;
-->
[<img alt="Install in VS Code Insiders" src="https://img.shields.io/badge/VS_Code_Insiders-VS_Code_Insiders?style=flat-square&label=Install%20Server&color=24bfa5">](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522playwright%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522-y%2522%252C%2522%2540playwright%252Fmcp%2540latest%2522%255D%257D)
[<img src="https://img.shields.io/badge/VS_Code-VS_Code?style=flat-square&label=Install%20Server&color=0098FF" alt="Install in VS Code">](https://insiders.vscode.dev/redirect?url=vscode%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522playwright%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522-y%2522%252C%2522%2540playwright%252Fmcp%2540latest%2522%255D%257D) [<img alt="Install in VS Code Insiders" src="https://img.shields.io/badge/VS_Code_Insiders-VS_Code_Insiders?style=flat-square&label=Install%20Server&color=24bfa5">](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Amcp%2Finstall%3F%257B%2522name%2522%253A%2522playwright%2522%252C%2522command%2522%253A%2522npx%2522%252C%2522args%2522%253A%255B%2522-y%2522%252C%2522%2540playwright%252Fmcp%2540latest%2522%255D%257D)
Alternatively, you can install the Playwright MCP server using the VS Code CLI:
@@ -68,6 +68,7 @@ The Playwright MCP server supports the following command-line options:
- Chrome channels: `chrome-beta`, `chrome-canary`, `chrome-dev`
- Edge channels: `msedge-beta`, `msedge-canary`, `msedge-dev`
- Default: `chrome`
- `--caps <caps>`: Comma-separated list of capabilities to enable, possible values: tabs, pdf, history, wait, files, install. Default is all.
- `--cdp-endpoint <endpoint>`: CDP endpoint to connect to
- `--executable-path <path>`: Path to the browser executable
- `--headless`: Run browser in headless mode (headed by default)
@@ -167,22 +168,7 @@ transport = new SSEServerTransport("/messages", res);
server.connect(transport);
```
### Snapshot Mode
The Playwright MCP provides a set of tools for browser automation. Here are all available tools:
- **browser_navigate**
- Description: Navigate to a URL
- Parameters:
- `url` (string): The URL to navigate to
- **browser_go_back**
- Description: Go back to the previous page
- Parameters: None
- **browser_go_forward**
- Description: Go forward to the next page
- Parameters: None
### Snapshot-based Interactions
- **browser_click**
- Description: Perform click on a web page
@@ -210,109 +196,121 @@ The Playwright MCP provides a set of tools for browser automation. Here are all
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `text` (string): Text to type into the element
- `submit` (boolean): Whether to submit entered text (press Enter after)
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
- `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
- **browser_select_option**
- Description: Select option in a dropdown
- Description: Select an option in a dropdown
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `values` (array): Array of values to select in the dropdown.
- `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
- **browser_choose_file**
- Description: Choose one or multiple files to upload
- **browser_snapshot**
- Description: Capture accessibility snapshot of the current page, this is better than screenshot
- Parameters: None
- **browser_take_screenshot**
- Description: Take a screenshot of the current page. You can't perform actions based on the screenshot, use browser_snapshot for actions.
- Parameters:
- `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.
- `raw` (boolean, optional): Whether to return without compression (in PNG format). Default is false, which returns a JPEG image.
### Vision-based Interactions
- **browser_screen_move_mouse**
- Description: Move mouse to a given position
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `x` (number): X coordinate
- `y` (number): Y coordinate
- **browser_screen_capture**
- Description: Take a screenshot of the current page
- Parameters: None
- **browser_screen_click**
- Description: Click left mouse button
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `x` (number): X coordinate
- `y` (number): Y coordinate
- **browser_screen_drag**
- Description: Drag left mouse button
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `startX` (number): Start X coordinate
- `startY` (number): Start Y coordinate
- `endX` (number): End X coordinate
- `endY` (number): End Y coordinate
- **browser_screen_type**
- Description: Type text
- Parameters:
- `text` (string): Text to type
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
- **browser_press_key**
- Description: Press a key on the keyboard
- Parameters:
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- **browser_snapshot**
- Description: Capture accessibility snapshot of the current page (better than screenshot)
### Tab Management
- **browser_tab_list**
- Description: List browser tabs
- Parameters: None
- **browser_save_as_pdf**
- Description: Save page as PDF
- Parameters: None
- **browser_take_screenshot**
- Description: Capture screenshot of the page
- **browser_tab_new**
- Description: Open a new tab
- Parameters:
- `raw` (string): Optionally returns lossless PNG screenshot. JPEG by default.
- `url` (string, optional): The URL to navigate to in the new tab. If not provided, the new tab will be blank.
- **browser_wait**
- Description: Wait for a specified time in seconds
- **browser_tab_select**
- Description: Select a tab by index
- Parameters:
- `time` (number): The time to wait in seconds (capped at 10 seconds)
- `index` (number): The index of the tab to select
- **browser_close**
- Description: Close the page
- Parameters: None
- **browser_tab_close**
- Description: Close a tab
- Parameters:
- `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
### Vision Mode
Vision Mode provides tools for visual-based interactions using screenshots. Here are all available tools:
### Navigation
- **browser_navigate**
- Description: Navigate to a URL
- Parameters:
- `url` (string): The URL to navigate to
- **browser_go_back**
- **browser_navigate_back**
- Description: Go back to the previous page
- Parameters: None
- **browser_go_forward**
- **browser_navigate_forward**
- Description: Go forward to the next page
- Parameters: None
- **browser_screenshot**
- Description: Capture screenshot of the current page
- Parameters: None
- **browser_move_mouse**
- Description: Move mouse to specified coordinates
- Parameters:
- `x` (number): X coordinate
- `y` (number): Y coordinate
- **browser_click**
- Description: Click at specified coordinates
- Parameters:
- `x` (number): X coordinate to click at
- `y` (number): Y coordinate to click at
- **browser_drag**
- Description: Perform drag and drop operation
- Parameters:
- `startX` (number): Start X coordinate
- `startY` (number): Start Y coordinate
- `endX` (number): End X coordinate
- `endY` (number): End Y coordinate
- **browser_type**
- Description: Type text at specified coordinates
- Parameters:
- `text` (string): Text to type
- `submit` (boolean): Whether to submit entered text (press Enter after)
### Keyboard
- **browser_press_key**
- Description: Press a key on the keyboard
- Parameters:
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- **browser_choose_file**
### Files and Media
- **browser_file_upload**
- Description: Choose one or multiple files to upload
- Parameters:
- `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.
- **browser_save_as_pdf**
- **browser_pdf_save**
- Description: Save page as PDF
- Parameters: None
### Utilities
- **browser_wait**
- Description: Wait for a specified time in seconds
- Parameters:
@@ -321,3 +319,7 @@ Vision Mode provides tools for visual-based interactions using screenshots. Here
- **browser_close**
- Description: Close the page
- Parameters: None
- **browser_install**
- Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
- Parameters: None

7
index.d.ts vendored
View File

@@ -18,6 +18,8 @@
import type { LaunchOptions } from 'playwright';
import type { Server } from '@modelcontextprotocol/sdk/server/index.js';
type ToolCapability = 'core' | 'tabs' | 'pdf' | 'history' | 'wait' | 'files' | 'install';
type Options = {
/**
* Path to the user data directory.
@@ -35,6 +37,11 @@ type Options = {
* @default false
*/
vision?: boolean;
/**
* Capabilities to enable.
*/
capabilities?: ToolCapability[];
};
export function createServer(options?: Options): Server;

14
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "@playwright/mcp",
"version": "0.0.9",
"version": "0.0.12",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@playwright/mcp",
"version": "0.0.9",
"version": "0.0.12",
"license": "Apache-2.0",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.6.1",
@@ -377,6 +377,8 @@
},
"node_modules/@types/node": {
"version": "22.13.10",
"resolved": "https://registry.npmjs.org/@types/node/-/node-22.13.10.tgz",
"integrity": "sha512-I6LPUvlRH+O6VRUqYOcMudhaIdUVWfsjnZavnsraHvpBwaEyMN29ry+0UVJhImYL16xsscu0aske3yA+uPOWfw==",
"dev": true,
"license": "MIT",
"dependencies": {
@@ -1019,6 +1021,8 @@
},
"node_modules/commander": {
"version": "13.1.0",
"resolved": "https://registry.npmjs.org/commander/-/commander-13.1.0.tgz",
"integrity": "sha512-/rFeCpNJQbhSZjGVwO9RFV3xPqbnERS8MmIQzCtD/zl6gpJuV/bMLuN92oG3F7d8oDEHHRrujSXNUr8fpjntKw==",
"license": "MIT",
"engines": {
"node": ">=18"
@@ -4188,6 +4192,8 @@
},
"node_modules/undici-types": {
"version": "6.20.0",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-6.20.0.tgz",
"integrity": "sha512-Ny6QZ2Nju20vw1SRHe3d9jVu6gJ+4e3+MMpqu7pqE5HT6WsTSlce++GQmK5UXS8mzV8DSYHrQH+Xrf2jVcuKNg==",
"dev": true,
"license": "MIT"
},
@@ -4376,6 +4382,8 @@
},
"node_modules/zod": {
"version": "3.24.2",
"resolved": "https://registry.npmjs.org/zod/-/zod-3.24.2.tgz",
"integrity": "sha512-lY7CDW43ECgW9u1TcT3IoXHflywfVqDYze4waEz812jR/bZ8FHDsl7pFQoSZTz5N+2NqRXs8GBwnAwo3ZNxqhQ==",
"license": "MIT",
"funding": {
"url": "https://github.com/sponsors/colinhacks"
@@ -4383,6 +4391,8 @@
},
"node_modules/zod-to-json-schema": {
"version": "3.24.4",
"resolved": "https://registry.npmjs.org/zod-to-json-schema/-/zod-to-json-schema-3.24.4.tgz",
"integrity": "sha512-0uNlcvgabyrni9Ag8Vghj21drk7+7tp7VTwwR7KxxXXc/3pbXz2PHlDgj3cICahgF1kHm4dExBFj7BXrZJXzig==",
"license": "ISC",
"peerDependencies": {
"zod": "^3.24.1"

View File

@@ -1,6 +1,6 @@
{
"name": "@playwright/mcp",
"version": "0.0.9",
"version": "0.0.12",
"description": "Playwright Tools for MCP",
"repository": {
"type": "git",

View File

@@ -14,12 +14,12 @@
* limitations under the License.
*/
import { fork } from 'child_process';
import path from 'path';
import * as playwright from 'playwright';
import yaml from 'yaml';
import { waitForCompletion } from './tools/utils';
import { ToolResult } from './tools/tool';
export type ContextOptions = {
browserName?: 'chromium' | 'firefox' | 'webkit';
userDataDir: string;
@@ -28,147 +28,286 @@ export type ContextOptions = {
remoteEndpoint?: string;
};
type PageOrFrameLocator = playwright.Page | playwright.FrameLocator;
type RunOptions = {
captureSnapshot?: boolean;
waitForCompletion?: boolean;
status?: string;
noClearFileChooser?: boolean;
};
export class Context {
private _options: ContextOptions;
readonly options: ContextOptions;
private _browser: playwright.Browser | undefined;
private _page: playwright.Page | undefined;
private _console: playwright.ConsoleMessage[] = [];
private _createPagePromise: Promise<playwright.Page> | undefined;
private _fileChooser: playwright.FileChooser | undefined;
private _lastSnapshotFrames: (playwright.Page | playwright.FrameLocator)[] = [];
private _browserContext: playwright.BrowserContext | undefined;
private _tabs: Tab[] = [];
private _currentTab: Tab | undefined;
constructor(options: ContextOptions) {
this._options = options;
this.options = options;
}
async createPage(): Promise<playwright.Page> {
if (this._createPagePromise)
return this._createPagePromise;
this._createPagePromise = (async () => {
const { browser, page } = await this._createPage();
page.on('console', event => this._console.push(event));
page.on('framenavigated', frame => {
if (!frame.parentFrame())
this._console.length = 0;
});
page.on('close', () => this._onPageClose());
page.on('filechooser', chooser => this._fileChooser = chooser);
page.setDefaultNavigationTimeout(60000);
page.setDefaultTimeout(5000);
this._page = page;
this._browser = browser;
return page;
})();
return this._createPagePromise;
tabs(): Tab[] {
return this._tabs;
}
private _onPageClose() {
currentTab(): Tab {
if (!this._currentTab)
throw new Error('No current snapshot available. Capture a snapshot of navigate to a new location first.');
return this._currentTab;
}
async newTab(): Promise<Tab> {
const browserContext = await this._ensureBrowserContext();
const page = await browserContext.newPage();
this._currentTab = this._tabs.find(t => t.page === page)!;
return this._currentTab;
}
async selectTab(index: number) {
this._currentTab = this._tabs[index - 1];
await this._currentTab.page.bringToFront();
}
async ensureTab(): Promise<Tab> {
const context = await this._ensureBrowserContext();
if (!this._currentTab)
await context.newPage();
return this._currentTab!;
}
async listTabs(): Promise<string> {
if (!this._tabs.length)
return 'No tabs open';
const lines: string[] = ['Open tabs:'];
for (let i = 0; i < this._tabs.length; i++) {
const tab = this._tabs[i];
const title = await tab.page.title();
const url = tab.page.url();
const current = tab === this._currentTab ? ' (current)' : '';
lines.push(`- ${i + 1}:${current} [${title}] (${url})`);
}
return lines.join('\n');
}
async closeTab(index: number | undefined) {
const tab = index === undefined ? this.currentTab() : this._tabs[index - 1];
await tab.page.close();
return await this.listTabs();
}
private _onPageCreated(page: playwright.Page) {
const tab = new Tab(this, page, tab => this._onPageClosed(tab));
this._tabs.push(tab);
if (!this._currentTab)
this._currentTab = tab;
}
private _onPageClosed(tab: Tab) {
const index = this._tabs.indexOf(tab);
if (index === -1)
return;
this._tabs.splice(index, 1);
if (this._currentTab === tab)
this._currentTab = this._tabs[Math.min(index, this._tabs.length - 1)];
const browser = this._browser;
const page = this._page;
void page?.context()?.close().then(() => browser?.close()).catch(() => {});
if (this._browserContext && !this._tabs.length) {
void this._browserContext.close().then(() => browser?.close()).catch(() => {});
this._browser = undefined;
this._browserContext = undefined;
}
}
this._createPagePromise = undefined;
this._browser = undefined;
this._page = undefined;
async close() {
if (!this._browserContext)
return;
await this._browserContext.close();
}
private async _ensureBrowserContext() {
if (!this._browserContext) {
const context = await this._createBrowserContext();
this._browser = context.browser;
this._browserContext = context.browserContext;
for (const page of this._browserContext.pages())
this._onPageCreated(page);
this._browserContext.on('page', page => this._onPageCreated(page));
}
return this._browserContext;
}
private async _createBrowserContext(): Promise<{ browser?: playwright.Browser, browserContext: playwright.BrowserContext }> {
if (this.options.remoteEndpoint) {
const url = new URL(this.options.remoteEndpoint);
if (this.options.browserName)
url.searchParams.set('browser', this.options.browserName);
if (this.options.launchOptions)
url.searchParams.set('launch-options', JSON.stringify(this.options.launchOptions));
const browser = await playwright[this.options.browserName ?? 'chromium'].connect(String(url));
const browserContext = await browser.newContext();
return { browser, browserContext };
}
if (this.options.cdpEndpoint) {
const browser = await playwright.chromium.connectOverCDP(this.options.cdpEndpoint);
const browserContext = browser.contexts()[0];
return { browser, browserContext };
}
const browserContext = await this._launchPersistentContext();
return { browserContext };
}
private async _launchPersistentContext(): Promise<playwright.BrowserContext> {
try {
const browserType = this.options.browserName ? playwright[this.options.browserName] : playwright.chromium;
return await browserType.launchPersistentContext(this.options.userDataDir, this.options.launchOptions);
} catch (error: any) {
if (error.message.includes('Executable doesn\'t exist'))
throw new Error(`Browser specified in your config is not installed. Either install it (likely) or change the config.`);
throw error;
}
}
}
class Tab {
readonly context: Context;
readonly page: playwright.Page;
private _console: playwright.ConsoleMessage[] = [];
private _fileChooser: playwright.FileChooser | undefined;
private _snapshot: PageSnapshot | undefined;
private _onPageClose: (tab: Tab) => void;
constructor(context: Context, page: playwright.Page, onPageClose: (tab: Tab) => void) {
this.context = context;
this.page = page;
this._onPageClose = onPageClose;
page.on('console', event => this._console.push(event));
page.on('framenavigated', frame => {
if (!frame.parentFrame())
this._console.length = 0;
});
page.on('close', () => this._onClose());
page.on('filechooser', chooser => this._fileChooser = chooser);
page.setDefaultNavigationTimeout(60000);
page.setDefaultTimeout(5000);
}
private _onClose() {
this._fileChooser = undefined;
this._console.length = 0;
this._onPageClose(this);
}
async install(): Promise<string> {
const channel = this._options.launchOptions?.channel ?? this._options.browserName ?? 'chrome';
const cli = path.join(require.resolve('playwright/package.json'), '..', 'cli.js');
const child = fork(cli, ['install', channel], {
stdio: 'pipe',
});
const output: string[] = [];
child.stdout?.on('data', data => output.push(data.toString()));
child.stderr?.on('data', data => output.push(data.toString()));
return new Promise((resolve, reject) => {
child.on('close', code => {
if (code === 0)
resolve(channel);
else
reject(new Error(`Failed to install browser: ${output.join('')}`));
});
async navigate(url: string) {
await this.page.goto(url, { waitUntil: 'domcontentloaded' });
// Cap load event to 5 seconds, the page is operational at this point.
await this.page.waitForLoadState('load', { timeout: 5000 }).catch(() => {});
}
async run(callback: (tab: Tab) => Promise<void>, options?: RunOptions): Promise<ToolResult> {
try {
if (!options?.noClearFileChooser)
this._fileChooser = undefined;
if (options?.waitForCompletion)
await waitForCompletion(this.page, () => callback(this));
else
await callback(this);
} finally {
if (options?.captureSnapshot)
this._snapshot = await PageSnapshot.create(this.page);
}
const tabList = this.context.tabs().length > 1 ? await this.context.listTabs() + '\n\nCurrent tab:' + '\n' : '';
const snapshot = this._snapshot?.text({ status: options?.status, hasFileChooser: !!this._fileChooser }) ?? options?.status ?? '';
return {
content: [{
type: 'text',
text: tabList + snapshot,
}],
};
}
async runAndWait(callback: (tab: Tab) => Promise<void>, options?: RunOptions): Promise<ToolResult> {
return await this.run(callback, {
waitForCompletion: true,
...options,
});
}
existingPage(): playwright.Page {
if (!this._page)
throw new Error('Navigate to a location to create a page');
return this._page;
async runAndWaitWithSnapshot(callback: (snapshot: PageSnapshot) => Promise<void>, options?: RunOptions): Promise<ToolResult> {
return await this.run(tab => callback(tab.lastSnapshot()), {
captureSnapshot: true,
waitForCompletion: true,
...options,
});
}
lastSnapshot(): PageSnapshot {
if (!this._snapshot)
throw new Error('No snapshot available');
return this._snapshot;
}
async console(): Promise<playwright.ConsoleMessage[]> {
return this._console;
}
async close() {
if (!this._page)
return;
await this._page.close();
}
async submitFileChooser(paths: string[]) {
if (!this._fileChooser)
throw new Error('No file chooser visible');
await this._fileChooser.setFiles(paths);
this._fileChooser = undefined;
}
}
hasFileChooser() {
return !!this._fileChooser;
class PageSnapshot {
private _frameLocators: PageOrFrameLocator[] = [];
private _text!: string;
constructor() {
}
clearFileChooser() {
this._fileChooser = undefined;
static async create(page: playwright.Page): Promise<PageSnapshot> {
const snapshot = new PageSnapshot();
await snapshot._build(page);
return snapshot;
}
private async _createPage(): Promise<{ browser?: playwright.Browser, page: playwright.Page }> {
if (this._options.remoteEndpoint) {
const url = new URL(this._options.remoteEndpoint);
if (this._options.browserName)
url.searchParams.set('browser', this._options.browserName);
if (this._options.launchOptions)
url.searchParams.set('launch-options', JSON.stringify(this._options.launchOptions));
const browser = await playwright[this._options.browserName ?? 'chromium'].connect(String(url));
const page = await browser.newPage();
return { browser, page };
text(options?: { status?: string, hasFileChooser?: boolean }): string {
const results: string[] = [];
if (options?.status) {
results.push(options.status);
results.push('');
}
if (this._options.cdpEndpoint) {
const browser = await playwright.chromium.connectOverCDP(this._options.cdpEndpoint);
const browserContext = browser.contexts()[0];
let [page] = browserContext.pages();
if (!page)
page = await browserContext.newPage();
return { browser, page };
if (options?.hasFileChooser) {
results.push('- There is a file chooser visible that requires browser_file_upload to be called');
results.push('');
}
const context = await this._launchPersistentContext();
const [page] = context.pages();
return { page };
results.push(this._text);
return results.join('\n');
}
private async _launchPersistentContext(): Promise<playwright.BrowserContext> {
try {
const browserType = this._options.browserName ? playwright[this._options.browserName] : playwright.chromium;
return await browserType.launchPersistentContext(this._options.userDataDir, this._options.launchOptions);
} catch (error: any) {
if (error.message.includes('Executable doesn\'t exist'))
throw new Error(`Browser specified in your config is not installed. Either install it (likely) or change the config.`);
throw error;
}
private async _build(page: playwright.Page) {
const yamlDocument = await this._snapshotFrame(page);
const lines = [];
lines.push(
`- Page URL: ${page.url()}`,
`- Page Title: ${await page.title()}`
);
lines.push(
`- Page Snapshot`,
'```yaml',
yamlDocument.toString().trim(),
'```',
''
);
this._text = lines.join('\n');
}
async allFramesSnapshot() {
this._lastSnapshotFrames = [];
const yaml = await this._allFramesSnapshot(this.existingPage());
return yaml.toString().trim();
}
private async _allFramesSnapshot(frame: playwright.Page | playwright.FrameLocator): Promise<yaml.Document> {
const frameIndex = this._lastSnapshotFrames.push(frame) - 1;
private async _snapshotFrame(frame: playwright.Page | playwright.FrameLocator) {
const frameIndex = this._frameLocators.push(frame) - 1;
const snapshotString = await frame.locator('body').ariaSnapshot({ ref: true });
const snapshot = yaml.parseDocument(snapshotString);
@@ -189,7 +328,7 @@ export class Context {
const ref = value.match(/\[ref=(.*)\]/)?.[1];
if (ref) {
try {
const childSnapshot = await this._allFramesSnapshot(frame.frameLocator(`aria-ref=${ref}`));
const childSnapshot = await this._snapshotFrame(frame.frameLocator(`aria-ref=${ref}`));
return snapshot.createPair(node.value, childSnapshot);
} catch (error) {
return snapshot.createPair(node.value, '<could not take iframe snapshot>');
@@ -206,11 +345,11 @@ export class Context {
}
refLocator(ref: string): playwright.Locator {
let frame = this._lastSnapshotFrames[0];
let frame = this._frameLocators[0];
const match = ref.match(/^f(\d+)(.*)/);
if (match) {
const frameIndex = parseInt(match[1], 10);
frame = this._lastSnapshotFrames[frameIndex];
frame = this._frameLocators[frameIndex];
ref = match[2];
}

View File

@@ -15,53 +15,46 @@
*/
import { createServerWithTools } from './server';
import * as snapshot from './tools/snapshot';
import * as common from './tools/common';
import * as screenshot from './tools/screenshot';
import { console } from './resources/console';
import common from './tools/common';
import files from './tools/files';
import install from './tools/install';
import keyboard from './tools/keyboard';
import navigate from './tools/navigate';
import pdf from './tools/pdf';
import snapshot from './tools/snapshot';
import tabs from './tools/tabs';
import screen from './tools/screen';
import { console as consoleResource } from './resources/console';
import type { Tool } from './tools/tool';
import type { Tool, ToolCapability } from './tools/tool';
import type { Resource } from './resources/resource';
import type { Server } from '@modelcontextprotocol/sdk/server/index.js';
import type { LaunchOptions } from 'playwright';
const commonTools: Tool[] = [
common.pressKey,
common.wait,
common.pdf,
common.close,
common.install,
];
const snapshotTools: Tool[] = [
common.navigate(true),
common.goBack(true),
common.goForward(true),
common.chooseFile(true),
snapshot.snapshot,
snapshot.click,
snapshot.hover,
snapshot.type,
snapshot.selectOption,
snapshot.screenshot,
...commonTools,
...common(true),
...files(true),
...install,
...keyboard(true),
...navigate(true),
...pdf,
...snapshot,
...tabs(true),
];
const screenshotTools: Tool[] = [
common.navigate(false),
common.goBack(false),
common.goForward(false),
common.chooseFile(false),
screenshot.screenshot,
screenshot.moveMouse,
screenshot.click,
screenshot.drag,
screenshot.type,
...commonTools,
...common(false),
...files(false),
...install,
...keyboard(false),
...navigate(false),
...pdf,
...screen,
...tabs(false),
];
const resources: Resource[] = [
console,
consoleResource,
];
type Options = {
@@ -70,12 +63,14 @@ type Options = {
launchOptions?: LaunchOptions;
cdpEndpoint?: string;
vision?: boolean;
capabilities?: ToolCapability[];
};
const packageJSON = require('../package.json');
export function createServer(options?: Options): Server {
const tools = options?.vision ? screenshotTools : snapshotTools;
const allTools = options?.vision ? screenshotTools : snapshotTools;
const tools = allTools.filter(tool => !options?.capabilities || tool.capability === 'core' || options.capabilities.includes(tool.capability));
return createServerWithTools({
name: 'Playwright',
version: packageJSON.version,

View File

@@ -29,6 +29,7 @@ import { ServerList } from './server';
import type { LaunchOptions } from 'playwright';
import assert from 'assert';
import { ToolCapability } from './tools/tool';
const packageJSON = require('../package.json');
@@ -36,6 +37,7 @@ program
.version('Version ' + packageJSON.version)
.name(packageJSON.name)
.option('--browser <browser>', 'Browser or chrome channel to use, possible values: chrome, firefox, webkit, msedge.')
.option('--caps <caps>', 'Comma-separated list of capabilities to enable, possible values: tabs, pdf, history, wait, files, install. Default is all.')
.option('--cdp-endpoint <endpoint>', 'CDP endpoint to connect to.')
.option('--executable-path <path>', 'Path to the browser executable.')
.option('--headless', 'Run browser in headless mode, headed by default')
@@ -72,7 +74,7 @@ program
}
const launchOptions: LaunchOptions = {
headless: !!options.headless,
headless: !!(options.headless ?? (os.platform() === 'linux' && !process.env.DISPLAY)),
channel,
executablePath: options.executablePath,
};
@@ -85,6 +87,7 @@ program
launchOptions,
vision: !!options.vision,
cdpEndpoint: options.cdpEndpoint,
capabilities: options.caps?.split(',').map((c: string) => c.trim() as ToolCapability),
}));
setupExitWatchdog(serverList);
@@ -97,11 +100,15 @@ program
});
function setupExitWatchdog(serverList: ServerList) {
process.stdin.on('close', async () => {
const handleExit = async () => {
setTimeout(() => process.exit(0), 15000);
await serverList.closeAll();
process.exit(0);
});
};
process.stdin.on('close', handleExit);
process.on('SIGINT', handleExit);
process.on('SIGTERM', handleExit);
}
program.parse(process.argv);

View File

@@ -24,7 +24,7 @@ export const console: Resource = {
},
read: async (context, uri) => {
const messages = await context.console();
const messages = await context.currentTab().console();
const log = messages.map(message => `[${message.type().toUpperCase()}] ${message.text()}`).join('\n');
return [{
uri,

View File

@@ -14,74 +14,17 @@
* limitations under the License.
*/
import os from 'os';
import path from 'path';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import { captureAriaSnapshot, runAndWait, sanitizeForFilePath } from './utils';
import type { ToolFactory, Tool } from './tool';
const navigateSchema = z.object({
url: z.string().describe('The URL to navigate to'),
});
export const navigate: ToolFactory = snapshot => ({
schema: {
name: 'browser_navigate',
description: 'Navigate to a URL',
inputSchema: zodToJsonSchema(navigateSchema),
},
handle: async (context, params) => {
const validatedParams = navigateSchema.parse(params);
const page = await context.createPage();
await page.goto(validatedParams.url, { waitUntil: 'domcontentloaded' });
// Cap load event to 5 seconds, the page is operational at this point.
await page.waitForLoadState('load', { timeout: 5000 }).catch(() => {});
if (snapshot)
return captureAriaSnapshot(context);
return {
content: [{
type: 'text',
text: `Navigated to ${validatedParams.url}`,
}],
};
},
});
const goBackSchema = z.object({});
export const goBack: ToolFactory = snapshot => ({
schema: {
name: 'browser_go_back',
description: 'Go back to the previous page',
inputSchema: zodToJsonSchema(goBackSchema),
},
handle: async context => {
return await runAndWait(context, 'Navigated back', async page => page.goBack(), snapshot);
},
});
const goForwardSchema = z.object({});
export const goForward: ToolFactory = snapshot => ({
schema: {
name: 'browser_go_forward',
description: 'Go forward to the next page',
inputSchema: zodToJsonSchema(goForwardSchema),
},
handle: async context => {
return await runAndWait(context, 'Navigated forward', async page => page.goForward(), snapshot);
},
});
import type { Tool, ToolFactory } from './tool';
const waitSchema = z.object({
time: z.number().describe('The time to wait in seconds'),
});
export const wait: Tool = {
const wait: Tool = {
capability: 'wait',
schema: {
name: 'browser_wait',
description: 'Wait for a specified time in seconds',
@@ -99,48 +42,10 @@ export const wait: Tool = {
},
};
const pressKeySchema = z.object({
key: z.string().describe('Name of the key to press or a character to generate, such as `ArrowLeft` or `a`'),
});
export const pressKey: Tool = {
schema: {
name: 'browser_press_key',
description: 'Press a key on the keyboard',
inputSchema: zodToJsonSchema(pressKeySchema),
},
handle: async (context, params) => {
const validatedParams = pressKeySchema.parse(params);
return await runAndWait(context, `Pressed key ${validatedParams.key}`, async page => {
await page.keyboard.press(validatedParams.key);
});
},
};
const pdfSchema = z.object({});
export const pdf: Tool = {
schema: {
name: 'browser_save_as_pdf',
description: 'Save page as PDF',
inputSchema: zodToJsonSchema(pdfSchema),
},
handle: async context => {
const page = context.existingPage();
const fileName = path.join(os.tmpdir(), sanitizeForFilePath(`page-${new Date().toISOString()}`)) + '.pdf';
await page.pdf({ path: fileName });
return {
content: [{
type: 'text',
text: `Saved as ${fileName}`,
}],
};
},
};
const closeSchema = z.object({});
export const close: Tool = {
const close: Tool = {
capability: 'core',
schema: {
name: 'browser_close',
description: 'Close the page',
@@ -157,37 +62,34 @@ export const close: Tool = {
},
};
const chooseFileSchema = z.object({
paths: z.array(z.string()).describe('The absolute paths to the files to upload. Can be a single file or multiple files.'),
const resizeSchema = z.object({
width: z.number().describe('Width of the browser window'),
height: z.number().describe('Height of the browser window'),
});
export const chooseFile: ToolFactory = snapshot => ({
const resize: ToolFactory = captureSnapshot => ({
capability: 'core',
schema: {
name: 'browser_choose_file',
description: 'Choose one or multiple files to upload',
inputSchema: zodToJsonSchema(chooseFileSchema),
name: 'browser_resize',
description: 'Resize the browser window',
inputSchema: zodToJsonSchema(resizeSchema),
},
handle: async (context, params) => {
const validatedParams = chooseFileSchema.parse(params);
return await runAndWait(context, `Chose files ${validatedParams.paths.join(', ')}`, async () => {
await context.submitFileChooser(validatedParams.paths);
}, snapshot);
const validatedParams = resizeSchema.parse(params);
const tab = context.currentTab();
return await tab.run(
tab => tab.page.setViewportSize({ width: validatedParams.width, height: validatedParams.height }),
{
status: `Resized browser window`,
captureSnapshot,
}
);
},
});
export const install: Tool = {
schema: {
name: 'browser_install',
description: 'Install the browser specified in the config. Call this if you get an error about the browser not being installed.',
inputSchema: zodToJsonSchema(z.object({})),
},
handle: async context => {
const channel = await context.install();
return {
content: [{
type: 'text',
text: `Browser ${channel} installed`,
}],
};
},
};
export default (captureSnapshot: boolean) => [
close,
wait,
resize(captureSnapshot)
];

48
src/tools/files.ts Normal file
View File

@@ -0,0 +1,48 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import type { ToolFactory } from './tool';
const uploadFileSchema = z.object({
paths: z.array(z.string()).describe('The absolute paths to the files to upload. Can be a single file or multiple files.'),
});
const uploadFile: ToolFactory = captureSnapshot => ({
capability: 'files',
schema: {
name: 'browser_file_upload',
description: 'Upload one or multiple files',
inputSchema: zodToJsonSchema(uploadFileSchema),
},
handle: async (context, params) => {
const validatedParams = uploadFileSchema.parse(params);
const tab = context.currentTab();
return await tab.runAndWait(async () => {
await tab.submitFileChooser(validatedParams.paths);
}, {
status: `Chose files ${validatedParams.paths.join(', ')}`,
captureSnapshot,
noClearFileChooser: true,
});
},
});
export default (captureSnapshot: boolean) => [
uploadFile(captureSnapshot),
];

61
src/tools/install.ts Normal file
View File

@@ -0,0 +1,61 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { fork } from 'child_process';
import path from 'path';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import type { Tool } from './tool';
const install: Tool = {
capability: 'install',
schema: {
name: 'browser_install',
description: 'Install the browser specified in the config. Call this if you get an error about the browser not being installed.',
inputSchema: zodToJsonSchema(z.object({})),
},
handle: async context => {
const channel = context.options.launchOptions?.channel ?? context.options.browserName ?? 'chrome';
const cli = path.join(require.resolve('playwright/package.json'), '..', 'cli.js');
const child = fork(cli, ['install', channel], {
stdio: 'pipe',
});
const output: string[] = [];
child.stdout?.on('data', data => output.push(data.toString()));
child.stderr?.on('data', data => output.push(data.toString()));
await new Promise<void>((resolve, reject) => {
child.on('close', code => {
if (code === 0)
resolve();
else
reject(new Error(`Failed to install browser: ${output.join('')}`));
});
});
return {
content: [{
type: 'text',
text: `Browser ${channel} installed`,
}],
};
},
};
export default [
install,
];

46
src/tools/keyboard.ts Normal file
View File

@@ -0,0 +1,46 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { z } from 'zod';
import zodToJsonSchema from 'zod-to-json-schema';
import type { ToolFactory } from './tool';
const pressKeySchema = z.object({
key: z.string().describe('Name of the key to press or a character to generate, such as `ArrowLeft` or `a`'),
});
const pressKey: ToolFactory = captureSnapshot => ({
capability: 'core',
schema: {
name: 'browser_press_key',
description: 'Press a key on the keyboard',
inputSchema: zodToJsonSchema(pressKeySchema),
},
handle: async (context, params) => {
const validatedParams = pressKeySchema.parse(params);
return await context.currentTab().runAndWait(async tab => {
await tab.page.keyboard.press(validatedParams.key);
}, {
status: `Pressed key ${validatedParams.key}`,
captureSnapshot,
});
},
});
export default (captureSnapshot: boolean) => [
pressKey(captureSnapshot),
];

87
src/tools/navigate.ts Normal file
View File

@@ -0,0 +1,87 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import type { ToolFactory } from './tool';
const navigateSchema = z.object({
url: z.string().describe('The URL to navigate to'),
});
const navigate: ToolFactory = captureSnapshot => ({
capability: 'core',
schema: {
name: 'browser_navigate',
description: 'Navigate to a URL',
inputSchema: zodToJsonSchema(navigateSchema),
},
handle: async (context, params) => {
const validatedParams = navigateSchema.parse(params);
const currentTab = await context.ensureTab();
return await currentTab.run(async tab => {
await tab.navigate(validatedParams.url);
}, {
status: `Navigated to ${validatedParams.url}`,
captureSnapshot,
});
},
});
const goBackSchema = z.object({});
const goBack: ToolFactory = snapshot => ({
capability: 'history',
schema: {
name: 'browser_navigate_back',
description: 'Go back to the previous page',
inputSchema: zodToJsonSchema(goBackSchema),
},
handle: async context => {
return await context.currentTab().runAndWait(async tab => {
await tab.page.goBack();
}, {
status: 'Navigated back',
captureSnapshot: snapshot,
});
},
});
const goForwardSchema = z.object({});
const goForward: ToolFactory = snapshot => ({
capability: 'history',
schema: {
name: 'browser_navigate_forward',
description: 'Go forward to the next page',
inputSchema: zodToJsonSchema(goForwardSchema),
},
handle: async context => {
return await context.currentTab().runAndWait(async tab => {
await tab.page.goForward();
}, {
status: 'Navigated forward',
captureSnapshot: snapshot,
});
},
});
export default (captureSnapshot: boolean) => [
navigate(captureSnapshot),
goBack(captureSnapshot),
goForward(captureSnapshot),
];

51
src/tools/pdf.ts Normal file
View File

@@ -0,0 +1,51 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import os from 'os';
import path from 'path';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import { sanitizeForFilePath } from './utils';
import type { Tool } from './tool';
const pdfSchema = z.object({});
const pdf: Tool = {
capability: 'pdf',
schema: {
name: 'browser_pdf_save',
description: 'Save page as PDF',
inputSchema: zodToJsonSchema(pdfSchema),
},
handle: async context => {
const tab = context.currentTab();
const fileName = path.join(os.tmpdir(), sanitizeForFilePath(`page-${new Date().toISOString()}`)) + '.pdf';
await tab.page.pdf({ path: fileName });
return {
content: [{
type: 'text',
text: `Saved as ${fileName}`,
}],
};
},
};
export default [
pdf,
];

View File

@@ -17,20 +17,19 @@
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import { runAndWait } from './utils';
import type { Tool } from './tool';
export const screenshot: Tool = {
const screenshot: Tool = {
capability: 'core',
schema: {
name: 'browser_screenshot',
name: 'browser_screen_capture',
description: 'Take a screenshot of the current page',
inputSchema: zodToJsonSchema(z.object({})),
},
handle: async context => {
const page = context.existingPage();
const screenshot = await page.screenshot({ type: 'jpeg', quality: 50, scale: 'css' });
const tab = await context.ensureTab();
const screenshot = await tab.page.screenshot({ type: 'jpeg', quality: 50, scale: 'css' });
return {
content: [{ type: 'image', data: screenshot.toString('base64'), mimeType: 'image/jpeg' }],
};
@@ -46,17 +45,18 @@ const moveMouseSchema = elementSchema.extend({
y: z.number().describe('Y coordinate'),
});
export const moveMouse: Tool = {
const moveMouse: Tool = {
capability: 'core',
schema: {
name: 'browser_move_mouse',
name: 'browser_screen_move_mouse',
description: 'Move mouse to a given position',
inputSchema: zodToJsonSchema(moveMouseSchema),
},
handle: async (context, params) => {
const validatedParams = moveMouseSchema.parse(params);
const page = context.existingPage();
await page.mouse.move(validatedParams.x, validatedParams.y);
const tab = context.currentTab();
await tab.page.mouse.move(validatedParams.x, validatedParams.y);
return {
content: [{ type: 'text', text: `Moved mouse to (${validatedParams.x}, ${validatedParams.y})` }],
};
@@ -68,19 +68,22 @@ const clickSchema = elementSchema.extend({
y: z.number().describe('Y coordinate'),
});
export const click: Tool = {
const click: Tool = {
capability: 'core',
schema: {
name: 'browser_click',
name: 'browser_screen_click',
description: 'Click left mouse button',
inputSchema: zodToJsonSchema(clickSchema),
},
handle: async (context, params) => {
return await runAndWait(context, 'Clicked mouse', async page => {
return await context.currentTab().runAndWait(async tab => {
const validatedParams = clickSchema.parse(params);
await page.mouse.move(validatedParams.x, validatedParams.y);
await page.mouse.down();
await page.mouse.up();
await tab.page.mouse.move(validatedParams.x, validatedParams.y);
await tab.page.mouse.down();
await tab.page.mouse.up();
}, {
status: 'Clicked mouse',
});
},
};
@@ -92,42 +95,56 @@ const dragSchema = elementSchema.extend({
endY: z.number().describe('End Y coordinate'),
});
export const drag: Tool = {
const drag: Tool = {
capability: 'core',
schema: {
name: 'browser_drag',
name: 'browser_screen_drag',
description: 'Drag left mouse button',
inputSchema: zodToJsonSchema(dragSchema),
},
handle: async (context, params) => {
const validatedParams = dragSchema.parse(params);
return await runAndWait(context, `Dragged mouse from (${validatedParams.startX}, ${validatedParams.startY}) to (${validatedParams.endX}, ${validatedParams.endY})`, async page => {
await page.mouse.move(validatedParams.startX, validatedParams.startY);
await page.mouse.down();
await page.mouse.move(validatedParams.endX, validatedParams.endY);
await page.mouse.up();
return await context.currentTab().runAndWait(async tab => {
await tab.page.mouse.move(validatedParams.startX, validatedParams.startY);
await tab.page.mouse.down();
await tab.page.mouse.move(validatedParams.endX, validatedParams.endY);
await tab.page.mouse.up();
}, {
status: `Dragged mouse from (${validatedParams.startX}, ${validatedParams.startY}) to (${validatedParams.endX}, ${validatedParams.endY})`,
});
},
};
const typeSchema = z.object({
text: z.string().describe('Text to type into the element'),
submit: z.boolean().describe('Whether to submit entered text (press Enter after)'),
submit: z.boolean().optional().describe('Whether to submit entered text (press Enter after)'),
});
export const type: Tool = {
const type: Tool = {
capability: 'core',
schema: {
name: 'browser_type',
name: 'browser_screen_type',
description: 'Type text',
inputSchema: zodToJsonSchema(typeSchema),
},
handle: async (context, params) => {
const validatedParams = typeSchema.parse(params);
return await runAndWait(context, `Typed text "${validatedParams.text}"`, async page => {
await page.keyboard.type(validatedParams.text);
return await context.currentTab().runAndWait(async tab => {
await tab.page.keyboard.type(validatedParams.text);
if (validatedParams.submit)
await page.keyboard.press('Enter');
await tab.page.keyboard.press('Enter');
}, {
status: `Typed text "${validatedParams.text}"`,
});
},
};
export default [
screenshot,
moveMouse,
click,
drag,
type,
];

View File

@@ -17,12 +17,11 @@
import { z } from 'zod';
import zodToJsonSchema from 'zod-to-json-schema';
import { captureAriaSnapshot, runAndWait } from './utils';
import type * as playwright from 'playwright';
import type { Tool } from './tool';
export const snapshot: Tool = {
const snapshot: Tool = {
capability: 'core',
schema: {
name: 'browser_snapshot',
description: 'Capture accessibility snapshot of the current page, this is better than screenshot',
@@ -30,7 +29,8 @@ export const snapshot: Tool = {
},
handle: async context => {
return await captureAriaSnapshot(context);
const tab = await context.ensureTab();
return await tab.run(async () => {}, { captureSnapshot: true });
},
};
@@ -39,7 +39,8 @@ const elementSchema = z.object({
ref: z.string().describe('Exact target element reference from the page snapshot'),
});
export const click: Tool = {
const click: Tool = {
capability: 'core',
schema: {
name: 'browser_click',
description: 'Perform click on a web page',
@@ -48,7 +49,12 @@ export const click: Tool = {
handle: async (context, params) => {
const validatedParams = elementSchema.parse(params);
return runAndWait(context, `"${validatedParams.element}" clicked`, () => context.refLocator(validatedParams.ref).click(), true);
return await context.currentTab().runAndWaitWithSnapshot(async snapshot => {
const locator = snapshot.refLocator(validatedParams.ref);
await locator.click();
}, {
status: `Clicked "${validatedParams.element}"`,
});
},
};
@@ -59,7 +65,8 @@ const dragSchema = z.object({
endRef: z.string().describe('Exact target element reference from the page snapshot'),
});
export const drag: Tool = {
const drag: Tool = {
capability: 'core',
schema: {
name: 'browser_drag',
description: 'Perform drag and drop between two elements',
@@ -68,15 +75,18 @@ export const drag: Tool = {
handle: async (context, params) => {
const validatedParams = dragSchema.parse(params);
return runAndWait(context, `Dragged "${validatedParams.startElement}" to "${validatedParams.endElement}"`, async () => {
const startLocator = context.refLocator(validatedParams.startRef);
const endLocator = context.refLocator(validatedParams.endRef);
return await context.currentTab().runAndWaitWithSnapshot(async snapshot => {
const startLocator = snapshot.refLocator(validatedParams.startRef);
const endLocator = snapshot.refLocator(validatedParams.endRef);
await startLocator.dragTo(endLocator);
}, true);
}, {
status: `Dragged "${validatedParams.startElement}" to "${validatedParams.endElement}"`,
});
},
};
export const hover: Tool = {
const hover: Tool = {
capability: 'core',
schema: {
name: 'browser_hover',
description: 'Hover over element on page',
@@ -85,16 +95,23 @@ export const hover: Tool = {
handle: async (context, params) => {
const validatedParams = elementSchema.parse(params);
return runAndWait(context, `Hovered over "${validatedParams.element}"`, () => context.refLocator(validatedParams.ref).hover(), true);
return await context.currentTab().runAndWaitWithSnapshot(async snapshot => {
const locator = snapshot.refLocator(validatedParams.ref);
await locator.hover();
}, {
status: `Hovered over "${validatedParams.element}"`,
});
},
};
const typeSchema = elementSchema.extend({
text: z.string().describe('Text to type into the element'),
submit: z.boolean().describe('Whether to submit entered text (press Enter after)'),
submit: z.boolean().optional().describe('Whether to submit entered text (press Enter after)'),
slowly: z.boolean().optional().describe('Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.'),
});
export const type: Tool = {
const type: Tool = {
capability: 'core',
schema: {
name: 'browser_type',
description: 'Type text into editable element',
@@ -103,12 +120,17 @@ export const type: Tool = {
handle: async (context, params) => {
const validatedParams = typeSchema.parse(params);
return await runAndWait(context, `Typed "${validatedParams.text}" into "${validatedParams.element}"`, async () => {
const locator = context.refLocator(validatedParams.ref);
await locator.fill(validatedParams.text);
return await context.currentTab().runAndWaitWithSnapshot(async snapshot => {
const locator = snapshot.refLocator(validatedParams.ref);
if (validatedParams.slowly)
await locator.pressSequentially(validatedParams.text);
else
await locator.fill(validatedParams.text);
if (validatedParams.submit)
await locator.press('Enter');
}, true);
}, {
status: `Typed "${validatedParams.text}" into "${validatedParams.element}"`,
});
},
};
@@ -116,7 +138,8 @@ const selectOptionSchema = elementSchema.extend({
values: z.array(z.string()).describe('Array of values to select in the dropdown. This can be a single value or multiple values.'),
});
export const selectOption: Tool = {
const selectOption: Tool = {
capability: 'core',
schema: {
name: 'browser_select_option',
description: 'Select an option in a dropdown',
@@ -125,10 +148,12 @@ export const selectOption: Tool = {
handle: async (context, params) => {
const validatedParams = selectOptionSchema.parse(params);
return await runAndWait(context, `Selected option in "${validatedParams.element}"`, async () => {
const locator = context.refLocator(validatedParams.ref);
return await context.currentTab().runAndWaitWithSnapshot(async snapshot => {
const locator = snapshot.refLocator(validatedParams.ref);
await locator.selectOption(validatedParams.values);
}, true);
}, {
status: `Selected option in "${validatedParams.element}"`,
});
},
};
@@ -136,7 +161,8 @@ const screenshotSchema = z.object({
raw: z.boolean().optional().describe('Whether to return without compression (in PNG format). Default is false, which returns a JPEG image.'),
});
export const screenshot: Tool = {
const screenshot: Tool = {
capability: 'core',
schema: {
name: 'browser_take_screenshot',
description: `Take a screenshot of the current page. You can't perform actions based on the screenshot, use browser_snapshot for actions.`,
@@ -145,11 +171,21 @@ export const screenshot: Tool = {
handle: async (context, params) => {
const validatedParams = screenshotSchema.parse(params);
const page = context.existingPage();
const tab = context.currentTab();
const options: playwright.PageScreenshotOptions = validatedParams.raw ? { type: 'png', scale: 'css' } : { type: 'jpeg', quality: 50, scale: 'css' };
const screenshot = await page.screenshot(options);
const screenshot = await tab.page.screenshot(options);
return {
content: [{ type: 'image', data: screenshot.toString('base64'), mimeType: validatedParams.raw ? 'image/png' : 'image/jpeg' }],
};
},
};
export default [
snapshot,
click,
drag,
hover,
type,
selectOption,
screenshot,
];

109
src/tools/tabs.ts Normal file
View File

@@ -0,0 +1,109 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
import type { ToolFactory, Tool } from './tool';
const listTabs: Tool = {
capability: 'tabs',
schema: {
name: 'browser_tab_list',
description: 'List browser tabs',
inputSchema: zodToJsonSchema(z.object({})),
},
handle: async context => {
return {
content: [{
type: 'text',
text: await context.listTabs(),
}],
};
},
};
const selectTabSchema = z.object({
index: z.number().describe('The index of the tab to select'),
});
const selectTab: ToolFactory = captureSnapshot => ({
capability: 'tabs',
schema: {
name: 'browser_tab_select',
description: 'Select a tab by index',
inputSchema: zodToJsonSchema(selectTabSchema),
},
handle: async (context, params) => {
const validatedParams = selectTabSchema.parse(params);
await context.selectTab(validatedParams.index);
const currentTab = await context.ensureTab();
return await currentTab.run(async () => {}, { captureSnapshot });
},
});
const newTabSchema = z.object({
url: z.string().optional().describe('The URL to navigate to in the new tab. If not provided, the new tab will be blank.'),
});
const newTab: Tool = {
capability: 'tabs',
schema: {
name: 'browser_tab_new',
description: 'Open a new tab',
inputSchema: zodToJsonSchema(newTabSchema),
},
handle: async (context, params) => {
const validatedParams = newTabSchema.parse(params);
await context.newTab();
if (validatedParams.url)
await context.currentTab().navigate(validatedParams.url);
return await context.currentTab().run(async () => {}, { captureSnapshot: true });
},
};
const closeTabSchema = z.object({
index: z.number().optional().describe('The index of the tab to close. Closes current tab if not provided.'),
});
const closeTab: ToolFactory = captureSnapshot => ({
capability: 'tabs',
schema: {
name: 'browser_tab_close',
description: 'Close a tab',
inputSchema: zodToJsonSchema(closeTabSchema),
},
handle: async (context, params) => {
const validatedParams = closeTabSchema.parse(params);
await context.closeTab(validatedParams.index);
const currentTab = context.currentTab();
if (currentTab)
return await currentTab.run(async () => {}, { captureSnapshot });
return {
content: [{
type: 'text',
text: await context.listTabs(),
}],
};
},
});
export default (captureSnapshot: boolean) => [
listTabs,
newTab,
selectTab(captureSnapshot),
closeTab(captureSnapshot),
];

View File

@@ -18,6 +18,8 @@ import type { ImageContent, TextContent } from '@modelcontextprotocol/sdk/types'
import type { JsonSchema7Type } from 'zod-to-json-schema';
import type { Context } from '../context';
export type ToolCapability = 'core' | 'tabs' | 'pdf' | 'history' | 'wait' | 'files' | 'install';
export type ToolSchema = {
name: string;
description: string;
@@ -30,6 +32,7 @@ export type ToolResult = {
};
export type Tool = {
capability: ToolCapability;
schema: ToolSchema;
handle: (context: Context, params?: Record<string, any>) => Promise<ToolResult>;
};

View File

@@ -15,10 +15,8 @@
*/
import type * as playwright from 'playwright';
import type { ToolResult } from './tool';
import type { Context } from '../context';
async function waitForCompletion<R>(page: playwright.Page, callback: () => Promise<R>): Promise<R> {
export async function waitForCompletion<R>(page: playwright.Page, callback: () => Promise<R>): Promise<R> {
const requests = new Set<playwright.Request>();
let frameNavigated = false;
let waitCallback: () => void = () => {};
@@ -71,42 +69,6 @@ async function waitForCompletion<R>(page: playwright.Page, callback: () => Promi
}
}
export async function runAndWait(context: Context, status: string, callback: (page: playwright.Page) => Promise<any>, snapshot: boolean = false): Promise<ToolResult> {
const page = context.existingPage();
const dismissFileChooser = context.hasFileChooser();
await waitForCompletion(page, () => callback(page));
if (dismissFileChooser)
context.clearFileChooser();
const result: ToolResult = snapshot ? await captureAriaSnapshot(context, status) : {
content: [{ type: 'text', text: status }],
};
return result;
}
export async function captureAriaSnapshot(context: Context, status: string = ''): Promise<ToolResult> {
const page = context.existingPage();
const lines = [];
if (status)
lines.push(`${status}`);
lines.push(
'',
`- Page URL: ${page.url()}`,
`- Page Title: ${await page.title()}`
);
if (context.hasFileChooser())
lines.push(`- There is a file chooser visible that requires browser_choose_file to be called`);
lines.push(
`- Page Snapshot`,
'```yaml',
await context.allFramesSnapshot(),
'```',
''
);
return {
content: [{ type: 'text', text: lines.join('\n') }],
};
}
export function sanitizeForFilePath(s: string) {
return s.replace(/[\x00-\x2C\x2E-\x2F\x3A-\x40\x5B-\x60\x7B-\x7F]+/g, '-');
}

View File

@@ -15,66 +15,17 @@
*/
import fs from 'fs/promises';
import { spawn } from 'node:child_process';
import path from 'node:path';
import { test, expect } from './fixtures';
test('test tool list', async ({ client, visionClient }) => {
const { tools } = await client.listTools();
expect(tools.map(t => t.name)).toEqual([
'browser_navigate',
'browser_go_back',
'browser_go_forward',
'browser_choose_file',
'browser_snapshot',
'browser_click',
'browser_hover',
'browser_type',
'browser_select_option',
'browser_take_screenshot',
'browser_press_key',
'browser_wait',
'browser_save_as_pdf',
'browser_close',
'browser_install',
]);
const { tools: visionTools } = await visionClient.listTools();
expect(visionTools.map(t => t.name)).toEqual([
'browser_navigate',
'browser_go_back',
'browser_go_forward',
'browser_choose_file',
'browser_screenshot',
'browser_move_mouse',
'browser_click',
'browser_drag',
'browser_type',
'browser_press_key',
'browser_wait',
'browser_save_as_pdf',
'browser_close',
'browser_install',
]);
});
test('test resources list', async ({ client }) => {
const { resources } = await client.listResources();
expect(resources).toEqual([
expect.objectContaining({
uri: 'browser://console',
mimeType: 'text/plain',
}),
]);
});
test('test browser_navigate', async ({ client }) => {
test('browser_navigate', async ({ client }) => {
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
Navigated to data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
@@ -85,7 +36,7 @@ test('test browser_navigate', async ({ client }) => {
);
});
test('test browser_click', async ({ client }) => {
test('browser_click', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
@@ -99,7 +50,7 @@ test('test browser_click', async ({ client }) => {
element: 'Submit button',
ref: 's1e3',
},
})).toHaveTextContent(`"Submit button" clicked
})).toHaveTextContent(`Clicked "Submit button"
- Page URL: data:text/html,<html><title>Title</title><button>Submit</button></html>
- Page Title: Title
@@ -110,34 +61,8 @@ test('test browser_click', async ({ client }) => {
`);
});
test('test reopen browser', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
});
expect(await client.callTool({
name: 'browser_close',
})).toHaveTextContent('Page closed');
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
\`\`\`yaml
- text: Hello, world!
\`\`\`
`);
});
test('single option', async ({ client }) => {
test('browser_select_option', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
@@ -165,7 +90,7 @@ test('single option', async ({ client }) => {
`);
});
test('multiple option', async ({ client }) => {
test('browser_select_option (multiple)', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
@@ -194,51 +119,7 @@ test('multiple option', async ({ client }) => {
`);
});
test('browser://console', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><script>console.log("Hello, world!");console.error("Error"); </script></html>',
},
});
const resource = await client.readResource({
uri: 'browser://console',
});
expect(resource.contents).toEqual([{
uri: 'browser://console',
mimeType: 'text/plain',
text: '[LOG] Hello, world!\n[ERROR] Error',
}]);
});
test('stitched aria frames', async ({ client }) => {
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: `data:text/html,<h1>Hello</h1><iframe src="data:text/html,<button>World</button><main><iframe src='data:text/html,<p>Nested</p>'></iframe></main>"></iframe><iframe src="data:text/html,<h1>Should be invisible</h1>" style="display: none;"></iframe>`,
},
})).toContainTextContent(`
\`\`\`yaml
- heading "Hello" [level=1] [ref=s1e3]
- iframe [ref=s1e4]:
- button "World" [ref=f1s1e3]
- main [ref=f1s1e4]:
- iframe [ref=f1s1e5]:
- paragraph [ref=f2s1e3]: Nested
\`\`\`
`);
expect(await client.callTool({
name: 'browser_click',
arguments: {
element: 'World',
ref: 'f1s1e3',
},
})).toContainTextContent('"World" clicked');
});
test('browser_choose_file', async ({ client }) => {
test('browser_file_upload', async ({ client }) => {
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
@@ -252,20 +133,20 @@ test('browser_choose_file', async ({ client }) => {
element: 'Textbox',
ref: 's1e3',
},
})).toContainTextContent('There is a file chooser visible that requires browser_choose_file to be called');
})).toContainTextContent('There is a file chooser visible that requires browser_file_upload to be called');
const filePath = test.info().outputPath('test.txt');
await fs.writeFile(filePath, 'Hello, world!');
{
const response = await client.callTool({
name: 'browser_choose_file',
name: 'browser_file_upload',
arguments: {
paths: [filePath],
},
});
expect(response).not.toContainTextContent('There is a file chooser visible that requires browser_choose_file to be called');
expect(response).not.toContainTextContent('There is a file chooser visible that requires browser_file_upload to be called');
expect(response).toContainTextContent('textbox [ref=s3e3]: C:\\fakepath\\test.txt');
}
@@ -278,7 +159,7 @@ test('browser_choose_file', async ({ client }) => {
},
});
expect(response).toContainTextContent('There is a file chooser visible that requires browser_choose_file to be called');
expect(response).toContainTextContent('There is a file chooser visible that requires browser_file_upload to be called');
expect(response).toContainTextContent('button "Button" [ref=s4e4]');
}
@@ -291,80 +172,83 @@ test('browser_choose_file', async ({ client }) => {
},
});
expect(response, 'not submitting browser_choose_file dismisses file chooser').not.toContainTextContent('There is a file chooser visible that requires browser_choose_file to be called');
expect(response, 'not submitting browser_file_upload dismisses file chooser').not.toContainTextContent('There is a file chooser visible that requires browser_file_upload to be called');
}
});
test('sse transport', async () => {
const cp = spawn('node', [path.join(__dirname, '../cli.js'), '--port', '0'], { stdio: 'pipe' });
try {
let stdout = '';
const url = await new Promise<string>(resolve => cp.stdout?.on('data', data => {
stdout += data.toString();
const match = stdout.match(/Listening on (http:\/\/.*)/);
if (match)
resolve(match[1]);
}));
// need dynamic import b/c of some ESM nonsense
const { SSEClientTransport } = await import('@modelcontextprotocol/sdk/client/sse.js');
const { Client } = await import('@modelcontextprotocol/sdk/client/index.js');
const transport = new SSEClientTransport(new URL(url));
const client = new Client({ name: 'test', version: '1.0.0' });
await client.connect(transport);
await client.ping();
} finally {
cp.kill();
}
});
test('cdp server', async ({ cdpEndpoint, startClient }) => {
const client = await startClient({ args: [`--cdp-endpoint=${cdpEndpoint}`] });
expect(await client.callTool({
test('browser_type', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
\`\`\`yaml
- text: Hello, world!
\`\`\`
`
);
});
test('save as pdf', async ({ client }) => {
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
\`\`\`yaml
- text: Hello, world!
\`\`\`
`
);
const response = await client.callTool({
name: 'browser_save_as_pdf',
});
expect(response).toHaveTextContent(/^Saved as.*page-[^:]+.pdf$/);
});
test('executable path', async ({ startClient }) => {
const client = await startClient({ args: [`--executable-path=bogus`] });
const response = await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
url: `data:text/html,<input type='keypress' onkeypress="console.log('Key pressed:', event.key, ', Text:', event.target.value)"></input>`,
},
});
expect(response).toContainTextContent(`executable doesn't exist`);
await client.callTool({
name: 'browser_type',
arguments: {
element: 'textbox',
ref: 's1e3',
text: 'Hi!',
submit: true,
},
});
const resource = await client.readResource({
uri: 'browser://console',
});
expect(resource.contents).toEqual([{
uri: 'browser://console',
mimeType: 'text/plain',
text: '[LOG] Key pressed: Enter , Text: Hi!',
}]);
});
test('browser_type (slowly)', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: `data:text/html,<input type='text' onkeydown="console.log('Key pressed:', event.key, 'Text:', event.target.value)"></input>`,
},
});
await client.callTool({
name: 'browser_type',
arguments: {
element: 'textbox',
ref: 's1e3',
text: 'Hi!',
submit: true,
slowly: true,
},
});
const resource = await client.readResource({
uri: 'browser://console',
});
expect(resource.contents).toEqual([{
uri: 'browser://console',
mimeType: 'text/plain',
text: [
'[LOG] Key pressed: H Text: ',
'[LOG] Key pressed: i Text: H',
'[LOG] Key pressed: ! Text: Hi',
'[LOG] Key pressed: Enter Text: Hi!',
].join('\n'),
}]);
});
test('browser_resize', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Resize Test</title><body><div id="size">Waiting for resize...</div><script>new ResizeObserver(() => { document.getElementById("size").textContent = `Window size: ${window.innerWidth}x${window.innerHeight}`; }).observe(document.body);</script></body></html>',
},
});
const response = await client.callTool({
name: 'browser_resize',
arguments: {
width: 390,
height: 780,
},
});
expect(response).toContainTextContent('Resized browser window');
expect(response).toContainTextContent('Window size: 390x780');
});

View File

@@ -0,0 +1,94 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { test, expect } from './fixtures';
test('test snapshot tool list', async ({ client }) => {
const { tools } = await client.listTools();
expect(new Set(tools.map(t => t.name))).toEqual(new Set([
'browser_click',
'browser_drag',
'browser_file_upload',
'browser_hover',
'browser_select_option',
'browser_type',
'browser_close',
'browser_install',
'browser_navigate_back',
'browser_navigate_forward',
'browser_navigate',
'browser_pdf_save',
'browser_press_key',
'browser_resize',
'browser_snapshot',
'browser_tab_close',
'browser_tab_list',
'browser_tab_new',
'browser_tab_select',
'browser_take_screenshot',
'browser_wait',
]));
});
test('test vision tool list', async ({ visionClient }) => {
const { tools: visionTools } = await visionClient.listTools();
expect(new Set(visionTools.map(t => t.name))).toEqual(new Set([
'browser_close',
'browser_file_upload',
'browser_install',
'browser_navigate_back',
'browser_navigate_forward',
'browser_navigate',
'browser_pdf_save',
'browser_press_key',
'browser_resize',
'browser_screen_capture',
'browser_screen_click',
'browser_screen_drag',
'browser_screen_move_mouse',
'browser_screen_type',
'browser_tab_close',
'browser_tab_list',
'browser_tab_new',
'browser_tab_select',
'browser_wait',
]));
});
test('test resources list', async ({ client }) => {
const { resources } = await client.listResources();
expect(resources).toEqual([
expect.objectContaining({
uri: 'browser://console',
mimeType: 'text/plain',
}),
]);
});
test('test capabilities', async ({ startClient }) => {
const client = await startClient({
args: ['--caps="core"'],
});
const { tools } = await client.listTools();
const toolNames = tools.map(t => t.name);
expect(toolNames).not.toContain('browser_file_upload');
expect(toolNames).not.toContain('browser_pdf_save');
expect(toolNames).not.toContain('browser_screen_capture');
expect(toolNames).not.toContain('browser_screen_click');
expect(toolNames).not.toContain('browser_screen_drag');
expect(toolNames).not.toContain('browser_screen_move_mouse');
expect(toolNames).not.toContain('browser_screen_type');
});

61
tests/cdp.spec.ts Normal file
View File

@@ -0,0 +1,61 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { test, expect } from './fixtures';
test('cdp server', async ({ cdpEndpoint, startClient }) => {
const client = await startClient({ args: [`--cdp-endpoint=${cdpEndpoint}`] });
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
Navigated to data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
\`\`\`yaml
- text: Hello, world!
\`\`\`
`
);
});
test('cdp server reuse tab', async ({ cdpEndpoint, startClient }) => {
const client = await startClient({ args: [`--cdp-endpoint=${cdpEndpoint}`] });
expect(await client.callTool({
name: 'browser_click',
arguments: {
element: 'Hello, world!',
ref: 'f0',
},
})).toHaveTextContent(`Error: No current snapshot available. Capture a snapshot of navigate to a new location first.`);
expect(await client.callTool({
name: 'browser_snapshot',
arguments: {},
})).toHaveTextContent(`
- Page URL: data:text/html,hello world
- Page Title:
- Page Snapshot
\`\`\`yaml
- text: hello world
\`\`\`
`);
});

35
tests/console.spec.ts Normal file
View File

@@ -0,0 +1,35 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { test, expect } from './fixtures';
test('browser://console', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><script>console.log("Hello, world!");console.error("Error"); </script></html>',
},
});
const resource = await client.readResource({
uri: 'browser://console',
});
expect(resource.contents).toEqual([{
uri: 'browser://console',
mimeType: 'text/plain',
text: '[LOG] Hello, world!\n[ERROR] Error',
}]);
});

View File

@@ -20,11 +20,12 @@ import { chromium } from 'playwright';
import { test as baseTest, expect as baseExpect } from '@playwright/test';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { spawn } from 'child_process';
type Fixtures = {
client: Client;
visionClient: Client;
startClient: (options?: { args?: string[], vision?: boolean }) => Promise<Client>;
startClient: (options?: { args?: string[] }) => Promise<Client>;
wsEndpoint: string;
cdpEndpoint: string;
};
@@ -36,7 +37,7 @@ export const test = baseTest.extend<Fixtures>({
},
visionClient: async ({ startClient }, use) => {
await use(await startClient({ vision: true }));
await use(await startClient({ args: ['--vision'] }));
},
startClient: async ({ }, use, testInfo) => {
@@ -45,8 +46,6 @@ export const test = baseTest.extend<Fixtures>({
use(async options => {
const args = ['--headless', '--user-data-dir', userDataDir];
if (options?.vision)
args.push('--vision');
if (options?.args)
args.push(...options.args);
const transport = new StdioClientTransport({
@@ -70,12 +69,25 @@ export const test = baseTest.extend<Fixtures>({
cdpEndpoint: async ({ }, use, testInfo) => {
const port = 3200 + (+process.env.TEST_PARALLEL_INDEX!);
const browser = await chromium.launchPersistentContext(testInfo.outputPath('user-data-dir'), {
channel: 'chrome',
args: [`--remote-debugging-port=${port}`],
const executablePath = chromium.executablePath();
const browserProcess = spawn(executablePath, [
`--user-data-dir=${testInfo.outputPath('user-data-dir')}`,
`--remote-debugging-port=${port}`,
`--no-first-run`,
`--no-sandbox`,
`--headless`,
`data:text/html,hello world`,
], {
stdio: 'pipe',
});
await new Promise<void>(resolve => {
browserProcess.stderr.on('data', data => {
if (data.toString().includes('DevTools listening on '))
resolve();
});
});
await use(`http://localhost:${port}`);
await browser.close();
browserProcess.kill();
},
});
@@ -86,10 +98,17 @@ export const expect = baseExpect.extend({
const isNot = this.isNot;
try {
const text = (response.content as any)[0].text;
if (isNot)
baseExpect(text).not.toMatch(content);
else
baseExpect(text).toMatch(content);
if (typeof content === 'string') {
if (isNot)
baseExpect(text.trim()).not.toBe(content.trim());
else
baseExpect(text.trim()).toBe(content.trim());
} else {
if (isNot)
baseExpect(text).not.toMatch(content);
else
baseExpect(text).toMatch(content);
}
} catch (e) {
return {
pass: isNot,

43
tests/iframes.spec.ts Normal file
View File

@@ -0,0 +1,43 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { test, expect } from './fixtures';
test('stitched aria frames', async ({ client }) => {
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: `data:text/html,<h1>Hello</h1><iframe src="data:text/html,<button>World</button><main><iframe src='data:text/html,<p>Nested</p>'></iframe></main>"></iframe><iframe src="data:text/html,<h1>Should be invisible</h1>" style="display: none;"></iframe>`,
},
})).toContainTextContent(`
\`\`\`yaml
- heading "Hello" [level=1] [ref=s1e3]
- iframe [ref=s1e4]:
- button "World" [ref=f1s1e3]
- main [ref=f1s1e4]:
- iframe [ref=f1s1e5]:
- paragraph [ref=f2s1e3]: Nested
\`\`\`
`);
expect(await client.callTool({
name: 'browser_click',
arguments: {
element: 'World',
ref: 'f1s1e3',
},
})).toContainTextContent('Clicked "World"');
});

57
tests/launch.spec.ts Normal file
View File

@@ -0,0 +1,57 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { test, expect } from './fixtures';
test('test reopen browser', async ({ client }) => {
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
});
expect(await client.callTool({
name: 'browser_close',
})).toHaveTextContent('Page closed');
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
Navigated to data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
\`\`\`yaml
- text: Hello, world!
\`\`\`
`);
});
test('executable path', async ({ startClient }) => {
const client = await startClient({ args: [`--executable-path=bogus`] });
const response = await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
});
expect(response).toContainTextContent(`executable doesn't exist`);
});

55
tests/pdf.spec.ts Normal file
View File

@@ -0,0 +1,55 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { test, expect } from './fixtures';
test('save as pdf unavailable', async ({ startClient }) => {
const client = await startClient({ args: ['--caps="no-pdf"'] });
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
});
expect(await client.callTool({
name: 'browser_pdf_save',
})).toHaveTextContent(/Tool \"browser_pdf_save\" not found/);
});
test('save as pdf', async ({ client }) => {
expect(await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<html><title>Title</title><body>Hello, world!</body></html>',
},
})).toHaveTextContent(`
Navigated to data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page URL: data:text/html,<html><title>Title</title><body>Hello, world!</body></html>
- Page Title: Title
- Page Snapshot
\`\`\`yaml
- text: Hello, world!
\`\`\`
`
);
const response = await client.callTool({
name: 'browser_pdf_save',
});
expect(response).toHaveTextContent(/^Saved as.*page-[^:]+.pdf$/);
});

42
tests/sse.spec.ts Normal file
View File

@@ -0,0 +1,42 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { spawn } from 'node:child_process';
import path from 'node:path';
import { test } from './fixtures';
test('sse transport', async () => {
const cp = spawn('node', [path.join(__dirname, '../cli.js'), '--port', '0'], { stdio: 'pipe' });
try {
let stdout = '';
const url = await new Promise<string>(resolve => cp.stdout?.on('data', data => {
stdout += data.toString();
const match = stdout.match(/Listening on (http:\/\/.*)/);
if (match)
resolve(match[1]);
}));
// need dynamic import b/c of some ESM nonsense
const { SSEClientTransport } = await import('@modelcontextprotocol/sdk/client/sse.js');
const { Client } = await import('@modelcontextprotocol/sdk/client/index.js');
const transport = new SSEClientTransport(new URL(url));
const client = new Client({ name: 'test', version: '1.0.0' });
await client.connect(transport);
await client.ping();
} finally {
cp.kill();
}
});

121
tests/tabs.spec.ts Normal file
View File

@@ -0,0 +1,121 @@
/**
* Copyright (c) Microsoft Corporation.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import { chromium } from 'playwright';
import { test, expect } from './fixtures';
import type { Client } from '@modelcontextprotocol/sdk/client/index.js';
async function createTab(client: Client, title: string, body: string) {
return await client.callTool({
name: 'browser_tab_new',
arguments: {
url: `data:text/html,<title>${title}</title><body>${body}</body>`,
},
});
}
test('create new tab', async ({ client }) => {
expect(await createTab(client, 'Tab one', 'Body one')).toHaveTextContent(`
Open tabs:
- 1: [] (about:blank)
- 2: (current) [Tab one] (data:text/html,<title>Tab one</title><body>Body one</body>)
Current tab:
- Page URL: data:text/html,<title>Tab one</title><body>Body one</body>
- Page Title: Tab one
- Page Snapshot
\`\`\`yaml
- text: Body one
\`\`\``);
expect(await createTab(client, 'Tab two', 'Body two')).toHaveTextContent(`
Open tabs:
- 1: [] (about:blank)
- 2: [Tab one] (data:text/html,<title>Tab one</title><body>Body one</body>)
- 3: (current) [Tab two] (data:text/html,<title>Tab two</title><body>Body two</body>)
Current tab:
- Page URL: data:text/html,<title>Tab two</title><body>Body two</body>
- Page Title: Tab two
- Page Snapshot
\`\`\`yaml
- text: Body two
\`\`\``);
});
test('select tab', async ({ client }) => {
await createTab(client, 'Tab one', 'Body one');
await createTab(client, 'Tab two', 'Body two');
expect(await client.callTool({
name: 'browser_tab_select',
arguments: {
index: 2,
},
})).toHaveTextContent(`
Open tabs:
- 1: [] (about:blank)
- 2: (current) [Tab one] (data:text/html,<title>Tab one</title><body>Body one</body>)
- 3: [Tab two] (data:text/html,<title>Tab two</title><body>Body two</body>)
Current tab:
- Page URL: data:text/html,<title>Tab one</title><body>Body one</body>
- Page Title: Tab one
- Page Snapshot
\`\`\`yaml
- text: Body one
\`\`\``);
});
test('close tab', async ({ client }) => {
await createTab(client, 'Tab one', 'Body one');
await createTab(client, 'Tab two', 'Body two');
expect(await client.callTool({
name: 'browser_tab_close',
arguments: {
index: 3,
},
})).toHaveTextContent(`
Open tabs:
- 1: [] (about:blank)
- 2: (current) [Tab one] (data:text/html,<title>Tab one</title><body>Body one</body>)
Current tab:
- Page URL: data:text/html,<title>Tab one</title><body>Body one</body>
- Page Title: Tab one
- Page Snapshot
\`\`\`yaml
- text: Body one
\`\`\``);
});
test('reuse first tab when navigating', async ({ startClient, cdpEndpoint }) => {
const browser = await chromium.connectOverCDP(cdpEndpoint);
const [context] = browser.contexts();
const pages = context.pages();
const client = await startClient({ args: [`--cdp-endpoint=${cdpEndpoint}`] });
await client.callTool({
name: 'browser_navigate',
arguments: {
url: 'data:text/html,<title>Title</title><body>Body</body>',
},
});
expect(pages.length).toBe(1);
expect(await pages[0].title()).toBe('Title');
});