chore: group tools, prepare for capabilities (#134)

2026-02-03 08:53:38 +00:00 · 2025-04-04 15:22:00 -07:00
parent fc0cccf4a5
commit 707ebbf4d4
14 changed files with 500 additions and 361 deletions
--- a/README.md
+++ b/README.md
@@ -167,22 +167,7 @@ transport = new SSEServerTransport("/messages", res);
 server.connect(transport);
 ```

-### Snapshot Mode
-
-The Playwright MCP provides a set of tools for browser automation. Here are all available tools:
-
- **browser_navigate**
-  - Description: Navigate to a URL
-  - Parameters:
-    - `url` (string): The URL to navigate to
-
- **browser_go_back**
-  - Description: Go back to the previous page
-  - Parameters: None
-
- **browser_go_forward**
-  - Description: Go forward to the next page
-  - Parameters: None
+### Snapshot-based Interactions

 - **browser_click**
  - Description: Perform click on a web page
@@ -210,109 +195,121 @@ The Playwright MCP provides a set of tools for browser automation. Here are all
    - `element` (string): Human-readable element description used to obtain permission to interact with the element
    - `ref` (string): Exact target element reference from the page snapshot
    - `text` (string): Text to type into the element
-    - `submit` (boolean): Whether to submit entered text (press Enter after)
+    - `submit` (boolean, optional): Whether to submit entered text (press Enter after)
+    - `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.

 - **browser_select_option**
-  - Description: Select option in a dropdown
+  - Description: Select an option in a dropdown
  - Parameters:
    - `element` (string): Human-readable element description used to obtain permission to interact with the element
    - `ref` (string): Exact target element reference from the page snapshot
-    - `values` (array): Array of values to select in the dropdown.
+    - `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.

- **browser_choose_file**
-  - Description: Choose one or multiple files to upload
+- **browser_snapshot**
+  - Description: Capture accessibility snapshot of the current page, this is better than screenshot
+  - Parameters: None
+
+- **browser_take_screenshot**
+  - Description: Take a screenshot of the current page. You can't perform actions based on the screenshot, use browser_snapshot for actions.
  - Parameters:
-    - `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.
+    - `raw` (boolean, optional): Whether to return without compression (in PNG format). Default is false, which returns a JPEG image.
+
+### Vision-based Interactions
+
+- **browser_screen_move_mouse**
+  - Description: Move mouse to a given position
+  - Parameters:
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `x` (number): X coordinate
+    - `y` (number): Y coordinate
+
+- **browser_screen_capture**
+  - Description: Take a screenshot of the current page
+  - Parameters: None
+
+- **browser_screen_click**
+  - Description: Click left mouse button
+  - Parameters:
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `x` (number): X coordinate
+    - `y` (number): Y coordinate
+
+- **browser_screen_drag**
+  - Description: Drag left mouse button
+  - Parameters:
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `startX` (number): Start X coordinate
+    - `startY` (number): Start Y coordinate
+    - `endX` (number): End X coordinate
+    - `endY` (number): End Y coordinate
+
+- **browser_screen_type**
+  - Description: Type text
+  - Parameters:
+    - `text` (string): Text to type
+    - `submit` (boolean, optional): Whether to submit entered text (press Enter after)

 - **browser_press_key**
  - Description: Press a key on the keyboard
  - Parameters:
    - `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`

- **browser_snapshot**
-  - Description: Capture accessibility snapshot of the current page (better than screenshot)
+### Tab Management
+
+- **browser_tab_list**
+  - Description: List browser tabs
  - Parameters: None

- **browser_save_as_pdf**
-  - Description: Save page as PDF
-  - Parameters: None
-
- **browser_take_screenshot**
-  - Description: Capture screenshot of the page
+- **browser_tab_new**
+  - Description: Open a new tab
  - Parameters:
-    - `raw` (string): Optionally returns lossless PNG screenshot. JPEG by default.
+    - `url` (string, optional): The URL to navigate to in the new tab. If not provided, the new tab will be blank.

- **browser_wait**
-  - Description: Wait for a specified time in seconds
+- **browser_tab_select**
+  - Description: Select a tab by index
  - Parameters:
-    - `time` (number): The time to wait in seconds (capped at 10 seconds)
+    - `index` (number): The index of the tab to select

- **browser_close**
-  - Description: Close the page
-  - Parameters: None
+- **browser_tab_close**
+  - Description: Close a tab
+  - Parameters:
+    - `index` (number, optional): The index of the tab to close. Closes current tab if not provided.

-
-### Vision Mode
-
-Vision Mode provides tools for visual-based interactions using screenshots. Here are all available tools:
+### Navigation

 - **browser_navigate**
  - Description: Navigate to a URL
  - Parameters:
    - `url` (string): The URL to navigate to

- **browser_go_back**
+- **browser_navigate_back**
  - Description: Go back to the previous page
  - Parameters: None

- **browser_go_forward**
+- **browser_navigate_forward**
  - Description: Go forward to the next page
  - Parameters: None

- **browser_screenshot**
-  - Description: Capture screenshot of the current page
-  - Parameters: None
-
- **browser_move_mouse**
-  - Description: Move mouse to specified coordinates
-  - Parameters:
-    - `x` (number): X coordinate
-    - `y` (number): Y coordinate
-
- **browser_click**
-  - Description: Click at specified coordinates
-  - Parameters:
-    - `x` (number): X coordinate to click at
-    - `y` (number): Y coordinate to click at
-
- **browser_drag**
-  - Description: Perform drag and drop operation
-  - Parameters:
-    - `startX` (number): Start X coordinate
-    - `startY` (number): Start Y coordinate
-    - `endX` (number): End X coordinate
-    - `endY` (number): End Y coordinate
-
- **browser_type**
-  - Description: Type text at specified coordinates
-  - Parameters:
-    - `text` (string): Text to type
-    - `submit` (boolean): Whether to submit entered text (press Enter after)
+### Keyboard

 - **browser_press_key**
  - Description: Press a key on the keyboard
  - Parameters:
    - `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`

+### Files and Media
+
 - **browser_choose_file**
  - Description: Choose one or multiple files to upload
  - Parameters:
    - `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.

- **browser_save_as_pdf**
+- **browser_pdf_save**
  - Description: Save page as PDF
  - Parameters: None

+### Utilities
+
 - **browser_wait**
  - Description: Wait for a specified time in seconds
  - Parameters:
@@ -321,3 +318,10 @@ Vision Mode provides tools for visual-based interactions using screenshots. Here
 - **browser_close**
  - Description: Close the page
  - Parameters: None
+
+- **browser_install**
+  - Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
+  - Parameters: None
+
+### Vision Mode
+