usage-demo through flawed architect gem output and interaction
This commit is contained in:
226
CURRENT-V2/usage-demo/architecture-draft.md
Normal file
226
CURRENT-V2/usage-demo/architecture-draft.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# BMad Hacker Daily Digest Architecture Document
|
||||
|
||||
## Technical Summary
|
||||
|
||||
This document outlines the technical architecture for the BMad Hacker Daily Digest, a command-line tool built with TypeScript and Node.js v22. It adheres to the structure provided by the "bmad-boilerplate". The system fetches the top 10 Hacker News stories and their comments daily via the Algolia HN API, attempts to scrape linked articles, generates summaries for both articles (if scraped) and discussions using a local Ollama instance, persists intermediate data locally, and sends an HTML digest email via Nodemailer upon manual CLI execution. The architecture emphasizes modularity through distinct clients and processing stages, facilitating independent stage testing as required by the PRD. Execution is strictly local for the MVP.
|
||||
|
||||
## High-Level Overview
|
||||
|
||||
The application follows a sequential pipeline architecture triggered by a single CLI command (`npm run dev` or `npm start`). Data flows through distinct stages: HN Data Acquisition, Article Scraping, LLM Summarization, and Digest Assembly/Email Dispatch. Each stage persists its output to a date-stamped local directory, allowing subsequent stages to operate on this data and enabling stage-specific testing utilities.
|
||||
|
||||
**(Diagram Suggestion for Canvas: Create a flowchart showing the stages below)**
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[CLI Trigger (npm run dev/start)] --> B(Initialize: Load Config, Setup Logger, Create Output Dir);
|
||||
B --> C{Fetch HN Data (Top 10 Stories + Comments)};
|
||||
C -- Story/Comment Data --> D(Persist HN Data: ./output/YYYY-MM-DD/{storyId}_data.json);
|
||||
D --> E{Attempt Article Scraping (per story)};
|
||||
E -- Scraped Text (if successful) --> F(Persist Article Text: ./output/YYYY-MM-DD/{storyId}_article.txt);
|
||||
F --> G{Generate Summaries (Article + Discussion via Ollama)};
|
||||
G -- Summaries --> H(Persist Summaries: ./output/YYYY-MM-DD/{storyId}_summary.json);
|
||||
H --> I{Assemble Digest (Read persisted data)};
|
||||
I -- HTML Content --> J{Send Email via Nodemailer};
|
||||
J --> K(Log Final Status & Exit);
|
||||
|
||||
subgraph Stage Testing Utilities
|
||||
direction LR
|
||||
T1[npm run stage:fetch] --> D;
|
||||
T2[npm run stage:scrape] --> F;
|
||||
T3[npm run stage:summarize] --> H;
|
||||
T4[npm run stage:email] --> J;
|
||||
end
|
||||
|
||||
C --> |Error/Skip| G; // If no comments
|
||||
E --> |Skip/Fail| G; // If no URL or scrape fails
|
||||
G --> |Summarization Fail| H; // Persist null summaries
|
||||
I --> |Assembly Fail| K; // Skip email if assembly fails
|
||||
```
|
||||
|
||||
## Component View
|
||||
|
||||
The application logic resides primarily within the `src/` directory, organized into modules responsible for specific pipeline stages or cross-cutting concerns.
|
||||
|
||||
**(Diagram Suggestion for Canvas: Create a component diagram showing modules and dependencies)**
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph src ["Source Code (src/)"]
|
||||
direction LR
|
||||
Entry["index.ts (Main Orchestrator)"]
|
||||
|
||||
subgraph Config ["Configuration"]
|
||||
ConfMod["config.ts"]
|
||||
EnvFile[".env File"]
|
||||
end
|
||||
|
||||
subgraph Utils ["Utilities"]
|
||||
Logger["logger.ts"]
|
||||
end
|
||||
|
||||
subgraph Clients ["External Service Clients"]
|
||||
Algolia["clients/algoliaHNClient.ts"]
|
||||
Ollama["clients/ollamaClient.ts"]
|
||||
end
|
||||
|
||||
Scraper["scraper/articleScraper.ts"]
|
||||
|
||||
subgraph Email ["Email Handling"]
|
||||
Assembler["email/contentAssembler.ts"]
|
||||
Templater["email/templater.ts (or within Assembler)"]
|
||||
Sender["email/emailSender.ts"]
|
||||
Nodemailer["(nodemailer library)"]
|
||||
end
|
||||
|
||||
subgraph Stages ["Stage Testing Scripts (src/stages/)"]
|
||||
FetchStage["fetch_hn_data.ts"]
|
||||
ScrapeStage["scrape_articles.ts"]
|
||||
SummarizeStage["summarize_content.ts"]
|
||||
SendStage["send_digest.ts"]
|
||||
end
|
||||
|
||||
Entry --> ConfMod;
|
||||
Entry --> Logger;
|
||||
Entry --> Algolia;
|
||||
Entry --> Scraper;
|
||||
Entry --> Ollama;
|
||||
Entry --> Assembler;
|
||||
Entry --> Templater;
|
||||
Entry --> Sender;
|
||||
|
||||
Algolia -- uses --> NativeFetch["Node.js v22 Native Workspace"];
|
||||
Ollama -- uses --> NativeFetch;
|
||||
Scraper -- uses --> NativeFetch;
|
||||
Scraper -- uses --> ArticleExtractor["(@extractus/article-extractor)"];
|
||||
Sender -- uses --> Nodemailer;
|
||||
ConfMod -- reads --> EnvFile;
|
||||
|
||||
Assembler -- reads --> LocalFS["Local Filesystem (./output)"];
|
||||
Entry -- writes --> LocalFS;
|
||||
|
||||
FetchStage --> Algolia;
|
||||
FetchStage --> LocalFS;
|
||||
ScrapeStage --> Scraper;
|
||||
ScrapeStage --> LocalFS;
|
||||
SummarizeStage --> Ollama;
|
||||
SummarizeStage --> LocalFS;
|
||||
SendStage --> Assembler;
|
||||
SendStage --> Templater;
|
||||
SendStage --> Sender;
|
||||
SendStage --> LocalFS;
|
||||
end
|
||||
|
||||
CLI["CLI (npm run ...)"] --> Entry;
|
||||
CLI -- runs --> FetchStage;
|
||||
CLI -- runs --> ScrapeStage;
|
||||
CLI -- runs --> SummarizeStage;
|
||||
CLI -- runs --> SendStage;
|
||||
|
||||
```
|
||||
|
||||
_Module Descriptions:_
|
||||
|
||||
- **`src/index.ts`**: The main entry point, orchestrating the entire pipeline flow from initialization to final email dispatch. Imports and calls functions from other modules.
|
||||
- **`src/config.ts`**: Responsible for loading and validating environment variables from the `.env` file using the `dotenv` library.
|
||||
- **`src/logger.ts`**: Provides a simple console logging utility used throughout the application.
|
||||
- **`src/clients/algoliaHNClient.ts`**: Encapsulates interaction with the Algolia Hacker News Search API using the native `Workspace` API for fetching stories and comments.
|
||||
- **`src/clients/ollamaClient.ts`**: Encapsulates interaction with the local Ollama API endpoint using the native `Workspace` API for generating summaries.
|
||||
- **`src/scraper/articleScraper.ts`**: Handles fetching article HTML using native `Workspace` and extracting text content using `@extractus/article-extractor`. Includes robust error handling for fetch and extraction failures.
|
||||
- **`src/email/contentAssembler.ts`**: Reads persisted story data and summaries from the local output directory.
|
||||
- **`src/email/templater.ts` (or integrated)**: Renders the HTML email content using the assembled data.
|
||||
- **`src/email/emailSender.ts`**: Configures and uses Nodemailer to send the generated HTML email.
|
||||
- **`src/stages/*.ts`**: Individual scripts designed to run specific pipeline stages independently for testing, using persisted data from previous stages as input where applicable.
|
||||
|
||||
## Key Architectural Decisions & Patterns
|
||||
|
||||
- **Pipeline Architecture:** A sequential flow where each stage processes data and passes artifacts to the next via the local filesystem. Chosen for simplicity and to easily support independent stage testing.
|
||||
- **Local Execution & File Persistence:** All execution is local, and intermediate artifacts (`_data.json`, `_article.txt`, `_summary.json`) are stored in a date-stamped `./output` directory. This avoids database setup for MVP and facilitates debugging/stage testing.
|
||||
- **Native `Workspace` API:** Mandated by constraints for all HTTP requests (Algolia, Ollama, Article Scraping). Ensures usage of the latest Node.js features.
|
||||
- **Modular Clients:** External interactions (Algolia, Ollama) are encapsulated in dedicated client modules (`src/clients/`). This promotes separation of concerns and makes swapping implementations (e.g., different LLM API) easier.
|
||||
- **Configuration via `.env`:** Standard approach using `dotenv` for managing API keys, endpoints, and behavioral parameters (as per boilerplate).
|
||||
- **Stage Testing Utilities:** Dedicated scripts (`src/stages/*.ts`) allow isolated testing of fetching, scraping, summarization, and emailing, fulfilling a key PRD requirement.
|
||||
- **Graceful Error Handling (Scraping):** Article scraping failures are logged but do not halt the main pipeline, allowing the process to continue with discussion summaries only, as required. Other errors (API, LLM) are logged.
|
||||
|
||||
## Core Workflow / Sequence Diagrams (Simplified)
|
||||
|
||||
**(Diagram Suggestion for Canvas: Create a Sequence Diagram showing interactions)**
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CLI
|
||||
participant Index as index.ts
|
||||
participant Config as config.ts
|
||||
participant Logger as logger.ts
|
||||
participant OutputDir as Output Dir Setup
|
||||
participant Algolia as algoliaHNClient.ts
|
||||
participant Scraper as articleScraper.ts
|
||||
participant Ollama as ollamaClient.ts
|
||||
participant Assembler as contentAssembler.ts
|
||||
participant Templater as templater.ts
|
||||
participant Sender as emailSender.ts
|
||||
participant FS as Local Filesystem (./output/YYYY-MM-DD)
|
||||
|
||||
CLI->>Index: npm run dev
|
||||
Index->>Config: Load .env vars
|
||||
Index->>Logger: Initialize
|
||||
Index->>OutputDir: Create/Verify Date Dir
|
||||
Index->>Algolia: fetchTopStories()
|
||||
Algolia-->>Index: stories[]
|
||||
loop For Each Story
|
||||
Index->>Algolia: fetchCommentsForStory(storyId, MAX_COMMENTS)
|
||||
Algolia-->>Index: comments[]
|
||||
Index->>FS: Write {storyId}_data.json
|
||||
alt Has Valid story.url
|
||||
Index->>Scraper: scrapeArticle(story.url)
|
||||
Scraper-->>Index: articleContent (string | null)
|
||||
alt Scrape Success
|
||||
Index->>FS: Write {storyId}_article.txt
|
||||
end
|
||||
end
|
||||
alt Has articleContent
|
||||
Index->>Ollama: generateSummary(ARTICLE_PROMPT, articleContent)
|
||||
Ollama-->>Index: articleSummary (string | null)
|
||||
end
|
||||
alt Has comments[]
|
||||
Index->>Ollama: generateSummary(DISCUSSION_PROMPT, formattedComments)
|
||||
Ollama-->>Index: discussionSummary (string | null)
|
||||
end
|
||||
Index->>FS: Write {storyId}_summary.json
|
||||
end
|
||||
Index->>Assembler: assembleDigestData(dateDirPath)
|
||||
Assembler->>FS: Read _data.json, _summary.json files
|
||||
Assembler-->>Index: digestData[]
|
||||
alt digestData is not empty
|
||||
Index->>Templater: renderDigestHtml(digestData, date)
|
||||
Templater-->>Index: htmlContent
|
||||
Index->>Sender: sendDigestEmail(subject, htmlContent)
|
||||
Sender-->>Index: success (boolean)
|
||||
end
|
||||
Index->>Logger: Log final status
|
||||
```
|
||||
|
||||
## Infrastructure and Deployment Overview
|
||||
|
||||
- **Cloud Provider(s):** N/A (Local Machine Execution Only for MVP)
|
||||
- **Core Services Used:** N/A
|
||||
- **Infrastructure as Code (IaC):** N/A
|
||||
- **Deployment Strategy:** Manual CLI execution (`npm run dev` for development with `ts-node`, `npm run build && npm start` for running compiled JS). No automated deployment pipeline for MVP.
|
||||
- **Environments:** Single: Local development machine.
|
||||
|
||||
## Key Reference Documents
|
||||
|
||||
- docs/prd.md
|
||||
- docs/epic1-draft.txt, docs/epic2-draft.txt, ... docs/epic5-draft.txt
|
||||
- docs/tech-stack.md
|
||||
- docs/project-structure.md
|
||||
- docs/coding-standards.md
|
||||
- docs/api-reference.md
|
||||
- docs/data-models.md
|
||||
- docs/environment-vars.md
|
||||
- docs/testing-strategy.md
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ------------- | ---------- | ------- | ---------------------------------- | ----------- |
|
||||
| Initial draft | 2025-05-04 | 0.1 | Initial draft based on PRD & Epics | 3-Architect |
|
||||
25
CURRENT-V2/usage-demo/demo.md
Normal file
25
CURRENT-V2/usage-demo/demo.md
Normal file
@@ -0,0 +1,25 @@
|
||||
# Demonstration of the Full BMad Workflow Agent Gem Usage
|
||||
|
||||
## BA Brainstorming
|
||||
|
||||
The following link shows the full chat thread with the BA demonstrating many features of this amazing agent. I started out not even knowing what to build, and it helped me ideate with the goal of something interesting for tutorial purposes, refine it, do some deep research (in thinking mode, i did not switch models), gave some great alternative details and ideas, prompted me section by section eventually to produce the brief. It worked amazingly well. You can read the full transcript and output here:
|
||||
|
||||
https://gemini.google.com/share/fec063449737
|
||||
|
||||
## PM Brainstorming (Oops it was not the PM LOL)
|
||||
|
||||
I took the final output md brief with prompt for the PM at the end of the last chat and created a google doc to make it easier to share with the pm (I could have probably just pasted it into the new chat, but its easier if i want to start over). In google docs its so easy to just create a new doc, right click and select 'Paste from MArkdown', then click in the title and it will automatically name and save it with the title of the document. I then started a chat with the 2-PM Gem, also in gemini 2.5 pro thinking mode by attaching the google doc and telling it to reference the prompt. This is the transcript. I realized that I accidentally had pasted the BA prompt also into the PM prompt, so this actually ended up producing a pretty nicely refined brief 2.0 instead LOL
|
||||
|
||||
https://g.co/gemini/share/3e09f04138f2
|
||||
|
||||
So I tool that output file and put it into the actual BA again to produce a new version with prompt as seen in [this file](final-brief-with-pm-prompt.txt).
|
||||
|
||||
## PM Brainstorming Take 2
|
||||
|
||||
I will going forward with the rest of the process not use google docs even though its preferred and instead attach txt attachments of previous phase documents, this is required or else the link will be un-sharable.
|
||||
|
||||
Of note here is how I am not passive in this process and you should not be either - I looked at its proposed epics in its first prd draft after answering the initial questions and spotting something really dumb, it had a final epic for doing file output and logging all the way at the end - when really this should be happening incrementally with each epic. The Architect or PO I hope would have caught this later and the PM might also if i let it get to the checklist phase, but if you can work with it you will have quicker results and better off.
|
||||
|
||||
Also notice, since we came to the PM with the amazing brief + prompt embedded in it - it only had like 1 question before producing the first draft - amazing!!!
|
||||
|
||||
The PM did a great job of asking the right questions, and producing the [Draft PRD](./draft-prd.txt), and each epic, [1](./epic1-draft.txt), [2](./epic2-draft.txt), [3](./epic3-draft.txt), [4](./epic4-draft.txt), [5](epic5-draft.txt)
|
||||
189
CURRENT-V2/usage-demo/draft-prd.txt
Normal file
189
CURRENT-V2/usage-demo/draft-prd.txt
Normal file
@@ -0,0 +1,189 @@
|
||||
# BMad Hacker Daily Digest Product Requirements Document (PRD)
|
||||
|
||||
## Intro
|
||||
|
||||
The BMad Hacker Daily Digest is a command-line tool designed to address the time-consuming nature of reading extensive Hacker News (HN) comment threads. It aims to provide users with a time-efficient way to grasp the collective intelligence and key insights from discussions on top HN stories. The service will fetch the top 10 HN stories daily, retrieve a configurable number of comments for each, attempt to scrape the linked article, generate separate summaries for the article (if scraped) and the comment discussion using a local LLM, and deliver these summaries in a single daily email briefing triggered manually. This project also serves as a practical learning exercise in agent-driven development, TypeScript, Node.js, API integration, and local LLM usage, starting from the provided "bmad-boilerplate" template.
|
||||
|
||||
## Goals and Context
|
||||
|
||||
- **Project Objectives:**
|
||||
- Provide a quick, reliable, automated way to stay informed about key HN discussions without reading full threads.
|
||||
- Successfully fetch top 10 HN story metadata via Algolia HN API.
|
||||
- Retrieve a _configurable_ number of comments per story (default 50) via Algolia HN API.
|
||||
- Attempt basic scraping of linked article content, handling failures gracefully.
|
||||
- Generate distinct Article Summaries (if scraped) and Discussion Summaries using a local LLM (Ollama).
|
||||
- Assemble summaries for 10 stories into an HTML email and send via Nodemailer upon manual CLI trigger.
|
||||
- Serve as a learning platform for agent-driven development, TypeScript, Node.js v22, API integration, local LLMs, and configuration management, leveraging the "bmad-boilerplate" structure and tooling.
|
||||
- **Measurable Outcomes:**
|
||||
- The tool completes its full process (fetch, scrape attempt, summarize, email) without crashing on manual CLI trigger across multiple test runs.
|
||||
- The generated email digest consistently contains results for 10 stories, including correct links, discussion summaries, and article summaries where scraping was successful.
|
||||
- Errors during article scraping are logged, and the process continues for affected stories using only comment summaries, without halting the script.
|
||||
- **Success Criteria:**
|
||||
- Successful execution of the end-to-end process via CLI trigger for 3 consecutive test runs.
|
||||
- Generated email is successfully sent and received, containing summaries for all 10 fetched stories (article summary optional based on scraping success).
|
||||
- Scraping failures are logged appropriately without stopping the overall process.
|
||||
- **Key Performance Indicators (KPIs):**
|
||||
- Successful Runs / Total Runs (Target: 100% for MVP tests)
|
||||
- Stories with Article Summaries / Total Stories (Measures scraping effectiveness)
|
||||
- Stories with Discussion Summaries / Total Stories (Target: 100%)
|
||||
* Manual Qualitative Check: Relevance and coherence of summaries in the digest.
|
||||
|
||||
## Scope and Requirements (MVP / Current Version)
|
||||
|
||||
### Functional Requirements (High-Level)
|
||||
|
||||
- **HN Story Fetching:** Retrieve IDs and metadata (title, URL, HN link) for the top 10 stories from Algolia HN Search API.
|
||||
- **HN Comment Fetching:** For each story, retrieve comments from Algolia HN Search API up to a maximum count defined in a `.env` configuration variable (`MAX_COMMENTS_PER_STORY`, default 50).
|
||||
- **Article Content Scraping:** Attempt to fetch HTML and extract main text content from the story's external URL using basic methods (e.g., Node.js native fetch, optionally `article-extractor` or similar basic library).
|
||||
- **Scraping Failure Handling:** If scraping fails, log the error and proceed with generating only the Discussion Summary for that story.
|
||||
- **LLM Summarization:**
|
||||
- Generate an "Article Summary" from scraped text (if successful) using a configured local LLM (Ollama endpoint).
|
||||
- Generate a "Discussion Summary" from the fetched comments using the same LLM.
|
||||
- Initial Prompts (Placeholders - refine in Epics):
|
||||
- _Article Prompt:_ "Summarize the key points of the following article text: {Article Text}"
|
||||
- _Discussion Prompt:_ "Summarize the main themes, viewpoints, and key insights from the following Hacker News comments: {Comment Texts}"
|
||||
- **Digest Formatting:** Combine results for the 10 stories into a single HTML email. Each story entry should include: Story Title, HN Link, Article Link, Article Summary (if available), Discussion Summary.
|
||||
- **Email Dispatch:** Send the formatted HTML email using Nodemailer to a recipient list defined in `.env`. Use credentials also stored in `.env`.
|
||||
- **Main Execution Trigger:** Initiate the _entire implemented pipeline_ via a manual command-line interface (CLI) trigger, using the standard scripts defined in the boilerplate (`npm run dev`, `npm start` after build). Each functional epic should add its capability to this main execution flow.
|
||||
- **Configuration:** Manage external parameters (Algolia API details (if needed), LLM endpoint URL, `MAX_COMMENTS_PER_STORY`, Nodemailer credentials, recipient email list, output directory path) via a `.env` file, based on the provided `.env.example`.
|
||||
- **Incremental Logging & Data Persistence:**
|
||||
- Implement basic console logging for key steps and errors throughout the pipeline.
|
||||
- Persist intermediate data artifacts (fetched stories/comments, scraped text, generated summaries) to local files within a configurable, date-stamped directory structure (e.g., `./output/YYYY-MM-DD/`).
|
||||
- This persistence should be implemented incrementally within the relevant functional epics (Data Acquisition, Scraping, Summarization).
|
||||
- **Stage Testing Utilities:**
|
||||
- Provide separate utility scripts or CLI commands to allow testing individual pipeline stages in isolation (e.g., fetching HN data, scraping URLs, summarizing text, sending email).
|
||||
- These utilities should support using locally saved files as input (e.g., test scraping using a file containing story URLs, test summarization using a file containing text). This facilitates development and debugging.
|
||||
|
||||
### Non-Functional Requirements (NFRs)
|
||||
|
||||
- **Performance:** MVP focuses on functionality over speed. Should complete within a reasonable time (e.g., < 5 minutes) on a typical developer machine for local LLM use. No specific response time targets.
|
||||
- **Scalability:** Designed for single-user, local execution. No scaling requirements for MVP.
|
||||
- **Reliability/Availability:**
|
||||
- The script must handle article scraping failures gracefully (log and continue).
|
||||
- Basic error handling for API calls (e.g., log network errors).
|
||||
- Local LLM interaction may fail; basic error logging is sufficient for MVP.
|
||||
- No requirement for automated retries or production-grade error handling.
|
||||
- **Security:**
|
||||
- Email credentials must be stored securely via `.env` file and not committed to version control (as per boilerplate `.gitignore`).
|
||||
- No other specific security requirements for local MVP.
|
||||
- **Maintainability:**
|
||||
- Code should be well-structured TypeScript.
|
||||
- Adherence to the linting (ESLint) and formatting (Prettier) rules configured in the "bmad-boilerplate" is required. Use `npm run lint` and `npm run format`.
|
||||
- Modularity is desired to potentially swap LLM providers later and facilitate stage testing.
|
||||
- **Usability/Accessibility:** N/A (CLI tool for developer).
|
||||
- **Other Constraints:**
|
||||
- Must use TypeScript and Node.js v22.
|
||||
- Must run locally on the developer's machine.
|
||||
- Must use Node.js v22 native `Workspace` API for HTTP requests.
|
||||
- Must use Algolia HN Search API for HN data.
|
||||
- Must use a local Ollama instance via a configurable HTTP endpoint.
|
||||
- Must use Nodemailer for email dispatch.
|
||||
- Must use `.env` for configuration based on `.env.example`.
|
||||
- Must use local file system for logging and intermediate data storage. Ensure output/log directories are gitignored.
|
||||
- Focus on a functional pipeline for learning/demonstration.
|
||||
|
||||
### User Experience (UX) Requirements (High-Level)
|
||||
|
||||
- The primary UX goal is to deliver a time-saving digest.
|
||||
- For the developer user, the main CLI interaction should be simple: using standard boilerplate scripts like `npm run dev` or `npm start` to trigger the full process.
|
||||
- Feedback during CLI execution (e.g., "Fetching stories...", "Summarizing story X/10...", "Sending email...") is desirable via console logging.
|
||||
- Separate CLI commands/scripts for testing individual stages should provide clear input/output mechanisms.
|
||||
|
||||
### Integration Requirements (High-Level)
|
||||
|
||||
- **Algolia HN Search API:** Fetching top stories and comments. Requires understanding API structure and query parameters.
|
||||
- **Ollama Service:** Sending text (article content, comments) and receiving summaries via its API endpoint. Endpoint URL must be configurable.
|
||||
- **SMTP Service (via Nodemailer):** Sending the final digest email. Requires valid SMTP credentials and recipient list configured in `.env`.
|
||||
|
||||
### Testing Requirements (High-Level)
|
||||
|
||||
- MVP success relies on manual end-to-end test runs confirming successful execution and valid email output.
|
||||
- Unit/integration tests are encouraged using the **Jest framework configured in the boilerplate**. Focus testing effort on the core pipeline components. Use `npm run test`.
|
||||
- **Stage-specific testing utilities (as defined in Functional Requirements) are required** to support development and verification of individual pipeline components.
|
||||
|
||||
## Epic Overview (MVP / Current Version)
|
||||
|
||||
_(Revised proposal)_
|
||||
|
||||
- **Epic 1: Project Initialization & Core Setup** - Goal: Initialize the project using "bmad-boilerplate", manage dependencies, setup `.env` and config loading, establish basic CLI entry point, setup basic logging and output directory structure.
|
||||
- **Epic 2: HN Data Acquisition & Persistence** - Goal: Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally. Implement stage testing utility for fetching.
|
||||
- **Epic 3: Article Scraping & Persistence** - Goal: Implement best-effort article scraping/extraction, handle failures gracefully, and persist scraped text locally. Implement stage testing utility for scraping.
|
||||
- **Epic 4: LLM Summarization & Persistence** - Goal: Integrate with Ollama to generate article/discussion summaries from persisted data and persist summaries locally. Implement stage testing utility for summarization.
|
||||
- **Epic 5: Digest Assembly & Email Dispatch** - Goal: Format collected summaries into an HTML email using persisted data and send it using Nodemailer. Implement stage testing utility for emailing (with dry-run option).
|
||||
|
||||
## Key Reference Documents
|
||||
|
||||
- `docs/project-brief.md`
|
||||
- `docs/prd.md` (This document)
|
||||
- `docs/architecture.md` (To be created by Architect)
|
||||
- `docs/epic1.md`, `docs/epic2.md`, ... (To be created)
|
||||
- `docs/tech-stack.md` (Partially defined by boilerplate, to be finalized by Architect)
|
||||
- `docs/api-reference.md` (If needed for Algolia/Ollama details)
|
||||
- `docs/testing-strategy.md` (Optional - low priority for MVP, Jest setup provided)
|
||||
|
||||
## Post-MVP / Future Enhancements
|
||||
|
||||
- Advanced scraping techniques (handling JavaScript, anti-bot measures).
|
||||
- Processing all comments (potentially using MapReduce summarization).
|
||||
- Automated scheduling (e.g., using cron).
|
||||
- Database integration for storing results or tracking.
|
||||
- Cloud deployment and web frontend.
|
||||
- User management (sign-ups, preferences).
|
||||
- Production-grade error handling, monitoring, and email deliverability.
|
||||
- Fine-tuning LLM prompts or models.
|
||||
- Sophisticated retry logic for API calls or scraping.
|
||||
- Cloud LLM integration.
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ----------------------- | ---------- | ------- | --------------------------------------- | ------ |
|
||||
| Refined Epics & Testing | 2025-05-04 | 0.3 | Removed Epic 6, added stage testing req | 2-pm |
|
||||
| Boilerplate Added | 2025-05-04 | 0.2 | Updated to reflect use of boilerplate | 2-pm |
|
||||
| Initial Draft | 2025-05-04 | 0.1 | First draft based on brief | 2-pm |
|
||||
|
||||
## Initial Architect Prompt
|
||||
|
||||
### Technical Infrastructure
|
||||
|
||||
- **Starter Project/Template:** **Mandatory: Use the provided "bmad-boilerplate".** This includes TypeScript setup, Node.js v22 compatibility, Jest, ESLint, Prettier, `ts-node`, `.env` handling via `.env.example`, and standard scripts (`dev`, `build`, `test`, `lint`, `format`).
|
||||
- **Hosting/Cloud Provider:** Local machine execution only for MVP. No cloud deployment.
|
||||
- **Frontend Platform:** N/A (CLI tool).
|
||||
- **Backend Platform:** Node.js v22 with TypeScript (as provided by the boilerplate). No specific Node.js framework mandated, but structure should support modularity and align with boilerplate setup.
|
||||
- **Database Requirements:** None. Local file system for intermediate data storage and logging only. Structure TBD (e.g., `./output/YYYY-MM-DD/`). Ensure output directory is configurable via `.env` and gitignored.
|
||||
|
||||
### Technical Constraints
|
||||
|
||||
- Must adhere to the structure and tooling provided by "bmad-boilerplate".
|
||||
- Must use Node.js v22 native `Workspace` for HTTP requests.
|
||||
- Must use the Algolia HN Search API for fetching HN data.
|
||||
- Must integrate with a local Ollama instance via a configurable HTTP endpoint. Design should allow potential swapping to other LLM APIs later.
|
||||
- Must use Nodemailer for sending email.
|
||||
- Configuration (LLM endpoint, email credentials, recipients, `MAX_COMMENTS_PER_STORY`, output dir path) must be managed via a `.env` file based on `.env.example`.
|
||||
- Article scraping must be basic, best-effort, and handle failures gracefully without stopping the main process.
|
||||
- Intermediate data must be persisted locally incrementally.
|
||||
- Code must adhere to the ESLint and Prettier configurations within the boilerplate.
|
||||
|
||||
### Deployment Considerations
|
||||
|
||||
- Execution is manual via CLI trigger only, using `npm run dev` or `npm start`.
|
||||
- No CI/CD required for MVP.
|
||||
- Single environment: local development machine.
|
||||
|
||||
### Local Development & Testing Requirements
|
||||
|
||||
- The entire application runs locally.
|
||||
- The main CLI command (`npm run dev`/`start`) should execute the _full implemented pipeline_.
|
||||
- **Separate utility scripts/commands MUST be provided** for testing individual pipeline stages (fetch, scrape, summarize, email) potentially using local file I/O. Architecture should facilitate creating these stage runners. (e.g., `npm run stage:fetch`, `npm run stage:scrape -- --inputFile <path>`, `npm run stage:summarize -- --inputFile <path>`, `npm run stage:email -- --inputFile <path> [--dry-run]`).
|
||||
- The boilerplate provides `npm run test` using Jest for running automated unit/integration tests.
|
||||
- The boilerplate provides `npm run lint` and `npm run format` for code quality checks.
|
||||
- Basic console logging is required. File logging can be considered by the architect.
|
||||
- Testability of individual modules (API clients, scraper, summarizer, emailer) is crucial and should leverage the Jest setup and stage testing utilities.
|
||||
|
||||
### Other Technical Considerations
|
||||
|
||||
- **Modularity:** Design components (HN client, scraper, LLM client, emailer) with clear interfaces to facilitate potential future modifications (e.g., changing LLM provider) and independent stage testing.
|
||||
- **Error Handling:** Focus on robust handling of scraping failures and basic handling of API/network errors. Implement within the boilerplate structure. Logging should clearly indicate errors.
|
||||
- **Resource Management:** Be mindful of local resources when interacting with the LLM, although optimization is not a primary MVP goal.
|
||||
- **Dependency Management:** Add necessary production dependencies (e.g., `nodemailer`, potentially `article-extractor`, libraries for date handling or file system operations if needed) to the boilerplate's `package.json`. Keep dependencies minimal.
|
||||
- **Configuration Loading:** Implement a robust way to load and validate settings from the `.env` file early in the application startup.
|
||||
90
CURRENT-V2/usage-demo/epic1-draft.txt
Normal file
90
CURRENT-V2/usage-demo/epic1-draft.txt
Normal file
@@ -0,0 +1,90 @@
|
||||
# Epic 1: Project Initialization & Core Setup
|
||||
|
||||
**Goal:** Initialize the project using the "bmad-boilerplate", manage dependencies, setup `.env` and config loading, establish basic CLI entry point, setup basic logging and output directory structure. This provides the foundational setup for all subsequent development work.
|
||||
|
||||
## Story List
|
||||
|
||||
### Story 1.1: Initialize Project from Boilerplate
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to set up the initial project structure using the `bmad-boilerplate`, so that I have the standard tooling (TS, Jest, ESLint, Prettier), configurations, and scripts in place.
|
||||
- **Detailed Requirements:**
|
||||
- Copy or clone the contents of the `bmad-boilerplate` into the new project's root directory.
|
||||
- Initialize a git repository in the project root directory (if not already done by cloning).
|
||||
- Ensure the `.gitignore` file from the boilerplate is present.
|
||||
- Run `npm install` to download and install all `devDependencies` specified in the boilerplate's `package.json`.
|
||||
- Verify that the core boilerplate scripts (`lint`, `format`, `test`, `build`) execute without errors on the initial codebase.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The project directory contains the files and structure from `bmad-boilerplate`.
|
||||
- AC2: A `node_modules` directory exists and contains packages corresponding to `devDependencies`.
|
||||
- AC3: `npm run lint` command completes successfully without reporting any linting errors.
|
||||
- AC4: `npm run format` command completes successfully, potentially making formatting changes according to Prettier rules. Running it a second time should result in no changes.
|
||||
- AC5: `npm run test` command executes Jest successfully (it may report "no tests found" which is acceptable at this stage).
|
||||
- AC6: `npm run build` command executes successfully, creating a `dist` directory containing compiled JavaScript output.
|
||||
- AC7: The `.gitignore` file exists and includes entries for `node_modules/`, `.env`, `dist/`, etc. as specified in the boilerplate.
|
||||
|
||||
---
|
||||
|
||||
### Story 1.2: Setup Environment Configuration
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to establish the environment configuration mechanism using `.env` files, so that secrets and settings (like output paths) can be managed outside of version control, following boilerplate conventions.
|
||||
- **Detailed Requirements:**
|
||||
- Add a production dependency for loading `.env` files (e.g., `dotenv`). Run `npm install dotenv --save-prod` (or similar library).
|
||||
- Verify the `.env.example` file exists (from boilerplate).
|
||||
- Add an initial configuration variable `OUTPUT_DIR_PATH=./output` to `.env.example`.
|
||||
- Create the `.env` file locally by copying `.env.example`. Populate `OUTPUT_DIR_PATH` if needed (can keep default).
|
||||
- Implement a utility module (e.g., `src/config.ts`) that loads environment variables from the `.env` file at application startup.
|
||||
- The utility should export the loaded configuration values (initially just `OUTPUT_DIR_PATH`).
|
||||
- Ensure the `.env` file is listed in `.gitignore` and is not committed.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The chosen `.env` library (e.g., `dotenv`) is listed under `dependencies` in `package.json` and `package-lock.json` is updated.
|
||||
- AC2: The `.env.example` file exists, is tracked by git, and contains the line `OUTPUT_DIR_PATH=./output`.
|
||||
- AC3: The `.env` file exists locally but is NOT tracked by git.
|
||||
- AC4: A configuration module (`src/config.ts` or similar) exists and successfully loads the `OUTPUT_DIR_PATH` value from `.env` when the application starts.
|
||||
- AC5: The loaded `OUTPUT_DIR_PATH` value is accessible within the application code.
|
||||
|
||||
---
|
||||
|
||||
### Story 1.3: Implement Basic CLI Entry Point & Execution
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a basic `src/index.ts` entry point that can be executed via the boilerplate's `dev` and `start` scripts, providing a working foundation for the application logic.
|
||||
- **Detailed Requirements:**
|
||||
- Create the main application entry point file at `src/index.ts`.
|
||||
- Implement minimal code within `src/index.ts` to:
|
||||
- Import the configuration loading mechanism (from Story 1.2).
|
||||
- Log a simple startup message to the console (e.g., "BMad Hacker Daily Digest - Starting Up...").
|
||||
- (Optional) Log the loaded `OUTPUT_DIR_PATH` to verify config loading.
|
||||
- Confirm execution using boilerplate scripts.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The `src/index.ts` file exists.
|
||||
- AC2: Running `npm run dev` executes `src/index.ts` via `ts-node` and logs the startup message to the console.
|
||||
- AC3: Running `npm run build` successfully compiles `src/index.ts` (and any imports) into the `dist` directory.
|
||||
- AC4: Running `npm start` (after a successful build) executes the compiled code from `dist` and logs the startup message to the console.
|
||||
|
||||
---
|
||||
|
||||
### Story 1.4: Setup Basic Logging and Output Directory
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a basic console logging mechanism and the dynamic creation of a date-stamped output directory, so that the application can provide execution feedback and prepare for storing data artifacts in subsequent epics.
|
||||
- **Detailed Requirements:**
|
||||
- Implement a simple, reusable logging utility module (e.g., `src/logger.ts`). Initially, it can wrap `console.log`, `console.warn`, `console.error`.
|
||||
- Refactor `src/index.ts` to use this `logger` for its startup message(s).
|
||||
- In `src/index.ts` (or a setup function called by it):
|
||||
- Retrieve the `OUTPUT_DIR_PATH` from the configuration (loaded in Story 1.2).
|
||||
- Determine the current date in 'YYYY-MM-DD' format.
|
||||
- Construct the full path for the date-stamped subdirectory (e.g., `${OUTPUT_DIR_PATH}/YYYY-MM-DD`).
|
||||
- Check if the base output directory exists; if not, create it.
|
||||
- Check if the date-stamped subdirectory exists; if not, create it recursively. Use Node.js `fs` module (e.g., `fs.mkdirSync(path, { recursive: true })`).
|
||||
- Log (using the logger) the full path of the output directory being used for the current run (e.g., "Output directory for this run: ./output/2025-05-04").
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: A logger utility module (`src/logger.ts` or similar) exists and is used for console output in `src/index.ts`.
|
||||
- AC2: Running `npm run dev` or `npm start` logs the startup message via the logger.
|
||||
- AC3: Running the application creates the base output directory (e.g., `./output` defined in `.env`) if it doesn't already exist.
|
||||
- AC4: Running the application creates a date-stamped subdirectory (e.g., `./output/2025-05-04`) within the base output directory if it doesn't already exist.
|
||||
- AC5: The application logs a message indicating the full path to the date-stamped output directory created/used for the current execution.
|
||||
- AC6: The application exits gracefully after performing these setup steps (for now).
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ------------- | ---------- | ------- | ------------------------- | -------------- |
|
||||
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 1 | 2-pm |
|
||||
99
CURRENT-V2/usage-demo/epic2-draft.txt
Normal file
99
CURRENT-V2/usage-demo/epic2-draft.txt
Normal file
@@ -0,0 +1,99 @@
|
||||
# Epic 2: HN Data Acquisition & Persistence
|
||||
|
||||
**Goal:** Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally into the date-stamped output directory created in Epic 1. Implement a stage testing utility for fetching.
|
||||
|
||||
## Story List
|
||||
|
||||
### Story 2.1: Implement Algolia HN API Client
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a dedicated client module to interact with the Algolia Hacker News Search API, so that fetching stories and comments is encapsulated, reusable, and uses the required native `Workspace` API.
|
||||
- **Detailed Requirements:**
|
||||
- Create a new module: `src/clients/algoliaHNClient.ts`.
|
||||
- Implement an async function `WorkspaceTopStories` within the client:
|
||||
- Use native `Workspace` to call the Algolia HN Search API endpoint for front-page stories (e.g., `http://hn.algolia.com/api/v1/search?tags=front_page&hitsPerPage=10`). Adjust `hitsPerPage` if needed to ensure 10 stories.
|
||||
- Parse the JSON response.
|
||||
- Extract required metadata for each story: `objectID` (use as `storyId`), `title`, `url` (article URL), `points`, `num_comments`. Handle potential missing `url` field gracefully (log warning, maybe skip story later if URL needed).
|
||||
- Construct the `hnUrl` for each story (e.g., `https://news.ycombinator.com/item?id={storyId}`).
|
||||
- Return an array of structured story objects.
|
||||
- Implement a separate async function `WorkspaceCommentsForStory` within the client:
|
||||
- Accept `storyId` and `maxComments` limit as arguments.
|
||||
- Use native `Workspace` to call the Algolia HN Search API endpoint for comments of a specific story (e.g., `http://hn.algolia.com/api/v1/search?tags=comment,story_{storyId}&hitsPerPage={maxComments}`).
|
||||
- Parse the JSON response.
|
||||
- Extract required comment data: `objectID` (use as `commentId`), `comment_text`, `author`, `created_at`.
|
||||
- Filter out comments where `comment_text` is null or empty. Ensure only up to `maxComments` are returned.
|
||||
- Return an array of structured comment objects.
|
||||
- Implement basic error handling using `try...catch` around `Workspace` calls and check `response.ok` status. Log errors using the logger utility from Epic 1.
|
||||
- Define TypeScript interfaces/types for the expected structures of API responses (stories, comments) and the data returned by the client functions (e.g., `Story`, `Comment`).
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The module `src/clients/algoliaHNClient.ts` exists and exports `WorkspaceTopStories` and `WorkspaceCommentsForStory` functions.
|
||||
- AC2: Calling `WorkspaceTopStories` makes a network request to the correct Algolia endpoint and returns a promise resolving to an array of 10 `Story` objects containing the specified metadata.
|
||||
- AC3: Calling `WorkspaceCommentsForStory` with a valid `storyId` and `maxComments` limit makes a network request to the correct Algolia endpoint and returns a promise resolving to an array of `Comment` objects (up to `maxComments`), filtering out empty ones.
|
||||
- AC4: Both functions use the native `Workspace` API internally.
|
||||
- AC5: Network errors or non-successful API responses (e.g., status 4xx, 5xx) are caught and logged using the logger.
|
||||
- AC6: Relevant TypeScript types (`Story`, `Comment`, etc.) are defined and used within the client module.
|
||||
|
||||
---
|
||||
|
||||
### Story 2.2: Integrate HN Data Fetching into Main Workflow
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to integrate the HN data fetching logic into the main application workflow (`src/index.ts`), so that running the app retrieves the top 10 stories and their comments after completing the setup from Epic 1.
|
||||
- **Detailed Requirements:**
|
||||
- Modify the main execution flow in `src/index.ts` (or a main async function called by it).
|
||||
- Import the `algoliaHNClient` functions.
|
||||
- Import the configuration module to access `MAX_COMMENTS_PER_STORY`.
|
||||
- After the Epic 1 setup (config load, logger init, output dir creation), call `WorkspaceTopStories()`.
|
||||
- Log the number of stories fetched.
|
||||
- Iterate through the array of fetched `Story` objects.
|
||||
- For each `Story`, call `WorkspaceCommentsForStory()`, passing the `story.storyId` and the configured `MAX_COMMENTS_PER_STORY`.
|
||||
- Store the fetched comments within the corresponding `Story` object in memory (e.g., add a `comments: Comment[]` property to the `Story` object).
|
||||
- Log progress using the logger utility (e.g., "Fetched 10 stories.", "Fetching up to X comments for story {storyId}...").
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: Running `npm run dev` executes Epic 1 setup steps followed by fetching stories and then comments for each story.
|
||||
- AC2: Logs clearly show the start and successful completion of fetching stories, and the start of fetching comments for each of the 10 stories.
|
||||
- AC3: The configured `MAX_COMMENTS_PER_STORY` value is read from config and used in the calls to `WorkspaceCommentsForStory`.
|
||||
- AC4: After successful execution, story objects held in memory contain a nested array of fetched comment objects. (Can be verified via debugger or temporary logging).
|
||||
|
||||
---
|
||||
|
||||
### Story 2.3: Persist Fetched HN Data Locally
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to save the fetched HN stories (including their comments) to JSON files in the date-stamped output directory, so that the raw data is persisted locally for subsequent pipeline stages and debugging.
|
||||
- **Detailed Requirements:**
|
||||
- Define a consistent JSON structure for the output file content. Example: `{ storyId: "...", title: "...", url: "...", hnUrl: "...", points: ..., fetchedAt: "ISO_TIMESTAMP", comments: [{ commentId: "...", text: "...", author: "...", createdAt: "ISO_TIMESTAMP", ... }, ...] }`. Include a timestamp for when the data was fetched.
|
||||
- Import Node.js `fs` (specifically `fs.writeFileSync`) and `path` modules.
|
||||
- In the main workflow (`src/index.ts`), within the loop iterating through stories (after comments have been fetched and added to the story object in Story 2.2):
|
||||
- Get the full path to the date-stamped output directory (determined in Epic 1).
|
||||
- Construct the filename for the story's data: `{storyId}_data.json`.
|
||||
- Construct the full file path using `path.join()`.
|
||||
- Serialize the complete story object (including comments and fetch timestamp) to a JSON string using `JSON.stringify(storyObject, null, 2)` for readability.
|
||||
- Write the JSON string to the file using `fs.writeFileSync()`. Use a `try...catch` block for error handling.
|
||||
- Log (using the logger) the successful persistence of each story's data file or any errors encountered during file writing.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: After running `npm run dev`, the date-stamped output directory (e.g., `./output/YYYY-MM-DD/`) contains exactly 10 files named `{storyId}_data.json`.
|
||||
- AC2: Each JSON file contains valid JSON representing a single story object, including its metadata, fetch timestamp, and an array of its fetched comments, matching the defined structure.
|
||||
- AC3: The number of comments in each file's `comments` array does not exceed `MAX_COMMENTS_PER_STORY`.
|
||||
- AC4: Logs indicate that saving data to a file was attempted for each story, reporting success or specific file writing errors.
|
||||
|
||||
---
|
||||
|
||||
### Story 2.4: Implement Stage Testing Utility for HN Fetching
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a separate, executable script that *only* performs the HN data fetching and persistence, so I can test and trigger this stage independently of the full pipeline.
|
||||
- **Detailed Requirements:**
|
||||
- Create a new standalone script file: `src/stages/fetch_hn_data.ts`.
|
||||
- This script should perform the essential setup required for this stage: initialize logger, load configuration (`.env`), determine and create output directory (reuse or replicate logic from Epic 1 / `src/index.ts`).
|
||||
- The script should then execute the core logic of fetching stories via `algoliaHNClient.fetchTopStories`, fetching comments via `algoliaHNClient.fetchCommentsForStory` (using loaded config for limit), and persisting the results to JSON files using `fs.writeFileSync` (replicating logic from Story 2.3).
|
||||
- The script should log its progress using the logger utility.
|
||||
- Add a new script command to `package.json` under `"scripts"`: `"stage:fetch": "ts-node src/stages/fetch_hn_data.ts"`.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The file `src/stages/fetch_hn_data.ts` exists.
|
||||
- AC2: The script `stage:fetch` is defined in `package.json`'s `scripts` section.
|
||||
- AC3: Running `npm run stage:fetch` executes successfully, performing only the setup, fetch, and persist steps.
|
||||
- AC4: Running `npm run stage:fetch` creates the same 10 `{storyId}_data.json` files in the correct date-stamped output directory as running the main `npm run dev` command (at the current state of development).
|
||||
- AC5: Logs generated by `npm run stage:fetch` reflect only the fetching and persisting steps, not subsequent pipeline stages.
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ------------- | ---------- | ------- | ------------------------- | -------------- |
|
||||
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 2 | 2-pm |
|
||||
111
CURRENT-V2/usage-demo/epic3-draft.txt
Normal file
111
CURRENT-V2/usage-demo/epic3-draft.txt
Normal file
@@ -0,0 +1,111 @@
|
||||
# Epic 3: Article Scraping & Persistence
|
||||
|
||||
**Goal:** Implement a best-effort article scraping mechanism to fetch and extract plain text content from the external URLs associated with fetched HN stories. Handle failures gracefully and persist successfully scraped text locally. Implement a stage testing utility for scraping.
|
||||
|
||||
## Story List
|
||||
|
||||
### Story 3.1: Implement Basic Article Scraper Module
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a module that attempts to fetch HTML from a URL and extract the main article text using basic methods, handling common failures gracefully, so article content can be prepared for summarization.
|
||||
- **Detailed Requirements:**
|
||||
- Create a new module: `src/scraper/articleScraper.ts`.
|
||||
- Add a suitable HTML parsing/extraction library dependency (e.g., `@extractus/article-extractor` recommended for simplicity, or `cheerio` for more control). Run `npm install @extractus/article-extractor --save-prod` (or chosen alternative).
|
||||
- Implement an async function `scrapeArticle(url: string): Promise<string | null>` within the module.
|
||||
- Inside the function:
|
||||
- Use native `Workspace` to retrieve content from the `url`. Set a reasonable timeout (e.g., 10-15 seconds). Include a `User-Agent` header to mimic a browser.
|
||||
- Handle potential `Workspace` errors (network errors, timeouts) using `try...catch`.
|
||||
- Check the `response.ok` status. If not okay, log error and return `null`.
|
||||
- Check the `Content-Type` header of the response. If it doesn't indicate HTML (e.g., does not include `text/html`), log warning and return `null`.
|
||||
- If HTML is received, attempt to extract the main article text using the chosen library (`article-extractor` preferred).
|
||||
- Wrap the extraction logic in a `try...catch` to handle library-specific errors.
|
||||
- Return the extracted plain text string if successful. Ensure it's just text, not HTML markup.
|
||||
- Return `null` if extraction fails or results in empty content.
|
||||
- Log all significant events, errors, or reasons for returning null (e.g., "Scraping URL...", "Fetch failed:", "Non-HTML content type:", "Extraction failed:", "Successfully extracted text") using the logger utility.
|
||||
- Define TypeScript types/interfaces as needed.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The `articleScraper.ts` module exists and exports the `scrapeArticle` function.
|
||||
- AC2: The chosen scraping library (e.g., `@extractus/article-extractor`) is added to `dependencies` in `package.json`.
|
||||
- AC3: `scrapeArticle` uses native `Workspace` with a timeout and User-Agent header.
|
||||
- AC4: `scrapeArticle` correctly handles fetch errors, non-OK responses, and non-HTML content types by logging and returning `null`.
|
||||
- AC5: `scrapeArticle` uses the chosen library to attempt text extraction from valid HTML content.
|
||||
- AC6: `scrapeArticle` returns the extracted plain text on success, and `null` on any failure (fetch, non-HTML, extraction error, empty result).
|
||||
- AC7: Relevant logs are produced for success, failure modes, and errors encountered during the process.
|
||||
|
||||
---
|
||||
|
||||
### Story 3.2: Integrate Article Scraping into Main Workflow
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to integrate the article scraper into the main workflow (`src/index.ts`), attempting to scrape the article for each HN story that has a valid URL, after fetching its data.
|
||||
- **Detailed Requirements:**
|
||||
- Modify the main execution flow in `src/index.ts`.
|
||||
- Import the `scrapeArticle` function from `src/scraper/articleScraper.ts`.
|
||||
- Within the main loop iterating through the fetched stories (after comments are fetched in Epic 2):
|
||||
- Check if `story.url` exists and appears to be a valid HTTP/HTTPS URL. A simple check for starting with `http://` or `https://` is sufficient.
|
||||
- If the URL is missing or invalid, log a warning ("Skipping scraping for story {storyId}: Missing or invalid URL") and proceed to the next story's processing step.
|
||||
- If a valid URL exists, log ("Attempting to scrape article for story {storyId} from {story.url}").
|
||||
- Call `await scrapeArticle(story.url)`.
|
||||
- Store the result (the extracted text string or `null`) in memory, associated with the story object (e.g., add property `articleContent: string | null`).
|
||||
- Log the outcome clearly (e.g., "Successfully scraped article for story {storyId}", "Failed to scrape article for story {storyId}").
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: Running `npm run dev` executes Epic 1 & 2 steps, and then attempts article scraping for stories with valid URLs.
|
||||
- AC2: Stories with missing or invalid URLs are skipped, and a corresponding log message is generated.
|
||||
- AC3: For stories with valid URLs, the `scrapeArticle` function is called.
|
||||
- AC4: Logs clearly indicate the start and success/failure outcome of the scraping attempt for each relevant story.
|
||||
- AC5: Story objects held in memory after this stage contain an `articleContent` property holding the scraped text (string) or `null` if scraping was skipped or failed.
|
||||
|
||||
---
|
||||
|
||||
### Story 3.3: Persist Scraped Article Text Locally
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to save successfully scraped article text to a separate local file for each story, so that the text content is available as input for the summarization stage.
|
||||
- **Detailed Requirements:**
|
||||
- Import Node.js `fs` and `path` modules if not already present in `src/index.ts`.
|
||||
- In the main workflow (`src/index.ts`), immediately after a successful call to `scrapeArticle` for a story (where the result is a non-null string):
|
||||
- Retrieve the full path to the current date-stamped output directory.
|
||||
- Construct the filename: `{storyId}_article.txt`.
|
||||
- Construct the full file path using `path.join()`.
|
||||
- Get the successfully scraped article text string (`articleContent`).
|
||||
- Use `fs.writeFileSync(fullPath, articleContent, 'utf-8')` to save the text to the file. Wrap in `try...catch` for file system errors.
|
||||
- Log the successful saving of the file (e.g., "Saved scraped article text to {filename}") or any file writing errors encountered.
|
||||
- Ensure *no* `_article.txt` file is created if `scrapeArticle` returned `null` (due to skipping or failure).
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: After running `npm run dev`, the date-stamped output directory contains `_article.txt` files *only* for those stories where `scrapeArticle` succeeded and returned text content.
|
||||
- AC2: The name of each article text file is `{storyId}_article.txt`.
|
||||
- AC3: The content of each `_article.txt` file is the plain text string returned by `scrapeArticle`.
|
||||
- AC4: Logs confirm the successful writing of each `_article.txt` file or report specific file writing errors.
|
||||
- AC5: No empty `_article.txt` files are created. Files only exist if scraping was successful.
|
||||
|
||||
---
|
||||
|
||||
### Story 3.4: Implement Stage Testing Utility for Scraping
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a separate script/command to test the article scraping logic using HN story data from local files, allowing independent testing and debugging of the scraper.
|
||||
- **Detailed Requirements:**
|
||||
- Create a new standalone script file: `src/stages/scrape_articles.ts`.
|
||||
- Import necessary modules: `fs`, `path`, `logger`, `config`, `scrapeArticle`.
|
||||
- The script should:
|
||||
- Initialize the logger.
|
||||
- Load configuration (to get `OUTPUT_DIR_PATH`).
|
||||
- Determine the target date-stamped directory path (e.g., `${OUTPUT_DIR_PATH}/YYYY-MM-DD`, using the current date or potentially an optional CLI argument). Ensure this directory exists.
|
||||
- Read the directory contents and identify all `{storyId}_data.json` files.
|
||||
- For each `_data.json` file found:
|
||||
- Read and parse the JSON content.
|
||||
- Extract the `storyId` and `url`.
|
||||
- If a valid `url` exists, call `await scrapeArticle(url)`.
|
||||
- If scraping succeeds (returns text), save the text to `{storyId}_article.txt` in the same directory (using logic from Story 3.3). Overwrite if the file exists.
|
||||
- Log the progress and outcome (skip/success/fail) for each story processed.
|
||||
- Add a new script command to `package.json`: `"stage:scrape": "ts-node src/stages/scrape_articles.ts"`. Consider adding argument parsing later if needed to specify a date/directory.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The file `src/stages/scrape_articles.ts` exists.
|
||||
- AC2: The script `stage:scrape` is defined in `package.json`.
|
||||
- AC3: Running `npm run stage:scrape` (assuming a directory with `_data.json` files exists from a previous `stage:fetch` run) reads these files.
|
||||
- AC4: The script calls `scrapeArticle` for stories with valid URLs found in the JSON files.
|
||||
- AC5: The script creates/updates `{storyId}_article.txt` files in the target directory corresponding to successfully scraped articles.
|
||||
- AC6: The script logs its actions (reading files, attempting scraping, saving results) for each story ID processed.
|
||||
- AC7: The script operates solely based on local `_data.json` files and fetching from external article URLs; it does not call the Algolia HN API.
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ------------- | ---------- | ------- | ------------------------- | -------------- |
|
||||
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 3 | 2-pm |
|
||||
138
CURRENT-V2/usage-demo/epic4-draft.txt
Normal file
138
CURRENT-V2/usage-demo/epic4-draft.txt
Normal file
@@ -0,0 +1,138 @@
|
||||
# Epic 4: LLM Summarization & Persistence
|
||||
|
||||
**Goal:** Integrate with the configured local Ollama instance to generate summaries for successfully scraped article text and fetched comments. Persist these summaries locally. Implement a stage testing utility for summarization.
|
||||
|
||||
## Story List
|
||||
|
||||
### Story 4.1: Implement Ollama Client Module
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a client module to interact with the configured Ollama API endpoint via HTTP, handling requests and responses for text generation, so that summaries can be generated programmatically.
|
||||
- **Detailed Requirements:**
|
||||
- **Prerequisite:** Ensure a local Ollama instance is installed and running, accessible via the URL defined in `.env` (`OLLAMA_ENDPOINT_URL`), and that the model specified in `.env` (`OLLAMA_MODEL`) has been downloaded (e.g., via `ollama pull model_name`). Instructions for this setup should be in the project README.
|
||||
- Create a new module: `src/clients/ollamaClient.ts`.
|
||||
- Implement an async function `generateSummary(prompt: string, content: string): Promise<string | null>`.
|
||||
- Add configuration variables `OLLAMA_ENDPOINT_URL` (e.g., `http://localhost:11434`) and `OLLAMA_MODEL` (e.g., `llama3`) to `.env.example`. Ensure they are loaded via the config module (`src/config.ts`). Update local `.env` with actual values.
|
||||
- Inside `generateSummary`:
|
||||
- Construct the full prompt string (e.g., `${prompt}\n\n${content}`).
|
||||
- Construct the Ollama API request payload (JSON): `{ model: configured_model, prompt: full_prompt, stream: false }`. Refer to Ollama `/api/generate` documentation.
|
||||
- Use native `Workspace` to send a POST request to the configured Ollama endpoint + `/api/generate`. Set appropriate headers (`Content-Type: application/json`). Set a reasonable timeout (e.g., 1-2 minutes, as LLM generation can be slow).
|
||||
- Handle `Workspace` errors (network, timeout) using `try...catch`.
|
||||
- Check `response.ok`. If not OK, log the status/error and return `null`.
|
||||
- Parse the JSON response from Ollama. Extract the generated text (typically in the `response` field).
|
||||
- Check for potential errors within the Ollama response structure itself.
|
||||
- Return the extracted summary string on success. Return `null` on any failure.
|
||||
- Log key events: initiating request (mention model), receiving response, success, failure reasons, potentially request/response time using the logger.
|
||||
- Define necessary TypeScript types for the Ollama request payload and expected response structure.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The `ollamaClient.ts` module exists and exports `generateSummary`.
|
||||
- AC2: `OLLAMA_ENDPOINT_URL` and `OLLAMA_MODEL` are defined in `.env.example`, loaded via config, and used by the client.
|
||||
- AC3: `generateSummary` sends a correctly formatted POST request (model, full prompt, stream:false) to the configured Ollama endpoint/path using native `Workspace`.
|
||||
- AC4: Network errors, timeouts, and non-OK API responses are handled gracefully, logged, and result in a `null` return (given the Prerequisite Ollama service is running).
|
||||
- AC5: A successful Ollama response is parsed correctly, the generated text is extracted, and returned as a string.
|
||||
- AC6: Unexpected Ollama response formats or internal errors are handled, logged, and result in a `null` return.
|
||||
- AC7: Logs provide visibility into the client's interaction with the Ollama API.
|
||||
|
||||
---
|
||||
|
||||
### Story 4.2: Define Summarization Prompts
|
||||
|
||||
- **User Story / Goal:** As a developer, I want standardized base prompts for generating article summaries and HN discussion summaries, ensuring consistent instructions are sent to the LLM.
|
||||
- **Detailed Requirements:**
|
||||
- Define two string constants or configuration variables for the prompts:
|
||||
- `ARTICLE_SUMMARY_PROMPT`: e.g., "Summarize the key points, arguments, and conclusions of the following article text concisely:"
|
||||
- `DISCUSSION_SUMMARY_PROMPT`: e.g., "Summarize the main themes, diverse viewpoints, key insights, and overall sentiment expressed in the following Hacker News comments:"
|
||||
- Store these prompts in a suitable location (e.g., a new `src/summarizer/prompts.ts` module or within `src/config.ts`). Make them easily accessible to the main workflow.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The `ARTICLE_SUMMARY_PROMPT` constant/variable is defined with appropriate instructional text.
|
||||
- AC2: The `DISCUSSION_SUMMARY_PROMPT` constant/variable is defined with appropriate instructional text.
|
||||
- AC3: These prompts are exported or otherwise made available for use in the main workflow.
|
||||
|
||||
---
|
||||
|
||||
### Story 4.3: Integrate Summarization into Main Workflow
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to integrate the Ollama client into the main workflow to generate summaries for each story's scraped article text (if available) and fetched comments.
|
||||
- **Detailed Requirements:**
|
||||
- Modify the main execution flow in `src/index.ts`.
|
||||
- Import `ollamaClient.generateSummary` and the defined `ARTICLE_SUMMARY_PROMPT`, `DISCUSSION_SUMMARY_PROMPT`.
|
||||
- Within the main loop iterating through stories (after article scraping/persistence in Epic 3):
|
||||
- **Article Summary Generation:**
|
||||
- Check if the `story` object has non-null `articleContent`.
|
||||
- If yes: log "Attempting article summarization for story {storyId}", call `await generateSummary(ARTICLE_SUMMARY_PROMPT, story.articleContent)`, store the result (string or null) as `story.articleSummary`, log success/failure.
|
||||
- If no: set `story.articleSummary = null`, log "Skipping article summarization: No content".
|
||||
- **Discussion Summary Generation:**
|
||||
- Check if the `story` object has a non-empty `comments` array.
|
||||
- If yes:
|
||||
- Format the `story.comments` array into a single text block suitable for the LLM prompt (e.g., concatenating `comment.text` with separators like `---`). *Note: Be aware of potential LLM context window limits; for MVP, concatenate all fetched comments but log a warning if total character count is very high (e.g., > 10000 characters).*
|
||||
- Log "Attempting discussion summarization for story {storyId}".
|
||||
- Call `await generateSummary(DISCUSSION_SUMMARY_PROMPT, formattedCommentsText)`.
|
||||
- Store the result (string or null) as `story.discussionSummary`. Log success/failure.
|
||||
- If no: set `story.discussionSummary = null`, log "Skipping discussion summarization: No comments".
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: Running `npm run dev` executes steps from Epics 1-3, then attempts summarization using the Ollama client.
|
||||
- AC2: Article summary is attempted only if `articleContent` exists for a story.
|
||||
- AC3: Discussion summary is attempted only if `comments` exist for a story.
|
||||
- AC4: `generateSummary` is called with the correct prompts and corresponding content (article text or formatted comments).
|
||||
- AC5: Logs clearly indicate the start, success, or failure (including null returns from the client) for both article and discussion summarization attempts per story.
|
||||
- AC6: Story objects in memory now contain `articleSummary` (string/null) and `discussionSummary` (string/null) properties.
|
||||
|
||||
---
|
||||
|
||||
### Story 4.4: Persist Generated Summaries Locally
|
||||
|
||||
- **User Story / Goal:** As a developer, I want to save the generated article and discussion summaries (or null placeholders) to a local JSON file for each story, making them available for the email assembly stage.
|
||||
- **Detailed Requirements:**
|
||||
- Define the structure for the summary output file: `{storyId}_summary.json`. Content example: `{ "storyId": "...", "articleSummary": "...", "discussionSummary": "...", "summarizedAt": "ISO_TIMESTAMP" }`. Note that `articleSummary` and `discussionSummary` can be `null`.
|
||||
- Import `fs` and `path` in `src/index.ts` if needed.
|
||||
- In the main workflow loop, after *both* summarization attempts (article and discussion) for a story are complete:
|
||||
- Create a summary result object containing `storyId`, `articleSummary` (string or null), `discussionSummary` (string or null), and the current ISO timestamp (`new Date().toISOString()`).
|
||||
- Get the full path to the date-stamped output directory.
|
||||
- Construct the filename: `{storyId}_summary.json`.
|
||||
- Construct the full file path using `path.join()`.
|
||||
- Serialize the summary result object to JSON (`JSON.stringify(..., null, 2)`).
|
||||
- Use `fs.writeFileSync` to save the JSON to the file, wrapping in `try...catch`.
|
||||
- Log the successful saving of the summary file or any file writing errors.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: After running `npm run dev`, the date-stamped output directory contains 10 files named `{storyId}_summary.json`.
|
||||
- AC2: Each `_summary.json` file contains valid JSON adhering to the defined structure.
|
||||
- AC3: The `articleSummary` field contains the generated summary string if successful, otherwise `null`.
|
||||
- AC4: The `discussionSummary` field contains the generated summary string if successful, otherwise `null`.
|
||||
- AC5: A valid ISO timestamp is present in the `summarizedAt` field.
|
||||
- AC6: Logs confirm successful writing of each summary file or report file system errors.
|
||||
|
||||
---
|
||||
|
||||
### Story 4.5: Implement Stage Testing Utility for Summarization
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a separate script/command to test the LLM summarization logic using locally persisted data (HN comments, scraped article text), allowing independent testing of prompts and Ollama interaction.
|
||||
- **Detailed Requirements:**
|
||||
- Create a new standalone script file: `src/stages/summarize_content.ts`.
|
||||
- Import necessary modules: `fs`, `path`, `logger`, `config`, `ollamaClient`, prompts.
|
||||
- The script should:
|
||||
- Initialize logger, load configuration (Ollama endpoint/model, prompts, output dir).
|
||||
- Determine target date-stamped directory path.
|
||||
- Find all `{storyId}_data.json` files in the directory.
|
||||
- For each `storyId` found:
|
||||
- Read `{storyId}_data.json` to get comments. Format them into a single text block.
|
||||
- *Attempt* to read `{storyId}_article.txt`. Handle file-not-found gracefully (means article wasn't scraped). Store content or null.
|
||||
- Call `ollamaClient.generateSummary` for article text (if not null) using `ARTICLE_SUMMARY_PROMPT`.
|
||||
- Call `ollamaClient.generateSummary` for formatted comments (if comments exist) using `DISCUSSION_SUMMARY_PROMPT`.
|
||||
- Construct the summary result object (with summaries or nulls, and timestamp).
|
||||
- Save the result object to `{storyId}_summary.json` in the same directory (using logic from Story 4.4), overwriting if exists.
|
||||
- Log progress (reading files, calling Ollama, saving results) for each story ID.
|
||||
- Add script to `package.json`: `"stage:summarize": "ts-node src/stages/summarize_content.ts"`.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The file `src/stages/summarize_content.ts` exists.
|
||||
- AC2: The script `stage:summarize` is defined in `package.json`.
|
||||
- AC3: Running `npm run stage:summarize` (after `stage:fetch` and `stage:scrape` runs) reads `_data.json` and attempts to read `_article.txt` files from the target directory.
|
||||
- AC4: The script calls the `ollamaClient` with correct prompts and content derived *only* from the local files (requires Ollama service running per Story 4.1 prerequisite).
|
||||
- AC5: The script creates/updates `{storyId}_summary.json` files in the target directory reflecting the results of the Ollama calls (summaries or nulls).
|
||||
- AC6: Logs show the script processing each story ID found locally, interacting with Ollama, and saving results.
|
||||
- AC7: The script does not call Algolia API or the article scraper module.
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ----------------------- | ---------- | ------- | -------------------------------- | -------------- |
|
||||
| Added Ollama Prereq Note| 2025-05-04 | 0.2 | Added note about local Ollama setup | 2-pm |
|
||||
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 4 | 2-pm |
|
||||
152
CURRENT-V2/usage-demo/epic5-draft.txt
Normal file
152
CURRENT-V2/usage-demo/epic5-draft.txt
Normal file
@@ -0,0 +1,152 @@
|
||||
# Epic 5: Digest Assembly & Email Dispatch
|
||||
|
||||
**Goal:** Assemble the collected story data and summaries from local files, format them into a readable HTML email digest, and send the email using Nodemailer with configured credentials. Implement a stage testing utility for emailing with a dry-run option.
|
||||
|
||||
## Story List
|
||||
|
||||
### Story 5.1: Implement Email Content Assembler
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a module that reads the persisted story metadata (`_data.json`) and summaries (`_summary.json`) from a specified directory, consolidating the necessary information needed to render the email digest.
|
||||
- **Detailed Requirements:**
|
||||
- Create a new module: `src/email/contentAssembler.ts`.
|
||||
- Define a TypeScript type/interface `DigestData` representing the data needed per story for the email template: `{ storyId: string, title: string, hnUrl: string, articleUrl: string | null, articleSummary: string | null, discussionSummary: string | null }`.
|
||||
- Implement an async function `assembleDigestData(dateDirPath: string): Promise<DigestData[]>`.
|
||||
- The function should:
|
||||
- Use Node.js `fs` to read the contents of the `dateDirPath`.
|
||||
- Identify all files matching the pattern `{storyId}_data.json`.
|
||||
- For each `storyId` found:
|
||||
- Read and parse the `{storyId}_data.json` file. Extract `title`, `hnUrl`, and `url` (use as `articleUrl`). Handle potential file read/parse errors gracefully (log and skip story).
|
||||
- Attempt to read and parse the corresponding `{storyId}_summary.json` file. Handle file-not-found or parse errors gracefully (treat `articleSummary` and `discussionSummary` as `null`).
|
||||
- Construct a `DigestData` object for the story, including the extracted metadata and summaries (or nulls).
|
||||
- Collect all successfully constructed `DigestData` objects into an array.
|
||||
- Return the array. It should ideally contain 10 items if all previous stages succeeded.
|
||||
- Log progress (e.g., "Assembling digest data from directory...", "Processing story {storyId}...") and any errors encountered during file processing using the logger.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The `contentAssembler.ts` module exists and exports `assembleDigestData` and the `DigestData` type.
|
||||
- AC2: `assembleDigestData` correctly reads `_data.json` files from the provided directory path.
|
||||
- AC3: It attempts to read corresponding `_summary.json` files, correctly handling cases where the summary file might be missing or unparseable (resulting in null summaries for that story).
|
||||
- AC4: The function returns a promise resolving to an array of `DigestData` objects, populated with data extracted from the files.
|
||||
- AC5: Errors during file reading or JSON parsing are logged, and the function returns data for successfully processed stories.
|
||||
|
||||
---
|
||||
|
||||
### Story 5.2: Create HTML Email Template & Renderer
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a basic HTML email template and a function to render it with the assembled digest data, producing the final HTML content for the email body.
|
||||
- **Detailed Requirements:**
|
||||
- Define the HTML structure. This can be done using template literals within a function or potentially using a simple template file (e.g., `src/email/templates/digestTemplate.html`) and `fs.readFileSync`. Template literals are simpler for MVP.
|
||||
- Create a function `renderDigestHtml(data: DigestData[], digestDate: string): string` (e.g., in `src/email/contentAssembler.ts` or a new `templater.ts`).
|
||||
- The function should generate an HTML string with:
|
||||
- A suitable title in the body (e.g., `<h1>Hacker News Top 10 Summaries for ${digestDate}</h1>`).
|
||||
- A loop through the `data` array.
|
||||
- For each `story` in `data`:
|
||||
- Display `<h2><a href="${story.articleUrl || story.hnUrl}">${story.title}</a></h2>`.
|
||||
- Display `<p><a href="${story.hnUrl}">View HN Discussion</a></p>`.
|
||||
- Conditionally display `<h3>Article Summary</h3><p>${story.articleSummary}</p>` *only if* `story.articleSummary` is not null/empty.
|
||||
- Conditionally display `<h3>Discussion Summary</h3><p>${story.discussionSummary}</p>` *only if* `story.discussionSummary` is not null/empty.
|
||||
- Include a separator (e.g., `<hr style="margin-top: 20px; margin-bottom: 20px;">`).
|
||||
- Use basic inline CSS for minimal styling (margins, etc.) to ensure readability. Avoid complex layouts.
|
||||
- Return the complete HTML document as a string.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: A function `renderDigestHtml` exists that accepts the digest data array and a date string.
|
||||
- AC2: The function returns a single, complete HTML string.
|
||||
- AC3: The generated HTML includes a title with the date and correctly iterates through the story data.
|
||||
- AC4: For each story, the HTML displays the linked title, HN link, and conditionally displays the article and discussion summaries with headings.
|
||||
- AC5: Basic separators and margins are used for readability. The HTML is simple and likely to render reasonably in most email clients.
|
||||
|
||||
---
|
||||
|
||||
### Story 5.3: Implement Nodemailer Email Sender
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a module to send the generated HTML email using Nodemailer, configured with credentials stored securely in the environment file.
|
||||
- **Detailed Requirements:**
|
||||
- Add Nodemailer dependencies: `npm install nodemailer @types/nodemailer --save-prod`.
|
||||
- Add required configuration variables to `.env.example` (and local `.env`): `EMAIL_HOST`, `EMAIL_PORT` (e.g., 587), `EMAIL_SECURE` (e.g., `false` for STARTTLS on 587, `true` for 465), `EMAIL_USER`, `EMAIL_PASS`, `EMAIL_FROM` (e.g., `"Your Name <you@example.com>"`), `EMAIL_RECIPIENTS` (comma-separated list).
|
||||
- Create a new module: `src/email/emailSender.ts`.
|
||||
- Implement an async function `sendDigestEmail(subject: string, htmlContent: string): Promise<boolean>`.
|
||||
- Inside the function:
|
||||
- Load the `EMAIL_*` variables from the config module.
|
||||
- Create a Nodemailer transporter using `nodemailer.createTransport` with the loaded config (host, port, secure flag, auth: { user, pass }).
|
||||
- Verify transporter configuration using `transporter.verify()` (optional but recommended). Log verification success/failure.
|
||||
- Parse the `EMAIL_RECIPIENTS` string into an array or comma-separated string suitable for the `to` field.
|
||||
- Define the `mailOptions`: `{ from: EMAIL_FROM, to: parsedRecipients, subject: subject, html: htmlContent }`.
|
||||
- Call `await transporter.sendMail(mailOptions)`.
|
||||
- If `sendMail` succeeds, log the success message including the `messageId` from the result. Return `true`.
|
||||
- If `sendMail` fails (throws error), log the error using the logger. Return `false`.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: `nodemailer` and `@types/nodemailer` dependencies are added.
|
||||
- AC2: `EMAIL_*` variables are defined in `.env.example` and loaded from config.
|
||||
- AC3: `emailSender.ts` module exists and exports `sendDigestEmail`.
|
||||
- AC4: `sendDigestEmail` correctly creates a Nodemailer transporter using configuration from `.env`. Transporter verification is attempted (optional AC).
|
||||
- AC5: The `to` field is correctly populated based on `EMAIL_RECIPIENTS`.
|
||||
- AC6: `transporter.sendMail` is called with correct `from`, `to`, `subject`, and `html` options.
|
||||
- AC7: Email sending success (including message ID) or failure is logged clearly.
|
||||
- AC8: The function returns `true` on successful sending, `false` otherwise.
|
||||
|
||||
---
|
||||
|
||||
### Story 5.4: Integrate Email Assembly and Sending into Main Workflow
|
||||
|
||||
- **User Story / Goal:** As a developer, I want the main application workflow (`src/index.ts`) to orchestrate the final steps: assembling digest data, rendering the HTML, and triggering the email send after all previous stages are complete.
|
||||
- **Detailed Requirements:**
|
||||
- Modify the main execution flow in `src/index.ts`.
|
||||
- Import `assembleDigestData`, `renderDigestHtml`, `sendDigestEmail`.
|
||||
- Execute these steps *after* the main loop (where stories are fetched, scraped, summarized, and persisted) completes:
|
||||
- Log "Starting final digest assembly and email dispatch...".
|
||||
- Determine the path to the current date-stamped output directory.
|
||||
- Call `const digestData = await assembleDigestData(dateDirPath)`.
|
||||
- Check if `digestData` array is not empty.
|
||||
- If yes:
|
||||
- Get the current date string (e.g., 'YYYY-MM-DD').
|
||||
- `const htmlContent = renderDigestHtml(digestData, currentDate)`.
|
||||
- `const subject = \`BMad Hacker Daily Digest - ${currentDate}\``.
|
||||
- `const emailSent = await sendDigestEmail(subject, htmlContent)`.
|
||||
- Log the final outcome based on `emailSent` ("Digest email sent successfully." or "Failed to send digest email.").
|
||||
- If no (`digestData` is empty or assembly failed):
|
||||
- Log an error: "Failed to assemble digest data or no data found. Skipping email."
|
||||
- Log "BMad Hacker Daily Digest process finished."
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: Running `npm run dev` executes all stages (Epics 1-4) and then proceeds to email assembly and sending.
|
||||
- AC2: `assembleDigestData` is called correctly with the output directory path after other processing is done.
|
||||
- AC3: If data is assembled, `renderDigestHtml` and `sendDigestEmail` are called with the correct data, subject, and HTML.
|
||||
- AC4: The final success or failure of the email sending step is logged.
|
||||
- AC5: If `assembleDigestData` returns no data, email sending is skipped, and an appropriate message is logged.
|
||||
- AC6: The application logs a final completion message.
|
||||
|
||||
---
|
||||
|
||||
### Story 5.5: Implement Stage Testing Utility for Emailing
|
||||
|
||||
- **User Story / Goal:** As a developer, I want a separate script/command to test the email assembly, rendering, and sending logic using persisted local data, including a crucial `--dry-run` option to prevent accidental email sending during tests.
|
||||
- **Detailed Requirements:**
|
||||
- Add `yargs` dependency for argument parsing: `npm install yargs @types/yargs --save-dev`.
|
||||
- Create a new standalone script file: `src/stages/send_digest.ts`.
|
||||
- Import necessary modules: `fs`, `path`, `logger`, `config`, `assembleDigestData`, `renderDigestHtml`, `sendDigestEmail`, `yargs`.
|
||||
- Use `yargs` to parse command-line arguments, specifically looking for a `--dry-run` boolean flag (defaulting to `false`). Allow an optional argument for specifying the date-stamped directory, otherwise default to current date.
|
||||
- The script should:
|
||||
- Initialize logger, load config.
|
||||
- Determine the target date-stamped directory path (from arg or default). Log the target directory.
|
||||
- Call `await assembleDigestData(dateDirPath)`.
|
||||
- If data is assembled and not empty:
|
||||
- Determine the date string for the subject/title.
|
||||
- Call `renderDigestHtml(digestData, dateString)` to get HTML.
|
||||
- Construct the subject string.
|
||||
- Check the `dryRun` flag:
|
||||
- If `true`: Log "DRY RUN enabled. Skipping actual email send.". Log the subject. Save the `htmlContent` to a file in the target directory (e.g., `_digest_preview.html`). Log that the preview file was saved.
|
||||
- If `false`: Log "Live run: Attempting to send email...". Call `await sendDigestEmail(subject, htmlContent)`. Log success/failure based on the return value.
|
||||
- If data assembly fails or is empty, log the error.
|
||||
- Add script to `package.json`: `"stage:email": "ts-node src/stages/send_digest.ts --"`. The `--` allows passing arguments like `--dry-run`.
|
||||
- **Acceptance Criteria (ACs):**
|
||||
- AC1: The file `src/stages/send_digest.ts` exists. `yargs` dependency is added.
|
||||
- AC2: The script `stage:email` is defined in `package.json` allowing arguments.
|
||||
- AC3: Running `npm run stage:email -- --dry-run` reads local data, renders HTML, logs the intent, saves `_digest_preview.html` locally, and does *not* call `sendDigestEmail`.
|
||||
- AC4: Running `npm run stage:email` (without `--dry-run`) reads local data, renders HTML, and *does* call `sendDigestEmail`, logging the outcome.
|
||||
- AC5: The script correctly identifies and acts upon the `--dry-run` flag.
|
||||
- AC6: Logs clearly distinguish between dry runs and live runs and report success/failure.
|
||||
- AC7: The script operates using only local files and the email configuration/service; it does not invoke prior pipeline stages (Algolia, scraping, Ollama).
|
||||
|
||||
## Change Log
|
||||
|
||||
| Change | Date | Version | Description | Author |
|
||||
| ------------- | ---------- | ------- | ------------------------- | -------------- |
|
||||
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 5 | 2-pm |
|
||||
111
CURRENT-V2/usage-demo/final-brief-with-pm-prompt.txt
Normal file
111
CURRENT-V2/usage-demo/final-brief-with-pm-prompt.txt
Normal file
@@ -0,0 +1,111 @@
|
||||
# Project Brief: BMad Hacker Daily Digest
|
||||
|
||||
## Introduction / Problem Statement
|
||||
|
||||
Hacker News (HN) comment threads contain valuable insights but can be prohibitively long to read thoroughly. The BMad Hacker Daily Digest project aims to solve this by providing a time-efficient way to stay informed about the collective intelligence within HN discussions. The service will automatically fetch the top 10 HN stories daily, retrieve a manageable subset of their comments using the Algolia HN API, generate concise summaries of both the linked article (when possible) and the comment discussion using an LLM, and deliver these summaries in a daily email briefing. This project also serves as a practical learning exercise focused on agent-driven development, TypeScript, Node.js backend services, API integration, and local LLM usage with Ollama.
|
||||
|
||||
## Vision & Goals
|
||||
|
||||
- **Vision:** To provide a quick, reliable, and automated way for users to stay informed about the key insights and discussions happening within the Hacker News community without needing to read lengthy comment threads.
|
||||
- **Primary Goals (MVP - SMART):**
|
||||
- **Fetch HN Story Data:** Successfully retrieve the IDs and metadata (title, URL, HN link) of the top 10 Hacker News stories using the Algolia HN Search API when triggered.
|
||||
- **Retrieve Limited Comments:** For each fetched story, retrieve a predefined, limited set of associated comments using the Algolia HN Search API.
|
||||
- **Attempt Article Scraping:** For each story's external URL, attempt to fetch the raw HTML and extract the main article text using basic methods (Node.js native fetch, article-extractor/Cheerio), handling failures gracefully.
|
||||
- **Generate Summaries (LLM):** Using a local LLM (via Ollama, configured endpoint), generate: an "Article Summary" from scraped text (if successful), and a separate "Discussion Summary" from fetched comments.
|
||||
- **Assemble & Send Digest (Manual Trigger):** Format results for 10 stories into a single HTML email and successfully send it to recipients (list defined in config) using Nodemailer when manually triggered via CLI.
|
||||
- **Success Metrics (Initial Ideas for MVP):**
|
||||
- **Successful Execution:** The entire process completes successfully without crashing when manually triggered via CLI for 3 different test runs.
|
||||
- **Digest Content:** The generated email contains results for 10 stories (correct links, discussion summary, article summary where possible). Spot checks confirm relevance.
|
||||
- **Error Handling:** Scraping failures are logged, and the process continues using only comment summaries for affected stories without halting the script.
|
||||
|
||||
## Target Audience / Users
|
||||
|
||||
**Primary User (MVP):** The developer undertaking this project. The primary motivation is learning and demonstrating agent-driven development, TypeScript, Node.js (v22), API integration (Algolia, LLM, Email), local LLMs (Ollama), and configuration management ( .env ). The key need is an interesting, achievable project scope utilizing these technologies.
|
||||
|
||||
**Secondary User (Potential):** Time-constrained HN readers/tech enthusiasts needing automated discussion summaries. Addressing their needs fully is outside MVP scope but informs potential future direction.
|
||||
|
||||
## Key Features / Scope (High-Level Ideas for MVP)
|
||||
|
||||
- Fetch Top HN Stories (Algolia API).
|
||||
- Fetch Limited Comments (Algolia API).
|
||||
- Local File Storage (Date-stamped folder, structured text/JSON files).
|
||||
- Attempt Basic Article Scraping (Node.js v22 native fetch, basic extraction).
|
||||
- Handle Scraping Failures (Log error, proceed with comment-only summary).
|
||||
- Generate Summaries (Local Ollama via configured endpoint: Article Summary if scraped, Discussion Summary always).
|
||||
- Format Digest Email (HTML: Article Summary (opt.), Discussion Summary, HN link, Article link).
|
||||
- Manual Email Dispatch (Nodemailer, credentials from .env , recipient list from .env ).
|
||||
- CLI Trigger (Manual command to run full process).
|
||||
|
||||
**Explicitly OUT of Scope for MVP:** Advanced scraping (JS render, anti-bot), processing _all_ comments/MapReduce summaries, automated scheduling (cron), database integration, cloud deployment/web frontend, user management (sign-ups etc.), production-grade error handling/monitoring/deliverability, fine-tuning LLM prompts, sophisticated retry logic.
|
||||
|
||||
## Known Technical Constraints or Preferences
|
||||
|
||||
- **Constraints/Preferences:**
|
||||
|
||||
- **Language/Runtime:** TypeScript running on Node.js v22.
|
||||
- **Execution Environment:** Local machine execution for MVP.
|
||||
- **Trigger Mechanism:** Manual CLI trigger only for MVP.
|
||||
- **Configuration Management:** Use a `.env` file for configuration: LLM endpoint URL, email credentials, recipient email list, potentially comment fetch limits etc.
|
||||
- **HTTP Requests:** Use Node.js v22 native fetch API (no Axios).
|
||||
- **HN Data Source:** Algolia HN Search API.
|
||||
- **Web Scraping:** Basic, best-effort only (native fetch + static HTML extraction). Must handle failures gracefully.
|
||||
- **LLM Integration:** Local Ollama via configurable endpoint for MVP. Design for potential swap to cloud LLMs. Functionality over quality for MVP.
|
||||
- **Summarization Strategy:** Separate Article/Discussion summaries. Limit comments processed per story (configurable). No MapReduce.
|
||||
- **Data Storage:** Local file system (structured text/JSON in date-stamped folders). No database.
|
||||
- **Email Delivery:** Nodemailer. Read credentials and recipient list from `.env`. Basic setup, no production deliverability focus.
|
||||
- **Primary Goal Context:** Focus on functional pipeline for learning/demonstration.
|
||||
|
||||
- **Risks:**
|
||||
- Algolia HN API Issues: Changes, rate limits, availability.
|
||||
- Web Scraping Fragility: High likelihood of failure limiting Article Summaries.
|
||||
- LLM Variability & Quality: Inconsistent performance/quality from local Ollama; potential errors.
|
||||
*Incomplete Discussion Capture: Limited comment fetching may miss key insights.
|
||||
*Email Configuration/Deliverability: Fragility of personal credentials; potential spam filtering.
|
||||
*Manual Trigger Dependency: Digest only generated on manual execution.
|
||||
*Configuration Errors: Incorrect `.env` settings could break the application.
|
||||
_(User Note: Risks acknowledged and accepted given the project's learning goals.)_
|
||||
|
||||
## Relevant Research (Optional)
|
||||
|
||||
Feasibility: Core concept confirmed technically feasible with available APIs/libraries.
|
||||
Existing Tools & Market Context: Similar tools exist (validating interest), but daily email format appears distinct.
|
||||
API Selection: Algolia HN Search API chosen for filtering/sorting capabilities.
|
||||
Identified Technical Challenges: Confirmed complexities of scraping and handling large comment volumes within LLM limits, informing MVP scope.
|
||||
Local LLM Viability: Ollama confirmed as viable for local MVP development/testing, with potential for future swapping.
|
||||
|
||||
## PM Prompt
|
||||
|
||||
**PM Agent Handoff Prompt: BMad Hacker Daily Digest**
|
||||
|
||||
**Summary of Key Insights:**
|
||||
|
||||
This Project Brief outlines the "BMad Hacker Daily Digest," a command-line tool designed to provide daily email summaries of discussions from top Hacker News (HN) comment threads. The core problem is the time required to read lengthy but valuable HN discussions. The MVP aims to fetch the top 10 HN stories, retrieve a limited set of comments via the Algolia HN API, attempt basic scraping of linked articles (with fallback), generate separate summaries for articles (if scraped) and comments using a local LLM (Ollama), and email the digest to the developer using Nodemailer. This project primarily serves as a learning exercise and demonstration of agent-driven development in TypeScript.
|
||||
|
||||
**Areas Requiring Special Attention (for PRD):**
|
||||
|
||||
- **Comment Selection Logic:** Define the specific criteria for selecting the "limited set" of comments from Algolia (e.g., number of comments, recency, token count limit).
|
||||
- **Basic Scraping Implementation:** Detail the exact steps for the basic article scraping attempt (libraries like Node.js native fetch, article-extractor/Cheerio), including specific error handling and the fallback mechanism.
|
||||
- **LLM Prompting:** Define the precise prompts for generating the "Article Summary" and the "Discussion Summary" separately.
|
||||
- **Email Formatting:** Specify the exact structure, layout, and content presentation within the daily HTML email digest.
|
||||
- **CLI Interface:** Define the specific command(s), arguments, and expected output/feedback for the manual trigger.
|
||||
- **Local File Structure:** Define the structure for storing intermediate data and logs in local text files within date-stamped folders.
|
||||
|
||||
**Development Context:**
|
||||
|
||||
This brief was developed through iterative discussion, starting from general app ideas and refining scope based on user interest (HN discussions) and technical feasibility for a learning/demo project. Key decisions include prioritizing comment summarization, using the Algolia HN API, starting with local execution (Ollama, Nodemailer), and including only a basic, best-effort scraping attempt in the MVP.
|
||||
|
||||
**Guidance on PRD Detail:**
|
||||
|
||||
- Focus detailed requirements and user stories on the core data pipeline: HN API Fetch -> Comment Selection -> Basic Scrape Attempt -> LLM Summarization (x2) -> Email Formatting/Sending -> CLI Trigger.
|
||||
- Keep potential post-MVP enhancements (cloud deployment, frontend, database, advanced scraping, scheduling) as high-level future considerations.
|
||||
- Technical implementation details for API/LLM interaction should allow flexibility for potential future swapping (e.g., Ollama to cloud LLM).
|
||||
|
||||
**User Preferences:**
|
||||
|
||||
- Execution: Manual CLI trigger for MVP.
|
||||
- Data Storage: Local text files for MVP.
|
||||
- LLM: Ollama for local development/MVP. Ability to potentially switch to cloud API later.
|
||||
- Summaries: Generate separate summaries for article (if available) and comments.
|
||||
- API: Use Algolia HN Search API.
|
||||
- Email: Use Nodemailer for self-send in MVP.
|
||||
- Tech Stack: TypeScript, Node.js v22.
|
||||
Reference in New Issue
Block a user