190 lines
16 KiB
Plaintext
190 lines
16 KiB
Plaintext
# BMad Hacker Daily Digest Product Requirements Document (PRD)
|
|
|
|
## Intro
|
|
|
|
The BMad Hacker Daily Digest is a command-line tool designed to address the time-consuming nature of reading extensive Hacker News (HN) comment threads. It aims to provide users with a time-efficient way to grasp the collective intelligence and key insights from discussions on top HN stories. The service will fetch the top 10 HN stories daily, retrieve a configurable number of comments for each, attempt to scrape the linked article, generate separate summaries for the article (if scraped) and the comment discussion using a local LLM, and deliver these summaries in a single daily email briefing triggered manually. This project also serves as a practical learning exercise in agent-driven development, TypeScript, Node.js, API integration, and local LLM usage, starting from the provided "bmad-boilerplate" template.
|
|
|
|
## Goals and Context
|
|
|
|
- **Project Objectives:**
|
|
- Provide a quick, reliable, automated way to stay informed about key HN discussions without reading full threads.
|
|
- Successfully fetch top 10 HN story metadata via Algolia HN API.
|
|
- Retrieve a _configurable_ number of comments per story (default 50) via Algolia HN API.
|
|
- Attempt basic scraping of linked article content, handling failures gracefully.
|
|
- Generate distinct Article Summaries (if scraped) and Discussion Summaries using a local LLM (Ollama).
|
|
- Assemble summaries for 10 stories into an HTML email and send via Nodemailer upon manual CLI trigger.
|
|
- Serve as a learning platform for agent-driven development, TypeScript, Node.js v22, API integration, local LLMs, and configuration management, leveraging the "bmad-boilerplate" structure and tooling.
|
|
- **Measurable Outcomes:**
|
|
- The tool completes its full process (fetch, scrape attempt, summarize, email) without crashing on manual CLI trigger across multiple test runs.
|
|
- The generated email digest consistently contains results for 10 stories, including correct links, discussion summaries, and article summaries where scraping was successful.
|
|
- Errors during article scraping are logged, and the process continues for affected stories using only comment summaries, without halting the script.
|
|
- **Success Criteria:**
|
|
- Successful execution of the end-to-end process via CLI trigger for 3 consecutive test runs.
|
|
- Generated email is successfully sent and received, containing summaries for all 10 fetched stories (article summary optional based on scraping success).
|
|
- Scraping failures are logged appropriately without stopping the overall process.
|
|
- **Key Performance Indicators (KPIs):**
|
|
- Successful Runs / Total Runs (Target: 100% for MVP tests)
|
|
- Stories with Article Summaries / Total Stories (Measures scraping effectiveness)
|
|
- Stories with Discussion Summaries / Total Stories (Target: 100%)
|
|
* Manual Qualitative Check: Relevance and coherence of summaries in the digest.
|
|
|
|
## Scope and Requirements (MVP / Current Version)
|
|
|
|
### Functional Requirements (High-Level)
|
|
|
|
- **HN Story Fetching:** Retrieve IDs and metadata (title, URL, HN link) for the top 10 stories from Algolia HN Search API.
|
|
- **HN Comment Fetching:** For each story, retrieve comments from Algolia HN Search API up to a maximum count defined in a `.env` configuration variable (`MAX_COMMENTS_PER_STORY`, default 50).
|
|
- **Article Content Scraping:** Attempt to fetch HTML and extract main text content from the story's external URL using basic methods (e.g., Node.js native fetch, optionally `article-extractor` or similar basic library).
|
|
- **Scraping Failure Handling:** If scraping fails, log the error and proceed with generating only the Discussion Summary for that story.
|
|
- **LLM Summarization:**
|
|
- Generate an "Article Summary" from scraped text (if successful) using a configured local LLM (Ollama endpoint).
|
|
- Generate a "Discussion Summary" from the fetched comments using the same LLM.
|
|
- Initial Prompts (Placeholders - refine in Epics):
|
|
- _Article Prompt:_ "Summarize the key points of the following article text: {Article Text}"
|
|
- _Discussion Prompt:_ "Summarize the main themes, viewpoints, and key insights from the following Hacker News comments: {Comment Texts}"
|
|
- **Digest Formatting:** Combine results for the 10 stories into a single HTML email. Each story entry should include: Story Title, HN Link, Article Link, Article Summary (if available), Discussion Summary.
|
|
- **Email Dispatch:** Send the formatted HTML email using Nodemailer to a recipient list defined in `.env`. Use credentials also stored in `.env`.
|
|
- **Main Execution Trigger:** Initiate the _entire implemented pipeline_ via a manual command-line interface (CLI) trigger, using the standard scripts defined in the boilerplate (`npm run dev`, `npm start` after build). Each functional epic should add its capability to this main execution flow.
|
|
- **Configuration:** Manage external parameters (Algolia API details (if needed), LLM endpoint URL, `MAX_COMMENTS_PER_STORY`, Nodemailer credentials, recipient email list, output directory path) via a `.env` file, based on the provided `.env.example`.
|
|
- **Incremental Logging & Data Persistence:**
|
|
- Implement basic console logging for key steps and errors throughout the pipeline.
|
|
- Persist intermediate data artifacts (fetched stories/comments, scraped text, generated summaries) to local files within a configurable, date-stamped directory structure (e.g., `./output/YYYY-MM-DD/`).
|
|
- This persistence should be implemented incrementally within the relevant functional epics (Data Acquisition, Scraping, Summarization).
|
|
- **Stage Testing Utilities:**
|
|
- Provide separate utility scripts or CLI commands to allow testing individual pipeline stages in isolation (e.g., fetching HN data, scraping URLs, summarizing text, sending email).
|
|
- These utilities should support using locally saved files as input (e.g., test scraping using a file containing story URLs, test summarization using a file containing text). This facilitates development and debugging.
|
|
|
|
### Non-Functional Requirements (NFRs)
|
|
|
|
- **Performance:** MVP focuses on functionality over speed. Should complete within a reasonable time (e.g., < 5 minutes) on a typical developer machine for local LLM use. No specific response time targets.
|
|
- **Scalability:** Designed for single-user, local execution. No scaling requirements for MVP.
|
|
- **Reliability/Availability:**
|
|
- The script must handle article scraping failures gracefully (log and continue).
|
|
- Basic error handling for API calls (e.g., log network errors).
|
|
- Local LLM interaction may fail; basic error logging is sufficient for MVP.
|
|
- No requirement for automated retries or production-grade error handling.
|
|
- **Security:**
|
|
- Email credentials must be stored securely via `.env` file and not committed to version control (as per boilerplate `.gitignore`).
|
|
- No other specific security requirements for local MVP.
|
|
- **Maintainability:**
|
|
- Code should be well-structured TypeScript.
|
|
- Adherence to the linting (ESLint) and formatting (Prettier) rules configured in the "bmad-boilerplate" is required. Use `npm run lint` and `npm run format`.
|
|
- Modularity is desired to potentially swap LLM providers later and facilitate stage testing.
|
|
- **Usability/Accessibility:** N/A (CLI tool for developer).
|
|
- **Other Constraints:**
|
|
- Must use TypeScript and Node.js v22.
|
|
- Must run locally on the developer's machine.
|
|
- Must use Node.js v22 native `Workspace` API for HTTP requests.
|
|
- Must use Algolia HN Search API for HN data.
|
|
- Must use a local Ollama instance via a configurable HTTP endpoint.
|
|
- Must use Nodemailer for email dispatch.
|
|
- Must use `.env` for configuration based on `.env.example`.
|
|
- Must use local file system for logging and intermediate data storage. Ensure output/log directories are gitignored.
|
|
- Focus on a functional pipeline for learning/demonstration.
|
|
|
|
### User Experience (UX) Requirements (High-Level)
|
|
|
|
- The primary UX goal is to deliver a time-saving digest.
|
|
- For the developer user, the main CLI interaction should be simple: using standard boilerplate scripts like `npm run dev` or `npm start` to trigger the full process.
|
|
- Feedback during CLI execution (e.g., "Fetching stories...", "Summarizing story X/10...", "Sending email...") is desirable via console logging.
|
|
- Separate CLI commands/scripts for testing individual stages should provide clear input/output mechanisms.
|
|
|
|
### Integration Requirements (High-Level)
|
|
|
|
- **Algolia HN Search API:** Fetching top stories and comments. Requires understanding API structure and query parameters.
|
|
- **Ollama Service:** Sending text (article content, comments) and receiving summaries via its API endpoint. Endpoint URL must be configurable.
|
|
- **SMTP Service (via Nodemailer):** Sending the final digest email. Requires valid SMTP credentials and recipient list configured in `.env`.
|
|
|
|
### Testing Requirements (High-Level)
|
|
|
|
- MVP success relies on manual end-to-end test runs confirming successful execution and valid email output.
|
|
- Unit/integration tests are encouraged using the **Jest framework configured in the boilerplate**. Focus testing effort on the core pipeline components. Use `npm run test`.
|
|
- **Stage-specific testing utilities (as defined in Functional Requirements) are required** to support development and verification of individual pipeline components.
|
|
|
|
## Epic Overview (MVP / Current Version)
|
|
|
|
_(Revised proposal)_
|
|
|
|
- **Epic 1: Project Initialization & Core Setup** - Goal: Initialize the project using "bmad-boilerplate", manage dependencies, setup `.env` and config loading, establish basic CLI entry point, setup basic logging and output directory structure.
|
|
- **Epic 2: HN Data Acquisition & Persistence** - Goal: Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally. Implement stage testing utility for fetching.
|
|
- **Epic 3: Article Scraping & Persistence** - Goal: Implement best-effort article scraping/extraction, handle failures gracefully, and persist scraped text locally. Implement stage testing utility for scraping.
|
|
- **Epic 4: LLM Summarization & Persistence** - Goal: Integrate with Ollama to generate article/discussion summaries from persisted data and persist summaries locally. Implement stage testing utility for summarization.
|
|
- **Epic 5: Digest Assembly & Email Dispatch** - Goal: Format collected summaries into an HTML email using persisted data and send it using Nodemailer. Implement stage testing utility for emailing (with dry-run option).
|
|
|
|
## Key Reference Documents
|
|
|
|
- `docs/project-brief.md`
|
|
- `docs/prd.md` (This document)
|
|
- `docs/architecture.md` (To be created by Architect)
|
|
- `docs/epic1.md`, `docs/epic2.md`, ... (To be created)
|
|
- `docs/tech-stack.md` (Partially defined by boilerplate, to be finalized by Architect)
|
|
- `docs/api-reference.md` (If needed for Algolia/Ollama details)
|
|
- `docs/testing-strategy.md` (Optional - low priority for MVP, Jest setup provided)
|
|
|
|
## Post-MVP / Future Enhancements
|
|
|
|
- Advanced scraping techniques (handling JavaScript, anti-bot measures).
|
|
- Processing all comments (potentially using MapReduce summarization).
|
|
- Automated scheduling (e.g., using cron).
|
|
- Database integration for storing results or tracking.
|
|
- Cloud deployment and web frontend.
|
|
- User management (sign-ups, preferences).
|
|
- Production-grade error handling, monitoring, and email deliverability.
|
|
- Fine-tuning LLM prompts or models.
|
|
- Sophisticated retry logic for API calls or scraping.
|
|
- Cloud LLM integration.
|
|
|
|
## Change Log
|
|
|
|
| Change | Date | Version | Description | Author |
|
|
| ----------------------- | ---------- | ------- | --------------------------------------- | ------ |
|
|
| Refined Epics & Testing | 2025-05-04 | 0.3 | Removed Epic 6, added stage testing req | 2-pm |
|
|
| Boilerplate Added | 2025-05-04 | 0.2 | Updated to reflect use of boilerplate | 2-pm |
|
|
| Initial Draft | 2025-05-04 | 0.1 | First draft based on brief | 2-pm |
|
|
|
|
## Initial Architect Prompt
|
|
|
|
### Technical Infrastructure
|
|
|
|
- **Starter Project/Template:** **Mandatory: Use the provided "bmad-boilerplate".** This includes TypeScript setup, Node.js v22 compatibility, Jest, ESLint, Prettier, `ts-node`, `.env` handling via `.env.example`, and standard scripts (`dev`, `build`, `test`, `lint`, `format`).
|
|
- **Hosting/Cloud Provider:** Local machine execution only for MVP. No cloud deployment.
|
|
- **Frontend Platform:** N/A (CLI tool).
|
|
- **Backend Platform:** Node.js v22 with TypeScript (as provided by the boilerplate). No specific Node.js framework mandated, but structure should support modularity and align with boilerplate setup.
|
|
- **Database Requirements:** None. Local file system for intermediate data storage and logging only. Structure TBD (e.g., `./output/YYYY-MM-DD/`). Ensure output directory is configurable via `.env` and gitignored.
|
|
|
|
### Technical Constraints
|
|
|
|
- Must adhere to the structure and tooling provided by "bmad-boilerplate".
|
|
- Must use Node.js v22 native `Workspace` for HTTP requests.
|
|
- Must use the Algolia HN Search API for fetching HN data.
|
|
- Must integrate with a local Ollama instance via a configurable HTTP endpoint. Design should allow potential swapping to other LLM APIs later.
|
|
- Must use Nodemailer for sending email.
|
|
- Configuration (LLM endpoint, email credentials, recipients, `MAX_COMMENTS_PER_STORY`, output dir path) must be managed via a `.env` file based on `.env.example`.
|
|
- Article scraping must be basic, best-effort, and handle failures gracefully without stopping the main process.
|
|
- Intermediate data must be persisted locally incrementally.
|
|
- Code must adhere to the ESLint and Prettier configurations within the boilerplate.
|
|
|
|
### Deployment Considerations
|
|
|
|
- Execution is manual via CLI trigger only, using `npm run dev` or `npm start`.
|
|
- No CI/CD required for MVP.
|
|
- Single environment: local development machine.
|
|
|
|
### Local Development & Testing Requirements
|
|
|
|
- The entire application runs locally.
|
|
- The main CLI command (`npm run dev`/`start`) should execute the _full implemented pipeline_.
|
|
- **Separate utility scripts/commands MUST be provided** for testing individual pipeline stages (fetch, scrape, summarize, email) potentially using local file I/O. Architecture should facilitate creating these stage runners. (e.g., `npm run stage:fetch`, `npm run stage:scrape -- --inputFile <path>`, `npm run stage:summarize -- --inputFile <path>`, `npm run stage:email -- --inputFile <path> [--dry-run]`).
|
|
- The boilerplate provides `npm run test` using Jest for running automated unit/integration tests.
|
|
- The boilerplate provides `npm run lint` and `npm run format` for code quality checks.
|
|
- Basic console logging is required. File logging can be considered by the architect.
|
|
- Testability of individual modules (API clients, scraper, summarizer, emailer) is crucial and should leverage the Jest setup and stage testing utilities.
|
|
|
|
### Other Technical Considerations
|
|
|
|
- **Modularity:** Design components (HN client, scraper, LLM client, emailer) with clear interfaces to facilitate potential future modifications (e.g., changing LLM provider) and independent stage testing.
|
|
- **Error Handling:** Focus on robust handling of scraping failures and basic handling of API/network errors. Implement within the boilerplate structure. Logging should clearly indicate errors.
|
|
- **Resource Management:** Be mindful of local resources when interacting with the LLM, although optimization is not a primary MVP goal.
|
|
- **Dependency Management:** Add necessary production dependencies (e.g., `nodemailer`, potentially `article-extractor`, libraries for date handling or file system operations if needed) to the boilerplate's `package.json`. Keep dependencies minimal.
|
|
- **Configuration Loading:** Implement a robust way to load and validate settings from the `.env` file early in the application startup.
|