# BMad Hacker Daily Digest Product Requirements Document (PRD) ## Intro The BMad Hacker Daily Digest is a command-line tool designed to address the time-consuming nature of reading extensive Hacker News (HN) comment threads. It aims to provide users with a time-efficient way to grasp the collective intelligence and key insights from discussions on top HN stories. The service will fetch the top 10 HN stories daily, retrieve a configurable number of comments for each, attempt to scrape the linked article, generate separate summaries for the article (if scraped) and the comment discussion using a local LLM, and deliver these summaries in a single daily email briefing triggered manually. This project also serves as a practical learning exercise in agent-driven development, TypeScript, Node.js, API integration, and local LLM usage, starting from the provided "bmad-boilerplate" template. ## Goals and Context - **Project Objectives:** - Provide a quick, reliable, automated way to stay informed about key HN discussions without reading full threads. - Successfully fetch top 10 HN story metadata via Algolia HN API. - Retrieve a _configurable_ number of comments per story (default 50) via Algolia HN API. - Attempt basic scraping of linked article content, handling failures gracefully. - Generate distinct Article Summaries (if scraped) and Discussion Summaries using a local LLM (Ollama). - Assemble summaries for 10 stories into an HTML email and send via Nodemailer upon manual CLI trigger. - Serve as a learning platform for agent-driven development, TypeScript, Node.js v22, API integration, local LLMs, and configuration management, leveraging the "bmad-boilerplate" structure and tooling. - **Measurable Outcomes:** - The tool completes its full process (fetch, scrape attempt, summarize, email) without crashing on manual CLI trigger across multiple test runs. - The generated email digest consistently contains results for 10 stories, including correct links, discussion summaries, and article summaries where scraping was successful. - Errors during article scraping are logged, and the process continues for affected stories using only comment summaries, without halting the script. - **Success Criteria:** - Successful execution of the end-to-end process via CLI trigger for 3 consecutive test runs. - Generated email is successfully sent and received, containing summaries for all 10 fetched stories (article summary optional based on scraping success). - Scraping failures are logged appropriately without stopping the overall process. - **Key Performance Indicators (KPIs):** - Successful Runs / Total Runs (Target: 100% for MVP tests) - Stories with Article Summaries / Total Stories (Measures scraping effectiveness) - Stories with Discussion Summaries / Total Stories (Target: 100%) * Manual Qualitative Check: Relevance and coherence of summaries in the digest. ## Scope and Requirements (MVP / Current Version) ### Functional Requirements (High-Level) - **HN Story Fetching:** Retrieve IDs and metadata (title, URL, HN link) for the top 10 stories from Algolia HN Search API. - **HN Comment Fetching:** For each story, retrieve comments from Algolia HN Search API up to a maximum count defined in a `.env` configuration variable (`MAX_COMMENTS_PER_STORY`, default 50). - **Article Content Scraping:** Attempt to fetch HTML and extract main text content from the story's external URL using basic methods (e.g., Node.js native fetch, optionally `article-extractor` or similar basic library). - **Scraping Failure Handling:** If scraping fails, log the error and proceed with generating only the Discussion Summary for that story. - **LLM Summarization:** - Generate an "Article Summary" from scraped text (if successful) using a configured local LLM (Ollama endpoint). - Generate a "Discussion Summary" from the fetched comments using the same LLM. - Initial Prompts (Placeholders - refine in Epics): - _Article Prompt:_ "Summarize the key points of the following article text: {Article Text}" - _Discussion Prompt:_ "Summarize the main themes, viewpoints, and key insights from the following Hacker News comments: {Comment Texts}" - **Digest Formatting:** Combine results for the 10 stories into a single HTML email. Each story entry should include: Story Title, HN Link, Article Link, Article Summary (if available), Discussion Summary. - **Email Dispatch:** Send the formatted HTML email using Nodemailer to a recipient list defined in `.env`. Use credentials also stored in `.env`. - **Main Execution Trigger:** Initiate the _entire implemented pipeline_ via a manual command-line interface (CLI) trigger, using the standard scripts defined in the boilerplate (`npm run dev`, `npm start` after build). Each functional epic should add its capability to this main execution flow. - **Configuration:** Manage external parameters (Algolia API details (if needed), LLM endpoint URL, `MAX_COMMENTS_PER_STORY`, Nodemailer credentials, recipient email list, output directory path) via a `.env` file, based on the provided `.env.example`. - **Incremental Logging & Data Persistence:** - Implement basic console logging for key steps and errors throughout the pipeline. - Persist intermediate data artifacts (fetched stories/comments, scraped text, generated summaries) to local files within a configurable, date-stamped directory structure (e.g., `./output/YYYY-MM-DD/`). - This persistence should be implemented incrementally within the relevant functional epics (Data Acquisition, Scraping, Summarization). - **Stage Testing Utilities:** - Provide separate utility scripts or CLI commands to allow testing individual pipeline stages in isolation (e.g., fetching HN data, scraping URLs, summarizing text, sending email). - These utilities should support using locally saved files as input (e.g., test scraping using a file containing story URLs, test summarization using a file containing text). This facilitates development and debugging. ### Non-Functional Requirements (NFRs) - **Performance:** MVP focuses on functionality over speed. Should complete within a reasonable time (e.g., < 5 minutes) on a typical developer machine for local LLM use. No specific response time targets. - **Scalability:** Designed for single-user, local execution. No scaling requirements for MVP. - **Reliability/Availability:** - The script must handle article scraping failures gracefully (log and continue). - Basic error handling for API calls (e.g., log network errors). - Local LLM interaction may fail; basic error logging is sufficient for MVP. - No requirement for automated retries or production-grade error handling. - **Security:** - Email credentials must be stored securely via `.env` file and not committed to version control (as per boilerplate `.gitignore`). - No other specific security requirements for local MVP. - **Maintainability:** - Code should be well-structured TypeScript. - Adherence to the linting (ESLint) and formatting (Prettier) rules configured in the "bmad-boilerplate" is required. Use `npm run lint` and `npm run format`. - Modularity is desired to potentially swap LLM providers later and facilitate stage testing. - **Usability/Accessibility:** N/A (CLI tool for developer). - **Other Constraints:** - Must use TypeScript and Node.js v22. - Must run locally on the developer's machine. - Must use Node.js v22 native `Workspace` API for HTTP requests. - Must use Algolia HN Search API for HN data. - Must use a local Ollama instance via a configurable HTTP endpoint. - Must use Nodemailer for email dispatch. - Must use `.env` for configuration based on `.env.example`. - Must use local file system for logging and intermediate data storage. Ensure output/log directories are gitignored. - Focus on a functional pipeline for learning/demonstration. ### User Experience (UX) Requirements (High-Level) - The primary UX goal is to deliver a time-saving digest. - For the developer user, the main CLI interaction should be simple: using standard boilerplate scripts like `npm run dev` or `npm start` to trigger the full process. - Feedback during CLI execution (e.g., "Fetching stories...", "Summarizing story X/10...", "Sending email...") is desirable via console logging. - Separate CLI commands/scripts for testing individual stages should provide clear input/output mechanisms. ### Integration Requirements (High-Level) - **Algolia HN Search API:** Fetching top stories and comments. Requires understanding API structure and query parameters. - **Ollama Service:** Sending text (article content, comments) and receiving summaries via its API endpoint. Endpoint URL must be configurable. - **SMTP Service (via Nodemailer):** Sending the final digest email. Requires valid SMTP credentials and recipient list configured in `.env`. ### Testing Requirements (High-Level) - MVP success relies on manual end-to-end test runs confirming successful execution and valid email output. - Unit/integration tests are encouraged using the **Jest framework configured in the boilerplate**. Focus testing effort on the core pipeline components. Use `npm run test`. - **Stage-specific testing utilities (as defined in Functional Requirements) are required** to support development and verification of individual pipeline components. ## Epic Overview (MVP / Current Version) _(Revised proposal)_ - **Epic 1: Project Initialization & Core Setup** - Goal: Initialize the project using "bmad-boilerplate", manage dependencies, setup `.env` and config loading, establish basic CLI entry point, setup basic logging and output directory structure. - **Epic 2: HN Data Acquisition & Persistence** - Goal: Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally. Implement stage testing utility for fetching. - **Epic 3: Article Scraping & Persistence** - Goal: Implement best-effort article scraping/extraction, handle failures gracefully, and persist scraped text locally. Implement stage testing utility for scraping. - **Epic 4: LLM Summarization & Persistence** - Goal: Integrate with Ollama to generate article/discussion summaries from persisted data and persist summaries locally. Implement stage testing utility for summarization. - **Epic 5: Digest Assembly & Email Dispatch** - Goal: Format collected summaries into an HTML email using persisted data and send it using Nodemailer. Implement stage testing utility for emailing (with dry-run option). ## Key Reference Documents - `docs/project-brief.md` - `docs/prd.md` (This document) - `docs/architecture.md` (To be created by Architect) - `docs/epic1.md`, `docs/epic2.md`, ... (To be created) - `docs/tech-stack.md` (Partially defined by boilerplate, to be finalized by Architect) - `docs/api-reference.md` (If needed for Algolia/Ollama details) - `docs/testing-strategy.md` (Optional - low priority for MVP, Jest setup provided) ## Post-MVP / Future Enhancements - Advanced scraping techniques (handling JavaScript, anti-bot measures). - Processing all comments (potentially using MapReduce summarization). - Automated scheduling (e.g., using cron). - Database integration for storing results or tracking. - Cloud deployment and web frontend. - User management (sign-ups, preferences). - Production-grade error handling, monitoring, and email deliverability. - Fine-tuning LLM prompts or models. - Sophisticated retry logic for API calls or scraping. - Cloud LLM integration. ## Change Log | Change | Date | Version | Description | Author | | ----------------------- | ---------- | ------- | --------------------------------------- | ------ | | Refined Epics & Testing | 2025-05-04 | 0.3 | Removed Epic 6, added stage testing req | 2-pm | | Boilerplate Added | 2025-05-04 | 0.2 | Updated to reflect use of boilerplate | 2-pm | | Initial Draft | 2025-05-04 | 0.1 | First draft based on brief | 2-pm | ## Initial Architect Prompt ### Technical Infrastructure - **Starter Project/Template:** **Mandatory: Use the provided "bmad-boilerplate".** This includes TypeScript setup, Node.js v22 compatibility, Jest, ESLint, Prettier, `ts-node`, `.env` handling via `.env.example`, and standard scripts (`dev`, `build`, `test`, `lint`, `format`). - **Hosting/Cloud Provider:** Local machine execution only for MVP. No cloud deployment. - **Frontend Platform:** N/A (CLI tool). - **Backend Platform:** Node.js v22 with TypeScript (as provided by the boilerplate). No specific Node.js framework mandated, but structure should support modularity and align with boilerplate setup. - **Database Requirements:** None. Local file system for intermediate data storage and logging only. Structure TBD (e.g., `./output/YYYY-MM-DD/`). Ensure output directory is configurable via `.env` and gitignored. ### Technical Constraints - Must adhere to the structure and tooling provided by "bmad-boilerplate". - Must use Node.js v22 native `Workspace` for HTTP requests. - Must use the Algolia HN Search API for fetching HN data. - Must integrate with a local Ollama instance via a configurable HTTP endpoint. Design should allow potential swapping to other LLM APIs later. - Must use Nodemailer for sending email. - Configuration (LLM endpoint, email credentials, recipients, `MAX_COMMENTS_PER_STORY`, output dir path) must be managed via a `.env` file based on `.env.example`. - Article scraping must be basic, best-effort, and handle failures gracefully without stopping the main process. - Intermediate data must be persisted locally incrementally. - Code must adhere to the ESLint and Prettier configurations within the boilerplate. ### Deployment Considerations - Execution is manual via CLI trigger only, using `npm run dev` or `npm start`. - No CI/CD required for MVP. - Single environment: local development machine. ### Local Development & Testing Requirements - The entire application runs locally. - The main CLI command (`npm run dev`/`start`) should execute the _full implemented pipeline_. - **Separate utility scripts/commands MUST be provided** for testing individual pipeline stages (fetch, scrape, summarize, email) potentially using local file I/O. Architecture should facilitate creating these stage runners. (e.g., `npm run stage:fetch`, `npm run stage:scrape -- --inputFile `, `npm run stage:summarize -- --inputFile `, `npm run stage:email -- --inputFile [--dry-run]`). - The boilerplate provides `npm run test` using Jest for running automated unit/integration tests. - The boilerplate provides `npm run lint` and `npm run format` for code quality checks. - Basic console logging is required. File logging can be considered by the architect. - Testability of individual modules (API clients, scraper, summarizer, emailer) is crucial and should leverage the Jest setup and stage testing utilities. ### Other Technical Considerations - **Modularity:** Design components (HN client, scraper, LLM client, emailer) with clear interfaces to facilitate potential future modifications (e.g., changing LLM provider) and independent stage testing. - **Error Handling:** Focus on robust handling of scraping failures and basic handling of API/network errors. Implement within the boilerplate structure. Logging should clearly indicate errors. - **Resource Management:** Be mindful of local resources when interacting with the LLM, although optimization is not a primary MVP goal. - **Dependency Management:** Add necessary production dependencies (e.g., `nodemailer`, potentially `article-extractor`, libraries for date handling or file system operations if needed) to the boilerplate's `package.json`. Keep dependencies minimal. - **Configuration Loading:** Implement a robust way to load and validate settings from the `.env` file early in the application startup.