16 KiB
BMad Hacker Daily Digest Product Requirements Document (PRD)
Intro
The BMad Hacker Daily Digest is a command-line tool designed to address the time-consuming nature of reading extensive Hacker News (HN) comment threads. It aims to provide users with a time-efficient way to grasp the collective intelligence and key insights from discussions on top HN stories. The service will fetch the top 10 HN stories daily, retrieve a configurable number of comments for each, attempt to scrape the linked article, generate separate summaries for the article (if scraped) and the comment discussion using a local LLM, and deliver these summaries in a single daily email briefing triggered manually. This project also serves as a practical learning exercise in agent-driven development, TypeScript, Node.js, API integration, and local LLM usage, starting from the provided "bmad-boilerplate" template.
Goals and Context
- Project Objectives:
- Provide a quick, reliable, automated way to stay informed about key HN discussions without reading full threads.
- Successfully fetch top 10 HN story metadata via Algolia HN API.
- Retrieve a configurable number of comments per story (default 50) via Algolia HN API.
- Attempt basic scraping of linked article content, handling failures gracefully.
- Generate distinct Article Summaries (if scraped) and Discussion Summaries using a local LLM (Ollama).
- Assemble summaries for 10 stories into an HTML email and send via Nodemailer upon manual CLI trigger.
- Serve as a learning platform for agent-driven development, TypeScript, Node.js v22, API integration, local LLMs, and configuration management, leveraging the "bmad-boilerplate" structure and tooling.
- Measurable Outcomes:
- The tool completes its full process (fetch, scrape attempt, summarize, email) without crashing on manual CLI trigger across multiple test runs.
- The generated email digest consistently contains results for 10 stories, including correct links, discussion summaries, and article summaries where scraping was successful.
- Errors during article scraping are logged, and the process continues for affected stories using only comment summaries, without halting the script.
- Success Criteria:
- Successful execution of the end-to-end process via CLI trigger for 3 consecutive test runs.
- Generated email is successfully sent and received, containing summaries for all 10 fetched stories (article summary optional based on scraping success).
- Scraping failures are logged appropriately without stopping the overall process.
- Key Performance Indicators (KPIs):
- Successful Runs / Total Runs (Target: 100% for MVP tests)
- Stories with Article Summaries / Total Stories (Measures scraping effectiveness)
- Stories with Discussion Summaries / Total Stories (Target: 100%)
- Manual Qualitative Check: Relevance and coherence of summaries in the digest.
Scope and Requirements (MVP / Current Version)
Functional Requirements (High-Level)
- HN Story Fetching: Retrieve IDs and metadata (title, URL, HN link) for the top 10 stories from Algolia HN Search API.
- HN Comment Fetching: For each story, retrieve comments from Algolia HN Search API up to a maximum count defined in a
.envconfiguration variable (MAX_COMMENTS_PER_STORY, default 50). - Article Content Scraping: Attempt to fetch HTML and extract main text content from the story's external URL using basic methods (e.g., Node.js native fetch, optionally
article-extractoror similar basic library). - Scraping Failure Handling: If scraping fails, log the error and proceed with generating only the Discussion Summary for that story.
- LLM Summarization:
- Generate an "Article Summary" from scraped text (if successful) using a configured local LLM (Ollama endpoint).
- Generate a "Discussion Summary" from the fetched comments using the same LLM.
- Initial Prompts (Placeholders - refine in Epics):
- Article Prompt: "Summarize the key points of the following article text: {Article Text}"
- Discussion Prompt: "Summarize the main themes, viewpoints, and key insights from the following Hacker News comments: {Comment Texts}"
- Digest Formatting: Combine results for the 10 stories into a single HTML email. Each story entry should include: Story Title, HN Link, Article Link, Article Summary (if available), Discussion Summary.
- Email Dispatch: Send the formatted HTML email using Nodemailer to a recipient list defined in
.env. Use credentials also stored in.env. - Main Execution Trigger: Initiate the entire implemented pipeline via a manual command-line interface (CLI) trigger, using the standard scripts defined in the boilerplate (
npm run dev,npm startafter build). Each functional epic should add its capability to this main execution flow. - Configuration: Manage external parameters (Algolia API details (if needed), LLM endpoint URL,
MAX_COMMENTS_PER_STORY, Nodemailer credentials, recipient email list, output directory path) via a.envfile, based on the provided.env.example. - Incremental Logging & Data Persistence:
- Implement basic console logging for key steps and errors throughout the pipeline.
- Persist intermediate data artifacts (fetched stories/comments, scraped text, generated summaries) to local files within a configurable, date-stamped directory structure (e.g.,
./output/YYYY-MM-DD/). - This persistence should be implemented incrementally within the relevant functional epics (Data Acquisition, Scraping, Summarization).
- Stage Testing Utilities:
- Provide separate utility scripts or CLI commands to allow testing individual pipeline stages in isolation (e.g., fetching HN data, scraping URLs, summarizing text, sending email).
- These utilities should support using locally saved files as input (e.g., test scraping using a file containing story URLs, test summarization using a file containing text). This facilitates development and debugging.
Non-Functional Requirements (NFRs)
- Performance: MVP focuses on functionality over speed. Should complete within a reasonable time (e.g., < 5 minutes) on a typical developer machine for local LLM use. No specific response time targets.
- Scalability: Designed for single-user, local execution. No scaling requirements for MVP.
- Reliability/Availability:
- The script must handle article scraping failures gracefully (log and continue).
- Basic error handling for API calls (e.g., log network errors).
- Local LLM interaction may fail; basic error logging is sufficient for MVP.
- No requirement for automated retries or production-grade error handling.
- Security:
- Email credentials must be stored securely via
.envfile and not committed to version control (as per boilerplate.gitignore). - No other specific security requirements for local MVP.
- Email credentials must be stored securely via
- Maintainability:
- Code should be well-structured TypeScript.
- Adherence to the linting (ESLint) and formatting (Prettier) rules configured in the "bmad-boilerplate" is required. Use
npm run lintandnpm run format. - Modularity is desired to potentially swap LLM providers later and facilitate stage testing.
- Usability/Accessibility: N/A (CLI tool for developer).
- Other Constraints:
- Must use TypeScript and Node.js v22.
- Must run locally on the developer's machine.
- Must use Node.js v22 native
WorkspaceAPI for HTTP requests. - Must use Algolia HN Search API for HN data.
- Must use a local Ollama instance via a configurable HTTP endpoint.
- Must use Nodemailer for email dispatch.
- Must use
.envfor configuration based on.env.example. - Must use local file system for logging and intermediate data storage. Ensure output/log directories are gitignored.
- Focus on a functional pipeline for learning/demonstration.
User Experience (UX) Requirements (High-Level)
- The primary UX goal is to deliver a time-saving digest.
- For the developer user, the main CLI interaction should be simple: using standard boilerplate scripts like
npm run devornpm startto trigger the full process. - Feedback during CLI execution (e.g., "Fetching stories...", "Summarizing story X/10...", "Sending email...") is desirable via console logging.
- Separate CLI commands/scripts for testing individual stages should provide clear input/output mechanisms.
Integration Requirements (High-Level)
- Algolia HN Search API: Fetching top stories and comments. Requires understanding API structure and query parameters.
- Ollama Service: Sending text (article content, comments) and receiving summaries via its API endpoint. Endpoint URL must be configurable.
- SMTP Service (via Nodemailer): Sending the final digest email. Requires valid SMTP credentials and recipient list configured in
.env.
Testing Requirements (High-Level)
- MVP success relies on manual end-to-end test runs confirming successful execution and valid email output.
- Unit/integration tests are encouraged using the Jest framework configured in the boilerplate. Focus testing effort on the core pipeline components. Use
npm run test. - Stage-specific testing utilities (as defined in Functional Requirements) are required to support development and verification of individual pipeline components.
Epic Overview (MVP / Current Version)
(Revised proposal)
- Epic 1: Project Initialization & Core Setup - Goal: Initialize the project using "bmad-boilerplate", manage dependencies, setup
.envand config loading, establish basic CLI entry point, setup basic logging and output directory structure. - Epic 2: HN Data Acquisition & Persistence - Goal: Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally. Implement stage testing utility for fetching.
- Epic 3: Article Scraping & Persistence - Goal: Implement best-effort article scraping/extraction, handle failures gracefully, and persist scraped text locally. Implement stage testing utility for scraping.
- Epic 4: LLM Summarization & Persistence - Goal: Integrate with Ollama to generate article/discussion summaries from persisted data and persist summaries locally. Implement stage testing utility for summarization.
- Epic 5: Digest Assembly & Email Dispatch - Goal: Format collected summaries into an HTML email using persisted data and send it using Nodemailer. Implement stage testing utility for emailing (with dry-run option).
Key Reference Documents
docs/project-brief.mddocs/prd.md(This document)docs/architecture.md(To be created by Architect)docs/epic1.md,docs/epic2.md, ... (To be created)docs/tech-stack.md(Partially defined by boilerplate, to be finalized by Architect)docs/api-reference.md(If needed for Algolia/Ollama details)docs/testing-strategy.md(Optional - low priority for MVP, Jest setup provided)
Post-MVP / Future Enhancements
- Advanced scraping techniques (handling JavaScript, anti-bot measures).
- Processing all comments (potentially using MapReduce summarization).
- Automated scheduling (e.g., using cron).
- Database integration for storing results or tracking.
- Cloud deployment and web frontend.
- User management (sign-ups, preferences).
- Production-grade error handling, monitoring, and email deliverability.
- Fine-tuning LLM prompts or models.
- Sophisticated retry logic for API calls or scraping.
- Cloud LLM integration.
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Refined Epics & Testing | 2025-05-04 | 0.3 | Removed Epic 6, added stage testing req | 2-pm |
| Boilerplate Added | 2025-05-04 | 0.2 | Updated to reflect use of boilerplate | 2-pm |
| Initial Draft | 2025-05-04 | 0.1 | First draft based on brief | 2-pm |
Initial Architect Prompt
Technical Infrastructure
- Starter Project/Template: Mandatory: Use the provided "bmad-boilerplate". This includes TypeScript setup, Node.js v22 compatibility, Jest, ESLint, Prettier,
ts-node,.envhandling via.env.example, and standard scripts (dev,build,test,lint,format). - Hosting/Cloud Provider: Local machine execution only for MVP. No cloud deployment.
- Frontend Platform: N/A (CLI tool).
- Backend Platform: Node.js v22 with TypeScript (as provided by the boilerplate). No specific Node.js framework mandated, but structure should support modularity and align with boilerplate setup.
- Database Requirements: None. Local file system for intermediate data storage and logging only. Structure TBD (e.g.,
./output/YYYY-MM-DD/). Ensure output directory is configurable via.envand gitignored.
Technical Constraints
- Must adhere to the structure and tooling provided by "bmad-boilerplate".
- Must use Node.js v22 native
Workspacefor HTTP requests. - Must use the Algolia HN Search API for fetching HN data.
- Must integrate with a local Ollama instance via a configurable HTTP endpoint. Design should allow potential swapping to other LLM APIs later.
- Must use Nodemailer for sending email.
- Configuration (LLM endpoint, email credentials, recipients,
MAX_COMMENTS_PER_STORY, output dir path) must be managed via a.envfile based on.env.example. - Article scraping must be basic, best-effort, and handle failures gracefully without stopping the main process.
- Intermediate data must be persisted locally incrementally.
- Code must adhere to the ESLint and Prettier configurations within the boilerplate.
Deployment Considerations
- Execution is manual via CLI trigger only, using
npm run devornpm start. - No CI/CD required for MVP.
- Single environment: local development machine.
Local Development & Testing Requirements
- The entire application runs locally.
- The main CLI command (
npm run dev/start) should execute the full implemented pipeline. - Separate utility scripts/commands MUST be provided for testing individual pipeline stages (fetch, scrape, summarize, email) potentially using local file I/O. Architecture should facilitate creating these stage runners. (e.g.,
npm run stage:fetch,npm run stage:scrape -- --inputFile <path>,npm run stage:summarize -- --inputFile <path>,npm run stage:email -- --inputFile <path> [--dry-run]). - The boilerplate provides
npm run testusing Jest for running automated unit/integration tests. - The boilerplate provides
npm run lintandnpm run formatfor code quality checks. - Basic console logging is required. File logging can be considered by the architect.
- Testability of individual modules (API clients, scraper, summarizer, emailer) is crucial and should leverage the Jest setup and stage testing utilities.
Other Technical Considerations
- Modularity: Design components (HN client, scraper, LLM client, emailer) with clear interfaces to facilitate potential future modifications (e.g., changing LLM provider) and independent stage testing.
- Error Handling: Focus on robust handling of scraping failures and basic handling of API/network errors. Implement within the boilerplate structure. Logging should clearly indicate errors.
- Resource Management: Be mindful of local resources when interacting with the LLM, although optimization is not a primary MVP goal.
- Dependency Management: Add necessary production dependencies (e.g.,
nodemailer, potentiallyarticle-extractor, libraries for date handling or file system operations if needed) to the boilerplate'spackage.json. Keep dependencies minimal. - Configuration Loading: Implement a robust way to load and validate settings from the
.envfile early in the application startup.