Files

Brian Madison f7d6a4d2b5 V2 Frozen

2025-06-04 22:16:41 -05:00

16 KiB

Raw Blame History

BMad Hacker Daily Digest Product Requirements Document (PRD)

Intro

The BMad Hacker Daily Digest is a command-line tool designed to address the time-consuming nature of reading extensive Hacker News (HN) comment threads. It aims to provide users with a time-efficient way to grasp the collective intelligence and key insights from discussions on top HN stories. The service will fetch the top 10 HN stories daily, retrieve a configurable number of comments for each, attempt to scrape the linked article, generate separate summaries for the article (if scraped) and the comment discussion using a local LLM, and deliver these summaries in a single daily email briefing triggered manually. This project also serves as a practical learning exercise in agent-driven development, TypeScript, Node.js, API integration, and local LLM usage, starting from the provided "bmad-boilerplate" template.

Goals and Context

Project Objectives:
- Provide a quick, reliable, automated way to stay informed about key HN discussions without reading full threads.
- Successfully fetch top 10 HN story metadata via Algolia HN API.
- Retrieve a configurable number of comments per story (default 50) via Algolia HN API.
- Attempt basic scraping of linked article content, handling failures gracefully.
- Generate distinct Article Summaries (if scraped) and Discussion Summaries using a local LLM (Ollama).
- Assemble summaries for 10 stories into an HTML email and send via Nodemailer upon manual CLI trigger.
- Serve as a learning platform for agent-driven development, TypeScript, Node.js v22, API integration, local LLMs, and configuration management, leveraging the "bmad-boilerplate" structure and tooling.
Measurable Outcomes:
- The tool completes its full process (fetch, scrape attempt, summarize, email) without crashing on manual CLI trigger across multiple test runs.
- The generated email digest consistently contains results for 10 stories, including correct links, discussion summaries, and article summaries where scraping was successful.
- Errors during article scraping are logged, and the process continues for affected stories using only comment summaries, without halting the script.
Success Criteria:
- Successful execution of the end-to-end process via CLI trigger for 3 consecutive test runs.
- Generated email is successfully sent and received, containing summaries for all 10 fetched stories (article summary optional based on scraping success).
- Scraping failures are logged appropriately without stopping the overall process.
Key Performance Indicators (KPIs):
- Successful Runs / Total Runs (Target: 100% for MVP tests)
- Stories with Article Summaries / Total Stories (Measures scraping effectiveness)
- Stories with Discussion Summaries / Total Stories (Target: 100%)
- Manual Qualitative Check: Relevance and coherence of summaries in the digest.

Scope and Requirements (MVP / Current Version)

Functional Requirements (High-Level)

HN Story Fetching: Retrieve IDs and metadata (title, URL, HN link) for the top 10 stories from Algolia HN Search API.
HN Comment Fetching: For each story, retrieve comments from Algolia HN Search API up to a maximum count defined in a .env configuration variable (MAX_COMMENTS_PER_STORY, default 50).
Article Content Scraping: Attempt to fetch HTML and extract main text content from the story's external URL using basic methods (e.g., Node.js native fetch, optionally article-extractor or similar basic library).
Scraping Failure Handling: If scraping fails, log the error and proceed with generating only the Discussion Summary for that story.
LLM Summarization:
- Generate an "Article Summary" from scraped text (if successful) using a configured local LLM (Ollama endpoint).
- Generate a "Discussion Summary" from the fetched comments using the same LLM.
- Initial Prompts (Placeholders - refine in Epics):
  - Article Prompt: "Summarize the key points of the following article text: {Article Text}"
  - Discussion Prompt: "Summarize the main themes, viewpoints, and key insights from the following Hacker News comments: {Comment Texts}"
Digest Formatting: Combine results for the 10 stories into a single HTML email. Each story entry should include: Story Title, HN Link, Article Link, Article Summary (if available), Discussion Summary.
Email Dispatch: Send the formatted HTML email using Nodemailer to a recipient list defined in .env. Use credentials also stored in .env.
Main Execution Trigger: Initiate the entire implemented pipeline via a manual command-line interface (CLI) trigger, using the standard scripts defined in the boilerplate (npm run dev, npm start after build). Each functional epic should add its capability to this main execution flow.
Configuration: Manage external parameters (Algolia API details (if needed), LLM endpoint URL, MAX_COMMENTS_PER_STORY, Nodemailer credentials, recipient email list, output directory path) via a .env file, based on the provided .env.example.
Incremental Logging & Data Persistence:
- Implement basic console logging for key steps and errors throughout the pipeline.
- Persist intermediate data artifacts (fetched stories/comments, scraped text, generated summaries) to local files within a configurable, date-stamped directory structure (e.g., ./output/YYYY-MM-DD/).
- This persistence should be implemented incrementally within the relevant functional epics (Data Acquisition, Scraping, Summarization).
Stage Testing Utilities:
- Provide separate utility scripts or CLI commands to allow testing individual pipeline stages in isolation (e.g., fetching HN data, scraping URLs, summarizing text, sending email).
- These utilities should support using locally saved files as input (e.g., test scraping using a file containing story URLs, test summarization using a file containing text). This facilitates development and debugging.

Non-Functional Requirements (NFRs)

Performance: MVP focuses on functionality over speed. Should complete within a reasonable time (e.g., < 5 minutes) on a typical developer machine for local LLM use. No specific response time targets.
Scalability: Designed for single-user, local execution. No scaling requirements for MVP.
Reliability/Availability:
- The script must handle article scraping failures gracefully (log and continue).
- Basic error handling for API calls (e.g., log network errors).
- Local LLM interaction may fail; basic error logging is sufficient for MVP.
- No requirement for automated retries or production-grade error handling.
Security:
- Email credentials must be stored securely via .env file and not committed to version control (as per boilerplate .gitignore).
- No other specific security requirements for local MVP.
Maintainability:
- Code should be well-structured TypeScript.
- Adherence to the linting (ESLint) and formatting (Prettier) rules configured in the "bmad-boilerplate" is required. Use npm run lint and npm run format.
- Modularity is desired to potentially swap LLM providers later and facilitate stage testing.
Usability/Accessibility: N/A (CLI tool for developer).
Other Constraints:
- Must use TypeScript and Node.js v22.
- Must run locally on the developer's machine.
- Must use Node.js v22 native Workspace API for HTTP requests.
- Must use Algolia HN Search API for HN data.
- Must use a local Ollama instance via a configurable HTTP endpoint.
- Must use Nodemailer for email dispatch.
- Must use .env for configuration based on .env.example.
- Must use local file system for logging and intermediate data storage. Ensure output/log directories are gitignored.
- Focus on a functional pipeline for learning/demonstration.

User Experience (UX) Requirements (High-Level)

The primary UX goal is to deliver a time-saving digest.
For the developer user, the main CLI interaction should be simple: using standard boilerplate scripts like npm run dev or npm start to trigger the full process.
Feedback during CLI execution (e.g., "Fetching stories...", "Summarizing story X/10...", "Sending email...") is desirable via console logging.
Separate CLI commands/scripts for testing individual stages should provide clear input/output mechanisms.

Integration Requirements (High-Level)

Algolia HN Search API: Fetching top stories and comments. Requires understanding API structure and query parameters.
Ollama Service: Sending text (article content, comments) and receiving summaries via its API endpoint. Endpoint URL must be configurable.
SMTP Service (via Nodemailer): Sending the final digest email. Requires valid SMTP credentials and recipient list configured in .env.

Testing Requirements (High-Level)

MVP success relies on manual end-to-end test runs confirming successful execution and valid email output.
Unit/integration tests are encouraged using the Jest framework configured in the boilerplate. Focus testing effort on the core pipeline components. Use npm run test.
Stage-specific testing utilities (as defined in Functional Requirements) are required to support development and verification of individual pipeline components.

Epic Overview (MVP / Current Version)

(Revised proposal)

Epic 1: Project Initialization & Core Setup - Goal: Initialize the project using "bmad-boilerplate", manage dependencies, setup .env and config loading, establish basic CLI entry point, setup basic logging and output directory structure.
Epic 2: HN Data Acquisition & Persistence - Goal: Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally. Implement stage testing utility for fetching.
Epic 3: Article Scraping & Persistence - Goal: Implement best-effort article scraping/extraction, handle failures gracefully, and persist scraped text locally. Implement stage testing utility for scraping.
Epic 4: LLM Summarization & Persistence - Goal: Integrate with Ollama to generate article/discussion summaries from persisted data and persist summaries locally. Implement stage testing utility for summarization.
Epic 5: Digest Assembly & Email Dispatch - Goal: Format collected summaries into an HTML email using persisted data and send it using Nodemailer. Implement stage testing utility for emailing (with dry-run option).

Key Reference Documents

docs/project-brief.md
docs/prd.md (This document)
docs/architecture.md (To be created by Architect)
docs/epic1.md, docs/epic2.md, ... (To be created)
docs/tech-stack.md (Partially defined by boilerplate, to be finalized by Architect)
docs/api-reference.md (If needed for Algolia/Ollama details)
docs/testing-strategy.md (Optional - low priority for MVP, Jest setup provided)

Post-MVP / Future Enhancements

Advanced scraping techniques (handling JavaScript, anti-bot measures).
Processing all comments (potentially using MapReduce summarization).
Automated scheduling (e.g., using cron).
Database integration for storing results or tracking.
Cloud deployment and web frontend.
User management (sign-ups, preferences).
Production-grade error handling, monitoring, and email deliverability.
Fine-tuning LLM prompts or models.
Sophisticated retry logic for API calls or scraping.
Cloud LLM integration.

Change Log

Change	Date	Version	Description	Author
Refined Epics & Testing	2025-05-04	0.3	Removed Epic 6, added stage testing req	2-pm
Boilerplate Added	2025-05-04	0.2	Updated to reflect use of boilerplate	2-pm
Initial Draft	2025-05-04	0.1	First draft based on brief	2-pm

Initial Architect Prompt

Technical Infrastructure

Starter Project/Template: Mandatory: Use the provided "bmad-boilerplate". This includes TypeScript setup, Node.js v22 compatibility, Jest, ESLint, Prettier, ts-node, .env handling via .env.example, and standard scripts (dev, build, test, lint, format).
Hosting/Cloud Provider: Local machine execution only for MVP. No cloud deployment.
Frontend Platform: N/A (CLI tool).
Backend Platform: Node.js v22 with TypeScript (as provided by the boilerplate). No specific Node.js framework mandated, but structure should support modularity and align with boilerplate setup.
Database Requirements: None. Local file system for intermediate data storage and logging only. Structure TBD (e.g., ./output/YYYY-MM-DD/). Ensure output directory is configurable via .env and gitignored.

Technical Constraints

Must adhere to the structure and tooling provided by "bmad-boilerplate".
Must use Node.js v22 native Workspace for HTTP requests.
Must use the Algolia HN Search API for fetching HN data.
Must integrate with a local Ollama instance via a configurable HTTP endpoint. Design should allow potential swapping to other LLM APIs later.
Must use Nodemailer for sending email.
Configuration (LLM endpoint, email credentials, recipients, MAX_COMMENTS_PER_STORY, output dir path) must be managed via a .env file based on .env.example.
Article scraping must be basic, best-effort, and handle failures gracefully without stopping the main process.
Intermediate data must be persisted locally incrementally.
Code must adhere to the ESLint and Prettier configurations within the boilerplate.

Deployment Considerations

Execution is manual via CLI trigger only, using npm run dev or npm start.
No CI/CD required for MVP.
Single environment: local development machine.

Local Development & Testing Requirements

The entire application runs locally.
The main CLI command (npm run dev/start) should execute the full implemented pipeline.
Separate utility scripts/commands MUST be provided for testing individual pipeline stages (fetch, scrape, summarize, email) potentially using local file I/O. Architecture should facilitate creating these stage runners. (e.g., npm run stage:fetch, npm run stage:scrape -- --inputFile <path>, npm run stage:summarize -- --inputFile <path>, npm run stage:email -- --inputFile <path> [--dry-run]).
The boilerplate provides npm run test using Jest for running automated unit/integration tests.
The boilerplate provides npm run lint and npm run format for code quality checks.
Basic console logging is required. File logging can be considered by the architect.
Testability of individual modules (API clients, scraper, summarizer, emailer) is crucial and should leverage the Jest setup and stage testing utilities.

Other Technical Considerations

Modularity: Design components (HN client, scraper, LLM client, emailer) with clear interfaces to facilitate potential future modifications (e.g., changing LLM provider) and independent stage testing.
Error Handling: Focus on robust handling of scraping failures and basic handling of API/network errors. Implement within the boilerplate structure. Logging should clearly indicate errors.
Resource Management: Be mindful of local resources when interacting with the LLM, although optimization is not a primary MVP goal.
Dependency Management: Add necessary production dependencies (e.g., nodemailer, potentially article-extractor, libraries for date handling or file system operations if needed) to the boilerplate's package.json. Keep dependencies minimal.
Configuration Loading: Implement a robust way to load and validate settings from the .env file early in the application startup.

16 KiB Raw Blame History