11 KiB
BMad Hacker Daily Digest Architecture Document
Technical Summary
This document outlines the technical architecture for the BMad Hacker Daily Digest, a command-line tool built with TypeScript and Node.js v22. It adheres to the structure provided by the "bmad-boilerplate". The system fetches the top 10 Hacker News stories and their comments daily via the Algolia HN API, attempts to scrape linked articles, generates summaries for both articles (if scraped) and discussions using a local Ollama instance, persists intermediate data locally, and sends an HTML digest email via Nodemailer upon manual CLI execution. The architecture emphasizes modularity through distinct clients and processing stages, facilitating independent stage testing as required by the PRD. Execution is strictly local for the MVP.
High-Level Overview
The application follows a sequential pipeline architecture triggered by a single CLI command (npm run dev or npm start). Data flows through distinct stages: HN Data Acquisition, Article Scraping, LLM Summarization, and Digest Assembly/Email Dispatch. Each stage persists its output to a date-stamped local directory, allowing subsequent stages to operate on this data and enabling stage-specific testing utilities.
(Diagram Suggestion for Canvas: Create a flowchart showing the stages below)
graph TD
A[CLI Trigger (npm run dev/start)] --> B(Initialize: Load Config, Setup Logger, Create Output Dir);
B --> C{Fetch HN Data (Top 10 Stories + Comments)};
C -- Story/Comment Data --> D(Persist HN Data: ./output/YYYY-MM-DD/{storyId}_data.json);
D --> E{Attempt Article Scraping (per story)};
E -- Scraped Text (if successful) --> F(Persist Article Text: ./output/YYYY-MM-DD/{storyId}_article.txt);
F --> G{Generate Summaries (Article + Discussion via Ollama)};
G -- Summaries --> H(Persist Summaries: ./output/YYYY-MM-DD/{storyId}_summary.json);
H --> I{Assemble Digest (Read persisted data)};
I -- HTML Content --> J{Send Email via Nodemailer};
J --> K(Log Final Status & Exit);
subgraph Stage Testing Utilities
direction LR
T1[npm run stage:fetch] --> D;
T2[npm run stage:scrape] --> F;
T3[npm run stage:summarize] --> H;
T4[npm run stage:email] --> J;
end
C --> |Error/Skip| G; // If no comments
E --> |Skip/Fail| G; // If no URL or scrape fails
G --> |Summarization Fail| H; // Persist null summaries
I --> |Assembly Fail| K; // Skip email if assembly fails
Component View
The application logic resides primarily within the src/ directory, organized into modules responsible for specific pipeline stages or cross-cutting concerns.
(Diagram Suggestion for Canvas: Create a component diagram showing modules and dependencies)
graph TD
subgraph src ["Source Code (src/)"]
direction LR
Entry["index.ts (Main Orchestrator)"]
subgraph Config ["Configuration"]
ConfMod["config.ts"]
EnvFile[".env File"]
end
subgraph Utils ["Utilities"]
Logger["logger.ts"]
end
subgraph Clients ["External Service Clients"]
Algolia["clients/algoliaHNClient.ts"]
Ollama["clients/ollamaClient.ts"]
end
Scraper["scraper/articleScraper.ts"]
subgraph Email ["Email Handling"]
Assembler["email/contentAssembler.ts"]
Templater["email/templater.ts (or within Assembler)"]
Sender["email/emailSender.ts"]
Nodemailer["(nodemailer library)"]
end
subgraph Stages ["Stage Testing Scripts (src/stages/)"]
FetchStage["fetch_hn_data.ts"]
ScrapeStage["scrape_articles.ts"]
SummarizeStage["summarize_content.ts"]
SendStage["send_digest.ts"]
end
Entry --> ConfMod;
Entry --> Logger;
Entry --> Algolia;
Entry --> Scraper;
Entry --> Ollama;
Entry --> Assembler;
Entry --> Templater;
Entry --> Sender;
Algolia -- uses --> NativeFetch["Node.js v22 Native Workspace"];
Ollama -- uses --> NativeFetch;
Scraper -- uses --> NativeFetch;
Scraper -- uses --> ArticleExtractor["(@extractus/article-extractor)"];
Sender -- uses --> Nodemailer;
ConfMod -- reads --> EnvFile;
Assembler -- reads --> LocalFS["Local Filesystem (./output)"];
Entry -- writes --> LocalFS;
FetchStage --> Algolia;
FetchStage --> LocalFS;
ScrapeStage --> Scraper;
ScrapeStage --> LocalFS;
SummarizeStage --> Ollama;
SummarizeStage --> LocalFS;
SendStage --> Assembler;
SendStage --> Templater;
SendStage --> Sender;
SendStage --> LocalFS;
end
CLI["CLI (npm run ...)"] --> Entry;
CLI -- runs --> FetchStage;
CLI -- runs --> ScrapeStage;
CLI -- runs --> SummarizeStage;
CLI -- runs --> SendStage;
Module Descriptions:
src/index.ts: The main entry point, orchestrating the entire pipeline flow from initialization to final email dispatch. Imports and calls functions from other modules.src/config.ts: Responsible for loading and validating environment variables from the.envfile using thedotenvlibrary.src/logger.ts: Provides a simple console logging utility used throughout the application.src/clients/algoliaHNClient.ts: Encapsulates interaction with the Algolia Hacker News Search API using the nativeWorkspaceAPI for fetching stories and comments.src/clients/ollamaClient.ts: Encapsulates interaction with the local Ollama API endpoint using the nativeWorkspaceAPI for generating summaries.src/scraper/articleScraper.ts: Handles fetching article HTML using nativeWorkspaceand extracting text content using@extractus/article-extractor. Includes robust error handling for fetch and extraction failures.src/email/contentAssembler.ts: Reads persisted story data and summaries from the local output directory.src/email/templater.ts(or integrated): Renders the HTML email content using the assembled data.src/email/emailSender.ts: Configures and uses Nodemailer to send the generated HTML email.src/stages/*.ts: Individual scripts designed to run specific pipeline stages independently for testing, using persisted data from previous stages as input where applicable.
Key Architectural Decisions & Patterns
- Pipeline Architecture: A sequential flow where each stage processes data and passes artifacts to the next via the local filesystem. Chosen for simplicity and to easily support independent stage testing.
- Local Execution & File Persistence: All execution is local, and intermediate artifacts (
_data.json,_article.txt,_summary.json) are stored in a date-stamped./outputdirectory. This avoids database setup for MVP and facilitates debugging/stage testing. - Native
WorkspaceAPI: Mandated by constraints for all HTTP requests (Algolia, Ollama, Article Scraping). Ensures usage of the latest Node.js features. - Modular Clients: External interactions (Algolia, Ollama) are encapsulated in dedicated client modules (
src/clients/). This promotes separation of concerns and makes swapping implementations (e.g., different LLM API) easier. - Configuration via
.env: Standard approach usingdotenvfor managing API keys, endpoints, and behavioral parameters (as per boilerplate). - Stage Testing Utilities: Dedicated scripts (
src/stages/*.ts) allow isolated testing of fetching, scraping, summarization, and emailing, fulfilling a key PRD requirement. - Graceful Error Handling (Scraping): Article scraping failures are logged but do not halt the main pipeline, allowing the process to continue with discussion summaries only, as required. Other errors (API, LLM) are logged.
Core Workflow / Sequence Diagrams (Simplified)
(Diagram Suggestion for Canvas: Create a Sequence Diagram showing interactions)
sequenceDiagram
participant CLI
participant Index as index.ts
participant Config as config.ts
participant Logger as logger.ts
participant OutputDir as Output Dir Setup
participant Algolia as algoliaHNClient.ts
participant Scraper as articleScraper.ts
participant Ollama as ollamaClient.ts
participant Assembler as contentAssembler.ts
participant Templater as templater.ts
participant Sender as emailSender.ts
participant FS as Local Filesystem (./output/YYYY-MM-DD)
CLI->>Index: npm run dev
Index->>Config: Load .env vars
Index->>Logger: Initialize
Index->>OutputDir: Create/Verify Date Dir
Index->>Algolia: fetchTopStories()
Algolia-->>Index: stories[]
loop For Each Story
Index->>Algolia: fetchCommentsForStory(storyId, MAX_COMMENTS)
Algolia-->>Index: comments[]
Index->>FS: Write {storyId}_data.json
alt Has Valid story.url
Index->>Scraper: scrapeArticle(story.url)
Scraper-->>Index: articleContent (string | null)
alt Scrape Success
Index->>FS: Write {storyId}_article.txt
end
end
alt Has articleContent
Index->>Ollama: generateSummary(ARTICLE_PROMPT, articleContent)
Ollama-->>Index: articleSummary (string | null)
end
alt Has comments[]
Index->>Ollama: generateSummary(DISCUSSION_PROMPT, formattedComments)
Ollama-->>Index: discussionSummary (string | null)
end
Index->>FS: Write {storyId}_summary.json
end
Index->>Assembler: assembleDigestData(dateDirPath)
Assembler->>FS: Read _data.json, _summary.json files
Assembler-->>Index: digestData[]
alt digestData is not empty
Index->>Templater: renderDigestHtml(digestData, date)
Templater-->>Index: htmlContent
Index->>Sender: sendDigestEmail(subject, htmlContent)
Sender-->>Index: success (boolean)
end
Index->>Logger: Log final status
Infrastructure and Deployment Overview
- Cloud Provider(s): N/A (Local Machine Execution Only for MVP)
- Core Services Used: N/A
- Infrastructure as Code (IaC): N/A
- Deployment Strategy: Manual CLI execution (
npm run devfor development withts-node,npm run build && npm startfor running compiled JS). No automated deployment pipeline for MVP. - Environments: Single: Local development machine.
Key Reference Documents
- docs/prd.md
- docs/epic1-draft.txt, docs/epic2-draft.txt, ... docs/epic5-draft.txt
- docs/tech-stack.md
- docs/project-structure.md
- docs/coding-standards.md
- docs/api-reference.md
- docs/data-models.md
- docs/environment-vars.md
- docs/testing-strategy.md
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Initial draft | 2025-05-04 | 0.1 | Initial draft based on PRD & Epics | 3-Architect |