11 KiB
BMad Hacker Daily Digest Architecture Document
Technical Summary
The BMad Hacker Daily Digest is a command-line interface (CLI) tool designed to provide users with concise summaries of top Hacker News (HN) stories and their associated comment discussions . Built with TypeScript and Node.js (v22) , it operates entirely on the user's local machine . The core functionality involves a sequential pipeline: fetching story and comment data from the Algolia HN Search API , attempting to scrape linked article content , generating summaries using a local Ollama LLM instance , persisting intermediate data to the local filesystem , and finally assembling and emailing an HTML digest using Nodemailer . The architecture emphasizes modularity and testability, including mandatory standalone scripts for testing each pipeline stage . The project starts from the bmad-boilerplate template .
High-Level Overview
The application follows a simple, sequential pipeline architecture executed via a manual CLI command (npm run dev or npm start) . There is no persistent database; the local filesystem is used to store intermediate data artifacts (fetched data, scraped text, summaries) between steps within a date-stamped directory . All external HTTP communication (Algolia API, article scraping, Ollama API) utilizes the native Node.js Workspace API .
graph LR
subgraph "BMad Hacker Daily Digest (Local CLI)"
A[index.ts / CLI Trigger] --> B(core/pipeline.ts);
B --> C{Fetch HN Data};
B --> D{Scrape Articles};
B --> E{Summarize Content};
B --> F{Assemble & Email Digest};
C --> G["Local FS (_data.json)"];
D --> H["Local FS (_article.txt)"];
E --> I["Local FS (_summary.json)"];
F --> G;
F --> H;
F --> I;
end
subgraph External Services
X[Algolia HN API];
Y[Article Websites];
Z["Ollama API (Local)"];
W[SMTP Service];
end
C --> X;
D --> Y;
E --> Z;
F --> W;
style G fill:#eee,stroke:#333,stroke-width:1px
style H fill:#eee,stroke:#333,stroke-width:1px
style I fill:#eee,stroke:#333,stroke-width:1px
Component View
The application code (src/) is organized into logical modules based on the defined project structure (docs/project-structure.md). Key components include:
src/index.ts: The main entry point, handling CLI invocation and initiating the pipeline.src/core/pipeline.ts: Orchestrates the sequential execution of the main pipeline stages (fetch, scrape, summarize, email).src/clients/: Modules responsible for interacting with external APIs.algoliaHNClient.ts: Communicates with the Algolia HN Search API.ollamaClient.ts: Communicates with the local Ollama API.
src/scraper/articleScraper.ts: Handles fetching and extracting text content from article URLs.src/email/: Manages digest assembly, HTML rendering, and email dispatch via Nodemailer.contentAssembler.ts: Reads persisted data.templates.ts: Renders HTML.emailSender.ts: Sends the email.
src/stages/: Contains standalone scripts (Workspace_hn_data.ts,scrape_articles.ts, etc.) for testing individual pipeline stages independently using local data where applicable.src/utils/: Shared utilities for configuration loading (config.ts), logging (logger.ts), and date handling (dateUtils.ts).src/types/: Shared TypeScript interfaces and types.
graph TD
subgraph AppComponents ["Application Components (src/)"]
Idx(index.ts) --> Pipe(core/pipeline.ts);
Pipe --> HNClient(clients/algoliaHNClient.ts);
Pipe --> Scraper(scraper/articleScraper.ts);
Pipe --> OllamaClient(clients/ollamaClient.ts);
Pipe --> Assembler(email/contentAssembler.ts);
Pipe --> Renderer(email/templates.ts);
Pipe --> Sender(email/emailSender.ts);
Pipe --> Utils(utils/*);
Pipe --> Types(types/*);
HNClient --> Types;
OllamaClient --> Types;
Assembler --> Types;
Renderer --> Types;
subgraph StageRunnersSubgraph ["Stage Runners (src/stages/)"]
SFetch(fetch_hn_data.ts) --> HNClient;
SFetch --> Utils;
SScrape(scrape_articles.ts) --> Scraper;
SScrape --> Utils;
SSummarize(summarize_content.ts) --> OllamaClient;
SSummarize --> Utils;
SEmail(send_digest.ts) --> Assembler;
SEmail --> Renderer;
SEmail --> Sender;
SEmail --> Utils;
end
end
subgraph Externals ["Filesystem & External"]
FS["Local Filesystem (output/)"]
Algolia((Algolia HN API))
Websites((Article Websites))
Ollama["Ollama API (Local)"]
SMTP((SMTP Service))
end
HNClient --> Algolia;
Scraper --> Websites;
OllamaClient --> Ollama;
Sender --> SMTP;
Pipe --> FS;
Assembler --> FS;
SFetch --> FS;
SScrape --> FS;
SSummarize --> FS;
SEmail --> FS;
%% Apply style to the subgraph using its ID after the block
style StageRunnersSubgraph fill:#f9f,stroke:#333,stroke-width:1px
Key Architectural Decisions & Patterns
- Architecture Style: Simple Sequential Pipeline executed via CLI.
- Execution Environment: Local machine only; no cloud deployment, no database for MVP.
- Data Handling: Intermediate data persisted to local filesystem in a date-stamped directory.
- HTTP Client: Mandatory use of native Node.js v22
WorkspaceAPI for all external HTTP requests. - Modularity: Code organized into distinct modules for clients, scraping, email, core logic, utilities, and types to promote separation of concerns and testability.
- Stage Testing: Mandatory standalone scripts (
src/stages/*) allow independent testing of each pipeline phase. - Configuration: Environment variables loaded natively from
.envfile; nodotenvpackage required. - Error Handling: Graceful handling of scraping failures (log and continue); basic logging for other API/network errors.
- Logging: Basic console logging via a simple wrapper (
src/utils/logger.ts) for MVP; structured file logging is a post-MVP consideration. - Key Libraries:
@extractus/article-extractor,date-fns,nodemailer,yargs. (Seedocs/tech-stack.md)
Core Workflow / Sequence Diagram (Main Pipeline)
sequenceDiagram
participant CLI_User as CLI User
participant Idx as src/index.ts
participant Pipe as core/pipeline.ts
participant Cfg as utils/config.ts
participant Log as utils/logger.ts
participant HN as clients/algoliaHNClient.ts
participant FS as Local FS [output/]
participant Scr as scraper/articleScraper.ts
participant Oll as clients/ollamaClient.ts
participant Asm as email/contentAssembler.ts
participant Tpl as email/templates.ts
participant Snd as email/emailSender.ts
participant Alg as Algolia HN API
participant Web as Article Website
participant Olm as Ollama API [Local]
participant SMTP as SMTP Service
Note right of CLI_User: Triggered via 'npm run dev'/'start'
CLI_User ->> Idx: Execute script
Idx ->> Cfg: Load .env config
Idx ->> Log: Initialize logger
Idx ->> Pipe: runPipeline()
Pipe ->> Log: Log start
Pipe ->> HN: fetchTopStories()
HN ->> Alg: Request stories
Alg -->> HN: Story data
HN -->> Pipe: stories[]
loop For each story
Pipe ->> HN: fetchCommentsForStory(storyId, max)
HN ->> Alg: Request comments
Alg -->> HN: Comment data
HN -->> Pipe: comments[]
Pipe ->> FS: Write {storyId}_data.json
end
Pipe ->> Log: Log HN fetch complete
loop For each story with URL
Pipe ->> Scr: scrapeArticle(story.url)
Scr ->> Web: Request article HTML [via Workspace]
alt Scraping Successful
Web -->> Scr: HTML content
Scr -->> Pipe: articleText: string
Pipe ->> FS: Write {storyId}_article.txt
else Scraping Failed / Skipped
Web -->> Scr: Error / Non-HTML / Timeout
Scr -->> Pipe: articleText: null
Pipe ->> Log: Log scraping failure/skip
end
end
Pipe ->> Log: Log scraping complete
loop For each story
alt Article content exists
Pipe ->> Oll: generateSummary(prompt, articleText)
Oll ->> Olm: POST /api/generate [article]
Olm -->> Oll: Article Summary / Error
Oll -->> Pipe: articleSummary: string | null
else No article content
Pipe -->> Pipe: Set articleSummary = null
end
alt Comments exist
Pipe ->> Pipe: Format comments to text block
Pipe ->> Oll: generateSummary(prompt, commentsText)
Oll ->> Olm: POST /api/generate [comments]
Olm -->> Oll: Discussion Summary / Error
Oll -->> Pipe: discussionSummary: string | null
else No comments
Pipe -->> Pipe: Set discussionSummary = null
end
Pipe ->> FS: Write {storyId}_summary.json
end
Pipe ->> Log: Log summarization complete
Pipe ->> Asm: assembleDigestData(dateDirPath)
Asm ->> FS: Read _data.json, _summary.json files
FS -->> Asm: File contents
Asm -->> Pipe: digestData[]
alt Digest data assembled
Pipe ->> Tpl: renderDigestHtml(digestData, date)
Tpl -->> Pipe: htmlContent: string
Pipe ->> Snd: sendDigestEmail(subject, htmlContent)
Snd ->> Cfg: Load email config
Snd ->> SMTP: Send email
SMTP -->> Snd: Success/Failure
Snd -->> Pipe: success: boolean
Pipe ->> Log: Log email result
else Assembly failed / No data
Pipe ->> Log: Log skipping email
end
Pipe ->> Log: Log finished
Infrastructure and Deployment Overview
- Cloud Provider(s): N/A. Executes locally on the user's machine.
- Core Services Used: N/A (relies on external Algolia API, local Ollama, target websites, SMTP provider).
- Infrastructure as Code (IaC): N/A.
- Deployment Strategy: Manual execution via CLI (
npm run devornpm run startafternpm run build). No CI/CD pipeline required for MVP. - Environments: Single environment: local development machine.
Key Reference Documents
docs/prd.mddocs/epic1.md...docs/epic5.mddocs/tech-stack.mddocs/project-structure.mddocs/data-models.mddocs/api-reference.mddocs/environment-vars.mddocs/coding-standards.mddocs/testing-strategy.mddocs/prompts.md
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Initial draft | 2025-05-04 | 0.1 | Initial draft based on PRD | 3-Architect |