Files
BMAD-METHOD/V2-FULL-DEMO-WALKTHROUGH/architecture.txt
Brian Madison f7d6a4d2b5 V2 Frozen
2025-06-04 22:16:41 -05:00

255 lines
11 KiB
Plaintext

# BMad Hacker Daily Digest Architecture Document
## Technical Summary
The BMad Hacker Daily Digest is a command-line interface (CLI) tool designed to provide users with concise summaries of top Hacker News (HN) stories and their associated comment discussions . Built with TypeScript and Node.js (v22) , it operates entirely on the user's local machine . The core functionality involves a sequential pipeline: fetching story and comment data from the Algolia HN Search API , attempting to scrape linked article content , generating summaries using a local Ollama LLM instance , persisting intermediate data to the local filesystem , and finally assembling and emailing an HTML digest using Nodemailer . The architecture emphasizes modularity and testability, including mandatory standalone scripts for testing each pipeline stage . The project starts from the `bmad-boilerplate` template .
## High-Level Overview
The application follows a simple, sequential pipeline architecture executed via a manual CLI command (`npm run dev` or `npm start`) . There is no persistent database; the local filesystem is used to store intermediate data artifacts (fetched data, scraped text, summaries) between steps within a date-stamped directory . All external HTTP communication (Algolia API, article scraping, Ollama API) utilizes the native Node.js `Workspace` API .
```mermaid
graph LR
subgraph "BMad Hacker Daily Digest (Local CLI)"
A[index.ts / CLI Trigger] --> B(core/pipeline.ts);
B --> C{Fetch HN Data};
B --> D{Scrape Articles};
B --> E{Summarize Content};
B --> F{Assemble & Email Digest};
C --> G["Local FS (_data.json)"];
D --> H["Local FS (_article.txt)"];
E --> I["Local FS (_summary.json)"];
F --> G;
F --> H;
F --> I;
end
subgraph External Services
X[Algolia HN API];
Y[Article Websites];
Z["Ollama API (Local)"];
W[SMTP Service];
end
C --> X;
D --> Y;
E --> Z;
F --> W;
style G fill:#eee,stroke:#333,stroke-width:1px
style H fill:#eee,stroke:#333,stroke-width:1px
style I fill:#eee,stroke:#333,stroke-width:1px
```
## Component View
The application code (`src/`) is organized into logical modules based on the defined project structure (`docs/project-structure.md`). Key components include:
- **`src/index.ts`**: The main entry point, handling CLI invocation and initiating the pipeline.
- **`src/core/pipeline.ts`**: Orchestrates the sequential execution of the main pipeline stages (fetch, scrape, summarize, email).
- **`src/clients/`**: Modules responsible for interacting with external APIs.
- `algoliaHNClient.ts`: Communicates with the Algolia HN Search API.
- `ollamaClient.ts`: Communicates with the local Ollama API.
- **`src/scraper/articleScraper.ts`**: Handles fetching and extracting text content from article URLs.
- **`src/email/`**: Manages digest assembly, HTML rendering, and email dispatch via Nodemailer.
- `contentAssembler.ts`: Reads persisted data.
- `templates.ts`: Renders HTML.
- `emailSender.ts`: Sends the email.
- **`src/stages/`**: Contains standalone scripts (`Workspace_hn_data.ts`, `scrape_articles.ts`, etc.) for testing individual pipeline stages independently using local data where applicable.
- **`src/utils/`**: Shared utilities for configuration loading (`config.ts`), logging (`logger.ts`), and date handling (`dateUtils.ts`).
- **`src/types/`**: Shared TypeScript interfaces and types.
```mermaid
graph TD
subgraph AppComponents ["Application Components (src/)"]
Idx(index.ts) --> Pipe(core/pipeline.ts);
Pipe --> HNClient(clients/algoliaHNClient.ts);
Pipe --> Scraper(scraper/articleScraper.ts);
Pipe --> OllamaClient(clients/ollamaClient.ts);
Pipe --> Assembler(email/contentAssembler.ts);
Pipe --> Renderer(email/templates.ts);
Pipe --> Sender(email/emailSender.ts);
Pipe --> Utils(utils/*);
Pipe --> Types(types/*);
HNClient --> Types;
OllamaClient --> Types;
Assembler --> Types;
Renderer --> Types;
subgraph StageRunnersSubgraph ["Stage Runners (src/stages/)"]
SFetch(fetch_hn_data.ts) --> HNClient;
SFetch --> Utils;
SScrape(scrape_articles.ts) --> Scraper;
SScrape --> Utils;
SSummarize(summarize_content.ts) --> OllamaClient;
SSummarize --> Utils;
SEmail(send_digest.ts) --> Assembler;
SEmail --> Renderer;
SEmail --> Sender;
SEmail --> Utils;
end
end
subgraph Externals ["Filesystem & External"]
FS["Local Filesystem (output/)"]
Algolia((Algolia HN API))
Websites((Article Websites))
Ollama["Ollama API (Local)"]
SMTP((SMTP Service))
end
HNClient --> Algolia;
Scraper --> Websites;
OllamaClient --> Ollama;
Sender --> SMTP;
Pipe --> FS;
Assembler --> FS;
SFetch --> FS;
SScrape --> FS;
SSummarize --> FS;
SEmail --> FS;
%% Apply style to the subgraph using its ID after the block
style StageRunnersSubgraph fill:#f9f,stroke:#333,stroke-width:1px
```
## Key Architectural Decisions & Patterns
- **Architecture Style:** Simple Sequential Pipeline executed via CLI.
- **Execution Environment:** Local machine only; no cloud deployment, no database for MVP.
- **Data Handling:** Intermediate data persisted to local filesystem in a date-stamped directory.
- **HTTP Client:** Mandatory use of native Node.js v22 `Workspace` API for all external HTTP requests.
- **Modularity:** Code organized into distinct modules for clients, scraping, email, core logic, utilities, and types to promote separation of concerns and testability.
- **Stage Testing:** Mandatory standalone scripts (`src/stages/*`) allow independent testing of each pipeline phase.
- **Configuration:** Environment variables loaded natively from `.env` file; no `dotenv` package required.
- **Error Handling:** Graceful handling of scraping failures (log and continue); basic logging for other API/network errors.
- **Logging:** Basic console logging via a simple wrapper (`src/utils/logger.ts`) for MVP; structured file logging is a post-MVP consideration.
- **Key Libraries:** `@extractus/article-extractor`, `date-fns`, `nodemailer`, `yargs`. (See `docs/tech-stack.md`)
## Core Workflow / Sequence Diagram (Main Pipeline)
```mermaid
sequenceDiagram
participant CLI_User as CLI User
participant Idx as src/index.ts
participant Pipe as core/pipeline.ts
participant Cfg as utils/config.ts
participant Log as utils/logger.ts
participant HN as clients/algoliaHNClient.ts
participant FS as Local FS [output/]
participant Scr as scraper/articleScraper.ts
participant Oll as clients/ollamaClient.ts
participant Asm as email/contentAssembler.ts
participant Tpl as email/templates.ts
participant Snd as email/emailSender.ts
participant Alg as Algolia HN API
participant Web as Article Website
participant Olm as Ollama API [Local]
participant SMTP as SMTP Service
Note right of CLI_User: Triggered via 'npm run dev'/'start'
CLI_User ->> Idx: Execute script
Idx ->> Cfg: Load .env config
Idx ->> Log: Initialize logger
Idx ->> Pipe: runPipeline()
Pipe ->> Log: Log start
Pipe ->> HN: fetchTopStories()
HN ->> Alg: Request stories
Alg -->> HN: Story data
HN -->> Pipe: stories[]
loop For each story
Pipe ->> HN: fetchCommentsForStory(storyId, max)
HN ->> Alg: Request comments
Alg -->> HN: Comment data
HN -->> Pipe: comments[]
Pipe ->> FS: Write {storyId}_data.json
end
Pipe ->> Log: Log HN fetch complete
loop For each story with URL
Pipe ->> Scr: scrapeArticle(story.url)
Scr ->> Web: Request article HTML [via Workspace]
alt Scraping Successful
Web -->> Scr: HTML content
Scr -->> Pipe: articleText: string
Pipe ->> FS: Write {storyId}_article.txt
else Scraping Failed / Skipped
Web -->> Scr: Error / Non-HTML / Timeout
Scr -->> Pipe: articleText: null
Pipe ->> Log: Log scraping failure/skip
end
end
Pipe ->> Log: Log scraping complete
loop For each story
alt Article content exists
Pipe ->> Oll: generateSummary(prompt, articleText)
Oll ->> Olm: POST /api/generate [article]
Olm -->> Oll: Article Summary / Error
Oll -->> Pipe: articleSummary: string | null
else No article content
Pipe -->> Pipe: Set articleSummary = null
end
alt Comments exist
Pipe ->> Pipe: Format comments to text block
Pipe ->> Oll: generateSummary(prompt, commentsText)
Oll ->> Olm: POST /api/generate [comments]
Olm -->> Oll: Discussion Summary / Error
Oll -->> Pipe: discussionSummary: string | null
else No comments
Pipe -->> Pipe: Set discussionSummary = null
end
Pipe ->> FS: Write {storyId}_summary.json
end
Pipe ->> Log: Log summarization complete
Pipe ->> Asm: assembleDigestData(dateDirPath)
Asm ->> FS: Read _data.json, _summary.json files
FS -->> Asm: File contents
Asm -->> Pipe: digestData[]
alt Digest data assembled
Pipe ->> Tpl: renderDigestHtml(digestData, date)
Tpl -->> Pipe: htmlContent: string
Pipe ->> Snd: sendDigestEmail(subject, htmlContent)
Snd ->> Cfg: Load email config
Snd ->> SMTP: Send email
SMTP -->> Snd: Success/Failure
Snd -->> Pipe: success: boolean
Pipe ->> Log: Log email result
else Assembly failed / No data
Pipe ->> Log: Log skipping email
end
Pipe ->> Log: Log finished
```
## Infrastructure and Deployment Overview
- **Cloud Provider(s):** N/A. Executes locally on the user's machine.
- **Core Services Used:** N/A (relies on external Algolia API, local Ollama, target websites, SMTP provider).
- **Infrastructure as Code (IaC):** N/A.
- **Deployment Strategy:** Manual execution via CLI (`npm run dev` or `npm run start` after `npm run build`). No CI/CD pipeline required for MVP.
- **Environments:** Single environment: local development machine.
## Key Reference Documents
- `docs/prd.md`
- `docs/epic1.md` ... `docs/epic5.md`
- `docs/tech-stack.md`
- `docs/project-structure.md`
- `docs/data-models.md`
- `docs/api-reference.md`
- `docs/environment-vars.md`
- `docs/coding-standards.md`
- `docs/testing-strategy.md`
- `docs/prompts.md`
## Change Log
| Change | Date | Version | Description | Author |
| ------------- | ---------- | ------- | -------------------------- | ----------- |
| Initial draft | 2025-05-04 | 0.1 | Initial draft based on PRD | 3-Architect |