# BMad Hacker Daily Digest Architecture Document ## Technical Summary This document outlines the technical architecture for the BMad Hacker Daily Digest, a command-line tool built with TypeScript and Node.js v22. It adheres to the structure provided by the "bmad-boilerplate". The system fetches the top 10 Hacker News stories and their comments daily via the Algolia HN API, attempts to scrape linked articles, generates summaries for both articles (if scraped) and discussions using a local Ollama instance, persists intermediate data locally, and sends an HTML digest email via Nodemailer upon manual CLI execution. The architecture emphasizes modularity through distinct clients and processing stages, facilitating independent stage testing as required by the PRD. Execution is strictly local for the MVP. ## High-Level Overview The application follows a sequential pipeline architecture triggered by a single CLI command (`npm run dev` or `npm start`). Data flows through distinct stages: HN Data Acquisition, Article Scraping, LLM Summarization, and Digest Assembly/Email Dispatch. Each stage persists its output to a date-stamped local directory, allowing subsequent stages to operate on this data and enabling stage-specific testing utilities. **(Diagram Suggestion for Canvas: Create a flowchart showing the stages below)** ```mermaid graph TD A[CLI Trigger (npm run dev/start)] --> B(Initialize: Load Config, Setup Logger, Create Output Dir); B --> C{Fetch HN Data (Top 10 Stories + Comments)}; C -- Story/Comment Data --> D(Persist HN Data: ./output/YYYY-MM-DD/{storyId}_data.json); D --> E{Attempt Article Scraping (per story)}; E -- Scraped Text (if successful) --> F(Persist Article Text: ./output/YYYY-MM-DD/{storyId}_article.txt); F --> G{Generate Summaries (Article + Discussion via Ollama)}; G -- Summaries --> H(Persist Summaries: ./output/YYYY-MM-DD/{storyId}_summary.json); H --> I{Assemble Digest (Read persisted data)}; I -- HTML Content --> J{Send Email via Nodemailer}; J --> K(Log Final Status & Exit); subgraph Stage Testing Utilities direction LR T1[npm run stage:fetch] --> D; T2[npm run stage:scrape] --> F; T3[npm run stage:summarize] --> H; T4[npm run stage:email] --> J; end C --> |Error/Skip| G; // If no comments E --> |Skip/Fail| G; // If no URL or scrape fails G --> |Summarization Fail| H; // Persist null summaries I --> |Assembly Fail| K; // Skip email if assembly fails ``` ## Component View The application logic resides primarily within the `src/` directory, organized into modules responsible for specific pipeline stages or cross-cutting concerns. **(Diagram Suggestion for Canvas: Create a component diagram showing modules and dependencies)** ```mermaid graph TD subgraph src ["Source Code (src/)"] direction LR Entry["index.ts (Main Orchestrator)"] subgraph Config ["Configuration"] ConfMod["config.ts"] EnvFile[".env File"] end subgraph Utils ["Utilities"] Logger["logger.ts"] end subgraph Clients ["External Service Clients"] Algolia["clients/algoliaHNClient.ts"] Ollama["clients/ollamaClient.ts"] end Scraper["scraper/articleScraper.ts"] subgraph Email ["Email Handling"] Assembler["email/contentAssembler.ts"] Templater["email/templater.ts (or within Assembler)"] Sender["email/emailSender.ts"] Nodemailer["(nodemailer library)"] end subgraph Stages ["Stage Testing Scripts (src/stages/)"] FetchStage["fetch_hn_data.ts"] ScrapeStage["scrape_articles.ts"] SummarizeStage["summarize_content.ts"] SendStage["send_digest.ts"] end Entry --> ConfMod; Entry --> Logger; Entry --> Algolia; Entry --> Scraper; Entry --> Ollama; Entry --> Assembler; Entry --> Templater; Entry --> Sender; Algolia -- uses --> NativeFetch["Node.js v22 Native Workspace"]; Ollama -- uses --> NativeFetch; Scraper -- uses --> NativeFetch; Scraper -- uses --> ArticleExtractor["(@extractus/article-extractor)"]; Sender -- uses --> Nodemailer; ConfMod -- reads --> EnvFile; Assembler -- reads --> LocalFS["Local Filesystem (./output)"]; Entry -- writes --> LocalFS; FetchStage --> Algolia; FetchStage --> LocalFS; ScrapeStage --> Scraper; ScrapeStage --> LocalFS; SummarizeStage --> Ollama; SummarizeStage --> LocalFS; SendStage --> Assembler; SendStage --> Templater; SendStage --> Sender; SendStage --> LocalFS; end CLI["CLI (npm run ...)"] --> Entry; CLI -- runs --> FetchStage; CLI -- runs --> ScrapeStage; CLI -- runs --> SummarizeStage; CLI -- runs --> SendStage; ``` _Module Descriptions:_ - **`src/index.ts`**: The main entry point, orchestrating the entire pipeline flow from initialization to final email dispatch. Imports and calls functions from other modules. - **`src/config.ts`**: Responsible for loading and validating environment variables from the `.env` file using the `dotenv` library. - **`src/logger.ts`**: Provides a simple console logging utility used throughout the application. - **`src/clients/algoliaHNClient.ts`**: Encapsulates interaction with the Algolia Hacker News Search API using the native `Workspace` API for fetching stories and comments. - **`src/clients/ollamaClient.ts`**: Encapsulates interaction with the local Ollama API endpoint using the native `Workspace` API for generating summaries. - **`src/scraper/articleScraper.ts`**: Handles fetching article HTML using native `Workspace` and extracting text content using `@extractus/article-extractor`. Includes robust error handling for fetch and extraction failures. - **`src/email/contentAssembler.ts`**: Reads persisted story data and summaries from the local output directory. - **`src/email/templater.ts` (or integrated)**: Renders the HTML email content using the assembled data. - **`src/email/emailSender.ts`**: Configures and uses Nodemailer to send the generated HTML email. - **`src/stages/*.ts`**: Individual scripts designed to run specific pipeline stages independently for testing, using persisted data from previous stages as input where applicable. ## Key Architectural Decisions & Patterns - **Pipeline Architecture:** A sequential flow where each stage processes data and passes artifacts to the next via the local filesystem. Chosen for simplicity and to easily support independent stage testing. - **Local Execution & File Persistence:** All execution is local, and intermediate artifacts (`_data.json`, `_article.txt`, `_summary.json`) are stored in a date-stamped `./output` directory. This avoids database setup for MVP and facilitates debugging/stage testing. - **Native `Workspace` API:** Mandated by constraints for all HTTP requests (Algolia, Ollama, Article Scraping). Ensures usage of the latest Node.js features. - **Modular Clients:** External interactions (Algolia, Ollama) are encapsulated in dedicated client modules (`src/clients/`). This promotes separation of concerns and makes swapping implementations (e.g., different LLM API) easier. - **Configuration via `.env`:** Standard approach using `dotenv` for managing API keys, endpoints, and behavioral parameters (as per boilerplate). - **Stage Testing Utilities:** Dedicated scripts (`src/stages/*.ts`) allow isolated testing of fetching, scraping, summarization, and emailing, fulfilling a key PRD requirement. - **Graceful Error Handling (Scraping):** Article scraping failures are logged but do not halt the main pipeline, allowing the process to continue with discussion summaries only, as required. Other errors (API, LLM) are logged. ## Core Workflow / Sequence Diagrams (Simplified) **(Diagram Suggestion for Canvas: Create a Sequence Diagram showing interactions)** ```mermaid sequenceDiagram participant CLI participant Index as index.ts participant Config as config.ts participant Logger as logger.ts participant OutputDir as Output Dir Setup participant Algolia as algoliaHNClient.ts participant Scraper as articleScraper.ts participant Ollama as ollamaClient.ts participant Assembler as contentAssembler.ts participant Templater as templater.ts participant Sender as emailSender.ts participant FS as Local Filesystem (./output/YYYY-MM-DD) CLI->>Index: npm run dev Index->>Config: Load .env vars Index->>Logger: Initialize Index->>OutputDir: Create/Verify Date Dir Index->>Algolia: fetchTopStories() Algolia-->>Index: stories[] loop For Each Story Index->>Algolia: fetchCommentsForStory(storyId, MAX_COMMENTS) Algolia-->>Index: comments[] Index->>FS: Write {storyId}_data.json alt Has Valid story.url Index->>Scraper: scrapeArticle(story.url) Scraper-->>Index: articleContent (string | null) alt Scrape Success Index->>FS: Write {storyId}_article.txt end end alt Has articleContent Index->>Ollama: generateSummary(ARTICLE_PROMPT, articleContent) Ollama-->>Index: articleSummary (string | null) end alt Has comments[] Index->>Ollama: generateSummary(DISCUSSION_PROMPT, formattedComments) Ollama-->>Index: discussionSummary (string | null) end Index->>FS: Write {storyId}_summary.json end Index->>Assembler: assembleDigestData(dateDirPath) Assembler->>FS: Read _data.json, _summary.json files Assembler-->>Index: digestData[] alt digestData is not empty Index->>Templater: renderDigestHtml(digestData, date) Templater-->>Index: htmlContent Index->>Sender: sendDigestEmail(subject, htmlContent) Sender-->>Index: success (boolean) end Index->>Logger: Log final status ``` ## Infrastructure and Deployment Overview - **Cloud Provider(s):** N/A (Local Machine Execution Only for MVP) - **Core Services Used:** N/A - **Infrastructure as Code (IaC):** N/A - **Deployment Strategy:** Manual CLI execution (`npm run dev` for development with `ts-node`, `npm run build && npm start` for running compiled JS). No automated deployment pipeline for MVP. - **Environments:** Single: Local development machine. ## Key Reference Documents - docs/prd.md - docs/epic1-draft.txt, docs/epic2-draft.txt, ... docs/epic5-draft.txt - docs/tech-stack.md - docs/project-structure.md - docs/coding-standards.md - docs/api-reference.md - docs/data-models.md - docs/environment-vars.md - docs/testing-strategy.md ## Change Log | Change | Date | Version | Description | Author | | ------------- | ---------- | ------- | ---------------------------------- | ----------- | | Initial draft | 2025-05-04 | 0.1 | Initial draft based on PRD & Epics | 3-Architect |