Files
BMAD-METHOD/V2-FULL-DEMO-WALKTHROUGH/architecture.md
Brian Madison f7d6a4d2b5 V2 Frozen
2025-06-04 22:16:41 -05:00

11 KiB

BMad Hacker Daily Digest Architecture Document

Technical Summary

The BMad Hacker Daily Digest is a command-line interface (CLI) tool designed to provide users with concise summaries of top Hacker News (HN) stories and their associated comment discussions . Built with TypeScript and Node.js (v22) , it operates entirely on the user's local machine . The core functionality involves a sequential pipeline: fetching story and comment data from the Algolia HN Search API , attempting to scrape linked article content , generating summaries using a local Ollama LLM instance , persisting intermediate data to the local filesystem , and finally assembling and emailing an HTML digest using Nodemailer . The architecture emphasizes modularity and testability, including mandatory standalone scripts for testing each pipeline stage . The project starts from the bmad-boilerplate template .

High-Level Overview

The application follows a simple, sequential pipeline architecture executed via a manual CLI command (npm run dev or npm start) . There is no persistent database; the local filesystem is used to store intermediate data artifacts (fetched data, scraped text, summaries) between steps within a date-stamped directory . All external HTTP communication (Algolia API, article scraping, Ollama API) utilizes the native Node.js Workspace API .

graph LR
    subgraph "BMad Hacker Daily Digest (Local CLI)"
        A[index.ts / CLI Trigger] --> B(core/pipeline.ts);
        B --> C{Fetch HN Data};
        B --> D{Scrape Articles};
        B --> E{Summarize Content};
        B --> F{Assemble & Email Digest};
        C --> G["Local FS (_data.json)"];
        D --> H["Local FS (_article.txt)"];
        E --> I["Local FS (_summary.json)"];
        F --> G;
        F --> H;
        F --> I;
    end

    subgraph External Services
        X[Algolia HN API];
        Y[Article Websites];
        Z["Ollama API (Local)"];
        W[SMTP Service];
    end

    C --> X;
    D --> Y;
    E --> Z;
    F --> W;

    style G fill:#eee,stroke:#333,stroke-width:1px
    style H fill:#eee,stroke:#333,stroke-width:1px
    style I fill:#eee,stroke:#333,stroke-width:1px

Component View

The application code (src/) is organized into logical modules based on the defined project structure (docs/project-structure.md). Key components include:

  • src/index.ts: The main entry point, handling CLI invocation and initiating the pipeline.
  • src/core/pipeline.ts: Orchestrates the sequential execution of the main pipeline stages (fetch, scrape, summarize, email).
  • src/clients/: Modules responsible for interacting with external APIs.
    • algoliaHNClient.ts: Communicates with the Algolia HN Search API.
    • ollamaClient.ts: Communicates with the local Ollama API.
  • src/scraper/articleScraper.ts: Handles fetching and extracting text content from article URLs.
  • src/email/: Manages digest assembly, HTML rendering, and email dispatch via Nodemailer.
    • contentAssembler.ts: Reads persisted data.
    • templates.ts: Renders HTML.
    • emailSender.ts: Sends the email.
  • src/stages/: Contains standalone scripts (Workspace_hn_data.ts, scrape_articles.ts, etc.) for testing individual pipeline stages independently using local data where applicable.
  • src/utils/: Shared utilities for configuration loading (config.ts), logging (logger.ts), and date handling (dateUtils.ts).
  • src/types/: Shared TypeScript interfaces and types.
graph TD
    subgraph AppComponents ["Application Components (src/)"]
        Idx(index.ts) --> Pipe(core/pipeline.ts);
        Pipe --> HNClient(clients/algoliaHNClient.ts);
        Pipe --> Scraper(scraper/articleScraper.ts);
        Pipe --> OllamaClient(clients/ollamaClient.ts);
        Pipe --> Assembler(email/contentAssembler.ts);
        Pipe --> Renderer(email/templates.ts);
        Pipe --> Sender(email/emailSender.ts);

        Pipe --> Utils(utils/*);
        Pipe --> Types(types/*);
        HNClient --> Types;
        OllamaClient --> Types;
        Assembler --> Types;
        Renderer --> Types;

        subgraph StageRunnersSubgraph ["Stage Runners (src/stages/)"]
            SFetch(fetch_hn_data.ts) --> HNClient;
            SFetch --> Utils;
            SScrape(scrape_articles.ts) --> Scraper;
            SScrape --> Utils;
            SSummarize(summarize_content.ts) --> OllamaClient;
            SSummarize --> Utils;
            SEmail(send_digest.ts) --> Assembler;
            SEmail --> Renderer;
            SEmail --> Sender;
            SEmail --> Utils;
        end
    end

    subgraph Externals ["Filesystem & External"]
        FS["Local Filesystem (output/)"]
        Algolia((Algolia HN API))
        Websites((Article Websites))
        Ollama["Ollama API (Local)"]
        SMTP((SMTP Service))
    end

    HNClient --> Algolia;
    Scraper --> Websites;
    OllamaClient --> Ollama;
    Sender --> SMTP;

    Pipe --> FS;
    Assembler --> FS;

    SFetch --> FS;
    SScrape --> FS;
    SSummarize --> FS;
    SEmail --> FS;

    %% Apply style to the subgraph using its ID after the block
    style StageRunnersSubgraph fill:#f9f,stroke:#333,stroke-width:1px

Key Architectural Decisions & Patterns

  • Architecture Style: Simple Sequential Pipeline executed via CLI.
  • Execution Environment: Local machine only; no cloud deployment, no database for MVP.
  • Data Handling: Intermediate data persisted to local filesystem in a date-stamped directory.
  • HTTP Client: Mandatory use of native Node.js v22 Workspace API for all external HTTP requests.
  • Modularity: Code organized into distinct modules for clients, scraping, email, core logic, utilities, and types to promote separation of concerns and testability.
  • Stage Testing: Mandatory standalone scripts (src/stages/*) allow independent testing of each pipeline phase.
  • Configuration: Environment variables loaded natively from .env file; no dotenv package required.
  • Error Handling: Graceful handling of scraping failures (log and continue); basic logging for other API/network errors.
  • Logging: Basic console logging via a simple wrapper (src/utils/logger.ts) for MVP; structured file logging is a post-MVP consideration.
  • Key Libraries: @extractus/article-extractor, date-fns, nodemailer, yargs. (See docs/tech-stack.md)

Core Workflow / Sequence Diagram (Main Pipeline)

sequenceDiagram
    participant CLI_User as CLI User
    participant Idx as src/index.ts
    participant Pipe as core/pipeline.ts
    participant Cfg as utils/config.ts
    participant Log as utils/logger.ts
    participant HN as clients/algoliaHNClient.ts
    participant FS as Local FS [output/]
    participant Scr as scraper/articleScraper.ts
    participant Oll as clients/ollamaClient.ts
    participant Asm as email/contentAssembler.ts
    participant Tpl as email/templates.ts
    participant Snd as email/emailSender.ts
    participant Alg as Algolia HN API
    participant Web as Article Website
    participant Olm as Ollama API [Local]
    participant SMTP as SMTP Service

    Note right of CLI_User: Triggered via 'npm run dev'/'start'

    CLI_User ->> Idx: Execute script
    Idx ->> Cfg: Load .env config
    Idx ->> Log: Initialize logger
    Idx ->> Pipe: runPipeline()
    Pipe ->> Log: Log start
    Pipe ->> HN: fetchTopStories()
    HN ->> Alg: Request stories
    Alg -->> HN: Story data
    HN -->> Pipe: stories[]
    loop For each story
        Pipe ->> HN: fetchCommentsForStory(storyId, max)
        HN ->> Alg: Request comments
        Alg -->> HN: Comment data
        HN -->> Pipe: comments[]
        Pipe ->> FS: Write {storyId}_data.json
    end
    Pipe ->> Log: Log HN fetch complete

    loop For each story with URL
        Pipe ->> Scr: scrapeArticle(story.url)
        Scr ->> Web: Request article HTML [via Workspace]
        alt Scraping Successful
            Web -->> Scr: HTML content
            Scr -->> Pipe: articleText: string
            Pipe ->> FS: Write {storyId}_article.txt
        else Scraping Failed / Skipped
            Web -->> Scr: Error / Non-HTML / Timeout
            Scr -->> Pipe: articleText: null
            Pipe ->> Log: Log scraping failure/skip
        end
    end
    Pipe ->> Log: Log scraping complete

    loop For each story
        alt Article content exists
            Pipe ->> Oll: generateSummary(prompt, articleText)
            Oll ->> Olm: POST /api/generate [article]
            Olm -->> Oll: Article Summary / Error
            Oll -->> Pipe: articleSummary: string | null
        else No article content
            Pipe -->> Pipe: Set articleSummary = null
        end
        alt Comments exist
            Pipe ->> Pipe: Format comments to text block
            Pipe ->> Oll: generateSummary(prompt, commentsText)
            Oll ->> Olm: POST /api/generate [comments]
            Olm -->> Oll: Discussion Summary / Error
            Oll -->> Pipe: discussionSummary: string | null
        else No comments
            Pipe -->> Pipe: Set discussionSummary = null
        end
        Pipe ->> FS: Write {storyId}_summary.json
    end
    Pipe ->> Log: Log summarization complete

    Pipe ->> Asm: assembleDigestData(dateDirPath)
    Asm ->> FS: Read _data.json, _summary.json files
    FS -->> Asm: File contents
    Asm -->> Pipe: digestData[]
    alt Digest data assembled
        Pipe ->> Tpl: renderDigestHtml(digestData, date)
        Tpl -->> Pipe: htmlContent: string
        Pipe ->> Snd: sendDigestEmail(subject, htmlContent)
        Snd ->> Cfg: Load email config
        Snd ->> SMTP: Send email
        SMTP -->> Snd: Success/Failure
        Snd -->> Pipe: success: boolean
        Pipe ->> Log: Log email result
    else Assembly failed / No data
        Pipe ->> Log: Log skipping email
    end
    Pipe ->> Log: Log finished

Infrastructure and Deployment Overview

  • Cloud Provider(s): N/A. Executes locally on the user's machine.
  • Core Services Used: N/A (relies on external Algolia API, local Ollama, target websites, SMTP provider).
  • Infrastructure as Code (IaC): N/A.
  • Deployment Strategy: Manual execution via CLI (npm run dev or npm run start after npm run build). No CI/CD pipeline required for MVP.
  • Environments: Single environment: local development machine.

Key Reference Documents

  • docs/prd.md
  • docs/epic1.md ... docs/epic5.md
  • docs/tech-stack.md
  • docs/project-structure.md
  • docs/data-models.md
  • docs/api-reference.md
  • docs/environment-vars.md
  • docs/coding-standards.md
  • docs/testing-strategy.md
  • docs/prompts.md

Change Log

Change Date Version Description Author
Initial draft 2025-05-04 0.1 Initial draft based on PRD 3-Architect