53 KiB
Epic 1 file
Epic 1: Project Initialization & Core Setup
Goal: Initialize the project using the "bmad-boilerplate", manage dependencies, setup .env and config loading, establish basic CLI entry point, setup basic logging and output directory structure. This provides the foundational setup for all subsequent development work.
Story List
Story 1.1: Initialize Project from Boilerplate
- User Story / Goal: As a developer, I want to set up the initial project structure using the
bmad-boilerplate, so that I have the standard tooling (TS, Jest, ESLint, Prettier), configurations, and scripts in place. - Detailed Requirements:
- Copy or clone the contents of the
bmad-boilerplateinto the new project's root directory. - Initialize a git repository in the project root directory (if not already done by cloning).
- Ensure the
.gitignorefile from the boilerplate is present. - Run
npm installto download and install alldevDependenciesspecified in the boilerplate'spackage.json. - Verify that the core boilerplate scripts (
lint,format,test,build) execute without errors on the initial codebase.
- Copy or clone the contents of the
- Acceptance Criteria (ACs):
- AC1: The project directory contains the files and structure from
bmad-boilerplate. - AC2: A
node_modulesdirectory exists and contains packages corresponding todevDependencies. - AC3:
npm run lintcommand completes successfully without reporting any linting errors. - AC4:
npm run formatcommand completes successfully, potentially making formatting changes according to Prettier rules. Running it a second time should result in no changes. - AC5:
npm run testcommand executes Jest successfully (it may report "no tests found" which is acceptable at this stage). - AC6:
npm run buildcommand executes successfully, creating adistdirectory containing compiled JavaScript output. - AC7: The
.gitignorefile exists and includes entries fornode_modules/,.env,dist/, etc. as specified in the boilerplate.
- AC1: The project directory contains the files and structure from
Story 1.2: Setup Environment Configuration
- User Story / Goal: As a developer, I want to establish the environment configuration mechanism using
.envfiles, so that secrets and settings (like output paths) can be managed outside of version control, following boilerplate conventions. - Detailed Requirements:
- Add a production dependency for loading
.envfiles (e.g.,dotenv). Runnpm install dotenv --save-prod(or similar library). - Verify the
.env.examplefile exists (from boilerplate). - Add an initial configuration variable
OUTPUT_DIR_PATH=./outputto.env.example. - Create the
.envfile locally by copying.env.example. PopulateOUTPUT_DIR_PATHif needed (can keep default). - Implement a utility module (e.g.,
src/config.ts) that loads environment variables from the.envfile at application startup. - The utility should export the loaded configuration values (initially just
OUTPUT_DIR_PATH). - Ensure the
.envfile is listed in.gitignoreand is not committed.
- Add a production dependency for loading
- Acceptance Criteria (ACs):
- AC1: The chosen
.envlibrary (e.g.,dotenv) is listed underdependenciesinpackage.jsonandpackage-lock.jsonis updated. - AC2: The
.env.examplefile exists, is tracked by git, and contains the lineOUTPUT_DIR_PATH=./output. - AC3: The
.envfile exists locally but is NOT tracked by git. - AC4: A configuration module (
src/config.tsor similar) exists and successfully loads theOUTPUT_DIR_PATHvalue from.envwhen the application starts. - AC5: The loaded
OUTPUT_DIR_PATHvalue is accessible within the application code.
- AC1: The chosen
Story 1.3: Implement Basic CLI Entry Point & Execution
- User Story / Goal: As a developer, I want a basic
src/index.tsentry point that can be executed via the boilerplate'sdevandstartscripts, providing a working foundation for the application logic. - Detailed Requirements:
- Create the main application entry point file at
src/index.ts. - Implement minimal code within
src/index.tsto:- Import the configuration loading mechanism (from Story 1.2).
- Log a simple startup message to the console (e.g., "BMad Hacker Daily Digest - Starting Up...").
- (Optional) Log the loaded
OUTPUT_DIR_PATHto verify config loading.
- Confirm execution using boilerplate scripts.
- Create the main application entry point file at
- Acceptance Criteria (ACs):
- AC1: The
src/index.tsfile exists. - AC2: Running
npm run devexecutessrc/index.tsviats-nodeand logs the startup message to the console. - AC3: Running
npm run buildsuccessfully compilessrc/index.ts(and any imports) into thedistdirectory. - AC4: Running
npm start(after a successful build) executes the compiled code fromdistand logs the startup message to the console.
- AC1: The
Story 1.4: Setup Basic Logging and Output Directory
- User Story / Goal: As a developer, I want a basic console logging mechanism and the dynamic creation of a date-stamped output directory, so that the application can provide execution feedback and prepare for storing data artifacts in subsequent epics.
- Detailed Requirements:
- Implement a simple, reusable logging utility module (e.g.,
src/logger.ts). Initially, it can wrapconsole.log,console.warn,console.error. - Refactor
src/index.tsto use thisloggerfor its startup message(s). - In
src/index.ts(or a setup function called by it):- Retrieve the
OUTPUT_DIR_PATHfrom the configuration (loaded in Story 1.2). - Determine the current date in 'YYYY-MM-DD' format.
- Construct the full path for the date-stamped subdirectory (e.g.,
${OUTPUT_DIR_PATH}/YYYY-MM-DD). - Check if the base output directory exists; if not, create it.
- Check if the date-stamped subdirectory exists; if not, create it recursively. Use Node.js
fsmodule (e.g.,fs.mkdirSync(path, { recursive: true })). - Log (using the logger) the full path of the output directory being used for the current run (e.g., "Output directory for this run: ./output/2025-05-04").
- Retrieve the
- Implement a simple, reusable logging utility module (e.g.,
- Acceptance Criteria (ACs):
- AC1: A logger utility module (
src/logger.tsor similar) exists and is used for console output insrc/index.ts. - AC2: Running
npm run devornpm startlogs the startup message via the logger. - AC3: Running the application creates the base output directory (e.g.,
./outputdefined in.env) if it doesn't already exist. - AC4: Running the application creates a date-stamped subdirectory (e.g.,
./output/2025-05-04) within the base output directory if it doesn't already exist. - AC5: The application logs a message indicating the full path to the date-stamped output directory created/used for the current execution.
- AC6: The application exits gracefully after performing these setup steps (for now).
- AC1: A logger utility module (
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 1 | 2-pm |
Epic 2 File
Epic 2: HN Data Acquisition & Persistence
Goal: Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally into the date-stamped output directory created in Epic 1. Implement a stage testing utility for fetching.
Story List
Story 2.1: Implement Algolia HN API Client
- User Story / Goal: As a developer, I want a dedicated client module to interact with the Algolia Hacker News Search API, so that fetching stories and comments is encapsulated, reusable, and uses the required native
WorkspaceAPI. - Detailed Requirements:
- Create a new module:
src/clients/algoliaHNClient.ts. - Implement an async function
WorkspaceTopStorieswithin the client:- Use native
Workspaceto call the Algolia HN Search API endpoint for front-page stories (e.g.,http://hn.algolia.com/api/v1/search?tags=front_page&hitsPerPage=10). AdjusthitsPerPageif needed to ensure 10 stories. - Parse the JSON response.
- Extract required metadata for each story:
objectID(use asstoryId),title,url(article URL),points,num_comments. Handle potential missingurlfield gracefully (log warning, maybe skip story later if URL needed). - Construct the
hnUrlfor each story (e.g.,https://news.ycombinator.com/item?id={storyId}). - Return an array of structured story objects.
- Use native
- Implement a separate async function
WorkspaceCommentsForStorywithin the client:- Accept
storyIdandmaxCommentslimit as arguments. - Use native
Workspaceto call the Algolia HN Search API endpoint for comments of a specific story (e.g.,http://hn.algolia.com/api/v1/search?tags=comment,story_{storyId}&hitsPerPage={maxComments}). - Parse the JSON response.
- Extract required comment data:
objectID(use ascommentId),comment_text,author,created_at. - Filter out comments where
comment_textis null or empty. Ensure only up tomaxCommentsare returned. - Return an array of structured comment objects.
- Accept
- Implement basic error handling using
try...catcharoundWorkspacecalls and checkresponse.okstatus. Log errors using the logger utility from Epic 1. - Define TypeScript interfaces/types for the expected structures of API responses (stories, comments) and the data returned by the client functions (e.g.,
Story,Comment).
- Create a new module:
- Acceptance Criteria (ACs):
- AC1: The module
src/clients/algoliaHNClient.tsexists and exportsWorkspaceTopStoriesandWorkspaceCommentsForStoryfunctions. - AC2: Calling
WorkspaceTopStoriesmakes a network request to the correct Algolia endpoint and returns a promise resolving to an array of 10Storyobjects containing the specified metadata. - AC3: Calling
WorkspaceCommentsForStorywith a validstoryIdandmaxCommentslimit makes a network request to the correct Algolia endpoint and returns a promise resolving to an array ofCommentobjects (up tomaxComments), filtering out empty ones. - AC4: Both functions use the native
WorkspaceAPI internally. - AC5: Network errors or non-successful API responses (e.g., status 4xx, 5xx) are caught and logged using the logger.
- AC6: Relevant TypeScript types (
Story,Comment, etc.) are defined and used within the client module.
- AC1: The module
Story 2.2: Integrate HN Data Fetching into Main Workflow
- User Story / Goal: As a developer, I want to integrate the HN data fetching logic into the main application workflow (
src/index.ts), so that running the app retrieves the top 10 stories and their comments after completing the setup from Epic 1. - Detailed Requirements:
- Modify the main execution flow in
src/index.ts(or a main async function called by it). - Import the
algoliaHNClientfunctions. - Import the configuration module to access
MAX_COMMENTS_PER_STORY. - After the Epic 1 setup (config load, logger init, output dir creation), call
WorkspaceTopStories(). - Log the number of stories fetched.
- Iterate through the array of fetched
Storyobjects. - For each
Story, callWorkspaceCommentsForStory(), passing thestory.storyIdand the configuredMAX_COMMENTS_PER_STORY. - Store the fetched comments within the corresponding
Storyobject in memory (e.g., add acomments: Comment[]property to theStoryobject). - Log progress using the logger utility (e.g., "Fetched 10 stories.", "Fetching up to X comments for story {storyId}...").
- Modify the main execution flow in
- Acceptance Criteria (ACs):
- AC1: Running
npm run devexecutes Epic 1 setup steps followed by fetching stories and then comments for each story. - AC2: Logs clearly show the start and successful completion of fetching stories, and the start of fetching comments for each of the 10 stories.
- AC3: The configured
MAX_COMMENTS_PER_STORYvalue is read from config and used in the calls toWorkspaceCommentsForStory. - AC4: After successful execution, story objects held in memory contain a nested array of fetched comment objects. (Can be verified via debugger or temporary logging).
- AC1: Running
Story 2.3: Persist Fetched HN Data Locally
- User Story / Goal: As a developer, I want to save the fetched HN stories (including their comments) to JSON files in the date-stamped output directory, so that the raw data is persisted locally for subsequent pipeline stages and debugging.
- Detailed Requirements:
- Define a consistent JSON structure for the output file content. Example:
{ storyId: "...", title: "...", url: "...", hnUrl: "...", points: ..., fetchedAt: "ISO_TIMESTAMP", comments: [{ commentId: "...", text: "...", author: "...", createdAt: "ISO_TIMESTAMP", ... }, ...] }. Include a timestamp for when the data was fetched. - Import Node.js
fs(specificallyfs.writeFileSync) andpathmodules. - In the main workflow (
src/index.ts), within the loop iterating through stories (after comments have been fetched and added to the story object in Story 2.2):- Get the full path to the date-stamped output directory (determined in Epic 1).
- Construct the filename for the story's data:
{storyId}_data.json. - Construct the full file path using
path.join(). - Serialize the complete story object (including comments and fetch timestamp) to a JSON string using
JSON.stringify(storyObject, null, 2)for readability. - Write the JSON string to the file using
fs.writeFileSync(). Use atry...catchblock for error handling.
- Log (using the logger) the successful persistence of each story's data file or any errors encountered during file writing.
- Define a consistent JSON structure for the output file content. Example:
- Acceptance Criteria (ACs):
- AC1: After running
npm run dev, the date-stamped output directory (e.g.,./output/YYYY-MM-DD/) contains exactly 10 files named{storyId}_data.json. - AC2: Each JSON file contains valid JSON representing a single story object, including its metadata, fetch timestamp, and an array of its fetched comments, matching the defined structure.
- AC3: The number of comments in each file's
commentsarray does not exceedMAX_COMMENTS_PER_STORY. - AC4: Logs indicate that saving data to a file was attempted for each story, reporting success or specific file writing errors.
- AC1: After running
Story 2.4: Implement Stage Testing Utility for HN Fetching
- User Story / Goal: As a developer, I want a separate, executable script that only performs the HN data fetching and persistence, so I can test and trigger this stage independently of the full pipeline.
- Detailed Requirements:
- Create a new standalone script file:
src/stages/fetch_hn_data.ts. - This script should perform the essential setup required for this stage: initialize logger, load configuration (
.env), determine and create output directory (reuse or replicate logic from Epic 1 /src/index.ts). - The script should then execute the core logic of fetching stories via
algoliaHNClient.fetchTopStories, fetching comments viaalgoliaHNClient.fetchCommentsForStory(using loaded config for limit), and persisting the results to JSON files usingfs.writeFileSync(replicating logic from Story 2.3). - The script should log its progress using the logger utility.
- Add a new script command to
package.jsonunder"scripts":"stage:fetch": "ts-node src/stages/fetch_hn_data.ts".
- Create a new standalone script file:
- Acceptance Criteria (ACs):
- AC1: The file
src/stages/fetch_hn_data.tsexists. - AC2: The script
stage:fetchis defined inpackage.json'sscriptssection. - AC3: Running
npm run stage:fetchexecutes successfully, performing only the setup, fetch, and persist steps. - AC4: Running
npm run stage:fetchcreates the same 10{storyId}_data.jsonfiles in the correct date-stamped output directory as running the mainnpm run devcommand (at the current state of development). - AC5: Logs generated by
npm run stage:fetchreflect only the fetching and persisting steps, not subsequent pipeline stages.
- AC1: The file
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 2 | 2-pm |
Epic 3 File
Epic 3: Article Scraping & Persistence
Goal: Implement a best-effort article scraping mechanism to fetch and extract plain text content from the external URLs associated with fetched HN stories. Handle failures gracefully and persist successfully scraped text locally. Implement a stage testing utility for scraping.
Story List
Story 3.1: Implement Basic Article Scraper Module
- User Story / Goal: As a developer, I want a module that attempts to fetch HTML from a URL and extract the main article text using basic methods, handling common failures gracefully, so article content can be prepared for summarization.
- Detailed Requirements:
- Create a new module:
src/scraper/articleScraper.ts. - Add a suitable HTML parsing/extraction library dependency (e.g.,
@extractus/article-extractorrecommended for simplicity, orcheeriofor more control). Runnpm install @extractus/article-extractor --save-prod(or chosen alternative). - Implement an async function
scrapeArticle(url: string): Promise<string | null>within the module. - Inside the function:
- Use native
Workspaceto retrieve content from theurl. Set a reasonable timeout (e.g., 10-15 seconds). Include aUser-Agentheader to mimic a browser. - Handle potential
Workspaceerrors (network errors, timeouts) usingtry...catch. - Check the
response.okstatus. If not okay, log error and returnnull. - Check the
Content-Typeheader of the response. If it doesn't indicate HTML (e.g., does not includetext/html), log warning and returnnull. - If HTML is received, attempt to extract the main article text using the chosen library (
article-extractorpreferred). - Wrap the extraction logic in a
try...catchto handle library-specific errors. - Return the extracted plain text string if successful. Ensure it's just text, not HTML markup.
- Return
nullif extraction fails or results in empty content.
- Use native
- Log all significant events, errors, or reasons for returning null (e.g., "Scraping URL...", "Fetch failed:", "Non-HTML content type:", "Extraction failed:", "Successfully extracted text") using the logger utility.
- Define TypeScript types/interfaces as needed.
- Create a new module:
- Acceptance Criteria (ACs):
- AC1: The
articleScraper.tsmodule exists and exports thescrapeArticlefunction. - AC2: The chosen scraping library (e.g.,
@extractus/article-extractor) is added todependenciesinpackage.json. - AC3:
scrapeArticleuses nativeWorkspacewith a timeout and User-Agent header. - AC4:
scrapeArticlecorrectly handles fetch errors, non-OK responses, and non-HTML content types by logging and returningnull. - AC5:
scrapeArticleuses the chosen library to attempt text extraction from valid HTML content. - AC6:
scrapeArticlereturns the extracted plain text on success, andnullon any failure (fetch, non-HTML, extraction error, empty result). - AC7: Relevant logs are produced for success, failure modes, and errors encountered during the process.
- AC1: The
Story 3.2: Integrate Article Scraping into Main Workflow
- User Story / Goal: As a developer, I want to integrate the article scraper into the main workflow (
src/index.ts), attempting to scrape the article for each HN story that has a valid URL, after fetching its data. - Detailed Requirements:
- Modify the main execution flow in
src/index.ts. - Import the
scrapeArticlefunction fromsrc/scraper/articleScraper.ts. - Within the main loop iterating through the fetched stories (after comments are fetched in Epic 2):
- Check if
story.urlexists and appears to be a valid HTTP/HTTPS URL. A simple check for starting withhttp://orhttps://is sufficient. - If the URL is missing or invalid, log a warning ("Skipping scraping for story {storyId}: Missing or invalid URL") and proceed to the next story's processing step.
- If a valid URL exists, log ("Attempting to scrape article for story {storyId} from {story.url}").
- Call
await scrapeArticle(story.url). - Store the result (the extracted text string or
null) in memory, associated with the story object (e.g., add propertyarticleContent: string | null). - Log the outcome clearly (e.g., "Successfully scraped article for story {storyId}", "Failed to scrape article for story {storyId}").
- Check if
- Modify the main execution flow in
- Acceptance Criteria (ACs):
- AC1: Running
npm run devexecutes Epic 1 & 2 steps, and then attempts article scraping for stories with valid URLs. - AC2: Stories with missing or invalid URLs are skipped, and a corresponding log message is generated.
- AC3: For stories with valid URLs, the
scrapeArticlefunction is called. - AC4: Logs clearly indicate the start and success/failure outcome of the scraping attempt for each relevant story.
- AC5: Story objects held in memory after this stage contain an
articleContentproperty holding the scraped text (string) ornullif scraping was skipped or failed.
- AC1: Running
Story 3.3: Persist Scraped Article Text Locally
- User Story / Goal: As a developer, I want to save successfully scraped article text to a separate local file for each story, so that the text content is available as input for the summarization stage.
- Detailed Requirements:
- Import Node.js
fsandpathmodules if not already present insrc/index.ts. - In the main workflow (
src/index.ts), immediately after a successful call toscrapeArticlefor a story (where the result is a non-null string):- Retrieve the full path to the current date-stamped output directory.
- Construct the filename:
{storyId}_article.txt. - Construct the full file path using
path.join(). - Get the successfully scraped article text string (
articleContent). - Use
fs.writeFileSync(fullPath, articleContent, 'utf-8')to save the text to the file. Wrap intry...catchfor file system errors. - Log the successful saving of the file (e.g., "Saved scraped article text to {filename}") or any file writing errors encountered.
- Ensure no
_article.txtfile is created ifscrapeArticlereturnednull(due to skipping or failure).
- Import Node.js
- Acceptance Criteria (ACs):
- AC1: After running
npm run dev, the date-stamped output directory contains_article.txtfiles only for those stories wherescrapeArticlesucceeded and returned text content. - AC2: The name of each article text file is
{storyId}_article.txt. - AC3: The content of each
_article.txtfile is the plain text string returned byscrapeArticle. - AC4: Logs confirm the successful writing of each
_article.txtfile or report specific file writing errors. - AC5: No empty
_article.txtfiles are created. Files only exist if scraping was successful.
- AC1: After running
Story 3.4: Implement Stage Testing Utility for Scraping
- User Story / Goal: As a developer, I want a separate script/command to test the article scraping logic using HN story data from local files, allowing independent testing and debugging of the scraper.
- Detailed Requirements:
- Create a new standalone script file:
src/stages/scrape_articles.ts. - Import necessary modules:
fs,path,logger,config,scrapeArticle. - The script should:
- Initialize the logger.
- Load configuration (to get
OUTPUT_DIR_PATH). - Determine the target date-stamped directory path (e.g.,
${OUTPUT_DIR_PATH}/YYYY-MM-DD, using the current date or potentially an optional CLI argument). Ensure this directory exists. - Read the directory contents and identify all
{storyId}_data.jsonfiles. - For each
_data.jsonfile found:- Read and parse the JSON content.
- Extract the
storyIdandurl. - If a valid
urlexists, callawait scrapeArticle(url). - If scraping succeeds (returns text), save the text to
{storyId}_article.txtin the same directory (using logic from Story 3.3). Overwrite if the file exists. - Log the progress and outcome (skip/success/fail) for each story processed.
- Add a new script command to
package.json:"stage:scrape": "ts-node src/stages/scrape_articles.ts". Consider adding argument parsing later if needed to specify a date/directory.
- Create a new standalone script file:
- Acceptance Criteria (ACs):
- AC1: The file
src/stages/scrape_articles.tsexists. - AC2: The script
stage:scrapeis defined inpackage.json. - AC3: Running
npm run stage:scrape(assuming a directory with_data.jsonfiles exists from a previousstage:fetchrun) reads these files. - AC4: The script calls
scrapeArticlefor stories with valid URLs found in the JSON files. - AC5: The script creates/updates
{storyId}_article.txtfiles in the target directory corresponding to successfully scraped articles. - AC6: The script logs its actions (reading files, attempting scraping, saving results) for each story ID processed.
- AC7: The script operates solely based on local
_data.jsonfiles and fetching from external article URLs; it does not call the Algolia HN API.
- AC1: The file
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 3 | 2-pm |
Epic 4 File
Epic 4: LLM Summarization & Persistence
Goal: Integrate with the configured local Ollama instance to generate summaries for successfully scraped article text and fetched comments. Persist these summaries locally. Implement a stage testing utility for summarization.
Story List
Story 4.1: Implement Ollama Client Module
- User Story / Goal: As a developer, I want a client module to interact with the configured Ollama API endpoint via HTTP, handling requests and responses for text generation, so that summaries can be generated programmatically.
- Detailed Requirements:
- Prerequisite: Ensure a local Ollama instance is installed and running, accessible via the URL defined in
.env(OLLAMA_ENDPOINT_URL), and that the model specified in.env(OLLAMA_MODEL) has been downloaded (e.g., viaollama pull model_name). Instructions for this setup should be in the project README. - Create a new module:
src/clients/ollamaClient.ts. - Implement an async function
generateSummary(promptTemplate: string, content: string): Promise<string | null>. (Note: Parameter name changed for clarity) - Add configuration variables
OLLAMA_ENDPOINT_URL(e.g.,http://localhost:11434) andOLLAMA_MODEL(e.g.,llama3) to.env.example. Ensure they are loaded via the config module (src/utils/config.ts). Update local.envwith actual values. Add optionalOLLAMA_TIMEOUT_MSto.env.examplewith a default like120000. - Inside
generateSummary:- Construct the full prompt string using the
promptTemplateand the providedcontent(e.g., replacing a placeholder like{Content Placeholder}in the template, or simple concatenation if templates are basic). - Construct the Ollama API request payload (JSON):
{ model: configured_model, prompt: full_prompt, stream: false }. Refer to Ollama/api/generatedocumentation anddocs/data-models.md. - Use native
Workspaceto send a POST request to the configured Ollama endpoint +/api/generate. Set appropriate headers (Content-Type: application/json). Use the configuredOLLAMA_TIMEOUT_MSor a reasonable default (e.g., 2 minutes). - Handle
Workspaceerrors (network, timeout) usingtry...catch. - Check
response.ok. If not OK, log the status/error and returnnull. - Parse the JSON response from Ollama. Extract the generated text (typically in the
responsefield). Refer todocs/data-models.md. - Check for potential errors within the Ollama response structure itself (e.g., an
errorfield). - Return the extracted summary string on success. Return
nullon any failure. - Log key events: initiating request (mention model), receiving response, success, failure reasons, potentially request/response time using the logger.
- Construct the full prompt string using the
- Define necessary TypeScript types for the Ollama request payload and expected response structure in
src/types/ollama.ts(referenced indocs/data-models.md).
- Prerequisite: Ensure a local Ollama instance is installed and running, accessible via the URL defined in
- Acceptance Criteria (ACs):
- AC1: The
ollamaClient.tsmodule exists and exportsgenerateSummary. - AC2:
OLLAMA_ENDPOINT_URLandOLLAMA_MODELare defined in.env.example, loaded via config, and used by the client. OptionalOLLAMA_TIMEOUT_MSis handled. - AC3:
generateSummarysends a correctly formatted POST request (model, full prompt based on template and content, stream:false) to the configured Ollama endpoint/path using nativeWorkspace. - AC4: Network errors, timeouts, and non-OK API responses are handled gracefully, logged, and result in a
nullreturn (given the Prerequisite Ollama service is running). - AC5: A successful Ollama response is parsed correctly, the generated text is extracted, and returned as a string.
- AC6: Unexpected Ollama response formats or internal errors (e.g.,
{"error": "..."}) are handled, logged, and result in anullreturn. - AC7: Logs provide visibility into the client's interaction with the Ollama API.
- AC1: The
Story 4.2: Define Summarization Prompts
- User Story / Goal: As a developer, I want standardized base prompts for generating article summaries and HN discussion summaries documented centrally, ensuring consistent instructions are sent to the LLM.
- Detailed Requirements:
- Define two standardized base prompts (
ARTICLE_SUMMARY_PROMPT,DISCUSSION_SUMMARY_PROMPT) and document them indocs/prompts.md. - Ensure these prompts are accessible within the application code, for example, by defining them as exported constants in a dedicated module like
src/utils/prompts.ts, which reads from or mirrors the content indocs/prompts.md.
- Define two standardized base prompts (
- Acceptance Criteria (ACs):
- AC1: The
ARTICLE_SUMMARY_PROMPTtext is defined indocs/prompts.mdwith appropriate instructional content. - AC2: The
DISCUSSION_SUMMARY_PROMPTtext is defined indocs/prompts.mdwith appropriate instructional content. - AC3: The prompt texts documented in
docs/prompts.mdare available as constants or variables within the application code (e.g., viasrc/utils/prompts.ts) for use by the Ollama client integration.
- AC1: The
Story 4.3: Integrate Summarization into Main Workflow
- User Story / Goal: As a developer, I want to integrate the Ollama client into the main workflow to generate summaries for each story's scraped article text (if available) and fetched comments, using centrally defined prompts and handling potential comment length limits.
- Detailed Requirements:
- Modify the main execution flow in
src/index.tsorsrc/core/pipeline.ts. - Import
ollamaClient.generateSummaryand the prompt constants/variables (e.g., fromsrc/utils/prompts.ts, which reflectdocs/prompts.md). - Load the optional
MAX_COMMENT_CHARS_FOR_SUMMARYconfiguration value from.envvia the config utility. - Within the main loop iterating through stories (after article scraping/persistence in Epic 3):
- Article Summary Generation:
- Check if the
storyobject has non-nullarticleContent. - If yes: log "Attempting article summarization for story {storyId}", call
await generateSummary(ARTICLE_SUMMARY_PROMPT, story.articleContent), store the result (string or null) asstory.articleSummary, log success/failure. - If no: set
story.articleSummary = null, log "Skipping article summarization: No content".
- Check if the
- Discussion Summary Generation:
- Check if the
storyobject has a non-emptycommentsarray. - If yes:
- Format the
story.commentsarray into a single text block suitable for the LLM prompt (e.g., concatenatingcomment.textwith separators like---). - Check truncation limit: If
MAX_COMMENT_CHARS_FOR_SUMMARYis configured to a positive number and theformattedCommentsTextlength exceeds it, truncateformattedCommentsTextto the limit and log a warning: "Comment text truncated to {limit} characters for summarization for story {storyId}". - Log "Attempting discussion summarization for story {storyId}".
- Call
await generateSummary(DISCUSSION_SUMMARY_PROMPT, formattedCommentsText). (Pass the potentially truncated text) - Store the result (string or null) as
story.discussionSummary. Log success/failure.
- Format the
- If no: set
story.discussionSummary = null, log "Skipping discussion summarization: No comments".
- Check if the
- Modify the main execution flow in
- Acceptance Criteria (ACs):
- AC1: Running
npm run devexecutes steps from Epics 1-3, then attempts summarization using the Ollama client. - AC2: Article summary is attempted only if
articleContentexists for a story. - AC3: Discussion summary is attempted only if
commentsexist for a story. - AC4:
generateSummaryis called with the correct prompts (sourced consistently withdocs/prompts.md) and corresponding content (article text or formatted/potentially truncated comments). - AC5: If
MAX_COMMENT_CHARS_FOR_SUMMARYis set and comment text exceeds it, the text passed togenerateSummaryis truncated, and a warning is logged. - AC6: Logs clearly indicate the start, success, or failure (including null returns from the client) for both article and discussion summarization attempts per story.
- AC7: Story objects in memory now contain
articleSummary(string/null) anddiscussionSummary(string/null) properties.
- AC1: Running
Story 4.4: Persist Generated Summaries Locally
(No changes needed for this story based on recent decisions)
- User Story / Goal: As a developer, I want to save the generated article and discussion summaries (or null placeholders) to a local JSON file for each story, making them available for the email assembly stage.
- Detailed Requirements:
- Define the structure for the summary output file:
{storyId}_summary.json. Content example:{ "storyId": "...", "articleSummary": "...", "discussionSummary": "...", "summarizedAt": "ISO_TIMESTAMP" }. Note thatarticleSummaryanddiscussionSummarycan benull. - Import
fsandpathinsrc/index.tsorsrc/core/pipeline.tsif needed. - In the main workflow loop, after both summarization attempts (article and discussion) for a story are complete:
- Create a summary result object containing
storyId,articleSummary(string or null),discussionSummary(string or null), and the current ISO timestamp (new Date().toISOString()). Add this timestamp to the in-memorystoryobject as well (story.summarizedAt). - Get the full path to the date-stamped output directory.
- Construct the filename:
{storyId}_summary.json. - Construct the full file path using
path.join(). - Serialize the summary result object to JSON (
JSON.stringify(..., null, 2)). - Use
fs.writeFileSyncto save the JSON to the file, wrapping intry...catch. - Log the successful saving of the summary file or any file writing errors.
- Create a summary result object containing
- Define the structure for the summary output file:
- Acceptance Criteria (ACs):
- AC1: After running
npm run dev, the date-stamped output directory contains 10 files named{storyId}_summary.json. - AC2: Each
_summary.jsonfile contains valid JSON adhering to the defined structure. - AC3: The
articleSummaryfield contains the generated summary string if successful, otherwisenull. - AC4: The
discussionSummaryfield contains the generated summary string if successful, otherwisenull. - AC5: A valid ISO timestamp is present in the
summarizedAtfield. - AC6: Logs confirm successful writing of each summary file or report file system errors.
- AC1: After running
Story 4.5: Implement Stage Testing Utility for Summarization
(Changes needed to reflect prompt sourcing and optional truncation)
- User Story / Goal: As a developer, I want a separate script/command to test the LLM summarization logic using locally persisted data (HN comments, scraped article text), allowing independent testing of prompts and Ollama interaction.
- Detailed Requirements:
- Create a new standalone script file:
src/stages/summarize_content.ts. - Import necessary modules:
fs,path,logger,config,ollamaClient, prompt constants (e.g., fromsrc/utils/prompts.ts). - The script should:
- Initialize logger, load configuration (Ollama endpoint/model, output dir, optional
MAX_COMMENT_CHARS_FOR_SUMMARY). - Determine target date-stamped directory path.
- Find all
{storyId}_data.jsonfiles in the directory. - For each
storyIdfound:- Read
{storyId}_data.jsonto get comments. Format them into a single text block. - Attempt to read
{storyId}_article.txt. Handle file-not-found gracefully. Store content or null. - Call
ollamaClient.generateSummaryfor article text (if not null) usingARTICLE_SUMMARY_PROMPT. - Apply truncation logic: If comments exist, check
MAX_COMMENT_CHARS_FOR_SUMMARYand truncate the formatted comment text block if needed, logging a warning. - Call
ollamaClient.generateSummaryfor formatted comments (if comments exist) usingDISCUSSION_SUMMARY_PROMPT(passing potentially truncated text). - Construct the summary result object (with summaries or nulls, and timestamp).
- Save the result object to
{storyId}_summary.jsonin the same directory (using logic from Story 4.4), overwriting if exists. - Log progress (reading files, calling Ollama, truncation warnings, saving results) for each story ID.
- Read
- Initialize logger, load configuration (Ollama endpoint/model, output dir, optional
- Add script to
package.json:"stage:summarize": "ts-node src/stages/summarize_content.ts".
- Create a new standalone script file:
- Acceptance Criteria (ACs):
- AC1: The file
src/stages/summarize_content.tsexists. - AC2: The script
stage:summarizeis defined inpackage.json. - AC3: Running
npm run stage:summarize(afterstage:fetchandstage:scraperuns) reads_data.jsonand attempts to read_article.txtfiles from the target directory. - AC4: The script calls the
ollamaClientwith correct prompts (sourced consistently withdocs/prompts.md) and content derived only from the local files (requires Ollama service running per Story 4.1 prerequisite). - AC5: If
MAX_COMMENT_CHARS_FOR_SUMMARYis set and applicable, comment text is truncated before calling the client, and a warning is logged. - AC6: The script creates/updates
{storyId}_summary.jsonfiles in the target directory reflecting the results of the Ollama calls (summaries or nulls). - AC7: Logs show the script processing each story ID found locally, interacting with Ollama, and saving results.
- AC8: The script does not call Algolia API or the article scraper module.
- AC1: The file
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Integrate prompts.md refs | 2025-05-04 | 0.3 | Updated stories 4.2, 4.3, 4.5 | 3-Architect |
| Added Ollama Prereq Note | 2025-05-04 | 0.2 | Added note about local Ollama setup | 2-pm |
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 4 | 2-pm |
Epic 5 File
Epic 5: Digest Assembly & Email Dispatch
Goal: Assemble the collected story data and summaries from local files, format them into a readable HTML email digest, and send the email using Nodemailer with configured credentials. Implement a stage testing utility for emailing with a dry-run option.
Story List
Story 5.1: Implement Email Content Assembler
- User Story / Goal: As a developer, I want a module that reads the persisted story metadata (
_data.json) and summaries (_summary.json) from a specified directory, consolidating the necessary information needed to render the email digest. - Detailed Requirements:
- Create a new module:
src/email/contentAssembler.ts. - Define a TypeScript type/interface
DigestDatarepresenting the data needed per story for the email template:{ storyId: string, title: string, hnUrl: string, articleUrl: string | null, articleSummary: string | null, discussionSummary: string | null }. - Implement an async function
assembleDigestData(dateDirPath: string): Promise<DigestData[]>. - The function should:
- Use Node.js
fsto read the contents of thedateDirPath. - Identify all files matching the pattern
{storyId}_data.json. - For each
storyIdfound:- Read and parse the
{storyId}_data.jsonfile. Extracttitle,hnUrl, andurl(use asarticleUrl). Handle potential file read/parse errors gracefully (log and skip story). - Attempt to read and parse the corresponding
{storyId}_summary.jsonfile. Handle file-not-found or parse errors gracefully (treatarticleSummaryanddiscussionSummaryasnull). - Construct a
DigestDataobject for the story, including the extracted metadata and summaries (or nulls).
- Read and parse the
- Collect all successfully constructed
DigestDataobjects into an array. - Return the array. It should ideally contain 10 items if all previous stages succeeded.
- Use Node.js
- Log progress (e.g., "Assembling digest data from directory...", "Processing story {storyId}...") and any errors encountered during file processing using the logger.
- Create a new module:
- Acceptance Criteria (ACs):
- AC1: The
contentAssembler.tsmodule exists and exportsassembleDigestDataand theDigestDatatype. - AC2:
assembleDigestDatacorrectly reads_data.jsonfiles from the provided directory path. - AC3: It attempts to read corresponding
_summary.jsonfiles, correctly handling cases where the summary file might be missing or unparseable (resulting in null summaries for that story). - AC4: The function returns a promise resolving to an array of
DigestDataobjects, populated with data extracted from the files. - AC5: Errors during file reading or JSON parsing are logged, and the function returns data for successfully processed stories.
- AC1: The
Story 5.2: Create HTML Email Template & Renderer
- User Story / Goal: As a developer, I want a basic HTML email template and a function to render it with the assembled digest data, producing the final HTML content for the email body.
- Detailed Requirements:
- Define the HTML structure. This can be done using template literals within a function or potentially using a simple template file (e.g.,
src/email/templates/digestTemplate.html) andfs.readFileSync. Template literals are simpler for MVP. - Create a function
renderDigestHtml(data: DigestData[], digestDate: string): string(e.g., insrc/email/contentAssembler.tsor a newtemplater.ts). - The function should generate an HTML string with:
- A suitable title in the body (e.g.,
<h1>Hacker News Top 10 Summaries for ${digestDate}</h1>). - A loop through the
dataarray. - For each
storyindata:- Display
<h2><a href="${story.articleUrl || story.hnUrl}">${story.title}</a></h2>. - Display
<p><a href="${story.hnUrl}">View HN Discussion</a></p>. - Conditionally display
<h3>Article Summary</h3><p>${story.articleSummary}</p>only ifstory.articleSummaryis not null/empty. - Conditionally display
<h3>Discussion Summary</h3><p>${story.discussionSummary}</p>only ifstory.discussionSummaryis not null/empty. - Include a separator (e.g.,
<hr style="margin-top: 20px; margin-bottom: 20px;">).
- Display
- A suitable title in the body (e.g.,
- Use basic inline CSS for minimal styling (margins, etc.) to ensure readability. Avoid complex layouts.
- Return the complete HTML document as a string.
- Define the HTML structure. This can be done using template literals within a function or potentially using a simple template file (e.g.,
- Acceptance Criteria (ACs):
- AC1: A function
renderDigestHtmlexists that accepts the digest data array and a date string. - AC2: The function returns a single, complete HTML string.
- AC3: The generated HTML includes a title with the date and correctly iterates through the story data.
- AC4: For each story, the HTML displays the linked title, HN link, and conditionally displays the article and discussion summaries with headings.
- AC5: Basic separators and margins are used for readability. The HTML is simple and likely to render reasonably in most email clients.
- AC1: A function
Story 5.3: Implement Nodemailer Email Sender
- User Story / Goal: As a developer, I want a module to send the generated HTML email using Nodemailer, configured with credentials stored securely in the environment file.
- Detailed Requirements:
- Add Nodemailer dependencies:
npm install nodemailer @types/nodemailer --save-prod. - Add required configuration variables to
.env.example(and local.env):EMAIL_HOST,EMAIL_PORT(e.g., 587),EMAIL_SECURE(e.g.,falsefor STARTTLS on 587,truefor 465),EMAIL_USER,EMAIL_PASS,EMAIL_FROM(e.g.,"Your Name <you@example.com>"),EMAIL_RECIPIENTS(comma-separated list). - Create a new module:
src/email/emailSender.ts. - Implement an async function
sendDigestEmail(subject: string, htmlContent: string): Promise<boolean>. - Inside the function:
- Load the
EMAIL_*variables from the config module. - Create a Nodemailer transporter using
nodemailer.createTransportwith the loaded config (host, port, secure flag, auth: { user, pass }). - Verify transporter configuration using
transporter.verify()(optional but recommended). Log verification success/failure. - Parse the
EMAIL_RECIPIENTSstring into an array or comma-separated string suitable for thetofield. - Define the
mailOptions:{ from: EMAIL_FROM, to: parsedRecipients, subject: subject, html: htmlContent }. - Call
await transporter.sendMail(mailOptions). - If
sendMailsucceeds, log the success message including themessageIdfrom the result. Returntrue. - If
sendMailfails (throws error), log the error using the logger. Returnfalse.
- Load the
- Add Nodemailer dependencies:
- Acceptance Criteria (ACs):
- AC1:
nodemailerand@types/nodemailerdependencies are added. - AC2:
EMAIL_*variables are defined in.env.exampleand loaded from config. - AC3:
emailSender.tsmodule exists and exportssendDigestEmail. - AC4:
sendDigestEmailcorrectly creates a Nodemailer transporter using configuration from.env. Transporter verification is attempted (optional AC). - AC5: The
tofield is correctly populated based onEMAIL_RECIPIENTS. - AC6:
transporter.sendMailis called with correctfrom,to,subject, andhtmloptions. - AC7: Email sending success (including message ID) or failure is logged clearly.
- AC8: The function returns
trueon successful sending,falseotherwise.
- AC1:
Story 5.4: Integrate Email Assembly and Sending into Main Workflow
- User Story / Goal: As a developer, I want the main application workflow (
src/index.ts) to orchestrate the final steps: assembling digest data, rendering the HTML, and triggering the email send after all previous stages are complete. - Detailed Requirements:
- Modify the main execution flow in
src/index.ts. - Import
assembleDigestData,renderDigestHtml,sendDigestEmail. - Execute these steps after the main loop (where stories are fetched, scraped, summarized, and persisted) completes:
- Log "Starting final digest assembly and email dispatch...".
- Determine the path to the current date-stamped output directory.
- Call
const digestData = await assembleDigestData(dateDirPath). - Check if
digestDataarray is not empty.- If yes:
- Get the current date string (e.g., 'YYYY-MM-DD').
const htmlContent = renderDigestHtml(digestData, currentDate).const subject = \BMad Hacker Daily Digest - ${currentDate}``.const emailSent = await sendDigestEmail(subject, htmlContent).- Log the final outcome based on
emailSent("Digest email sent successfully." or "Failed to send digest email.").
- If no (
digestDatais empty or assembly failed):- Log an error: "Failed to assemble digest data or no data found. Skipping email."
- If yes:
- Log "BMad Hacker Daily Digest process finished."
- Modify the main execution flow in
- Acceptance Criteria (ACs):
- AC1: Running
npm run devexecutes all stages (Epics 1-4) and then proceeds to email assembly and sending. - AC2:
assembleDigestDatais called correctly with the output directory path after other processing is done. - AC3: If data is assembled,
renderDigestHtmlandsendDigestEmailare called with the correct data, subject, and HTML. - AC4: The final success or failure of the email sending step is logged.
- AC5: If
assembleDigestDatareturns no data, email sending is skipped, and an appropriate message is logged. - AC6: The application logs a final completion message.
- AC1: Running
Story 5.5: Implement Stage Testing Utility for Emailing
- User Story / Goal: As a developer, I want a separate script/command to test the email assembly, rendering, and sending logic using persisted local data, including a crucial
--dry-runoption to prevent accidental email sending during tests. - Detailed Requirements:
- Add
yargsdependency for argument parsing:npm install yargs @types/yargs --save-dev. - Create a new standalone script file:
src/stages/send_digest.ts. - Import necessary modules:
fs,path,logger,config,assembleDigestData,renderDigestHtml,sendDigestEmail,yargs. - Use
yargsto parse command-line arguments, specifically looking for a--dry-runboolean flag (defaulting tofalse). Allow an optional argument for specifying the date-stamped directory, otherwise default to current date. - The script should:
- Initialize logger, load config.
- Determine the target date-stamped directory path (from arg or default). Log the target directory.
- Call
await assembleDigestData(dateDirPath). - If data is assembled and not empty:
- Determine the date string for the subject/title.
- Call
renderDigestHtml(digestData, dateString)to get HTML. - Construct the subject string.
- Check the
dryRunflag:- If
true: Log "DRY RUN enabled. Skipping actual email send.". Log the subject. Save thehtmlContentto a file in the target directory (e.g.,_digest_preview.html). Log that the preview file was saved. - If
false: Log "Live run: Attempting to send email...". Callawait sendDigestEmail(subject, htmlContent). Log success/failure based on the return value.
- If
- If data assembly fails or is empty, log the error.
- Add script to
package.json:"stage:email": "ts-node src/stages/send_digest.ts --". The--allows passing arguments like--dry-run.
- Add
- Acceptance Criteria (ACs):
- AC1: The file
src/stages/send_digest.tsexists.yargsdependency is added. - AC2: The script
stage:emailis defined inpackage.jsonallowing arguments. - AC3: Running
npm run stage:email -- --dry-runreads local data, renders HTML, logs the intent, saves_digest_preview.htmllocally, and does not callsendDigestEmail. - AC4: Running
npm run stage:email(without--dry-run) reads local data, renders HTML, and does callsendDigestEmail, logging the outcome. - AC5: The script correctly identifies and acts upon the
--dry-runflag. - AC6: Logs clearly distinguish between dry runs and live runs and report success/failure.
- AC7: The script operates using only local files and the email configuration/service; it does not invoke prior pipeline stages (Algolia, scraping, Ollama).
- AC1: The file
Change Log
| Change | Date | Version | Description | Author |
|---|---|---|---|---|
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 5 | 2-pm |