99 lines
9.3 KiB
Plaintext
99 lines
9.3 KiB
Plaintext
# Epic 2: HN Data Acquisition & Persistence
|
|
|
|
**Goal:** Implement fetching top 10 stories and their comments (respecting limits) from Algolia HN API, and persist this raw data locally into the date-stamped output directory created in Epic 1. Implement a stage testing utility for fetching.
|
|
|
|
## Story List
|
|
|
|
### Story 2.1: Implement Algolia HN API Client
|
|
|
|
- **User Story / Goal:** As a developer, I want a dedicated client module to interact with the Algolia Hacker News Search API, so that fetching stories and comments is encapsulated, reusable, and uses the required native `Workspace` API.
|
|
- **Detailed Requirements:**
|
|
- Create a new module: `src/clients/algoliaHNClient.ts`.
|
|
- Implement an async function `WorkspaceTopStories` within the client:
|
|
- Use native `Workspace` to call the Algolia HN Search API endpoint for front-page stories (e.g., `http://hn.algolia.com/api/v1/search?tags=front_page&hitsPerPage=10`). Adjust `hitsPerPage` if needed to ensure 10 stories.
|
|
- Parse the JSON response.
|
|
- Extract required metadata for each story: `objectID` (use as `storyId`), `title`, `url` (article URL), `points`, `num_comments`. Handle potential missing `url` field gracefully (log warning, maybe skip story later if URL needed).
|
|
- Construct the `hnUrl` for each story (e.g., `https://news.ycombinator.com/item?id={storyId}`).
|
|
- Return an array of structured story objects.
|
|
- Implement a separate async function `WorkspaceCommentsForStory` within the client:
|
|
- Accept `storyId` and `maxComments` limit as arguments.
|
|
- Use native `Workspace` to call the Algolia HN Search API endpoint for comments of a specific story (e.g., `http://hn.algolia.com/api/v1/search?tags=comment,story_{storyId}&hitsPerPage={maxComments}`).
|
|
- Parse the JSON response.
|
|
- Extract required comment data: `objectID` (use as `commentId`), `comment_text`, `author`, `created_at`.
|
|
- Filter out comments where `comment_text` is null or empty. Ensure only up to `maxComments` are returned.
|
|
- Return an array of structured comment objects.
|
|
- Implement basic error handling using `try...catch` around `Workspace` calls and check `response.ok` status. Log errors using the logger utility from Epic 1.
|
|
- Define TypeScript interfaces/types for the expected structures of API responses (stories, comments) and the data returned by the client functions (e.g., `Story`, `Comment`).
|
|
- **Acceptance Criteria (ACs):**
|
|
- AC1: The module `src/clients/algoliaHNClient.ts` exists and exports `WorkspaceTopStories` and `WorkspaceCommentsForStory` functions.
|
|
- AC2: Calling `WorkspaceTopStories` makes a network request to the correct Algolia endpoint and returns a promise resolving to an array of 10 `Story` objects containing the specified metadata.
|
|
- AC3: Calling `WorkspaceCommentsForStory` with a valid `storyId` and `maxComments` limit makes a network request to the correct Algolia endpoint and returns a promise resolving to an array of `Comment` objects (up to `maxComments`), filtering out empty ones.
|
|
- AC4: Both functions use the native `Workspace` API internally.
|
|
- AC5: Network errors or non-successful API responses (e.g., status 4xx, 5xx) are caught and logged using the logger.
|
|
- AC6: Relevant TypeScript types (`Story`, `Comment`, etc.) are defined and used within the client module.
|
|
|
|
---
|
|
|
|
### Story 2.2: Integrate HN Data Fetching into Main Workflow
|
|
|
|
- **User Story / Goal:** As a developer, I want to integrate the HN data fetching logic into the main application workflow (`src/index.ts`), so that running the app retrieves the top 10 stories and their comments after completing the setup from Epic 1.
|
|
- **Detailed Requirements:**
|
|
- Modify the main execution flow in `src/index.ts` (or a main async function called by it).
|
|
- Import the `algoliaHNClient` functions.
|
|
- Import the configuration module to access `MAX_COMMENTS_PER_STORY`.
|
|
- After the Epic 1 setup (config load, logger init, output dir creation), call `WorkspaceTopStories()`.
|
|
- Log the number of stories fetched.
|
|
- Iterate through the array of fetched `Story` objects.
|
|
- For each `Story`, call `WorkspaceCommentsForStory()`, passing the `story.storyId` and the configured `MAX_COMMENTS_PER_STORY`.
|
|
- Store the fetched comments within the corresponding `Story` object in memory (e.g., add a `comments: Comment[]` property to the `Story` object).
|
|
- Log progress using the logger utility (e.g., "Fetched 10 stories.", "Fetching up to X comments for story {storyId}...").
|
|
- **Acceptance Criteria (ACs):**
|
|
- AC1: Running `npm run dev` executes Epic 1 setup steps followed by fetching stories and then comments for each story.
|
|
- AC2: Logs clearly show the start and successful completion of fetching stories, and the start of fetching comments for each of the 10 stories.
|
|
- AC3: The configured `MAX_COMMENTS_PER_STORY` value is read from config and used in the calls to `WorkspaceCommentsForStory`.
|
|
- AC4: After successful execution, story objects held in memory contain a nested array of fetched comment objects. (Can be verified via debugger or temporary logging).
|
|
|
|
---
|
|
|
|
### Story 2.3: Persist Fetched HN Data Locally
|
|
|
|
- **User Story / Goal:** As a developer, I want to save the fetched HN stories (including their comments) to JSON files in the date-stamped output directory, so that the raw data is persisted locally for subsequent pipeline stages and debugging.
|
|
- **Detailed Requirements:**
|
|
- Define a consistent JSON structure for the output file content. Example: `{ storyId: "...", title: "...", url: "...", hnUrl: "...", points: ..., fetchedAt: "ISO_TIMESTAMP", comments: [{ commentId: "...", text: "...", author: "...", createdAt: "ISO_TIMESTAMP", ... }, ...] }`. Include a timestamp for when the data was fetched.
|
|
- Import Node.js `fs` (specifically `fs.writeFileSync`) and `path` modules.
|
|
- In the main workflow (`src/index.ts`), within the loop iterating through stories (after comments have been fetched and added to the story object in Story 2.2):
|
|
- Get the full path to the date-stamped output directory (determined in Epic 1).
|
|
- Construct the filename for the story's data: `{storyId}_data.json`.
|
|
- Construct the full file path using `path.join()`.
|
|
- Serialize the complete story object (including comments and fetch timestamp) to a JSON string using `JSON.stringify(storyObject, null, 2)` for readability.
|
|
- Write the JSON string to the file using `fs.writeFileSync()`. Use a `try...catch` block for error handling.
|
|
- Log (using the logger) the successful persistence of each story's data file or any errors encountered during file writing.
|
|
- **Acceptance Criteria (ACs):**
|
|
- AC1: After running `npm run dev`, the date-stamped output directory (e.g., `./output/YYYY-MM-DD/`) contains exactly 10 files named `{storyId}_data.json`.
|
|
- AC2: Each JSON file contains valid JSON representing a single story object, including its metadata, fetch timestamp, and an array of its fetched comments, matching the defined structure.
|
|
- AC3: The number of comments in each file's `comments` array does not exceed `MAX_COMMENTS_PER_STORY`.
|
|
- AC4: Logs indicate that saving data to a file was attempted for each story, reporting success or specific file writing errors.
|
|
|
|
---
|
|
|
|
### Story 2.4: Implement Stage Testing Utility for HN Fetching
|
|
|
|
- **User Story / Goal:** As a developer, I want a separate, executable script that *only* performs the HN data fetching and persistence, so I can test and trigger this stage independently of the full pipeline.
|
|
- **Detailed Requirements:**
|
|
- Create a new standalone script file: `src/stages/fetch_hn_data.ts`.
|
|
- This script should perform the essential setup required for this stage: initialize logger, load configuration (`.env`), determine and create output directory (reuse or replicate logic from Epic 1 / `src/index.ts`).
|
|
- The script should then execute the core logic of fetching stories via `algoliaHNClient.fetchTopStories`, fetching comments via `algoliaHNClient.fetchCommentsForStory` (using loaded config for limit), and persisting the results to JSON files using `fs.writeFileSync` (replicating logic from Story 2.3).
|
|
- The script should log its progress using the logger utility.
|
|
- Add a new script command to `package.json` under `"scripts"`: `"stage:fetch": "ts-node src/stages/fetch_hn_data.ts"`.
|
|
- **Acceptance Criteria (ACs):**
|
|
- AC1: The file `src/stages/fetch_hn_data.ts` exists.
|
|
- AC2: The script `stage:fetch` is defined in `package.json`'s `scripts` section.
|
|
- AC3: Running `npm run stage:fetch` executes successfully, performing only the setup, fetch, and persist steps.
|
|
- AC4: Running `npm run stage:fetch` creates the same 10 `{storyId}_data.json` files in the correct date-stamped output directory as running the main `npm run dev` command (at the current state of development).
|
|
- AC5: Logs generated by `npm run stage:fetch` reflect only the fetching and persisting steps, not subsequent pipeline stages.
|
|
|
|
## Change Log
|
|
|
|
| Change | Date | Version | Description | Author |
|
|
| ------------- | ---------- | ------- | ------------------------- | -------------- |
|
|
| Initial Draft | 2025-05-04 | 0.1 | First draft of Epic 2 | 2-pm | |