93 lines
10 KiB
Plaintext
93 lines
10 KiB
Plaintext
# Project Brief: BMad News DiCaster
|
|
|
|
## Introduction / Problem Statement
|
|
|
|
- **Core Idea:** BMad News DiCaster is a Next.js, Supabase, Vercel-hosted web app that generates daily podcasts/newsletters summarizing the top 10 Hacker News stories, including article and comment summaries. The project emphasizes flexible development, supporting both local (with optional local LLM and Supabase) and remote deployed environments.
|
|
- **Problem being solved / Opportunity being addressed:**
|
|
- The primary problem is the difficulty for individuals to efficiently keep up with the high volume of content and discussions on Hacker News.
|
|
- The opportunity is to provide a curated, easily digestible summary in both text and audio formats, catering to busy tech enthusiasts who want to stay informed, while also serving as a comprehensive demonstration project for modern web application development practices.
|
|
|
|
## Vision & Goals
|
|
|
|
- **Vision:** To be the go-to daily digest for Hacker News enthusiasts, offering both text and audio summaries to fit their busy lifestyles, demonstrating a modern web application architecture with robust local and remote development/deployment capabilities.
|
|
- **Primary Goals (SMART for MVP):**
|
|
- **Goal 1:** Successfully generate and store a daily top 10 Hacker News summary (text content and audio link from Play.ai's PlayNote API) within a defined daily processing window, functional in both local and deployed environments.
|
|
- **Goal 2:** For unauthenticated users: Display a list of generated summaries, excluding the two most recent editions, and limit viewing to a small number (e.g., the 3rd, 4th, and 5th most recent). Include a clear call to action to register for access to the latest content and email delivery.
|
|
- **Goal 3:** Enable registered users to log in, view/listen to all past summaries (including the latest), and manage their email subscription preference for the daily newsletter.
|
|
- **Success Metrics (Initial Ideas for MVP):**
|
|
- Consistent daily generation of the top 10 Hacker News summary content (locally and deployed).
|
|
- Successful retrieval of the audio link from the Play.ai PlayNote API webhook.
|
|
- Successful storage of all retrieved assets (HN posts, comments, scraped articles, summaries, podcast URLs) in Supabase (local Docker and deployed).
|
|
- Ability for registered users to toggle email notifications and receive them successfully.
|
|
- Ability for unauthenticated users to view a limited set of older content.
|
|
- Ability for authenticated users to view all content.
|
|
- CLI command successfully triggers on-demand content generation and storage in the local environment.
|
|
|
|
## Target Audience / Users
|
|
|
|
- **Primary End-Users:** Tech-savvy individuals, likely regular readers of Hacker News, who are often busy and would benefit from efficiently consumed, curated summaries of top stories and discussions, available in both text and engaging two-person podcast format.
|
|
- **Secondary Audience / Developer Persona:** Developers interested in learning from or contributing to a practical, real-world example application. They are keen on understanding the architecture (Next.js, Supabase, Vercel), specific API integrations (Play.ai PlayNote, `hnangolia`, Cheerio), LLM usage (interchangeable local Ollama and remote API-based models), local development setup (including Dockerized Supabase and CLI tooling), and deployment best practices.
|
|
|
|
## Key Features / Scope (High-Level Ideas for MVP)
|
|
|
|
- **Feature 1 (Content Sourcing):** Automated daily fetching of top Hacker News stories (e.g., top 10) using the `hnangolia` library.
|
|
- **Feature 2 (Content Scraping):** Scraping of linked article content using Cheerio for summarization.
|
|
- **Feature 3 (Content Summarization):** LLM-powered summarization of articles and associated Hacker News comments to create a "top 10 countdown" style daily briefing.
|
|
- Must support using a local LLM (e.g., via Ollama) during local development.
|
|
- Must support using a remote LLM (via API key) for deployed/remote environment or local development.
|
|
- **Feature 4 (Data Storage):** Comprehensive storage of all generated and retrieved assets (HN posts, comments, full scraped content, summaries, podcast URLs) in Supabase.
|
|
- Supports a local Docker version of Supabase for local development.
|
|
- Utilizes a cloud-hosted Supabase instance for the deployed application.
|
|
- **Feature 5 (Audio Generation):** Integration with the Play.ai **PlayNote API** to submit the text summary and receive a two-person AI-generated podcast link via a webhook.
|
|
- **Feature 6 (Content Generation Workflow - Automated):** A daily automated process (e.g., cron job, Vercel cron) orchestrating content sourcing, scraping, summarization, storage, and audio generation submission.
|
|
- **Feature 7 (Content Generation Workflow - Manual CLI):** A command-line interface (CLI) tool to trigger the entire content generation and storage process on-demand in the local development environment.
|
|
- **Feature 8 (Web Interface - List View):** A public web page listing generated summaries/podcasts.
|
|
- Unauthenticated users see a limited list (e.g., 3rd-5th most recent) with a call to register for full access.
|
|
- Authenticated users see all available summaries/podcasts.
|
|
- **Feature 9 (Web Interface - Detail View):** A detail page for each daily summary, displaying the full text briefing and an embedded audio player for the podcast.
|
|
- **Feature 10 (User Authentication):** User registration (email/password) and login/logout functionality using Supabase Auth.
|
|
- **Feature 11 (User Profile - Email Notifications):** Authenticated users can access a setting to toggle on/off daily email notifications for new briefings.
|
|
- **Feature 12 (Email Dispatch):** Automated daily email dispatch of the new newsletter (containing text summary highlights and podcast link) to all subscribed registered users.
|
|
|
|
## Post MVP Features / Scope and Ideas
|
|
|
|
- **Feature 1 (Admin Interface):** A web-based administrative interface for managing the application (e.g., view generation logs, manually trigger/re-run generation, potentially manage users or content).
|
|
- **Feature 2 (Flexible Scheduling & Editions):** Ability to configure and generate different summary cadences and types (e.g., weekly digests, "night and weekend" editions, topic-focused summaries).
|
|
- **Feature 3 (User Customization):** Allow users to customize content (e.g., choose number of stories, filter by keywords/topics).
|
|
- **Feature 4 (Expanded Content Sources):** Integrate other news sources beyond Hacker News.
|
|
|
|
## Known Technical Constraints or Preferences
|
|
|
|
- **Core Stack:** Next.js (frontend/backend), Supabase (database, auth, storage), Vercel (hosting, serverless functions, cron jobs).
|
|
- **Hosting Tier:** Vercel Pro tier to leverage potentially longer function execution times and other professional-grade features.
|
|
- **Content Fetching:** `hnangolia` library for Hacker News data.
|
|
- **Content Scraping:** Cheerio for HTML parsing of articles.
|
|
- **LLM:**
|
|
- Flexibility to use local LLMs (e.g., Ollama) for local development.
|
|
- Ability to use API-based LLMs (e.g., OpenAI, Anthropic - specific model to be decided) for production or as an alternative in local development. Configurable via API keys.
|
|
- **Audio Generation:** Specifically use the Play.ai **PlayNote API** for its two-person podcast generation feature; webhook for callback.
|
|
- **Local Development Environment:**
|
|
- Must support running the full application stack locally.
|
|
- Local Supabase instance running in Docker.
|
|
- CLI for on-demand content generation.
|
|
- **Data Persistence:** All fetched (HN posts, comments), scraped (article content), and generated (summaries, podcast URLs) data must be stored, both locally and in the deployed environment.
|
|
- **Architecture on Vercel:**
|
|
- The daily content generation process will likely be architected as a **pipeline of multiple, chained serverless functions** to manage execution time and resources efficiently within Vercel's limits.
|
|
- Webhooks (e.g., from Play.ai) are critical for handling asynchronous operations from external services.
|
|
- Consideration for efficient batching or queuing of tasks (e.g., article scraping and summarization) to avoid hitting function timeouts or resource limits. Vercel KV or external queues (like Upstash QStash) might be explored if simple direct invocation chaining becomes insufficient.
|
|
- **Risks:**
|
|
- **Vercel Function Execution Limits:** Even on the Pro tier, individual serverless functions have execution time limits. The pipeline approach is intended to mitigate this, but complex/slow steps (especially numerous LLM calls or heavy scraping) need careful management and optimization.
|
|
- **LLM Processing Time/Cost:** LLM summarizations can be time-consuming and/or costly depending on the model and number of tokens processed. This needs to be factored into the daily processing window and operational budget.
|
|
- **Scraping Reliability:** Websites can change their structure, breaking scrapers. Anti-scraping measures could also pose a challenge.
|
|
- **External API Dependencies:** Reliance on Hacker News (`hnangolia`), LLM provider, and Play.ai means that any downtime or API changes from these services can impact the application.
|
|
- **Webhook Management:** Ensuring reliable receipt and processing of webhooks, including handling retries or failures.
|
|
- **Cold Starts:** Serverless functions can have "cold starts," which might introduce latency, especially for the first request in a while or for less frequently used functions in the pipeline. This needs to be acceptable for the user experience or generation window.
|
|
|
|
## Relevant Research (Optional)
|
|
|
|
None at this time.
|
|
|
|
## PM Prompt
|
|
|
|
This Project Brief provides the full context for "BMad News DiCaster," a daily Hacker News summary and podcast generation service. Please operate in **MODE 1**, review it thoroughly to create the Product Requirements Document (PRD). Your process should involve going through each section of the PRD, asking clarifying questions for any ambiguities in this brief, and suggesting improvements or alternative approaches where appropriate, adhering to your standard operational mode for PRD development.
|