.. _gsoc-proposal-template: A Conversational AI Assistant for BeagleBoard using RAG and Fine-tuning - Fayez Zouari ###################################################################################### Introduction ************ Summary links ============= - **Contributor:** `Fayez Zouari `_ - **Mentors:** `Aryan Nanda `_ - **Code:** `TBD`_ - **Documentation:** `TBD`_ - **GSoC:** `TBD`_ Status ====== This project is currently just a proposal. Proposal ======== - Created accounts across `OpenBeagle `_ and `Beagle Forum `_ - The PR Request for Cross Compilation: `#197 `_ - Created a project proposal using the `proposed template `_. About ===== * Forum: FAYEZ_ZOUARI * OpenBeagle: fayezzouari * Discord ID: .kageyamo * GitHub: fayezzouari * School: INSAT (National Institute of Applied Science and Technology) * Country: Tunisia * Typical work hours: 9:00 AM - 6:00 PM (UTC+1) * Previous GSoC participation: No Project ******* **Project name:** BeagleMind - Documentation Assistant with Fine-tuned LLM and RAG Description =========== BeagleMind combines fine-tuned LLMs with RAG to create an accurate documentation assistant that: 1. Uses PEFT/LoRA fine-tuning on BeagleBoard documentation 2. Implements RAG for fact-based responses and to reduce LLM hallucination 3. Accessed using a HF inference endpoint 4. Deploys via: - CLI tool for local usage - Web interface with websockets 5. Includes agentic evaluation framework Technical Implementation ------------------------ LLM Fine-tuning Architecture ============================ The system will employ the selected LLM as its base model, utilizing Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters to specialize the model for BeagleBoard documentation. The training pipeline processes OpenBeagle resources through: - Semantic segmentation of technical documentation - Generation of instruction-response pairs - Dynamic masking of code samples for focused learning Evaluation will combine: - Perplexity measurements on held-out documentation - Task-specific accuracy on BeagleBoard API questions - Human review of generated troubleshooting steps RAG Integration Pipeline ======================= The retrieval-augmented generation system implements a three-stage accuracy enforcement: 1. Document Processing: - Hierarchical chunking preserving code-sample context - Metadata enrichment with section headers - Cross-document relationship mapping 2. Vector Retrieval: - Hybrid dense-sparse retrieval using BAAI embeddings - Query-adaptive reranking - Confidence-based fallback mechanisms 3. Response Generation: - Contextual grounding with retrieved passages - Automatic citation injection - Confidence thresholding for uncertain responses Hosting Infrastructure ====================== The production deployment features: .. list-table:: Hosting Specifications :widths: 30 70 :header-rows: 1 * - Component - Implementation * - Inference Endpoint - Hugging Face TGI with 4-bit quantization * - Load Balancing - Round-robin with health checks * - Monitoring - Prometheus metrics for: - Token generation latency - Retrieval hit rate - Hallucination alerts Deployment Targets ================== Multi-platform accessibility through: 1. Web Interface: - React.js frontend with response streaming - Interactive citation visualization - Session-based query history 2. CLI Tool: - Access to the hosted LLM through an Api Key - Configurable verbosity levels - Automated test script integration Evaluation Framework ==================== The agentic evaluation system employs three specialized test agents: 1. Fact-Verification Agent: - Cross-references answers with source docs - Flags unsupported technical claims - Maintains accuracy heatmaps 2. Completeness Auditor: - Scores answer depth on: - API reference coverage - Troubleshooting steps - Example code relevance 3. Stress-Test Bot: - Generates adversarial queries - Measures failure modes - Identifies documentation gaps Software ======== - **Programming Languages:** Python - **ML Tools:** PEFT, LoRA, Quantization - **Frameworks:** FastAPI, Hugging Face Transformers - **Database:** ChromaDB/Weaviate/Qdrant - **Frontend:** React - **Deployment:** Docker, Nginx, PYPI, Hugging Face Spaces - **Version Control:** Git, GitHub/GitLab Hardware ======== - **Development Boards:** - BeagleBone AI-64 - BeagleY-AI - **Cloud Services:** - Hugging Face Spaces / Inference Endpoints - Vercel Architecture and Diagrams ************************* These diagrams represent the workflow of the methods mentionned earlier. .. figure:: finetuning.png :align: center :alt: Fine-Tuning Architecture Fine-Tuning Architecture .. figure:: rag.png :align: center :alt: RAG Integration Pipeline RAG Integration Pipeline .. figure:: deployment.png :align: center :alt: Deployment Structure Deployment Structure Timeline ******** .. list-table:: :widths: 20 30 50 :header-rows: 1 :class: milestone-table * - **Deadline** - **Milestone** - **Deliverables** * - May 27 - Coding Begins - Finalize architecture diagrams * - June 3 - M1: Foundation - CLI prototype, Fine-tuning strategy doc * - June 17 - M2: Data Preparation - Curated dataset, Vector DB ready * - July 1 - M3: Model Training - Fine-tuned model on HF, Initial benchmarks * - July 8 - Midterm Evaluation - Working CLI with local inference * - July 22 - M4: Agentic Evaluation - Test agents implemented, Accuracy reports * - Aug 5 - M5: Web Interface - Websocket server, React frontend * - Aug 19 - Final Submission - Full documentation, Demo video Detailed Timeline ================= Community Bonding (May 9 - May 26) ---------------------------------- - Develop workflow diagrams: - Data collection pipeline - Fine-tuning process - RAG integration flow - Finalize model selection criteria - Establish evaluation metrics with mentor .. _gsoc-beaglemind-m1: Milestone 1: Foundation (June 3) -------------------------------- 1. **CLI Prototype**: - Basic question-answering interface - Chatbot using only RAG just to present the PoC - Provide helpful parameters like -h for help, -p for prompt and -l to refer to a log file - Simple evaluation script 2. **Video demonstration**: - Provide video demonstration - Present a proof of concept - Highlight that the actual solution will feature a hosted fine-tuned LLM and RAG to reduce hallucination 3. **Fine-tuning Prep**: - Document preprocessing scripts - Training environment setup .. _gsoc-beaglemind-m2: Milestone 2: Data Preparation (June 17) --------------------------------------- 1. **Document Processing**: - Data formatting - Generate synthetic Q&A pairs - Convert all docs to clean Markdown - Extract code samples, diagrams, cicruit schemas and any resource that could help in the troubleshooting 2. **Vector Database**: - Implement chunking strategy - Test retrieval accuracy - Optimize embedding selection .. _gsoc-beaglemind-m3: Milestone 3: Model Training (July 1) ------------------------------------ 1. **Fine-tuning**: - Training runs with different parameters - Loss/accuracy tracking - Quantization tests 2. **Deployment**: - HF Inference Endpoint setup - Performance benchmarks - Hallucination tests Midterm Evaluation (July 8) --------------------------- - Functional CLI with: - Model inference - Basic RAG integration - Accuracy metrics - Video demonstration - Mentor review session .. _gsoc-beaglemind-m4: Milestone 4: Agentic Evaluation (July 22) ----------------------------------------- 1. **Evaluation Agents**: - Fact-checking agent - Completeness evaluator - Hallucination detector 2. **Automated Testing**: - 100-question test suite - Continuous integration setup - Performance dashboard .. _gsoc-beaglemind-m5: Milestone 5: Web Interface (Aug 5) ---------------------------------- 1. **Backend**: - FastAPI websocket server - Dockerize the server - Async model loading - Rate limiting 2. **Frontend**: - React-based chat UI - Response visualization - Mobile responsiveness Final Submission (Aug 19) ------------------------- - Comprehensive documentation: - Installation guides - API references - Training methodology - 5-minute demo video - Performance report Benefit ======= BeagleMind will provide: - 24/7 documentation assistance - Reduced maintainer workload - Visualized technical answers - Accelerated debugging - Offline documentation access - Improved onboarding experience Experience and Approach ********************** Personal Background =================== As an Embedded Systems Engineering student with a passion for AI and robotics, I find the BeagleMind project perfectly aligns with my academic specialization and technical interests. My coursework in embedded systems, combined with self-study in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), has prepared me to bridge the gap between hardware documentation and AI-powered assistance. Experience ========== As an Embedded Systems Engineering student with AI specialization, I bring: 1. **LENS Platform**: - RAG Chatbot with Citations: Developed a retrieval-augmented chatbot that provides answers with detailed references, URL, page number, and File Name. 2. **Chatautomation Platform**: - Built multimodal data loaders (PDFs, images, audio) - Implemented voice interaction system (STT + LLM + TTS) - Developed WhatsApp/Instagram chatbot integrations 3. **Orange Digital Center Internship**: - Created MEPS monitoring system - Developed biogas forecast mode - Implemented agentic workflows for production reports 4. **x2x Modality Project**: - Hexastack Hackathon 1st place (Open source contribution) - Speech to Text for effortless communication - Text to Speech for improved accessibility - Image and Document Processing into text for smoother integration Contingency =========== If blockers occur: 1. Research documentation and source code 2. Seek community support (Discord/Forum) 3. Implement alternative approaches 4. Escalate to the mentor if unresolved Misc ==== - Will comply with all GSoC requirements - Merge request will be submitted to BeagleBoard GitHub - Current demo available at `bb-gsoc.fayez-zouari.tn `_ | `CLI GitHub Repo `_ References ========== 1. `Hugging Face Transformers `_ 2. `ChromaDB Documentation `_ 3. `BeagleBoard Documentation `_ 4. `PEFT Fine-tuning `_