This project requires the development of an intelligent text-splitting system that divides documents into optimally sized chunks while preserving context and meaning. The system supports multiple splitting techniques, including: Sentence-based splitting Token-based splitting Recursive character splitting Semantic-based splitting Key Features: ✅ Dynamic Chunk Sizing – Adjusts chunk size based on text complexity. ✅ Context-Aware Overlap – Smart overlap calculation to maintain coherence. ✅ Arabic & English Support – Advanced preprocessing for Arabic text (reshaping, normalization). ✅ Handles Structured Text – Code snippets, tables, dialogues, and long paragraphs. ✅ Automatic Splitter Selection – Chooses the best splitting method based on text analysis.
مراحل الوظيفة
Project delivery
To deliver the project as agreed
المهارات المطلوبة
Artificial Intelligence
Data Science
Data Integration