Shajara is my primary independent R&D project exploring how large language models can be given reliable access to curated knowledge sources.
I built multiple data pipelines from scratch for downloading, transcribing, and processing large text corpora, and developed a retrieval-augmented system to ground LLM responses in specific source material.
The project includes prototype web interfaces with custom LLM chat functionality, and active research into techniques for steering and guiding LLM reasoning processes. All scope, architecture, and technical decisions are managed independently.
- Built data pipelines for downloading, transcribing, and processing large text corpora.
- Developed a retrieval-augmented generation system with curated source material.
- Researched techniques for steering and guiding LLM reasoning processes.
- Built and iterated on prototype web interfaces with custom LLM chat functionality.
- Acquired practical working knowledge in Python, embeddings, vector databases, and AI APIs.
- Managing all scope, direction, architecture, and technical decisions independently.