Experience
Data Scientist
Data Annotation
Remote, Part-time position focusing on AI model evaluation and improvement.
- Prompt Analytics & Model Diagnostics: Analysed more than 2,000 live user prompts and completions to uncover systematic error patterns, bias and security vulnerabilities in frontier LLMs.
- Insight Reporting: Produced interactive dashboards and weekly briefs that prioritised fine‑tuning work, lifting factual accuracy by 8 pp in A/B tests.
- Tooling & Automation: Built Python pipelines to cluster prompts, label edge‑cases and automatically surface regressions, cutting manual triage by 35 %.
AI Trainer
Data Annotation
Remote, Part-time position focusing on AI model evaluation and improvement.
- AI Model Evaluation & Debugging: Conducted in-depth analysis of AI-generated coding solutions, identifying and resolving errors, improving output accuracy.
- Prompt Engineering & Model Assessment: Designed and tested diverse prompt variations to benchmark LLMs’ performance across multiple tasks, enhancing response consistency.
- Comparative Analysis & Feedback: Evaluated outputs from various LLMs, identifying key improvement areas and providing structured feedback that contributed to an increase in model precision.
AI Researcher
BT Group
Master`s Thesis Project in collaboration with BT Group and Prof. Hani Hagras.
- AI Model Development & Optimisation: Designed and implemented an AI Labelling Assistant capable of accurately classifying images with as few as 10 training samples per class, achieving 87.3% accuracy on the Intel Image Classification.
- Generalisation & Adaptability: Engineered the model to handle diverse image types and used a novel approach for out-of-distribution (OOD) detection with recall 100% on certain OOD tests, ensuring robust performance in open-world scenarios.
- Industry Collaboration & Deployment: Partnered with BT Group and Prof. Hani Hagras to refine and integrate the AI solution, leading to a high-accuracy automated labelling system adopted for industrial use.
- Advanced CV R&D: Self‑training ensemble leveraged test data (Intel accuracy 87.3→89.3 %, CIFAR‑10 63.3→65.7 %); prototype‑based OOD detection hit 100 % recall on Fashion‑MNIST↔Intel and ≥95 % precision overall.
- Prototype-Based OOD Detection: Designed a novel approach to detect out-of-distribution images, hitting 100 % OOD recall on Fashion-MNIST⇆Intel tests and ≥95 % precision overall.
- Clustering Workflow: Re-engineered K-Means with a cosine metric to batch OOD images (87 % clusteringAcc. on Intel, 73 % on Caltech-101), halving expert triage time.