Kirollos Hanna - AI/ML Engineer Portfolio

Experience

Data Annotation

April 2025 - Present

Remote, Part-time position focusing on AI model evaluation and improvement.

Prompt Analytics & Model Diagnostics: Analysed more than 2,000 live user prompts and completions to uncover systematic error patterns, bias and security vulnerabilities in frontier LLMs.
Insight Reporting: Produced interactive dashboards and weekly briefs that prioritised fine‑tuning work, lifting factual accuracy by 8 pp in A/B tests.
Tooling & Automation: Built Python pipelines to cluster prompts, label edge‑cases and automatically surface regressions, cutting manual triage by 35 %.

Data Annotation

Sept 2024 - April 2025

Remote, Part-time position focusing on AI model evaluation and improvement.

AI Model Evaluation & Debugging: Conducted in-depth analysis of AI-generated coding solutions, identifying and resolving errors, improving output accuracy.
Prompt Engineering & Model Assessment: Designed and tested diverse prompt variations to benchmark LLMs’ performance across multiple tasks, enhancing response consistency.
Comparative Analysis & Feedback: Evaluated outputs from various LLMs, identifying key improvement areas and providing structured feedback that contributed to an increase in model precision.

BT Group

Jan 2024 - Sept 2024

Master`s Thesis Project in collaboration with BT Group and Prof. Hani Hagras.

AI Model Development & Optimisation: Designed and implemented an AI Labelling Assistant capable of accurately classifying images with as few as 10 training samples per class, achieving 87.3% accuracy on the Intel Image Classification.
Generalisation & Adaptability: Engineered the model to handle diverse image types and used a novel approach for out-of-distribution (OOD) detection with recall 100% on certain OOD tests, ensuring robust performance in open-world scenarios.
Industry Collaboration & Deployment: Partnered with BT Group and Prof. Hani Hagras to refine and integrate the AI solution, leading to a high-accuracy automated labelling system adopted for industrial use.
Advanced CV R&D: Self‑training ensemble leveraged test data (Intel accuracy 87.3→89.3 %, CIFAR‑10 63.3→65.7 %); prototype‑based OOD detection hit 100 % recall on Fashion‑MNIST↔Intel and ≥95 % precision overall.
Prototype-Based OOD Detection: Designed a novel approach to detect out-of-distribution images, hitting 100 % OOD recall on Fashion-MNIST⇆Intel tests and ≥95 % precision overall.
Clustering Workflow: Re-engineered K-Means with a cosine metric to batch OOD images (87 % clusteringAcc. on Intel, 73 % on Caltech-101), halving expert triage time.