About

I am a final-year BSc Statistics and Programming student at Kenyatta University, Nairobi. My work is method-first, assumption-aware, and decision-ready: I prioritize model validity, reproducibility, and clear uncertainty quantification over leaderboard metrics. Prior to this, I worked as a Machine Learning Data Specialist at Cloudfactory Kenya, building ground truth datasets for Computer Vision systems.

My core statistical stack is R for inference, mixed-effects modeling, Bayesian estimation, and residual diagnostics. On the engineering side, I use Python for machine learning pipelines, RAG architectures, and deployment workflows. I hold statistical rigor as a precondition for any modeling decision, not an afterthought.

Statistical Projects

02
R tidymodels Classification
Loan default prediction model performance visual.

Loan Default Prediction

Credit-risk classification pipeline with dimensionality reduction, three tuned model families, and calibrated probability outputs evaluated against asymmetric cost metrics.

Sparse PCA ahead of model fitting does two things at once: it removes the collinearity that inflates coefficients and shrinks the feature space to dimensions that actually separate defaulters from non-defaulters.
Rtidymodelsxgboostrangerggplot2PCA/SPCABayesian Optimizationcross-validationROC-AUC
03
R Classification Statistical Analysis
Breast cancer malignancy classification analysis visual.

Breast Cancer Malignancy Classification

Eight-model comparative study on 30 nucleus morphology features, with Boruta feature selection, PCA biplots, and SMOTE-corrected class imbalance producing a clinically interpretable SVM-RBF decision boundary.

The three worst-case measurements, concavity, concave point count, and perimeter, carry substantially more diagnostic signal than the mean measurements, because malignant nuclei are irregular at their extreme edges, not uniformly throughout.
Rtidymodelskernlabggplot2GGallySVM-RBFBARTBorutaSMOTEPCA biplotsRacing ANOVAhierarchical clustering
04
R Bayesian In Progress

Bayesian Structural Time Series

Interrupted time series causal inference with BSTS decomposition and INLA posterior estimation for policy impact analysis.

Counterfactual forecasting separates intervention effects from baseline trend and seasonality.
RINLABSTSinterrupted time seriescausal inferenceposterior uncertainty
Active development
05
R Bayesian In Progress
Bayesian survival analysis model visual.

Bayesian Survival Analysis

Bayesian hierarchical survival modeling for censored time-to-event data with competing risks and time-varying covariates.

Cause-specific hazard modeling preserves uncertainty under right-censoring and subgroup heterogeneity.
RBayesian hierarchicalright-censored datacompeting riskstime-varying covariates
Active development

All repositories

Production AI Systems

Alongside statistical work, I build robust Python AI systems: RAG pipelines, agentic architectures, and automated research tools. These projects demonstrate software engineering discipline and practical ML deployment.

06
Python RAG

Medical Research Assistant

Medical PDF question answering pipeline with retrieval, reranking, and source-level traceability for each generated response.

PythonLangChainFAISSNetworkXStreamlitrerankingsemantic caching
07
Python Agents

Agentic RAG

Agent-driven RAG architecture that combines planning, shared memory, and runtime tool calling for multi-step context-aware tasks.

PythonLangChainLangGraphOpenRoutertool-calling agentsshared memory
08
Python DSPy + LangGraph
MedReportAI pipeline architecture visual.

MedReportAI

Multi-agent LangGraph pipeline orchestrating a two-phase research protocol to synthesize publication-ready biomedical reports from live PubMed and web evidence.

PythonDSPyLangGraphStreamlitPubMedFAISShybrid retrievalHITLastream_events
09
Python Computer Vision Kaggle Notebook
Tomato ripeness object detection output visual.

Tomato Ripeness Detection

Object detection model for ripe versus unripe tomatoes, achieving strong precision across IoU thresholds with mAP@50 of 93.2%.

PythonYOLOPyTorchobject detectionmAP@50: 93.2%

Skills and Stack

Statistical Modeling and Inference

Core competency, R

  • GLMs / GLMMsglmmtmb, rainfall
  • Bayesian inferenceINLA, BSTS
  • Hypothesis testingacross projects
  • Residual diagnosticsDHARMa
  • Survival analysisin progress
  • Time seriesBSTS, causal ITS
  • Resampling / CVtidymodels
  • Reproducible reportsQuarto

Machine Learning

R + Python

  • tidymodels (core)loan default, cancer
  • XGBoost / Random Forestxgboost, ranger
  • SVMkernlab
  • ggplot2 (core)all R projects
  • Scikit-learnPython baselines
  • PyTorchdeep learning, YOLO
  • Hyperparameter tuningRacing ANOVA, Bayes

Systems and Deployment

Python, Infrastructure

  • LangChain / LangGraphRAG, agents
  • DSPyMedReportAI
  • FAISSvector search
  • StreamlitRAG UI
  • FastAPIAPI serving
  • Dockercontainerisation
  • PostgreSQL / SQLstructured data
  • Git / GitHuball projects

Experience

September 2022 to April 2024

Machine Learning Data Specialist

Cloudfactory Kenya

Nairobi, Kenya

  • Developed high-precision ground truth datasets for Computer Vision tasks, including 3D bounding box estimation and semantic segmentation for autonomous systems.
  • Engineered data validation pipelines and performed statistical quality control to minimize label noise, directly improving client model mAP and F1 scores.
  • Collaborated on iterative model error analysis, identifying edge cases in complex spatial datasets to refine training data distribution.

Education

2017 to Mar 2021

Kenya Certificate of Secondary Education (KCSE)

Bungoma High School

Grade B+

Sep 2021 to Aug 2022

Analytical Chemistry

University of Nairobi

Transferred to BSc Statistics and Programming at Kenyatta University in 2022.

Sep 2022 to Dec 2026

BSc Statistics and Programming

Kenyatta University

Expected: First Class Honours

Contact

I am available for full-time roles from mid-2026, with a focus on Statistician and Statistical Data Scientist positions. I am also open to research collaborations and consulting engagements in the interim. If my work is relevant to what you are building, email is the best way to reach me.

Resume / CV Download .pdf Email (preferred) olandechris@gmail.com GitHub @Chrisolande LinkedIn Chris Olande Kaggle @chrisolande