

I am a Research Fellow at the University of Melbourne (School of Mathematics & Statistics), within Melbourne Integrative Genomics (MIG) and the ARC Centre of Excellence for the Mathematical Analysis of Cellular Systems (MACSYS). Previously, I was a Doctoral Researcher at the Finnish Center for Artificial Intelligence (FCAI) and Probabilistic Machine Learning (PML) group at Aalto University (Department of Computer Science) in Helsinki, Finland.
I work at the intersection of AI/ML (artificial intelligence and machine learning), computer science, applied mathematics and statistics, to develop advanced quantitative technologies for complex system domains like biology and economics/finance.
My career spans research and industry/startups across Australia and Europe, including experience as a co-founder of an AI digital health startup (Velmio) in Estonia, and technology consultant (Deloitte) in Australia.
I hold a PhD in Computer Science (Doctor of Science) from Aalto University, where my research focused on probabilistic machine learning and deep learning for health, genetics and personalized medicine. I previously completed a First Class Honours degree in Applied Mathematics at the University of Sydney and a Bachelor of Science in Mathematics, Statistics and Economics at the University of Western Australia.
Connect on LinkedIn
Connect on LinkedIn
RESEARCH
PhD thesis
Advancing towards personalized medicine: probabilistic machine learning and deep learning for health and genetics
Advancing towards personalized medicine: probabilistic machine learning and deep learning for health and genetics
Full text of thesis
Full text of thesis
Publications (Google Scholar)
Publications
(Google Scholar)
Publications
(Google Scholar)



research highlights
Improving GeneraliSability of Health Prediction Models
Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.
Learn More
Published in
Preprint, Under review
Keywords
Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL
Improving GeneraliSability of Health Prediction Models
Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.
Learn More
Published in
Preprint, Under review
Keywords
Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL
Modeling disease risk in families with graph neural networks
Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.
Learn More
Published in
Machine Learning for Healthcare Conference (MLHC)
Keywords
graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease
Modeling disease risk in families with graph neural networks
Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.
Learn More
Published in
Machine Learning for Healthcare Conference (MLHC)
Keywords
graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease
Synthetic data FOR GENETICS RESEARCH
HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.
Learn More
Published in
BIOINFORMATICS
Keywords
computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring
Synthetic data FOR GENETICS RESEARCH
HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.
Learn More
Published in
BIOINFORMATICS
Keywords
computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring
BLOG
IN THE media

March 2024
Interview by the Finnish Center for AI about Proof of Concept funding from the Research Council of Finland for research on AI-powered personalised medicine

september 2020
Featured in Sifted (Financial Times) for AI-powered digital health and mobile health tools for women’s health

april 2020
Featured in numerous media outlets (CNN, Forbes and Sydney Morning Herald) for digital health tools for the Covid pandemic
media
March 2024
Interview by the Finnish Center for AI about Proof of Concept funding from the Research Council of Finland for research on AI-powered personalised medicine
september 2020
Featured in Sifted/Financial Times for AI-powered digital health and mobile health tools for women’s health
april 2020
Featured in numerous media outlets (CNN, Forbes and Sydney Morning Herald) for digital health tools for the Covid pandemic
© 2025 Sophie wharrie
research highlights
Improving GeneraliSability of Health Prediction Models
Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.
Learn More
Published in
Preprint, Under review
Keywords
Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL
Improving GeneraliSability of Health Prediction Models
Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.
Learn More
Published in
Preprint, Under review
Keywords
Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL
Modeling disease risk in families with graph neural networks
Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.
Learn More
Published in
Machine Learning for Healthcare Conference (MLHC)
Keywords
graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease
Modeling disease risk in families with graph neural networks
Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.
Learn More
Published in
Machine Learning for Healthcare Conference (MLHC)
Keywords
graph neural networks, geometric deep learning, deep learning for sequential data, electronic health records, genetics, familial factors of disease
Synthetic data FOR GENETICS RESEARCH
HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.
Learn More
Published in
BIOINFORMATICS
Keywords
computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring
Synthetic data FOR GENETICS RESEARCH
HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.
Learn More
Published in
BIOINFORMATICS
Keywords
computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring
media
March 2024
Interview by the Finnish Center for AI about Proof of Concept funding from the Research Council of Finland for research on AI-powered personalised medicine
september 2020
Featured in Sifted/Financial Times for AI-powered digital health and mobile health tools for women’s health
april 2020
Featured in numerous media outlets (CNN, Forbes and Sydney Morning Herald) for digital health tools for the Covid pandemic
researcH highlights
Improving GeneraliSability of Health Prediction Models
Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.
Learn More
Published in
Preprint, Under review
Technical
keywords
Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL
Improving GeneraliSability of Health Prediction Models
Machine learning is a powerful tool for health prediction, but often struggles with generalisability challenges when making predictions for new patients. A promising strategy is to pool learning across related supervised learning tasks. A new Bayesian meta-learning approach is introduced that also models similarity between causal mechanisms of the tasks. This is applied to a case study for stroke prediction for health record and genetics data from the UK Biobank and FinnGen datasets.
Learn More
Published in
Preprint, Under review
Technical
keywords
Meta-learning, Bayesian Hierarchical models, causal inference, deep learning, transfer learning, oOD generalisation, robust mL
Published in
Machine Learning for Healthcare Conference, PMLR 2023
Technical
keywords
graph neural networks, geometric deep learning, deep learning for time series data, electronic health records, genetics, familial factors of disease
Modeling disease risk in families with graph neural networks
Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.
Learn More
Published in
Machine Learning for Healthcare Conference, PMLR 2023
Technical
keywords
graph neural networks, geometric deep learning, deep learning for time series data, electronic health records, genetics, familial factors of disease
Modeling disease risk in families with graph neural networks
Electronic health records (EHRs) spanning multiple generations present a new way for examining health trends in families. In collaboration with the Institute of Molecular Medicine Finland, an AI system was developed to analyze a network of over 7 million patients’ EHR data. The findings demonstrate that a geometric deep learning approach is beneficial for modeling the shared genetic, environmental, and lifestyle factors influencing disease risk in families.
Learn More
Published in
BIOINFORMATICS
Technical
keywords
computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring
Synthetic data FOR GENETICS RESEARCH
HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.
Learn More
Published in
BIOINFORMATICS
Technical
keywords
computational biology, statistical genetics, simulation-based inference, generative modeling, polygenic risk scoring
Synthetic data FOR GENETICS RESEARCH
HAPNEST is a new software tool that efficiently generates large synthetic datasets that closely mimic real genetics and phenotypic data. This work was carried out with the European-wide INTERVENE consortium, enabling researchers to test new computational methods for polygenic risk scoring across diverse ancestry groups, while protecting sensitive health information. The software and a synthetic dataset of 6.8 million common variants and nine phenotypes for over 1 million individuals has been made publicly available.