Developing deep learning algorithm for de novo variants detection in Pacbio long-read sequencing data

Start date: 05-09-2022
End date: 05-03-2023

Clinical problem

Many developmental disorders, such as intellectual disability, autism spectrum disorder and multiple congenital anomalies are known to be caused by de novo mutations (DNMs). The reliable identification of DNMs is, therefore, of paramount importance both for genetic testing as well as research studies. Because of the genetic heterogeneity that exists for disorders where DNMs play a major role, the identification of DNMs is typically performed based on whole exome (WES) or whole-genome sequencing (WGS) data. In our recent study, we developed DeNovoCNN, a deep-learning-based tool that identifies de novo variants in WES and WGS short-read data more accurately than existing tools. However, the short-read sequencing approaches pose a limitation for the identification of variants in difficult regions. These limitations may significantly contribute to the diagnostic gap in patients who have undergone standard WES and WGS. The emerging long-read sequencing (LRS) technologies offer improvements in the characterization of genetic variation and regions that are difficult to assess. Therefore, this is considered the genetics technology of the future. Further development of the DeNovoCNN tool is necessary to use the advantages of the LRS technologies.


Based on the previous work we showed that deep-learning algorithms can be efficient in the identification of de novo variants in short-read sequencing data (WES and WGS). In this project, we will work on further improvements of DeNovoCNN to detect de novo variants in long-read sequencing data.


We have a set of 33 child-parent trios in-house, 8 of them are with 80 fully validated de novo events per trio. There are also published datasets available (Genome In A Bottle consortium data >1000 events, Noyes et al. AJHG ~ 200 events). Additional datasets might be collected through collaboration with PacBio.


Long read sequencing is expected to replace SRS within the next 10 years. LRS is already offered for research projects and will become the future of genetic diagnostics. The integration and support will be done jointly with the existing DeNovoCNN by the genetics department (Production and Support team).


You will be embedded in the department of Human Genetics at Radboudumc. We provide access to a GPU machine and research cluster.


Ole ten Hove

Ole ten Hove

Gelana Khazeeva

Gelana Khazeeva

PhD student


Christian Gilissen

Christian Gilissen

Associate professor

Human Genetics, Radboudumc