Data scientists working at the Icahn School of Medicine at Mount Sinai in New York have created an artificial intelligence model that may more accurately predict which existing medicines, not currently classified as harmful, may in fact lead to congenital disabilities.
The model is a form of “knowledge graph” and it has the potential to predict the involvement of pre-clinical compounds that may harm the developing foetus.
Birth defects are abnormalities that affect about 1 in 33 births in the U.S.. These can be functional or structural and result from various factors, including genetics. However, the specific causes of most of these disabilities remain unknown. It is thought that certain substances found in medicines, cosmetics, food, and environmental pollutants can potentially lead to birth defects if exposed during pregnancy.
According to lead researcher Avi Ma’ayan, PhD, Professor, Pharmacological Sciences, and Director of the Mount Sinai Center for Bioinformatics: “Although identifying the underlying causes is a complicated task, we offer hope that through complex data analysis like this that integrates evidence from multiple sources, we will be able, in some cases, to better predict, regulate, and protect against the significant harm that congenital disabilities could cause.”
To develop the AI model, the researchers gathered knowledge across several datasets on birth-defect associations noted in published work to demonstrate how integrating data from these resources can lead to synergistic discoveries.
This included combining data from the known genetics of reproductive health, classification of medicines based on their risk during pregnancy, and how drugs and pre-clinical compounds affect the biological mechanisms inside human cells. These data included studies on genetic associations, drug- and preclinical-compound-induced gene expression changes in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecule drugs.
Through the use of semi-supervised learning, the scientists prioritized 30,000 preclinical small molecule drugs for their potential to cross the placenta and induce birth defects. This approach to machine learning uses a small amount of labelled data to guide predictions for much larger unlabelled data.
The outcome was the identification of more than 500 birth-defect/gene/drug cliques that could explain molecular mechanisms that underlie drug-induced birth defects.
Although the study was successful, the researchers caution that the study’s findings are preliminary and that further experiments are needed for validation. The research may lead to assessing more complex data, such as gene expression from specific tissues and cell types collected at multiple stages of development.
The research appears in the journal Nature, titled “Toxicology Knowledge Graph for Structural Birth Defects.”
