Refining Protein-Level MicroRNA Target Interactions in Disease from Prediction Databases Using Sentence-BERT
- Posted
- Server
- bioRxiv
- DOI
- 10.1101/2024.05.17.594604
MicroRNAs (miRNAs) regulate gene expression by binding to mRNAs, inhibiting translation, or promoting mRNA degradation. miRNAs are of great importance in the development of various diseases. Currently, a variety of miRNA target prediction tools are available, that analyze sequence complementarity, thermodynamic stability, and evolutionary conservation to predict miRNA-target interactions (MTIs) within the 3’ untranslated region (3’UTR). We propose a concept for further screening of human sequence-based predicted MTIs by considering the disease similarity between miRNAs and genes to establish a prediction database of disease-specific MTIs. We finetuned the Sentence-BERT model to calculate the semantic similarity of disease. The method achieved an F1 score of 0.88 in accurately distinguishing human protein-level experimentally (Western Blot, Reporter Assay, etc.) validated MTIs and predicted MTIs. Moreover, this method exhibits exceptional generalizability across different databases. The proposed method was utilized to calculate the similarity of disease in 1,220,904 human MTIs from miRTarbase, miRDB, and miRWalk, involving 6,085 genes and 1,261 pre-miRNAs. The study holds the potential to offer valuable insights into comprehending miRNA-gene regulatory networks and advancing progress in disease diagnosis, treatment, and drug development.