Background: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume\r\nthat functional classes are organized hierarchically, that is, general functions include more specific ones. This has\r\nrecently motivated the development of several machine learning algorithms for gene function prediction that\r\nleverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to\r\nexploit relationships among examples, since it is plausible that related genes tend to share functional annotations.\r\nAlthough these relationships have been identified and extensively studied in the area of protein-protein interaction\r\n(PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction.\r\nRelations between genes introduce autocorrelation in functional annotations and violate the assumption that\r\ninstances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms.\r\nAlthough the explicit consideration of these relations brings additional complexity to the learning process, we expect\r\nsubstantial benefits in predictive accuracy of learned classifiers.\r\nResults: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in\r\nmulti-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in\r\nthe setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called\r\nNHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO\r\nannotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into\r\naccount improves the predictive performance of the learned models for predicting gene function.\r\nConclusions: Our newly developed method for HMC takes into account network information in the learning phase:\r\nWhen used for gene function prediction in the context of PPI networks, the explicit consideration of network\r\nautocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for\r\ndifferent gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved\r\nwhen the PPI network is dense and contains a large proportion of function-relevant interactions.
Loading....