Background: Large comparative genomics studies and tools are becoming increasingly more compute-expensive as\r\nthe number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures\r\nare likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative\r\ncomputing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and\r\nenable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned\r\na typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon''s\r\nElastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of\r\nfully sequenced genomes.\r\nResults: We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to\r\n100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large\r\nand small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD.\r\nConclusions: The effort to transform existing comparative genomics algorithms from local compute infrastructures is\r\nnot trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with\r\nmanageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily\r\nadaptable to similar comparative genomics problems.
Loading....