MSC student in CS @ University of Salerno
BGP D2D is a distributed library which goal was to calculate similarity matrix of several genomic sequences; the similarity metric used is D2.
The library has been implemented with Apache Hadoop. Distributed D2 implementation consist of a first MapReduce phase (to read k-mers occurrences from KMC output file and calculate partial D2 scores) and an eventual second one where if more than one task is created to sum partial scores.
For further references, read project README.