Repeat Aware Evaluation of Scaffolding Tools

Home | Install&Use | Datasets | About

The aim of this project is to provide a framework for repeat aware evaluation of scaffolding tools. The first comprehensive scaffolding evaluation was performed in (Hunt et al., Genome Biology, 2014). Its main drawback is that it considers only the “best match” for each contig, i.e. the alignment of the contig to the genome with the highest similarity score.

We proposed a new evaluation framework which has the following advantages:

Repeat-awareness
Fair evaluation
Ability to evaluate repeat-aware scaffolders (like OPERA-LG)
Ease-of-use

In the picture below is a simple example illustrating the main idea.

Reference Scaffolding is the “golden truth” scaffolding
Inferred Scaffolding is the scaffolding produced by a tool

Contig 2 has two copies in the reference, namely 2a and 2b. Contig 4 has only one copy, but is inferred to have two copies 4a and 4b in the inferred scaffolding.

Our approach is to assign contigs in the Inferred Scaffolding to the contigs of the Reference Scaffolding maximizing the number of correct links. Asigning Contig 2 and Contig 4a from the Inferred Scaffolding to Contig 2a and Contig 4 in the Reference Scaffolding correspondingly, we obtaing 2 correct links. Any other assingment delivers less correct links.

Repeat Aware Evaluation of Scaffolding Tools

Share this page on social networks: