Efficiently Managing Intervals: A Trip to Interval Tree Algorithm

Yangyang Li

3/8/23

Why manipulate intervals

  • One of fundamental and important tasks in bioinformatics
  • Identify overlapping exons, introns, or regulatory regions
  • Compare genomic intervals between different samples
  • Detect candidate genes or regulatory elements.
  • Detect structural variants.

…and much more

How to manipulate intervals

What is Interval Tree

How to optimize

  • Use self-balanced tree

  • Use continue memory to store data

What is Coitree (Cache Oblivious Interval Trees)

  • Implement a data structure for very fast overlap queries of a static set of integer intervals
  • Store intervals in contiguous memory
  • Improve query performance by storing the nodes in in-order van Emde Boas layout layout.

Benchmarking Results

A (4,816,112) VS B (44,426,501)
Command Mean[s] Min[s] Max[s] Relative
coitree-default 4.393 4.308 4.469 1.24
bedtools 14.004 13.822 14.353 3.96
ailist 5.768 5.675 5.963 1.63
B (44,426,501) VS A (4,816,112)
Command Mean [s] Min [s] Max [s] Relative
ailist 7.047 6.919 7.268 1.00
coitree-default 8.326 8.209 8.522 1.18
bedtools 29.385 28.971 29.887 4.17

Why SIMD (single instruction multiple data)

Benchmarking Results

Table 1: A (4,816,112) VS B (44,426,501)
Command Mean[s] Min[s] Max[s] Relative
coitree-neon 3.535 3.512 3.608 1.00
coitree-default 4.393 4.308 4.469 1.24
bedtools 14.004 13.822 14.353 3.96
ailist 5.768 5.675 5.963 1.63
Table 2: B (44,426,501) VS A (4,816,112)
Command Mean [s] Min [s] Max [s] Relative
ailist 7.047 6.919 7.268 1.00
coitree-neon 8.106 8.040 8.323 1.15
coitree-default 8.326 8.209 8.522 1.18
bedtools 29.385 28.971 29.887 4.17
Table 3: B (44,426,501) VS B (44,426,501)
Command Mean[s] Min[s] Max[s] Relative
coitree-neon 11.263 11.052 11.529 1.00
coitree-default 12.625 12.429 12.957 1.12
bedtools 38.661 38.218 39.473 3.43
ailist 12.282 12.125 12.628 1.19
Table 4: A (4,816,112) VS B-unsorted (44,426,501)
Command Mean[s] Min[s] Max[s] Relative
coitree-neon 5.636 5.386 7.295 1.00
coitree-default 6.351 6.304 6.406 1.13
bedtools 223.421 223.585 224.342 44.60
ailist 6.559 6.508 6.743 1.16

Take Home Message

  • A high-performance interval tree implementation is available.
  • Select an appropriate library based on your requirements.
  • Reconsider your algorithm before coding.
  • Take a look https://github.com/cauliyang/coitrees/ for more information.
  • Attempt to provide an API for other programming languages.

The End