Efficient Genomic Interval Search Using SIMD-Enhanced COITree

Sun, 12 Mar 2023 00:00:00 +0000

Background
#

In bioinformatics, researchers frequently analyze various types of genomic data, such as DNA sequencing data, RNA sequencing data, and epigenetic data. Manipulating genomic intervals is a crucial task in comprehending the genetic basis of diseases and identifying potential therapeutic targets. Genomic intervals are defined as regions that span from a starting position to an ending position and can encompass genes, regulatory elements, and other functional elements of the genome. One primary application of genomic interval manipulation is analyzing ChIP-seq data. Moreover, manipulating genomic intervals allows for the integration of ChIP-seq data with other genomic data types, such as gene expression and genetic variations. This integration provides a more comprehensive understanding of biological processes and their contribution to normal development or disease. However, integrating these data types into a single data structure can pose challenges, especially when handling large datasets. Cache Oblivious Interval Trees (COITree), with cache-oblivious design and efficient query algorithms, have the potential to handle and integrate multiple types of genomic data into a single data structure. It stores the intervals in contiguous memory and employs in-order van Emde Boas layout to enhance query performance. The tree is designed to optimize cache performance by reducing the number of cache misses during traversal. However, COITree still suffer from performance bottlenecks, particularly when dealing with large datasets. One approach to addressing this bottleneck is to use Single Instruction Multiple Data (SIMD), which is optimized for vector operations, to improve the performance of COITree. Thus, I hypothesize that the approach is a viable solution for improving the speed and efficiency of genomic interval analysis.

Bioinformatics Algorithm Library aka BINARY

Mon, 26 Sep 2022 00:00:00 +0000

The library is a collection of algorithms and data structures that are designed for modern C++ bioinformatics applications. You can use the library in your own projects or as a part of a larger project.

Nei Saitou Neighbor Joining

Wed, 03 Apr 2019 00:00:00 +0000

1. Background
#

Before diving into code, the description of NJ algorithm can be found in

, where first column indicates parent node, and second column is its children node, the last column is the value of edge.

Algorithms on Bytes of Life

Efficient Genomic Interval Search Using SIMD-Enhanced COITree

Background #

Bioinformatics Algorithm Library aka BINARY

Nei Saitou Neighbor Joining

1. Background #

Background
#

1. Background
#