<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Bioinformatics on Bytes of Life</title><link>https://yangyangli.top/categories/bioinformatics/</link><description>Recent content in Bioinformatics on Bytes of Life</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>2026 &lt;a class='hover:underline hover:decoration-primary-400 hover:text-primary-500' href=https://yangyangli.top target=_blank rel='noopener noreferrer'&gt;Yangyang Li&lt;/a&gt;</copyright><lastBuildDate>Thu, 07 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://yangyangli.top/categories/bioinformatics/index.xml" rel="self" type="application/rss+xml"/><item><title>DeepChopper: A Genomic Language Model that Cleans Up Nanopore Direct RNA Sequencing</title><link>https://yangyangli.top/posts/027-deepchopper/</link><pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate><guid>https://yangyangli.top/posts/027-deepchopper/</guid><description>&lt;h2 class="relative group"&gt;The chimera mystery
&lt;div id="the-chimera-mystery" class="anchor"&gt;&lt;/div&gt;
&lt;span
class="absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100 select-none"&gt;
&lt;a class="group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline" href="#the-chimera-mystery" aria-label="Anchor"&gt;#&lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;Direct RNA sequencing (dRNA-seq) on Oxford Nanopore looks, on paper, like a transcriptomics dream.
You sequence native RNA molecules end to end, you keep the modifications, and you skip every reverse-transcription and PCR step that has been quietly polluting short-read data for years.
For a while, that was the story we were telling ourselves.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/posts/027-deepchopper/featured.jpg"/></item><item><title>A language model enables accurate structural variant detection in whole-genome amplified long-read sequencing</title><link>https://yangyangli.top/projects/006-chimeralm/</link><pubDate>Thu, 23 Jan 2025 00:00:00 +0000</pubDate><guid>https://yangyangli.top/projects/006-chimeralm/</guid><description/><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/projects/006-chimeralm/featured.png"/></item><item><title>Genomic Language Model Mitigates Chimera Artifacts in Nanopore Direct RNA Sequencing</title><link>https://yangyangli.top/projects/005-deepchopper/</link><pubDate>Thu, 31 Oct 2024 00:00:00 +0000</pubDate><guid>https://yangyangli.top/projects/005-deepchopper/</guid><description/><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/projects/005-deepchopper/featured.png"/></item><item><title>Aurora Is a Web Application for Visualizing Non-linear Graph</title><link>https://yangyangli.top/talks/001-codex/</link><pubDate>Thu, 04 Apr 2024 00:00:00 +0000</pubDate><guid>https://yangyangli.top/talks/001-codex/</guid><description/><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/talks/001-codex/featured.png"/></item><item><title>PxBLAT: An Efficient and Ergonomic Python Binding Library for BLAT</title><link>https://yangyangli.top/projects/001-pxblat/</link><pubDate>Sun, 25 Jun 2023 00:00:00 +0000</pubDate><guid>https://yangyangli.top/projects/001-pxblat/</guid><description/><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/projects/001-pxblat/featured.png"/></item><item><title>Efficient Genomic Interval Search Using SIMD-Enhanced COITree</title><link>https://yangyangli.top/posts/019-efficient-genoimc-interval-search/</link><pubDate>Sun, 12 Mar 2023 00:00:00 +0000</pubDate><guid>https://yangyangli.top/posts/019-efficient-genoimc-interval-search/</guid><description>&lt;h2 class="relative group"&gt;Background
&lt;div id="background" class="anchor"&gt;&lt;/div&gt;
&lt;span
class="absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100 select-none"&gt;
&lt;a class="group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline" href="#background" aria-label="Anchor"&gt;#&lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;In bioinformatics, researchers frequently analyze various types of genomic data, such as DNA sequencing data, RNA sequencing data, and epigenetic data.
Manipulating genomic intervals is a crucial task in comprehending the genetic basis of diseases and identifying potential therapeutic targets.
Genomic intervals are defined as regions that span from a starting position to an ending position and can encompass genes, regulatory elements, and other functional elements of the genome.
One primary application of genomic interval manipulation is analyzing ChIP-seq data.
Moreover, manipulating genomic intervals allows for the integration of ChIP-seq data with other genomic data types, such as gene expression and genetic variations.
This integration provides a more comprehensive understanding of biological processes and their contribution to normal development or disease.
However, integrating these data types into a single data structure can pose challenges, especially when handling large datasets.
Cache Oblivious Interval Trees (COITree), with cache-oblivious design and efficient query algorithms, have the potential to handle and integrate multiple types of genomic data into a single data structure.
It stores the intervals in contiguous memory and employs in-order van Emde Boas layout to enhance query performance.
The tree is designed to optimize cache performance by reducing the number of cache misses during traversal.
However, COITree still suffer from performance bottlenecks, particularly when dealing with large datasets.
One approach to addressing this bottleneck is to use Single Instruction Multiple Data (SIMD), which is optimized for vector operations, to improve the performance of COITree.
Thus, I hypothesize that the approach is a viable solution for improving the speed and efficiency of genomic interval analysis.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/posts/019-efficient-genoimc-interval-search/featured.jpg"/></item><item><title>How to Use Noodles Library in Rust</title><link>https://yangyangli.top/posts/001-rust-noodles/</link><pubDate>Sat, 04 Mar 2023 00:00:00 +0000</pubDate><guid>https://yangyangli.top/posts/001-rust-noodles/</guid><description>&lt;h2 class="relative group"&gt;1. Introduction
&lt;div id="1-introduction" class="anchor"&gt;&lt;/div&gt;
&lt;span
class="absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100 select-none"&gt;
&lt;a class="group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline" href="#1-introduction" aria-label="Anchor"&gt;#&lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;&lt;a
href="https://github.com/zaeleus/noodles"
target="_blank"
&gt;Noodles&lt;/a&gt; and &lt;a
href="https://github.com/rust-bio/rust-htslib"
target="_blank"
&gt;Rust-htslib&lt;/a&gt; are two widely used Rust libraries for genomic data handling.
While both libraries are designed to work with genomic data, they take different approaches to achieve this goal.
This blog explores Noodles and compares it with &lt;a
href="https://github.com/rust-bio/rust-htslib"
target="_blank"
&gt;Rust-htslib&lt;/a&gt;, while also discussing its potential pitfalls.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/posts/001-rust-noodles/featured.png"/></item><item><title>Bioinformatics Algorithm Library aka BINARY</title><link>https://yangyangli.top/projects/003-bioinformatics-algorithm-library/</link><pubDate>Mon, 26 Sep 2022 00:00:00 +0000</pubDate><guid>https://yangyangli.top/projects/003-bioinformatics-algorithm-library/</guid><description>&lt;p&gt;The library is a collection of algorithms and data structures that are designed for modern C++ bioinformatics applications.
You can use the library in your own projects or as a part of a larger project.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/projects/003-bioinformatics-algorithm-library/featured.png"/></item><item><title>Bioinformatics Toolbox Aka Boss</title><link>https://yangyangli.top/projects/002-new-tool-bioinformatics-toolbox-aka-boss/</link><pubDate>Sun, 25 Sep 2022 00:00:00 +0000</pubDate><guid>https://yangyangli.top/projects/002-new-tool-bioinformatics-toolbox-aka-boss/</guid><description>&lt;p&gt;BOSS is a bioinformatics toolbox, which will contain efficient tools. It is written in modern C++ and is tested exhaustively. It is designed to be easy to use and time-efficient. BOSS is a free software and is distributed under the terms of the GNU General Public License V3.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/projects/002-new-tool-bioinformatics-toolbox-aka-boss/featured.png"/></item><item><title>C++ Development in Bioinformatics</title><link>https://yangyangli.top/posts/003-build-cpp-development-env-with-htslib/</link><pubDate>Wed, 15 Jun 2022 00:00:00 +0000</pubDate><guid>https://yangyangli.top/posts/003-build-cpp-development-env-with-htslib/</guid><description>&lt;h2 class="relative group"&gt;1.1 Config Compile Environment
&lt;div id="11-config-compile-environment" class="anchor"&gt;&lt;/div&gt;
&lt;span
class="absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100 select-none"&gt;
&lt;a class="group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline" href="#11-config-compile-environment" aria-label="Anchor"&gt;#&lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;I am currently planning to develop a tool using &lt;em&gt;C++&lt;/em&gt; in both Linux and macOS environments.
However, I frequently encounter obstacles in the form of lacking root access to download dependencies using &lt;code&gt;apt-get install -y dependencies&lt;/code&gt; directly in Ubuntu.
Navigating the complicated dependency chain and compiling each library individually can be time-consuming, often taking a night or even a week to complete.
One solution to this issue is to use a package manager such as &lt;strong&gt;Conda&lt;/strong&gt;, which is primarily used in the &lt;em&gt;data science&lt;/em&gt; domain.
&lt;strong&gt;Conda&lt;/strong&gt; offers support for other languages such as &lt;em&gt;C++&lt;/em&gt;, &lt;em&gt;Rust&lt;/em&gt; and &lt;em&gt;R&lt;/em&gt; as well.
Concrete package names may change at any time, and it&amp;rsquo;s necessary to search for the real package name.
Therefore, &lt;strong&gt;Conda&lt;/strong&gt; can be useful tool for installing &lt;em&gt;C++&lt;/em&gt; dependencies, particularly in the bioinformatics domain.
It&amp;rsquo;s worth mentioning that there are several other solutions available for managing &lt;em&gt;C++&lt;/em&gt; dependencies such as &lt;a
href="https://vcpkg.io/en/index.html"
target="_blank"
&gt;Vcpkg&lt;/a&gt;, &lt;a
href="https://conan.io/"
target="_blank"
&gt;Conan&lt;/a&gt;, and I use &lt;a
href="https://github.com/cpm-cmake/CPM.cmake"
target="_blank"
&gt;CPM&lt;/a&gt; as an alternative option.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/posts/003-build-cpp-development-env-with-htslib/featured.jpg"/></item><item><title>Nei Saitou Neighbor Joining</title><link>https://yangyangli.top/posts/008-nei-saitou-neighbor-joining/</link><pubDate>Wed, 03 Apr 2019 00:00:00 +0000</pubDate><guid>https://yangyangli.top/posts/008-nei-saitou-neighbor-joining/</guid><description>&lt;h2 class="relative group"&gt;1. Background
&lt;div id="1-background" class="anchor"&gt;&lt;/div&gt;
&lt;span
class="absolute top-0 w-6 transition-opacity opacity-0 ltr:-left-6 rtl:-right-6 not-prose group-hover:opacity-100 select-none"&gt;
&lt;a class="group-hover:text-primary-300 dark:group-hover:text-neutral-700 !no-underline" href="#1-background" aria-label="Anchor"&gt;#&lt;/a&gt;
&lt;/span&gt;
&lt;/h2&gt;
&lt;p&gt;Before diving into code, the description of NJ algorithm can be found in
&lt;figure&gt;
&lt;img class="my-0 rounded-md" loading="lazy" alt="This Link" src="https://cdn.jsdelivr.net/gh/cauliyang/blog-image@main//img/1605172209524.png"&gt;
&lt;/figure&gt;
, where first column indicates parent node, and second column is its children node, the last column is the value of edge.&lt;/p&gt;</description><media:content xmlns:media="http://search.yahoo.com/mrss/" url="https://yangyangli.top/posts/008-nei-saitou-neighbor-joining/featured.jpg"/></item></channel></rss>