Let’s keep some helpful links of bioinformatics blogs and such.

>Bioinformatics Adventures in Unix:

>Genomic services lab on hudsonAlpha
>Annotating a SAM file-best method auto thelw na kanw😛
>Extracting regions from .bam file using a .gff or .gtf file
>Question: Samtools or Bedtools: How to filter a bam file with a bed file using strand information
>DNA functional data miner


Almost in any bioinformatics process you are going to use an alignment tool. The most known is BLAST. Although, for the work I have BLAT is better, it gives me the chromosome as an output and i don’t need to write extra code to search the sequence ID on GenBank.

But what are the differences between those two alignment tools, except the S in their name.

First of all, the algorithms are structured differently. On DNA, BLAT works by keeping an index of an entire genome in memory. Thus, the target database of BLAT  is not a set of GenBank sequences, but instead an index derived from the assempbly of the entire genome. By default, the index consists of all non-overlapping 11-mets except for those heavily involved in repeats, and it uses less that a gigabyte of RAM. This smaller size means that BLAT is far more easily mirrored than BLAST. BLAT of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments.

On proteins, BLAT uses 4-mers rather than 11-mers, finding protein sequences of 80% and greater similarity to the query of length 20+ amino acids. The protein index requires slightly more than 2 GB RAM. in practice due to sequence divergence rates over evolutionary time DNA BLAT works well within humans and primates, while protein BLAT continues to find good matches within terrestrial vertebrates and even earlier organisms for conserved proteins. Within humans, protein BLAT gives a much better picture of gene families (paralogs) than DNA BLAT. However, BLAST and psi-BLAST at NCBI can find much more remote matches.

From a practical standpoint, BLAT has several advantages over BLAST:

  1. speed (no queues, response in seconds) at the price of lesser homology depth
  2. the ability to submit a long list of simultaneous queries in FASTA format.
  3. five convenient output sort options
  4. a direct link into the UCSC browser
  5. alignment block details in natural genomic order
  6. an option to launch the alignment later as part of custom track
BLAT is commonly used to look up the location of a sequence in the genome or determine the exon structure of an mRNA, but expert users can run large batch jobs and make internal parameter sensitivity charges by installing command line BLAT on their own LINUX server.
1. BLAT indexes the genome/protein database returns the index in memory,  and then scans the query sequence for matches. BLAST on the other hand, builds an index of the query sequences and searches through the database for matches, A BLAST variant called MegaBLAST indexes 4 databases to speed up alignments.
2. BLAT can extend on multiple perfect and near-perfect matches (default is 2 perfect matches of length 11 for nucleotide searches and 3 perfect matches of length 4 for protein searches, while BLAST extends only when one or two searches occur close together.
3. BLAT requires query sequences in FASTA format, while BLAST accepts both FASTA and queries by accession number.
4. BLAT connects each homologous area between two sequences into a single larger alignment, in contrast to BLAST which returns each homologous area as a separate local alignment. The result of BLAST is a list of exons with each alignment extending just past the end of the exon. BLAT, however, correctly places each base of the mRNA onto the genome, using each base only once and can be used to identify intron-exon boundaries (splice sites).
5. BLAT is less sensitive than BLAST.
source: wikipedia
BLAT is fast, but demands more exact matches. BLAST will allow lower scoring hits, and allows more gaps in alignments. You will get more hits with BLAST, but it may be slower.


An easy programming language that is widely used in Bioinformatics.

Let’s start with some links about it before we start learning.

Favorite links, Microsoft coding.

One of the things that I hate is having a long list of favorites on my browser, browser history and cookies, the browser ones, not the ones that I dip on my coffee. That’s why I will add some links here to keep them in mind.

In this article I will store some Programming Links.

  1. Microsoft App Studio
  2. TouchDevelop
  3. Hour of Code

A BAM to FASTA Odyssey

For the bioinformatics work I had to do in the research center I had to work with FASTA files, but the lab had Illumina machine that as output gives BAM files. So the question was clear, «how do I convert a BAM file to FASTA». If you search that on the internet you will realize that it is one of the most common problems/questions that people who work in bioinformatics have.

So I started looking around. I thought to use R/Bioconductor to do it, that doesn’t seem as a good idea, because R/Bioconductor is developed for statistical research on genomic data, which means, you have the file, you open it and the R/Bioconductor helps you to view them, and do statistics and graphs. Though I kinda found a way, didn’t manage to do something but here it is.


Hello gnosis

As in most blogs the first article is the reason that the blog was created. Well I won’t reinvent the wheel, so the reason of this blog is to keep a track of the things I learn about some stuff, mostly science, as this is the field of my work.

Let’s start from the domain of the blog. The general idea is that the knowledge is chaotic, you can never learn too much and most importantly you can’t keep the knowledge you gain in your head nice and tidy. That’s why I picked the name «khaognosis» (from chaos [chaos was already in use :P] and gnosis). So I will try to keep the knowledge I try to gain in here, at least a portion of it, and also to learn about blogging, as this is my first attempt to make a blog.

In this blog, the articles will be mostly in English and/or in Greek. The topic will be about bioinformatics, biology, physics, programming and probably anything that has to do with knowledge. Also, I am making this blog as a personal workblog. I don’t expect people finding it great or at least reading it, but I am creating it to help myself and anyone else that will find it useful or charming to learn or maintain knowledge.

Good Start.