Wednesday, April 7, 2010

FASTX-Toolkit

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.However, It is sometimes more productive to preprocess the FASTA/FASTQ files before mapping the sequences to the genome - manipulating the sequences to produce better mapping results.The FASTX-Toolkit tools perform some of these pre-processing tasks.Available Tools
  • FASTQ-to-FASTA converter: Convert FASTQ files to FASTA files.
  • FASTQ Information: Chart Quality Statistics and Nucleotide Distribution
  • FASTQ/A Collapser: Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
  • FASTQ/A Trimmer: Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).
  • FASTQ/A Renamer: Renames the sequence identifiers in FASTQ/A file.
  • FASTQ/A Clipper: Removing sequencing adapters / linkers
  • FASTQ/A Reverse-Complement: Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.
  • FASTQ/A Barcode splitter: Splitting a FASTQ/FASTA files containning multiple samples
  • FASTA Formatter: changes the width of sequences line in a FASTA file
  • FASTA Nucleotide Changer: Convets FASTA sequences from/to RNA/DNA
  • FASTQ Quality Filter: Filters sequences based on quality
  • FASTQ Quality Trimmer: Trims (cuts) sequences based on quality
  • FASTQ Masker: Masks nucleotides with 'N' (or other character) based on quality
  • These tools can be used in two forms:
  • Web-based (with Galaxy): Galaxy's Test website already contains some of the FASTX-toolkit tools.
  • Command-line: running the tools from command line (or as part of a script).
  • For fastx_quality_stats use with ascii33 offset:
    There is an undocumented argument "-Q" that determines the input quality ASCII offset (33 or 64 or other). Every program in the fastx-toolkit) knows numeric values and automatically detect ASCII (and subtract 64 from the ASCII value). If you need a different ASCII offset (e.g. 33 instead of 64), you can use the (undocumented) argument "-Q", as so:$ fastx_quality_stats -Q 33 -i INPUT.txt -o OUTPUT.txt-Gordon
    (this Post content was reproduced from: http://hannonlab.cshl.edu/fastx_toolkit/index.html)