http://www.gly.uga.edu/railsback/FieldImages.html

PAVED- A Software suite for the analysis of epigenome-derived next generation sequencing data

MAIN

INDEX

ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

PAVED Package

Example Data

Filter BAM by insert size

The filterBAMbyInsertSize utility takes a BAM file as input and filters it to include segments that are with in a certain insert size limit. The reads from the next generation sequencing technology are short typically of the order of 100bp. The non-specificity of the short reads allows them to be aligned to multiple loci across the genome and the aligner has to make a decision regarding where they must go. The fuzziness involved in the non-specificity of a read allows for random errors in alignment which can be corrected based on the alignment information of the paired read. This utility includes only those fragments that are within a certain insert size thereby eliminating noise that might have been introduced during alignment.

Prerequisites

1) Align the fastq files to the genome of interest using your choice of alignment algorithm (BWA, BOWTIE and Novoalign)
2) Convert to binary alignment map format and sort by using Samtools.

How to run it?

Type java -jar PAVED.jar filterBAMbyInsertSize -h to see the list of options

Run the utility as follows:

java -jar C:\Britta\manuscript\Analysis\PAVED.jar filterBAMbyInsertSize -i "C:\Britta\manuscript\Analysis\data\BAMFile\ControlRep1Chr5.bam" -o "C:\Britta\manuscript\Analysis\data\BAMFilesInsertSize\ControlRep1Chr5.bam" -m 104 -n 328

Here, "C:\Britta\manuscript\Analysis\" is the location where the jar file is present on the local disk, -i is the input sorted bam file and -o is the output filtered bam file, -m is the minimum insert size and -n is the maximum insert size.