http://www.gly.uga.edu/railsback/FieldImages.html

PAVED- A Software suite for the analysis of epigenome-derived next generation sequencing data

MAIN

INDEX

ANALYTICAL PIPELINE

CONTACT

SYSTEM REQUIREMENTS

PAVED Package

Example Data

Pipeline to Extract Annotated Peaks

This pipeline walks you through the steps to extract peaks from the FAIRE-Seq like data. To use this pipeline, you need sequenced control dataset as well as an experimental dataset.

Prerequisites

1) Align the fastq files to the genome of interest using your choice of alignment algorithm (BWA, BOWTIE and Novoalign)
2) Convert to binary alignment map format and sort by genomic position using Samtools.

Pipeline

Step1: Find the median insert size for each of your datasets using the script findInsertSizeStatistics. From the output files generated, draw conclusions on minimum and maximum tolerable insert sizes. For single end reads, skip this step.

Step2: Construct the fragments that are within the insert size limits found in Step 1 using filterBAMbyInsertSize utility and filter out rest of the reads. For single end reads, skip this step.

Step3: Find fragment depth for each of the datasets using findFragmentDepth utility. For single end reads, use findReadDepth utility.

Step4: Find the average fragment depth/read depth for the control and experimental datasets using the utility data4HistogramDepth

Step5: Using the normalizeDepthFile utility, normalize the depth values from files in Step 3 by a factor based values found in step 4.

Step6: Find the fold change values between the normalized control and experimental data using the foldChangeReadDepth utility.

Step7: Find the peaks using the utility findPeaks. The thresolds can be inferred based on observations derived by using data4Histogram and trackNRest utilities

Step8: Find the annotations for the valleys using the utility findAnnotations.