PAVED- A Software suite for the analysis of epigenome-derived next generation sequencing data |
MAIN INDEX ANALYTICAL PIPELINE CONTACT SYSTEM REQUIREMENTS PAVED Package Example Data | Pipeline to Extract Annotated PeaksThis pipeline walks you through the steps to extract peaks from the FAIRE-Seq like data. To use this pipeline, you need sequenced control dataset as well as an experimental dataset.Prerequisites1) Align the fastq files to the genome of interest using your choice of alignment algorithm (BWA, BOWTIE and Novoalign)2) Convert to binary alignment map format and sort by genomic position using Samtools. PipelineStep1: Find the median insert size for each of your datasets using the script findInsertSizeStatistics. From the output files generated, draw conclusions on minimum and maximum tolerable insert sizes. For single end reads, skip this step.Step2: Construct the fragments that are within the insert size limits found in Step 1 using filterBAMbyInsertSize utility and filter out rest of the reads. For single end reads, skip this step. Step3: Find fragment depth for each of the datasets using findFragmentDepth utility. For single end reads, use findReadDepth utility. Step4: Find the average fragment depth/read depth for the control and experimental datasets using the utility data4HistogramDepth Step5: Using the normalizeDepthFile utility, normalize the depth values from files in Step 3 by a factor based values found in step 4. Step6: Find the fold change values between the normalized control and experimental data using the foldChangeReadDepth utility. Step7: Find the peaks using the utility findPeaks. The thresolds can be inferred based on observations derived by using data4Histogram and trackNRest utilities Step8: Find the annotations for the valleys using the utility findAnnotations. |