PAVED- A Software suite for the analysis of epigenome-derived next generation sequencing data |
MAIN INDEX ANALYTICAL PIPELINE CONTACT SYSTEM REQUIREMENTS PAVED Package Example Data | Pipeline to Extract Annotated ValleysThis pipeline walks you through the steps to extract valleys from the MNAse-Seq like data. To use this pipeline, you need sequenced control dataset as well as an experimental dataset.Prerequisites1) Align the fastq files to the genome of interest using your choice of alignment algorithm (BWA, BOWTIE and Novoalign)2) Convert to binary alignment map format and sort by genomic position using Samtools. PipelineStep1: Find the median insert size for each of your datasets using the script findInsertSizeStatistics. From the output files generated, draw conclusions on minimum and maximum tolerable insert sizes. For single end reads, skip this step.Step2: Construct the fragments that are within the insert size limits found in Step 1 using filterBAMbyInsertSize utility and filter out rest of the reads. For single end reads, skip this step. Step3: Find fragment depth for each of the datasets using findFragmentDepth utility. For single end reads, use findReadDepth utility. Step4: Find the average fragment depth/read depth for the control and experimental datasets using the utility data4HistogramDepth Step5: Using the normalizeDepthFile utility, normalize the depth values from files in Step 3 by a factor based values found in step 4. Step6: Find the fold change values between the normalized control and experimental data using the foldChangeReadDepth utility. Step7: Find the valleys using the utility findValleys. The thresolds can be inferred based on observations derived by using data4Histogram and trackNRest utilities Step8: Find the annotations for the valleys using the utility findAnnotations. |