PANGEA-plus
文件大小: unknow
源码售价: 5 个金币 积分规则     积分充值
资源说明:A new implementation of PANGEA pipeline for metagenomics with multiple classification methods and consensus analysis
#PANGEA+

A new implementation of PANGEA pipeline for faster and more accurated metagenomics with multiple classification methods and consensus analysis.


#Download

(LINUX):

    wget https://github.com/Bioinfo-Tools/PANGEA-plus/tarball/master -O BioinfoTools_PANGEA-plus.tar.gz

(MAC):

    curl https://github.com/Bioinfo-Tools/PANGEA-plus/tarball/master -o BioinfoTools_PANGEA-plus.tar.gz     

#Extract the files:

    tar –xvf BioinfoTools_PANGEA-plus.tar.gz


Your work dir should be set as the PANGEA-plus directory.

    cd BioinfoTools_PANGEA-plus
    export PANGEAWD=$PWD

#Install parallel BLAST (for High Performance Computing clusters)

    cd $PANGEAWD/Classify/Runblast
    sh install_MPI_blast.sh

#Trimming your input sequences

    cd $PANGEAWD/Trim
    perl trim2.3.pl -a ../input_A.txt -b ../input_B.txt -g 100

where: perl trim2.3.pl ...
	-a raw illumina input file read 1
	-b raw illumina input file read 2 (if any) 
	-g size of GAP between paired-ends (if any) 
	-t truncate size (if any)
	-q quality file (in case of FASTA input)
	-qc quality cutoff value
	-lc minimum length 

Supported formats: FASTA, FASTQ and QSEQ.

Results will be saved in $PANGEAWD/output/trim2 folder

#Download / Install Blast

    cd $PANGEAWD/Classify/Runblast
    sh install_blast.sh

#Download NCBI database for classification

    cd $PANGEAWD
    wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz
    gunzip nt.gz

#Format the database

    $PANGEAWD/Classify/Runblast/makeblastdb -in $PANGEAWD/nt -out $PANGEAWD/nt -dbtype nucl

#Classify your sequences using parallel BLAST search

    cd $PANGEAWD/Classify/Runblast

Example of parallel BLAST (MPI-blastn) executed in a PBS/Torque/Maui HPC cluster:

Use an example submission script available in $PANGEAWD/Scripts directory

    *EDIT THE FILE submit_MPI-blast.job FIRST!

    qsub $PANGEAWD/Scripts/submit_MPI-blast.job

where: 	input.fasta refers to your sequences after trimming.

For running parallel blast for multiple input files at the same time:

    *EDIT THE FILE submit_multiple_MPI-blast.job FIRST! Follow the instructions in the file.

    *Replace ./dir/ by your input directory and change the values of these parameters before running: "database="; "total_processes="; "nodes="
    
    for i in ./dir/*.fasta; do qsub submit_multiple_MPI-blast.job -v in=`echo $i`,out=`echo $i.txt`,database=database_name,nodes=4, total_processes=16; done

where: 
./dir/ is your input sequences directory
nodes= is the number of requested nodes
total_processes= is the total number of processes requested
database= is the name of database
The output files will have the same name of your inputs, but with .txt suffix.

Example using your own blastn installation:

    export PATH=$PANGEAWD/Classify/Runblast:$PATH
    blastn -query input.fasta -db database.formated -outfmt 6 -out blast_output.txt 

#Parse the taxonomic classification based on the NCBI taxonomy databases

Running NCBI-taxcollector:

    cd $PANGEAWD/Tax_class
    make all
    wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
    wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.tar.gz
    tar -xvf taxdump.tar.gz
    tar -xvf gi_taxid_nucl.dmp.tar.gz
    ./tax_class -c
    perl NCBI-taxcollector-0.01.pl –f $PANGEAWD/parallel_output.txt -o $PANGEAWD/parallel_output_class.txt > report.txt

where: 	parallel_output.txt is the mpiblastn results
parallel_output_class.txt are the parsed and classified output generated by this program.


#Classify your sequences using RDP Classifier search

    cd $PANGEAWD/Classify/RunRDP/
    sh install_RDPClassifier.sh
    java -Xmx1g -jar rdp_classifier-2.5.jar -q $PANGEAWD/input_trimmed.txt -o output_rdp.txt
    
Where:  -q refers to the query file.
   	-o refers to the output file.
   	
   	
#Classify your sequences using parallel SOAP Aligner search

Format your database:

     cd $PANGEAWD/Classify/Runsoap/soap2.21release
     ./2bwt-builder $PANGEAWD/database.fasta

Run sequence search:     

     ./soap -a $PANGEAWD/input_trimmed.fasta -D $PANGEAWD/database.fasta.index -o $PANGEAWD/output_soap.txt -p 8 -M 4

Where:  -D   Prefix name for reference index [*.index].
	-a   Query file, for SE reads alignment or one end of PE reads.
	-b   Query b file, one end of PE reads.
	-o   Ouput file
	-p   Number of threads
	-M   INT   Match mode for each read or the seed part of read,  which
	shouldn't contain more than 2 mismaches, [4]
	0: exact match only
 	1: 1 mismatch match only
  	2: 2 mismatch match only
  	3: [gap] (coming soon)
  	4: find the best hits

#Run Consensus Analysis

    cd $PANGEAWD/Consensus
    perl Consensus_BLAST_SOAP_RDP-1.1.pl -b output_blast_class.txt -r output_rdp.txt -o output_consensus.txt

Where:
	-b Classification results (Blast) parsed by NCBI-taxcollector
	-r Classification results (RDP)
	-s Classification results (SOAP2)
	-o Output file (txt)

The output shall look like this:

         S000008953	[0]Bacteria;[1]Firmicutes;[2]Bacilli;[3]Bacillales;[4]Bacillaceae;[5]Bacillus;[6]Bacillus_sp._8A18S6;		92.61	1435	81	23	29	1452	1	1421	0.0	2039
         #Matches found: 4
         S000010870	[0]Bacteria;[1]Firmicutes;[2]Bacilli;[3]Bacillales;[4]Bacillaceae;[5]Bacillus;[6]Bacillus_sp._8A18S6;		91.78	1435	90	26	49	1469	1	1421	0.0	1971
         #Matches found: 4
         S000014058	[0]Bacteria;[1]Firmicutes;[2]Bacilli;[3]Bacillales;[4]Bacillaceae;[5]Bacillus;[6]Bacillus_sp._8A18S6;		92.20	1435	88	22	29	1453	1	1421	0.0	2008
         #Matches found: 4
         S000016099	[0]Bacteria;[1]Firmicutes;[2]Bacilli;[3]Bacillales;[4]Bacillaceae;[5]Bacillus;[6]Bacillus_sp._8A18S6;		91.66	1438	86	29	49	1469	1	1421	0.0	1960
         #Matches found: 4


#Cluster your results by identity:

Example for 80% identity*:

    perl $PANGEAWD/Megaclust/megaclust2.pl -i $PANGEAWD/output_consensus.txt -o $PANGEAWD/output_consensus.megaclust_80_hits.txt -b 100 -s 80 -e 1e-20

*More examples and automatic scripts at $PANGEAWD/Scripts


#Generate summary table for classified results:

Example for Domain level (80%) similarity*:

    perl $PANGEAWD/Megaclustable/megaclustable.pl -m $PANGEAWD/output_consensus.megaclust_80_hits.txt -t 0 -o $PANGEAWD /results/megaclustable/DomainTable.txt

*Note: in the –m option you shall list all the ouput files generated by the megaclust execution for every sample. More examples and automatic scripts at $PANGEAWD/Scripts.

The classification output should be like this:

    		1	2	3	4	5	6	7	8	9	10
    Bacteria	479	4	32	7507	11977	13245	2129	11222	539	2411	
    Eukaryota	1	4	5	5	2	17	78	3	10	3	
    Archaea		1	0	0	0	0	0	0	0	0	1		

#References


本源码包内暂不包含可直接显示的源代码文件,请下载源码包。