资源说明:High throughput protein function annotation with Human Readable Description (HRDs) and Gene Ontology (GO) Terms.
h1. Automated Assignment of Human Readable Descriptions (AHRD) Short descriptions in sequence databases are useful to quickly gain insight into important information about a sequence, for example in search results. We developed a new program called “Automatic assignment of Human Readable Descriptions” (AHRD) with the aim to select descriptions and Gene Ontology terms that are concise, informative and precise. AHRD outperforms competing methods and can overcome problems caused by wrong annotations, lack of similar sequences and partial alignments. h2. Table of contents # "Getting started":#1-getting-started ## "Requirements":#11-requirements ## "Installation":#12-installation ### "Get AHRD":#121-get-ahrd ### "Build the executable jar":#122-build-the-executable-jar # "Usage":#2-usage ## "AHRD example usages":#21-ahrd-example-usages ## "Input":#22-input ### "Required input data":#221-required-input-data ### "Optional input data":#222-optional-input-data ### "Required config files":#223-required-config-files #### "Test custom blacklists and filters":#2231-test-custom-blacklists-and-filters ## "Batcher":#23-batcher ## "Output":#24-output ### "Tab-Delimited Table":#241-tab-delimited-table ### "Fasta-Format":#242-fasta-format ## "AHRD run using BLASTX results":#25-ahrd-run-using-blastx-results ## "Parameter Optimization":#26-parameter-optimization ### "Optimization in parallel (Trainer-Batcher)":#261-optimization-in-parallel-(trainer-batcher) ## "Computing F-Scores for selected parameter sets (AHRD-Evaluator)":#27-computing-f-scores-for-selected-parameter-sets-ahrd-evaluator # "Algorithm":#3-algorithm ## "Pseudo-Code":#31-pseudo-code ## "Used Formulae and Parameters":#32-used-formulae-and-parameters ## "Parameters":#33-parameters ### "Parameters controlling the parsing of tabular sequence similarity search result tables (legacy BLAST, BLAST+, and BLAT)":#331-parameters-controlling-the-parsing-of-tabular-sequence-similarity-search-result-tables-legacy-blast-blast-and-blat ### "Parameters controlling Gene Ontology term annotations":#332-parameters-controlling-gene-ontology-term-annotations #### "Prefer reference proteins as candidates that have GO Term annotations":#3320-prefer-reference-proteins-as-candidates-that-have-go-term-annotations #### "Custom reference Gene Ontology annotations (non UniprotKB GOA)":#3321-custom-reference-gene-ontology-annotations-non-uniprotkb-goa ##### "Custom Gene Ontology Database":#33221-custom-gene-ontology-database # "Testing":#4-testing # "License":#5-license # "Authors":#6-authors # "References":#7-references h2. 1 Getting started h3. 1.1 Requirements AHRD is a Java-Program which requires @Java 1.7@ or higher and @ant@. h3. 1.2 Installation h4. 1.2.1 Get AHRD Copy (clone) AHRD to your computer using git via command-line, then change into AHRD's directory, and finally use the latest stable version:git clone https://github.com/groupschoof/AHRD.git cd AHRD git checkout tags/v3.3.3Alternativelly without using @git@, you can download AHRD version @v3.3.3@ ("zip":https://github.com/groupschoof/AHRD/archive/v3.3.3.zip or "tar.gz":https://github.com/groupschoof/AHRD/archive/v3.3.3.tar.gz) and extract it. h4. 1.2.2 Build the executable jar Runningant distwill create the executable JAR-File: @./dist/ahrd.jar@ h2. 2 Usage All AHRD-Inputs are passed to AHRD in a single YML-File. See @./ahrd_example_input.yml@ for details. (About YAML-Format see Wikipedia/YAML) Basically AHRD needs a FASTA-File of amino acid sequences and different files containing the results from the respective BLAST searches, in our example we searched three databases: Uniprot/trEMBL, Uniprot/Swissprot and TAIR10. Note, that AHRD is generic and can make use of any number of different Blast databases that do not necessarily have to be the above ones. If e.g. annotating genes from a fungal genome searching yeast databases might be more recommendable than using TAIR (_Arabidopsis thaliana_). All parameters can be set manually, or the default ones can be used as given in the example input file @ahrd_example_input.yml@ (see sections "2.1":#21-ahrd-example-usages and "3.2":#32-used-formulae-and-parameters for more details). In order to parallelize the protein function annotation processes, AHRD can be run on batches of recommended size between 1,000 to 2,000 proteins. If you want to annotate very large protein sets or have low memory capacities use the included Batcher to split your input-data into Batches of appropriate size (see section "2.3":#23-batcher). _Note:_ As of Java 7 or higher AHRD is quite fast and batching might no longer be necessary. h3. 2.1 AHRD example usages There are _two_ template AHRD input files provided that you should use according to your use case. All example input files are stored in @./test/resources@ and are named @ahrd_example_input*.yml@. You can run AHRD on any of these use cases withjava -Xmx2g -jar ./dist/ahrd.jar your_use_case_file.yml| *Use Case* | *Template File* | | Annotate your Query proteins with Human Readable Descriptions (HRD) | @./test/resources/ahrd_example_input.yml@ | | Annotate your Query proteins with HRD _and_ Gene Ontology (GO) terms | @./test/resources/ahrd_example_input_go_prediction.yml@ | h3. 2.2 Input Example files for all input files can be found under @./test/resources/@. _NOTE:_ Only files containing @example@ in their filename should be used as template input files. Other YAML files are used for testing purposes. h4. 2.2.1 Required input data # Protein sequences in fasta format # Sequence Similarity Search (@blastp@ or @blat@) results in tabular format (If you run AHRD in batches the blast search results need to be batched in the same way as the fasta files.) Recommended Sequence Similarity Search: For your query proteins you should start independent BLAST searches e.g. in the three different databases mentioned above:blastp -outfmt 6 -query query_sequences_AA.fasta -db uniprot_swissprot.fasta -out query_vs_swissprot.txth4. 2.2.2 Optional input data If you want AHRD to predict your query protein's functions with Gene Ontology (GO) terms, you need to provide the GO annotations of reference proteins. See section "3.3.2":#332-parameters-controlling-gene-ontology-term-annotations for more details. h4. 2.2.3 Required config files # Input yml with all pathes and parameters according to your needs (see ahrd_example_input.yml and section Parameters) # Blacklists and filters (they can either be used as provided or can be adapted to your needs and databases). Each of these files contains a list of valid Java regular expressions, one per line. For details on Java regular expressions please refer to http://docs.oracle.com/javase/tutorial/essential/regex/. ## *Description blacklist* (Argument @blacklist: ./test/resources/blacklist_descline.txt@) - Any Blast-Hit's description matching one of the regular expressions in this file will be ignored. ## *Description filter* for each single blast database (Argument @filter: ./test/resources/filter_descline_sprot.txt@) - Any part of a Blast-Hit description that matches any one of the regular expressions in this file will be deleted from the description. ## *Token blacklist* (Argument @token_blacklist: ./test/resources/blacklist_token.txt@) - Blast-Hit's descriptions are composed of words. Any such word matching any one of the regular expressions in this file will be ignored by AHRD's scoring, albeit it will not be deleted from the description and thus might still be seen in the final output. h5. 2.2.3.1 Test custom blacklists and filters As explained in 2.2.3 AHRD makes use of blacklists and filters provided as Java regular expressions. You can test your own custom blacklists and filters: # Put the strings representing Blast-Hit descriptions or words contained in them in the file @./test/resources/regex_list.txt@. Note, that each line is interpreted as a single entry. # Put the Java regular expressions you want to test in file @./test/resources/match_list.txt@, using one regular expression per line. # Execute @ant test.regexs@ and study the output. Example Output for test string "activity", and regular expressions "(?i)interacting", and "(?i)activity" applied in serial:[junit] activity [junit] (?i)interacting -> activity [junit] (?i)activity ->The above example demonstrates how the first regular expression does not match anything in the test string "activity", but after matching it against the second regular expression nothing remains, because the matched substring has been filtered out. As you can see, this test applies all provided regular expression _in order of appearance_ and shows what _remains_ of the provided test string after filtering with the provided regular expressions. h3. 2.3 Batcher AHRD provides a function to generate several input.yml files from large datasets, consisting of _batches_ of query proteins. For each of these batches the user is expected to provide the batch's query proteins in FASTA format, and one Blast result file for each database searched. The AHRD batcher will then generate a unique input.yml file and entry in a batch shell script to execute AHRD on the respective batches in parallel for example on a compute cluster using a batch-system like LSF. We recommend this for very large datasets (more than a genome) or computers with low RAM. To generate the mentioned input.yml files and batcher shell script that can subsequently be used to start AHRD in parallel use the batcher function as follows:java -cp ./dist/ahrd.jar ahrd.controller.Batcher ./batcher_input_example.ymlYou will have to edit @./batcher_input_example.yml@ and provide the following arguments. Note, that in the mentioned directories each file will be interpreted as belonging to one unique Batch, if and only if they have identical file names. # @shell_script:@ Path to the shell-script file which later contains all the statements to invoke AHRD on each supplied batch in parallel. # @ahrd_call: "java -Xmx2048m -jar ./dist/ahrd.jar #batch#"@ This line will be the basis of executing AHRD on each supplied batch. The line _must_ contain @#batch#@ wich will be replaced with the input.yml files. To use a batch system like LSF this line has to be modified to something like @bsub -q normal 'java -Xmx2048m -jar ./dist/ahrd.jar #batch#'@ # @proteins_dir:@ The path to the directory the query protein batches are stored in. # @batch_ymls_dir:@ The path to the directory the AHRD-Batcher should store the generated input.yml files in. # @dir:@ Each database entry requires this argument, the path to the directory each batch's blast result file from searches in the corresponding Blast-database is located. # @output_dir:@ The directory each AHRD run should create a subdirectory with the output for the processed batch. _Batch-Name requirement:_ All above explained files belonging to the same Batch _must_ have the same name. This name must start with alpha-numeric characters and may finish with digits indicating the Batch's number. File extensions are allowed to be varying. h3. 2.4 Output AHRD supports two different formats. The default one is a tab-delimited table. The other is FASTA-Format. h4. 2.4.1 Tab-Delimited Table AHRD writes out a CSV table with the following columns: # Protein-Accesion -- The Query Protein's Accession # Blast-Hit-Accession -- The Accession of the Protein the assigned description was taken from. # AHRD-Quality-Code -- explained below # Human-Readable-Description -- The assigned HRD # Gene-Ontology-ID -- If AHRD was requested to generate Gene-Ontology-Annotations, they are appended here. AHRD's quality-code consists of a three character string, where each character is either '*' if the respective criteria is met or *'-'* otherwise. Their meaning is explained in the following table: | Position | Criteria | | 1 | Bit score of the blast result is >50 and e-value is60% | | 3 | Top token score of assigned HRD is >0.5 | h4. 2.4.2 Fasta-Format To set AHRD to write its output in FASTA-Format set the following switch in the input.yml: output_fasta: trueAHRD will write a valid FASTA-File of your query-proteins where the Header will be composed of the same parts as above, but here separated by whitespaces. h3. 2.5 AHRD run using BLASTX results In order to run AHRD on BLASTX results instead of BLASTP results you have to modify the following parameters in the ahrd_example_input.yml:token_score_bit_score_weight: 0.5 token_score_database_score_weight: 0.3 token_score_overlap_score_weight: 0.2Since the algorithm is based on protein sequences and the BLASTX searches are based on nucleotide sequence there will be a problem calculating the overlap score of the blast result. To overcome this problem the token_score_overlap_score_weight has to be set to 0.0. Therefore the other two scores have to be raised. These three parameters have to sum up to 1. The resulting parameter configuration could look like this:token_score_bit_score_weight: 0.6 token_score_database_score_weight: 0.4 token_score_overlap_score_weight: 0.0h3. 2.6 Parameter Optimization If you have well annotated reference proteins (at least 1000) from an organism closely related to the proteins you want to annotate, you can optimize AHRD's parameters to reproduce protein function predictions more alike to your references. In order to do so you need to start a simulated annealing approach:java -cp dist/ahrd.jar ahrd.controller.Trainer trainer_example_input.yml_Note:_ See @./test/resources/trainer_example_input.yml@ for details. Basically the input file is almost identical to the standard AHRD run. Only the following additional parameters are available; all numeric values given show the _default_ used for the respective parameter: # @references_fasta: path_to_your_references.fasta@ _Required_ parameter pointing to the fasta formatted file containing the reference proteins _including_ the reference Human Readable Descriptions. (See @./test/resources/references.fasta@ for details.) # @path_log: your_log_file.tsv@ Stores the parameters evaluated in any optimization iteration, their F1-Score, and the difference in score to the currently accepted parameter set. # @temperature: 75000@ Temperature used in _simulated annealing_ # @cool_down_by: 1@ Dimish temperature by this value after each simulated annealing iteration # @f_measure_beta_parameter: 1.0@ The scaling factor used to compute the F-Score of an assigned HRD when comparing it with the reference. See Wikipedia/F1_score for details. # @optimization_acceptance_probability_scaling_factor: 2500000000.0@ Used to scale the probability of accepting a currently evaluated parameter set. @P(Accept) := exp(-delta_scores*scaling_factor/current-temperature)@ # @mutator_mean: 0.25@ Mutate a randomly selected parameter by value gaussian normal distributed with this mean # @mutator_deviation: 0.15@ Mutate a randomly selected parameter by value gaussian normal distributed with this standard deviation # @remember_simulated_annealing_path: false@ Set to @true@, if you want the optimization to remember already visited parameter sets and their performance. This increases memory usage but improves final results. h4. 2.6.1 Optimization in parallel (Trainer-Batcher) Simulated Annealing is a time and resource consuming process. Because AHRD uses quite a number of parameters, the search space is accordingly large. Hence it is recommendable to start independend simulated annealing runs in parallel. Do this usingjava -cp ./dist/ahrd.jar ahrd.controller.TrainerBatcher ./test/resources/trainer_batcher_example.ymlThe following parameters are specific to the Trainer-Batcher: # @shell_script: ./start_ahrd_batched.sh@ The path to the shell script you will use to start the simulated annealing jobs in parallel. # @ahrd_call: "java -Xmx2048m -cp ./dist/ahrd.jar ahrd.controller.Trainer #batch#"@ Defines a line in the resulting shell script, requires the presence of @#batch#@ which will be replaced with the respective generated Trainer inputs. This line can be used to submit the parallel optimizations into a job queue. # @no_start_positions_in_parameter_space: 1024@ Defines how many parallel optimization runs you want to start. # @batch_ymls_dir: ./trainer_batch_ymls@ Path to the directory in which to store the generated Trainer input files. # @output_dir: ./trainer_results@ Path to the directory in which to store each Trainer run's output(s). _Note_, that the Trainer-Batcher works very much like the AHRD-Batcher (section "2.3":#23-batcher). Particularly you need to respect the _batch-name requirements_ explained there. The above Trainer-Batcher example input shows how to automatically generated a desired number of input files with different starting points in parameter space. These input files can then directly be used with the above documented AHRD Trainer (section "2.6":#26-parameter-optimization). h3. 2.7 Computing F-Scores for selected parameter sets (AHRD-Evaluator) Having different parameter sets AHRD enables you to compute their performance in terms of F-Scores for each reference protein. Optionally you can also revise the theorectically maximum attainable F-Score and see how well the best Hits from each sequence similarity search perform. In order to do so, use the Evaluator function:java -cp ./dist/ahrd.jar ahrd.controller.Evaluator ./evaluator_example.ymlSee @./test/resources/evaluator_example.yml@ for more details. Parameters specific to the Evaluator function are: # @find_highest_possible_evaluation_score: false@ Set to @true@ if you want to see how the best fitting candidate description among the sequence similarity search hits performs. This gives you the theoretical best possible performance. # @write_scores_to_output: false@ Set to @true@ if you want to see all internal intermediate scores: Token-Scores, Lexical-Scores, and Description-Scores. Use with extreme caution, because this option is meant for developers _only_. # @write_best_blast_hits_to_output: false@ Set to @true@ if you want to see the respective best sequence similarity search Hits and their performances. # @f_measure_beta_parameter: 1.0@ This parameter is also used by AHRD-Trainer. See section "2.6":#26-parameter-optimization for details. Optionally you can apply AHRD's balcklisting and filtering on the Reference Descriptions, too. This results in performance only being assessed on non-blacklistes and filtered descriptions, of which only meaningful words, i.e. non blacklisted tokens will be scored. In order to do so, you need to set _all_ of the three following parameters. Note, that you can point any of them to empty dummy files, if you wish. # @references_description_blacklist: ./test/resources/blacklist_descline.txt@ Regular expressions that identify reference descriptions not to score. # @references_description_filter: ./test/resources/filter_descline_sprot.txt@ Regular expressions that filter out parts from the reference descriptions to be ignored for performance scoring. # @references_token_blacklist: ./test/resources/blacklist_token.txt@ Regular expressions that identify non-meaningful words (tokens) not to be considered when computing the performance scores. h2. 3 Algorithm Based on e-values the 200 best scoring blast results are chosen from each database-search (e.g. Swissprot, TAIR, trEMBL). For all resulting candidate description lines a score is calculated using a lexical approach. First each description line is passed through two regular expression filters. The first filter discards any matching description line in order to ignore descriptions like e.g. 'Whole genome shotgun sequence', while the second filter tailors the description lines deleting matching parts, in order to discard e.g. the trailing Species-Descriptions 'OS=Arabidopsis thaliana [...]". In the second step of the scoring each description line is split into single tokens, which are passed through a blacklist filter, ignoring all matching tokens in terms of score. Tokens are sequences of characters with a collective meaning. For each token a score is calculated from three single scores with different weights, the bit score, the database score and the overlap score. The bit score is provided within the blast result. The database score is a fixed score for each blast database, based on the description quality of the database. The overlap score reflects the overlap of the query and subject sequence. In the second step the sum of all token scores from a description line is divided by a correction factor that avoids the scoring system from being biased towards longer or shorter description lines. From this ranking now the best scoring description line can be chosen. In the last step a domain name provided by InterProScan results, if available, is extracted and appended to the best scoring description line for each uncharacterized protein. In the end for each uncharacterized protein a description line is selected that comes from a high-scoring BLAST match, that contains words occurring frequently in the descriptions of highest scoring BLAST matches and that does not contain meaningless "fill words". Each HRD line will contain an evaluation section that reflects the significance of the assigned human readable description. h3. 3.1 Pseudo-Code # Choose best scoring blast results, 200 from each database searched # Filter description lines of above blast-results using regular expressions: ## Reject those matched by any regex given in e.g. ./test/resources/blacklist_descline.txt, ## Delete those parts of each description line, matching any regex in e.g. ./test/resources/filter_descline_sprot.txt. # Divide each description line into tokens (characters of collective meaning) ## In terms of score ignore any tokens matching regexs given e.g. in ./test/resources/blacklist_token.txt. # Token score (calculated from: bitscore, database weight, overlap score) # Lexical score (calculated from: Token score, High score factor, Pattern factor, Correction factor) # Description score (calculated from: Lexical score and Blast score) # Choose best scoring description line h3. 3.2 Used Formulae and Parameters !https://raw.githubusercontent.com/groupschoof/AHRD/master/images/formulae.jpg! h3. 3.3 Parameters Above formulae use the following parameters as given in *./ahrd_example_input.yml*. These parameters can either be used as provided or can be adapted to your needs. The weights in the above formulae aretoken_score_bit_score_weight: 0.468 token_score_database_score_weight: 0.2098 token_score_overlap_score_weight: 0.3221and Blast-Database specific, for example for UniprotKB/Swissprot:weight: 653 description_score_bit_score_weight: 2.717061h4. 3.3.1 Parameters controlling the parsing of tabular sequence similarity search result tables (legacy BLAST, BLAST+, and BLAT) AHRD has been designed to work with tabular sequence similarity search results in tabular format. By default it parses the tabular "Blast 8" output: | Sequence Similarity Search Tool | recommended output format (@command line switch@) | | Blast+ | @-outfmt 6@ | | legacy Blast | @-m 8@ | | Blat | @-out=blast8@ | The following paramters can be set optionally, if your tabular sequence similarity search result deviates from the above "Blast 8" format. See example file @./test/resources/ahrd_input_seq_sim_table.yml@ for more details. | Optional Parameter | example | meaning of parameter | | seq_sim_search_table_comment_line_regex | @"#"@ | single character that starts a to be ignored comment line | | seq_sim_search_table_sep | @"\t"@ | single character that separates columns | | seq_sim_search_table_query_col | @10@ | number of column holding the query protein's accession | | seq_sim_search_table_subject_col | @11@ | number of column holding the Hit protein's accession | | seq_sim_search_table_query_start_col | @16@ | number of column holding the query's start amino acid position in the local alignment | | seq_sim_search_table_query_end_col | @17@ | number of column holding the query's end amino acid position in the local alignment | | seq_sim_search_table_subject_start_col | @18@ | number of column holding the Hit's start amino acid position in the local alignment | | seq_sim_search_table_subject_end_col | @19@ | number of column holding the Hit's end amino acid position in the local alignment | | seq_sim_search_table_e_value_col | @20@ | number of column holding the Hit's E-Value | | seq_sim_search_table_bit_score_col | @21@ | number of column holding the Hit's Bit-Score | _NOTE:_ All above column numbers start counting with zero, i.e. the first column has number 0. h4. 3.3.2 Parameters controlling Gene Ontology term annotations AHRD is capable of annotating the Query proteins with Gene Ontology (GO) terms. It does so, by transferring the reference GO terms found in the Blast Hit AHRD selects as source of the resulting HRD. To be able to pass these reference GO terms AHRD needs a reference GO annotation file (GOA). By default AHRD expects this GOA file to be in the standard Uniprot format. You can download the latest GOA file from the "Uniprot server":http://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/. To obtain GO annotations for all UniprotKB proteins download file @goa_uniprot_all.gaf.gz@ (last visit Feb 16th 2017) To have AHRD annotate your proteins with GO terms, you just need to provide the optional parameter @gene_ontology_result@. See example file @./test/resources/ahrd_input_seq_sim_table_go_prediction.yml@ for more details. h5. 3.3.2.0 Prefer reference proteins as candidates that have GO Term annotations The parameter @prefer_reference_with_go_annos: true@ is highly recommended when aiming to annotate your query proteins with GO Terms. If this parameter is set to true only those candidate references are considered that also have GO Term annotations. However, if you put more emphasis on Human Readable Descriptions and are prepared to accept a couple of your queries to not get any GO Term predictions you can switch this off with @prefer_reference_with_go_annos: false@ or just omit the parameter as by default it is set to @false@. h5. 3.3.2.1 Custom reference Gene Ontology annotations (non UniprotKB GOA) Unfortunately UniprotKB GOA files use short protein accessions like @W9QFR0@, while the UniprotKB FASTA databases use the long protein accessions with pipes like this @tr|W9QFR0|W9QFR0_9ROSA@. In order to enable AHRD to match the correct short and long accessions, and thus the reference GO annotations it uses regular expressions to parse both the long accessions and the GOA files. By default AHRD is setup to handle Uniprot formats, but you can provide custom regular expressions for your own custom GOA files: # Set the regular expression to extract short protein accessions mapped to GO terms from the reference GOA using @reference_go_regex@, e.g. @reference_go_regex: ^UniProtKB\\s+(?\\S+)\\s+\\S+\\s+(? GO:\\d{7})@ # Set the Blast database specific regular expression, used to extract the short accessions from long ones with @short_accession_regex:@ For example for TAIR10, use @short_accession_regex: "^(? .+)$"@. _Note:_ You must provide the above named match groups @shortAccession@ and @goTerm@, respectively. h2. 4 Testing If you want to run the complete JUnit Test-Suite execute: antIf you want to execute a test run on example proteins use:ant test.runh2. 5 License See attached file LICENSE.txt for details. h2. 6 Authors Dr. Asis Hallab, Kathrin Klee, Florian Boecker, Dr. Girish Srinivas, and Prof. Dr. Heiko Schoof INRES Crop Bioinformatics University of Bonn Katzenburgweg 2 53115 Bonn Germany h2. 7 References High quality genome projects that used AHRD: fn1. Young, Nevin D., Frédéric Debellé, Giles E. D. Oldroyd, Rene Geurts, Steven B. Cannon, Michael K. Udvardi, Vagner A. Benedito, et al. “The Medicago Genome Provides Insight into the Evolution of Rhizobial Symbioses.” Nature 480, no. 7378 (December 22, 2011): 520–24. doi:10.1038/nature10625. fn2. The Tomato Genome Consortium. “The Tomato Genome Sequence Provides Insights into Fleshy Fruit Evolution.” Nature 485, no. 7400 (May 31, 2012): 635–41. doi:10.1038/nature11119. fn3. International Wheat Genome Sequencing Consortium (IWGSC). “A Chromosome-Based Draft Sequence of the Hexaploid Bread Wheat (Triticum Aestivum) Genome.” Science (New York, N.Y.) 345, no. 6194 (July 18, 2014): 1251788. doi:10.1126/science.1251788. fn4. International Barley Genome Sequencing Consortium, Klaus F. X. Mayer, Robbie Waugh, John W. S. Brown, Alan Schulman, Peter Langridge, Matthias Platzer, et al. “A Physical, Genetic and Functional Sequence Assembly of the Barley Genome.” Nature 491, no. 7426 (November 29, 2012): 711–16. doi:10.1038/nature11543. fn5. Wang, W., G. Haberer, H. Gundlach, C. Gläßer, T. Nussbaumer, M. C. Luo, A. Lomsadze, et al. “The Spirodela Polyrhiza Genome Reveals Insights into Its Neotenous Reduction Fast Growth and Aquatic Lifestyle.” Nature Communications 5 (2014): 3311. doi:10.1038/ncomms4311. fn6. Guo, Shaogui, Jianguo Zhang, Honghe Sun, Jerome Salse, William J. Lucas, Haiying Zhang, Yi Zheng, et al. “The Draft Genome of Watermelon (Citrullus Lanatus) and Resequencing of 20 Diverse Accessions.” Nature Genetics 45, no. 1 (January 2013): 51–58. doi:10.1038/ng.2470. AHRD was applied on all plant genomes present in the public database *PlantsDB*: fn7. Spannagl, Manuel, Thomas Nussbaumer, Kai C. Bader, Mihaela M. Martis, Michael Seidel, Karl G. Kugler, Heidrun Gundlach, and Klaus F. X. Mayer. “PGSB PlantsDB: Updates to the Database Framework for Comparative Plant Genome Research.” Nucleic Acids Research 44, no. D1 (January 4, 2016): D1141–47. doi:10.1093/nar/gkv1130.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。