|
Web interface and SQL database All data, including taxon and voucher information, gene sequences, and intermediary results, are stored in an SQL database. Access to the data is provided for registered users using a secure web interface build upon the Zope application server (www.zope.org). Participants of AFTOL use the interface to enter specimen voucher information, to maintain sequencing primer information, and to access and verify the results of their sequencing reactions, automated analyses, and blast searches during all stages of the automated analyses. The wwwinterface also provides functionality to include GenBank sequences for further analysis and to faciliate submission of finalized gene sequences to GenBank. The database and web interface are closely linked to a set of software applications which analyze and verify the user generated sequence data. Each time sequences are entered, corrected, or deleted, subsequent analyses related to these changes are triggered to keep all information in the database consistent. |
|
Base-calling and contig assembly Sequencing is performed on two ABI 3700 automated sequencers at the Duke Biology Sequencing Facility. The resulting electrophoreograms are assembled and evaluated using phred and phrap (www.phrap.org). Each base of a sequence read is assigned a quality value based on the characteristics of the read, the chemistry, and the sequencing hardware. All primer sequences used for sequencing reactions, with information about their respective target genes and their orientation, are stored in the database. The subsequent contig assembly takes into account the base qualities when assembling the single reads into a contig sequence. Each base of the contig sequence itself is assigned a quality score that can indicate problematic regions and the overall quality of the contig. Sequences that pass all quality checks (length, quality, blast) are transferred to their respective gene tables. |
|
Sequence verification and local BLAST Both single sequence reads and assembled contig sequences are subjected to several steps of verification:
|
|
Alignment (Work in Progress) Verified gene sequences are automatically aligned to a core alignment. The core alignment contains information about intron positions, ambiguous regions, and other nonalignable elements that are excluded from the alignment process. Starting with the largest block, alignable regions of the core alignment are successively aligned to the new sequence. As a result the sequence is broken down into smaller regions, making the subsequent alignment of the shorter regions easier with each step. This algorithm generates high quality alignments, especially when large amounts of ambiguous regions or introns are present, e.g., in the fungal LSU and SSU nrDNA. Test for congruence Prior to the phylogenetic analysis, data sets are tested for congruence. Sequences that cause conflict between data sets are excluded to ensure compatibility between the alignments. |
|
Phylogenetic Analysis (Work in progress) Data sets that have been verified for congruence are analyzed on a regular schedule. Single gene data sets, as well as selected combinations, are analyzed using a variety of methods, such as Maximum Parsimony, Maximum Likelihood, and Bayesian MCMC. Support is estimated with Bootstrap, Bayesian posterior probabilities, and Bayesian Bootstrap. The results are available for the participants of AFTOL using the web interface. |
|
AFToL and WASABI
WASABI – Web Accessible Sequence Analysis for Biological Inference: 'Fungal trees grow faster with computer help' - Science, Vol 309, Issue 5733, 374 , 15 July 2005.
Introduction Communication Framework
General information about AFTOL can be obtained at: For more information about the bioinformatic package developed for AFTOL and to access WASABI go to: Data section of the AFTOL website Data sets used in publications of AFTOL can be downloaded at: |
|
Department of Biology, Duke University, Box 90338, Durham, NC 27708 Database driven website by J. Bélisle and E. Rivas Plata |