TIGR Assembler: Assembling Large Shotgun Sequencing Projects Granger G. Sutton The Institute for Genomic Research, Gaithersburg, MD In large shotgun sequencing projects DNA fragments are assembled into a consensus sequence. The basic approach is to compare each pair of fragments to find overlaps and use this information to build a consensus sequence. Two obstacles are the large number of pairwise comparisons and the presence of repetitive elements. TIGR Assembler uses a fast initial comparison of fragments (similar to BLAST) to eliminate the need for a more sensitive comparison between most fragment pairs greatly reducing the computer search time. TIGR Assembler recognizes potential repetitive elements by determining which fragments have more potential overlaps than expected given a random distribution of fragments. Repetitive ele- ments are dealt with in a number of ways: repetitive regions are assembled last so that maximum information from nonrepetitive regions can be used, the stringency of match criteria is increased in repetitive regions, and con- straints involving fragments sequenced from both ends of a clone are used. Short repetitive elements less than half the length of the average fragment are usually not a problem because they are most often spanned by a single fragment. Likewise, repetitive elements which are significantly less similar than the fragment sequencing accuracy (e.g. 94% similar vs. 98% accurate) can be handled by increasing the match stringency. For long, nearly identi- cal repetitive elements sequencing from both ends of clones of known average length and reasonably small variance is essential. This allows frag- ments which are totally contained in a repetitive element to be properly placed by TIGR Assembler based on the position of their corresponding clone mate. This technique will not work for repetitive regions longer than the average clone length. For very long, nearly identical repetitive regions a second library of much longer clones sequenced from both ends is neces- sary for TIGR Assembler to determine which flanking regions should be joined. TIGR Assembler can fill the very long repetitive regions with a consensus sequence or the exact sequence can be determined by walking the repeat containing clone. TIGR Assembler has been used to assemble the complete genomes of H. influenzae and M. genitalium. See also: Sutton, G.G., White O., Adams M.D., and Kerlavage, A.R. (1995). TIGR Assembler: a new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1, 9-19.