For audio portion of webcast please dial: +44

For audio portion of webcast please dial: +44

For audio portion of webcast please dial: +44 (0)870 22 333 65 (please omit zero if calling from outside the UK) PIN = 444888 Copyright 2004 Synamatix sdn bhd Personal Introductions Robert Hercus - MD and Inventor, Synamatix Over 30 years IT experience Pioneered many large-scale IT projects Language of Biology basis of Synamatix Interests: Linguistics, Genomics, Artificial Intelligence Ali Zamli Bioinformatician Research Scientist Synamatix applications development Dr. Arif Anwar VP, Synamatix 10 yrs+ post-Ph.D. US and EU genomics background Ex Agilent, CLONTECH and Axon Instruments For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U)

Questions to answer today? 1. What is a SynaBASE? 2. What are the advantages of using SynaBASE? 3. In which situations has SynaBASE been applied to? 4. Does the use of SynaBASE offer any advantages for phylogenetics? For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Core IP - SynaBASE - PLATFORM Main partners and users in US and EU 50+ staff split across group Open approach to development engine not software Focused on efficient HPC for Genomics and Life Sciences For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK)

Copyright 2006 Synamatix sdn bhd (538481-U) CORE Database platform API calls Graphical Interface Command line interface Applications Data analys is SXParse Develo p Tools SXSequenceRefs

SynaSearch Bulk SXLRESearch SynaRex Bulk SXFuzzyPatternSearch SynaProbe Bulk SXAlign Sxpet For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Software policy More than 40 existing applications All open source to licensees of SynaBASE Users can also develop, modify and share all applications For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from

outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) What do we know about data ? Similarity & association Common PATTERNS and functionality For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Pattern Trie A AA More memory

efficient than variable length data structures C AC AAC T CT ACT AACT TC CTC ACTC

AACTC Going to leaf node finds all sources and positions For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Pattern Trie A AA AAA C AC AAC

T CT ACT TC CTC Low complexity repeats - filtered AAA AACT f=100 ACTC AACTC f=2 0

High frequency patterns removed from alignment seeding For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Building a SynaBASE easy and fast For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Takes 8 minutes for Swissprot The fields in the build form are equivalent to the command-line XML configuration Fields data is converted

into XML format and added to the existing entry in the Synabase XML configuration file For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Pattern Trie A AA C AC AAC T CT

ACT AACT TC CTC ACTC AACTC Trie Boundary Frequency is greater than build limit For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Flexibility to use CMD line For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from

outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Single-server IT architecture SynaBASE & SynaSuite Server HP Integrity rx4640 server Dual Intel Itanium2 1.5GHz CPU 64 GB DDR memory 146GB Ultra320 SCSI hard disk x 2 Red Hat Enterprise Linux AS 3 for IA64 For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) 1. SynaBASE scales efficiently 250 Flat file Database size (Mbytes) 200

S. pneumoniae R6 genome size = 2.068 Mbytes 150 100 SynaBASE 50 0 10 20 40 60 80 100 Number of Streptococcus pneumoniae r6 genomes For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK)

Copyright 2006 Synamatix sdn bhd (538481-U) 2. SynaBASE enables very fast access A C AA T AC Number of levels small For a query: CT TC Match 1st longest pattern Follow Eulerian path through network, picking up longest matching pattern for each posn. In query

Processing time is: AAC ACT AACT CTC ACTC AACTC Proportional to query size to obtain all unique subpatterns CTCG ACTCG TCGA CTCGA For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from

outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Efficiency leads to high performance Only 15million nodes are needed to represent 56million residues The storage of the shorter nodes has little effect For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) 3. SynaBASE is very fast - Q* logN base A Speed milliseconds 900 800 700 Conventional 600 SynaBASE

500 400 300 200 100 1 10 100 Size of database mega bp 1000 For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) BLASTN vs. SynaSearch-Bulk Cumulative Number of hits shows SynaSearch Bulk found extra hits at low-mid identities

Novel hits For audio of DB webcast dial:queried +44 with (0)870 22 333 65 (omit zero if calling from SynaBASE and Blast of 700000please Bacterial ORFs 100 1kb sequences outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) 4. Novel annotation using SynaBASE The elephant and the giraffe walked up the mountain

A graph showing Frequency of string (word) patterns in a sentence does not reflect meaning The elephant and the giraffe walked up the mountain A graph showing Probabilities of predicting Precessor and Successor Characters/events (string Significance) reflecting meaning For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) a1 a2 a3 a2 a3 a1 a2 a1 a2 a3 Expected Frequency Ef(a1a2a3) =

F(a1a2) * F(a2a3) F(a2) SIGNIFICANCE Sig(a1a2a3) = Actual Freq/Expec Freq F(a1a2a3) / Ef(a1a2a3) = Fr(a1a2a3) * F(a2) F(a1a2) * F(a2a3) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Gene models correlate with SIGNIFICANCE Ensembl Gene F2 F3 PIM1 Oncogene

For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Example 1 - 454 assembly result 400,000 reads assembled into 11 contigs in 11 minutes, 2 minutes for error correction Genome coverage 99.89% For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) FragBASE using the SynaBASE structure. Use corrected FragBASE Select patterns of high coverage Use FragBASE network* to extend patterns Increase pattern size to overcome shorter repeat sections For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from

outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Example 2 - Microarrays Probe design 30000 75mer probes, 8 per gene in 8h compared to previous 3 month+ process Probe evaluation and mapping Mapping of 600,000 Affymetrix 25mer probes to Human genome in 17s Compares to over 2 weeks with BLAST For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Example 3 Comparative Genomics 3 yrs 22days 6h

SynaBASE PatternHunter BLAST For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Example 4 Genome mapping Aims: Mapping of whole genome shotgun reads from a mammalian genome to the Human Genome, to facilitate genome assembly using Synamatix and public tools. Compare sensitivity, specificity and performance advantages of Synamatix technologies . Results: In comparison to BLASTz, SynaSearch: Is 219 fold faster Finds 11% more true positives Finds 17% more unique hits to queries Has a higher specificity: 113% fewer false positives fewer multiple placements per read 2.7 v 5.3 Benefits:

Enables significant enhancements in workflow throughput. 219 fold compute time improvement SynaSearch requires only 1 search process whereas BLASTz requires genome to be separated into 5MB chunks and apportioned across multiple processors. Results in better assemblies of new genomes. Reduces current reliance on outsourcing of BLASTz analysis. For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Further example of use of SynaBASE engine: applying SynaBASE to Phylogenetics Inference of a phylogenetic network of whole prokaryotic genomes using SynaBASE Copyright 2004 Synamatix sdn bhd Outline of study Primary data set 1: 101 Bacterial and Archaeal Genomes Used SynaTree exhaustive comparison between Sequences in SynaBASE structure

Generates phylogenetic tree Used prototype Synamatix application: SXComparePattern exhaustive pattern based similarity matching Evaluation of methods using: C-score method* Group visualisation and clustering analysis Tested SXComparePattern method with a larger 488 Bacterial Genome data set *Henz S.R., Huson D.H., Auch A.F. Struwe K.N-. and Schuster S.C. (2005) Whole-genome prokaryotic phylogeny. For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from Bioinformatics. 21(10): 2329-2335 outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Phylogenetics using SynaTree For each query genome, can search SynaBASE

for all alignments with all other genome sequences {srefs, posn, length} The alignment scores can then be used to calculate a distance matrix: 2 Aij D log( ) Li Lj Where: A = alignment score L = length of respective genomes The distance matrix is used to generate a phylogenetic tree For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SynaTree Interface For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from

outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SynaTree uses SXAlign API for comparing alignments It can be seen from the chart that the resulting triplet in a sliding window include significant alignments and also spurious short matches that are not significant. The SynaBASE align function, SXAlign, includes a filter to remove the random short alignments or 'noise' from the alignment data. The alignment scores are then used to calculate a distance matrix SynaTree uses the SXAlign API for comparing alignments For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from

outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Example of filtering Chart shows the effect of using diagonal alignment filter on the alignment of 2 Serine Kinase aa sequences For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SynaTree for 101 bacterial & archaeal genomes 95 minutes! Compared to 7 days

with BLAST For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SynaTree for 101 bacterial & archaeal genomes Cyanobacteria Firmicute Chlamydiae For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Genome 1 1st Pattern 2nd Pattern 3rd Pattern

4th Pattern 6th Pattern 7th Pattern 8th Pattern 9th Pattern AGGCTGAGGCTG AGGCT GGCTG GCTGAG CTGAGG TGAGG GAGGC AGGCT GGCTG Genome 2 1st Pattern 2nd Pattern 3rd Pattern 4th Pattern 5th Pattern 6th Pattern 7th Pattern

TTGTAGGCTCCGAGC TTGTA TGTAG GTAGG TAGGC AGGCT GGCTCCG GCTCCGA Genome 3 1st Pattern 2nd Pattern 3rd Pattern 4th Pattern 5th Pattern 6th Pattern 7th Pattern TGCGCTGAGCCT TGCG GCGC CGCTG GCTGAG CTGAGC TGAGCC

GAGCCT 2nd method: SXComparePattern Frequency of each pattern AGGCT GCTGAG 2 1 1 0 0 1 Genome 1 Genome 2 Genome 3 Raw score for patterns Genome 1 Genome 2 Genome 3 Genome 1 2+1

1 1 Genome 2 1 1+1 0 Genome 3 1 0 1+1 Calculation of distance matrix from raw score by distance formula For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SXComparePattern Approach Distance matrix calculated is the same as before with some exceptions:

2 Aij D log( ) Li Lj Where: A =shared patterns between genomes i and j L= number of patterns for respective genomes Here, the calculation is based on shared patterns between each genomic sequences For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SXComparePattern tree for 101 bacterial and archaeal genomes 23seconds ! Compared

to 7 days with BLAST For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) SXComparePattern tree for 101 bacterial and archaeal genomes Chlamydiae Cyanobacteria Firmicute For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Perfomance based on grouping Phylum Class

Fusobacteria Cyanobacteria Chlamydia Firmicute Proteobacteria Proteobacteria Proteobacteria Proteobacteria Archae Actinobacteria Deinococcus Thermotoga Aquificales Spirochaete Green bacteria SynaTree grouped outliers 1 3 6 20 8 2 11

2 14 3 2 1 1 1 1 Henz method grouped outliers 0 1 0 3 0 5 5 16 3 6 1 2 8 12

2 4 2 16 3 6 0 1 0 1 0 1 1 1 0 1 SXComparePattern NCBI grouped outliers reference 0 1 0 100%

0 3 0 100% 0 6 0 100% 7 18 7 100% 2 4 7 100% 1 2 1 100% 4 11 8 100% 0

2 2 100% 0 13 3 100% 0 3 3 100% 0 2 0 100% 0 1 0 100% 0 1 0 100% 1 1

1 100% 0 1 0 100% For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Evaluation of phylogenetic networks Evaluation of phylogenetic networks based on c-score proposed by Henz, et al. (2005) Tc c score To Which is essentially a sum of compatible nontrivial splits (Tc) divided by the sum of all nontrivial splits in the test tree Assumption is that the compatability of nontrivial splits is compared against a reference tree which is deemed 'correct'. For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK)

Copyright 2006 Synamatix sdn bhd (538481-U) NCBI Reference Tree For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Zoomed tree of 488 Bacterial Genomes For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Performance comparison 1000

Rapid method for inferring phylogenetic networks. Filter SXComparePattern SynaTree Henz et. al. (2005)b SXComparePattern* On N/A N/A On Score a 0.542 0.596 0.627 - Elapsed

time (min) 0.4 95 10200 3.56 SXComparePattern highlighted above and marked with * is with 488 bacterial sequences 600 Time (minutes) Approach Pattern length limit 15 15 N/A 15 800

400 200 0 For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) Summary SynaBASE platform extensible to phylogenetics Pattern based approach provides for a very rapid and scalable means of clustering genomes into phylogenetic networks Enables multi-supercomputer performance from a single server This same approach can be used to cluster and analyse previously improbable data sets, e.g. All primate genomes All genes Iterative analysis of evolutionary phylogenetics

For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U) END OF WEBCAST Thank you for your participation! Next Webcast will be on April 30 Use of SynaBASE for assembly of reads from 454 Life Sciences sequencing platform A full paper of the work presented will be sent to you on Monday next week Please email: [email protected] if you have any questions or would like a free trial For audio of webcast please dial: +44 (0)870 22 333 65 (omit zero if calling from outside the UK) Copyright 2006 Synamatix sdn bhd (538481-U)

Recently Viewed Presentations

  • Name: ___________________________ 5th Grade List 27: Prefixes in-,

    Name: ___________________________ 5th Grade List 27: Prefixes in-,

    Name: _____ 5th Grade List 27: Prefixes in-, un-, dis-, mis-Spelling Tic Tac Toe Homework mishap disadvantage insincere
  • Signs and Symptoms of Adverse Reactions ... - Allergy, Nutrition

    Signs and Symptoms of Adverse Reactions ... - Allergy, Nutrition

    It is important for this reason that the allergy evaluation be based on the patient's history and directed by a health care professional with understanding of allergy _____ Cox et al 2008 * Use of the Information Use the information...
  • Local Church Lay Leaders/Lay Members to Annual Conference ...

    Local Church Lay Leaders/Lay Members to Annual Conference ...

    Lay Leader Responsibilities. Represent the lay people . Strengthen ministries that build discipleship. Assist in advising of opportunities available and needs expressed . Celebrate the ministry of the laity. Meet regularly with the pastor. Continue involvement in study and training....
  • Radial Lightning Structure

    Radial Lightning Structure

    (-) STEERING LAYER PRESSURE: intensification favored for storms moving more with the upper level flow - this predictor usually only comes into play when storms get sheared off and move with the flow at very low levels (in which case...
  • FPGA Implementation of the 3-D FDTD Algorithm Wang

    FPGA Implementation of the 3-D FDTD Algorithm Wang

    The more parallelism, the faster speed. As long as the FPGA chip space is adequate, we can parallel more pipelines to speedup the design. In current FPGA chip, it is possible to use 6 or 12 pipelines to double or...
  • Enterprise Risk Management Board of Trustees Oversight Discussion

    Enterprise Risk Management Board of Trustees Oversight Discussion

    New technologies for learning, course delivery, and collaboration that require culture change, new ways of working together, and significant investments in technology and training. Changes in the nature of work that are changing the what graduates need to be prepared...
  • The Tabernacle a Type of The Church

    The Tabernacle a Type of The Church

    THE TABERNACLE. TYPE - TABERNACLE. The place of worship for Israel. ANTITYPE - CHURCH. And I heard a loud voice from heaven saying, "Behold, the tabernacle of God . is. with men, and He will dwell with them, and they...
  • Precision Content Tools, Techniques, and Technology The evolution

    Precision Content Tools, Techniques, and Technology The evolution

    Importance of Content Strategy. Before any technology is considered, organizations must first consider their content strategy to fully understand what they need and how they are going to get there. Several factors must be examined. @2015 Precision Content Authoring Solutions...