Dr. Bertil Schmidt

Associate Professor, School of Computer Engineering, Nanyang Technological University

Welcome!

Bertil Schmidt

Latest news

Workshop at ICCS 2010: I am organizing the "2nd Workshop on Emerging Parallel Architectures" - to be held at ICCS 2010.

PRIB 2008: I will be presenting two papers at the PRIB 2008 conference in Melbourne, 15-17 Oct 2008.

Progam Comittees: I am in the Programme Committees of IEEE HiCOMB 2009, IEEE CEC 2009, PBC 2009

EFL: My soccer team, South Buona Vista Saints, is currently leading the EFL.

Links:

Bertil's Research

Posted on Tuesday, July 15, 2008 at 2:53 PM by Bertil Schmidt

Research Interests

  • High Performance Computing
  • Bioinformatics
  • Reconfigurable Computing
  • GPGPU
  • Heterogeneous Mutlicores

Research Background

New high performance computing challenges emerge daily. Each problem raises questions as to

  • What algorithms and data structures to use?
  • How to exploit parallelism?
  • Which computer architectures will minimize the execution time?

My group is working on solutions to emerging high performance computing problems on hybrid parallel computer architectures with a very low price-performance ratio. Such architectures can provide the flexibility to speedup a wide range of algorithms at both fine-grained and coarse-grained parallel levels.

Examples for the fine-grained parallel level are

  • reconfigurable architectures (FPGA)
  • graphics architectures (GPU)
  • heterogeneous multi-core architectures (in particular the Cell BE).

Examples of coarse-grained parallel and distributed architectures include

  • PC clusters
  • Compuational Grids (based on the BOINC middleware)

We are currently investigating the use of this infrastructure to support advanced algorithms and applications in serveral different domain. Some examples are described in the follwoing.

1. Bioinformatics

Algorithms and Systems for New Sequencing Technologies (such as Solexa/Illimuna):

Next generation, rapid, low-cost genome sequencing promises to address a broad range of genetic analysis applications. One of the ambitious goals for these technologies is to produce a complete human genome in a reasonable time frame for US$100,000, and eventually US$1,000. In order to do this, throughput must be increased dramatically. This is achieved by carrying out many parallel reactions. Although the read-length is short (currently around 35 base-pairs for Solexa/Illumina), the overall throughput is enormous, each run producing up to several hundreds of million reads and billions of base-pairs of sequence data. Therefore, computational methods for analyzing and managing the massive numbers of the short reads produced by these platforms, is urgently needed. In particular existing assembly tools and algorithms have been designed and optimized for shotgun sequencing. Since new sequencing technologies are using much shorter reads these approaches cannot be applied for assembly next-generation sequencing technologies (in particular the characterization of sequencing errors is different).

We are currently working on the design of efficient assembly algorithms on high performance computers for new sequencing technologies.

Partners: Shi Haixiang (NTU), Bryan Beresford-Smith (NICTA), Jan Schroeder (Kiel Uni), Heiko Schroder (RMIT), Simon Puglisi (RMIT), Ranjan Sinha (Melbourne Uni),

Pandemic Control System.

Recent occurrences of pandemics like the SARS or the Avian Influenza clearly display the threat and seriousness of global diseases. The steadily growing globalization makes it difficult to contain pandemics to a certain region. Therefore, pandemic control is of highest importance to human health. Unfortunately, the segmented nature of viruses is very conducive for genetic shift and their rapid spread across various genera augments genetic drift. For instance, the H5N1 outbreak in Hong Kong in 1997 has demonstrated the ability of an avian virus to jump from birds to humans directly. This project investigates a new approach to pandemic control by constantly monitoring molecular evolution at both macro level (within the group of viruses) and micro level (within the group of strains). The goal is to facilitate prediction of how viruses are evolving in spatial, temporal, and host dimension, and therefore, allows for fast and efficient responses to new outbreaks as well as their diagnosis. In order to achieve high-quality predictions, computational analysis of these viruses is required at gene and at genome level. However, corresponding algorithms suffer high runtimes due to their high computational complexities as well as the large datasets involved. Therefore, it is necessary to develop IT solutions that use suitable algorithms and take advantage of high performance technologies.

Partners: D.T. Singh (Genvea Biosciences)

Related Publication: D.T. Singh, R.Trehan, B. Schmidt, T. Bretschneider: "Comparative Phyloinformatics of Virus Genes at Micro and Macro Levels in a Distributed Computing Environments", BMC Bioinformatics, Vol. 9:S23, 2008

2. Simulation

Computational simulation of complex systems is of highest importance to R&D in science and engineering. Examples of areas that require extensive simulation include the engineering design of modern aircrafts, development of new materials (such as Carbon-Nano-Tubes) and drugs as well as climate prediction. Most of these simulations are highly compute-intensive. Examples include computational electrodynamics modeling techniques (such as FMM) and molecular dynamics simulations for protein docking. However, in order to achieve highly accurate simulations large input data sets are required. This, in turn leads to prohibitive runtimes of such simulations on traditional computer architectures. The purpose of this project is to investigate and evaluate the suitability of new classes of emerging high performance architectures (such as FPGAs, GPUs and Cell BE) for engineering simulations.

Partners: Liu Weiguo (NTU), G. Alleon (EADS), P.K. Kolatkar (GIS), Francis Nai (PhD Student)

Related Publication: W. Liu, B. Schmidt, G. Voss, W. Mueller-Wittig: "Accelerating Molecular Dynamics simulations using Graphics Processing Units with CUDA", Computer Physics Communications, in press, doi:10.1016/j.cpc.2008.05.008

3. Data Mining and Database Searching

Data Mining is increasingly used in science and engineering to extract information from the enormous data sets generated by modern experimental and observational methods. The process of finding statistically overrepresented patterns in a data set can be divided into three phases: model specification, model evaluation, and search. Model specification includes the selection of a suitable analysis method from a whole range of available algorithms. Utilized algorithms are usually iterative optimization methods such as Expectation Maximization (EM), Gibbs Sampling or evolutionary algorithms. In order to avoid getting trapped into local minima, a large number of starting points or populations have to be tested. This, in turn, can be highly compute-intensive depending on the size of the data set, the model representation and objective function. Using high performance computers for this task allows the application of more complex model representations and objective functions to data mining methods on larger input data sets. An example is the hybrid parallel EM algorithms for the identification of Transcription Factor Binding Sites (TFBS) in genomic sequences, which we have recently developed.

Other examples that we are currently working on are fast searching of large genomic databases using BLAST and BLAST-like tools.

Partners: Liu Weiguo (NTU), D. Maskell (NTU), V. Brusic (Harvard). M. Rajapakse (I2R), H. Schroder (RMIT)

Related Publications:

  • C. Chen, B. Schmidt, W. Liu, W. Mueller-Wittig: "GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Seqeunces", PRIB 2008, Lecture Notes in Bioinformatics, Springer, to appear
  • H. Zhang, B. Schmidt, W. Mueller-Wittig: "Accelerating BLASTP on the Cell Broadband Engines", PRIB 2008, Lecture Notes in Bioinformatics, Springer, to appear
  • M. Rajapakse, B. Schmidt, F. Lin, V. Brusic: "Predicting peptides binding to MHC class II molecules using multi-objective evolutionary algorithms", BMC Bioinformatics, 2007, 8:459
Edited on: Wednesday, July 16, 2008 9:29 AM

Posted in (RSS)