Technology Overview


BIOwulf is advancing Machine Learning through the development and application of kernel-based learning algorithms such as "Support Vector Machines." Our enabling technology uses supervised learning techniques to perform the best analysis in a given problem domain. This methodology requires the development of specialized algorithms, a formidable task.

Machine Learning approaches do not rely upon specific rules for solving problems. Instead, the system is created by generating a set of "training examples," so that the algorithm can learn the pre-existing organization within the dataset, i.e. how to extract the hidden structure through pattern recognition. A learning algorithm is created to automatically build an intelligent system from the training examples. Machine Learning is concerned with the design and the analysis of computer programs that improve with experience.

Algorithms based on descriptive statistics can sometimes lead to acceptable results for simple tasks. The real difficulty, however, lies in cases where the problems are of higher dimensionality, i.e. where many input variables exist and the underlying patterns are of a complex nature. In order to exploit patterns in the data, two features are necessary: the ability to (1) detect patterns and (2) ensure that the patterns are reliable and not the spurious product of chance. To implement the first feature requires overcoming a computational problem and the second a statistical problem. Fortunately, machine learning has seen a breakthrough in these problems: a new generation of learning systems has been created that combines remarkable generalization performance with computational efficiency and theoretical elegance.

A prominent example of this breakthrough is a new class of learning machines using the ideas of kernel similarity measures and support vectors. The idea is that the choice of a certain nonlinear similarity measure used to compare data can be mathematically understood as a mapping into a high-dimensional feature space. Implicitly, the learning algorithm is performed in that feature space. For instance, in the case of pattern recognition, a separating hyperplane is constructed in the feature space. Two aspects of this hyperplane are crucial. First, it is unique, since it is found by solving a so-called convex quadratic programming problem. This way, a standard problem of Neural Networks, which suffer from multiple local solutions whose quality is hard to assess, is avoided. Second, the hyperplane can be characterized by a small subset of the training data, called the support vectors. They are identified as being the crucial data points in the training set.