Computational Knowledge Discovery and Learning in Complex Systems
Simon Kasif
University of Illinois at Chicago &
Johns Hopkins University, Baltimore
The goal of scientific research is to understand the world around us in general, and the behavior of complex systems and processes (physical, biological, social, organizational and computational) in particular.
This understanding allows us to create new methods to monitor, and manipulate complex systems, invent new processes and subsequently create numerous opportunities for increasing the quality of human life and endeavor.
The scientific method typically involves two essential steps:
Model Formation:
While this approach to science had (and still has) enormous successes, (largely due to the remarkable human intellect behind it), the scientific method sketched above (before computers) was necessarily limited in scope largely due to severe lack of computational resources, infrastructures and computational tools supporting model formation and verification.
In particular:
This is not surprising therefore that the physical processes we understood in previous centuries are often captured by a single equation. However, there are many processes that have substantial inherent complexity (e.g, Kolmogorov complexity), that are difficult to specify, understand and analyze without computational tools that expand our ability to probe into the mystery of vastly more complex systems (such as the brain, biological systems, economic processes, computational finance, and organizational structures) that have been tackled in the past.
Modern computers allow us to fundamentally change our approach to the scientific process in general and scientific discovery, modeling, and verification in particular.
The new "Computational Method" for discovery in science and engineering includes:
The most interesting and perhaps the most promising aspect of this method is that this new approach tends to have a profound effect on many rather diverse disciplines. This is perhaps not that surprising considering that the "mathematical method" also had a most profound effect on a variety of disciplines as well.
Two particularly prominent examples of this new approach to scientific discovery is the use of probabilistic methods (that became feasible only due to recent computational developments in theory, algorithms and hardware), is now dominating a wide variety of disciplines such as gene finding and location, protein function understanding, speech understanding, natural language understanding, user modeling in Microsoft systems and other domains.
At the same time computational biology and bioinformatics is a field that is becoming vastly technology rich and is proceeding at enormous speed building on WEB-based biological databases, and systems built to facilitate and aid scientific discovery.
In this talk we will describe several of the new generation systems based on intelligent systems technology that perform gene finding DNA sequence modeling and generally database retrieval in biological databases. The systems are based on learning algorithms and adaptive probabilistic representations that became feasible only recently due to the remarkable breakthroughs in computing speed and memory capacity.
We expect the technology and the scientific knowledge created by this research to have a major impact on scientific discovery and computational learning of complex systems in physical sciences, engineering, business and social science communities. This computational approach for knowledge discovery appears to enhance the opportunities for scientific breakthroughs, learning complex systems, and aid in the process of probing the most challenging problems of our society.