Getting AI to find scientific laws sometimes works, but it's a long way from science.
As machine-learning algorithms grow more sophisticated, artificial intelligence seems poised to revolutionize the practice of science itself. In part, this will come from the software enabling scientists to work more effectively. But some advocates are hoping for a fundamental transformation in the process of science. The Nobel Turing Challenge, issued in 2021 by noted computer scientist Hiroaki Kitano, tasked the scientific community with producing a computer program capable of making a discovery worthy of a Nobel Prize by 2050.
Part of the work of scientists is to uncover laws of nature—basic principles that distill the fundamental workings of our Universe. Many of them, like Newton’s laws of motion or the law of conservation of mass in chemical reactions, are expressed in a rigorous mathematical form. Others, like the law of natural selection or Mendel’s law of genetic inheritance, are more conceptual.
The scientific community consists of theorists, data analysts, and experimentalists who collaborate to uncover these laws. The dream behind the Nobel Turing Challenge is to offload the tasks of all three onto artificial intelligence.
Outsourcing (some) science
Outsourcing the work of scientists to machines is not a new idea. As far back as the 1970s, Carnegie Mellon University professor Patrick Langley developed a program he called BACON, after Francis Bacon, who pioneered the use of empirical reasoning in science. BACON was capable of looking at data and putting it together in different ways until it found something that looked like a pattern, akin to discovering a new physical law. Given the right data, BACON discovered Kepler’s laws, which govern the orbits planets make around the Sun. However, limited computing power kept BACON from taking on more complex tasks.
In the 1990s, with more computing power at their fingertips, scientists developed an automated tool that could search through formulas until it found one that fit a given dataset. This technique, called symbolic regression, bred formulas as if they were a species, with genetic inheritance and mutations, where only the ones that fit the data best would survive. This technique, and variants thereof, spurred on a new era of AI scientists, many with similarly referential names like Eureqa and AI Feynman.
These sophisticated algorithms can effectively extract new formulas, which may describe scientific laws, from vast datasets. Present them with enough raw information, and they’ll determine and quantify any underlying relationships, effectively spitting out plausible hypotheses and equations for any situation. They play the role of the data analyst, but experts say this approach isn't about replacing all human scientists.
“The biggest roadblock is knowledge representation,” says Ross King, a machine-learning researcher at the University of Cambridge. “Because if you look at big breakthroughs, like Einstein’s theory of special relativity, it came from a philosophical question about magnetism. And it’s a reformulation of our knowledge. We’re nowhere near a computer being able to do that.”
Leveraging existing knowledge
To truly make groundbreaking discoveries, King argues, the way the machines represent knowledge has to be more sophisticated than simply pushing around algebraic expressions until they find one that fits. There needs to be a way to represent more of the abstract, almost philosophical formulations of knowledge and understanding—they have to handle laws in both their mathematical and non-mathematical forms.
As a step in that direction, researchers at IBM have created a new AI scientist with a novel feature: incorporating prior knowledge. Human scientists often start with well-established basic principles and deduce more intricate or specific relationships from there; they don’t solely rely on new data.
The IBM program, named AI Descartes, merges data-driven discovery with a knowledge of theory for the first time. “This is what real scientists do,” said Cristina Cornelio, a research scientist now at Samsung AI who led the effort. Like many previous machine scientists, AI Descartes looks at new data and compiles a list of potential underlying formulas. Unlike previous software, however, it doesn’t stop there: it then considers relevant prior knowledge, checking how well the suggested formulas fit into the bigger picture.
AI Descartes is basically a three-step system that helps the software make the most sense out of a set of data, given some theoretical information. Its first step is similar to previous machine scientists: looking at noisy data and searching for a formula that would fit without being overly complicated. For example, one of the classic equations it re-discovered was Kepler's law, which describes how planets orbit the Sun. Descartes’ handlers fed the system the masses of the Sun and each planet, their distance to the Sun, and the number of days each takes to complete one revolution. The system used a version of symbolic regression to construct possible formulas from component terms and searched for one that can predict the orbital period of any planet based on mass and distance. Usually, this procedure results in a few possible formulas with varying levels of complexity (with simpler ones being less accurate).
For the second step, AI Descartes turns to the known background theory to check if any of the candidate formulas make scientific sense and help break the tie. To do this, it makes use of a “logical reasoning module” that basically works as a theorem prover—verifying logical connections without the need for actual data. It starts with fundamental rules and concepts, expressed as a set of equations entered by human researchers. For the case of Kepler’s law, this included expressions for gravitational and centrifugal forces, as well as basic premises like mass should always be positive. Then, the reasoning module tries to expand its background knowledge one logical step at a time, using the fundamental rules to generate more and more formulas that are still valid.
If one of the first step’s candidate formula pops up in that list, that immediately makes it the favorite, since it would be provable from background theory.
Imperfect matches
Of course, it’s more likely the theorem prover won’t generate an exact match for a candidate formula—if the formula is easily derived from the background theory alone, one might question the necessity of the data in the first place. In the Kepler’s law example, none of the three formulas it identified in the first step could be derived from existing knowledge alone.
But the ways in which the candidate formulas fall short can be enlightening. This comprises the crucial third step: determining which candidate formula is closest to the possibilities suggested by the background theory. To do this, AI Descartes uses three separate ways of describing the distance between the candidate data-driven formulas and those derivable from background theory—something that can be done even without an explicit ‘correct’ formula. “That’s the magic of the theorem prover,” Cornelio says.
These definitions of distance vary, but they’re all about trying to derive the candidate formula from the background theory with a few different assumptions. These distances help tease out why the formula might be underivable from the background theory and thus suggest future courses of action. One checks that the data itself isn’t inconsistent with the theory; the second examines whether the formula overfit the noisy data; and the third checks whether the candidate formula has a sensible dependence on each of the variables (for example, the masses and distances of the planets in the Solar System).
By looking at all three error measures, AI Descartes picked the least offensive version of Kepler’s law. All three candidate formulas did reasonably well on the first and second tests, but the third revealed that none of them had a theory-approved dependence on mass, and only one had the appropriate dependence on the distance between the planets and the Sun. So, the AI concluded that the distance-dependent formula is a good approximation for the range of masses of bodies in the Solar System.
To do better, the team turned to a dataset that included pairs of stars orbiting each other. Then, the AI learned the proper dependence on mass and fully re-discovered Kepler’s law.
If the program fails to find a formula that at least partially fits both data and theory, it can recommend follow-up experiments to produce additional data that would help it distinguish between candidate formulas.
A long road ahead
Consulting with prior knowledge allows the program to make meaningful inferences from far fewer data points. Aside from Kepler’s laws, AI Descartes has re-derived several well-known laws in physics and chemistry from as few as 10 pieces of data, and it may soon help scientists crack unsolved problems. “In many problems, making the measurement is difficult,” Cornelio says, “both from the experiment point of view and also in terms of cost. So in many cases, you have really noisy data with very few points. That’s where AI Descartes would be most useful.”
It's not going to win the Nobel Turing Challenge on its own, says King, but “AI Descartes is a step towards that. It’s one of the pieces that’s required.” Machine science expert George Karniadakis at Brown University agrees: “I commend the effort because it’s in the right direction,” he says, “but we are not at the point where we have enough intelligence yet.”
One issue is that, while AI Descartes can analyze data and recommend experiments, the system cannot perform the experiments itself. And even more serious is the lack of systematized sets of background knowledge axioms for it to build on in its second step. It will be even harder to give an AI the ability to re-formulate that knowledge by starting from completely different premises or an alternate conceptual framework, rather than by adding formulas to the existing structure. Yet that ability is critical for navigating fields where there are multiple competing hypotheses, like finding a quantum-compatible version of gravity.
“If you think about the history of science,” Karniadakis says, “the big discoveries came from ‘aha’ moments. An aha moment is like you're going down the road and you discover you’ve had the wrong assumptions and you realize they’re the wrong assumptions. These machines cannot realize that.”
But AI Descartes is showing one possible way to start getting us there, and the researchers are already at work on the next steps. “Theory can be incomplete, and at times incorrect,” says Lior Horesh, a senior manager at MIT-IBM Research who led the project. "So, our next question is, 'Can we somehow bring both numerical data and theorems to a common ground, where they can exchange value and simultaneously guide us towards the discovery of new models?’ One way or another, I hope that AI-Descartes and future AI advancements can help us unveil some of the mysteries of the Universe.”
Source
Recommended Comments
There are no comments to display.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.