Most of us have encountered the excitement surrounding Artificial Intelligence (AI) and Machine Learning during the past two or three years. In the popular press, business conferences and Gartner reports we are frequently reminded that AI has progressed by leaps and bounds in the recent past. In particular, a method known as Deep Learning has been used to develop systems that perform surprisingly well in a wide range of tasks. These successes include image recognition (for instance, tagging Facebook pictures), machine translation (most prominently used in Google Translate) and self-driving cars, which are being developed at various companies and universities. More recently, systems using Deep Learning have achieved notable victories in board games, with a program known as AlphaGo beating one of the top players in the Oriental game Go, and a related program (AlphaZero) performing at superhuman levels in chess.
Extrapolating from these unprecedented achievements, many technology watchers have predicted that Deep Learning is poised to transform AI, business and society in rapid succession. Gartner, for example, has predicted that AI will be the core element in three of the top ten technology trends for 2018, with Deep Learning being an implicit component in each of those predicted advances: “AI Foundation”, “ Intelligent Apps and Analytics” and “Intelligent Things”. (At least two of the other “technology trends” foreseen by Gartner are also likely to require Deep Learning, namely “Conversational Platforms” and “Continuous Adaptive Risk and Trust”. In the same vein, publications such as the New York Times and The Economist have published several articles in recent months on the looming importance of AI in areas ranging from poverty alleviation to consumer electronics. Beyond these prospects lie visions of superhuman intelligence solving the world’s problems and, eventually, the technological singularity which will dramatically alter the entire human project.
Of course, predictions – especially about the future – are tricky, and a number of issues need to be considered when the potential of Deep Learning is assessed. In this blog, I will briefly mention three of the current controversies that are most relevant to the future of Deep Learning, and in future blogs I will explore each of these topics in more detail.
One area of widespread concern amongst the experts is our very limited theoretical understanding of Deep Learning. Our current Deep-Learning algorithms have evolved from concepts that were developed in the 1980s; those algorithms were themselves not thoroughly understood, and the dramatic improvements of the past five years were mostly achieved by trial and error. Consequently, we do not have a good model for explaining why Deep Learning performs so well, or how to make systematic improvements to its capabilities. One popular theoretical model from the past focused on the relative paucity of parameters that are used to describe a large data set (the so-called “Occam’s razor” principle). However, it is becoming increasingly clear that this is not a sufficient basis for understanding Deep Learning: different Deep Neural Networks with the same number of parameters but different network structures differ systematically in their performance. Several proposals for theoretical models have recently appeared in the Deep-Learning literature, but it is probably safe to say that these have not fully explained why Deep Learning succeeds so well – they certainly have not influenced the practical development of systems to a significant extent. Our understanding of issues such as generalization and learning algorithms is still highly immature.
On a somewhat related note, there is also much skepticism in the literature about how far the current batch of approaches will take us. Although Deep Learning has been applied in a wide range of domains, those applications share many similarities: in each case, the task can be represented as a mapping between an input of fixed dimensionality and an output of fixed dimensionality; these mappings are adequately represented by a set of “training examples”; many of these training examples are available for the development of the system, etc. Biological intelligence – including human intelligence – on the other hand, operates under a much wider range of conditions. We generate our own “training examples” from a continuous stream of experience, and solve many problems which are not naturally cast as input-output mappings of the type required for Deep Learning. A variety of extensions to Deep Learning have been proposed in order to address these differences, but many informed critics remain doubtful that anything like human intelligence can be achieved in this manner.
Finally, there are good reasons to wonder whether it is even desirable that Deep Learning should achieve anything like human-level intelligence. One type of concern is represented by the warning from Stephen Hawking, when he said “I think the development of full artificial intelligence could spell the end of the human race.” Others worry that vastly improved AI will exacerbate inequality and unemployment – if most human occupations can be performed by intelligent algorithms, the owners of systems executing those algorithms will become extremely wealthy whereas most people will no longer be able to support themselves through salaried employment.
Although both of these concerns seem way overblown given the limitations of Deep Learning mentioned above, it would be foolish not to think about them at an early stage. One clear lesson from the past decade is that the capabilities of algorithms can improve very rapidly, and even several decades may be too short a period to prepare for changes of such magnitude.
All these unknowns are signs of a field in rapid transition, and in future blogs I will take a more detailed look at each of the above issues.