Your Personal AI (PAI): Pt 4 — Deep Agents (Deep Learning and Natural Intelligence)
A Multi-Part Series
This is an excerpt from my book, The Foresight Guide, free online at ForesightGuide.com. The Guide intros the field of professional foresight, and offers a Big Picture view of our accelerating 21st century future.
Why will our smart agents and PAIs soon become as indispensible as the web and our smartphones are today? Why will most of us be joking — and some of us seriously thinking — that our PAIs are “our better selves” in 2040, and for some of us, even 2030? To understand this key aspect of our global future, our next two posts will take a deep look at deep learning, a new paradigm of not only machine learning, but of future computer development.
This will be a long post, as it is about the technology behind the greatest story of our collective future, the advent of machines that think and feel like us, so I make no apologies for its length. Plenty of people will write the short versions. But there are many doubts and misconceptions on these topics, so the length will hopefully clear up a few of both.
There are also some rewards at the end, to make up for this post’s length. The first reward is the “PAI superlongevity” (a new “superpower”, the ability to live as long as we consider ourselves useful) and “Mind Meld” (aka, “Merging With our PAIs” prediction. It describes how humanity will increasingly use these highly personalized and naturally intelligent technologies — deep learning and personal AIs— to solve the “death problem”, in a few ways we’ve never seen before. The second reward is much more prosaic, some good investment tips and a few calls to action at the end.
Like any long-term exponential process, the growth of PAIs will start out looking slow, then become lightning fast, and at some point, we’ll see PAIs as simply a natural extension of us. For a nice intro to the power of exponentials, see Adrian Paenza’s Ted Ed lesson, How folding paper can get you to the moon (2012). Fold an ordinary piece of paper twenty-three times, and you get to the top of Big Ben. That’s surprising. Fold it twenty-two more times, you get to the Moon. That’s difficult to imagine. Fold it fifty-five more times (100 total folds, or informational doublings), and you’re now eight billion light years away from Earth. That’s almost impossible to imagine. But that’s how exponentials run. So knowing where they operate most on Earth (infotech and nanotech), and when they’ll end, has become critically strategically important.
It was nice to see Klaus Schwab, Chairman of the World Economic Forum, promote acceleration-awareness in his Fourth Industrial Revolution theme at Davos 2016, and in his new book, The Fourth Industrial Revolution (2016). But this message about exponentials is likely to continue to be ignored by most of the folks who should get it, for the time being. Most politicians, policy, institutional and corporate leaders are still stuck in old ways of thinking. They are definitely ignoring deep learning, and lacking any understanding of its major implications on their strategy, partnering, R&D, operations, marketing, business development, and corporate foresight work. But the longer they wait, the worse their competitive position will be.
For about five years now to insiders, and three years for everyone, it has been clear that deep learning, which uses reinforcement-based hierarchical neural networks and other variations of brain-inspired computing, will increasingly take over the field of machine learning, and perhaps in its next feat, the entire field of high-end computer design.
With these bottom-up approaches, code (and with computer design, the circuits) are grown, trained, and tested. It is not built by humans. While we have a basic logical and mathematical understanding of the inputs, we can’t understand or describe the algorithms or connectivity that emerges. As with our own brains, its complexity exceeds our minds.
The people who make these deep learners will increasingly not be doing what Sam Arbesman, author of the great new book Overcomplicated: Technology at the Limits of Comprehension (2016) calls “physics thinking”, where math, logic, rationality, and engineering dominate. Instead, they’ll be doing “biological thinking”. Beyond their basic architecture, which is very likely built to exploit some still-poorly-understood version of Bayesian (probabilistic) learning, the vast majority of the evolved internal complexity and connectivity of these deep learners won’t be describable by human-understood science. Here’s the key point to understand: Most of our top-down models in physics, math, logic, and other tools of our limited rationality will continue to fail to explain their higher features — they are too complex for such conceptual reduction.
What we’ll have instead is what we are now building in biology — a set of low-level (molecular, cellular, physiologic) and “bottom-up” learning and control algorithms, another set of grossly useful generalizations and analogies about emergent high-level forms and functions, and a lot of practical experience and intuition with what kind of data, training methods, and selection environments have been best, so far, at creating and improving the performance and intelligence in biological systems.
Recall NVIDIA’s work on self-driving cars, mentioned in our first post. Central to their approach is a toaster-sized supercomputer sitting in the car, the Drive PX, running a neural network that does computer vision, and which talks to a second neural net in the cloud, DriveNet, both of which “see” the world in a way roughly similar to the way human brains see (more on that later in this post). These learning networks are grown and trained like babies, not coded by humans. This brain-like approach in software and eventually in hardware will migrate to our industrial and domestic robots, and be at the heart of all our most complex systems.
The Next Chapter of Machine Intelligence
The next chapter in machine intelligence will involve what is called biologically-inspired computing. See Floreano & Mattiusi’s Bio-Inspired AI (2008) for a good older overview. When we borrow deeply from biology to guide our hardware and software design, we take advantage of the only known methods of making increasingly self-improving technology — the methods that led to our own emergence. Like biology itself, bio-inspired methods are mostly bottom-up and self-directed, rather than top-down, human-directed, or engineered. A mostly bottom up, and slightly top-down approach is how our own genes work in living systems, as the field of evo-devo biology demonstrates.
In my opinion, IARPA’s 2016 $100M MICORNS project (Machine Intelligence from CORtical NetworkS), seeking to reverse-engineer the structure and function of mammalian cerebral cortex to improve the performance of our deep learning software, is the best public money the US has spent on science and tech funding in the last 10 years. MICORNS (stupidly acronymned as MICrONS, just re-acronym it as MICORNS and you’ll remember it) employs powerful new tools in automated connectome imaging in dead and chemically-preserved brains, and in realtime observation, using two-photon confocal microscopy, of learning and control processes in living brains. Here are two excellent 2016 articles (Scientific American, SingularityHub) that explain the profound and largely unrecognized benefits of MICORNS for the world.
The money the US is spending on these reverse-engineering technologies through the 2013 BRAIN Initiative ($150M in 2016) is on par with what Europe committed to the 2013 Human Brain Project (HBP, $1.3B over 10 years). But while we smartly took a bottom-up approach, working on maps, tools, and data, the Europeans awarded all their money to a longshot top-down approach guided by one overly-ambitious person, Henry Markram, trying to simulate cortex in supercomputers, based on what (little) we presently know. By 2014, over 800 neuroscientists had signed letters saying the HBP project was doomed, and they wouldn’t cooperate with it. By 2015, the Eurocrats realized they needed to copy the US approach.
As we’ll see in this series, the science and technologies around deep learning are presently the greatest lever for improving the human condition, as all our science, engineering, and society (think of intelligent agents) will improve greatly as well. This post makes a case that a very bottom-up and user-involved approach to agent and PAI development, with key roles for open source, open data, modularity, and mass user testing and training, will allow us to make our software and computers more like our biology, and in the process, achieve our best individual and collective futures.
In a more foresighted world, the US would be funding ten times as much research on the neuroscience, computer science, and engineering of smarter computers. There are only three teams, at five institutions, being supported by MICORNS at present, and $100M is a pittance. Fortunately, China may jump in to this reverse-engineering work soon as well. China already outspends the US on deep learning research and nanotechnology research, even though they have only a third of our discretionary budget. It’s sad that the US has ceded funding leadership in these critical S&T domains to another nation. Eventually we’ll realize that is a major foresight mistake. See the 2016 OSTP report, Preparing for the Future of Artificial Intelligence, for details.
“Artificial” intelligence (AI) is a good description of where computer intelligence sits today. This intelligence is human-constructed, simplistic, and brittle. It feels as natural as a building, which can’t adapt beyond its design, and begins decaying as soon as humans stop repairing it. A single bit out of place in a configuration file in many current software systems can cause total system failure. We program most of today’s computers top-down, using rational, logical, engineered approaches. They aren’t yet autopoetic, or capable of self-replication and adaptation.
But they will be. They’ll be not only robust to error, but antifragile. That means they have not just security, but immune systems, which learn from catastrophe and error. Catastrophes and errors actually make antifragile systems stronger, just as dirt and infections strengthen our biological immune systems. Rather than being “built”, it’s better to say they’ll be “seeded”, grown, and trained. Folks like Dipankar Dasgupta have been researching artificial immune systems (AIS) for twenty years, with little recognition by mainstream computer science. Here’s his latest book. I am convinced that the better we understand neuroimmunology, the better we’ll realize that the combination of bio-inspired computers and technological immune systems are the only reliable and proven path to real security, in both biology and technology.
We will address NI safety in our post on Safe Agents. If naturally intelligent (NI) machines prove to be rapidly self-correcting and antifragile after bad things happen with them, just as living systems naturally are, and unlike almost all of today’s AI machines, it seems clear that we as a society will continue to build and use them to solve our pressing human problems.
Self-improving, antifragile intelligence is so different from today’s artificial intelligence it deserves a new name. So let’s call it “natural” intelligence (NI), and recognize that it must be deeply biologically-inspired. Again, bio-inspired machines aren’t coded and designed, but rather are grown, gardened, and tested by us, against big data and the world. They have the equivalent of both brains and immune systems, and an ever-growing ability to self-explore, self-repair, and self-improve.
The symbolic, rule-based, top-down, engineered, and human-comprehensible approaches to AI, which have delivered modest progress for fifty years, are just a small part of the human brain. You can be sure they’ll also be a small part of the machine brains to come. Our systems, software and computer designers will keep sliding toward naturally intelligent machines because working with them, once they reach a threshold level of intelligence and self-improvement ability, will be far more efficient and effective than continuing to design top-down, using the old paradigms.
We can also call bio-inspired computer hardware and software design “natural computing”, to distinguish it from the engineered, discrete, serial, rule-based, “nonbiological computing” that we still use in the vast majority of our IT systems. We saw the earliest signs of natural computing in first crude neural networks, Frank Rosenblatt’s perceptron, in 1957. But the perceptron didn’t have a good training algorithm, so this kind of computing made little progress for thirty years. A good training algorithm, backpropagation was invented by Geoff Hinton and others in 1986, and neural nets began to make progress after that. But natural computing had to wait another twenty years, for fast processors with good hardware parallelism and access to data. All told, it took fifty years for neural networks to become an overnight success.
Natural computing’s successes began in earnest around 2005, as we will see. By 2009, nonbiological approaches to machine learning began losing out to biological approaches. Natural computing includes minimally biologically-similar hardware, like NVIDIA’s Pascal (optimized for running neural net software, but not yet deeply biological), and more strongly biological hardware like IBM’s SyNAPSE and other neuromorphic chips, and a wide array of biologically-similar machine learning software and algorithms, like recurrent and convolutional neural nets, reinforcement learning, hierarchies, modularity, swarm intelligence, evolutionary developmental methods, and much more.
Bio-inspired computing methods includes biomimicry (biomimetics), the imitation of models, systems, and elements of nature to solve human problems, described well in Janine Benyus’s Biomimicry (2002). But they also take us beyond biology, which the word biomimicry doesn’t convey. Naturally intelligent computers will do things biology can’t, at speeds biological brains will never reach. They will learn to replicate, and generate their own adaptive complexity and intelligence, far faster and more stably than we ever could.
Our naturally intelligent PAIs will help us with many things, as this series seeks to address. But of everything our PAIs can and will help us with, thinking about how they will advance evidence-based thinking and collaborative scientific and technological research, and where that will take us, is perhaps the most exciting of all our opportunities ahead. Demis Hassabis, CEO of Google DeepMind makes that point in this lovely 14 min video at Falling Walls 2015, which is well worth a watch.
Even though deep learning systems are nowhere near as complex yet as biological brains, they will keep learning and operating at least seven million times faster than biological brains, which are limited by electrochemical rather than electrical communication speeds. So it won’t be that much longer before they “learn their way up” to our level of complexity. In fact, this NI future seems so useful and powerful, I predict future science will show it is a developmental outcome that emerges on all technological planets, an “attractor” that humanity cannot avoid.
Many people currently talking about machine intelligence are still missing the increasingly bio-inspired, bottom up, and evolutionary developmental (evo devo) nature of the new generation of machines. They still think in terms of the top-down, rationalist, engineered way that most machine intelligence has emerged to date. But that top-down approach depends on our slow and limited biological human minds to grow it, and has far less potential than the bottom-up, self-replicating methods now emerging.
Top-down, rational design schemes for creating machine ethics and engineering “safe AI” in our PAIs and robots will always be very limited in usefulness, in a world of increasingly bottom-up NI systems. Even in today’s rationally engineered computing environments, all our leading computer science algorithms and data structures are actually not fully rational, they are rationality-guided but computationally incomplete guesses at how to represent the world in a useful way. Logic, rationality, probability theory, and other top-down tools let us make better guesses, but they are still just guesses
Most fundamentally, all most complex things in the world, including life and minds, are both evolutionary and developmental. That means that they are almost entirely bottom-up, experimental systems (evolutionary) with a just few empirically-found rules for top-down, systemic guidance (development). Evo-devo biology is precisely how the most complex organisms on our planet self-organized their own amazing complexity. Evo devo methods are how tomorrow’s smart machines and agents will emerge, as these methods alone allow computers to increasingly guide their own self-improvement.
In his beautifully-written book on machine learning, The Master Algorithm, (2015), recommended earlier as background reading, computer scientist Pedro Domingos identifies “Five Tribes of Machine Learning”. Each tribe has been successful, to some degree, in building learning computers to date. Domingos’ Five Tribes, and in parentheses, the current favorite algorithms used by each, are:
1. Bayesianism (probabilistic inference)
2. Evolutionism (genetic programming)
3. Connectionism (backpropagation)
4. Analogizers (support vector machines)
5. Symbolists (inverse deduction)
Deep learning, which we’ll discuss at length in this post, is a kind of Connectionism, the Third Tribe on this list.
When we ask ourselves to write the story of life’s Intelligence Emergence Stack — the evolutionary developmental hierarchy in which intelligence emerged in living systems on Earth, there are good arguments that biology followed the order laid out above. This is not Domingo’s order in his book, as he does not (yet) view the universe from an evo devo perspective. But I for one am hopeful that one day, he will.
Let’s briefly back up these claims:
- Bayesian Intelligence. Molecular precursors to our first cells must have used chemistry to do probabilistic inference, in replicating chemical networks, to model and react to their immediate surroundings, and to support their survival, in molecular evo devo. One good book that takes this perspective is John Campbell’s Darwin Does Physics (2015). Campbell is a scholar in our Evo Devo Universe research community.
- Evolutionary Intelligence. Eventually life emerged, with its cells and genes, which are both evolutionary and developmental. While life arose from and still uses Bayesian processes, it encodes 3D form, function, and constraint at a much higher level of informational and computational abstraction than those processes typically do.
- Connectionist Intelligence. Eventually, a special subset of dominant multicellular life built neural networks (brains). All biological neural networks use a still-only-partly understood set of Bayesian inference algorithms, most neuroscientists think. But their evolutionary and connectionist architectures and abilities make them considerably more complex than what we understand as standard Bayesianism (we might call them “SuperBayesian”).
- Analogical Intelligence. Eventually, the most intelligent and dominant of these animals with brains began thinking in analogies, a process that all higher animals, including crows, can do.
- Symbolic Intelligence. Finally, humans began their runaway partnership with technology, and evolved and developed symbolic language, and later, formal symbolic reasoning in the Enlightenment (1600–1800).
As might be expected on reflection, artificial intelligence research has emerged in the exact reverse of this order. In the 1960s’ we began working on machine intelligence using top-down, rule-based and discrete symbolic reasoning —the epitome of Arbesman’s precise yet oversimplistic “physics thinking.” That was where the easiest work could be done at first, and “Artificial” was a great word to describe this entire process. Symbolic strategies made lots of early progress, and were greatly overhyped by some, but anyone with a biology background had little faith that they alone would create truly smart machines.
As symbolic progress slowed, we moved to support vector machines (analogizers) in the 1990s, a promising step deeper into the nature of intelligence. We also began experimenting with genetic programming and neural networks in the 1980s and 1990s, but each were still too early then to make much progress. In the early 1990s, we began making progress with Bayesian networks. Since 2009, as we’ll see below, connectionism, via deep learning, has become the latest important advance.
The dramatic recent success of deep learners and the return of connectionism marks a big transition, and I think we need new language for that transition. From now on, whenever we talk about the future of thinking machines, I we should be favoring the phrase “Natural Intelligence” over Artificial Intelligence, and begin phasing out that latter phrase, as it is increasingly irrelevant and incorrect.
That change of language can help signify, to those ready to hear it, just how momentous this shift to deep learning actually is. We’re finally working earnestly across all the layers of the stack. Our best strategy to build smarter machines, from here forward, is to try to recapitulate all the key intelligence innovations that nature has made to bring us to this point. What’s more, we will increasingly let our machines lead us in that journey, as they get ever more effective at their own natural learning.
Our machine learning community has a lot of work still to do in creating natural intelligence. Our current understanding of evolutionary developmental (evo devo) computing is quite primitive. Just like evolutionary biologists who continue to ignore evo-devo biology, all the processes of convergent evolution, and the way development controls evolutionary processes, today’s leading conferences on evolutionary computing, like GECCO, still don’t pay much attention to development. Evo devo computing, for its part, must be tied to the development, variation, and maintenance of connectionist networks in machines, just as genes guide a living brain’s neural networks. Finally, all of these tribes must be tied into Bayesianism. We need to understand why Bayesian methods led inevitably to the kinds of intelligences that life uses. Computational neuroscientists have built early Bayesian models of brain functions, and biologists use Bayesian networks to discover gene associations, but it will be a while before we understand evo devo systems in Bayesian terms. All this will be needed to create deeply naturally intelligent machines, and the technological singularity, in my opinion.
At present, a tiny but rapidly growing number of computer scientists now train and guide, rather than program and engineer, the new deep learning systems that are driving cars, and acting as the cloud-based “brains” behind our current smartphone agents. Most computer science will be done this way in the years ahead. Large numbers of computer scientists and users will be experimenting with and training, far more than designing or programming, tomorrow’s leading PAIs. For the future of NI, bet on evo devo, which is 95% bottom up, not rational design, or other top-down approaches. And bet on evo devo machines and the environment doing the “programming,” not human brains.
The growth of life and mind has always been a lot of evolutionary trial and error balanced by a small amount of slightly improved developmental processes, in each replication cycle. So too it seems likely to be with tomorrow’s computers. For more on that perspective, see my book precis, Evo Devo Universe (2008) and our interdisciplinary research community EvoDevoUniverse.com.
When we view the world from the wrong frameworks, life has a way of showing us our mistakes. I am hopeful that deep learning’s continued juggernaut in the machine learning space will make the many currently top-down, rationalist philosophers of AI understand the unique advantages of applying the evo devo paradigm to the future of technology. We shall see.
So we now have a rough roadmap for how the much-vaunted “technological singularity” will arrive, later this century. In fact, it is no longer a “singularity,” a point at which our models and foresight breaks down, but rather a rapidly approaching and natural transition that many of us now expect. So let’s call it a predictable phase transition in natural intelligence (NI), not a singularity, and bring it into the realm of hypothesis and science.
Why Neural Networks are So Naturally Intelligent
Let’s take a look now at neural networks, both in brains and machines, to see why they are so important to the future of postbiological intelligence.
To better understand our own natural intelligence, consider just three great advantages of neural networks (connectomes), which are at the heart of today’s deep learning machines:
First, neural networks fail gracefully when damaged by the environment, because useful information is never stored in one single place. Concepts, models, ideas, and predictions are always stored “a little bit everywhere”, represented in the number, locations, and strengths of synaptic weights. Such systems undergo what is called “graceful degradation” when damaged. As links are damaged, their performance slowly decreases, and it rarely dies all at once. In today’s artificially intelligent computers, changing one single bit in a config file can crash the whole software. Not so with natural intelligence. If a neural connection is destroyed by trauma, disease, or biochemical error, we may partially forget some aspect of the information we wanted to keep, but we can often repair and reestablish the memory by concentrating on some other aspect of the thing in question and “routing around the damage”. This is what you do when forget a person’s name but think about some other aspect of the person, until their name suddenly comes back. It’s also what you do when you walk back to the place in which you were thinking about what you wanted to do next in order to remember it, thus returning to the original net of mental associations in which you formed the idea. All human thinking and memory works in this incredible associative way.
Second, neural networks can access vast amounts of stored information in each processing step, because all information in the brain is just a few “degrees of separation” (switching circuits) away from all the other information. Our brains have neural switching speeds of roughly a thousand times a second. Electronic transistors can switch on and off billions of times a second, making them roughly seven orders of magnitude faster (10,000,000X) at this task than biological brains. But because we store information associatively, in the number and strength of connections between neurons, we can search our memory almost instantaneously to see if we already know a concept, a name, or a face. It may take just a hundred neural processing steps to scan our entire memory, for a concept, as each step has access to so much information, due to the massive parallelism of our connectome. That means, within seconds, we can say with confidence whether we know something, have a partial memory of it, or it feels fully new, at least according to our current search — of our entire brain! Conventional serial computers cannot do this. Even though they are billions of times faster, they are not parallel, or naturally intelligent. Each search step accesses so little information, that trying to search a similarly large database takes forever. They can’t make realtime, dynamic estimates of what they know and don’t know. But deep learning systems, especially hardware based ones, can do this. They remember like us.
Third, neural networks are always simultaneously comparing a vast number of parameters of anything of interest, as they both remember and think via synaptic connections, and doing something the machine learning folks and statisticians call dimensional reduction. Connectomes offer the most powerful informational and computational architecture that we know to continually explore, and efficiently reduce the dimensionality of, a “hyperparameter space” of large numbers of potentially interacting parameters.
Our associative brains are the ultimate “relational databases”, relating everything to everything else. The central problem of intelligence is always the appropriate mapping, fanning out (evolution), and pruning (development) of a mind, to best navigate the combinatorial explosions of possible representations of reality (model parameters). See Alice Zheng’s (@RainyData) “hyperparameter tuning” post for more on this “metalearning task” (something that must be done prior to actual learning).
When they are properly connected, neural networks can quickly sift and pay attention to just that small combination of parameters that seem most adaptive to the problem at hand. Associational architectures quickly “fan out” (an evolutionary process) into a vast number of possible associations, and then just as quickly “fan in”, or prune (a developmental process) to just the information that they think is still worth attending to, and this process is how we make predictions.
This ability to continually fan out and fan back in, while simultaneously comparing a vast number of competing information sources to form an intuition, a model, a prediction, or a plan, is an evo devo process that allows us to elegantly manage a torrent of incoming information, and simultaneously compare thousands of potentially relevant parameters in the world. Again, conventional computers can’t do this. But deep learning systems are learning how, which means they will increasingly not just remember, but also think — like us.
Neural networks aren’t perfect. Whether biological or technological, they can and do eventually become overtrained. Their weights can become inflexible, like an old human mind that has been trained exclusively on just one type of data and can no longer see other points of view. But we can get out of that trap by rejuvenating them, opening new connection space, and retraining on new data. We are a long way from figuring out how to do that with human biology, but we are already learning how to do that renewal with many of our deep learning machines.
As another major current limitation, today’s artificial neural networks also are not “compositional”, meaning they don’t yet know how to combine different pieces of information sequentially, in different ways, to do chains of thinking, following sequential rules. So the symbolic processing that today’s computers can do very well, and humans can do to a limited degree, needs to emerge in the deep learning networks of the future, to move them fully into natural intelligence. But we’ll get there, by better understanding our biology, and porting over more of the kinds of specialty processing it uses into our machines.
As Kerri Smith reports in How to map the circuits that define us, Nature, 9 Aug 2017, when you incorporate their size, shape, firing speed, receptor types, and what genes they express, some neuroscientists expect that mouse (and human) cortex has as many as 10,000 neuronal types. Then there are the networks themselves, typically small world clusters of various types, loosely linked or sequentially chained together to do valuable things. So there is tremendous fascinating complexity there. Fortunately our ability to map and upload it all is also growing at superexponential rates. Check out David Cyranoski’s piece, China launches (a new) brain-imaging factory, 16 Aug 2017 for one of several exciting recent examples.
So as neuroscience keeps advancing, we’ll keep using all the brain’s neural network structure and algorithms that we can copy, algorithms we will likely never fully understand, and create experimental versions of them in our hardware and software. We’ll train those neural networks with data and our feedback, not program them. Those systems will in turn themselves run vast numbers of new experiments, in their reconfigurable hardware, in their software, and in the way they interact in the world. Many of those experiments, of course, will be initiated by our PAIs and agents, and run on us, and the world. They’ll learn just like a baby learns, with progress and failures too, but constantly getting better by trial and error.
In a famous recent example, Google DeepMind’s deep learning network learned by itself how to play 49 video games from the Atari 2600, with no human training, in Feb 2015. It was immediately better than the best human players on 23 of these games, and in a few games, like Breakout, it uncovered optimal play strategies that humans didn’t realize were available.
As mediated reality grows (our last post) deep learning-backed software agents will be able to learn even faster from many virtual realities than from physical reality, once enough data and accuracy are in the simulation. They’ll continually take their most useful virtual learning back into the physical world. Learning is particularly rapid in virtual space because more iterations can be tried faster, as long as computational power and simulation detail are sufficient, with no risk to physical life, and with much less need for physical resources.
Biological neural networks do this virtual world simulation constantly already. It’s called dreaming, and imagination. So do deep learners now. See the dramatic visual examples of “inceptionism” by Mordvintsev et al. at Google for how today’s deep networks can “dream” or “imagine” the world around them. I’ve got a few of these artworks on my wall now, to remind me that our most bio-inspired computers are just now learning to dream, in limited ways. It’s truly a brave new world!
Again, remember that evolution and development in electronic systems, whether hardware or software, can happen far faster than in human brains. Evolutionary pattern recognition (thinking, imagination, dreaming) runs at roughly 100 mph (the speed of neural communication) in human brains. That’s fast within a small human brain, and this speed keeps us alive in the world, but the same processes run at the speed of light inside dynamically reconfigurable hardware-based neural networks, in neuromorphic chips. That’s at least seven million times faster than human brains. So you can see where all this is going.
Deep Learning: 2005 to the Present
Let’s do a quick recent history now of the most recent star of natural computing, deep learning, to see it in broader context. Again, deep learning is a type of bio-inspired computing that uses neural networks of different varieties (hierarchical, recurrent, convolutional, goal-directed, reinforcement-driven, etc.). It is the hottest new area of machine intelligence, and like any rapidly improving area, it is easily overhyped, especially for what it can deliver in the next five years. But beyond that, all bets are off with what these systems can deliver. They’re on the path to natural intelligence.
An interesting and unconventional place to begin our deep learning story is in 2005. In that year, Moore’s law in MOS integrated circuits ran into the first of a series of endings that will increasingly move us out of its fifty-year long “magic shrinking transistor” paradigm. All exponential growth in any substrate can only run for so long, then it must jump to a new substrate. 2005 brought the end of something called Dennard scaling, which meant that chips got too hot (leaked too much current) if you shrunk them any further, so so around that year chip companies began producing multicore CPUs. The chip industry didn’t want go multicore, as no one knew how to connect multicore chips in useful ways (parallel computing). But the end of Dennard scaling forced them to start making a bunch of first-gen, weakly parallel CPUs. As miniaturization limits grow, Intel’s former Chief Architect, Bob Colwell, predicted in 2013 that Moore’s law will be totally “dead within a decade.” If you care about natural intelligence, please pray for that prediction to be true! Only then will deep learning truly dominate, in both hardware and software domains, as we’ll see.
As Moore’s law was hitting its first ending in 2005, companies like NVIDIA which had had been making graphics processing units or GPUs to run video games since the mid-1990s, began realizing they were in a unique position to take a leadership position in the future of machine learning. At first, their chips used simple parallel processing in hardware and software, primarily for graphics. But as the video game industry exploded, GPUs rapidly improved their performance, with performance doubling times that were much faster than for CPUs (often doubling their performance per price every 12–16 months, instead of 18–24 months). By the late 1990’s, GPUs, not CPUs on motherboards, had become the best places to run the computationally intensive algorithms being used by the machine learning community. These simply parallel GPUs, in the graphics cards on our desktop computers, running our ever larger screens and our video games, can be thought of as Earth’s first mass-produced weakly bio-inspired hardware brains.
Thus 2005 can be argued as the the time when the chip industry began to move from “miniaturization exponentiation” into “parallelization exponentiation”, doubling the number of processors and circuits that can work together simultaneously in useful ways. Parallel exponentiation is much harder, because we humans don’t know how to best connect up parallel systems. When we were in the middle of the Moore’s law era of continually shrinking circuits, attempts to build massively parallel machines, like Danny Hillis’s impressive Connection Machine in the 1980s, unfortunately just couldn’t work. Their hardware became obsolete almost immediately after they were built. But just as importantly, we had no idea how to program those deeply parallel machines, and no incentive to do so, as we got so much more performance return by continuing to shrink standard, nonbiological, and serial Von Neumann computer architectures.
Fortunately, biology has had billions of years to make massively parallel self-improving systems, and after 2005, computer hardware and software begin to get parallel enough for us to start using bio-inspired methods. On my website in 2002, I predicted we’d need an end of Moore’s law and a rise of massive parallelism, neural nets, and bio-inspired computing to get real machine intelligence. So I’ve been gratified to see these emerge over the last decade.
Scholars who publish on exponential technology growth, in journals like Technological Forecasting & Social Change, tell us that individual exponentials always end. But if we live in a universe where nanotech and infotech are special, as I argued in Post 3 (The Agent Environment), then whenever any productive technology exponential ends, it creates technical and market opportunities for new exponentials to emerge, out of nanotech or infotech strategies that couldn’t work before. So as exponential miniaturization of digital circuits began to end in 2005, we created the first real opportunities for exponential parallelization of those circuits, and thus deep learning, to emerge. That new exponential is now the one to watch. The bottom line, for those of us who do foresight work, is be very careful to identify the appropriate exponentials relevant to our problem. They may not be the ones that most people are thinking about.
Ironically then, the beginning of the ending of Moore’s law is one of the best things that has happened to machine intelligence. As chips are stopping their magic shrinking game, it is becoming economically possible, for the first time ever, for chip companies to massively parallelize them, bringing more brainlike machines, what we can call Natural Intelligence, to the world. Artificial intelligence is top-down, human engineered machine learning. We’re moving out of that paradigm right now. Natural intelligence is bottom-up, self-guided, and deeply biologically inspired.
Natural intelligence will be the future of our most advanced CPUs and GPUs. They’ll become increasingly neuromorphic (brain-architecture inspired), like the experimental SyNAPSE chips by IBM and others, and those architectures will be controlled by technological versions of genes, hardware description languages that can evolve, and that each specify the kinds of neural network architectures that develop in each replication cycle. Again, human beings won’t program these naturally intelligent machines, as we aren’t smart enough, but I’m convinced they’ll be tomorrow’s best self-learning systems.
Let’s jump ahead now to 2009, another big year in the deep learning story. Neural networks can’t work well unless they have a lot of data to crunch, as well as machine learning professionals who believe crunching all that data will yield powerful results. In that year, Halevy, Norvig and Pereria of Google published a seminal opinion paper, The Unreasonable Effectiveness of Data, which described big progress being made in statistical, associational approaches to speech recognition, language translation, and language understanding. This widely-discussed paper was an important signal, both to machine learners and the technically literate community know just how important both statistical approaches and web scale data were becoming, and would increasingly be to the future of machine intelligence.
Also in 2009 a type of deep learning system called a Long Short-Term Memory network, developed by my friend Juergen Schmidhuber and his team at IDSIA in Switzerland, became the first deep learning system (recurrent neural network) to win an international machine learning competition, against other traditional, much less bio-inspired approaches. Their system won first for handwriting recognition (ICDAR 2009), then later for traffic sign recognition (IJCNN 2011), then for a variety of image recognition tests (ISBI and ICPR 2012). Their 2011 win was the first to achieve what Schmidhuber calls “superhuman performance” in complex visual recognition, beating humans at recognizing traffic signs in the wild.
In 2010, Kaggle, the leading predictive modelling competition platform emerged, creating a new place for data scientists to openly compete to produce the best predictive software. They’ve grown to half a million registered “Kagglers” since. Many of the world’s deep learning practitioners engage in contests and share code on Kaggle today.
In 2011 and 2012 academic teams using neural networks again won character recognition, traffic sign recognition, and medical imaging tests against other machine learning approaches. The ILSVRC 2012 ImageNet competition was perhaps the turning point event for deep learning, as neural networks were so successful discriminating images on ImageNet (a common image data set used by machine learning community) in that competition, that most machine learners then turned away from hand-built “feature engineering” toward unsupervised feature learning using deep learning. Google, Facebook, Microsoft, and other majors immediately noticed this change and began acquiring deep learning research teams and startups around the world.
By 2011, NVIDIA was also doing increasingly complex parallel hardware and software design, using their GPUs as accelerators for large financial and supercomputing clients. After 2012, inspired by deep learning’s advances, NVIDIA began to plan a major pivot of their company toward artificial intelligence, to try to sustain their manufacturing leadership position in this rapidly emerging field.
The success of deep learners entered the public consciousness in June 2012, with John Markoff’s New York Times article, How Many Computers to Identify a Cat? 16,000. This article described Andrew Ng and Jeff Dean’s team at Google, which used 16,000 processors, in a network of one billion connections, that identified cats, and other objects, from 10 million YouTube videos, using an unsupervised (autonomous) approach.
This Google Brain network is a nine layer system, only three of which are particularly complex (structures called sparse autoencoders). It could only recognize cat faces head on, while humans can recognize them in any pose. But the cat was out of the bag, so to speak :) Not just industry insiders, but techies everywhere began following the deep learning story which has been accelerating ever since.
After 2012, deep learning began working well in a variety of applications, such as auto-captioning of images, in language translation, in computer vision, and in several other fields. For an excellent window into this prolific period, see Jeremy Howard, “The Wonderful and Terrifying Implications of Computers That Can Learn,” TEDxBrussels 2014.
See also Steve Omohundro’s (@steveom) great TEDx Talk, What’s Happening With Artificial Intelligence? (2016). His second slide highlights a few of the multi-billion dollar investments we’ve seen in AI over the last three years.
Let’s look at a few highlights from this most recent period. In 2014, Andrew Ng, formerly at Google, joined Baidu to build a speech recognition system entirely via deep learning. This was very ambitious, as all previous speech recognition systems had involved significant amounts of human-directed training and feature engineering. Also in that year, several companies made some huge investments in deep learning, as summarized in the slide above.
In 2015, Baidu announced their deep learning network was the first to reach superhuman performance in the recognition of short clips of speech spoken over the phone (“Baidu’s Deep-Learning System Rivals People at Speech Recognition,” Tech Review, 2015). Coincident with this, Baidu has launched a smart agent, Duer, to help smartphone users do various tasks. China is now investing heavily in Baidu and other Chinese deep learning firms, and in deep learning education, in an effort to match the West on this critical technology. If our politicians stay ignorant of its strategic value, we may eventually end up getting the lead snatched away from us, the way Britain invented the chemical industry, then lost it to the more pragmatic Germans over four decades prior to WW I.
Also in 2015, we saw NVIDIA’s self-driving car, a much more rapidly emerging, and more bottom up system than the mapping-based approach to self-driving cars that Google has been developing for ten years, since Sebastian Thrun’s team won the DARPA 2005 self-driving car competition. Perhaps the most amazing thing about the NVIDIA car was that it learned to reach near-human level performance over just six months in 2015. With the right hardware, software, the right problem, and good training data, these systems can rapidly gain human level proficiency (picture below).
The layers in these deep learning systems aren’t nearly as complex as the human brain yet. The human visual system, for example, is still much more elaborate. A task like face recognition in our brain begins with neural nets in the retina of your eye, then goes to midbrain relay nets called LGN, then goes to six layers of visual cortex at the back of your brain in V1, then to the six layers of V2, then V3 and V4, then to the fusiform face area (another six layer region of cortex, specialized to process faces) and then to individual cells, including a number of so-called grandmother cells, single cells (or sometimes, small networks) that are tuned to recognize individual faces, like the face of your nanny, and no one else. We have a ways to go before our deep learning systems are as complex as this. But we will get there, exponentially.
Facebook’s Yann LeCun (@ylecun) is a deep learning leader who is presently building the best face recognition solution available on the planet. It may have superhuman performance narrowly already, and it will reach achieve it broadly soon. The FBI launched a $1B face recognition project in 2012, but knowing how federal institutions contract such work, I predict it will be junk, and one of the deep learning IT leaders listed above will get there first.
If the FBI really wanted to get their solution on time and on budget, they could have done a few large parallel contracts with a variety of IT leaders, not defense contractors, and a majority of smaller incremental competitions on Kaggle, with tens of millions in education and startup bounties available for any small team or sole practitioner who deployed anything semi-competent during the competition.
That would spend a lot less for much faster, better and deeper technical and social returns. But taking a mostly bottom-up, evo devo strategy would have required their recognizing that face recognition is a tool all of us will have. They won’t be able to corner it, or even get there first. A society of Little Brothers (mass souveillance of each other, via our PAIs) is not only inevitable, it is far safer and more antifragile than the Big Brother (surveillance) society that some of our security leaders falsely envision.
Deep learning apps dominated NVIDIA’s GTC 2016. See this May 6th NVIDIA piece on how their engineers taught a car to drive using their Drive PX hardware and other software and lots of training data. NVIDIA shipped a board last year, the GTX Titan X, that folks can use to train neural networks on their home PCs, and they’ve got a new GPU (Pascal) and board (Tesla P100) that will be 10X faster at running deep networks, shipping next month.
In March 2016, Google DeepMind’s computer scientists and neuroscientists built a program, AlphaGo, that beat Lee Sedol, the world’s best ranked player in Go, four games out of five. Go is exponentially more complex than chess. See this lovely video for more on how a relatively small team of fifteen employees at DeepMind accomplished this amazing feat, using a blend of deep learning and reinforcement learning, and clever training and goal-development architecting for this amazing hardware and software “brain.” [Note: It also turned out, per Google CEO Sundar Pichai (@sundar_pichai) at Google I/O on May 18th, that Google built a custom ASIC chip for their deep learners, what they call a Tensor Processing Unit, which they say is ten times more efficient per watt than commercial GPUs and FPGAs. It’s great to see Google in the chip-making business for machine learning! I hope that continues.]
So we’re truly off to the races now with deep learning, and we’ll see a new generation of programmers using these increasingly biologically-inspired approaches to machine learning in the coming decade, for a vast range of uses. See Eric Siegel’s Predictive Analytics (2013) for some of the areas machine learning is already disrupting. We will see deep learning increasingly prevalent in automation and robotics of all kinds in coming years.
These successes are vindications for folks like Geoff Hinton, one of the fathers of connectionist computing’s most useful algorithm to date, backpropagation, in 1986. At that time, computers weren’t fast or parallel enough, and data sets big enough, for neural networks to deliver many human-surpassing results. Now they are, and Hinton leads a large deep learning team at Google.
They are also a vindication for technologist Jeff Hawkins, who published an influential book, On Intelligence, in 2004, arguing that a special kind of neural network, an HTM network, modeled after the human cortex, would be key to the future of machine intelligence. Hawkins and his colleague, Dileep George, now running Vicarious, made some progress with their HTM-variant networks, and they opened their platform to community use. But without the resources of a Google, Microsoft, IBM, or an NVIDIA, they couldn’t quite jump start this field at the time. They also must be smiling today.
There are now many entry-level resources for learning more about deep learning. There are tons of YouTube videos on deep learning, many on recent achievements, including cat recognizers, speech recognizers, autocaptioning, video game playing, game playing, and self-driving cars. NVIDIA has good deep learning tutorials, including Deep Learning in a Nutshell (2015). Michael Nielsen has a great free online textbook, Neural Networks and Deep Learning (2016). Presentations like A Short History of and Intro to Deep Learning, John Kaufhold (89 slides). Browse the DeepLearning.net wiki for conferences and resources. See Quora’s tags for Deep Learning, Convolutional Neural Networks, etc. Join Reddit’s Machine Learning and Deep Learning communities. For places to work or invest, see Venture Scanner’s list of nearly 1,000 AI companies. Some analysts have estimated that about a fifth of these are presently employing or developing deep learning competencies in their solutions. That percentage will obviously rise, among the future leaders.
Deep Learning Captures Real Neurobiology
Are deep learning neural networks really biologically inspired, or are they just a “toy model”, slightly useful but not complex enough to capture the way the brain actually works? A new paper by Yamins and DiCarlo, Using goal-driven deep learning models to understand sensory cortex, Nature Neuroscience 19:356–365, Mar 2016, makes a big step forward in putting this question to rest.
Their paper demonstrates that even today’s simple deep learners duplicate many powerful features of how neurons in human visual sensory cortex process information and predict visual images. It also gives research guidance to computer scientists and neuroscientists over next five years. The paper is behind a paywall, but here is an excerpt of the front page.
See also DiCarlo et. al.’s 2014 paper, which directly compares the representational performance for visual object recognition of DNNs (deep neural networks) to the primate brain, finding them both efficient at constructing representational spaces in which objects of the same category are close, and objects of different categories are far apart, even with large variations in the object example, position, scale, and background. This isn’t our father’s A.I., it’s natural intelligence, or N.I.
Papers like these show us that deep learners already strongly mimic how we mammals make sense of and remember the world. DNNs are likely still missing some of our basic algorithms however. We don’t really know because long-term memory encoding has not yet been fully cracked by neuroscientists to date, though we are fast closing in on the prize.
One of the things we do know about human memory is that its most important and basic component by far is the shape and variety of synapses of the 10,000 dendritic spines (on average) that lead into every individual cortical neuron in our brains. This gross basic connectivity and synaptic weighting is crudely captured in today’s deep learners. A good book on spines, which explores how they form neural circuits and memories, is Rafael Yuste’s Dendritic Spines (2010).
In Nobel prize-worthy work published in 2014, Steve Ramirez and Xu Liu implanted a fake memory of a traumatic event, a foot shock, into a living mouse’s brain, by altering the shape of dendritic spines in their brain with an optically sensitive transgenic protein (ChR2) and laser light, in an area called the hippocampus, which stores the most recent two days of our memory, and which writes some of those short term memories to long term memory (in cortex) when we sleep. This and similar experiments have confirmed decades-old theories that our memories are stored in the architecture and connectivity of the thousands of dendritic spines that connect every one of our pyramidal neurons to each other in our brains.
These very special neurons are 80% of our 25 billion cortical neurons, and they hold all of our higher memory and personality. Curiously, the pyramidal neurons in our prefrontal cortex, where we conduct all our highest thinking and planning, have totally maxed out the number of connections they can make to other neurons. Prefrontal cortex pyramidal neurons have on average 23 times more dendritic spines than the same neurons in our primary visual cortex. There is simply no more room around these particularly helpful neurons to make more physical connections to neighboring neurons. But there will be such room in your PAI’s neural network, you can be sure.
Many of the dynamic features of neural architecture still elude us. Most molecular features can likely be ignored in a first model, as they exist to keep biological cells alive, not to allow them to think or remember. But some dynamic features are central to learning and memory. They involve things like Attractor Networks (Scholarpedia article) and Neurotransmitter Field Theory (Greer & Tuceryan 2010), and it will take a while to figure them out. But with 30,000 bright neuroscientists attending the annual Society for Neuroscience meeting, and hundred of specialty neuroscience conferences, we’re getting closer to learning the full rules of neural learning and memory every year. If you want to study these topics further, or explore a career in this field, here’s a great free online textbook, Computational Cognitive Neuroscience (2014).
Consider this insight about our brains that recent neuroscience work has suggested. Bourne and Harris (2007) tell us that in human brains, roughly 65% of our spines are ‘thin’, 25% are ‘mushroom’ spines, and the remaining 10% are stubby, branched, or other ‘immature’ forms. See the picture at left for the different shapes. They propose that thin spines are what we do our thinking with, interpreting our sensory data and relating it to our memories and motor outputs, and mushroom spines are where we store our stable long-term memories. If this educated guess proves true, it will turn out that about one quarter of the connections in human cortex are dedicated to memory storage, and the rest dedicated to thinking, about our outside world, and our own memories. That would make us each 75% thinking, and 25% memory machines. Pretty neat, huh?
Our PAIs Will Manage Our Biases, and Make Us Perpetual Learners
To get back to where we are today, we know there will be many valleys and swamps to cross before most of us view ourselves as part-agent, part-biology. We’ll continue to experience social prejudice and conflict from biased, inflexible, extremist biological brains in human society, for decades to come. So besides improving our PAIs, we’ve got to keep empowering human beings, growing their empathy, and moving them from ideology to evidence-based thinking. But now that deep learning is on the scene, I believe we’ll make increasingly more progress decreasing human prejudice and bias by improving our PAIs, even more than our brains. Both strategies are important, but the first is far more exponential, for deep universal reasons.
As we come to see our agents as a natural part of us, we’ll re-understand ourselves as lifelong learners, as perpetual children, as experimenters, and as investigators. Our accelerating personal learning abilities via our agents will make us much less inflexible, dogmatic and judgmental of others. When it isn’t so hard to change our views, via our agents views, and when our agents know and have mapped our cognitive and social biases, and are helping us to manage them, every position will become more lightly held, able to be improved by the latest theories and data. At least in our PAI’s mind.
In short, we’re on the edge of an amazing world, and there’s never been a better time to be intelligent optimists. Thank you for reading.
Calls to Action
● Consider putting some of your speculative investment savings into a company using or improving deep learning. My top pick at present is NVIDIA (NVDA). They are trading at 45 with a P/E of 39. They have gained 225% over the last 18 months, and they may drop a bit soon due to profit-taking, as they’ve just recently run up a 125% increase. Nevertheless I predict they will gain at least 80% or more in value yet again over the next 12–36 mos. The impact of deep learning is still greatly undervalued in the business world, and NVIDIA is a solid company in the right place and time to be a Levi Strauss & Co. to the coming Gold Rush.
● Consider funding an individual’s deep learning, computer science or neuroscience training or research on GoFundMe or a similar site, and investing for equity in a deep learning startup on an equity crowdfunding site like StartEngine and Crowdfunder (available to any of us) or one of the (presently) 122 deep learning startups on AngelList (for accredited investors only, unless you join a syndicate).
CC 4.0. Anyone may share or adapt, but please with link and attribution.