深度神经网络的哲学思考

2021年1月5日 / 34次阅读 / Last Modified 2021年1月6日
神经网络

本文用深度神经网络,来表示具有多hidden layer的神经网络。

最初的感知机网络,只有两层,input layer和output layer,因此minsky会说这样的网络连异或XOR这样的规则都学不会。但其实minsky同样也提到了,增加网络的层次,可以解决学不会XOR的问题,只是当时少有人关注到,神经网络的研究进入冰河期。

An MIT professor named Marvin Minsky (who was a grade behind Rosenblatt at the same high school!), along with Seymour Papert, wrote a book called _Perceptrons_ (MIT Press), about Rosenblatt's invention. They showed that a single layer of these devices was unable to learn some simple but critical mathematical functions (such as XOR). In the same book, they also showed that using multiple layers of the devices would allow these limitations to be addressed. Unfortunately, only the first of these insights was widely recognized. As a result, the global academic community nearly entirely gave up on neural networks for the next two decades.

后来不管怎么样,神经网络还是在发展,科学家们证明了带有hidden layer的神经网络,实际上就是个通用函数模拟器,universal approximator,可以模拟任何函数。1986年,可加速计算梯度的BP算法重新浮出水面,让大家认识到其价值,神经网络又来了一个春天。不过,很多人的实践还是停留在只有一层hidden layer的网络结构上。只有1层hidden layer,理论上可以,但是实际上它更加消耗计算资源。假设有两个神经网络的能力一样,有多层hidden layer的网络在计算上消耗的资源,要少于只有一层hidden layer的网络,带来这个结果的原因,很可能是神经元的总数量前者要低,虽然层次更多。(如果用软件开发来类比,就是良好的分层设计,可以减少代码量,增加可维护性,更容易优化)

In the 1980's most models were built with a second layer of neurons, thus avoiding the problem that had been identified by Minsky and Papert (this was their "pattern of connectivity among units," to use the framework above). And indeed, neural networks were widely used during the '80s and '90s for real, practical projects. However, again a misunderstanding of the theoretical issues held back the field. In theory, adding just one extra layer of neurons was enough to allow any mathematical function to be approximated with these neural networks, but in practice such networks were often too big and too slow to be useful.

Although researchers showed 30 years ago that to get practical good performance you need to use even more layers of neurons, it is only in the last decade that this principle has been more widely appreciated and applied. Neural networks are now finally living up to their potential, thanks to the use of more layers, coupled with the capacity to do so due to improvements in computer hardware, increases in data availability, and algorithmic tweaks that allow neural networks to be trained faster and more easily. We now have what Rosenblatt promised: "a machine capable of perceiving, recognizing, and identifying its surroundings without any human training or control."

饶了这么多,我想说的第一个问题是:多层(多hidden layer)深度神经网络,这种pattern与我们很多其它方面的事情竟然保持着高度的一致!

比如我们在解决问题的时候,总是习惯性的将一个大问题,分解成一个个的小问题,甚至对小问题进行继续分解。再解决了一个个小问题后,大问题自然也就搞定了。这种大问题划分小问题的pattern,就是在分层。

比如计算机科学,就是分层的科学。硬件是最底层,然后是OS,然后是各APP。APP不与硬件直接交互,要通过OS。这是大的分层,硬件和软件内部,都还有着非常多的更细分的层次。硬件我不太懂,就说软件,一条非常经典的设计原则是,机制和策略分离,这就是分层,底层是机制,上层是策略,底层的策略还可以继续作为高层策略的机制而存在。软件设计讲究封装和解耦合(隔离),本质也是分层,相互之间仅通过API通信,看不到各自内部的实现细节。

TCPIP网络,也是典型的分层结构,物理层,链路层,IP层,TCP/UDP层,应用层。

下面摘一段文字,说明在做硬件电路设计的时候,层次化的设计能够大幅减少门电路的数量:

So deep circuits make the process of design easier. But they're not just helpful for design. There are, in fact, mathematical proofs showing that for some functions very shallow circuits require exponentially more circuit elements to compute than do deep circuits. For instance, a famous series of papers in the early 1980s* (*The history is somewhat complex, so I won't give detailed references. See Johan Håstad's 2012 paper On the correlation of parity and small-depth circuits for an account of the early history and references. ) showed that computing the parity of a set of bits requires exponentially many gates, if done with a shallow circuit. On the other hand, if you use deeper circuits it's easy to compute the parity using a small circuit: you just compute the parity of pairs of bits, then use those results to compute the parity of pairs of pairs of bits, and so on, building up quickly to the overall parity. Deep circuits thus can be intrinsically much more powerful than shallow circuits.

分层的好处是显而易见的,各自解决各自的问题,相互之间保持一定的独立性,独立发展,内部结构也很清晰。能够被分层分解的,都是可以更好解决或更好发展的。所有这一切,我感觉都与多hidden layer的神经网络,神似!从输入层到输出层,中间很多hidden layer,各自干着我们现在还说不清道不明的事情。

神经网络的第二个迷人的地方是:它是由很多很多极其简单的小单位,组合而成的可以表现出一定智慧的计算结构!这种由很多简单的小单位,组合而成为功能强大的大单位的结构,感觉很上道!

比如著名的生命游戏......(还不太懂)

比如有机体的结构,有无数细胞构成,单个细胞啥都不是,组合在一起,就成了有生命的自然界。

比如人类社会,每个人都是一个小单位,组合在一起就成了家庭,公司,国家......如果人类社会是一台计算机,它的输出层是什么?

词穷了,能想到很多,但已经不知道该如何表达。多层深度神经网络,就像自然界的很多其它事物,他们之间在很深很本质的层面上,有着惊人的高度的一致性。从这个角度来看现在神经网络和深度学习的成功,就是很自然的结果。神经网络如果不是最终人工智能的载体,还能是什么?

-- EOF --

本文链接:https://www.pynote.net/archives/3081

留言区

您的电子邮箱地址不会被公开。 必填项已用*标注


前一篇:
后一篇:

More


©Copyright 麦新杰 Since 2019 Python笔记

go to top