Deep learning as a new generation of computing models, in recent years, its unprecedented breakthrough has set off a new wave of artificial intelligence development. Deep learning is essentially a multi-level artificial neural network algorithm, which is a neural network that mimics the human brain, simulating the operating mechanism of the human brain from the most basic unit. Because the operating mechanism of the human brain is distinct from that of computers, deep learning is very different from traditional computing models.
The artificial neural network algorithm of deep learning is different from the traditional computing mode. It can spontaneously sum up the rules from the input of a large amount of data, and thus infers to the case that has never been seen before. Therefore, it does not require artificial extraction of the features required to solve the problem or summarize the rules for programming. The artificial neural network algorithm actually establishes the mapping relationship between input data and output data through a large amount of sample data training. The most direct application is in classification identification. For example, the input of the training sample is voice data, and the function of the trained neural network is voice recognition. If the training sample input is face image data, the function realized after training is face recognition.
The traditional computer software is programmed by the programmer according to the functional principle that needs to be realized, and can be input to the computer to run, and the calculation process is mainly embodied in the process of executing the instruction. The deep learning artificial neural network algorithm contains two calculation processes:
1. Train the artificial neural network with the existing sample data;
2. Use a trained artificial neural network to run other data. This difference increases the need for training data volume and parallel computing power, reducing the need for manual understanding of functional principles.
Traditional computing architecture cannot support massive data parallel computing for deep learningAccording to the above analysis, we can see that the biggest difference between deep learning and traditional computing mode is that no programming is required, but massive data parallel computing is required.
Traditional processor architectures (including x86 and ARM, etc.) often require hundreds or even thousands of instructions to complete a neuron's processing and therefore cannot support the massively parallel computing needs of deep learning.
Why can't traditional computing architectures support the massively parallel computing needs of deep learning? Because traditional computing architectures have limited computing resources.
The traditional computing architecture is generally composed of a central operator (executing instruction calculation), a central controller (for instruction execution), memory (storage instruction), input (input programming instruction), and output (output result). The integration of the operator and the central controller on a single chip constitutes the CPU we usually talk about today.
We can see from the internal structure of the CPU: essentially only a single ALU module (logical unit) is used to complete the calculation of the instruction data, and the other modules exist to ensure that the instructions can be executed one after another. This versatile structure is ideal for traditional programming calculations, and can be increased by increasing the CPU's main frequency (uppering the execution time of the instruction unit).
But for a computational requirement that does not require too many program instructions, but requires the deep learning of massive data operations, this structure is very clumsy. In particular, under the current power limitation, it is impossible to speed up the instruction execution speed by increasing the CPU frequency. This contradiction is increasingly irreconcilable. Therefore, deep learning requires new underlying hardware that is more adaptive to such algorithms to speed up the computational process, that is, new hardware plays a very important role in speeding up deep learning. The main way is to use the existing GPU, FPGA and other general-purpose chips.
The new computing platform ecosystem is being establishedGPU was first introduced to deep learning because of its parallel computing advantages
As a chip that meets the needs of image processing, the GPU's ability to parallelize massive data is in line with the requirements of deep learning. Therefore, it is the first to introduce deep learning.
In 2011, Wu Enda took the lead in applying it to Google's brain and achieved amazing results. The results showed that 12 NVIDIAD GPUs can provide deep learning performance equivalent to 2000 CPUs, followed by researchers at New York University, University of Toronto, and Swiss Artificial Intelligence Lab. Accelerate their deep neural networks on the GPU.
Nvidia is the world leader in programmable graphics processing technology, and its core product is the GPU processor.
NVIDIA quickly cuts into the field of artificial intelligence through the outstanding performance of GPU in deep learning, and greatly enhances its programming efficiency, openness and richness by creating NVIDIA CUDA platform, including CNN, DNN, deep sensing network, RNN, LSTM and A platform for enhancing algorithms such as learning networks.
According to Nvidia’s publicly announced news, in just two years, the number of companies that have collaborated with NVIDIA on deep learning has surged nearly 35 times to more than 3,400 companies, including medical, life sciences, energy, Financial services, automotive, manufacturing and entertainment industries.
NVIDIA develops GPUs for all types of intelligent computing devices, enabling deep learning to penetrate all types of smart machines
IT giants vying for open source artificial intelligence platformOn the one hand, the deep learning system needs to use huge data to train it. On the other hand, there are tens of thousands of parameters in the system that need to be adjusted.
The IT giant's open source artificial intelligence platform is designed to mobilize more outstanding engineers to participate in the development of its artificial intelligence system. An open development platform will bring booming downstream applications. The most typical example is the Google Open Source Android platform, which directly contributes to the unprecedented prosperity of downstream mobile Internet applications.
Open source artificial intelligence platform can enhance the attractiveness and competitiveness of cloud computing businessTake Google as an example. Users use the open source TensorFlow platform to train and export the artificial intelligence models they need, and then directly import the models into TensorFlow Serving to provide predictive cloud services. The TensorFlow series provides the entire deep learning model. The service plan is all inclusive.
In essence, users of open source deep learning tools are directly transformed into their cloud computing services. Cloud computing service providers, including Ali and Amazon, embed machine learning platforms as a way to enhance their competitiveness and attract more users.
Since 2015, the world's top artificial intelligence giants have been vying for open source's core artificial intelligence platform, and various open source deep learning frameworks have emerged, including: Caffe, CNTK, MXNet, Neon, TensorFlow, Theano and Torch.
Artificial intelligence spawns a new generation of dedicated computing chipsLooking back at the history of the computer industry, new computing models often spawn new specialized computing chips. The strong demand for new computing in the era of artificial intelligence is spawning new specialized computing chips.
GPU and its limitationsAt present, the new computing requirements of artificial intelligence represented by deep learning mainly use GPUs, FPGAs and other general-purpose chips suitable for parallel computing to achieve acceleration.
When industrial applications are not on the rise, the use of such existing general-purpose chips can avoid the high investment and high risk of specializing in the development of custom-made chips (ASICs). However, since such general-purpose chips are not designed specifically for deep learning, Naturally, there are bottlenecks in performance and power consumption. With the expansion of the scale of artificial intelligence applications, such problems will become increasingly prominent.
The GPU is an image processor designed to cope with the need for massively parallel computing in image processing. Therefore, when applied to deep learning algorithms, there are three limitations:
First, the advantages of parallel computing cannot be fully utilized in the application process. Deep learning involves two calculations: training and application. The GPU is very efficient in training deep learning algorithms, but it can only process one input image at a time. The advantage of parallelism cannot be fully realized.
Second, the hardware structure is fixed without programmability. The deep learning algorithm is not completely stable. If the deep learning algorithm changes greatly, the GPU cannot flexibly configure the hardware structure like the FPGA.
Third, running deep learning algorithms is much less efficient than FPGAs. Academia and industry research has proven that the same performance can be achieved by running deep learning algorithms. The power consumption of GPUs is much larger than that of FPGAs. For example, the artificial intelligence chips of domestic startups, Shenjian Technology, based on FPGA platforms, are relatively GPU-efficient in the same development cycle. There is an order of magnitude improvement.
FPGA and its limitationsFPGA, the field editable gate array, is a new type of programmable logic device. The original intention of the design is to realize the function of the semi-custom chip, that is, the hardware structure can be flexibly changed in real time according to the needs.
The research report shows that the current FPGA market is dominated by Xilinx and Altera, which together share 85% of the market share, of which Altera was acquired by Intel for $16.7 billion in 2015 (this transaction is the largest acquisition case in the history of Intel) Another Xilinx chose to work intensively with IBM, and behind it is the importance of FPGA in the era of artificial intelligence.
Although FPGAs are highly optimistic, even the new generation of Baidu brain is based on the FPGA platform, but it is not developed specifically for the application of deep learning algorithms, there are still many limitations:
First, the basic unit has limited computing power. In order to achieve reconfigurable features, there are a large number of very fine-grained basic units inside the FPGA, but the computational power of each unit (mainly relying on the LUT lookup table) is much lower than the ALU modules in the CPU and GPU.
Second, there is still a big gap between speed and power consumption compared to dedicated custom chips (ASICs).
Third, FPGAs are more expensive, and the cost of a single FPGA is much higher than that of a dedicated custom chip in the case of scale.
Artificial intelligence custom chips are a general trend. From the perspective of development trends, artificial intelligence custom chips will be the general direction of computing chip development:
First, the performance improvement of custom chips is very obvious.For example, NVIDIA's first chip designed for deep learning from scratch, the Tesla P100, is 12 times faster than its GPU series in 2014.
Google's custom-made chip TPU for machine learning has improved hardware performance to the equivalent of 7 years after Moore's Law. It should be pointed out that this rapid increase in performance is significant for the development of artificial intelligence.
Dr. Chen Yunqi, a researcher at the Institute of Computing Technology of the Chinese Academy of Sciences and the founder of the Cambrian Deep Learning Processor Chip, wrote in the "China Computer Society Newsletter" that by designing a specialized instruction set, microstructure, artificial neural circuits, and storage hierarchy, it is possible In 3-5 years, the intelligent processing efficiency of the brain-like computer of the deep learning model is increased by 10,000 times (relative to the Google brain).
The significance of the 10,000-fold increase is that you can put a deep learning supercomputer such as Google Brain on your mobile phone to help us understand and recognize various images, voices and texts locally and in real time. More importantly, it has the ability to train in real time. After that, we can continuously improve our ability by observing people's behavior and become an intelligent assistant in our lives.
Second, the downstream demand is sufficient to dilute the cost of custom chip inputs.The market space of artificial intelligence will not only be limited to traditional computing platforms such as computers and mobile phones. From unmanned cars and drones to smart home appliances, devices that are at least dozens of times larger than smartphones need to introduce awareness. Interaction ability.
Due to the requirements of real-time and training data privacy, these capabilities cannot be completely dependent on the cloud, and must have local hardware and software infrastructure support. From this perspective alone, the demand for artificial intelligence custom chips will be tens of times that of smartphones.
Third, companies that cut into the field of artificial intelligence through algorithms hope to make profits through chip and productization.There are many companies that are currently cutting into artificial intelligence through algorithms, including companies that use algorithms such as speech recognition, image recognition, and ADAS (Advanced Driver Assistance Systems). Because they provide high-frequency, basic functional services, the commercial profitability of algorithms alone often encounters bottlenecks. By chipping and productizing the respective artificial intelligence core algorithms, it not only improves the original performance, but also hopes to pave the way for commercial profit.
At present, famous artificial intelligence companies including Mobileye, Shangtang Technology, and Horizon Robot are working on the core algorithm chip.
At present, the tide of custom-made chips for artificial intelligence has gradually begun to emerge. NVIDIA announced this year that it has invested more than 2 billion US dollars in deep learning dedicated chips, and Google has even secretly operated TPU chips for deep learning for one year. The chip is directly Supported the human-machine Go war that shocked the world. China's Cambrian chips are also scheduled to begin industrialization this year. Google's "AlphaGo" in the Manga Wars uses about 170 graphics processing units (GPUs) and 1200 central processing units (CPUs) that require a single computer room and high-powered air conditioning. As well as a number of experts for system maintenance. The number of chips currently used by AlphaGo will be replaced by a chip of the "Cambrian" architecture developed by the Chinese in the future. It is estimated that a small box will be fully loaded. This means that the "Alpha Dog" will be able to run faster.
The emergence of artificial intelligence-specific chips indicates that the new round of computing model changes starting from the chip level has kicked off, which is the turning point for the artificial intelligence industry to formally mature.
Artificial intelligence chip development roadmapThe purpose of designing the chip is to accelerate the deep learning algorithm and hope to simulate the human brain from the underlying structure to achieve better intelligence.
At present, artificial intelligence chips cover three stages of FPGA-based semi-customization, full-customization for deep learning algorithms, and brain-like computing chips.
Semi-customized artificial intelligence chip based on FPGA is the best choice for semi-customized artificial intelligence chip by using FPGA chip with reconfigurable characteristics when chip demand is not yet large and deep learning algorithm is not stable and needs to be iteratively improved. . The outstanding representative of this type of chip is the domestic startup Shenkang Technology, which designed the "Deep Processing Unit" (DPU) chip, hoping to achieve better performance than GPU with ASIC-level power consumption. Its first products are based on the FPGA platform. Although this semi-custom chip relies on the FPGA platform, it can be quickly developed and quickly iterated by abstracting the instruction set and compiler. Compared with the dedicated FPGA accelerator product, it also has obvious advantages.
Fully custom artificial intelligence chip for deep learning algorithmsThese chips are fully customizable using ASIC design methods, and performance, power, and area metrics are optimized for deep learning algorithms. Google's TPU chip, the Cambrian deep learning processor chip of the Institute of Computing Technology of China's Chinese Academy of Sciences is a typical representative of such chips.
Taking the Cambrian processor as an example, the current Cambrian series already contains three prototype processor structures:
Cambrian No. 1 (English name DianNao, prototype processor structure for neural networks), Cambrian 2 (English name DaDianNao, for large-scale neural networks), Cambrian No. 3 (English name PuDianNao, for multiple Deep learning algorithm).
Among them, Cambrian 2 has a frequency of 606MHz and an area of ​​67.7 mm2 under 28nm process, and the power consumption is about 16W. Its single-chip performance is 21 times higher than that of mainstream GPUs, and the power consumption is only 1/330 of that of mainstream GPUs. The high-performance computing system composed of 64 chips can achieve 450 times better performance than mainstream GPUs, but the total energy consumption is only 1/150.
The third stage: the brain-like computing chip design of the chip is no longer limited to just speed up the deep learning algorithm, but hope to develop a new brain-like computer architecture at the chip basic structure and even the device level, such as using New devices such as resistors and ReRAM to increase storage density.
The research on such chips has become a big gap from the mature technology that can be widely used on a large scale in the market, and even has great risks. However, in the long run, brain-like chips may bring about a revolution in the computing system. A typical example of such a chip is IBM's TrueNorth chip.
The TrueNorth processor consists of 5.4 billion connected transistors and consists of an array of 1 million digital neurons that can communicate with each other through 256 million electrical synapses. The chip uses a different structure than the traditional Von Neumann, fully integrating the memory, processor unit and communication components, so the processing of information is completely local, and because the amount of data processed locally is not large, the traditional computer The bottleneck between memory and CPU no longer exists. At the same time, neurons can communicate with each other conveniently and quickly. As long as they receive pulses (action potentials) from other neurons, these neurons will simultaneously perform actions to implement event-driven asynchronous circuit characteristics. The chip consumes very little power because it doesn't require a synchronous clock: the 16 TrueNorth chips consume only 2.5 watts, which is comparable to a tablet.
Brain-like computing chip market space is hugeIt is predicted that the brain-like computing chip market including consumer terminals will reach the scale of 100 billion US dollars by 2022, of which consumer terminals are the largest market, accounting for 98.17% of the total. Other needs include industrial inspection, aviation, military and defense.
The core chip is the strategic commanding height of the artificial intelligence era
The core chip will determine the infrastructure and future ecology of a new computing era. Therefore, global IT giants such as Google, Microsoft, IBM, and Facebook have invested heavily in accelerating the research and development of artificial intelligence core chips, aiming to seize the strategic commanding heights of the new computing era. Control the dominance of the artificial intelligence era.
Looking back at the evolution of the X86 architecture and ARM architecture that dominated the PC and mobile Internet eras, we can see how important it is to gain a first-mover advantage from the source of the core chip architecture.
Intel X86 processor chip monopolizes the PC era
The computer instruction set architecture can be divided into two types: complex instruction set (CISC) and reduced instruction set (RISC). The X86 architecture, which is monopolized in the PC era, is a complex instruction set. Complex instruction sets have inherent advantages in dealing with complex instructions, but at the same time there are problems of complex design, difficult flow, and high power consumption. In essence, the streamlined instruction set was designed in the 1980s for the shortcomings of complex instruction sets. At the time, the academic community agreed that the streamlined instruction set was more advanced.
However, the PC chip of the PC era, Intel's processor chip 8086, which was used before the invention of the reduced instruction set, adopted the X86 architecture of the complex instruction set. In the subsequent series of 80286, 80386 and other processor chips, the compatible X86 architecture continued to be adopted, while strengthening each A generation of processors is compatible with the upper layers of software, and with Microsoft established the Wintel Alliance to firmly support the entire PC application ecosystem.
Software companies accustomed to using Intel X86 processors are no longer willing to use processors with other architectures, even if they perform better. The result: in the 1990s, almost only one Intel company insisted on developing X86-based processors, but defeated MIPS, PowerPC, IBM, HP, DEC, and other processors with reduced instruction sets. The X86 architecture was firmly in control. The dominance of the PC era.
ARM becomes the hegemon of the mobile Internet eraIn the era of mobile Internet, Intel did not continue its advantages in the PC era, but a previously unknown British chip design company ARM became the new dominant player monopolizing mobile processor chips.
There are three reasons for ARM's success:First, ARM started out as a CPU for Apple in the early 1990s (ARM was jointly funded by Acorn, Apple, and VLSI Technology), so it entered this fast-growing market at the beginning of the smartphone revolution. Apple's relationship laid the first-mover advantage of its architecture in the mobile processor market.
Second, the ARM processor is part of a reduced instruction architecture, and the X86 processor is inherently less power consuming than the complex instruction architecture, which is extremely important in the mobile market.
Third, ARM created a business model that only authorized core design IP to not produce chips, and quickly wooed the major chip giants to establish their own ecological alliance.
The implications of ARM's success are:First, when the new era of computing comes, it is often an excellent opportunity for emerging companies to overtake the curve. The strong traditional giants will inevitably face a reshuffle.
Second, grasp the first-mover advantage of the core chip architecture, and quickly establish an ecosystem based on this is the key to success in the era of a new computing change.
Third, the GPUs and FPGAs currently used are not artificial intelligence custom chips, and there are natural limitations. The artificial intelligence special chips are the same blue line for the giants and start-ups.
We are in the midst of an important turning point from the information age to the intelligent era. Artificial intelligence will promote a new round of computing revolution, and the chip industry as the industry's most upstream is the pioneer of the artificial intelligence era:
On the one hand, it has the meaning of the industry's leading indicators, on the other hand, it is the industry that is the first to start and the most flexible in the early stage of the development of artificial intelligence industry. In the information age, there are billion-dollar chip giants like Intel. The era of artificial intelligence with a larger application market will surely breed more "Intel."
Uv Curing Film For Mobile Tablet
Uv Curing Film For Mobile Tablet,Tempered Glass For Mobile,Phone Curve Edge Protector,Local Tempered Glass
Shenzhen TUOLI Electronic Technology Co., Ltd. , https://www.szhydrogelprotector.com