A few days ago, from Google, Baidu, Intel, AMD, Harvard University and Stanford University jointly released a new benchmarking tool MLPerf. The tool recommended by Wu Enda of AI Technology and Jeff Dean, the leader of Google Machine Learning, was specifically used to measure the execution speed of machine learning software and hardware. Its arrival represents the formal launch of the AI ​​performance comparison program with relatively limited market size. On the right track. In short, the AI ​​performance released by major companies will no longer be able to sell their own boasts.
Last week, the RiseML blog compared Google TPUv2 with the Nvidia V100. Today, Intel Corporation released another blog post that mentions the use of RNN for machine translation data processing. "The Intel Xeon Scalable Processor's AWS Sockeye (https://github.com/awslabs/sockeye) Neural Machine Translation Model Performance up to 4 times that of the Infinity V100."
For a long time, the industry has conducted fierce discussions and debates on the practical significance of AI benchmarking requirements. Supporters believe that the absence of benchmarking tools severely limits the practical application of AI technology.
According to AI technology pioneer Wu Enda in the MLPerf statement, "AI is making changes in various industries, but in order to fully realize the true potential of this technology, we still need faster hardware and software." Of course we want to be more powerful. The resource platform, and the standardization process of the benchmarking program will help AI technology developers create such products, thereby helping adopters to more intelligently select AI options that fit their needs.
Not only Wu Enda, but also Google's machine learning gurus Jeff Dean strongly recommend this tool on Twitter:
Intention: Google is pleased to join Stanford, Berkeley, Harvard, Baidu, Intel, AMD, and others as one of the organizations dedicated to using MLPerf as a universal standard for measuring machine learning performance.
The main goals of the MLPerf project include:
Accelerate the development of machine learning through fair and practical metrics.
A fair comparison of competing systems, while encouraging innovation to improve industry-leading machine learning techniques.
Keep the cost of benchmarking reasonable and allow everyone to participate.
Serves business and research communities.
Provide repeatable and reliable test results.
The comparison of AI performance (including both h/w and s/w directions) has previously been issued by vested interests, this time Intel’s titled “Imprompting Reasoning Performance with Intel Xeon Scala Processors†Bowen is the best example.
We do not deal with Intel here - but it must be admitted that although such comparisons contain important insights, they are often deliberately designed to ensure that a given vendor's performance outperforms its competitors. Therefore, the existence of a standardized benchmarking test will completely resolve this loss of neutrality, thus providing a fair and objective comparison.
The orientation of the MLPerf project is based on past similar projects such as SPEC (Standard Performance Evaluation Group). The MLPerf project statement states that "the emergence of SPEC benchmarks has significantly contributed to the improvement of general computing power. SPEC was launched by the Computer Associates Consortium in 1988 and achieved an average of 1.6 times the average CPU performance improvement over the next 15 years. MLPerf integration There are best practices in the field of existing benchmarks, including the suite of programs used by SPEC, the performance comparison and the innovative comparison methods used by SOR, the deployment of software within DeepBench's production environment, and the time-accuracy metrics of DAWNBench, among others."
Addison Snell, CEO of Intersect360 Research, pointed out that "AI has become a technological power that many companies can not ignore now, so any neutral benchmarking conclusions are very important - especially in the context of selecting competitive technology solutions. However, AI At the same time, it is also a kind of diversified field, so over time, any benchmark may develop into the only dominant option. Five years ago, big data and analytical technology encouraged the enthusiasm of the entire technology industry; however, to date, There is still no unified universal benchmark in this area. I think the same situation may occur in the AI ​​field."
Steve Conway, senior vice president of research at Hyperion Research, said that MLPerf represents a "positive and practical" step. "Because buyers and sellers have been lacking the necessary benchmarks for many years to prove the difference between different AI products and solutions.
The existence of the original benchmark is only to solve the problem of the bounded class in the early stage of AI development. With the rapid increase in the number of unbounded AI problems, we obviously need additional benchmarking tools to assess it, which is extremely important at the economic level. The so-called limited problem is usually relatively simple, such as voice and image recognition or game AI and so on. The problem of unbounded types includes the diagnosis of cancer and the reading of medical image content. The goal is to truly provide advice and decisions for complex issues. â€
MLPerf is currently released on GitHub but is still in early stages of development. As highlighted in the MLPerf statement, "The current version is still in the 'pre-apha' phase, so there are still many areas for improvement. The benchmarking suite is still under development and improvement. Please refer to the suggestions section below to learn how to participate in the project contribution. Feedback from users, we expect to have a major update of the project by the end of May."
Currently, reference implementations are available for all seven benchmarks in the MLPerf suite (from GitHub):
Image Classification - Resnet-50 v1 for ImageNet.
Object Detection – Mask R-CNN for COCO.
Speech Recognition – DeepSpeech2 for Librispeech.
Translation – Transformer for WMT English-German.
Recommended - Neural Collaborative Filtering for MovieLens 20 Million (ml-20m).
Sentiment Analysis – Seq-CNN, for IMDB data sets.
Enhanced - Mini-go, suitable for predicting game action.
Each reference implementation provides the following: code that implements the model in at least one set of frameworks, a Dockerfile that can run benchmarks within the container, a script to download the corresponding dataset, and a model to run and time the model. Script, plus documentation on datasets, models, and machine settings.
According to the instructions on the GitHub page, this benchmark has been verified in the following device configuration:
16 CPU, single NVIDIA P100.
Ubuntu 16.04, including Docker supporting NVIDIA hardware.
600 GB disk (In fact, most benchmarks do not require such a large storage capacity).
We look forward to seeing what kind of benchmarking prospects the AI ​​industry will eventually come up with - a few monopolies or a hundred schools of thought. In such a young market, it is believed that many manufacturers will provide benchmarking tools and services. Stanford University is a member of the MLPerf project and has just released the first DAWNBench v1 deep learning test results.
The Stanford University report stated: "On April 20, 2018, the first deep learning end-to-end benchmarking and performance measurement competition was officially launched to record the time and cost required for ordinary deep learning tasks to achieve the highest level of accuracy, and Reach this highest level of reasoning accuracy, delay, and cost. Focusing on end-to-end performance means that we provide a more objective approach that can be used for different computing frameworks, hardware, optimization algorithms, hyperparameter settings, and affecting actual performance The other factors are standardized and compared."
As one of the contestants, fast.ai, a young artificial intelligence training and artificial intelligence software tool development company, has achieved outstanding results. These benchmark results are very important, and Stanford University is indeed serious about this round of competition. However, apart from this, we obviously still need more similar objective and fair comparison platforms. In this regard, the emergence of MLPerf should help us to overcome the predicament as soon as possible, and we must really and logically choose the AI ​​solution that best suits our actual needs.
LED Flexible Light Strip
Kindwin Technology (H.K.) Limited , https://www.ktl-led.com