中国科技论文在线

上传时间

2020年11月04日

ACM Transactions on Design Automation of Electronic Systems，2015，20（2）：18

2015年03月01日

Architectural design spaces of microprocessors are often exponentially large with respect to the pending processor parameters. To avoid simulating all configurations in the design space, machine learning and statistical techniques have been utilized to build regression models for characterizing the relationship between architectural configurations and responses (e.g., performance or power consumption). However, this article shows that the accuracy variability of many learning techniques over different design spaces and benchmarks can be significant enough to mislead the decision-making. This clearly indicates a high risk of applying techniques that work well on previous modeling tasks (each involving a design space, benchmark, and design objective) to a new task, due to which the powerful tools might be impractical. Inspired by ensemble learning in the machine learning domain, we propose a robust framework called ELSE to reduce the accuracy variability of design space modeling. Rather than employing a single learning technique as in previous investigations, ELSE employs distinct learning techniques to build multiple base regression models for each modeling task. This is not a trivial combination of different techniques (e.g., always trusting the regression model with the smallest error). Instead, ELSE carefully maintains the diversity of base regression models and constructs a metamodel from the base models that can provide accurate predictions even when the base models are far from accurate. Consequently, we are able to reduce the number of cases in which the final prediction errors are unacceptably large. Experimental results validate the robustness of ELSE: compared with the widely used artificial neural network over 52 distinct modeling tasks, ELSE reduces the accuracy variability by about 62%. Moreover, ELSE reduces the average prediction error by 27% and 85% for the investigated MIPS and POWER design spaces, respectively.

无

0

29浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】A Small-Footprint Accelerator for Large-Scale Neural Networks

ACM Transactions on Computer Systems，2015，33（2）：6

2015年05月01日

摘要

Machine-learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve toward heterogeneous multicores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have been focusing on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02mm<sup>2</sup> and 485mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87 × faster, and it can reduce the total energy by 21.08 ×. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

无

0

35浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Iterative optimization for the data center

ACM SIGPLAN Notices，2012，47（4）：

2012年03月01日

摘要

Iterative optimization is a simple but powerful approach that searches for the best possible combination of compiler optimizations for a given workload. However, each program, if not each data set, potentially favors a different combination. As a result, iterative optimization is plagued by several practical issues that prevent it from being widely used in practice: a large number of runs are required for finding the best combination; the process can be data set dependent; and the exploration process incurs significant overhead that needs to be compensated for by performance benefits.Therefore, while iterative optimization has been shown to have significant performance potential, it is seldomly used in production compilers. In this paper, we propose Iterative Optimization for the Data Center (IODC): we show that servers and data centers offer a context in which all of the above hurdles can be overcome. The basic idea is to spawn different combinations across workers and recollect performance statistics at the master, which then evolves to the optimum combination of compiler optimizations. IODC carefully manages costs and benefits, and is transparent to the end user. We evaluate IODC using both MapReduce and throughput compute-intensive server applications. In order to reflect the large number of users interacting with the system, we gather a very large collection of data sets (at least 1000 and up to several million unique data sets per program), for a total storage of 10.7TB, and 568 days of CPU time. We report an average performance improvement of 1.48×, and up to 2.08×, for the MapReduce applications, and 1.14×, and up to 1.39×, for the throughput compute-intensive server applications.

无

0

23浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Leveraging the Error Resilience of Neural Networks for Designing Highly Energy Efficient Accelerators

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems，2015，34（8）：1223 - 123

2015年04月03日

摘要

In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for slashing energy consumption in many applications that can tolerate a certain degree of inaccuracy. Driven by the principle of trading tolerable amounts of application accuracy in return for significant resource savings-the energy consumed, the (critical path) delay, and the (silicon) area-this approach has been limited to application-specified integrated circuits (ASICs) so far. These ASIC realizations have a narrow application scope and are often rigid in their tolerance to inaccuracy, as currently designed; the latter often determining the extent of resource savings we would achieve. In this paper, we propose to improve the application scope, error resilience and the energy savings of inexact computing by combining it with hardware neural networks. These neural networks are fast emerging as popular candidate accelerators for future heterogeneous multicore platforms and have flexible error resilience limits owing to their ability to be trained. Our results in 65-nm technology demonstrate that the proposed inexact neural network accelerator could achieve 1.78-2.67× savings in energy consumption (with corresponding delay and area savings being 1.23 and 1.46×, respectively) when compared to the existing baseline neural network implementation, at the cost of a small accuracy loss (mean squared error increases from 0.14 to 0.20 on average).

无

0

42浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Accelerating Architectural Simulation Via Statistical Techniques: A Survey

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems，2015，35（3）：433 - 446

2015年09月24日

摘要

In computer architecture research and development, simulation is a powerful way of acquiring and predicting processor behaviors. While architectural simulation has been extensively utilized for computer performance evaluation, design space exploration, and computer architecture assessment, it still suffers from the high computational costs in practice. Specifically, the total simulation time is determined by the simulator's raw speed and the total number of simulated instructions. The simulator's speed can be improved by enhanced simulation infrastructures (e.g., simulators with high-level abstraction, parallel simulators, and hardware-assisted simulators). Orthogonal to these work, recent studies also managed to significantly reduce the total number of simulated instructions with a slight loss of accuracy. Interestingly, we observe that most of these work are built upon statistical techniques. This survey presents a comprehensive review to such studies and proposes a taxonomy based on the sources of reduction. In addition to identifying the similarities and differences of state-of-the-art approaches, we further discuss insights gained from these studies as well as implications for future research.

无

0

38浏览
0点赞
0收藏
0分享
0下载
0

引用