中国科技论文在线

上传时间

2020年11月04日

【期刊论文】Performance Portability Across Heterogeneous SoCs Using a Generalized Library-Based Approach

ACM Transactions on Architecture and Code Optimization，2014，11（2）：21

2014年06月01日

Because of tight power and energy constraints, industry is progressively shifting toward heterogeneous system-on-chip (SoC) architectures composed of a mix of general-purpose cores along with a number of accelerators. However, such SoC architectures can be very challenging to efficiently program for the vast majority of programmers, due to numerous programming approaches and languages. Libraries, on the other hand, provide a simple way to let programmers take advantage of complex architectures, which does not require programmers to acquire new accelerator-specific or domain-specific languages. Increasingly, library-based, also called algorithm-centric, programming approaches propose to generalize the usage of libraries and to compose programs around these libraries, instead of using libraries as mere complements. In this article, we present a software framework for achieving performance portability by leveraging a generalized library-based approach. Inspired by the notion of a component, as employed in software engineering and HW/SW codesign, we advocate nonexpert programmers to write simple wrapper code around existing libraries to provide simple but necessary semantic information to the runtime. To achieve performance portability, the runtime employs machine learning (simulated annealing) to select the most appropriate accelerator and its parameters for a given algorithm. This selection factors in the possibly complex composition of algorithms used in the application, the communication among the various accelerators, and the tradeoff between different objectives (i.e., accuracy, performance, and energy). Using a set of benchmarks run on a real heterogeneous SoC composed of a multicore processor and a GPU, we show that the runtime overhead is fairly small at 5.1% for the GPU and 6.4% for the multi-core. We then apply our accelerator selection approach to a simulated SoC platform containing multiple inexact accelerators. We show that accelerator selection together with hardware parameter tuning achieves an average 46.2% energy reduction and a speedup of 2.1× while meeting the desired application error target.

无

0

33浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Robust Design Space Modeling

ACM Transactions on Design Automation of Electronic Systems，2015，20（2）：18

2015年03月01日

摘要

Architectural design spaces of microprocessors are often exponentially large with respect to the pending processor parameters. To avoid simulating all configurations in the design space, machine learning and statistical techniques have been utilized to build regression models for characterizing the relationship between architectural configurations and responses (e.g., performance or power consumption). However, this article shows that the accuracy variability of many learning techniques over different design spaces and benchmarks can be significant enough to mislead the decision-making. This clearly indicates a high risk of applying techniques that work well on previous modeling tasks (each involving a design space, benchmark, and design objective) to a new task, due to which the powerful tools might be impractical. Inspired by ensemble learning in the machine learning domain, we propose a robust framework called ELSE to reduce the accuracy variability of design space modeling. Rather than employing a single learning technique as in previous investigations, ELSE employs distinct learning techniques to build multiple base regression models for each modeling task. This is not a trivial combination of different techniques (e.g., always trusting the regression model with the smallest error). Instead, ELSE carefully maintains the diversity of base regression models and constructs a metamodel from the base models that can provide accurate predictions even when the base models are far from accurate. Consequently, we are able to reduce the number of cases in which the final prediction errors are unacceptably large. Experimental results validate the robustness of ELSE: compared with the widely used artificial neural network over 52 distinct modeling tasks, ELSE reduces the accuracy variability by about 62%. Moreover, ELSE reduces the average prediction error by 27% and 85% for the investigated MIPS and POWER design spaces, respectively.

无

0

29浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Statistical Performance Comparisons of Computers

IEEE Transactions on Computers，2014，65（5）： 1442 - 14

2014年04月04日

摘要

As a fundamental task in computer architecture research, performance comparison has been continuously hampered by the variability of computer performance. In traditional performance comparisons, the impact of performance variability is usually ignored (i.e., the means of performance observations are compared regardless of the variability), or in the few cases directly addressed with i-statistics without checking the number and normality of performance observations. In this paper, we formulate a performance comparison as a statistical task, and empirically illustrate why and how common practices can lead to incorrect comparisons. We propose a non-parametric hierarchical performance testing (HPT) framework for performance comparison, which is significantly more practical than standard i-statistics because it does not require to collect a large number of performance observations in order to achieve a normal distribution of sample mean. In particular, the proposed HPT can facilitate quantitative performance comparison, in which the performance speedup of one computer over another is statistically evaluated. Compared with the HPT, a common practice which uses geometric mean performance scores to estimate the performance speedup has errors of 8.0 to 56.3 percent on SPEC CPU2006 or SPEC MPI2007, which demonstrates the necessity of using appropriate statistical techniques. This HPT framework has been implemented as an open-source software, and integrated in the PARSEC 3.0 benchmark suite.

无

0

27浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Architecture Support for Task Out-of-Order Execution in MPSoCs

IEEE Transactions on Computers，2014，64（5）：1296 - 131

2014年04月09日

摘要

Multi-processor system on chip (MPSoC) has been widely applied in embedded systems in the past decades. However, it has posed great challenges to efficiently design and implement a rapid prototype for diverse applications due to heterogeneous instruction set architectures (ISA), programming interfaces and software tool chains. In order to solve the problem, this paper proposes a novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs. The architecture support is composed of a hierarchical middleware with an automatic task level OoO parallel execution engine. Incorporated with a hierarchical OoO layer model, the middleware is able to identify the parallel regions and generate the sources codes automatically. Besides, a runtime middleware Task-Scoreboarding analyzes the inter-task data dependencies and automatically schedules and dispatches the tasks with parameter renaming techniques. The middleware has been verified by the prototype built on FPGA platform. Examples and a JPEG case study demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism.

无

0

27浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Effective and efficient microprocessor design space exploration using unlabeled design configurations

ACM Transactions on Intelligent Systems and Technology，2014，5（1）：20

2014年01月01日

摘要

Ever-increasing design complexity and advances of technology impose great challenges on the design of modern microprocessors. One such challenge is to determine promising microprocessor configurations to meet specific design constraints, which is called Design Space Exploration (DSE). In the computer architecture community, supervised learning techniques have been applied to DSE to build regression models for predicting the qualities of design configurations. For supervised learning, however, considerable simulation costs are required for attaining the labeled design configurations. Given limited resources, it is difficult to achieve high accuracy. In this article, inspired by recent advances in semisupervised learning and active learning, we propose the COAL approach which can exploit unlabeled design configurations to significantly improve the models. Empirical study demonstrates that COAL significantly outperforms a state-of-the-art DSE technique by reducing mean squared error by 35% to 95%, and thus, promising architectures can be attained more efficiently.

无

0

27浏览
0点赞
0收藏
0分享
0下载
0

引用