中国科技论文在线

上传时间

2020年11月04日

【期刊论文】Godson-3: A Scalable Multicore RISC Processor with x86 Emulation

IEEE Micro，2009，29（2）：17 - 29

2009年04月07日

The Godson-3 microprocessor aims at high-throughput server applications, high-performance scientific computing, and high-end embedded applications. It offers a scalable network on chip, hardware support for x86 emulation, and a reconfigurable architecture. The four-core Godson-3 chip is fabricated with 65-nm CMOS technology. Eight- and 16-core Godson-3 chips are in development.

无

0

39浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems

arXiv，2009，（）：

2009年07月12日

摘要

In multiprocessor systems, various problems are treated with Lamport's logical clock and the resultant logical time orders between operations. However, one often needs to face the high complexities caused by the lack of logical time order information in practice. In this paper, we utilize the \emph{global clock} to infuse the so-called \emph{pending period} to each operation in a multiprocessor system, where the pending period is a time interval that contains the performed time of the operation. Further, we define the \emph{physical time order} for any two operations with disjoint pending periods. The physical time order is obeyed by any real execution in multiprocessor systems due to that it is part of the truly happened operation orders restricted by global clock, and it is then proven to be independent and consistent with traditional logical time orders. The above novel yet fundamental concepts enables new effective approaches for analyzing multiprocessor systems, which are named \emph{pending period analysis} as a whole. As a consequence of pending period analysis, many important problems of multiprocessor systems can be tackled effectively. As a significant application example, complete memory consistency verification, which was known as an NP-hard problem, can be solved with the complexity of O(n2) (where n is the number of operations). Moreover, the two event ordering problems, which were proven to be Co-NP-Hard and NP-hard respectively, can both be solved with the time complexity of O(n) if restricted by pending period information.

无

0

21浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】System Architecture of Godson-3 Multi-Core Processors

Journal of Computer Science and Technology，2010，25（）：181–191

2010年03月16日

摘要

Godson-3 is the latest generation of Godson microprocessor family. It takes a scalable multi-core architecture with hardware support for accelerating applications including X86 emulation and signal processing. This paper introduces the system architecture of Godson-3 from various aspects including system scalability, organization of memory hierarchy, network-on-chip, inter-chip connection and I/O subsystem.

无

0

23浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Linear Time Memory Consistency Verification

IEEE Transactions on Computers，2011，61（4）：502 - 516

2011年02月10日

摘要

Verifying the execution of a parallel program against a given memory consistency model (memory consistency verification) is a crucial problem in the functional validation of Chip Multiprocessor (CMP). In the absence of additional information, the above problem is known to be NP-hard. By adopting the pending period information, this paper proposes the first linear-time software-based approach to memory consistency verification. Our approach relies on a novel technique called reusable cycle checking, which reuses the previous order information when repeatedly checking cycle at different frontiers. In the context of pending period information, this technique significantly reduces the overall computational costs required by cycle checking, enabling linear-time (in the number of memory operations) memory consistency verification for any given multicore system with a constant number of processors. From a practical perspective, an industrial memory consistency verification tool, named XCHECK, has been developed based on our approach. XCHECK is capable of working with neither test program constraint nor dedicated hardware support in postsilicon verifications of many multiprocessor systems. Experimental results show that XCHECK is 3-10 times faster than a state-of-art software-based approach. XCHECK has been integrated into the verification platforms for an industrial multicore processor Godson-3B, and found several bugs of the design.

无

0

35浏览
0点赞
0收藏
0分享
0下载
0

引用

上传时间

2020年11月04日

【期刊论文】Program Regularization in Memory Consistency Verification

IEEE Transactions on Parallel and Distributed Systems，2012，23（11）：2163 - 217

2012年01月31日

摘要

A widely adopted methodology for verifying the memory subsystem of a Chip Multiprocessor (CMP) is to verify executions of parallel test programs on the CMP against the given memory consistency model, which has been long known to be time consuming in both theory and practice. To accelerate memory consistency verification, previous approaches have to bear the cost of availability (e.g., relying on dedicated hardware supports that have not been offered by many commodity CMPs) or completeness (e.g., missing some bugs). In the meantime, the impact of parallel programs on memory consistency verification has more or less been overlooked. One piece of evidence is that few investigations have been dedicated to finding appropriate test programs enabling more efficient verification From a novel perspective of test program, we devise a practical technique called “program regularization,” which can effectively reduce the computation time of memory consistency verification. The key intuition behind program regularization is that any parallel program, if being reformed appropriately, can enable efficient memory consistency verification. More specifically, for an original program, program regularization introduces some auxiliary memory addresses, and periodically inserts load/store operations accessing these addresses to the original program. With the regularized program, memory consistency verification can be accomplished in linear time (with respect to the number of memory operations) when the number of processors is fixed. Experimental results show that program regularization can significantly accelerate memory consistency verification. Last but not least, our technique, which does not rely on concrete verification algorithm or dedicated hardware support, can be smoothly integrated into existing presilicon/postsilicon verification platforms of industrial CMPs to speed up memory consistency verification.

无

0

40浏览
0点赞
0收藏
0分享
0下载
0

引用