IRDiff:基于LLVM中间表示的代码差异分析
首发时间:2018-02-09
摘要:代码差异分析是多版本程序分析领域中重要的研究问题之一。已有的工作,包括基于代码行的比较或基于抽象语法树的代码分析,往往会受到代码移动的影响,使得对比过程中,由于对比代码的错位而导致差异分析的不精确。本文提出一种基于LLVM中间表示语言的源代码差异分析算法。该算法首先将待比较程序的两个版本的源代码转换成LLVM中间表示语言,利用LLVM中间表示的语法结构、控制流以及数据流信息,在程序两个版本的控制流图上寻找同构节点,得到一系列同构子图以及差异节点集合,从而得到程序两个版本间的代码差异。利用中间表示语言的层次化结构特性以及控制流信息,能够避免因代码移动导致的对比代码错位问题,并且结合控制依赖、数据依赖以及类型信息,可以提高代码差异分析的准确性。基于该算法,我们实现了一个代码差异分析工具IRDiff,并使用SIR 程序集进行实验评估,实验表明该工具可以较好地分析C/C++程序不同版本之间的代码差异。
关键词: 代码演化分析 源代码差异 控制流图 中间表示 程序理解
For information in English, please click here
IRDiff:Source Code Differencing based on LLVM IR
Abstract:Abstract: Source Code differencing is one of the most important research issues in multi-version program analysis. Existing work, including text differencing or abstract syntax tree-based code analysis, is often affected by code movement, making the comparison inaccurate due to the misalignment of the code. This paper proposed a new approach of source code difference analysis based on LLVM intermediate representation language (LLVM IR). The two versions of the source code are as input transformed to the LLVM IR. Then based on the syntax structure, control flow and data flow information represented by LLVM IR to find the isomorphic nodes in the two versions of the control flow graph. A series of isomorphic sub-graphs and a set of difference nodes are obtained, and the code difference between two versions of the program is obtained. By using LLVM IR\'s hierarchical structure and control flow information, the problem of code misalignment caused by code movement can be avoided, and the accuracy of code variance analysis can be improved by combining control dependency, data dependence and type information, etc. Based on this algorithm, we implemented a code variance analysis tool, called IRDiff, and used the SIR benchmark for experimental evaluation. Experiments show that the tool can better analyze the code differences between different versions of C / C ++ programs.
Keywords: source evolution analysis control flow graph program comprehension graph differencing intermediate
基金:
引用
No.****
动态公开评议
共计0人参与
勘误表
IRDiff:基于LLVM中间表示的代码差异分析
评论
全部评论0/1000