International Journal of Software Engineering and Knowledge Engineering, pages 1-23

GMRepair: Graph Mining Template-based Automated Software Repair

Heling Cao 1, 2, 3, 4
Yanlong Guo 1, 2, 3, 4
Yun Wang 1, 2, 3, 4
Fangchao Tian 1, 2, 3, 4
Zhaolong Wang 1, 2, 3, 4
Yonghe Chu 1, 2, 3, 4
Miaolei Deng 1, 2, 3, 4
Panpan Wang 1, 2, 3, 4
Zhenghao He 1, 2, 3, 4
Shuting Wei 1, 2, 3, 4
Show full list: 10 authors
Publication typeJournal Article
Publication date2025-01-27
scimago Q3
SJR0.251
CiteScore1.9
Impact factor0.6
ISSN02181940, 17936403
Abstract

With the increasing scale and complexity of software recently, automated software bug repair has grown in importance. However, the current automated software bug repair process suffers from issues such as coarse-grained repair granularity and poor patch quality. To address these problems, we propose a graph mining template-based automatic software repair (GMRepair) to improve the performance of automated software bug repair. First, this approach adopts the Ochiai fault localization technique to locate and generate a list of suspicious defect statements. We utilize the GumTree tool to parse the bug and repair program files, generating edit scripts. These edit scripts are then transformed into a graphical representation. Second, we utilize a frequent graph miner to obtain graph mining templates by matching the context of the suspicious statements with the context of the graph mining templates, generating an initial population for them. The buggy program is evolved using genetic programming through mutation and crossover operations, generating new individuals. Finally, we sequentially pass the candidate patches (CPs) through corresponding test cases and prioritize the test cases using priority sorting techniques. Patches that fail to pass the test cases are filtered out, and the patches that pass the test cases are output. We conducted the experiments using two datasets, QuixBugs and Defects4J. In Defects4J, the GMRepair successfully repaired 41 defects, while in QuixBugs, it successfully repaired 15 defects. Compared to the existing methods, GMRepair offers a higher success rate and efficiency in defect repair.

Cao H., Han D., Chu Y., Tian F., Wang Y., Liu Y., Jia J., Ge H.
2024-04-18 citations by CoLab: 1 Abstract  
Automatic program repair (APR) is crucial to improve software quality. Recently, neural machine translation (NMT) based modeling for bug fixes has demonstrated great potential. However, these approaches still have two major challenges. One is that their search space is limited due to the out-of-vocabulary (OOV) problem. The other is that the NMT-based APR models tend to ignore past translation information, which often leads to over-translation and under-translation. To address the above challenges, we propose MNRepair, a new NMT-based APR approach that combines multiple mechanisms to fix bugs in source code. Specifically, we devise an encoder-decoder NMT framework with the attention mechanism. Our framework combines the copy mechanism to overcome the OOV problem that occurs with source code. To deal with the over-translation and under-translation, we utilize a coverage mechanism to record past translation information. MNRepair is able to capture a wide range of repair operators and fix 26 bugs in Defects4J. Our evaluation shows the effectiveness of multiple mechanisms in the repair process.
Chen L., Pei Y., Furia C.A.
2021-12-01 citations by CoLab: 14 Abstract  
Most techniques for automated program repair (APR) use tests to drive the repair process; this makes them prone to generating spurious repairs that overfit the available tests unless additional information about expected program behavior is available. Our previous work on Jaid , an APR technique for Java programs, showed that constructing detailed state abstractions—similar to those employed by techniques for programs with contracts—from plain Java code without any special annotations provides valuable additional information, and hence helps mitigate the overfitting problem. This paper extends the work on Jaid with a comprehensive experimental evaluation involving 693 bugs in three different benchmark suites. The evaluation shows, among other things, that: 1) Jaid is effective: it produced correct fixes for over 15 percent of all bugs, with a precision of nearly 60 percent; 2) Jaid is reasonably efficient: on average, it took less than 30 minutes to output a correct fix; 3) Jaid is competitive with the state of the art, as it fixed more bugs than any other technique, and 11 bugs that no other tool can fix; 4) Jaid is robust: its heuristics are complementary and their effectiveness does not depend on the fine-tuning of parameters. The experimental results also indicate the main trade-offs involved in designing an APR technique based on tests, as well as possible directions for further progress in this line of work.
Nowack V., Bowes D., Counsell S., Hall T., Haraldsson S., Winter E., Woodward J.
2021-10-01 citations by CoLab: 5 Abstract  
Automatic Program Repair (APR) has been proposed to help developers and reduce the time spent repairing programs. Recent APR tools have applied learned templates (fix patterns) to fix code using knowledge from fixes successfully applied in the past. However, there is still no general agreement on the representation of fix patterns, making their application and comparison with a baseline difficult. As a consequence, it is also difficult to expand fix patterns and further enable APR. We automatically generate fix patterns from similar fixes and compare the generated fix patterns against a state-of-the-art taxonomy. Our automated approach splits fixes into smaller, method-level chunks and calculates their similarity. A threshold-based clustering algorithm groups similar chunks and finds matches with state-of-the-art fix patterns. In our evaluation, we present 33 clusters whose fix patterns were generated from the fixes of 835 Defects4J bugs. Of those 33 clusters, 22 matched a state-of-the-art taxonomy with good agreement. The remaining 11 clusters were thematically analysed and generated new fix patterns that expanded the taxonomy. Our new fix patterns should enable APR researchers and practitioners to expand their tools to fix a greater range of bugs in the future.
Wong C., Santiesteban P., Kästner C., Le Goues C.
2021-08-18 citations by CoLab: 32 Abstract  
Automatically repairing a buggy program is essentially a search problem, searching for code transformations that pass a set of tests. Various search strategies have been explored, but they either navigate the search space in an ad hoc way using heuristics, or systemically but at the cost of limited edit expressiveness in the kinds of supported program edits. In this work, we explore the possibility of systematically navigating the search space without sacrificing edit expressiveness. The key enabler of this exploration is variational execution, a dynamic analysis technique that has been shown to be effective at exploring many similar executions in large search spaces. We evaluate our approach on IntroClassJava and Defects4J, showing that a systematic search is effective at leveraging and combining fixing ingredients to find patches, including many high-quality patches and multi-edit patches.
Ye H., Martinez M., Durieux T., Monperrus M.
Journal of Systems and Software scimago Q1 wos Q1
2021-01-01 citations by CoLab: 44 Abstract  
Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an empirical study of automatic repair on a benchmark of bugs called QuixBugs, which has been little studied. In this paper, 1) We report on the characteristics of QuixBugs; 2) We study the effectiveness of 10 program repair tools on it; 3) We apply three patch correctness assessment techniques to comprehensively study the presence of overfitting patches in QuixBugs. Our key results are: 1) 16/40 buggy programs in QuixBugs can be repaired with at least a test suite adequate patch; 2) A total of 338 plausible patches are generated on the QuixBugs by the considered tools, and 53.3% of them are overfitting patches according to our manual assessment; 3) The three automated patch correctness assessment techniques, RGT_Evosuite, RGT_InputSampling and GT_Invariants, achieve an accuracy of 98.2%, 80.8% and 58.3% in overfitting detection, respectively. To our knowledge, this is the largest empirical study of automatic repair on QuixBugs, combining both quantitative and qualitative insights. All our empirical results are publicly available on GitHub in order to facilitate future research on automatic program repair.
Lutellier T., Pham H.V., Pang L., Li Y., Wei M., Tan L.
2020-07-18 citations by CoLab: 205 Abstract  
Automated generate-and-validate (GV) program repair techniques (APR) typically rely on hard-coded rules, thus only fixing bugs following specific fix patterns. These rules require a significant amount of manual effort to discover and it is hard to adapt these rules to different programming languages. To address these challenges, we propose a new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages. To better represent the context of a bug, we introduce a new context-aware NMT architecture that represents the buggy source code and its surrounding context separately. CoCoNuT uses CNNs instead of recurrent neural networks (RNNs), since CNN layers can be stacked to extract hierarchical features and better model source code at different granularity levels (e.g., statements and functions). In addition, CoCoNuT takes advantage of the randomness in hyperparameter tuning to build multiple models that fix different bugs and combines these models using ensemble learning to fix more bugs. Our evaluation on six popular benchmarks for four programming languages (Java, C, Python, and JavaScript) shows that CoCoNuT correctly fixes (i.e., the first generated patch is semantically equivalent to the developer’s patch) 509 bugs, including 309 bugs that are fixed by none of the 27 techniques with which we compare.
Koyuncu A., Liu K., Bissyandé T.F., Kim D., Klein J., Monperrus M., Le Traon Y.
Empirical Software Engineering scimago Q1 wos Q1
2020-03-14 citations by CoLab: 139 Abstract  
Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner’s generated plausible patches are correct.
Tufano M., Watson C., Bavota G., Penta M.D., White M., Poshyvanyk D.
2019-09-02 citations by CoLab: 210 Abstract  
Millions of open source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. First, we mine millions of bug-fixes from the change histories of projects hosted on GitHub in order to extract meaningful examples of such bug-fixes. Next, we abstract the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version. In our empirical investigation, we found that such a model is able to fix thousands of unique buggy methods in the wild. Overall, this model is capable of predicting fixed patches generated by developers in 9--50% of the cases, depending on the number of candidate patches we allow it to generate. Also, the model is able to emulate a variety of different Abstract Syntax Tree operations and generate candidate patches in a split second.
Saha S., Saha R.K., Prasad M.R.
2019-05-01 citations by CoLab: 72 Abstract  
Despite significant advances in automatic program repair (APR) techniques over the past decade, practical deployment remains an elusive goal. One of the important challenges in this regard is the general inability of current APR techniques to produce patches that require edits in multiple locations, i.e., multi-hunk patches. In this work, we present a novel APR technique that generalizes single-hunk repair techniques to include an important class of multi-hunk bugs, namely bugs that may require applying a substantially similar patch at a number of locations. We term such sets of repair locations as evolutionary siblings - similar looking code, instantiated in similar contexts, that are expected to undergo similar changes. At the heart of our proposed method is an analysis to accurately identify a set of evolutionary siblings, for a given bug. This analysis leverages three distinct sources of information, namely the test-suite spectrum, a novel code similarity analysis, and the revision history of the project. The discovered siblings are then simultaneously repaired in a similar fashion. We instantiate this technique in a tool called HERCULES and demonstrate that it is able to correctly fix 46 bugs in the Defects4J dataset, the highest of any individual APR technique to date. This includes 15 multi-hunk bugs and overall 11 bugs which have not been fixed by any other technique so far.
Liu K., Koyuncu A., Bissyande T.F., Kim D., Klein J., Le Traon Y.
2019-04-01 citations by CoLab: 101 Abstract  
Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by "tweaking" the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-theart APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems.
Liu K., Koyuncu A., Kim D., Bissyande T.F.
2019-02-01 citations by CoLab: 109 Abstract  
Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this paper, we propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the AVATAR APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. Evaluated on the Defects4J benchmark, we show that, assuming a perfect localization of faults, AVATAR can generate correct patches to fix 34/39 bugs. We further find that AVATAR yields performance metrics that are comparable to that of the closely-related approaches in the literature. While AVATAR outperforms many of the state-of-the-art pattern-based APR systems, it is mostly complementary to current approaches. Overall, our study highlights the relevance of static bug finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases.
Martinez M., Monperrus M.
2018-08-21 citations by CoLab: 55 Abstract  
Astor is a program repair library which has different modes. In this paper, we present the Cardumen mode of Astor, a repair approach based mined templates that has an ultra-large search space. We evaluate the capacity of Cardumen to discover test-suite adequate patches (aka plausible patches) over the 356 real bugs from Defects4J [11]. Cardumen finds 8935 patches over 77 bugs of Defects4J. This is the largest number of automatically synthesized patches ever reported, all patches being available in an open-science repository. Moreover, Cardumen identifies 8 unique patches, that are patches for Defects4J bugs that were never repaired in the whole history of program repair.
Hua J., Zhang M., Wang K., Khurshid S.
2018-05-27 citations by CoLab: 97 Abstract  
Effective program repair techniques, which modify faulty programs to fix them with respect to given test suites, can substantially reduce the cost of manual debugging. A common repair approach is to iteratively first generate candidate programs with possible bug fixes and then validate them against the given tests until a candidate that passes all the tests is found. While this approach is conceptually simple, due to the potentially high number of candidates that need to first be generated and then be compiled and tested, existing repair techniques that embody this approach have relatively low effectiveness, especially for faults at a fine granularity.
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?