International Journal of Machine Learning and Cybernetics

Periodic frequent subgraph mining in dynamic graphs

Jiayu Cai
Zhaoming Chen
Guoting Chen
Wensheng Gan
Amaël Broustet
Publication typeJournal Article
Publication date2025-02-25
scimago Q1
SJR0.988
CiteScore7.9
Impact factor3.1
ISSN18688071, 1868808X
He C., Chen X., Chen G., Gan W., Fournier-Viger P.
2024-07-01 citations by CoLab: 2 Abstract  
Graph mining has numerous real-world applications. The goal is to extract interesting subgraphs or patterns in graph databases. Dynamic attributed graphs are more complex databases in which graphs change over time, and each vertex has multiple attributes. However, most algorithms only pay attention to finding rules that show relationships between nodes, and they typically have substantial limitations, such as finding rules with a single attribute or vertex and edges that do not change over time. To address these limitations, we propose an algorithm called CAR-Miner, which aims to mine credible attribute rules in dynamic attributed graphs. The algorithm incorporates a novel objective interestingness measure that is stable, anti-monotonic, and can eliminate cross-support core patterns. We conducted several experiments on real-life databases. The results show that our algorithm outperforms the state-of-the-art algorithm, especially as the number of attributes increases. Additionally, we find patterns that have practical significance and can be interpreted in various ways. Overall, CAR-Miner is a promising algorithm for mining patterns in dynamic attributed graphs, which can help researchers and practitioners identify valuable patterns that were previously difficult to discover.
Chen Z., Chen X., Chen G., Gan W.
2023-12-15 citations by CoLab: 3
Dong Y., Ma J., Wang S., Chen C., Li J.
2023-10-01 citations by CoLab: 40
Cheng Z., Andriamampianina L., Ravat F., Song J., Vallès-Parlangeau N., Fournier-Viger P., Selmaoui-Folcher N.
2023-05-27 citations by CoLab: 1 Abstract  
Mining patterns in a dynamic attributed graph has received more and more attention recently. However, it is a complex task because both graph topology and attributes values of each vertex can change over time. In this work, we focus on the discovery of frequent sequential subgraph evolutions (FSSE) in such a graph. These FSSE patterns occur both spatially and temporally, representing frequent evolutions of attribute values for general sets of connected vertices. A novel algorithm, named FSSEMiner, is proposed to mine FSSE patterns. This algorithm is based on a new strategy (graph addition) to guarantee mining efficiency. Experiments performed on benchmark and real-world datasets show the interest of our approach and its scalability.
Peng H., Zhang D.
Information Sciences scimago Q1
2023-05-01 citations by CoLab: 6 Abstract  
The extraction of frequent subgraphs is a basic and well studied operation on graphs. Thus, mining frequent graph patterns and problems associated with it is very important. However, the number of frequent subgraphs is potentially exponential while mining large graph patterns. This issue can be partly overcome using closed frequent graphs mining. Instead of mining all frequent subgraphs, it is more efficient to enumerate only the closed frequent graphs. Thus, in this paper, we propose a novel closed frequent subgraph mining algorithm: CFGM. In this algorithm, a stack-based architecture is used to enumerate the frequent graph represented by the depth-first search. Moreover, this algorithm defines a strict partial order among frequent graphs. We demonstrate that, with respect to this strict partial order, only maximal elements (frequent graphs) need to be discovered. A pruning strategy is developed based on this strict partial order that dramatically reduces unnecessary frequent subgraphs to be enumerated. Computational results show our algorithm displays excellent performance, especially for some large asymmetric frequent graph patterns.
Inoubli W., Aridhi S., Mezni H., Maddouri M., Mephu Nguifo E.
2022-09-01 citations by CoLab: 6 Abstract  
Graph clustering is one of the key techniques to understand structures that are presented in networks. In addition to clusters, bridges and outliers detection is also a critical task as it plays an important role in the analysis of networks. Recently, several graph clustering methods are developed and used in multiple application domains such as biological network analysis, recommendation systems and community detection. Most of these algorithms are based on the structural clustering algorithm. Yet, this kind of algorithm is based on the structural similarity. This latter requires to parse all graph’ edges in order to compute the structural similarity. However, the height needs of similarity computing make this algorithm more adequate for small graphs, without significant support to deal with large-scale networks. In this paper, we propose a novel distributed graph clustering algorithm based on structural graph clustering. The experimental results show the efficiency in terms of running time of the proposed algorithm in large networks compared to existing structural graph clustering methods. • An adaptation of the edge partitioning method in a distributed setting. • A novel scalable clustering method for distributed networks. • An incremental graph clustering algorithm for both large and dynamic graphs. • An experimental study to evaluate the novel scalable clustering method for distributed networks.
Banerjee S.
2022-06-28 citations by CoLab: 4 Abstract  
An uncertain graph (also known as probabilistic graph) is a generic model to represent many real-world networks from social to biological. In recent times, analysis and mining of uncertain graphs have drawn significant attention from the researchers of the data management community. Several noble problems have been introduced, and efficient methodologies have been developed to solve those problems. Hence, there is a need to summarize the existing results on this topic in a self-organized way. In this paper, we present a comprehensive survey on uncertain graph mining focusing on mainly three aspects: (i) different problems studied, (ii) computational challenges for solving those problems, and (iii) proposed methodologies. Finally, we list out important future research directions.
Shaul Z., Naaz S.
2021-12-15 citations by CoLab: 7 Abstract  
gSpan is a popular algorithm for mining frequent subgraphs. cgSpan (closed graph-based substructure pattern mining) is a gSpan extension that only mines closed subgraphs. A subgraph g is closed in the graphs database if there is no proper frequent supergraph of g that has equivalent occurrence with g. cgSpan adds the Early Termination pruning method to the gSpan pruning methods, while leaving the original gSpan steps unchanged.cgSpan also detects and handles cases in which Early Termination should not be applied.To the best of our knowledge, cgSpan is the first publicly available implementation for closed graphs mining.
Nguyen L.B., Nguyen L.T., Zelinka I., Snasel V., Nguyen H.S., Vo B.
IEEE Access scimago Q1 wos Q2 Open Access
2021-12-07 citations by CoLab: 11 Abstract  
Mining frequent subgraphs is an interesting and important problem in the graph mining field, in that mining frequent subgraphs from a single large graph has been strongly developed, and has recently attracted many researchers. Among them, MNI-based approaches are considered as state-of-the-art, such as the GraMi algorithm. Besides frequent subgraph mining (FSM), frequent closed frequent subgraph mining was also developed. This has many practical applications and is a fundamental premise for many studies. This paper proposes the CloGraMi (Closed Frequent Subgraph Mining) algorithm based on GraMi to find all closed frequent subgraphs in a single large graph. Two effective strategies are also developed, the first one is a new level order traversal strategy to quickly determine closed subgraphs in the searching process, and the second is setting a condition for early pruning a large portion of non-closed candidates, both of them aim to reduce the running time as well as the memory requirements, improve the performance of the proposed algorithm. Our experiments are performed on five real datasets (both directed and undirected graphs) and the results show that the running time as well as the memory requirements of our algorithm are significantly better than those of the GraMi-based algorithm.
Pourhatami A., Kaviyani-Charati M., Kargar B., Baziyad H., Kargar M., Olmeda-Gómez C.
Scientometrics scimago Q1 wos Q1
2021-06-15 citations by CoLab: 29 Abstract  
Over the two last decades, coronaviruses have affected human life in different ways, especially in terms of health and economy. Due to the profound effects of novel coronaviruses, growing tides of research are emerging in various research fields. This paper employs a co-word analysis approach to map the intellectual structure of the coronavirus literature for a better understanding of how coronavirus research and the disease itself have developed during the target timeframe. A strategic diagram has been drawn to depict the coronavirus domain’s structure and development. A detailed picture of coronavirus literature has been extracted from a huge number of papers to provide a quick overview of the coronavirus literature. The main themes of past coronavirus-related publications are (a) “Antibody-Virus Interactions,” (b) “Emerging Infectious Diseases,” (c) “Protein Structure-based Drug Design and Antiviral Drug Discovery,” (d) “Coronavirus Detection Methods,” (e) “Viral Pathogenesis and Immunity,” and (f) “Animal Coronaviruses.” The emerging infectious diseases are mostly related to fatal diseases (such as Middle East respiratory syndrome, severe acute respiratory syndrome, and COVID-19) and animal coronaviruses (including porcine, turkey, feline, canine, equine, and bovine coronaviruses and infectious bronchitis virus), which are capable of placing animal-dependent industries such as the swine and poultry industries under strong economic pressure. Although considerable research into coronavirus has been done, this unique field has not yet matured sufficiently. Therefore, “Antibody-virus Interactions,” “Emerging Infectious Diseases,” and “Coronavirus Detection Methods” hold interesting, promising research gaps to be both explored and filled in the future.
Huang J., Jaysawal B.P., Wang C.
2021-04-05 citations by CoLab: 9 Abstract  
Periodic pattern has been utilized in many real life applications, such as weather conditions in a particular season, transactions in a superstore, power consumption, computer network fault analysis, and analysis of DNA and protein sequences. Periodic pattern mining is a popular though challenging research field in data mining because periodic patterns are of different types (namely full, inner, and tail patterns) and varied periodicities (namely perfect, imperfect, and asynchronous periodicity). Previous periodic pattern mining methods have some disadvantages: (1) Previous methods have to find different patterns separately; (2) They require postprocessing such as level-by-level join strategies for mining complex periodic patterns which have wildcards between two items. They cannot mine full, tail, and inner periodic patterns with perfect, imperfect, and asynchronous periodicities simultaneously. Therefore, an effective and comprehensive approach capable of discovering the above specified kinds of periodic patterns is needed. We propose a novel suffix tree-based algorithm, Mining dIfferent kinds of Periodic Patterns Simultaneously, MIPPS, to address the above issues. MIPPS finds different kinds of periodic patterns with different periodicities simultaneously without level-by-level join techniques using a novel incremental propagation generator. In addition, MIPPS mines periodic patterns efficiently using some pruning strategies. For the performance evaluation, we use both synthetic and real data to confirm good performance and scalability with complex periodic patterns.
Hosseini S., Baziyad H., Norouzi R., Jabbedari Khiabani S., Gidófalvi G., Albadvi A., Alimohammadi A., Seyedabrishami S.
Scientometrics scimago Q1 wos Q1
2021-02-19 citations by CoLab: 27 Abstract  
Using geographic information systems (GIS) widely for dealing with transportation problems (is well-known as GIS-T), has made it nessasary for researchers to discover the current state-of-the-art and predict the trends of future research. This paper aims to contribute to a better understanding of GIS-T research area from a longitudinal perspective, over the period 2008–2019. A co-word analysis was used to illustrate all the underlying subfields of GIS-T based on published papers in the Web of Science (WoS) database service. The main knowledge areas representing the intellectual structure of GIS-T including (a) sustainability, (b) health, (c) planning and management, and (d) methods and tools, were detected. Finally, in order to illustrate the structure and development of the identified clusters, two-dimensional maps and strategic diagrams for each period were drawn. This study is the first attempt to employ a text mining method so as to detect the conceptual structure of GIS-T research area from a complex and interdisciplinary literature.
Fournier-Viger P., Wang Y., Yang P., Lin J.C., Yun U., Kiran R.U.
Applied Intelligence scimago Q2 wos Q2
2021-02-03 citations by CoLab: 31 Abstract  
Discovering periodic patterns consists of identifying all sets of items (values) that periodically co-occur in a discrete sequence. Although traditional periodic pattern mining algorithms have multiple applications, they have two key limitations. First, they consider that a pattern is not periodic if the time difference between two of its successive occurrences is greater than a maxPer threshold. But this constraint is too strict, as a pattern may be discarded based on only two of its occurrences, although it may be usually periodic. Second, traditional algorithms use a constraint that the support (occurrence frequency) of a pattern must be no less than a minSup threshold. But setting that parameter is not intuitive. Hence, it is usually set by trial and error, which is time-consuming. This paper addresses the first limitation by introducing a concept of stability to find periodic patterns that have a stable periodic behavior. Then, the second limitation is addressed by proposing an algorithm named TSPIN (Top-k Stable Periodic pattern mINer) to find the top-k stable periodic patterns, where the user can directly specify the number of patterns k to be found rather than using the minSup threshold. Several experiments have been performed to assess TSPIN’s performance, and it was found that it is efficient and can discover patterns that reveal interesting insights in real data.
Zhang Q., Guo D., Zhao X., Li X., Wang X.
2020-10-19 citations by CoLab: 5 Abstract  
\emphSeasonal periodicity is a frequent phenomenon for social interactions in temporal networks. A key property of this behavior is that it exhibits periodicity for multiple particular periods in temporal networks. Mining such seasonal-periodic patterns is significant since it can indicate interesting relationships between the individuals involved in the interactions. Unfortunately, most previous studies for periodic pattern mining ignore the seasonal feature. This motivates us to explore mining seasonal-periodic subgraphs, and the investigation presents a novel model, called maximal σ-periodic $ømega$-seasonal k-subgraph. It represents a subgraph with size larger than k and that appears at least σ times periodically in at least $ømega$ particular periods on the temporal graph. Since seasonal-periodic patterns do not satisfy the anti-monotonic property, we propose a weak version of support measure with an anti-monotonic property to reduce the search space efficiently. Then, we present an effective mining algorithm to seek all maximal σ-periodic $ømega$-seasonal k-subgraphs. Experimental results on real-life datasets show the effectiveness and efficiency of our approach.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?