A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective
Article
Subjects > Engineering
Europe University of Atlantic > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
University of La Romana > Research > Scientific Production
Abierto
Inglés
Data mining is an analytical approach that contributes to achieving a solution to many problems by extracting previously unknown, fascinating, nontrivial, and potentially valuable information from massive datasets. Clustering in data mining is used for splitting or segmenting data items/points into meaningful groups and clusters by grouping the items that are near to each other based on certain statistics. This paper covers various elements of clustering, such as algorithmic methodologies, applications, clustering assessment measurement, and researcher-proposed enhancements with their impact on data mining thorough grasp of clustering algorithms, its applications, and the advances achieved in the existing literature. This study includes a literature search for papers published between 1995 and 2023, including conference and journal publications. The study begins by outlining fundamental clustering techniques along with algorithm improvements and emphasizing their advantages and limitations in comparison to other clustering algorithms. It investigates the evolution measures for clustering algorithms with an emphasis on metrics used to gauge clustering quality, such as the F-measure and the Rand Index. This study includes a variety of clustering-related topics, such as algorithmic approaches, practical applications, metrics for clustering evaluation, and researcher-proposed improvements. It addresses numerous methodologies offered to increase the convergence speed, resilience, and accuracy of clustering, such as initialization procedures, distance measures, and optimization strategies. The work concludes by emphasizing clustering as an active research area driven by the need to identify significant patterns and structures in data, enhance knowledge acquisition, and improve decision making across different domains. This study aims to contribute to the broader knowledge base of data mining practitioners and researchers, facilitating informed decision making and fostering advancements in the field through a thorough analysis of algorithmic enhancements, clustering assessment metrics, and optimization strategies.
metadata
Chaudhry, Mahnoor and Shafi, Imran and Mahnoor, Mahnoor and Ramírez-Vargas, Debora L. and Bautista Thompson, Ernesto and Ashraf, Imran
mail
UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, debora.ramirez@unini.edu.mx, ernesto.bautista@unini.edu.mx, UNSPECIFIED
(2023)
A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective.
Symmetry, 15 (9).
p. 1679.
ISSN 2073-8994
|
Text
symmetry-15-01679-v2.pdf Available under License Creative Commons Attribution. Download (2MB) | Preview |
Abstract
Data mining is an analytical approach that contributes to achieving a solution to many problems by extracting previously unknown, fascinating, nontrivial, and potentially valuable information from massive datasets. Clustering in data mining is used for splitting or segmenting data items/points into meaningful groups and clusters by grouping the items that are near to each other based on certain statistics. This paper covers various elements of clustering, such as algorithmic methodologies, applications, clustering assessment measurement, and researcher-proposed enhancements with their impact on data mining thorough grasp of clustering algorithms, its applications, and the advances achieved in the existing literature. This study includes a literature search for papers published between 1995 and 2023, including conference and journal publications. The study begins by outlining fundamental clustering techniques along with algorithm improvements and emphasizing their advantages and limitations in comparison to other clustering algorithms. It investigates the evolution measures for clustering algorithms with an emphasis on metrics used to gauge clustering quality, such as the F-measure and the Rand Index. This study includes a variety of clustering-related topics, such as algorithmic approaches, practical applications, metrics for clustering evaluation, and researcher-proposed improvements. It addresses numerous methodologies offered to increase the convergence speed, resilience, and accuracy of clustering, such as initialization procedures, distance measures, and optimization strategies. The work concludes by emphasizing clustering as an active research area driven by the need to identify significant patterns and structures in data, enhance knowledge acquisition, and improve decision making across different domains. This study aims to contribute to the broader knowledge base of data mining practitioners and researchers, facilitating informed decision making and fostering advancements in the field through a thorough analysis of algorithmic enhancements, clustering assessment metrics, and optimization strategies.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | clustering; distance measures; data mining; evolution measures; symmetry |
Subjects: | Subjects > Engineering |
Divisions: | Europe University of Atlantic > Research > Scientific Production Ibero-american International University > Research > Scientific Production Universidad Internacional do Cuanza > Research > Scientific Production University of La Romana > Research > Scientific Production |
Date Deposited: | 02 Jan 2024 23:30 |
Last Modified: | 02 Jan 2024 23:30 |
URI: | https://repositorio.uniromana.edu.do/id/eprint/8657 |
Actions (login required)
![]() |
View Item |
<a class="ep_document_link" href="/15983/1/Food%20Science%20%20%20Nutrition%20-%202025%20-%20Tanveer%20-%20Novel%20Transfer%20Learning%20Approach%20for%20Detecting%20Infected%20and%20Healthy%20Maize%20Crop.pdf"><img class="ep_doc_icon" alt="[img]" src="/15983/1.hassmallThumbnailVersion/Food%20Science%20%20%20Nutrition%20-%202025%20-%20Tanveer%20-%20Novel%20Transfer%20Learning%20Approach%20for%20Detecting%20Infected%20and%20Healthy%20Maize%20Crop.pdf" border="0"/></a>
en
open
Novel Transfer Learning Approach for Detecting Infected and Healthy Maize Crop Using Leaf Images
Maize is a staple crop worldwide, essential for food security, livestock feed, and industrial uses. Its health directly impacts agricultural productivity and economic stability. Effective detection of maize crop health is crucial for preventing disease spread and ensuring high yields. This study presents VG-GNBNet, an innovative transfer learning model that accurately detects healthy and infected maize crops through a two-step feature extraction process. The proposed model begins by leveraging the visual geometry group (VGG-16) network to extract initial pixel-based spatial features from the crop images. These features are then further refined using the Gaussian Naive Bayes (GNB) model and feature decomposition-based matrix factorization mechanism, which generates more informative features for classification purposes. This study incorporates machine learning models to ensure a comprehensive evaluation. By comparing VG-GNBNet's performance against these models, we validate its robustness and accuracy. Integrating deep learning and machine learning techniques allows VG-GNBNet to capitalize on the strengths of both approaches, leading to superior performance. Extensive experiments demonstrate that the proposed VG-GNBNet+GNB model significantly outperforms other models, achieving an impressive accuracy score of 99.85%. This high accuracy highlights the model's potential for practical application in the agricultural sector, where the precise detection of crop health is crucial for effective disease management and yield optimization.
Muhammad Usama Tanveer mail , Kashif Munir mail , Ali Raza mail , Laith Abualigah mail , Helena Garay mail helena.garay@uneatlantico.es, Luis Eduardo Prado González mail uis.prado@uneatlantico.es, Imran Ashraf mail ,
Tanveer
<a href="/16270/1/s12880-024-01546-4.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/16270/1.hassmallThumbnailVersion/s12880-024-01546-4.pdf" border="0"/></a>
en
open
Novel transfer learning based bone fracture detection using radiographic images
A bone fracture is a medical condition characterized by a partial or complete break in the continuity of the bone. Fractures are primarily caused by injuries and accidents, affecting millions of people worldwide. The healing process for a fracture can take anywhere from one month to one year, leading to significant economic and psychological challenges for patients. The detection of bone fractures is crucial, and radiographic images are often relied on for accurate assessment. An efficient neural network method is essential for the early detection and timely treatment of fractures. In this study, we propose a novel transfer learning-based approach called MobLG-Net for feature engineering purposes. Initially, the spatial features are extracted from bone X-ray images using a transfer model, MobileNet, and then input into a tree-based light gradient boosting machine (LGBM) model for the generation of class probability features. Several machine learning (ML) techniques are applied to the subsets of newly generated transfer features to compare the results. K-nearest neighbor (KNN), LGBM, logistic regression (LR), and random forest (RF) are implemented using the novel features with optimized hyperparameters. The LGBM and LR models trained on proposed MobLG-Net (MobileNet-LGBM) based features outperformed others, achieving an accuracy of 99% in predicting bone fractures. A cross-validation mechanism is used to evaluate the performance of each model. The proposed study can improve the detection of bone fractures using X-ray images.
Aneeza Alam mail , Ahmad Sami Al-Shamayleh mail , Nisrean Thalji mail , Ali Raza mail , Edgar Aníbal Morales Barajas mail , Ernesto Bautista Thompson mail ernesto.bautista@unini.edu.mx, Isabel de la Torre Diez mail , Imran Ashraf mail ,
Alam
<a class="ep_document_link" href="/16577/1/nutrients-17-00521-v2.pdf"><img class="ep_doc_icon" alt="[img]" src="/16577/1.hassmallThumbnailVersion/nutrients-17-00521-v2.pdf" border="0"/></a>
en
open
Nut Consumption Is Associated with Cognitive Status in Southern Italian Adults
Background: Nut consumption has been considered a potential protective factor against cognitive decline. The aim of this study was to test whether higher total and specific nut intake was associated with better cognitive status in a sample of older Italian adults. Methods: A cross-sectional analysis on 883 older adults (>50 y) was conducted. A 110-item food frequency questionnaire was used to collect information on the consumption of various types of nuts. The Short Portable Mental Status Questionnaire was used to assess cognitive status. Multivariate logistic regression analyses were performed to calculate odds ratios (ORs) and 95% confidence intervals (CIs) for the association between nut intake and cognitive status after adjusting for potential confounding factors. Results: The median intake of total nuts was 11.7 g/day and served as a cut-off to categorize low and high consumers (mean intake 4.3 g/day vs. 39.7 g/day, respectively). Higher total nut intake was significantly associated with a lower prevalence of impaired cognitive status among older individuals (OR = 0.35, CI 95%: 0.15, 0.84) after adjusting for potential confounding factors. Notably, this association remained significant after additional adjustment for adherence to the Mediterranean dietary pattern as an indicator of diet quality, (OR = 0.32, CI 95%: 0.13, 0.77). No significant associations were found between cognitive status and specific types of nuts. Conclusions: Habitual nut intake is associated with better cognitive status in older adults.
Justyna Godos mail , Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Evelyn Frias-Toral mail , Raynier Zambrano-Villacres mail , Angel Olider Rojas Vistorte mail angel.rojas@uneatlantico.es, Vanessa Yélamos Torres mail vanessa.yelamos@funiber.org, Maurizio Battino mail maurizio.battino@uneatlantico.es, Fabio Galvano mail , Sabrina Castellano mail , Giuseppe Grosso mail ,
Godos
<a class="ep_document_link" href="/16580/1/ofaf039.pdf"><img class="ep_doc_icon" alt="[img]" src="/16580/1.hassmallThumbnailVersion/ofaf039.pdf" border="0"/></a>
en
open
Background Co-infection of dengue and COVID-19 has increased the health burden worldwide. We found a significant knowledge gap in epidemiology and risk factors of co-infection in Bangladesh. Methods This study included 2458 participants from Dhaka city from December 1, 2021, to November 30, 2023. We performed Kruskal-Walli’s test and χ2 test. Multivariable logistic regression was also performed. Results Co-infection of dengue and COVID-19 was found among 31% of the participants. Co-prevalence of dengue and COVID-19 was found in higher frequency in Jatrabari (14%), and Motijhil (11%). Severe (65%, p-value 0.001) and very severe (78%, p-value 0.005) symptoms were prevalent among the participants aged >50 years. Long-term illness was prevalent among the participants with co-infection (35%, 95% CI 33%- 36%) and COVID-19 (28%, 95% CI 26%- 30%). Co-infected participants had a higher frequency of heart damage (31.6%, p-value 0.005), brain fog (22%, p-value 0.03), and kidney damage (49.3%, p-value 0.001). Fever (100%) was the most prevalent symptom followed by weakness (89.6%), chills (82.4%), fatigue (81.4%), headache (80.6%), feeling thirsty (76.3%), myalgia (75%), pressure in the chest (69.1%), and shortness of breath (68.3%), respectively. Area of residence (OR 2.26, 95% CI 1.96-2.49, p-value 0.01), number of family members (OR 1.45, 95% CI 1.08-1.87, p-value <0.001), and population density (OR 2.43, 95% CI 2.15-3.01, p-value 0.001) were associated with higher odds of co-infection. We found that coinfected participants had a 4 times higher risk of developing severe health conditions (OR 4.22, 95% CI 4.11-4.67, p-value 0.02). Conclusions This is one of the early epidemiologic studies of co-infection of dengue and COVID-19 in Bangladesh.
Nadim Sharif mail , Rubayet Rayhan Opu mail , Afsana Khan mail , Tama Saha mail , Abdullah Ibna Masud mail , Jannatin Naim mail , Zaily Leticia Velázquez Martínez mail zaily.velazquez@unini.edu.mx, Carlos Manuel Osorio García mail carlos.osorio@uneatlantico.es, Meshari A Alsuwat mail , Fuad M Alzahrani mail , Khalid J Alzahrani mail , Isabel De la Torre Díez mail , Shuvra Kanti Dey mail ,
Sharif
<a href="/15625/1/s41598-024-74127-8.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/15625/1.hassmallThumbnailVersion/s41598-024-74127-8.pdf" border="0"/></a>
en
open
Plant stress reduction research has advanced significantly with the use of Artificial Intelligence (AI) techniques, such as machine learning and deep learning. This is a significant step toward sustainable agriculture. Innovative insights into the physiological responses of plants mostly crops to drought stress have been revealed through the use of complex algorithms like gradient boosting, support vector machines (SVM), recurrent neural network (RNN), and long short-term memory (LSTM), combined with a thorough examination of the TYRKC and RBR-E3 domains in stress-associated signaling proteins across a range of crop species. Modern resources were used in this study, including the UniProt protein database for crop physiochemical properties associated with specific signaling domains and the SMART database for signaling protein domains. These insights were then applied to deep learning and machine learning techniques after careful data processing. The rigorous metric evaluations and ablation analysis that typified the study’s approach highlighted the algorithms’ effectiveness and dependability in recognizing and classifying stress events. Notably, the accuracy of SVM was 82%, while gradient boosting and RNN showed 96%, and 94%, respectively and LSTM obtained an astounding 97% accuracy. The study observed these successes but also highlights the ongoing obstacles to AI adoption in agriculture, emphasizing the need for creative thinking and interdisciplinary cooperation. In addition to its scholarly value, the collected data has significant implications for improving resource efficiency, directing precision agricultural methods, and supporting global food security programs. Notably, the gradient boosting and LSTM algorithm outperformed the others with an exceptional accuracy of 96% and 97%, demonstrating their potential for accurate stress categorization. This work highlights the revolutionary potential of AI to completely disrupt the agricultural industry while simultaneously advancing our understanding of plant stress responses.
Tariq Ali mail , Saif Ur Rehman mail , Shamshair Ali mail , Khalid Mahmood mail , Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Tahir Khurshaid mail , Imran Ashraf mail ,
Ali