A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective

Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Universidad de La Romana > Investigación > Producción Científica Abierto Inglés Data mining is an analytical approach that contributes to achieving a solution to many problems by extracting previously unknown, fascinating, nontrivial, and potentially valuable information from massive datasets. Clustering in data mining is used for splitting or segmenting data items/points into meaningful groups and clusters by grouping the items that are near to each other based on certain statistics. This paper covers various elements of clustering, such as algorithmic methodologies, applications, clustering assessment measurement, and researcher-proposed enhancements with their impact on data mining thorough grasp of clustering algorithms, its applications, and the advances achieved in the existing literature. This study includes a literature search for papers published between 1995 and 2023, including conference and journal publications. The study begins by outlining fundamental clustering techniques along with algorithm improvements and emphasizing their advantages and limitations in comparison to other clustering algorithms. It investigates the evolution measures for clustering algorithms with an emphasis on metrics used to gauge clustering quality, such as the F-measure and the Rand Index. This study includes a variety of clustering-related topics, such as algorithmic approaches, practical applications, metrics for clustering evaluation, and researcher-proposed improvements. It addresses numerous methodologies offered to increase the convergence speed, resilience, and accuracy of clustering, such as initialization procedures, distance measures, and optimization strategies. The work concludes by emphasizing clustering as an active research area driven by the need to identify significant patterns and structures in data, enhance knowledge acquisition, and improve decision making across different domains. This study aims to contribute to the broader knowledge base of data mining practitioners and researchers, facilitating informed decision making and fostering advancements in the field through a thorough analysis of algorithmic enhancements, clustering assessment metrics, and optimization strategies. metadata Chaudhry, Mahnoor; Shafi, Imran; Mahnoor, Mahnoor; Ramírez-Vargas, Debora L.; Bautista Thompson, Ernesto y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, debora.ramirez@unini.edu.mx, ernesto.bautista@unini.edu.mx, SIN ESPECIFICAR (2023) A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective. Symmetry, 15 (9). p. 1679. ISSN 2073-8994

Vista Previa

Texto
symmetry-15-01679-v2.pdf
Available under License Creative Commons Attribution.
Descargar (2MB) | Vista Previa

URL Oficial: http://doi.org/10.3390/sym15091679

Resumen

Data mining is an analytical approach that contributes to achieving a solution to many problems by extracting previously unknown, fascinating, nontrivial, and potentially valuable information from massive datasets. Clustering in data mining is used for splitting or segmenting data items/points into meaningful groups and clusters by grouping the items that are near to each other based on certain statistics. This paper covers various elements of clustering, such as algorithmic methodologies, applications, clustering assessment measurement, and researcher-proposed enhancements with their impact on data mining thorough grasp of clustering algorithms, its applications, and the advances achieved in the existing literature. This study includes a literature search for papers published between 1995 and 2023, including conference and journal publications. The study begins by outlining fundamental clustering techniques along with algorithm improvements and emphasizing their advantages and limitations in comparison to other clustering algorithms. It investigates the evolution measures for clustering algorithms with an emphasis on metrics used to gauge clustering quality, such as the F-measure and the Rand Index. This study includes a variety of clustering-related topics, such as algorithmic approaches, practical applications, metrics for clustering evaluation, and researcher-proposed improvements. It addresses numerous methodologies offered to increase the convergence speed, resilience, and accuracy of clustering, such as initialization procedures, distance measures, and optimization strategies. The work concludes by emphasizing clustering as an active research area driven by the need to identify significant patterns and structures in data, enhance knowledge acquisition, and improve decision making across different domains. This study aims to contribute to the broader knowledge base of data mining practitioners and researchers, facilitating informed decision making and fostering advancements in the field through a thorough analysis of algorithmic enhancements, clustering assessment metrics, and optimization strategies.

Tipo de Documento:	Artículo
Palabras Clave:	clustering; distance measures; data mining; evolution measures; symmetry
Clasificación temática:	Materias > Ingeniería
Divisiones:	Universidad Europea del Atlántico > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Producción Científica Universidad de La Romana > Investigación > Producción Científica
Depositado:	02 Ene 2024 23:30
Ultima Modificación:	02 Ene 2024 23:30
URI:	https://repositorio.uniromana.edu.do/id/eprint/8657

Acciones (logins necesarios)

Ver Objeto

open

Detecting hate in diversity: a survey of multilingual code-mixed image and video analysis

The proliferation of damaging content on social media in today’s digital environment has increased the need for efficient hate speech identification systems. A thorough examination of hate speech detection methods in a variety of settings, such as code-mixed, multilingual, visual, audio, and textual scenarios, is presented in this paper. Unlike previous research focusing on single modalities, our study thoroughly examines hate speech identification across multiple forms. We classify the numerous types of hate speech, showing how it appears on different platforms and emphasizing the unique difficulties in multi-modal and multilingual settings. We fill research gaps by assessing a variety of methods, including deep learning, machine learning, and natural language processing, especially for complicated data like code-mixed and cross-lingual text. Additionally, we offer key technique comparisons, suggesting future research avenues that prioritize multi-modal analysis and ethical data handling, while acknowledging its benefits and drawbacks. This study attempts to promote scholarly research and real-world applications on social media platforms by acting as an essential resource for improving hate speech identification across various data sources.

Producción Científica

Hafiz Muhammad Raza Ur Rehman mail , Mahpara Saleem mail , Muhammad Zeeshan Jhandir mail , Eduardo René Silva Alvarado mail eduardo.silva@funiber.org, Helena Garay mail helena.garay@uneatlantico.es, Imran Ashraf mail ,

Raza Ur Rehman

open

Novel hybrid transfer neural network for wheat crop growth stages recognition using field images

Wheat is one of the world’s most widely cultivated cereal crops and is a primary food source for a significant portion of the population. Wheat goes through several distinct developmental phases, and accurately identifying these stages is essential for precision farming. Determining wheat growth stages accurately is crucial for increasing the efficiency of agricultural yield in wheat farming. Preliminary research identified obstacles in distinguishing between these stages, negatively impacting crop yields. To address this, this study introduces an innovative approach, MobDenNet, based on data collection and real-time wheat crop stage recognition. The data collection utilized a diverse image dataset covering seven growth phases ‘Crown Root’, ‘Tillering’, ‘Mid Vegetative’, ‘Booting’, ‘Heading’, ‘Anthesis’, and ‘Milking’, comprising 4496 images. The collected image dataset underwent rigorous preprocessing and advanced data augmentation to refine and minimize biases. This study employed deep and transfer learning models, including MobileNetV2, DenseNet-121, NASNet-Large, InceptionV3, and a convolutional neural network (CNN) for performance comparison. Experimental evaluations demonstrated that the transfer model MobileNetV2 achieved 95% accuracy, DenseNet-121 achieved 94% accuracy, NASNet-Large achieved 76% accuracy, InceptionV3 achieved 74% accuracy, and the CNN achieved 68% accuracy. The proposed novel hybrid approach, MobDenNet, that synergistically merges the architectures of MobileNetV2 and DenseNet-121 neural networks, yields highly accurate results with precision, recall, and an F1 score of 99%. We validated the robustness of the proposed approach using the k-fold cross-validation. The proposed research ensures the detection of growth stages with great promise for boosting agricultural productivity and management practices, empowering farmers to optimize resource distribution and make informed decisions.

Producción Científica

Aisha Naseer mail , Madiha Amjad mail , Ali Raza mail , Kashif Munir mail , Aseel Smerat mail , Henry Fabian Gongora mail henry.gongora@uneatlantico.es, Carlos Eduardo Uc Ríos mail carlos.uc@unini.edu.mx, Imran Ashraf mail ,

Naseer

open

Client engagement solution for post implementation issues in software industry using blockchain

In the rapidly advanced and evolving information technology industry, adequate client engagement plays a critical role as it is very important to understand the client’s concerns, and requirements, have the records, authorizations, and go-ahead of previously agreed requirements, and provide the feasible solution accordingly. Previously multiple solutions have been proposed to enhance the efficiency of client engagement, but they lack traceability, trust, transparency, and conflict in agreements of previous contracts. Due to the lack of these shortcomings, the client requirement is getting delayed which is causing client escalations, integrity issues, project failure, and penalties. In this study, we proposed the UniferCollab framework to overcome the issues of collaboration between various teams, transparency, the record of client authorizations, and the go-ahead on previous developments by implementing blockchain technology. We store the data on the permissible network in the proposed approach. It allows us to compile all the requirements and information shared by clients on permissible blockchain to secure a large amount of data which enhances the traceability of all the requirements. All the authorizations from the client generate push notifications for any changes in their current system executed through smart contracts. It removes the ambiguity between various development teams if the client has only shared the requirement with one team. The data is stored in the decentralized network from where information is gathered which resolves the traceability, transparency, and trust issues. Lastly, evaluations involved a total of 800 hypertext transfer protocol (HTTP) requests tested using Postman with blockchain block sizes ranging from 0.568 KB to 550 KB and an average size increase of 280 KB was observed as new blocks were added. The longest chain in the network was observed during 800 repetitions of blockchain operations. Latency analysis revealed that delays in processing HTTP requests were influenced by decentralized node processing, local machine response times, and internet bandwidth through various experiments. Results show that the proposed framework resolves all client engagement issues in implementation between all stakeholders which enhances trust, and transparency improves client experience and helps us manage disputes effectively.

Producción Científica

Muhammad Shoaib Farooq mail , Khurram Irshad mail , Danish Riaz mail , Nagwan Abdel Samee mail , Ernesto Bautista Thompson mail ernesto.bautista@unini.edu.mx, Daniel Gavilanes Aray mail daniel.gavilanes@uneatlantico.es, Imran Ashraf mail ,

Farooq

open

Ensemble stacked model for enhanced identification of sentiments from IMDB reviews

The emergence of social media platforms led to the sharing of ideas, thoughts, events, and reviews. The shared views and comments contain people’s sentiments and analysis of these sentiments has emerged as one of the most popular fields of study. Sentiment analysis in the Urdu language is an important research problem similar to other languages, however, it is not investigated very well. On social media platforms like X (Twitter), billions of native Urdu speakers use the Urdu script which makes sentiment analysis in the Urdu language important. In this regard, an ensemble model RRLS is proposed that stacks random forest, recurrent neural network, logistic regression (LR), and support vector machine (SVM). The Internet Movie Database (IMDB) movie reviews and Urdu tweets are examined in this study using Urdu sentiment analysis. The Urdu hack library was used to preprocess the Urdu data, which includes preprocessing operations including normalizing individual letters, merging them, including spaces, etc. concerning punctuation. The problem of accurately encoding Urdu characters and replacing Arabic letters with their Urdu equivalents is fixed by the normalization module. Several models are adopted in this study for extensive evaluation of their accuracy for Urdu sentiment analysis. While the results promising, among machine learning models, the SVM and LR attained an accuracy of 87%, according to performance criteria such as F-measure, accuracy, recall, and precision. The accuracy of the long short-term memory (LSTM) and bidirectional LSTM (BiLSTM) was 84%. The suggested ensemble RRLS model performs better than other learning algorithms and achieves a 90% accuracy rate, outperforming current methods. The use of the synthetic minority oversampling technique (SMOTE) is observed to improve the performance and lead to 92.77% accuracy.

Producción Científica

Komal Azim mail , Alishba Tahir mail , Mobeen Shahroz mail , Hanen Karamti mail , Annia A. Vázquez mail annia.almeyda@uneatlantico.es, Angel Olider Rojas Vistorte mail angel.rojas@uneatlantico.es, Imran Ashraf mail ,

Azim

open

Tensiomyography, functional movement screen and counter movement jump for the assessment of injury risk in sport: a systematic review of original studies of diagnostic tests

Background: Scientific research should be carried out to prevent sports injuries. For this purpose, new assessment technologies must be used to analyze and identify the risk factors for injury. The main objective of this systematic review was to compile, synthesize and integrate international research published in different scientific databases on Countermovement Jump (CMJ), Functional Movement Screen (FMS) and Tensiomyography (TMG) tests and technologies for the assessment of injury risk in sport. This way, this review determines the current state of the knowledge about this topic and allows a better understanding of the existing problems, making easier the development of future lines of research. Methodology: A structured search was carried out following the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines and the PICOS model until November 30, 2024, in the MEDLINE/PubMed, Web of Science (WOS), ScienceDirect, Cochrane Library, SciELO, EMBASE, SPORTDiscus and Scopus databases. The risk of bias was assessed and the PEDro scale was used to analyze methodological quality. Results: A total of 510 articles were obtained in the initial search. After inclusion and exclusion criteria, the final sample was 40 articles. These studies maintained a high standard of quality. This revealed the effects of the CMJ, FMS and TMG methods for sports injury assessment, indicating the sample population, sport modality, assessment methods, type of research design, study variables, main findings and intervention effects. Conclusions: The CMJ vertical jump allows us to evaluate the power capacity of the lower extremities, both unilaterally and bilaterally, detect neuromuscular asymmetries and evaluate fatigue. Likewise, FMS could be used to assess an athlete's basic movement patterns, mobility and postural stability. Finally, TMG is a non-invasive method to assess the contractile properties of superficial muscles, monitor the effects of training, detect muscle asymmetries, symmetries, provide information on muscle tone and evaluate fatigue. Therefore, they should be considered as assessment tests and technologies to individualize training programs and identify injury risk factors.

Producción Científica

Álvaro Velarde-Sotres mail alvaro.velarde@uneatlantico.es, Antonio Bores-Cerezal mail antonio.bores@uneatlantico.es, Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Julio Calleja-González mail ,

Velarde-Sotres

A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective

Resumen

Acciones (logins necesarios)

TEMÁTICA

ACCESO

IDIOMA

Filtros

Enlaces: