An improved deep convolutional neural network-based YouTube video classification using textual features

Artículo Materias > Ingeniería Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Universidad de La Romana > Investigación > Producción Científica
Abierto Inglés Video content on the web platform has increased explosively during the past decade, thanks to the open access to Facebook, YouTube, etc. YouTube is the second-largest social media platform nowadays containing more than 37 million YouTube channels. YouTube revealed at a recent press event that 30,000 new content videos per hour and 720,000 per day are posted. There is a need for an advanced deep learning-based approach to categorize the huge database of YouTube videos. This study aims to develop an artificial intelligence-based approach to categorize YouTube videos. This study analyzes the textual information related to videos like titles, descriptions, user tags, etc. using YouTube exploratory data analysis (YEDA) and shows that such information can be potentially used to categorize videos. A deep convolutional neural network (DCNN) is designed to categorize YouTube videos with efficiency and high accuracy. In addition, recurrent neural network (RNN), and gated recurrent unit (GRU) are also employed for performance comparison. Moreover, logistic regression, support vector machines, decision trees, and random forest models are also used. A large dataset with 9 classes is used for experiments. Experimental findings indicate that the proposed DCNN achieves the highest receiver operating characteristics (ROC) area under the curve (AUC) score of 99% in the context of YouTube video categorization and 96% accuracy which is better than existing approaches. The proposed approach can be used to help YouTube users suggest relevant videos and sort them by video category. metadata Raza, Ali; Younas, Faizan; Siddiqui, Hafeez Ur Rehman; Rustam, Furqan; Gracia Villar, Mónica; Silva Alvarado, Eduardo René y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, eduardo.silva@funiber.org, SIN ESPECIFICAR (2024) An improved deep convolutional neural network-based YouTube video classification using textual features. Heliyon, 10 (16). e35812. ISSN 24058440

[img]
Vista Previa
Texto
PIIS2405844024118439.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Descargar (3MB) | Vista Previa

Resumen

Video content on the web platform has increased explosively during the past decade, thanks to the open access to Facebook, YouTube, etc. YouTube is the second-largest social media platform nowadays containing more than 37 million YouTube channels. YouTube revealed at a recent press event that 30,000 new content videos per hour and 720,000 per day are posted. There is a need for an advanced deep learning-based approach to categorize the huge database of YouTube videos. This study aims to develop an artificial intelligence-based approach to categorize YouTube videos. This study analyzes the textual information related to videos like titles, descriptions, user tags, etc. using YouTube exploratory data analysis (YEDA) and shows that such information can be potentially used to categorize videos. A deep convolutional neural network (DCNN) is designed to categorize YouTube videos with efficiency and high accuracy. In addition, recurrent neural network (RNN), and gated recurrent unit (GRU) are also employed for performance comparison. Moreover, logistic regression, support vector machines, decision trees, and random forest models are also used. A large dataset with 9 classes is used for experiments. Experimental findings indicate that the proposed DCNN achieves the highest receiver operating characteristics (ROC) area under the curve (AUC) score of 99% in the context of YouTube video categorization and 96% accuracy which is better than existing approaches. The proposed approach can be used to help YouTube users suggest relevant videos and sort them by video category.

Tipo de Documento: Artículo
Palabras Clave: YouTube video categorization; Convolutional neural network; Text categorization; Text features
Clasificación temática: Materias > Ingeniería
Divisiones: Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Producción Científica
Universidad de La Romana > Investigación > Producción Científica
Depositado: 23 Sep 2024 23:30
Ultima Modificación: 23 Sep 2024 23:30
URI: https://repositorio.uniromana.edu.do/id/eprint/14337

Acciones (logins necesarios)

Ver Objeto Ver Objeto

<a class="ep_document_link" href="/15625/1/s41598-024-74127-8.pdf"><img class="ep_doc_icon" alt="[img]" src="/15625/1.hassmallThumbnailVersion/s41598-024-74127-8.pdf" border="0"/></a>

en

open

Smart agriculture: utilizing machine learning and deep learning for drought stress identification in crops

Plant stress reduction research has advanced significantly with the use of Artificial Intelligence (AI) techniques, such as machine learning and deep learning. This is a significant step toward sustainable agriculture. Innovative insights into the physiological responses of plants mostly crops to drought stress have been revealed through the use of complex algorithms like gradient boosting, support vector machines (SVM), recurrent neural network (RNN), and long short-term memory (LSTM), combined with a thorough examination of the TYRKC and RBR-E3 domains in stress-associated signaling proteins across a range of crop species. Modern resources were used in this study, including the UniProt protein database for crop physiochemical properties associated with specific signaling domains and the SMART database for signaling protein domains. These insights were then applied to deep learning and machine learning techniques after careful data processing. The rigorous metric evaluations and ablation analysis that typified the study’s approach highlighted the algorithms’ effectiveness and dependability in recognizing and classifying stress events. Notably, the accuracy of SVM was 82%, while gradient boosting and RNN showed 96%, and 94%, respectively and LSTM obtained an astounding 97% accuracy. The study observed these successes but also highlights the ongoing obstacles to AI adoption in agriculture, emphasizing the need for creative thinking and interdisciplinary cooperation. In addition to its scholarly value, the collected data has significant implications for improving resource efficiency, directing precision agricultural methods, and supporting global food security programs. Notably, the gradient boosting and LSTM algorithm outperformed the others with an exceptional accuracy of 96% and 97%, demonstrating their potential for accurate stress categorization. This work highlights the revolutionary potential of AI to completely disrupt the agricultural industry while simultaneously advancing our understanding of plant stress responses.

Producción Científica

Tariq Ali mail , Saif Ur Rehman mail , Shamshair Ali mail , Khalid Mahmood mail , Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Tahir Khurshaid mail , Imran Ashraf mail ,

Ali

<a href="/15198/1/nutrients-16-03859.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/15198/1.hassmallThumbnailVersion/nutrients-16-03859.pdf" border="0"/></a>

en

open

Carotenoids Intake and Cardiovascular Prevention: A Systematic Review

Background: Cardiovascular diseases (CVDs) encompass a variety of conditions that affect the heart and blood vessels. Carotenoids, a group of fat-soluble organic pigments synthesized by plants, fungi, algae, and some bacteria, may have a beneficial effect in reducing cardiovascular disease (CVD) risk. This study aims to examine and synthesize current research on the relationship between carotenoids and CVDs. Methods: A systematic review was conducted using MEDLINE and the Cochrane Library to identify relevant studies on the efficacy of carotenoid supplementation for CVD prevention. Interventional analytical studies (randomized and non-randomized clinical trials) published in English from January 2011 to February 2024 were included. Results: A total of 38 studies were included in the qualitative analysis. Of these, 17 epidemiological studies assessed the relationship between carotenoids and CVDs, 9 examined the effect of carotenoid supplementation, and 12 evaluated dietary interventions. Conclusions: Elevated serum carotenoid levels are associated with reduced CVD risk factors and inflammatory markers. Increasing the consumption of carotenoid-rich foods appears to be more effective than supplementation, though the specific effects of individual carotenoids on CVD risk remain uncertain.

Producción Científica

Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Imanol Eguren García mail imanol.eguren@uneatlantico.es, Álvaro Lasarte García mail , Thomas Prola mail thomas.prola@uneatlantico.es, Raquel Martínez Díaz mail raquel.martinez@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es,

Sumalla Cano

<a href="/15333/1/nutrients-16-03907.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/15333/1.hassmallThumbnailVersion/nutrients-16-03907.pdf" border="0"/></a>

en

open

Youth Healthy Eating Index (YHEI) and Diet Adequacy in Relation to Country-Specific National Dietary Recommendations in Children and Adolescents in Five Mediterranean Countries from the DELICIOUS Project

Background/Objectives: The diet quality of younger individuals is decreasing globally, with alarming trends also in the Mediterranean region. The aim of this study was to assess diet quality and adequacy in relation to country-specific dietary recommendations for children and adolescents living in the Mediterranean area. Methods: A cross-sectional survey was conducted of 2011 parents of the target population participating in the DELICIOUS EU-PRIMA project. Dietary data and cross-references with food-based recommendations and the application of the youth healthy eating index (YHEI) was assessed through 24 h recalls and food frequency questionnaires. Results: Adherence to recommendations on plant-based foods was low (less than ∼20%), including fruit and vegetables adequacy in all countries, legume adequacy in all countries except for Italy, and cereal adequacy in all countries except for Portugal. For animal products and dietary fats, the adequacy in relation to the national food-based dietary recommendations was slightly better (∼40% on average) in most countries, although the Eastern countries reported worse rates. Higher scores on the YHEI predicted adequacy in relation to vegetables (except Egypt), fruit (except Lebanon), cereals (except Spain), and legumes (except Spain) in most countries. Younger children (p < 0.005) reporting having 8–10 h adequate sleep duration (p < 0.001), <2 h/day screen time (p < 0.001), and a medium/high physical activity level (p < 0.001) displayed a better diet quality. Moreover, older respondents (p < 0.001) with a medium/high educational level (p = 0.001) and living with a partner (p = 0.003) reported that their children had a better diet quality. Conclusions: Plant-based food groups, including fruit, vegetables, legumes, and even (whole-grain) cereals are underrepresented in the diets of Mediterranean children and adolescents. Moreover, the adequate consumption of other important dietary components, such as milk and dairy products, is rather disregarded, leading to substantially suboptimal diets and poor adequacy in relation to dietary guidelines.

Producción Científica

Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Alice Rosi mail , Francesca Scazzina mail , Evelyn Frias-Toral mail , Osama Abdelkarim mail , Mohamed Aly mail , Raynier Zambrano-Villacres mail , Juancho Pons mail , Laura Vázquez-Araújo mail , Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Lorenzo Monasta mail , Ana Mata mail , María Isabel Pardo mail , Pablo Busó mail , Giuseppe Grosso mail ,

Giampieri

<a class="ep_document_link" href="/15441/1/journal.pone.0313835.pdf"><img class="ep_doc_icon" alt="[img]" src="/15441/1.hassmallThumbnailVersion/journal.pone.0313835.pdf" border="0"/></a>

en

open

StackIL10: A stacking ensemble model for the improved prediction of IL-10 inducing peptides

Interleukin-10, a highly effective cytokine recognized for its anti-inflammatory properties, plays a critical role in the immune system. In addition to its well-documented capacity to mitigate inflammation, IL-10 can unexpectedly demonstrate pro-inflammatory characteristics under specific circumstances. The presence of both aspects emphasizes the vital need to identify the IL-10-induced peptide. To mitigate the drawbacks of manual identification, which include its high cost, this study introduces StackIL10, an ensemble learning model based on stacking, to identify IL-10-inducing peptides in a precise and efficient manner. Ten Amino-acid-composition-based Feature Extraction approaches are considered. The StackIL10, stacking ensemble, the model with five optimized Machine Learning Algorithm (specifically LGBM, RF, SVM, Decision Tree, KNN) as the base learners and a Logistic Regression as the meta learner was constructed, and the identification rate reached 91.7%, MCC of 0.833 with 0.9078 Specificity. Experiments were conducted to examine the impact of various enhancement techniques on the correctness of IL-10 Prediction. These experiments included comparisons between single models and various combinations of stacking-based ensemble models. It was demonstrated that the model proposed in this study was more effective than singular models and produced satisfactory results, thereby improving the identification of peptides that induce IL-10.

Producción Científica

Salman Sadullah Usmani mail , Izaz Ahmmed Tuhin mail , Md. Rajib Mia mail , Md. Monirul Islam mail , Imran Mahmud mail , Carlos Eduardo Uc Ríos mail carlos.uc@unini.edu.mx, Henry Fabian Gongora mail henry.gongora@uneatlantico.es, Imran Ashraf mail , Md. Abdus Samad mail ,

Usmani

<a class="ep_document_link" href="/15444/1/s41598-024-79106-7.pdf"><img class="ep_doc_icon" alt="[img]" src="/15444/1.hassmallThumbnailVersion/s41598-024-79106-7.pdf" border="0"/></a>

en

open

Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization

With the rapid increase of users over social media, cyberbullying, and hate speech problems have arisen over the past years. Automatic hate speech detection (HSD) from text is an emerging research problem in natural language processing (NLP). Researchers developed various approaches to solve the automatic hate speech detection problem using different corpora in various languages, however, research on the Urdu language is rather scarce. This study aims to address the HSD task on Twitter using Roman Urdu text. The contribution of this research is the development of a hybrid model for Roman Urdu HSD, which has not been previously explored. The novel hybrid model integrates deep learning (DL) and transformer models for automatic feature extraction, combined with machine learning algorithms (MLAs) for classification. To further enhance model performance, we employ several hyperparameter optimization (HPO) techniques, including Grid Search (GS), Randomized Search (RS), and Bayesian Optimization with Gaussian Processes (BOGP). Evaluation is carried out on two publicly available benchmarks Roman Urdu corpora comprising HS-RU-20 corpus and RUHSOLD hate speech corpus. Results demonstrate that the Multilingual BERT (MBERT) feature learner, paired with a Support Vector Machine (SVM) classifier and optimized using RS, achieves state-of-the-art performance. On the HS-RU-20 corpus, this model attained an accuracy of 0.93 and an F1 score of 0.95 for the Neutral-Hostile classification task, and an accuracy of 0.89 with an F1 score of 0.88 for the Hate Speech-Offensive task. On the RUHSOLD corpus, the same model achieved an accuracy of 0.95 and an F1 score of 0.94 for the Coarse-grained task, alongside an accuracy of 0.87 and an F1 score of 0.84 for the Fine-grained task. These results demonstrate the effectiveness of our hybrid approach for Roman Urdu hate speech detection.

Producción Científica

Waqar Ashiq mail , Samra Kanwal mail , Adnan Rafique mail , Muhammad Waqas mail , Tahir Khurshaid mail , Elizabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Alicia Bustamante Alonso mail alicia.bustamante@uneatlantico.es, Imran Ashraf mail ,

Ashiq