Classification of news categories using BERT
Abstract
The present project consists of developing a Natural Language Processing model to classify news using a set of data or DataSets already evaluated. The main objective is to create a system that can automatically identify and assign news to one of the predefined categories: business, entertainment, politics, sports or technology. This involves data preprocessing, feature extraction, training a machinelearning model and then evaluating its performance using metrics such as "accuracy", "recall 2" F1 - score". This will allow to determine how well the model can predict the correct category for a new or unlabeled news item. If the performance of the model is satisfactory, it can be used to classify unlabeled news in real time. In summary, it seeks to provide an efficient and accurate solution for organizing and labeling the informative content of a news item with the help of Artificial Intelligence.
Downloads
References
Abu Nowshed Chy, Md Hanif Seddiqui, and Sowmitra Das. Bangla news classification using naive bayes classifier. In 16th Int’l Conf. Computer and Information Technology, pages 366–371. IEEE, 2014.
Philip J Hayes, Laura E Knecht, and Monica J Cellio. A news story categorization system. In Second Conference on Applied Natural Language Processing, pages 9–17, 2000.
Md Mahbubur Rahman, Rifat Sadik, and Al Amin Biswas. Bangla document classification using character level deep Lear Ning. In 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pages 1–6. IEEE, 2020.
Meng-Jin Wu, Tzu-Yuan Fu, Yao-Chung Chang, and Chia-Wei Lee. A study on natural language processing classified news. In 2020 Indo–Taiwan 2nd International Conference on Computing, Analytics and Networks (Indo-Taiwan ICAN), pages 244–247. IEEE, 2020.
Zhen Wang, Xu Shan, Xiangxie Zhang, and Jie Yang. N24news: A new dataset for multimodal news classification, 2022.
- Conceptualization
- Data curation
- Formal Analysis
- Investigation
- Methodology
- Software
- Validation
- Visualization
- Writing - original draft
- Writing - review & editing
- Conceptualization
- Data curation
- Formal Analysis
- Investigation
- Methodology
- Software
- Validation
- Visualization
- Writing - original draft
- Writing - review & editing
- Conceptualization
- Data curation
- Formal Analysis
- Investigation
- Methodology
- Software
- Validation
- Visualization
- Writing - original draft
- Writing - review & editing
Copyright (c) 2023 Innovation and Software
This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors exclusively grant the right to publish their article to the Innovation and Software Journal, which may formally edit or modify the approved text to comply with their own editorial standards and with universal grammatical standards, prior to publication; Likewise, our journal may translate the approved manuscripts into as many languages as it deems necessary and disseminates them in several countries, always giving public recognition to the author or authors of the research.