Occupational groups prediction in Turkish Twitter data by using machine learning algorithms with multinomial approach

Çıplak, Zeki; Yıldız, Kazım

Occupational groups prediction in Turkish Twitter data by using machine learning algorithms with multinomial approach

dc.contributor.author	Çıplak, Zeki
dc.contributor.author	Yıldız, Kazım
dc.date.accessioned	2024-06-13T20:15:53Z
dc.date.available	2024-06-13T20:15:53Z
dc.date.issued	2024
dc.department	Meslek Yüksekokulu, Gedik Meslek Yüksekokulu, Bilgisayar Programcılığı Programı
dc.description.abstract	A lot of research has been done on personality and sentiment analysis, demographic and professional aspects using user shares in social networks. In particular, information extraction and value are produced based on Twitter data. This study aims to predict the users, occupational groups, who share in Turkish on Twitter, using machine learning methods. First, occupational groups and the Twitter accounts of the occupations in these occupational groups were determined manually and the tweets shared in these accounts were scraped. All tweets were then grouped by occupation into groups of one, five and ten, creating datasets with different characteristics, each containing more than 500,000 tweets. Some datasets were preprocessed using the Zemberek library, which is used in many Turkish NLP studies, and experiments were conducted out with a total 6 datasets. During the preprocessing phase, since the ready-made stopwords lists were not considered sufficient, unnecessary word lists consisting of single and binary words were created manually. Count and TF-IDF vectorizers are used to convert textual data into numerical. Since each word represents a variable in the text classification study, new variables were created by combining double and triple word phrases (ngrams) with feature extraction. In the experiments in which 24 different models were run, instead of using all the features created, the method of “determining the optimal number of features”, which consists of the most valuable features, was used. It was found that the most successful model in the experiments using machine learning algorithms with a multinomial approach achieved 97.3% success in all calculated metrics.
dc.identifier.doi	10.1016/j.eswa.2024.124175
dc.identifier.issn	0957-4174
dc.identifier.scopus	2-s2.0-85192984274
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1016/j.eswa.2024.124175
dc.identifier.uri	https://hdl.handle.net/11501/924
dc.identifier.volume	252
dc.identifier.wos	WOS:001240775500001
dc.identifier.wosquality	Q1
dc.indekslendigikaynak	Scopus
dc.indekslendigikaynak	Web of Science
dc.institutionauthor	Çıplak, Zeki
dc.institutionauthorid	0000-0002-0086-3223
dc.language.iso	en
dc.publisher	Elsevier Ltd
dc.relation.ispartof	Expert Systems with Applications
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Occupation Prediction
dc.subject	Machine Learning
dc.subject	Turkish Twitter Data Analysis
dc.subject	Multinomial Approach
dc.subject	Data Mining
dc.title	Occupational groups prediction in Turkish Twitter data by using machine learning algorithms with multinomial approach
dc.type	Article

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: Tam Metin / Full Text
Boyut:: 2.3 MB
Biçim:: Adobe Portable Document Format

İndir

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu
Gedik Meslek Yüksekokulu Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu