Letter based text scoring method for language identification

Takci, HİDAYET; Sogukpinar, I

Letter based text scoring method for language identification

ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, cilt.3261, ss.283-290, 2004 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 3261
Basım Tarihi: 2004
Dergi Adı: ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.283-290
Sivas Cumhuriyet Üniversitesi Adresli: Hayır

Özet

In recent years, an unexpected amount of growth has been observed in the volume of text documents on the internet, intranet, digital libraries and news groups. It is an important issue to obtain useful information and meaningful patterns from these documents. Identification of Languages of these text documents is an important problem which is studied by many researchers. In these researches generally words (terms) have been used for language identification. Researchers have studied on different approaches like linguistic and statistical based. In this work, Letter Based Text Scoring Method has been proposed for language identification. This method is based on letter distributions of texts. Text scoring has been performed to identify the language of each text document. Text scores are calculated by using letter distributions of new text document. Besides its acceptable accuracy proposed method is easier and faster than short terms and n-gram methods.