Amharic Text Chunker Using Conditional Random Fields

Birhan Hailu Getaneh; Birchiko Achamyeleh; Gebeyehu Belay

doi:10.24114/cess.v7i1.26935

Penulis

Birhan Hailu Getaneh
Birchiko Achamyeleh
Gebeyehu Belay

DOI:

https://doi.org/10.24114/cess.v7i1.26935

Kata kunci:

Amharic text chunker, base phrase chunker, conditional random fields, clause boundary identification

Abstrak

This paper introduces Amharic text chunker using conditional random fields. To get the optimal feature set of the chunker; the researchers™ conduct different experiments using different scenarios until a promising result obtained. In this study different sentences are collected from Amharic grammar books, new articles, magazines and news of Walta Information Center (WIC) for the training and testing datasets. Thus, these datasets were analyzed and tagged manually and used as a corpus for our model training and testing. The entire datasets were chunk tagged manually for the training dataset and approved by linguistic professionals. For the identification of the boundary of the phrases IOB2 chunk specification is selected and used in this study. The result of all experiments is reported with the maximum overall accuracy off 97.26%, with a window size of two on both sides, with their corresponding POS tag of each token and the worst performance achieved is 84.57%, with only the window size of one word on both the left and right sides.

Unduhan

Data unduhan tidak tersedia.

Referensi

} A. Ibrahim, œA Hybrid Approach to Amharic Base Phrase Chunking and Parsing, Addis Abeba University, 2013.

N. Khoufi, C. Aloulou, and L. H. Belguith, œChunking Arabic Texts Using Conditional Random, IEEE, pp. 428“432, 2014.

K. Sarkar and V. Gayen, œBengali Noun Phrase Chunking Based on Conditional Random Fields, IEEE, pp. 148“153, 2014.

K. H. AMARE and A, œTigrigna question answering system for factoid questions, Addis Abeba University, 2016.

D. Abebaw, œLETEYEQ (áˆŒáŒ á‹¨á‰…)-A Web Based Amharic Question Answering System for Factoid Questions Using Machine Learning Approach, Addis Abeba University, 2013.

Muhe Seid, œTETEYEQ: Amharic Question Answering System for Factoid Questions, Addis Abeba University, 2009.

Y. Zhao and T. Zhao, œExploiting clause boundary information as features for Chinese functional chunk parsing, IEEE, pp. 874“878, 2016.

A. Ibrahim and Y. Assabie, œHierarchical Amharic Base Phrase Chunking Using HMM with Error Pruning, Springer Int. Publ. Switz., vol. 8387, pp. 126“135, 2014.30

X. Vwhp, œshallow parsing natural language processing implementation for intelligent automatic customer service, IEEE, pp. 274“279, 2014.

W. Ali, M. K. Malik, S. Hussain, S. Shahid, and A. Ali, œurdu noun phrase chunking, IEEE, pp. 494“497, 2010.

œCSA (Central Statistics Agency), Addis Ababa, Ethiopia: Central Statistics Agency, http://www.csa.gov.et, 2007. .

G. B. Kumar, œUCSG Shallow Parser : A Hybrid Architecture for a Wide Coverage Natural Language Parsing System, 2007.

K. Roman and T. Katrin, œClassical Probabilistic Models and Conditional Random Fields, Dortmund, 2007.

Amharic Text Chunker Using Conditional Random Fields

Penulis

DOI:

Kata kunci:

Abstrak

Unduhan

Referensi

Unduhan

Diterbitkan

Terbitan

Bagian

License

Cara Mengutip

Artikel Serupa

Terbitan Terkini

Cari