SAP: Standard Arabic profiling toolset for textual analysis

Khalid M.O. Nahar, Ahmed F. Al Eroud, Malek Barahoush, Abdallah M. Al-Akhras

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

This paper defines a Standard Arabic Profiling (SAP) toolset that helps researchers for textual analysis and comparing between different Arabic corpora. Since tools for Arabic language are needed, we present the SAP toolset to simplify the textual analysis process. The approach consists of three profilers: The Part of Speech (POS) profiler that gives statistical analysis for a given document, vocabulary profiler which provides user with an indication out the vocabulary used in a document with reference to Open Source Arabic Corpus (OSAC) of two news agencies (CNN and BBC). The process is accomplished by computing similarity between documents and corpus using Log likelihood measure. Lastly the newly added profiler is the Readability profiler which is used to 1) assess the readability level for a document according to Flesch Reading Ease Readability Formula, and 2) measure the simplicity and ambiguity levels of the document. We described the current part-of-speech for this toolset and how we can extend its functionality to embrace vocabulary and readability profiling.

Original languageEnglish (US)
Pages (from-to)222-229
Number of pages8
JournalInternational Journal of Machine Learning and Computing
Volume9
Issue number2
DOIs
StatePublished - Apr 1 2019
Externally publishedYes

Keywords

  • Part-of-speech tagging (POST)
  • Software
  • Terms-Arabic natural language processing
  • Text analysis

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'SAP: Standard Arabic profiling toolset for textual analysis'. Together they form a unique fingerprint.

Cite this