Concept Extraction from Plain Text for Ontology Builder

Views: 179

Title: Concept Extraction from Plain Text for Ontology Builder
Author(s): Akhil K. Meshram
Publisher: Common Ground Research Networks
Collection: Common Ground Research Networks
Series: Technology, Knowledge & Society
Journal Title: The International Journal of Technology, Knowledge, and Society
Keywords: Lexicon,, Sysnset, Universal Decimal Classification (UDC), Statistically Indexed Table, Ontology, Concept Extraction, Syntatic Parsing
Volume: 3
Issue: 1
Date: August 14, 2007
ISSN: 1832-3669 (Print)
DOI: https://doi.org/10.18848/1832-3669/CGP/v03i01/55708
Citation: Meshram, Akhil K.. 2007. "Concept Extraction from Plain Text for Ontology Builder." The International Journal of Technology, Knowledge, and Society 3 (1): 107-116. doi:10.18848/1832-3669/CGP/v03i01/55708.
Extent: 10 pages

Abstract

In the evolution of intelligent web, Semantic web, ontology plays a very crucial role. The ontology not only describes the underlying concepts with their relationships and property, it plays a major role in semantic application like eCommerce, eLearning, Expert Systems etc. Ontologies developed so far are domain specific and hence specific to an application. In order to develop a domain independent semantic application generic ontologies are needed which can fit into almost any domain and hence applicable to various applications. To construct such type of ontologies the concept extraction is to be general in nature. This paper proposed a technique to extract concepts from plain text to build generic ontologies. The extraction is based on existing linguistic resources like lexicon and synset. A Universal Decimal Classification is associated with each concept to classify the concepts. The Syntactic Parsing is to be done using Vibhakti Parser to preprocess the text and convert the compound and complex sentences into simpler sentences. The noun/noun phrases are extracted from the preprocessed text which are input to the concept extractor which extracts the potential nouns as the concepts. It uses Statistically indexed table is generated with the validation of the concept in text. Those concepts are extracted which are occurring most frequently in the text. This technique helps to extract the concepts from the plain text using linguistic resources. It can be applied in multilingual environment because of the use of linguistic resources for the concept extraction and hence expand the domain of input.

Common Ground Research Networks

Common Ground Journals and Books

Series (28)

Advanced Search

Search by open access?

Search by subscribed content?