Concept Extraction from Plain Text for Ontology Builder

T07 1

Views: 179

All Rights Reserved

Copyright © 2007, Common Ground Research Networks, All Rights Reserved

Abstract

In the evolution of intelligent web, Semantic web, ontology plays a very crucial role. The ontology not only describes the underlying concepts with their relationships and property, it plays a major role in semantic application like eCommerce, eLearning, Expert Systems etc. Ontologies developed so far are domain specific and hence specific to an application. In order to develop a domain independent semantic application generic ontologies are needed which can fit into almost any domain and hence applicable to various applications. To construct such type of ontologies the concept extraction is to be general in nature. This paper proposed a technique to extract concepts from plain text to build generic ontologies. The extraction is based on existing linguistic resources like lexicon and synset. A Universal Decimal Classification is associated with each concept to classify the concepts. The Syntactic Parsing is to be done using Vibhakti Parser to preprocess the text and convert the compound and complex sentences into simpler sentences. The noun/noun phrases are extracted from the preprocessed text which are input to the concept extractor which extracts the potential nouns as the concepts. It uses Statistically indexed table is generated with the validation of the concept in text. Those concepts are extracted which are occurring most frequently in the text. This technique helps to extract the concepts from the plain text using linguistic resources. It can be applied in multilingual environment because of the use of linguistic resources for the concept extraction and hence expand the domain of input.