Within this work, we have made some significant steps towards ontology learning from texts, without prior knowledge (i.e. ontology learning from scratch) in a language and domain-independent way. In addition, we have proposed a novel framework for the evaluation of ontology learning.
In particular, the contributions of this work regarding the learning of ontologies are summarized as follows:
- Concept/topic discovery and identification idependent of the terms surface appearance
- Formation of the ontology subsumption hierarchy backbone using only statistical information regarding the discovered topics
- Determination of the depth of the learned hierarchy in an automatic and statistical way
- Conception of a language-neutral and domain-independent ontology learning method
In the context of ontology evaluation, we have proposed a flexible framework along with a novel set of similarity methods that are able to assess the learned ontology with respect to a gold standard. In particular, the main contributions are:
- Highlight of the importance of the mapping between the learned and the gold ontology in order to then assess the former
- Matching of learned concepts to gold concepts beyond the lexical layer of the ontology
- Take into account both the structural and the lexical layers in order to assess the learned ontology
- Allow any method from the field of Ontology Alignment to be used to match the ontologies
Main contributions through reports and publications:
- A specification of the Ontology Learning process as a 3-step procedure that involves concept identification, taxonomy construction and semantic relations extraction, as well as the evaluation of learned ontologies taking into account related work from the state of the art of the relative literature.
- A novel evaluation method of ontology learning methods taking into account the evaluation aspects mentioned earlier.
- A method for identifying concepts in large text collections and arragning them in a taxonomy in a language-neutral way.
- A novel method for learning a Hierarchical Probabilistic Topic Model, that we apply in the task of ontology learning, and that is able to estimate at the same time the topics and the structure of the taxonomy given the text collection, as well as to estimate the depth and brancing factor at each level of the hierarchy.