Linguistic summarization with interval type-2 fuzzy sets


Institution Of The Thesis: Gazi University, Fen Bilimleri Enstitüsü, Turkey

Approval Date: 2013

Thesis Language: Turkish

Student: Fatih Emre Boran

Consultant: DİYAR AKAY

Abstract:

The progress in information technology makes it easy to collect and give opportunity to store a huge amount of data from a variety of sources. The extraction of knowledge that is easily understood by the human beings from the data has become very important task rather than the storage of huge amount of data. Therefore, linguistic summarization allowing the representation of knowledge as natural language based sentences has recently turned out to be one of the important data mining techniques. Most studies on linguistic summarization have handled ordinary fuzzy sets (type-1 fuzzy set) for modeling words. Type-1 fuzzy set is able to deal with intrapersonal uncertainty; however, it is not capable of handling interpersonal uncertainty. In order to model both types of uncertainty, the linguistic summarization have been performed by type-2 fuzzy set, and the scalar cardinality based degree of truth has been used in evaluating the linguistic summaries. However, the use of scalar cardinality based degree of truth has leaded to generate inconsistent linguistic summaries not supported by enough data in some situations. The main objective of this thesis work is to provide by generating more consistent linguistic summaries supported by enough data. To achieve this objective, two degree of truths are proposed for evaluating the linguistic summaries labelled with interval type-2 fuzzy sets. The proposed first degree of truth is based on the representation of an interval type-2 fuzzy set with interval alpha-cuts. First, it is proved that an interval type-2 fuzzy set can be presented by the union of interval alpha-cuts. Next, a way of how to obtain interval mass assignments from interval type-2 fuzzy sets is suggested, and then it is proved that the proposed method satisfies the given properties in the literature. Finally, a constrained non-linear mathematical programming model is established to find the minimum and the maximum values of the degree of truth. The proposed second degree of truth is based on the representation of an interval type-2 fuzzy set with the lower and the upper crisp sets obtained by the alpha-cuts. First, the lower and the upper crisp sets are found. Then the possible values for each alpha cuts are obtained. Finally, the degree of truth is computed by using the probability distribution and the minimum and the maximum values taken by a quantifier for each of the possible values. In this thesis work, new quality measures such as the interval coverage degree, interval usefulness degree and interval outlier degree have been also proposed. Furthermore, a new similarity measure has been introduced to measure the similarity between two linguistic summaries. It is proved that the proposed similarity measure satisfies the axioms given in the literature and discussed which areas was to be implemented. The time series of Europe Brent Spot Price (per barrel) has been linguistically summarized using the proposed two degree of truths and the existing degree of truth in the literature. When the results are compared, the results obtained by the proposed two degree of truths are more consistent than those of the existing degree of truth.