An Extensıon Of Fuzzy Lınguıstıc Summarızatıon Consıderıng Probabılıstıc Uncertaınty

Thesis Type: Post Graduate

Institution Of The Thesis: Gazi Üniversitesi, Fen Bilimleri Enstitüsü, Turkey

Approval Date: 2015


Consultant: DİYAR AKAY


Linguistic summarization is one of the data mining techniques for descriptive purposes. It provides to express large volumes of data in natural language based forms that could be easily understood by humans. The real world information covers various forms of uncertainties and is typically modeled with the preferred methods such as probability theory, possibility theory or evidence theory. In areas such as engineering, transportation and decision-making, there is apparently more than one variety of uncertainty at hand, and joint modeling of distinct uncertainties is important to reflect the information more accurately. Z-number is a new concept taking into account both probability and possibility. A Z-number consists of two components, (A,B). A represents the value of the variable with a fuzzy set and B represents the certainty of A with also a fuzzy set. Most important step in linguistic summarization is to determine the truth degree of the generated summary. The degree to which A satisfies B in Z-number concept is very similar with the calculating the truth degree in linguistic summarization. In literature, various methods have been proposed for linguistics summarization based on scalar cardinality and fuzzy cardinality. However, there isn't any method for linguistic summarization in which different uncertainties exist together in the data. Hence, starting from the similarity between linguistic summarization and Z-number, a new method for linguistic summarization based on Z-number, considering probabilistic and possibilistic uncertainties together, is proposed. In the proposed method, the copulas are employed to obtain the joint probability distribution of the different variables for type-II quantified sentences. The correlation, i.e., independent, positive correlated or negative correlated, between the variables in the database is also incorporated in the computations with copulas. The proposed method is tested on a sample dataset and compared with the existing methods in the literature.