LECTURE: Subgroup discovery in data sets with multidimensional responses
The next lecture at the Biostatistical Center will take place on Thursday, 1/24/2013, at 1:00 pm on IBMI.
Subgroup discovery (SD) is an applicable data analysis technique which aims at finding interesting subsets of a random sample according to a predefined target concept. The majority of the existing SD approaches has been developed for data sets with a single binary output variable
(class) therefore the subgroups' interestingness has been related to distributional unusualness.
In the talk we will present an algorithm for subgroup discovery which can handle multiple output variables simultaneously. Recently, the availability of such data sets is increasing and there is a need for suitable ways how to handle them. The proposed approach uses hierarchical clustering in the output space and then analyses the resulting clustering tree. Each node of the dendrogram corresponds to a particular subgroup, its interestingness is further measured using input variables and supervised data mining techniques. By default, subgroups are evaluated in terms of area under ROC curve.
The algorithm's performance will be compared to predictive clustering techniques. For illustration it will be applied to the data from European Social Survey (ESS).
Welcome!