Machine learning on imbalanced data

Event Date: 
Tuesday, 22 April, 2014 - 12:00
Location: 
IBMI
Lecturer: 
Assoc Prof Marko Robnik-Šikonja, PhD

Imbalanced data sets arise in many practically important classification problems. Typically in these type of problems we have sufficient number of majority class instances, while the class value of interest is rare. An examples of this problem is diagnostics of rare diseases where the vast majority of tested patients are negative, while we want to learn characteristics of the rare positive cases. Similar imbalances arise in genetics, detection of illegal stock market transactions, insurance fraud, production faults, etc. For general data analytics approaches these problems are difficult, but due to their importance there are many specialized approaches tackling them. We present sampling based approaches, cost-sensitive learning, and some adaptations of well-known learning algorithms intended to cope with data imbalance. In the last part we focus on the methods which are topic of our research, namely feature evaluation with imbalanced data, generation of semi-artificial data, and ensemble methods.

About IBMI

Institute for Biostatistics and Medical Informatics (IBMI), formerly Institute for BioMedical Informatics (so still IBMI) was founded by the Faculty of Medicine as a result of a need for a unit which would perform, or coordinate, tasks related to data analysis and providing information, relevant for research in medicine. The programme of the institute, and its development, have been adjusting thorugh time to changes in financing and technological progress, but the basic aim remain the same: to support research in medicine. This is achieved through the following tasks:

Contact

Institute for Biostatistics and Medical Informatics
University of Ljubljana, Faculty of Medicine
Vrazov trg 2, 1000 Ljubljana
Slovenia

tel: +386 1 543-77-70
fax: +386 1 543-77-71
email: ibmi (at) mf.uni-lj.si