Events in 2005

A CHECKLIST FOR AUTHORS: STATISTICAL PROBLEMS TO DOCUMENT AND TO AVOID
1 December, 2005
Prof. Frank E. Harrell, PhD
Vanderbilt University School of Medicine, Nashville

This talk will cover a list of common statistical errors in medical, epidemiologic, and other literature, found at http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/ManuscriptChecklist. Some of the areas discussed are categorizing continuous variables, treatment of missing data, testing data for normality, underuse of nonparametric tests, underuse of confidence intervals, misinterpretation of P-values, and model overfitting.

Photos: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

REGRESSION MODELING STRATEGIES WITH APPLICATION TO LOGISTIC REGRESSION COURSE

The Regression modeling strategies with application to logistic regression course at the IBMI from 28 November to 1 December, 2005, was held by Prof. Frank E. Harrell, PhD, from the Vanderbilt University School of Medicine. Prof. Harrell gave the course as part of the Statistics in Medicine course within the gradute study of Statistics, but other participants also attended it. The course was based on Prof. Harrell's textbook Regression modelling strategies with applications to linear models, logistics regression, and survival analysis (New York: Springer, 2001), chapters 1 to 6 and 9 to 12.

Course notes: PDF

Photos: 1, 2, 3, 4, 5, 6, 7, 8

THE CANCER REGISTRY OF SLOVENIA
22 November, 2005
Prof. Maja Primic Zakelj, PhD
Epidemiology and Cancer Registries, Institute of Oncology Ljubljana

Without data, there would be no statistics. The importance of the data can hardly be overemphasised, but they are all too seldom discussed.

We will learn what cancer registries do, how they assess their quality, what data they process routinely and what can be said on the basis of such data on cancer in Slovenia, what (else can) they offer to the users, in which inetrnational databases are Slovenian data included (and where can people find those data), as well as in which international research projects the Slovenian registry is taking part. The Cancer Registry of the Republic of Slovenia is an internationally renowned institution, which has established its reputation under the leadership of Prof. Vera Pompe Kirn, PhD, while under the leadership of Prof. Zakelj the reputation is being successfully maintained and the registry is being upgraded in accordance with new technological and methodological demands.

Slides: PowerPoint (in Slovenian)

Photos: 1, 2, 3, 4, 5, 6, 7

STATISTICS WITH EXCEL COURSES

From August to October, 2005, four courses on Statistics with Excel were held at the Statistical Office of the Republic of Slovenia, then one was held at the IBMI (from 7 to 15 November, 2005). All the courses were held by Gaj Vidmar, MSc, from the IBMI.

Course outline: PDF (in Slovenian)

ANALYSIS AND VISUALISATION OF LARGE NETWORKS WITH THE PAJEK SOFTWARE
4 October, 2005
Assist. Prof. Andrej Mrvar, Phd
University of Ljubljana, Faculty of Social Sciences

In daily data analysis, one often faces sizeable networks with thousands of points (vertices) and/or connections (edges). Such networks can be obtained automatically from electronic data sources, for example

Managing such large networks is very demanding regarding both time (only algorithms with sub-quadratic time demand are acceptable) and computer space. We will see the approaches for visualising such netwroks based on determining and displaying the global structure of the network, as well as its individual parts. The networks structure can have several levels.

Such approaches are implemented in the freely available software package Pajek. The package puts special emphasis on visualisation (automated view detemination) of networks. We will see some examples of using Pajek with real-life networks and skim through the book Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj: Exploratory Social Network Analysis with Pajek.

Slides: PDF (in Slovenian)

APPLIED STATISTICS 2005 CONFERENCE

From 18 to 21 September, 2005, the Applied Statistics 2005 international conference was held in Ribno near Bled. There were traditionally interesting invited lectures by distinguished guests; from the biostatistical point of view, the highlight was the lecure by Professor Stephen Senn.

FEATURE SELECTION AND CLASSIFICATION ON TYPE 2 DIABETIC PATIENT'S DATA
14 June, 2005
Prof. Paul McCullagh, PhD
University of Ulster

Diabetes is a disorder of the metabolism where the amount of glucose in the blood is too high because the body cannot produce or properly use insulin. In order to achieve more effective diabetes clinic management, data mining techniques have been applied to a patient database. In an attempt to improve the efficiency of data mining algorithms, a feature selection technique ReliefF is used with the data, which can rank the important attributes affecting Type 2 diabetes control. After selecting suitable attributes, classification techniques are applied to the data to predict how well the patients are controlling their condition. Preliminary results have been confirmed by the clinician and this provides optimism that data mining can be used to generate prediction models. ReliefF is a typical feature mining technique, which has been successfully used in data mining applications. However, ReliefF is sensitive to the definition of relevance that is used in its implementation and when handling a large data set, it is computationally expensive. I will present an optimisation approach called FSSMC (Feature Selection via Supervised Model Construction) that optimised ReliefF by data transformation and starter selection, and evaluates its effectiveness with a common machine learning algorithm, C4.5. Experiments indicate that the proposed method gave improvement for classification accuracy and computation efficiency of trial data sets.

Slides: PowerPoint

Photos: 1, 2, 3

VISUALIZATION, EXTRACTION AND REPRESENTATION TOOLS FOR KNOWLEDGE DISCOVERY
6 June, 2005
Prof. Sherrilynne Fuller, PhD
University of Washington, Seattle

With the rapid expansion of the scientific literature, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. The development of methods and tools for assisting researchers to rapidly extract research findings from heterogeneous and massive information sources, and for using this knowledge in problem-solving is one of the most fundamental research directions for the information sciences today. Based on extensive research to understand biomedical researcher information preferences and information seeking behavior the goal of the Telemakus system is to enhance the knowledge discovery process through innovative retrieval, visual and search interaction tools. Utilizing schema-based approaches to knowledge representation, semantic and syntactic parsing tools from the Unified Medical Language System (UMLS) and interactive mapping tools for users to navigate research findings and linkages across domains, the Telemakus system offers an innovative approach to representing, retrieving, and assimilating research findings in support of knowledge discovery in the biosciences.

RECENT DEVELOPMENTS IN RELATIVE SURVIVAL ANALYSIS
23 May, 2005
Prof. Timo Hakulinen, PhD
Director, Finnish Cancer Registry

Relative survival is used to summarize the excess mortality the cancer patients have in comparison with a corresponding general population group. It is often interpreted as the patients' survival if the mortality from other competing causes of death than the cancer of the patients were eliminated.

An analysis of total mortality of the patients does not differentiate whether variables such as age and sex are related to the disease-specific mortality, the "natural" mortality in the source population or to both of them. On the other hand, information on causes of death may be unavailable or non-reliable preventing the cause-specific survival analysis. Relative survival thus provides an "objective" way to account for mortality due to competing risks or for natural mortality, and it is particularly much utilized by population-based cancer registries.

Relative survival is based on an additive hazards model where the total hazard of dying is regarded as a sum of a known baseline hazard (of a comparable general population group) and an excess hazard associated with the diagnosis of cancer. The excess hazard is assumed to depend on prognostic factors, and also non-proportional effects can be accommodated in the calculations. In fact, non-proportional excess hazards are rather a rule than an exception.

There are at least five ways how to perform the analyses in practice. Currently, the regression analyses of relative survival can be conducted easily using mainstream statistical software packages (e.g. SAS and STATA), thereby removing the reliance on special-purpose software. As a rule of practice, analysis results do not depend on which one of the five ways is utilized. The relative survival methodology has been recently generalized also to analyses of cancer-specific relative survival of patients with multiple tumours.

The main alternative of relative survival analysis is cause-specific survival analysis that presupposes the patient-specific knowledge of the causes of death. Also multiplicative hazards models may be considered as alternatives.

Photos: 1, 2, 3, 4

MODELS FOR COST ANALYSIS IN HEALTH CARE: A CRITICAL AND SELECTIVE REVIEW
1 April, 2005
Prof. Dario Gregori, PhD
University of Torino, Department of Public Health and Microbiology

Consistent attention has been paid to models and methods for the analysis of costs associated with a specific treatment or disease. Models are basically playing around the issues of zero-costs and censoring, for which the statistical problems are the fact that usually an important mass of observations has zero-costs and that costs are usually not recorded until the final event of interest. Models to deal with these issues are robust regression methods, mixture models and the relatively new Lin and Bang estimators.

The aim of this talk is to organize the ideas about the various models as presented in the literature, highlighting in particular the assumptions underlying them. Both simulations and the data coming from a cardiovascular trial and a diabetic cohort will be used to illustrate the various approaches.

Slides: PowerPoint

Photos: 1, 2, 3, 4, 5

SOLUTIONS TO THE PROBLEM OF MONOTONE LIKELIHOOD IN LOGISTIC AND COX REGRESSION MODELS - THEORY AND APPLICATIONS
14 March, 2005
Prof. Georg Heinze, PhD
Medical University of Vienna, Core Unit for Medical Statistics and Informatics

The phenomenon of monotone likelihood or separation is observed in the maximum likelihood fitting process of generalized linear models such as the Cox proportional hazards or the logistic model, if the likelihood seems to converge while at least one parameter estimate diverges to infinity. It primarily occurs in small samples with highly predictive covariates. The simplest possible case of monotone likelihood is an analysis with a dichotomous covariate, where no events are observed in one of the two groups. Monotone likelihood implies infinite or zero hazard or odds ratio estimates, which are usually considered absurd. The adaptation of a penalized likelihood bias-reduction approach by Firth (1993, Biometrika 80, 27-38) has been shown to be superior to previous options to deal with monotone likelihood (Heinze and Schemper, 2001, Biometrics 57, 114-119; 2002, Statistics in Medicine 21, 2409-2419). Profile penalized likelihood (PPL) based confidence intervals for parameter estimates show excellent behaviour in terms of coverage probability. I will briefly review the theory behind the penalized likelihood estimation of parameters and confidence intervals. Furthermore, I introduce software packages to maximize the penalized likelihood in Cox and logistic models, which are available in R, SPLUS and SAS (Heinze and Ploner, 2002 & 2003, Computer Methods and Programs in Biomedicine) and show their application to some clinical data sets. Finally, I would like to discuss some common and differential effects of various bias-correction and penalized likelihood methods.

Slides: PDF

Photos: 1, 2, 3, 4

WHERE ARE OUR LEADING BRANDS ON THE MARKETS OF FORMER YUGOSLAVIA?
22 February, 2005
Zenel Batagelj
CATI Ltd., Ljubljana

First, we will define the question "where are" within the context of brands. We will briefly discuss brand assessment, brand strength measurement, "brand equity" and the like. In these fields, of course, we cannot do without statistics. But in order not to stick to the "theoretical level", we will also have a look at the fresh results of the PGM study (http://pgm.cati.si), which is aimed at measuring the strength of brands on the consumables market in Slovenia, Croatia, Bosnia and Herzegovina, and Serbia.

DISCOVERING INTERACTIONS BETWEEN VARIABLES
20 January, 2005
Aleks Jakulin, MSc
University of Ljubljana, Faculty of Computer and Information Science

Interactions are frequently mentioned and often a challenge. There are many different definitions, so the concept of an interaction remains vague. Among the many, we have selected two notable and useful definitions: the information-theoretic notion of interaction information reveals synergy or redundancy in the predictions made by two variables; inspired by the concepts of categorical data analysis, interaction gain is the benefit obtained by assuming an interaction between variables in a statistical model. Both definitions directly quantify the "amount" of interaction between variables. We will examine statistical models necessary for interaction analysis, and will show how to estimate the confidence intervals and test the statistical significance of interactions. In the remaining time, we will focus on the examples of visualizations involving interactions in data sets from medicine, economics, ecology and political science.

Websites:

Slides: PowerPoint (in Slovenian) 

X

OPOZORILO : Pregledujete staro stran IBMI

Vsebine na strani so zastarele in se ne posodabljajo več. Stara stran zajema določene članke in vsebine, ki pa morajo biti še vedno dostopne.

Za nove, posodobljene vsebine se obrnite na http://ibmi.mf.uni-lj.si/