Events in 2003

PREDICTIVE ACCURACY AND EXPLAINED VARIATION
9 December, 2003
Prof. Michael Schemper, PhD
Vienna University Medical School,
Department of Medical Computer Sciences,
Section of Clinical Biometrics

Measures of the predictive accuracy of regression models quantify the extent to which covariates determine an individual outcome. Explained variation measures the relative gains in predictive accuracy when prediction based on covariates replaces unconditional prediction. A unified concept of predictive accuracy and explained variation based on the absolute prediction error is presented for models with continuous, binary, polytomous and survival outcomes. The measures are given both in a model-based formulation and in a formulation directly contrasting observed and expected outcomes. Various aspects of application are demonstrated by examples from three forms of regression models. It is emphasized that the likely degree of absolute or relative predictive accuracy often is low even if there are highly significant and relatively strong covariates.

Photos: 1, 2, 3, 4, 5

BASICS OF GEOSTATISTICS AND ITS APPLICATION IN CHARTING VARIOUS VARIABLES
11 November, 2003
Damijana Kastelec, PhD
University of Ljubljana, Faculty of Biotechnology

Geostatistics is used for analysing spatial variables which describe a process in space, e.g., the distribution of air temperature over Slovenia, the distribution of various illnesses in a selected region, or the distribution of various disease presence indicators. In addition to their value, spatial variables always have a determined geographical location of the measurement. Geostatistical methods enable us to tell something about the distribution of variables over an area on the basis of the measurements taken on selected locations. In practice, this means that charts of selected variables are drawn with added information on their precision. An important part of spatial data analysis is the analysis of spatial correlation of a variable and the selection of the suitable mathematical model for the purpose. The values of a random variable at the points where no measurements were taken (i.e., interpolated values) are calculated as linear combinations of measurement values in a selected neighbourhood, whereby we take into account the distance between points and spatial correlation of the variable. The problem can also be approached in the multivariate way, where the correlation of the studied variable with other available variables is taken into account during interpolation. The geostatistical procedure of spatial interpolation is called kriging.

Besides a few inevitable formulas, I will try to avoid advanced theory during my lecture. The theory of simple and general kriging will be illustrated with examples of spatial interpolation of selected ecological variables (spatial distribution of precipitation over Slovenia, concentration of heavy metals in the soil in the Celje County, etc.). I expect the discussion to indicate which specific health-care problems can be approached in that way.

EXPLORATORY DATA ANALYSIS (EDA)
23 October, 2003
Assist. Prof. Andrej Blejec, PhD
National Institute of Biology, Ljubljana, Slovenia

EDA, conceived by John Tukey in the 1970s, is a set of statistical and graphical methods for displaying data. It is primarily useful when the analyst is not sure what the underlying distribution of the data is. EDA introduces relatively simple distribution displays that are applied to empirical data in order to provide clues about the distribution in the population. Lately, EDA has finally been gaining wide acceptance as a concise and instructive means of visualising data, particularly valuable in the initial phase of research. Basic EDA methods will be presented together with guidelines for their use.

Young researcher Maja Pohar from the IBMI, a graduate student of statistics, presented the solutions to selected exam problems from the Probability and Statistics course of Prof. Mihael Perman in two lectures (on 23 September and 9 October, 2003) at the IBMI. In her second lecture, she focused on a particularly important and interesting topic: the EM (expectation - maximization) algorithm for missing data handling. She presented the basic idea and some simple examples.

COURSE ON THE R STATISTICAL SOFTWARE ENVIRONMENT

From 29 September to 2 October, 2003, a 15-hour course on the R statistical software environment was held at the IBMI by Andrej Blejec, PhD, from the National Institute of Biology. The free R software is particulary suitable for development of new statistical methods and has excellent graphic capabilities, thus it is rapidly gaining popularity as an interactive tool for research and teaching. The course covered the basics of R (and S-plus): the basic data management and data analysis functions, graphical displays, programming new functions and transferring data from other software (Excel, SPSS).

If there is sufficient interest, the course will be repeated.

METHODOLOGY AND STATISTICS CONFERENCE

From 14 to 17 September, 2003, the traditional Methodology and Statistics conference was held in Ljubljana (in the City Hotel Turist). Within the biostatistical section, dr. Ronghui Xu from the Harvard University gave an invited lecture entitled Random effects models for right-censored data. Mrs. Ronghui Xu's PhD thesis (at UCSD) was supervised by Prof. John O'Quigley and she has been collaborating for a while with the Institute of Biomedical Informatics (IBMI) in Ljubljana.

BROWNIAN MOTION
13 June, 2003
Prof. John O'Quigley, PhD
University of California, San Diego, Department of Mathematics

A stochastic process is simply a collection of random variables indexed by some label, usually the label t corresponding to time. We write X(t) or Xt. One of the most commonly used stochastic processes is known as Brownian motion in which X(t) has a normal distribution. The importance of Brownian motion in the context of stochastic processes is analogous to that of the usual normal distribution for simple random variables. Furthermore, there is an analogous central limit theorem known as the functional central limit theorem. This theorem, together with some other basic theorems of probability, such as Slutsky's theorem, enable us to construct a framework for inference in many settings in which we have some continuous variable like t and our models focus on conditional distributions given t.

In this seminar we will review the basics of Brownian motion, related processes such as Brownian motion with drift, integrated Brownian motion, the Brownian Bridge and the Ornstein Uhlenbeck process. We will discuss the basic results and how they can be used in practice. The main application to these processes is to that of empirical processes and we discuss these, in particular the empirical cumulative distribution function. The results can then be extended in order to derive inferential results for almost any parameter which we estimate by using data, X1 to Xn.

RELATIVE SURVIVAL
22 May, 2003
Prof. Janez Stare, PhD
University of Ljubljana, Faculty of Medicine, Institute of Biomedical Informatics

Relative survival is the ratio of the survival of the group under investigation and the survival expected from the group if its mortality would not differ from the general population. Since we are usually interested in survival of groups more endangered than the general population, the value of relative survival is usually less than 1. To calculate expected survival, population mortality tables are required, which are usually calculated for each calendar year broken down by gender and one-year age-group. In our country, population mortality tables are produced by the Statistical Office of the Republic of Slovenia, if requested even per quarter-year interval.

Relative survival tells us what fraction of mortality within the group under observation can be attributed to the disease (or other selected factor). It is a particularly interesting type of information, which is very hard to obtain by means of studying cause-specific mortality, since cause of death is usually not recorded precisely or is sometimes even not recorded at all.

In the introductory part of the lecture I will describe the method for calculating expected survival, then I will focus on regression methods for relative survival. I will explain in some detail the most frequently cited method of Hakulinen and Tenkanen (1987), as well as the less widely used method of Andersen et al. (1985). I will conclude with the method developed at our institute, and I will illustrate all the methods with actual data from a study of survival after miocardial infarction. We shall see that in this case, the usual survival analysis and the analysis of relative survival yield completely opposite results regarding the influence of age and gender on (absolute or relative) survival. It will probably come as a surprise to the audience that all those results are correct (!?).

The lecture is primarily intended for statisticians, but the topic is not very demanding. One of the aims of my lecture is to draw attention to the possibilities for analysing relative survival in Slovenia. Hence, I shall try to make the lecture interesting for a wide non-expert audience. 

X

OPOZORILO : Pregledujete staro stran IBMI

Vsebine na strani so zastarele in se ne posodabljajo več. Stara stran zajema določene članke in vsebine, ki pa morajo biti še vedno dostopne.

Za nove, posodobljene vsebine se obrnite na http://ibmi.mf.uni-lj.si/