Events in 2004

R in EXCEL, EXCEL IN R ... AND MORE
17 December, 2004
Jaro Lajovic, MD
independent consultant

R is already considered by many to be the best statistical package available, while Excel is probably the most popular spreadsheet program in use today. The capabilities of the two overlap in some respects, differ in the others, but, most importantly, both applications can be made to complement each other effectively. But, then, how to achieve such a synergy and get "the best from both worlds"?

The lecture will be devoted to answering this question.

Interconnection of R and other programs under the Microsoft® Windows® platform will be discussed with emphasis on the Office suite, particularly connectivity with Excel. The presentation will be practically oriented, focusing on the required (freely available) software components, their download sites and their installation; brief description of client-server assignment; different ways of using R from Excel (scratchpad mode, worksheet functions, macro mode); and concluding with some practical examples.

Although the topic is of possible interest for both beginners and advanced R users, it should be noted that no special expertise apart from the basics of R and Excel is required to follow the lecture.

Handouts: (in Slovenian)

AN OVERVIEW OF SURVEY METHODOLOGY AND QUALITY CONTROL IN THE NATIONAL OFFICIAL STATISTICS SYSTEM
23 November, 2004
Metka Zaletel
Statistical Office of the Republic of Slovenia

As the principal co-ordinator of the national official statistics system, the Statistical Office of the Republic of Slovenia is daily facing both the advantages and the limitations of two partially overlapping systems: the European official statistics system and the Slovenian government. At the same time, its interactions with companies, households and farms are based on the same principles as those of marketing companies. Hence, development of survey methodology and total quality management are of crucial importance.

In the first part of the lecture, fundamentals of survey methodology regarding households and individuals (sampling methods, approaching surveyed units, form processing) will be presented together with corresponding statistical methodology (with emphasis on automated data processing). In the second part, the standard definition of quality of services, processes and products will be presented, as well as other mechanisms for monitoring quality in an institution such as a national statistical office.

Slides: PDF (in Slovenian)

Photos: 1, 2, 3, 4

NETWORKS AND DATA ANALYSIS
22 October, 2004
Prof. Vladimir Batagelj, PhD
University of Ljubljana, Faculty of Mathematics and Physics; Institute of Mathematics, Physics and Mechanics

Let U be a set of units in which distance d(X,Y) is defined for each pair of units X in Y. We can then create two types of networks on the basis of the set U:

In both cases, the weight of a link is the distance between its two end-points.

For smaller sets (up to a few hundred units), both networks can be generated with relatively short procedures in R. Applying network analysis methods to these networks can provide new and interesting insight into the sets of units.

For larger sets (thousands of units) the procedures for creating the two networks become too slow. Unfortunately, more efficient procedures are not known yet. However, an efficient partial solution does exist.

Suppose that in the given set of units there is an additional relational constraint (e.g., geographic adjacency, friendship, business contacts etc.); then the set can be extended to form a network in which the weight of a link is the distance between its two end-points. This procedure can also be easily implemented in R.

The use of the described networks will be demonstrated on selected datasets. For network analysis, the Pajek software will be used.

Slides: PDF (in Slovenian)

SOME UNDER-USED, BUT SIMPLE AND USEFUL, DATA ANALYSIS TECHNIQUES
30 September, 2004
Prof. K. Larry Weldon, PhD
Simon Fraser University, Vancouver, Canada, Department of Statistics and Actuarial Science

Resampling and graphical methods have experienced remarkable growth in both the theory and practice of statistics, largely as a result of advances in computer speeds and software ease-of-use. However, early undergraduate courses seldom incorporate these techniques, even though simple versions require very little prerequisite knowledge. In this paper I will describe some simple approaches to: kernel estimation of densities and nonparametric smoothing, multivariate data display, and the use of the bootstrap as a general- purpose inference procedure. It is suggested that these techniques may have more utility for students than some of the inference techniques more commonly taught.

The lecture was organised by Dr. Andrej Blejec and held at the National Institute of Biology in Ljubljana.

APPLIED STATISTICS 2004 CONFERENCE

From 19 to 22 September, 2004, the Applied Statistics international conference was held in Ljubljana (in the City Hotel Turist). Once more, we had the opportunity to listen to excellent invited speakers and participate in an interesting biostatistical section. In addition, a workshop on data visualisation was held by Prof. Frank E. Harrell.

POPULATION-BASED CANCER SURVIVAL ANALYSIS
22 June, 2004
Prof. Paul W. Dickman, PhD
Karolinska Institutet, Stockholm, Department of Medical Epidemiology and Biostatistics

Relative survival is commonly used as a measure of patient survival when working with data from population-based cancer registries. The relative survival ratio is estimated from life tables as the ratio of the observed survival of the patients (where all deaths are considered events) to the expected survival of a comparable group from the general population. I will introduce the field of population-based cancer survival analysis, the concept of relative survival, and describe how relative survival can be estimated using period life tables. Hermann Brenner's suggestion, first published in 1996, to estimate patient survival using period rather than cohort methods was initially met with scepticism but recent analyses based on historical data have demonstrated that period estimation provides better predictions of survival for recently diagnosed patients and earlier detection of temporal trends in patient survival than cohort-based analysis.

Prof. Dickman's website (publications, teaching materials): PaulDickman.com

COURSE ON STATISTICAL ANALYSIS WITH EXCEL

From 2 to 9 June, 2004, a 15-hour course on use of Microsoft® Excel in statistics was held at the IBMI by Gaj Vidmar, MSc, from the IBMI.

Excel is by far the most widespread and one of the most powerful spreadsheets, but it is not a very popular tool with statisticians. This is partly justified, but partly also based on prejudice and lack of knowledge, particularly of the 2002 and 2003 versions. It is an undisputable fact that spreadsheet software is the natural means for organising, displaying and analysing large amounts of data. Adding the fact that the vast majority of personal computers purchased by both the public and the private sector are shipped with pre-installed Microsoft® Windows® platform and Microsoft® Office suite, one should have enough reasons to learn how to analyse data with Excel.

A skilful user can successfully employ Excel to

The majority of occasional users of statistical packages will avoid the need to learn complex procedures from scratch by focusing on Excel, not to mention saving the funds.

PRINCIPLES OF DATA VISUALISATION
28 May, 2004
Gaj Vidmar, MSc
University of Ljubljana, Faculty of Medicine, Institute of Biomedical Informatics

For certain activities, such as driving a car or some other tasks, people tend to be convinced that they are good at them (if not the best), even though this may be far from true. The same goes for graph drawing: contrary to the widespread opinion, only systematic and thorough study of the subject matter, supported by some talent, can lead to true competence.

When visualising data, one faces many issues, ranging from the fundamental and general ones (data type, aim of visualisation, intended message, target audience etc.) to the most complex ones, requiring wide inter-disciplinary knowledge (e.g., selection from the huge number of techniques for visualising multivariate data, development and visualisation of statistical models, characteristics of human perception and cognition involved in interpreting charts).

The lecture will try to present the basic principles and methods of data visualisation through the work of the pioneers (Playfair, Minard) and the most important researchers in the field (Tukey, Bertin, Cleveland, Tufte, Friendly, Wilkinson). Along the way, we shall examine some of the famous successes and notorious failures of visualising data from different eras, from Napoleon to the space shuttle. Finally, we shall stress the importance and role of modern computers in terms of dynamic and interactive data visualisation.

MARKOV CHAINS
16 April, 2004
Prof. Mihael Perman, PhD

Markov chains are used as models in genetics, statistical physics, operational research and elsewhere, but have also become a tool for parameter estimation in statistics.

In this lecture I will first present the basic ideas of Markov chains, illustrated by some examples. Asymptotic behaviour of Markov chains and the rate of convergence towards the limiting distribution are often of interest in applications. Using examples I will explain the notion of the stationary distribution, and then look at what can be deduced from knowing the limiting distributions.

To conclude, I will briefly mention Markov chains in continuous time.

SIMULATION MODEL OF RUBELLA - THE EFFECTS OF VACCINATION STRATEGIES
19 March, 2004
Assist. Prof. Anamarija Jazbec, PhD
University of Zagreb, Faculty of Forestry

This work deals with the construction of a deterministic multistage simulation SIR (Susceptible-Infectious-Remove) model with age and sex structure. The model is based on the natural history of rubella. The dynamic of rubella is described by a non-linear system of differential equations. The model was validated by simulating the course of rubella in Tresnjevka, a part of Zagreb, from 1961 to 1991. For the purposes of these simulations, we used the epidemiological parameters commonly accepted to apply to the rubella; epidemiological classes, coefficients of transfer among the classes, and duration of stay in the classes, only the force of infection was estimated by iterative simulation until the average observed incidence in non epidemic and epidemic years was reproduced within computational error. The force of infection was age and sex dependent. The effects of three different vaccination strategies were simulated assuming vaccination during 60 years with an immunization coverage of 75% and vaccine effectiveness of 95%. The natural flow of rubella was simulated for the first 5 years, and different vaccination strategies were simulated in years six to sixty:

INTRODUCTION TO DATA MINING AND KNOWLEDGE DISCOVERY IN DATABASES
27 February, 2004
Prof. Nada Lavrac, PhD
Jozef Stefan Institute, Department of Knowledge Technologies

In every field of human activity, the gap between the amount of electronically stored data and the amount of data humans can process is growing ever wider. Traditional statistical analysis is focused on testing given hypotheses, while modern knowledge technologies enable the user to automatically extract new hypotheses, possible interpretations and potential knowledge from the data. The methods of data mining and knowledge discovery in databases will be illustrated with various medical data (diagnostics, prognostics, risk-group identification) and datasets from the Celje Health Care Centre.

PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS - POSSIBILITIES FOR APPLICATIONS IN MEDICINE
23 January, 2004
Gregor Socan, PhD
University of Ljubljana, Faculty of Arts, Department of Psychology

Principal component analysis (PCA) and exploratory factor analysis (EFA) are popular multivariate methods for structuring relationships between variables. PCA provides us with weighted sums of variables which explain as much variance of the original variables as possible (i.e., some sort of "summaries" of observed variables), while EFA searches for latent variables (called factors) which explain the inter-dependence between the observed variables (factors can thus be conceptualised as potential "hidden" or unmeasurable causes of the observed correlations). General availability of personal computers made both methods accessible to a mass of researchers, which also often leads to incorrect use. The lecture will focus on some critical aspects of using PCA and EFA, including

The overview of these issues will be supplemented by contemplating whether the criticism of EFA's place in scientific research is justified or not. The lecture will conclude with a discussion on the usefulness of the two methods in the field of biomedicine, aided by a small bibliometric survey (contributed by Gaj Vidmar from the IBMI).

Slides: PDF (in Slovenian)
Survey: PDF (in Slovenian) 

X

OPOZORILO : Pregledujete staro stran IBMI

Vsebine na strani so zastarele in se ne posodabljajo več. Stara stran zajema določene članke in vsebine, ki pa morajo biti še vedno dostopne.

Za nove, posodobljene vsebine se obrnite na http://ibmi.mf.uni-lj.si/