Step 2: Importing Data

Don’t forget to specify the number of stimuli (or conditions) in the upper left corner of the ILLMO interface before loading data. Then click File/Open [choose your kind of data]  to import your data. If your file is not in one of the listed formats, you may have to process your data first.

The ILLMO download directory contains a number of example data files, some of which are used in the book “Insights in Experimental Data”. In the example on this website we use the CSV file ‘data40.csv’, which contains (simulated) data from comparing the ease of use of the Mac and Windows operating systems by both novice and experienced users. In the data file, the column named ‘Between’ contains the tested system (1 or 2), while the column ‘Score’ contains the (simulated) outcomes of a questionnaire. The higher the number, the more the participant likes the operating system. The column ‘Time’ specifies whether the measurement was performed with novices (1) or experienced users (2). This data set was used as an example in the paper by Kaptein and Robertson, Rethinking Statistical Analysis Methods for CHI, Proceedings of CHI’2012,  1105-1114.

In the dialog window that appears after selecting “File- Open CSV file with attributes (field separator: semicolon)” and identifying the file “data40.csv” in the file manager, select the column ‘Between’ as the one that contains the number of the condition and select the column ‘Score’ as the one with dependent values. We can extract the data for novices only, by specifying the column ‘Time’ as the one with the selection variable, and setting 1 as the value to select (changing this to 2 would select the data for the experienced users). Note that the text window to the left in the dialog window provides a more detailed description of how to use the interface.

read_csv

Dialog window for specifying how to interpret the data in the CSV file data40.csv

Once the data has been read, ILLMO will ask the user whether the data should be interpreted as discrete (integer numbers, also called categorical data) or continuous (real numbers). If the data is continuous, click ‘No’. If this data is discrete, click ‘Yes’. This is only an initial choice, as ILLMO allows the user to switch between a discrete and a continuous interpretation of the data at any time.

Discrete data for instance arises when collecting responses on a Likert scale. Note that discrete data can always be processed as continuous data, but that the conversion from continuous to discrete requires a  quantization process that approximates the data, and that is hence not reversible.

Once the data is loaded (as continous), the main screen shows both data and model characteristics. The default setting of the ILLMO program is to approximate the observed histograms in the different conditions by Gaussian (normal) distributions with different averages but constant standard deviation.

ILLMO interface after reading the file data40.csv as continuous data

In the bottom-right graph are displayed the observed cumulative histograms (the stepwise curves in black and red) together with the estimated cumulative distributions (the smooth curves in blue and green).

Observed cumulative histograms (stepwise curves) being approximated by cumulative distributions (continuous curves)

The bottom-left graph displays the regular Gaussian distributions that correspond to the cumulative distributions in the above graph.

Regular distributions used to model individual observations in both conditions for the Score data in data40.csv

By varying the number of the selected condition in the upper-right corner of the interface, one can specifiy a single condition that is rendered in different colors (red and green) from the other conditions (which are rendered in black and blue). The value next to LLC expresses the lack-of-fit between the observed histograms and the modeled distributions (summed over all conditions). The optimal model parameters are determined by minimizing this LLC, which ILLMO has done automatically after reading in the data..

It is important that the user can inspect how well the observed histograms are approximated by the modeled distributions, as the follow-up steps in the statistical analysis use the distributions rather than the histograms to draw conclusions. These follow-up steps may hence be unreliable if the approximation is obviously flawed.

Next step: Model Optimization in Illmo

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s