A basic requirement for test and survey items is that they are able to detect variance with respect to a latent variable. To do this, an item scale must discriminate between test subjects and must have a systematic, clear and sufficiently strong relationship with the underlying construct.

One possibility to examine the variability of an item is the computation of the relative information content. The relative information content (also called relative entropy) is a dispersion measure for at least nominally scaled variables. But it also can be calculated on higher scale levels.

Mathematically, relative entropy can be expressed in a function as follows:

Where ** k** represents the number of response categories of an investigated item and

*represents the value of relative frequency in each category. The more evenly the frequencies are distributed across the various categories, the greater the effect. However, the characteristic of the relative entropy value also depend on the number of categories, so that it cannot be interpreted independently of the number of categories. If all response categories occur equally frequently, the relative entropy (*

**hj****) is maximum =**

*H**. If only one category occurs (i.e. the frequencies for all other categories are 0),*

**1 (100 %)***is minimal =*

**H***.*

**0 (0 %)**

Unfortunately, there is no function implemented in R to compute the relative information entropy of an item. I also don't know a package where these function is included. For this reason I wrote my own R function:

** **

**Rel.Entropy <- function(x,
cat, na.rm = TRUE) {**

** **

** if(na.rm==TRUE){ **

** tbl <- table(x, useNA="no")**

** }**

** else{ **

** tbl <- table(x, useNA="always")**

** }**

** **

** prp <- tbl/sum(tbl)**

** eq1 <- -1/log(scale)**

** eq2 <- sum(prp*log(prp))**

** eq3 <- eq1 * eq2**

** res <- paste(round(eq3*100, 2),**

** "%", sep = "")**

** **

**sum.tbl <- data.frame(tbl)**

**sum.prp <- data.frame(**

** "pcnt"= paste(round(**

** 100*prp, digits=2), **

** "%", sep=""))**

** **

**sum.all <- data.frame(**

** sum.tbl, sum.prp)**

** **

**sum.all <- data.frame(**

** "Cat"= sum.all$x, **

** "Freq"= sum.all$Freq, **

** "Prop"= sum.all$pcnt)**

** **

**RETURN<- list(**

** Table = sum.all, **

** Entropy = res)**

** **

**return(RETURN)**

** **

**}**

* Rel.Entropy*: computes the relative entropy of an item.

* x*: one or more objects which can be interpreted as factors (including character strings), or a list or data frame whose components can be so interpreted.

* cat*: number of response categories (Required. The characteristic of the relative entropy level depend on the number of response categories)

* na.rm*: controls if the table includes counts of

*values. The allowed values correspond to*

**NA***to remove NAs and*

**rm.na = "TRUE"***to to keep*

**rm.na = "FALSE"***s on the dataset.*

**NA**

O.k. let's see how it works... First we create three example datasets based on a Likert scale with 10 response categories.

**set.seed(3436)**

**Ex.1 <- round(runif(1000, min = 1, max = 10))**

**Ex.2 <- round(runif(1000, min = 3, max = 7))**

**Ex.3 <- round(runif(1000, min = 4, max = 6))**

To visualize the distributions of the example data we can plot histograms:

**library(ggplot2)**

library(ggpubr)

**Fig.1 <- qplot(**

** Ex.1, **

** geom = "histogram", **

** binwidth= 1, **

** xlim = c(0, 11), **

** col =I("black"), **

** fill = I("grey"), **

** xlab = "Example 1", **

** ylab = "Count") +**

theme_classic()

**Fig.2 <- qplot(**

** Ex.2, **

** geom = "histogram", **

** binwidth= 1, **

** xlim = c(0, 11), **

** col =I("black"), **

** fill = I("grey"), **

** xlab = "Example 2", **

** ylab = "Count")+**

theme_classic()

**Fig.3 <- qplot(**

** Ex.3, **

** geom = "histogram", **

** binwidth= 1, **

** xlim = c(0, 11), **

** col =I("black"), **

** fill = I("grey"), **

** xlab = "Example 3", **

** ylab = "Count")+**

theme_classic()

**ggarrange(Fig.1, Fig.2, Fig.3, **

** labels = c("A", "B", "C"), **

** ncol = 3, **

** nrow = 1)**

As we can see the dispersion level between the Examples varies from high to low.

Now let's call the * Rel.Entropy* function from above:

**Rel.Entropy(Ex****.1, cat = 10, na.rm = FALSE)**

**Rel.Entropy(Ex.2, cat = 10, na.rm = FALSE)**

**Rel.Entropy(Ex.3, cat = 10, na.rm = FALSE)**

When running the lines we get the following expressions:

*#Example 1*

**> Rel.Entropy(Ex.1, scale = 10, na.rm = TRUE)**

$Table

Cat Freq Prop

**1 1 48 4.8%**

**2 2 93 9.3%**

**3 3 109 10.9%**

**4 4 109 10.9%**

**5 5 102 10.2%**

**6 6 129 12.9%**

**7 7 107 10.7%**

**8 8 119 11.9%**

**9 9 121 12.1%**

**10 10 63 6.3%**

$Entropy

**[1] "98.54%"**

*#Example 2*

**> Rel.Entropy(Ex.2, scale = 10, na.rm = TRUE)**

$Table

Cat Freq Prop

**1 3 141 14.1%**

**2 4 237 23.7%**

**3 5 266 26.6%**

**4 6 231 23.1%**

**5 7 125 12.5%**

$Entropy

**[1] "68.1%"**

*#Example 3*

**> Rel.Entropy(Ex.3, scale = 10, na.rm = TRUE)**

$Table

Cat Freq Prop

**1 4 252 25.2%**

**2 5 500 50%**

**3 6 248 24.8%**

$Entropy

**[1] "45.15%"**

The function creates a summary table with a category, frequency and proportion column followed by the relative entropy value in percentage.

As we can see, the relative information content decreases with the decrease of variability in the example data (* Ex.1 = 98.54% to Ex.3 = 45.15 %*).

The R scripts can be downloaded at: https://osf.io/ky3f2/