Jamie Hale

Jamie Hale

Sunday, October 10, 2021

Measures in Science

 An instrument can provide accuracy and preciseness but lack value if the measurement is non-valid. When determining the validity of the measurement one must ask does the measurement really measure the concept in question?

The key aspects concerning the quality of scientific measures are reliability and validity (Hale, 2011). Reliability is a measure of the internal consistency and stability of a measuring device. Validity gives us an indication of whether the measuring device measures what it claims to.

Internal consistency is the degree in which the items or questions on the measure consistently assess the same construct. With an internally consistent measure items are positively correlated with each other. This measure of internal consistency is particularly important regarding self-report measures. It isn't as important when considering performance based measures, tests or surveys. Each question should be aimed at measuring the same thing. Stability is often measured by test / retest reliability. The same person takes the same test twice and the scores from each test are compared. Interrater reliability is sometimes used in assessing reliability. With interrater reliability different judges or raters (two or more) make observations, record their findings and then compare their observations. If the raters are reliable then the percentage of agreement should be high.

When asking if a measure is valid we are asking if it measures what is supposed to. Validity is a judgment based on collected data; it is not a statistical test. Two primary ways to determine validity include: existing measures and known group differences.

The existing measures test determines if the new measure correlates with existing relevant valid measures. The new measure should be similar to measures that have been recorded with already-established valid measuring devices. Known group differences determine whether the new measure distinguishes between known group differences. An illustration of known group differences is seen when different groups are given the same measure, and are expected to score differently. As an example, if you were to give Democrats and Republicans a test assessing the strength of certain political views, you would expect them to score differently. Various sub-categories of validity (external, internal, statistical and construct) are also important in some contexts. Validity rating is not overly objective; in fact, it is relatively subjective in some areas. There isn't a perfect validity.

It is possible to have a reliable but not valid measure. However, a valid measure is always a reliable measure,

Often, when using unsystematic (non-scientific) approaches to knowledge measures are not reliable or valid. That is, they do not measure the trait or characteristic of interest consistently nor do they measure what they are intended to measure. Quality scientific approaches generally make great efforts to ensure reliability and validity.

What about Replication in Science??

Replicable (reproducible) findings are important to science; they are a sub-component of converging evidence. When referring to the replication crisis it is important to understand that what is meant- is lack of replicating statistically significant findings. It would be more precise to say there is a "statistically significant replication crisis." Consider replication from another perspective; the original study failed to detect stat...sign.. (using criteria NHST prevalent with use of frequentist stats), but additional studies detect statistical significance.  What would the implications be??  College instructors should make an effort to address this condition- non-significant precedes significant findings. Students are often advised no need to try to replicate non-significant findings, but sign..findings should be replicated. This implies that the non-sign....findings must be accurate (if they occurred first), even though all studies are susceptible to flaws.  Read more 

Learn more about the need for science, rationality and statistics  - In Evidence We Trust  




Monday, April 26, 2021

Nonsense Detection Kit 2.0

The Nonsense Detection Kit 2.0 is a revision of The Nonsense Detection Kit. The contents of the full kit can be found in In Evidence We Trust 2nd Edition.

The impetus for writing the Nonsense Detection Kit was previous suggestions made by Sagan (1996), Lilienfeld et al. (2012) and Shermer (2001). The Nonsense Detection Kit is referring to nonsense in terms of “scientific nonsense”. So, nonsense as it is referred to here refer to “nonscientific information” that is often perpetuated as scientific, when in fact it is not scientific.

The Nonsense Detection Kit provides guidelines that can be used to separate sense from nonsense. There is no single criterion for distinguishing sense from nonsense, but it is possible to identify indicators, or warning signs. The more warnings signs that appear the more likely that claims are nonsense.

Below is a brief description of indicators that should be useful when separating sense from nonsense. These indicators should be useful when evaluating claims made by the media, on the Internet, in peer-reviewed publications, in lectures, by friends, or in everyday conversations with colleagues.

Nonsense indicator- claims haven’t been verified by an independent source

Nonsense perpetuators often claim special knowledge. That is, they have made specific discoveries that only they know about. Others lack know how, or do not have the proper equipment to make the finding. These findings are often reflected in phrases such as, “revolutionary breakthrough”, “what scientists don’t want you to know”, “what only a limited few have discovered”, and so on. These findings are not subject to criticism or replication. That is not how science works. When conducting studies it is imperative that researchers operationalize (provide operational definition- precise observable operation used to manipulate or measure a variable) variables so the specifics can be criticized and replicated. Non-scientists are not concerned with others being able to replicate their findings; because they know attempted replications will probably be unsuccessful. If a finding cannot be replicated this is a big problem, and it is unreasonable to consider a single finding as evidence. It is also problematic when only those making the original finding have replicated successfully. When independent researchers using the same methods as those used in the original study are not able to replicate this is a sign that something was faulty with the original research.

Nonsense indicator- claimant has only searched for confirmatory evidence

The confirmation bias is a cognitive error (cognitive bias) defined as tendency to seek out confirmatory evidence while rejecting or ignoring non-confirming evidence (Gilovich, 1991). Confirmation bias is pervasive, and may be the most common cognitive bias. Most people have a tendency to look for supporting evidence, while ignoring or not looking very hard for disconfirmatory evidence (showing a dislike for disconfirmatory evidence). This is displayed when people cherry pick the evidence. Of course, when you’re a lawyer this is what you need to do. You don’t want any evidence entering into the case that may be incongruent with the evidence you present. However, as a scientist it is important to look for disconfirming evidence. In fact, it has been suggested that a good scientist goes out of their way to look for disconfirmatory evidence. Why look for disconfirmatory evidence? Because when discovering reality is the objective it is necessary to look at all the available data, not just the data supporting one’s own assertions. Confirmation bias occurs when the only good evidence, according to the claimant, is the evidence that supports their claim. Often, perpetuators of nonsense may not even be aware of disconfirmatory evidence. They have no interest in even looking at it.

A study by Frey & Stahlberg (1986) examined how people cherry-pick the evidence. The participants took an IQ test and were given feedback indicating their IQ was either high or low. After receiving feedback participants had a chance to read magazine articles about IQ tests. The participants that were told they had low IQ scores spent more time looking at articles that criticized the validity of IQ tests, but those who were told they had high IQ scores spent more time looking at articles that supported the claim that IQ tests were valid measures of intelligence.

Scientific thinking should involve an effort to minimize confirmation bias. However, science does involve confirmation bias to a degree; this if often demonstrated in publication bias and forms of myside bias. The late Richard Feynman (Nobel Laureate, Physics) suggested that science is a set of processes that detects self-deception (Feynman, 1999). That is, science makes sure we don’t fool ourselves.