Written by Dr Hannes Nel
Technicist scientists are of the opinion that natural science is the most reliable path to the truth.
They, furthermore, claim that the truth can only be known through scientific proof.
Truth is based on logic, they say.
They believe in the singularity of meaning.
Something is either true or false – there is no in-between.
That is why they often base their research on hypotheses rather than a research question or statement.
They also believe that the truth can be proven and expressed numerically.
And they believe that the truth is not dependent on context or time.
Social scientists do not agree.
They feel that experience and reflection should not be neglected.
Individual and group perceptions can often be the truth.
The truth can be different in different contexts and at differ times.
Logic is simply common sense.
I will discuss the following issues related to statistical research methods in this article:
- Investigating a statistical hypothesis.
- Conducting statistical regression analysis.
Investigating a statistical hypothesis
You will mostly use a hypothesis in statistical research, although it is also possible to base your research on a problem statement or question.
You will need to formulate two opposing hypotheses – the null hypothesis and the alternative hypothesis.
The null hypothesis, indicated with H0, (H-naught) is a statement about the population that you believe to be true.
The alternative hypothesis, indicated with H1, is a claim about the population that is contradictory to H0. It is what we will conclude when you reject H0.
A null hypothesis can often be proved or disproved by means of statistical research.
One of your samples will support the H0 hypothesis while the other will support the H1 hypothesis.
You will reject the H0 hypothesis if the sample information favours the H1 hypothesis.
Or you will not reject the H0 hypothesis if the sample information is insufficient to reject it.
For example, your H0 hypothesis can be:
30% or less of the people who contracted the COVID-19 virus lived in rural areas.
You can also write the null hypothesis like this: H0 ≤ .3
Your H1 hypothesis will then be:
More than 30% of the people who contracted the COVID-19 virus did not live in rural areas.
You can also write the alternative hypothesis like this: N1 > .3
You will also need to calculate the size of the sample that you should use with a certain accuracy probability.
Dedicated computer programmes will do this for you.
Once you have composed a sample that will give you some answers with an acceptable level or probability, you will need to interpret the data that was probably analyzed with dedicated software.
You will need to set certain norms, or criteria, for the analysis of the data that you collected for the population first.
The samples also need to meet those norms, criteria or parameters.
A null hypothesis needs to be proven by comparing two sets of data.
If you reject the null hypothesis, then we can assume that there is enough evidence to support the alternative hypothesis.
That is: More than 30% of the people who contracted the COVID-19 virus did not live in rural areas.
You will probably compare the mean of observations or responses for the two sets of data.
It might sometimes be necessary to use the mode, median or correlation between the sets of data.
Random variability between different samples will also always be present.
There might also be small differences between the statistical relationship in the sample and the population.
It is possible that this can be just a matter of sample error.
Dedicated computer software will do the statistical calculations for you.
A null hypothesis does not “prove” anything to be true, but rather that the hypothesis is false.
If you cannot prove the two phenomena or populations to be different, then they are probably the same.
Then again, if the statistical analysis does not enable you to reject the null hypothesis, it does not necessarily mean that the null hypothesis is true.
Conducting statistical regression
Statistical regression analysis is a generic term for all methods in which quantitative data is collected and interpreted to numerically express the relationship between two groups of variables.
The expression may be used to describe the relationship between the two groups of variables.
It can also be used to predict values, although one must be careful of trying to predict future trends based on statistical data.
The two data groups, popularly represented by X and Y, are compared numerically or graphically to identify a relationship between the items or groups of items X and Y.
You can mostly use such comparisons to determine trends and correlation between variables.
It might, for example, be possible to identify a correlation between the hours that a student spends studying and his or her eventual performance in the exam.
Correlation measures the strength of association between two variables and the direction of the relationship.
In terms of the strength of the relationship, the values of the correlation coefficient will always vary between +1 and -1.
A value of +1 indicates a perfect degree of association between the two variables.
That means that if one thing happens, then something else will also happen.
For example, if you cut your arm you will bleed.
A value of -1 indicates a negative relationship between two variables.
For example, the faster you drive, the less time will it take you to reach your destination.
For example, the decrease in the number of new individuals that test positive for the COVID-19 virus does not enable us to predict when the pandemic will come to an end.
You can, perhaps, argue that correlation enables you to predict what will happen to one variable if a second variable changes.
However, predicting that such change will take place is often difficult, if not impossible in social sciences.
You can predict with a good measure of accuracy what will happen if you add certain amounts of yeast to the dough for baking bread.
But you cannot always predict how the baker will respond if she or he serves you the bread and you criticize it.
The situation is different in exact sciences, such as chemistry, where the scientist can initiate the change and control the size, measure and frequency of change.
You will probably use two opposing hypotheses in statistical research.
The null hypothesis is a statement about a population that you believe to be true.
The alternative hypothesis should contradict the null hypothesis.
You will use two samples to prove or disprove your hypothesis.
The findings that you gather from your analysis of the samples should apply to the population as well.
There might, however, be a sample error of which you should take note.
Statistical regression analysis investigates the relationship between two sets of variables.
It can show a correlation between the sets of variables.
It can also sometimes be used to predict values.
I would rather call it “foresee” values, because prediction based on statistics can be risky.
Relationships can be compared numerically or graphically.
Correlation between two variables can be anything between +1 and -1.
A value of +1 would indicate a perfectly positive correlation.
A value of -1 would indicate a perfectly negative correlation.