## ARTICLE 40: Research Methods for Ph. D. and Master’s Degree Studies: Statistical Research Methods Part 2 of 2

Written by Dr. Hannes Nel

Introduction

It is said that Albert Einstein wrote in a letter to his daughter in 1938 that there is an extremely powerful force that governs all the universe, a variable that scientists often forget.

He confessed that he omitted the variable when he developed the relativity theory.

That variable is love.

I don’t know if the story is true.

And if it is true, then Einstein probably wrote a letter to his daughter in which he declared his love for her in a most romantic and creative manner.

Even so, it made me think. When scientists start their findings by writing: “all other factors being constant,” they are admitting that their findings are wrong.

Because other factors are never constant.

I guess I am just being difficult because it is sometimes necessary to investigate the influence of single factors on an event, behaviour or phenomenon.

Even so, the need for considering the interrelationship between different factors are also important.

I will discuss the following issues related to statistical research methods in this article:

1. Determining validity from statistical data.
2. Calculating statistical significance.
3. Analyzing statistics.
4. Coming to conclusions from statistical data.

Determining validity from statistical data

Statistical validity refers to whether conclusions drawn from a statistical study agree with statistical and scientific laws.

There are different kinds of statistical validities that are relevant to research.

The following are examples of such statistical validities.

1. Construct validity. Construct validity ensures that the results of the data that you collected conform to the theory of your research.

For example, a questionnaire on the quality of learning provided by universities and completed by employers must provide a true picture of the value that university studies have for the workplace.

• Content validity. Content validity ensures that the test or questionnaire that you prepared covers all aspects of the variable that is being studied.

For example, if you were to do research on the exam papers that students studying towards a degree in accounting must write, then the exam papers must test all the exit level outcomes of the subject to have content validity.

• Face validity. Face validity is related to content validity and is a quick initial estimate to check if the test that you will conduct is in line with the hypothesis that you are investigating. It is more subjective than content validity.

For example, if you were to do research on the exam papers that students studying towards a degree in accounting must write, and it looks like a good exam paper that meets the requirements for assessment, then on appearance the exam papers can have face validity.

• Conclusion validity. Conclusion validity is achieved when the conclusions that you reach from the data that you collected are accurate and justified.

This will be the case if the sample or samples that you used are large enough, randomly chosen and taken from the population being investigated.

• Criterion validity. Criterion validity measures how closely the results that you obtain with a data collection instrument matches that of a different instrument.

For example, if you use a questionnaire to measure the extent to which university studies add value to the workplace, and you measure the same research question by making use of a different, proven questionnaire that was used for the same purpose previously, then your questionnaire will have criterion validity if it delivers the same results, or at least nearly the same results, as the proven questionnaire.

• Internal validity. Internal validity is achieved if you can claim that  the results that you achieved with your research can be attributed to the factors that you considered and not to other factors which you did not consider.

It is a measure of the inherent cause and effect relationship between the factors that you considered in your research.

For example, if you can prove that a certain symptom is an indication of only one specific illness, then your finding will have internal validity.

• External validity. External validity relates to how you apply the results of your investigation to the wider population.

It tells you if your findings apply generally or only to the target group of your research.

For example, if you can prove that your findings, based on a sample taken from a certain population also apply to any other group from any other population, then your research findings will have external validity.

Calculating statistical significance

Statistical significance means that you are sure that the statistics that you generated are reliable.

It does not necessarily mean that your findings are important.

Statistical significance can, for example, be skewed by the size of the sample that you investigate.

The larger the sample, the more significant will small differences between two variables appear to be.

Analyzing statistics

There are many ways in which statistics can be analyzed.

Numbers as such seldom provide a clear picture of any value for coming to conclusions.

Dedicated computer software will mostly provide you with such numbers.

Ultimately, however, it is up to you to interpret the numbers and to make sense of them.

For this purpose, you might need to summarize the data or rearrange it in tabular or graphic format.

Some computer programmes might do this for you. They might even interpret the data to an extent, but they will not come to conclusions or findings.

You might, for example, need to group statistics in different age groups, gender, nominal ranges, etc. to see the bigger picture from which to come to conclusions.

This can be made visual in the forms of tables, line graphs, bar charts, histograms, etc.

The computer software might do some handy calculations for you, for example calculating averages, also called the mean; medians (the midpoint of the data); mode (most common value in a set of data); range (the difference between the smallest and the largest value); standard deviation (the average spread around the mean); variance (the square of the standard deviation), etc.

Coming to conclusions from statistical data

Statistics are often used as a basis for coming to conclusions about presumed effects and relationships.

There are several principles of statistics that, if violated, can affect the inferences made from results as well as subsequent conclusions of the research.

Sophisticated statistics do not guarantee valid conclusions.

You will, therefore, need to obtain the assistance of an expert statistician to help you interpret statistical data if you are not one.

However, coming to conclusions and developing findings from them are still your responsibility.

Summary

The internal validity of conclusions is tested against statistical and scientific laws.

There are different kinds of validities, depending on how and against what your conclusions are tested.

Kinds of validity include:

1. Construct validity.
2. Content validity.
3. Face validity.
4. Conclusion validity.
5. Criterion validity.
6. Internal validity.
7. External validity.

Statistical significance is achieved if your statistics are reliable.

Reliability is often damaged when the size or composition of your sample is wrong.

Statistics can be analyzed by consolidating them in a table or graphs.

Some analysis and calculations are often done by computer.

You will need to come to your own conclusions based on your interpretation of the data.

Finally, you will need to derive findings from your conclusions.

## ARTICLE 39: Research Methods for Ph. D. and Master’s Degree Studies: Statistical Research Methods Part 1 of 2.

Written by Dr Hannes Nel

Introduction

Technicist scientists are of the opinion that natural science is the most reliable path to the truth.

They, furthermore, claim that the truth can only be known through scientific proof.

Truth is based on logic, they say.

They believe in the singularity of meaning.

Something is either true or false – there is no in-between.

That is why they often base their research on hypotheses rather than a research question or statement.

They also believe that the truth can be proven and expressed numerically.

And they believe that the truth is not dependent on context or time.

Social scientists do not agree.

They feel that experience and reflection should not be neglected.

Individual and group perceptions can often be the truth.

The truth can be different in different contexts and at differ times.

Logic is simply common sense.

I will discuss the following issues related to statistical research methods in this article:

1. Investigating a statistical hypothesis.
2. Conducting statistical regression analysis.

Investigating a statistical hypothesis

You will mostly use a hypothesis in statistical research, although it is also possible to base your research on a problem statement or question.

You will need to formulate two opposing hypotheses – the null hypothesis and the alternative hypothesis.

The null hypothesis, indicated with H0, (H-naught) is a statement about the population that you believe to be true.

The alternative hypothesis, indicated with H1, is a claim about the population that is contradictory to H0. It is what we will conclude when you reject H0.

A null hypothesis can often be proved or disproved by means of statistical research.

One of your samples will support the H0 hypothesis while the other will support the H1 hypothesis.

You will reject the H0 hypothesis if the sample information favours the H1 hypothesis.

Or you will not reject the H0 hypothesis if the sample information is insufficient to reject it.

For example, your H0 hypothesis can be:

30% or less of the people who contracted the COVID-19 virus lived in rural areas.

You can also write the null hypothesis like this: H0 ≤ .3

Your H1 hypothesis will then be:

More than 30% of the people who contracted the COVID-19 virus did not live in rural areas.

You can also write the alternative hypothesis like this: N1 > .3

You will also need to calculate the size of the sample that you should use with a certain accuracy probability.

Dedicated computer programmes will do this for you.

Once you have composed a sample that will give you some answers with an acceptable level or probability, you will need to interpret the data that was probably analyzed with dedicated software.

You will need to set certain norms, or criteria, for the analysis of the data that you collected for the population first.

The samples also need to meet those norms, criteria or parameters.

A null hypothesis needs to be proven by comparing two sets of data.

If you reject the null hypothesis, then we can assume that there is enough evidence to support the alternative hypothesis.

That is: More than 30% of the people who contracted the COVID-19 virus did not live in rural areas.

You will probably compare the mean of observations or responses for the two sets of data.

It might sometimes be necessary to use the mode, median or correlation between the sets of data.

Random variability between different samples will also always be present.

There might also be small differences between the statistical relationship in the sample and the population.

It is possible that this can be just a matter of sample error.

Dedicated computer software will do the statistical calculations for you.

A null hypothesis does not “prove” anything to be true, but rather that the hypothesis is false.

If you cannot prove the two phenomena or populations to be different, then they are probably the same.

Then again, if the statistical analysis does not enable you to reject the null hypothesis, it does not necessarily mean that the null hypothesis is true.

Conducting statistical regression

Statistical regression analysis is a generic term for all methods in which quantitative data is collected and interpreted to numerically express the relationship between two groups of variables.

The expression may be used to describe the relationship between the two groups of variables.

It can also be used to predict values, although one must be careful of trying to predict future trends based on statistical data.

The two data groups, popularly represented by X and Y, are compared numerically or graphically to identify a relationship between the items or groups of items X and Y.

You can mostly use such comparisons to determine trends and correlation between variables.

It might, for example, be possible to identify a correlation between the hours that a student spends studying and his or her eventual performance in the exam.

Correlation measures the strength of association between two variables and the direction of the relationship.

In terms of the strength of the relationship, the values of the correlation coefficient will always vary between +1 and -1.

A value of +1 indicates a perfect degree of association between the two variables.

That means that if one thing happens, then something else will also happen.

For example, if you cut your arm you will bleed.

A value of -1 indicates a negative relationship between two variables.

For example, the faster you drive, the less time will it take you to reach your destination.

For example, the decrease in the number of new individuals that test positive for the COVID-19 virus does not enable us to predict when the pandemic will come to an end.

You can, perhaps, argue that correlation enables you to predict what will happen to one variable if a second variable changes.

However, predicting that such change will take place is often difficult, if not impossible in social sciences.

You can predict with a good measure of accuracy what will happen if you add certain amounts of yeast to the dough for baking bread.

But you cannot always predict how the baker will respond if she or he serves you the bread and you criticize it.

The situation is different in exact sciences, such as chemistry, where the scientist can initiate the change and control the size, measure and frequency of change.

Summary

You will probably use two opposing hypotheses in statistical research.

The null hypothesis is a statement about a population that you believe to be true.

The alternative hypothesis should contradict the null hypothesis.

You will use two samples to prove or disprove your hypothesis.

The findings that you gather from your analysis of the samples should apply to the population as well.

There might, however, be a sample error of which you should take note.

Statistical regression analysis investigates the relationship between two sets of variables.

It can show a correlation between the sets of variables.

It can also sometimes be used to predict values.

I would rather call it “foresee” values, because prediction based on statistics can be risky.

Relationships can be compared numerically or graphically.

Correlation between two variables can be anything between +1 and -1.

A value of +1 would indicate a perfectly positive correlation.

A value of -1 would indicate a perfectly negative correlation.