ARTICLE 95: Research Methods for Ph. D. and Master’s Degree Studies: Methods for Organising and Analysing Data Part 1 of 2 Parts

Written by Dr. Hannes Nel

Data needs to be organised before it can be analysed.

Depending on whether a qualitative or quantitative approach is followed, the data needs to be arranged in a logical sequence or quantified.

This can be done by quantifying, sequencing, coding or memoing the data.

I discuss quantifying, sequencing and coding data in this article.

I will discuss memoing data in my second video on methods for organising and analysing data.

Quantifying data. Most data analysis today is conducted with computers, ranging from large, mainframe computers to small, personal laptops. Many computer programs are dedicated to analysing social science data, and it would be worth your while obtaining and learning to use such software if you need to write a thesis or dissertation, even if you do not exclusively use quantitative research methodology, because you might need to interpret some statistics or you might use some quantitative methods to enhance, support or corroborate your qualitative findings. However, you will probably need not much more than office software if you need to do largely qualitative research.

Almost all research software requires some form of coding. This can differ substantially from one software program to the next, so you will need to find out exactly how it works even before you purchase the software. Your study leader will probably know which software will be the most suitable for your research and give you advice on this. You will only quantify data if statistical analysis is necessary, so do not do this unless you know that you will need it in your thesis or dissertation.

Many people are intimidated by empirical research because they feel uncomfortable with mathematics and statistics. And indeed, many research reports are filled with unspecified computations. The role of statistics in research is quite important, but unless you write an assignment or thesis on statistics or mathematics you will not be assessed on your statistical or mathematical proficiency. That is why most universities offer statistical services. There are several private and public universities also offering such services, so use them. There is also nothing wrong with purchasing dedicated software to do your statistical analysis with, although it might be necessary to do a course on the software before you will be able to utilise it properly.

Sequencing the data. Many researchers are of the opinion that organising the data in a specific sequence offers the clearest available picture of the logic of causal analysis in research. This is called the elaboration model. Especially using contingency tables, this method portrays the logical process of scientific analysis.

When collecting material for interpretive analysis, you experience events, or the things people say in a linear, chronological order. When you then immerse yourself in field notes or transcripts, the material is again viewed in a linear sequence. This sequence can be broken down by inducing themes and coding concepts so that events or remarks that were far away from each other in a document, or perhaps even different documents, are now brought close together. This gives you a fresh view on the data and allows you to carefully compare sections of text that appear to belong together. At this stage, you are likely to find that there are all sorts of ways in which extracts that you grouped together under a single theme, differ, or that there are all kinds of sub-issues and themes that come to light.

Exploring themes more closely in this way is called elaboration. The purpose is to capture the finer nuances of meaning not captured by your original, possibly crude, coding system. This is also an opportunity to revise the coding system – either in small ways or drastically.  If you use software it might even be necessary to start your coding all over again. This can be extremely time-consuming, but at least every time you start over you end up with a much better structured research report.  

Coding. In most qualitative research, the original text is a set of field notes, data obtained through literature study, interviews, and focus groups. One of the first steps that you will need to take before studying and analysing data is to code the information. You can use cards for this, but dedicated computer software can save you time, effort and costs. Codes are typically short pieces of text referencing other pieces of text, graphical, audio, or video data. From a methodological standpoint, codes serve a variety of purposes. They capture meaning in the data. They also serve as tools for finding specific occurrences in the data that cannot be found by simple text-based search techniques. Codes also help you organise and structure the data that you collected.

Their main purpose is to classify many textual or other data units in such a manner that the data that belongs together can be grouped as such for easy analysis and structuring. One can, perhaps, think of coding as “indexing” your data. You can also see it as a way to mark keywords so that you can find, retrieve and group them more easily at a later stage. The length of a code should be restricted and should not be too long-winded.

Codes can also be used to classify data at different levels of abstraction, to group sets of related information units together for the purpose of comparison. This is what you would often use to consider and compare related arguments to make conclusions that can be the motivation for new knowledge. Dedicated computer software does not create new knowledge; it only helps you as the researcher to structure existing knowledge and experiences in such a manner that it will be easier for you to think creatively, that is to create new knowledge.

Formal coding will be necessary if you make use of dedicated research software. Even if you do not use research software you probably will need a method of coding to arrange your data according to the structure of your thesis or a dissertation. Your original data will probably include additional data, such as the time, date and place where the data was collected.

It is also a purpose of coding data to move to a higher conceptual level. The codes will inevitably represent the meanings that you infer from the original data, thereby moving closer towards the solution of your problem statement, or confirmation or rejection of your null hypothesis. By coding data, you will, of course, rearrange the data that you collected under different headings representing steps in the research process.

Five coding procedures are popularly used: open coding, in vivo coding, coding by list, quick coding and free coding.

With most qualitative research software, you can create codes first and then link them to sections in your data. Creating new codes is called open coding. The nature of the initial codes, which can be referred to as Level 1 codes or open codes, can vary and might change as you progress with your research. You should give a name for each new code that you open, and you can usually create one or more codes in a single step. These codes can stick closely to the original data, perhaps even reusing the exact words in the original data. Such codes can be deduced from research questions. In vivo coding is mostly used for this purpose. 

In vivo coding means creating a code for selected text as and when you come across text, or just a word in the text, that can and should serve as a code. This would normally be a word or short piece of text that would probably appear in other pieces of data that should be linked and grouped with the data in which you identified the code.

If you know where you are going with your study, you will probably create codes first (up front), then link them to sections of data. This would be coding by list. Coding by list allows you to select existing codes from a code list that you prepared in advance. You would typically select one or more codes associated with the current data selection.

You can also create codes as you work through your data, which would then be quick coding. In the case of quick coding, you will continue with the selected code that you are working with. This is an efficient method for the consecutive coding of segments using the most recently used code.

You can create codes that have not yet been used for coding or creating networks. Such codes are called free codes and they are a form of quick coding, although they can be prepared in advance. The reasons why you would create free codes can be:

  1. To prepare a stock of predefined codes in the framework of a given theory. This is especially useful in the context of teamwork when creating a base project.
  2. To code in a “top-down” (or deductive) way with all necessary concepts already at hand. This complements the “bottom-up” (or inductive) open coding stage, in which concepts emerge from the data.
  3. To create codes that come to mind during normal coding work and that cannot be applied to the current segment but will be useful later.

It will be easier to code data if you already have a good idea of what you are trying to achieve with your research. Sometimes the data will actually “steer” you towards codes that you did not even think of in the beginning. This is typical of a grounded theory approach, although you should always keep an open mind about your research, regardless of which approach you follow. Coding also helps to develop a schematic diagram of the structure of your thesis or dissertation. This can be based on your initial study proposal. A mindmap can, for example be used to structure your research process and to identify initial codes to start with.

A code may contain more than a single word but should be concise. There should be a comment area on your screen that you can use to write a definition for each code, if you need one. As you progress in doing the first level coding, you may start to understand how your data might relate to broader conceptual issues. Some of your field experiences may in fact be sufficiently similar so that you might be able to group different coded data together on a higher conceptual level. Your coding has then proceeded to a higher set of codes, referred to as Level 2 or category codes.

After a code has been created, it appears as a new entry in several locations (drop-down list, code manager). In this respect the following are important to remember:

  1. Groundedness: Groundedness refers to the number of quotations associated with the code. Large numbers indicate strong evidence already found for this code.
  2. Density: The number of codes connected to this code is indicated as the density. Large numbers can be interpreted as a high degree of theoretical density.
  3. Comment: The tilde character “~” can, as an example, be used to flag commented codes. It is not used for codes only but for all commented objects.

It is not only text that can be coded. You can also code graphic documents, audio and video material. There are many other ways in which codes can be utilised, for example they can be sorted, modified, renamed, deleted, merged and of course reported.

Axial coding. Axial coding is the process of putting data back together after it has been restructured by means of open coding. Open coding allows you to select data that belong together (under a certain code or sub-code) taken from a variety of sources containing the original or primary data. Categories of data are, thus, systematically developed and linked with subcategories. You can then develop a new narrative through a process of reconstruction. The new narrative might apply to a different context and should be articulated to the purpose of your research.

The articulation of selected data can typically relate to a condition, strategy or consequences. Data relating to a condition or strategy should address conditions that lead to the achievement of the purpose of the study. The purpose of the study will always be to solve a problem statement or question or to prove or disprove a null hypothesis. Consequential data include all outcomes of action or interaction.

Selective coding. Selective coding refers to the process of selecting a core category, systematically relating it to other categories, validating those relationships, and filling in categories that need further refinement and development. Categories are, thus, integrated and refined. The core category would be the central phenomenon to which all the other categories are linked. To use a romantic example, in a novel you will identify the plot first, then the storyline, which you should analyse to identify the elements of the storyline that relate to the plot. From this you should be able to deduce lessons learned or a moral for the story.


Data is mostly organised by making use of dedicated computer programmes.

Most such computer programmes require some form of coding.

Data can be sequenced by following an elaboration model.

Contingency tables are mostly used to achieve logic in scientific analysis.

Data is often analysed in a linear, chronological order.

Codes are typically short pieces of text referencing other pieces of text, graphical, audio or video data.


  1. Capture meaning.
  2. Serve as tools for finding specific occurrences in the data.
  3. Help you to organise and structure the data.
  4. Classifies textual or other data units in related groups and at different levels of abstraction.

Dedicated computer software does not create new knowledge.

Five coding procedures are popularly used.

They are open coding, in vivo coding, coding by list, quick coding and free coding.

Open coding means creating new codes.

In vivo coding means creating a code for elected text as and when you come across text, or just a word in text, that can and should serve as a code.

Coding by list is used when you know where you are going with your study so that you can create the codes even before collecting data.

Quick coding means creating codes as you work through your data.

Free codes are codes that have not been used yet. They can be the result of coding by list or quick coding.

To the five coding procedures should be added axial coding and selective coding.

Axial coding is the process of putting data back together after it has been restructured by means of open coding.

Selective coding refers to the process of electing a core category, systematically relating it to other categories, validating those relationships, and filling in categories that need further refinement and development.

You should always keep an open mind about your research and the codes that you create.


If what I discussed here sounds confusing and alien, then it is probably because of what we discussed under schema analysis in my previous video.

It is unlikely that the level of language used here is beyond you.

If that were the case, you would not have watched this video.

No doubt you will understand everything if you watch this video again after having tried out one or two of the computer programmes that deal with especially qualitative research.

Enjoy your studies.

Thank you.

Continue Reading