A common feature of any kind of primary research is that empirical data is used to support arguments and claims. In quantitative research this evidence is almost entirely number-based statistics. In a study, such as a questionnaire study, you will need to analyse the results statistically and include your results in tables and graphs to illustrate and support your findings.
When writing about your findings in the results section of your report, it is important to remember that the purpose is to present the results of your data analysis. It is normally not appropriate at this stage to discuss these results. That takes place in the discussion and conclusions.
The primary purpose of the results section is to present the data in a standard way. It is important to structure the results section, addressing each hypothesis in order. The normal format is for the results of the research to be reported factually and formally without detailed analysis. Results are presented both verbally and with ﬁgures and tables to help understanding.
There are standard conventions to follow when reporting statistics. It is usual to start by providing an overview of the hypothesis to be tested and a description of the test(s) used. This is followed by descriptive statistics, such as central tendency and standard deviation/variance. These are then followed by the inferential statistics, such as t-tests or correlations. When reporting the results of the ﬁndings from inferential tests, it is important to include the obtained value of the test statistic, the degrees of freedom, and level of probability with the implications for the null and alternative hypotheses. In addition, it is common for effect sizes to be reported.
This information is reported in a concise statement, such as:
An independent-samples t-test was conducted to evaluate the hypothesis that using mental images produces a significant difference in memory performance. The group using mental images recalled more words (M = 25, SD = 4.71) than the group that did not use mental images (M = 19, SD = 4.22). This difference was significant, t(18) = -3.00, p < .05, two-tailed. (Gravetter & Wallnau, 1996, p. 299). |
In the first sentence the hypothesis to be tested and the test used is introduced. In the second sentence, the mean (M = 25) and the standard deviation (SD = 4.71) are presented. The next sentence provides the results of the statistical analysis. Note that the degrees of freedom are reported in parentheses immediately after the symbol t. The value for the obtained t statistic follows (-3.00), and next is the probability of committing a Type I error (less than 5%). Finally, the type of test (one versus two-tailed) is noted.
We have carried out an independent-samples t-test to compare the happiness scores for American men and women. There was a signiﬁcant difference in scores for men (M = 1.76, SD = .69) and women (M = 1.83, SD = .63), t(1502) = -2.20, p<.05, but the magnitude of the difference in the means was very small (eta squared = .003), with sex explaining only .3 per cent of the variance in happiness (Dörnyei, 2007. p. 217) |
In the first sentence the hypothesis to be tested and the test used is introduced. In the second sentence, the mean (M = 1.76) and the standard deviation (SD = .69) are presented. The next sentence provides the results of the statistical analysis. Note that the degrees of freedom are reported in parentheses immediately after the symbol t. The value for the obtained t statistic follows (-2.02), and next is the probability of committing a Type I error (less than 5%). Finally, the effect size (.003) is noted.
According to the American Psychological Association (2001, p. 22):
When reporting inferential statistics (e.g., t tests, F tests, and chi-square), include information about the obtained magnitude or value of the test statistic, the degrees of freedom, the probability of obtaining a value as extreme as or more extreme than the one obtained, and the direction of the effect. Be sure to include sufficient descriptive statistics (e.g., per-cell sample size, means, correlations, standard deviations) so that the nature of the effect being reported can be understood by the reader and for future meta-analyses. This information is important, even if no significant effect is being reported. When point estimates are provided, always include an associated measure of variability (precision), specifying its nature (e.g., the standard error).
To begin the quantitative research process, the researcher often states two opposing hypotheses:
The first is the null hypothesis, or H_{0}. This hypothesis states that the treatment has no eﬂect, that there is no change, no difference, that nothing happened.
The null hypothesis (H_{0}) predicts that the independent variable (treatment) has no effect on the dependent variable for the population.
The second hypothesis is usually called the alternative hypothesis (H_{1}). This hypothesis states that the treatment dies have an effect on the dependent variable.
The alternative hypothesis (H_{1}) predicts that the independent variable (treatment) does have an effect on the dependent variable for the population.
After data collection, the researcher compares the data with the null hypothesis and makes a decision according to criteria established earlier. There are two possible decisions, and both are stated in terms of the null hypothesis.
One possibility is that the researcher decides to reject the null hypothesis.
In this case, the data provides strong evidence that the treatment does have an effect.
The second possibility is to fail to reject the null hypothesis.
In this case, the data does not provide evidence that the treatment has an effect.
Rejecting or disproving the null hypothesis is a central task in modern scientific practice.
However, when you are writing about your findings, you will not usually write about the null hypothesis. In research reports, the researcher does not actually state that “the null hypothesis was rejected.” Instead, you report that the effect of the treatment was statistically significant. Likewise, when H_{0} is not rejected, you simply state that the treatment effect was not statistically significant or that there was no evidence for a treatment effect. In fact, when you read scientific reports, you will note that the terms null hypothesis and alternative hypothesis are rarely mentioned.
Findings are said to be statistically significant when the null hypothesis has been rejected. Thus, if results achieve statistical significance, the researcher concludes that a treatment effect occurred.
(Gravetter & Wallnau, 1996, chs. 8 & 9).
Figure 1 shows the mean attractiveness ratings given by participants in each of the four experimental conditions: When participants were drunk, the attractiveness ratings were higher than when participants were sober, supporting the idea that the beer-goggles effect is alcohol dependent. The level of lighting appeared to have an effect in sober participants who rated the stooges as more attractive in dim lighting, M = 40.77, 95% CI [36.66, 44.77], than a light setting, M = 34.31, 95% CI [30.46, 38.65]; for drunk participants the differences between ratings in dim, M = 51.58, 95% CI [48.04, 55.00], and light, M = 55, 95% CI [51.15, 58.65], settings was less pronounced and in the opposite direction. Figure 1. Graph showing the mean physical attractiveness ratings (and 95% confidence interval) given by participants when sober and drunk, and in dim and bright lighting. A two-way 2 (alcohol: 0 pints or 6 pints) × 2 (lighting: dim vs. bright) repeated-measures ANOVA was conducted on the attractiveness ratings. This revealed a significant main effect of alcohol, F(1, 25) = 68.64, p < .001, ω2 = .34, indicating that attractiveness ratings were significantly higher when participants were sober. There was not a significant main effect of lighting, F(1, 25) = 0.50, p = .484, indicating that attractiveness ratings were similar overall in dim and bright conditions. The alcohol × lighting interaction was significant, F(1, 25) = 8.82, p = .006, ω2 = .21, indicating that difference in attractiveness ratings due to lighting was present in the sober participants but not the drunk ones. (Field, A. (2016). Discovering statistics. London: Sage) |
Mean sales for the organisation’s 30 employees were £46,600. As the mean, median and mode are virtually the same, this suggests these data are normally distributed. Consequently the standard deviation of 18.46 indicates that 95 per cent of sales fell within the range £10,318 to £82,682, the complete range being £68,000. (Saunders & Lewis, 2012, p: 178) |
There is a statistically significant strong positive relationship between the number of enquiries and the number of sales (r =.726, p < 0.001) and a statistically significant but weak to moderate relationship between the number of television advertisements and the number of enquiries (r =.362, p = 0.006). However, there is no statistically significant relationship between the number of television advertisements and the number of sales (r =.204, p = 0.131). (Saunders, Lewis & Thornhill, 2012. p. 522) |
The relationship between perceived control of internal states (as measured by the PCOISS) and perceived stress (as measured by the Perceived Stress Scale) was investigated using Pearson product-moment correlation coefficient. Preliminary analyses were performed to ensure no violation of the assumptions of normality, linearity and homoscedasticity. There was a strong, negative correlation between the two variables, r = -.58, n = 426, p < .0005, with high levels of perceived control associated with lower levels of perceived stress. (Pallant, 2010, p. 135) |
A set of Pearson correlations were computed to determine if there were any significant relationships between a number of employee variables. The correlation between starting and current salary is +.735; this is significant at the .01 level. The null hypothesis can be rejected. Starting salary appears to provide a moderate guide to current salary as it predicts around 54% of current salary level. The remainder of the unexplained variance may involve inter alia qualifications/skills developed over the time period and differential opportunities for promotion. (Burns & Burns, 2008, p. 354) |
A linear regression analysis was conducted to evaluate the prediction of monthly sales value from floor area of a set of 14 branches of a large multiple store. The scattergraph indicates that they are positively and strongly linearly related such that as floor area increases so does monthly sales income, in fact by $1,686 per sq mt. A histogram and residual plots indicate that linear regression assumptions are met. (Burns & Burns, 2008, p. 384) |
Hierarchical multiple regression was used to assess the ability of two control measures (Mastery Scale, Perceived Control of Internal States Scale: PCOISS) to predict levels of stress (Perceived Stress Scale), after controlling forthe influence of social desirability and age. Preliminary analyses were conducted to ensure no violation of the assumptions of normality, linearity, multicollinearity and homoscedasticity. Age and social desirability were entered at Step 1, explaining 6% of the variance in perceived stress. After entry of the Mastery Scale and PCOISS Scale at Step 2 the total variance explained by the model as a whole was 47.4%, F(4, 421) = 94.78, p < .001. The two control measures explained an additional 42% of the variance in stress, after controlling for age and socially desirable responding, R squared change = .42, F change (2, 421) = 166.87, p < .001. In the final model, only the two control measures were statistically significant, with the Mastery Scale recording a higher beta value (beta = -.44, p < .001) than the PCOISS Scale (beta = -.33, p < .001). (Pallant, 2010, p. 167) |
A chi-square goodness-of-fit test indicates there was no significant difference in the proportion of smokers identified in the current sample (19.5%) as compared with the value of 20% that was obtained in a previous nationwide study, χ^{2} (1, n = 436) = .07, p = .79. (Pallant, 2010, p. 216) |
A chi-square test for independence indicated no significant association between gender and smoking status, χ^{2} (1, n = 436) = .34, p =.56, phi = -.03. (Pallant, 2010, p. 222) |
An independent-samples t-test was run to determine if there were differences in engagement to an advertisement between males and females. There were no outliers in the data, as assessed by inspection of a boxplot. Engagement scores for each level of gender were normally distributed, as assessed by Shapiro-Wilks test (p > .05). Homogeneity of variances was violated, as assessed by Levene's Test for Equality of Variances (p = .013), so separate variances and the Welch-Satterthwaite correction were used. The advertisement was more engaging to male viewers (M = 5.56, SD = 0.35) than female viewers (M = 5.30, SD = 0.35), a statistically significant difference, M = 0.26, 95% CI (0.03, 0.48), t(37.998) = 2.325, p = .026. |
An independent-samples t-test was conducted to compare the self-esteem scores for males and females. There was no significant difference in scores for males (M = 34.02, SD = 4.91) and females (M = 33.17, SD = 5.71; t (434) = 1.62, p = .11, two-tailed). The magnitude of the differences in the means (mean difference = .85, 95% 0: -1.80 to 1.87) was very small (eta squared = .006). (Pallant, 2010, p. 243) |
An independent-samples t-test was conducted to evaluate the hypothesis that smokers and non-smokers differ significantly in their self-concept levels. The mean self-concept score of non-smokers (M=46.61, sd = 11.17) was statistically significantly different (t = 21.579, df = 423.3, two-tailed p = .000) from that of smokers (M = 28.28, sd = 6.54). The effect size d = 2.09 implies a very strong effect. (Burns & Burns , 2008. pp. 268-269) |
The Mann-Whitney U-test showed that there was no significant differen in absenecs rates in 2006 between male and female employees (U = 168.0, p = .413). (Burns & Burns, 2008, p. 272) |
A paired samples t test (N = 40) was conducted to evaluate whether there was a significant difference between initial and current salaries. The mean scores between initial and current salaries differed significantly (t = 17.385, df = 39, p < .000) with current salary having a significantly higher mean than the starting salon. The calculated effect size (d) was 2.75, indicating a large effect. (Burns & Burns, 2008, p. 276) |
A one-way analysis of variance indicated that there was a signiﬁcant difference in happiness amongst white people (M = 1.77, SD = .60), black people (M = 1.97, SD = .65) and other races (M = 1.94, SD = .67), F (2, 1501) = 10.23, p < .001. The effect size was small (eta squared = .013). S-N-K post hoc tests showed that white people were signiﬁcantly happier than members of the non-white races (black and other), p < .05, whereas the latter two groups did not differ from each other signiﬁcantly. (Dörnyei, 2007, p. 221) |
A one-way between-groups analysis of variance was conducted to explore the impact of age on levels of optimism, as measured by the Life Orientation Test (LOT). Participants were divided into three groups according to their age (Group 1: 29yrs or less; Group 2: 30 to 44yrs; Group 3: 45yrs and above). There was a statistically significant difference at the p < .05 level in LOT scores for the three age groups: F (2, 432) = 4.6, p = .01. Despite reaching statistical significance, the actual difference in mean scores between the groups was quite small. The effect size, calculated using eta squared, was .02. Post-hoc comparisons using the Tukey HSD test indicated that the mean score for Group 1 (M = 21.36, SD = 4.55) was significantly different from Group 3 (M = 22.96, SD = 4.49). Group 2 (M = 22.10, SD = 4.15) did not differ significantly from either Group 1 or 3. (Pallant, 2010, p. 255) |
On average, participants reported that they took the evaluation process somewhat seriously (M = 6.81, SD = 2.78). However, on the forced choice question, only 20% of participants indicated that they took the evaluation process seriously all the time, and 4% of participants indicated that they never took the evaluation process seriously. The majority of participants (76%) indicated that they sometimes took the process seriously, but that at other times they just bubbled in answers in order to get done quickly. Bassett, J., Cleveland, A., Acorn, D., Nix, M. & Snyder, T. (2015). Are they paying attention? Students’ lack of motivation and attention potentially threaten the utility of course evaluations. Assessment & Evaluation in Higher Education, 42, 431-442. |
The analysis revealed reliably higher percentages of overlap when participants were required to cite three sources (M = 10.26%, SD = 5.66) than when citations were optional (M = 4.76%, SD = 7.30), F(1,85) = 8.35, p < .001, η^{2}.= .17. Contrary to expectation, all three participants who were identified by the researcher as having committed plagiarism were assigned papers that required citations. Finally, there was no significant interaction between warnings and assignment type, F(1,85) = .96, ns. Youmans, R. J. (2011). Does the adoption of plagiarism-detection software in higher education reduce plagiarism? Studies in Higher Education, 36, 749–761. |
There was a significant effect of condition upon self-reported disgust [interpersonal, M= 5.33, SD = 0.44; outgroup, M> = 4.74, SD = 0.91; ingroup, M = 3.26, SD = 1.02, F(2, 42) = 25.09, P < 0.01, η^{2} = 0.54]. As predicted, post hoc Tukey tests revealed that the disgust score was lower in the ingroup condition than in either the outgroup or interpersonal conditions (both p < 0.001) and that there was no significant difference between the interpersonal and outgroup conditions. Reicher, S. D., Templeton, A., Neville, F., Ferrari, L. & Drury, J. (2015). Core disgust is attenuated by ingroup relations. Proceedings of the National Academy of Sciences of the United States of America, 113, 2631-2635. |
Results indicate that laptop use by fellow students was the single most reported distracter (n = 229), accounting for 64% of all responses. This was significantly greater than all other responses combined (n = 130), χ^{2} (1, N = 359) = 29.2, p < .001. Fried, C. B. (2008). In-class laptop use and its effects on student learning. Computers & Education, 50, 906–914. |
See also: Including tables and charts
The average age of participants was ... (SD = ...).
"The average age of participants was 25.5 years (SD = 7.94)."
The age of participants ranged from ... to ... years (M = ..., SD = ...).
"The age of participants ranged from 18 to 70 years (M = 25.5, SD = 7.94). "
Age was non-normally distributed, with skewness of ... (SE = ...) and kurtosis of ... (SE = ...)
"Age was non-normally distributed, with skewness of 1.87 (SE = 0.05) and kurtosis of 3.93 (SE = 0.10)"
Participants were ... and ... aged, ... to ... years.
"Participants were 98 men and 132 women, aged 17 to 25 years (men: M = 19.2, SD = 2.32; women: M = 19.6, SD = 2.54)."
An independent-samples t-test was conducted to compare ...
"An independent-samples t-test was conducted to compare salary in manual and non-manual conditions.”
An independent-samples t-test was run to determine ...
"An independent-samples t-test was run to determine if there were differences in engagement to an advertisement between males and females."
We have carried out an independent-samples t-test to compare ...
"We have carried out an independent-samples t-test to compare the happiness scores for American men and women."
A set of Pearson correlations were computed to determine ...
"A set of Pearson correlations were computed to determine if there were any significant relationships between a number of employee variables."
A paired samples t test (N = ...) was conducted to evaluate ...
"A paired samples t test (N = 40) was conducted to evaluate whether there was a significant difference between initial and current salaries."
A chi-square test was performed ...
"A chi-square test was performed to investigate the relationship between gender and salary."
Data were analysed using a mixed-design ANOVA with a within-subjects factor of ... and a between-subject factor of ...
"Data were analysed using a mixed-design ANOVA with a within-subjects factor of type of work (manual, semi-skilled, skilled, professional) and a between-subject factor of sex (male, female)."
A chi-square test of independence was performed to ...
"A chi-square test of independence was performed to examine the relation between ethnicity and subject interest."
A chi-square test of goodness-of-fit was performed to determine whether ...
"A chi-square test of goodness-of-fit was performed to determine whether the three types of car were equally preferred."
We ran a chi-square test to
"We ran a chi-squared test to examine whether gross national product (GNP) per capita of a country (GNPSPLIT) is related to its level of political freedom."
Correlational analyses were used to ...
"Correlational analyses were used to examine the relationship between the ages of younger and older participants' first memories and their scores on three psychometric measures.”
A Mann-Whitney test indicated that ...
"A Mann-Whitney test indicated that self-rated intelligence was greater for women who were not working (Md = 5) than for women who were uworking (Md = 4), U = 68.5, p = .035, r = .39."
A chi-squared test was performed and ...
"A chi-square test was performed and no relationship was found between gender and the frequency of social talk, χ^{2} (2, N = 170) = 1.10, p =.58."
A paired-samples t-test indicated that ...
"A paired-samples t-test indicated that scores were significantly higher for the salary scale (M = 26.4, SD = 7.41) than for the security scale (M = 18.0, SD = 9.49), t(721) = 23.3, p < .001, d = 0.87."
An independent-samples t-test indicated that ...
"An independent-samples t-test indicated that scores were significantly higher for women (M = 27.0, SD = 7.21) than for men (M = 24.2, SD = 7.69), t(734) = 4.30, p < .001, d = 0.35."
An analysis of variance showed that ...
"An analysis of variance showed that the effect of noise was significant, F(3,27) = 5.94, p = .007."
...were positively correlated.
"Preferences for femininity in male and female faces were positively correlated, Pearson’s r(1282) = .13, p < .001."
...were strongly positively correlated.
"Hours spent studying and GPA were strongly positively correlated, r(123) = .61, p = .011. "
... were moderately negatively correlated.
"Hours spent playing video games and GPA were moderately negatively correlated, r(123) = .32, p = .041."
We failed to find a significant correlation between ...
"We failed to find a significant correlation between their participants’ personality scores at age 14 and their scores on the same items at the age of 77. "
... reported more ... than ....
"Students taking statistics courses in business at the University of Hertforshire reported studying more hours for tests (M = 121, SD = 14.2) than did UH students in in general, t(33) = 2.10, p = .034."
... a preference for ... over ....
"Results indicate a significant preference for cod and chips (M = 3.45, SD = 1.11) over haddock and chips (M = 3.00, SD = .80), t(15) = 4.00, p = .001."
For most research, a significance level of .05 is appropriate and is generally defined as being statistically significant. The .01 level is used in situations where you want to make a stong demonstration of treatment effect and is generally decribed as being highly significant (Gravetter & Wallnau, 1996, p. 243).
"This difference was significant, t(18) = -3.00, p < .05, two-tailed."
"S-N-K post hoc tests showed that white people were signiﬁcantly happier than members of the non-white races (black and other), p < .05, whereas the latter two groups did not differ from each other signiﬁcantly."
"Post-hoc comparisons using the Tukey HSD test indicated that the mean score for Group 1 (M = 21.36, SD = 4.55) was significantly different from Group 3 (M = 22.96, SD = 4.49). Group 2 (M = 22.10, SD = 4.15) did not differ significantly from either Group 1 or 3."
"Results indicate a significant preference for cod and chips (M = 3.45, SD = 1.11) over haddock and chips (M = 3.00, SD = .80), t(15) = 4.00, p = .001."
"All effects were statistically significant at the .05 significance level."
"With an alpha level of .05, the effect of age was statistically significant, F(1, 123) = 7.27, p = .008."
"The main effect of touch was non-significant, F(1, 108) = 2.24, p > .05. However, the interaction effect was significant, F(1, 108) = 5.55, p < .05."
"There was a significant difference in the scores for degree (M = 4.2, SD = 1.3) and no degree (M = 2.2, SD=0.84) conditions; t (8)=2.89, p = 0.020.”
"Finally, there was no significant interaction between warnings and assignment type, F(1,85) = .96, ns."
"We found a highly significant association between schizophrenia and a COMT haplotype (p = 9.5×10−8)".
"However, there is no statistically significant relationship between the number of television advertisements and the number of sales (r =.204, p = 0.131)."
"There was not a significant main effect of lighting, F(1, 25) = 0.50, p = .484, indicating that attractiveness ratings were similar overall in dim and bright conditions."
"This is not significant and indicates a random relationship. "
"The interaction effect was non-significant, F(1, 24) = 1.22, p > .05."
"The main effect of touch was non-significant, F(1, 108) = 2.24, p > .05. However, the interaction effect was significant, F(1, 108) = 5.55, p < .05."
"S-N-K post hoc tests showed that white people were signiﬁcantly happier than members of the non-white races (black and other), p < .05, whereas the latter two groups did not differ from each other signiﬁcantly."
"The effect of age was not statistically significant, F(1, 123) = 2.45, p = .12."
“These results suggest that salary really does have an effect on creativity at work. Specifically, our results suggest that when humans have a higher salary, they are more creative.”
"This suggests that smarter individuals have earlier first memories."
"The study showed that white people were signiﬁcantly happier than members of the non-white races (black and other)."
"There is no evidence to suggest that absence is more frequent at one age rather than another.