IB Maths AI – Chapter 4 – Statistics and Probability – Correlation Coefficients + Hypothesis Testing (using the amazing and life-changing GDC)

Table of Contents

Introduction

Chapter 4 in the IB Maths AI curriculum is one of the topics that I noticed many people struggled with. I could see why, considering all of its many different rules and GDC algorithms. Fortunately, I think that I am quite OK with this topic, so I am going to detail the rules of the different formulae and tests, and how to find the answers. I will be starting with Correlation Coefficients and Hypothesis Testing.

Here is a link to the IB Maths AI Formula Booklet: https://www.rcboe.org/cms/lib/GA01903614/Centricity/Domain/1351/Formula%20Book.pdf

Correlation Coefficients

Pearson’s r Correlation Coefficient

The Pearson Correlation Coefficient is used to find the strength of the correlation between 2 variables. For example, finding the correlation strength between 2 people and their reaction times.

The r-value always lies in the range:

where:

  • r = 1, perfect positive linear correlation
  • r = -1, perfect negative linear correlation
  • r = 0, no linear correlation

Essentially, the closer to 1, the stronger the correlation, and the closer to 0, the weaker the correlation.

Now, how to find the r-value? You could calculate manually, but your life would be made so much easier with a Graphing Digital Calculator (GDC). In this case, I will be using the Ti-nspire CX II.

Here are the steps to finding the r-value:

  1. Open up a spreadsheet and enter the values provided in the question. Don’t forget to label each column. (Figure 1)
  2. Once done, put the cursor on an empty column and conduct a Linear Regression (mx+b) test. To do this, click MENU – Statistics – Stat Calculations – Linear Regression (mx+b). SHORTCUT: [MENU – 4 – 1 – 3]
  3. From here, input the label of your columns into the X and Y list boxes. There is no need to input anything else in other boxes. (Figure 2)
  4. The results will be outputted, and the r-value is given. (Figure 3)

Figure 1

Figure 2

Figure 3

Sometimes, the question will as you to plot in a linear regression graph/diagram. Here is how that would look like:

  1. Go to the ‘Data and Statistics’ feature of the GDC.
  2. Take the X and Y list from before and input it into the X and Y axes of the graph. (Figure 4)
  3. Display the linear regression line. To do this, click MENU – Regression – Show Linear (mx+b) SHORTCUT: [MENU – 4 – 6 – 1] (Figure 5)

Figure 4

Figure 5

Spearman’s Rank Correlation Coefficient

The Spearman’s Coefficient is relatively similar to the Pearson’s Coefficient, but before the r-value can be found, the data in the columns have to be ranked first. In terms of the r-value, it also follows the same rules as the Pearson’s Coefficient.

To find the rs value, the data will be ranked in descending order (The largest value will be rank 1).

Let’s follow a worked example:

BoysGirls
4520
10746
38120
5835
5935
Consider this spread of data. We would rank them starting with the largest value:
Ranked BoysRanked Girls
21
52
15
33.5
43.5
Since the Girls had 2 values that were the same, the Rank Placement these 2 values are in (in this case, 3 and 4), will be added together and divided by 2. This will give the answer 3.5, and this value will replace rank 3 and rank 4.

From here, a Linear Regression (mx + b) test can be conducted to find the rs value. This is done in the same fashion as the Pearson’s test, just with the ranked data. I will put the shortcut here as a reminder: [MENU – 4 – 1 – 3]

In this question, the rs value is:

rs = 0.359


Hypothesis Testing

Common Attributes

Before getting into the different hypothesis tests, I would like to mention that all of these methods use the common value, p. This p-value will be used to determine the outcome of the test. Additionally, all of these hypothesis tests involve a Significance Level. In most cases, the p-value will be compared with the significance level to determine the outcome of the test. Most of the time, the question will provide a significance level, but if it not, a commonly observed significance level is 5%, or 0.05 in decimal.

Chi-Squared Test for Independence

The Chi-Squared Independence Test is used to suggest an association between 2 variables, and whether these 2 variables are dependent or independent of each other. For example, an example question would be most people who study physics also study higher-level mathematics.

The Chi-Squared Independence Test follows this rule:

p-value < significance level, reject null hypothesis

Following this, here is a rough idea how to carry out this test:

  1. Identify the null and alternative hypotheses, H0 and H1.
  2. Open the Calculator function of the GDC.
  3. Plot the raw data from the question into a matrix, EXCLUDING THE TOTAL VALUES Menu – Matrix & Vector – Create – Matrix
  4. Initialise the matrix by labelling it ‘observed’ [ctrl – var – type observed]
  5. Conduct a Chi-Squared Test for Independence. To do this, click Menu – Statistics – Stat Tests – X2 2-way Test

Let’s follow through a sample question to understand how this hypothesis test works.

In most occasions, the question would provide the data in a table:

PastaFishShrimpTotal
Adults24253281
Children20143569
Total443967150
The significance level is 10%.

From here, we can determine the null and alternative hypotheses of this test:

H0 = The customer age and chosen meal is independent

H1 = The customer age and chosen meal is not independent/dependent

From here, we can add the data into a matrix. SHORTCUT: [MENU – 7 – 1 – 1] It also been initialised as ‘obverved’ [ctrl – var – type observed].

From here, the test can be done. SHORTCUT: [MENU – 6 – 7 – 8]. Choose the observed matrix, and click enter:

Conclusion: Since p-value > significance level, 0.264 > 0.100, we have enough evidence to accept the null hypothesis, meaning the customer age and their chosen meal is independent.


Chi-squared Goodness of Fit (GOF) Test

The Chi-Squared GOF Test is used to determine whether data fits a particular distribution.

Here is a rough idea on how to conduct a Chi-Squared GOF test:

  1. Identify the null and alternative hypotheses, H0 and H1.
  2. Input data into the Spreadsheet function. There will be 2 columns: Observed and Expected
  3. The observed values will be provided in the question. Sometimes, the expected values will not be provided. This can be calculated by totalling the observed values, and dividing by the number of columns.
  4. Determine the degrees of freedom (df) [number of columns – 1 = df]
  5. Conduct a Chi-Squared GOF test. To do this, click Menu – Statistics – Stat Tests – X2 GOF Test.
  6. Compare the p-value with the significance level.

Let’s follow through an example to better understand how to conduct a GOF test.

The data will be arranged in a table, similar to the other Hypothesis Tests:

DayMondayTuesdayWednesdayThursdayFriday
Number of copies sold74979186112
The significance level is 5%.

Firstly, let’s determine the null and alternate hypotheses:

H0 = The same number of copies will be sold each day.

H1 = A different number of copies will be sold each day

Alternatively, you can express this in numerals:

H0 = P1 = 20%, P2 = 20%, P3 = 20%, …

H1 = P1 not equal to 20%, …

Then, we can determine the Degrees of Freedom (df):

df = 5 – 1 = 4

From here, we can input the data into a Spreadsheet. The expected values will be:

74 + 97 + 91 + 86 + 112 = 460

460 / 5 = 92

Here is what that looks like:

From here, we can conduct the Chi-Squared GOF Test. SHORTCUT: [MENU – 4 – 4 – 7]. Input the observed list, expected list, and degrees of freedom:

From here, we can make a conclusion:

Since p-value > significance level, 0.07 > 0.05, we have enough evidence to accept the null hypothesis, meaning the same number of newspapers will be sold each day.


Two Sample t-test

The Two-Sample t-test is used when comparing 2 means of 2 sets of data. There are other types of t-tests, but most IB questions use the Two-Sample t-test, so I will be focusing on that.

Here is a brief rundown on how to conduct a Two-Sample t-test:

  1. Identify the null and alternative hypotheses, H0 and H1.
  2. Input the given data into the Spreadsheet function.
  3. Conduct a Two-Sample t-test. To do this click Menu – Statistics – Stat Tests – 2-Sample t Test.
  4. Input the 1st and 2nd lists, change the Alternate Hypothesis to what the question dictates, and click ‘yes’ for the pooled column.
  5. Compare the p-value and significance level.

Let’s follow through a sample question to better understand how the Two-Sample t-test works:

Weight of chinchilla rabbits, kg4.94.24.14.44.34.64.04.74.54.4
Weight of sable rabbits, kg4.24.14.14.24.54.44.53.94.24.0
The significance level is 5%.

Firstly, let’s determine the null and alternate hypotheses:

H0 = Chinchilla rabbits have the same weight as sable rabbits

H1 = Chinchilla rabbits have a greater weight than sable rabbits

Then, we can input the data in a spreadsheet:

From here, we can conduct the Two-Sample t-test. SHORTCUT: [MENU – 4 – 4 – 4]. In the case of the question, the alternate hypothesis will be U0 > U1 . The results are as follows:

From here, we can make a conclusion:

Since p-value < significance level, 0.04 < 0.05, we have enough evidence to reject the null hypothesis, meaning that the weight of chinchilla rabbits is greater than sable rabbits.


Conclusion

I hope this small guide to the correlation coefficients and hypothesis testing will help you, whether you are reviewing or are re-learning the topics. I used actual IB sample questions to better angle my blog towards us IB students, so I hope that helps. Anyways, thank you for reading!

For other review material for other IB subjects, please check out Prodat Blog, where we have review material for Economics, Business Management, and Psychology: https://prodatblog.org/ib-revision-guides/

  • M
Share the Post:

Related Posts

en_USEnglish