🥸 Testing Hypothesis

Hypothesis, Types, Important Terms

  • The estimate based on sample values do not equal to the true value in the population due to inherent variation in the population.
  • The samples drawn will have different estimates compared to the true value. It has to be verified that whether the difference between the sample estimate and the population value is due to sampling fluctuation or real difference.
  • If the difference is due to sampling fluctuation only it can be safely said that the sample belongs to the population under question and if the difference is real, we have every reason to believe that sample may not belong to the population under question.
  • The following are a few technical terms in this context.

Hypothesis

  • The assumption made about any unknown characteristics is called hypothesis.
  • It may or may not be true.
  • Ex:
    • μ = 2.3; μ be the population mean
    • σ = 2.1; σ be the population standard deviation
    • Population follows Normal Distribution.
    • There are two types of hypothesis, namely null hypothesis and alternative hypothesis.

Null Hypothesis

  • Null hypothesis is the statement about the parameters. Such a hypothesis, which is usually a hypothesis of no difference is called null hypothesis and is usually denoted by H0. or
  • Any statistical hypothesis under test is called null hypothesis. It is denoted by H0.

Ex.

  • H0: μ = μ0
  • H0: μ1 = μ2

Alternative Hypothesis

  • Any hypothesis, which is complementary to the null hypothesis, is called an alternative hypothesis, usually denoted by H1.

Ex:

  • H1: μ # μ0
  • H1: μ1 # μ1

Parameter

  • A characteristics of population values is known as parameter. For example, population mean (μ) and population variance (σ2).
  • In practice, if parameter values are not known and the estimates based on the sample values are generally used.

Statistic

  • A characteristics of sample values is called a statistic. For example, sample mean (x̄), sample variance (s2) where,

and s2

Sampling Distributions

  • The distribution of a statistic computed from all possible samples is known as sampling distribution of that statistic.

Standard Error

  • The standard deviation of the sampling distribution of a statistic is known as its standard error, abbreviated as S.E.

S.E. (x̄) = σ/√n

  • Where, σ = population standard deviation and n = sample size

Sample

  • A finite subset of statistical objects in a population is called a sample and the number of objects in a sample is called the sample size.

Population

  • In a statistical investigation the interest usually lies in the assessment of the general magnitude and the study of variation with respect to one or more characteristics relating to objects belonging to a group. This group of objects under study is called population or universe.

Random sampling

  • If the sampling units in a population are drawn independently with equal chance, to be included in the sample then the sampling will be called random sampling. It is also referred as simple random sampling and denoted as SRS.
  • Thus, if the population consists of “N” units the chance of selecting any unit is 1/N.
  • A theoretical definition of SRS is as follows:

Suppose we draw a sample of size “n” from a population size N; then there are (Nn) possible samples of size “n”. If all possible samples have an equal chance, 1/(Nn) of being drawn, then the sampling is said to be simple random sampling.

Simple Hypothesis

  • A hypothesis is said to be simple if it completely specifies the distribution of the population.
  • For instance, in case of normal population with mean μ and standard deviation σ, a simple null hypothesis is of the form H0: μ = μ0, σ is known, knowledge about μ would be enough to understand the entire distribution.
  • For such a test, the probability of committing the type-1 error is expressed as exactly α.

Composite Hypothesis

  • If the hypothesis does not specify the distribution of the population completely, it is said to be a composite hypothesis.
  • Following are some examples:
    • H0 : μ ≤ μ0 and σ is known
    • H0 : μ ≥ μ0 and σ is known
  • All these are composite because none of them specifies the distribution completely.
  • Hence, for such a test the LOS is specified not as α but as ‘at most α’.

Types of Errors

  • In testing of statistical hypothesis there are four possible types of decisions
    • Rejecting H0 when H0 is true
    • Rejecting H0 when H0 is false
    • Accepting H0 when H0 is true
    • Accepting H0 when H0 is false
  • 1st and 4th possibilities leads to error decisions.
  • Statistician gives specific names to these concepts namely Type-I error and Type-II error respectively.
  • The above decisions can be arranged in the following table:

Type 1st Error

  • Probabilities of type-I denoted by Alfa (α).
  • Rejecting H0 when it is true or accepting H1 when it is false.

Type 2nd Error

  • Probabilities of type-II denoted by Beta (β).
  • Accepting H0 to when it is false, rejecting H1 when it is true.
  • It is more severe than type I error.

Degrees of Freedom

  • In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.
  • The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom.
  • It is defined as the difference between the total number of items and the total number of constraints.
  • If ‘n’ is the total number of items and ‘k’ the total number of constraints then the degrees of freedom (d.f.) is given by

    d.f. = n - k

Level of Significance (LOS)

  • The maximum probability of committing Type I Error is known as level of significance denoted by Alfa.
  • Generally, we take 5% (field ex.) or 1 % level of significance.
  • The Level of significance is always fixed in advance before collecting the sample information. LOS 5% means the results obtained will be true is 95% out of 100 cases and the results may be wrong is 5 out of 100 cases.

Critical Value

  • While testing for the difference between the means of two populations, our concern is whether the observed difference is too large to believe that it has occurred just by chance.
  • But then the question is how much difference should be treated as too large? Based on sampling distribution of the means, it is possible to define a cut-off or threshold value such that if the difference exceeds this value, we say that it is not an occurrence by chance and hence there is sufficient evidence to claim that the means are different. Such a value is called the critical value and it is based on the level of significance.

Steps involved in test of hypothesis

  • The null and alternative hypothesis will be formulated
  • Test statistic will be constructed
  • Level of Significance will be fixed
  • The table (critical) values will be found out from the tables for a given level of significance. The null hypothesis will be rejected at the given level of significance if the value of test statistic is greater than or equal to the critical value.
  • Otherwise null hypothesis will be accepted.
  • In the case of rejection the variation in the estimates will be called “significant” variation. In the case of acceptance the variation in the estimates will be called “not-significant”.

Confidence limit

  • Tiny range within which the true populations mean lies is called confidence limit or fiduciary limit.

Questions? Let's chat

Open Discord