First steps into forcing

William Gaudelier

Forcing is a technique invented by Paul Cohen in 1963, first used in set theory to prove that the continuum hypothesis is independentIt cannot be proved nor disproved, using the axioms of set theory from the axioms of set theory. It requires quite a bit of machinery, thus the aim of this text is to provide, for mathematicians or uni students, an entry point into forcing with almost no prior knowledge in logic needed. A lot of things are not proved or explicited, but I tried my best to outline what was going on and to justify the definitions. So it should be easier for you to dive into more detailed books (references are given at the end).

Table of content

An informal recap on set theory and (first-order) logic

Set theory is often described as a rigorous foundation for mathematics. But in my opinion (and others) it is best to see it as a theory which studies the notion of infinity. To do so it uses objects called sets. What these sets can do is described by a list of axioms based on our intuition of how they should behave, this list is called ZFC (Zermelo-Fraenkel + the axiom of Choice). A mathematical structure satisfying these axioms is called a model of set theory ; just like a structure can satisfy the axioms that define a group.

The definitions of models, true statements and proofs are all made precise by the formalism of first-order logic. The latter also provides powerful theorems and tools, for example:

A fundamental question quickly arised in set theory when Georg Cantor proved that the infinity of the natural numbers (written \(\aleph_0\)) was strictly smaller that the infinity of the reals (called continuum, written \(2^{\aleph_0}\)): is there an in-between? The continuum hypothesis (CH), which Cantor believed was true, is the statement answering "no" to this question ; it can be written as \(2^{\aleph_0}=\aleph_1\), where \(\aleph_1\) is the first uncountable cardinal (and \(\aleph_2\) is the first strictly bigger than \(\aleph_1\), and so on...). But despite its simple appearance no one managed to prove or disprove it.

Skolem's paradox and absoluteness

The last assertion I made about the Lowenheim-Skolem theorem might have shocked you. Indeed this would imply, if ZFC does have a model, that it also has a countable model. But there are uncountable sets like \(\mathcal{P}(\mathbb{N})\) so how could that be? Recall that a set \(A\) is uncountable if there is no bijection between \(\mathbb{N}\) and \(A\). Now actually have to distinguish two things: the point of view of the model, and the point of view from outside of it. Outside the model we can see that it is countable, but inside of it we can't. Because even if you can see that \(A\) is countable from the exterior, there is in fact no bijection between \(A\) and \(\mathbb{N}\) inside of the model. The same sort of reasoning applies for similar situations involving the powerset for example. Do not hesitate to pause and ponder over all this as it is quite mind-boggling, you can also search online to have explanations from other people.

In model theory, different models of a same theory may not always agree on what is true and what is not, this applies to ZFC as well. For example consider the structures \(\langle\{-1, 1\}, \times\rangle\) and \(\langle\mathbb{Q}, \times\rangle\) both satisfying the axioms of groups. For the former, the sentence \(\forall x, x^2=1\) is true, but not for the latter. Formulas that are true (or false) in all the different models are called absolute. Absoluteness is important to point out when you need to be formal, but in this text I will try not to get into that to make things a bit simpler.

The idea of forcing

Around 1940, Gödel proved (via inner models) that if ZFC is consistent then so is ZFC + CH. In other words, one cannot disprove CH from ZFC, because otherwise this would mean that ZFC + CH proves both CH and ¬CH (the negation of CH), and thus it would not be consistent. So two possibilities remain, and so CH would be independant from ZFC. For the second option we need to prove that if ZFC is consistent then so is ZFC + ¬CH. Thanks to the completeness theorem, we can equivalently show that if there is a model of ZFC then there is a model of ZFC + ¬CH.

So suppose we have a model \(M\) of ZFC, what should we do? Refine Gödel's method maybe? Unfortunately I've been told it cannot work... But in that case maybe we can have more luck with extensions. An analogy of the situation can be made with field extensions. If \(K\) is a field in which a root \(\alpha\) of a polynomial \(P\) is missing, we can construct an extension \(K(\alpha)\) of \(K\) in which \(P(\alpha)=0\) this time. This construction can be controlled "from the ground" by considering \(K[X]\) and then quotienting. The trouble is, the axioms of ZFC are much more complicated than the ones for fields, a substantial part of mathematics can be carried out with them after all. So it is not possible to just adjoin an arbitrary set and expect to have another model of ZFC every time.

But in fact, and surprisingly, you can find sets \(G\) such that \(M[G]\) remains a model of ZFC and such that the properties of \(M[G]\) have been controlled from the ground model \(M\)! This is what the forcing technique manages to do, by using a set (in the ground model) of approximations of the desired object, and a relation \(\Vdash\), called the forcing relation, relating the truth value of statements in \(M[G]\) with these approximations. To set up all of this, we are going to suppose that \(M\) is a countable transitive model (that's possible thanks to techniques like Lowenheim-Skolem and Mostowki collapse) ; we will see what this means and explain why later.

An introductory example: Cantor's diagonal argument

The proof of this theorem can be seen as a simple application of forcing genericity, we will see how later, but first let's prove it.

There are more real numbers (represented as \(2^\mathbb{N}\)Because \(\mathbb{R}\) is in bijection with \(]0, 1[\), and members of \(]0, 1[\) can be written in binary, e.g. \(0.010101\ldots\)) than natural ones, i.e. there is no surjection from \(\mathbb{N}\) to \(2^\mathbb{N}\).

By contradiction suppose \(F:\mathbb{N}\to2^\mathbb{N}\) is such a surjection. To establish a contradiction, we want to show \(F\) is actually not surjective by finding a function \(g\in2^\mathbb{N}\) which is not listed by \(F\).

To do so we can try to construct one. And make sure it is different from all the functions listed by \(F\). Though since this seems like a cumbersome task, we would rather proceed one function at time, by making sure the following requirement is satisfied for all \(n\) : \[\mathcal{R}_n := g\neq F(n)\] This can be explicited a tad further: \(\mathcal{R}_n := \exists x, g(x)\neq F(n)(x)\). And to make sure each of them is fulfilled, we are going to define \(g\) one value at a time via induction, ensuring \(\mathcal{R}_n\) is fulfilled at step \(n\).

We start with the empty function for initialisation (step -1). And for heredity suppose \(g\) is defined on \(\{0, \ldots, n-1\}\) ; we wish to extend \(g\) so it is also defined on \(n\) and fulfils \(\mathcal{R}_n\). Maybe \(g\) is already different (on its domain) from \(F(n)\), but maybe it is not. In all cases we can just define: \[g(n):=\begin{cases}0&\text{ if }F(n)(n)=1\\ 1&\text{ otherwise}\end{cases}\] Or in a more concise manner: \(g(n):=1-F(n)(n)\).

Before proceeding, here's another way to look at the induction: say \(g_n\) corresponds to \(g\) as it is defined at the \(n\)-th step of the induction, we shall call it an approximation. We know that it has a finite domain \(\{0, \ldots, n\}\) and also that \(g_{n+1}\) is an extension of \(g_n\). By using these two facts, the unionIn set theory a function \(f\) is represented as the set of pairs \((x, y)\) where \(x\) is in the domain of \(f\), and \(y\) is its unique image \(\bigcup g_n\) is a well-defined function from \(\mathbb{N}\) to \(\{0, 1\}\), and it actually is the definition \(g\). And since \(g_n\) satisfies the \(n\)-th first requirements, we have that \(g\) satisfies all of them.

Analysing the proof

To see how this relates to forcing we are going to analyse the proof a bit to unveil some definitions. We can distinguish three types of objects here:

  1. The "limit objects" we wish to construct, in our case elements of \(\mathbb{N}\to\{0, 1\}\) such as \(g\)
  2. Their approximations, such as the functions \(g_n\)
  3. And the formulas we want to satisfy, such as the requirements \(\mathcal{R}_n\)

Formulas and limit objects are going to be "replaced" by sets of approximations later on. Because we will have to manipulate them in the ground model \(M\) to construct the extension. It is good to keep in mind what they represent though.

Forcing notion

The finite approximations have a certain structure ; some can be extensions of others like \(0, 1\mapsto 0\) and \(0\mapsto 0\), but others can also be incompatibles like \(0\mapsto 0\) and \(0\mapsto 1\). They form a partial order\(f\leqslant g\) iff \(\operatorname{dom}f\subseteq\operatorname{dom}g\) and \(g_{\upharpoonright\operatorname{dom}f}=f\) and moreover this order contains a smallest element\(m\) is a smallest element of a set \(S\) iff \(\forall s\in S, m\leqslant s\): the empty function. All this motivates our first definition.

A forcing notion is a partial order \(\langle\mathbb{P}, \leqslant\rangle\) (actually it can just be a preorderA preorder is a relation which is reflexive and transitive (so not necessarily antisymmetric)) containing a smallest element \(\varnothing_\mathbb{P}\). Its elements are called conditions.
In several textbooks the order is reversed, the reason being that if \(p\) brings more information than \(q\) it means there are fewer limit objects extending \(p\) rather than \(q\), and consequently a biggest element is required instead of smallest.

In our case we had the forcing notion \(\{f:\mathbb{N}\to\{0, 1\}\mid\operatorname{dom}f\text{ is finite}\}\), it is called Cohen forcing.

Ideals

As we saw earlier, to construct \(g\) from the approximations \(g_n\), it was crucial for them to be two-by-two compatible. To be more precise, we needed to have a common extension for any couple of approximations. Moreover, to avoid ending up with a trivial limit object the sequence cannot be empty. This leads us to our new representation of limit objects:

A subset \(G\) of a forcing notion \(\mathbb{P}\) is an ideal if:
  1. \(G\neq\varnothing\)
  2. It is directed : if \(p, q\in G\) then there exist \(r\in G\) such that \(r\geqslant p, q\)
  3. It is downward closed : if \(p\in G\) and \(p\geqslant q\) then \(q\in G\)
We can now say "\(p\) in an approximation of \(G\)" simply by writing \(p\in G\).

Density

Our induction worked because at each step we had the possibility to extend \(g_n\), who did not necessarily satisfy \(\mathcal{R}_{n+1}\), into a function satisfying it. Requirements possessing this property are the ones we are interested in, due to their flexibility, as we can satisfy them at any time during a construction.

Now a requirement \(\mathcal{R}\) is represented by the subset \(R:=\{p\in\mathbb{P}\mid\mathcal{R}\text{ is satisfied by }p\}\subseteq\mathbb{P}\), this way any subset of \(\mathbb{P}\) represent a requirement. Notice how a condition or an ideal satisfying \(\mathcal{R}\) can now be simply written \(p\in R\) and \(G\cap R\neq\varnothing\). This leads us to:
A set \(D\subseteq\mathbb{P}\) is dense iff for any \(p\in\mathbb{P}\) there is a \(q\in D\) extending \(p\).
Furthermore, to formalise the fact of satisfying all given requirements for a limit object, we have the following:
Let \(\mathbb{P}\) be a forcing notion, and \(\mathcal{D}:=(\mathcal{D}_i)_{i\in I}\) be an indexed family of dense sets. An ideal \(G\) is \(\mathcal{D}\)-generic if it intersects all the dense sets, i.e. \(\forall D\in\mathcal{D}, G\cap D\neq\varnothing\)
We also say that \(G\) is \(\mathbb{P}\)-generic over \(M\) when it is generic for all the \(\mathbb{P}\)-dense sets in \(M\).

Why this name generic? When studying the sets that made adjunction to the ground model impossible, Cohen realised that they all had specific features. So he had the idea that adjoining a set with no particularities could make it work. Here is how he puts it:

"Now one can trivially adjoin an element already in M. To test the intuition, one should try to adjoin to M an element which enjoys no “specific” property to M, i.e., something akin to a variable adjunction to a field. I called such an element a “generic” element. Now the problem is to make precise this notion of a generic element." (exctract of The origin of forcing)

With that in mind we can see how being \(\mathbb{P}\)-generic over \(M\) captures this idea, as it intuitively means "possessing all the properties you can have in \(M\)".

With all those definitions behind us, we can now state a simple but powerful lemma.
: Existence lemma
Let \(\mathbb{P}\) be a forcing notion, \(\mathcal{D}:=(D_n)_{n\in\mathbb{N}}\) a countable collection of dense sets, and \(p\) a condition. There exists a \(\mathcal{D}\)-generic ideal \(G\).
We proceed by induction, starting with \(p_{-1}:=p\) and finding \(p_n\in D_n\) such that \(p_n\geqslant p_{n-1}\) by density of \(D_n\). Though the set \(\{p_{-1}, p_0, \ldots\}\) is not necessarily downward-closed, so we must be careful and define \(G:=\{q\in\mathbb{P}\mid \exists n, q\leqslant p_n\}\) as our \(\mathcal{D}\)-generic ideal.

Cantor's diagonal argument revisited

We can now have an alternative proof of the result using all the forcing paraphernalia we developped so far:

Let \(\mathbb{P}\) be Cohen forcing, and let \[ D_n := \{p\in\mathbb{P}\mid \exists x, p(x)\neq F(n)(x)\}\\ E_m := \{p\in\mathbb{P}\mid m\in\operatorname{dom}p\}\] These sets are dense. Indeed for any \(D_n\) and \(p\in\mathbb{P}\), since \(p\) has finite domain, we can find \(x\notin\operatorname{dom}p\) and then have the extension \(q:=p\cup\{(x, 1-F(n)(x))\}\in D_n\). And for any \(E_m\) and \(p\in\mathbb{P}\), either \(m\) in already in \(\operatorname{dom}p\), or it is not and thus we can define the extension \(q:=p\cup\{(m, 0)\}\in E_m\).

Thus we can use the existence lemma to get the \(D\cup E\)-generic function \(g:\mathbb{N}\to\{0, 1\}\). By \(D\) genericity it is not in the range of \(F\), and by \(E\) genericity its domain is total.

Names

Due to the axioms of ZFC, adding \(G\) to our ground model, will also add all the sets that can be built using it. So we need to be able to know who those "stowaway" sets are, as we are building \(G\) with approximations, in order to influence its construction, and thus get \(M[G]\) to be a model of ZFC + our desired property. To achieve all this, while remaining in our ground model \(M\), a bit of work is needed, and this is what names are for.

Names are basically recipes or blueprints using an unknown ideal to define who they are. You can also keep in mind the analogy with polynomials, for whom the value is only determined after the value of the unknowns are given. For example: \[A:=\{n\in\mathbb{N}\mid 2n\in G\}\quad\text{or}\quad B:=\begin{cases}G&\text{ if }\varnothing\in G\\\varnothing&\text{otherwise}\end{cases}\] Notice how you can describe these sets without knowing what \(G\) contains. You don't know their actual value until you know \(G\). Besides, you can use previously defined recipes to define new ones. Again, much like polynomials which you can add or multiply together to get a new one. For example: if \(A\) and \(B\) are defined recipes then you can define \(C:=\{A, B\}\)

So a name is a set which contains potential elements, the ones we end up keeping will depend on the ideal we have. Thus we need a way to keep track of that, for each potential element. A simple solution would be to have some sort of labeling, where labels represent ideals. Though we can't use them directly, because we ultimately will use generic ideals, which are not in our ground model (indeed if \(G\in M\) then so is \(\mathbb{P}-G\), and you can check it is dense, so it intersects \(G\)... a contradiction) But there's no problem in using their approximations, which leads us to the following definition:

A \(\mathbb{P}\)-name is a set of the form: \[\{(\sigma, p)\mid \sigma\text{ is a }\mathbb{P}\text{-name}, p\in\mathbb{P}\}\]
The definition of the evaluation of names follows quite simply:
Let \(\mathbb{P}\) be a forcing notion, \(G\) an ideal, and \(\tau\) a \(\mathbb{P}\)-name. The evaluation of a \(\tau\) is the set: \[\tau[G]:=\{\sigma[G]\mid\exists p\in G, (\sigma, p)\in\tau\}\]
From there the new model can now be defined \[M[G]:=\{\tau[G]\mid\tau\text{ is a }\mathbb{P}\text{-name}\}\] And at that point we could check that when given a generic ideal \(G\), \(M[G]\) is the smallest transitive model of ZFC containing \(M\cup\{G\}\).

The forcing relation

When we proved Cantor's theorem, there was no problem to establish that the limit object would still satisfy the same requirements as its approximations. This is because we could prove that all the limit objects you could get starting from a certain approximationall the limit objects you could construct from \(p\) would satisfy the same requirements. But in more general situations things could be more complicated, this is why we need the forcing relation. Its definition corresponds to what we just describe:

Let \(\mathbb{P}\) be a forcing notion. Given a formula \(\varphi\), a condition \(p\) and \(\mathbb{P}\)-names \(\tau_0, \ldots, \tau_n\) we say that "\(p\) forces \(\varphi(\tau_0, \ldots, \tau_n)\)" and write \[p\Vdash\varphi(\tau_0, \ldots, \tau_n)\] iff for all \(\mathbb{P}\)-generic ideal \(G\) such that \(p\in G\), we have that \(M[G]\models\varphi(\tau_0[G], \ldots, \tau_n[G])\).
\(M\models\varphi\) means that the structure \(M\) satisfies the formula \(\varphi\).

This definition is great but it depends on all possible \(G\), and such generic set are not in \(M\). This means that in \(M\) we cannot know if \(p\) forces \(\varphi\) or not.

To address this issue the idea consists of defining a syntactic equivalent of the forcing relation, i.e. we define a relation \(p\Vdash^*\varphi\) by induction on \(\varphi\)In first-order logic, formulas (or statements) are defined by an inductive process, so by following the same process we can define things that apply for all formulas.. We then can verify the following:
: Truth lemma \[ M[G]\models\varphi(\tau_0[G], \ldots, \tau_n[G]) \iff \exists p\in G, (p\Vdash^*\varphi(\tau_0, \ldots, \tau_n))^M \] (where the \(M\) in exponent basically means the formula is seen from the point of view of \(M\))
This equivalence means that the formulas satisfied by the structure \(M[G]\) are precisely the ones forced by this new relation (in \(M\)). This is exactly what we needed to control \(M[G]\) from \(M\). Finally we can prove the main theorem of forcing that proves the equivalence between \(\Vdash\) and \(\Vdash^*\)
: Main theorem of forcing \[p\Vdash\varphi(\tau_0, \ldots, \tau_n)\iff(p\Vdash^*\varphi(\tau_0, \ldots, \tau_n))^M\]
This means that we can use the property of the truth lemma, without having to tediously work with the syntactic relation \(\Vdash^*\).

Forcing ¬CH

The basic idea to tackle the problem is add \(\aleph_2\)-many different elements of \(2^\mathbb{N}\) to our ground model \(M\). Consequently, using the truth lemma, we will have \(|2^\mathbb{N}|\geqslant\aleph_2>\aleph_1\) in \(M[G]\) (which is a model of ZFC since \(G\) is generic), hence refuting CH.

To do so we are going to use the existence lemma on Cohen's forcing \(\mathbb{P}:=\{f:\aleph_2^M\times\mathbb{N}\to\{0, 1\}\mid\operatorname{dom}f\text{ is finite }\}\) to find an \(M\)-generic ideal \(G\), this is possible because \(M\) is countable. Hence \(\bigcup G\) is a total function encoding \(\aleph_2^M\) different elements of \(2^\mathbb{N}\), because the following sets are dense: \[ D_{\alpha, \beta}:=\{p\in\mathbb{P}\mid\exists n\in\mathbb{N}, p(\alpha, n)\neq p(\beta, n)\}\\ E_{\alpha, n}:=\{p\in\mathbb{P}\mid (\alpha, n)\in\operatorname{dom}p\}\]

A last hurdle: cardinal collapse

Being a cardinal is not an absolute notion. So it is possible that \(\aleph_2^M\neq\aleph_2^{M[G]}\), invalidating our proof. Luckily for us Cohen forcing possess a property which ensure that \(\aleph_2^M=\aleph_2^{M[G]}\)
A partial order \(\mathbb{P}\) admits the countable chain condition (ccc) iff every antichain of \(\mathbb{P}\) is countable. An antichain is a subset \(A\subseteq\mathbb{P}\) whose elements are two-by-two incompatibles, i.e. \(\forall p, q\in A, p\neq q\implies (\forall r\in\mathbb{P}, r\not\geqslant p\text{ or }r\not\geqslant q)\).
If \(\mathbb{P}\in M\) and \((\mathbb{P}\text{ has ccc})^M\) then \(\mathbb{P}\) preserves cardinal.

Further notes and resources

We made the assumption that our model was countable and transitive, which was the historical approach, but other approaches were developped since then. One of them, attributed to Scott and Solovay, is based on boolean valued models and doesn't require for the ground model to be countable and transitive. It is well covered in Thomas Jech's Set Theory and Timothy Chow's beginner's guide to forcing. Note also that nowadays forcing is not restricted to set theory anymore, it has reached many fields, has connections with topology, and there are many other things i omitted like Martin's axiom or iterated forcing for instance.

Here are some resources I gathered on the subject, I used most of them.

Disclaimer

I have an MSc in logic but I am not an expert in set theory or even forcing. The present document was produced, in a hurry, for the first Summer of Math Exposition. I'm not entirely satisfied with it, so if you find a mistake, have a suggestion or a comment, you are more than welcome to contact me at william.gaudelier@gmail.com