# Blog

Mathematics

## Linear Algebra: Introduction

[latexpage]

The quintessential linear algebra problem will ask for the solution of a set of linear equations.

• Example: Find the solution $(x,y)$ for the linear system

$\begin{matrix} 3x-y=2 \\ 2x+3=4 \end{matrix}$

• There are two ways we can interpret the solution to these equations:
• The point(s) at which these equations intersect when plotted as a line in $\mathbb{R}^2$
• For our example, we can rewrite each equation in our system as a function $f(x)=y$ and plot each function on $\mathbb{R}^2$ to visualize the this interpretation of the solution:
• If we rewrite the system as a vector equation, the solution becomes the set of scalar values that are multiplied with respective column vectors on the left hand side of the vector equation so that we obtain the right hand side

Don’t worry, I will be adding more content to this page in the near future

## Sources

• Linear Algebra and its Applications, 3rd Edition. Gilbert Strang (1986)

[latexpage]

## Introduction – Toothpaste Brand Example

Consider the following scenerio:

• Brand T is a toothpaste company that controls 20% of the toothpaste market
• A market researcher predicts the following effects of an ad campaign:
• A consumer using brand T will continue using brand T with a 90% probability
• A consumer not using brand T will switch to brand T with a 70% probability
• For any given customer in the toothpaste market, let…
• $T$ denote a state of using brand T
• $T’$ denote the state of using a toothpaste brand other than T

Definitions:

• transition diagram is wieghted directed graph whose nodes denote various states a system can exist in and edges denote the probability of the system transitioning from one state to another after some time step $\Delta t$
• Transition diagram describing our toothpaste brand example:

• transition probability matrix $\mathrm{P}$ is a $n \times n$ matrix whose $(i,j)$ entries give the probabiliy a “sample” in the $i$th state of a $n$-state system will transition to its $j$th state
• Importantly, the elemental sum of all rows in any $\mathrm{P}$ must equal 1
• Here is transition probability matrix describing our toothpaste brand example:

• An initial state distribution matrix $\mathrm{S_0}$ for a $n$-state system is a $n$-dimensional column vector whose $i$th entries denote the percentage of “samples” that are in state $i$ at time $t=0$
• Here is the initial state distribution matrix for our toothpaste example:

• The $i=1$st entry indicates that 20% of customers in the toothpaste market use brand T (state $T$) and the $i=2$nd entry indicates that 80% of customers in the toothpaste market use brand T’ (state $T’$)
• probability tree gives the probability a “sample” will transition to some state after some time
• For our toothpaste example, let’s construct  a probability tree that tells us the probability a person will use brand T vs. T’ after one month

• In order to determine the probability a customer is using brand T after one week, we need to take the sum of all the possible products of the probability transitions that end in state T (Fig 5)
• Here, there are two possible state transition sequences that end in state T (highlighted in orange and green):
• The orange path gives the probability of a customer will use toothpaste brand T for the entire month ⇒ $P(T \rightarrow T) = (0.2)(0.9) = 0.18$
• The green path gives the probability of a customer using brand T’ at the beginning of the month, then switching to band T at the end of the month ⇒ $P(T’ \rightarrow T) = (0.8)(0.7) = 0.56$
• Now, we can sum the probabilities of each possible transition sequence ending in state $T$ in order to determine the total probability a custormer will be using brand T by the end of the month: $P(T) = P(T \rightarrow T) + P(T’ \rightarrow T) = (0.18)(0.56) = 0.74$
• There is a 74% chance a random customer in the toothpaste market will be using brand T by the end of the month
• Since we only have two possible states in our system, subtracting $P(T)$ from 1 will give us the probability a random customer will be using brand T’ by the end of the month: $P(T’) = 1 – P(T) = 1 – 0.74 = 0.26$
• Importantly, we can construct a state distribution matrix giving the the probabilities for the toothpaste brand a randomly sampled customer will be using after one month:

Theorem: $\mathrm{S_i} \cdot \mathrm{P} = \mathrm{S_{i+1}}$

• Specifically, this is saying that the dot product between a state distribution matrix for time $i$th time step and a corresponding transition probability matrix will return the state distribution matrix for the $i+1$th time step
• Let’s check and make sure this hold true for our example:

$\mathrm{S_0} \cdot \mathrm{P} = \begin{bmatrix} 0.2 & 0.8 \end{bmatrix} \cdot \begin{bmatrix} 0.9 & 0.1 \\ 0.7 & 0.3 \end{bmatrix} = \begin{bmatrix} (0.2)(0.9) + (0.8)(0.7) & (0.2)(0.1) + (0.8)(0.3) \end{bmatrix} = \begin{bmatrix} 0.74 & 0.26 \end{bmatrix} = \mathrm{S_{1}}$

• Results agree with our probability tree!

Assuming $\mathrm{P}$ remains valid, we can determine the expected state distribution matrix for any time step (i.e., month)

• State distribution matrix for second time step:

$\mathrm{S_1} \cdot \mathrm{P} = \begin{bmatrix} 0.74 & 0.26 \end{bmatrix} \cdot \begin{bmatrix} 0.9 & 0.1 \\ 0.7 & 0.3 \end{bmatrix}$

$= \begin{bmatrix} (0.74)(0.9) + (0.26)(0.7) & (0.74)(0.1) + (0.26)(0.3) \end{bmatrix}$

$= \begin{bmatrix} 0.848 & 0.152 \end{bmatrix} = \mathrm{S_{2}}$

• State distribution matrix for third time step

$\mathrm{S_2} \cdot \mathrm{P} = \begin{bmatrix} 0.848 & 0.152 \end{bmatrix} \cdot \begin{bmatrix} 0.9 & 0.1 \\ 0.7 & 0.3 \end{bmatrix}$

$= \begin{bmatrix} (0.848)(0.9) + (0.152)(0.7) & (0.848)(0.1) + (0.152)(0.3) \end{bmatrix}$

$= \begin{bmatrix} 0.8698 & 0.1304 \end{bmatrix} = \mathrm{S_{3}}$

## Regular Markov Chains: Stationary Matrices and Steady State Markov Chains

Example: assume a company initially has 10% of the market share

• Using an advertising campaign, the transition probability matrix is given by
• Notations:
• $A =$ state where a customer is using brand A
• $A’ =$ state where a customer is using brand A’
• Question: what happens to the company’s market shar over a long period of time (assuming $\mathrm{P}$ continues to be valid)
• Solution:

$\mathrm{S_0} = \begin{bmatrix} 0.1 & 0.9 \end{bmatrix}$

$\mathrm{S_1} = \mathrm{S_0} \cdot \mathrm{P} = \begin{bmatrix} 0.1 & 0.9 \end{bmatrix} \cdot \begin{bmatrix} 0.8 & 0.2 \\ 0.6 & 0.4 \end{bmatrix} = \begin{bmatrix} 0.62 & 0.38 \end{bmatrix}$

$\mathrm{S_2} = \mathrm{S_1} \cdot \mathrm{P} = \begin{bmatrix} 0.62 & 0.38 \end{bmatrix} \cdot \begin{bmatrix} 0.8 & 0.2 \\ 0.6 & 0.4 \end{bmatrix} = \begin{bmatrix} 0.724 & 0.276 \end{bmatrix}$

$\mathrm{S_3} = \mathrm{S_2} \cdot \mathrm{P} = \begin{bmatrix} 0.7448 & 0.2552 \end{bmatrix}$

$\mathrm{S_4} = \mathrm{S_3} \cdot \mathrm{P} = \begin{bmatrix} 0.74896 & 0.25104 \end{bmatrix}$

$\mathrm{S_5} = \mathrm{S_4} \cdot \mathrm{P} = \begin{bmatrix} 0.749792 & 0.250208\end{bmatrix}$

$\mathrm{S_6} = \mathrm{S_5} \cdot \mathrm{P} = \begin{bmatrix} 0.7499584 & 0.2500416\end{bmatrix}$

• Here, the state distribution matrices $\mathrm{S_i}$ get closer and closer to $\begin{bmatrix} 0.75 & 0.25 \end{bmatrix}$ as $i \rightarrow \infty$
• Moreover, if we take the dot product between $\mathrm{S} = \begin{bmatrix} 0.75 & 0.25 \end{bmatrix}$ and $\mathrm{P}$, we get $\mathrm{S}$:
• $\mathrm{S} = \mathrm{S} \cdot \mathrm{P} = \begin{bmatrix} 0.75 & 0.25 \end{bmatrix} \cdot \begin{bmatrix} 0.8 & 0.2 \\ 0.6 & 0.4 \end{bmatrix} = \begin{bmatrix} 0.75 & 0.25 \end{bmatrix}$
• No change occurs!
• The matrix $\mathrm{S}$ is called a stationary matrix and the system is said to be at steady state

Questions:

1. Does every Markov chain have a unique stationary matrix?
2. If a Markov chain does have a unique stationary matrix, will the successive state matrices always approach this stationary matrix?

Answers:

• Generally, the answer to both questions is no
• However, if a Markov chain is a regular Markov chain, then the answer to both questions is yes

[latexpage]

## Luminiferous ether

Mid-ninteenth century: mainstream science believed light particles were (mechanical) waves traveling through a medium called the luminiferous ether

Mid-ninteenth century definition of a wave:

• Waves = a disturbance traveling through a medium
• Note: this is essentially what our current definition of a mechanical wave is today
• Medium = the material or substance through which a wave is traveling
• Example 1: a dewdrop falling into a pond (Figure 1)
• Here, the medium is the water in the pond
• The initial disburbance is the dewdrop landing in the water
• The water particles initially disturbed by the dewdrop further disturbs the position of surrounding water particles
• This disturbance is further propagated throughout the medium (i.e., the pond)
• Example 2: sound waves from clapping
• Medium = air
• Initial disturbance = compression of air molecules
• Compressed air molecules causes them to collide with one another and generate sound waves
• Note: both of these examples are mechanical waves

We knew from research such as Young’s double-slit experiment (1801) that light has wave-like properties

• Specifically, it showed one of the hallmark signs of wave behavior: interference
• Some unaswered questions of the mid-19th century
• How could they define light in terms of its wave-like properties?
• Note: they were trying to define light in terms of mehcanical waves (they didn’t know about electromagntic waves back then)
• They theorized that light (e.g., light traveling from the Earth to the Sun) could be explained as a disturbance propagating through a medium
• People called this medium the luminiferous ether
• Big question: does the luminferous ether exist?

Luminiferous ether = medium through which light (supposedly) propogates

• One major goal of mainstream science back then was to detect/validate the existence of this medium
•  Note: if there is a luminiferous ether, the Earth must be traveling fast relative to it
• Not only is the Earth rotating on its own axis, but is also treaveling along an elliptical orbit around the Sun at $\approx 30 \mathrm{km/s}$
• Moreover, the Sun is estimated to orbit around the center of the galaxy at $\approx 200 \mathrm{km/s}$
• As far as our galaxy is concerned, we don’t really know what it’s doing, we just know its moving
• Most scientists theorize our galaxy is rotating around a black hole
• Take-home-message: if the luminiferous ether exists, Earth’s position should be constantly changing relative to it
• Reasoning behing this:
• The odds of us being stationary relative to such a medium are essentially zero
• We should either be moving relative to the ether or the ether should be moving relative to us
• Thus, we should be able to detect some sort of “ether wind” or the “current” associated with the luminiferous ether
• Aside: waves propagate faster in the direction which current is moving
• Example: a dew drop falling in a stream with a current flowing (Figure 2)
• Here, the medium is the water in the stream and the initial disturbance is the dewdrop falling into the stream
• Propagation of medium distortion (i.e., the waves/ripples in the stream) will occur more quickly in the direction of current (i.e., movment to the left)

### The Michelson-Morely Experiment:

#### Experiment Background

Assuming there did exist a luminiferous ether, let $\overrightarrow{s}$ be the velocity of the its ether wind

• From our dicussion on wave propogation speed and currents, light that is propagated in the same direction as $\overrightarrow{s}$ show travel at a faster velocity than light propagated in the $-\overrightarrow{s}$ direction
• For a while, no one could figure out how to test this because the tools/technology did not yet exist that could detect velocities near the speed of light (thus, any differences they would have expected to find were inmeasureable)

Eventually Michelson and Morely designed an experiment that was able to work around this issue using wave interference

• Recall: interference is a hallmark behavior of waves
• Instead of attempting to measure the speed of light emitted in different directions, they split light into two different directions, recombined them, and observed the interference patterns
• They reasoned, that if light emitted in different directions traveled at different speeds, then different interference patterns would result
• However, this isn’t what happened!!
• No matter how they oriented the apparatus (no matter the time of day and/or year), they always observed the same interference pattern
• Conclusion: the luminiferous ether doesn’t seem to affect light waves ⇒ breakdown of the idea behind a luminiferous ether and/or an “absolute” inertial frame of reference through which light traveles
• Titled one of “the most famous failed experiments”
• Note: there were other experiments besides this one at the time that were also causing people to question the existence of a luminiferous ether

As it turns out, no matter the reference frame, light always travels at a constant speed!

[latexpage]

## Lecture 1

### 1D Random Walk

• $\delta =$ distance unit
• $\tau =$ time unit
• Unbiased random walk: $\mathrm{Prob(left \quad step)} = \mathrm{Prob(right \quad step)} = 0.5$
• Biased random walk: $\mathrm{Prob(left \quad step)} \neq \mathrm{Prob(right \quad step)}$

### Applications of Randomness in Biology

Macromolecule structures

• Entropy = the number of ways someting can be arranged
• Equilibrium systems vs. nonequilibrium systems
• Order of magnitude estimates ⇒ sanity checks
• Force-extension experiment ⇒ entropy vs. energy

Chemotaxis = movement in response to chemicals

• E coli example ⇒ flagella rotation
• Counterclockwise spin ⇒ propel straight
• Clockwise spin ⇒ “tumble”

Diffusion – biased vs. unbiased

Molecular motors

• Kinesin on microtubules
• Polymerase on DNA (during DNA replication)
• Ribosome on RNA (during RNA synthesis)

Ion channels – amount of time spent open vs. closed

Noisy cellular reaction networks

• Bursting ⇒ off state vs. on state
• Time-averaging ⇒ protein translation/degradation
• Propogation ⇒ gene activation

Reaction-diffusion and biological pattern formation (e.g., B-Z reactions)

Viral evolution – high mutation rate (compared to higher-order organisms)

Extinction

• Extinction vs. reintroduction of disease in a local population
• Critical community size = population size needed to sustain infection without exinction
• Small local populations ⇒ more stochastic

## Lecture 2

### Review of the average, variance, and standard deviation

Consider the following set of data points: 1, 1, 2, 3, 4, 5, 5, 5, 6, 7, 7, 8

• Here, the number of data points is $N=12$
• Average = point that data points are centered about
• The mean average is defined as follows:

$\boxed{\mu = \left< x \right> = \frac{1}{N} \sum _{i=1}^{N}{ x_i}}$

• For our $N=12$ data points, $\left< x \right> = 4.5$
• Alternatively, we can find $\left< x \right>$ using the probabilities associated with each data point, $P(x_i)$
• For our $N=12$ data points…
• There are two data points (the first two listed) that have a value of 1
• ⇒ $P(x_1) = P(x_2) = 2/12$
• There is one data point (the third listed) with a value of 2
• ⇒ $P(x_3) = 1/12$
• There is one data point (the fourth listed) with a value of 3
• ⇒ $P(x_4) = 1/12$
• There is one data point (the fifth listed) with a value of 4
• ⇒ $P(x_5) = 1/12$
• There are three data points (the sixth, seventh, and eighth listed) with a value of 5
• ⇒ $P(x_6) = P(x_7) = P(x_8) = 3/12$
• There is one data point (the ninth listed) with a value of 6
• ⇒ $P(x_9) = 1/12$
• There are two data points (the tenth and eleventh listed) with a value of 7
• ⇒ $P(x_10) = P(x_11) = 2/12$
• There is one data point (the twelfth listed) with a value of 8
• ⇒ $P(x_12) = 1/12$
• We can use this information and to determine $\left< x \right>$ using the following equation:
• $\boxed{\left< x \right> = \sum _{i=1}^{N}{ x_i \cdot P(x_i)}}$
• For our specific example, we would write,
• $\left< x \right> = 1(\frac{2}{12}) + 2(\frac{1}{12}) + 3(\frac{1}{12}) + 4(\frac{1}{12}) + 5(\frac{3}{12}) + 6(\frac{1}{12}) + 7(\frac{2}{12} + 8(\frac{1}{12}) = 4.5$

Variance = measures how far the data points tend to be spread about the mean

• Mathematical expression:

$\boxed{\sigma ^2 = \left< (x – \mu)^2 \right> = \left< x^2 – 2x\mu + \mu ^2 \right> = \left<x^2 \right> – 2\mu \left< x \right> + \mu ^2}$

• Recall: $\mu = \left< x \right>$
• $\sigma ^2 = \left<x^2 \right> – 2 \left< x \right> \left< x \right> + \left< x \right> ^2 = \left<x^2 \right> – 2 \left< x \right> ^2 + \left< x \right> ^2 =$\left< x^2 \right> – (2-1) \left< x \right> ^2 = $\left< x^2 \right> – \left< x \right> ^2$
• Thus, $\boxed{\sigma ^2 = \left< x^2 \right> – \left< x \right> ^2}$ (or,  $\sigma ^2 = \left< x^2 \right> – \mu ^2$)
• IMPORTANTLY, $\left< x^2 \right>$ is DISTINCT from $\left< x \right> ^2$
• $\left< x^2 \right>$ denotes the average of a set of data points that have been squared from their original value
• $\left< x \right> ^2$ denotes squaring the average of a set of data points
• Lets compute the variance for our $N=12$ data set we defined earlier: 1, 1, 2, 3, 4, 5, 5, 5, 6, 7, 7, 8
• First, we need to find $\left< x^2 \right>$:
• Let $m = 8$ denote the number of possible values a data point may obtain
• The mathematical expression for $\left< x^2 \right>$ is…
• $\left< x^2 \right> = \sum _{j=1}^{m}{ x_{j}^2 \cdot P(x_j)} = \frac{1}{N} \sum _{i=1}^{N}{ x_{i}^2}$
• Thus, for our example
• $\left< x^2 \right> = 1^2 (\frac{2}{12}) + 2^2 (\frac{1}{12}) + 3^2 (\frac{1}{12}) + 4^2 (\frac{1}{12}) + 5^2 (\frac{3}{12}) + 6^2 (\frac{1}{12}) + 7(\frac{2}{12}) + 8(\frac{1}{12}) \approx 25.333$
• Now, we can compute the variance for our data set:
• $\sigma ^2 = \left< x^2 \right> – \left< x \right> ^2 \approx 25.333 – (4.5)^2 \approx 5.0833$
• Important notes on the variance formula:
• The variance formula contains a squared exponent to account for any possible negative data values ⇒ prevents data points from cancelling eachother out
• Although an operator such as the absolute value initially seems like an appropriate way to describe the variance of a data set, it does’nt always provide “good” answers (i.e., it can cause problems with certain data sets that squaring doesn’t create)
• Any measure of variance should have squared units with respect to the units used in the data set we are taking the variance of
• E.g., if our example data set has units in $\mathrm{mm}$, the units of the variance will be $\mathrm{mm^2}$
• However, interpreting values that has squared units of what we were originally measuring can be counter-intuitive
• So, we usually calculate something called the standard deviation to get a better intuition when interpreting the variance of a data set
• Equation: $\boxed{\sigma = \sqrt{\sigma ^2} = \sqrt{\left< x^2 \right> – \left< x \right>^2}}$
• For our example data set, $sigma = \sqrt{5.08333 \mathrm{mm^2}} \approx 2.2546 \mathrm{mm}$ (units aren’t squared!)
• Take-home-message: the standard deviation is better for gaining intuition about the spread of data than variance because it gives a value in units rather than units squared

### Review of probabilities

Suppose $A$ and $B$ are two independent events (i.e., the chances of event $B$ occuring are independent of whether event $A$ occurs and vice-versa)

Some notations:

• Let $S$ denote the all possible events that can occur in some sample space
• $P(A \cap B)=$ the probability of event $A$ AND event $B$ occuring
• Ven Diagram (shaded region represents the case where both events occur):
• We can find the probability of multiple independent events occuring by multipliying the probabilities of each individual event together
• For our events $A$ and $B$, $\boxed{P(A \cap B)= P(A)P(B)}$
• Example: the probability of flipping two heads in a row if we flip a coin twice
• Let event $A$ be the event where the first coin toss is heads
• For an unbiased coin, $P(A) = 0.5$
• Let event $B$ be the event where the second coin toss is heads
• For an unbiased coin, $P(B) = 0.5$
• Since events $A$ and $B$ are independent of one another (our outcome of the second coin toss is independent of the outcome of our first coin toss and vice-versa), we can use our expression for $P(A \cap B)$ to determine the probability of getting two heads:
• $P(A \cap B) = P(A)P(B) = 0.5 \cdot 0.5 = 0.25$
• Thus, there is a 25% chance of getting two heads after flipping an unbiased coin twice
• $P(A \cup B)=$ the probability of event $A$ OR event $B$ occuring
• Ven Diagram (shaded region represents cases where at least one of our two events of interest occur):
• We can find the probability of at least one of these two events occuring by summing the individual probabilities and subtracting any double counting
• For our events $A$ and $B$, $\boxed P(A \cup B) = P(A) + P(B) – P(A \cap B)$
• Example: if we roll two dice, what is the probability that at least one of die giving a even number?
• Let event $A$ be the event that the first die gives an even number
• $P(A) = 0.5$ is the probability of event $A$ occuring
• Let event $B$ be the event that the sescond die gives an even number
• $P(B) = 0.5$ is the probability of event $B$ occuring
• Note: if we just added $P(A)$ and $P(B)$ together to determine $P(A \cup B)$, we would be double-counting the probability associated with both events $A$ and $B$ occuring ⇒ need to subtract $P(A \cap B)$ from our summed probabilities
• $P(A \cup B) = P(A) + P(B) – P(A \cap B)$
• First determine $P(A \cap B)$:
• $P(A) = 0.5$
• $P(B) = 0.5$
• $P(A \cup B) = P(A)P(B) = 0.5(0.5) = 0.25$
• $P(A \cup B) = 0.5 + 0.5 – 0.25 = 0.75$
• Thus, there is a 75% chance of rolling at least one even number if we roll two die
• $P(\mathrm{neither} A \mathrm{nor} B) = P(\mathrm{not} A) \cdot P(\mathrm{not} B)$
• Venn Diagram (shaded region represents all possible events besides events $A$ or $B$):
• Note: $P(A \cup B) = 1 – P(\mathrm{neither} A \mathrm{nor} B)$
• Often times, it is convienent to find the probability of something by considering the probability that the “opposite” of what you are trying to find and subtracting that probability from 1
• Example: use this strategy to compute the probability of rolling at least one even die if we roll two dice
• The “opposite” of rolling at least one even number is to roll two odd numbers
• Let $A$ denote the event where the first die gives an odd number
• $P(A) = 0.5$
• Let $B$ denote the event where the second die gives an odd number
• $P(B) = 0.5$
• Subtracting the product of the probabilities of our “opposite” events from 1 will give us the total probability of at least one “non-opposite” event occuring
• $P(\mathrm{at \\ least \\ one \\ even} = 1 – P(A)P(B) = 1 – 0.5(0.5) = 1 – 0.25 = 0.75$
• 75% chance of rolling at least one even number (agrees with the probability we found for this earlier!)

## Sources

• Dr. Leah Shaw – BIOL 356: Random Walks in Biology, Lecture 1 (College of William and Mary)
Physics

## Gravity

[latexpage]

### Background

Gravity is fundamental to everything humans have experienced in (written) history

• Newton was the first person recorded in modern history to question why objects always fall towards the ground
• Used gravity to explain this phenomena
• Lead to the development of classical mechanics (later updated by Einstein)
• Note: classical mechanics applies to macroscopic objects that are traveling far slower than the speed of light

Importantly, we currently do not have a very good understanding of what is causing gravity

• Question: what is mass and why do bodies of mass “gravitate” towards each other?
• Answer: we dont know
• Comments/random thoughts: this is kind of like the opposite of diffusion (diffusion = the tendency for particles to move from areas of high concentration gradients – i.e. – areas with a lot of particles, to areas of low concentration gradients – i.e. – areas with few particles)
• Like diffusion, gravity seems to occur spontaneously, so maybe it is the result of something that can be described as “energetically favorable”
• Q: is there a “randomness” component to gravity like there is for the diffusion of particles?

Although we still dont exactly know why gravity exists, we are pretty good at describing how gravity behaves (in the context of systems that follow principles of classical mechanics)

### Newton’s Law of Gravitation

• Generally, gravity is defined as the attractive force $\overrightarrow{F}_g$ between two objects with positive nonzero masses $m_1$ and $m_2$ whose centers of mass are separated by a distance vector $\overrightarrow{r}$
• Note: Newton’s law is stated in the context of particles
• Equation for the gravitational force between two objects:

$\left\| \overrightarrow{F}_g \right\| = G \frac{m_1 m_2}{\left\| \overrightarrow{r} \right\|}$

• Variables:
• $\left\| \overrightarrow{F}_g \right\| =$ the magnitude of the gravitational force between object 1 and object 2 (SI units: Newton)
• $G =$ the graviational constant $\approx 6.67 \cdot 10^{-11} \mathrm{N (\frac{m}{kg})^2}$
• $m_i$ = the mass of object $i$ (SI units: $\mathrm{kg}$)
• $\left\| \overrightarrow{r} \right\| =$ the length of the vector describing the distance between the two objects (SI units: $\mathrm{m}$)
• Also, assume $m_1$ is located at a point $P$ and $m_2$ is located at a point $Q$