2.6: Models and Comparative Methods - Biology

2.6: Models and Comparative Methods - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

For the rest of this book I will introduce several models that can be applied to evolutionary data. I will discuss how to simulate evolutionary processes under these models, how to compare data to these models, and how to use model selection to discriminate amongst them. In each section, I will describe standard statistical tests (when available) along with ML and Bayesian approaches.

One theme in the book is that I emphasize fitting models to data and estimating parameters. I think that this approach is very useful for the future of the field of comparative statistics for three main reasons. First, it is flexible; one can easily compare a wide range of competing models to your data. Second, it is extendable; one can create new models and automatically fit them into a preexisting framework for data analysis. Finally, it is powerful; a model fitting approach allows us to construct comparative tests that relate directly to particular biological hypotheses.

Comparative Analysis of Single-Cell RNA Sequencing Methods

Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols.

Keywords: cost-effectiveness method comparison power analysis simulation single-cell RNA-seq transcriptomics.

2.6: Models and Comparative Methods - Biology

Количество зарегистрированных учащихся: 26 тыс.

Участвовать бесплатно

How do children overcome hazardous experiences to succeed in life? What can be done to protect young people at risk from trauma, war, disasters, and other adversities? Learn about the importance of fostering resilience in children at risk. During this course, participants will: learn how trauma can affect children and the systems they depend on, gain insight into core concepts, research methods and lessons learned in last 50 years of resilience research, learn how research is being applied in the real world through interventions that promote resilience, and engage in discussions with others who are working with children at risk around the world Participants are welcome to take the MOOC at no cost or to register for a Course Certificate ($49). Those who register and earn a Course Certificate from Coursera also are eligible to sign up for continuing education clock hours through the University of Minnesota. Participants can earn 10 clock hours of continuing education credit (added cost $99) from the College of Education and Human Development at the University of Minnesota. Go here to register for continuing education clock hours for completing this course.


Fantastic course, I team training others in an organisation on Trauma and Adverse child experiences, so to see and hear so much current research really supported my work.

I honestly enjoyed this course. It was well-thought out. There were a lot of examples that the instructor drew from to describe resilience. Overall, I learned a lot!

Week 2: Methods and Models of Research on Resilience (including case studies)

This module highlights the models and methods used in resilience science, including person-focused methods and variable-focused methods. The case study of Dr. Maddaus continues and the case of resilience in early childhood is presented.


Ann S. Masten, Ph.D., LP

Regents Professor, Irving B. Harris Professor of Child Development

Текст видео

Hello, let's take a closer look in this session at moderator models, and the, and the topic of protective factors. Which is a very important topic in the study of resilience. There has always been an interest in trying to figure out what really matters when children are facing adversity. What can we do to protect our children when they encounter difficulties, to help them recover. And, although we know that, assets and resources can make a difference, they help children at all levels of risk. There's a special interest in understanding, what we can do that might really matter in circumstances where adversity is very high. What is especially important that protects children when they're in very difficult situations? And that's when the, the topic of moderators and protective factors becomes really important. Here is a model of a moderator. A moderator is something that influences the relationship between an adversity or a risk factor. And the impact on individuals in terms of the outcomes that they experience. So here we're looking at adversity. Predicting good outcome, that would be negative. In most cases adversity has a negative ou, input, or impact on how well a child is doing, but there's a moderator here, that is making a difference and it's affecting or changing how the adversity is predicting changes in the outcome of the child. And many kinds of moderators show that pattern and have been studied in resilience science. They include things as different in level as genes. You can measure now the genetic characteristics of an individual. And try to see if genetic in, variables are related to outcomes. And there's some very interesting research suggesting that that's the case. Or, you can measure other biological systems in a child. A popular topic in resilience, now is looking at how stress systems work in a child, and whether the stress systems in, that are operating in our body alter the way in which experiences affect us, either in the short term or the long ca, term. People have been interested in personality. Their are individual differences in how reactive people are to adversity. Some people are more easy going, and some people are more stress reactive, and that could make a difference in how children do. But there are other levels of variables that appear to play a very important role in resilience that are outside the child. The, these are measures like relationships. How close you feel to your family. The parenting quality you're experiencing. The family routines and practices, the cultural beliefs and even the kind of emergency services that are available in your community and the social policy of a community or a nation. All of these could be and have been investigated as possible moderators of the relationship between adversity and how a child is doing. There's another particular kind of moderator that I wanted to focus on as well and this one is called a threat activated protected factor, that's just a fancy way of saying that, there are some protections that are actually triggered by adversity. The, the risk occurs, the threat occurs, and then that triggers the protective factor and sets it into motion. And there's some great examples of this kind of moderator in our everyday life. One, one example is the air bag in many of automobiles manufactured around the world now. Your airbag isn't doing anything, usually when you're driving around, except it's stored in, in the car. But when the airbag system detects a threat that an accident is happening, the airbag can, suddenly deploys in order to protect you from getting bad injuries during a car accidents. And, airbags are a kind of protective factor that moderate the impact on the human body of a car accident. It's a classic example of a threat-activated protective factor. Another example would be the way your immune system works in your body. You have, your body is capable of making antibodies. And many of us get vaccinated, in order to protect us from diseases. And when we're vaccinated, it stimulates our body to make antibodies to various kinds of diseases, ranging from tetanus to measles to smallpox, whatever it is. And then, when our body encounters that inf, infectious agent, th, our immune system responds by protecting us. It, the, antibodies attack. The foreign agents and, we keep, we're able to maintain our health. Vaccination is an extremely important kind of intervention that we'll be talking about more later in the class. So here's some other examples, of threat-activated protective factors that are not automobile airbags or our immune system. One that is very important for children, is the way that parents respond in an emergency. Parents are always looking after their children, they're always promoting their pa, their development. But in an emergency, when a parent sees that a child is in imminent danger. A chi, a parent will respond by taking a special kind of action. So, a parent can be a threat activated protective factor as well. I think as you get older often your friends play the same role in your life. If they observe that you're being threatened in some way. They may, they respond by trying to help you out. So, friends can be threat activated protective factors as well. And there are emergency services for children that work this way. In our community, we have emergency services. If a family feels they're unable to cope with whatever's going on, in. In their life they can call and get emergency help from Social Services in the environment. We also have emergency services in the context of disasters and terrorist attacks. Where we have first responders that come, the, you know, fire. People, police, people, military responses emergency services of many different kinds. To try to respond and help people who are in the middle of an unfolding disaster. We're also going to see in a future segment, that interventions can be thought about as. Threat activated protected factors as well. We've looked at this graph before, and this is simply an example of a moderator. And I want to tell you a little bit more about this study. This moderating, moderator s analysis is from a study of homeless children. I've been using research, a body of research, on families experiencing homelessness in the course of this methods section. And in this particular study, these were young children who were being studied, and one of the things that was measured was parenting quality, how well is the parent. Managing to take care of the children in the family even though the family's in a very difficult time, living in a temporary shelter. But, we observe many families in that situation who are still able to take care of their children get them ready for school. Read to them and try to ensure that they're getting the, the sleep and nutrition their need. Doing everything that you would hope that parents do, even though the parent is in a very difficult situation. And what we found in this particular study, which was done by Jeanette Herder, Her, Herbers. And her group is that, the children that had that really effective parenting, were always doing better. However, when th risk level in the family was very, very high, that had a lot of stressful life experiences. That good parenting mattered even more. Because parenting quality appears to be a very powerful moderator of how children do in the midst of adversity. Now, we'll be hearing more about that as we go forward in the class, in other sections on war and disaster as well. Next time, we're going to take a look at combined models. We've been looking at a group of person-focused models. And a group of variable focused models. And now we're going to look at the topic of strategies that try to combine both those models into one study.

1. Introduction

The elucidation of the structure and dynamics of biopathways is a central objective of systems biology. A standard approach is to view a biopathway as a network of biochemical reactions, which is modeled as a system of ordinary differential equations (ODEs). Following Barenco et al. (2006), this system can typically be expressed as 1 :

where i ∈ <1, … , n> denotes one of n components (henceforth referred to as “species”) in the biopathway, xi(t) denotes the concentration of species i at time t, δi is a decay rate and x(t) is a vector of concentrations of all system components that influence or regulate the concentration of species i at time t. If, for instance, species i is an mRNA, then x(t) may contain the concentrations of transcription factors (proteins) that bind to the promoter of the gene from which i is transcribed. The regulation is modeled by the regulation function g. Depending on the species involved, g may define different types of regulatory interactions, e.g., mass action kinetics, Michaelis–Menten kinetics, allosteric Hill kinetics, etc. All of these interactions depend on a vector of kinetic parameters, ρ i. For complex biopathways, only a small fraction of ρ i can typically be measured. Hence, the explication of the biopathway dynamics requires the majority of kinetic parameters to be inferred from observed (typically noisy and sparse) time course concentration profiles. In principle, this can be accomplished with standard techniques from machine learning and statistical inference. These techniques are based on first quantifying the difference between predicted and measured time course profiles by some appropriate metric to obtain the likelihood of the data. The parameters are then either optimized to maximize the likelihood (or a regularized version thereof), or sampled from a distribution based on the likelihood (the posterior distribution).

However, the nature of the ODE-based model in equation (1) renders the inference problem computationally challenging in two respects. First, the ODE system often does not permit closed-form solutions. One therefore has to resort to numerical simulations every time the kinetic parameters ρ i are adapted, which is computationally onerous. Second, the likelihood function in the space of parameters ρ i is typically not unimodal, but suffers from multiple local optima. Hence, even if a closed-form solution of the ODEs existed, inference by maximum likelihood would face an NP-hard optimization problem, and Bayesian inference would suffer from poor mixing and convergence of the Markov chain Monte Carlo (MCMC) simulations.

Conventional inference methods involve numerically integrating the system of ODEs to produce a signal, which is compared to the data by some appropriate metric defined by the chosen noise model, allowing for the calculation of a likelihood. This process is repeated as part of an iterative optimization or sampling procedure to produce estimates of the parameters. Figure 1A is a graphical representation of the model for these conventional inference methods. For a given set of initial concentrations of the entire system X(0) and set of ODE parameters θ [where θ = (θ1, … , θn) and θi = (ρ i, δi)], a signal can be produced by integration of the ODEs. As mentioned previously, for many ODE systems a closed-form solution does not exist, so in practice, numerical integration is implemented instead. Assuming an appropriate noise model (for example, a Gaussian additive noise model) with standard deviation (SD) of the observational error σ, the differences between the resultant signal and the data Y can be used to calculate the likelihood of the parameters θ. The process is repeated for different parameters θ until the maximum likelihood of the parameters is found (in the classical approach) or until convergence to the posterior distribution is reached (in the Bayesian approach). However, the computational costs involved with repeatedly numerically solving the ODEs are large.

Figure 1. Graphical representations of (left) the explicit solution of the ODE system, as shown in Calderhead et al. (2008), and (right) gradient matching with Gaussian processes, as proposed in Calderhead et al. (2008) and Dondelinger et al. (2013). (A) Explicit solution of the ODE system, as shown in Calderhead et al. (2008). The noisy data signals Y are described by some initial concentration X(0), ODE parameters θ and observational errors with SD σ. For a given set of initial concentrations X(0) and set of ODE parameters θ, the ODEs can be integrated to produce a signal, which is then compared to the data signal by some metric defined by the chosen noise model. (B) Gradient matching with Gaussian processes, as proposed in Calderhead et al. (2008) and Dondelinger et al. (2013). The gradients X ˙ are compared from two modeling approaches the Gaussian process model and the ODEs themselves. The distribution of Y is given in equation (4), the Gaussian process on X defined in equation (5), the derivatives of the Gaussian process X ˙ in equation (10), the ODE model in equation (2), and the gradient matching in equation (17). All symbols are detailed in Section 2.1.

To reduce the computational complexity, several authors have adopted an approach based on gradient matching [e.g., Calderhead et al. (2008) and Liang and Wu (2008)]. The idea is based on the following two-step procedure. In a preliminary smoothing step, the time series data are interpolated then, in a second step, the parameters θ of the ODEs are optimized so as to minimize some metric measuring the difference between the slopes of the tangents to the interpolants, and the θ-dependent time derivatives from the ODEs. In this way, the ODEs never have to be solved explicitly, and the typically unknown initial conditions are effectively profiled over. A disadvantage of this two-step scheme is that the results of parameter inference critically hinge on the quality of the initial interpolant. A better approach, first suggested in Ramsay et al. (2007), is to regularize the interpolants by the ODEs themselves. Dondelinger et al. (2013) applied this idea to the non-parametric Bayesian approach of Calderhead et al. (2008), using Gaussian processes (GPs), and demonstrated that it substantially improves the accuracy of parameter inference and robustness with respect to noise. As opposed to Ramsay et al. (2007), all smoothness hyperparameters are consistently inferred in the framework of non-parametric Bayesian statistics, dispensing with the need to adopt heuristics and approximations.

This review compares the current state-of-the-art in gradient matching, specifically in the context of parameter inference in ODEs. This comparison aids in understanding the difference between key components of methods without confounding influence from other modeling choices. For instance, we compare the inference paradigm of the parameter that governs the degree of mismatch between the gradients of the interpolants and ODEs [using the method in Dondelinger et al. (2013)] with a tempering approach [from the method in Macdonald and Husmeier (2015)], using the same interpolation scheme (namely, Gaussian processes). This way, we are able to gain an understanding as to what approach may be more suitable, without concern that differences may be due to interpolation choice. If the ODEs provide the correct mathematical description of the system, ideally there should be no difference between the interpolant gradients and those predicted from the ODEs. In practice, however, forcing the gradients to be equal is likely to cause parameter inference techniques to converge to a local optimum of the likelihood. A parallel tempering scheme is the natural way to deal with such local optima, as opposed to inferring the degree of mismatch, since different tempering levels correspond to different strengths of penalizing the mismatch between the gradients. A parallel tempering scheme (which uses smoothed versions of the posterior distribution as well as the usual posterior distribution, see Section 2.2 for more details) was explored by Campbell and Steele (2012).

When comparing one method to another, in order to assess the strengths and weaknesses of an approach, often results are not directly comparable, since different approaches use different methodological paradigms. For example, if the method by Campbell and Steele (2012) (which uses B-splines interpolation) was compared to Dondelinger et al. (2013) (which uses a GP approach) in order to examine the difference between parallel tempering and inference of the parameter controlling the degree of mismatch between the gradients, then the results would be confounded by the choice of interpolation scheme. In this review, we present a comparative evaluation of parallel tempering versus inference in the context of gradient matching for the same modeling framework, i.e., without any confounding influence from the model choice. We also compare the method of Bayesian inference with Gaussian processes with other methodological paradigms, within the specific context of adaptive gradient matching, which is highly relevant to current computational systems biology. We look at the methods of: Campbell and Steele (2012), who carry out parameter inference using adaptive gradient matching and B-splines interpolation González et al. (2013), who implement a reproducing kernel Hilbert space (RKHS) and penalized maximum likelihood approach in a non-Bayesian fashion Ramsay et al. (2007), who optimize the gradient mismatch, interpolant, and ODE parameters using a hierarchical regularization method and penalize the difference between the gradients using B-splines in a non-Bayesian approach Dondelinger et al. (2013), who use adaptive gradient matching with Gaussian processes, inferring the degree of mismatch between the gradients and Macdonald and Husmeier (2015), who use adaptive gradient matching with Gaussian processes and temper the parameter that controls the degree of mismatch between the gradients.

Comparative transcriptomic analysis of the mechanisms underpinning ageing and fecundity in social insects

The exceptional longevity of social insect queens despite their lifelong high fecundity remains poorly understood in ageing biology. To gain insights into the mechanisms that might underlie ageing in social insects, we compared gene expression patterns between young and old castes (both queens and workers) across different lineages of social insects (two termite, two bee and two ant species). After global analyses, we paid particular attention to genes of the insulin/insulin-like growth factor 1 signalling (IIS)/target of rapamycin (TOR)/juvenile hormone (JH) network, which is well known to regulate lifespan and the trade-off between reproduction and somatic maintenance in solitary insects. Our results reveal a major role of the downstream components and target genes of this network (e.g. JH signalling, vitellogenins, major royal jelly proteins and immune genes) in affecting ageing and the caste-specific physiology of social insects, but an apparently lesser role of the upstream IIS/TOR signalling components. Together with a growing appreciation of the importance of such downstream targets, this leads us to propose the TI–J–LiFe (TOR/IIS–JH–Lifespan and Fecundity) network as a conceptual framework for understanding the mechanisms of ageing and fecundity in social insects and beyond.

This article is part of the theme issue ‘Ageing and sociality: why, when and how does sociality change ageing patterns?’

1. Introduction

Why do organisms age? This is a major question in evolutionary biology, given that an unlimited lifespan associated with continuous reproduction would increase fitness and hence should be favoured. The classical evolutionary theory of ageing, developed by Medawar, Williams and Hamilton [1–3], has, in principle, explained why ageing evolves. However, we still understand very little about the tremendous diversity of ageing rates among organisms and the mechanisms that might underlie this diversity [4] (reviewed in [5,6]).

During the last decades, results from model organisms have revealed the existence of a conserved set of gene networks and pathways involved in ageing in animals ranging from nematodes and flies to mice and humans (see [6–20], and references therein). In many insects, for example, the insulin/insulin-like growth factor 1 signalling (IIS)/target of rapamycin (TOR)/juvenile hormone (JH) network has emerged as a key regulator of lifespan and somatic maintenance, growth and fecundity, and explains trade-offs between these processes (figure 1). The IIS and TOR pathways sense the availability of nutrients, such as carbohydrates and amino acids. Through a cascade of signalling activities, they positively affect the production of the lipophilic sesquiterpenoid hormone JH (as well as the steroid hormone 20-hydroxy-ecdysone) and regulate various physiological processes including reproductive physiology (e.g. egg maturation, by affecting the expression of yolk proteins or the yolk precursor protein vitellogenin see [13–19]), somatic maintenance (e.g. humoral innate immunity and oxidative stress resistance) and lifespan (see reviews in [6–20] and references therein). In particular, results from the fruit fly Drosophila melanogaster as well as from other relatively short-lived insects (e.g. grasshoppers, butterflies, bugs and planthoppers) suggest that downregulation of this signalling network (e.g. via experimental ablation of insulin-producing cells or of the gland that produces JH) promotes somatic maintenance and longevity at the expense of fecundity (e.g. [7,13,15–17,20] and references therein). Because of its central role in modulating insect life history and ageing, we herein refer to this integrated network and the downstream processes that it affects as the TI–J–LiFe network (TOR/IIS–JH–Lifespan and Fecundity) (figure 1).

Figure 1. The ‘TI–J–LiFe’ network. The TI–J–LiFe network represents a set of interacting pathways that comprise the nutrient sensing TOR (target of rapamycin) and IIS (insulin/insulin-like growth factor 1 signalling) pathways, the Juvenile Hormone (JH, a major lipophilic hormone whose production is regulated by IIS and TOR), as well as downstream processes targeted by this network, including somatic maintenance functions (e.g. immunity and oxidative stress resistance) and reproductive physiology (including vitellogenins and yolk proteins), that have profound effects upon insect life history, especially on Lifespan and Fecundity. This network is thought to be one of the major regulatory circuits underpinning variation of insect lifespan and the trade-off between fecundity and longevity. The core components and feedback loops depicted here are mainly based on experimental findings in Drosophila melanogaster (for detailed information, see e.g. IIS gene lists at: Previous work suggests that this network and its effects are evolutionarily highly conserved among insects beyond Drosophila. In some social insects (e.g. Apis mellifera), some parts of this network might be ‘wired’ differently, but whether such a ‘rewiring’ is common among social insects remains largely unknown (for further discussion, see [18]). (Online version in colour.)

Considerably less is known, however, about the role of this signalling system in affecting ageing of social insects in which queens have extraordinarily long lifespans of up to several decades and that seemingly defy the commonly observed trade-off between fecundity and longevity [21–24]. Social insects (termites and ants as well as some bees and wasps) are further characterized by a reproductive division of labour: within a colony, the typically long-lived queens (and in termites, also kings) are the only reproducing individuals, while the other colony members (workers and sometimes soldiers) perform all non-reproductive tasks, such as foraging, brood care and defence, and are comparatively short-lived. Thus, as is the case in long-lived social mole-rats [25,26], reproductive individuals with exceptionally long lifespans (queens) have evolved in social insects. The convergent evolution of sociality and reproductive division of labour (‘castes’, comprising reproductives, workers and sometimes soldiers) appear to be associated with selection for long lifespans in reproductives (see also [21,24,27]). This calls for investigation of the convergent, or possibly parallel, evolution of the mechanisms underlying a long lifespan in reproductives.

Social animals are especially suited for ageing studies because both short- and long-lived phenotypes are encoded by the same genome within a colony (e.g. [17,24,28] and references therein). Indeed, outside social insects and mole-rats, such extreme (and in this case phenotypically plastic) differences in lifespan are only found in a few, distantly related taxa (e.g. [4]), which makes controlled comparisons difficult. The shared genetic background among castes within a colony furthermore means that caste-associated differences in longevity are generally not the result of genetic variation among individuals but are due to differences in gene expression. Transcriptomic studies of social insects therefore hold great promise for uncovering the physiological mechanisms underlying large differences in lifespan (e.g. [22,28,29]). To date, however, most such studies have focused on single species and not leveraged the potential power of comparative transcriptomics across taxa.

Here, we have examined the mechanisms underlying ageing in social insects by comparing gene expression patterns between young and old queens (and for termites, also kings) and workers across different social insect lineages: two termite (Blattodea, Isoptera), two bee (Hymenoptera, Apoidea) and two ant species (Hymenoptera, Formicidae) (for species and lifespan characteristics, see table 1). We studied patterns of life history and ageing of these species comparatively within a collaborative framework, the ‘So-Long’ consortium ( This consortium tackles major questions about the apparent ‘reversal’ of the fecundity–longevity trade-off in the context of insect sociality by using species of different social complexity for each lineage and applying standardized methods when technically feasible. However, major biological differences among the species studied by our consortium sometimes necessitated the use of, for example, different tissues for transcriptomic analysis since the amount and quality of tissue that could be obtained constrained our use of specific tissues. In brief, we employed gene expression data derived from transcriptomes of target species to identify putative differences and commonalities in ageing-related expression patterns across three social insect lineages, with a special focus on the TI–J–LiFe network (figure 1 electronic supplementary material, §S1.0 and table S1). By comparing our results with published work from the well-established ageing model D. melanogaster, we begin to uncover how long-lived social insects might differ in their molecular underpinning of ageing and life-history traits when compared with short-lived solitary insects.

Table 1. Overview of samples included in this study.

a In [30], the same samples were referred to as ‘head’, yet the prothorax was attached to the head.


Phylogenetic comparative approaches can complement other ways of studying adaptation, such as studying natural populations, experimental studies, and mathematical models. [6] Interspecific comparisons allow researchers to assess the generality of evolutionary phenomena by considering independent evolutionary events. Such an approach is particularly useful when there is little or no variation within species. And because they can be used to explicitly model evolutionary processes occurring over very long time periods, they can provide insight into macroevolutionary questions, once the exclusive domain of paleontology. [4]

Phylogenetic comparative methods are commonly applied to such questions as:

Example: do canids have larger hearts than felids?

Example: do carnivores have larger home ranges than herbivores?

Example: where did endothermy evolve in the lineage that led to mammals?

Example: where, when, and why did placentas and viviparity evolve?

  • Does a trait exhibit significant phylogenetic signal in a particular group of organisms? Do certain types of traits tend to "follow phylogeny" more than others?

Example: are behavioral traits more labile during evolution?

Example: why do small-bodied species have shorter life spans than their larger relatives?

Felsenstein [1] proposed the first general statistical method in 1985 for incorporating phylogenetic information, i.e., the first that could use any arbitrary topology (branching order) and a specified set of branch lengths. The method is now recognized as an algorithm that implements a special case of what are termed phylogenetic generalized least-squares models. [8] The logic of the method is to use phylogenetic information (and an assumed Brownian motion like model of trait evolution) to transform the original tip data (mean values for a set of species) into values that are statistically independent and identically distributed.

The algorithm involves computing values at internal nodes as an intermediate step, but they are generally not used for inferences by themselves. An exception occurs for the basal (root) node, which can be interpreted as an estimate of the ancestral value for the entire tree (assuming that no directional evolutionary trends [e.g., Cope's rule] have occurred) or as a phylogenetically weighted estimate of the mean for the entire set of tip species (terminal taxa). The value at the root is equivalent to that obtained from the "squared-change parsimony" algorithm and is also the maximum likelihood estimate under Brownian motion. The independent contrasts algebra can also be used to compute a standard error or confidence interval.

Probably the most commonly used PCM is phylogenetic generalized least squares (PGLS). [8] [9] This approach is used to test whether there is a relationship between two (or more) variables while accounting for the fact that lineage are not independent. The method is a special case of generalized least squares (GLS) and as such the PGLS estimator is also unbiased, consistent, efficient, and asymptotically normal. [10] In many statistical situations where GLS (or, ordinary least squares [OLS]) is used residual errors ε are assumed to be independent and identically distributed random variables that are assumed to be normal

whereas in PGLS the errors are assumed to be distributed as

where V is a matrix of expected variance and covariance of the residuals given an evolutionary model and a phylogenetic tree. Therefore, it is the structure of residuals and not the variables themselves that show phylogenetic signal. This has long been a source of confusion in the scientific literature. [11] A number of models have been proposed for the structure of V such as Brownian motion [8] Ornstein-Uhlenbeck, [12] and Pagel's λ model. [13] (When a Brownian motion model is used, PGLS is identical to the independent contrasts estimator. [14] ). In PGLS, the parameters of the evolutionary model are typically co-estimated with the regression parameters.

PGLS can only be applied to questions where the dependent variable is continuously distributed however, the phylogenetic tree can also be incorporated into the residual distribution of generalized linear models, making it possible to generalize the approach to a broader set of distributions for the response. [15] [16] [17]

Proto-Indo-European reconstruction

Reconstruction of the Proto-Indo-European labial stops (made with the lips) and dental stops (made with the tip of the tongue touching the teeth) is fairly straightforward. More controversial is the reconstruction of the Proto-Indo-European sounds underlying the correspondences shown in Table 2.

Velar and palatal stops in the Indo-European languages
Greek Latin Gothic Sanskrit Slavic
k k h sh s
g g k j z
kh h/g/f g h z
p/t/k qu wh k k
b/d/g v/gu q g g
ph/th/kh f/v/gu w gh g

According to the most generally accepted hypothesis, there were in Proto-Indo-European at least two distinct series of velar (or “guttural”) consonants: simple velars (or palatals), symbolized as *k, *g, and *gh, and labiovelars, symbolized as *k w , *g w , and *g w h. The labiovelars may be thought of as velar stops articulated with simultaneous lip-rounding. In one group of languages, the labial component is assumed to have been lost, and in another group the velar component it is only in the Latin reflex of the voiceless *k w that both labiality and velarity are retained (compare Latin quis from *k w i-). It is notable that the languages that have a velar for the Proto-Indo-European labiovelar stops (e.g., Sanskrit and Slavic) have a sibilant or palatal sound (s or ś) for the Proto-Indo-European simple velars. Earlier scholars attached great significance to this fact and thought that it represented a fundamental division of the Indo-European family into a western and an eastern group. The western group—comprising Celtic, Germanic, Italic, and Greek—is commonly referred to as the centum group the eastern group—comprising Sanskrit, Iranian, Slavic, and others—is called the satem (satəm) group. (The words centum and satem come from Latin and Iranian, respectively, and mean “hundred.” They exemplify, with their initial consonant, the two different treatments of the Proto-Indo-European simple velars.) Nowadays less importance is attached to the centum–satem distinction. But it is still generally held that in an early period of Indo-European, there was a sound law operative in the dialect or dialects from which Sanskrit, Iranian, Slavic and the other so-called satem languages developed that had the effect of palatalizing the original Proto-Indo-European velars and eventually converting them to sibilants.

Rule-Based Models and Applications in Biology

Complex systems are governed by dynamic processes whose underlying causal rules are difficult to unravel. However, chemical reactions, molecular interactions, and many other complex systems can be usually represented as concentrations or quantities that vary over time, which provides a framework to study these dynamic relationships. An increasing number of tools use these quantifications to simulate dynamically complex systems to better understand their underlying processes. The application of such methods covers several research areas from biology and chemistry to ecology and even social sciences.In the following chapter, we introduce the concept of rule-based simulations based on the Stochastic Simulation Algorithm (SSA) as well as other mathematical methods such as Ordinary Differential Equations (ODE) models to describe agent-based systems. Besides, we describe the mathematical framework behind Kappa (κ), a rule-based language for the modeling of complex systems, and some extensions for spaßtial models implemented in PISKaS (Parallel Implementation of a Spatial Kappa Simulator). To facilitate the understanding of these methods, we include examples of how these models can be used to describe population dynamics in a simple predator-prey ecosystem or to simulate circadian rhythm changes.

Keywords: Rule-based modeling Stochastic simulation κ language.

Mathematical Concepts and Methods in Modern Biology

Mathematical Concepts and Methods in Modern Biology offers a quantitative framework for analyzing, predicting, and modulating the behavior of complex biological systems. The book presents important mathematical concepts, methods and tools in the context of essential questions raised in modern biology.

Designed around the principles of project-based learning and problem-solving, the book considers biological topics such as neuronal networks, plant population growth, metabolic pathways, and phylogenetic tree reconstruction. The mathematical modeling tools brought to bear on these topics include Boolean and ordinary differential equations, projection matrices, agent-based modeling and several algebraic approaches. Heavy computation in some of the examples is eased by the use of freely available open-source software.

Phytools development web-log and other resources

This package so far implements a number of methods for phylogenetic comparative biology, phylogeny inference, tree manipulation and graphing. However, the phytools project is one in progress. To keep users of phytools up to date on bugs, improvements, and new functionality, I maintain an active web-log (i.e. ‘blog’ This blog acts as both a conduit between the developer (presently myself) and users of the phytools package, as well as a sort of open lab notebook ( Butler 2005 Bradley et al. 2011 ) in which I document the details of bug fixes, software implementation, and use. Most of the functions listed earlier have already been featured on the blog (in the course of their development and refinement). Future work on phytools will also be documented here.

Finally, in addition to my blog, there are a number of other helpful web and email forums for phytools and phylogenetics in the R language generally. The phylogenetics CRAN Task View ( and R-sig-phylo email mailing list ( are two of the most important such resources.