## Design of Experiments@Southampton |

Our monthly study group is organised by Mia Tackney and provides an opportunity for the group to present current work and discuss recent design of experiments papers. Please contact Mia to be added to the study group mailing list.

Our next meeting is being held on **5 July 2019**.
**Time/venue:** 12:00 in 58/2097
**Speaker:** Damianos Michaelides
**Title:** A review of Bayesian design of experiments
**Abstract:** The talk focuses on basic concepts of Bayesian design of experiments with a view to demonstrate how to obtain Bayesian optimal designs. After considering a simple linear case scenario for which optimal designs are analytically tractable, the practical and computational challenges that exist when finding Bayesian designs for other types of models are described. Problems of intractability and high dimensionality raise the need of Monte Carlo approximation methods that can be used to obtain Bayesian designs in order to approximate the expected utility under certain utility functions. The approximation methods are finally implemented in practice for a generalised linear model with the aim to find Bayesian designs using various modern algorithms. The latter is particularly important because it allows to evaluate and compare amongst the algorithms used, in terms of computing cost and performance accuracy.

**Previous meetings**

**Date:** 8 May 2019
**Speaker:** Antony Overstall
**Title:** Bayesian design for physical models using computer experiments
**Abstract:** Design of experiments is an "a-priori" activity making the Bayesian approach particularly attractive. Bayesian inference allows any available prior information to be included in both design and analysis, yielding posterior distributions for quantities of interest that are more interpretable by researchers. Furthermore, the design process allows the precise aim of the experiment to be incorporated. Mathematically, a Bayesian design is given by the maximisation of an expected utility function over all unknown quantities. While straightforward in principle, finding a Bayesian design in practice is difficult. The utility and expected utility functions are rarely available in closed form and require approximation, and the space of all designs can be high dimensional. These problems are compounded when the data-generating process is thought to depend on an analytically intractable physical model, i.e. an intractable likelihood model. This talk will review a recent research programme that has developed methodology to find practically-relevant Bayesian designs for a range of physical models. The methodology uses several layers of computer experiments to both approximate quantities of interest (utilities and expected utilities) and to use these approximations to find Bayesian designs.
**Speaker:** Sam Jackson
**Title:** Approaches to the emulation of chains of computer models with application to epidemic policy making
**Abstract:** We have developed novel Bayesian emulation methodology to analyse chains of computer models, where the outputs of one model feed into the next model. Such computer models, with inputs and outputs representing quantities of interest, are frequently developed to aid the understanding of the general behaviour of real-world processes. In particular, the motivation for this work comes for epidemic disease modelling, linking, for example, atmospheric dispersion, dose-response and epidemiological models. Emulation techniques are well discussed in the literature as an approach to efficiently understanding computationally intensive models, however, for the purposes of analysing chains of computer models, tend to focus on approximating the entire chain using a single emulator.

Our work focusses on several methods to link Bayes linear emulators of each component model of a chain. We have developed emulators for models where the input is uncertain (as the inputs to all but the first model are the uncertain emulated outputs of another model). The first method proposes analysing each emulator's behaviour for a sample of inputs arising from a probabilistic distribution commensurate with our beliefs about the output of the previous emulator. The second method extends the field of emulation to directly incorporate uncertain inputs within each emulator itself. We demonstrate the potential of these novel emulation approaches using intuitive examples, before demonstrating their advantage over the single emulator approach in our application of modelling of epidemic diseases. Application of our techniques to models of such epidemics permits detailed uncertainty quantifi cation via, for example, thorough sensitivity analysis into the effect of unknown quantities, thus aiding online policy decision making in the event of an epidemic.
**Date:** 4 April 2019
**Speaker:** Stephen Gow
**Title:** Emulation for chains of computational models and a dispersion-dose response application
**Abstract:** There are many fields in which it is of interest to make predictions from a chain of computational models or simulators, in which the output of one simulator in the chain forms one of the inputs to the next simulator. For example, chains of simulators can be used to describe the impact of the release of a biological agent by combining atmospheric dispersion and casualty models. Each simulator will often be computationally intensive, and for computational feasibility must be approximated by a simpler Gaussian process emulator. We review methods to predict the output of the final simulator in such a chain, introduce sensitivity analysis techniques, and demonstrate methodology for an application of biological agent dispersion.
**Speaker:** Mia Tackney
**Title:** Sequential design for personalized medicine
**Abstract:** Advances in genomics are making it possible to personalize medicine so that treatments are tailored to patients' genetic information. For example, cancers can now be characterized at the molecular level, and treatments can be targeted at specific genetic and biological mechanisms (biomarkers). Personalized clinical trials need to be able to identify effective treatment-biomarker combinations. We demonstrate a method of designing a sequential experiment with an adaptive treatment allocation scheme which seeks to both find effective treatments and estimate the corresponding treatment-biomarker interactions. This method weights the probabilities of treatment assignment according to an optimality criterion which takes into account the biomarkers, treatment assignment and response of the patients in the trial so far. We provide examples of both myopic and non-myopic sequential strategies. In the former, decisions on which treatment to apply to the current patient ignore any potential information about future patients; in the latter, we account for potential treatment allocations to future patients when choosing the treatment for the current patient. We describe some computational challenges in implementing the non-myopic method and, through simulation studies, we describe possible settings in which it may provide benefit over the myopic approach.
**Date:** 22 March 2022
**Title:** Paper discussion - Joseph, Gu, Ba and Myers (2019). Space-filling designs for robustness experiments. Technometrics, 61, 24-37 (https://amstat.tandfonline.com/doi/full/10.1080/00401706.2018.1451390#.XIYcZYj7SUl)
**Date:** 25 February
**Speaker:** Dasha Semochkina
**Title:** Optimisation using emulation in disease modelling. How to use potential improvements to identify new design points
**Abstract:** The primary interest of this project is in optimising computationally expensive (and possibly competing) objective functions. Without loss of generality we can consider the case of minimisation instead of optimisation. In this talk I am going to present some methods of emulation based optimisation techniques.

The simplest emulation-based (also known as surrogate-based) optimisation recipe goes like this: we build a sampling plan, we then calculate the responses in these points and fit a surrogate model to this data, then we can locate the input parameters that are as close to the true minimum of the function as we can make it. However, the optimisation task is not complete until we validate this function value against the true, expensive function itself. We then can, and indeed should retain more of our computational budget so as to allow this search and update process to be repeated many times, adding multiple so-called infill points.

When performing a search and infill strategy, we wish to position the next infill point at the value which will lead to an improvement on our bestobserved value so far. One possibility is to calculate a so-called expected improvement at a finite set of points and select the best one. This, however, implies that our current minimum observation is the correct representation of the function’s value at that point. If there is noise present, using only the noisy observations is a highly risky strategy, since the noise may introduce errors in the ranking of the observations. Expected quantile improvement introduces a different infill criterion in order to overcome this drawback.
**Date:** 1 February 2019
**Speaker:** Kim Lee (MRC Biostatistics Unit, University of Cambridge)
**Title:** Introduction to methods that allow for the incorporation of historical control data
**Abstract:** In this talk, I will introduce methods that allow for the incorporation of historical control data. The rationale of doing this is to borrow information from various studies, aiming to increase the precision of making inference. I will describe some work on designs that consider some of these analysis methods. A review of the analysis methods that I will be discussing can be found in this paper by van Rosmalen et al. (2018).
**Date:** 7 December 2018
**Speaker:** Stefana Juncu (Department of Psychology, University of Portsmouth)
**Title:** The role of presenting multiple images on prospective person memory
**Abstract:** To find a missing person, the public must be on the lookout and alert the authorities if they encounter the person during their everyday activities. Border control officers use the same type of memory process when looking for wanted individuals whilst doing their everyday jobs. In laboratory and field-based studies, performance on these types of prospective person memory tasks has been poor. In this study, we will explore whether presenting different types (i.e., low variability of high variability) of images of the targets improves prospective person memory accuracy. Moreover, we will also look at the impact of the way we are presenting these photographs (i.e., simultaneous vs massed sequential vs distributed sequential) on performance. This study will use a 2 (variability: low vs high) x 3 (presentation mode: simultaneous, massed sequential or distributed sequential) mixed design. The experimental design aspects of interest in this study are sample size calculations and finding an appropriate stopping rule.
**Date:** 26 October 2018
**Speaker:** David Arcia (Cancer Sciences, Faculty of Medicine)
**Title:** Cancer Immunotherapy: training the immune system to eliminate tumours
**Abstract:** Cancer represents one of the most deadly diseases in humans, and current approved therapeutic strategies fail to eliminate tumours from patients and are correlated with many adverse effects. Over the last few years, cancer immunotherapy drugs have proven their utility for the treatment of this disease, showing complete long-lasting remission in some cases. The main goal of immunotherapy is to activate the host immune system, so it can detect and eliminate cancer cells. Particular components of the immune system are considered the main effectors of these anti-cancer responses, specifically CD8+ T cells, which have been the target cells for most immunotherapy studies. However, in basal conditions, CD8+ T cells are heavily immunosuppressed by different mechanisms in the tumour microenvironment, which include the presence of regulatory cells, and the high expression of co-inhibitory molecules, such as CTLA-4 and PD-1. Several monoclonal antibodies have been developed that block these co-inhibitory molecules and have been shown to release CD8+ T cells from its immunosuppressive state, thus inducing strong anti-tumour immune responses that are translated into beneficial clinical outcomes. Moreover, different strategies have been aimed to boost the activity of CD8+ T cells by targeting, including vaccination strategies comprised of short cancer-derived protein fragments, known as epitopes, which can activate the anti-tumour activity of these cells. These strategies have shown exciting results in pre-clinical models, however, an appropriate study design is imperative for a successful translation of the results into the human setting. This talk will be focused on the design of relevant experiments in cancer immunotherapy pre-clinical models, which can help to pave the way for novel therapeutic approaches against cancer.
**Date:** 25 September 2018
**Title:** Paper discussion - Binois, Huang, Gramacy and Ludkovski (2018). Replication or exploration? Sequential design for stochastic simulation. Technometrics, in press (https://amstat.tandfonline.com/doi/full/10.1080/00401706.2018.1469433#.W4mZBy2ZNGx)
**Date:** 27 July 2018
**Speaker:** Ben Parker
**Title:** Optimal design for queueing systems: maximal information by appropriate measurement on queues
**Abstract:** Queues occur frequently in many areas of nature and technology, for example in transport modelling, but much current use of queues is made in modelling communications networks.

Whilst probabilistic results have been derived from queueing models, there has been limited research about the optimal measurement times in order to make inference about the parameters which determine the queues' behaviour. In particular, in some applications (e.g. communications networks) measuring queues can require adding customers to the queue to act as survey customers. This has the effect of altering the future behaviour of the queue, and potentially changing the optimal measurement pattern of the queues: observations interfere with the experiment. We look in some detail at this interesting interfering case.

We examine the optimal design of measurements on queues with particular reference to the M/M/1 queue. Using the statistical theory of design of experiments, we find optimal times to measure the queue when the parameters of the queue are unknown. We present some guidelines on how to extend for other probabilistic models (differential equation models and more general Markov chain models).
**Speaker:** Dave Woods
**Title:** Design of experiments for the calibration of computational models
**Abstract:**Computational modelling now underpins much research in the sciences and engineering, allowing in silico investigations and predictions of complex systems. Reliable and accurate computational modelling often relies on the calibration of the model using physical data, usually collected via designed experiments. These data are used to tune, or estimate, unknown model parameters and, perhaps, to learn the discrepancy between the computational model and reality.

In this talk, we present some new methods for the optimal design of physical experiments for this calibration problem. A Bayesian approach is adopted, with a Gaussian process prior assumed for the output from the computational model. New decision-theoretic optimal designs are sought for this problem, using novel methods for the numerical approximation of the expected utility of a design. The results are motivated by, and demonstrated on, problems from science and technology.

Joint work with Yiolanda Englezou and Tim Waite.
**Date:** 13 July 2018
**Speaker:** Meshayil Alsolmi
**Title:** Decision-theoretic optimal experimental design for generalized linear models
**Abstract:** Bayesian designs are found by maximising the expectation of a utility function where the utility function is chosen to represent the aim of the experiment. There are several hurdles to overcome when considering Bayesian design for intractable models. Firstly, common to nearly all Bayesian design problems, the expected utility function is not analytically tractable and requires approximation. Secondly, this approximate expected utility needs to be maximised over a potentially high-dimensional design space. To compound these problems, thirdly, the model is intractable, i.e. has no closed form. New approaches to maximise an approximation to the expected utility for intractable models are developed and applied to illustrative exemplar design problems with experimental aims of parameter estimation and model selection.
**Speaker:** Antony Overstall
**Title:** Bayesian design for intractable models
**Abstract:** Alphabetical optimal designs, found by minimising a scalar function of the inverse Fisher information matrix, represent the de-facto standard in optimal design of experiments. For example, the well-known D-optimal design is found by minimising the log determinant of the inverse Fisher information matrix. An alternative decision-theoretic basis for frequentist design is proposed whereby designs are found by minimising the risk function defined as the expectation of an appropriate loss function. The conceptual advantages of the decision-theoretic framework over alphabetical optimal designs will be discussed. However, finding such designs is complicated due to the considerable computational challenges of minimising an analytically intractable risk function, leading to the need for suitable approximation methods. A number of approximation methods are proposed to generate frequentist decision-theoretic optimal designs for different classes of models. Comparison between these approximations will be considered in terms of performance and computing time.

**Date:** 15 May 2018
**Speaker:** Sam Jackson
**Title:** Design of physical systems experiments using history matching methodology
**Abstract:** History matching is a method of efficiently identifying the subset of the input parameter space of a model giving rise to acceptable matches to observed data, given our state of uncertainty about the model and the measurements. We achieve this by iteratively removing parts of the space classed as implausible with the aid of a series of increasingly accurate Bayes linear emulators. Analysis of the resulting subset is informative for answering specific scientific questions about the physical system corresponding to various relevant features of the model. I will discuss history matching and how we have developed a powerful technique for the design of future system experiments based on history matching criteria relating to current scientific research aims. We will demonstrate our novel methodology to a systems biology setting to analyse hormonal crosstalk in the roots of Arabidopsis Thaliana.

**Date:** 19 April 2018
**Speaker:** Danesh Tarapore (Electronics and Computer Science)
**Title:** Data-efficient machine learning for fault-tolerant robots
**Abstract:** Robots have transformed many industries such as manufacturing and logistics. However, a major obstacle to their widespread adoption in more complex environments outside factories is their fragility. Whereas animals can quickly adapt to injuries, current robots cannot ‘think outside the box’ to find a compensatory behaviour when they are damaged: they are limited to their pre-specified self-sensing abilities, can diagnose only anticipated failure modes, and require a pre-programmed contingency plan for every type of potential damage, an impracticality for complex robots. A promising approach to reducing robot fragility involves having robots learn appropriate behaviors in response to damage, but current techniques are slow even with small, constrained search spaces. In this talk, I will introduce a new type of evolutionary algorithm (EA) that enables robots to adapt to damage in less than two minutes in large search spaces without requiring self-diagnosis or pre-specified contingency plans. Before the robot is deployed, it uses the EA to illuminate a detailed map of the space of high-performing behaviors. This map represents the robot’s prior knowledge about what behaviors it can perform and their value. When the robot is damaged, it uses this prior knowledge to guide a Bayesian optimization process to rapidly discover behaviors that work despite the damage. Such a combination of bio-inspired algorithms and data-efficient machine learning techniques allow us to optimize the parameters for walking-robot controllers in 36D, to discover compensatory behaviors to damages (e.g., broken joints) in just a few trials (less than 2 minutes). The work was featured on the front cover of Nature.

**Date:** 16 February 2018
**Speaker:** Stefanie Biedermann
**Title:** Optimal Design for Experiments with Potentially Missing Data
**Abstract:** The presence of missing response values complicates statistical analyses. However, incomplete data are particularly problematic when constructing optimal designs, as it is not known at the design stage which values will be missing. When data are missing at random (MAR) it is possible to incorporate this information into the optimality criterion that is used to find designs. However, when data are not missing at random (NMAR) such a framework can lead to inefficient designs.

We address the specific challenges that NMAR values present when finding optimal designs for linear regression models. We show that the optimality criteria will depend on model parameters that traditionally do not affect the design, such as regression coefficients and the residual variance. We also develop a framework that improves efficiency of designs over those found assuming values are MAR.

**Date:** 22 January 2018
**Speaker:** Mia Tackney
**Title:** Design of Experiments for Personalized Medicine
**Abstract:** Advances in genomics are making it possible to personalize medicine so that treatments are tailored to patients’ genetic information. For example, cancers can now be characterized at the molecular level, and treatments can be targeted at specific genetic and biological mechanisms (biomarkers). There is a need for clinical trials to be able to validate effective treatment-biomarker combinations. We consider how to design a sequential experiment with an adaptive treatment allocation scheme which seeks to find optimal treatment-biomarker combinations. This method weights the probabilities of treatment assignment according to accruing data on treatment effectiveness for each biomarker subgroup. We show some simulation results and discuss issues around constructing the initial design and how to define effective sample size.

This is joint work with Kim May Lee (MRC Biostatistics Unit, University of Cambridge)

**Date:** 11 December 2017
**Speaker:** Stephen Gow
**Title:** Prediction for chains of computer models using Gaussian process emulators
**Abstract:** There are many important applications in which it is of interest to make predictions from a chain of computational models, in which the output of a model in the early part of the chain forms one of the inputs to a later model. Each model in the chain will often be computationally intensive, and must therefore be approximated by a simpler emulator in order to be used feasibly. We introduce a method to predict the output of a chain of models approximated by Gaussian process emulators, and consider sensitivity analysis and the problem of experimental design for such a chain.

**Date:** 24 November 2017
**Speaker:** Antony Overstall
**Title:** Bayesian design for intractable models using indirect inference and multivariate emulators
**Abstract:** There are several hurdles to overcome when considering Bayesian design for intractable models. Firstly, common to nearly all Bayesian design problems the expected loss function is not analytically tractable and requires approximation. Secondly, this approximate expected loss needs to be minimised over a potentially high-dimensional design space. To compound these problems, thirdly, the model is intractable, i.e. has no closed form. The gold standard for inference in this latter case is to use Approximate Bayesian Computation (ABC) to evaluate the posterior distribution. However this, in the case of design, is highly inefficient. Instead, indirect inference will be used. This is where an auxiliary model is developed to approximate the intractable model. The difficulty now lies in choosing the auxiliary model. An automatic approach is described based on multivariate emulators. This approach is used to find approximately optimal designs in several illustrative examples.

**Date** 04 October 2017
**Speaker:** Yiolanda Englezou
**Title:** New methods for approximating the expected utility in Bayesian design for nonlinear models
**Abstract:** The estimation of empirical and physical models is often performed using data collected via experimentation. Hence, the design of the experiment is crucial in determining the quality of the results. For complex models, an optimal design often depends on features, particularly model parameters, which are uncertain prior to experimentation. This dependence leads naturally to a Bayesian approach which can (a) make use of any prior information on these features, and (b) be tailored to the reduction of posterior uncertainty.

Optimal Bayesian design for most realistic models is complicated by the need to approximate an analytically intractable expected utility; for example, the expected gain in Shannon information from the prior to posterior distribution. For models which are nonlinear in the uncertain parameters, this expected gain must be approximated numerically. The standard approach employs "double-loop" Monte Carlo integration using nested sampling from the prior distribution. Although this method is easy to implement, it produces biased approximations and is computationally expensive.

In this talk, we will describe, assess and compare some recent alternatives to simple Monte Carlo sampling from the prior for the approximation of expected utilities. The presented methods include combinations of features from importance sampling and Laplace approximations. Assessments will include both computational cost and the statistical qualities of the resulting approximations.

**Date:** 13 September 2017
**Speaker:** Ben Parker
**Title:** Optimal Design of Experiments on Connected Units: How to use experiments to measure networks better, and how to use networks to make experiments better
**Abstract:** We review our recent method for designing experiments on social networks. In previous work, we introduced a linear network effects model, which provides a framework for finding the optimal design of experiments when experimental units were connected according to some relationship, which was specified by an adjacency matrix. We showed how these networks could be useful in a variety of experimental applications: for example, agricultural experiments where experimental units (plots) were connected by some spatial relationship, and also in crossover trials, where experimental units were connected by temporal networks.

In this current work, we argue that there is a wide class of experiments that can be reformulated into a problem of design on a network, and that by presenting the problem in this networked form, we can develop faster new algorithms that allow optimal designs on the original problems to be found more quickly. In other words, by regarding experimental design models as problems in network science, we can improve experimental design algorithms even when there is no obvious network relationship.

**Date:** 27 July 2017
**Speaker:** Dave Woods
**Title:** Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion
**Abstract:** It is often desirable to build a statistical emulator of a complex computer simulator in order to perform analysis which would otherwise be computationally infeasible. We propose methodology to model multivariate output from a computer simulator taking into account output structure in the responses. The utility of this approach is demonstrated by applying it to a chemical and biological hazard prediction model. Predicting the hazard area which results from an accidental or deliberate chemical or biological release is imperative in civil and military planning and also in emergency response. The hazard area resulting from such a release is highly structured in space, and we therefore propose the use of a thin-plate spline to capture the spatial structure and fit a Gaussian process emulator to the coefficients of the resultant basis functions. We compare and contrast four different techniques for emulating multivariate output: dimension reduction using (i) a fully Bayesian approach with a principal component basis, (ii) a fully Bayesian approach with a thin-plate spline basis, assuming that the basis coefficients are independent, (iii) a "plug-in" Bayesian approach with a thin-plate spline basis and a separable covariance structure; and (iv) a functional data modelling approach using a tensor-product (separable) Gaussian process. We develop methodology for the two thin-plate spline emulators and demonstrate that these emulators significantly outperform the principal component emulator. Further, the separable thin-plate spline emulator, which accounts for the dependence between basis coefficients, provides substantially more realistic quantification of uncertainty, and is also computationally more tractable, allowing fast emulation. For high resolution output data, it also offers substantial predictive and computational advantages over the tensor-product Gaussian process emulator.

**Date** 19 June 2017
**Speaker:** Maria Adamou
**Title:** Optimal design with profile factors
**Abstract:** Increasing numbers of experiments in science and engineering involve profile factors, whose values can be varied (e.g. as a function of time) within a single run of the experiment. A typical example of a profile factor is temperature which can perhaps be varied monotonically or as a step function. The design problem then becomes choosing suitable functions for the profile factor for each run of the experiment.

Motivated by biopharmaceutical studies, we present some initial results on optimal design for factorial experiments with profile factors. Both frequentist and Bayesian designs are considered, and some connections are made between smoothness of functional parameters and sparsity inducing prior distributions.

**Date:** 23 May 2017
**Speaker:** Martina Testori
**Title:** Modelling the effect of psychopathic traits on strategies in the iterated prisoner's dilemma game
**Abstract:** One of the first and central assumptions of game theory is that players are rational, i.e. they will always choose the action which gives the best outcome, given what they expect their opponents will do. Despite this, researchers have started investigating which other considerations may affect the decision-making process as they noticed "irrational" behaviour in experimental research. Emotions and psychological notions have been included in economic models to explain human behaviour in social games. Additionally, it is well known that visual facial expressions communicate messages and intentions and a particular focus was placed in the emotional content of smiles and frowns. We have conducted experiments in which smiling and frowning faces have been used to communicate the opponent’s state of mind to the players. The aim of this study is to investigate how psychopathic traits affect the rate of cooperation in the presence of emotional feedback in an Iterated Prisoner’s Dilemma (IPD) game, in order to answer the question of how empathy affects human behaviour. Psychopaths are characterised by a general lack of emotional empathy and attenuated response to emotional stimuli. Therefore, people scoring high values in psychopathic measures will not be as affected by their opponents’ facial expression as people scoring low values in those measures. Hence, we aim to show a negative correlation between cooperation and psychopathic personality traits. In this talk, I will describe the methodology of our work so far and potential ideas for a future experiment.

**Date:** 28 April 2017
**Speaker:** Antony Overstall
**Title:** knitr: reproducible reports
**Abstract:** knitr is an R package which allows you to integrate R code into LaTeX. This allows reports which rely on presenting numerical results
(e.g. theses, papers, lecture notes, practical sheets) to be generated which are reproducible (i.e. free of errors caused by copy and
paste mistakes, typos, etc). In this talk, I will briefly introduce knitr based on my own experience of using it to create papers, lecture
notes, and R worksheets.

**Date:** 31 March 2017
**Speaker:** Mia Tackney
**Title:** Seesaws for sequential design: balancing covariates for treatment groups in clinical trials
**Abstract:** In many clinical trials, patients enrol for the study sequentially over time. Covariate information such as the patients’ age, sex and medical history
is taken at enrolment, and a treatment is allocated shortly after. Complete randomization is one way to assign treatments to patients. However, a given realization of randomization
could lead to a design that is highly unbalanced for one or more of the covariates. This can lead to confounding bias and is particularly problematic for small experiments. Further,
since covariate information is only known for patients who have enrolled, stratified randomization to ensure balance is not possible. I will present an approach from clinical trials
methodology called "minimization", which aims to allocate treatments in a way that preserves covariate balance between treatment groups. I will also present alternative methods from
an optimal design perspective, and illustrate some connections between them.

**Date:**14 February 2017
**Speaker:** Alistair Bailey (Cancer Studies, Southampton)
**Title:** Identifying allergenic peptides presented by skin cells
**Abstract:** In this talk I will describe our work to identify the protein fragments, called peptides, responsible for an allergic reaction in
skin using mass spectrometry proteomics. Due to technical and financial constraints, our experiments are limited to four experimental samples under two conditions.
The statistical challenge therefore is to determine the likelihood that peptides observed under these two conditions are from different underlying populations.

**Date:**24 January 2017
**Speaker:** Helen Ogden
**Title:** Inference with approximate likelihoods, and some links with design of experiments
**Abstract:** Many statistical models have likelihoods which are intractable: it is impossible or too expensive to compute the likelihood exactly.
In such settings, a common approach is to replace the likelihood with an approximation, and then do inference as if the approximate likelihood were the exact likelihood.
It is natural to ask whether the resulting inference will be close to the inference we would have got with the exact likelihood.
I will describe some conditions on the approximation to the likelihood which ensure that inference using the approximate
likelihood will "tend towards" the inference with the exact likelihood, as the amount of information available about the parameters grows.
As an example, I will describe some implications of these results for inference for Generalized Linear Mixed Models using a Laplace approximation to the likelihood.
I will finish with a discussion of some possible links with problems in design of experiments, such as finding approximately optimal designs for models with intractable likelihoods,
and using emulators to conduct inference for simulator models.

**Date** 30 November 2016
**Speaker:** Dave Woods
**Title:** Closed-loop automatic experimentation for optimisation
**Abstract:** Automated experimental systems, involving minimal human intervention, are becoming more popular and common, providing economical and fast data collection.
We discuss some statistical issues around the design of experiments and data modelling for such systems. Our application is to “closed-loop” optimisation of chemical processes, where automation of
reaction synthesis, chemical analysis and statistical design and modelling increases lab efficiency and allows 24/7 use of equipment.

Our approach uses nonparametric regression modelling, specifically Gaussian process regression, to allow flexible and robust modelling of potentially complex relationships between reaction conditions and measured responses. A Bayesian approach is adopted to uncertainty quantification, facilitated through computationally efficient Sequential Monte Carlo algorithms for the approximation of the posterior predictive distribution. We propose a new criterion, Expected Gain in Utility (EGU), for optimisation of a noisy response via fully-sequential design of experiments, and we compare the performance of EGU to extensions of the Expected Improvement criterion, which is popular for optimisation of deterministic functions. We also show how the modelling and design can be adapted to identify, and then down-weight, potentially outlying observations to obtain a more robust analysis.

Joint work with Tim Waite (University of Manchester)