# Design of Experiments

The Uptimai **Design of Experiments** method builds sampling schemes commonly used
in operational research. It also allows the enhancement of the existing matrix of data points with
new members to optimally fill the design space when increasing the precision of the data
approximation. The method can be used separately or as an initial step of the
**Data Analyzer** method.

## #

How to use the interfaceThe general appearance of the program window, especially its left section, is described in detail
in the **Input preparation** link. Here the main focus is on the other part
which is to a certain point individual for each of the supported methods. The initial
of the GUI window when preparing inputs for the **Design of Experiments** is shown in
Figure 1, as the user starts from scratch and needs to
**Define Input Variables**.

### #

Define Input VariablesIn the beginning, the user creates a new variable (parameter) of the input domain with the
*Add input variable* button. Each added variable appears at the bottom
of the list of already existing variables.

Adding an input variable enhances the input domain space with one dimension. There has to be at least one for the model to be created. Figure 2 describes the situation with multiple input variables already created (see this link for more info about the borehole problem). One of the displayed variables is about to be edited. The input variable can be set using the following controls:

**Variable name**: Label of the input parameter, which is being used throughout the whole process up to the postprocessing. The variable name cannot contain empty spaces, these are automatically replaced with underscores.**Distribution**: Selection box where the user sets the shape of the probabilistic function for the input variable. According to the distribution type selected, additional entries with shape parameters appear. A detailed description of featured probability distribution types can be found in the section**Input distribution types**.**Confirm**: Any changes need to be confirmed with this button to take effect.**"X"**: Each input variable can be deleted when clicking this icon.**"="**: Allows input variable dragging to change the ordering of inputs in the projects.

Adding one or more input variables activates the *Prepare distributions* button. This one invokes the
preparation of randomly distributed samples according to the settings. In case there are invalid
entries in the input variable definition, the user is informed and not allowed to continue to the next
step until everything is by the book. Then, the button itself turns into *Tweak distribution options*,
sending the user to this next step. Also, the **Tweak Distribution Options** item is activated
in the fishbone navigation bar on the left.

### #

Tweak Distribution OptionsAt this point, the user adjusts the boundaries of the input domain and the so-called nominal sample. Boundaries are recommended to be adjusted especially for distribution shapes where the user defines parameters like mean value and standard deviation. In these cases, the edges of the domain depend on the randomization of samples within the input variable. Thus, modification is usually required to set the exact range for such inputs. For certain types of distribution shapes like uniform or discrete, edges of the domain are exactly given by the distribution shape definition and cannot be changed after.

The nominal point is a sample acting as a baseline for the created surrogate model and analysis.
In the model, the results of all data samples are compared with the result value of the nominal
sample. This process allows handling the effects of input parameters and their interactions
separately as increments to the nominal value. It must be within the range of each input variable
and not be equal to its boundaries. Although not strictly necessary, it is recommended
to place the nominal sample into the statistical centre of the domain. Then, the process of the
surrogate model creation is most efficient and precise. The nominal sample's default position is
suggested as the mean of the probability distribution of each input variable. When changing its position,
(shown in Figure 3) it is advised to not shift it by more than 10% of the range
of each input. As in the case of input variable distribution definition, all changes must be saved
using the *Confirm* button.

##### Special considerations

For the sampling of variables leading to periodic or symmetrical functions
(typically, but not exclusively, angles of any kind), extra
caution is required. It is highly recommended not to set their
nominal value exactly to the centre of symmetry of the
corresponding input distribution**!** A typical example can be the
angular position of a crankshaft, wave phase, etc.

Clicking the *Generate data* button at the bottom right invokes the saving
of `.txt`

files with randomly distributed samples according to the settings and input domain info.
Then, the button itself turns into the *Setup Experiment Design*,
sending the user to this next step. Also, the **Setup Experiment Design** item is activated
in the fishbone navigation bar on the left. For fundamental changes in any of the input distributions
can the user go back to the previous step with the *Return to Input Variables* button.

### #

Setup Experiment DesignHere the user creates the matrix of data samples' coordinates. To the right from the fishbone
navigation bar, there is the *Design option* section where one can choose the sampling method.
Currently, the following methods commonly used in the response surface methodology are featured:

**LHS**: Latin hypercube sampling is used to create randomly distributed samples across the domain, which is then covered optimally in all dimensions. Each input variable distribution respects the distribution shape selected in the*Define Input Variables*section. The number of samples needs to be set by the user in the*Number of samples*entry field.**Face-centered**: The type of central composite design (CCD) method, where all samples (besides the central point) are located on the edges of the domain. Offers good quality of domain coverage, however, might not give the best performance for models containing quadratic shapes. The number of samples is fixed and depends on the number of input variables.**Box-Behnken**: Very efficient design scheme for cases with a low number of parameters. Does not contain points in corners of the domain, thus, a model based on Box-Behnken can be insufficient in describing the effects of combined extremities of input parameters. The number of samples is fixed and depends on the number of input variables.**BGELHS**: Basic General Extension algorithm of LHS. When adding new samples to the domain, this method reuses as many existing samples as possible in the new set, however, some previous samples might get lost anyway. BGELHS can be significantly time-consuming for larger domains with a higher number of input variables and a higher total number of required samples.**GGELHS**: General Extension Algorithm of LHS Based on Greedy Algorithm. Allows to add samples to the domain. Not as effective in sense of the number of samples kept from the previous run as BGELHS, but works much faster.

For further information about methods and algorithms used for domain space sampling check the links here and here.

Clicking the *Save and generate samples* shows the coordinates of created points
on the right as *Generated samples*. Also, samples are stored in the `*.txt`

file
named as set in the *Storage file name* entry field. This file can be found in the project folder.
Two more buttons are located under the table. *Return to Distribution Options* brings users back
to the previous step **Tweak Distribution Options** where they can fix boundaries or the
position of the nominal sample. The other button will *Close Preprocessor*.