# Data Analyzer

Uptimai **Data Analyzer** builds the complete surrogate mathematical
model of the solved problem. It allows performing the full
study of dependencies of output values on uncertain inputs, not only sensitivities to
input variables. On the full model, it is possible to prepare the complete set of visualisations
of these dependencies and walk in detail through all statistical characteristics of the model.
It also allows statistics-based optimization, in which results are again delivered in the form of easily
readable graphics, giving the user straightforward hints for increased performance in a range
of e.g. operational and environmental conditions, or manufacturing tolerances. Here it looks
very similar to the **Uncertainty Quantification** method, however, the
**Data Analyzer** works with already prepared datasets and does not call for
outputs corresponding to specific combinations of input parameters.

## #

How to use the interfaceThe general appearance of the program window, especially its left section, is described in detail
in the **Input preparation** link. Here the main focus is on the other part
which is to a certain point individual for each of the supported methods. The initial
of the GUI window when preparing inputs for the **Data Analyzer** is shown in
Figure 1, as the user starts from scratch and needs to
**Select Initial Data**.

### #

Select Initial DataIn the beginning, the user needs to select source files with the necessary data. Under
*Samples*, there should be selected the file with input coordinates of data points (matrix X).
File with corresponding output values (matrix Y) is handled in section *Results*.
Both are plain `*.txt`

files with comma-separated values, where the number of lines is the
number of samples to be used to create the mathematical model.

The number of lines must be
the same in both matrices. Otherwise, the user cannot successfully continue to
**Define Input Variables** with the *Define input variables* button or the same-called
fishbone item.

### #

Define Input VariablesOn top of the window, there is an entry field *# of Monte-Carlo samples* setting the
number of samples to be used for Monte Carlo sampling used for model propagation
and visualizations. It must be an integer value between 1,000 and 1,000,000. The default value of
100,000 is based on the
best-practice trade-off between the speed of the solver and postprocessor, file
sizes, and model precision.

The number of input variables depends on sampling loaded from files in the previous step. This prevents the incompatibility of loaded data with the definition of inputs for the Uptimai solver. The ordering of variables cannot be changed for the very same reason. However, types and boundaries can still be edited as in Figure 3 (see this link for more info about the borehole problem used here as an example). The input variable can be set using the following controls:

**Variable name**: Label of the input parameter, which is being used throughout the whole process up to the postprocessing. The variable name cannot contain empty spaces, these are automatically replaced with underscores.**Distribution**: Selection box where the user sets the shape of the probabilistic function for the input variable. According to the distribution type selected, additional entries with shape parameters appear. A detailed description of featured probability distribution types can be found in the section**Input distribution types**.**Confirm**: Any changes need to be confirmed with this button to take effect.**+ Advanced Options: Activation Type**: Allows change between*Active*(by default) and*Inactive*.*Active*means that the intrinsic uncertainty of the variable will be propagated and*Inactive*means that only the nominal value will be used (the variable won’t be studied).

The *Prepare distributions* button invokes the
preparation of randomly distributed samples according to the settings. In case there are invalid
entries in the input variable definition, the user is informed and not allowed to continue to the next
step until everything is by the book. Then, the button itself turns into *Tweak Distribution Options*,
sending the user to this next step. Also, the **Tweak Distribution Options** item is activated
in the fishbone navigation bar on the left.

### #

Tweak Distribution OptionsAt this point, the user adjusts the boundaries of the input domain and the so-called nominal sample. Boundaries are recommended to be adjusted especially for distribution shapes where the user defines parameters like mean value and standard deviation. In these cases, the edges of the domain depend on the randomization of samples within the input variable. Thus, modification is usually required to set the exact range for such inputs. For certain types of distribution shapes like uniform or discrete, edges of the domain are exactly given by the distribution shape definition and cannot be changed after.

The nominal point is a sample acting as a baseline for the created surrogate model and analysis.
In the model, the results of all data samples are compared with the result value of the nominal
sample. This process allows handling the effects of input parameters and their interactions
separately as increments to the nominal value. It must be within the range of each input variable
and not be equal to its boundaries. Although not strictly necessary, it is recommended
to place the nominal sample into the statistical centre of the domain. Then, the process of the
surrogate model creation is most efficient and precise. The nominal sample's default position is
suggested as the mean of the probability distribution of each input variable. When changing its position,
(shown in Figure 4) it is advised to not shift it by more than 10% of the range
of each input. As in the case of input variable distribution definition, all changes must be saved
using the *Confirm* button.

##### Special considerations

For the sampling of variables leading to periodic or symmetrical functions
(typically, but not exclusively, angles of any kind), extra
caution is required. It is highly recommended not to set their
nominal value exactly to the centre of symmetry of the
corresponding input distribution**!** A typical example can be the
angular position of a crankshaft, wave phase, etc.

Clicking the *Generate data* button at the bottom right invokes the saving
of `.txt`

files with randomly distributed samples according to the settings and input domain info.
Then, the button itself turns into a **View Data Histogram**,
sending the user to this next step. Also, the **View Data Histogram** item is activated
in the fishbone navigation bar on the left. For fundamental changes in any input distribution,
the user can return to the previous step with the *Return to Input Variables* button.

### #

View Data HistogramIn this section, all created input variables can be reviewed to check their probability distribution shapes. Users can see the histograms of randomized samples as these are about to be used. It is recommended to provide such a type of check before an actual data analysis run to prevent solver crashes and eliminate misinterpretation of results.

To the left of the actual histogram plot, there are controls of the figure to be shown. Box labeled
*Variables* contain the list of input parameters available in the domain. Each item can be
selected by mouse clicking, showing the corresponding distribution shape. The appearance of the
plot can be changed using the settings in the *Plot options* section:

**Plot title**: Displayed above the plot, input variable name by default.**X label**: Label of the X axis, input variable name by default.**Y label**: Label of the Y axis. The default text contains the number of samples used for the histogram and the number of bins these are split in.**Title size**: Size of the title font.**Label size**: Size of the font for both axis labels.**Show legend**: Switch turning on/off the legend of the plot.**Legend size**: Size of the legend font.**Range**: Double-sided slider allowing to show a slice of the input distribution in detail. Dragging one of the slider's points limits the depicted range one can move with the section along the X-axis by dragging the green bar of the slider (both edge points are highlighted).**Adjust axes**: Toggle if the X-axis range of the plot should be only the range adjusted with the slider above (on) or the full range of the input distribution (off).**Normalize plot**: Turn on/off normalization of the histogram. Y-axis values change accordingly, Y-axis' default label changes from*N*to*Density*.**Log. vertical axis**: Turns on/off logarithmic scaling of the Y-axis.**Bin count**: Set the number of bins for the histogram. The recommended value is below 200. Needs to be confirmed with the*Apply*button.

There are two more buttons under the plot. *Return to Distribution Options* brings users back
to the previous step of **Tweak Distribution Options** where they can fix boundaries or the
the nominal sample position. The other button will *Close Preprocessor* since all the input files
required are ready for the next step, which is **Core Solver Setup**.

Additional vertical lines that can be seen in the plot show the boundaries of the input variable distribution (input domain edges) and the position of the nominal sample. Also, clicking into the plot invokes the cross with a label showing the exact value of the selected point in the histogram.