# Input distribution types

When defining the design space of the project, the user needs to decide not only the range of values for each input distribution but also its probability distribution. Especially for variables representing e.g. environmental variables or material characteristics, this allows to setup of the case to be much closer to real-world conditions. As a result, the obtained mathematical model represents the reality in a better way.

## Featured distribution types

### Normal distribution

Normal distribution, also known as Gaussian distribution, is a continuous probability distribution that is symmetric around its mean value. According to the empirical rule, 99.7% of all its values lie within three standard deviations of the mean value. It is expected for physical quantities that are a sum of many independent processes affected by random errors, e.g. material properties.

### Uniform distribution

It is considered a continuous symmetric distribution where the probability of occurrence in the distribution is equal for all values from a given range. It is a type of distribution well-suited for all input parameters dependent on the designer's decision.

### Lognormal distribution

It is a continuous probability distribution where the logarithm of its values is distributed normally. This distribution consists of real positive values only. It can be observed among various phenomena in the fields of e.g. biology, sociology, chemistry, technology, and finance.

### Gumbel distribution

It is a continuous distribution describing the distribution of the maximum (or the minimum) values of a series of distributions, e.g. maximum rainfall rate in a series of distributions measured over time. Since it is related to the duration of time up to an event, its use can be in e.g. reliability analysis in engineering.

### Laplace distribution

It is a symmetric continuous probability distribution also known as the double exponential distribution, since it can be seen as an exponential distribution mirrored around the location parameter. It is the distribution of differences between two independent varieties with identical exponential distributions. It can be used for describing the distribution of errors in observed values.

### Beta distribution

It is a continuous probability distribution defined on the interval (0,1), thus, it can be used to model random variables of prescribed length. Settings of its two parameters can result in a wide variety of shapes of the distribution (e.g. it turns out to be the uniform distribution with default values of parameters, same as in Figure 2), therefore, there is a wide variety of its use.

### Logistic distribution

It is a symmetric continuous probability distribution very similar to the normal distribution in its appearance but has heavier tails, which can increase the robustness of analyses based on it when compared to those using the normal distribution. It is used for logistic regression model predicting the likelihood of an event based on an input variable.

### Poisson distribution

It is a discrete probability distribution used to represent integer-valued counts, such as the number of events observed within an interval of time or other specified dimensions such as distance, area or volume. Therefore, it is used for applications in many fields related to counting individual objects, e.g. astronomy, biology, and telecommunication.

### Logarithmic distribution

Also known as the logarithmic series distribution, is a discrete probability distribution based on the standard power series expansion of the natural logarithm function. It can be used in cases involving several event occurrences within a time period or space.

### Power distribution

It is a continuous probability distribution defined on the interval (0,1), also known as the power function distribution. It also may be seen as the inverse of the Pareto distribution, thus, its application can cover a similar range of problems.

### Discrete distribution

As the name suggests, it is a discrete type of distribution, in this
case completely arbitrary according to the user's needs. As inputs, two
vectors of comma-separated values are required. One is the *Vector of
positions* setting distribution values, the other is the *Vector of
weights* setting ratios of values in the input distributions. At least
three distinct position values must be given, weights must be greater
than zero.

### Boolean distribution

A special case of the discrete distribution. Although in its traditional meaning it is used as a $true/false$ or $0/1$ value switch, here it is not limited so strictly. In general, it can be used for any parameter splitting data into two categories. To define the distribution, the user only needs to enter two position values (i.e. categories, characteristics, limits, etc.) and their weights. As for the regular discrete distributions, weights have to be greater than zero.

### Data-driven distribution

When dealing with the **Data Analysis** method, it might be required to
work with distributions as close as possible to those of training data in a loaded
**X** matrix. Therefore, input distributions respecting the loaded data are possible to be
created. In principle, the loaded data is split into clusters in the first step and then these
clusters are populated according to the selected rule. Distributions of samples are created
according to the setup that can be found under the *Configure generator* button at the top of
the screen:

**Type of local sampling**: Distribution type used for the sampling around the center of each cluster.**# of clusters coef.**: The coefficient used for the computation of the total number of clusters. The number of clusters affects their diameter and thus the spread of samples across the domain. The higher number of clusters results in distributions with more distinct edges and more concentrated around loaded samples.

There are also sub-variants of this type of distribution dealing with *discrete* or *boolean*
samples if the solved problem requires these.

### User defined distribution

The principle is similar to the discrete distribution, but this custom set
distribution is continuous. Its parameters are set through a standalone
GUI whose features are described in a separate sheet available with the
**Distribution Creator** tool described below.

## Distribution Creator

With the help of the **Distribution Creator**, a distribution of an
arbitrary shape according to the user's needs can be prepared. It is an
add-on of the **Input Preprocessor** program popping up on *+ Define new*
item selection from the distribution selector. The distribution shape is designed directly in the
graphic interface allowing one to see the resulting shape immediately.
The distribution shape itself can be also loaded from a `*.csv`

file when needed.

### How to use the interface

The initial state of the application's window can be seen in
Figure 11. On the left, there is the *Existing custom distributions*
section, where the management of user defined distributions present in the project can be done.
By clicking on items in the list below the user can either *Edit* the existing distribution
or *Make a copy* of it, which can be renamed and adjusted separately. Below the list, there are two
buttons. *Create new* starts a new distribution preparation from scratch. The button *Load form
data file* allows users to upload coordinates of the distribution's control points directly from a
`*.csv`

file. It is a very useful option for situations where the probability distribution can be
e.g. measured or found in the literature. The source file needs to have two comma-separated
columns defining X and Y coordinates of control points. This formatting is very similar to what the
user can see in the GUI when creating the distribution directly in the tool.

The central section of the GUI window is dedicated to the graphical definition of the input
distribution. On the top of this section, there is an entry field where the input distribution
name can be changed. This modification appears instantly in the list on the left. Be aware
that the name of each input distribution has to be unique. Right under the *Name* entry, the
*Boundaries* of the input distribution shape can be adjusted. It is the setting of limits
applicable for all definition points of the distribution, it is not possible to create a
definition point outside the boundaries.

There are two possibilities for how to handle definition points in case of restricting/cropping boundaries of an existing distribution shape. First, points not fitting into cropped boundaries can be deleted completely. The second option is to scale the whole distribution based on the new range of boundaries.

The gridded canvas making the most of the area is an interactive space for the definition of the input distribution shape. Just by the left mouse button clicking, the user can create new definition points of the input distribution shapes - their position within the boundaries and the weight of each point ranging from 0 to 1. Unwanted definition points can be easily removed by the right mouse button click at these. Dragging a point with the mouse adjusts its position and weight but with the restriction given by distribution boundaries and/or positions of surrounding definition points.

As shown in Figure 12, all created definition points are listed in the
list of entries on the right side of the window, one row of the table for each point. Within the
limits given by boundaries, positions and weights of definition points can be edited. Confirmation
of changes is required and done using the *Save* button. For better clarification, the edited point is
highlighted in the canvas area (vice versa the definition point manipulated in the canvas area gets
highlighted in the table). At the bottom of the table of points, there is a possibility to add a new
definition point by inserting its exact coordinates into empty entries and confirming with the *Add point*
button. If necessary, for both adding and editing definition points of the distribution, their order in the table
changes automatically to keep their position coordinates in ascending order. Similarly, each
distribution point can be removed by clicking on the corresponding *X* button in the table (see
Figure 13).

To start with the distribution shape again, one can use the *Clear points* button underneath the
table. Two remaining buttons deal with the closing of the **Distribution Creator**
window. *Close without saving* button discards the window without any effect on the project. The
*Save and close* button adds the newly created input distribution to the project. Then, the
distribution can be found and selected in the list of available input distribution types
(see Figure 14).

Files of each user defined distribution shape are stored in a dedicated subfolder inside the
*custom_distributions* folder, located inside the project. There are two files: `info.json`

with general information about the distribution name and boundaries, the `points.csv`

contains a
table of the distribution points' coordinates.