Skip to main content

Input distribution types

When defining the design space of the project, the user needs to decide not only the range of values for each input distribution but also its probability distribution. Especially for variables representing e.g. environmental variables or material characteristics, this allows to setup of the case to be much closer to real-world conditions. As a result, the obtained mathematical model represents the reality in a better way.

Featured distribution types

Normal distribution

Normal distribution, also known as Gaussian distribution, is a continuous probability distribution that is symmetric around its mean value. According to the empirical rule, 99.7% of all its values lie within three standard deviations of the mean value. It is expected for physical quantities that are a sum of many independent processes affected by random errors, e.g. material properties.

Figure 1: Normal distribution, 100,000 random samples, μ=0.0, σ=1.0

Uniform distribution

It is considered a continuous symmetric distribution where the probability of occurrence in the distribution is equal for all values from a given range. It is a type of distribution well-suited for all input parameters dependent on the designer's decision.

Figure 2: Uniform distribution, 100,000 random samples, a=0.0, b=1.0

Lognormal distribution

It is a continuous probability distribution where the logarithm of its values is distributed normally. This distribution consists of real positive values only. It can be observed among various phenomena in the fields of e.g. biology, sociology, chemistry, technology, and finance.

Figure 3: Lognormal distribution, 100,000 random samples, μ=0.0, σ=1.0

Gumbel distribution

It is a continuous distribution describing the distribution of the maximum (or the minimum) values of a series of distributions, e.g. maximum rainfall rate in a series of distributions measured over time. Since it is related to the duration of time up to an event, its use can be in e.g. reliability analysis in engineering.

Figure 4: Gumbel distribution, 100,000 random samples, μ=0.0, β=1.0

Laplace distribution

It is a symmetric continuous probability distribution also known as the double exponential distribution, since it can be seen as an exponential distribution mirrored around the location parameter. It is the distribution of differences between two independent varieties with identical exponential distributions. It can be used for describing the distribution of errors in observed values.

Figure 5: Laplace distribution, 100,000 random samples, μ=0.0, λ=1.0

Beta distribution

It is a continuous probability distribution defined on the interval (0,1), thus, it can be used to model random variables of prescribed length. Settings of its two parameters can result in a wide variety of shapes of the distribution (e.g. it turns out to be the uniform distribution with default values of parameters, same as in Figure 2), therefore, there is a wide variety of its use.

Figure 6: Beta distribution, 100,000 random samples, α=1.0, β=1.0

Logistic distribution

It is a symmetric continuous probability distribution very similar to the normal distribution in its appearance but has heavier tails, which can increase the robustness of analyses based on it when compared to those using the normal distribution. It is used for logistic regression model predicting the likelihood of an event based on an input variable.

Figure 7: Logistic distribution, 100,000 random samples, μ=0.0, s=1.0

Poisson distribution

It is a discrete probability distribution used to represent integer-valued counts, such as the number of events observed within an interval of time or other specified dimensions such as distance, area or volume. Therefore, it is used for applications in many fields related to counting individual objects, e.g. astronomy, biology, and telecommunication.

Figure 8: Poisson distribution, 100,000 random samples, λ=1.0

Logarithmic distribution

Also known as the logarithmic series distribution, is a discrete probability distribution based on the standard power series expansion of the natural logarithm function. It can be used in cases involving several event occurrences within a time period or space.

Figure 9: Logarithmic distribution, 100,000 random samples, p=0.5

Power distribution

It is a continuous probability distribution defined on the interval (0,1), also known as the power function distribution. It also may be seen as the inverse of the Pareto distribution, thus, its application can cover a similar range of problems.

Figure 10: Power distribution, 100,000 random samples, p=2.0

Discrete distribution

As the name suggests, it is a discrete type of distribution, in this case completely arbitrary according to the user's needs. As inputs, two vectors of comma-separated values are required. One is the Vector of positions setting distribution values, the other is the Vector of weights setting ratios of values in the input distributions. At least three distinct position values must be given, weights must be greater than zero.

Boolean distribution

A special case of the discrete distribution. Although in its traditional meaning it is used as a true/falsetrue/false or 0/10/1 value switch, here it is not limited so strictly. In general, it can be used for any parameter splitting data into two categories. To define the distribution, the user only needs to enter two position values (i.e. categories, characteristics, limits, etc.) and their weights. As for the regular discrete distributions, weights have to be greater than zero.

Data-driven distribution

When dealing with the Data Analysis method, it might be required to work with distributions as close as possible to those of training data in a loaded X matrix. Therefore, input distributions respecting the loaded data are possible to be created. In principle, the loaded data is split into clusters in the first step and then these clusters are populated according to the selected rule. Distributions of samples are created according to the setup that can be found under the Configure generator button at the top of the screen:

  • Type of local sampling : Distribution type used for the sampling around the center of each cluster.
  • # of clusters coef. : The coefficient used for the computation of the total number of clusters. The number of clusters affects their diameter and thus the spread of samples across the domain. The higher number of clusters results in distributions with more distinct edges and more concentrated around loaded samples.

There are also sub-variants of this type of distribution dealing with discrete or boolean samples if the solved problem requires these.

User defined distribution

The principle is similar to the discrete distribution, but this custom set distribution is continuous. Its parameters are set through a standalone GUI whose features are described in a separate sheet available with the Distribution Creator tool described below.

Distribution Creator

With the help of the Distribution Creator, a distribution of an arbitrary shape according to the user's needs can be prepared. It is an add-on of the Input Preprocessor program popping up on + Define new item selection from the distribution selector. The distribution shape is designed directly in the graphic interface allowing one to see the resulting shape immediately. The distribution shape itself can be also loaded from a *.csv file when needed.

How to use the interface

The initial state of the application's window can be seen in Figure 11. On the left, there is the Existing custom distributions section, where the management of user defined distributions present in the project can be done. By clicking on items in the list below the user can either Edit the existing distribution or Make a copy of it, which can be renamed and adjusted separately. Below the list, there are two buttons. Create new starts a new distribution preparation from scratch. The button Load form data file allows users to upload coordinates of the distribution's control points directly from a *.csv file. It is a very useful option for situations where the probability distribution can be e.g. measured or found in the literature. The source file needs to have two comma-separated columns defining X and Y coordinates of control points. This formatting is very similar to what the user can see in the GUI when creating the distribution directly in the tool.

Figure 11: Distribution Creator - Initial state of the window

The central section of the GUI window is dedicated to the graphical definition of the input distribution. On the top of this section, there is an entry field where the input distribution name can be changed. This modification appears instantly in the list on the left. Be aware that the name of each input distribution has to be unique. Right under the Name entry, the Boundaries of the input distribution shape can be adjusted. It is the setting of limits applicable for all definition points of the distribution, it is not possible to create a definition point outside the boundaries.

Cropping boundaries

There are two possibilities for how to handle definition points in case of restricting/cropping boundaries of an existing distribution shape. First, points not fitting into cropped boundaries can be deleted completely. The second option is to scale the whole distribution based on the new range of boundaries.

The gridded canvas making the most of the area is an interactive space for the definition of the input distribution shape. Just by the left mouse button clicking, the user can create new definition points of the input distribution shapes - their position within the boundaries and the weight of each point ranging from 0 to 1. Unwanted definition points can be easily removed by the right mouse button click at these. Dragging a point with the mouse adjusts its position and weight but with the restriction given by distribution boundaries and/or positions of surrounding definition points.

Figure 12: Distribution Creator - Prepared custom distribution shape

As shown in Figure 12, all created definition points are listed in the list of entries on the right side of the window, one row of the table for each point. Within the limits given by boundaries, positions and weights of definition points can be edited. Confirmation of changes is required and done using the Save button. For better clarification, the edited point is highlighted in the canvas area (vice versa the definition point manipulated in the canvas area gets highlighted in the table). At the bottom of the table of points, there is a possibility to add a new definition point by inserting its exact coordinates into empty entries and confirming with the Add point button. If necessary, for both adding and editing definition points of the distribution, their order in the table changes automatically to keep their position coordinates in ascending order. Similarly, each distribution point can be removed by clicking on the corresponding X button in the table (see Figure 13).

Figure 13: Distribution Creator - Removing the definition point of the distribution

To start with the distribution shape again, one can use the Clear points button underneath the table. Two remaining buttons deal with the closing of the Distribution Creator window. Close without saving button discards the window without any effect on the project. The Save and close button adds the newly created input distribution to the project. Then, the distribution can be found and selected in the list of available input distribution types (see Figure 14).

Files of each user defined distribution shape are stored in a dedicated subfolder inside the custom_distributions folder, located inside the project. There are two files: info.json with general information about the distribution name and boundaries, the points.csv contains a table of the distribution points' coordinates.

Figure 14: Distribution Creator - Enhanced list of distribution types available for the project