Head of Laboratory

Derkach, Denis

PhD in Physics and Mathematics, Associate Professor
Publications
831
Citations
35 739
h-index
90
Authorization required.
Lab team

The activity of the Laboratory of Big data Analysis methods is to develop and apply machine learning and data analysis methods to solve problems of fundamental sciences such as particle physics and astrophysics. The search for answers to the mysteries of the universe with leading scientists from these fields is the main direction of the laboratory's development. In particular, we cooperate with the European Center for Nuclear Research (CERN), and our joint work consists both in research on the physics of the events of the Large Hadron Collider and in solving problems of improving the efficiency of data processing. In addition, the laboratory's educational activities include organizing and conducting academic seminars and summer/winter schools on big data analysis and providing scientific guidance to graduate and dissertation papers. The Laboratory of Big Data Analysis Methods was founded in 2015.

  1. New materials
  2. Computer search for materials
  3. Artificial intelligence
Denis Derkach
Head of Laboratory
Ratnikov, Fedor D
Fedor Ratnikov
Leading researcher
Ustyuzhanin, Andrey E
Andrey Ustyuzhanin
Leading researcher
Hushchyn, Mikhail I
Mikhail Hushchyn
Senior Researcher
Sergei Mokhnenko 🤝
Researcher
Mikhail Lazarev 🥼 🤝
Researcher
Trofimova, Ekaterina A
Ekaterina Trofimova
Junior researcher
Ryzhikov, Artem S
Artem Ryzhikov
Junior researcher
Kurbatov, Evgenii O
Evgenii Kurbatov
Junior researcher
Bocharnikov, Vladimir O
Vladimir Bocharnikov
Junior researcher
Arzymatov, Kenenbek
Kenenbek Arzymatov
Junior researcher
Karpov, Maxim E
Maxim Karpov
Junior researcher
Rogachev, Alexander Igorevich
Alexander Rogachev
Research intern
Shevelev, Andrey
Andrey Shevelev
Research intern
Shipilov, Foma A
Foma Shipilov
Research intern
Gremyachikh, Leonid
Leonid Gremyachikh
Research intern
Rashid, Abdalaziz Rashid
Abdalaziz Rashid
Research intern
Ramazyan, Tigran
Tigran Ramazyan
Research intern
Kagramanyan, David G
David Kagramanyan
Research intern
Popov, Sergey A
Sergey Popov
Research intern
Temirkhanov, Aziz
Aziz Temirkhanov
Research intern

Research directions

Natural language for machine learning

+
Routine tasks of designing data analysis pipelines using various machine learning models usually involve building a combination of repetitive common patterns. Nevertheless, the construction of such pipelines is extremely important for specialists in various subject areas that are not directly related to data analysis. Thus, among non-specialists in the field of data analysis, for example, among biologists, chemists, physicists or humanities, there is a great demand for advanced ML pipeline developments. This project aims to develop an auxiliary bot/auxiliary agent capable of generating ML-related task pipelines from a natural language task description. Such an auxiliary bot must rely heavily on natural language processing and programming language synthesis techniques.

Interpreted machine learning models and the search for the laws of nature

+
Interpreted machine learning models and the search for the laws of nature
There are many problems in physics, biology and other natural sciences in which symbolic regression can provide valuable information and discover new laws of nature. The widespread deep neural network does not offer interpretable solutions. Meanwhile, symbolic expressions indicate a clear connection between the observations and the target variable. However, at the moment there is no dominant solution for the symbolic regression problem, and we are striving to reduce this gap with our project. Our laboratory has started research in this direction, and our approach to finding a representation of the symbolic law involves the use of generative models along with optimization methods with constraints. It can be applied to equations in closed form or to a system of differentiable equations. The objective of the study is to improve the model by using active/zero learning methods.

Platforms for evaluating ML models

+
The transfer of predictive deep learning models from a research environment to an industrial environment involves significant costs associated with the versatile verification of such models: work under load, work under conditions of RAM limitations, streaming data access. This project is aimed at implementing algorithms for continuous monitoring of various deep learning models in an industrial environment and early diagnosis of the need for pre-training of these models on the minimum required data set. The goal is to introduce this platform into the CERN LHCb experiment.

High-precision digital twin of data storage systems (DSS)

+
High-precision digital twin of data storage systems (DSS)
High-precision modeling of installations and systems is one of the main directions of industrial data analysis today. Models of systems, their digital counterparts, are used to predict their behavior under various conditions. We have developed a digital twin of a data storage system (DSS) using generative machine learning models. The system consists of several types of components: HDD and SSD disks, disk pools with different RAID arrays, cache and storage controllers. Each storage component is represented by a probabilistic model that describes the probability distribution of the values of the component performance parameters depending on their configuration and the parameters of the external data load. Using machine learning allows you to get a high-precision digital twin of a specific system, spending less time and resources than other analogues. It allows you to quickly predict the performance of the system and its components under different configurations and external data loads, which significantly speeds up the development of new storage systems. Also, comparing the forecasts of the double with the indicators of the real storage system allows you to diagnose failures and anomalies in the system, increasing its reliability.

Detecting temporary changes for predictive analytics systems

+
Detecting changes in the behavior of complex systems is one of the important industrial tasks in signal processing, statistics and machine learning. Solutions to this problem have found applications in many applications: quality control of production processes, monitoring of the condition of engineering structures, detection of failures and breakdowns of equipment according to sensor readings, monitoring of distributed computer systems and detection of security violations, segmentation of video stream, recognition of sound effects, control of chemical processes, monitoring of seismological data, analysis of financial and economic data and many others. We have developed a number of new methods for detecting mode changes in complex systems using classification and regression models, generative-adversarial networks and normalization flows, as well as neural stochastic differential equations. Theoretical and practical advantages over other analogues have been demonstrated. We have successfully applied new methods to detect data storage failures, analyze human activity, and segment videos and texts.

Updating the weather forecast

+
Updating the weather forecast
Forecasting and checking the state of the weather is the task of extrapolating a number of indicators. Modern weather research and forecasting models work well under well-known conditions and short time intervals. On the other hand, it is known that AI methods, available data, and weather simulators do not perfectly match each other. Thus, this project is aimed at developing and training new algorithms to adjust the parameters of the simulator and more effectively obtain reliable forecasts. This synergy, in turn, will improve the accuracy of forecasts of normal and abnormal weather conditions for a longer period.

Investigation of two-dimensional materials: prediction of properties and generation according to specified parameters

+
Investigation of two-dimensional materials: prediction of properties and generation according to specified parameters
The development of new materials with the properties of electric energy storage is the most important task of the modern energy industry. Two-dimensional crystals based on the principles of graphene lattices can be used to produce such materials. The search for crystal lattice configurations is complicated by the multitude of possible options and the length of the test cycle for a single configuration. Many resource-intensive in silico and in vitro tests are required. These algorithms are aimed at realizing the possibility of predicting the energy properties of crystals of a given configuration and solving the problem of inference - determining the optimal crystal configuration according to a given energy characteristic. Combining such algorithms will significantly reduce the time for searching and synthesizing practically useful energy carriers.

Publications and patents

Partners

Lab address

Москва, Покровский бульвар, 11 комн. S-924
Authorization required.