Head of Laboratory

Denis Derkach

PhD in Physics and Mathematics, associate professor
Publications
835
Citations
39 216
h-index
93
Authorization required.
Lab team

The activity of the Laboratory of Big data Analysis methods is to develop and apply machine learning and data analysis methods to solve problems of fundamental sciences such as particle physics and astrophysics. The search for answers to the mysteries of the universe with leading scientists from these fields is the main direction of the laboratory's development. In particular, we cooperate with the European Center for Nuclear Research (CERN), and our joint work consists both in research on the physics of the events of the Large Hadron Collider and in solving problems of improving the efficiency of data processing. In addition, the laboratory's educational activities include organizing and conducting academic seminars and summer/winter schools on big data analysis and providing scientific guidance to graduate and dissertation papers. The Laboratory of Big Data Analysis Methods was founded in 2015.

  1. New materials
  2. Computer search for materials
  3. Artificial intelligence
Denis Derkach
Head of Laboratory
Fedor D Ratnikov
Fedor Ratnikov
Leading researcher
Andrey E Ustyuzhanin
Andrey Ustyuzhanin
Leading researcher
Mikhail I Hushchyn
Mikhail Hushchyn
Senior Researcher
Sergei Mokhnenko 🤝
Researcher
Mikhail Lazarev 🥼 🤝
Researcher
Ekaterina A Trofimova
Ekaterina Trofimova
Junior researcher
Artem S Ryzhikov
Artem Ryzhikov
Junior researcher
Evgenii O Kurbatov
Evgenii Kurbatov
Junior researcher
Vladimir O Bocharnikov
Vladimir Bocharnikov
Junior researcher
Kenenbek Arzymatov
Kenenbek Arzymatov
Junior researcher
Maxim E Karpov
Maxim Karpov
Junior researcher
Alexander Igorevich Rogachev
Alexander Rogachev
Research intern
Andrey Shevelev
Andrey Shevelev
Research intern
Foma A Shipilov
Foma Shipilov
Research intern
Leonid Gremyachikh
Leonid Gremyachikh
Research intern
Abdalaziz Rashid Rashid
Abdalaziz Rashid
Research intern
Tigran Ramazyan
Tigran Ramazyan
Research intern
David G Kagramanyan
David Kagramanyan
Research intern
Sergey A Popov
Sergey Popov
Research intern
Aziz Temirkhanov
Aziz Temirkhanov
Research intern

Research directions

Natural language for machine learning

+
Routine tasks of designing data analysis pipelines using various machine learning models usually involve building a combination of repetitive common patterns. Nevertheless, the construction of such pipelines is extremely important for specialists in various subject areas that are not directly related to data analysis. Thus, among non-specialists in the field of data analysis, for example, among biologists, chemists, physicists or humanities, there is a great demand for advanced ML pipeline developments. This project aims to develop an auxiliary bot/auxiliary agent capable of generating ML-related task pipelines from a natural language task description. Such an auxiliary bot must rely heavily on natural language processing and programming language synthesis techniques.

Interpreted machine learning models and the search for the laws of nature

+
Interpreted machine learning models and the search for the laws of nature
There are many problems in physics, biology and other natural sciences in which symbolic regression can provide valuable information and discover new laws of nature. The widespread deep neural network does not offer interpretable solutions. Meanwhile, symbolic expressions indicate a clear connection between the observations and the target variable. However, at the moment there is no dominant solution for the symbolic regression problem, and we are striving to reduce this gap with our project. Our laboratory has started research in this direction, and our approach to finding a representation of the symbolic law involves the use of generative models along with optimization methods with constraints. It can be applied to equations in closed form or to a system of differentiable equations. The objective of the study is to improve the model by using active/zero learning methods.

Platforms for evaluating ML models

+
The transfer of predictive deep learning models from a research environment to an industrial environment involves significant costs associated with the versatile verification of such models: work under load, work under conditions of RAM limitations, streaming data access. This project is aimed at implementing algorithms for continuous monitoring of various deep learning models in an industrial environment and early diagnosis of the need for pre-training of these models on the minimum required data set. The goal is to introduce this platform into the CERN LHCb experiment.

High-precision digital twin of data storage systems (DSS)

+
High-precision digital twin of data storage systems (DSS)
High-precision modeling of installations and systems is one of the main directions of industrial data analysis today. Models of systems, their digital counterparts, are used to predict their behavior under various conditions. We have developed a digital twin of a data storage system (DSS) using generative machine learning models. The system consists of several types of components: HDD and SSD disks, disk pools with different RAID arrays, cache and storage controllers. Each storage component is represented by a probabilistic model that describes the probability distribution of the values of the component performance parameters depending on their configuration and the parameters of the external data load. Using machine learning allows you to get a high-precision digital twin of a specific system, spending less time and resources than other analogues. It allows you to quickly predict the performance of the system and its components under different configurations and external data loads, which significantly speeds up the development of new storage systems. Also, comparing the forecasts of the double with the indicators of the real storage system allows you to diagnose failures and anomalies in the system, increasing its reliability.

Detecting temporary changes for predictive analytics systems

+
Detecting changes in the behavior of complex systems is one of the important industrial tasks in signal processing, statistics and machine learning. Solutions to this problem have found applications in many applications: quality control of production processes, monitoring of the condition of engineering structures, detection of failures and breakdowns of equipment according to sensor readings, monitoring of distributed computer systems and detection of security violations, segmentation of video stream, recognition of sound effects, control of chemical processes, monitoring of seismological data, analysis of financial and economic data and many others. We have developed a number of new methods for detecting mode changes in complex systems using classification and regression models, generative-adversarial networks and normalization flows, as well as neural stochastic differential equations. Theoretical and practical advantages over other analogues have been demonstrated. We have successfully applied new methods to detect data storage failures, analyze human activity, and segment videos and texts.

Updating the weather forecast

+
Updating the weather forecast
Forecasting and checking the state of the weather is the task of extrapolating a number of indicators. Modern weather research and forecasting models work well under well-known conditions and short time intervals. On the other hand, it is known that AI methods, available data, and weather simulators do not perfectly match each other. Thus, this project is aimed at developing and training new algorithms to adjust the parameters of the simulator and more effectively obtain reliable forecasts. This synergy, in turn, will improve the accuracy of forecasts of normal and abnormal weather conditions for a longer period.

Investigation of two-dimensional materials: prediction of properties and generation according to specified parameters

+
Investigation of two-dimensional materials: prediction of properties and generation according to specified parameters
The development of new materials with the properties of electric energy storage is the most important task of the modern energy industry. Two-dimensional crystals based on the principles of graphene lattices can be used to produce such materials. The search for crystal lattice configurations is complicated by the multitude of possible options and the length of the test cycle for a single configuration. Many resource-intensive in silico and in vitro tests are required. These algorithms are aimed at realizing the possibility of predicting the energy properties of crystals of a given configuration and solving the problem of inference - determining the optimal crystal configuration according to a given energy characteristic. Combining such algorithms will significantly reduce the time for searching and synthesizing practically useful energy carriers.

Publications and patents

Found 

Partners

Lab address

Москва, Покровский бульвар, 11 комн. S-924
Authorization required.