A Repository Database System to Do Data Mining in Drug Discovery
Author | : Jiali Tang |
Publisher | : |
Total Pages | : |
Release | : 2019 |
ISBN-10 | : OCLC:1132866839 |
ISBN-13 | : |
Rating | : 4/5 (39 Downloads) |
Download or read book A Repository Database System to Do Data Mining in Drug Discovery written by Jiali Tang and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: The exponentially increasing amounts of data associated with drug discovery being generated each year make getting useful information from that data more and more critical. With a central repository to keep the massive amounts of data, organizations need tools that can help them extract the most useful information from the data. A data warehouse can bring together data in a single format, supplemented by metadata through the use of a set of input mechanisms known as extraction, transformation, and loading (ETL) tools. Extraction of the data can be either extracting existing data or the data that is imported to the database, transformation is when the data is translated to the format the database can understand. Transformation makes the new format of the data consistent with the other existing data. Finally, the formatted data can be loaded into files and the link address of the data is saved in tables in the database for further analysis. Analysis of the data includes simple query and reporting, statistical analysis, complex multidimensional analysis, and data mining. Large quantities of data are searched and analyzed to discover useful patterns or relationships, which are then used to predict behavior. The purpose of this project is to produce a repository database of drugs, drug features (properties), and drug targets where data can be mined and analyzed. Drug targets are different proteins that drugs try to bind to stop the activities of the protein. For example, g-secretase is a protein that causes Alzheimer's. There are certain drugs that can bind to g-secretase to stop its functionality which in turn may stop Alzheimer's disease. Users can utilize the database to mine useful data to predict the specific chemical properties that will have the relative efficacy of a specific target and the coefficient for each chemical property. This database can be equipped with different data mining approaches/algorithms such as linear, non-linear, and classification types of data modeling. The data models have enhanced with the Genetic Evolution (GE) algorithms [1, 2, through 17]. This paper discusses implementation with the linear data models such as Multiple Linear Regression (MLR) [18], Partial Least Square Regression (PLSR) [19], and Support Vector Machine (SVM) [20].