- Class 30
- Practice 30
- Independent work 90
Lecturers and associates
For the data analysis to have high quality results, it is necessary to make the preparation of the input data. The aim of the course is to demonstrate basic methods of data preparation that includes methods of cleaning, transforming, introverting, normalizing and aggregating data, time series transformation, work with missing values as well as basic data reduction methods such as feature reduction, sample reduction, and discretization.
Introduction to data preparation. Data cleaning. Work with missing values. Data Transformation. Sample Reduction. Aggregation of data. Transformation of time series. Data Integration. Normalization of data. Data discretization. Feature Reduction. Practice and Future. Exam preparation.
1. Crickard, P (2020) Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python, Birmingham: Packt Publishing,
2. Algebra University College (2020), Data Engineering Handbook, Zagreb: Algebra University College
1. Garcia, S., Luengo, J., Herrera, F. (2016) Data Preprocessing in Data Mining, Cham: Springer International Publishing
2. Balamurugan, A.S., Christopher, A.B. (2012) Insight into Data Preprocessing: Theory and Practice: Data Mining Perspective Chisinau: Lap lambert Academic Publishing
1. Chakrabarti, S., Cox E., Eibe, F., Hartmut, RG, Han, J., Jiang, X., Kamber, M., Lightstone, S.S. (2009) Data Mining: Know It All, Massachusetts: Morgan Kaufmann
Minimal learning outcomes
- Describe possible solutions to data preparation problems.
- Discuss differences between methods for working with missing data and data transformation methods.
- Explain the impact of selected newer technologies on the data preparation process.
- Identify different aggregation functions and methods of time series transformation.
- Explain possible solution for a particular problem in the process of integration, normalization and discretization of data
- Explain available basic methods of feature and pattern reduction.
Preferred learning outcomes
- Recommend optimal solutions to data preparation problems.
- Distinguish between an adequate method for working with missing data and data transformation methods.
- Judge the impact of newer technologies on the data preparation process.
- Select adequate aggregation functions and methods of time series transformation.
- Choose an adequate solution for a particular problem in the process of integration, normalization and discretization of data.
- Apply adequate basic methods of feature and pattern reduction.