Practical - 5
Practical 5
Aim: Data Pre-processing and text analytics using Orange
Theory:
what is text analytics:
The automated method of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns is text analytics.
Preprocessing is a key component in Data Science. The orange tool has various ways to achieve the activities.
1. Discretization:
Discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers.
Discretization replaces continuous features with the corresponding categorical features:
2. Continuization
- binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument
zero_based
. - multinomial variables are treated according to the argument
multinomial_treatment
. - discrete attribute with only one possible value is removed;
3. Normalisation:
Construct a preprocessor for normalization of features. Given a data table, preprocessor returns a new table in which the continuous attributes are normalized.
4. Randomization :
Construct a preprocessor for randomization of classes, attributes and/or metas. Given a data table, preprocessor returns a new table in which the data is shuffled.
Comments
Post a Comment