Practical - 5

 

Practical 5

 Aim: Data Pre-processing and text analytics using Orange


Theory:


what is text analytics:
The automated method of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns is text analytics.
Preprocessing is a key component in Data Science. The orange tool has various ways to achieve the activities.


1. Discretization: 

Discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers.

Discretization replaces continuous features with the corresponding categorical features:






2. Continuization


Given a data table, return a new table in which the discretize attributes are replaced with continuous or removed.
  • binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument zero_based.
  • multinomial variables are treated according to the argument multinomial_treatment.
  • discrete attribute with only one possible value is removed;


3. Normalisation: 
Construct a preprocessor for normalization of features. Given a data table, preprocessor returns a new table in which the continuous attributes are normalized.




4. Randomization : 
Construct a preprocessor for randomization of classes, attributes and/or metas. Given a data table, preprocessor returns a new table in which the data is shuffled.

Comments