Practical

Aim: Data Pre-processing and text analytics using Orange

Theory:

what is text analytics:

The automated method of translating large volumes of unstructured text into quantitative data to uncover insights, trends, and patterns is text analytics.

Preprocessing is a key component in Data Science. The orange tool has various ways to achieve the activities.

1. Discretization:

Discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical evaluation and implementation on digital computers.

Discretization replaces continuous features with the corresponding categorical features:

2. Continuization

Given a data table, return a new table in which the discretize attributes are replaced with continuous or removed.

binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument zero_based.
multinomial variables are treated according to the argument multinomial_treatment.
discrete attribute with only one possible value is removed;

3. Normalisation:

Construct a preprocessor for normalization of features. Given a data table, preprocessor returns a new table in which the continuous attributes are normalized.

4. Randomization :

Construct a preprocessor for randomization of classes, attributes and/or metas. Given a data table, preprocessor returns a new table in which the data is shuffled.

Search This Blog

Data-Science(17IT093)

Practical - 5

Practical 5

Aim: Data Pre-processing and text analytics using Orange

Theory:

2. Continuization

Comments

Post a Comment