The Knowledge Acquisition Process in Data Mining

The Knowledge Base is a dynamic part of the system that can be supplemented and refreshed through The
Intelligent KB Editor. We should notice that there are two potential sources of  knowledge to be discovered for
the proposed system. These are the analysis of theory background that lies behind the feature extraction and
classification methods, and field experiments.

In the first case, knowledge is formulated by an expert in the area of the specific feature extraction methods
and classification schemes, and then represented as a set of rules by a knowledge engineer in the terms of a
knowledge representation language that is supported by the system. We argue that it is possible and
reasonable to categorise the facts and rules that are present in the Knowledge Base. Categorisation can be
done according to the way the knowledge has been obtained – has it been got from the analysis of
experimental results of from the domain theory, was it put automatically by the Intelligent KB Editor or by a
knowledge engineer (who could be a data miner as well). Another categorisation criterion is the level of
confidence of a rule. The expert can be sure in a certain fact or may just think or to hypothesize about another
fact. In a similar way, a rule that has been just generated from the analysis of results by experimenting on
artificially generated data sets but has been never verified on real-worlds data sets and a rule that has been
verified on a number of real-world problems. These two rules definitely should not have the same level of
confidence.

In addition to the “trust“ criteria due to the categorisation of the rules it is possible to adapt the system to a
concrete researcher needs and preferences by giving higher weights to the rules that actually are the ones of
the user.

And, in the second case, a data miner can discover knowledge during the analysis of results obtained from
the experiments as separate facts, trends and dependencies. In the same manner, discovered knowledge is
represented as a set of rules by a knowledge engineer using of the knowledge representation language.
Alternatively, the knowledge acquisition process can be automatic, i.e. the knowledge discovery process
would be accomplished without any interference with a human expert. This may happen using the possibility
of deriving new rules and updating the old ones based on the analysis of results obtained during the self-run
experimenting.

In both the last cases we have a problem of learning how the Intelligent KB Editor should try to build up a
classification or a regression model on meta-data resulted from experiments. In this context the input
parameters for a classification model are specific data set characteristics and a classification model’s outputs
that include accuracy, sensitivity, specificity, time complexity, etc. The combination of a feature extraction
method’s and a classification model’s names with their parameter values represents a class label. When
building a regression model – meta-data-set attributes are data set characteristics, the feature extraction
method’s and the classification model’s names, and one of the model output characteristics is the attribute
which value (continuous) has to be predicted.

The results obtained to the present stage of research show a high level of complexity in
dependencies between the data set characteristics and the best-suited scheme for the data mining process.
In order to further develop our understanding it is necessary to proceed the research with the following
iterations:

  • Generation of artificial data sets with known characteristics (simple, statistical and information theoretic
    measures);
  • Design of experiments on the generated artificial data sets;
  • Derivation of dependencies and definition of the criteria from the obtained results;
  • Development of a knowledge base defining a set of rules on the set of obtained criteria;
  • Proof of the constructed theory with a set of experiments on real-world data sets.

Thus, three basic research methods are used in the research: the theoretical approach, the constructive
approach, and the experimental approach. These approaches are closely related and are applied in parallel.
The theoretical backgrounds are exploited during the constructive work and the constructions are used for
experimentation. The results of constructive and experimental work are used to refine the theory.
An example of such a procedure can be presented as:

  1. Generation of artificial data sets with the number of attributes from 2 to 100, with the number of
    instances from 150 to 5000, with the number of classes from 2 to 10, with the average correlation
    between the attributes from 10% to 90%, with the average noisiness of attributes from 10% to 50%,
    with the percent of irrelevant attributes from the total number of attributes from 10% to 50%.
  2. Design of the experiments on generated artificial data sets and analysing accuracy and efficiency of
    the classification model built on different learning algorithms and using different feature extraction
    methods. Tuning of the input parameters for each combination is required.
  3. Analysis of the dependencies and trends between output accuracies and efficiencies, feature
    extraction methods and classifiers, their input parameters, and pre-defined data set characteristics.
  4. Definition of a set of rules that reflect found dependencies and trends.
  5. Execution of a number of experiments on UCI data sets using DSS for the best-suited feature
    extraction method and classifier selection.
  6. Addition of the invented rules that were successfully validated during the tests on the benchmark
    data sets to the knowledge base.
About these ads

, , , , , , , , , , ,

  1. The Knowledge Acquisition Process in Data Mining « Protogenist Blog | Errol A. Adams, J.D. M.L.S' Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: