Negative Association Rule
Association Rule Mining
the problem of association rule mining
is defined as: Let be a set of n binary attributes called items. Let be a set of
transactions called the database. Each transaction in D has a unique transaction ID and contains a subset of the items in
I. A rule is defined as an implication of the form where and . The sets of items X and Y
are called antecedent (left-hand-side) and consequent (right-hand-side) of the rule.
To select interesting rules from the set of all possible rules, constraints on various measures of significance and interest can be used. The best-known constraints are minimum thresholds on support and confidence. The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset. The confidence of a rule is defined
Confidence can be interpreted as an estimate of the probability P(Y | X), the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS
The interestingness of an association rule can be defined in terms of the measure associated with it, as well as in the form an association can be found. The most common framework in the association rules generation is the
“support-confidence” one. Although these two parameters allow the pruning of many associations that are discovered in data, there are cases when many uninteresting rules may be produced. The measure interest is used to discover
4. Chi-Square Test
Generally speaking, the chi-square test is a statistical test (Glenn A. Walker.) used to examine differences with categorical variables. There are a number of features of the social world we characterize through categorical variables – religion, political preference, etc. To examine hypotheses using such variables, use the chi-square test. The chi-square test is used in two similar but distinct circumstances:
For estimating how closely an observed distribution matches an expected distribution – we’ll refer to this as the
For estimating whether two random variables are independent.