Thursday, October 3, 2019

Candidate Set Essay Example for Free

Candidate Set Essay Part of the fast changing science of database management is the improvement of association rules generation. Several algorithms had been proposed and implemented in different platforms or programs to generate these rules. These rules state the rate of confidence of predicting an occurrence of entity or an event based on the occurrence of another entity or event. One popular algorithm proposed to generate the association rules of a given data is the Apriori Algorithm. It uses the bottom-up approach in order to come up with all the significant association rules by specifying the minimum support a super set must have. With the help of a pruning step that uses the property of infrequent set defined in the paper Fast Algorithms for Discovering the Maximum Frequent Set [Lin98], the database scans needed to obtain the MFS are minimized. Another algorithm to solve the maximum frequent sets is the top-down approach. Its first main aim is to discover the Maximum Frequent Candidate Set (MFCS) that would quickly gives all the other frequent set based on the property of frequent sets. Here in this paper, we would compare the disadvantages to be encountered on both algorithms and how the integration of the two cited algorithms would work and be implemented. Apriori Algorithm’s Dilemma FIGURE 2. 1: Lattice 1, 2, and 3 resembling the discovery of frequent set [Dun03]. PROPERTY 1: If an item set is infrequent, all its superset must be infrequent, and they do not need to be examined further. Apriori Algorithm needs to check the entire super sets with one element, {A}, {B}, {C}, and {D}, in order to know the MFCS. With the help of the pruning step that use the above stated property of infrequent sets then in Figure 2. 1 we could determine the MFCS of the universe ABCD by performing Apriori Algorithm. In Figure 2. 1 we should perform four database scans checking the super sets A, B, C and D respectively before we could determine the MFCS for all lattices in Figure 2. 1. Lattice 1 needs four database scans before determining that A is the MFCS. Lattice 2 needs four scans in order to determine ACD and this would be the same in lattice 3 which needs four scans before we would conclude that ABCD is the MFCS. What if we would consider a lattice with 5 items, with 6 items and so on? We would then come up with the conclusion that Apriori Algorithm needs to have n database scans for n items. By considering the above fact, try to examine the lattice of ABCDEFGHIJKLMNOP QRSTUVWXYZ. Then we would conclude that MFCS would be determined after 28 database scans through the use of Apriori Algorithm. The Top-down Approach and the MFCS The top-down approach works well when the MFCS is long. What if the database to be examined has up to 100 items? Then, in Apriori Algorithm, it needs to have 100 database scans in order to come up with the MFCS. On the contrary, the Top-down approach starts with the set containing all the elements of the item set considered down to its subsets. In Figure 2. 1 the Top-down approach checks first the frequency of ABCD, BCD, and so on. What is better with the Top-down Approach compared to the Apriori Algorithm is that it only needs to know the first occurrence of a frequent set to get the MFCS. This is because of the second property of frequent sets. PROPERTY 2: If an item set is frequent, all its subsets must be frequent and they do not need to be examined further. Let’s examine the performance of top-down approach for the three lattices in Figure 2. 1. Top-down approach works best when all of the items in the item set are all frequent. In lattice 3, Top-down approach needs only one database scan in order to come up with the complete frequent sets. Lattice 3’s MFCS is ABCD, therefore it would consider all the subsets of ABCD because ABCD is frequent in the first place. But the problem with the top-down approach is when the MFCS is short. On lattice 3, the number of database scans needed to know MFCS is still lower than the number of database scans needed in the Apriori algorithm, three as compared to four. But on the case of the lattice three, the Top-down approach needs to traverse all the points in the lattice in order to determine the MFCS which is A. The table below gives a view of the database scans needed to determine the complete MFS. Table 2. 1 Apriori and Top-down Approach Comparison Items Apriori Top-down Approach Best case:1 Worst case: 15 5 5 Best case: 1 Worst case: 31 . . . n n Best case: 1 Worst case: 2n 1 Upon considering both the advantages and disadvantages of the two above discussed algorithms, I had decided to merge the good side properties of the two algorithms. To come up with an integrative algorithm that would make use of the concepts of the Apriori Algorithm and Top-down approach, we should first understand or simulate how the two algorithms come up with generating their set of possible candidates for frequent sets. Here is a program code that would generate Apriori Algorithm’s set of possible candidates given the starting candidate {0} and the number of items to be considered. Note that I had opted to start the representation of the possible candidates with zero because the Java program that I had decided to use in order to perform the discussed algorithms uses zero as its start index on its array data structures. Accompanying this program code is the explanation of how did the recursive property come up with the set of possible candidates.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.