In this talk we present a system for induction of forest of functional trees from data streams able to detect and react to concept drift.
The Ultra Fast Forest of Trees (UFFT) is an incremental algorithm, that works online, processing each example in constant time, and performing a single scan over the training examples. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. Decision nodes and leaves contain naive-Bayes classifiers playing different roles during the induction process.
Naive-Bayes in leaves are used to classify test examples, naive-Bayes in inner nodes can be used either as multivariate splitting-tests if chosen by the splitting criteria, or used to detect drift in the distribution of the examples that traverse the node.
The use of naive-Bayes classifiers at leaves to classify test examples, the use of splitting-tests based on the outcome of naive-Bayes, and the use of naive-Bayes classifiers at decision nodes to detect drift are directly obtained from the sufficient statistics required to compute the splitting criteria, without no additional computations.
This aspect is a main advantage in the context of high-speed
data streams. This methodology was tested with
artificial and real-world data sets.
The experimental results show a very good performance in comparison to a batch decision tree learner, and high capacity to detect and react to drift.