Publications
Dynamic Adaptation of Online Ensembles for Drifting Data Streams
Journal Of Intelligent Information Systems Integrating Artificial Intelligence And Database Technologies, US. Apr, 2017. doi:10.1007/s10844-017-0460-9
Abstract
The success of data stream mining techniques has allowed decision makers to analyze their data in multiple domains, ranging from monitoring network intrusion to financial markets analysis and online sales transactions exploration. Specifically, online ensembles that construct accurate models against drifting data streams have been developed. Recently, there has been a surge in interest in mobile (or so-called pocket) data stream mining, aiming to construct near real-time models for data stream mining applications that run on mobile devices. In such a setting, it follows that the computational resources are limited and that there is a need to adapt analytics to map the resource usage requirements. Consequently, the resultant models should not only be highly accurate, but they should also adapt swiftly to changes. In addition, the data mining techniques should be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider Return on Investment (ROI) issues such as storage requirements and memory utilization. This paper introduces the Adaptive Ensemble Size (AES) algorithm, an extension of the Online Bagging method, to address these issues. Our AES method dynamically adapts the sizes of ensembles, based on ROI usage patterns. We illustrate our approach by analyzing the performances against both synthetic and real-world data streams. The results, when comparing our AES algorithm with the state-of-the-art, indicate that we are able to obtain a high Return on Investment (ROI) and to swiftly adapt to change, without compromising on the predictive accuracy.
Tweets as a Vote: Exploring Political Sentiments on Twitter for Opinion Mining
Foundations of Intelligent Systems: 22nd International Symposium, Lyon, France. Oct, 2015. doi:10.1007/978-3-319-25252-0_19
Abstract
Twitter feeds provide data scientists with a large repository for entity based sentiment analysis. Specifically, the tweets of individual users may be used in order to track the ebb and flow of their sentiments and opinions. However, this domain poses a challenge for traditional classifiers, since the vast majority of tweets are unlabeled. Further, tweets arrive at high speeds and in very large volumes. They are also suspect to change over time (so-called concept drift). In this paper, we present the PyStream algorithm that addresses these issues. Our method starts with a small annotated training set and bootstraps the learning process. We employ online analytic processing (OLAP) to aggregate the opinions of the individuals we track, expressed in terms of the votes they would cast in a national election. Our results indicate that we are able to capture the sentiments of individuals as they evolve over time.
Intelligent Adaptive Ensembles for Data Stream Mining: A High Return on Investment Approach
4th International Workshop - New Frontiers in Mining Complex Patterns, Porto, Portugal. Sep, 2015. doi:10.1007/978-3-319-39315-5_5
Abstract
Online ensemble methods have been very successful to create accurate models against data streams that are susceptible to concept drift. The success of data stream mining has allowed diverse users to analyse their data in multiple domains, ranging from monitoring stock markets to analysing network traffic and exploring ATM transactions. Increasingly, data stream mining applications are running on mobile devices, utilizing the variety of data generated by sensors and network technologies. Subsequently, there has been a surge in interest in mobile (or so-called pocket) data stream mining, aiming to construct near real-time models. However, it follows that the computational resources are limited and that there is a need to adapt analytics to map the resource usage requirements. In this context, the resultant models produced by such algorithms should thus not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider Return on Investment (ROI) issues such as storage space needs and memory utilization. This paper introduces the Adaptive Ensemble Size (AES) algorithm, an extension of the Online Bagging method, to address this issue. Our AES method dynamically adapts the sizes of ensembles, based on the most recent memory usage requirements. Our results when comparing our AES algorithm with the state-of-the-art indicate that we are able to obtain a high Return on Investment (ROI) without compromising on the accuracy of the results.
Master's Thesis: Intelligent Adaptation of Ensemble Size in Data Streams Using Online Learning
uO Research, Ottawa, Canada. May, 2015. doi:10.20381/ruor-4304
Abstract
In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.