Tags
Google Chairman Eric Schmidt has said that every 2 days the world produces five exabytes of data. This is as much data as the world produced from the start of humanity until 2003 [1]. The amount of streaming data being generated is exploding fast from the growing variety of interconnected machines, devices, sensors, and consumer content. The amount of data produced from any one source is surprising. For example, Virgin Atlantic has a new fleet of highly connected planes that generate a half terabyte of data on every flight [2]. This data is expected to grow in size even further resulting in enormous data points termed “Big Data”. According to Gartner’s prediction for business intelligence, by 2017 more than 50 percent of analytics implementations will make use of event data streams generated from instrumented machines, applications and/or individuals [3]. Managing such huge increasing data will pose a big challenge to current stream mining techniques and will need evolution in terms of how the results are being dealt with today.
The healthcare industry is expected to see a drastic change in data input and the requirements from the decision support systems using this data. Moving forward, expectations will rise from assisting doctors in diagnosis to tracking a patient’s state in real time. Physicians should be able to analyze the patient’s information during the treatment duration and accordingly decide further steps of treatment. At UCLA’s school of medicine Dr. Paul Vespa is heading a research where real-time signal streams from the brain are being analyzed using IBM’s Watson Foundation to help physicians through decision support for brain analysis [4]. Such types of real time analysis will be expected from stream processing in healthcare.
FIGURE 1 – Healthcare Decision Support Systems [7]
Streaming data has the potential to radically reduce the time to provide business critical information if it can be captured, processed and analyzed in real time. In order to make effective use of this vast information, visual data processing needs to be seamlessly integrated with traditional streaming analysis. Stream analytics needs to shift from static report based solutions to user driven interactive visualization of information. Users should be able to access and combine information from multiple sources of data streams and view recent result from all the business functions. Currently, due to the size of the data and recentness, it is difficult to find user-friendly visualizations. Going further new techniques needs to be developed that will incorporate easy data visualization into all stream mining software.
Our research [5], [6] suggests that the following are major areas where we can see potential problems with the current data stream mining techniques and implementations.
1. Converging real time and historical data – If the data streams from various sources grow as they are expected to, making sense of such incoming Big Data along with the historical data present will be a challenge. Architectures that deal with real-time and historical data will need to be developed to make efficient use of all the information at our hands.
2. Situation aware data stream mining – Real time data stream mining that can provide results in seconds using the most recent data will be in demand. Present data stream techniques involve modifying models based on the characteristics of incoming streams. Such data systems will eventually become computationally slow as the amount of data pouring in every second increases. New methodologies should be devised such that models that were built in similar situations are recalled rather than building new ones for every scenario in order to provide faster analysis of the current state.
3. Mobile data stream mining – Given that the number of mobile technology users has augmented radically in the past decade, data stream mining will need to be performed on remote mobile devices. For example, we need systems that should be able to analyze a driver’s behavior and vehicle’s health from the incoming streams coming through and prevent accidents and adverse events. Such systems will face computational and connectivity challenges. Techniques that can yield efficient and faster results for such scenarios using minimal resources need to be researched upon.
References
[1] “http://techcrunch.com/2010/08/04/schmidt-data/”, Aug 2010, Tech Crunch
[2] “http://sandbox.macworld.com.au/news/boeing-787s-to-create-half-a-terabyte-of-data-per-flight-says-virgin-atlantic-88897/#.UzWkA8eDpEg”, March 2013, MacWorld
[3] Gartner predicts Business Intelligence, “http://www.gartner.com/newsroom/id/2637615”, Dec 2013, Gartner
[4] Using data stream analysis in brain research, “http://www.ibmbigdatahub.com/blog/using-data-stream-analysis-brain-research-ucla%E2%80%99s-school-medicine”, March 2014, IBM
[5] Advances in Data Stream Mining, “http://www.immagic.com/eLibrary/ARCHIVES/GENERAL/JOURNALS/W120101G.pdf”, Mohamed Medhat Gaber, Feb 2012
[6] Big Data Mining future challenges, “http://albertbifet.com/big-data-mining-future-challenges/”, April 2013, Albert Bifet
[7] Clinical Decision Support Systems, “http://www.philblock.info/hitkb/c/clinical_decision_support_systems_part1.html”, philblock.info