The Future of Multi-Source Data Analysis
We believe that multi-source data analysis will become increasingly pervasive in the future, due to:
- the expansion of networking and connectivity – allowing data collection from more sources
- increasing ease and lower cost of data collection, storage and manipulation
- increased role of machine learning and automation in decision-making. This is particularly relevant for multi-source data. Currently (in certain industries), human beings rather than computers tend to make complex decisions that rely on a subjective weighing of evidence from multiple sources. For instance, a UX designer will consider digital analytics, market research, and A/B testing while creating a user-centered design. In the future (and even now) this array of multi-source data with diverse dimensions and different analytical requirements will be combined into a single easy-to-use decision-making model.
- growing dependence on simulation versus physical experimentation. Simulation frequently relies on multi-source data
- the expansion of Geographic Information Systems, which rely on the analysis of spatial data from multiple sources
- increased use of sensors and sensor algorithms to track multi-source data, generate alarm signals, and make predictions
- across the board, as data analysis capabilities evolve, a higher threshold for accuracy and a lower tolerance for risk, especially in more established data analysis applications
- increased investment in data technologies and methods
As reliance on data grows across activities and users in both business and personal spheres, technology and analytical methods will be fashioned particularly towards the rapid, continuous combination and analysis of multi-source data. In particular we believe that entrepreneurs and new technologies will expand upon the following offerings:
- Multi-source data visualization
- Enterprise-level automation solutions for decision-making based on multi-source data
- Information systems focused on multi-source data integration
- An information systems focus on intra-sector, multi-source information-sharing between actors in sectors such as: private business, health, research, and government
- A more established set of standardized algorithms for cleaning, integrating, and fusing multi-source data
- Continued and expanded offering of big multi-source data integration and cloud storage (e.g. Hadoop)
Throughout the literature on multi-source data analysis, we have encountered common challenges, regardless of analytical application. Some of these challenges are:
- Data integration: for heterogeneous, dynamic, distributed large data. Heterogeneity can be defined in different ways: different dimensions, file formats, etc.
- Understanding large multi-source data sets: collecting dynamic data from various sensors and satellites can create an enormous data collection. This data collection can be quite complex and challenging to understand. In addition, running a model can be computationally challenging. Decisions that involve pruning the data-sets or reducing dimensionality require further investigation.
- Fusion: Empirical observation of appropriate scenarios for “early” or “late” fusion
- Measurement of model performance: Models don’t always benefit from the addition of data from new sources.
- Data visualization: Difficulty of visualizing heterogeneous, massive data.
Conclusions:
While reading research papers, we have observed that commonly used data mining methods are applied in multi-source data analysis in novel, experimental ways. We believe that, beyond its effect on prompting technology advancements, the increasing interest in multi-source data analysis will lead to a generation of new or altered data mining algorithms.