The Future of Multi-Source Data Analysis

The Future of Multi-Source Data Analysis

We believe that multi-source data analysis will become increasingly pervasive in the future, due to:

  • the expansion of networking and connectivity – allowing data collection from more sources
  • increasing ease and lower cost of data collection, storage and manipulation
  • increased role of machine learning and automation in decision-making. This is particularly relevant for multi-source data. Currently (in certain industries), human beings rather than computers tend to make complex decisions that rely on a subjective weighing of evidence from multiple sources. For instance, a UX designer will consider digital analytics, market research, and A/B testing while creating a user-centered design. In the future (and even now) this array of multi-source data with diverse dimensions and different analytical requirements will be combined into a single easy-to-use decision-making model.
  • growing dependence on simulation versus physical experimentation. Simulation frequently relies on multi-source data
  • the expansion of Geographic Information Systems, which rely on the analysis of spatial data from multiple sources
  • increased use of sensors and sensor algorithms to track multi-source data, generate alarm signals, and make predictions
  • across the board, as data analysis capabilities evolve, a higher threshold for accuracy and a lower tolerance for risk, especially in more established data analysis applications
  • increased investment in data technologies and methods

As reliance on data grows across activities and users in both business and personal spheres, technology and analytical methods will be fashioned particularly towards the rapid, continuous combination and analysis of multi-source data. In particular we believe that entrepreneurs and new technologies will expand upon the following offerings:

  • Multi-source data visualization
  • Enterprise-level automation solutions for decision-making based on multi-source data
  • Information systems focused on multi-source data integration
  • An information systems focus on intra-sector, multi-source information-sharing between actors in sectors such as: private business, health, research, and government
  • A more established set of standardized algorithms for cleaning, integrating, and fusing multi-source data
  • Continued and expanded offering of big multi-source data integration and cloud storage (e.g. Hadoop)

Throughout the literature on multi-source data analysis, we have encountered common challenges, regardless of analytical application. Some of these challenges are:

  • Data integration: for heterogeneous, dynamic, distributed large data. Heterogeneity can be defined in different ways: different dimensions, file formats, etc.
  • Understanding large multi-source data sets: collecting dynamic data from various sensors and satellites can create an enormous data collection. This data collection can be quite complex and challenging to understand. In addition, running a model can be computationally challenging. Decisions that involve pruning the data-sets or reducing dimensionality require further investigation.
  • Fusion: Empirical observation of appropriate scenarios for “early” or “late” fusion
  • Measurement of model performance: Models don’t always benefit from the addition of data from new sources.
  • Data visualization: Difficulty of visualizing heterogeneous, massive data.

Conclusions:

While reading research papers, we have observed that commonly used data mining methods are applied in multi-source data analysis in novel, experimental ways. We believe that, beyond its effect on prompting technology advancements, the increasing interest in multi-source data analysis will lead to a generation of new or altered data mining algorithms.

The Future of Deep Learning

Tags

The arc of Deep Learning is an interesting one. It could rightly be accused that “deep learning” is a rebranding, reminiscent of the introductory remarks delivered in class about the evolution of terminology from “business intelligence” to “data mining” to “data science.” Perhaps that accusation is accurate, but perhaps this evolution of terminology is necessary to convey the effect when incremental changes (such as distributed computing and massive data storage) result in a qualitative transition. Indeed, according to Abdel-Rehman Mohamed, a PhD computer scientist trained at the University of Toronto and slated to join IBM Research, “we’re now at the intersection of so many things that we didn’t have in the past.”(1) He’s speaking about computer hardware, algorithm sophistication, and data acquisition and storage.

In addition to the changes above, development of Graphics Processing Unit (GPU) designs has also aided the shift from traditional ANNs to modern deep learning. According to Jürgen Schmidthuber of the Swiss Artificial Intelligence Lab, “GPUs … accelerate learning by a factor of 50.”(4)

That concept – that the practical application of deep learning is, indeed, qualitatively different from previous attempts at applying ANN (artificial neural network technology) in part because of the simultaneous development of “so many things” serves as the key to understanding the future of deep learning. However, another kernel from the above – Mohamed’s departure from the University of Toronto and into IBM Research – is also key to understanding the state of Deep Learning in the present moment.

graph

(Pallister ’13)(2)

The figure above, adapted from a publication by the embedded systems company EMBecosm, illustrates the relationship between basic research and applied research. Deep learning is quickly making the leap into the domain of applied, commercial research and development. Interestingly, it is often the very same figures who are quite literally making the transition in their careers: In 2013 Yann LeCun, an expert in deep learning from NYU, announced that he would transition from his post at New York University to join Facebook. In the 1980’s, LeCun was instrumental in the development of the back-propagation neural networks that are directly responsible for what is now called deep learning.  (1) It seems fitting, then, that he would take the step into industry in 2013.

LeCun is not the only member of the NYU computer science faculty to make the move to Facebook. Rob Fergus, director of NYU’s M.S. in Data Science program, also announced that he would join the company alongside his colleague.(3)

Taken along with the numerous startup companies mentioned in our second blog post, all vying to commercialize the technology, it seems that the time has come for deep learning to prove whether it will succeed in building profits and providing competitive advantages or whether this wave of interest will serve as another bust in the tumultuous relationship between “artificial intelligence” and private industry.

One can find a glimmer of hope, however, in the possibility that perhaps this time the technology will be deployed to solve those problems – and only those problems – that are peculiarly well-suited for deep learning. Indeed, a 2014 article from Business Insider calls attention to marketing department efforts in “social listening” in the domain of social media images and photos, using deep learning to take a step beyond the kind of straightforward sentiment analysis that can be tackled with bag-of-words representation and information retrieval techniques.(5)

In conclusion, the debate over whether “deep learning” should be kept conceptually distinct from ANN is functionally a moot point. Whatever the technology is called, it is currently experiencing a renaissance as developments across several fronts (algorithms, computational power, data availability, storage capacity/cost, and GPU design) have allowed for quick and accurate unsupervised learning on data types previously inaccessible such as images and videos. The technology has burst from the walls of academia and is, as of 2013 and 2014, in the midst of a transition from academia and basic research to industry and applied research. Finally, the success or failure of deep learning is likely to hinge upon the ability of firms to leverage it in the right spot in their overall analytics strategy. Deep learning is likely to be a strong contender for problems such as “social listening,” a task that fits the problem paradigm nicely due to the highly unstructured nature of inputs and relatively nominal nature of outputs.

 

(1)    Metz, Cade. “60 years later, Facebook heralds a new dawn for artificial intelligence.” Wired. 12/10/2013. http://www.wired.com/2013/12/facebook-deep-learning/

(2)    Pallister, James. “Pasteurized computing: the relationship between academia and industry.” EMBecosm. 2/1/2013. http://www.embecosm.com/2013/02/01/academia-and-industry/

(3)    Fergus, Rob. “I’m joining Facebook!” (blog post) http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php

(4)    Angelica, Amara. “How bio-inspired deep learning keeps winning competitions.” 11/28/2012. http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions

(5)    Smith, Cooper. “Social Media’s Big Data Future.” Business Insider Australia. 2/8/2014. http://www.businessinsider.com.au/social-medias-big-data-future-from-deep-learning-to-predictive-marketing-2014-2

 

Customer Segmentation: Future and upcoming challenges

Tags

Here comes the final episode of the customer segmentation series and we are going to discuss the future and potential problems inherent in customer segmentation.

 

Customer segmentation has currently been widely used in different business settings and we would say this area promises a lot. Customer segmentation helps attract revenues and enhance user satisfaction with the user-centrics service and contents, which is a win-win situation for both businesses and consumers. However, in reality, there are still several obstacles that an organization faces which has to be resolved.

 

The major types of customer segmentation are value-based, behavioral, loyalty-based, need -based and so on. One of the major challenge for most of the businesses is to choose proper type of implementation for their segmentation and develop effective business strategies catered to their business objectives . If the business chooses an inadequate method, they might run under the risk of loosing out opportunities to generate more revenues. On the other hand, customer segmentation only provides specific and centered information or services to the segmented users, which results in loosing out opportunities of exploring new market.

 

Another major hurdle involve in this exciting field is cost incurred by businesses in segmenting their users. Customer segmentation cost might be huge. If there are too many clusters, which could be a result of overfitting data, a business needs to design various services or workflows for each type of customers. This in turn would considerably increase the administrative cost. Moreover, the maintenance cost of customer segmentation is also high. Organizations have to continuously collect data over time and modify their segmentation strategies in order to respond to the upcoming trends in purchases of products.

 

Despite of some difficulties faced by organizations in real implementation, Data Science in Customer Segmentation is  going to be the driving force for most of the businesses and is going to become more popular and  more prosperous from our perspective. In the early stages of a business cycle, organizations may have to invest in the customer segmentation research and development, but later on they will be able to provide better user experiences to their customers based on the results of their research which will help them to get their upfront investment back and earn more revenues. It will be a win-win situation for both of them.

 

So are you ready to play with customer segmentation?

 

Future of Sentiment Analysis and Problems faced

Tags

Introduction

Sentiment Analysis has been more than just a social analytic tool. It’s been an interesting field of study. But it is a field that is still being studied, although not at great lengths due to the intricacy of this analysis. That is this field has functions that are too complicated for machines to understand. The ability to understand sarcasm, hyperbole, positive feelings, or negative feelings has been difficult, for machines that lack feelings. Algorithms have not been able to predict with more than 60% accuracy the feelings portrayed by people. Yet with so many limitations this is one field which is growing at great pace within many industries. Companies want to accommodate the sentiment analysis tools into areas of customer feedback, marketing, CRM, and ecommerce.

 

Way Ahead

ads1

Sentiment analysis methods till now have been used to detect the polarity in the thoughts and opinions of all the users that access social media. Researchers and Businesses are very interested to understand the thoughts of people and how they respond to everything happening around them. Companies use this to evaluate their advertisement campaigns and to improve their products.

pic1

There is too much potential in machine learning, overtaking some of the manual labor of some lexicon based tasks that are labor intensive. For example, lexicon sentiment creation is labor intensive and there are already unsupervised methods to create them. This is where machine learning will play a crucial role. Such algorithms will also have to understand and analyze natural text concept-wise and context-wise. Time will also be a crucial element looking at the amount of data that is being generated on the Web today. Collecting opinions on the web will still requires processing that can filter out un-opinionated user-generated content and also to test the trustworthiness of the opinion and its source.

There is a lot of scope in analyzing the video and images on the web. Now a days, with the advent of Facebook, Instagram and Video vines people are expressing their thoughts with pictures and videos along with text. Sentiment analysis will have to pace up with this change. Tools which are helping companies to change strategies based on Facebook and Twitter will also have to accommodate the number of likes and re-tweets that the thought is generating on the Social media. People follow and unfollow people and comments on Social Media but never comment so there is scope in analyzing these aspects of the Web as well.

ads2 ads2

The use of punctuation is an obstacle in Sentiment Analysis which is under research as well. Sentiment Analysis has started helping us to predict events just like in the case of Obama vs Romney but is still naïve in most cases. A sentiment analysis tool Tweview had predicted the winner of the show X factor but eventually that person came second. So improvements on the analysis is one scope which is under way by many tools available on the web.

As new text types appear on the Social Web, the techniques to pre-process, as well as to tackle their informal style must be adapted, so as to obtain acceptable levels of performance of the sentiment analysis systems. The field will have to combine with effective computing, psychology and neuroscience to converge on a unified approach to understanding the sentiments better.

Roadblocks

 Many tools and algorithms rely on the polarity of the words and the scoring is dependent on this polarity. This means that accuracy drops since the semantics of the complete sentence is lost. The semantics of the sentence makes it difficult to measure the polarity of the sentences on individual words. For eg. “This car is anything but useful”. The word useful can make this sentence positive but eventually this is a negative sentence overall. There are a few limitation to sentiment analysis which are hampering the progress of the accuracy of the models.

ads3

The positive or negative word might mean completely opposite depending upon the context used in the sentence. For example “My car is very good at using up the petrol at a faster rate.” Then sometimes the sentence ambiguity can be a problem since some positive or negative words might mean nothing in perspective of the sentence and sometimes words with no individual meaning express a lot of sentiment in the sentence. Sarcasm is the biggest challenge that sentiment analysis faces. Machine or algorithms with no emotion will find it extremely difficult to differentiate when users are commenting sarcastically.

The language used throughout social media is different. Financial industry have their own language which means completely differs from Entertainment industry. This makes it hard for nay tool to predict the emotion or semantic of the sentence. People also use a lot of slang language and hashtags which makes the accuracy of the algorithms lower. It is difficult for the tool to even understand who the object of the sentence is. For example “I feel the browser is working fine but my friend hates working on it”.

 

Sustenance

 

Sentiment analysis is not all that smooth after all. There are several issues related to Sentiment analysis that could lead to the loss of popularity of the technique.

  • Opinion spam: Sentiment analysis can be used by competitors to portray negative image of a company. Once sentiment analysis gains popularity as a metric to gauge performance and brand image of a company, such mal-practices may become very common which will lead to decreased popularity of Sentiment Analysis.
  • Result measure: The outputs of Sentiment analysis are useful as a reactive measure. It cannot be used to predict the performance of a company or other metrics. In some cases, Sentiment analysis can be redundant and can be only a reporting measure after the damage has been done.
  • Lack of complete information. Biased results based on the sources: The sources of extracting information can be a major roadblock in sentiment analysis. Analysis of a scenario on incomplete information can lead to skewed results. Sources like Twitter, Facebook can be mined to get complete information.

But, other sources like blogs, posts, forums etc can be difficult to retrieve information from that can lead to a biased result-set.

Conclusion

Despite all the challenges and potential problems that threatens Sentiment analysis, one cannot ignore the value that it adds to the industry. Because Sentiment analysis bases its results on factors that are so inherently humane, it is bound to become one the major drivers of many business decisions in future. Improved accuracy and consistency in text mining techniques can help overcome some current problems faced in Sentiment analysis. Looking ahead, what we can see is a true social democracy that will be created using Sentiment analysis, where we can harness the wisdom of the crowd rather than a select few “experts”. A democracy where every opinion counts and every sentiment affects decision making.

 

References:

http://www.scoop.it/t/social-media-monitoring-tools-and-solutions 1st picture.

http://www.saama.com/sentiment-analytics-the-gold-mine-which-you-didn-t-mine/ 3rd picture

http://www.brandwatch.com/2013/12/social-data-gets-the-x-factor/ Tweview

www.niemanlab.org/2013/01/feelings-nothing-more-than-feelings-the-measured-rise-of-sentiment-analysis-in-journalism/ 2nd picture

Image

PSM:Future

PSM : So What Next??

So far you have been exposed to the fundamental questions of what is propensity matching, why is it a good choice for causal effect evaluation, how is it used in day to day applications and what are the different techniques and challenges? Now for the NEXT BIG QUESTION? WHAT NEXT? We are hoping we address this in detail in this blog so that you can get a (right) direction to proceed while employing this technique.

pic1

Image Source: http://1.bp.blogspot.com/–QhyAvmz1ow/UT–34XJFrI/AAAAAAAAV20/gtJ0OcRXQrI/s1600/j.mp:14PR7bT.jpg

Propensity Matching is a fairly young technique just here around 20 years or so (we are counting in turtle years by the way). The scope of this technique is therefore growing with time. Also, everyday(figuratively) new applications of PSM techniques are being discovered. Let’s explore these applications together.

In one of the earlier blog posts, we analyzed two case studies which employed PSM, one dealing with analyzing the effect of coaching/tuitions on SAT scores, the other dealing with analyzing the outcome evaluation of medical therapy management at Retail Pharmacy. Although, origin of PSM is in the healthcare and medical industry, the applications of PSM are branching out to educational and even economic industries. This primarily comes from the fact that PSM is an easy and convenient technique to use. Easy because it is fairly simple to understand and employ. Convenient because it adheres to moral and ethical principles. And lastly, it is the one which you run to, when you need to compare apples with oranges (Back to our fruit salad).

Applications of PSM, nowadays, range widely from dealing with economic issues, such as comparing performance of firms which received government assistance[1], to making a difference in the social sector by analyzing whether a child being victim of bullying result in future delinquency[2]. Since, the factors that decide whether to provide assistance to a firm or not are not under researcher’s control, PSM provides a solid platform to compare the two samples by looking into the after-effects of the treatment. Similarly, whether a person is bullied or not is an unfortunate event which can’t be predicted. However, the adverse effects of such an event can be analyzed more objectively through the use of PSM. It provides a basis for comparison between a victim and a normal individual so as to understand the extent to which such events alter a person’s life

PSM : Pros and Cons!!

However, PSM is not a magic trick which can be employed to each and every A/B testing scenario. Just there exists Mr. Hyde which is a counterpart to Dr. Jekyll, PSM also has its own dark side. This dark side lies within the assumptions of PSM. It is important to understand these assumptions and verify whether they are applicable to your individual scenario.   One of these assumptions is the aspect of hidden bias.

beam1

Image source:http://www.clipartbest.com/cliparts/RTd/egL/RTdegLRT9.png

No matter how many attributes we measure or the amount of data that we collect, there are few factors that remain unmeasured but still do have the power to influence the outcome. PSM technique assumes that the effect of these factors on the outcome is not significant enough. This assumption might be untrue in many cases where these factors actually have a significant effect. Consider a case in the healthcare industry. We judge the effectiveness of a drug based on observable vital stats of a person. This makes sense enough to employ PSM and PSM works well in such scenarios since most of the factors affecting a person’s survival(drug’s positive effect) are dependent on these vital stats which can be measured.

Now consider a scenario, where we compare the effect of coaching on SAT scores of various factors.  Confounding variables are demographic attributes, aptitude, IQ etc which are measurable. However, the hidden bias exist in the form of attention span of a student. It is not possible to measure the number of times a student pays attention during tuitions or while studying. This unmeasured confounder however, does have a significant effect on the outcome. PSM doesn’t handle this type of bias well and hence might give out inaccurate comparisons.

Fortunately, these days, sentiment analysis is a growing trend where you measure the possible effect of such unmeasurable confounders and determine whether this effect is significant enough to manipulate the output.


In a nutshell:

  • PSM is a handy technique to employ you need to compare the outcomes of a non-randomized experiment.
  • PSM  is finding its applications from healthcare industry to the economic sectors, social sectors and even housing industry.
  • It is important to understand the underlying assumptions made while employing this technique since these assumptions are not always applicable to business scenarios
  • If  biases exist, it is important to gauge the effect of unmeasured confounders on the outcome and then determine whether PSM results can be safely utilized.

Key Terminology

Unmeasurable confounders: Factors which exist but which can’t be accurately measured and which have an effect on the outcome of the experiment

Hidden Bias: Effect that unmeasurable confounders have on the outcome


References

[1] Cristian Rotaru, Sezim Dzhumasheva,and Franklin Soriano(2012),”Propensity Score Matching: An Application using the ABS Business Characteristics Survey”

[2] Dr. Jennifer Wong(2013), “Does Bully Victimization Predict Future Delinquency?” url:http://cjb.sagepub.com/content/40/11/1184.abstract?rss=1

THE FUTURE OF DATA SCIENCE IN FINANCE

Tags

 

“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, and everyone thinks everyone else is doing it, so everyone claims they are doing it.” – Dan Ariely

P1

Data Science has become the new buzzword in the industry today and everyone wants to make use of this power. Gartner recently released a report saying that 64% of companies are deploying Data Analytics project, yet 56% struggle to know how to get value from their data.[1] Many universities including CMU now have specialized data science courses and degrees. A lot of research work has begun in this field which are funded by many prestigious institutions and governments. The future for Data Science as such is very bright and fruitful.

Also, from our previous blogs, we have seen that Data science has tremendous potential even in the ever changing industry of Finance.  The Financial Services industry has realized this potential and has started harnessing data in the form of the transaction data, real time market feeds, social media trends etc to help them realize their potential. Data science in financial industry can not only help create a very customer driven enterprise but it will also help in optimizing risk management, making intelligent decision and streamlining the operations for any financial institution.

But for any institution to be able to make full use of the potential of data analytics, it is very important to determine the use cases that will generate significant business value.

The areas where financial industry would want to focus their attention on are:

  •  Leveraging Mobile wallet for marketing their services better.
  •  Fraud Detection
  •  Risk management
  •  Customer segmentation and targeting.
  •  Pricing securities and derivatives
  •  Competition analysis

Here are some of the areas where innovation can be applied to make the combination of data science and finance more powerful.

An Ensemble of Sentiment and Scenario Analysis 

Traders of today are constantly on the look-out for new insights that would give them an edge on the trading platform. This is where scenario and sentiment analysis could be used effectively. Sentiment analysis thrives on data analyzed from social media and news platforms and plays a  vital role in the financial industry, considering the sensitivity of market trends with respect to investor sentiment [1]. Price of a specific stock is usually determined by the speculation surrounding the company which is spread across the community of investors through platforms like Facebook, Twitter, financial blogs, RSS news feeds.

Scenario analysis too plays a major role in the field of finance especially when it comes to predicting stock prices using a simulation model. A investor usually inputs a specific scenario data depending upon previous market trends along with the outcome observed at that time.

However all possible scenarios are created and input by the analyst based on past trends in the market data and is usually specific to financial data alone. Here no consideration is given to the sentiment of the market at the time of analysis. Sentiment analysis is usually conducted separately and the judgement of whether a stock should be sold or bought is left to the investor to decide .

What if Sentiment analyzed data is used to predict a scenario? For example facebook and twitter data feed at the time of the recession could be used and the scenario of a recession can be used in a simulation model in the future if the sentiment analysis data is similar to that at the time of recession. [2] Sentiment Analysis in our opinion should be a part of scenario analysis so that it can eventually be used in the simulation models to determine the risk of a particular trade.

Intelligent Trading Models

Decision Models that execute a trade on its own (ranges of stock price values are predicted)

Today Algorithmic Trading has revolutionized the concept of buying and selling of stocks. High frequency trading has taken trading to an entirely new level but also has made it risky. The ranges usually chosen in the algorithm used for trading is entered by the trader themselves after analyzing market trends using financial models. However this process can be simplified if the ranges predicted by the financial models were directly put into the algorithm without human intervention. Further more the results of past trades could be used in the financial model being used so that future predicted trades would improve.However such a scenario may not involve human judgement which at crucial times may be required for making major financial decisions. The instinctive nature humans possess of making a decision in terms of trading a stock is something a model might not be able to replicate.

p2

Challenges :

Most financial firms are jumping on the data analytics bandwagon. However, this does not necessarily  mean that they are leveraging tools & data effectively. Here are some barriers to effectively implementing data analytics in financial institutions.

  • Lack of a centralized approach to capture and analyse financial data.
  • Insufficient infrastructure and technologies to capture and handle transactional data and customer data on a massive scale.
  • Leadership does not support the use of data analytics and are skeptic about the impact it could have on their predictions.
  • Dearth of talent to deal with the data and derive meaningful patterns to corroborate evidences towards predicting market shifts and financial meltdowns.
  • Defining metrics to measure the role of analytics in transforming the financial sectors.

The future of data science with respect to the financial services industry is mov­ing towards a model that is easy for the aver­age analyst—and company—to use. The goal is for you to get usable, real-time, easy-to-understand insights  using the cutting edge technologies and techniques to overcome the aforementioned challenges [3] Use of analytics is becoming a necessity in the financial services industry and using it appropriately,  will serve as the key differentiator between firms that become successful and firms that fail in the long run.

References:

[1]http://tomfishburne.com/2014/01/big-data.html

[2]http://www.informationweek.com/software/information-management/seven-breakthrough-sentiment-analysis-scenarios/d/d-id/1096156?page_number=1

[3]http://blogs.adobe.com/digitalmarketing/analytics/future-analytics-adobe-summit/

 

Future of Data Visualization

Tags

Future of Data Visualization

 

In the last post, we focused on the current techniques and challenges of data visualization. As we discussed, there are several new and impressive toolkits and dashboards. However, we did not focus on how the nature of data visualizations are changing. This blog post will address that topic by discussing two features of data visualizations that have begun to gain traction: recommender engines and thefusion of data visualization with business intelligence tools.

 

Recommender Engines

 

It is well-known that companies such as Netflix have gained prominence in recent years through usage of recommender systems. Recommender systems are popular because they simplify decision making by helping something that there is a high probability they would like. This same model can be applied to data visualization; for example, a visualization tool may provide recommendations on color schemes based on how you set the type on an infographic (Scheidegger, 2013). Described as an exciting prospect, this sort of product unfortunately does not currently exist in the wild. Recommender systems that use collaborative filtering can predict which features of a visualization will be most impactful to an audience.

 

The Fusion of Data Visualization with Business Intelligence Tools

 

These days you will be hard pressed to find a company that does not consider Business Intelligence tools a necessity. Like data visualizations, provides the ability to leverage and explore data. Therefore several companies are fusing the two together in order to make powerful visuals easily available for consumption and manipulation. For example, SAS promotes their product Visual Analytics with the tagline “Better analytics. Faster insights. Built for everyone.” (http://www.sas.com/en_us/software/business-intelligence/visual-analytics.html#). In order to make the product even more appealing, Visual Analytics is deployable onsite, in the private cloud, and in the public cloud. Similarly, Tableau, a well-known provider of BI-driven data visualization, has launched Tableau Online in order to pursue the same market (http://www.tableausoftware.com/products/online).

 

Upcoming Problems with Data Visualization & Aspects in Decline

 

While it is certainly an exciting time to be in data visualization, there are certainly a few problems to bear in mind. For starters, several of the products available have complicated pricing schemes that inhibit mass availability. Luckily, Tableau Online has simplified their model to $500 per user per year. Complications in the field do not just end with pricing: JavaScript visualization tools still need to become simpler to use in order to reach mass availability. Finally, it is always worth repeating that visualizations can still be used to deceive audiences. For example, a recent chart featured on Business Insider has caused commotion for an inverted y-axis:

A misleading chart

A misleading chart

(http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2)

 

It is very difficult to interpret the above graph, and a quick glance might lead one to conclude the opposite of what is true: gun deaths in Florida have been on the rise since 2005’s ‘Stand Your Ground’ law.

Our current obsession with mobile devices has also influenced data visualizations–visualizations were previously restrictive in what mediums they could be viewed on. Luckily, this aspect is in decline; HTML5/JavaScript and design principles such as “Responsive Web Design” have increased the accessibility to visualizations on all devices. Major visualization providers such as SAS have also done their part to ensure that visuals can be accessed on the go.

 

In conclusion, data visualization tools have begun to make exciting advancements and seem as if they are on the cusp of simplifying the creation of powerful visualizations. While a lot of this technology is only targeted to corporate users, it does not seem unreasonable that these leaps in data visualization will eventually reach everyday consumers. Soon will be the days when anyone can create a powerful visual to present for any sort of reason. Accessibility means that these visuals will be always interpretable, regardless of device and programs available.

 

Works Cited

Booker, Ellis. “How Data Visualization Experts See the Future – Information Week.” Information Week. UBM Tech, 9 Sept. 2013. Web. 18 Apr. 2014.

Engel, Pamela. “This Chart Shows An Alarming Rise In Florida Gun Deaths After ‘Stand Your Ground’ Was Enacted.” Business Insider. Business Insider, Inc, 18 Feb. 2014. Web. 18 Apr. 2014.

Scheidegger, Carlos. “The Future of Data Visualization Tools.” Visually Blog The Future of Data Visualization Tools Comments. Visual.ly, 11 Mar. 2013. Web. 18 Apr. 2014.

FUTURE OF RECOMMENDATION SYSTEMS

Tags

,

In our previous blogs we have discussed in detail about the different technologies used in recommendation systems and the current applications of recommendation systems. In this blog we will focus on trends in the field of recommendation systems and possible domains where they can be used. We will finally touch upon challenges that next generation recommendation systems will face.

New Directions

There are several new domains and avenues that recommendation systems could be put to use in. Described below are some of these areas.

Local Businesses:

A potential area where the use of recommendation systems would be beneficial is in local businesses. Recommendation systems could serve as a bridge between these businesses such as hotels, tourist spots, eateries etc. and the customers. In recent times with the advent of smartphones, users rely heavily on apps such as Yelp, and Tripadvisor while deciding which place to visit. This fact can be leveraged and the data from such companies could be used to provide recommendations to customers. For Example, the recommendation systems could provide suggestions on which restaurant a customer should visit based on whether the cuisine is vegetarian or not, the calorie content, cooking style etc.

Configuration Systems:

Configuration is a design activity which involves composing a target product from a set of predefined components. [2] In such systems usability is always a concern. Recommendation systems could be used to help users select those features that are relevant to them. For instance, all users may not be interested in the GPS feature of a camera. Recommendation could be used to filter out the unnecessary feature from the user’s point of view

Near Real -Time Item Based Deals:

Conventionally, the shopping related recommendations being provided to customers is static. In the future, recommender systems could be used to provide almost real-time deal recommendations and targeted ads.

It is possible for credit card companies to provide information regarding discounts and deals in nearby shops based on the customers purchasing behavior. [1] For instance if a customer purchased a winter coat she could be provided with information on the discounts available for boots almost immediately.

These real time deal recommendations would be extremely valuable for both customers and businesses, and will eventually facilitate creating personalized malls for customers.

Currently depending on the time of the day, group of people present in that area billboards in subway stations display appropriate ads. These customizations could leverage on the real time nature of interactive marketing, and with this data recommendation systems could decide upon what ad is to be placed in that bill board depending on people who are around. This makes way for a more flexible and centralized advertisement broadcasting setup that gives out a very specific ad to connected billboard depending upon the people near the billboard. Such possibilities are also fueled by the boom of internet of things.

Persuasive Software Development:

Development teams work in high pressure conditions and are faced with the challenge of presenting deliverables on time as well as adhering to quality guidelines. Quality can be tied to the understandability of the software. Recommendation systems can be used to inform users about critical sections in the code as well as measures to increase software quality [2].

Pribik et al. introduced such an environment which has been implemented as an Eclipse plugin (www.eclipse.org).[3]

Smart Homes :

Recommendation systems can be used to improve the quality of life at home and to provide a better experience overall. For instance , these systems could suggest the desired setting for the heating ,raise alerts when an unsafe situation is detected, optimizing energy usage etc. This idea could be extended beyond homes to schools and workplaces as well.

POTENTIAL ROADBLOCKS

As the importance of recommendation systems grows, there are third party vendors whose primary focus is to create and sell recommendation systems to clients. The challenges that these companies are faced with have been explained below.

Commoditization:

Vast amount of information pertaining to recommender systems is freely available and easily accessible. In addition to this, there are relatively few patents in this domain. Hence developing recommender systems in-house is not an insurmountable task today. Consequently most companies would prefer developing their own recommender systems in-house. Therefore from the perspective of third parties whose sole product is recommendation systems, obtaining a client contract would be challenging. This threat can be negated to some extent if recommendation system providers bring unique value and provide relevant and contextual recommendations.

Limited Control:

Recommendation systems provided by third party vendors usually make use of proprietary technology that cannot be easily altered. As a result of this, potential clients feel that they have limited control over these systems and the algorithm used. Keeping this in mind it is important for recommendation system providers to keep the clients in the loop. For instance if the client is an online retailer, providing the client with the option to alter the algorithms, select custom algorithm etc. would provide them would more control and would ultimately result in increased revenues.

Privacy:

As we have seen from our previous blogs, the more data recommendation systems have access to the better they perform. This brings in several considerations.

Firstly there are legal factors that come into the picture. For instance, the Data Protection Directive which the European Union follows dictates that organizations need to gain user consent to process their data. [1]Such guidelines make the data collection for recommendation systems considerably more challenging.

In addition to this, there is the customer’s perspective. Customers get bothered when sensitive information is used and when their data is shared with third party recommendation system vendors. They view this as over personalization.

MOVING FORWARD

Up and till recently , recommendation systems have been widely used in only in the developed nations. Currently there is a huge demand for these systems in developing nations as well. Cisco predicts that by 2020 50 Billion devices would be connected to the internet[4]. These devices will be bringing in more real time data that might have an influence on the way recommendation systems work. This can be attributed to two reasons. The first one is that e-commerce in these countries is only now beginning to leave its mark, and the second is the smartphone boom in these nations. Both of these factors provide unique opportunities that recommendation systems can leverage[1].

Considering the wide range of domains that recommendation systems can be used in as well as the demand for them throughout the world, we believe that the importance of these systems will continue to grow. As we have highlighted by means of our previous blogs, the field of recommendation systems is a dynamic and complex one. It will be interesting to see what the future holds!

REFERENCES

[1] Third-party Recommendation Systems Industry: Current Trends and Future Directions – Amit Sharma

[2] Toward the Next Generation ofRecommender Systems: Applications and Research Challenges – Preprint Version of paper published in: Multimedia Services in Intelligent Environments: RecommendationServices, Springer, 8767:81-98, 2013, see:www.springer.com.

[3] Pribik, I., Felfernig, A.: Towards Persuasive Technology for Software Development Environments: An Empirical Study. In: Persuasive Technology Conference (Persuasive 2012). pp. 227{238 (2012)

[4] https://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf

The Future of Crowdsourcing

The Future of Crowdsourcing

As we mentioned in our previous discussion, the idea of crowdsourcing has been adopted by more and more organizations. As the “crowd”, we also get familiar with this idea and become more involved in contributing our resources, either by volunteering or for benefits. We may wonder, what is the future of crowdsourcing?

 

We introduced crowdsourcing in the scope of outsourcing, and we believe the essential of crowdsourcing is the same. The reason crowdsourcing becomes a buzzword is because it also introduces open innovation. With our exploration, we believe in the future, crowdsourcing will evolve in two dimensions. Horizontally, the business process of crowdsourcing will be segmented into sub-tasks. For every sub-task, professional organizations will occupy the market. Vertically, crowdsourcing will be applied into different fields of business, which means the requirement of input and output of a crowdsourcing task will be strictly defined and the overall quality of the final output will be improved.

 

Horizontal Evolvement:

The following life cycle can be applied to a typical crowdsourcing task.

1. An issue is identified;

2. Potential solutions to the problem are developed;

3. The best solution is further developed through micro-tasks by the crowd;

4. The crowd funds the final solution;

5. The solution is implemented using the crowd as a delivery mechanism;

6. The final solution is validated through social media sentiment analysis.

 

Screen Shot 2014-04-18 at 4.05.52 PM

 

Vertical Evolvement:

Crowdsourcing will be introduced into business that requires different level of expertise:

1. Unconscious Behavior: where the outsourcing tasks are done unconsciously as part of other tasks.

2. Common Knowledge: where tasks require basic knowledge, like voting and labeling.

3. Required Satisfaction: where requirements are specified and only tasks meeting the requirements can get the benefit.

4. Professional and Experts: where tasks require domain knowledge and usually the tasks are presented in a contest format.

 

a

 

Upcoming Problems:

However, when the idea of crowdsourcing blooms, the problems also occur. Crowdsourcing’s resource is based on crowd, which determines its unavoidable characteristics of the crowd: need for motivation, unpredicted respondents, uncontrollable quality and limited professional resources.

1. Need for motivation: Though implicit crowdsourcing has set a model to collect users’ output without letting them know, creating such a model is difficult and usually a by-product of other application. Usually, companies would like motivating the target users with wages or prize. Yet report says most crowd workers receive minimal wages. The prize is awarded to a few and sometimes less than crowd workers’ expectation.

2. Unpredicted respondents and uncontrollable quality: to reach the large scale of the crowd, organizations usually don’t have resources for organizational management and quality control. Therefore, organizations looking for predicated and stable outcome would rather try in-housing and outsourcing development. Crowdsourcing may be a good choice for organizations that are looking for innovative solutions or don’t care about the output quality.

3. Limited professional resources: the more crowdsourcing agencies yield a diminishing return when they require professional resources. An example will be crowd funding website like Kickstarter. Now individuals may be able to find large amount of funding, however, when there are more agencies out on the market and more people realize it is a good approach to seek initial funding, the funding source will be separated and streamed into smaller amount. The same thing can happen on contests requiring skills such as machining learning since the talent pool is limited.

 

Reference:

[1]: The Future of Crowdsourcing, Dustin Haisler, https://medium.com/p/67ee31b88b5b

[2]: The Future of Crowd Work, Aniket Kittur, Jeffrey V. Nickerson, Michael S. Bernstein, http://hci.stanford.edu/publications/2013/CrowdWork/futureofcrowdwork-cscw2013

 

The Future of Fraud Detection

The Future of Fraud Detection

By Rohan Nanda, Nicholas Garcia, & Alejandra Caro Rincon

Advances in technology give criminals increasingly powerful tools to commit fraud, especially using credit cards or internet bots. To combat the evolving face of fraud, researchers are developing increasingly sophisticated tools, with algorithms and data structures capable of handling large-scale complex data analysis and storage.

 

Image

            Source: Merchant911[1]

 

The most popular area of current fraud detection research has been in credit card, but we see online bots and Ad click fraud as growing concerns for the future. With rapid reduction in the cost of computing power, publishers can exploit vulnerabilities by creating bots to click on Ads to generate more revenue.

Image

source: Phua, Clifton, et al. “A comprehensive survey of data mining-based fraud detection research.” arXiv preprint arXiv:1009.6119 (2010).

Credit-card Fraud Detection

Banks typically implement a single fraud detection and prevention system that tries to capture fraudulent transactions based on a model generalized to all their customers. This network model incorporates general fraud trends from different products across the bank. However, this approach is ineffective in the long run as they are too broad to find ever more sophisticated forms of fraud. Credit card associations are combining network as well as custom models to develop a comprehensive system that detects fraud upon point of sale. For instance, MasterCard implements the following approach:

Image

        Source: MasterCard Fraud Analytics

 

With a diverse set of data mining and neural network analysis techniques, and over 100 parameters to evaluate, MasterCard’s Expert Monitoring system aids issuer banks in detecting fraud within minutes of the transaction[2].

Image

    Source: MasterCard Fraud Analytics

Custom models or targeted modeling enhance the accuracy of fraud detection by pulling customer-specific data points[2]. In future, this technique will be standardized across all card associations and banks. Nonetheless, this approach is difficult because of customer’s privacy concerns for customer data. Consequently, the challenge the credit companies must master is implementing such a system without spooking the customer. A second challenge is the timeliness of the detection. Customers want their transactions approved in seconds, not minutes. To address this issue, better machine learning algorithms are needed to raise flags about fraudulent transaction in real-time. Standardized techniques are desirable across industries, however they must account for user heterogeneity and security preferences, and models have to bed constantly update in order to detect and learn emerging fraudulent behaviors.

 

Other challenges in fraud detection systems include but are not limited to:

  • Imbalanced data distribution: the number of fraudulent transactions is much smaller than legitimate ones. History has shown that models trained on such data do not perform well, however bootstrapping and other resampling techniques are used to counter this in order to ‘con’ the model into thinking that it has more data to work with[3]
  • Non-stationary data: with a continuous stream of transactions available, models have to be retrained often. However, this problem is compounded with imbalanced class distributions[3]
  • Non-availability of public data: Due to the sensitive nature of the topic, often datasets are not available to effectively evaluate existing methods of fraud detection[3]

 

Online Ad Click Fraud Detection

In most of our entries we have been very interested in fraud detection in the financial industry. In this entry we also want to mention alternative emerging fraud behaviors; also how they harm some  business and the strategies used in the industry to detect it.

 

Online click fraud is the act of clicking on advertisements without a specific interest on the product. Such practice is usually performed by software in a systematic way, increasing the marketing expenses for the business offering the product. This also harms the credibility of the advertising companies and the online advertising industry as a whole. Click forensics estimate that fraud clicks correspond to a 19% of overall clicks through ads. [4]

 

In order to identify fraudulent clicks there are several machine learning techniques being developed. For instance, detecting duplicate clicks over decaying windows is an important technique to accomplish such task. These type of models consist of eliminating the expired information according to the number of object collected or the activity in a certain period of time, over which the analysis is performed. Some of the most common algorithms implemented are based on Bloom filters, a data structure for testing whether an element belongs to a set. The particular characteristic of this approach is that these probabilistic data structures don’t allow false negatives. Thus avoiding classify a set of fraudulent clicks as legit.

 

Yet, beyond the technical approach of this problem it is important to note the important role and the challenges regulation around the world. The heterogeneity across regulatory frameworks  in different  countries poses great challenges for many industries to detect fraud. For instance, in countries where Electronic privacy laws are too strict it is harder to gather data, detect fraudulent patterns, and thus track and identify fraudsters. To learn more about the specific tools that are in the process of being implemented to combat fraud, please see our Overview of the Industry blog post here.

Works Cited

[1]Merchant911. (n.d.). Credit Card Fraud Trends. Retrieved April 17, 2014, from http://www.merchant911.org/fraud-trends.html

[2]MasterCard. (n.d.). How the Past Changes the Future of Fraud. Retrieved April 17, 2014, from http://www.mastercard.com/us/company/en/docs/Modeling_white_paper.pdf

[3]Pozzolo, A. D. (n.d.). Learned lessons in credit card fraud detection from a practitioner perspective. Learned lessons in credit card fraud detection from a practitioner perspective. Retrieved April 18, 2014, from http://www.sciencedirect.com/science/article/pii/S095741741400089X

[4] http://searchengineland.com/click-fraud-q42010-62471