The Future of Deep Learning

18 Friday Apr 2014

Posted by tnfilipiak in 16_Deep learning, Uncategorized

Tags

The arc of Deep Learning is an interesting one. It could rightly be accused that “deep learning” is a rebranding, reminiscent of the introductory remarks delivered in class about the evolution of terminology from “business intelligence” to “data mining” to “data science.” Perhaps that accusation is accurate, but perhaps this evolution of terminology is necessary to convey the effect when incremental changes (such as distributed computing and massive data storage) result in a qualitative transition. Indeed, according to Abdel-Rehman Mohamed, a PhD computer scientist trained at the University of Toronto and slated to join IBM Research, “we’re now at the intersection of so many things that we didn’t have in the past.”(1) He’s speaking about computer hardware, algorithm sophistication, and data acquisition and storage.

In addition to the changes above, development of Graphics Processing Unit (GPU) designs has also aided the shift from traditional ANNs to modern deep learning. According to Jürgen Schmidthuber of the Swiss Artificial Intelligence Lab, “GPUs … accelerate learning by a factor of 50.”(4)

That concept – that the practical application of deep learning is, indeed, qualitatively different from previous attempts at applying ANN (artificial neural network technology) in part because of the simultaneous development of “so many things” serves as the key to understanding the future of deep learning. However, another kernel from the above – Mohamed’s departure from the University of Toronto and into IBM Research – is also key to understanding the state of Deep Learning in the present moment.

(Pallister ’13)(2)

The figure above, adapted from a publication by the embedded systems company EMBecosm, illustrates the relationship between basic research and applied research. Deep learning is quickly making the leap into the domain of applied, commercial research and development. Interestingly, it is often the very same figures who are quite literally making the transition in their careers: In 2013 Yann LeCun, an expert in deep learning from NYU, announced that he would transition from his post at New York University to join Facebook. In the 1980’s, LeCun was instrumental in the development of the back-propagation neural networks that are directly responsible for what is now called deep learning. (1) It seems fitting, then, that he would take the step into industry in 2013.

LeCun is not the only member of the NYU computer science faculty to make the move to Facebook. Rob Fergus, director of NYU’s M.S. in Data Science program, also announced that he would join the company alongside his colleague.(3)

Taken along with the numerous startup companies mentioned in our second blog post, all vying to commercialize the technology, it seems that the time has come for deep learning to prove whether it will succeed in building profits and providing competitive advantages or whether this wave of interest will serve as another bust in the tumultuous relationship between “artificial intelligence” and private industry.

One can find a glimmer of hope, however, in the possibility that perhaps this time the technology will be deployed to solve those problems – and only those problems – that are peculiarly well-suited for deep learning. Indeed, a 2014 article from Business Insider calls attention to marketing department efforts in “social listening” in the domain of social media images and photos, using deep learning to take a step beyond the kind of straightforward sentiment analysis that can be tackled with bag-of-words representation and information retrieval techniques.(5)

In conclusion, the debate over whether “deep learning” should be kept conceptually distinct from ANN is functionally a moot point. Whatever the technology is called, it is currently experiencing a renaissance as developments across several fronts (algorithms, computational power, data availability, storage capacity/cost, and GPU design) have allowed for quick and accurate unsupervised learning on data types previously inaccessible such as images and videos. The technology has burst from the walls of academia and is, as of 2013 and 2014, in the midst of a transition from academia and basic research to industry and applied research. Finally, the success or failure of deep learning is likely to hinge upon the ability of firms to leverage it in the right spot in their overall analytics strategy. Deep learning is likely to be a strong contender for problems such as “social listening,” a task that fits the problem paradigm nicely due to the highly unstructured nature of inputs and relatively nominal nature of outputs.

(1) Metz, Cade. “60 years later, Facebook heralds a new dawn for artificial intelligence.” Wired. 12/10/2013. http://www.wired.com/2013/12/facebook-deep-learning/

(2) Pallister, James. “Pasteurized computing: the relationship between academia and industry.” EMBecosm. 2/1/2013. http://www.embecosm.com/2013/02/01/academia-and-industry/

(3) Fergus, Rob. “I’m joining Facebook!” (blog post) http://cs.nyu.edu/~fergus/pmwiki/pmwiki.php

(4) Angelica, Amara. “How bio-inspired deep learning keeps winning competitions.” 11/28/2012. http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions

(5) Smith, Cooper. “Social Media’s Big Data Future.” Business Insider Australia. 2/8/2014. http://www.businessinsider.com.au/social-medias-big-data-future-from-deep-learning-to-predictive-marketing-2014-2

Customer Segmentation: Future and upcoming challenges

18 Friday Apr 2014

Posted by amilshah in 11_Customer segmentation

≈ Leave a comment

Tags

04_Future

Here comes the final episode of the customer segmentation series and we are going to discuss the future and potential problems inherent in customer segmentation.

Customer segmentation has currently been widely used in different business settings and we would say this area promises a lot. Customer segmentation helps attract revenues and enhance user satisfaction with the user-centrics service and contents, which is a win-win situation for both businesses and consumers. However, in reality, there are still several obstacles that an organization faces which has to be resolved.

The major types of customer segmentation are value-based, behavioral, loyalty-based, need -based and so on. One of the major challenge for most of the businesses is to choose proper type of implementation for their segmentation and develop effective business strategies catered to their business objectives . If the business chooses an inadequate method, they might run under the risk of loosing out opportunities to generate more revenues. On the other hand, customer segmentation only provides specific and centered information or services to the segmented users, which results in loosing out opportunities of exploring new market.

Another major hurdle involve in this exciting field is cost incurred by businesses in segmenting their users. Customer segmentation cost might be huge. If there are too many clusters, which could be a result of overfitting data, a business needs to design various services or workflows for each type of customers. This in turn would considerably increase the administrative cost. Moreover, the maintenance cost of customer segmentation is also high. Organizations have to continuously collect data over time and modify their segmentation strategies in order to respond to the upcoming trends in purchases of products.

Despite of some difficulties faced by organizations in real implementation, Data Science in Customer Segmentation is going to be the driving force for most of the businesses and is going to become more popular and more prosperous from our perspective. In the early stages of a business cycle, organizations may have to invest in the customer segmentation research and development, but later on they will be able to provide better user experiences to their customers based on the results of their research which will help them to get their upfront investment back and earn more revenues. It will be a win-win situation for both of them.

So are you ready to play with customer segmentation?

Future of Sentiment Analysis and Problems faced

18 Friday Apr 2014

Posted by anamin2014 in 06_Sentiment analysis

≈ 2 Comments

Tags

04_Future

Introduction

Sentiment Analysis has been more than just a social analytic tool. It’s been an interesting field of study. But it is a field that is still being studied, although not at great lengths due to the intricacy of this analysis. That is this field has functions that are too complicated for machines to understand. The ability to understand sarcasm, hyperbole, positive feelings, or negative feelings has been difficult, for machines that lack feelings. Algorithms have not been able to predict with more than 60% accuracy the feelings portrayed by people. Yet with so many limitations this is one field which is growing at great pace within many industries. Companies want to accommodate the sentiment analysis tools into areas of customer feedback, marketing, CRM, and ecommerce.

Way Ahead

Sentiment analysis methods till now have been used to detect the polarity in the thoughts and opinions of all the users that access social media. Researchers and Businesses are very interested to understand the thoughts of people and how they respond to everything happening around them. Companies use this to evaluate their advertisement campaigns and to improve their products.

pic1

There is too much potential in machine learning, overtaking some of the manual labor of some lexicon based tasks that are labor intensive. For example, lexicon sentiment creation is labor intensive and there are already unsupervised methods to create them. This is where machine learning will play a crucial role. Such algorithms will also have to understand and analyze natural text concept-wise and context-wise. Time will also be a crucial element looking at the amount of data that is being generated on the Web today. Collecting opinions on the web will still requires processing that can filter out un-opinionated user-generated content and also to test the trustworthiness of the opinion and its source.

There is a lot of scope in analyzing the video and images on the web. Now a days, with the advent of Facebook, Instagram and Video vines people are expressing their thoughts with pictures and videos along with text. Sentiment analysis will have to pace up with this change. Tools which are helping companies to change strategies based on Facebook and Twitter will also have to accommodate the number of likes and re-tweets that the thought is generating on the Social media. People follow and unfollow people and comments on Social Media but never comment so there is scope in analyzing these aspects of the Web as well.

The use of punctuation is an obstacle in Sentiment Analysis which is under research as well. Sentiment Analysis has started helping us to predict events just like in the case of Obama vs Romney but is still naïve in most cases. A sentiment analysis tool Tweview had predicted the winner of the show X factor but eventually that person came second. So improvements on the analysis is one scope which is under way by many tools available on the web.

As new text types appear on the Social Web, the techniques to pre-process, as well as to tackle their informal style must be adapted, so as to obtain acceptable levels of performance of the sentiment analysis systems. The field will have to combine with effective computing, psychology and neuroscience to converge on a unified approach to understanding the sentiments better.

Roadblocks

Many tools and algorithms rely on the polarity of the words and the scoring is dependent on this polarity. This means that accuracy drops since the semantics of the complete sentence is lost. The semantics of the sentence makes it difficult to measure the polarity of the sentences on individual words. For eg. “This car is anything but useful”. The word useful can make this sentence positive but eventually this is a negative sentence overall. There are a few limitation to sentiment analysis which are hampering the progress of the accuracy of the models.

The positive or negative word might mean completely opposite depending upon the context used in the sentence. For example “My car is very good at using up the petrol at a faster rate.” Then sometimes the sentence ambiguity can be a problem since some positive or negative words might mean nothing in perspective of the sentence and sometimes words with no individual meaning express a lot of sentiment in the sentence. Sarcasm is the biggest challenge that sentiment analysis faces. Machine or algorithms with no emotion will find it extremely difficult to differentiate when users are commenting sarcastically.

The language used throughout social media is different. Financial industry have their own language which means completely differs from Entertainment industry. This makes it hard for nay tool to predict the emotion or semantic of the sentence. People also use a lot of slang language and hashtags which makes the accuracy of the algorithms lower. It is difficult for the tool to even understand who the object of the sentence is. For example “I feel the browser is working fine but my friend hates working on it”.

Sustenance

Sentiment analysis is not all that smooth after all. There are several issues related to Sentiment analysis that could lead to the loss of popularity of the technique.

Opinion spam: Sentiment analysis can be used by competitors to portray negative image of a company. Once sentiment analysis gains popularity as a metric to gauge performance and brand image of a company, such mal-practices may become very common which will lead to decreased popularity of Sentiment Analysis.
Result measure: The outputs of Sentiment analysis are useful as a reactive measure. It cannot be used to predict the performance of a company or other metrics. In some cases, Sentiment analysis can be redundant and can be only a reporting measure after the damage has been done.
Lack of complete information. Biased results based on the sources: The sources of extracting information can be a major roadblock in sentiment analysis. Analysis of a scenario on incomplete information can lead to skewed results. Sources like Twitter, Facebook can be mined to get complete information.

But, other sources like blogs, posts, forums etc can be difficult to retrieve information from that can lead to a biased result-set.

Conclusion

Despite all the challenges and potential problems that threatens Sentiment analysis, one cannot ignore the value that it adds to the industry. Because Sentiment analysis bases its results on factors that are so inherently humane, it is bound to become one the major drivers of many business decisions in future. Improved accuracy and consistency in text mining techniques can help overcome some current problems faced in Sentiment analysis. Looking ahead, what we can see is a true social democracy that will be created using Sentiment analysis, where we can harness the wisdom of the crowd rather than a select few “experts”. A democracy where every opinion counts and every sentiment affects decision making.

References:

http://www.scoop.it/t/social-media-monitoring-tools-and-solutions 1st picture.

http://www.saama.com/sentiment-analytics-the-gold-mine-which-you-didn-t-mine/ 3^rd picture

http://www.brandwatch.com/2013/12/social-data-gets-the-x-factor/ Tweview

www.niemanlab.org/2013/01/feelings-nothing-more-than-feelings-the-measured-rise-of-sentiment-analysis-in-journalism/ 2^nd picture

THE FUTURE OF DATA SCIENCE IN FINANCE

18 Friday Apr 2014

Posted by karthikkiran in 33_Data Science in Finance, Uncategorized

≈ 1 Comment

Tags

04_Future

“Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, and everyone thinks everyone else is doing it, so everyone claims they are doing it.” – Dan Ariely

Data Science has become the new buzzword in the industry today and everyone wants to make use of this power. Gartner recently released a report saying that 64% of companies are deploying Data Analytics project, yet 56% struggle to know how to get value from their data.[1] Many universities including CMU now have specialized data science courses and degrees. A lot of research work has begun in this field which are funded by many prestigious institutions and governments. The future for Data Science as such is very bright and fruitful.

Also, from our previous blogs, we have seen that Data science has tremendous potential even in the ever changing industry of Finance. The Financial Services industry has realized this potential and has started harnessing data in the form of the transaction data, real time market feeds, social media trends etc to help them realize their potential. Data science in financial industry can not only help create a very customer driven enterprise but it will also help in optimizing risk management, making intelligent decision and streamlining the operations for any financial institution.

But for any institution to be able to make full use of the potential of data analytics, it is very important to determine the use cases that will generate significant business value.

The areas where financial industry would want to focus their attention on are:

Leveraging Mobile wallet for marketing their services better.
Fraud Detection
Risk management
Customer segmentation and targeting.
Pricing securities and derivatives
Competition analysis

Here are some of the areas where innovation can be applied to make the combination of data science and finance more powerful.

An Ensemble of Sentiment and Scenario Analysis

Traders of today are constantly on the look-out for new insights that would give them an edge on the trading platform. This is where scenario and sentiment analysis could be used effectively. Sentiment analysis thrives on data analyzed from social media and news platforms and plays a vital role in the financial industry, considering the sensitivity of market trends with respect to investor sentiment [1]. Price of a specific stock is usually determined by the speculation surrounding the company which is spread across the community of investors through platforms like Facebook, Twitter, financial blogs, RSS news feeds.

Scenario analysis too plays a major role in the field of finance especially when it comes to predicting stock prices using a simulation model. A investor usually inputs a specific scenario data depending upon previous market trends along with the outcome observed at that time.

However all possible scenarios are created and input by the analyst based on past trends in the market data and is usually specific to financial data alone. Here no consideration is given to the sentiment of the market at the time of analysis. Sentiment analysis is usually conducted separately and the judgement of whether a stock should be sold or bought is left to the investor to decide .

What if Sentiment analyzed data is used to predict a scenario? For example facebook and twitter data feed at the time of the recession could be used and the scenario of a recession can be used in a simulation model in the future if the sentiment analysis data is similar to that at the time of recession. [2] Sentiment Analysis in our opinion should be a part of scenario analysis so that it can eventually be used in the simulation models to determine the risk of a particular trade.

Intelligent Trading Models

Decision Models that execute a trade on its own (ranges of stock price values are predicted)

Today Algorithmic Trading has revolutionized the concept of buying and selling of stocks. High frequency trading has taken trading to an entirely new level but also has made it risky. The ranges usually chosen in the algorithm used for trading is entered by the trader themselves after analyzing market trends using financial models. However this process can be simplified if the ranges predicted by the financial models were directly put into the algorithm without human intervention. Further more the results of past trades could be used in the financial model being used so that future predicted trades would improve.However such a scenario may not involve human judgement which at crucial times may be required for making major financial decisions. The instinctive nature humans possess of making a decision in terms of trading a stock is something a model might not be able to replicate.

Challenges :

Most financial firms are jumping on the data analytics bandwagon. However, this does not necessarily mean that they are leveraging tools & data effectively. Here are some barriers to effectively implementing data analytics in financial institutions.

Lack of a centralized approach to capture and analyse financial data.
Insufficient infrastructure and technologies to capture and handle transactional data and customer data on a massive scale.
Leadership does not support the use of data analytics and are skeptic about the impact it could have on their predictions.
Dearth of talent to deal with the data and derive meaningful patterns to corroborate evidences towards predicting market shifts and financial meltdowns.
Defining metrics to measure the role of analytics in transforming the financial sectors.

The future of data science with respect to the financial services industry is moving towards a model that is easy for the average analyst—and company—to use. The goal is for you to get usable, real-time, easy-to-understand insights using the cutting edge technologies and techniques to overcome the aforementioned challenges [3] Use of analytics is becoming a necessity in the financial services industry and using it appropriately, will serve as the key differentiator between firms that become successful and firms that fail in the long run.

References:

[1]http://tomfishburne.com/2014/01/big-data.html

[2]http://www.informationweek.com/software/information-management/seven-breakthrough-sentiment-analysis-scenarios/d/d-id/1096156?page_number=1

[3]http://blogs.adobe.com/digitalmarketing/analytics/future-analytics-adobe-summit/

Future of Data Visualization

18 Friday Apr 2014

Posted by mkleimancmu in 25_Data visualization

≈ Leave a comment

Tags

04_Future

Future of Data Visualization

In the last post, we focused on the current techniques and challenges of data visualization. As we discussed, there are several new and impressive toolkits and dashboards. However, we did not focus on how the nature of data visualizations are changing. This blog post will address that topic by discussing two features of data visualizations that have begun to gain traction: recommender engines and thefusion of data visualization with business intelligence tools.

Recommender Engines

It is well-known that companies such as Netflix have gained prominence in recent years through usage of recommender systems. Recommender systems are popular because they simplify decision making by helping something that there is a high probability they would like. This same model can be applied to data visualization; for example, a visualization tool may provide recommendations on color schemes based on how you set the type on an infographic (Scheidegger, 2013). Described as an exciting prospect, this sort of product unfortunately does not currently exist in the wild. Recommender systems that use collaborative filtering can predict which features of a visualization will be most impactful to an audience.

The Fusion of Data Visualization with Business Intelligence Tools

These days you will be hard pressed to find a company that does not consider Business Intelligence tools a necessity. Like data visualizations, provides the ability to leverage and explore data. Therefore several companies are fusing the two together in order to make powerful visuals easily available for consumption and manipulation. For example, SAS promotes their product Visual Analytics with the tagline “Better analytics. Faster insights. Built for everyone.” (http://www.sas.com/en_us/software/business-intelligence/visual-analytics.html#). In order to make the product even more appealing, Visual Analytics is deployable onsite, in the private cloud, and in the public cloud. Similarly, Tableau, a well-known provider of BI-driven data visualization, has launched Tableau Online in order to pursue the same market (http://www.tableausoftware.com/products/online).

Upcoming Problems with Data Visualization & Aspects in Decline

While it is certainly an exciting time to be in data visualization, there are certainly a few problems to bear in mind. For starters, several of the products available have complicated pricing schemes that inhibit mass availability. Luckily, Tableau Online has simplified their model to $500 per user per year. Complications in the field do not just end with pricing: JavaScript visualization tools still need to become simpler to use in order to reach mass availability. Finally, it is always worth repeating that visualizations can still be used to deceive audiences. For example, a recent chart featured on Business Insider has caused commotion for an inverted y-axis:

A misleading chart

(http://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2)

It is very difficult to interpret the above graph, and a quick glance might lead one to conclude the opposite of what is true: gun deaths in Florida have been on the rise since 2005’s ‘Stand Your Ground’ law.

Our current obsession with mobile devices has also influenced data visualizations–visualizations were previously restrictive in what mediums they could be viewed on. Luckily, this aspect is in decline; HTML5/JavaScript and design principles such as “Responsive Web Design” have increased the accessibility to visualizations on all devices. Major visualization providers such as SAS have also done their part to ensure that visuals can be accessed on the go.

In conclusion, data visualization tools have begun to make exciting advancements and seem as if they are on the cusp of simplifying the creation of powerful visualizations. While a lot of this technology is only targeted to corporate users, it does not seem unreasonable that these leaps in data visualization will eventually reach everyday consumers. Soon will be the days when anyone can create a powerful visual to present for any sort of reason. Accessibility means that these visuals will be always interpretable, regardless of device and programs available.

Works Cited

Booker, Ellis. “How Data Visualization Experts See the Future – Information Week.” Information Week. UBM Tech, 9 Sept. 2013. Web. 18 Apr. 2014.

Engel, Pamela. “This Chart Shows An Alarming Rise In Florida Gun Deaths After ‘Stand Your Ground’ Was Enacted.” Business Insider. Business Insider, Inc, 18 Feb. 2014. Web. 18 Apr. 2014.

Scheidegger, Carlos. “The Future of Data Visualization Tools.” Visually Blog The Future of Data Visualization Tools Comments. Visual.ly, 11 Mar. 2013. Web. 18 Apr. 2014.

FUTURE OF RECOMMENDATION SYSTEMS

18 Friday Apr 2014

Posted by prempv in 05_Recommendation engines

≈ Leave a comment

Tags

04_Future, Future of Recommendation Technologies

In our previous blogs we have discussed in detail about the different technologies used in recommendation systems and the current applications of recommendation systems. In this blog we will focus on trends in the field of recommendation systems and possible domains where they can be used. We will finally touch upon challenges that next generation recommendation systems will face.

New Directions

There are several new domains and avenues that recommendation systems could be put to use in. Described below are some of these areas.

Local Businesses:

A potential area where the use of recommendation systems would be beneficial is in local businesses. Recommendation systems could serve as a bridge between these businesses such as hotels, tourist spots, eateries etc. and the customers. In recent times with the advent of smartphones, users rely heavily on apps such as Yelp, and Tripadvisor while deciding which place to visit. This fact can be leveraged and the data from such companies could be used to provide recommendations to customers. For Example, the recommendation systems could provide suggestions on which restaurant a customer should visit based on whether the cuisine is vegetarian or not, the calorie content, cooking style etc.

Configuration Systems:

Configuration is a design activity which involves composing a target product from a set of predefined components. [2] In such systems usability is always a concern. Recommendation systems could be used to help users select those features that are relevant to them. For instance, all users may not be interested in the GPS feature of a camera. Recommendation could be used to filter out the unnecessary feature from the user’s point of view

Near Real -Time Item Based Deals:

Conventionally, the shopping related recommendations being provided to customers is static. In the future, recommender systems could be used to provide almost real-time deal recommendations and targeted ads.

It is possible for credit card companies to provide information regarding discounts and deals in nearby shops based on the customers purchasing behavior. [1] For instance if a customer purchased a winter coat she could be provided with information on the discounts available for boots almost immediately.

These real time deal recommendations would be extremely valuable for both customers and businesses, and will eventually facilitate creating personalized malls for customers.

Currently depending on the time of the day, group of people present in that area billboards in subway stations display appropriate ads. These customizations could leverage on the real time nature of interactive marketing, and with this data recommendation systems could decide upon what ad is to be placed in that bill board depending on people who are around. This makes way for a more flexible and centralized advertisement broadcasting setup that gives out a very specific ad to connected billboard depending upon the people near the billboard. Such possibilities are also fueled by the boom of internet of things.

Persuasive Software Development:

Development teams work in high pressure conditions and are faced with the challenge of presenting deliverables on time as well as adhering to quality guidelines. Quality can be tied to the understandability of the software. Recommendation systems can be used to inform users about critical sections in the code as well as measures to increase software quality [2].

Pribik et al. introduced such an environment which has been implemented as an Eclipse plugin (www.eclipse.org).[3]

Smart Homes :

Recommendation systems can be used to improve the quality of life at home and to provide a better experience overall. For instance , these systems could suggest the desired setting for the heating ,raise alerts when an unsafe situation is detected, optimizing energy usage etc. This idea could be extended beyond homes to schools and workplaces as well.

POTENTIAL ROADBLOCKS

As the importance of recommendation systems grows, there are third party vendors whose primary focus is to create and sell recommendation systems to clients. The challenges that these companies are faced with have been explained below.

Commoditization:

Vast amount of information pertaining to recommender systems is freely available and easily accessible. In addition to this, there are relatively few patents in this domain. Hence developing recommender systems in-house is not an insurmountable task today. Consequently most companies would prefer developing their own recommender systems in-house. Therefore from the perspective of third parties whose sole product is recommendation systems, obtaining a client contract would be challenging. This threat can be negated to some extent if recommendation system providers bring unique value and provide relevant and contextual recommendations.

Limited Control:

Recommendation systems provided by third party vendors usually make use of proprietary technology that cannot be easily altered. As a result of this, potential clients feel that they have limited control over these systems and the algorithm used. Keeping this in mind it is important for recommendation system providers to keep the clients in the loop. For instance if the client is an online retailer, providing the client with the option to alter the algorithms, select custom algorithm etc. would provide them would more control and would ultimately result in increased revenues.

Privacy:

As we have seen from our previous blogs, the more data recommendation systems have access to the better they perform. This brings in several considerations.

Firstly there are legal factors that come into the picture. For instance, the Data Protection Directive which the European Union follows dictates that organizations need to gain user consent to process their data. [1]Such guidelines make the data collection for recommendation systems considerably more challenging.

In addition to this, there is the customer’s perspective. Customers get bothered when sensitive information is used and when their data is shared with third party recommendation system vendors. They view this as over personalization.

MOVING FORWARD

Up and till recently , recommendation systems have been widely used in only in the developed nations. Currently there is a huge demand for these systems in developing nations as well. Cisco predicts that by 2020 50 Billion devices would be connected to the internet[4]. These devices will be bringing in more real time data that might have an influence on the way recommendation systems work. This can be attributed to two reasons. The first one is that e-commerce in these countries is only now beginning to leave its mark, and the second is the smartphone boom in these nations. Both of these factors provide unique opportunities that recommendation systems can leverage[1].

Considering the wide range of domains that recommendation systems can be used in as well as the demand for them throughout the world, we believe that the importance of these systems will continue to grow. As we have highlighted by means of our previous blogs, the field of recommendation systems is a dynamic and complex one. It will be interesting to see what the future holds!

REFERENCES

[1] Third-party Recommendation Systems Industry: Current Trends and Future Directions – Amit Sharma

[2] Toward the Next Generation ofRecommender Systems: Applications and Research Challenges – Preprint Version of paper published in: Multimedia Services in Intelligent Environments: RecommendationServices, Springer, 8767:81-98, 2013, see:www.springer.com.

[3] Pribik, I., Felfernig, A.: Towards Persuasive Technology for Software Development Environments: An Empirical Study. In: Persuasive Technology Conference (Persuasive 2012). pp. 227{238 (2012)

[4] https://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf

Data Stream Analytics: Future Scope and Problems

18 Friday Apr 2014

Posted by richac in 18_Data stream analytics

≈ 1 Comment

Tags

04_Future

Google Chairman Eric Schmidt has said that every 2 days the world produces five exabytes of data. This is as much data as the world produced from the start of humanity until 2003 ^[1]. The amount of streaming data being generated is exploding fast from the growing variety of interconnected machines, devices, sensors, and consumer content. The amount of data produced from any one source is surprising. For example, Virgin Atlantic has a new fleet of highly connected planes that generate a half terabyte of data on every flight ^[2]. This data is expected to grow in size even further resulting in enormous data points termed “Big Data”. According to Gartner’s prediction for business intelligence, by 2017 more than 50 percent of analytics implementations will make use of event data streams generated from instrumented machines, applications and/or individuals ^[3]. Managing such huge increasing data will pose a big challenge to current stream mining techniques and will need evolution in terms of how the results are being dealt with today.

The healthcare industry is expected to see a drastic change in data input and the requirements from the decision support systems using this data. Moving forward, expectations will rise from assisting doctors in diagnosis to tracking a patient’s state in real time. Physicians should be able to analyze the patient’s information during the treatment duration and accordingly decide further steps of treatment. At UCLA’s school of medicine Dr. Paul Vespa is heading a research where real-time signal streams from the brain are being analyzed using IBM’s Watson Foundation to help physicians through decision support for brain analysis ^[4]. Such types of real time analysis will be expected from stream processing in healthcare.

FIGURE 1 – Healthcare Decision Support Systems ^[7]

Streaming data has the potential to radically reduce the time to provide business critical information if it can be captured, processed and analyzed in real time. In order to make effective use of this vast information, visual data processing needs to be seamlessly integrated with traditional streaming analysis. Stream analytics needs to shift from static report based solutions to user driven interactive visualization of information. Users should be able to access and combine information from multiple sources of data streams and view recent result from all the business functions. Currently, due to the size of the data and recentness, it is difficult to find user-friendly visualizations. Going further new techniques needs to be developed that will incorporate easy data visualization into all stream mining software.

Our research ^{[5], [6]} suggests that the following are major areas where we can see potential problems with the current data stream mining techniques and implementations.

1. Converging real time and historical data – If the data streams from various sources grow as they are expected to, making sense of such incoming Big Data along with the historical data present will be a challenge. Architectures that deal with real-time and historical data will need to be developed to make efficient use of all the information at our hands.

2. Situation aware data stream mining – Real time data stream mining that can provide results in seconds using the most recent data will be in demand. Present data stream techniques involve modifying models based on the characteristics of incoming streams. Such data systems will eventually become computationally slow as the amount of data pouring in every second increases. New methodologies should be devised such that models that were built in similar situations are recalled rather than building new ones for every scenario in order to provide faster analysis of the current state.

3. Mobile data stream mining – Given that the number of mobile technology users has augmented radically in the past decade, data stream mining will need to be performed on remote mobile devices. For example, we need systems that should be able to analyze a driver’s behavior and vehicle’s health from the incoming streams coming through and prevent accidents and adverse events. Such systems will face computational and connectivity challenges. Techniques that can yield efficient and faster results for such scenarios using minimal resources need to be researched upon.

References

[1] “http://techcrunch.com/2010/08/04/schmidt-data/”, Aug 2010, Tech Crunch

[2] “http://sandbox.macworld.com.au/news/boeing-787s-to-create-half-a-terabyte-of-data-per-flight-says-virgin-atlantic-88897/#.UzWkA8eDpEg”, March 2013, MacWorld

[3] Gartner predicts Business Intelligence, “http://www.gartner.com/newsroom/id/2637615”, Dec 2013, Gartner

[4] Using data stream analysis in brain research, “http://www.ibmbigdatahub.com/blog/using-data-stream-analysis-brain-research-ucla%E2%80%99s-school-medicine”, March 2014, IBM

[5] Advances in Data Stream Mining, “http://www.immagic.com/eLibrary/ARCHIVES/GENERAL/JOURNALS/W120101G.pdf”, Mohamed Medhat Gaber, Feb 2012

[6] Big Data Mining future challenges, “http://albertbifet.com/big-data-mining-future-challenges/”, April 2013, Albert Bifet

[7] Clinical Decision Support Systems, “http://www.philblock.info/hitkb/c/clinical_decision_support_systems_part1.html”, philblock.info

Future of Data Science in Education

18 Friday Apr 2014

Posted by ldreyer2014 in 39_Data science in Education, Uncategorized

≈ 1 Comment

Tags

04_Future

The data science trends identified in this blog series are definitely here to stay. However, the upcoming challenges and opportunities lie in operationalizing the various educational technologies in day-to-day learning environments. This blog focuses on two such environments: credit/certificate based learning programs and higher education. In addition, we re-visit Game-Based Learning as a technique which is still evolving and where the education landscape still needs to find repeatable successes in order for the approach to gain acceptance in wider applications.

Future of Massive Open Online Classes (MOOCs):

MOOCs provided by companies such as Coursera, Khan Academy, and EdX have changed the manner in which individuals educate themselves worldwide. The basic idea behind MOOCs is that anyone can receive higher education/training at anyplace at anytime for free. These online courses have gained popularity due to the array of content made available by top-tier institutions.

In the past, individuals have not been able to receive credit for these courses due to potential identity fraud. Coursera, however, has recently broken this boundary by experimenting with keystroke biometrics in their Signature Track verification system. The system analyzes the user’s keystroke characteristics and facial image to verify user identity. This process, thereby, allows Coursera to offer their students a verified certificate for a given program to show potential employers.

Even though the use of keystroke biometrics has furthered individuals’ career paths, it has even greater potential in the organizational context. Companies are often obliged/encouraged to offer employees opportunities to expand their knowledge and skillset. These benefits are often provided by sponsoring employees to attend conferences and/or workshops. MOOCs can potentially be used as a substitute or complement to these current benefits. For example, for employees that are interested in furthering their knowledge in data science, employers may offer to sponsor their Signature Track enrollment ($30 – $100) in the John Hopkins’ Data Science Certificate available on Coursera. In order for companies’ to utilize MOOCs to their full potential, Coursera, Khan Academy, and EdX may consider partnering with industry leaders to provide more industry focused workshops.

Switching to ‘Evidence Based’ Approaches in Higher Education

Institutions in higher education are also not immune to the changes in the educational landscape brought about by online and ‘automated’ learning environments. Candice Thille, the founding director of the Open Learning Initiative (OLI) at Carnegie Mellon, foresees major shifts in the way classes are structured at traditional educational institutions such as community colleges and universities. Specifically, she predicts “a switch from an intuitive to an evidence-based approach for course development, delivery, and assessment and from a solo content expert to an interdisciplinary team for developing, evaluating, delivering, and improving courses” (Thille & Smith, n.d.).

Overcoming general aversion to risk as well as faculty distrust of the ultimate objectives of ‘automated tutors’ will require rigorous evaluation of the true impact of these types of technologies and how they are informed by learning-science theory. Organizations such as OLI and the Pittsburgh Science of Learning Center are already engaged in research of this kind where “student learning data [from adaptive learning platforms] not only provides feedback to students, instructors, and course-design teams but also prompts further research” (Thille & Smith, n.d.).

The increasing research focus in this area of education suggests that we can expect to see formalized insights from the learning sciences to inform deployment of these technologies. There are a huge array of ‘parameters’ which need to be assessed and tweaked as part of deployment. For example, one of the major research areas focus on determining the right ‘mix’ of online vs classroom content delivery across subject areas. Thus, student performance data has to be monitored and reviewed to identify where the ‘automated learning systems’ need traditional teaching support. Furthermore, if the goal is to provide ‘personalized education’, the optimal ‘mix’ parameters may need to be computed separately for each student based on their aptitude and learning styles. Recognizing these parameters would require a combination of adaptive learning data science techniques as well as domain expertise from the learning sciences.

Future of Gamification Learning

Gamification learning is still a relatively new field and it has a promising future for growth. It has the potential to grow through all academic fields and help promote learning. However, the future of gamification learning will be heavily dependent on how well games can engage the user. The most difficult problem in engaging the user is discovering a game’s difficulty sweet spot. If a game is too difficult the user will feel too frustrated, likewise if a game is too easy then the user will get bored. In both these case the user will most likely give up. Finding the right difficulty will give user a sense of achievement as they progress through the game and keep on playing. Eventually, gamification learning will elevate social learning through Mass Multiplayer Online game (MMO). MMOs are online games in which many players can work together simultaneously to complete certain tasks. The problem with the growth of MMO is that games will eventually require lots of server to group all these player into different cluster and have a good cluster technique identify what difficulty each player should play at. However, the growth of MMO might lead to a decline in personalized learning and we need to find a way to ensure that individuals will also learn on a personal level. It’s important to find the balance between independent task and group task such to ensure the growth of each individual (Li, 2013).

References

Li, Ming-Chaun, and Chin-Chung Tsai. “Game-Based Learning in Science Education: A Review of Relevant Research.” Journal of Science Education and Technology 22.6 (2013): 877-98. Print.

Thille, C., Pittsburgh Science of Learning Center & Smith, J. (n.d.). Cold Rolled Steel and Knowledge: What Can Higher Education Learn About Productivity? Retrieved April 15, 2014, from Change the Magazine of Higher Learning: http://www.changemag.org/Archives/Back%20Issues/2011/March-April%202011/cold-rolled-steel-full.html

Future for Data Science in Entertainment Industry

18 Friday Apr 2014

Posted by gigakayla in 36_Data science in entertainment industry

≈ Leave a comment

Tags

04_Future

1. Introduction

Nowadays, data science is becoming more and more important in the Entertainment Industry. Different data science methods are used in this fields to improve user experience. So what is the trend and future for data science in this industry? This is the topic we want to talk about in this post. We will take game industry as a case analysis for detailed explanation.

2. Game Industry – A relatively new but popular sub-industry for entertainment fields

Video games and online games dominate the game industry. And almost every big game company will have data research teams to support their operation. Data positions in game company can be classified into three main types: data analysts (MBA students dominating), data engineers or database administrators (CS student dominating) and data scientists (Phd student dominating). In the future, the most changes will be taking places for all of the data guys.

2.1 24 hours data need 48 hours to proceed – Big Data Curse

First of all, as more and more people are involved in games, the data volumes are increasing significantly with unbelievable speed. Before game companies mostly use excel, traditional database and easy data presentation tool for their work. But as time going on, data guys in game company finds out that “Oops, our one day’s user data has to take us 48 hours to proceed, we need revolution!” So they decided to take new tools for their data. This is the real story for Riot games.

Riot Games, Inc. was founded as an independent game studio in 2006 by Brandon “Ryze” Beck, and Marc “Tryndamere” Merrill in Los Angeles.The company announced its first game, League of Legends: Clash of Fates, in October 2008, and released the game in October of 2009 as simply League of Legends. [1].

Riot Games Logo [2]

Before the game is growing so fast, they simply use traditional and simple data tools for data work. However, as more and more players are fascinated with this game, their data became really big data. There is a picture to compare Riot games data situation between the past and now. In this picture we can see that the number of tables before is 180 but now is 1200! The increase is more than a double. Also there is no pipeline event per day, but now the number of pipelines is 7+ billion. The tools and environments are also changed significantly. So as the data trend going on, in the future, their data will become huge and new technologies and tools must be applied. [3]

Riot Games Data Increasing Comparison [4]

Data is infinite, but human efforts is limited. How to solve this conflict? That’s a question for the future.

2.2 Data is more informational and helpful than you know

Data in game industry now is used most in a statistical way. For example, as we talked before, “Flappy Birds” is taking your location data as into their analysis process. They want to know the geographical distribution for the players of their game. But in the future, more values inside data will be exploited for better use.

Let’s go on with Riot Games to be our case study topic. Now, Riot is using data to improve player’s experience, because they take players’ feelings as the priority. But do you know data can have infinite information for us to explore?

First of all, Riot wants to take “log chatting” data into consideration as their next step. Players tend to chat a lot during games, and their chatting records can reveal a lot of useful information. For example, let’s imagine that you are playing a champion that you think it is the worst. So during the game, you are complaining about that champion all the time with your teammates. And they will also reply you with their attitudes about the champion. So after collecting your guys’ chatting data, sentimental analysis can be done to help Riot improve their champions. Also, in your chatting records, some basic information of yourself will be revealed too. That helps the game company to grasp more of their user characteristics. Interesting, right?

Second, detailed game play even tracking will be done more carefully. Riot games will track your actions and reactions in the game to know more about their players’ behaviors. This makes them easier to catch your feeling and do prediction on you. Definitely, this can improve your experience in game and of course, this can also help them earn more profits.

Data science provides a win-win situation in entertainment industry!

2.3 Data Center Everywhere

Let’s imagine that all the things we mentioned above have been already perfectly completed by Riot, the players are still complaining. Why? It is data latency!

Huge data centers for Riot need to be built everywhere in the world, at least in four to five main countries’ core cities. Especially, for Asia countries, they have limited Internet access but high game demands. Physical solutions must be established together with our data plans mentioned above. So data centers become the biggest support.

In the future, more and more data centers will be built by entertainment industry to satisfy huge data need all over the world and reduce data latency.

3. Conclusion

In this passage, we use case study for Riot Games to explain future trend for data science in entertainment industry. More changes about data will take place and be seen by the world in the future. Data scientists, are you ready?

[1] “Riot Games.” Wikipedia. http://en.wikipedia.org/wiki/Riot_Games

[2] “Riot Games.” Wikipedia. http://en.wikipedia.org/wiki/Riot_Games

[3] “Oozie @ Riot Games.” SlideShare.http://www.slideshare.net/mattgoeke/oozie-riot-games

[4] “Oozie @ Riot Games.” SlideShare.http://www.slideshare.net/mattgoeke/oozie-riot-games

datascienceCMU

~ Learn,Explore,Network on Data Science

Tag Archives: 04_Future

The Future of Deep Learning

Customer Segmentation: Future and upcoming challenges

Future of Sentiment Analysis and Problems faced

THE FUTURE OF DATA SCIENCE IN FINANCE

Future of Data Visualization

Future of Data Visualization

Recommender Engines

The Fusion of Data Visualization with Business Intelligence Tools

Upcoming Problems with Data Visualization & Aspects in Decline

FUTURE OF RECOMMENDATION SYSTEMS

Data Stream Analytics: Future Scope and Problems

Future of Data Science in Education

Future of Massive Open Online Classes (MOOCs):

Switching to ‘Evidence Based’ Approaches in Higher Education

Future of Gamification Learning

References

Future for Data Science in Entertainment Industry