Tags
Web scrapping is a technique to fetch data from the web to users, usually in a routine fashion. Any business that consumes data can potentially be active users of web scrapping to reduce the manual labor in web browsing significantly. Government agencies, educational institutes, industry players and entrepreneurs can all make use of web scrapping. Google is an exemplar in incorporating web scrapping to their business model – Google constantly scrape information from other websites and store relevant information into its database. When end users type a particular search string into Google, the search engine will return a list of websites based on Google’s prior scrapping effort. Likewise, Kayak returns the lowest flight fare via scrapping ticket information from a number of online flight ticket vendors. [1]
[2]
Web scrapping can be used to solve a wide range of problems, as long as users are able to derive structured information from websites. [3] For instance, real estate companies can apply this technology to receive a list of property information from other real estate websites. Websites that serve price minded customers can generate current deals and coupon codes from other online stores. University career offices can automatically collect tailored list of internship and job opening from websites such as Glassdoor.com, Indeed.com and company career websites. [4] Therefore, web scrapping is a flexible tool that can tackle various tasks for different users.
The most rudimentary technique in web scrapping is simple copy and paste. However, web scrapping can be achieved through automated process empowered by programming language such as Excel VBA, R, Python and Perl. Common methods of web scrapping include DOM parsing and HTML parsers. Instead of hiring programmer who can write code to “hack” other organizations’ website, they are also utilize web scrapping software for web scrapping solutions. There are also a number of web scrapping tools, including commercial tools such as Mezenda, Import.io, and open source tools such as DeiXTO and Scrapy. [5] Users can choose their web scrapping methods based on the problems they need to tackle, as well as organizational factors such as budget constraints and security policies.
[6]
[1] https://www.udemy.com/learn-web-scraping-in-minutes/
[2] http://promptcloud.com/web-scraping-companies.php
[3] http://www.scrapegoat.com/faqs.php
[4] http://deixto.blogspot.com/2012/03/uses-and-applications-of-web-scraping.html
[5] http://www.kdnuggets.com/software/web-content-mining.html
[6] http://online.wsj.com/news/articles/SB10001424052748703358504575544381288117888