BusinessSearch Project work flow and description
Objective of BusinessSearch Project:
Main goal of this project is to collect the business profile in a automated way and try to develop a system that can update the profile in quarter or half year basis.
Flow Description:
The target of our project is to collect maximum business profile. But the business profile is fully depends on host name. If we have a single host name then we are able to create a profile by using it’s home page, contact us, about us pages. But we do not have enough host names to extract Business profile for our project . That is why our main target is to collect maximum host name and convert it to a perfect business profile.
Current / available attributes for Business profile are:
· Fax
· Email
· Website
· Zip code
· Address
· Phone no
· GPS location
· Contract person
· Contact person’s designation
Upcoming/ future attributes are:
· Business type
· Business Description
· Owner/ Proprietor/ Director
· Establishment date
· Registration Date
· Logo
Challenges: WebCrawler
· Robustness
· Mirror
· Hashing
· Unexpected bug
Work done so far in Business Search:
· Fax : 80- 90% accuracy
· Phone: 80-90%
· Zip Code: 70-80%
· Website: 85-90%
· Email: 85-90%
· Address: 45-50%
· Company Name: 45-50%
· Contact Person: 50-60%
· Contact Designation: 60-70%
· GPS Cordinate: 60-70% but (we will consider it later with the support of iSearch/Lucene)
· Branch: 70-80%
Yet to Develop:
· Business type – (Machine Learning)
· Business Description – (Snippet)
· Owner/ Proprietor/ Director (Natural Language Processing)
· Establishment date - Parsing
· Registration Date – Parsing
· Logo – Paring + Tricks + Crawling
No comments:
Post a Comment