Spam Free Googling Technique to Retrieve Optimized and Multilingual Results, Translation and Speech

Paper contains the feature where the spam links and results are filtered from the result set coming from google.com obtained by giving a query. Also there may be a feature where results will be in multiple languages. From this we know that Google server has almost all the languages worldwide but we can implement a translator for multiple languages. According to that we can implement a synthesizer to convert the multilingual text to the speech file in common MP3 format. Vice versa the reverse process is possible to provide a source text to translate via speech spoken in multiple languages.

 I. INTRODUCTION

Google search is the traditional way of surfing the web so where you spend a lot  of time to search for unnecessary advertisement links which of no use for normal user. We can get the exact result when the user is advanced. But when a new user uses Google, we can see the frustration on his face.

Another issue is that, when a user wishes to search in the language different than English then he can search in custom languages. Because Google servers has almost all languages, as they Google has country wise servers; Ex- google.co.in for India, google.co.uk for UK, google.ru for Russia. But still there is no facility to translate one language to another language.

There is no tool where we can provide multilingual text input to a translator to convert it in to another language. As well as no tool for text synthesis to convert the translated text in MP3 file to learn how to pronounce the translated text.

II. TOOLS AND TECHNOLOGIES

Hence some new techniques and algorithms we decided to work on this and explore the new possibilities of the surfing the web and exploring the knowledge from web from the internet which is the flood of information.

Information is only worthy only when it is precise and relevant to our context. So for the sake of user convenience we decided to work on this context and for this we actually implemented and introduced some more algorithms which could

1. Eliminate the spam links
2. Eliminate the advertisement links
3. Reorder the links according to the closer relevance
(Here the relevance means preciseness according to the given context)

So first the modules those were proposed for the implementation for the system are all refined by our backend and given more accuracy to get the neat and correct result to the user. The information gathering done actually Google works and searches for the information accordingly we built approach to achieve the same thing.

III. The Search Filter

For the optimized and actual results from the Google we have to follow certain steps. The first step is filtering the Query.

The second step is to filter the advertisement links from the result set coming from Google server. The third and last step includes the ranking and rearrangements of the results in the result set.

 

The ranking can be done by the ranker algorithms or the repetition count algorithm.

IV. The Search Context

The searching can be any type of the object. May be that is multimedia object, or anything else ex-

  • Image
  • Web
  • Videos
  • Contact
  • Patent
  • News
  • Blogs

The user needs to search the images required for his editing and presentation work. Basically problems faced are - Don’t get for what he searches. Examples: Searching for HD image but it also encompassed with the icon images which is very weird thing. So the user centric approach is provided in terms of options.

So here the idea is providing the only results for what results user is interested in and eliminates all the rest of the results.

Strategy as follows:

Size:

If user want to search on the basis of file size
If size=huge then
Populate the images with huge size
Else if size=large then
Populate the image with size large
Else if size=small then
Show low resolution images
Else if size=icon then
Populate the icon images
End if

Color:

The user provides to have the color images in the search results then give the color images.
If image query =color then
Populate the color images results
Else if image query=gray scale then
Populate the gray scale images.
End if

IV. The Blog Search Algorithm

Sometimes you need to get some information which is having a very personal touch which is provided by no professional so here is the idea why should not we search for the information made by the normal people just like us. What we need to do simply search for the needed information on the internet just by searching the blogs of the normal users just like us proving or claiming their identity on the net by providing the information which is worth more useful but free of cost. The reason is they are not made by the profession groups or you may say the companies.

Benefit is you don’t need to pay a single penny to avail the information.

Sometimes info also gets a quick help by the blog owners since they have the craze of increasing their page views.
We get quick and instant resource of information.

Strategy is as follows - 
The information is searched with the help of using the http header. Search for the blogs all over the internet. Then collect that result in the collection object. The thing is the collection is then iterated

If collection all elements are visited then
STOP
Else
Repeat the above process
End if
Get all the results and then
Pass to the next module print ()

Print means passing the results to the front end and displaying those links on the user’s web page.

Along with a little summary or description about that topic,

If all the results are passed to the front end then
Stop
Else
Repeat the same process until the all the all collection elements are visited.
End if.

V. Translation

According to above diagram, the user requests to a translator to translate the source text to the target language. Then the Query first goes to the regional server to contact to the web.

With reference of web the query is sent to the indexed dictionary of the languages. The grammar is analyzed over this phase.

The Google is responsible to provide the patterns and grammar of the target language. Then according to the DFA tree created of the source language, sentences are built in target language. Hence this way the language is translated sentence by sentence in the target language.

VI. Speech to Text

In computer science, speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition”. Some SR systems use training where an individual speaker reads sections of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the recognition of that person's speech, resulting in more accurate transcription. Systems that do not use training are called "Speaker Independent" systems. Systems that use training are called "Speaker Dependent" systems.

Speech recognition applications include voice user interfaces such as voice dialing e.g. "Call home", call routing, demotic appliance control, search e.g. find a podcast where particular words were spoken, simple data entry. Speech-to-text processing e.g., word processors or emails), and aircraft.

 

Performance:

The performance of speech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).

However, speech recognition (by a machine) is a very complex problem. Vocalizations vary in terms of accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed. Speech is distorted by a background noise and echoes, electrical characteristics. Accuracy of speech recognition varies with the following:

  • Vocabulary size and confusability
  • Speaker dependence vs. independence
  • Isolated, discontinuous, or continuous speech
  • Task and language constraints
  • Read vs. spontaneous speech
  • Adverse conditions

Neural Network Classification

Given basic sound blocks, that a machine digitized, one has a bunch of numbers which describe a wave and waves describe words. Each frame has a unit block of sound, which are broken into basic sound waves and represented by numbers after Fourier Transform, can be statistically evaluated to set to which class of sounds it belongs to.
The nodes in the figure on a slide represent a feature of a sound in which a feature of a wave from first layer of nodes to a second layer of nodes based on some statistical analysis. This analysis depends on programmer's instructions. At this point, a second layer of nodes represents higher level features of a sound input which is again statistically evaluated to see what class they belong to.

Last level of nodes should be output nodes that tell us with high probability what original sound really was.

VII. Text to Speech

Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented.

In software or hardware products, A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.

Qualities

The most important qualities of a speech synthesis system are naturalness and intelligibility. Naturalness describes how closely the output sounds like human speech, while intelligibility is the ease with which the output is understood. The ideal speech synthesizer is both natural and intelligible. Speech synthesis systems usually try to maximize both characteristics.

The two primary technologies for generating synthetic speech waveforms are concatenate synthesis and formant synthesis. Each technology has strengths and weaknesses, and the intended uses of a synthesis system will typically determine which approach is used.

VIII. Conclusion

Finally, after all Google search will not the traditional way of surfing the web.

So you don’t need to spend a lot of time to search for the desired query result. And also non-advanced and new user can get the exact result.

Another benefit is that, when a user wishes to use the facility to translate one language to another language then he can use it when he wants.

The user will also be able to convert his speech to the source text and get the MP3 sound as a reply in the target language.

About Authors:

Neve Jitesh R.
Department of I.T., AVCOE, Sangamner
Email: jiteshneve@gmail.com
Patil Ekata I.
Department of I.T., AVCOE, Sangamner
Email: ekatapatil18@gmail.com
Chaudhari Suresh F.
Department of I.T., AVCOE, Sangamner
Email: Chaudharisuresh01@gmail.com
Shimpi Pankaj S.
Department of I.T., AVCOE, Sangamner
Email: pankajshimpi83@gmail.com

 

REFERENCES

Proceedings Papers:

[1] Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hyper-textual Web Search Engine, Computer Science Department, Stanford University, Stanford, CA 94305.
[2] Richard Chiburis, Jishnu Das, Michael Lokshin, The Anatomy of a Large-Scale Hyper-textual Web Search Engine, Economics Letters, vol. 117 (2012), pp. 762-766. Year 2012

[3] Bingsheng Wang, Haili Dong, Arnold Boedihardjo, Chang-Tien Lu, Harland Yu, Ing-Ray Chen, Jing (David) Dai, An Integrated Framework for Spatio-Temporal-Textual Search and Mining, 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2012), ACM, 2 Penn Plaza, Suite 701, New York, NY 10121, pp. 570-573 year-2012








}