Dev:APIyacysearch

Aus YaCyWiki
Wechseln zu: Navigation, Suche

Search Interface

The main YaCy search page is at /index.html, but all search results are presented on the result page at /yacysearch.html. That page provides the visual results, but can also be used for technical search requests using an XML or JSON output format. For the search query we use the SRU (Search/Retrieve via URL) schema and for the XML result format we use opensearch schema which is an extension of RSS. Therefore, YaCy search results are readable with any RSS reader right out of the box. To get this output format, just click on the 'RSS' box in the upper right of the search result page or replace 'html' with 'rss' in the result URL.

An example path for a search query is

/yacysearch.rss?query=freedom&resource=global&urlmaskfilter=.*&prefermaskfilter=&nav=all

The api can also return JSON using

/yacysearch.json?query=freedom&resource=global&urlmaskfilter=.*&prefermaskfilter=&nav=all

i.e. try http://localhost:8090/yacysearch.rss?query=freedom&resource=global&urlmaskfilter=.*&prefermaskfilter=&nav=all or http://localhost:8090/yacysearch.json?query=freedom&resource=global&urlmaskfilter=.*&prefermaskfilter=&nav=all

GET-Parameters for http Requests

The GET-attribute names are based on SRU, but there are some more for special features of YaCy.

query = [URL encoded search string]

if search contains the keyword /date then YaCy sorts the search result by date
if search consist of two search term and the keyword NEAR, YaCy will raise the ranking if the two terms appear close together
the keyword LANGUAGE:lang can be used to select the desired language e.g. LANGUAGE:en
see also En:SearchParameters. Special keywords are:

  • /date
  • /heuristic
  • /language/
  • /location
  • /near
  • /radius/
  • /vocabulary/
  • inlink:
  • inurl:
  • tld:
startRecord = [number] of first record to return e.g. for maximumRecords=10 and startRecord=11 YaCy returns results 11-20
maximumRecords =10 [number] of items YaCy should return - queries without authentication are limited to 10 results
contentdom = app|audio|ctrl|image|text|video ]
resource = local
  • global: ask all peers in network for results
  • local: ask only peer queried
urlmaskfilter = RegExp to limit the search
prefermaskfilter = RegExp for prefering results after search
verify =cacheonly
  • true: YaCy 'verifies the result URLs and returns a snippet
  • false: can be used to speed up search
  • iffresh: use the cache if the cache exists and is fresh using the proxy-fresh rules
  • ifexist: use the cache if the cache exist. Do no check freshness. Otherwise use online source.
  • cacheonly: never go online, use all content from cache. If no cache entry exist, consider content nevertheless as available
lr = desired language e.g. lr=lang_en
meancount = number of maximum alternative queries for 'did you mean?'
nav =
  • all: show all navigators
  • none: no navigators
display = (obsolete?)
  • 0
  • 1
  • 2
Enter =search
  • Search
  • Search again
  • ?

Result

Calling yacysearch.rss will return a XML-formatted list of search results and some additional information regarding the search that may look like this

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='/yacysearch.xsl' version='1.0'?>
<rss version="2.0"
	xmlns:yacy="http://www.yacy.net/"
	xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/"
    xmlns:media="http://search.yahoo.com/mrss/"
    xmlns:atom="http://www.w3.org/2005/Atom">
	<!-- YaCy Search Engine; http://yacy.net -->
	<channel>
		<title>dulcedoSearch</title>
		<description>Search for yacy</description>
		<link>http://localhost:8000/yacysearch.html?query=yacy&resource=global&contentdom=text&verify=true</link>
		<image>
			<url>http://localhost:8090/env/grafics/yacy.gif</url>
			<title>Search for yacy</title>
			<link>http://localhost:8090/yacysearch.html?query=yacy&resource=global&contentdom=text&verify=true</link>
		</image>
		<opensearch:totalResults>145.650</opensearch:totalResults>
		<opensearch:startIndex>0</opensearch:startIndex>
		<opensearch:itemsPerPage>10</opensearch:itemsPerPage>
		<atom:link rel="related" href="opensearchdescription.xml" type="application/opensearchdescription+xml"/>
		<opensearch:Query role="request" searchTerms="yacy" />

Search Results are stored in the 'items' childgroup of the rss-feed channel

<item>
 <title>YaCyWeb.de - unzensierte Suchmaschine - uncensored search engine - YaCy (not YaCi)</title> 
 <link>http://yacyweb.de/</link>
 <description>YaCyWeb.de - unzensierte Suchmaschine - uncensored search engine - <b>YaCy</b> (not YaCi)</description> 
 <pubDate>Wed, 14 Oct 2009 02:00:00 +0200</pubDate> 
 <yacy:size>11667</yacy:size>
 <yacy:sizename>11 kbyte</yacy:sizename>
 <yacy:host>yacyweb.de</yacy:host>
 <yacy:path>/</yacy:path>
 <yacy:file>/</yacy:file>
 <guid isPermaLink="false">g_L5h78wvMRA</guid>
</item>

The YaCy RSS result has much more attributes to reflect also the search facets, which are not part of the opensearch specification. The JSON result is adopted from the RSS result and contains the same information in just a different (JSON) syntax.

Retrieve Search Results with PHP

This is an example using a YaCy PHP api from YaCyAPI4:

// using YaCyapi.php
$search->query($searchWeb)                     
              ->setSources('Web')
              ->setFormat('xml');           
              ->setOptions('startRecord=21');
$results = $search->search();

or native php:

  //open connection to peer    
  $YaCyURL="http://mypeer.tld:8090/";  
  $cu=$YaCyURL."Yacysearch.rss";
  $cu=$cu."?query=yacy";
  $cu=$cu."&maximumRecords=10";
  $cu=$cu."&startRecord=21";
  $queryServer = curl_init($cu);     
  curl_setopt($queryServer, CURLOPT_HEADER, 0);
  curl_setopt($queryServer, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($queryServer, CURLOPT_USERPWD,$appID);
  $results = curl_exec($queryServer);
  curl_close($queryServer);  

and continue with results

  //now we have xml/json, put it in a simple array
  $resultarray=xml2array($results);  #, $get_attributes = 1, $priority = 'tag');
  //get items 
  $items=$resultarray['rss']['channel']['item'];
  if ($items)
  {
   foreach ($items as $item)
   {   
    echo "<p><a href=".$item['link'].">".$item['title']."</a>";
    echo "<br>"$item['description']."</p>"; 
   }
  } else {
    echo "no results";
  }

The native PHP example below is showing results 21 to 30 from a query for 'yacy'

<?php
  //open connection to peer    
  $YaCyURL="http://mypeer.tld:8090/";  
  $cu=$YaCyURL."Yacysearch.rss";
  $cu=$cu."?query=yacy";
  $cu=$cu."&maximumRecords=10";
  $cu=$cu."&startRecord=21";
  $queryServer = curl_init($cu);     
  curl_setopt($queryServer, CURLOPT_HEADER, 0);
  curl_setopt($queryServer, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($queryServer, CURLOPT_USERPWD,$appID);
  $results = curl_exec($queryServer);
  curl_close($queryServer);  
  //now we have xml/json, put it in a simple array
  $resultarray=xml2array($results); 
  //item childgroup 
  $items=$resultarray['rss']['channel']['item'];
  if ($items)
  {
   foreach ($items as $item)
   {   
    echo "<a href=".$item['link'].">".$item['title']."</a>";
   }
  } else {
    echo "no results";
  }
?>

Another example for an enhanced search showing 5 results beginning at 50 for 'yacy', but limit to PNG-images and prefer results from yacy.net. Search should be speeded up a bit so verify is set false and all peers should be asked so 'resource' is 'global'.

http://localhost:8090/yacysearch.rss?query=yacy&contentdom=text&maximumRecords=5&startRecord=50&verify=false&resource=global&urlmaskfilter=png&prefermaskfilter=yacy.net

Example

/yacysearch.rss?display=0&query=test&Enter=Search&former=&verify=true&contentdom=text&maximumRecords=10&startRecord=0&resource=global&urlmaskfilter=.*&prefermaskfilter=&indexof=off

<rss version="2.0"
  xmlns:yacy="http:// www.yacy.net/"
  xmlns:opensearch="http:// a9.com/-/spec/opensearch/1.1/"
  xmlns:atom="http:// www.w3.org/2005/Atom">
  <channel>
    <title>YaCy P2P-Search for test</title>
    <description>Search for test</description>
    <link>http:// localhost:8090/yacysearch.html?query=test&resource=global&contentdom=text&verify=true</link>
    <image>
      <url>http:// localhost:8090/env/grafics/yacy.gif</url>
      <title>Search for test</title>
      <link>http:// localhost:8090/yacysearch.html?query=test&resource=global&contentdom=text&verify=true</link>
    </image>
    <opensearch:totalResults>0</opensearch:totalResults>
    <opensearch:startIndex>0</opensearch:startIndex>
    <opensearch:itemsPerPage>5</opensearch:itemsPerPage>
    <atom:link rel="related" href="opensearchdescription.xml" type="application/opensearchdescription+xml"/>
    <opensearch:Query role="request" searchTerms="test" />
    <item>
      <title>The Jam - Live On The Old Grey Whistle Test</title>
      <link>http: //www.youtube.com/watch?v=Hc2ZIQ-xp7o&feature=related</link>
      <description></description>
      <pubDate>Tue, 16 Sep 2008 02:00:00 +0200</pubDate>
      <guid isPermaLink="false">Bblo-QQnvQBY</guid>
    </item>
    ...
    <yacy:topwords>
      <yacy:item>adsl</yacy:item>
      <yacy:item>cociente</yacy:item>
    ...
    </yacy:topwords>
  </channel>
</rss>

Used

  • opensearch
  • YaCy-UI