Dev:API

Aus YaCyWiki
Wechseln zu: Navigation, Suche

Introduction

Besides the web interface, YaCy offers a rich XML and JSON based API for interaction. Some of these interfaces can also be accessed via html, and these pages are integrated in YaCy web interface. When you access such a page, a 'API' tooltip icon appears on the right upper corner of the web page, and a mouseover shows a short introduction of the API. The API icon itself links to the XML, JSON or similar API file that presents the shown data in annotated form. Please note that these tooltips and the underlying link to the API path change every time you navigate to another YaCy page, even if the icon looks the same, it will always link to the data that you just see at the web page.

API reference

There are different 'generations' of YaCy APIs:

  • servlets in /yacy/* .. these had been there first and contain the basic peer-to-peer bootstrap, search and DHT transfer servlets. These servlets are used only for peer-to-peer communication
  • servlets in /api/* .. additional servlets to support ajax functions of the YaCy administratin interface, to be used locally on the same peer
  • any other servlet which clones the content of a web page but has a .xml or .json extension. These API servlets are marked within the administratin interface with an orange 'API' icon in the top right corner of the administration web page
  • the solr search servlet at /solr/select of the embedded Solr search server which provides exactly the same functionality like the original Solr as described in the Solr wiki for queries.


Search Interface

/
/yacysearch.rss and /yacysearch.json
YaCy search page returning xml (opensearch) or json results
/suggest.xml and /suggest.json
YaCy suggest interface returning xml (opensearch-compliant) or json results
/solr
/solr/select
The (original) Solr search api embedded into YaCy
/gsa
/gsa/searchresult
The (re-implemented) Google Search Appliance API embedded into YaCy

Peer-to-Peer Communication

/yacy
/yacy/seedlist.html / /yacy/seedlist.json / /yacy/seedlist.xml
the YaCy p2p network bootstraping seed list
/yacy/crawlReceipt.html
feedback from a remote peer to transmit metadata of loaded (remote-crawled) content
/yacy/hello.html
called for a peer-ping, the network keep-alive process
/yacy/idx.json
retrieve the known internet network structure based on inter-host links
/yacy/list.html
lists shared blacklists
/yacy/message.html
send a message to a remote peer
/yacy/profile.html
get the profile of a remote peer
/yacy/query.html
query specific throughput and sizing parameters from the remote peer
/yacy/search.html
DHT search on the remote peer
/yacy/transferRWI.html
send a RWI to a remote peer
/yacy/transferURL.html
send URL metadata to a remote peer
/yacy/urls.xml
ask for remote crawl URL lists that the requesting peer wants to load

AJAX Services for the local peer

Some of these servlets are protected (all servlets ending with '_p' and can only be accessed from localhost or after authorization.

/api
/api/blacklists_p.xml
/api/config_p.xml
/api/feed.rss
/api/push_p.json
/api/queues_p.xml
/api/schema.xml
/api/status_p.xml
/api/table_p.xml
/api/version.xml
YaCy SVN version
/api/webstructure.xml
/api/yacydoc.xml
/api/util
/api/util/getpageinfo_p.xml
crawling information for single url
/api/util/ynetSearch.xml
/api/blacklists
/api/blacklists/get_metadata_p[.xml | .json]
list of all blacklists and their metadata
/api/blacklists/get_list_p[.xml | .json]
matadata and content of a specific list
/api/blacklists/add_entry_p[.xml | .json]
adds new entry to blacklist
/api/blacklists/delete_entry_p[.xml | .json]
adds new entry to blacklist
/api/bookmarks
/api/bookmarks/get_bookmarks[.xml | .json]
/api/bookmarks/posts
/api/bookmarks/posts/add_p.xml
/api/bookmarks/posts/all.xml
/api/bookmarks/posts/delete_p.xml
/api/bookmarks/posts/get.xml
/api/bookmarks/tags
/api/bookmarks/tags/getTag.xml
/api/bookmarks/tags/rename_p.xml
/api/bookmarks/xbel
/api/bookmarks/xbel/xbel.xml

HTML Servlets

These servlets are used for online administration of a YaCy peer but they can also be used for a remote steering by just calling the http interface with i.e. wget or curl.

/
/AccessTracker_p.xml
peer access statistics
/Blog.xml
YaCy blog
/Crawler_p.html
start a web crawl
/CrawlStartExpert.html
start a web crawl in expert mode
/CrawlProfileEditor_p.xml
show and edit crawl profiles
/IndexDeletion_p.html
delete documents from solr index
/IndexSchema_p.html
show and edit the Solr index schema
/Messages_p.xml
/Network.xml
peer and network statistics
/News.rss
/opensearchdescription.xml
/PerformanceMemory_p.xml
peer memory status
/PerformanceQueues_p.xml
peer status of busy queues
/QuickCrawlLink_p.xml
single url crawl start with immediate confirmation
/Status.html
peer steering: shutdown, restart ,pause/resume crawls
/Steering.html
update peer
/ViewProfile.xml
view peer profile

Accessing the APIs using non-java frameworks

The example above showed how to retrieve information from a peer by simply calling the appropriate applet and encoding the delivered xml. The easiest way to explore other API calls is to perform the desired action in theYaCy admin interface and use the same parameters while calling the rss or xml applet ie: Network.xml instead of Network.html. Most actions that had been issued on the YaCy interface to change the configuration or to request crawl actions can be examined on page Table_API_p.html.
After having received the query results the delivered xml or json must be converted into a SimpleXML object or Array. The client then iterates over the elements in the response, processing each one using a foreach() loop and retrieving the information sent by the peer. Heres how some sample peer information is retrieved using PHP or similar languages for web applications.

Handling XML with PHP5

Open a connection to the desired peer and send a http request. A PHP5 class Dev:YaCyAPIforPHP is available for simple handling of requests to one or multiple YaCy peers.
A native http request could be handled by cURL like shown in this example:

<?php
  // method using native php-curl
  $YaCyURL="http://mypeer.tld:8090/";  
  $cu=$YaCyURL."Status.html";
  $queryServer = curl_init($cu);     
  curl_setopt($queryServer, CURLOPT_HEADER, 0);
  curl_setopt($queryServer, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($queryServer, CURLOPT_USERPWD,$appID);
  $results = curl_exec($queryServer);
  curl_close($queryServer);  
?>
  1. The peers friendly name is stored in the <your> node collection, the sample accesses this node collection as yourpeer and stores the information like name or hash in yourpeer->name or yourpeer->hash
  2. The networks URL count is stored in the <all> node collection, the sample accesses this node collection as allpeers and stores the information in allpeers->count'
<?php
  //method using YaCyapi.php
  require 'YaCyAPI2.php';
  // start the class 
 search = new YaCyAPI();
 $results = $search->peerCommand("Network.xml");                                       
 //now we have xml, put it in a simple array
 $resultarray=xml2array($results); #convert to php-array
 //get items 
 $yourpeer=$resultarray['peers']['your'];
 $peername=$yourpeer['name']
 $peerhash=$yourpeer['hash']
 //
 $allpeers=$resultarray['peers']['all'];
 $urlcount=$allpeers['count']
?>

The returned XML string is now converted to an array (xml2array).
This example is calling Network.xml with the page parameter to retrieve information about all peers in the queried network.

<?php
$results = $search->peerCommand("Network.xml?page=1"); 
//now we have xml, put it in a simple array
$resultarray=xml2array($results);;
//get items only
$items=$resultarray['peers']['peer'];
if ($items)
{
  echo "<h1>Active Peers</h1>";
  echo "<table>";
  foreach ($items as $item)
  {
   if ($tr=="ffffff") {$tr="aaaaaa";} else {$tr="ffffff";}
   echo "<tr bgcolor=#".$tr.">";
   echo "<td>".$item['hash']."</td>";
   echo "<td>".$item['fullname']."</td>";
   echo "<td>".$item['type']."</td>";
   echo "<td>".$item['version']."</td>";
   echo "<td>".$item['ppm']."</td>";
   echo "<td>".$item['qph']."</td>";
   echo "<td>".$item['uptime']."</td>";
   echo "<td>".$item['links']."</td>";
   echo "<td>".$item['words']."</td>";
   echo "<td>".$item['rurls']."</td>";
   echo "<td>".$item['lastseen']."</td>";
   echo "<td>".$item['sendWords']."</td>";
   echo "<td>".$item['receivedWords']."</td>";
   echo "<td>".$item['sendURLs']."</td>";
   echo "<td>".$item['receivedURLs']."</td>";
   echo "<td>".$item['direct']."</td>";
   echo "<td>".$item['acceptcrawl']."</td>";
   echo "<td>".$item['dhtreceive']."</td>";
   echo "<td>".$item['rankingreceive']."</td>";
   echo "<td>".$item['location']."</td>";
   echo "<td>".$item['seedurl']."</td>";
   echo "<td>".$item['age']."</td>";
   echo "<td>".$item['seeds']."</td>";
   echo "<td>".$item['connects']."</td>";
   echo "</tr>"; 
  }
  echo "</table>";
}
?>

Handling JSON with PHP5

Some applets could be called to deliver JSON instead of XML. Results are delivered a bit faster and most parsers are able to decode returned data quicker so this format should be preferred to speed up things.

Handling XML or JSON with Ruby

The Ruby gem [httparty] makes it easy to consume REST like APIs and offers great flexibility. It is used in this example doing a quick search for 25 global results querying for 'test' and echos the links found.

XML Example

require 'httparty'

class YaCy
  include HTTParty

  format :xml
  base_uri 'http://localhost:8090'

  def self.search(q)
    get('/yacysearch.rss', :query => {
      :query => q,
      :resource => 'global',
      :verify => false,
      :maximumrecords => 25
    })
  end
end

begin
  channel = YaCy.search('test')
  channel['rss']['channel']['item'].each do |item|
    puts item['link']
  end
rescue
  p 'oops nothing found'
end

JSON Example

require 'httparty'

class YaCy
  include HTTParty

  format :json
  base_uri 'http://localhost:8090'

  def self.search(q)
    get('/yacysearch.json', :query => {
      :query => q,
      :resource => 'global',
      :verify => false,
      :maximumrecords => 25
    })
  end
end

begin
  channel = YaCy.search('test')
  channel['channels'].first['items'].each do |item|
    puts item['link']
  end
rescue
  p 'oops nothing found'
end

Handling XML or JSON with perl

For perl a library [Ismael] is available to handle request and returned results.

Steering a peer

To intiate functions without awaiting a delivered result, like pausing/resuming crawls or shutdown the peer, just call the applet as in the admin-interface.

http://localhost:8090/Steering.html?restart= 

will restart the peer after confirming admin credentials if not delivered with the query via http basic-auth. As the peer doesnt have to confirm this action nor does it has a need to deliver any data, no data must be parsed by the client.

Resources

As these examples show the YaCy API is very useful when you try to mash up data found or delivered by YaCy with data from other services, or simply build a customized interface for the YaCy community.
For more information about REST, XML, JSON and implementations in popular web programming languages see also