Dev:APICrawlProfileEditor
Aus YaCyWiki
/CrawlProfileEditor_p.xml
Crawl profiles are collections of information containing start-url, crawling-depth and filters which specify each running crawl job. Crawl profiles can be retrieved as XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> - <crawlProfiles> - <crawlProfile> <name>snippetLocalMedia</name> <status>active</status> <starturl /> <depth>0</depth> <mustmatch>.*</mustmatch> <mustnotmatch /> <crawlingIfOlder>02.01.2010 16:10:00</crawlingIfOlder> <crawlingDomFilterDepth>inactive</crawlingDomFilterDepth> <crawlingDomFilterContent /> <crawlingDomMaxPages>unlimited</crawlingDomMaxPages> <withQuery>yes</withQuery> <storeCache>no</storeCache> <indexText>no</indexText> <indexMedia>no</indexMedia> <remoteIndexing>no</remoteIndexing> </crawlProfile> - <crawlProfile> <name>/autoReCrawl/daily/http://www.acer-userforum.de/</name> <status>active</status> <starturl>http://www.acer-userforum.de/</starturl> <depth>3</depth> <mustmatch>.*</mustmatch> <mustnotmatch>.*memberlist.*|.*previous.*|.*next.*|.*p=.*</mustnotmatch> <crawlingIfOlder>31.01.2010 14:56:24</crawlingIfOlder> <crawlingDomFilterDepth>inactive</crawlingDomFilterDepth> <crawlingDomFilterContent /> <crawlingDomMaxPages>unlimited</crawlingDomMaxPages> <withQuery>yes</withQuery> <storeCache>no</storeCache> <indexText>yes</indexText> <indexMedia>yes</indexMedia> <remoteIndexing>no</remoteIndexing> </crawlProfile> </crawlProfiles>
This native PHP example shows how to request a list of all crawl profiles a peer has loaded.
<?php $command="CrawlProfileEditor_p.xml"; //open connection to peer $YaCyURL="http://mypeer.tld:8080/"; $cu=$YaCyURL.$command; $queryServer = curl_init($cu); curl_setopt($queryServer, CURLOPT_HEADER, 0); curl_setopt($queryServer, CURLOPT_RETURNTRANSFER, 1); curl_setopt($queryServer, CURLOPT_USERPWD,$appID); $results = curl_exec($queryServer); curl_close($queryServer); //parse xml... $resultarray=xml2array($results); //get items only $items=$resultarray['crawlProfiles']['crawlProfile']; if ($items) { echo "<h1>Crawl Profiles</h1>"; echo "<table>"; foreach ($items as $item) { if ($tr=="ffffff") {$tr="aaaaaa";} else {$tr="ffffff";} echo "<tr bgcolor=".$tr.">"; echo "<td>".$item['hash']."</td>"; echo "<td>".$item['name']."</td>"; echo "<td>".$item['status']."</td>"; echo "<td>".$item['starturl']."</td>"; echo "<td>".$item['depth']."</td>"; echo "<td>".$item['mustmatch']."</td>"; echo "<td>".$item['mustnotmatch']."</td>"; echo "<td>".$item['crawlingIfOlder']."</td>"; echo "<td>".$item['crawlingDomFilterDepth']."</td>"; echo "<td>".$item['crawlingDomFilterContent']."</td>"; echo "<td>".$item['DomMaxPages']."</td>"; echo "<td>".$item['withQuery']."</td>"; echo "<td>".$item['storeCache']."</td>"; echo "<td>".$item['indexText']."</td>"; echo "<td>".$item['indexMedia']."</td>"; echo "<td>".$item['remoteIndexing']."</td>"; echo "</tr>"; } echo "</table>"; }