<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6298254854261232655</id><updated>2011-07-05T05:36:50.465-07:00</updated><category term='xml'/><category term='crawler web'/><category term='interfete evoluate'/><category term='browsere'/><category term='xhtml'/><category term='DespreProiect'/><category term='sgml'/><title type='text'>Proiect Interfete Evoluate</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/00092100451467307725</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>11</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-2304754104596392844</id><published>2008-01-13T13:28:00.000-08:00</published><updated>2008-01-15T09:50:13.385-08:00</updated><title type='text'>SEO poisoning</title><content type='html'>&lt;div style="text-align: justify;"&gt;Cei de la Google au un motiv serios sa blocheze si sa puna pe lista neagra site-urile care fac SEO agresiv. In ultimul timp a existat o crestere masiva a incercarilor de SEO poisoning care promovau site-uri de phishing sau care contineau malware.  Astfel, pentru un query destul de comun si nevinovat putem gasi intr-o lista de rezultate, chiar pe prima pagina, acest tip de site-uri.&lt;br /&gt;   Site-urile de phishing copiaza aspectul unor site-uri bine cunoscute (de obicei banci sau institutii) pentru a pacali userii sa-si introduca datele personale.&lt;br /&gt;   Alte site-uri instaleaza malware pe calulatorul vizitatorului, fara stirea acestuia. Instalarea se face prin exploatarea vulnerabilitatilor browser-ului (de obicei Internet Explore) sau ale altor programe care redeau continut web (Flash Player, Quicktime, Windows Media Player, etc.).&lt;br /&gt;   O data prinse, aceste site-uri sunt black-listed atat in motorul de cautare cat si blocate/desfiintate. Durata lor de viata este mica, dar indeajus cat sa prinda cativa useri neatenti. Aceste site-uri sunt creeate in masa, pentru a fi eficiente. Administrarea acestor site-uri este foarte simpla, datorina noilor tool-uri(&lt;a href="http://en.wikipedia.org/wiki/MPack_%28software%29"&gt;MPack&lt;/a&gt;, Icepack) facute special pentru administrarea site-urilor de malware. Aceste tool-uri au devenit din ce in ce mai complete si fac creearea unui astfel de site un "no brainer".&lt;br /&gt;   Pe langa SEO poisoning, mai exista un siretlic prin care userul este pacalit sa intre pe un site: folosirea domeniilor cu nume similare cu ale altor site-uri cunoscute, de exemplu: googl, gogle, microsof, etc.&lt;br /&gt;   Iata cateva sfaturi pentru a va proteja de aceste atacuri:&lt;br /&gt;       - Faceti update des programelor dumneavoastra: de obicei, aceste updateuri contin fix-uri la buguri si vulnerabilitati.&lt;br /&gt;       - Folositi Firefox sau Opera: s-a constatat ca marea ajoritate a atacurilor sunt indreptate doar catre Internet Explorer.&lt;br /&gt;       - Folositi un antivirus care sa includa si detectia site-urilor nesigure (McAfee, Bitdefender, TrendMicro).&lt;br /&gt;       - Folositi &lt;a href="http://noscript.net/"&gt;NoScript &lt;/a&gt;pentru a va apara de continutul activ periculos al paginilor.&lt;br /&gt;       - Uitati-va la numele de domeniu, daca pare ciudat, opriti-va putin si ganditi-va daca poate fi safe.&lt;br /&gt;&lt;br /&gt;Referinte:&lt;br /&gt;&lt;a href="http://www.downloadsquad.com/2007/11/29/google-removes-thousands-of-malware-sites/"&gt;http://www.downloadsquad.com/2007/11/29/google-removes-thousands-of-malware-sites/&lt;/a&gt;&lt;br /&gt;&lt;a href="http://blogs.zdnet.com/security/?p=700"&gt;http://blogs.zdnet.com/security/?p=700&lt;/a&gt;&lt;br /&gt;&lt;a href="http://honeynet.org/papers/wek/"&gt;Know Your Enemy: Behind the Scenes of Malicious Web Servers&lt;/a&gt;&lt;br /&gt;&lt;a href="http://blog.trendmicro.com/"&gt;http://blog.trendmicro.com/&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.f-secure.com/weblog/"&gt;http://www.f-secure.com/weblog/&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-2304754104596392844?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/2304754104596392844/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=2304754104596392844' title='7 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/2304754104596392844'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/2304754104596392844'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2008/01/seo-poisoning.html' title='SEO poisoning'/><author><name>Echipa WebLab</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-8052438301029492649</id><published>2008-01-09T09:51:00.000-08:00</published><updated>2008-01-15T10:48:00.453-08:00</updated><title type='text'>Standarde Web si cine le reglementeaza</title><content type='html'>&lt;div style="text-align: justify;"&gt;Dupa cum bine stim de la un renumit curs din acest an, nu este suficient sa stim sa aplicam tehnologiile invatate, sa stim sa le folosim ,ci este de asemenea necesara cunoasterea standardelor si a organizatiilor care au stat la baza lor.De aceea m-am gandit sa fac o scurta prezentare in legatura cu serviciile web,mai exact standardele web si cine anume le implementeaza.&lt;br /&gt;Cele mai importante standarde sunt:&lt;br /&gt;- XML (Extensible Markup Language) - Meta-limbajul XML este o simplificare a limbajului SGML (din care se trage si HTML) si a fost proiectat in scopul transferului de date intre aplicaţii pe internet. XML este acum şi un model de stocare a datelor nestructurate şi semi-structurate în cadrul bazelor de date native XML.&lt;br /&gt;- SOAP (Simple Object Access Protocol) - un set de reguli referitoare la modul in care sunt incluse solicitarile si raspunsurile pentru servicii Web in schimbul care are loc intre un utilizator si un furnizor de servicii Web.&lt;br /&gt;- WSDL (Web Services Description Language) - un limbaj bazat pe XML utilizat pentru descrierea serviciilor oferite de un furnizor si a protocoalelor (precum SOAP) necesar a fi utilizate pentru a putea beneficia de acestea.&lt;br /&gt;- UDDI (Universal Description, Discovery and Integration) - un registru care permite unui furnizor de servicii Web sa faca publice si sa descrie serviciile pe care le pune la dispozitie pentru ca potentialii utilizatori sa fie informati de disponibilitatea lor.&lt;br /&gt;Standardele mentionate mai sus sunt elaborate si certificate de organizatiile urmatoare:&lt;br /&gt;-World Wide Consortium (W3C)-o organizatie non-profit, care ordoneaza          evolutia Web-ului.Avand ca motto &lt;span style="color: rgb(128, 128, 128);"&gt;"To lead the World Wide Web          to its full potential by developing protocols and guidelines that ensure          long-term growth for the Web"&lt;/span&gt;, W3C a publicat pana in prezent          mai mult de 90 recomandari - &lt;a href="http://www.w3.org/TR/"&gt;W3C Recommendations&lt;/a&gt;.In prezent se lucreaza la &lt;a href="http://www.w3.org/TR/2006/WD-xhtml2-20060726/"&gt;XHTML&lt;/a&gt;          ( versiunea 2.0 ), o reformulare a &lt;a href="http://www.w3.org/TR/html401/"&gt;HTML          4.01&lt;/a&gt; - standard oficial W3C din decembrie 1999 - ca o aplicatie &lt;a href="http://www.w3.org/XML/"&gt;XML&lt;/a&gt;.&lt;br /&gt;-OASIS (Organization for the Advancement of Structured Information Systems) -este o organizatie formata la initiativa unor companii cu preocupari in randul serviciilor Web, precum IBM, Microsoft si Sun Microsystems. OASIS a supervizat dezvoltarea de standarde precum UDDI, ebXML.&lt;br /&gt;-WSI (Web Services Interoperability Organization )- este o alta organizatie formata din peste 120 de membri, printre care IBM, Microsoft, Sun Microsystems, Oracle, ce au ca scop elaborarea de standarde ce permit serviciilor Web sa fie accesibile pe platforme diferite.Setul comun de instrumente standard folosite este destinat sa permita functionarea serviciilor Web pe diferite sisteme de operare.&lt;br /&gt;Ca o concluzie-o intrebare adresata voua:&lt;span style="color: rgb(0, 0, 153);"&gt;Ce standarde va sunt familiare si in ce domenii?&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-8052438301029492649?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/8052438301029492649/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=8052438301029492649' title='0 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/8052438301029492649'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/8052438301029492649'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2008/01/standarde-web-si-cine-le-reglementeaza.html' title='Standarde Web si cine le reglementeaza'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/00092100451467307725</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-6341619098369907359</id><published>2007-12-23T11:17:00.000-08:00</published><updated>2008-01-15T11:31:51.323-08:00</updated><title type='text'>Am intrat in concurs!</title><content type='html'>&lt;div style="text-align: justify;"&gt;O echipa adversa(&lt;span style=";font-family:Arial;font-size:85%;"  &gt;&lt;a href="http://www.interfeteevoluate.ro/" target="_blank"&gt;www.interfeteevoluate.ro&lt;/a&gt;&lt;/span&gt;) se pare ca a venit cu ideea organizarii unui concurs:"Site-ul anului 2007 la Interfete Evoluate".Suntem fericiti sa va anuntam ca suntem si noi printre cei nominalizati.Asteptam voturile voastre daca nu ati facut-o deja. &lt;a href="http://interfete-evoluate.blogspot.com/2007/12/3-serie-de-nominalizari.html"&gt;Aici&lt;/a&gt; puteti vedea nominalizarile.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-6341619098369907359?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/6341619098369907359/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=6341619098369907359' title='0 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/6341619098369907359'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/6341619098369907359'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/12/am-intrat-in-concurs.html' title='Am intrat in concurs!'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/00092100451467307725</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-4700042007457631455</id><published>2007-12-02T08:54:00.000-08:00</published><updated>2007-12-02T13:46:47.514-08:00</updated><title type='text'>Tema 3 Interfete Evoluate</title><content type='html'>&lt;div style="text-align: justify;"&gt;De ceva timp a aparut tema 3 la Interfete Evoluate.Pentru aceasta tema fiecare membru al echipei trebuie sa isi aleaga spre implementare cate o componenta JavaScript.Oare care e mai interesanta dintre contor,auto-complete, calendar si topul articolelor ? Inca nu suntem decisi...&lt;br /&gt;Pentru o descriere mai amanuntita a ceea ce urmeaza sa implementam,gasiti &lt;a href="http://www.weblab.3x.ro/enunttema3.html"&gt;aici&lt;/a&gt; enuntul temei.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-4700042007457631455?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/4700042007457631455/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=4700042007457631455' title='1 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/4700042007457631455'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/4700042007457631455'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/12/tema-3-interfete-evoluate.html' title='Tema 3 Interfete Evoluate'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/17364903723368433050</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-5570482792599867252</id><published>2007-11-25T12:34:00.000-08:00</published><updated>2007-12-02T13:48:24.857-08:00</updated><title type='text'>Cum sa-ti faci propriul Crawler? Partea a 2-a</title><content type='html'>&lt;div style="text-align: justify;"&gt;Buun, ne-am intors cu partea a doua (si ultima) despre crawlere, la sfarsit, vom avea si niste cod disponibil.&lt;br /&gt;Ok unde ramasesem? Hmmm...nu mai tin minte. A! Da! Probleme de implementare. Ok, pana acum am presupus ca informatiile(link-urike) pe care le cautam sunt valide. Nu este chiar asa simplu: unele link-uri s-ar putea sa fie momentan sau permanent indisponibile, altele s-ar putea sa fie pur si simplu scrise gresit. Nu voi intra in detaliu, deoarece majoritatea limbajelor au posibilitatea de tratare a acestor greseli/erori/exceptii. O mica nota: pentru link-urile care nu mai sunt valide, se poate folosi &lt;a href="http://www.archive.org/web/web.php"&gt;WayBack Machine&lt;/a&gt; (care se bazeaza tot pe un crawler), unde se poate gasi un snapshot al unei pagini la o anumita data.&lt;br /&gt;Acum avem toate link-urile valide, numai ca nu sunt toate necesare. De exemplu, unele link-uri sunt spre site-ul curent. Si asta se face usor.&lt;br /&gt;Sa zicem ca am terminat. Dupa cum am spus anterior, eu am ales sa fac acest script in Python si rezultatul va fi un graf orientat. Asa arata rezultatul:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp1.blogger.com/_uFbgSO_Pn6k/R0qOFDQVOhI/AAAAAAAAAB4/sSliv96r2J0/s1600-h/theveryshortstory3.blogspot.com.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://bp1.blogger.com/_uFbgSO_Pn6k/R0qOFDQVOhI/AAAAAAAAAB4/sSliv96r2J0/s320/theveryshortstory3.blogspot.com.jpg" alt="" id="BLOGGER_PHOTO_ID_5137074542393768466" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Eu am folosit Graphviz pentru a defini graf-ul. Este usor de folosit, foarte customizabil si graf-ul poate capata diferite forme.&lt;br /&gt;Cam atat despre mini crawler-ul nostru. Mai multe implementari &lt;a href="http://weblab.3x.ro/enuntteme2.html#"&gt;aici&lt;/a&gt;, adica pe site-ul nostru &lt;a href="http://www.weblab.3x.ro"&gt;weblab.3x.ro&lt;/a&gt; . Sper ca a fost cat de cat interesant.&lt;br /&gt;&lt;br /&gt;PS: Despre WayBack Machine si crawler-ul asociat se pot spune multe. Crawler-ul este open source. Nici nu vreau sa ma gandesc in cate feluri se poate folosi WayBack Machine. Totusi, voi da un exemplu: Un crawler temporal care poate observa nu doar relatiile prezente dintre site-uri/entitati ci si relatiile temporale. Am spus destul.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-5570482792599867252?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/5570482792599867252/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=5570482792599867252' title='1 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/5570482792599867252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/5570482792599867252'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/11/cum-sa-ti-faci-propriul-crawler-partea_25.html' title='Cum sa-ti faci propriul Crawler? Partea a 2-a'/><author><name>Echipa WebLab</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp1.blogger.com/_uFbgSO_Pn6k/R0qOFDQVOhI/AAAAAAAAAB4/sSliv96r2J0/s72-c/theveryshortstory3.blogspot.com.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-5641842702067459599</id><published>2007-11-15T00:21:00.000-08:00</published><updated>2007-11-24T13:47:28.369-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xhtml'/><category scheme='http://www.blogger.com/atom/ns#' term='sgml'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><category scheme='http://www.blogger.com/atom/ns#' term='crawler web'/><title type='text'>Cum sa-ti faci propriul Crawler? Partea 1</title><content type='html'>&lt;div style="text-align: justify;"&gt;Ti-a trecut vre-o data prin cap sa faci asa ceva? Ti se pare interesanta ideea? Nu-i greu deloc. Ideea e simpla, scrii un program/script care partseaza continutul unui site si extrage continutul acestuia (doar ce te intereseaza). In functie de continut, poti alege sa vizitezi anumite link-uri. Toata aceasta informatie este apoi pusa intr-o forma usor accesibila (un XML sau baza de date).&lt;br /&gt;&lt;br /&gt;Exista cateva alegeri pe care va trebui sa le faci inainte:&lt;br /&gt;&lt;br /&gt;- Ce limbaj foloses? Cam orice iti convine de la C, C++, Java la limbaje de scripting gen Perl, Pythom, Ruby, etc. Cam toate au mai multe implementari de parsere HTML si XML. Eu am ales Python, pentru ca imi este foarte comod si este rapid de implementat.&lt;br /&gt;&lt;br /&gt;- Ce parser folosesc? Aici conteaza ce tip de site vrei sa parsezi. Sa zicem ca dorim sa extragem informatiile dintr-un blog de pe blogger.com. Asta inseamna ca avem XHTML. Buun, simplu, folosim un parser de XML! Sau nu... Exista o mica problema aici, legata de corectitudinea XML-ului. La o validare a XML-ului, vom primi un raspuns negativ.  Ce e de facut? Pai, cel mai simplu, facem un pas inapoi in istorie si ne amintim de&lt;a href="http://en.wikipedia.org/wiki/SGML"&gt; SGML&lt;/a&gt;, tatal XML-ului si HTML-ului. Acum avem un grad mai mare de flexibilite si putem folosi scheme malformate, ceea ce si cautam. Nu am specificat tehnologia de parsare XML, SAX sau DOM. Ei bine, SAX este mai potrivit scopului, deoarece este mai rapid si nu avem nevoie de toate informatiile din pagina.&lt;br /&gt;&lt;br /&gt;- Cat de mult vreau sa parsez, care este volumul de date cu care se va lucra si viteza dorita? In functie de acestea vom obtine gradul de complexitate si scalabilitate al crawler-ului. Pentru exercitiul nostru nu avem nevoie de ceva prea complicat, volumul de date nu e prea mare, datele parsate sunt destul de putine. Daca totus vrei ceva mai bun este recomandat sa cauti crawlere deja &lt;a href="http://en.wikipedia.org/wiki/Web_crawler"&gt;existente&lt;/a&gt;. Daca tii mortis sa iti faci propriul crawler care parseaza un volum foarte mare de date si mai este si rapid, iata niste idei: foloseste thread-uri multiple (dooh), foloseste un model de procesare paralela (boss worker, work crew, etc), pentru a tine minte site-urile gata vizitate foloseste  una sau mai multe stive (desi ar merge si alte structuri de date...).&lt;br /&gt;&lt;br /&gt;- Ce vreau sa fac cu datele obtinute? Un crawler n-are nici un folos daca nu folosim datele obtinute in urma rularii sale.  Un functie de nevoi, datele se pot scrie intr-un fisier format XML , in propriul format sau intr-o baza de date. Un exercitiu interesant este sa se creeze un graf al datelor citite. Cu &lt;a href="http://www.graphviz.org/"&gt;graphwiz&lt;/a&gt; se face foarte usor asta: un graf este reprezentat de un fisier text care specifica legaturile dintr nod-uri.&lt;br /&gt;&lt;br /&gt;Dupa ce ai raspuns la toate acestea poti sa te apuci de partea cea mai simpla, implementarea ;) . Glumesc, nu e cea mai simpla, dar nici grea nu este, deoarece exista o gramada de parsere care ne vin in ajutor.&lt;br /&gt;&lt;br /&gt;Atat pentru partea 1. In partea a doua va exista si o implementare.&lt;a href="http://en.wikipedia.org/wiki/SGML"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-5641842702067459599?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/5641842702067459599/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=5641842702067459599' title='2 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/5641842702067459599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/5641842702067459599'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/11/cum-sa-ti-faci-propriul-crawler-partea.html' title='Cum sa-ti faci propriul Crawler? Partea 1'/><author><name>Echipa WebLab</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-3789900516887590171</id><published>2007-11-06T16:09:00.000-08:00</published><updated>2007-11-07T02:37:04.220-08:00</updated><title type='text'>Sa ne facem si site-ul cunoscut...</title><content type='html'>Site-ul nostru este &lt;a href="http://weblab.3x.ro/"&gt;weblab.3x.ro&lt;/a&gt; si contine(cel putin pana acum) tema 1 la interfete evoluate.Vom adauga si viitoarele teme cat si articole despre tehnologiile folosite pentru  realizarea acestora.De astazi site-ul a fost inscris pe &lt;a href="http://www.trafic.ro/?"&gt;trafic.ro&lt;/a&gt; asa ca vizitati-ne cat mai des.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-3789900516887590171?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/3789900516887590171/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=3789900516887590171' title='3 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/3789900516887590171'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/3789900516887590171'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/11/sa-ne-facem-si-site-ul-cunoscut.html' title='Sa ne facem si site-ul cunoscut...'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/17364903723368433050</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-3515250849085556359</id><published>2007-11-06T14:50:00.000-08:00</published><updated>2007-11-06T15:57:56.476-08:00</updated><title type='text'>Lucruri esentiale</title><content type='html'>&lt;div style="text-align: justify;"&gt;Cand am creat blogul ,primul post ar fi trebuit sa fie o descriere a membrilor echipei.Cum niciodata nu este prea tarziu, facem acest lucru in acest post.Suntem 4 studenti, fosti sau actuali colegi de grupa,unii chiar colegi de camera,insa cu totii avem un punct comun(pe langa acest blog):Facultatea de Automatica si Calculatoare.O descriere cat mai scurta(nu de alta dar avem 2 membri carora nu le place sa fie prezentati) si concisa :Ana si Dragos sunt cei care vor scrie cu siguranta articolele tehnice iar eu cu Alex vom avea posturi asemanatoare cu acesta.Inca nu ne-am decis care e liderul nostru.Cam greu cu 2 berbeci,un leu si un rac.Eu cam atat pot spune.Daca mai vrea cineva sa adauge ceva la propria descriere (sau chiar la descrierea echipei) poate da un comment.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-3515250849085556359?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/3515250849085556359/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=3515250849085556359' title='2 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/3515250849085556359'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/3515250849085556359'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/11/lucruri-esentiale.html' title='Lucruri esentiale'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/00092100451467307725</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-5180674746173204186</id><published>2007-11-03T14:13:00.000-07:00</published><updated>2007-11-24T13:50:04.442-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='browsere'/><category scheme='http://www.blogger.com/atom/ns#' term='interfete evoluate'/><title type='text'>Ce ne pregateste Firefox?</title><content type='html'>&lt;div style="text-align: justify;"&gt;    De ce ne place Firefox? Raspunsurile sunt multe, dar printre capetele de lista se numara extensibilitatea sa. De curand, s-a anuntat pe &lt;a href="http://labs.mozilla.com/"&gt;Firefox labs&lt;/a&gt; un nou proiect in dezvoltare. Se numeste &lt;a href="http://labs.mozilla.com/2007/10/prism/"&gt;Prism&lt;/a&gt;, este o platforma pentru desprinderea aplicatiilor web de browser. Astfel, serviciile web vor fi standalone folosind resursele oferite de Prism. De exemplu,  GMail sau Google Reader sunt accesibile direct de pe desktop, ca aplicatii separate. Nu este un feature foarte mare, dar poate schimba mentalitati. Pe de alta parte, poate aparea ca un derivat nereusit al widget-urilor. Ramane de vazut daca va avea succes la useri.&lt;br /&gt;    Un lucru e sigur, Google are un avantaj direct din acest proiect. Ceea ce este normal, caci ei sunt cei mai importanti investitori in Mozlilla Foundation. Pentru mai multe detalii &lt;a href="http://www.cnet.com/8301-13739_1-9776759-46.html?tag=blg.orig"&gt;aici&lt;/a&gt;.&lt;br /&gt;Firefox labs este playground-ul comunitatii Firefox, mai exista cateva proiecte interesante in derulare. Check them out!&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-5180674746173204186?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/5180674746173204186/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=5180674746173204186' title='0 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/5180674746173204186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/5180674746173204186'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/11/ce-ne-pregateste-firefox.html' title='Ce ne pregateste Firefox?'/><author><name>Echipa WebLab</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-8738911785633004087</id><published>2007-10-29T15:17:00.000-07:00</published><updated>2007-11-24T13:49:27.063-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xhtml'/><category scheme='http://www.blogger.com/atom/ns#' term='interfete evoluate'/><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><title type='text'>Altova XMLSpy</title><content type='html'>&lt;div style="text-align: justify;"&gt;Daca vrei vreodata sa editezi, sa modelezi  documente legate de tehnologia XML,  XMLSpy este cea mai buna solutie.Este un mediu de dezvoltare puternic de la Altova care iti usureaza mult munca si ruleaza pe 3 dintre cele mai renumite platforme:Windows, Linux si MacOS.&lt;br /&gt;Desi eu nu l-am folosit pana acum decat pentru editarea si validarea documentelor XML folosind un DTD, XMLSpy poate mai mult de atat.Tool-ul are suport pentru XSLT, XPath, XQuery, WSDL, SOAP, documente Office Open XML, plugin-uri de Visual Studio .NET si Eclipse si multe altele.XMLSpy2008 (aparut pe 12 septembrie) are suport chiar si pentru formatul Office Open XML (folosit de  Office 2007).Documentele Office 2007 pot fi stocate acum sub forma unei arhive zip ce contine datele sub format XML ceea ce reprezinta un mare avantaj pentru ca astfel datele sunt  standardizate foarte usor.&lt;br /&gt;&lt;div style="text-align: left;"&gt;Pentru download puteti intra aici(nu e free:():  &lt;a href="http://www.altova.com/altovaxml.html"&gt;Download&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;Detalii despre XML programming ( despre integrarea Eclipse ) gasiti aici: &lt;a href="http://www.altova.com/features_eclipse.html"&gt;Integrare EclipseTeam&lt;/a&gt;.Ma gandesc ca poate va ajuta la tema 2 la Interfete Evoluate.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-8738911785633004087?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/8738911785633004087/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=8738911785633004087' title='0 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/8738911785633004087'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/8738911785633004087'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/10/altova-xmlspy.html' title='Altova XMLSpy'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/17364903723368433050</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6298254854261232655.post-29262864681746311</id><published>2007-10-29T14:44:00.000-07:00</published><updated>2007-11-06T16:19:36.096-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='DespreProiect'/><title type='text'>Proiect Interfete Evoluate</title><content type='html'>Asa incepe proiectul nostru la interfete evoluate....cu acest blog.Va urma...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6298254854261232655-29262864681746311?l=interfeteevoluate2007.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://interfeteevoluate2007.blogspot.com/feeds/29262864681746311/comments/default' title='Postare comentarii'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6298254854261232655&amp;postID=29262864681746311' title='4 comentarii'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/29262864681746311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6298254854261232655/posts/default/29262864681746311'/><link rel='alternate' type='text/html' href='http://interfeteevoluate2007.blogspot.com/2007/10/proiect-interfete-evoluate.html' title='Proiect Interfete Evoluate'/><author><name>Echipa WebLab</name><uri>http://www.blogger.com/profile/00092100451467307725</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry></feed>
