Posts Tagged ‘File’

Ftps (ftp Over Ssl) Vs. Sftp (ssh File Transfer Protocol): What to Choose

Wednesday, January 20th, 2010

File transfer over the network using FTP protocol (certain by RFC 959 and later additions) takes roots in year 1980, when the first RFC for FTP protocol was in print. FTP provides functions to upload, download and delete files, make and delete directories, read directory contents. While FTP is very standard, it has certain disadvantages that make it harder to use. The major drawbacks are lack of the uniform format for directory listing (this problem has been partially solved by introducing MLST command, but it’s not supported by some servers) and presence of the lesser connection (DATA connection). Security in FTP is provided by employing SSL/TLS protocol for channel encryption as certain in RFC 2228. The open version of FTP is called FTPS.

In UNIX systems a further security standard has grown. It was SSH family of protocols. The primary function of SSH was to secure remote bombard access to UNIX systems. Later SSH was extended with file transfer protocol – first SCP (in SSH 1.x), then SFTP (in SSH2). Version 1 of the SSH protocol is outdated, insecure and generally not recommended for use. Consequently SCP is not used anymore and SFTP gains popularity day by day.

“SFTP” abbreviation is evenly mistakenly used to specify some kind of Secure FTP, by which people most evenly mean FTPS. A further (similar) mistake is that SFTP is thought to be some kind of FTP over SSL. In fact SFTP is an abbreviation of “SSH File Transfer Protocol”. This is not FTP over SSL and not FTP over SSH (which is also technically possible, but very rare).

SFTP is a binary protocol, the newest version of which is standardized in RFC 4253. All commands (requests) are packed to binary messages and sent to the server, which answers with binary comeback packets. In later versions SFTP has been extended to provide not just file upload/download operations, but also some file-system operations, such as file lock, symbolic link creation etc.

Both FTPS and SFTP use a combination of asymmetric algorithm (RSA, DSA), symmetric algorithm (DES/3DES, AES, Twhofish etc.) and a key-exchange algorithm. For certification FTPS (or, to be more precise, SSL/TLS protocol under FTP) uses X.509 certificates, while SFTP (SSH protocol) uses SSH keys.

X.509 certificates include the public key and certain information about the certificate owner. This information lets the other side verify the integrity of the certificate itself and authenticity of the certificate owner. Verification can be done both by computer and to some extent by the human. X.509 certificate has an associated confidential key, which is usually stored separately from the certificate for security reasons.

SSH key contains only a public key (the associated confidential key is stored separately). It doesn’t contain any information about the owner of the key. Neither it contains information that lets one reliably validate the integrity and authenticity. Some SSH software implementations use X.509 certificates for certification, but in fact they don’t validate the total certificate chain – only the public key is used (which makes such certification incomplete and similar to SSH key certification).

Here’s the brief list of Pros and Cons of the two protocols:

FTPS

Pros:

Usually known and used

The communication can be read and understood by the human

Provides air force for server-to-server file transfer

SSL/TLS has excellent certification mechanisms (X.509 certificate features)

FTP and SSL/TLS support is built into many internet communication frameworks.

Cons:

Doesn’t have a uniform directory listing format

Requires a lesser DATA channel, which makes it hard to use behind the firewalls

Doesn’t define a standard for file name character sets (encodings)

Not all FTP servers support SSL/TLS

Doesn’t have a standard way to get and exchange file and directory attributes

SFTP

Pros:

Has excellent standards background which strictly defines most (if not all) aspects of operations

Has only one connection (no need for DATA connection)

The connection is permanently open

The directory listing is uniform and apparatus-readable

The protocol includes operations for consent and attribute manipulation, file locking and more functionality

Cons:

The communication is binary and can’t be logged “as is” for human reading

SSH keys are harder to manage and validate

The standards define certain things as optional or recommended, which leads to certain compatibility problems between different software titles from different vendors.

No server-to-server copy and recursive directory removal operations

No built-in SSH/SFTP support in VCL and .NET frameworks

What to choose

As usually, the answer depends on what your goals and requirements are. In general, SFTP is technologically superior to FTPS. Of course, it’s a excellent thought to implement support for both protocols, but they are different in concepts, in supported commands and in many other things.

It’s a excellent thought to use FTPS when you have a server that needs to be accessed from personal devices (smartphones, PDAs etc.) or from some specific operating systems which have FTP support but don’t have SSH / SFTP clients. If you are building a custom security solution, SFTP is probably the better option.

As for the client side, the requirements are certain by the server(s) that you plot to connect to. When connecting to Internet servers, SFTP is more standard because it’s supported by Linux and UNIX servers by default.

For confidential host-to-host transfer you can use both SFTP and FTPS. For FTPS you would need to search for a free FTPS client and server software or buy a license for commercial one. For SFTP support you can install OpenSSH package, which provides free client and server software.

Developer tools

If you are a software developer and need to implement file transfer capability in your application, you will be searching for the components to do the job.

In .NET you have built-in support for FTPS in .NET Framework (see FtpWebRequest class). But functionality of this class is severely limited, especially in SSL/TLS control aspect.

.NET Framework doesn’t include any support for SSH or SFTP.

In VCL you have a selection of free components and libraries which provide FTP functionality. When you add OpenSSL to them, you can get FTPS for free. If you don’t want to deal with OpenSSL DLLs, you can use one of the commercially available libraries for SSL and FTPS support. Again, there are no freeware SFTP components available for VCL.

If you use a tool with which you have to use ActiveX controls, you need to search for commercial FTPS or SFTP controls. No free controls are available.

SecureBlackbox library provides both FTPS and SFTP support for .NET, VCL and ActiveX technologies.

3 Tips For Fully Utilizing Your Robots TXT File

Tuesday, December 29th, 2009

When it comes to administering the back-end part of your website, especially if this is a sales site that you are going to use to promote a product to make cash, one of the vital bits and pieces that should be included in your server files and in the meta data of your web pages is called the “robots.txt” file. Depending on the type of website you have and the purpose that it serves for you, the information that you include in your robots file is of varying importance. When you are first setting up your website, this is only a touch that you need to do once place it can have vital benefits months and years into the future.

The purpose of the “robots.txt” file is to instruct search engine web bots or spiders as to which content should be indexed and which content should be avoided. There are three vital tips that can help you to gain the maximum benefits from using this file in the right way on your server.

Protect Your Files, Software, And Documents

Many online businesses have a business model based on the digital delivery of a product, whether that product is a piece of software or an ebook that contains certain vital information that the buyer needs. Software piracy is a major concern for this type of business model, and unfortunately with an ebook product it is much simpler to hurt your business because your documents can be stored in a search engine reasonably easily. By instructing web bots not to catalog certain material you can help to make sure that these files or software remain confidential.

Keep Others From Infringing Your Copyrighted Material

Many types of websites such as a photography website, a stock photo marketplace, a premium desktop wallpaper site or any other type of site that is largely graphics-intensive might have a large images folder which you would not want to be stored in the memory of any search engines. You may also have articles or documents that you sell to provide a solution to a given problem, or maybe you simply do not want other authors out there to copy your articles word-for-word. By including a statement in your “robots.txt” file that says “Disallow: /folder/” where you insert the name of the folder where your material is stored you can prevent search engine spiders from indexing any of this content.

Prevent Web Bots From Utilizing Excessive Bandwidth

If you have a large website then there is a chance that your images folder could have as many as thousands of different images which could take up gigabytes of space. If a search engine spider stumbles upon this folder it could potentially lead to an unwanted increase in server bandwidth. Taking steps to prevent this from happening by instructing web bots to ignore your images folder or other folders containing large files could make sure that you do not receive higher website hosting invoices due to increased bandwidth.

It is vital to remember that while most search engine spiders are programmed to honor the data that is presented in the robots file, do not fully assume that all of the files and archives on your site will never be indexed or copied simply because this file says that they shouldn’t be. A computer programmer that does not have your best interest at heart can program a web bot to simply store all information and files it finds into its own cache memory, and if you are running a website where you sell a digital product then this could potentially harm your business because once your documents are copied they can then be distributed or catalogued in the search engines.

Robots.txt File: How to Benefit From it Most

Thursday, December 24th, 2009

 

In the web, you will find many types of websites in which you can get the information that you need. You can simply search the web engine and there are by now lists of website pages that may match your needs. Anyone can also have their own site and place all the necessary contents that they want to share to other people. But, there are a number people who make some web pages but they do not intend that it would be search for others. Thus, they use the robots.txt file.

One may have some websites in which they do not want others to view it because it is not yet finished or that there are a few information that may be irrelevant to most people. Thus, you can use the file and keep others in opening the webpage.

The search engine crawler generally follows this robots exclusion protocol or robots.txt file if it is present in the server. The main use of this file is to determine which sites or pages in a website are to be accessed by the search engine and which are not. This will keep the Web robots from crawling in certain pages that may have sensitive contents that is not intended for other viewers. But, this file only prevents the access into the web pages but it does not keep the site from being indexed.

There are some people who tend to have problems when their sites are not listed in the search engine. When this is the case, most blame the incorrect use of the robots.txt file. The file prevented the users why certain sites are not listed in the search engines or some cannot access the total site. When you fixed the problem regarding the file, the site will then be indexed and it will soon have a better traffic.

When a person does not use the file correctly or place codes the incorrect way, there result would certainly the other way around. Thus, they should be able to know the use of the file and how it should work according to your needs. There are some people who want their website to be viewed by others so they must not use the robots.txt file. But, for some who may want certain unfinished or confidential pages not to be indexed, the proper use of the file will be a huge help for them.

There may be a number of problems with the robots.txt file since there is no way that you may be able to stop other sites to link into your site. But, in some way, the robots.txt file help in making the person protect the search engines from gaining access to some web pages or your total website but the site is still available, only not from the search engines. Still, a careful use of the file should be done so that the results will be according to what you want.

Search Engine Spiders And Your Robots.txt File

Wednesday, December 23rd, 2009

In this article we will discuss search engine spiders and what they do. You will also learn how to make a robots.txt file and why you might need one.
Search engine spiders are automated software programs that crawl the Web looking for pages to feed to search engines. They are also called crawlers, robots and bots. Spiders are one of the most useful programs on the internet. They are a key part in how the search engines operate. Spiders allow your site to be found by the millions of people who use search engines. Feed the spiders right and they will tell the search engines about your site.
How Spiders Work
A search engine is an index to the Internet, search engines point to relevant web sites depending on your search. Search engines need a tool that is able to stay websites, navigate the websites, choose what the website is about and add that data to the search engine.
Spiders are essentially programs that “crawl” sites and report back to their boss their findings. Their purpose in life is to make it simple for your site to get listed in search engines.
Spiders work by finding links to web sites, visiting those web sites, going through the content of a web site and then reporting the content of the site back to the database of the search engine they work for. From there, the information is added to the search engine, and the site then shows up in search results.
The robots.txt file
By defining a few rules, you can tell robots to not crawl certain directories or files, within your site. Web sites do not unquestionably have to have a robots.txt file, they can get along just fine without one. Most spiders look for a robots.txt file as soon as they arrive on your site. Take a look at your site statistics. If your statistics has a “files not found” section, you may see many entries where spiders disastrous to find the file on your site.
The default behavior is to allow all unless you have a Disallow for that resource. If you wish to exclude some of your pages from search engine indexing, this is the tool approved by the search engines. Making a robots.txt file that guides spiders is simple.
If you want to allow the spiders to crawl your site but exclude directories of your choice, copy and paste the following into a blank txt file:
User-agent: *
Disallow: /directory1/
Disallow: /directory2/
Disallow: /directory3/
To exclude files of your choice, type in the path to the files you want to exclude:
User-agent: *
Disallow: /directory1/page1.html
Disallow: /directory2/page2.html
Disallow: /directory3/page3.html
To exclude all the search engine spiders from your entire web site, copy and paste the following into the txt file:
User-agent: *
Disallow: /
This will keep a specific search engine spider from indexing your site:
User-agent: Name_of_Robot
Disallow: /
To allow a single robot and exclude all other robots:
User-agent: Googlebot
Disallow:
User-agent: *
Disallow: /
There can only be one robots.txt on a site, and you may not have blank lines in a record. Once you have it the way you want, save the file as “robots” and as a .txt file. Uploading the file to the root directory of your site, that is the directory where your home page or index page is. Place the robots.txt file right alongside the index file.

Magic of .htaccess File in Internet Marketing

Monday, December 21st, 2009

The .htaccess file and the power it has to improve your website. Even if .htaccess is only a file, it can exchange settings on the servers and allow you to do many different things, the most standard being able to have your own custom 404 error pages. .htaccess isn’t hard to use and is really just made up of a few simple instructions in a text file. If your server runs Unix or Linux, or any version of the Apache web server it will support .htaccess, even if your host may not allow you to use it.

There is a huge range of things .htaccess can do including:

password protecting folders, redirecting users involuntarily, custom error pages, varying your file extensions, banning users with certian IP addresses, only allowing users with certain IP addresses, stopping directory listings and using a different file as the index file.

Custom Error Pages

The first use of the .htaccess file which will cover is custom error pages. These will allow you to have your own, personal error pages (for example when a file is not found) instead of using your host’s error pages or having no page.

This will make your site seem much more qualified in the unlikely event of an error. It will also allow you to make scripts to say you if there is an error (for example I use a PHP script on Free Webmaster Help to involuntarily e-mail me when a page is not found).

You can use custom error pages for any error as long as you know its number (like 404 for page not found) by count the following to your .htaccess file: ErrorDocument errornumber /file.html If the file is not in the root directory of your site, you just need to place the path to it: ErrorDocument 500 /errorpages/500.html

These are some of the most common errors:

401 – Authorization Required

400 – Terrible question for

403 – Forbidden

500 – Internal Server Error

404 – Incorrect page

Then, all you need to do is to make a file to show when the error happens and upload it and the .htaccess file.

The Easy Guide to Making a Robots.txt File

Monday, December 21st, 2009

If you have a website you really need to have a robots.txt file. It gives search engine spiders specific commands and it is simple to use and simple to maintain. Here is an simple guide to a robots.txt file in five minutes.

There are times when you don’t want a search engine to index a page or a folder on your website. Maybe you have some information you just don’t want to have show up in google. This may include your statistics page, a page of notes, or a dynamic page. And, importantly, if you use google adsense and the search tool that displays search results on your website google mandates you exclude this page from search engines. Which means they mandate you having a robots.txt file.

A robots.txt file is a simple document named robots.txt and saved in the root folder of your website. Search engines see this and follow any commands it contains. Make a simple text document using any word processor program like notepad and place these two lines it:

User-agent: *

Disallow:

The first line tells all spiders to listen up because the following command is for you. The second line means do not index any of the following pages. And it is here you place the url of any pages you don’t want spidered. So if you wanted the spiders to skip your confidential page it looks like this:

Disallow:/privatepage.htm

If you want the spiders to skip a total folder you place the url of that folder with a slash like this:

Disallow:/privatefolder/

Simply place this text file in the root folder of your website and you are done. In the future you can add and remove commands easily.

The robots.txt file is a very simple file to write and maintain and it is a very powerful tool that will help you interact successfully with search engines. This disallow command is the simplest and most used command but there are also many other commands you can use and if you have a website it is well worth your time to have a robots.txt file and even to research it a bit further.

For more appealing insights into being a creative webmaster and making your website work for you stay the authors site at: The Creative Webmaster – Forging the Iron of Creativity on the Anvil of a Website

Create A .htaccess File Without Referral Spam

Monday, December 21st, 2009

At present, there is a growing nuisance for users and administrators alike of sites that ruin web servers and more particularly, blogs. This nuisance is being referred to as comment, trackback and referrer spams. Various solutions have been proposed with some being applicable to even two of these forms of spam using a single solution.
What is Referral Spam?
A referrer question for-header file allows the client to specify the address (URI) of the resource from which the question for-URI was obtained. It is a way for an HTTP client to send in the headers, the URI of the page that sent them there. This is especially handy for a site administrator to provide insight as to where the traffic on his web server is coming from. It is also depended upon by the most standard web server log analyzers in providing statistics on the most common referrers.
The HTTP Referrer: header is very useful but it is also completely illogical. Any web browser or HTTP client is free to send a forged Referrer: header with any question for to a web server. Spammers have taken advantage of the fact that there is no provision for certification in SMPTP and have used the existing openness to specially craft question for with their website in the Referrer: header.
Most people will find it hard to know why a name would bother spamming a touch which only the site administrator will see in the logs. One probable motivation pinpointed is the boosting of search engine ranking. A further is simply to show-up in any stats in print by the site. If a site being spammed runs a web server log analyzing software, access to the URL in the top referrer’s section is handily obtained by the spammer.
A serious consequence of referrer spam is that the process is evenly performed via an HTTP “GET” or “POST” question for which retrieves the entire body of the document being spammed. A 30k document, for example, will have all the 30k transferred across one’s Internet pipe. This results to not a small amount of traffic in the web server which could be very costly since bandwidth is not low-cost.
Referrer spam wastes CPU and disk space and can be a source of endless annoyance to server operators. It is being really fought by search engine developers thus its initial effectiveness in boosting a site’s ranking has been greatly lessened. But, the problem persists and much has to be done to conquer it.
Some recommended practices in countering the threat of referral spam include the non-publication of referrers by bloggers, inclusion of the page in robots.txt when referrers have to be in print, use of the rel=”no follow” attribute and gathering a cleaner list of referrers using JavaScript and beacon images. Some bloggers have begun fighting referrer spammers at the .htaccess amount. Others have even taken steps to automate this.
Blocking Users by Referrer Notes
A very useful feature of .htaccess is the ability to block users or sites that originate from a particular domain. When there are tons of referrals from a particular site with no single visible link to one’s own site from the said site, the referral probably isn’t a legitimate one. The other site is most likely hot between to certain files such as images, CSS file or other file. The blocking access by referrer in .htaccess requires the help of the Apache module mod rewrite to be able to make out the referrer first. There is a dread that spam would still come in even as .htaccess take up again to grow. Blacklisting certain referrers in .htaccess is a further option, the effectiveness of which has been greatly diminished due to the ease by which spammers are able to register thousands of domains and rotate them as quickly as they are blacklisted.
The .htaccess generator to prevent people from certain IP addresses, domains or even countries from gaining access to a site or to specific folders can be used. The full IP address has to be typed to block a specific IP. The use of a partial IP address is required to block a range of IPs. Blocking a particular domain can be done by typing the domain without the www. The tail extension is to be typed when blocking a country.
There is no limit to the entries that can be added one at a time. The “add” should be checked after each entry while the generated code is to be copied and posted into a plain text file. This file is then named .htaccess. The “.” Previous to the file name should be noted as well as the absence of any tail extension.
If there is by now an .htaccess file in the root of the docs directory or the folder where it is to be applied, the generated code shall be added to the end of the current .htaccess file, taking extra care not to scare the existing code. It will then be uploaded in ASCII mode.
The rel = “no follow” solution
A coalition of blogging and search engine companies have joined together to support an HTML attribute designed primarily to combat comment spam but have high potentials as well for effectual use against referral spam. This attribute is known as the rel =”no follow” is being praised by many bloggers as the ultimate solution for the prevailing problem. The thought is simple enough with the toughest part being the topic of influential the major players such as Google, Yahoo! and MSN to agree on it.
Tagging a link with rel =’no follow” attribute would prevent any contribution to the site’s PageRank. This means that comment and referral spammers will not be rewarded for their illegitimate activities on websites that implement the attribute. The problem gets solved partially but this solution is unable to end it.
This truth is required to be clarified by the fact that it is impossible to reach a 100% adoption thus there will permanently be an incentive to spam. Spammers essentially do not care whether their techniques are specifically effectual as long as they are generally effectual. They need no particular reason to hit any site and will do so as their main target is the blogosphere as a total. It is also reasonably unfortunate that the resources required to fight spam, particularly referral spam, is far larger than the resources needed to make it.
Referral spam is an HTTP question for. The client doesn’t even need to acknowledge the response. All it may need is a simple packet with formatted text.
Spammers take pains to make a question for look legitimate. The user – agent string would look very much like MSIE. It used to be that spam came from a single IP but things have certainly gotten more complex since then.
Filtering referrer IPs against spam blacklisting can also be done. Listing the referring URL in any section of a site’s web stats should be avoided if the IP is blacklisted. Do not pursue query once a given site is identified as a referral spam host name.

Importance of the Robots.txt File

Monday, December 21st, 2009

Despite the importance of the Robots.txt file in getting your website indexed with the major search engines, many webmasters don’t place forward one on their site. What is the robots.txt file you question? If you don’t know, you are far from alone. The robots.txt file is a simple text file (no html) that is placed in your website’s root directory in order to tell the search engines which pages to index and which to skip.

When a search engine sends its webcrawler to your site, one of the first things the webcrawler will do is search the root directory for the robots.txt file. A correctly formated robots.txt file will consist of several records, each providing instructions for a particular search-bot. A record will generally consist of two components, the first is called the user-agent and is where the name of the search-bot is listed. The second line consits of one or more “disallow” lines. These lines tell the webcrawler which files or folders should not be indexed (ie a cgi-bin folder).

If you currently have a website and do not have a robots.txt file, you can make one easily. As mentioned earlier, the files are plain text, so just open up notepad and save the file at robots.txt. Most webmasters can use one record that will apply to all of the search engine crawlers. Once you have opened notepad penetrate the following:

User-agent: * Disallow:

The “*” applies this rule to all bots. In this example, there is nothing listed in the disallow line. This tells the robot to index the entire site. You can also penetrate a folder path here such as “/confidential” if there is a folder that shouldn’t be indexed. This can be very useful if you are still testing a part of your website or is a section is still under construction.

Now that you know what should go into your robots.txt file, there are several common mistakes people make when making these files. Never penetrate notes or comments into the file as these bits and pieces can produce confusion for the webcrawler. Also, the format should permanently be the user-agent on the first line, followed by the disallow(s). Do not reverse the order. A further common mistake made involves using the incorrect case. If the disallowed folder is /confidential, make sure your robots.txt file does not list the folder as /Confidential. It seems like a very minor issue, but it will produce problems if done incorrectly. Irrevocably, there is no Allow command. You cannot tell the webcrawler what to look at, only what not to look at.

If you are still curious about the robots.txt file you can find many more complex examples online. Just try one of your favorite websites and look for their robots.txt file. For example you can go to http://www.cnn.com/robots.txt. If you need help making a robots.txt file for your site, there are plenty of places online that will make the file for you for free. One example is http://www.seochat.com/seo-tools/robots-generator/. Despite its apparently simplicity, this file can make or break your site’s chances with the search engines. Make sure you have your robots.txt file in place and correctly formatted today.

How To Use Your .htaccess File To Keep Spammers Out

Monday, December 21st, 2009

Spammers have a knack for developing “overrides” to even the most open aspect of the system including those that are not readily recognized as potential targets. The .htaccess file can be used to keep e-mail harvesters away. This is considered very effectual since all of these harvesters get to identify themselves in some way using the user agent files which gives .htaccess the capability to block them.
Spams Countered by .htaccess
Terrible bots are the spiders that are considered to do a lot more harm than excellent to a site such as an e-mail harvester. Site rippers are offline browsing programs that a surfer may unleash on a site to crawl and download every one of its pages for offline viewing. Both cases would result to a jacking up a site’s bandwidth and resource usage even up to the point of loud the site’s server. Since terrible bots would typically ignore the desires of ones’ robots.txtfile they can be banned using the .htaccess essentially by identifying the terrible bots.
There is a useful code block that can be inserted into the .htaccess file for blocking a lot of the known terrible bots and site rippers currently existing. Affected bots will receive a 403 Forbidden Error when they attempt to view a protected site. This usually results to a significant bandwidth saving and decrease in server resource usage.
Bandwidth stealing or what is commonly referred to as hot between in the web community refers to between frankly to non-HTML objects that are not on one’s own server such as images and CSS files. The victim’s server is robbed of bandwidth and cash as the perpetrator enjoys showing content without having to pay for its delivery.
Hot between to one’s own server can be disallowed with the use of .htaccess. Those who will attempt to link an image or CSS file on a protected site is either blocked or served a different content. Being blocked would usually mean a disastrous question for in the form of a broken image while an example of a different content would be an image of an mad man, presumably to send a clear message to the violators. It is necessary that the mod rewrite is enabled on one’s server in order for this aspect of .htaccess to work.
Disabling hot between of certain file types on a site would need a code to the .htaccess file which will be uploaded to the root directory or a particular subdirectory to localize the effect to just one section of the site. A server is typically set to prevent directory listing. If this is not the case, the required link should be stored into the .htaccess files of the image directory so that nothing in this directory will be allowed to be listed.
The .htaccess file is also able to reliably password protect directories on websites. Other options can be used but only .htaccess offers total security. Anyone wishing to get into the directory must know the password and no “back doors” are provided. Password protection using .htaccess requires count the approximate links to the .htaccess file in the directory that is being required to be protected.
Password protecting a directory is one of the functions of .htaccess that takes a small more work than the others. This is because a file containing the usernames and passwords which are allowed to access the site has to be produced. It is placed somewhere within the website even if it is advisable to store it outside the web root so that it cannot be accessed from the web.
Recommended Practices to Deter Spam
Avoiding the publication of referrers is one way of discouraging spammers. It would be pointless to bother sending spoofed requests to blogs when this information is not known. Unfortunately, most bloggers believe that being able to click on a link such as “sites referring to me” and the like is a clean feature and have not evaluated its detrimental effect on the total blogosphere.
If publishing referrers is a certain must, there should be a built-in support for a referral spam blacklist and include the page in robots.txt. It specifically tells Googlebot and its relatives not to index the referrer’s page. By doing this, spammers are unable to get the page rank they seek. This would only work but, when referrers are in print separately from the rests of the site’s content.
The use of rel = “no follow” likewise denies the spammers of their desired page rank at the link-amount and not just the page-amount using robots.txt. All link referrer section of the website between to outdoor websites should carry this attribute. This is done without exception so as to place forward maximum protection.
Referrer statistics gathered from beacon images loaded via JavaScript document, write statements that are more reliable than what the raw web server logs will contain. There is an option to perfectly disregard the referrer’s section of a site’s server logs. A cleaner list of referrers can be gathered from the use of JavaScript and beacon images from referrer stats.
The current Master Blacklist File can be a powerful and well-organized weapon against spam. A log file analysis program that filters referrers against this list can help root out spam. The Master Blacklist is a simple text file that can be downloaded from a website or simply mirrored. It is far from perfect since a check on the file against the referrers that got through shows that few or none of them were listed.
The thought of combating comment spam by harnessing DNS-based black hole lists could also be used to ferret out other forms of spam such as referral spam. The proposal is really rather simple and suggests to query the IP against a blacklist for a question for with a referrer. If the IP is blacklisted or has a high score among a multitude of blacklist, listing the referring URL in any section of a site’s web stats should be refrained from. Once a given site has been identified as a referral spam host name, querying the blacklist again for any IPs with the same host name in the HTTP question for should not be done as a topic of efficiency.
There are various forms of spam that has grown exponentially along with the popularity of blogs. This is probably due to the very small restrictions given against those that can post a comment. This is easily exploited by spammers who are intent on getting their goods in front of people’s view. Spammers have automated tools on a constant look-out for blogs that can easily be spammed. Spamming in all its forms, carry gray consequences for those trying to use the Internet and the world wide web in a productive way.

What A .htaccess File Is And How To Make One

Monday, December 21st, 2009

A .htaccess file is a simple ASCII file similar to that produced through text editor such as Notepad or Simple Text. Most people are confused with the naming caucus for the file. The term .htaccess is not a file .htaccess or somepage.htaccess because it is the file extension simply named as such. Its usually known use is related to implementing custom error page or password protected directories.
Making the File
The creation of the file is done by opening up a text editor and saving an empty page as .htaccess. If it is not allowed to save an empty page, simply type in one character. An editor probably appends its default file extension to the name. Notepad for one would call the file .htaccess.txt but the .txt or other file extension need to be removed to enable the user to start “htaccessing”. This can be done by clicking the file and renaming it by removing anything that doesn’t say .htaccess. It can also be renamed via telnet or the ftp program.
These files must not be uploaded as binary but rather as ASCII mode. Users can CHMOP the .htaccess file to 644 to make the file usable by the server while preventing it from being read by a browser since this can seriously compromise security. When there are passwords protected directories and a browser can read the .htaccess file, the place of the certification file can be bought to reverse engineer the list and so completely access any part that had previously been protected. This can be prevented by either placing all certification files above root directory so rendering the www inaccessible or through an .htaccess series of commands that prevents itself from being accessed by a browser.
Most commands in .htaccess are meant to be placed on one line only thus if a text editor uses word wrap, it should be disabled as it is possible that it might throw in a few characters that might contradict Apache. .htaccess is not for NT servers and is considered an Apache thing. Apache is generally very tolerant of malformed content in an .htaccess file.
The directory in which .htaccess file is placed is “affected” as well as all sub-directories. It a user desires not to have certain .htaccess commands affect a specific directory, this is done by placing a new .htaccess file within the directory that should not be affected with certain changes and removing the specific command/s. from the new .htaccess file which should not affect the directory. The nearest .htaccess file to the current directory is the one considered as the .htaccess file. A global .htaccess located in the root, if considered the nearest, affects every single directory in the entire site.
Residency of .htaccess should not be done haphazardly as this may result to redundancy and may produce an infinite loop of redirects or errors. There are sites that do not allow the use of .htaccess files because a server overloaded with domains can be slowed down when all are using .htaccess files. It is possible that .htaccess can compromise a server configuration specifically set-up by the administrator. It is therefore necessary to make sure that the use of .htaccess is allowed previous to its actual use.
Error documents are only a part of the general use of .htaccess. Specifying one’s own customized error documents will require a command within the .htaccess file. The pages can be named anything and can be placed somewhere within the site as long as they are web-accessible through a URL. The best names are those that would prevent the user from forgetting what the page is being used for.
Password protection is effectively dealt with by .htaccess. By making a file called .htpasswd, username and the encrypted password of the people to be allowed access are placed in the .htpasswd file. The .htpasswd file should likewise be not uploaded to a directory that is web accessible for maximum security.
Total directories of a site can be redirected using the .htaccess file without the need to specify each file. Thus any question for made for an ancient site will be redirected to the new site, with the extra information in the URL added on. This is a very powerful feature when used correctly.
Aside from custom error pages, password protecting folders and automatic redirection of users, .htaccess is also capable of varying file extension, banning users with extra certain IP address allowing only users with certain IP addresses, stopping directory listing and using a different file as the index file. Accessing a site that has been protected by .htaccess will require a browser to pop-up a standard username/password show box. But, there are certain scripts available which will allow the user to embed a username/password box in a website to do the certification. The wide variety of uses of .htaccess facilitates time saving options and increased security in a website.
Many hosts support .htaccess but do not publicize it while many others have the capability for it but do not allow their users to have an .htaccess file. Generally, a server that runs UNIX or any version of the Apache web server will support .htaccess even if the host may not allow its use.
When to Use .htaccess Files
The .htaccess files should not be used when there is no access to the main server configuration file. Divergent to common belief, user certification is not permanently done in .htaccess files. The preferred way is to place user certification configuration in the main server configuration.
It should be used in situations where the content provider needs to make configuration changes to the server on a per-directory basis but does not have root access on the server system. Individual users can be permitted to make these changes in .htaccess files for themselves if the server administrator is unwilling to make frequent configuration. As a general rule, the use of .htaccess should be avoided when possible since configuration can be effectively made in a Directory Section in the main server configuration file.
Two main factors warrant avoiding the use of .htaccess files – performance and security. Permitting .htaccess files causes a performance hit whether or not it is really used, since Apache will look in every directory for such file. The .htaccess file is also looked into every time a document is requested. The Apache search will include .htaccess files in all higher-amount directories to have a full complement of directories of application. As such, each file accessed out of the directory results to 4 bonus file system accesses even if none was originally present.
The use of .htaccess permits users to modify server configuration which may produce uncontrolled changes. This privilege should be carefully considered previous to it is given to users. The use of the .htaccess files can be completely disabled by setting the Allow Overide directive to none.