| View previous topic :: View next topic |
| Author |
Message |
GregLand Valued Contributor


Joined: 15 Jun 2004 Posts: 212 Location: FRANCE
|
Posted: Mon Mar 23, 2009 3:07 pm Post subject: [Solved] Is Binary File ? |
|
|
Hello all !
Is there a way to know if a file is a binary file or a non-binary file ?
(For example return 1 if binary or 0 if non-binary file.)
I thought it was possible using binary command or function, but I didn't find it...
Is there a way to do that easily ?
Thanks for all.. 
Last edited by GregLand on Sun Mar 29, 2009 11:08 am; edited 1 time in total |
|
| Back to top |
|
 |
Hooligan VDS Developer


Joined: 28 Oct 2003 Posts: 480 Location: California
|
Posted: Tue Mar 24, 2009 2:49 am Post subject: |
|
|
Essentially, all files are binary. It's just that some of those can be read directly as ascii or unicode text. This is usually identified by the extention (i.e. filename.txt, etc). Programmatically determining whether or not a file is a text file can be quite challanging, as almost any file can be read in as ascii, although it wouldn't make any sense when you read it...
Hooligan _________________ Hooligan
Why be normal? |
|
| Back to top |
|
 |
Garrett Moderator Team
Joined: 04 Oct 2001 Posts: 2149 Location: A House
|
Posted: Tue Mar 24, 2009 5:09 am Post subject: |
|
|
I think you're just going to have to put together a list of file extensions known to be used only in binary format. _________________ 'What you do not want done to yourself, do not do to others.' - Confucius (550 b.c. to 479 b.c.) |
|
| Back to top |
|
 |
GregLand Valued Contributor


Joined: 15 Jun 2004 Posts: 212 Location: FRANCE
|
Posted: Tue Mar 24, 2009 9:37 am Post subject: |
|
|
| Quote: | | I think you're just going to have to put together a list of file extensions known to be used only in binary format. |
In fact, I have a list of files (binary and text) and I would like to delete in the list every binary file (Unicode files)...
I can't use file extension because some extension can be unknown (.gep for example) and it's not a reference... .log files for example can be a binary or a text file
| Quote: | | It's just that some of those can be read directly as ascii or unicode text |
Is Ascii corresponding to text file and Unicode with Binary Files ? |
|
| Back to top |
|
 |
Hooligan VDS Developer


Joined: 28 Oct 2003 Posts: 480 Location: California
|
Posted: Tue Mar 24, 2009 12:22 pm Post subject: |
|
|
| Quote: | | Is Ascii corresponding to text file and Unicode with Binary Files ? |
Ascii code is an 8 bit character code, giving a total of 256 possible characters. Generally comprised of control characters, upper and lower case letters, numbers, punctuation and extended (special) characters. In order to change to different characters, you would have to change code sets (kind of like changing fonts).
Unicode is a 16 bit character code. This allows for 65,535 characters, making it easy to add characters from international languages, such as Japanese, Korean and Chinese.
We wont even go into EBCDIC...
Hooligan _________________ Hooligan
Why be normal? |
|
| Back to top |
|
 |
GregLand Valued Contributor


Joined: 15 Jun 2004 Posts: 212 Location: FRANCE
|
Posted: Tue Mar 24, 2009 1:27 pm Post subject: |
|
|
Ok Hooligan, thanks for these precisions...
So can VDS make a difference between Unicode and ASCII files ? |
|
| Back to top |
|
 |
vdsalchemist Admin Team

Joined: 23 Oct 2001 Posts: 1448 Location: Florida, USA
|
Posted: Tue Mar 24, 2009 2:43 pm Post subject: |
|
|
GregLand,
VDS does not make a distinction between Unicode and ASCII files. Usually you can find a header (ie... sturcture that defines the layout of the file in question) in which case the file would be considered binary by most but not all. As far as the windows operating system goes it does not make that distinction. To windows all files are just streams of bytes. To many people here in the US we see any file that has characters outside of the printable ASCII character set with the exception of the caraige return, line feeds, space, and tab as text but to other countries they may see it differently since their files are in unicode or using a different code page. VDS controls are all ASCII controls they do not support UNICODE. VDS does support multiple code pages however they are still the 256 ASCII characters being presented differently. So the only real help that I can offer is to determine first what you consider text and then look for characters in the file using the new VDS 6 I/O commands/functions or older VDS BINFILE/@BINFILE commands functions and as soon as you find a character that does not fit your definition of a text file then it must be binary. _________________ Home of
Give VDS a new purpose!
 |
|
| Back to top |
|
 |
GregLand Valued Contributor


Joined: 15 Jun 2004 Posts: 212 Location: FRANCE
|
Posted: Tue Mar 24, 2009 2:57 pm Post subject: |
|
|
Ok I understand your method.
I will try to find a character not common between ASCII and Unicode.
Thanks |
|
| Back to top |
|
 |
uvedese Contributor


Joined: 21 Jan 2006 Posts: 169 Location: Spain
|
Posted: Tue Mar 24, 2009 3:33 pm Post subject: |
|
|
Hi GregLand:
My experience tells me that a text file usually used characters whose ASCII code is between 32 and 255 with the exception of code 13 (0D) and 10 (0A) to define new lines of text... |
|
| Back to top |
|
 |
vdsalchemist Admin Team

Joined: 23 Oct 2001 Posts: 1448 Location: Florida, USA
|
Posted: Tue Mar 24, 2009 6:39 pm Post subject: |
|
|
| GregLand wrote: | Ok I understand your method.
I will try to find a character not common between ASCII and Unicode.
Thanks |
IMHO I think you would be better off building a list of known text file extensions and if the file is not one of those then assume it be a binary file. _________________ Home of
Give VDS a new purpose!
 |
|
| Back to top |
|
 |
GregLand Valued Contributor


Joined: 15 Jun 2004 Posts: 212 Location: FRANCE
|
Posted: Tue Mar 24, 2009 9:09 pm Post subject: |
|
|
I do not think I will use file extensions because files can have the same extension, but an Unicode or ASCII content (eg. Log. Dat ...)
I think this is not as reliable as the choice of characters.
I am continuing my research, thank you for your help |
|
| Back to top |
|
 |
DanTheMan Contributor


Joined: 15 Mar 2002 Posts: 56 Location: Sweden
|
Posted: Wed Mar 25, 2009 12:55 pm Post subject: |
|
|
This snippet of C code can give you an idea.
It opens a stream assuming it to be an ascii first. The While loop checkes
for non ascii byte. I don't know if it's to slow ? but it's
an idea anyway
//Dan
| Code: | include <stdio.h>
int main(int argc, char *argv[])
{
unsigned char ch;
FILE *file;
int binaryFile = FALSE;
file = fopen(<FILE_PATH>, "rb"); // Open in Binary mode for the first time.
while((fread(&ch, 1, 1, file) == 1) && (binaryFile == FALSE))
{
if(ch < 9 || ch == 11 || (ch > 13 && ch < 32) || ch == 255)
{
binaryFile = 1;
}
}
fclose(file);
if(binaryFile)
file = fopen(<FILE_PATH>, "rb");
else
file = fopen(<FILE_PATH>, "r");
if(binaryFile)
{
while(fread(&ch, 1, 1, file) == 1)
{
// Do whatever you want here with the binary file byte...
}
}
else
{
while(fread(&ch, 1, 1, file) == 1)
{
// This is ASCII data, can easily print it!
putchar(ch);
}
}
fclose(file);
return(0);
} |  |
|
| Back to top |
|
 |
DanTheMan Contributor


Joined: 15 Mar 2002 Posts: 56 Location: Sweden
|
Posted: Thu Mar 26, 2009 10:58 am Post subject: |
|
|
An example of VDS code i hope it will work ?, the only problem
is if the ascii file contains special chars non ascii it will be treated as binary...
| Code: | %%File = c:\x.x
BINFILE OPEN,1,%%File,READ
if @not(@ok())
Warn File not found !
Exit
end
%%Break =
while @both(@not(@binfile(EOF,1)),@not(%%Break))
%C = @binfile(read,1,BINARY,1)
%C = @SUBSTR(%C,1,@pred(@len(%C)))
# Check if binary
if @equal(%C,255)@equal(11,%C)@greater(9,%C)@both(@greater(%C,13),@greater(32,%C))
%%Break = 1
end
Wend
if %%break
Info File %%File is binary
else
Info File %%File is Ascii
end
exit
|
|
|
| Back to top |
|
 |
GregLand Valued Contributor


Joined: 15 Jun 2004 Posts: 212 Location: FRANCE
|
Posted: Thu Mar 26, 2009 9:14 pm Post subject: |
|
|
Thanks you very much !
I will try your code !  |
|
| Back to top |
|
 |
Dr. Dread Professional Member


Joined: 03 Aug 2001 Posts: 1065 Location: Copenhagen, Denmark
|
Posted: Fri Mar 27, 2009 5:54 am Post subject: |
|
|
Dan's code should be quite reliable. But I'm also pretty sure that with VDS this kind of looping
would be sloooow if the file isn't small.
With binary files it would be OK, because the loop will probably find a binary-flagged char quite
soon and exit the loop. But with ASCII files you will loop through the entire file and that means
more than a million loops just for a 1MB file.
So I would suggest that you just do the check for perhaps the first 1000 chars only. Count the loops
and exit if you go beyond that count.
Greetz
Dr. Dread _________________ ~~ Alcohol and calculus don't mix... Don't drink and derive! ~~
String.DLL * advanced string processing |
|
| Back to top |
|
 |
|