forum.vdsworld.com Forum Index forum.vdsworld.com
Visit VDSWORLD.com
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 


[Solved] Is Binary File ?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    forum.vdsworld.com Forum Index -> General Help
View previous topic :: View next topic  
Author Message
GregLand
Valued Contributor
Valued Contributor


Joined: 15 Jun 2004
Posts: 212
Location: FRANCE

PostPosted: Mon Mar 23, 2009 3:07 pm    Post subject: [Solved] Is Binary File ? Reply with quote

Hello all !
Is there a way to know if a file is a binary file or a non-binary file ?
(For example return 1 if binary or 0 if non-binary file.)

I thought it was possible using binary command or function, but I didn't find it...

Is there a way to do that easily ?

Thanks for all.. Wink


Last edited by GregLand on Sun Mar 29, 2009 11:08 am; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website
Hooligan
VDS Developer
VDS Developer


Joined: 28 Oct 2003
Posts: 480
Location: California

PostPosted: Tue Mar 24, 2009 2:49 am    Post subject: Reply with quote

Essentially, all files are binary. It's just that some of those can be read directly as ascii or unicode text. This is usually identified by the extention (i.e. filename.txt, etc). Programmatically determining whether or not a file is a text file can be quite challanging, as almost any file can be read in as ascii, although it wouldn't make any sense when you read it...

Hooligan

_________________
Hooligan

Why be normal?
Back to top
View user's profile Send private message
Garrett
Moderator Team


Joined: 04 Oct 2001
Posts: 2149
Location: A House

PostPosted: Tue Mar 24, 2009 5:09 am    Post subject: Reply with quote

I think you're just going to have to put together a list of file extensions known to be used only in binary format.
_________________
'What you do not want done to yourself, do not do to others.' - Confucius (550 b.c. to 479 b.c.)
Back to top
View user's profile Send private message
GregLand
Valued Contributor
Valued Contributor


Joined: 15 Jun 2004
Posts: 212
Location: FRANCE

PostPosted: Tue Mar 24, 2009 9:37 am    Post subject: Reply with quote

Quote:
I think you're just going to have to put together a list of file extensions known to be used only in binary format.

In fact, I have a list of files (binary and text) and I would like to delete in the list every binary file (Unicode files)...

I can't use file extension because some extension can be unknown (.gep for example) and it's not a reference... .log files for example can be a binary or a text file

Quote:
It's just that some of those can be read directly as ascii or unicode text

Is Ascii corresponding to text file and Unicode with Binary Files ?
Back to top
View user's profile Send private message Visit poster's website
Hooligan
VDS Developer
VDS Developer


Joined: 28 Oct 2003
Posts: 480
Location: California

PostPosted: Tue Mar 24, 2009 12:22 pm    Post subject: Reply with quote

Quote:
Is Ascii corresponding to text file and Unicode with Binary Files ?

Ascii code is an 8 bit character code, giving a total of 256 possible characters. Generally comprised of control characters, upper and lower case letters, numbers, punctuation and extended (special) characters. In order to change to different characters, you would have to change code sets (kind of like changing fonts).
Unicode is a 16 bit character code. This allows for 65,535 characters, making it easy to add characters from international languages, such as Japanese, Korean and Chinese.

We wont even go into EBCDIC...

Hooligan

_________________
Hooligan

Why be normal?
Back to top
View user's profile Send private message
GregLand
Valued Contributor
Valued Contributor


Joined: 15 Jun 2004
Posts: 212
Location: FRANCE

PostPosted: Tue Mar 24, 2009 1:27 pm    Post subject: Reply with quote

Ok Hooligan, thanks for these precisions... Thumbs Up
So can VDS make a difference between Unicode and ASCII files ?
Back to top
View user's profile Send private message Visit poster's website
vdsalchemist
Admin Team


Joined: 23 Oct 2001
Posts: 1448
Location: Florida, USA

PostPosted: Tue Mar 24, 2009 2:43 pm    Post subject: Reply with quote

GregLand,
VDS does not make a distinction between Unicode and ASCII files. Usually you can find a header (ie... sturcture that defines the layout of the file in question) in which case the file would be considered binary by most but not all. As far as the windows operating system goes it does not make that distinction. To windows all files are just streams of bytes. To many people here in the US we see any file that has characters outside of the printable ASCII character set with the exception of the caraige return, line feeds, space, and tab as text but to other countries they may see it differently since their files are in unicode or using a different code page. VDS controls are all ASCII controls they do not support UNICODE. VDS does support multiple code pages however they are still the 256 ASCII characters being presented differently. So the only real help that I can offer is to determine first what you consider text and then look for characters in the file using the new VDS 6 I/O commands/functions or older VDS BINFILE/@BINFILE commands functions and as soon as you find a character that does not fit your definition of a text file then it must be binary.

_________________
Home of

Give VDS a new purpose!
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger
GregLand
Valued Contributor
Valued Contributor


Joined: 15 Jun 2004
Posts: 212
Location: FRANCE

PostPosted: Tue Mar 24, 2009 2:57 pm    Post subject: Reply with quote

Ok I understand your method.
I will try to find a character not common between ASCII and Unicode.

Thanks
Back to top
View user's profile Send private message Visit poster's website
uvedese
Contributor
Contributor


Joined: 21 Jan 2006
Posts: 169
Location: Spain

PostPosted: Tue Mar 24, 2009 3:33 pm    Post subject: Reply with quote

Hi GregLand:

My experience tells me that a text file usually used characters whose ASCII code is between 32 and 255 with the exception of code 13 (0D) and 10 (0A) to define new lines of text...
Back to top
View user's profile Send private message Visit poster's website
vdsalchemist
Admin Team


Joined: 23 Oct 2001
Posts: 1448
Location: Florida, USA

PostPosted: Tue Mar 24, 2009 6:39 pm    Post subject: Reply with quote

GregLand wrote:
Ok I understand your method.
I will try to find a character not common between ASCII and Unicode.

Thanks


IMHO I think you would be better off building a list of known text file extensions and if the file is not one of those then assume it be a binary file.

_________________
Home of

Give VDS a new purpose!
Back to top
View user's profile Send private message Visit poster's website AIM Address Yahoo Messenger MSN Messenger
GregLand
Valued Contributor
Valued Contributor


Joined: 15 Jun 2004
Posts: 212
Location: FRANCE

PostPosted: Tue Mar 24, 2009 9:09 pm    Post subject: Reply with quote

I do not think I will use file extensions because files can have the same extension, but an Unicode or ASCII content (eg. Log. Dat ...)

I think this is not as reliable as the choice of characters.

I am continuing my research, thank you for your help
Back to top
View user's profile Send private message Visit poster's website
DanTheMan
Contributor
Contributor


Joined: 15 Mar 2002
Posts: 56
Location: Sweden

PostPosted: Wed Mar 25, 2009 12:55 pm    Post subject: Reply with quote

This snippet of C code can give you an idea.

It opens a stream assuming it to be an ascii first. The While loop checkes
for non ascii byte. I don't know if it's to slow ? but it's
an idea anyway Wink

//Dan

Code:
include <stdio.h>

int main(int argc, char *argv[])
{
 unsigned char ch;
 FILE *file;
 int binaryFile = FALSE;

 file = fopen(<FILE_PATH>, "rb");            // Open in Binary mode for the first time.


 while((fread(&ch, 1, 1, file) == 1) && (binaryFile == FALSE))
 {
    if(ch < 9 || ch == 11 || (ch > 13 && ch < 32) || ch == 255)
    {
       binaryFile = 1;                 
    }
 }

 fclose(file);   

 if(binaryFile)
    file = fopen(<FILE_PATH>, "rb");   
 else                                 
    file = fopen(<FILE_PATH>, "r");


 if(binaryFile)
 {
    while(fread(&ch, 1, 1, file) == 1) 
    {
      // Do whatever you want here with the binary file byte...
    }
 }
 else                               
 {
    while(fread(&ch, 1, 1, file) == 1)
    {
      // This is ASCII data, can easily print it!
      putchar(ch);         
    }
 }

 fclose(file);

 return(0);
}
Wink Wink
Back to top
View user's profile Send private message
DanTheMan
Contributor
Contributor


Joined: 15 Mar 2002
Posts: 56
Location: Sweden

PostPosted: Thu Mar 26, 2009 10:58 am    Post subject: Reply with quote

An example of VDS code i hope it will work ?, the only problem
is if the ascii file contains special chars non ascii it will be treated as binary...

Code:
%%File = c:\x.x
BINFILE OPEN,1,%%File,READ
if @not(@ok())
   Warn File not found !
   Exit
end
%%Break =
while @both(@not(@binfile(EOF,1)),@not(%%Break))
    %C = @binfile(read,1,BINARY,1)
    %C = @SUBSTR(%C,1,@pred(@len(%C)))
    # Check if binary
    if @equal(%C,255)@equal(11,%C)@greater(9,%C)@both(@greater(%C,13),@greater(32,%C))
       %%Break = 1
    end         
Wend
if %%break 
  Info File %%File is binary
else
  Info File %%File is Ascii
end                                                                                               
exit
Back to top
View user's profile Send private message
GregLand
Valued Contributor
Valued Contributor


Joined: 15 Jun 2004
Posts: 212
Location: FRANCE

PostPosted: Thu Mar 26, 2009 9:14 pm    Post subject: Reply with quote

Thanks you very much !
I will try your code ! Thumbs Up
Back to top
View user's profile Send private message Visit poster's website
Dr. Dread
Professional Member
Professional Member


Joined: 03 Aug 2001
Posts: 1065
Location: Copenhagen, Denmark

PostPosted: Fri Mar 27, 2009 5:54 am    Post subject: Reply with quote

Dan's code should be quite reliable. But I'm also pretty sure that with VDS this kind of looping
would be sloooow if the file isn't small.
With binary files it would be OK, because the loop will probably find a binary-flagged char quite
soon and exit the loop. But with ASCII files you will loop through the entire file and that means
more than a million loops just for a 1MB file.

So I would suggest that you just do the check for perhaps the first 1000 chars only. Count the loops
and exit if you go beyond that count.

Greetz
Dr. Dread

_________________
~~ Alcohol and calculus don't mix... Don't drink and derive! ~~

String.DLL * advanced string processing
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forum.vdsworld.com Forum Index -> General Help All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You can attach files in this forum
You can download files in this forum

Twitter@vdsworld       RSS

Powered by phpBB © 2001, 2005 phpBB Group