Recent

Author Topic: UTF8UpperCase  (Read 6712 times)

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
UTF8UpperCase
« on: December 03, 2012, 11:02:34 pm »
When I call UTF8UpperCase('ɑbɑri'); (pchar #201#145'b'#201#145'ri') any further action leads to external SIGSEV. I guess some pointer are faulty; e.g. OutCounter for "Final correction of the buffer size" is 9.
If someone can confirm I'll post a bug report.

Code: [Select]
unit Unit1;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls,
  LCLProc;

type

  { TForm1 }

  TForm1 = class(TForm)
    Button1: TButton;
    Memo1: TMemo;
    procedure Button1Click(Sender: TObject);
  private
    { private declarations }
  public
    { public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.lfm}

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
begin
  Caption:=UTF8UpperCase('ɑbɑri');
  Memo1.Lines.Add(Caption);
end;

end.

Lazarus 1.1 r39360M FPC 2.6.0 x86_64-linux-qt
Lazarus 1.7 (SVN) FPC 3.0.0

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: UTF8UpperCase
« Reply #1 on: December 04, 2012, 12:09:06 am »
I can confirm.
It seems that this happens when a char has an uppercase equivalent that is longer:
#$C9#$91 -> #$E2#$B#$AD and apparently you need at least 2 of those:
UTf8UpperCase(#$C9#$91) will not crash, Utf8UpperCase(#$C9#$91#$C9#$91) will.

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: UTF8UpperCase
« Reply #2 on: December 04, 2012, 12:33:27 am »
Reported as issue #23428.

Bart

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: UTF8UpperCase
« Reply #3 on: December 04, 2012, 01:12:14 am »
Reported as issue #23428.
Thank you! But is the resulting character still UTF8 with more than two bytes?
Anyway, at least no sigsev should be thrown.
Lazarus 1.7 (SVN) FPC 3.0.0

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: UTF8UpperCase
« Reply #4 on: December 04, 2012, 11:25:44 am »
Thank you! But is the resulting character still UTF8 with more than two bytes?

Yes, the upper of #201#145 (2 bytes) is #$E2#$B#$AD (3 bytes).

Anyway, at least no sigsev should be thrown.

I don't get a sigsev, it just crashes (the app vanishes) with the follwing message:

Marked memory at $0023DBE8 invalid
Wrong signature $7D64F58E instead of 4C796538

Also, this only happens when HeapTrace is enabled (option -gh).
(Tested on Win7-64, Lazarus 1.1/Fpc 2.6.2r1 (32-bit))

Bart

Bart

  • Hero Member
  • *****
  • Posts: 5290
    • Bart en Mariska's Webstek
Re: UTF8UpperCase
« Reply #5 on: December 04, 2012, 12:23:28 pm »
I posted a workaround in the bugtracker.

Bart

Ocye

  • Hero Member
  • *****
  • Posts: 518
    • Scrabble3D
Re: UTF8UpperCase
« Reply #6 on: December 06, 2012, 08:09:32 pm »
Unfortunately the bug tracker didn't sent me an email. So I have to thank Felipe here. Great work!
And thanks to you, Bart, for the management :-[.
Lazarus 1.7 (SVN) FPC 3.0.0

 

TinyPortal © 2005-2018