Sorting the characters in a utf-16 string in java Announcing the arrival of Valued Associate...

Etymology of 見舞い

Weaponising the Grasp-at-a-Distance spell

How to create a command for the "strange m" symbol in latex?

Why does my GNOME settings mention "Moto C Plus"?

Knights and Knaves question

Normal Operator || T^2|| = ||T||^2

How is an IPA symbol that lacks a name (e.g. ɲ) called?

Can gravitational waves pass through a black hole?

Protagonist's race is hidden - should I reveal it?

2 sample t test for sample sizes - 30,000 and 150,000

Why not use the yoke to control yaw, as well as pitch and roll?

What is the evidence that custom checks in Northern Ireland are going to result in violence?

Can I ask an author to send me his ebook?

Determine the generator of an ideal of ring of integers

Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?

What could prevent concentrated local exploration?

lm and glm function in R

Does using the Inspiration rules for character defects encourage My Guy Syndrome?

What helicopter has the most rotor blades?

/bin/ls sorts differently than just ls

Providing direct feedback to a product salesperson

How to keep bees out of canned beverages?

Can the van der Waals coefficients be negative in the van der Waals equation for real gases?

If gravity precedes the formation of a solar system, where did the mass come from that caused the gravity?



Sorting the characters in a utf-16 string in java



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







8















tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main {
public static void main(String[] args) {
int[] utfCodes = {128513, 128531, 128557};
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
}
}


Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    2 hours ago


















8















tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main {
public static void main(String[] args) {
int[] utfCodes = {128513, 128531, 128557};
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
}
}


Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    2 hours ago














8












8








8








tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main {
public static void main(String[] args) {
int[] utfCodes = {128513, 128531, 128557};
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
}
}


Output:



Initial String: 😁😓😭
Sorted String: ??😁??









share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












tl;dr



Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?



Details



Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).



Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)



To be specific, do you convert char[] to int[] or is there a better way to sort?



import java.util.Arrays;

public class Main {
public static void main(String[] args) {
int[] utfCodes = {128513, 128531, 128557};
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
}
}


Output:



Initial String: 😁😓😭
Sorted String: ??😁??






java string sorting utf-16






share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 hours ago









jtahlborn

47.6k56198




47.6k56198






New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 hours ago









dingydingy

413




413




New contributor




dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






dingy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    2 hours ago



















  • This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

    – Guillaume F.
    2 hours ago

















This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

– Guillaume F.
2 hours ago





This is what we call a "Collation". You should use a library for this because there are many collations to choose from.

– Guillaume F.
2 hours ago












3 Answers
3






active

oldest

votes


















1














I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



public static void main(String[] args) {
int[] utfCodes = {128531, 128557, 128513};
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);

int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));
}



Initial String: 😓😭😁



Sorted String: 😁😓😭




I switched the order of the characters in your example because they were already sorted.






share|improve this answer

































    1














    We can't use char for Unicode, because Java's Unicode char handling is broken.



    In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



    So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






    share|improve this answer








    New contributor




    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.




























      1














      If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



      int[] codepoints = someString.codePoints().sort().toArray();
      String sorted = new String(codepoints, 0, codepoints.length);


      Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.





      Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



      (When was the last time you tested for anagrams of emojis?)






      share|improve this answer


























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });






        dingy is a new contributor. Be nice, and check out our Code of Conduct.










        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        1














        I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



        Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



        public static void main(String[] args) {
        int[] utfCodes = {128531, 128557, 128513};
        String emojis = new String(utfCodes, 0, 3);
        System.out.println("Initial String: " + emojis);

        int[] codePoints = emojis.codePoints().sorted().toArray();
        System.out.println("Sorted String: " + new String(codePoints, 0, 3));
        }



        Initial String: 😓😭😁



        Sorted String: 😁😓😭




        I switched the order of the characters in your example because they were already sorted.






        share|improve this answer






























          1














          I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



          Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



          public static void main(String[] args) {
          int[] utfCodes = {128531, 128557, 128513};
          String emojis = new String(utfCodes, 0, 3);
          System.out.println("Initial String: " + emojis);

          int[] codePoints = emojis.codePoints().sorted().toArray();
          System.out.println("Sorted String: " + new String(codePoints, 0, 3));
          }



          Initial String: 😓😭😁



          Sorted String: 😁😓😭




          I switched the order of the characters in your example because they were already sorted.






          share|improve this answer




























            1












            1








            1







            I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



            Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



            public static void main(String[] args) {
            int[] utfCodes = {128531, 128557, 128513};
            String emojis = new String(utfCodes, 0, 3);
            System.out.println("Initial String: " + emojis);

            int[] codePoints = emojis.codePoints().sorted().toArray();
            System.out.println("Sorted String: " + new String(codePoints, 0, 3));
            }



            Initial String: 😓😭😁



            Sorted String: 😁😓😭




            I switched the order of the characters in your example because they were already sorted.






            share|improve this answer















            I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.



            Luckily, the codePoints of the String are what you used to create the String itself in this example, so you can simply sort those and create a new String with the result.



            public static void main(String[] args) {
            int[] utfCodes = {128531, 128557, 128513};
            String emojis = new String(utfCodes, 0, 3);
            System.out.println("Initial String: " + emojis);

            int[] codePoints = emojis.codePoints().sorted().toArray();
            System.out.println("Sorted String: " + new String(codePoints, 0, 3));
            }



            Initial String: 😓😭😁



            Sorted String: 😁😓😭




            I switched the order of the characters in your example because they were already sorted.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 2 hours ago

























            answered 2 hours ago









            Jacob G.Jacob G.

            16.9k52466




            16.9k52466

























                1














                We can't use char for Unicode, because Java's Unicode char handling is broken.



                In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



                So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






                share|improve this answer








                New contributor




                peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.

























                  1














                  We can't use char for Unicode, because Java's Unicode char handling is broken.



                  In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



                  So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






                  share|improve this answer








                  New contributor




                  peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.























                    1












                    1








                    1







                    We can't use char for Unicode, because Java's Unicode char handling is broken.



                    In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



                    So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.






                    share|improve this answer








                    New contributor




                    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.










                    We can't use char for Unicode, because Java's Unicode char handling is broken.



                    In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.



                    So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index) or the String.codePoints() stream on JDK 1.8 and above.







                    share|improve this answer








                    New contributor




                    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    share|improve this answer



                    share|improve this answer






                    New contributor




                    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.









                    answered 1 hour ago









                    peekaypeekay

                    20613




                    20613




                    New contributor




                    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.





                    New contributor





                    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.






                    peekay is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                    Check out our Code of Conduct.























                        1














                        If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



                        int[] codepoints = someString.codePoints().sort().toArray();
                        String sorted = new String(codepoints, 0, codepoints.length);


                        Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.





                        Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



                        (When was the last time you tested for anagrams of emojis?)






                        share|improve this answer






























                          1














                          If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



                          int[] codepoints = someString.codePoints().sort().toArray();
                          String sorted = new String(codepoints, 0, codepoints.length);


                          Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.





                          Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



                          (When was the last time you tested for anagrams of emojis?)






                          share|improve this answer




























                            1












                            1








                            1







                            If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



                            int[] codepoints = someString.codePoints().sort().toArray();
                            String sorted = new String(codepoints, 0, codepoints.length);


                            Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.





                            Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



                            (When was the last time you tested for anagrams of emojis?)






                            share|improve this answer















                            If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:



                            int[] codepoints = someString.codePoints().sort().toArray();
                            String sorted = new String(codepoints, 0, codepoints.length);


                            Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.





                            Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.



                            (When was the last time you tested for anagrams of emojis?)







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited 31 mins ago

























                            answered 1 hour ago









                            Stephen CStephen C

                            528k72590946




                            528k72590946






















                                dingy is a new contributor. Be nice, and check out our Code of Conduct.










                                draft saved

                                draft discarded


















                                dingy is a new contributor. Be nice, and check out our Code of Conduct.













                                dingy is a new contributor. Be nice, and check out our Code of Conduct.












                                dingy is a new contributor. Be nice, and check out our Code of Conduct.
















                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                What is the “three and three hundred thousand syndrome”?Who wrote the book Arena?What five creatures were...

                                Gersau Kjelder | Navigasjonsmeny46°59′0″N 8°31′0″E46°59′0″N...

                                Hestehale Innhaldsliste Hestehale på kvinner | Hestehale på menn | Galleri | Sjå òg |...